Schlagwort-Archive: JavaEE

eXtended Objects: ORM for No-SQL Databases in JavaEE

Since the appearance of object relational mappers (ORM) the programming of persistence layers became easier and less error prone. ORMs store objects in SQL databases by interpreting the object’s fields as columns of a table to store them. With the standardization of the Java Persistence API the success was programmed and a lot of Java EE and also Java SE projects rely on ORM.

For about ten years now, a new class (or better new classes) of databases appear which were developed in large companies like Amazon, Google and Facebook to solve storage issues which cannot be handled by SQL databases efficiently. These databases are divided into the classes: Big Table Databases, Graph Databases, Key-Value Databases and Document Databases.

This article is not about the databases itself, but about another issue: ORM is for SQL databases and does not work very well with No-SQL databases. Additionally, the ORM implementations for Java are, as far as I know, all JDBC based, which is also not very handy when it comes to migration to other databases which do not have a JDBC like driver implementation. A new approach needs to be thought of and also some short comings need to be extended to overcome struggles still in ORM.

A new approach comes with a project called eXtended Objects. Version 0.4.0 of eXtended Objects was released on Github: https://github.com/buschmais/extended-objects and even it is a quite stable release and the first implementations for databases are out, it is worth mentioning it.

eXtended Objects in JavaEE

For a PureSol Technologies‘ project the Titan database (http://thinkaurelius.github.io/titan) is used and therefore, a Titan storage back end was developed (https://github.com/PureSolTechnologies/extended-objects-titan) for eXtended Objects to make it work with Titan. The current release  is version 0.1.1 and it is still based on eXtended Objects 0.3.0, but a new release may follow quickly for version 0.4.0.

Due to the release is hosted in Maven Central, a simple

is suitable to get eXtended Objects for Titan for Maven.

eXtended Objects is well suitable for JavaSE, but in JavaEE the current approach to get the mapper working is not suitable for that purpose we want to use @Inject to get a new xoManager. This can be done via a producer method (and also a destroyer method to close the xoManager). This may be another blog in future.

Supported Databases

As described in the features section already, several data store implementations are already out, in development or planned. One of the first implementation was done for another graph database to check the eXtended Objects SPI and to proof the concept.Due to the current usage the first  implementation was done for the Titan graph database which is provided by PureSol Technologies.

Other implementations are planned or already available in development state. Here is the current list of (known) projects:

  1. eXtended Objects for Neo4j by Buschmais GbR (stable): https://github.com/buschmais/extended-objects (included in eXtended Objects base library)
  2. eXtended Objects for JSON by Buschmais GbR (development): https://github.com/buschmais/extended-objects (included in eXtended Objects base library)
  3. eXtended Objects for Titan by PureSol Technologies  (development): https://github.com/PureSolTechnologies/extended-objects-titan (project page: http://opensource.puresol-technologies.com/xo-titan)
  4. eXtended Objects for Blueprints (development): https://github.com/BluWings/xo-tinkerpop-blueprints

Hadoop Client in WildFly – A Difficult Marriage

(This article was triggered by a question „Hadoop Jersey conflicts with Wildfly resteasy“ on StackOverflow, because I hit the same wall…)

For a current project, I evaluate the usage of Hadoop 2.7.1 for handling data. The current idea is to use Hadoop’s HDFS, HBase and Spark to handle bigger amount of data (1 TB range, no real Big Data).

The current demonstrator implementation uses Cassandra and Titan as databases. Due to some developments with Cassandra and Titan (Aurelius was aquired by DataStax.), the stack seems not to be future-proof. An alternative is to be evaluated.

The first goal is to use the Hadoop client

in WildFly 9.0.1. (The content of this artical should be also valid for WildFly >=8.1.0.) HDFS is to be used at first to store and retrieve raw files.

Setting up Hadoop in a pseudo distributed mode as it is described in „Hadoop: The Definitive Guide“ was a breeze. I was full of hope and added the dependency above to an EJB Maven module and wanted to use the client to connect to HDFS to store and retrieve single files. Here it is, where the problems started…

I cannot provide all stack traces and error messages anymore, but roughly, this is what happend (one after another; when I removed one obstacle, the next came up):

  • Duplicate providers for Weld where brought up as errors due to multiple providers in the Hadoop client. Several JARs are loaded as implicit bean archives, because JavaEE annotations are included. I did not expect that and it seems strange to have it in a client library which is mainly used in Java SE context.
  • The client dependency is not self contained. During compile time an issue arised due to missing libraries.
  • The client libraries contains depencencies which provide web applications. These applications are also loaded and WildFly try to initialize them, but fails due to missing libraries which are set to provided, but not included in WildFly (but maybe in Geronimo?). Again, I am puzzled, why something like that is packaged in a client library.
  • Due to providers delivered in sub-dependencies of Hadoop client, the JSON provider was switched from Jackson 2 (default since WildFly 8.1.0) back to Jackson 1 causing infinite recursions in trees I need to marshall into JSON, because the com.fasterxml.jackson.*  annotations were not recognized anymore and the org.codehaus.jackson.*  annotations were not provided.

The issues are manyfold and  is caused by a very strange, no to say messy packaging of Hadoop client.

Following are the solutions so far:

Broken implicte bean archives

Several JARs contain JavaEE annotations which leads to an implicit bean archive loading (see: http://weld.cdi-spec.org/documentation/#4). Implicit bean archive support needs to be switched off. For WildFly, it looks like this:

Change the Weld settings in WildFly’s standalone.conf from

to

This switches the implicit bean archive handling off. All libraries used for CDI need to have a /META-INF/beans.xml  file now. (After switching off the implicit archives, I found a lot of libraries with missing beans.xml  files.)

Missing dependencies

I added the following dependencies to fix the compile/linkage issues:

Services provided, but not working

After switching off the implicit bean archives and added new dependencies to get the project compilied, I run into issues during deployment. Mostly, the issues were missing runtime dependencies due to missing injection providers.

The first goal was to shut off all (hopefully) not needed stuff which was disturbing. I excluded the Hadoop MapReduce Client App and JobClient (no idea what these are for). Additionally, I excluded Jackson dependencies, beacause they are already provided in the WildFly container.

Broken JSON marshalling in RestEasy

After all the fixes above, the project compiled and deployed successfully. During test I found that JSON marshalling was broken due to infinite recursions I got during marshalling of my file trees. I drove me cracy to find out the issue. I was almost sure that WildFly 9 switched the default Jackson implementation back to Jackson 1, but I did not find any release note for that. After a long while and some good luck, I found a  YarnJacksonJaxbJsonProvider class which forces the container to use Jackson 1 instead of Jackson 2 messing up my application…

That was the final point to decide (maybe too late), that I need a kind of calvanic isolation. Hadoop client and WildFly need to talk through a proxy of some kind not sharing any dependencies except of one common interface.

Current Solution

I created now one Hadoop connector EAR archive which contains the above mentioned and fixed Hadoop client dependencies. Additionally, I create a Remote EJB and add it to the EAR which provides the proxy to use Hadoop. The proxy implements a Remote Interface which is also used by the client. The client performs a lookup on the remote interface of the EJB. That setup seems to work so far…

The only drawback in this scenario at the moment is, that I cannot use stream throug EJB, because streams cannot be serialized. I think about creating a REST interface for Hadoop, but I have no idea about the performance. Additionally, will the integration of HBase be as difficult as this!?

For the next versions, maybe a fix can be in place. I found a Jira ticket HDFS-2261 „AOP unit tests are not getting compiled or run“.

JavaEE: Arquillian Tests support Multi-WAR-EARs

Before version 1.0.2 Arquillian did not support EAR deployments which contained multiple WAR files. The issue was the selection of the WAR into which the arquillian artifacts are to be placed for testing. The trick until then was to remove all WAR files not needed and to leave only one WAR within the EAR.

Starting from version 1.0.2 Arquillian does support Multi-WAR-EAR-Deployments as described in http://arquillian.org/blog/2012/07/25/arquillian-core-1-0-2-Final.

If you have an EAR which is used via

you can select a WAR with the following lines:

That’s it. After this selection Arquillian stops complaining about multiple WAR files within the EAR and the selected WAR is enriched and tested.