After writing about the first impressions about Hadoop and its WildFly integration, I did some more research and work to bring Hadoop into JavaEE on WildFly in a more elegant way.
The follow-up idea after putting Hadoop into an EAR for isolation was to isolate Hadoop and HBase clients into a WildFly module. Both clients work very well in a Java SE environment, so the idea was to use these modules which are not container managed (like Java SE) to provide the functionality to be used in a container.
The resulting (so far) module.xml is:
<?xml version="1.0" encoding="UTF-8"?> <module xmlns="urn:jboss:module:1.3" name="org.apache.hadoop.client"> <resources> <resource-root path="htrace-core-3.1.0-incubating.jar" /> <resource-root path="hadoop-hdfs-2.7.1.jar" /> <resource-root path="commons-configuration-1.6.jar" /> <resource-root path="hadoop-auth-2.7.1.jar" /> <resource-root path="hadoop-common-2.7.1.jar" /> <resource-root path="protobuf-java-2.5.0.jar" /> <resource-root path="service-loader-resources" /> </resources> <dependencies> <module name="sun.jdk" /> <module name="javax.api" /> <module name="javaee.api" /> <module name="com.google.guava" /> <module name="org.apache.commons.cli" /> <module name="org.apache.commons.collections" /> <module name="org.apache.commons.io" /> <module name="org.apache.commons.lang" /> <module name="org.apache.commons.logging" /> <module name="org.slf4j" /> </dependencies> </module>
For HBase client, the module.xml looks like this:
<?xml version="1.0" encoding="UTF-8"?> <module xmlns="urn:jboss:module:1.3" name="org.apache.hbase.client"> <resources> <resource-root path="phoenix-core-4.5.1-HBase-1.1.jar" /> <resource-root path="hbase-client-1.1.1.jar" /> <resource-root path="hbase-common-1.1.1.jar" /> <resource-root path="hbase-server-1.1.1.jar" /> <resource-root path="hbase-protocol-1.1.1.jar" /> <resource-root path="zookeeper-3.4.6.jar" /> <resource-root path="guava-16.0.1.jar" /> </resources> <dependencies> <module name="sun.jdk" /> <module name="io.netty" /> <module name="javax.api" /> <module name="javax.transaction.api" /> <module name="org.antlr.antlr-runtime" slot="3.4"/> <module name="org.apache.commons.codec" /> <module name="org.apache.commons.lang" /> <module name="org.apache.commons.logging" /> <module name="org.apache.log4j" /> <module name="org.jboss.as.security" /> <module name="org.joda.time" /> <module name="org.slf4j" /> <!-- Own modules... --> <module name="org.apache.hadoop.client" /> </dependencies> </module>
For the modules above, I cannot garantee the completeness of all dependencies, but for the use cases I have at hand, these modules work.