Ninad Raut wrote:
OSGi provides navigability to your components and create a life cycle for
each of those components viz; install. start, stop, un- deploy etc.
This is the reason why we are thinking of creating components using OSGi.
The problem we are facing is our components using mapreduce and HDFS, as
such OSGi container cannot detect hadoop mapred engine or HDFS.

I  have searched through the net and looks like people are working or have
achieved success in running hadoop in OSGi container....

Ninad


1. I am doing work on a simple lifecycle for the services, start/stop/ping, which is not OSGI (which worries a lot about classloading and versioning, check out HADOOP-3628 for this.

2. You can run it under OSGi systems, such as the OSGi branch of SmartFrog : http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/branches/core-branch-osgi/, or under non-OSGi tools. Either way, these tools are left dealing with classloading and the like.

3. Any container is going to have to deal with the problem that there are bits of all the services that call System.Exit() by running under a security manager, trapping the call, raising an exception etc.

4. Any container is going to have to then deal with the fact that from 0.20 onwards, Hadoop does things with security policy that are incompatible with normal Java security managers. whatever security manager you have for trapping system exits, can't extend the default one.

5. any container also has to deal with every service (namenode, job tracker, etc) makes a lot of assumptions about singletons, that they have exclusive use of filesystem objects retrieved through FileSystem.get(), and the like. While OSGi can do that with its classloading work, its still fairly complex.

6. There are also lots of JVM memory/thread management issues, see the various Hadoop bugs

If you look at the slides of what I've been up to, you can see that it can be done
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/doc/dynamic_hadoop_clusters.ppt

However,
* you really need to run every service in its own process, for memory and reliability alone
 * It's pretty leading edge
 * You will have to invest the time and effort to get it working

If you want to do the work, start with what I've been doing, bring it up under the OSGi container of your choice. You can come and play with our tooling, I'm cutting a release today of this week's Hadoop trunk merged with my branch, it is of course experimental, as even the trunk is a bit up-and-down on feature stability.

-steve

Reply via email to