Ninad Raut wrote:
OSGi provides navigability to your components and create a life cycle for
each of those components viz; install. start, stop, un- deploy etc.
This is the reason why we are thinking of creating components using OSGi.
The problem we are facing is our components using mapreduce and HDFS, as
such OSGi container cannot detect hadoop mapred engine or HDFS.
I have searched through the net and looks like people are working or have
achieved success in running hadoop in OSGi container....
Ninad
1. I am doing work on a simple lifecycle for the services,
start/stop/ping, which is not OSGI (which worries a lot about
classloading and versioning, check out HADOOP-3628 for this.
2. You can run it under OSGi systems, such as the OSGi branch of
SmartFrog :
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/branches/core-branch-osgi/,
or under non-OSGi tools. Either way, these tools are left dealing with
classloading and the like.
3. Any container is going to have to deal with the problem that there
are bits of all the services that call System.Exit() by running under a
security manager, trapping the call, raising an exception etc.
4. Any container is going to have to then deal with the fact that from
0.20 onwards, Hadoop does things with security policy that are
incompatible with normal Java security managers. whatever security
manager you have for trapping system exits, can't extend the default one.
5. any container also has to deal with every service (namenode, job
tracker, etc) makes a lot of assumptions about singletons, that they
have exclusive use of filesystem objects retrieved through
FileSystem.get(), and the like. While OSGi can do that with its
classloading work, its still fairly complex.
6. There are also lots of JVM memory/thread management issues, see the
various Hadoop bugs
If you look at the slides of what I've been up to, you can see that it
can be done
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/doc/dynamic_hadoop_clusters.ppt
However,
* you really need to run every service in its own process, for memory
and reliability alone
* It's pretty leading edge
* You will have to invest the time and effort to get it working
If you want to do the work, start with what I've been doing, bring it up
under the OSGi container of your choice. You can come and play with our
tooling, I'm cutting a release today of this week's Hadoop trunk merged
with my branch, it is of course experimental, as even the trunk is a bit
up-and-down on feature stability.
-steve