Right, that would surely be incompatible. The initial work I did was on 1.0.3 and those problems can be solved in a more simple (though less clean) way in that branch, mainly because of the fact that there is a single jar which contain everything, so that causes less problems in OSGi.
For trunk, is there any valid reason to create multiple configurations ? Or is the idea of a singleton something that I can investigate working on ? I'm not very familiar with hadoop internals, so I may very well be missing some edge cases. If not, I can come up with a patch that would transform Configuration into a singleton, leading to more flexibility for OSGi and a performance improvement by avoiding re-parsing the xml configuration multiple times. On Mon, Jul 9, 2012 at 4:37 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > Guillaume, > > I am not super familiar with OSGi. I have used it a little in the past, > but that was 5+ years ago. I am in favor of something that will fix the > CLASSPATH problems that we currently have and would allow for CLASSPATH > isolation between Hadoop itself and the applications that use Hadoop. If > OSGi can do this cleanly then I am +1 for moving to OSGi. > > However, we are trying to maintain binary compatibility within major > version numbers, in preparation for rolling upgrades. Many of the things > you have suggested like moving classes from one package to another, and > doing some serious rework to Configuration will break not only binary > compatibility but also API compatibility. > > If we do go this rout, just be aware that it is most likely something that > would have to force a major version bump, which right now means trunk (the > 3.0 line). > > --Bobby Evans > > On 7/9/12 8:24 AM, "Guillaume Nodet" <gno...@gmail.com> wrote: > > >I'm working with Jean-Baptiste to make hadoop work in OSGi. > >OSGi works with classloader in a very specific way which leads to several > >problems with hadoop. > > > >Let me quickly explain how OSGi works. In OSGi, you deploy bundles, which > >are jars with additional OSGi metadata. This metadata is used by the OSGi > >framework to create a classloader for the bundle. However, the > >classloaders are not organized in a tree like in a JEE environment, but > >rather in some kind of graph, where each classloader has limited > >visibility > >and limited exposure. This is controlled by at the package level by > >specifying which packages are exported and which packages are imported by > >a > >given bundle. This is mainly two consequences: > > * OSGi does not supports well split-packages, where the same package is > >exported by two different bundles > > * a classloader does not have visibility on everything as in a usual > >flat > >classloader environment or even JEE-like env > > > >The first problem arise for example with the org.apache.hadoop.fs package > >which is split across hadoop-common and hadoop-hdfs jars (which defines > >the > >Hdfs class). There may be other cases, but I haven't hit them yet. To > >solve this problem, it'd be better if such classes were moved into a > >different package. > > > >The second problem is much more complicated. I think most of the > >classloading is done from Configuration. However, Configuration has an > >internal classloader which is set by the constructor to the thread context > >classloader (defaulting to the Configuration class' classloader) and new > >Configuration objects are created everywhere in the code. > >In addition, creating new Configuration objects force the parsing of the > >configuration files several times. > >Also in OSGi, Configuration is better done through the standard OSGi > >ConfigurationAdmin service, so it would be nice to integrate the > >configuration into ConfigAdmin when running in OSGi. > >For the above reasons, I'd like to know what would you think of > >transforming the Configuration object into a real singleton, or at least > >replacing the "new Configuration()" call spread everywhere with the access > >to a singleton Configuration.getInstance(). > >This would allow the hadoop osgi layer to manage the Configuration in a > >more osgi friendly way, allowing the use of a specific subclass which > >could > >better manage the class loading in an OSGi environment and integrate with > >ConfigAdmin. This may also remove the need for keeping a registry of > >existing Configuration and having to update them when a default resource > >if > >added for example. > > > >Some of the above problems have been addressed in some way in HADOOP-7977, > >but the fixes I've been working on were more related to hadoop 1.0.x > >branch, and are slightly unapplicable to trunk. > > > >One last point: the two above problems are mainly due to the fact that > >I've > >been assuming that individual hadoop jars are transformed into native > >bundles. This would go away if we'd have a single bundle containing all > >the individual jars (as it was with hadoop-core-1.0.x, but having more > >fine > >grained jars is better imho. > > > >Thoughts welcomed. > > > >-- > >------------------------ > >Guillaume Nodet > >------------------------ > >Blog: http://gnodet.blogspot.com/ > >------------------------ > >FuseSource, Integration everywhere > >http://fusesource.com > > -- ------------------------ Guillaume Nodet ------------------------ Blog: http://gnodet.blogspot.com/ ------------------------ FuseSource, Integration everywhere http://fusesource.com