Right, that would surely be incompatible.  The initial work I did was on
1.0.3 and those problems can be solved in a more simple (though less clean)
way in that branch, mainly because of the fact that there is a single jar
which contain everything, so that causes less problems in OSGi.

For trunk, is there any valid reason to create multiple configurations ? Or
is the idea of a singleton something that I can investigate working on ?
 I'm not very familiar with hadoop internals, so I may very well be missing
some edge cases.  If not, I can come up with a patch that would transform
Configuration into a singleton, leading to more flexibility for OSGi and a
performance improvement by avoiding re-parsing the xml configuration
multiple times.

On Mon, Jul 9, 2012 at 4:37 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

> Guillaume,
>
> I am not super familiar with OSGi.  I have used it a little in the past,
> but that was 5+ years ago.  I am in favor of something that will fix the
> CLASSPATH problems that we currently have and would allow for CLASSPATH
> isolation between Hadoop itself and the applications that use Hadoop.  If
> OSGi can do this cleanly then I am +1 for moving to OSGi.
>
> However, we are trying to maintain binary compatibility within major
> version numbers, in preparation for rolling upgrades.  Many of the things
> you have suggested like moving classes from one package to another, and
> doing some serious rework to Configuration will break not only binary
> compatibility but also API compatibility.
>
> If we do go this rout, just be aware that it is most likely something that
> would have to force a major version bump, which right now means trunk (the
> 3.0 line).
>
> --Bobby Evans
>
> On 7/9/12 8:24 AM, "Guillaume Nodet" <gno...@gmail.com> wrote:
>
> >I'm working with Jean-Baptiste to make hadoop work in OSGi.
> >OSGi works with classloader in a very specific way which leads to several
> >problems with hadoop.
> >
> >Let me quickly explain how OSGi works.  In OSGi, you deploy bundles, which
> >are jars with additional OSGi metadata.  This metadata is used by the OSGi
> >framework to create a classloader for the bundle.  However, the
> >classloaders are not organized in a tree like in a JEE environment, but
> >rather in some kind of graph, where each classloader has limited
> >visibility
> >and limited exposure.  This is controlled by at the package level by
> >specifying which packages are exported and which packages are imported by
> >a
> >given bundle.   This is mainly two consequences:
> >  * OSGi does not supports well split-packages, where the same package is
> >exported by two different bundles
> >  * a classloader does not have visibility on everything as in a usual
> >flat
> >classloader environment or even JEE-like env
> >
> >The first problem arise for example with the org.apache.hadoop.fs package
> >which is split across hadoop-common and hadoop-hdfs jars (which defines
> >the
> >Hdfs class).  There may be other cases, but I haven't hit them yet.  To
> >solve this problem, it'd be better if such classes were moved into a
> >different package.
> >
> >The second problem is much more complicated.   I think most of the
> >classloading is done from Configuration.  However, Configuration has an
> >internal classloader which is set by the constructor to the thread context
> >classloader (defaulting to the Configuration class' classloader) and new
> >Configuration objects are created everywhere in the code.
> >In addition, creating new Configuration objects force the parsing of the
> >configuration files several times.
> >Also in OSGi, Configuration is better done through the standard OSGi
> >ConfigurationAdmin service, so it would be nice to integrate the
> >configuration into ConfigAdmin when running in OSGi.
> >For the above reasons, I'd like to know what would you think of
> >transforming the Configuration object into a real singleton, or at least
> >replacing the "new Configuration()" call spread everywhere with the access
> >to a singleton Configuration.getInstance().
> >This would allow  the hadoop osgi layer to manage the Configuration in a
> >more osgi friendly way, allowing the use of a specific subclass which
> >could
> >better manage the class loading in an OSGi environment and integrate with
> >ConfigAdmin.  This may also remove the need for keeping a registry of
> >existing Configuration and having to update them when a default resource
> >if
> >added for example.
> >
> >Some of the above problems have been addressed in some way in HADOOP-7977,
> >but the fixes I've been working on were more related to hadoop 1.0.x
> >branch, and are slightly unapplicable to trunk.
> >
> >One last point: the two above problems are mainly due to the fact that
> >I've
> >been assuming that individual hadoop jars are transformed into native
> >bundles.  This would go away if we'd have a single bundle containing all
> >the individual jars (as it was with hadoop-core-1.0.x, but having more
> >fine
> >grained jars is better imho.
> >
> >Thoughts welcomed.
> >
> >--
> >------------------------
> >Guillaume Nodet
> >------------------------
> >Blog: http://gnodet.blogspot.com/
> >------------------------
> >FuseSource, Integration everywhere
> >http://fusesource.com
>
>


-- 
------------------------
Guillaume Nodet
------------------------
Blog: http://gnodet.blogspot.com/
------------------------
FuseSource, Integration everywhere
http://fusesource.com

Reply via email to