Re: Calling external classes added by sc.addJar needs to be through reflection

Koert Kuipers Wed, 21 May 2014 14:58:18 -0700

db tsai, i do not think userClassPathFirst is working, unless the classes
you load dont reference any classes already loaded by the parent
classloader (a mostly hypothetical situation)... i filed a jira for this
here:
https://issues.apache.org/jira/browse/SPARK-1863




On Tue, May 20, 2014 at 1:04 AM, DB Tsai <[email protected]> wrote:

> In 1.0, there is a new option for users to choose which classloader has
> higher priority via spark.files.userClassPathFirst, I decided to submit the
> PR for 0.9 first. We use this patch in our lab and we can use those jars
> added by sc.addJar without reflection.
>
> https://github.com/apache/spark/pull/834
>
> Can anyone comment if it's a good approach?
>
> Thanks.
>
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Mon, May 19, 2014 at 7:42 PM, DB Tsai <[email protected]> wrote:
>
> > Good summary! We fixed it in branch 0.9 since our production is still in
> > 0.9. I'm porting it to 1.0 now, and hopefully will submit PR for 1.0
> > tonight.
> >
> >
> > Sincerely,
> >
> > DB Tsai
> > -------------------------------------------------------
> > My Blog: https://www.dbtsai.com
> > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
> >
> > On Mon, May 19, 2014 at 7:38 PM, Sandy Ryza <[email protected]
> >wrote:
> >
> >> It just hit me why this problem is showing up on YARN and not on
> >> standalone.
> >>
> >> The relevant difference between YARN and standalone is that, on YARN,
> the
> >> app jar is loaded by the system classloader instead of Spark's custom
> URL
> >> classloader.
> >>
> >> On YARN, the system classloader knows about [the classes in the spark
> >> jars,
> >> the classes in the primary app jar].   The custom classloader knows
> about
> >> [the classes in secondary app jars] and has the system classloader as
> its
> >> parent.
> >>
> >> A few relevant facts (mostly redundant with what Sean pointed out):
> >> * Every class has a classloader that loaded it.
> >> * When an object of class B is instantiated inside of class A, the
> >> classloader used for loading B is the classloader that was used for
> >> loading
> >> A.
> >> * When a classloader fails to load a class, it lets its parent
> classloader
> >> try.  If its parent succeeds, its parent becomes the "classloader that
> >> loaded it".
> >>
> >> So suppose class B is in a secondary app jar and class A is in the
> primary
> >> app jar:
> >> 1. The custom classloader will try to load class A.
> >> 2. It will fail, because it only knows about the secondary jars.
> >> 3. It will delegate to its parent, the system classloader.
> >> 4. The system classloader will succeed, because it knows about the
> primary
> >> app jar.
> >> 5. A's classloader will be the system classloader.
> >> 6. A tries to instantiate an instance of class B.
> >> 7. B will be loaded with A's classloader, which is the system
> classloader.
> >> 8. Loading B will fail, because A's classloader, which is the system
> >> classloader, doesn't know about the secondary app jars.
> >>
> >> In Spark standalone, A and B are both loaded by the custom classloader,
> so
> >> this issue doesn't come up.
> >>
> >> -Sandy
> >>
> >> On Mon, May 19, 2014 at 7:07 PM, Patrick Wendell <[email protected]>
> >> wrote:
> >>
> >> > Having a user add define a custom class inside of an added jar and
> >> > instantiate it directly inside of an executor is definitely supported
> >> > in Spark and has been for a really long time (several years). This is
> >> > something we do all the time in Spark.
> >> >
> >> > DB - I'd hold off on a re-architecting of this until we identify
> >> > exactly what is causing the bug you are running into.
> >> >
> >> > In a nutshell, when the bytecode "new Foo()" is run on the executor,
> >> > it will ask the driver for the class over HTTP using a custom
> >> > classloader. Something in that pipeline is breaking here, possibly
> >> > related to the YARN deployment stuff.
> >> >
> >> >
> >> > On Mon, May 19, 2014 at 12:29 AM, Sean Owen <[email protected]>
> wrote:
> >> > > I don't think a customer classloader is necessary.
> >> > >
> >> > > Well, it occurs to me that this is no new problem. Hadoop, Tomcat,
> etc
> >> > > all run custom user code that creates new user objects without
> >> > > reflection. I should go see how that's done. Maybe it's totally
> valid
> >> > > to set the thread's context classloader for just this purpose, and I
> >> > > am not thinking clearly.
> >> > >
> >> > > On Mon, May 19, 2014 at 8:26 AM, Andrew Ash <[email protected]>
> >> > wrote:
> >> > >> Sounds like the problem is that classloaders always look in their
> >> > parents
> >> > >> before themselves, and Spark users want executors to pick up
> classes
> >> > from
> >> > >> their custom code before the ones in Spark plus its dependencies.
> >> > >>
> >> > >> Would a custom classloader that delegates to the parent after first
> >> > >> checking itself fix this up?
> >> > >>
> >> > >>
> >> > >> On Mon, May 19, 2014 at 12:17 AM, DB Tsai <[email protected]>
> >> wrote:
> >> > >>
> >> > >>> Hi Sean,
> >> > >>>
> >> > >>> It's true that the issue here is classloader, and due to the
> >> > classloader
> >> > >>> delegation model, users have to use reflection in the executors to
> >> > pick up
> >> > >>> the classloader in order to use those classes added by sc.addJars
> >> APIs.
> >> > >>> However, it's very inconvenience for users, and not documented in
> >> > spark.
> >> > >>>
> >> > >>> I'm working on a patch to solve it by calling the protected method
> >> > addURL
> >> > >>> in URLClassLoader to update the current default classloader, so no
> >> > >>> customClassLoader anymore. I wonder if this is an good way to go.
> >> > >>>
> >> > >>>   private def addURL(url: URL, loader: URLClassLoader){
> >> > >>>     try {
> >> > >>>       val method: Method =
> >> > >>> classOf[URLClassLoader].getDeclaredMethod("addURL", classOf[URL])
> >> > >>>       method.setAccessible(true)
> >> > >>>       method.invoke(loader, url)
> >> > >>>     }
> >> > >>>     catch {
> >> > >>>       case t: Throwable => {
> >> > >>>         throw new IOException("Error, could not add URL to system
> >> > >>> classloader")
> >> > >>>       }
> >> > >>>     }
> >> > >>>   }
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> Sincerely,
> >> > >>>
> >> > >>> DB Tsai
> >> > >>> -------------------------------------------------------
> >> > >>> My Blog: https://www.dbtsai.com
> >> > >>> LinkedIn: https://www.linkedin.com/in/dbtsai
> >> > >>>
> >> > >>>
> >> > >>> On Sun, May 18, 2014 at 11:57 PM, Sean Owen <[email protected]>
> >> > wrote:
> >> > >>>
> >> > >>> > I might be stating the obvious for everyone, but the issue here
> is
> >> > not
> >> > >>> > reflection or the source of the JAR, but the ClassLoader. The
> >> basic
> >> > >>> > rules are this.
> >> > >>> >
> >> > >>> > "new Foo" will use the ClassLoader that defines Foo. This is
> >> usually
> >> > >>> > the ClassLoader that loaded whatever it is that first referenced
> >> Foo
> >> > >>> > and caused it to be loaded -- usually the ClassLoader holding
> your
> >> > >>> > other app classes.
> >> > >>> >
> >> > >>> > ClassLoaders can have a parent-child relationship. ClassLoaders
> >> > always
> >> > >>> > look in their parent before themselves.
> >> > >>> >
> >> > >>> > (Careful then -- in contexts like Hadoop or Tomcat where your
> app
> >> is
> >> > >>> > loaded in a child ClassLoader, and you reference a class that
> >> Hadoop
> >> > >>> > or Tomcat also has (like a lib class) you will get the
> container's
> >> > >>> > version!)
> >> > >>> >
> >> > >>> > When you load an external JAR it has a separate ClassLoader
> which
> >> > does
> >> > >>> > not necessarily bear any relation to the one containing your app
> >> > >>> > classes, so yeah it is not generally going to make "new Foo"
> work.
> >> > >>> >
> >> > >>> > Reflection lets you pick the ClassLoader, yes.
> >> > >>> >
> >> > >>> > I would not call setContextClassLoader.
> >> > >>> >
> >> > >>> > On Mon, May 19, 2014 at 12:00 AM, Sandy Ryza <
> >> > [email protected]>
> >> > >>> > wrote:
> >> > >>> > > I spoke with DB offline about this a little while ago and he
> >> > confirmed
> >> > >>> > that
> >> > >>> > > he was able to access the jar from the driver.
> >> > >>> > >
> >> > >>> > > The issue appears to be a general Java issue: you can't
> directly
> >> > >>> > > instantiate a class from a dynamically loaded jar.
> >> > >>> > >
> >> > >>> > > I reproduced it locally outside of Spark with:
> >> > >>> > > ---
> >> > >>> > >     URLClassLoader urlClassLoader = new URLClassLoader(new
> >> URL[] {
> >> > new
> >> > >>> > > File("myotherjar.jar").toURI().toURL() }, null);
> >> > >>> > >
> >> Thread.currentThread().setContextClassLoader(urlClassLoader);
> >> > >>> > >     MyClassFromMyOtherJar obj = new MyClassFromMyOtherJar();
> >> > >>> > > ---
> >> > >>> > >
> >> > >>> > > I was able to load the class with reflection.
> >> > >>> >
> >> > >>>
> >> >
> >>
> >
> >
>

Re: Calling external classes added by sc.addJar needs to be through reflection

Reply via email to