In 1.0, there is a new option for users to choose which classloader has
higher priority via spark.files.userClassPathFirst, I decided to submit the
PR for 0.9 first. We use this patch in our lab and we can use those jars
added by sc.addJar without reflection.

https://github.com/apache/spark/pull/834

Can anyone comment if it's a good approach?

Thanks.


Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Mon, May 19, 2014 at 7:42 PM, DB Tsai <dbt...@stanford.edu> wrote:

> Good summary! We fixed it in branch 0.9 since our production is still in
> 0.9. I'm porting it to 1.0 now, and hopefully will submit PR for 1.0
> tonight.
>
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Mon, May 19, 2014 at 7:38 PM, Sandy Ryza <sandy.r...@cloudera.com>wrote:
>
>> It just hit me why this problem is showing up on YARN and not on
>> standalone.
>>
>> The relevant difference between YARN and standalone is that, on YARN, the
>> app jar is loaded by the system classloader instead of Spark's custom URL
>> classloader.
>>
>> On YARN, the system classloader knows about [the classes in the spark
>> jars,
>> the classes in the primary app jar].   The custom classloader knows about
>> [the classes in secondary app jars] and has the system classloader as its
>> parent.
>>
>> A few relevant facts (mostly redundant with what Sean pointed out):
>> * Every class has a classloader that loaded it.
>> * When an object of class B is instantiated inside of class A, the
>> classloader used for loading B is the classloader that was used for
>> loading
>> A.
>> * When a classloader fails to load a class, it lets its parent classloader
>> try.  If its parent succeeds, its parent becomes the "classloader that
>> loaded it".
>>
>> So suppose class B is in a secondary app jar and class A is in the primary
>> app jar:
>> 1. The custom classloader will try to load class A.
>> 2. It will fail, because it only knows about the secondary jars.
>> 3. It will delegate to its parent, the system classloader.
>> 4. The system classloader will succeed, because it knows about the primary
>> app jar.
>> 5. A's classloader will be the system classloader.
>> 6. A tries to instantiate an instance of class B.
>> 7. B will be loaded with A's classloader, which is the system classloader.
>> 8. Loading B will fail, because A's classloader, which is the system
>> classloader, doesn't know about the secondary app jars.
>>
>> In Spark standalone, A and B are both loaded by the custom classloader, so
>> this issue doesn't come up.
>>
>> -Sandy
>>
>> On Mon, May 19, 2014 at 7:07 PM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>>
>> > Having a user add define a custom class inside of an added jar and
>> > instantiate it directly inside of an executor is definitely supported
>> > in Spark and has been for a really long time (several years). This is
>> > something we do all the time in Spark.
>> >
>> > DB - I'd hold off on a re-architecting of this until we identify
>> > exactly what is causing the bug you are running into.
>> >
>> > In a nutshell, when the bytecode "new Foo()" is run on the executor,
>> > it will ask the driver for the class over HTTP using a custom
>> > classloader. Something in that pipeline is breaking here, possibly
>> > related to the YARN deployment stuff.
>> >
>> >
>> > On Mon, May 19, 2014 at 12:29 AM, Sean Owen <so...@cloudera.com> wrote:
>> > > I don't think a customer classloader is necessary.
>> > >
>> > > Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc
>> > > all run custom user code that creates new user objects without
>> > > reflection. I should go see how that's done. Maybe it's totally valid
>> > > to set the thread's context classloader for just this purpose, and I
>> > > am not thinking clearly.
>> > >
>> > > On Mon, May 19, 2014 at 8:26 AM, Andrew Ash <and...@andrewash.com>
>> > wrote:
>> > >> Sounds like the problem is that classloaders always look in their
>> > parents
>> > >> before themselves, and Spark users want executors to pick up classes
>> > from
>> > >> their custom code before the ones in Spark plus its dependencies.
>> > >>
>> > >> Would a custom classloader that delegates to the parent after first
>> > >> checking itself fix this up?
>> > >>
>> > >>
>> > >> On Mon, May 19, 2014 at 12:17 AM, DB Tsai <dbt...@stanford.edu>
>> wrote:
>> > >>
>> > >>> Hi Sean,
>> > >>>
>> > >>> It's true that the issue here is classloader, and due to the
>> > classloader
>> > >>> delegation model, users have to use reflection in the executors to
>> > pick up
>> > >>> the classloader in order to use those classes added by sc.addJars
>> APIs.
>> > >>> However, it's very inconvenience for users, and not documented in
>> > spark.
>> > >>>
>> > >>> I'm working on a patch to solve it by calling the protected method
>> > addURL
>> > >>> in URLClassLoader to update the current default classloader, so no
>> > >>> customClassLoader anymore. I wonder if this is an good way to go.
>> > >>>
>> > >>>   private def addURL(url: URL, loader: URLClassLoader){
>> > >>>     try {
>> > >>>       val method: Method =
>> > >>> classOf[URLClassLoader].getDeclaredMethod("addURL", classOf[URL])
>> > >>>       method.setAccessible(true)
>> > >>>       method.invoke(loader, url)
>> > >>>     }
>> > >>>     catch {
>> > >>>       case t: Throwable => {
>> > >>>         throw new IOException("Error, could not add URL to system
>> > >>> classloader")
>> > >>>       }
>> > >>>     }
>> > >>>   }
>> > >>>
>> > >>>
>> > >>>
>> > >>> Sincerely,
>> > >>>
>> > >>> DB Tsai
>> > >>> -------------------------------------------------------
>> > >>> My Blog: https://www.dbtsai.com
>> > >>> LinkedIn: https://www.linkedin.com/in/dbtsai
>> > >>>
>> > >>>
>> > >>> On Sun, May 18, 2014 at 11:57 PM, Sean Owen <so...@cloudera.com>
>> > wrote:
>> > >>>
>> > >>> > I might be stating the obvious for everyone, but the issue here is
>> > not
>> > >>> > reflection or the source of the JAR, but the ClassLoader. The
>> basic
>> > >>> > rules are this.
>> > >>> >
>> > >>> > "new Foo" will use the ClassLoader that defines Foo. This is
>> usually
>> > >>> > the ClassLoader that loaded whatever it is that first referenced
>> Foo
>> > >>> > and caused it to be loaded -- usually the ClassLoader holding your
>> > >>> > other app classes.
>> > >>> >
>> > >>> > ClassLoaders can have a parent-child relationship. ClassLoaders
>> > always
>> > >>> > look in their parent before themselves.
>> > >>> >
>> > >>> > (Careful then -- in contexts like Hadoop or Tomcat where your app
>> is
>> > >>> > loaded in a child ClassLoader, and you reference a class that
>> Hadoop
>> > >>> > or Tomcat also has (like a lib class) you will get the container's
>> > >>> > version!)
>> > >>> >
>> > >>> > When you load an external JAR it has a separate ClassLoader which
>> > does
>> > >>> > not necessarily bear any relation to the one containing your app
>> > >>> > classes, so yeah it is not generally going to make "new Foo" work.
>> > >>> >
>> > >>> > Reflection lets you pick the ClassLoader, yes.
>> > >>> >
>> > >>> > I would not call setContextClassLoader.
>> > >>> >
>> > >>> > On Mon, May 19, 2014 at 12:00 AM, Sandy Ryza <
>> > sandy.r...@cloudera.com>
>> > >>> > wrote:
>> > >>> > > I spoke with DB offline about this a little while ago and he
>> > confirmed
>> > >>> > that
>> > >>> > > he was able to access the jar from the driver.
>> > >>> > >
>> > >>> > > The issue appears to be a general Java issue: you can't directly
>> > >>> > > instantiate a class from a dynamically loaded jar.
>> > >>> > >
>> > >>> > > I reproduced it locally outside of Spark with:
>> > >>> > > ---
>> > >>> > >     URLClassLoader urlClassLoader = new URLClassLoader(new
>> URL[] {
>> > new
>> > >>> > > File("myotherjar.jar").toURI().toURL() }, null);
>> > >>> > >
>> Thread.currentThread().setContextClassLoader(urlClassLoader);
>> > >>> > >     MyClassFromMyOtherJar obj = new MyClassFromMyOtherJar();
>> > >>> > > ---
>> > >>> > >
>> > >>> > > I was able to load the class with reflection.
>> > >>> >
>> > >>>
>> >
>>
>
>

Reply via email to