Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-16 Thread DB Tsai
After reading the spark code more carefully, spark does `Thread.currentThread().setContextClassLoader` to the custom classloader. However, the classes have to be used via reflection with this approach. See, http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-with

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-16 Thread DB Tsai
Hi Xiangrui, We're still using Spark 0.9 branch, and our job is submitted by ./bin/spark-class org.apache.spark.deploy.yarn.Client \ --jar \ --class \ --args \ --num-workers \ --master-class --master-memory \ --worker-memory \ --addJars Based on my understanding of the c

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-16 Thread DB Tsai
The jars are actually there (and in classpath), but you need to load through reflection. I've another thread giving the workaround. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, Ma

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-16 Thread Robert James
I've experienced the same bug, which I had to workaround manually. I posted the details here: http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster On 5/15/14, DB Tsai wrote: > Hi guys, > > I think it maybe a bug in Spark. I wrote some code to demonstrate th

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-16 Thread DB Tsai
Hi guys, I think it maybe a bug in Spark. I wrote some code to demonstrate the bug. Example 1) This is how Spark adds jars. Basically, add jars to cutomURLClassLoader. https://github.com/dbtsai/classloader-experiement/blob/master/calling/src/main/java/Calling1.java It doesn't work for two reaso

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-15 Thread Xiangrui Meng
In SparkContext#addJar, for yarn-standalone mode, the workers should get the jars from local distributed cache instead of fetching them from the http server. Could you send the command you used to submit the job? -Xiangrui On Wed, May 14, 2014 at 1:26 AM, DB Tsai wrote: > Hi Xiangrui, > > I actua

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-14 Thread DB Tsai
Hi Xiangrui, I actually used `yarn-standalone`, sorry for misleading. I did debugging in the last couple days, and everything up to updateDependency in executor.scala works. I also checked the file size and md5sum in the executors, and they are the same as the one in driver. Gonna do more testing

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-14 Thread Xiangrui Meng
I don't know whether this would fix the problem. In v0.9, you need `yarn-standalone` instead of `yarn-cluster`. See https://github.com/apache/spark/commit/328c73d037c17440c2a91a6c88b4258fbefa0c08 On Tue, May 13, 2014 at 11:36 PM, Xiangrui Meng wrote: > Does v0.9 support yarn-cluster mode? I che

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-14 Thread Xiangrui Meng
Does v0.9 support yarn-cluster mode? I checked SparkContext.scala in v0.9.1 and didn't see special handling of `yarn-cluster`. -Xiangrui On Mon, May 12, 2014 at 11:14 AM, DB Tsai wrote: > We're deploying Spark in yarn-cluster mode (Spark 0.9), and we add jar > dependencies in command line with "-

Distribute jar dependencies via sc.AddJar(fileName)

2014-05-12 Thread DB Tsai
We're deploying Spark in yarn-cluster mode (Spark 0.9), and we add jar dependencies in command line with "--addJars" option. However, those external jars are only available in the driver (application running in hadoop), and not available in the executors (workers). After doing some research, we re