Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Sean Owen
I might be stating the obvious for everyone, but the issue here is not reflection or the source of the JAR, but the ClassLoader. The basic rules are this. "new Foo" will use the ClassLoader that defines Foo. This is usually the ClassLoader that loaded whatever it is that first referenced Foo and c

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan
My bad ... I was replying via mobile, and I did not realize responses to JIRA mails were not mirrored to JIRA - unlike PR responses ! Regards, Mridul On Sun, May 18, 2014 at 2:50 AM, Matei Zaharia wrote: > We do actually have replicated StorageLevels in Spark. You can use > MEMORY_AND_DISK_2 o

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread witgo
How to reproduce this bug? -- Original -- From: "Patrick Wendell";; Date: Mon, May 19, 2014 10:08 AM To: "dev@spark.apache.org"; Cc: "Tom Graves"; Subject: Re: [VOTE] Release Apache Spark 1.0.0 (rc9) Hey Matei - the issue you found is not related to secur

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread Patrick Wendell
Hey Matei - the issue you found is not related to security. This patch a few days ago broke builds for Hadoop 1 with YARN support enabled. The patch directly altered the way we deal with commons-lang dependency, which is what is at the base of this stack trace. https://github.com/apache/spark/pull

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread Tom Graves
no ideas off hand, I'll take a look tomorrow. Tom On Sunday, May 18, 2014 7:28 PM, Matei Zaharia wrote: Alright, I’ve opened https://github.com/apache/spark/pull/819 with the Windows fixes. I also found one other likely bug, https://issues.apache.org/jira/browse/SPARK-1875, in the binary

Re: can RDD be shared across mutil spark applications?

2014-05-18 Thread qingyang li
thanks for sharing, I am using tachyon to store RDD now. 2014-05-18 12:02 GMT+08:00 Christopher Nguyen : > Qing Yang, Andy is correct in answering your direct question. > > At the same time, depending on your context, you may be able to apply a > pattern where you turn the single Spark applicat

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread Matei Zaharia
Alright, I’ve opened https://github.com/apache/spark/pull/819 with the Windows fixes. I also found one other likely bug, https://issues.apache.org/jira/browse/SPARK-1875, in the binary packages for Hadoop1 built in this RC. I think this is due to Hadoop 1’s security code depending on a differen

Fwd: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
Since the additional jars added by sc.addJars are through http server, even it works, we still want to have a better way due to scalability (imagine that thousands of workers downloading jars from driver). If we ignore the fundamental scalability issue, this can be fixed by using the customClasslo

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The jars are included in my driver, and I can successfully use them in the driver. I'm working on a patch, and it's almost working. Will submit a PR soon. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The reflection actually works. But you need to get the loader by `val loader = Thread.currentThread.getContextClassLoader` which is set by Spark executor. Our team verified this, and uses it as workaround. Sincerely, DB Tsai --- My Blog: https

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Sandy Ryza
Hey Xiangrui, If the jars are placed in the distributed cache and loaded statically, as the primary app jar is in YARN, then it shouldn't be an issue. Other jars, however, including additional jars that are sc.addJar'd and jars specified with the spark-submit --jars argument, are loaded dynamical

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Matei Zaharia
BTW in Spark the consensus so far was that we’d use the dev@ list for high-level discussions (e.g. change in the development process, major features, proposals of new components, release votes) and keep lower-level issue tracking in JIRA. This is just how the project operated before so it was th

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
Hi Sandy, It is hard to imagine that a user needs to create an object in that way. Since the jars are already in distributed cache before the executor starts, is there any reason we cannot add the locally cached jars to classpath directly? Best, Xiangrui On Sun, May 18, 2014 at 4:00 PM, Sandy Ry

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
Hi Patrick, If spark-submit works correctly, user only needs to specify runtime jars via `--jars` instead of using `sc.addJar`. Is it correct? I checked SparkSubmit and yarn.Client but didn't find any code to handle `args.jars` for YARN mode. So I don't know where in the code the jars in the distr

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Matei Zaharia
Ah, maybe it’s just different in other Apache projects. All the ones I’ve participated in have had their design discussions on JIRA. For example take a look at https://issues.apache.org/jira/browse/HDFS-4949. (Most design discussions in Hadoop are also on JIRA). Hosting it this way is more conv

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Sandy Ryza
I spoke with DB offline about this a little while ago and he confirmed that he was able to access the jar from the driver. The issue appears to be a general Java issue: you can't directly instantiate a class from a dynamically loaded jar. I reproduced it locally outside of Spark with: --- URL

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Jacek Laskowski
On Sun, May 18, 2014 at 8:28 PM, Andrew Ash wrote: > The nice thing about putting discussion on the Jira is that everything > about the bug is in one place. So people looking to understand the > discussion a few years from now only have to look on the jira ticket rather > than also search the mai

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Matei Zaharia
JIRAs comments are mirrored to the iss...@spark.apache.org list, so people who want to get them by email can do so. In theory one should also be able to reply to one of those emails and have the message show up in JIRA, but I don’t think ours is configured that way. I’m not sure why it wouldn’t

Re: Matrix Multiplication of two RDD[Array[Double]]'s

2014-05-18 Thread Andrew Ash
Hi Liquan, There is some working being done on implementing linear algebra algorithms on Spark for use in higher-level machine learning algorithms. That work is happening in the MLlib project, which has a org.apache.spark.mllib.linalgpackage you may find useful. See https://github.com/apache/spa

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-18 Thread Matei Zaharia
I took the always fun task of testing it on Windows, and unfortunately, I found some small problems with the prebuilt packages due to recent changes to the launch scripts: bin/spark-class2.cmd looks in ./jars instead of ./lib for the assembly JAR, and bin/run-example2.cmd doesn’t quite match the

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@xiangrui - we don't expect these to be present on the system classpath, because they get dynamically added by Spark (e.g. your application can call sc.addJar well after the JVM's have started). @db - I'm pretty surprised to see that behavior. It's definitely not intended that users need reflectio

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Patrick Wendell
@db - it's possible that you aren't including the jar in the classpath of your driver program (I think this is what mridul was suggesting). It would be helpful to see the stack trace of the CNFE. - Patrick On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell wrote: > @xiangrui - we don't expect the

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Andrew Ash
The nice thing about putting discussion on the Jira is that everything about the bug is in one place. So people looking to understand the discussion a few years from now only have to look on the jira ticket rather than also search the mailing list archives and hope commenters all put the string "S

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Jacek Laskowski
Hi, I'm curious if it's a common approach to have discussions in JIRA not here. I don't think it's the ASF way. Pozdrawiam, Jacek Laskowski http://blog.japila.pl 17 maj 2014 23:55 "Matei Zaharia" napisał(a): > We do actually have replicated StorageLevels in Spark. You can use > MEMORY_AND_DISK_

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
Btw, I tried rdd.map { i => System.getProperty("java.class.path") }.collect() but didn't see the jars added via "--jars" on the executor classpath. -Xiangrui On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng wrote: > I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The > reflecti

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870 DB, could you add more info to that JIRA? Thanks! -Xiangrui On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng wrote: > Btw, I tried > > rdd.map { i => > System.getProperty("java.class.path") > }.collect() > > but didn't see the j

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
So I think I need to clarify a few things here - particularly since this mail went to the wrong mailing list and a much wider audience than I intended it for :-) Most of the issues I mentioned are internal implementation detail of spark core : which means, we can enhance them in future without di

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread Xiangrui Meng
I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The reflection approach mentioned by DB didn't work either. I checked the distributed cache on a worker node and found the jar there. It is also in the Environment tab of the WebUI. The workaround is making an assembly jar. DB, could y

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
On 18-May-2014 5:05 am, "Mark Hamstra" wrote: > > I don't understand. We never said that interfaces wouldn't change from 0.9 Agreed. > to 1.0. What we are committing to is stability going forward from the > 1.0.0 baseline. Nobody is disputing that backward-incompatible behavior or > interface