Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
I've discovered that one of the anomalies I encountered was due to a (embarrassing? humorous?) user error. See the user list thread "Failed RC-10 yarn-cluster job for FS closed error when cleaning up staging directory" for my discussion. With the user error corrected, the FS closed exception

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-22 Thread Xiangrui Meng
Hi DB, I found it is a little hard to implement the solution I mentioned: > Do not send the primary jar and secondary jars to executors' > distributed cache. Instead, add them to "spark.jars" in SparkSubmit > and serve them via http by called sc.addJar in SparkContext. If you look at Application

Contributions to MLlib

2014-05-22 Thread MEETHU MATHEW
Hi, I would like to do some contributions towards the MLlib .I've a few concerns regarding the same. 1. Is there any reason for implementing the algorithms supported  by MLlib in Scala 2. Will you accept if  the contributions are done in Python or Java Thanks, Meethu M

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas
Sure. Should I create a Jira as well? I saw there's already a broader ticket regarding the ambiguous use of SPARK_HOME [1] (cc: Patrick as owner of that ticket) I don't know if it would be more relevant to remove the use of SPARK_HOME when using mesos and have the assembly as the only way forwa

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
I retested several different cases... 1. FS closed exception shows up ONLY in RC-10, not in Spark 0.9.1, with both Hadoop 2.2 and 2.3. 2. SPARK-1898 has no effect for my use cases. 3. The failure to report that the underlying application is "RUNNING" and that it has succeeded is due ONLY to my

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Marcelo Vanzin
Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey wrote: > The FS closed exception only effects the cleanup of the staging directory, > not the final success or failure. I've not yet tested the effect of > changing my application's initialization, use, or closing of FileSystem. Without go

Re: Contributions to MLlib

2014-05-22 Thread Xiangrui Meng
Hi Meethu, Thanks for asking! Scala is the native language in Spark. Implementing algorithms in Scala can utilize the full power of Spark Core. Also, Scala's syntax is very concise. Implementing ML algorithms using different languages would increase the maintenance cost. However, there are still m

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options wi

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Andrew Ash
Fixing the immediate issue of requiring SPARK_HOME to be set when it's not actually used is a separate ticket in my mind from a larger cleanup of what SPARK_HOME means across the cluster. I think you should file a new ticket for just this particular issue. On Thu, May 22, 2014 at 11:03 AM, Gerar

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Aaron Davidson
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Prior to this fix, each Spark task created and cached its own FileSystems due to a bug

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
Thank you, all! This is quite helpful. We have been arguing how to handle this issue across a growing application. Unfortunately the Hadoop FileSystem java doc should say all this but doesn't! Kevin On 05/22/2014 01:48 PM, Aaron Davidson wrote: In Spark 0.9.0 and 0.9.1, we stopped using t

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Hey all, On further testing, I came across a bug that breaks execution of pyspark scripts on YARN. https://issues.apache.org/jira/browse/SPARK-1900 This is a blocker and worth cutting a new RC. We also found a fix for a known issue that prevents additional jar files to be specified through spark-

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
On Thu, May 22, 2014 at 12:48 PM, Aaron Davidson wrote: > In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, > and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue > was fixed: https://issues.apache.org/jira/browse/SPARK-1676 > Interesting... > Pr

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Henry Saputra
Looks like SPARK-1900 is a blocker for YARN and might as well add SPARK-1870 while at it. TD or Patrick, could you kindly send [CANCEL] prefixed in the subject email out for the RC10 Vote to help people follow the active VOTE threads? The VOTE emails are getting a bit hard to follow. - Henry O

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Right! Doing that. TD On Thu, May 22, 2014 at 3:07 PM, Henry Saputra wrote: > Looks like SPARK-1900 is a blocker for YARN and might as well add > SPARK-1870 while at it. > > TD or Patrick, could you kindly send [CANCEL] prefixed in the subject > email out for the RC10 Vote to help people follow

[CANCEL][VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Hey all, We are canceling the vote on RC10 because of a blocker bug in pyspark on Yarn. https://issues.apache.org/jira/browse/SPARK-1900 Thanks everyone for testing! We will post RC11 soon. TD

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas
ack On Thu, May 22, 2014 at 9:26 PM, Andrew Ash wrote: > Fixing the immediate issue of requiring SPARK_HOME to be set when it's not > actually used is a separate ticket in my mind from a larger cleanup of what > SPARK_HOME means across the cluster. > > I think you should file a new ticket for j

java.lang.OutOfMemoryError while running Shark on Mesos

2014-05-22 Thread prabeesh k
Hi, I am trying to apply inner join in shark using 64MB and 27MB files. I am able to run the following queris on Mesos - "SELECT * FROM geoLocation1 " - """ SELECT * FROM geoLocation1 WHERE country = '"US"' """ But while trying inner join as "SELECT * FROM geoLocation1 g1 INNER

Re: java.lang.OutOfMemoryError while running Shark on Mesos

2014-05-22 Thread Akhil Das
Hi Prabeesh, Do a export _JAVA_OPTIONS="-Xmx10g" before starting the shark. Also you can do a ps aux | grep shark and see how much memory it is being allocated, mostly it should be 512mb, in that case increase the limit. Thanks Best Regards On Fri, May 23, 2014 at 10:22 AM, prabeesh k wrote: