Re: Graphx: GraphLoader.edgeListFile with edge weight

2014-05-21 Thread Reynold Xin
You can submit a pull request on the github mirror: https://github.com/apache/spark Thanks. On Wed, May 21, 2014 at 10:59 PM, npanj wrote: > Hi, > > For my project I needed to load a graph with edge weight; for this I have > updated GraphLoader.edgeListFile to consider third column in input fi

Graphx: GraphLoader.edgeListFile with edge weight

2014-05-21 Thread npanj
Hi, For my project I needed to load a graph with edge weight; for this I have updated GraphLoader.edgeListFile to consider third column in input file. I will like to submit my patch for review so that it can be merged with master branch. What is the process for submitting patches? -- View this

Re: Should SPARK_HOME be needed with Mesos?

2014-05-21 Thread Andrew Ash
Hi Gerard, I agree that your second option seems preferred. You shouldn't have to specify a SPARK_HOME if the executor is going to use the spark.executor.uri instead. Can you send in a pull request that includes your proposed changes? Andrew On Wed, May 21, 2014 at 10:19 AM, Gerard Maas wrot

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-21 Thread Tom Graves
I don't think Kevin's issue would be with an api change in YarnClientImpl since in both cases he says he is using hadoop 2.3.0.  I'll take a look at his post in the user list. Tom On Wednesday, May 21, 2014 7:01 PM, Colin McCabe wrote: Hi Kevin, Can you try https://issues.apache.org/ji

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-21 Thread Tom Graves
Has anyone tried pyspark on yarn and got it to work?  I was having issues when I built spark on redhat but when I built on my mac it had worked,  but now when I build it on my mac it also doesn't work. Tom On Tuesday, May 20, 2014 3:14 PM, Tathagata Das wrote: Please vote on releasing

Re: Re:MLlib ALS-- Errors communicating with MapOutputTracker

2014-05-21 Thread Sue Cai
Hi Witgo, Thanks a lot for your reply. In my second test, the user features and product features were loaded from the file system directly,which means I did not use ALS here, and this problem happened at the loading data stage. The way I am asking the question was a little bit miss lead

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-21 Thread Colin McCabe
Hi Kevin, Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it fixes your issue? Running in YARN cluster mode, I had a similar issue where Spark was able to create a Driver and an Executor via YARN, but then it stopped making any progress. Note: I was using a pre-release ver

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
@Xiangrui How about we send the primary jar and secondary jars into distributed cache without adding them into the system classloader of executors. Then we add them using custom classloader so we don't need to call secondary jars through reflection in primary jar. This will be consistent to what we

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-21 Thread Kevin Markey
0 Abstaining because I'm not sure if my failures are due to Spark, configuration, or other factors... Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn documentation. No problems. Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache release). Updated

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
Hey I just looked at the fix here: https://github.com/apache/spark/pull/848 Given that this is quite simple, maybe it's best to just go with this and just explain that we don't support adding jars dynamically in YARN in Spark 1.0. That seems like a reasonable thing to do. On Wed, May 21, 2014 at

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Patrick Wendell
Of these two solutions I'd definitely prefer 2 in the short term. I'd imagine the fix is very straightforward (it would mostly just be remove code), and we'd be making this more consistent with the standalone mode which makes things way easier to reason about. In the long term we'll definitely wan

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Koert Kuipers
db tsai, i do not think userClassPathFirst is working, unless the classes you load dont reference any classes already loaded by the parent classloader (a mostly hypothetical situation)... i filed a jira for this here: https://issues.apache.org/jira/browse/SPARK-1863 On Tue, May 20, 2014 at 1:04

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Xiangrui Meng
That's a good example. If we really want to cover that case, there are two solutions: 1. Follow DB's patch, adding jars to the system classloader. Then we cannot put a user class in front of an existing class. 2. Do not send the primary jar and secondary jars to executors' distributed cache. Inste

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Sandy Ryza
Is that an assumption we can make? I think we'd run into an issue in this situation: *In primary jar:* def makeDynamicObject(clazz: String) = Class.forName(clazz).newInstance() *In app code:* sc.addJar("dynamicjar.jar") ... rdd.map(x => makeDynamicObject("some.class.from.DynamicJar")) It might

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
How about the jars added dynamically? Those will be in customLoader's classpath but not in the system one. Unfortunately, when we reference to those jars added dynamically in primary jar, the default classloader will be the system one not the custom one. It works in standalone mode since the prima

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Xiangrui Meng
I think adding jars dynamically should work as long as the primary jar and the secondary jars do not depend on dynamically added jars, which should be the correct logic. -Xiangrui On Wed, May 21, 2014 at 1:40 PM, DB Tsai wrote: > This will be another separate story. > > Since in the yarn deployme

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
This will be another separate story. Since in the yarn deployment, as Sandy said, the app.jar will be always in the systemclassloader which means any object instantiated in app.jar will have parent loader of systemclassloader instead of custom one. As a result, the custom class loader in yarn will

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread Sandy Ryza
This will solve the issue for jars added upon application submission, but, on top of this, we need to make sure that anything dynamically added through sc.addJar works as well. To do so, we need to make sure that any jars retrieved via the driver's HTTP server are loaded by the same classloader th

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-21 Thread Mark Hamstra
+1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra wrote: > Signature and hash for source looks good > No external executable package with source - good > Compiled with git and maven - good > Ran examples and sample programs locally and standalone -good > > +1 > > - Henry > > > > On Tue, May 20,

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Tobias, Regarding my comment on closure serialization: I was discussing it with my fellow Sparkers here and I totally overlooked the fact that you need the class files to de-serialize the closures (or whatever) on the workers, so you always need the jar file delivered to the workers in order f

Should SPARK_HOME be needed with Mesos?

2014-05-21 Thread Gerard Maas
Spark dev's, I was looking into a question asked on the user list where a ClassNotFoundException was thrown when running a job on Mesos. Curious issue with serialization on Mesos: more details here [1]: When trying to run that simple example on my Mesos installation, I faced another issue: I got

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Tobias, I was curious about this issue and tried to run your example on my local Mesos. I was able to reproduce your issue using your current config: [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times (most recent failure: Exception failure: java.lang.

Re:MLlib ALS-- Errors communicating with MapOutputTracker

2014-05-21 Thread witgo
Lack of hard disk space? If yes, you can try https://github.com/apache/spark/pull/828 -- Original -- From: "Sue Cai";; Date: Wed, May 21, 2014 03:31 PM To: "dev"; Subject: MLlib ALS-- Errors communicating with MapOutputTracker Hello, I am currently usi

MLlib ALS-- Errors communicating with MapOutputTracker

2014-05-21 Thread Sue Cai
Hello, I am currently using MLlib ALS to process a large volume of data, about 1.2 billion Rating(userId, productId, rates) triples. The dataset was sepatated into 4000 partitions for parallized computation on our yarn clusters. I encountered this error "Errors communicating with MapOutputTracke