You can submit a pull request on the github mirror:
https://github.com/apache/spark
Thanks.
On Wed, May 21, 2014 at 10:59 PM, npanj wrote:
> Hi,
>
> For my project I needed to load a graph with edge weight; for this I have
> updated GraphLoader.edgeListFile to consider third column in input fi
Hi,
For my project I needed to load a graph with edge weight; for this I have
updated GraphLoader.edgeListFile to consider third column in input file. I
will like to submit my patch for review so that it can be merged with master
branch. What is the process for submitting patches?
--
View this
Hi Gerard,
I agree that your second option seems preferred. You shouldn't have to
specify a SPARK_HOME if the executor is going to use the spark.executor.uri
instead. Can you send in a pull request that includes your proposed
changes?
Andrew
On Wed, May 21, 2014 at 10:19 AM, Gerard Maas wrot
I don't think Kevin's issue would be with an api change in YarnClientImpl since
in both cases he says he is using hadoop 2.3.0. I'll take a look at his post
in the user list.
Tom
On Wednesday, May 21, 2014 7:01 PM, Colin McCabe wrote:
Hi Kevin,
Can you try https://issues.apache.org/ji
Has anyone tried pyspark on yarn and got it to work? I was having issues when
I built spark on redhat but when I built on my mac it had worked, but now when
I build it on my mac it also doesn't work.
Tom
On Tuesday, May 20, 2014 3:14 PM, Tathagata Das
wrote:
Please vote on releasing
Hi Witgo,
Thanks a lot for your reply.
In my second test, the user features and product features were loaded
from the file system directly,which means I did not use ALS here, and this
problem happened at the loading data stage. The way I am asking the question
was a little bit miss lead
Hi Kevin,
Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it
fixes your issue?
Running in YARN cluster mode, I had a similar issue where Spark was able to
create a Driver and an Executor via YARN, but then it stopped making any
progress.
Note: I was using a pre-release ver
@Xiangrui
How about we send the primary jar and secondary jars into distributed cache
without adding them into the system classloader of executors. Then we add
them using custom classloader so we don't need to call secondary jars
through reflection in primary jar. This will be consistent to what we
0
Abstaining because I'm not sure if my failures are due to Spark,
configuration, or other factors...
Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn
documentation. No problems.
Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache
release).
Updated
Hey I just looked at the fix here:
https://github.com/apache/spark/pull/848
Given that this is quite simple, maybe it's best to just go with this
and just explain that we don't support adding jars dynamically in YARN
in Spark 1.0. That seems like a reasonable thing to do.
On Wed, May 21, 2014 at
Of these two solutions I'd definitely prefer 2 in the short term. I'd
imagine the fix is very straightforward (it would mostly just be
remove code), and we'd be making this more consistent with the
standalone mode which makes things way easier to reason about.
In the long term we'll definitely wan
db tsai, i do not think userClassPathFirst is working, unless the classes
you load dont reference any classes already loaded by the parent
classloader (a mostly hypothetical situation)... i filed a jira for this
here:
https://issues.apache.org/jira/browse/SPARK-1863
On Tue, May 20, 2014 at 1:04
That's a good example. If we really want to cover that case, there are
two solutions:
1. Follow DB's patch, adding jars to the system classloader. Then we
cannot put a user class in front of an existing class.
2. Do not send the primary jar and secondary jars to executors'
distributed cache. Inste
Is that an assumption we can make? I think we'd run into an issue in this
situation:
*In primary jar:*
def makeDynamicObject(clazz: String) = Class.forName(clazz).newInstance()
*In app code:*
sc.addJar("dynamicjar.jar")
...
rdd.map(x => makeDynamicObject("some.class.from.DynamicJar"))
It might
How about the jars added dynamically? Those will be in customLoader's
classpath but not in the system one. Unfortunately, when we reference to
those jars added dynamically in primary jar, the default classloader will
be the system one not the custom one.
It works in standalone mode since the prima
I think adding jars dynamically should work as long as the primary jar
and the secondary jars do not depend on dynamically added jars, which
should be the correct logic. -Xiangrui
On Wed, May 21, 2014 at 1:40 PM, DB Tsai wrote:
> This will be another separate story.
>
> Since in the yarn deployme
This will be another separate story.
Since in the yarn deployment, as Sandy said, the app.jar will be always in
the systemclassloader which means any object instantiated in app.jar will
have parent loader of systemclassloader instead of custom one. As a result,
the custom class loader in yarn will
This will solve the issue for jars added upon application submission, but,
on top of this, we need to make sure that anything dynamically added
through sc.addJar works as well.
To do so, we need to make sure that any jars retrieved via the driver's
HTTP server are loaded by the same classloader th
+1
On Tue, May 20, 2014 at 11:09 PM, Henry Saputra wrote:
> Signature and hash for source looks good
> No external executable package with source - good
> Compiled with git and maven - good
> Ran examples and sample programs locally and standalone -good
>
> +1
>
> - Henry
>
>
>
> On Tue, May 20,
Hi Tobias,
Regarding my comment on closure serialization:
I was discussing it with my fellow Sparkers here and I totally overlooked
the fact that you need the class files to de-serialize the closures (or
whatever) on the workers, so you always need the jar file delivered to the
workers in order f
Spark dev's,
I was looking into a question asked on the user list where a
ClassNotFoundException was thrown when running a job on Mesos. Curious
issue with serialization on Mesos: more details here [1]:
When trying to run that simple example on my Mesos installation, I faced
another issue: I got
Hi Tobias,
I was curious about this issue and tried to run your example on my local
Mesos. I was able to reproduce your issue using your current config:
[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
1.0:4 failed 4 times (most recent failure: Exception failure:
java.lang.
Lack of hard disk space? If yes, you can try
https://github.com/apache/spark/pull/828
-- Original --
From: "Sue Cai";;
Date: Wed, May 21, 2014 03:31 PM
To: "dev";
Subject: MLlib ALS-- Errors communicating with MapOutputTracker
Hello,
I am currently usi
Hello,
I am currently using MLlib ALS to process a large volume of data, about 1.2
billion Rating(userId, productId, rates) triples. The dataset was sepatated
into 4000 partitions for parallized computation on our yarn clusters.
I encountered this error "Errors communicating with MapOutputTracke
24 matches
Mail list logo