I'm not sure if this is what you're looking for but we have several custom
RDD implementations for internal data format/partitioning schemes.
The Spark api is really simple and consists primarily of being able to
implement 3 simple things:
1) You need a class that extends RDD that's lightweight
Yana and Sean,
Thanks for the feedback. I can get it to work a number of ways, I'm just
wondering if there's a preferred means.
One last question. Is there a reason the deployed Spark install doesn't
contain the same version of several classes as the maven dependency. Is this
intentional?
Thank
Ah. I've avoided using spark-submit primarily because our use of Spark is as
part of an analytics library that's meant to be embedded in other
applications with their own lifecycle management.
One of those application is a REST app running in tomcat which will make the
use of spark-submit difficul
Hi Sean,
I'm packaging spark with my (standalone) driver app using maven. Any
assemblies that are used on the mesos workers through extending the
classpath or providing the jars in the driver (via the SparkConf) isn't
packaged with spark (it seems obvious that would be a mistake).
I need, for exa
Hi Sean,
I'm running a Mesos cluster. My driver app is built using maven against the
maven 1.4.0 dependency.
The Mesos slave machines have the spark distribution installed from the
distribution link.
I have a hard time understanding how this isn't a standard app deployment
but maybe I'm missing
These jars are simply incompatible. You can see this by looking at that class
in both the maven repo for 1.4.0 here:
http://central.maven.org/maven2/org/apache/spark/spark-core_2.10/1.4.0/spark-core_2.10-1.4.0.jar
as well as the spark-assembly jar inside the .tgz file you can get from the
officia
Hello all,
I have a strange problem. I have a mesos spark cluster with Spark
1.4.0/Hadoop 2.4.0 installed and a client application use maven to include
the same versions.
However, I'm getting a serialUIDVersion problem on:
ERROR Remoting -
org.apache.spark.storage.BlockManagerMessages$RegisterB
Okay.
PR: https://github.com/apache/spark/pull/5669
Jira: https://issues.apache.org/jira/browse/SPARK-7100
Hope that helps.
Let me know if you need anything else.
Jim
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/GradientBoostTrees-leaks-a-per
Hi Sean and Joe,
I have another question.
GradientBoostedTrees.run iterates over the RDD calling DecisionTree.run on
each iteration with a new random sample from the input RDD. DecisionTree.run
calls RandomForest.run. which also calls persist.
One of these seems superfluous.
Should I simply re
Hi Joe,
Do you want a PR per branch (one for master, one for 1.3)? Are you still
maintaining 1.2? Do you need a Jira ticket per PR or can I submit them all
under the same ticket?
Or should I just submit it to master and let you guys back-port it?
Jim
--
View this message in context:
http://
Hi all,
It appears GradientBoostedTrees.scala can call 'persist' on an RDD and never
unpersist it. In the master branch it's here:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala#L181
In 1.3.1 it's here:
https://github.com/
Hello all,
I worked around this for now using the class (that I already had) that
inherits from RDD and is the one all of our custom RDDs inherit from. I did
the following:
1) Overload all of the transformations (that get used in our app) that don't
change the RDD size wrapping the results with a
Hi Sean,
Thanks for the response.
I can't imagine a case (though my imagination may be somewhat limited) where
even map side effects could change the number of elements in the resulting
map.
I guess "count" wouldn't officially be an 'action' if it were implemented
this way. At least it wouldn't
Hi all,
I was wondering why the RDD.count call recomputes the RDD in all cases? In
most cases it can simply ask the next dependent RDD. I have several RDD
implementations and was surprised to see a call like the following never
call my RDD's count method but instead recompute/traverse the entire
d
14 matches
Mail list logo