For some reason it does not appear when I hit "tab" in Spark shell, but when
I put everything together in one line, it DOES WORK!
orig_graph.edges.map(_.copy()).cartesian(orig_graph.edges.map(_.copy())).flatMap(
A => Seq(if (A._1.srcId == A._2.dstId) Edge(A._2.srcId,A._1.dstId,1) else if
(A._1.dst
http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
It becomes automagically available when your RDD contains pairs.
On Tue, May 20, 2014 at 9:00 PM, GlennStrycker wrote:
> I don't seem to have this function in my Spark installation for this object,
> or
That's all very old functionality in Spark terms, so it shouldn't have
anything to do with your installation being out-of-date. There is also no
need to cast as long as the relevant implicit conversions are in scope:
import org.apache.spark.SparkContext._
On Tue, May 20, 2014 at 1:00 PM, GlennSt
I don't seem to have this function in my Spark installation for this object,
or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph.
Which class should have the reduceByKey function, and how do I cast my
current RDD as this class?
Perhaps this is still due to my Spark installation
You are probably looking for reduceByKey in that case.
"reduce" just reduces everything in the collection into a single element.
On Tue, May 20, 2014 at 12:16 PM, GlennStrycker wrote:
> Wait a minute... doesn't a reduce function return 1 element PER key pair?
> For example, word-count mapreduce
Wait a minute... doesn't a reduce function return 1 element PER key pair?
For example, word-count mapreduce functions return a {word, count} element
for every unique word. Is this supposed to be a 1-element RDD object?
The .reduce function for a MappedRDD or FlatMappedRDD both are of the form
Oh... ha, good point. Sorry, I'm new to mapreduce programming and forgot
about that... I'll have to adjust my reduce function to output a vector/RDD
as the element to return. Thanks for reminding me of this!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabbl
reduce always return a single element - maybe you are misunderstanding what
the reduce function in collections does.
On Mon, May 19, 2014 at 3:32 PM, GlennStrycker wrote:
> I tried adding .copy() everywhere, but still only get one element returned,
> not even an RDD object.
>
> orig_graph.edges.
I tried adding .copy() everywhere, but still only get one element returned,
not even an RDD object.
orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge =>
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce(
(A,B) => { if (A._1.copy().dstId == B._1.copy().srcI
Yea unfortunately you need that as well. When 1.0 is released, you wouldn't
need to do that anymore.
BTW - you can also just check out the source code from github to build 1.0.
The current branch-1.0 branch is very already at release candidate status -
so it should be almost identical to the actua
Thanks, rxin, this worked!
I am having a similar problem with .reduce... do I need to insert .copy()
functions in that statement as well?
This part works:
orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge =>
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).coll
This was an optimization that reuses a triplet object in GraphX, and when
you do a collect directly on triplets, the same object is returned.
It has been fixed in Spark 1.0 here:
https://issues.apache.org/jira/browse/SPARK-1188
To work around in older version of Spark, you can add a copy step to
graph.triplets does not work -- it returns incorrect results
I have a graph with the following edges:
orig_graph.edges.collect
= Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1),
Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1),
Edge(5,2,1), Edge(5,3,1), Edge(
13 matches
Mail list logo