Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
For some reason it does not appear when I hit "tab" in Spark shell, but when I put everything together in one line, it DOES WORK! orig_graph.edges.map(_.copy()).cartesian(orig_graph.edges.map(_.copy())).flatMap( A => Seq(if (A._1.srcId == A._2.dstId) Edge(A._2.srcId,A._1.dstId,1) else if (A._1.dst

Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread Sean Owen
http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions It becomes automagically available when your RDD contains pairs. On Tue, May 20, 2014 at 9:00 PM, GlennStrycker wrote: > I don't seem to have this function in my Spark installation for this object, > or

Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread Mark Hamstra
That's all very old functionality in Spark terms, so it shouldn't have anything to do with your installation being out-of-date. There is also no need to cast as long as the relevant implicit conversions are in scope: import org.apache.spark.SparkContext._ On Tue, May 20, 2014 at 1:00 PM, GlennSt

Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
I don't seem to have this function in my Spark installation for this object, or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph. Which class should have the reduceByKey function, and how do I cast my current RDD as this class? Perhaps this is still due to my Spark installation

Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread Reynold Xin
You are probably looking for reduceByKey in that case. "reduce" just reduces everything in the collection into a single element. On Tue, May 20, 2014 at 12:16 PM, GlennStrycker wrote: > Wait a minute... doesn't a reduce function return 1 element PER key pair? > For example, word-count mapreduce

Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
Wait a minute... doesn't a reduce function return 1 element PER key pair? For example, word-count mapreduce functions return a {word, count} element for every unique word. Is this supposed to be a 1-element RDD object? The .reduce function for a MappedRDD or FlatMappedRDD both are of the form

Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
Oh... ha, good point. Sorry, I'm new to mapreduce programming and forgot about that... I'll have to adjust my reduce function to output a vector/RDD as the element to return. Thanks for reminding me of this! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabbl

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
reduce always return a single element - maybe you are misunderstanding what the reduce function in collections does. On Mon, May 19, 2014 at 3:32 PM, GlennStrycker wrote: > I tried adding .copy() everywhere, but still only get one element returned, > not even an RDD object. > > orig_graph.edges.

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
I tried adding .copy() everywhere, but still only get one element returned, not even an RDD object. orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge => (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) => { if (A._1.copy().dstId == B._1.copy().srcI

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
Yea unfortunately you need that as well. When 1.0 is released, you wouldn't need to do that anymore. BTW - you can also just check out the source code from github to build 1.0. The current branch-1.0 branch is very already at release candidate status - so it should be almost identical to the actua

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
Thanks, rxin, this worked! I am having a similar problem with .reduce... do I need to insert .copy() functions in that statement as well? This part works: orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge => (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).coll

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
This was an optimization that reuses a triplet object in GraphX, and when you do a collect directly on triplets, the same object is returned. It has been fixed in Spark 1.0 here: https://issues.apache.org/jira/browse/SPARK-1188 To work around in older version of Spark, you can add a copy step to

BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
graph.triplets does not work -- it returns incorrect results I have a graph with the following edges: orig_graph.edges.collect = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), Edge(5,3,1), Edge(