subject:"Re\: Modify the functioning of zipWithIndex function for RDDs"

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Punit Naik

Actually I was writing a code for the Connected Components algorithm. In that I have to keep track of a variable called vertex number which keeps on getting incremented based on the number of triples it encounters in a line. This variable should be available at all the nodes and all the partitions.

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu

Since the data.length is variable, I am not sure whether mixing data.length and the index makes sense. Can you describe your use case in bit more detail ? Thanks On Tue, Jun 28, 2016 at 11:34 AM, Punit Naik wrote: > Hi Ted > > So would the tuple look like: (x._1, split.startIndex + x._2 + > x.

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Punit Naik

Hi Ted So would the tuple look like: (x._1, split.startIndex + x._2 + x._1.length) ? On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu wrote: > Please take a look at: > core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala > > In compute() method: > val split = splitIn.asInstanceOf[Zippe

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu

Please take a look at: core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala In compute() method: val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition] firstParent[T].iterator(split.prev, context).zipWithIndex.map { x => (x._1, split.startIndex + x._2) You can mo