Actually I was writing a code for the Connected Components algorithm. In
that I have to keep track of a variable called vertex number which keeps on
getting incremented based on the number of triples it encounters in a line.
This variable should be available at all the nodes and all the partitions.
Since the data.length is variable, I am not sure whether mixing data.length
and the index makes sense.
Can you describe your use case in bit more detail ?
Thanks
On Tue, Jun 28, 2016 at 11:34 AM, Punit Naik wrote:
> Hi Ted
>
> So would the tuple look like: (x._1, split.startIndex + x._2 +
> x.
Hi Ted
So would the tuple look like: (x._1, split.startIndex + x._2 + x._1.length)
?
On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu wrote:
> Please take a look at:
> core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
>
> In compute() method:
> val split = splitIn.asInstanceOf[Zippe
Please take a look at:
core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
In compute() method:
val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition]
firstParent[T].iterator(split.prev, context).zipWithIndex.map { x =>
(x._1, split.startIndex + x._2)
You can mo