Re: Incorrect results with reduceByKey

2015-11-18 Thread tovbinm
Deep copying the data solved the issue: data.map(r => {val t = SpecificData.get().deepCopy(r.getSchema, r); (t.id, List(t)) }).reduceByKey(_ ++ _) (noted here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1003) Thanks Igor Berman, for point

Re: Incorrect results with reduceByKey

2015-11-17 Thread Igor Berman
you should clone your data after reading avro On 18 November 2015 at 06:28, tovbinm wrote: > Howdy, > > We've noticed a strange behavior with Avro serialized data and reduceByKey > RDD functionality. Please see below: > > // We're reading a bunch of Avro serialized data > val data: RDD[T] = spa