Hi Sean,

Thanks for your great help! It works all right if I remove persist!!

For next step, I will transform those values before persist.
I convert to RDD and back to JavaRDD just for testing purposes.

Best Regards,
Jia

On Mon, Jul 25, 2016 at 1:01 PM, Sean Owen <so...@cloudera.com> wrote:

> Why are you converting to RDD and back to JavaRDD?
> The problem is storing references to Writable, which are mutated by the
> InputFormat. Somewhere you have 1000 refs to the same key. I think it may
> be the persist. You want to immediately transform these values to something
> besides a Writable.
>
> On Mon, Jul 25, 2016, 18:50 Jia Zou <jacqueline...@gmail.com> wrote:
>
>>
>> My code is as following:
>>
>>                 System.out.println("Initialize points...");
>>
>>                 JavaPairRDD<IntWritable, DoubleArrayWritable> data =
>>
>>                                 sc.sequenceFile(inputFile, IntWritable.
>> class, DoubleArrayWritable.class);
>>
>>                 RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
>>
>>                                 JavaPairRDD.toRDD(data);
>>
>>                 JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points
>> = JavaRDD.fromRDD(rdd, data.classTag());
>>
>>                 points.persist(StorageLevel.MEMORY_ONLY());
>>
>>                 int i;
>>
>>
>>               for (i=0; i<iterations; i++) {
>>
>>                         System.out.println("iteration="+i);
>>
>>                         //points.foreach(new
>> ForEachMapPointToCluster(numDimensions, numClusters));
>>
>>                         points.foreach(new
>> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() {
>>
>>                             public void call(Tuple2<IntWritable,
>> DoubleArrayWritable> tuple) {
>>
>>                                 IntWritable key = tuple._1();
>>
>>                                 System.out.println("key:"+key.get());
>>
>>                                 DoubleArrayWritable array = tuple._2();
>>
>>                                 double[] point = array.getData();
>>
>>                                 for (int d = 0; d < 20; d ++) {
>>
>>                                     System.out.println(d+":"+point[d]);
>>
>>                                 }
>>
>>                             }
>>
>>                         });
>>
>>                 }
>>
>>
>> The output is a lot of following, only the last element in the rdd has
>> been output.
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>

Reply via email to