Hi Sean, Thanks for your great help! It works all right if I remove persist!!
For next step, I will transform those values before persist. I convert to RDD and back to JavaRDD just for testing purposes. Best Regards, Jia On Mon, Jul 25, 2016 at 1:01 PM, Sean Owen <so...@cloudera.com> wrote: > Why are you converting to RDD and back to JavaRDD? > The problem is storing references to Writable, which are mutated by the > InputFormat. Somewhere you have 1000 refs to the same key. I think it may > be the persist. You want to immediately transform these values to something > besides a Writable. > > On Mon, Jul 25, 2016, 18:50 Jia Zou <jacqueline...@gmail.com> wrote: > >> >> My code is as following: >> >> System.out.println("Initialize points..."); >> >> JavaPairRDD<IntWritable, DoubleArrayWritable> data = >> >> sc.sequenceFile(inputFile, IntWritable. >> class, DoubleArrayWritable.class); >> >> RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd = >> >> JavaPairRDD.toRDD(data); >> >> JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points >> = JavaRDD.fromRDD(rdd, data.classTag()); >> >> points.persist(StorageLevel.MEMORY_ONLY()); >> >> int i; >> >> >> for (i=0; i<iterations; i++) { >> >> System.out.println("iteration="+i); >> >> //points.foreach(new >> ForEachMapPointToCluster(numDimensions, numClusters)); >> >> points.foreach(new >> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() { >> >> public void call(Tuple2<IntWritable, >> DoubleArrayWritable> tuple) { >> >> IntWritable key = tuple._1(); >> >> System.out.println("key:"+key.get()); >> >> DoubleArrayWritable array = tuple._2(); >> >> double[] point = array.getData(); >> >> for (int d = 0; d < 20; d ++) { >> >> System.out.println(d+":"+point[d]); >> >> } >> >> } >> >> }); >> >> } >> >> >> The output is a lot of following, only the last element in the rdd has >> been output. >> >> key:999 >> >> 0:0.9953839426689233 >> >> 1:0.12656798341145892 >> >> 2:0.16621114723289654 >> >> 3:0.48628049787614236 >> >> 4:0.476991470215116 >> >> 5:0.5033640235789054 >> >> 6:0.09257098597507829 >> >> 7:0.3153088440494892 >> >> 8:0.8807426085223242 >> >> 9:0.2809625780570739 >> >> 10:0.9584880094505738 >> >> 11:0.38521222520661547 >> >> 12:0.5114241334425228 >> >> 13:0.9524628903835111 >> >> 14:0.5252549496842003 >> >> 15:0.5732037830866236 >> >> 16:0.8632451606583632 >> >> 17:0.39754347061499895 >> >> 18:0.2859522809981715 >> >> 19:0.2659002343432888 >> >> key:999 >> >> 0:0.9953839426689233 >> >> 1:0.12656798341145892 >> >> 2:0.16621114723289654 >> >> 3:0.48628049787614236 >> >> 4:0.476991470215116 >> >> 5:0.5033640235789054 >> >> 6:0.09257098597507829 >> >> 7:0.3153088440494892 >> >> 8:0.8807426085223242 >> >> 9:0.2809625780570739 >> >> 10:0.9584880094505738 >> >> 11:0.38521222520661547 >> >> 12:0.5114241334425228 >> >> 13:0.9524628903835111 >> >> 14:0.5252549496842003 >> >> 15:0.5732037830866236 >> >> 16:0.8632451606583632 >> >> 17:0.39754347061499895 >> >> 18:0.2859522809981715 >> >> 19:0.2659002343432888 >> >> key:999 >> >> 0:0.9953839426689233 >> >> 1:0.12656798341145892 >> >> 2:0.16621114723289654 >> >> 3:0.48628049787614236 >> >> 4:0.476991470215116 >> >> 5:0.5033640235789054 >> >> 6:0.09257098597507829 >> >> 7:0.3153088440494892 >> >> 8:0.8807426085223242 >> >> 9:0.2809625780570739 >> >> 10:0.9584880094505738 >> >> 11:0.38521222520661547 >> >> 12:0.5114241334425228 >> >> 13:0.9524628903835111 >> >> 14:0.5252549496842003 >> >> 15:0.5732037830866236 >> >> 16:0.8632451606583632 >> >> 17:0.39754347061499895 >> >> 18:0.2859522809981715 >> >> 19:0.2659002343432888 >> >