Hello. I have a hadoopFile RDD and i tried to collect items to driver program, but it returns me an array of identical records (equals to last record of my file). My code is like this:
val rdd = sc.hadoopFile( "hdfs://..../data.avro", classOf[org.apache.avro.mapred.AvroInputFormat[MyAvroRecord]], classOf[org.apache.avro.mapred.AvroWrapper[MyAvroRecord]], classOf[org.apache.hadoop.io.NullWritable], 10) val collectedData = rdd.collect() for (s <- collectedData){ println(s) } it prints wrong data. But rdd.foreach(println) works as expected. What wrong with my code and how can i collect the hadoop RDD files (actually i want collect parts of it) to the driver program ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-hadoopFile-RDD-returns-wrong-results-tp14368.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org