Hello. I have a hadoopFile RDD and i tried to collect items to driver
program, but it returns me an array of identical records (equals to last
record of my file). My code is like this:

val rdd = sc.hadoopFile(
      "hdfs://..../data.avro",
      classOf[org.apache.avro.mapred.AvroInputFormat[MyAvroRecord]],
      classOf[org.apache.avro.mapred.AvroWrapper[MyAvroRecord]],
      classOf[org.apache.hadoop.io.NullWritable],
      10)
    
    
val collectedData = rdd.collect()

for (s <- collectedData){
   println(s)
}

it prints wrong data. But rdd.foreach(println) works as expected. 

What wrong with my code and how can i collect the hadoop RDD files (actually
i want collect parts of it) to the driver program ?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-hadoopFile-RDD-returns-wrong-results-tp14368.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to