subject:"Re\: collect on hadoopFile RDD returns wrong results"

Re: collect on hadoopFile RDD returns wrong results

2014-09-18 Thread Reynold Xin

This is due to the HadoopRDD (and also the underlying Hadoop InputFormat) reuse objects to avoid allocation. It is sort of tricky to fix. However, in most cases you can clone the records to make sure you are not collecting the same object over and over again. https://issues.apache.org/jira/browse/

Re: collect on hadoopFile RDD returns wrong results

2014-09-18 Thread vasiliy

i posted an example in previous post. Tested on spark 1.0.2, 1.2.0-SNAPSHOT and 1.1.0 for hadoop 2.4.0 on Windows and Linux servers with hortonworks hadoop 2.4 in local[4] mode. Any ideas about this spark behavior ? Akhil Das-2 wrote > Can you dump out a small piece of data? while doing rdd.colle

Re: collect on hadoopFile RDD returns wrong results

2014-09-17 Thread vasiliy

full code example: def main(args: Array[String]) { val conf = new SparkConf().setAppName("ErrorExample").setMaster("local[8]") .set("spark.serializer", classOf[KryoSerializer].getName) val sc = new SparkContext(conf) val rdd = sc.hadoopFile( "hdfs://./user.avro"

Re: collect on hadoopFile RDD returns wrong results

2014-09-17 Thread Akhil Das

Can you dump out a small piece of data? while doing rdd.collect and rdd.foreach(println) Thanks Best Regards On Wed, Sep 17, 2014 at 12:26 PM, vasiliy wrote: > it also appears in streaming hdfs fileStream > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabb

Re: collect on hadoopFile RDD returns wrong results

2014-09-16 Thread vasiliy

it also appears in streaming hdfs fileStream -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-hadoopFile-RDD-returns-wrong-results-tp14368p14425.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: collect on hadoopFile RDD returns wrong results

Re: collect on hadoopFile RDD returns wrong results

Re: collect on hadoopFile RDD returns wrong results

Re: collect on hadoopFile RDD returns wrong results

Re: collect on hadoopFile RDD returns wrong results

5 matches

Site Navigation

Mail list logo

Footer information