Re: Spark reads partitions in a wrong order

2014-04-25 Thread Andrew Ash
Have you run the same test but with a URL in HDFS rather than the local filesystem? I think order may be preserved in that run, which makes the local filesystem losing order look more like a bug. Sent from my mobile phone On Apr 25, 2014 9:11 AM, "Mingyu Kim" wrote: > If the underlying file syst

Spark reads partitions in a wrong order

2014-04-25 Thread Mingyu Kim
If the underlying file system returns files in a non-alphabetical order to java.io.File.listFiles(), Spark reads the partitions out of order. Here¹s an example. var sc = new SparkContext(³local[3]², ³test²); var rdd1 = sc.parallelize([1,2,3,4,5]); rdd1.saveAsTextFile(³file://path/to/file²); var rd