Re: RDD order guarantees

2020-05-06 Thread Antonin Delpeuch (lists)
Thanks a lot for the reply Steve! If you don't see a way to fix this in Spark itself, then I will try to improve the docs. Antonin On 06/05/2020 17:19, Steve Loughran wrote: > > > On Tue, 7 Apr 2020 at 15:26, Antonin Delpeuch > wrote: > > Hi, > > So

Re: RDD order guarantees

2020-05-06 Thread Steve Loughran
On Tue, 7 Apr 2020 at 15:26, Antonin Delpeuch wrote: > Hi, > > Sorry to dig out this thread but this bug is still present. > > The fix proposed in this thread (creating a new FileSystem implementation > which sorts listed files) was rejected, with the suggestion that it is the > FileInputFormat's

Re: RDD order guarantees

2020-04-07 Thread Antonin Delpeuch
Hi, Sorry to dig out this thread but this bug is still present. The fix proposed in this thread (creating a new FileSystem implementation which sorts listed files) was rejected, with the suggestion that it is the FileInputFormat's responsibility to sort the file names if preserving partition orde

Re: RDD order guarantees

2015-01-19 Thread Ewan Higgs
Hi Reynold. I'll take a look. SPARK-5300 is open for this issue. -Ewan On 19/01/15 08:39, Reynold Xin wrote: Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem impleme

Re: RDD order guarantees

2015-01-18 Thread Reynold Xin
Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem implementation that overrides the listStatus method, and then in Hadoop Conf set the fs.file.impl to that. Shouldn't be

Re: RDD order guarantees

2015-01-16 Thread Ewan Higgs
Yes, I am running on a local file system. Is there a bug open for this? Mingyu Kim reported the problem last April: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html -Ewan On 01/16/2015 07:41 PM, Reynold Xin wrote: You are running on a local

Re: RDD order guarantees

2015-01-16 Thread Reynold Xin
You are running on a local file system right? HDFS orders the file based on names, but local file system often don't. I think that's why the difference. We might be able to do a sort and order the partitions when we create a RDD to make this universal though. On Fri, Jan 16, 2015 at 8:26 AM, Ewan

RDD order guarantees

2015-01-16 Thread Ewan Higgs
Hi all, Quick one: when reading files, are the orders of partitions guaranteed to be preserved? I am finding some weird behaviour where I run sortByKeys() on an RDD (which has 16 byte keys) and write it to disk. If I open a python shell and run the following: for part in range(29): print