subject:"Readin from Amazon S3 behaves inconsistently\: return different number of lines..."

Re: Readin from Amazon S3 behaves inconsistently: return different number of lines...

2014-08-30 Thread Chris Fregly

interesting and possibly-related blog post from netflix earlier this year: http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html On Fri, Aug 1, 2014 at 8:09 AM, nit wrote: > @sean - I am using latest code from master branch, up to commit# > a7d145e98c55fa66a541293930f25d9cdc25f3b

Re: Readin from Amazon S3 behaves inconsistently: return different number of lines...

2014-08-01 Thread nit

@sean - I am using latest code from master branch, up to commit# a7d145e98c55fa66a541293930f25d9cdc25f3b4 . In my case I have multiple directories with 1024 files(in that sizes of files may be different). For some directories I always get consistent result... and for others I can reproduce the inc

Re: Readin from Amazon S3 behaves inconsistently: return different number of lines...

2014-08-01 Thread Sean Owen

See https://issues.apache.org/jira/browse/SPARK-2579 It also was mentioned on the mailing list a while ago, and have heard tell of this from customers. I am trying to get to the bottom of it too. What version are you using, to start? I am wondering if it was fixed in 1.0.x since I was not able to

Readin from Amazon S3 behaves inconsistently: return different number of lines...

2014-07-31 Thread nit

get all the data with sc.textFile due to the issue I mentioned above) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Readin-from-Amazon-S3-behaves-inconsistently-return-different-number-of-lines-tp11092.html Sent from the Apache Spark User List mailing list archive at Nabble.com.