Re: Spark reading from S3 getting very slow

2015-11-05 Thread Steve Loughran
On 5 Nov 2015, at 02:03, Younes Naguib mailto:younes.nag...@tritondigital.com>> wrote: Hi all, I’m reading large text files from s3. Sizes between from 30GB and 40GB. Every stage runs in 8-9s, except the last 32, jumps to 1mn-2mn for some reason! Here is my sample code: val myDF = sc.textFi

Spark reading from S3 getting very slow

2015-11-04 Thread Younes Naguib
Hi all, I'm reading large text files from s3. Sizes between from 30GB and 40GB. Every stage runs in 8-9s, except the last 32, jumps to 1mn-2mn for some reason! Here is my sample code: val myDF = sc.textFile(input_file).map{ x => val p = x.split("\t", -1) new (

Re: spark, reading from s3

2015-02-12 Thread Kane Kim
Looks like my clock is in sync: -bash-4.1$ date && curl -v s3.amazonaws.com Thu Feb 12 21:40:18 UTC 2015 * About to connect() to s3.amazonaws.com port 80 (#0) * Trying 54.231.12.24... connected * Connected to s3.amazonaws.com (54.231.12.24) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.19.7

Re: spark, reading from s3

2015-02-12 Thread Franc Carter
Check that your timezone is correct as well, an incorrect timezone can make it look like your time is correct when it is skewed. cheers On Fri, Feb 13, 2015 at 5:51 AM, Kane Kim wrote: > The thing is that my time is perfectly valid... > > On Tue, Feb 10, 2015 at 10:50 PM, Akhil Das > wrote: >

Re: spark, reading from s3

2015-02-12 Thread Kane Kim
The thing is that my time is perfectly valid... On Tue, Feb 10, 2015 at 10:50 PM, Akhil Das wrote: > Its with the timezone actually, you can either use an NTP to maintain > accurate system clock or you can adjust your system time to match with the > AWS one. You can do it as: > > telnet s3.amazo

Re: spark, reading from s3

2015-02-10 Thread Akhil Das
Its with the timezone actually, you can either use an NTP to maintain accurate system clock or you can adjust your system time to match with the AWS one. You can do it as: telnet s3.amazonaws.com 80 GET / HTTP/1.0 [image: Inline image 1] Thanks Best Regards On Wed, Feb 11, 2015 at 6:43 AM, Kan

spark, reading from s3

2015-02-10 Thread Kane Kim
I'm getting this warning when using s3 input: 15/02/11 00:58:37 WARN RestStorageService: Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately 0 seconds. Retrying connection. After that there are tons of 403/forbidden erro