Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-09 Thread Rahul Kumar Singh
n Owen > > Komu: > > Datum: 08.10.2014 18:05 > > Předmět: Re: SparkContext.wholeTextFiles() > java.io.FileNotFoundException: File does not exist: > > > > > CC: "user@spark.apache.org" > > Take this as a bit of a guess, since I don't use S3 much an

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-09 Thread jan.zikes
m the standard EC2 installation? __ Od: Sean Owen Komu: Datum: 08.10.2014 18:05 Předmět: Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist: CC: "user@spark.apache.org" Take this as a bit of a

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread Sean Owen
Take this as a bit of a guess, since I don't use S3 much and am only a bit aware of the Hadoop+S3 integration issues. But I know that S3's lack of proper directories causes a few issues when used with Hadoop, which wants to list directories. According to http://hadoop.apache.org/docs/r2.3.0/api/o

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread jan.zikes
One more update: I've realized that this problem is not only Python related. I've tried it also in Scala, but I'm still getting the same error, my scala code: val file = sc.wholeTextFiles("s3n://wiki-dump/wikiinput").first() __ My addit

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread jan.zikes
My additional question is if this problem can be possibly caused by the fact that my file is bigger than RAM memory across the whole cluster?   __ Hi I'm trying to use sc.wholeTextFiles() on file that is stored amazon S3 I'm getting fol

SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread jan.zikes
Hi I'm trying to use sc.wholeTextFiles() on file that is stored amazon S3 I'm getting following Error:   14/10/08 06:09:50 INFO input.FileInputFormat: Total input paths to process : 1 14/10/08 06:09:50 INFO input.FileInputFormat: Total input paths to process : 1 Traceback (most recent call last):