Re: Accessing S3 files with s3n://

2015-08-11 Thread Steve Loughran
On 10 Aug 2015, at 20:17, Akshat Aranya mailto:aara...@gmail.com>> wrote: Hi Jerry, Akhil, Thanks your your help. With s3n, the entire file is downloaded even while just creating the RDD with sqlContext.read.parquet(). It seems like even just opening and closing the InputStream causes the en

Re: Accessing S3 files with s3n://

2015-08-10 Thread Akshat Aranya
Hi Jerry, Akhil, Thanks your your help. With s3n, the entire file is downloaded even while just creating the RDD with sqlContext.read.parquet(). It seems like even just opening and closing the InputStream causes the entire data to get fetched. As it turned out, I was able to use s3a and avoid th

Re: Accessing S3 files with s3n://

2015-08-09 Thread bo yang
Hi Akshat, I find some open source library which implements S3 InputFormat for Hadoop. Then I use Spark newAPIHadoopRDD to load data via that S3 InputFormat. The open source library is https://github.com/ATLANTBH/emr-s3-io. It is a little old. I look inside it and make some changes. Then it works

Re: Accessing S3 files with s3n://

2015-08-09 Thread Jerry Lam
Hi Akshat, Is there a particular reason you don't use s3a? From my experience,s3a performs much better than the rest. I believe the inefficiency is from the implementation of the s3 interface. Best Regards, Jerry Sent from my iPhone > On 9 Aug, 2015, at 5:48 am, Akhil Das wrote: > > Depend

Re: Accessing S3 files with s3n://

2015-08-09 Thread Akhil Das
Depends on which operation you are doing, If you are doing a .count() on a parquet, it might not download the entire file i think, but if you do a .count() on a normal text file it might pull the entire file. Thanks Best Regards On Sat, Aug 8, 2015 at 3:12 AM, Akshat Aranya wrote: > Hi, > > I'v