Re: Role-based S3 access outside of EMR

2016-08-14 Thread Steve Loughran
On 29 Jul 2016, at 00:07, Everett Anderson mailto:ever...@nuna.com.invalid>> wrote: Hey, Just wrapping this up -- I ended up following the instructions to build a custom Spark release with Hadoop 2.7.2, stealing from Steve's SPARK-7481

Re: Role-based S3 access outside of EMR

2016-07-28 Thread Everett Anderson
fs.s3a.S3AFileSystem > fs.AbstractFileSystem.s3.impl=org.apache.hadoop.fs.s3a.S3A > fs.AbstractFileSystem.s3a.impl=org.apache.hadoop.fs.s3a.S3A > > And make sure the s3a jars are in your classpath > > Thanks, > Ewan > > *From:* Everett Anderson [mailto:ever...@nuna.com.INV

Re: Role-based S3 access outside of EMR

2016-07-23 Thread Steve Loughran
asspath Thanks, Ewan From: Everett Anderson [mailto:ever...@nuna.com.INVALID] Sent: 21 July 2016 17:01 To: Gourav Sengupta mailto:gourav.sengu...@gmail.com>> Cc: Teng Qiu mailto:teng...@gmail.com>>; Andy Davidson mailto:a...@santacruzintegration.com>>; user mailto:user@spark.

RE: Role-based S3 access outside of EMR

2016-07-21 Thread Ewan Leith
Sengupta Cc: Teng Qiu ; Andy Davidson ; user Subject: Re: Role-based S3 access outside of EMR Hey, FWIW, we are using EMR, actually, in production. The main case I have for wanting to access S3 with Spark outside of EMR is that during development, our developers tend to run EC2 sandbox instances

Re: Role-based S3 access outside of EMR

2016-07-21 Thread Everett Anderson
Hey, FWIW, we are using EMR, actually, in production. The main case I have for wanting to access S3 with Spark outside of EMR is that during development, our developers tend to run EC2 sandbox instances that have all the rest of our code and access to some of the input data on S3. It'd be nice if

Re: Role-based S3 access outside of EMR

2016-07-21 Thread Gourav Sengupta
Hi Teng, This is totally a flashing news for me, that people cannot use EMR in production because its not open sourced, I think that even Werner is not aware of such a problem. Is EMRFS opensourced? I am curious to know what does HA stand for? Regards, Gourav On Thu, Jul 21, 2016 at 8:37 AM, Ten

Re: Role-based S3 access outside of EMR

2016-07-21 Thread Teng Qiu
there are several reasons that AWS users do (can) not use EMR, one point for us is that security compliance problem, EMR is totally not open sourced, we can not use it in production system. second is that EMR do not support HA yet. but to the original question from @Everett : -> Credentials and H

Re: Role-based S3 access outside of EMR

2016-07-20 Thread Gourav Sengupta
But that would mean you would be accessing data over internet increasing data read latency, data transmission failures. Why are you not using EMR? Regards, Gourav On Thu, Jul 21, 2016 at 1:06 AM, Everett Anderson wrote: > Thanks, Andy. > > I am indeed often doing something similar, now -- copyi

Re: Role-based S3 access outside of EMR

2016-07-20 Thread Everett Anderson
Thanks, Andy. I am indeed often doing something similar, now -- copying data locally rather than dealing with the S3 impl selection and AWS credentials issues. It'd be nice if it worked a little easier out of the box, though! On Tue, Jul 19, 2016 at 2:47 PM, Andy Davidson < a...@santacruzintegra

Re: Role-based S3 access outside of EMR

2016-07-19 Thread Andy Davidson
Hi Everett I always do my initial data exploration and all our product development in my local dev env. I typically select a small data set and copy it to my local machine My main() has an optional command line argument Œ- - runLocal¹ Normally I load data from either hdfs:/// or S3n:// . If the a