> On 7 May 2015, at 18:02, Matei Zaharia <matei.zaha...@gmail.com> wrote: > > We should make sure to update our docs to mention s3a as well, since many > people won't look at Hadoop's docs for this. > > Matei >
1. to use s3a you'll also need an amazon toolkit JAR on the cp 2. I can add a hadoop-2.6 profile that sets things up for s3a, azure and openstack swift. 3. TREAT S3A on HADOOP 2.6 AS BETA-RELEASE For anyone thinking putting that in all-caps seems excessive, consult https://issues.apache.org/jira/browse/HADOOP-11571 in particular, anything that queries for the block size of a file before dividing work up is dead in the water due to HADOOP-11584 : s3a file block size set to 0 in getFileStatus. There's also thread pooling problems if too many writes are going on in the same JVM; this may hit output operations Hadoop 2.7 fixes all the phase I issues, leaving those in HADOOP-11694 to look at >> On May 7, 2015, at 12:57 PM, Nicholas Chammas <nicholas.cham...@gmail.com> >> wrote: >> >> Ah, thanks for the pointers. >> >> So as far as Spark is concerned, is this a breaking change? Is it possible >> that people who have working code that accesses S3 will upgrade to use >> Spark-against-Hadoop-2.6 and find their code is not working all of a sudden? >> >> Nick >> >> On Thu, May 7, 2015 at 12:48 PM Peter Rudenko <petro.rude...@gmail.com >> <mailto:petro.rude...@gmail.com>> >> wrote: >> >>> Yep it's a Hadoop issue: >>> https://issues.apache.org/jira/browse/HADOOP-11863 >>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org