Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

Steve Loughran Fri, 08 May 2015 01:55:39 -0700

> On 7 May 2015, at 18:02, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> 
> We should make sure to update our docs to mention s3a as well, since many 
> people won't look at Hadoop's docs for this.
> 
> Matei
>

1. to use s3a you'll also need an amazon toolkit JAR on the cp
2. I can add a hadoop-2.6 profile that sets things up for s3a, azure and 
openstack swift.
3. TREAT S3A on HADOOP 2.6 AS BETA-RELEASE

For anyone thinking putting that in all-caps seems excessive, consult

https://issues.apache.org/jira/browse/HADOOP-11571

in particular, anything that queries for the block size of a file before 
dividing work up is dead in the water due to 
HADOOP-11584 : s3a file block size set to 0 in getFileStatus. There's also 
thread pooling problems if too many
writes are going on in the same JVM; this may hit output operations

Hadoop 2.7 fixes all the phase I issues, leaving those in HADOOP-11694 to look 
at

>> On May 7, 2015, at 12:57 PM, Nicholas Chammas <nicholas.cham...@gmail.com> 
>> wrote:
>> 
>> Ah, thanks for the pointers.
>> 
>> So as far as Spark is concerned, is this a breaking change? Is it possible
>> that people who have working code that accesses S3 will upgrade to use
>> Spark-against-Hadoop-2.6 and find their code is not working all of a sudden?
>> 
>> Nick
>> 
>> On Thu, May 7, 2015 at 12:48 PM Peter Rudenko <petro.rude...@gmail.com 
>> <mailto:petro.rude...@gmail.com>>
>> wrote:
>> 
>>> Yep it's a Hadoop issue:
>>> https://issues.apache.org/jira/browse/HADOOP-11863
>>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spark 1.3.1 / Hadoop 2.6 package has broken S3 access

Reply via email to