Re: s3a file system and spark deployment mode

Scott Reynolds Thu, 15 Oct 2015 12:39:48 -0700

We do not use EMR. This is deployed on Amazon VMs

We build Spark with Hadoop-2.6.0 but that does not include the s3a
filesystem nor the Amazon AWS SDK


On Thu, Oct 15, 2015 at 12:26 PM, Spark Newbie <sparknewbie1...@gmail.com>
wrote:

> Are you using EMR?
> You can install Hadoop-2.6.0 along with Spark-1.5.1 in your EMR cluster.
> And that brings s3a jars to the worker nodes and it becomes available to
> your application.
>
> On Thu, Oct 15, 2015 at 11:04 AM, Scott Reynolds <sreyno...@twilio.com>
> wrote:
>
>> List,
>>
>> Right now we build our spark jobs with the s3a hadoop client. We do this
>> because our machines are only allowed to use IAM access to the s3 store. We
>> can build our jars with the s3a filesystem and the aws sdk just fine and
>> this jars run great in *client mode*.
>>
>> We would like to move from client mode to cluster mode as that will allow
>> us to be more resilient to driver failure. In order to do this either:
>> 1. the jar file has to be on worker's local disk
>> 2. the jar file is in shared storage (s3a)
>>
>> We would like to put the jar file in s3 storage, but when we give the jar
>> path as s3a://......, the worker node doesn't have the hadoop s3a and aws
>> sdk in its classpath / uber jar.
>>
>> Other then building spark with those two dependencies, what other options
>> do I have ? We are using 1.5.1 so SPARK_CLASSPATH is no longer a thing.
>>
>> Need to get s3a access to both the master (so that we can log spark event
>> log to s3) and to the worker processes (driver, executor).
>>
>> Looking for ideas before just adding the dependencies to our spark build
>> and calling it a day.
>>
>
>

Re: s3a file system and spark deployment mode

Reply via email to