Re: use S3-Compatible Storage with spark

Schmirr Wurst Wed, 22 Jul 2015 07:30:41 -0700

I could get a little further :
- installed spark-1.4.1-without-hadoop
- unpacked hadoop 2.7.1
- added the folowing in spark-env.sh


HADOOP_HOME=/opt/hadoop-2.7.1/
SPARK_DIST_CLASSPATH=/opt/hadoop-2.7.1/opt/hadoop-2.7.1/share/hadoop/tools/lib/*/share/hadoop/tools/lib/*:/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/had$

and start spark-shell with :
bin/spark-shell --jars
/opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar

Now spark-shell is starting with
"spark.SparkContext: Added JAR
file:/opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar at
http://185.19.29.91:46368/jars/hadoop-aws-2.7.1.jar with timestamp
1437575186830"

But when trying to access s3 I have
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem:
Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be
instantiated

In Fact it doesn't even matters if I try to use s3n or s3a, error is
the same (strange!)

2015-07-22 12:19 GMT+02:00 Thomas Demoor <thomas.dem...@hgst.com>:
> You need to get the hadoop-aws.jar from hadoop-tools (use hadoop 2.7+) - you 
> can get the source and build with mvn or get it from prebuilt hadoop 
> distro's. Then when you run your spark job add --jars path/to/thejar
>
> ________________________________________
> From: Schmirr Wurst <schmirrwu...@gmail.com>
> Sent: Wednesday, July 22, 2015 12:06 PM
> To: Thomas Demoor
> Subject: Re: use S3-Compatible Storage with spark
>
> Hi Thomas, thanks, could you just tell me what exaclty I need to do ?
> I'm not familiar with java programming
> - where do I get the jar from, do  I need to compile it with mvn ?
> - where should I update the classpath and how ?
>
>
>
> 2015-07-22 11:55 GMT+02:00 Thomas Demoor <thomas.dem...@hgst.com>:
>> The classes are not found. Is the jar on your classpath?
>>
>> Take care: there are multiple s3 connectors in hadoop: the legacy s3n, based 
>> on a 3d party S3 lib Jets3t, and the recent (functional since hadoop 2.7)  
>> s3a based on the Amazon SDK. Make sure you stick to one: so use fs.s3a 
>> endpoint and url s3a://bucket/object or fs.s3n.endpoint and 
>> s3n://bucket/object. I recommend s3a but I'm biased :P
>>
>> Regards,
>> Thomas
>>
>> ________________________________________
>> From: Schmirr Wurst <schmirrwu...@gmail.com>
>> Sent: Tuesday, July 21, 2015 11:59 AM
>> To: Akhil Das
>> Cc: user@spark.apache.org
>> Subject: Re: use S3-Compatible Storage with spark
>>
>> Which version do you have ?
>>
>> - I tried with spark 1.4.1 for hdp 2.6, but here I had an issue that
>> the aws-module is not there somehow:
>> java.io.IOException: No FileSystem for scheme: s3n
>> the same for s3a :
>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>
>> - On Spark 1.4.1 for hdp 2.4 , the module is there, and works out of
>> the box for S3n (but for the endpoint)
>> But I have "java.io.IOException: No FileSystem for scheme: s3a"
>>
>> :-|
>>
>> 2015-07-21 11:09 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>> Did you try with s3a? It seems its more like an issue with hadoop.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, Jul 21, 2015 at 2:31 PM, Schmirr Wurst <schmirrwu...@gmail.com>
>>> wrote:
>>>>
>>>> It seems to work for the credentials , but the endpoint is ignored.. :
>>>> I've changed it to
>>>> sc.hadoopConfiguration.set("fs.s3n.endpoint","test.com")
>>>>
>>>> And I continue to get my data from amazon, how could it be ? (I also
>>>> use s3n in my text url)
>>>>
>>>> 2015-07-21 9:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>>> > You can add the jar in the classpath, and you can set the property like:
>>>> >
>>>> > sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com")
>>>> >
>>>> >
>>>> >
>>>> > Thanks
>>>> > Best Regards
>>>> >
>>>> > On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <schmirrwu...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Thanks, that is what I was looking for...
>>>> >>
>>>> >> Any Idea where I have to store and reference the corresponding
>>>> >> hadoop-aws-2.6.0.jar ?:
>>>> >>
>>>> >> java.io.IOException: No FileSystem for scheme: s3n
>>>> >>
>>>> >> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>>> >> > Not in the uri, but in the hadoop configuration you can specify it.
>>>> >> >
>>>> >> > <property>
>>>> >> >   <name>fs.s3a.endpoint</name>
>>>> >> >   <description>AWS S3 endpoint to connect to. An up-to-date list is
>>>> >> >     provided in the AWS Documentation: regions and endpoints. Without
>>>> >> > this
>>>> >> >     property, the standard region (s3.amazonaws.com) is assumed.
>>>> >> >   </description>
>>>> >> > </property>
>>>> >> >
>>>> >> >
>>>> >> > Thanks
>>>> >> > Best Regards
>>>> >> >
>>>> >> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst
>>>> >> > <schmirrwu...@gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> I want to use pithos, were do I can specify that endpoint, is it
>>>> >> >> possible in the url ?
>>>> >> >>
>>>> >> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>>>> >> >> > Could you name the Storage service that you are using? Most of
>>>> >> >> > them
>>>> >> >> > provides
>>>> >> >> > a S3 like RestAPI endpoint for you to hit.
>>>> >> >> >
>>>> >> >> > Thanks
>>>> >> >> > Best Regards
>>>> >> >> >
>>>> >> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst
>>>> >> >> > <schmirrwu...@gmail.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> Hi,
>>>> >> >> >>
>>>> >> >> >> I wonder how to use S3 compatible Storage in Spark ?
>>>> >> >> >> If I'm using s3n:// url schema, the it will point to amazon, is
>>>> >> >> >> there
>>>> >> >> >> a way I can specify the host somewhere ?
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> ---------------------------------------------------------------------
>>>> >> >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> >> >> >> For additional commands, e-mail: user-h...@spark.apache.org
>>>> >> >> >>
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >> ---------------------------------------------------------------------
>>>> >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> >> >> For additional commands, e-mail: user-h...@spark.apache.org
>>>> >> >>
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: use S3-Compatible Storage with spark

Reply via email to