I could get a little further : - installed spark-1.4.1-without-hadoop - unpacked hadoop 2.7.1 - added the folowing in spark-env.sh
HADOOP_HOME=/opt/hadoop-2.7.1/ SPARK_DIST_CLASSPATH=/opt/hadoop-2.7.1/opt/hadoop-2.7.1/share/hadoop/tools/lib/*/share/hadoop/tools/lib/*:/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/had$ and start spark-shell with : bin/spark-shell --jars /opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar Now spark-shell is starting with "spark.SparkContext: Added JAR file:/opt/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar at http://185.19.29.91:46368/jars/hadoop-aws-2.7.1.jar with timestamp 1437575186830" But when trying to access s3 I have java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated In Fact it doesn't even matters if I try to use s3n or s3a, error is the same (strange!) 2015-07-22 12:19 GMT+02:00 Thomas Demoor <thomas.dem...@hgst.com>: > You need to get the hadoop-aws.jar from hadoop-tools (use hadoop 2.7+) - you > can get the source and build with mvn or get it from prebuilt hadoop > distro's. Then when you run your spark job add --jars path/to/thejar > > ________________________________________ > From: Schmirr Wurst <schmirrwu...@gmail.com> > Sent: Wednesday, July 22, 2015 12:06 PM > To: Thomas Demoor > Subject: Re: use S3-Compatible Storage with spark > > Hi Thomas, thanks, could you just tell me what exaclty I need to do ? > I'm not familiar with java programming > - where do I get the jar from, do I need to compile it with mvn ? > - where should I update the classpath and how ? > > > > 2015-07-22 11:55 GMT+02:00 Thomas Demoor <thomas.dem...@hgst.com>: >> The classes are not found. Is the jar on your classpath? >> >> Take care: there are multiple s3 connectors in hadoop: the legacy s3n, based >> on a 3d party S3 lib Jets3t, and the recent (functional since hadoop 2.7) >> s3a based on the Amazon SDK. Make sure you stick to one: so use fs.s3a >> endpoint and url s3a://bucket/object or fs.s3n.endpoint and >> s3n://bucket/object. I recommend s3a but I'm biased :P >> >> Regards, >> Thomas >> >> ________________________________________ >> From: Schmirr Wurst <schmirrwu...@gmail.com> >> Sent: Tuesday, July 21, 2015 11:59 AM >> To: Akhil Das >> Cc: user@spark.apache.org >> Subject: Re: use S3-Compatible Storage with spark >> >> Which version do you have ? >> >> - I tried with spark 1.4.1 for hdp 2.6, but here I had an issue that >> the aws-module is not there somehow: >> java.io.IOException: No FileSystem for scheme: s3n >> the same for s3a : >> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >> org.apache.hadoop.fs.s3a.S3AFileSystem not found >> >> - On Spark 1.4.1 for hdp 2.4 , the module is there, and works out of >> the box for S3n (but for the endpoint) >> But I have "java.io.IOException: No FileSystem for scheme: s3a" >> >> :-| >> >> 2015-07-21 11:09 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>: >>> Did you try with s3a? It seems its more like an issue with hadoop. >>> >>> Thanks >>> Best Regards >>> >>> On Tue, Jul 21, 2015 at 2:31 PM, Schmirr Wurst <schmirrwu...@gmail.com> >>> wrote: >>>> >>>> It seems to work for the credentials , but the endpoint is ignored.. : >>>> I've changed it to >>>> sc.hadoopConfiguration.set("fs.s3n.endpoint","test.com") >>>> >>>> And I continue to get my data from amazon, how could it be ? (I also >>>> use s3n in my text url) >>>> >>>> 2015-07-21 9:30 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>: >>>> > You can add the jar in the classpath, and you can set the property like: >>>> > >>>> > sc.hadoopConfiguration.set("fs.s3a.endpoint","storage.sigmoid.com") >>>> > >>>> > >>>> > >>>> > Thanks >>>> > Best Regards >>>> > >>>> > On Mon, Jul 20, 2015 at 9:41 PM, Schmirr Wurst <schmirrwu...@gmail.com> >>>> > wrote: >>>> >> >>>> >> Thanks, that is what I was looking for... >>>> >> >>>> >> Any Idea where I have to store and reference the corresponding >>>> >> hadoop-aws-2.6.0.jar ?: >>>> >> >>>> >> java.io.IOException: No FileSystem for scheme: s3n >>>> >> >>>> >> 2015-07-20 8:33 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>: >>>> >> > Not in the uri, but in the hadoop configuration you can specify it. >>>> >> > >>>> >> > <property> >>>> >> > <name>fs.s3a.endpoint</name> >>>> >> > <description>AWS S3 endpoint to connect to. An up-to-date list is >>>> >> > provided in the AWS Documentation: regions and endpoints. Without >>>> >> > this >>>> >> > property, the standard region (s3.amazonaws.com) is assumed. >>>> >> > </description> >>>> >> > </property> >>>> >> > >>>> >> > >>>> >> > Thanks >>>> >> > Best Regards >>>> >> > >>>> >> > On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst >>>> >> > <schmirrwu...@gmail.com> >>>> >> > wrote: >>>> >> >> >>>> >> >> I want to use pithos, were do I can specify that endpoint, is it >>>> >> >> possible in the url ? >>>> >> >> >>>> >> >> 2015-07-19 17:22 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>: >>>> >> >> > Could you name the Storage service that you are using? Most of >>>> >> >> > them >>>> >> >> > provides >>>> >> >> > a S3 like RestAPI endpoint for you to hit. >>>> >> >> > >>>> >> >> > Thanks >>>> >> >> > Best Regards >>>> >> >> > >>>> >> >> > On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst >>>> >> >> > <schmirrwu...@gmail.com> >>>> >> >> > wrote: >>>> >> >> >> >>>> >> >> >> Hi, >>>> >> >> >> >>>> >> >> >> I wonder how to use S3 compatible Storage in Spark ? >>>> >> >> >> If I'm using s3n:// url schema, the it will point to amazon, is >>>> >> >> >> there >>>> >> >> >> a way I can specify the host somewhere ? >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> --------------------------------------------------------------------- >>>> >> >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> >> >> >> For additional commands, e-mail: user-h...@spark.apache.org >>>> >> >> >> >>>> >> >> > >>>> >> >> >>>> >> >> >>>> >> >> --------------------------------------------------------------------- >>>> >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> >> >> For additional commands, e-mail: user-h...@spark.apache.org >>>> >> >> >>>> >> > >>>> > >>>> > >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org