Try using s3n:// instead of s3 (for the credential configuration as well). On Tue, Oct 7, 2014 at 9:51 AM, Sunny Khatri <sunny.k...@gmail.com> wrote:
> Not sure if it's supposed to work. Can you try newAPIHadoopFile() passing > in the required configuration object. > > On Tue, Oct 7, 2014 at 4:20 AM, Tomer Benyamini <tomer....@gmail.com> > wrote: > >> Hello, >> >> I'm trying to read from s3 using a simple spark java app: >> >> --------------------- >> >> SparkConf sparkConf = new SparkConf().setAppName("TestApp"); >> sparkConf.setMaster("local"); >> JavaSparkContext sc = new JavaSparkContext(sparkConf); >> sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX"); >> sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX"); >> >> String path = "s3://bucket/test/testdata"; >> JavaRDD<String> textFile = sc.textFile(path); >> System.out.println(textFile.count()); >> >> --------------------- >> But getting this error: >> >> org.apache.hadoop.mapred.InvalidInputException: Input path does not >> exist: s3://bucket/test/testdata >> at >> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) >> at >> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) >> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) >> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097) >> at org.apache.spark.rdd.RDD.count(RDD.scala:861) >> at >> org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:365) >> at org.apache.spark.api.java.JavaRDD.count(JavaRDD.scala:29) >> .... >> >> Looking at the debug log I see that >> org.jets3t.service.impl.rest.httpclient.RestS3Service returned 404 >> error trying to locate the file. >> >> Using a simple java program with >> com.amazonaws.services.s3.AmazonS3Client works just fine. >> >> Any idea? >> >> Thanks, >> Tomer >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >