Re: How does one use s3 for checkpointing?

Jerry Lam Mon, 21 Sep 2015 19:14:26 -0700

Hi Amit,

Have you looked at Amazon EMR? Most people using EMR use s3 for persistency 
(both as input and output of spark jobs).


Best Regards,

Jerry

Sent from my iPhone

> On 21 Sep, 2015, at 9:24 pm, Amit Ramesh <a...@yelp.com> wrote:
> 
> 
> A lot of places in the documentation mention using s3 for checkpointing, 
> however I haven't found any examples or concrete evidence of anyone having 
> done this.
> Is this a safe/reliable option given the read-after-write consistency for 
> PUTS in s3?
> Is s3 access broken for hadoop 2.6 (SPARK-7442)? If so, is it viable in 2.4?
> Related to #2. I did try providing hadoop-aws-2.6.0.jar while submitting the 
> job and got the following stack trace. Is there a fix?
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: 
> Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
>         at java.util.ServiceLoader.fail(ServiceLoader.java:224)
>         at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
>         at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
>         at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
>         at 
> org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
>         at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
>         at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>         at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1354)
>         at org.apache.spark.SparkContext.addFile(SparkContext.scala:1332)
>         at 
> org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:475)
>         at 
> org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:475)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:475)
>         at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>         at py4j.Gateway.invoke(Gateway.java:214)
>         at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>         at 
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoClassDefFoundError: 
> com/amazonaws/AmazonServiceException
>         at java.lang.Class.getDeclaredConstructors0(Native Method)
>         at java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
>         at java.lang.Class.getConstructor0(Class.java:2885)
>         at java.lang.Class.newInstance(Class.java:350)
>         at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
>         ... 27 more
> Caused by: java.lang.ClassNotFoundException: 
> com.amazonaws.AmazonServiceException
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         ... 32 more
> 
> Thanks!
> Amit
>

Re: How does one use s3 for checkpointing?

Reply via email to