Hi, I did some digging.. I believe the error is caused by jets3t jar. Essentially these lines
locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 'java/net/URI', 'org/apache/hadoop/conf/Configuration', 'org/apache/hadoop/fs/s3/S3Credentials', 'org/jets3t/service/security/AWSCredentials' } stack: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', uninitialized 32, uninitialized 32, 'org/jets3t/service/security/AWSCredentials' } But I am not sure how to fix this. Is it jar version issue? I am using cloudera cdh 5.2 distro and have created the symlink of jets3t jar from hadoop/lib to spark/lib (which I believe is of version 0.9ish version)? On Wed, Sep 16, 2015 at 4:59 PM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Hi, > I have a spark cluster setup and I am trying to write the data to s3 but > in parquet format. > Here is what I am doing > > df = sqlContext.load('test', 'com.databricks.spark.avro') > > df.saveAsParquetFile("s3n://test") > > But I get some nasty error: > > Py4JJavaError: An error occurred while calling o29.saveAsParquetFile. > > : org.apache.spark.SparkException: Job aborted. > > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:166) > > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:139) > > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) > > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) > > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950) > > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950) > > at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:336) > > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144) > > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135) > > at org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:1508) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > > at py4j.Gateway.invoke(Gateway.java:259) > > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > > at py4j.commands.CallCommand.execute(CallCommand.java:79) > > at py4j.GatewayConnection.run(GatewayConnection.java:207) > > at java.lang.Thread.run(Thread.java:744) > > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task > 3.3 in stage 0.0 (TID 12, srv-110-29.720.rdio): > org.apache.spark.SparkException: Task failed while writing rows. > > at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org > $apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191) > > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) > > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > > at org.apache.spark.scheduler.Task.run(Task.scala:70) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > Caused by: java.lang.VerifyError: Bad type on operand stack > > Exception Details: > > Location: > > > org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V > @38: invokespecial > > Reason: > > Type 'org/jets3t/service/security/AWSCredentials' (current frame, > stack[3]) is not assignable to > 'org/jets3t/service/security/ProviderCredentials' > > Current Frame: > > bci: @38 > > flags: { } > > locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', > 'java/net/URI', 'org/apache/hadoop/conf/Configuration', > 'org/apache/hadoop/fs/s3/S3Credentials', > 'org/jets3t/service/security/AWSCredentials' } > > stack: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', > uninitialized 32, uninitialized 32, > 'org/jets3t/service/security/AWSCredentials' } > > Bytecode: > > 0000000: bb00 3159 b700 324e 2d2b 2cb6 0034 bb00 > > 0000010: 3659 2db6 003a 2db6 003d b700 403a 042a > > 0000020: bb00 4259 1904 b700 45b5 0047 a700 0b3a > > 0000030: 042a 1904 b700 4f2a 2c12 5103 b600 55b5 > > 0000040: 0057 2a2c 1259 1400 5ab6 005f 1400 1eb8 > > 0000050: 0065 b500 672a 2c12 6914 001e b600 5f14 > > 0000060: 001e b800 65b5 006b 2a2c 126d b600 71b5 > > 0000070: 0073 2abb 0075 592b b600 78b7 007b b500 > > 0000080: 7db1 > > Exception Handler Table: > > bci [14, 44] => handler: 47 > > Stackmap Table: > > > full_frame(@47,{Object[#2],Object[#73],Object[#75],Object[#49]},{Object[#47]}) > > same_frame(@55) > > > > And in s3, I see something like test$folder? > > I am not sure, how to fix this? > > Any ideas? > > Thanks >