Hi, I'm trying to save a simple dataframe to S3 in ORC format. The code is as follows:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) > import sqlContext.implicits._ > val df=sc.parallelize(1 to 1000).toDF() > df.write.format("orc").save("s3://logs/dummy) I ran the above code in spark-shell and only the _SUCCESS file was saved under the directory. The last part of the spark-shell log said: 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task > 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100) > > 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage > 2 (save at <console>:29) finished in 0.834 s > > 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed TaskSet > 2.0, whose tasks have all completed, from pool > > 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at > <console>:29, took 0.895912 s > > 15/08/23 07:38:24 main INFO > LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory: > /media/ephemeral0/s3/output- > > 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for > dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78, > 4, -23, -103, 9, -104, -20, -8, 66, 126] > > 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_**** > committed. Anyone has experienced this before? Thanks!