Hi,
  I'm trying to save a simple dataframe to S3 in ORC format. The code is as
follows:


     val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>       import sqlContext.implicits._
>       val df=sc.parallelize(1 to 1000).toDF()
>       df.write.format("orc").save("s3://logs/dummy)


I ran the above code in spark-shell and only the _SUCCESS file was saved
under the directory.
The last part of the spark-shell log said:

15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task
> 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100)
>


> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage
> 2 (save at <console>:29) finished in 0.834 s
>


> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed TaskSet
> 2.0, whose tasks have all completed, from pool
>


> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
> <console>:29, took 0.895912 s
>


> 15/08/23 07:38:24 main INFO
> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
> /media/ephemeral0/s3/output-
>


> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>


> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
> committed.


Anyone has experienced this before?
Thanks!

Reply via email to