Hi Ryan and Steve, Thanks very much for your reply.
I was finally able to get Ryan's repo work for me by changing the output committer to FileOutputFormat instead of ParquetOutputCommitter in spark as Steve suggested. However, It is not working for append mode while saving the data frame. val hf = spark.read.parquet("/home/user/softwares/spark-2.1.0-bin-hadoop2.7/examples/src/main/resources/users.parquet") hf.persist(StorageLevel.DISK_ONLY) hf.show() hf.write .partitionBy("name").mode("append") .save(S3Location + "data" + ".parquet") The above code is successfully saving the parquet file when I am running it for the first time. But When I rerun the code again the new parquet files are not getting added to s3 I have put a print statement in the constructors of PartitionedOutputCommiter in Ryan's repo and realized that the partitioned output committer is not even getting called the second time I ran the code. It is being called only for the first time. Is there anything that I can do to make spark call the PartitionedOutputCommiter even when the file already exists in s3? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21776.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org