How can I overwrite only a given partition or manually remove a partition before writing?
I don't know if (and I don't think) there is a way to do that using a mode. But doesn't manually deleting the directory of a particular partition help? For directory structure, check this out... http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman <r...@totango.com> wrote: > Hello, > > I have a DataFrame, with a date column which I want to use as a partition. > Each day I want to write the data for the same date in Parquet, and then > read a dataframe for a date range. > > I'm using: > > myDataframe.write().partitionBy("date").mode(SaveMode.Overwrite).parquet(parquetDir); > > If I use SaveMode.Append, then writing data for the same partition adds > the same data there again. > If I use SaveMode.Overwrite, then writing data for a single partition > removes all the data for all partitions. > > How can I overwrite only a given partition or manually remove a partition > before writing? > > Many thanks! > Romi K. >