How can I overwrite only a given partition or manually remove a partition
before writing?

I don't know if (and I don't think)  there is a way to do that using a
mode. But doesn't manually deleting the directory of a particular partition
help? For directory structure, check this out...

http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery


On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman <r...@totango.com> wrote:

> Hello,
>
> I have a DataFrame, with a date column which I want to use as a partition.
> Each day I want to write the data for the same date in Parquet, and then
> read a dataframe for a date range.
>
> I'm using:
>
> myDataframe.write().partitionBy("date").mode(SaveMode.Overwrite).parquet(parquetDir);
>
> If I use SaveMode.Append, then writing data for the same partition adds
> the same data there again.
> If I use SaveMode.Overwrite, then writing data for a single partition
> removes all the data for all partitions.
>
> How can I overwrite only a given partition or manually remove a partition
> before writing?
>
> Many thanks!
> Romi K.
>

Reply via email to