Hi Peay, Have you find better solution yet? I am having same issue.
Following says it works with spark 2.1 onward but only when you use sqlContext and not Dataframe https://medium.com/@anuvrat/writing-into-dynamic-partitions-using-spark-2e2b818a007a Thanks, Nirav On Mon, Oct 2, 2017 at 4:37 AM, Pavel Knoblokh <knobl...@gmail.com> wrote: > If your processing task inherently processes input data by month you > may want to "manually" partition the output data by month as well as > by day, that is to save it with a file name including the given month, > i.e. "dataset.parquet/month=01". Then you will be able to use the > overwrite mode with each month partition. Hope this could be of some > help. > > -- > Pavel Knoblokh > > On Fri, Sep 29, 2017 at 5:31 PM, peay <p...@protonmail.com> wrote: > > Hello, > > > > I am trying to use > > data_frame.write.partitionBy("day").save("dataset.parquet") to write a > > dataset while splitting by day. > > > > I would like to run a Spark job to process, e.g., a month: > > dataset.parquet/day=2017-01-01/... > > ... > > > > and then run another Spark job to add another month using the same folder > > structure, getting me > > dataset.parquet/day=2017-01-01/ > > ... > > dataset.parquet/day=2017-02-01/ > > ... > > > > However: > > - with save mode "overwrite", when I process the second month, all of > > dataset.parquet/ gets removed and I lose whatever was already computed > for > > the previous month. > > - with save mode "append", then I can't get idempotence: if I run the > job to > > process a given month twice, I'll get duplicate data in all the > subfolders > > for that month. > > > > Is there a way to do "append in terms of the subfolders from partitionBy, > > but overwrite within each such partitions? Any help would be appreciated. > > > > Thanks! > > > > -- > Pavel Knoblokh > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- <http://www.xactlycorp.com/email-click/> <https://www.instagram.com/xactlycorp/> <https://www.linkedin.com/company/xactly-corporation> <https://twitter.com/Xactly> <https://www.facebook.com/XactlyCorp> <http://www.youtube.com/xactlycorporation>