Re: Overwrite support from ParquetIO

2021-01-28 Thread Tao Li
Thanks everyone for your inputs here! Really helpful information! From: Chamikara Jayalath Reply-To: "user@beam.apache.org" Date: Thursday, January 28, 2021 at 10:54 AM To: user Subject: Re: Overwrite support from ParquetIO On Thu, Jan 28, 2021 at 9:14 AM Alexey Romanenko mailto:

Re: Overwrite support from ParquetIO

2021-01-28 Thread Chamikara Jayalath
now if this makes sense to you. Thanks! > > > *From: *Alexey Romanenko > *Reply-To: *"user@beam.apache.org" > *Date: *Wednesday, January 27, 2021 at 9:10 AM > *To: *"user@beam.apache.org" > *Subject: *Re: Overwrite support from ParquetIO > > What do you

Re: Overwrite support from ParquetIO

2021-01-28 Thread Alexey Romanenko
org" > Date: Wednesday, January 27, 2021 at 9:10 AM > To: "user@beam.apache.org" > Subject: Re: Overwrite support from ParquetIO > > What do you mean by “wipe out all existing parquet files before a write > operation”? Are these all files that already exist in

Re: Overwrite support from ParquetIO

2021-01-27 Thread Reuven Lax
r this deletion operation, or maybe a composite >>> PTransform that does deletion first followed by ParquetIO.Write. >>> >>> >>> >>> *From: *Chamikara Jayalath >>> *Reply-To: *"user@beam.apache.org" >>&g

Re: Overwrite support from ParquetIO

2021-01-27 Thread Robert Bradshaw
, this can >> be done by performing it in a side-input step (to a ParDo that precedes >> sink) or by adding a GBK/Reshuffle between the two steps. >> >> >> >> Thanks, >> >> Cham >> >> >> >> >> >> >>1. >

Re: Overwrite support from ParquetIO

2021-01-27 Thread Reuven Lax
er > *Cc: *Alexey Romanenko > *Subject: *Re: Overwrite support from ParquetIO > > > > > > > > On Wed, Jan 27, 2021 at 12:06 PM Tao Li wrote: > > @Alexey Romanenko thanks for your response. > Regarding your questions: > > > >1. Yes I can p

Re: Overwrite support from ParquetIO

2021-01-27 Thread Tao Li
Date: Wednesday, January 27, 2021 at 3:45 PM To: user Cc: Alexey Romanenko Subject: Re: Overwrite support from ParquetIO On Wed, Jan 27, 2021 at 12:06 PM Tao Li mailto:t...@zillow.com>> wrote: @Alexey Romanenko<mailto:aromanenko@gmail.com> thanks for your response. Regarding your

Re: Overwrite support from ParquetIO

2021-01-27 Thread Chamikara Jayalath
n the two steps. Thanks, Cham > >1. > > > > Please let me know if this makes sense to you. Thanks! > > > > > > *From: *Alexey Romanenko > *Reply-To: *"user@beam.apache.org" > *Date: *Wednesday, January 27, 2021 at 9:10 AM > *To: *&quo

Re: Overwrite support from ParquetIO

2021-01-27 Thread Tao Li
files from previous run that won’t get overwritten in the current run. Please let me know if this makes sense to you. Thanks! From: Alexey Romanenko Reply-To: "user@beam.apache.org" Date: Wednesday, January 27, 2021 at 9:10 AM To: "user@beam.apache.org" Subject: Re: Overwr

Re: Overwrite support from ParquetIO

2021-01-27 Thread Alexey Romanenko
What do you mean by “wipe out all existing parquet files before a write operation”? Are these all files that already exist in the same output directory? Can you purge this directory before or just use a new output directory for every pipeline run? To write Parquet files you need to use ParquetI