[ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Udi Meiri
The Apache Beam team is pleased to announce the release of version 2.18.0. Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. See https://beam.apache.org You can download the release her

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Rui Wang
Thank you Udi for taking care of Beam 2.18.0 release! -Rui On Tue, Jan 28, 2020 at 10:59 AM Udi Meiri wrote: > The Apache Beam team is pleased to announce the release of version 2.18.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelin

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Pablo Estrada
Thanks Udi! On Tue, Jan 28, 2020 at 11:08 AM Rui Wang wrote: > Thank you Udi for taking care of Beam 2.18.0 release! > > > > -Rui > > On Tue, Jan 28, 2020 at 10:59 AM Udi Meiri wrote: > >> The Apache Beam team is pleased to announce the release of version 2.18.0. >> >> Apache Beam is an open so

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Hannah Jiang
Thanks Udi! On Tue, Jan 28, 2020 at 11:09 AM Pablo Estrada wrote: > Thanks Udi! > > On Tue, Jan 28, 2020 at 11:08 AM Rui Wang wrote: > >> Thank you Udi for taking care of Beam 2.18.0 release! >> >> >> >> -Rui >> >> On Tue, Jan 28, 2020 at 10:59 AM Udi Meiri wrote: >> >>> The Apache Beam team

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Yichi Zhang
Thanks Udi! On Tue, Jan 28, 2020 at 11:28 AM Hannah Jiang wrote: > Thanks Udi! > > > On Tue, Jan 28, 2020 at 11:09 AM Pablo Estrada wrote: > >> Thanks Udi! >> >> On Tue, Jan 28, 2020 at 11:08 AM Rui Wang wrote: >> >>> Thank you Udi for taking care of Beam 2.18.0 release! >>> >>> >>> >>> -Rui >

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Ankur Goenka
Thanks Udi! On Tue, Jan 28, 2020 at 11:30 AM Yichi Zhang wrote: > Thanks Udi! > > On Tue, Jan 28, 2020 at 11:28 AM Hannah Jiang > wrote: > >> Thanks Udi! >> >> >> On Tue, Jan 28, 2020 at 11:09 AM Pablo Estrada >> wrote: >> >>> Thanks Udi! >>> >>> On Tue, Jan 28, 2020 at 11:08 AM Rui Wang wrot

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Connell O'Callaghan
Well done thank you Udi!!! On Tue, Jan 28, 2020 at 11:47 AM Ankur Goenka wrote: > Thanks Udi! > > On Tue, Jan 28, 2020 at 11:30 AM Yichi Zhang wrote: > >> Thanks Udi! >> >> On Tue, Jan 28, 2020 at 11:28 AM Hannah Jiang >> wrote: >> >>> Thanks Udi! >>> >>> >>> On Tue, Jan 28, 2020 at 11:09 AM P

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread kant kodali
Looks like https://beam.apache.org/documentation/runners/capability-matrix/ needs to be updated? since there seems to be support for spark structured streaming? On Tue, Jan 28, 2020 at 1:47 PM Connell O'Callaghan wrote: > Well done thank you Udi!!! > > On Tue, Jan 28, 2020 at 11:47 AM Ankur Goen

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Ahmet Altay
Thank you Udi! On Tue, Jan 28, 2020 at 2:13 PM kant kodali wrote: > Looks like > https://beam.apache.org/documentation/runners/capability-matrix/ needs to > be updated? since there seems to be support for spark structured streaming? > > On Tue, Jan 28, 2020 at 1:47 PM Connell O'Callaghan > wrot

Re: [ANNOUNCE] Beam 2.18.0 Released

2020-01-28 Thread Reza Rokni
Thank you Udi! On Wed, 29 Jan 2020 at 06:34, Ahmet Altay wrote: > Thank you Udi! > > On Tue, Jan 28, 2020 at 2:13 PM kant kodali wrote: > >> Looks like >> https://beam.apache.org/documentation/runners/capability-matrix/ needs >> to be updated? since there seems to be support for spark structure

Fwd: Unable to reliably have multiple cores working on a dataset with DirectRunner

2020-01-28 Thread Julien Lafaye
Hello, I have a set of tfrecord files, obtained by converting parquet files with Spark. Each file is roughly 1GB and I have 11 of those. I would expect simple statistics gathering (ie counting number of items of all files) to scale linearly with respect to the number of cores on my system. I am

Re: Unable to reliably have multiple cores working on a dataset with DirectRunner

2020-01-28 Thread Hannah Jiang
Hi Julien Thanks for reaching out user community. I will look into it. Can you please share how you checked CPU usage for each core? Thanks, Hannah On Tue, Jan 28, 2020 at 9:48 PM Julien Lafaye wrote: > Hello, > > I have a set of tfrecord files, obtained by converting parquet files with > Spar

Re: Unable to reliably have multiple cores working on a dataset with DirectRunner

2020-01-28 Thread Julien Lafaye
Hi Hannah, I used top. Please let me know if you need any other information that cloud help me understand the issue. J. On Wed, Jan 29, 2020 at 8:14 AM Hannah Jiang wrote: > Hi Julien > > Thanks for reaching out user community. I will look into it. Can you > please share how you checked CPU u