Re: DirectRunner, Fusion, and Triggers

2021-05-19 Thread Bashir Sadjad
roblem is that no matter how many threads I use, Direct Runner keeps all the resources in memory and then writes them all to file (instead of flushing triggered panes). Regards -B > > Brian > > On Mon, May 17, 2021 at 7:12 AM Jan Lukavský wrote: > >> On 5/17/21 3:46 PM,

Re: DirectRunner, Fusion, and Triggers

2021-05-17 Thread Bashir Sadjad
en I think you would see the result > you expect. > > Best, > > Jan > On 5/12/21 7:35 PM, Bashir Sadjad wrote: > > Thanks Kenn. > > On Wed, May 12, 2021 at 12:14 PM Kenneth Knowles wrote: > >> >> On Sat, May 8, 2021 at 12:00 AM Bashir Sadjad wrote: >

Re: DirectRunner, Fusion, and Triggers

2021-05-12 Thread Bashir Sadjad
Thanks Kenn. On Wed, May 12, 2021 at 12:14 PM Kenneth Knowles wrote: > > On Sat, May 8, 2021 at 12:00 AM Bashir Sadjad wrote: > >> However, if I add a dummy S2' after S2 (i.e., S1->S2->S2'->S3) which only >> prints some log messages for each record and

DirectRunner, Fusion, and Triggers

2021-05-07 Thread Bashir Sadjad
Hi Beam-users, *TL;DR;* I wonder if DirectRunner does any fusion optimization and whether this has any impact on triggers/panes? *Details* (the context for everything below is *DirectRunner* and this is a *batch* job): I hav

Re: Setting rowGroupSize in ParquetIO

2021-03-17 Thread Bashir Sadjad
Thank you Alexey for the review and great suggestions. -B On Wed, Mar 17, 2021 at 12:07 PM Alexey Romanenko wrote: > Thank you for your contribution, Bashir! > > Alexey > > On 17 Mar 2021, at 15:32, Bashir Sadjad wrote: > > To close the loop here: The fix is merge

Re: Setting rowGroupSize in ParquetIO

2021-03-17 Thread Bashir Sadjad
> Tbh mate, I reckon it would be quicker if you progress your PR. > > Cheers, > David > > ------ > *From:* Bashir Sadjad > *Sent:* 12 March 2021 16:29 > *To:* user@beam.apache.org > *Subject:* Re: Setting rowGroupSize in ParquetIO > >

Re: Setting rowGroupSize in ParquetIO

2021-03-12 Thread Bashir Sadjad
uncompressed. > I’m sure there are other use cases out there that need this fined grained > control. > > > > Cheers, David > > > > *David Hollands* > > BBC Broadcast Centre, London, W12 > > Email: david.holla...@bbc.co.uk > > > > > > *From: *Bashi

Setting rowGroupSize in ParquetIO

2021-03-11 Thread Bashir Sadjad
Hi all, I wonder how I can set the row group size for files generated by ParquetIO.Sink . It doesn't seem to provide the option for setting that and IIUC from the code

Re: Implementing an IO Connector for Debezium

2020-11-26 Thread Bashir Sadjad
splittable DoFn. > > > Python examples: > >- BoundedSourceWrapper > > <https://github.com/apache/beam/blob/571338b0cc96e2e80f23620fe86de5c92dffaccc/sdks/python/apache_beam/io/iobase.py#L1375> >- A wrapper which converts an existing BoundedSource implemen

Implementing an IO Connector for Debezium

2020-11-25 Thread Bashir Sadjad
Hi, I have a scenario in which a streaming pipeline should read update messages from MySQL binlog (through Debezium). To implement this pipeline using Beam, I understand there is a KafkaIO which I can use. But I also want to support a local mode in which there is no Kafka and the messages are dire

Re: AvroCoder fails on BigDecimal

2020-10-06 Thread Bashir Sadjad
For future reference: This was an issue in Avro type conversion. For the full description and a solution see here <https://lists.apache.org/thread.html/rc4d468169920a81362c23b320752b7073afa4ab1eea3b6bfd6b9c93b%40%3Cuser.avro.apache.org%3E> . -B On Mon, Oct 5, 2020 at 2:33 PM Bashir Sadjad

Re: Managing long-running pipelines

2018-04-29 Thread Bashir Sadjad
Hi Pino, If you have not figured it already, for Dataflow runner, this API may help. -B On Mon, Apr 16, 2018 at 5:21 PM Pino de Candia wrote: > Hi Folks, > > are there any systems that can help manage Beam pipelines? I'm looking for > so