Re: [Question] Processing chunks of data in batch based pipelines

2023-07-18 Thread Alexey Romanenko
Reading from RDBMS and processing the data downstream of your pipeline is not the same in terms of bundling. The main “issue" with a former is that it reads mostly in a single thread per SQL-query and JDBC client is not exception. So, Beam can’t split data, that are not yet read, into bundles

Re: [Question] Processing chunks of data in batch based pipelines

2023-07-17 Thread Yomal de Silva
Hi Alexey, Yes, I have tried changing the fetch size for my implementation. What I observed through the Flink dashboard was the reading transform gets completed quickly and one of the other transforms takes a much longer time (due to some logic). Even if Apache Beam processes data in bundles when

Re: [Question] Processing chunks of data in batch based pipelines

2023-07-17 Thread Alexey Romanenko
Hi Yomal, Actually, usually all data in Beam pipeline is processed by bundles (or chunks) if it processed by DoFn. The size of the bundle is up to your processing engine and, iirc, there is no way in Beam to change it. Talking about your case - did you try to change a fetch size for Beam’s Jdb

[Question] Processing chunks of data in batch based pipelines

2023-07-17 Thread Yomal de Silva
Hi all, I have a pipeline which reads data from a database(postgresql), enrich the data through a side input and finally publish the results to Kafka. Currently I am not using the builtin JDBCIO to read the data but I think there wont be any difference in using that. With my implementation I have