subject:"\[Question\] Processing chunks of data in batch based pipelines"

Re: [Question] Processing chunks of data in batch based pipelines

2023-07-18 Thread Alexey Romanenko

Reading from RDBMS and processing the data downstream of your pipeline is not the same in terms of bundling. The main “issue" with a former is that it reads mostly in a single thread per SQL-query and JDBC client is not exception. So, Beam can’t split data, that are not yet read, into bundles

Re: [Question] Processing chunks of data in batch based pipelines

2023-07-17 Thread Yomal de Silva

Hi Alexey, Yes, I have tried changing the fetch size for my implementation. What I observed through the Flink dashboard was the reading transform gets completed quickly and one of the other transforms takes a much longer time (due to some logic). Even if Apache Beam processes data in bundles when

Re: [Question] Processing chunks of data in batch based pipelines

2023-07-17 Thread Alexey Romanenko

Hi Yomal, Actually, usually all data in Beam pipeline is processed by bundles (or chunks) if it processed by DoFn. The size of the bundle is up to your processing engine and, iirc, there is no way in Beam to change it. Talking about your case - did you try to change a fetch size for Beam’s Jdb

[Question] Processing chunks of data in batch based pipelines

2023-07-17 Thread Yomal de Silva

Hi all, I have a pipeline which reads data from a database(postgresql), enrich the data through a side input and finally publish the results to Kafka. Currently I am not using the builtin JDBCIO to read the data but I think there wont be any difference in using that. With my implementation I have

Re: [Question] Processing chunks of data in batch based pipelines

Re: [Question] Processing chunks of data in batch based pipelines

Re: [Question] Processing chunks of data in batch based pipelines

[Question] Processing chunks of data in batch based pipelines

4 matches

Site Navigation

Mail list logo

Footer information