Reading from RDBMS and processing the data downstream of your pipeline is not
the same in terms of bundling.
The main “issue" with a former is that it reads mostly in a single thread per
SQL-query and JDBC client is not exception. So, Beam can’t split data, that
are not yet read, into bundles
Hi Alexey,
Yes, I have tried changing the fetch size for my implementation. What I
observed through the Flink dashboard was the reading transform gets
completed quickly and one of the other transforms takes a much longer time
(due to some logic).
Even if Apache Beam processes data in bundles when
Hi Yomal,
Actually, usually all data in Beam pipeline is processed by bundles (or chunks)
if it processed by DoFn. The size of the bundle is up to your processing engine
and, iirc, there is no way in Beam to change it.
Talking about your case - did you try to change a fetch size for Beam’s Jdb
Hi all,
I have a pipeline which reads data from a database(postgresql), enrich the
data through a side input and finally publish the results to Kafka.
Currently I am not using the builtin JDBCIO to read the data but I think
there wont be any difference in using that. With my implementation I have