Hi Mark - we faced a similar problem at my company (we sink our data into
BigQuery first from Dataflow and then run a PSQL COPY operation to import
in bulk). Moving to bulk can help but the bottleneck is more likely going
to be your MySQL instance itself, so you might have to put your DBA hat
on.
Hi Mark,
Writing to the db in bulk would be the first step. Have you looked into
writing to the DB with a larger batch size. I believe mysql-beam-connector
also supports this.
On Wed, May 18, 2022 at 2:13 AM Mark Striebeck
wrote:
> Hi,
>
> We have a datapipeline that produces ~400M datapoints
Hi,
We have a datapipeline that produces ~400M datapoints each day. If we run
it without storing, it finishes in a little over an hour. If we run it and
store the datapoints in a MySQL database it takes several hours.
We are running on GCP dataflow, the MySQL instances are hosted GCP
instances. W