Writing out results to MySQL database

Mark Striebeck Tue, 17 May 2022 13:41:25 -0700

Hi,

We have a datapipeline that produces ~400M datapoints each day. If we run
it without storing, it finishes in a little over an hour. If we run it and
store the datapoints in a MySQL database it takes several hours.


We are running on GCP dataflow, the MySQL instances are hosted GCP
instances. We are using mysql-beam-connector
<https://github.com/esakik/beam-mysql-connector>.

The pipeline writes ~5000 datapoints per second.

A couple of questions:

   - Does this throughput sound reasonable or could it be significantly
   improved by optimizing the database?
   - The pipeline runs several workers to write this out - and because it's
   a write operation they content for write access. Is it better to write out
   through just one worker and one connection?
   - Is it actually faster to write from the pipeline to pubsub or kafka or
   such and have a client on the other side which then writes in bulk?

Thanks for any ideas or pointers (no, I'm by no means an experienced DBA!!!)

     Mark

Writing out results to MySQL database

Reply via email to