Hey all,

We are working to scale one of our Flink Jobs (using Table API mostly, some
DataStream) where we are using a MySQL CDC table as a source for enrichment.

What I've noticed is that, when I increase the parallelism of the job (e.g.
to 2), the CDC table source has 2 tasks, but only one of these reads any
events. The other one remains completely idle. This stalls downstream
processing because we are not getting any watermarks, the only way I've
found to get this to continue is to set table.exec.source.idle-timeout to a
non-zero value.

My questions are:
  - is there some setting I can tune to get the CDC to distribute events
across the different sub-tasks?
  - If the above isn't possible, is there a way in the Table/SQL API to
reduce the parallelism (e.g. to 1)? CDC doesn't seem to support
scan.parallelism.

If neither of the above works, I think I may be forced to use the
DataStream API, set the parallelism explicitly and then convert to a table.

Thanks!

Cheers,
Mike

-- 

Michael Marino

Principal Data Science & Analytics

Phone:  +49 89 7167786 - 14

linkedin.com/company/tadogmbh <https://www.linkedin.com/company/tadogmbh> |
facebook.com/tado <http://www.facebook.com/tado> | twitter.com/tado
<http://www.twitter.com/tado> | youtube.com/tado
<http://www.youtube.com/tado>

www.tado.com | tado GmbH | Sapporobogen 6-8 | 80637 Munich | Germany

 Managing Directors: Dr. Philip Beckmann | Christian Deilmann | Johannes
Schwarz | Dr. Frank Siebdrat | Lukas Zyla

Registered with the Commercial Register Munich as HRB 194769 | VAT-No: DE
280012558

Reply via email to