Hey all, We are working to scale one of our Flink Jobs (using Table API mostly, some DataStream) where we are using a MySQL CDC table as a source for enrichment.
What I've noticed is that, when I increase the parallelism of the job (e.g. to 2), the CDC table source has 2 tasks, but only one of these reads any events. The other one remains completely idle. This stalls downstream processing because we are not getting any watermarks, the only way I've found to get this to continue is to set table.exec.source.idle-timeout to a non-zero value. My questions are: - is there some setting I can tune to get the CDC to distribute events across the different sub-tasks? - If the above isn't possible, is there a way in the Table/SQL API to reduce the parallelism (e.g. to 1)? CDC doesn't seem to support scan.parallelism. If neither of the above works, I think I may be forced to use the DataStream API, set the parallelism explicitly and then convert to a table. Thanks! Cheers, Mike -- Michael Marino Principal Data Science & Analytics Phone: +49 89 7167786 - 14 linkedin.com/company/tadogmbh <https://www.linkedin.com/company/tadogmbh> | facebook.com/tado <http://www.facebook.com/tado> | twitter.com/tado <http://www.twitter.com/tado> | youtube.com/tado <http://www.youtube.com/tado> www.tado.com | tado GmbH | Sapporobogen 6-8 | 80637 Munich | Germany Managing Directors: Dr. Philip Beckmann | Christian Deilmann | Johannes Schwarz | Dr. Frank Siebdrat | Lukas Zyla Registered with the Commercial Register Munich as HRB 194769 | VAT-No: DE 280012558