Hi, Michael.

MySQL CDC source has the parallelism 1 when reading binlog events to keep
their order. And other subtasks will stop reading data.
For your question, you could set the option
'scan.incremental.close-idle-reader.enabled'='true'[1] in your cdc table to
let the source close the idle subtasks.
ps: There is a limit when opening this option. Please see more in its
description.

Best,
Hang

[1]
https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/mysql-cdc/

Michael Marino <michael.mar...@tado.com> 于2024年10月24日周四 19:13写道:

> Let me quickly follow up on this:
>
> - I missed noting that I *was* setting the server-id value to a range.
> - I just realized that if I do a hard restart and start without a
> snapshot, then this works, i.e. the multiple sub-tasks receive events and
> the watermarking/processing progresses. This is, however, not really ideal,
> is there any way to scale CDC without this hard restart?
>
> Thanks,
> Mike
>
>
> On Thu, Oct 24, 2024 at 12:25 PM Michael Marino <michael.mar...@tado.com>
> wrote:
>
>> Hey all,
>>
>> We are working to scale one of our Flink Jobs (using Table API mostly,
>> some DataStream) where we are using a MySQL CDC table as a source for
>> enrichment.
>>
>> What I've noticed is that, when I increase the parallelism of the job
>> (e.g. to 2), the CDC table source has 2 tasks, but only one of these reads
>> any events. The other one remains completely idle. This stalls downstream
>> processing because we are not getting any watermarks, the only way I've
>> found to get this to continue is to set table.exec.source.idle-timeout to a
>> non-zero value.
>>
>> My questions are:
>>   - is there some setting I can tune to get the CDC to distribute events
>> across the different sub-tasks?
>>   - If the above isn't possible, is there a way in the Table/SQL API to
>> reduce the parallelism (e.g. to 1)? CDC doesn't seem to support
>> scan.parallelism.
>>
>> If neither of the above works, I think I may be forced to use the
>> DataStream API, set the parallelism explicitly and then convert to a table.
>>
>> Thanks!
>>
>> Cheers,
>> Mike
>>
>> --
>>
>> Michael Marino
>>
>> Principal Data Science & Analytics
>>
>> Phone:  +49 89 7167786 - 14
>>
>> linkedin.com/company/tadogmbh <https://www.linkedin.com/company/tadogmbh>
>>  | facebook.com/tado <http://www.facebook.com/tado> | twitter.com/tado
>> <http://www.twitter.com/tado> | youtube.com/tado
>> <http://www.youtube.com/tado>
>>
>> www.tado.com | tado GmbH | Sapporobogen 6-8 | 80637 Munich | Germany
>>
>>  Managing Directors: Dr. Philip Beckmann | Christian Deilmann | Johannes
>> Schwarz | Dr. Frank Siebdrat | Lukas Zyla
>>
>> Registered with the Commercial Register Munich as HRB 194769 | VAT-No: DE
>> 280012558
>>
>
>
> --
>
> Michael Marino
>
> Principal Data Science & Analytics
>
> Phone:  +49 89 7167786 - 14
>
> linkedin.com/company/tadogmbh <https://www.linkedin.com/company/tadogmbh>
> | facebook.com/tado <http://www.facebook.com/tado> | twitter.com/tado
> <http://www.twitter.com/tado> | youtube.com/tado
> <http://www.youtube.com/tado>
>
> www.tado.com | tado GmbH | Sapporobogen 6-8 | 80637 Munich | Germany
>
>  Managing Directors: Dr. Philip Beckmann | Christian Deilmann | Johannes
> Schwarz | Dr. Frank Siebdrat | Lukas Zyla
>
> Registered with the Commercial Register Munich as HRB 194769 | VAT-No: DE
> 280012558
>

Reply via email to