Flink handles its parallelism independently from the number of partitions in the topic(s) being read. The parallelism comes from whatever is set in the cluster configuration, without any concern for the source's native parallelism. If there are fewer kafka partitions than the flink parallelism, then there will be idle source tasks with nothing to do. If the flink source parallelism is less than the number of kafka partitions, then the flink source tasks will consume from multiple partitions, as needed.
On Fri, Dec 20, 2024 at 3:32 AM Guillermo Ortiz Fernández < guillermo.ortiz.f...@gmail.com> wrote: > I'm looking for how Flink defines parallelism for a Kafka source ( > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/). > How is it determined by default? Is it based on the number of partitions in > the topic? I have some topics with hundreds of partitions, and such a high > level of parallelism seems excessive. >