Hi Mason, your understanding is correct. On Thu, May 28, 2020 at 8:23 AM Chen, Mason <mason.c...@sony.com> wrote:
> I think I may have just answered my own question. There’s only one Kafka > partition, so the maximum parallelism is one and it doesn’t really make > sense to make another kafka consumer under the same group id. What threw me > off is that there’s a 2nd subtask for the kafka source created even > though it’s not actually doing anything. So, it seems a general statement > can be made that (# kafka partitions) >= (# parallelism of flink kafka > source)…well I guess you could have more parallelism than kafka partitions, > but the extra subtasks will not doing anything. > > > > *From: *"Chen, Mason" <mason.c...@sony.com> > *Date: *Wednesday, May 27, 2020 at 11:09 PM > *To: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *Flink Kafka Connector Source Parallelism > > > > Hi all, > > > > I’m currently trying to understand Flink’s Kafka Connector and how > parallelism affects it. So, I am running the flink playground click count > job and the parallelism is set to 2 by default. > > > However, I don’t see the 2nd subtask of the Kafka Connector sending any > records: https://imgur.com/cA5ucSg. Do I need to rebalance after reading from > kafka? > > ``` > clicks = clicks > .keyBy(ClickEvent::getPage) > .map(*new *BackpressureMap()) > .name(*"Backpressure"*); > ``` > > > > `clicks` is the kafka click stream. From my reading in the operator docs, > it seems counterintuitive to do a `rebalance()` when I am already doing a > `keyBy()`. > > So, my questions: > > 1. How do I make use of the 2nd subtask? > > 2. Does the number of partitions have some sort of correspondence with the > parallelism of the source operator? If so, is there a general statement to > be made about parallelism across all source operators? > > > > Thanks, > > Mason >