Hi Mason,
your understanding is correct.

On Thu, May 28, 2020 at 8:23 AM Chen, Mason <mason.c...@sony.com> wrote:

> I think I may have just answered my own question. There’s only one Kafka
> partition, so the maximum parallelism is one and it doesn’t really make
> sense to make another kafka consumer under the same group id. What threw me
> off is that there’s a 2nd subtask for the kafka source created even
> though it’s not actually doing anything. So, it seems a general statement
> can be made that (# kafka partitions) >= (# parallelism of flink kafka
> source)…well I guess you could have more parallelism than kafka partitions,
> but the extra subtasks will not doing anything.
>
>
>
> *From: *"Chen, Mason" <mason.c...@sony.com>
> *Date: *Wednesday, May 27, 2020 at 11:09 PM
> *To: *"user@flink.apache.org" <user@flink.apache.org>
> *Subject: *Flink Kafka Connector Source Parallelism
>
>
>
> Hi all,
>
>
>
> I’m currently trying to understand Flink’s Kafka Connector and how
> parallelism affects it. So, I am running the flink playground click count
> job and the parallelism is set to 2 by default.
>
>
> However, I don’t see the 2nd subtask of the Kafka Connector sending any 
> records: https://imgur.com/cA5ucSg. Do I need to rebalance after reading from 
> kafka?
>
> ```
> clicks = clicks
>    .keyBy(ClickEvent::getPage)
>    .map(*new *BackpressureMap())
>    .name(*"Backpressure"*);
> ```
>
>
>
> `clicks` is the kafka click stream. From my reading in the operator docs,
> it seems counterintuitive to do a `rebalance()` when I am already doing a
> `keyBy()`.
>
> So, my questions:
>
> 1. How do I make use of the 2nd subtask?
>
> 2. Does the number of partitions have some sort of correspondence with the
> parallelism of the source operator? If so, is there a general statement to
> be made about parallelism across all source operators?
>
>
>
> Thanks,
>
> Mason
>

Reply via email to