Hi, Hongshun,

Thank you for starting this discussion.  I have some problems about the
field `firstDiscoveryDone`.

In the FLIP, why we need firstDiscoveryDone is as follows.
> Why do we need firstDiscoveryDone? Only relying on the
unAssignedInitialPartitons attribute cannot distinguish between the
following two cases (which often occur in pattern mode):
> The first partition discovery is so slow, before which the checkpoint is
executed and then job is restarted . At this time, the restored
unAssignedInitialPartitons is an empty set, which means non-discovery. The
next discovery will be treated as first discovery.
> The first time the partition is discovered is empty, and new partitions
can only be found after multiple partition discoveries. If a restart occurs
between this period, the restored unAssignedInitialPartitons is also an
empty set, which means empty-discovery.The next discovery will be treated
as new discovery.

I don't know when the second case will occur. The partitions must be
greater than 0 when creating topics. And I have read this note in the FLIP.
> Note: The current design only applies to cases where all existing
partitions can be discovered at once. If all old partitions cannot be
discovered at once, the subsequent old partitions discovered will be
treated as new partitions, leading to message duplication. Therefore, this
point needs to be particularly noted.

Then when will we get the empty partition list? I think it should be
treated as the first initial discovery if both `unassignedInitialPartitons`
and `assignedPartitons` are empty without `firstDiscoveryDone`.

Besides that, I think the `unAssignedInitialPartitons` is better to be
named `unassignedInitialPartitons`.

Best,
Hang

Hongshun Wang <loserwang1...@gmail.com> 于2023年3月17日周五 18:42写道:

> Hi everyone,
>
> I would like to start a discussion on FLIP-288:Enable Dynamic Partition
> Discovery by Default in Kafka Source[1].
>
> As described in mail thread[2], dynamic partition discovery is disabled by
> default and users have to explicitly specify the interval of discovery in
> order to turn it on. Besides, if the initial offset strategy is LATEST,
> same strategy is used for new partitions, leading to the loss of some data
> (thinking a new partition is created and might be discovered by Kafka
> source several minutes later, and the message produced into the partition
> within the gap might be dropped if we use for example "latest" as the
> initial offset strategy.)
>
> The goals of this FLIP are as follows:
>
>    1. Enable partition discovery by default.
>    2. Use earliest as the offset strategy for new partitions after the
>    first discovery.
>
> Looking forward to hearing from you.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source
>
> [2]  <https://lists.apache.org/thread/d7zy46gj3sw0zwzq2rj3fmc0hx8ojtln>
> https://lists.apache.org/thread/d7zy46gj3sw0zwzq2rj3fmc0hx8ojtln
>
>
> Best,
>
> Hongshun
>

Reply via email to