Yes, this is a Kafka side issue.

Since the affected version of Kafka is all below 1.1.0, ideally speaking we
should upgrade Kafka minor version on flink-connector-kafka-0.10/0.11 once
the fix was back-ported on the Kafka side.
However based on the fact that the PR has been merged for 2 years, I am not
sure that would eventually happen.

--
Rong

On Fri, Mar 13, 2020 at 6:43 AM Aljoscha Krettek <aljos...@apache.org>
wrote:

> Thanks for the update!
>
> On 13.03.20 13:47, Rong Rong wrote:
> > 1. I think we have finally pinpointed what the root cause to this issue
> is:
> > When partitions are assigned manually (e.g. with assign() API instead
> > subscribe() API) the client will not try to rediscover the coordinator if
> > it dies [1]. This seems to no longer be an issue after Kafka 1.1.0
> > After cherry-picking the PR[2] back to Kafka 0.11.x branch and package it
> > with our Flink application, we haven't seen this issue re-occurred so
> far.
>
> So the solution to this thread is: we don't do anything because it is a
> Kafka bug that was fixed?
>
> > 2. The GROUP_OFFSETS is in fact the default startup mode if Checkpoint is
> > not enable - that's why I was a bit surprise that this problem was
> reported
> > so many times.
> > To follow up on the question "whether resuming from GROUP_OFFSETS are
> > useful": there are definitely use cases where users don't want to use
> > checkpointing (e.g. due to resource constraint, storage cost
> consideration,
> > etc), but somehow still want to avoid a certain amount of data loss. Most
> > of our analytics use cases falls into this category.
>
> Yes, this is what I assumed. I was not suggesting to remove the feature.
> We also just leave it as is, right?
>
> Best,
> Aljoscha
>

Reply via email to