Hi! Done: https://issues.apache.org/jira/browse/KAFKA-10013
пт, 15 мая 2020 г. в 18:36, Sophie Blee-Goldman <sop...@confluent.io>: > Hey Dmitry, > > Can you open a ticket at https://issues.apache.org/jira/issues/ and > include > all this information so we can track and look into it? > > Thanks! > Sophie > > On Fri, May 15, 2020 at 2:26 AM Dmitry Sorokin <dmitry.soro...@gmail.com> > wrote: > > > According to documentation, in case if `auto.offset.reset` is set > > to none or not set, the exception is thrown to a client code, allowing to > > handle it in a way that client want. > > In case if one will take a closer look on this mechanism, it will turn > out > > that it is not working. > > > > Starting from kafka 2.3 new offset reset negotiation algorithm added > > > (org.apache.kafka.clients.consumer.internals.Fetcher#validateOffsetsAsync) > > During this validation, > > Fetcher `org.apache.kafka.clients.consumer.internals.SubscriptionState` > is > > held in `AWAIT_VALIDATION` fetch state. > > This effectively means that fetch requests are not issued and consumption > > stopped. > > In case if unclean leader election is happening during this time, > > `LogTruncationException` is thrown from future listener in method > > `validateOffsetsAsync`. > > The main problem is that this exception (thrown from listener of future) > is > > effectively swallowed > > by > > > `org.apache.kafka.clients.consumer.internals.AsyncClient#sendAsyncRequest` > > by this part of code > > ``` > > } catch (RuntimeException e) { > > if (!future.isDone()) { > > future.raise(e); > > } > > } > > ``` > > > > In the end the result is: The only way to get out of AWAIT_VALIDATION and > > continue consumption is to successfully finish validation, but it can not > > be finished. > > However - consumer is alive, but is consuming nothing. The only way to > > resume consumption is to terminate consumer and start another one. > > > > We discovered this situation by means of kstreams application, where > valid > > value of `auto.offset.reset` provided by our code is replaced > > by `None` value for a purpose of position reset > > (org.apache.kafka.streams.processor.internals.StreamThread#create). > > And with kstreams it is even worse, as application may be working, > logging > > warn messages of format `Truncation detected for partition ...,` but data > > is not generated for a long time and in the end is lost, making kstreams > > application unreliable. > > > > *Did someone saw it already, maybe there are some ways to reconfigure > this > > behavior? I checked code for 2.3, 2.4, trunk client - the bug is still > > there.* > > > > -- > > Dmitry Sorokin > > mailto://dmitry.soro...@gmail.com > > > -- Dmitry Sorokin mailto://dmitry.soro...@gmail.com