Hey Dmitry, Can you open a ticket at https://issues.apache.org/jira/issues/ and include all this information so we can track and look into it?
Thanks! Sophie On Fri, May 15, 2020 at 2:26 AM Dmitry Sorokin <dmitry.soro...@gmail.com> wrote: > According to documentation, in case if `auto.offset.reset` is set > to none or not set, the exception is thrown to a client code, allowing to > handle it in a way that client want. > In case if one will take a closer look on this mechanism, it will turn out > that it is not working. > > Starting from kafka 2.3 new offset reset negotiation algorithm added > (org.apache.kafka.clients.consumer.internals.Fetcher#validateOffsetsAsync) > During this validation, > Fetcher `org.apache.kafka.clients.consumer.internals.SubscriptionState` is > held in `AWAIT_VALIDATION` fetch state. > This effectively means that fetch requests are not issued and consumption > stopped. > In case if unclean leader election is happening during this time, > `LogTruncationException` is thrown from future listener in method > `validateOffsetsAsync`. > The main problem is that this exception (thrown from listener of future) is > effectively swallowed > by > `org.apache.kafka.clients.consumer.internals.AsyncClient#sendAsyncRequest` > by this part of code > ``` > } catch (RuntimeException e) { > if (!future.isDone()) { > future.raise(e); > } > } > ``` > > In the end the result is: The only way to get out of AWAIT_VALIDATION and > continue consumption is to successfully finish validation, but it can not > be finished. > However - consumer is alive, but is consuming nothing. The only way to > resume consumption is to terminate consumer and start another one. > > We discovered this situation by means of kstreams application, where valid > value of `auto.offset.reset` provided by our code is replaced > by `None` value for a purpose of position reset > (org.apache.kafka.streams.processor.internals.StreamThread#create). > And with kstreams it is even worse, as application may be working, logging > warn messages of format `Truncation detected for partition ...,` but data > is not generated for a long time and in the end is lost, making kstreams > application unreliable. > > *Did someone saw it already, maybe there are some ways to reconfigure this > behavior? I checked code for 2.3, 2.4, trunk client - the bug is still > there.* > > -- > Dmitry Sorokin > mailto://dmitry.soro...@gmail.com >