Well, first of all, in current releases if any timeout exception happens, the corresponding thread dies. Thus, if a standby task throws, it would impact the active tasks of the same thread and the thread dies and all active and standby tasks need to redistributed to remaining threads/instances via a rebalance.
We are actually improving this in upcoming `2.8.0` release vie KIP-572. Beside this, both consumers are used interleaved, ie, the thread polls for the main consumer, processed some records, polls for the restore consumer, and updates standby tasks and so forth. Does this answer your question? -Matthias On 2/9/21 6:50 AM, William Hovnanyan wrote: > Hi, > > We are running KStreams application (2.6.1) with standby replicas set to 1. > > Recently one of the instances had an unexpected behaviour. We observed > several DisconnectExceptions & TimeoutException in logs due to request > timeouts for a single stream thread, > logged by the internal restore consumer which is used by a standby task to > consume store changelog topics > > Rowthreadtimestamploggerlevelmessage exception > 247 > <applicationName>-StreamThread-8 > 2021-02-08 16:06:05.425439 UTC > org.apache.kafka.clients.NetworkClient > DEBUG > [Consumer clientId=<applicationName>-StreamThread-8-restore-consumer, > groupId=null] Disconnecting from node 1596506249 due to request timeout. > null > 248 > <applicationName>-StreamThread-8 > 2021-02-08 16:06:05.425446 UTC > org.apache.kafka.clients.NetworkClient > DEBUG > [Consumer clientId=<applicationName>-StreamThread-8-restore-consumer, > groupId=null] Disconnecting from node 1802747700 due to request timeout. > null > 249 > <applicationName>-StreamThread-8 > 2021-02-08 16:06:05.425463 UTC > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient > DEBUG > [Consumer clientId=<applicationName>-StreamThread-8-restore-consumer, > groupId=null] Cancelled request with header RequestHeader(apiKey=FETCH, > apiVersion=11, clientId=<applicationName>-StreamThread-8-restore-consumer, > correlationId=2102822) due to node 1596506249 being disconnected > null > 250 > <applicationName>-StreamThread-8 > 2021-02-08 16:06:05.425472 UTC > org.apache.kafka.clients.FetchSessionHandler > INFO > [Consumer clientId=<applicationName>-StreamThread-8-restore-consumer, > groupId=null] Error sending fetch request (sessionId=INVALID, > epoch=INITIAL) to node 1596506249: > org.apache.kafka.common.errors.DisconnectException: null > > After which the restore consumer was able to retry and connect. These are > DEBUG/INFO level logs since there were no ERROR logs at all. > > However, the impact was that we were not processing events for some time > with some of the active tasks in that instance, since the input message > delay had spiked (calculated as CurrentTime-EventTime). At the same time we > were not able to find anything concerning in application logs (even with > DEBUG enabled) related to active tasks and the main consumer/producer used > by them. > > So the question is, given that the standby and active tasks are sharing a > thread, in case there is a timeout/disconnect errors in standby restore > consumer, could that in theory impact the processing latency for active > tasks as well? > > William Hovnanyan > Software Engineer > EMAIL whovnan...@twilio.com >