[ https://issues.apache.org/jira/browse/KAFKA-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar resolved KAFKA-8994. ----------------------------------- Resolution: Fixed > Streams should expose standby replication information & allow stale reads of > state store > ---------------------------------------------------------------------------------------- > > Key: KAFKA-8994 > URL: https://issues.apache.org/jira/browse/KAFKA-8994 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Vinoth Chandar > Assignee: Vinoth Chandar > Priority: Major > Labels: needs-kip > > Currently Streams interactive queries (IQ) fail during the time period where > there is a rebalance in progress. > Consider the following scenario in a three node Streams cluster with node A, > node S and node R, executing a stateful sub-topology/topic group with 1 > partition and `_num.standby.replicas=1_` > * *t0*: A is the active instance owning the partition, B is the standby that > keeps replicating the A's state into its local disk, R just routes streams > IQs to active instance using StreamsMetadata > * *t1*: IQs pick node R as router, R forwards query to A, A responds back to > R which reverse forwards back the results. > * *t2:* Active A instance is killed and rebalance begins. IQs start failing > to A > * *t3*: Rebalance assignment happens and standby B is now promoted as active > instance. IQs continue to fail > * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last > commit position, IQs continue to fail > * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs > start succeeding again > > Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can > take few seconds (~10 seconds based on defaults values). Depending on how > laggy the standby B was prior to A being killed, t4 can take few > seconds-minutes. > While this behavior favors consistency over availability at all times, the > long unavailability window might be undesirable for certain classes of > applications (e.g simple caches or dashboards). > This issue aims to also expose information about standby B to R, during each > rebalance such that the queries can be routed by an application to a standby > to serve stale reads, choosing availability over consistency. > > > > > > > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)