[jira] [Created] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

Vinoth Chandar (Jira) Mon, 07 Oct 2019 16:22:30 -0700

Vinoth Chandar created KAFKA-8994:
-------------------------------------

             Summary: Streams should expose standby replication information & 
allow stale reads of state store
                 Key: KAFKA-8994
                 URL: https://issues.apache.org/jira/browse/KAFKA-8994
             Project: Kafka
          Issue Type: New Feature
          Components: streams
            Reporter: Vinoth Chandar



Currently Streams interactive queries (IQ) fail during the time period where 
there is a rebalance in progress. 

Consider the following scenario in a three node Streams cluster with node A, 
node S and node R, executing a stateful sub-topology/topic group with 1 
partition and `_num.standby.replicas=1_`  
 * *t0*: A is the active instance owning the partition, B is the standby that 
keeps replicating the A's state into its local disk, R just routes streams IQs 
to active instance using StreamsMetadata
 * *t1*: IQs pick node R as router, R forwards query to A, A responds back to R 
which reverse forwards back the results.
 * *t2:* Active A instance is killed and rebalance begins. IQs start failing to 
A
 * *t3*: Rebalance assignment happens and standby B is now promoted as active 
instance. IQs continue to fail
 * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
commit position, IQs continue to fail
 * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
start succeeding again

 

Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
take few seconds (~10 seconds based on defaults values). Depending on how laggy 
the standby B was prior to A being killed, t4 can take few seconds-minutes. 

 

While this behavior favors consistency over availability at all times, the long 
unavailability window might be undesirable for certain classes of applications 
(e.g simple caches or dashboards). 

 

This issue aims to also expose information about standby B to R, during each 
rebalance such that the queries can be routed by an application to a standby to 
serve stale reads, choosing availability over consistency. 

 

 

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

Reply via email to