[ https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714311#comment-16714311 ]
Guozhang Wang commented on KAFKA-6144: -------------------------------------- [~NaviBrar] as labeled above I think since it is going to change some public APIs (e.g. the `StreamsMetadata` which is returned from KafkaStreams#metadataForXXX / allMetadata is a public class), I'd suggest you start writing a KIP page to summarize all your thoughts, and also by writing the concrete proposal down it can also help you think about any edge cases like upgrade path, implementation details etc. But it seems you do not have an account for the wiki space yet (https://cwiki.apache.org/confluence/display/KAFKA/Index, it is different from your account in JIRA), please ping me with your account and I can then add you to the permission list so that you can create a KIP: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals [~NIzhikov] I saw you are assigning this ticket to yourself, are you also working on this ticket in parallel as well? > Allow state stores to serve stale reads during rebalance > -------------------------------------------------------- > > Key: KAFKA-6144 > URL: https://issues.apache.org/jira/browse/KAFKA-6144 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Antony Stubbs > Assignee: Nikolay Izhikov > Priority: Major > Labels: needs-kip > > Currently when expanding the KS cluster, the new node's partitions will be > unavailable during the rebalance, which for large states can take a very long > time, or for small state stores even more than a few ms can be a deal breaker > for micro service use cases. > One workaround is to allow stale data to be read from the state stores when > use case allows. > Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - > potentially a two phase rebalance > This is the description from KAFKA-6031 (keeping this JIRA as the title is > more descriptive): > {quote} > Currently reads for a key are served by single replica, which has 2 drawbacks: > - if replica is down there is a down time in serving reads for keys it was > responsible for until a standby replica takes over > - in case of semantic partitioning some replicas might become hot and there > is no easy way to scale the read load > If standby replicas would have endpoints that are exposed in StreamsMetadata > it would enable serving reads from several replicas, which would mitigate the > above drawbacks. > Due to the lag between replicas reading from multiple replicas simultaneously > would have weaker (eventual) consistency comparing to reads from single > replica. This however should be acceptable tradeoff in many cases. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)