[jira] [Commented] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

Rohan Desai (Jira) Thu, 13 Aug 2020 19:07:12 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-10357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177425#comment-17177425
 ]


Rohan Desai commented on KAFKA-10357:
-------------------------------------

> So maybe we can consider just fixing KAFKA-3370 and resetting policy to 
>`none` would fix it, and we just need an elegant way to shutdown the whole 
>application and notify the user when this exception get thrown due to 
>re-creation of the repartition topics. WDYT?
 
One issue here is that we're pushing the responsibility of handling this 
scenario without data loss into the application. Typically I'd expect most 
applications that see this error to exit - the app can no longer make progress. 
However most apps running in a production setting are wrapped in some sort of 
retry loop. For example, someone just using Streams might run their service 
under something like upstart and would typically configure it to just restart 
the process when it exits. Or maybe they are running in k8s which would start a 
new pod when a pod exits. In ksql we would just try to restart the query. Even 
if we included the smarts to detect this case and not restart, we'd need to 
persist this information somewhere so that we would know not to do this on a 
restart. It seems preferable to me to have streams be able to detect when its 
internal state is invalid. Requiring explicit initialization would be one way 
to do this.
 
> a new Streams client could be started before the rebalance that should report 
>the error took place
 
I would expect that a user would do this initialization as a manual step before 
starting their application. I think it's fine for there to be some initial 
configuration that's not done automatically by streams.

> Handle accidental deletion of repartition-topics as exceptional failure
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-10357
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10357
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Bruno Cadonna
>            Priority: Major
>
> Repartition topics are both written by Stream's producer and read by Stream's 
> consumer, so when they are accidentally deleted both clients may be notified. 
> But in practice the consumer would react to it much quicker than producer 
> since the latter has a delivery timeout expiration period (see 
> https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to 
> it, it will re-join the group since metadata changed and during the triggered 
> rebalance it would auto-recreate the topic silently and continue, causing 
> data lost silently. 
> One idea, is to only create all repartition topics *once* in the first 
> rebalance and not auto-create them any more in future rebalances, instead it 
> would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code 
> (https://issues.apache.org/jira/browse/KAFKA-10355).
> The challenge part would be, how to determine if it is the first-ever 
> rebalance, and there are several wild ideas I'd like to throw out here:
> 1) change the thread state transition diagram so that STARTING state would 
> not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the 
> assign function we can check if the state is still in CREATED and not RUNNING.
> 2) augment the subscriptionInfo to encode whether or not this is the first 
> time ever rebalance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

Reply via email to