Hi Vito, Haha, your archive game is on point!
What Matthias said in that email is essentially what I figured was the rationale. It makes sense, but the point I was making is that this really doesn’t seem like a good way to structure a production app. On the other hand, considering the exception fatal has a good chance of avoiding a frustrating debug session if you just forgot to call start. Nevertheless, if we omit the categorization, it’s moot. It would be easy to add a categorization layer later if we want it, but not very easy to change it if we get it wrong. Thanks for your consideration! -John On Wed, Jan 15, 2020, at 21:14, Vito Jeng wrote: > Hi John, > > About `StreamsNotStartedException is strange` -- > The original idea came from Matthias, two years ago. :) > You can reference here: > https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc5...@confluent.io%3e > > About omitting the categorization -- > It looks reasonable. I'm fine with omitting the categorization but not very > sure it is a good choice. > Does any other folks provide opinion? > > > Hi, folks, > > Just update the KIP-216, please take a look. > > --- > Vito > > > On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <v...@is-land.com.tw> wrote: > > > > > Hi, folks, > > > > Thank you suggestion, really appreciate it. :) > > I understand your concern. I'll merge StreamsNotRunningException and > > StateStoreNotAvailableException. > > > > > > --- > > Vito > > > > > > On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcep...@apache.org> wrote: > > > >> Hey Vito, > >> > >> Yes, thanks for the KIP. Sorry the discussion has been so long. > >> Hopefully, we can close it out soon. > >> > >> I agree we can drop StreamsNotRunningException in favor of > >> just StateStoreNotAvailableException. > >> > >> Unfortunately, I have some higher-level concerns. The value > >> of these exceptions is that they tell you how to handle the > >> various situations that can arise while querying a distributed > >> data store. > >> > >> Ideally, as a caller, I should be able to just catch "retriable" or > >> "fatal" and handle them appropriately. Otherwise, there's no > >> point in having categories, and we should just have all the > >> exceptions extend InvalidStateStoreException. > >> > >> Presently, it's not possible to tell from just the > >> "retriable"/"fatal" distinction what to do. You can tell > >> from the descriptions of the various exceptions. E.g.: > >> > >> Retriable: > >> * StreamsRebalancingException: the exact same call > >> should just be retried until the rebalance is complete > >> * StateStoreMigratedException: the store handle is > >> now invalid, so you need to re-discover the instance > >> and get a new handle on that instance. In other words, > >> the query itself may be valid, but the particular method > >> invocation on this particular instance has encountered > >> a fatal exception. > >> > >> Fatal: > >> * UnknownStateStoreException: this is truly fatal. No amount > >> of retrying or re-discovering is going to get you a handle on a > >> store that doesn't exist in the cluster. > >> * StateStoreNotAvailableException: this is actually recoverable, > >> since the store might exist in the cluster, but isn't available on > >> this particular instance (which is shut down or whatever). > >> > >> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine > >> with omitting the categorization and just having 5 subclasses > >> of InvalidStateStoreException. Each of them would tell you > >> how to handle them, and it's not too many to really > >> understand and handle each one. > >> > >> If you really want to have a middle tier, I'd recommend: > >> * RetryableStateStoreException: the exact same call > >> should be repeated. > >> * RecoverableStateStoreException: the store handle > >> should be discarded and the caller should re-discover > >> the location of the store and repeat the query on the > >> correct instance. > >> * FatalStateStoreException: the query/request is totally > >> invalid and will never succeed. > >> > >> However, attempting to categorize the proposed exceptions > >> reveals even problems with this categorization: > >> Retriable: > >> * StreamsRebalancingException > >> Recoverable: > >> * StateStoreMigratedException > >> * StreamsNotRunningException > >> Fatal: > >> * UnknownStateStoreException > >> > >> But StreamsNotStartedException is strange... It means that > >> one code path got a handle on a specific KafkaStreams object > >> instance and sent it a query before another code path > >> invoked the start() method on the exact same object instance. > >> It seems like the most likely scenario is that whoever wrote > >> the program just forgot to call start() before querying, in > >> which case, retrying isn't going to help, and a fatal exception > >> is more appropriate. I.e., it sounds like a "first 15 minutes > >> experience" problem, and making it fatal would be more > >> helpful. Even in a production context, there's no reason not > >> to sequence your application startup such that you don't > >> accept queries until after Streams is started. Thus, I guess > >> I'd categorize it under "fatal". > >> > >> Regardless of whether you make it fatal or retriable, you'd > >> still have a whole category with only one exception in it, > >> and the other two categories only have two exceptions. > >> Plus, as you pointed out in the KIP, you can't get all > >> exceptions in all cases anyway: > >> * store() can only throw NotStarted, NotRunning, > >> and Unknown > >> * actual store queries can only throw Rebalancing, > >> Migrated, and NotRunning > >> > >> Thus, in practice also, there are exactly three categories > >> and also exactly three exception types. It doesn't seem > >> like there's a great advantage to the categories here. To > >> avoid the categorization problem and also to clarify what > >> exceptions can actually be thrown in different circumstances, > >> it seems like we should just: > >> * get rid of the middle tier and make all the exceptions > >> extend InvalidStateStoreException > >> * drop StateStoreNotAvailableException in favor of > >> StreamsNotRunningException > >> * clearly document on all public methods which exceptions > >> need to be handled > >> > >> How do you feel about this? > >> Thanks, > >> -John > >> > >> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote: > >> > Thanks for KIP Vito. > >> > > >> > Overall the KIP LGTM, but I'd have to agree with others on merging the > >> > `StreamsNotRunningException` and `StateStoreNotAvailableException` > >> classes. > >> > > >> > Since in both cases, the thread state is in `PENDING_SHUTDOWN || > >> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to > >> > use the different > >> > exceptions. Maybe a good middle ground would be to have a detailed > >> > exception message. > >> > > >> > The KIP freeze is close, so I think if we can agree on this, we can > >> wrap up > >> > the voting soon. > >> > > >> > Thanks, > >> > Bill > >> > > >> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <matth...@confluent.io> > >> > wrote: > >> > > >> > > Vito, > >> > > > >> > > It's still unclear to me what the advantage is, to have both > >> > > `StreamsNotRunningException` and `StateStoreNotAvailableException`? > >> > > > >> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR` > >> > > and thus, for a user point of view, why does it matter if the store is > >> > > closed on not? I don't understand why/how this information would be > >> > > useful? Do you have a concrete example in mind how a user would react > >> > > differently to both exceptions? > >> > > > >> > > > >> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best > >> to > >> > > actually do this on a per-query basis, ie, have an overload > >> > > `KafkaStreams#store(...)` that takes a boolean flag that allow to > >> > > _disable_ the exception and opt-in to query a active store during > >> > > recovery. However, as KIP-535 actually introduces this change in > >> > > behavior, I think KIP-216 should not cover this, but KIP-535 should be > >> > > updated. I'll follow up on the other KIP thread to raise this point. > >> > > > >> > > > >> > > -Matthias > >> > > > >> > > On 1/11/20 12:26 AM, Vito Jeng wrote: > >> > > > Hi, Matthias & Vinoth, > >> > > > > >> > > > Thanks for the feedback. > >> > > > > >> > > >> What is still unclear to me is, what we gain by having both > >> > > >> `StreamsNotRunningException` and > >> `StateStoreNotAvailableException`. Both > >> > > >> exception are thrown when KafkaStreams is in state > >> PENDING_SHUTDOWN / > >> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the > >> > > >> state store is closed on not -- I can't query it anyway? Maybe I > >> miss > >> > > >> something thought? > >> > > > > >> > > > Yes, both `StreamsNotRunningException` and > >> > > > `StateStoreNotAvailableException` are fatal exception. > >> > > > But `StateStoreNotAvailableException` is fatal exception about state > >> > > store > >> > > > related. > >> > > > I think it would be helpful that if user need to distinguish these > >> two > >> > > > different case to handle it. > >> > > > > >> > > > I'm not very sure, does that make sense? > >> > > > > >> > > > > >> > > > --- > >> > > > Vito > >> > > > > >> > > > > >> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <vin...@apache.org> > >> > > wrote: > >> > > > > >> > > >> +1 on merging `StreamsNotRunningException` and > >> > > >> `StateStoreNotAvailableException`, both exceptions are fatal > >> anyway. IMO > >> > > >> its best to have these exceptions be about the state store (and not > >> > > streams > >> > > >> state), to easier understanding. > >> > > >> > >> > > >> Additionally, KIP-535 allows for querying of state stores in > >> rebalancing > >> > > >> state. So do we need the StreamsRebalancingException? > >> > > >> > >> > > >> > >> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <matth...@confluent.io> > >> > > wrote: > >> > > >>> Sorry that I dropped the ball on this... > >> > > >>> > >> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to start > >> a > >> > > VOTE > >> > > >>> thread. > >> > > >>> > >> > > >>> What is still unclear to me is, what we gain by having both > >> > > >>> `StreamsNotRunningException` and > >> `StateStoreNotAvailableException`. > >> > > Both > >> > > >>> exception are thrown when KafkaStreams is in state > >> PENDING_SHUTDOWN / > >> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if > >> the > >> > > >>> state store is closed on not -- I can't query it anyway? Maybe I > >> miss > >> > > >>> something thought? > >> > > >>> > >> > > >>> > >> > > >>> -Matthias > >> > > >>> > >> > > >>> > >> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote: > >> > > >>>> Sorry for the late reply, thanks for the review. > >> > > >>>> > >> > > >>>> > >> > > >>>>> About `StateStoreMigratedException`: > >> > > >>>>> > >> > > >>>>> Why is it only thrown if the state is REBALANCING? A store > >> might be > >> > > >>>>> migrated during a rebalance, and Kafka Streams might resume > >> back to > >> > > >>>>> RUNNING state and afterward somebody tries to use an old store > >> > > handle. > >> > > >>>>> Also, if state is REBALANCING, should we throw > >> > > >>>>> `StreamThreadRebalancingException`? Hence, I think > >> > > >>>>> `StateStoreMigratedException` does only make sense during > >> `RUNNING` > >> > > >> state. > >> > > >>>>> > >> > > >>>> > >> > > >>>> Thank you point this, already updated. > >> > > >>>> > >> > > >>>> > >> > > >>>> Why do we need to distinguish between > >> > > `KafkaStreamsNotRunningException` > >> > > >>>>> and `StateStoreNotAvailableException`? > >> > > >>>>> > >> > > >>>> > >> > > >>>> `KafkaStreamsNotRunningException` may be caused by various > >> reasons, I > >> > > >> think > >> > > >>>> it would be helpful that the > >> > > >>>> user can distinguish whether it is caused by the state store > >> closed. > >> > > >>>> (Maybe I am wrong...) > >> > > >>>> > >> > > >>>> > >> > > >>>> Last, why do we distinguish between `KafkaStreams` instance and > >> > > >>>>> `StreamsThread`? To me, it seems we should always refer to the > >> > > >> instance, > >> > > >>>>> because that is the level of granularity in which we > >> enable/disable > >> > > >> IQ atm. > >> > > >>>>> > >> > > >>>> > >> > > >>>> Totally agree. Do you mean the naming of state store exceptions? > >> > > >>>> I don't have special reason to distinguish these two. > >> > > >>>> Your suggestion look more reasonable for the exception naming. > >> > > >>>> > >> > > >>>> > >> > > >>>> Last, for `StateStoreMigratedException`, I would add that a user > >> need > >> > > >> to > >> > > >>>>> rediscover the store and cannot blindly retry as the store > >> handle is > >> > > >>>>> invalid and a new store handle must be retrieved. That is a > >> > > difference > >> > > >>>>> to `StreamThreadRebalancingException` that allows for "blind" > >> retries > >> > > >>>>> that either resolve (if the store is still on the same instance > >> after > >> > > >>>>> rebalancing finishes, or changes to > >> `StateStoreMigratedException` if > >> > > >> the > >> > > >>>>> store was migrated away during rebalancing). > >> > > >>>>> > >> > > >>>> > >> > > >>>> Nice, it's great! Thank you. > >> > > >>>> > >> > > >>>> > >> > > >>>> The KIP already updated, please take a look. :) > >> > > >>>> > >> > > >>>> > >> > > >>>> > >> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax < > >> > > matth...@confluent.io > >> > > >>> > >> > > >>>> wrote: > >> > > >>>> > >> > > >>>>> Any update on this KIP? > >> > > >>>>> > >> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote: > >> > > >>>>>> Sorry for the late reply. The 2.4 deadline kept us quite busy. > >> > > >>>>>> > >> > > >>>>>> About `StateStoreMigratedException`: > >> > > >>>>>> > >> > > >>>>>> Why is it only thrown if the state is REBALANCING? A store > >> might be > >> > > >>>>>> migrated during a rebalance, and Kafka Streams might resume > >> back to > >> > > >>>>>> RUNNING state and afterward somebody tries to use an old store > >> > > >> handle. > >> > > >>>>>> Also, if state is REBALANCING, should we throw > >> > > >>>>>> `StreamThreadRebalancingException`? Hence, I think > >> > > >>>>>> `StateStoreMigratedException` does only make sense during > >> `RUNNING` > >> > > >>>>> state. > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> Why do we need to distinguish between > >> > > >> `KafkaStreamsNotRunningException` > >> > > >>>>>> and `StateStoreNotAvailableException`? > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> Last, why do we distinguish between `KafkaStreams` instance and > >> > > >>>>>> `StreamsThread`? To me, it seems we should always refer to the > >> > > >> instance, > >> > > >>>>>> because that is the level of granularity in which we > >> enable/disable > >> > > >> IQ > >> > > >>>>> atm. > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> Last, for `StateStoreMigratedException`, I would add that a > >> user > >> > > >> need to > >> > > >>>>>> rediscover the store and cannot blindly retry as the store > >> handle is > >> > > >>>>>> invalid and a new store handle must be retrieved. That is a > >> > > >> difference > >> > > >>>>>> to `StreamThreadRebalancingException` that allows for "blind" > >> > > retries > >> > > >>>>>> that either resolve (if the store is still on the same instance > >> > > after > >> > > >>>>>> rebalancing finishes, or changes to > >> `StateStoreMigratedException` if > >> > > >> the > >> > > >>>>>> store was migrated away during rebalancing). > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> > >> > > >>>>>> -Matthias > >> > > >>>>>> > >> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote: > >> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9` > >> <https://shorturl.at/CDNT9> > >> > > <https://shorturl.at/CDNT9> > >> > > >> <https://shorturl.at/CDNT9> > >> > > >>>>> <https://shorturl.at/CDNT9> > >> > > >>>>>>> <https://shorturl.at/CDNT9> seems incorrect. > >> > > >>>>>>> > >> > > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU > >> > > >>>>>>> > >> > > >>>>>>> > >> > > >>>>>>> --- > >> > > >>>>>>> Vito > >> > > >>>>>>> > >> > > >>>>>>> > >> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng < > >> v...@is-land.com.tw> > >> > > >> wrote: > >> > > >>>>>>> > >> > > >>>>>>>> Thanks, Matthias! > >> > > >>>>>>>> > >> > > >>>>>>>>> About `StreamThreadNotStartedException`: > >> > > >>>>>>>> > >> > > >>>>>>>> Thank you for explanation. I agree with your opinion. > >> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would never throw > >> > > >>>>>>>> `StreamThreadNotStartedException`. > >> > > >>>>>>>> > >> > > >>>>>>>> For the case that corresponding thread crashes after we > >> handed out > >> > > >> the > >> > > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException` > >> or > >> > > >>>>>>>> `StateStoreMigratedException`. > >> > > >>>>>>>> In `StreamThreadStateStoreProvider`, we would throw > >> > > >>>>>>>> `KafkaStreamsNotRunningException` when stream thread is not > >> > > >> running( > >> > > >>>>>>>> https://shorturl.at/CDNT9) or throw > >> `StateStoreMigratedException` > >> > > >> when > >> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN). So I think we > >> do not > >> > > >> need > >> > > >>>>> to > >> > > >>>>>>>> add a new type for this case. Does that make sense? > >> > > >>>>>>>> > >> > > >>>>>>>> > >> > > >>>>>>>>> About `KafkaStreamsNotRunningException` vs > >> > > >>>>>>>> `StreamThreadNotRunningException`: > >> > > >>>>>>>> > >> > > >>>>>>>> I understand your point. I rename > >> > > >> `StreamThreadNotRunningException` to > >> > > >>>>>>>> `KafkaStreamsNotRunningException`. > >> > > >>>>>>>> > >> > > >>>>>>>> > >> > > >>>>>>>> About check unknown state store names: > >> > > >>>>>>>> Thank you for the hint. I add a new type > >> > > >> `UnknownStateStoreException` > >> > > >>>>> for > >> > > >>>>>>>> this case. > >> > > >>>>>>>> > >> > > >>>>>>>> > >> > > >>>>>>>>> Also, we should still have fatal exception > >> > > >>>>>>>> `StateStoreNotAvailableException`? Not sure why you remove > >> it? > >> > > >>>>>>>> > >> > > >>>>>>>> Thank you point this, already add it again. > >> > > >>>>>>>> > >> > > >>>>>>>> The KIP already updated, please take a look. > >> > > >>>>>>>> > >> > > >>>>>>>> --- > >> > > >>>>>>>> Vito > >> > > >>>>>>>> > >> > > >>>>>>> > >> > > >>>>>> > >> > > >>>>> > >> > > >>>>> > >> > > >>>> > >> > > >>> > >> > > >>> > >> > > >> > >> > > > > >> > > > >> > > > >> > > >> > > >