Hi John, About `StreamsNotStartedException is strange` -- The original idea came from Matthias, two years ago. :) You can reference here: https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc5...@confluent.io%3e
About omitting the categorization -- It looks reasonable. I'm fine with omitting the categorization but not very sure it is a good choice. Does any other folks provide opinion? Hi, folks, Just update the KIP-216, please take a look. --- Vito On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <v...@is-land.com.tw> wrote: > > Hi, folks, > > Thank you suggestion, really appreciate it. :) > I understand your concern. I'll merge StreamsNotRunningException and > StateStoreNotAvailableException. > > > --- > Vito > > > On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcep...@apache.org> wrote: > >> Hey Vito, >> >> Yes, thanks for the KIP. Sorry the discussion has been so long. >> Hopefully, we can close it out soon. >> >> I agree we can drop StreamsNotRunningException in favor of >> just StateStoreNotAvailableException. >> >> Unfortunately, I have some higher-level concerns. The value >> of these exceptions is that they tell you how to handle the >> various situations that can arise while querying a distributed >> data store. >> >> Ideally, as a caller, I should be able to just catch "retriable" or >> "fatal" and handle them appropriately. Otherwise, there's no >> point in having categories, and we should just have all the >> exceptions extend InvalidStateStoreException. >> >> Presently, it's not possible to tell from just the >> "retriable"/"fatal" distinction what to do. You can tell >> from the descriptions of the various exceptions. E.g.: >> >> Retriable: >> * StreamsRebalancingException: the exact same call >> should just be retried until the rebalance is complete >> * StateStoreMigratedException: the store handle is >> now invalid, so you need to re-discover the instance >> and get a new handle on that instance. In other words, >> the query itself may be valid, but the particular method >> invocation on this particular instance has encountered >> a fatal exception. >> >> Fatal: >> * UnknownStateStoreException: this is truly fatal. No amount >> of retrying or re-discovering is going to get you a handle on a >> store that doesn't exist in the cluster. >> * StateStoreNotAvailableException: this is actually recoverable, >> since the store might exist in the cluster, but isn't available on >> this particular instance (which is shut down or whatever). >> >> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine >> with omitting the categorization and just having 5 subclasses >> of InvalidStateStoreException. Each of them would tell you >> how to handle them, and it's not too many to really >> understand and handle each one. >> >> If you really want to have a middle tier, I'd recommend: >> * RetryableStateStoreException: the exact same call >> should be repeated. >> * RecoverableStateStoreException: the store handle >> should be discarded and the caller should re-discover >> the location of the store and repeat the query on the >> correct instance. >> * FatalStateStoreException: the query/request is totally >> invalid and will never succeed. >> >> However, attempting to categorize the proposed exceptions >> reveals even problems with this categorization: >> Retriable: >> * StreamsRebalancingException >> Recoverable: >> * StateStoreMigratedException >> * StreamsNotRunningException >> Fatal: >> * UnknownStateStoreException >> >> But StreamsNotStartedException is strange... It means that >> one code path got a handle on a specific KafkaStreams object >> instance and sent it a query before another code path >> invoked the start() method on the exact same object instance. >> It seems like the most likely scenario is that whoever wrote >> the program just forgot to call start() before querying, in >> which case, retrying isn't going to help, and a fatal exception >> is more appropriate. I.e., it sounds like a "first 15 minutes >> experience" problem, and making it fatal would be more >> helpful. Even in a production context, there's no reason not >> to sequence your application startup such that you don't >> accept queries until after Streams is started. Thus, I guess >> I'd categorize it under "fatal". >> >> Regardless of whether you make it fatal or retriable, you'd >> still have a whole category with only one exception in it, >> and the other two categories only have two exceptions. >> Plus, as you pointed out in the KIP, you can't get all >> exceptions in all cases anyway: >> * store() can only throw NotStarted, NotRunning, >> and Unknown >> * actual store queries can only throw Rebalancing, >> Migrated, and NotRunning >> >> Thus, in practice also, there are exactly three categories >> and also exactly three exception types. It doesn't seem >> like there's a great advantage to the categories here. To >> avoid the categorization problem and also to clarify what >> exceptions can actually be thrown in different circumstances, >> it seems like we should just: >> * get rid of the middle tier and make all the exceptions >> extend InvalidStateStoreException >> * drop StateStoreNotAvailableException in favor of >> StreamsNotRunningException >> * clearly document on all public methods which exceptions >> need to be handled >> >> How do you feel about this? >> Thanks, >> -John >> >> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote: >> > Thanks for KIP Vito. >> > >> > Overall the KIP LGTM, but I'd have to agree with others on merging the >> > `StreamsNotRunningException` and `StateStoreNotAvailableException` >> classes. >> > >> > Since in both cases, the thread state is in `PENDING_SHUTDOWN || >> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to >> > use the different >> > exceptions. Maybe a good middle ground would be to have a detailed >> > exception message. >> > >> > The KIP freeze is close, so I think if we can agree on this, we can >> wrap up >> > the voting soon. >> > >> > Thanks, >> > Bill >> > >> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <matth...@confluent.io> >> > wrote: >> > >> > > Vito, >> > > >> > > It's still unclear to me what the advantage is, to have both >> > > `StreamsNotRunningException` and `StateStoreNotAvailableException`? >> > > >> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR` >> > > and thus, for a user point of view, why does it matter if the store is >> > > closed on not? I don't understand why/how this information would be >> > > useful? Do you have a concrete example in mind how a user would react >> > > differently to both exceptions? >> > > >> > > >> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best >> to >> > > actually do this on a per-query basis, ie, have an overload >> > > `KafkaStreams#store(...)` that takes a boolean flag that allow to >> > > _disable_ the exception and opt-in to query a active store during >> > > recovery. However, as KIP-535 actually introduces this change in >> > > behavior, I think KIP-216 should not cover this, but KIP-535 should be >> > > updated. I'll follow up on the other KIP thread to raise this point. >> > > >> > > >> > > -Matthias >> > > >> > > On 1/11/20 12:26 AM, Vito Jeng wrote: >> > > > Hi, Matthias & Vinoth, >> > > > >> > > > Thanks for the feedback. >> > > > >> > > >> What is still unclear to me is, what we gain by having both >> > > >> `StreamsNotRunningException` and >> `StateStoreNotAvailableException`. Both >> > > >> exception are thrown when KafkaStreams is in state >> PENDING_SHUTDOWN / >> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the >> > > >> state store is closed on not -- I can't query it anyway? Maybe I >> miss >> > > >> something thought? >> > > > >> > > > Yes, both `StreamsNotRunningException` and >> > > > `StateStoreNotAvailableException` are fatal exception. >> > > > But `StateStoreNotAvailableException` is fatal exception about state >> > > store >> > > > related. >> > > > I think it would be helpful that if user need to distinguish these >> two >> > > > different case to handle it. >> > > > >> > > > I'm not very sure, does that make sense? >> > > > >> > > > >> > > > --- >> > > > Vito >> > > > >> > > > >> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <vin...@apache.org> >> > > wrote: >> > > > >> > > >> +1 on merging `StreamsNotRunningException` and >> > > >> `StateStoreNotAvailableException`, both exceptions are fatal >> anyway. IMO >> > > >> its best to have these exceptions be about the state store (and not >> > > streams >> > > >> state), to easier understanding. >> > > >> >> > > >> Additionally, KIP-535 allows for querying of state stores in >> rebalancing >> > > >> state. So do we need the StreamsRebalancingException? >> > > >> >> > > >> >> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <matth...@confluent.io> >> > > wrote: >> > > >>> Sorry that I dropped the ball on this... >> > > >>> >> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to start >> a >> > > VOTE >> > > >>> thread. >> > > >>> >> > > >>> What is still unclear to me is, what we gain by having both >> > > >>> `StreamsNotRunningException` and >> `StateStoreNotAvailableException`. >> > > Both >> > > >>> exception are thrown when KafkaStreams is in state >> PENDING_SHUTDOWN / >> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if >> the >> > > >>> state store is closed on not -- I can't query it anyway? Maybe I >> miss >> > > >>> something thought? >> > > >>> >> > > >>> >> > > >>> -Matthias >> > > >>> >> > > >>> >> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote: >> > > >>>> Sorry for the late reply, thanks for the review. >> > > >>>> >> > > >>>> >> > > >>>>> About `StateStoreMigratedException`: >> > > >>>>> >> > > >>>>> Why is it only thrown if the state is REBALANCING? A store >> might be >> > > >>>>> migrated during a rebalance, and Kafka Streams might resume >> back to >> > > >>>>> RUNNING state and afterward somebody tries to use an old store >> > > handle. >> > > >>>>> Also, if state is REBALANCING, should we throw >> > > >>>>> `StreamThreadRebalancingException`? Hence, I think >> > > >>>>> `StateStoreMigratedException` does only make sense during >> `RUNNING` >> > > >> state. >> > > >>>>> >> > > >>>> >> > > >>>> Thank you point this, already updated. >> > > >>>> >> > > >>>> >> > > >>>> Why do we need to distinguish between >> > > `KafkaStreamsNotRunningException` >> > > >>>>> and `StateStoreNotAvailableException`? >> > > >>>>> >> > > >>>> >> > > >>>> `KafkaStreamsNotRunningException` may be caused by various >> reasons, I >> > > >> think >> > > >>>> it would be helpful that the >> > > >>>> user can distinguish whether it is caused by the state store >> closed. >> > > >>>> (Maybe I am wrong...) >> > > >>>> >> > > >>>> >> > > >>>> Last, why do we distinguish between `KafkaStreams` instance and >> > > >>>>> `StreamsThread`? To me, it seems we should always refer to the >> > > >> instance, >> > > >>>>> because that is the level of granularity in which we >> enable/disable >> > > >> IQ atm. >> > > >>>>> >> > > >>>> >> > > >>>> Totally agree. Do you mean the naming of state store exceptions? >> > > >>>> I don't have special reason to distinguish these two. >> > > >>>> Your suggestion look more reasonable for the exception naming. >> > > >>>> >> > > >>>> >> > > >>>> Last, for `StateStoreMigratedException`, I would add that a user >> need >> > > >> to >> > > >>>>> rediscover the store and cannot blindly retry as the store >> handle is >> > > >>>>> invalid and a new store handle must be retrieved. That is a >> > > difference >> > > >>>>> to `StreamThreadRebalancingException` that allows for "blind" >> retries >> > > >>>>> that either resolve (if the store is still on the same instance >> after >> > > >>>>> rebalancing finishes, or changes to >> `StateStoreMigratedException` if >> > > >> the >> > > >>>>> store was migrated away during rebalancing). >> > > >>>>> >> > > >>>> >> > > >>>> Nice, it's great! Thank you. >> > > >>>> >> > > >>>> >> > > >>>> The KIP already updated, please take a look. :) >> > > >>>> >> > > >>>> >> > > >>>> >> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax < >> > > matth...@confluent.io >> > > >>> >> > > >>>> wrote: >> > > >>>> >> > > >>>>> Any update on this KIP? >> > > >>>>> >> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote: >> > > >>>>>> Sorry for the late reply. The 2.4 deadline kept us quite busy. >> > > >>>>>> >> > > >>>>>> About `StateStoreMigratedException`: >> > > >>>>>> >> > > >>>>>> Why is it only thrown if the state is REBALANCING? A store >> might be >> > > >>>>>> migrated during a rebalance, and Kafka Streams might resume >> back to >> > > >>>>>> RUNNING state and afterward somebody tries to use an old store >> > > >> handle. >> > > >>>>>> Also, if state is REBALANCING, should we throw >> > > >>>>>> `StreamThreadRebalancingException`? Hence, I think >> > > >>>>>> `StateStoreMigratedException` does only make sense during >> `RUNNING` >> > > >>>>> state. >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> Why do we need to distinguish between >> > > >> `KafkaStreamsNotRunningException` >> > > >>>>>> and `StateStoreNotAvailableException`? >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> Last, why do we distinguish between `KafkaStreams` instance and >> > > >>>>>> `StreamsThread`? To me, it seems we should always refer to the >> > > >> instance, >> > > >>>>>> because that is the level of granularity in which we >> enable/disable >> > > >> IQ >> > > >>>>> atm. >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> Last, for `StateStoreMigratedException`, I would add that a >> user >> > > >> need to >> > > >>>>>> rediscover the store and cannot blindly retry as the store >> handle is >> > > >>>>>> invalid and a new store handle must be retrieved. That is a >> > > >> difference >> > > >>>>>> to `StreamThreadRebalancingException` that allows for "blind" >> > > retries >> > > >>>>>> that either resolve (if the store is still on the same instance >> > > after >> > > >>>>>> rebalancing finishes, or changes to >> `StateStoreMigratedException` if >> > > >> the >> > > >>>>>> store was migrated away during rebalancing). >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> -Matthias >> > > >>>>>> >> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote: >> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9` >> <https://shorturl.at/CDNT9> >> > > <https://shorturl.at/CDNT9> >> > > >> <https://shorturl.at/CDNT9> >> > > >>>>> <https://shorturl.at/CDNT9> >> > > >>>>>>> <https://shorturl.at/CDNT9> seems incorrect. >> > > >>>>>>> >> > > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> --- >> > > >>>>>>> Vito >> > > >>>>>>> >> > > >>>>>>> >> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng < >> v...@is-land.com.tw> >> > > >> wrote: >> > > >>>>>>> >> > > >>>>>>>> Thanks, Matthias! >> > > >>>>>>>> >> > > >>>>>>>>> About `StreamThreadNotStartedException`: >> > > >>>>>>>> >> > > >>>>>>>> Thank you for explanation. I agree with your opinion. >> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would never throw >> > > >>>>>>>> `StreamThreadNotStartedException`. >> > > >>>>>>>> >> > > >>>>>>>> For the case that corresponding thread crashes after we >> handed out >> > > >> the >> > > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException` >> or >> > > >>>>>>>> `StateStoreMigratedException`. >> > > >>>>>>>> In `StreamThreadStateStoreProvider`, we would throw >> > > >>>>>>>> `KafkaStreamsNotRunningException` when stream thread is not >> > > >> running( >> > > >>>>>>>> https://shorturl.at/CDNT9) or throw >> `StateStoreMigratedException` >> > > >> when >> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN). So I think we >> do not >> > > >> need >> > > >>>>> to >> > > >>>>>>>> add a new type for this case. Does that make sense? >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>>> About `KafkaStreamsNotRunningException` vs >> > > >>>>>>>> `StreamThreadNotRunningException`: >> > > >>>>>>>> >> > > >>>>>>>> I understand your point. I rename >> > > >> `StreamThreadNotRunningException` to >> > > >>>>>>>> `KafkaStreamsNotRunningException`. >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>> About check unknown state store names: >> > > >>>>>>>> Thank you for the hint. I add a new type >> > > >> `UnknownStateStoreException` >> > > >>>>> for >> > > >>>>>>>> this case. >> > > >>>>>>>> >> > > >>>>>>>> >> > > >>>>>>>>> Also, we should still have fatal exception >> > > >>>>>>>> `StateStoreNotAvailableException`? Not sure why you remove >> it? >> > > >>>>>>>> >> > > >>>>>>>> Thank you point this, already add it again. >> > > >>>>>>>> >> > > >>>>>>>> The KIP already updated, please take a look. >> > > >>>>>>>> >> > > >>>>>>>> --- >> > > >>>>>>>> Vito >> > > >>>>>>>> >> > > >>>>>>> >> > > >>>>>> >> > > >>>>> >> > > >>>>> >> > > >>>> >> > > >>> >> > > >>> >> > > >> >> > > > >> > > >> > > >> > >> >