Vito, Thanks for the updates, the KIP LGTM.
-Bill On Wed, Jan 15, 2020 at 11:31 PM John Roesler <vvcep...@apache.org> wrote: > Hi Vito, > > Haha, your archive game is on point! > > What Matthias said in that email is essentially what I figured was the > rationale. It makes sense, but the point I was making is that this really > doesn’t seem like a good way to structure a production app. On the other > hand, considering the exception fatal has a good chance of avoiding a > frustrating debug session if you just forgot to call start. > > Nevertheless, if we omit the categorization, it’s moot. > > It would be easy to add a categorization layer later if we want it, but > not very easy to change it if we get it wrong. > > Thanks for your consideration! > -John > > On Wed, Jan 15, 2020, at 21:14, Vito Jeng wrote: > > Hi John, > > > > About `StreamsNotStartedException is strange` -- > > The original idea came from Matthias, two years ago. :) > > You can reference here: > > > https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc5...@confluent.io%3e > > > > About omitting the categorization -- > > It looks reasonable. I'm fine with omitting the categorization but not > very > > sure it is a good choice. > > Does any other folks provide opinion? > > > > > > Hi, folks, > > > > Just update the KIP-216, please take a look. > > > > --- > > Vito > > > > > > On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <v...@is-land.com.tw> wrote: > > > > > > > > Hi, folks, > > > > > > Thank you suggestion, really appreciate it. :) > > > I understand your concern. I'll merge StreamsNotRunningException and > > > StateStoreNotAvailableException. > > > > > > > > > --- > > > Vito > > > > > > > > > On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcep...@apache.org> > wrote: > > > > > >> Hey Vito, > > >> > > >> Yes, thanks for the KIP. Sorry the discussion has been so long. > > >> Hopefully, we can close it out soon. > > >> > > >> I agree we can drop StreamsNotRunningException in favor of > > >> just StateStoreNotAvailableException. > > >> > > >> Unfortunately, I have some higher-level concerns. The value > > >> of these exceptions is that they tell you how to handle the > > >> various situations that can arise while querying a distributed > > >> data store. > > >> > > >> Ideally, as a caller, I should be able to just catch "retriable" or > > >> "fatal" and handle them appropriately. Otherwise, there's no > > >> point in having categories, and we should just have all the > > >> exceptions extend InvalidStateStoreException. > > >> > > >> Presently, it's not possible to tell from just the > > >> "retriable"/"fatal" distinction what to do. You can tell > > >> from the descriptions of the various exceptions. E.g.: > > >> > > >> Retriable: > > >> * StreamsRebalancingException: the exact same call > > >> should just be retried until the rebalance is complete > > >> * StateStoreMigratedException: the store handle is > > >> now invalid, so you need to re-discover the instance > > >> and get a new handle on that instance. In other words, > > >> the query itself may be valid, but the particular method > > >> invocation on this particular instance has encountered > > >> a fatal exception. > > >> > > >> Fatal: > > >> * UnknownStateStoreException: this is truly fatal. No amount > > >> of retrying or re-discovering is going to get you a handle on a > > >> store that doesn't exist in the cluster. > > >> * StateStoreNotAvailableException: this is actually recoverable, > > >> since the store might exist in the cluster, but isn't available on > > >> this particular instance (which is shut down or whatever). > > >> > > >> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine > > >> with omitting the categorization and just having 5 subclasses > > >> of InvalidStateStoreException. Each of them would tell you > > >> how to handle them, and it's not too many to really > > >> understand and handle each one. > > >> > > >> If you really want to have a middle tier, I'd recommend: > > >> * RetryableStateStoreException: the exact same call > > >> should be repeated. > > >> * RecoverableStateStoreException: the store handle > > >> should be discarded and the caller should re-discover > > >> the location of the store and repeat the query on the > > >> correct instance. > > >> * FatalStateStoreException: the query/request is totally > > >> invalid and will never succeed. > > >> > > >> However, attempting to categorize the proposed exceptions > > >> reveals even problems with this categorization: > > >> Retriable: > > >> * StreamsRebalancingException > > >> Recoverable: > > >> * StateStoreMigratedException > > >> * StreamsNotRunningException > > >> Fatal: > > >> * UnknownStateStoreException > > >> > > >> But StreamsNotStartedException is strange... It means that > > >> one code path got a handle on a specific KafkaStreams object > > >> instance and sent it a query before another code path > > >> invoked the start() method on the exact same object instance. > > >> It seems like the most likely scenario is that whoever wrote > > >> the program just forgot to call start() before querying, in > > >> which case, retrying isn't going to help, and a fatal exception > > >> is more appropriate. I.e., it sounds like a "first 15 minutes > > >> experience" problem, and making it fatal would be more > > >> helpful. Even in a production context, there's no reason not > > >> to sequence your application startup such that you don't > > >> accept queries until after Streams is started. Thus, I guess > > >> I'd categorize it under "fatal". > > >> > > >> Regardless of whether you make it fatal or retriable, you'd > > >> still have a whole category with only one exception in it, > > >> and the other two categories only have two exceptions. > > >> Plus, as you pointed out in the KIP, you can't get all > > >> exceptions in all cases anyway: > > >> * store() can only throw NotStarted, NotRunning, > > >> and Unknown > > >> * actual store queries can only throw Rebalancing, > > >> Migrated, and NotRunning > > >> > > >> Thus, in practice also, there are exactly three categories > > >> and also exactly three exception types. It doesn't seem > > >> like there's a great advantage to the categories here. To > > >> avoid the categorization problem and also to clarify what > > >> exceptions can actually be thrown in different circumstances, > > >> it seems like we should just: > > >> * get rid of the middle tier and make all the exceptions > > >> extend InvalidStateStoreException > > >> * drop StateStoreNotAvailableException in favor of > > >> StreamsNotRunningException > > >> * clearly document on all public methods which exceptions > > >> need to be handled > > >> > > >> How do you feel about this? > > >> Thanks, > > >> -John > > >> > > >> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote: > > >> > Thanks for KIP Vito. > > >> > > > >> > Overall the KIP LGTM, but I'd have to agree with others on merging > the > > >> > `StreamsNotRunningException` and `StateStoreNotAvailableException` > > >> classes. > > >> > > > >> > Since in both cases, the thread state is in `PENDING_SHUTDOWN || > > >> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish > when to > > >> > use the different > > >> > exceptions. Maybe a good middle ground would be to have a detailed > > >> > exception message. > > >> > > > >> > The KIP freeze is close, so I think if we can agree on this, we can > > >> wrap up > > >> > the voting soon. > > >> > > > >> > Thanks, > > >> > Bill > > >> > > > >> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax < > matth...@confluent.io> > > >> > wrote: > > >> > > > >> > > Vito, > > >> > > > > >> > > It's still unclear to me what the advantage is, to have both > > >> > > `StreamsNotRunningException` and > `StateStoreNotAvailableException`? > > >> > > > > >> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / > ERROR` > > >> > > and thus, for a user point of view, why does it matter if the > store is > > >> > > closed on not? I don't understand why/how this information would > be > > >> > > useful? Do you have a concrete example in mind how a user would > react > > >> > > differently to both exceptions? > > >> > > > > >> > > > > >> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems > best > > >> to > > >> > > actually do this on a per-query basis, ie, have an overload > > >> > > `KafkaStreams#store(...)` that takes a boolean flag that allow to > > >> > > _disable_ the exception and opt-in to query a active store during > > >> > > recovery. However, as KIP-535 actually introduces this change in > > >> > > behavior, I think KIP-216 should not cover this, but KIP-535 > should be > > >> > > updated. I'll follow up on the other KIP thread to raise this > point. > > >> > > > > >> > > > > >> > > -Matthias > > >> > > > > >> > > On 1/11/20 12:26 AM, Vito Jeng wrote: > > >> > > > Hi, Matthias & Vinoth, > > >> > > > > > >> > > > Thanks for the feedback. > > >> > > > > > >> > > >> What is still unclear to me is, what we gain by having both > > >> > > >> `StreamsNotRunningException` and > > >> `StateStoreNotAvailableException`. Both > > >> > > >> exception are thrown when KafkaStreams is in state > > >> PENDING_SHUTDOWN / > > >> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know > if the > > >> > > >> state store is closed on not -- I can't query it anyway? Maybe > I > > >> miss > > >> > > >> something thought? > > >> > > > > > >> > > > Yes, both `StreamsNotRunningException` and > > >> > > > `StateStoreNotAvailableException` are fatal exception. > > >> > > > But `StateStoreNotAvailableException` is fatal exception about > state > > >> > > store > > >> > > > related. > > >> > > > I think it would be helpful that if user need to distinguish > these > > >> two > > >> > > > different case to handle it. > > >> > > > > > >> > > > I'm not very sure, does that make sense? > > >> > > > > > >> > > > > > >> > > > --- > > >> > > > Vito > > >> > > > > > >> > > > > > >> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar < > vin...@apache.org> > > >> > > wrote: > > >> > > > > > >> > > >> +1 on merging `StreamsNotRunningException` and > > >> > > >> `StateStoreNotAvailableException`, both exceptions are fatal > > >> anyway. IMO > > >> > > >> its best to have these exceptions be about the state store > (and not > > >> > > streams > > >> > > >> state), to easier understanding. > > >> > > >> > > >> > > >> Additionally, KIP-535 allows for querying of state stores in > > >> rebalancing > > >> > > >> state. So do we need the StreamsRebalancingException? > > >> > > >> > > >> > > >> > > >> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" < > matth...@confluent.io> > > >> > > wrote: > > >> > > >>> Sorry that I dropped the ball on this... > > >> > > >>> > > >> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to > start > > >> a > > >> > > VOTE > > >> > > >>> thread. > > >> > > >>> > > >> > > >>> What is still unclear to me is, what we gain by having both > > >> > > >>> `StreamsNotRunningException` and > > >> `StateStoreNotAvailableException`. > > >> > > Both > > >> > > >>> exception are thrown when KafkaStreams is in state > > >> PENDING_SHUTDOWN / > > >> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know > if > > >> the > > >> > > >>> state store is closed on not -- I can't query it anyway? > Maybe I > > >> miss > > >> > > >>> something thought? > > >> > > >>> > > >> > > >>> > > >> > > >>> -Matthias > > >> > > >>> > > >> > > >>> > > >> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote: > > >> > > >>>> Sorry for the late reply, thanks for the review. > > >> > > >>>> > > >> > > >>>> > > >> > > >>>>> About `StateStoreMigratedException`: > > >> > > >>>>> > > >> > > >>>>> Why is it only thrown if the state is REBALANCING? A store > > >> might be > > >> > > >>>>> migrated during a rebalance, and Kafka Streams might resume > > >> back to > > >> > > >>>>> RUNNING state and afterward somebody tries to use an old > store > > >> > > handle. > > >> > > >>>>> Also, if state is REBALANCING, should we throw > > >> > > >>>>> `StreamThreadRebalancingException`? Hence, I think > > >> > > >>>>> `StateStoreMigratedException` does only make sense during > > >> `RUNNING` > > >> > > >> state. > > >> > > >>>>> > > >> > > >>>> > > >> > > >>>> Thank you point this, already updated. > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> Why do we need to distinguish between > > >> > > `KafkaStreamsNotRunningException` > > >> > > >>>>> and `StateStoreNotAvailableException`? > > >> > > >>>>> > > >> > > >>>> > > >> > > >>>> `KafkaStreamsNotRunningException` may be caused by various > > >> reasons, I > > >> > > >> think > > >> > > >>>> it would be helpful that the > > >> > > >>>> user can distinguish whether it is caused by the state store > > >> closed. > > >> > > >>>> (Maybe I am wrong...) > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> Last, why do we distinguish between `KafkaStreams` instance > and > > >> > > >>>>> `StreamsThread`? To me, it seems we should always refer to > the > > >> > > >> instance, > > >> > > >>>>> because that is the level of granularity in which we > > >> enable/disable > > >> > > >> IQ atm. > > >> > > >>>>> > > >> > > >>>> > > >> > > >>>> Totally agree. Do you mean the naming of state store > exceptions? > > >> > > >>>> I don't have special reason to distinguish these two. > > >> > > >>>> Your suggestion look more reasonable for the exception > naming. > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> Last, for `StateStoreMigratedException`, I would add that a > user > > >> need > > >> > > >> to > > >> > > >>>>> rediscover the store and cannot blindly retry as the store > > >> handle is > > >> > > >>>>> invalid and a new store handle must be retrieved. That is a > > >> > > difference > > >> > > >>>>> to `StreamThreadRebalancingException` that allows for > "blind" > > >> retries > > >> > > >>>>> that either resolve (if the store is still on the same > instance > > >> after > > >> > > >>>>> rebalancing finishes, or changes to > > >> `StateStoreMigratedException` if > > >> > > >> the > > >> > > >>>>> store was migrated away during rebalancing). > > >> > > >>>>> > > >> > > >>>> > > >> > > >>>> Nice, it's great! Thank you. > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> The KIP already updated, please take a look. :) > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax < > > >> > > matth...@confluent.io > > >> > > >>> > > >> > > >>>> wrote: > > >> > > >>>> > > >> > > >>>>> Any update on this KIP? > > >> > > >>>>> > > >> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote: > > >> > > >>>>>> Sorry for the late reply. The 2.4 deadline kept us quite > busy. > > >> > > >>>>>> > > >> > > >>>>>> About `StateStoreMigratedException`: > > >> > > >>>>>> > > >> > > >>>>>> Why is it only thrown if the state is REBALANCING? A store > > >> might be > > >> > > >>>>>> migrated during a rebalance, and Kafka Streams might resume > > >> back to > > >> > > >>>>>> RUNNING state and afterward somebody tries to use an old > store > > >> > > >> handle. > > >> > > >>>>>> Also, if state is REBALANCING, should we throw > > >> > > >>>>>> `StreamThreadRebalancingException`? Hence, I think > > >> > > >>>>>> `StateStoreMigratedException` does only make sense during > > >> `RUNNING` > > >> > > >>>>> state. > > >> > > >>>>>> > > >> > > >>>>>> > > >> > > >>>>>> Why do we need to distinguish between > > >> > > >> `KafkaStreamsNotRunningException` > > >> > > >>>>>> and `StateStoreNotAvailableException`? > > >> > > >>>>>> > > >> > > >>>>>> > > >> > > >>>>>> Last, why do we distinguish between `KafkaStreams` > instance and > > >> > > >>>>>> `StreamsThread`? To me, it seems we should always refer to > the > > >> > > >> instance, > > >> > > >>>>>> because that is the level of granularity in which we > > >> enable/disable > > >> > > >> IQ > > >> > > >>>>> atm. > > >> > > >>>>>> > > >> > > >>>>>> > > >> > > >>>>>> Last, for `StateStoreMigratedException`, I would add that a > > >> user > > >> > > >> need to > > >> > > >>>>>> rediscover the store and cannot blindly retry as the store > > >> handle is > > >> > > >>>>>> invalid and a new store handle must be retrieved. That is a > > >> > > >> difference > > >> > > >>>>>> to `StreamThreadRebalancingException` that allows for > "blind" > > >> > > retries > > >> > > >>>>>> that either resolve (if the store is still on the same > instance > > >> > > after > > >> > > >>>>>> rebalancing finishes, or changes to > > >> `StateStoreMigratedException` if > > >> > > >> the > > >> > > >>>>>> store was migrated away during rebalancing). > > >> > > >>>>>> > > >> > > >>>>>> > > >> > > >>>>>> > > >> > > >>>>>> -Matthias > > >> > > >>>>>> > > >> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote: > > >> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9` > <https://shorturl.at/CDNT9> > > >> <https://shorturl.at/CDNT9> > > >> > > <https://shorturl.at/CDNT9> > > >> > > >> <https://shorturl.at/CDNT9> > > >> > > >>>>> <https://shorturl.at/CDNT9> > > >> > > >>>>>>> <https://shorturl.at/CDNT9> seems incorrect. > > >> > > >>>>>>> > > >> > > >>>>>>> Please use the following instead: > https://shorturl.at/bkKQU > > >> > > >>>>>>> > > >> > > >>>>>>> > > >> > > >>>>>>> --- > > >> > > >>>>>>> Vito > > >> > > >>>>>>> > > >> > > >>>>>>> > > >> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng < > > >> v...@is-land.com.tw> > > >> > > >> wrote: > > >> > > >>>>>>> > > >> > > >>>>>>>> Thanks, Matthias! > > >> > > >>>>>>>> > > >> > > >>>>>>>>> About `StreamThreadNotStartedException`: > > >> > > >>>>>>>> > > >> > > >>>>>>>> Thank you for explanation. I agree with your opinion. > > >> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would never throw > > >> > > >>>>>>>> `StreamThreadNotStartedException`. > > >> > > >>>>>>>> > > >> > > >>>>>>>> For the case that corresponding thread crashes after we > > >> handed out > > >> > > >> the > > >> > > >>>>>>>> store handle. We may throw > `KafkaStreamsNotRunningException` > > >> or > > >> > > >>>>>>>> `StateStoreMigratedException`. > > >> > > >>>>>>>> In `StreamThreadStateStoreProvider`, we would throw > > >> > > >>>>>>>> `KafkaStreamsNotRunningException` when stream thread is > not > > >> > > >> running( > > >> > > >>>>>>>> https://shorturl.at/CDNT9) or throw > > >> `StateStoreMigratedException` > > >> > > >> when > > >> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN). So I think > we > > >> do not > > >> > > >> need > > >> > > >>>>> to > > >> > > >>>>>>>> add a new type for this case. Does that make sense? > > >> > > >>>>>>>> > > >> > > >>>>>>>> > > >> > > >>>>>>>>> About `KafkaStreamsNotRunningException` vs > > >> > > >>>>>>>> `StreamThreadNotRunningException`: > > >> > > >>>>>>>> > > >> > > >>>>>>>> I understand your point. I rename > > >> > > >> `StreamThreadNotRunningException` to > > >> > > >>>>>>>> `KafkaStreamsNotRunningException`. > > >> > > >>>>>>>> > > >> > > >>>>>>>> > > >> > > >>>>>>>> About check unknown state store names: > > >> > > >>>>>>>> Thank you for the hint. I add a new type > > >> > > >> `UnknownStateStoreException` > > >> > > >>>>> for > > >> > > >>>>>>>> this case. > > >> > > >>>>>>>> > > >> > > >>>>>>>> > > >> > > >>>>>>>>> Also, we should still have fatal exception > > >> > > >>>>>>>> `StateStoreNotAvailableException`? Not sure why you > remove > > >> it? > > >> > > >>>>>>>> > > >> > > >>>>>>>> Thank you point this, already add it again. > > >> > > >>>>>>>> > > >> > > >>>>>>>> The KIP already updated, please take a look. > > >> > > >>>>>>>> > > >> > > >>>>>>>> --- > > >> > > >>>>>>>> Vito > > >> > > >>>>>>>> > > >> > > >>>>>>> > > >> > > >>>>>> > > >> > > >>>>> > > >> > > >>>>> > > >> > > >>>> > > >> > > >>> > > >> > > >>> > > >> > > >> > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > > > >