Makes sense to me :) On Thu, Sep 17, 2020 at 9:34 AM Bruno Cadonna <br...@confluent.io> wrote:
> Hi Sophie, > > Thank you for the feedback! I replied inline. > > Best, > Bruno > > On 16.09.20 19:19, Sophie Blee-Goldman wrote: > >> > >> We guarantee that the metadata of the dead stream threads will be > >> returned by KafkaStreams#localThreadsMetadata() at least until the next > >> call to KafkaStreams#addStreamThread() or > >> KafkaStreams#removeStreamThread() after the stream thread transited to > >> DEAD > > > > > > This seems kind of tricky...personally I would find it pretty odd if I > > queried the > > local thread metadata and found two threads, A (alive) and B (dead), and > > then > > called removeStreamThread() and now suddenly I have zero. Or if I call > > addStreamThread and now I still have two threads. > > > > The behavior might be unusual, but it is well defined and not random by > any means. > > > Both of those results seem to indicate that only live threads "count" and > > are returned > > by localThreadsMetadata(). But in reality we do temporarily keep the dead > > thread, > > but only for the arbitrary amount of time until the next time you want to > > add or > > remove some other stream thread? That seems like a weird side effect of > the > > add/removeStreamThread APIs. > > > > This is not a side effect that just happens to occur. This is a > guarantee that users get. It gives users the possibility to retrieve the > metadata of the dead stream threads since the last call to > add/removeStreamThread. Admittedly, this guarantee overlap with the > current/planned implementation. But that is more a coincidence. > > I would be more concerned about when add/removeStreamThread is called > from different threads which could happen if an uncaught exception > handler is called that wants to replace a stream thread and a thread > that is responsible for automated scaling up is running. > > > If we really think users might want to log the metadata of dead threads, > > then > > let's just do that for them or give them a way to do exactly that. > > > > Logging the metatdata of dead stream threads for the user is a valid > alternative. Giving users the way to do exactly that is hard because the > StreamThread class is not part of the public API. They would always need > to call a method on the KafkaStreams object where we already have > localThreadsMetadata(). > > > I'm not that concerned about the backwards compatibility of removing dead > > threads from the localThreadsMetadata, because I find it hard to believe > > that > > users do anything other than just skip over them in the list (set?) that > > gets > > returned. But maybe someone can chime in with an example use case. > > > > I am also not too much concerned about backwards compatibility. That > would indeed be a side effect of the current proposal. > > > I'm actually even a little skeptical that any users might want to log the > > metadata of a > > dead thread, since all of the metadata is only useful for IQ on live > > threads or > > already covered by other easily discoverable logging elsewhere, or both. > > > > Said all of the above, I actually agree with you that there is not that > much information in the metadata of a dead stream thread that is > interesting. The name of the stream thread is known in the uncaught > exception handler. The names of the clients, like consumer etc., used by > the stream thread can be derived from the name of the stream thread. > Finally, the sets of active and standby tasks should be empty for a dead > stream thread. > > Hence, I backpedal and propose to filter out dead stream threads from > localThreadsMetadata(). WDYT? > > > On Wed, Sep 16, 2020 at 2:07 AM Bruno Cadonna <br...@confluent.io> > wrote: > > > >> Hi again, > >> > >> I just realized that if we filter out DEAD stream threads in > >> localThreadsMetadata(), users cannot log the metadata of dying stream > >> threads in the uncaught exception handler. > >> > >> I realized this thanks to the example Guozhang requested in the KIP. > >> Thank you for that, Guozhang! > >> > >> Hence, I adapted the KIP as follows: > >> > >> - We do not filter out DEAD stream threads in > >> KafkaStreams#localThreadsMetadata() > >> > >> - We guarantee that the metadata of the dead stream threads will be > >> returned by KafkaStreams#localThreadsMetadata() at least until the next > >> call to KafkaStreams#addStreamThread() or > >> KafkaStreams#removeStreamThread() after the stream thread transited to > >> DEAD. Besides giving users the opportunity to log the metadata of a > >> dying stream thread in its uncaught exception handler, this guarantee > >> makes KafkaStreams#localThreadsMetadata() completely backward compatible > >> to the current behavior, because if KafkaStreams#addStreamThread() and > >> KafkaStreams#removeStreamThread() are never called, > >> KafkaStreams#localThreadsMetadata() will also return the metadata of all > >> streams threads that have ever died which corresponds to the current > >> behavior. > >> > >> - We guarantee that dead stream threads are removed from a Kafka Streams > >> client at latest after the next call to KafkaStreams#addStreamThread() > >> or KafkaStreams#removeStreamThread() following the transition of the > >> stream thread to DEAD. This guarantees that the number of maintained > >> stream threads does not grow indefinitely. > >> > >> > >> Best, > >> Bruno > >> > >> > >> > >> On 16.09.20 09:23, Bruno Cadonna wrote: > >>> Hi Guozhang, > >>> > >>> Good point! I would propose to filter out DEAD stream threads in > >>> localThreadsMetadata() to get consistent results that do not depend on > >>> timing. I will update the KIP accordingly. > >>> > >>> Best, > >>> Bruno > >>> > >>> On 16.09.20 06:02, Guozhang Wang wrote: > >>>> Thanks Bruno, your replies make sense to me. As for > >>>> localThreadsMetadata() itself, > >>>> I'd like to clarify if it would return any still-bookkept threads, or > >>>> would > >>>> it specifically filter out those DEAD threads even if they are not yet > >>>> removed. > >>>> > >>>> Otherwise, the KIP LGTM. > >>>> > >>>> Guozhang > >>>> > >>>> On Tue, Sep 15, 2020 at 2:58 AM Bruno Cadonna <br...@confluent.io> > >> wrote: > >>>> > >>>>> Hi Guozhang, > >>>>> > >>>>> Thank you for your feedback. I replied inline. > >>>>> > >>>>> Best, > >>>>> Bruno > >>>>> > >>>>> On 09.09.20 23:43, Guozhang Wang wrote: > >>>>>> Hello Bruno, > >>>>>> > >>>>>> Finally got some time to review your KIP and the discussion thread > >>>>>> now.. > >>>>> a > >>>>>> few comments below: > >>>>>> > >>>>>> 1) I'm with Matthias about the newly added > numberOfAliveStreamThreads > >>>>> v.s. > >>>>>> existing localThreadsMetadata: to me it seems we can always achieve > >> the > >>>>>> first based on the second. It seems not worthy to provide some > "syntax > >>>>>> sugar" to the API but just let users do the filtering themselves. > >>>>> > >>>>> I am not married to that method. I removed it. > >>>>> > >>>>>> Furthermore, I'm wondering what's the rationale behind removing the > >>>>>> DEAD > >>>>>> threads from localThreadsMetadata()? Personally I feel returning all > >>>>>> threads, including those who are ever closed, either due to > >>>>>> exception or > >>>>>> due to removeStreamThread, would be beneficial for debugging > >>>>>> purposes, as > >>>>>> long as within the lifetime of an instance we expect the amount of > >> such > >>>>>> dead threads will not increase linearly --- and if we agree with > that, > >>>>>> maybe we can rename "removeStreamThread" to sth. like > >>>>>> "terminateStreamThread" indicating it is only terminated but not > >>>>>> removed > >>>>>> --- and of course if users do not want to see those DEAD threads > >>>>>> they can > >>>>>> always filter them out. I'm just proposing that we should still > >>>>>> leave the > >>>>>> door open for those who want to check those ever terminated threads. > >>>>>> > >>>>> > >>>>> I actually think the number of dead stream threads might increase > >>>>> linearly. Assume users have a systematic error that continuously > kills > >> a > >>>>> stream thread and they blindly start a new stream thread in the > >> uncaught > >>>>> exception handler. This scenario might be a mistake but if the > >>>>> systematic error does not occur at a high rate, it could also be a > >>>>> strategy to keep the application running during the investigation of > >> the > >>>>> systematic error. > >>>>> > >>>>> IMO, removing dead stream threads makes Kafka Streams more robust > >>>>> because it prevent a possibly unbounded increase of memory usage. If > >>>>> users want to debug the dead stream threads they can monitor the > number > >>>>> of dead threads with the metric proposed in the KIP and they could > >>>>> additionally log the metadata of the dying stream thread in the > >> uncaught > >>>>> exception handler. I do not think that there is need to keep dead > >> stream > >>>>> threads around. > >>>>> > >>>>>> 2) I think it would help to write down some example user code in > >>>>> exception > >>>>>> handler e.g. to illustrate how this would be implemented -- e.g. we > >>>>>> know > >>>>>> that practically the handler need to maintain a "this" reference of > >> the > >>>>>> instance anyways in order to shutdown the whole instance or, > >>>>> add/terminate > >>>>>> threads dynamically, but I want to see if we have listed all > possible > >>>>> call > >>>>>> paths like: a) a thread's handler logic to terminate another thread, > >>>>>> b) a > >>>>>> thread handler to add new threads, etc are all appropriately > supported > >>>>>> without deadlocks. > >>>>>> > >>>>> > >>>>> I added an example for an uncaught exception handler that adds a > stream > >>>>> thread to the KIP. Removing a stream thread in an uncaught exception > >>>>> handler doesn't seem a common use case to me. Nevertheless, we need > to > >>>>> make sure that we do not run in a deadlock in that case. I will > >> consider > >>>>> that during the implementation and write tests to check for > deadlocks. > >>>>> > >>>>> Shutting down the Kafka Streams client from inside an uncaught > >> exception > >>>>> handler is outside the scope of this KIP. In the beginning it was > part > >>>>> of the KIP, but during the discussion it turned out that we can fix > our > >>>>> existing close() method to accomplish the shutdown from inside an > >>>>> uncaught exception handler. But I completely agree with you that we > >> need > >>>>> to ensure that we do not run into a deadlock in this case. > >>>>> > >>>>> > >>>>>> > >>>>>> Guozhang > >>>>>> > >>>>>> > >>>>>> On Wed, Sep 9, 2020 at 11:35 AM Matthias J. Sax <mj...@apache.org> > >>>>> wrote: > >>>>>> > >>>>>>> I would prefer to not add a new method. It seems unnecessary. > >>>>>>> `localThreadMetadata` does return all threads in all states(*) and > >>>>>>> thus > >>>>>>> provides full insight. > >>>>>>> > >>>>>>> (*) A thread in state DEAD could be returned as long as it's not > >>>>>>> removed > >>>>>>> yet. > >>>>>>> > >>>>>>> I don't see any advantage to pre-filter threads and to exclude > >> threads > >>>>>>> in state CREATE or PENDING_SHUTDOWN. Even if a CREATED thread is > not > >>>>>>> started yet, it is still "alive" in a broader sense. For example, > if > >> a > >>>>>>> user wants to scale out to 10 thread, and 8 are RUNNING and 2 are > in > >>>>>>> state CREATED, a user won't need to add 2 more threads -- there are > >>>>>>> already 10 threads. > >>>>>>> > >>>>>>> For PENDING_SHUTDOWN and scale in it would be different I guess, as > >>>>>>> the > >>>>>>> proposal would be to filter them out right away. However, filtering > >>>>>>> them > >>>>>>> seems actually not to be "correct", as a thread in PENDING_SHUTDOWN > >>>>>>> might still process data and it's thus still "alive". > >>>>>>> > >>>>>>> If there is still a need later to add a new method about "alive > >>>>>>> thread" > >>>>>>> we can always add as a follow up -- removing things is much harder. > >>>>>>> > >>>>>>> I also don't think that there is value in returning names of dead > >>>>>>> threads, as we recycle names. > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> > >>>>>>> On 9/9/20 10:04 AM, Sophie Blee-Goldman wrote: > >>>>>>>> I agree that the current behavior of localThreadsMetadata() does > not > >>>>> seem > >>>>>>>> to match, but it seems like we will be forced to change it to only > >>>>> return > >>>>>>>> currently-alive threads. For one thing, we plan to recycle old > >> thread > >>>>>>> names. > >>>>>>>> It would be pretty confusing for a user to get two (or more) > >>>>>>> ThreadMetadata > >>>>>>>> objects returned with the same name, since AFAICT this is the only > >>>>>>>> distinguishing identifier of stream threads. I think we should > >>>>>>>> enforce > >>>>>>> that > >>>>>>>> only live threads are returned by localThreadsMetadata(). Plus, as > >>>>>>> Matthias > >>>>>>>> pointed out, we plan to remove dead threads from the KafkaStreams > >>>>> client, > >>>>>>>> so still returning them in the metadata would be extremely odd. > >>>>>>>> > >>>>>>>> If we think that there might be some use case that requires > knowing > >>>>> which > >>>>>>>> threads have died, we could consider adding a method that returns > >> the > >>>>>>>> names of dead threads. But the only use case I can imagine would > >>>>> probably > >>>>>>>> be better served by a callback that gets invoked when the thread > >>>>>>>> dies, > >>>>>>> which > >>>>>>>> we already have. > >>>>>>>> > >>>>>>>> On Tue, Sep 8, 2020 at 11:46 PM Bruno Cadonna <br...@confluent.io > > > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi Matthias and Sophie, > >>>>>>>>> > >>>>>>>>> I agree that localThreadsMetadata() can be used here. However, > >>>>>>>>> localThreadsMetadata() returns all stream threads irrespectively > of > >>>>>>>>> their states. Alive stream threads are specified as being in one > of > >>>>> the > >>>>>>>>> following states: RUNNING, STARTING, PARTITIONS_REVOKED, and > >>>>>>>>> PARTITIONS_ASSIGNED. Hence, users would need to filter the result > >> of > >>>>>>>>> localThreadsMetadata(). I thought, it would be neat to have a > >> method > >>>>>>>>> that hides this filtering and returns the number of alive stream > >>>>>>>>> threads, because that is the most basic information you might > >>>>>>>>> need to > >>>>>>>>> decide about adding or removing stream threads. For all more > >>>>>>>>> advanced > >>>>>>>>> use cases users should use localThreadsMetadata(). I am also > happy > >>>>> with > >>>>>>>>> removing the method. WDYT? > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Bruno > >>>>>>>>> > >>>>>>>>> On 09.09.20 03:51, Matthias J. Sax wrote: > >>>>>>>>>> Currently we, don't cleanup dead threads, but the KIP proposes > to > >>>>>>> change > >>>>>>>>>> this: > >>>>>>>>>> > >>>>>>>>>>> Stream threads that are in state DEAD will be removed from the > >>>>> stream > >>>>>>>>> threads of a Kafka Streams client. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Matthias > >>>>>>>>>> > >>>>>>>>>> On 9/8/20 2:37 PM, Sophie Blee-Goldman wrote: > >>>>>>>>>>> Ah, I forgot about localThreadsMetadata(). In that. case I > agree, > >>>>>>>>> there's > >>>>>>>>>>> no reason > >>>>>>>>>>> to introduce a new method when we can get both the names and > >>>>>>>>>>> number > >>>>> of > >>>>>>>>> all > >>>>>>>>>>> running threads from this. > >>>>>>>>>>> > >>>>>>>>>>> I assume that we would update localThreadsMetadata to only > return > >>>>>>>>> currently > >>>>>>>>>>> alive threads as part of this KIP -- at a quick glance, it > seems > >>>>> like > >>>>>>> we > >>>>>>>>>>> don't do > >>>>>>>>>>> any pruning of dead threads at the moment > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Sep 8, 2020 at 1:58 PM Matthias J. Sax < > mj...@apache.org > >>> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> I am not sure if we need a new method? There is already > >>>>>>>>>>>> `localThreadsMetadata()`. What do we gain by adding a new one? > >>>>>>>>>>>> > >>>>>>>>>>>> Returning the thread's name (as `Optional<String>`) for both > >>>>>>>>>>>> add() > >>>>>>> and > >>>>>>>>>>>> remove() is fine with me. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -Matthias > >>>>>>>>>>>> > >>>>>>>>>>>> On 9/8/20 12:58 PM, Sophie Blee-Goldman wrote: > >>>>>>>>>>>>> Sorry Bruno, I think I missed the end of your message with > the > >>>>>>>>>>>>> numberOfAliveStreamThreads() > >>>>>>>>>>>>> proposal. I agree, that would be better than the > alternatives I > >>>>>>>>> listed. > >>>>>>>>>>>>> That said: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> They rather suggest that the method returns a list of > >>>>>>>>>>>>>> handles to > >>>>>>> the > >>>>>>>>>>>>> stream threads. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I hadn't thought of that originally, but now that you mention > >>>>>>>>>>>>> it, > >>>>>>> this > >>>>>>>>>>>>> might be a good idea. > >>>>>>>>>>>>> I don't think we should return actual handles on the threads, > >>>>>>>>>>>>> but > >>>>>>>>> maybe a > >>>>>>>>>>>>> list of the thread > >>>>>>>>>>>>> names rather than a single number of currently alive threads. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Since we seem to think it would be difficult if not > >>>>>>>>>>>>> impossible to > >>>>>>> keep > >>>>>>>>>>>>> track of the number > >>>>>>>>>>>>> of running stream threads, we should apply the same reasoning > >> to > >>>>> the > >>>>>>>>>>>> names > >>>>>>>>>>>>> and not > >>>>>>>>>>>>> assume the user can/will keep track of every thread returned > by > >>>>>>>>>>>>> addStreamThread() or > >>>>>>>>>>>>> removeStreamThread(). Users should generally take any > required > >>>>>>> action > >>>>>>>>>>>>> immediately > >>>>>>>>>>>>> after adding/removing the thread -- eg deregistering the > thread > >>>>>>>>> metrics > >>>>>>>>>>>> -- > >>>>>>>>>>>>> but it might > >>>>>>>>>>>>> still be useful to provide a convenience method listing all > >>>>>>>>>>>>> of the > >>>>>>>>>>>> current > >>>>>>>>>>>>> threads > >>>>>>>>>>>>> > >>>>>>>>>>>>> And of course you could still get the number of threads > >>>>>>>>>>>>> easily by > >>>>>>>>>>>> invoking > >>>>>>>>>>>>> size() on the > >>>>>>>>>>>>> returned list (or ordered set?). > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Sep 8, 2020 at 12:16 PM Bruno Cadonna > >>>>>>>>>>>>> <br...@confluent.io > >>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thank you again for the feedback Sophie! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> As I tried to point out in my previous e-mail, removing a > >>>>>>>>>>>>>> stream > >>>>>>>>> thread > >>>>>>>>>>>>>> from a Kafka Streams client that does not have alive stream > >>>>> threads > >>>>>>>>> is > >>>>>>>>>>>>>> nothing exceptional for the client per se. However, it can > >>>>>>>>>>>>>> become > >>>>>>>>>>>>>> exceptional within the context of the user. For example, if > >>>>>>>>>>>>>> users > >>>>>>>>> want > >>>>>>>>>>>>>> to remove a stream thread from a client without alive stream > >>>>>>> threads > >>>>>>>>>>>>>> because one if their metrics say so, then this is > >>>>>>>>>>>>>> exceptional in > >>>>>>> the > >>>>>>>>>>>>>> context of that user metric not in the context of the Kafka > >>>>> Streams > >>>>>>>>>>>>>> client. In that case, users should throw an exception and > >>>>>>>>>>>>>> handle > >>>>>>> it. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regarding returning null, I do not like to return null > because > >>>>>>> from a > >>>>>>>>>>>>>> development point of view there is no distinction between > >>>>> returning > >>>>>>>>> null > >>>>>>>>>>>>>> because we have a bug in the code or returning null because > >>>>>>>>>>>>>> there > >>>>>>>>> are no > >>>>>>>>>>>>>> alive stream threads. Additionally, Optional<String> makes > it > >>>>> more > >>>>>>>>>>>>>> explicit that the result could also be empty. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thank you for the alternative method names! However, with > the > >>>>> names > >>>>>>>>> you > >>>>>>>>>>>>>> propose it is not immediately clear that the method returns > an > >>>>>>>>> amount of > >>>>>>>>>>>>>> stream threads. They rather suggest that the method returns > a > >>>>> list > >>>>>>> of > >>>>>>>>>>>>>> handles to the stream threads. I chose to use > >>>>> "aliveStreamThreads" > >>>>>>>>> to be > >>>>>>>>>>>>>> consistent with the client-level metric > "alive-stream-threads" > >>>>>>> which > >>>>>>>>>>>>>> reports the same number of stream threads that > >>>>>>>>>>>>>> numberOfAliveStreamThreads() should report. If others also > >>>>>>>>>>>>>> think > >>>>>>> that > >>>>>>>>>>>>>> the proposed name in the KIP is too clumsy, I am open to > >> rename > >>>>> it, > >>>>>>>>>>>> though. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 08.09.20 20:12, Sophie Blee-Goldman wrote: > >>>>>>>>>>>>>>>> it's never a good sign when the discussion moves into the > >>>>>>>>>>>>>>>> vote > >>>>>>>>> thread > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hah, sorry, the gmail consolidation of [VOTE] and [DISCUSS] > >>>>>>> threads > >>>>>>>>>>>>>> strikes > >>>>>>>>>>>>>>> again. > >>>>>>>>>>>>>>> Thanks for redirecting me Bruno > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I suppose it's unfair to expect the callers to keep perfect > >>>>> track > >>>>>>> of > >>>>>>>>>>>> the > >>>>>>>>>>>>>>> current > >>>>>>>>>>>>>>> number of stream threads, but it also seems like you > >>>>> shouldn't > >>>>>>> be > >>>>>>>>>>>>>> calling > >>>>>>>>>>>>>>> removeStreamThread() when there are no threads left. Either > >>>>> you're > >>>>>>>>> just > >>>>>>>>>>>>>>> haphazardly removing threads and could unintentionally slip > >>>>> into a > >>>>>>>>>>>> state > >>>>>>>>>>>>>> of > >>>>>>>>>>>>>>> no > >>>>>>>>>>>>>>> running threads without realizing it, or more > realistically, > >>>>>>> you're > >>>>>>>>>>>>>>> carefully > >>>>>>>>>>>>>>> removing threads based on some metric(s) that convey > >>>>>>>>>>>>>>> whether the > >>>>>>>>> system > >>>>>>>>>>>>>> is > >>>>>>>>>>>>>>> over or under-provisioned. If your metrics say you're > >>>>>>>>> over-provisioned > >>>>>>>>>>>>>> but > >>>>>>>>>>>>>>> there's > >>>>>>>>>>>>>>> not one thread running, well, that certainly sounds > >>>>>>>>>>>>>>> exceptional > >>>>> to > >>>>>>>>> me. > >>>>>>>>>>>> Or > >>>>>>>>>>>>>>> you might > >>>>>>>>>>>>>>> be right in that the cluster is over-provisioned but have > >> just > >>>>>>> been > >>>>>>>>>>>>>>> directing the > >>>>>>>>>>>>>>> removeStreamThread() and addStreamThread() calls to > >>>>>>>>>>>>>>> instances at > >>>>>>>>>>>> random, > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>> end up with one massive instance and one with no threads at > >>>>>>>>>>>>>>> all. > >>>>>>>>> Again, > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>> probably merits some human intervention (or system > redesign) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> That said, I don't think there's any real harm to just > >>>>>>>>>>>>>>> returning > >>>>>>>>> null > >>>>>>>>>>>> in > >>>>>>>>>>>>>>> this case, but I hope > >>>>>>>>>>>>>>> that users would pay attention to this since it seems > >>>>>>>>>>>>>>> likely to > >>>>>>>>>>>> indicate > >>>>>>>>>>>>>>> something has gone > >>>>>>>>>>>>>>> seriously wrong. I suppose Optional<String> would be a > >>>>> reasonable > >>>>>>>>>>>>>>> compromise. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> As for the method name, what about activeStreamThreads() or > >>>>>>>>>>>>>>> liveStreamThreads() ? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 1:45 AM Bruno Cadonna < > >>>>> br...@confluent.io> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi John, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I agree with you except for checking null. I would rather > >>>>> prefer > >>>>>>> to > >>>>>>>>>>>> use > >>>>>>>>>>>>>>>> Optional<String> as the return type to both methods. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I changed the subject from [VOTE] to [DISCUSS] so that we > >> can > >>>>>>>>> follow > >>>>>>>>>>>> up > >>>>>>>>>>>>>>>> in the discussion thread. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 04.09.20 23:12, John Roesler wrote: > >>>>>>>>>>>>>>>>> Hi Sophie, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Uh, oh, it's never a good sign when the discussion moves > >>>>>>>>>>>>>>>>> into the vote thread :) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I agree with you, it seems like a good touch for > >>>>>>>>>>>>>>>>> removeStreamThread() to return the name of the thread > that > >>>>>>>>>>>>>>>>> got removed, rather than a boolean flag. Maybe the return > >>>>>>>>>>>>>>>>> value would be `null` if there is no thread to remove. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If we go that way, I'd suggest that addStreamThread() > also > >>>>>>>>>>>>>>>>> return the name of the newly created thread, or null if > no > >>>>>>>>>>>>>>>>> thread can be created right now. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm not completely sure if I think that callers of this > >>>>>>>>>>>>>>>>> method would know exactly how many threads there are. > Sure, > >>>>>>>>>>>>>>>>> if a human being is sitting there looking at the metrics > or > >>>>>>>>>>>>>>>>> logs and decides to call the method, it would work out, > but > >>>>>>>>>>>>>>>>> I'd expect this kind of method to find its way into > >>>>>>>>>>>>>>>>> automated tooling that reacts to things like current > system > >>>>>>>>>>>>>>>>> load or resource saturation. Those kinds of toolchains > >> often > >>>>>>>>>>>>>>>>> are part of a distributed system, and it's probably not > >> that > >>>>>>>>>>>>>>>>> easy to guarantee that the thread count they observe is > >>>>>>>>>>>>>>>>> fully consistent with the number of threads that are > >>>>>>>>>>>>>>>>> actually running. Therefore, an in-situ `int > >>>>>>>>>>>>>>>>> numStreamThreads()` method might not be a bad idea. Then > >>>>>>>>>>>>>>>>> again, it seems sort of optional. A caller can catch an > >>>>>>>>>>>>>>>>> exception or react to a `null` return value just the same > >>>>>>>>>>>>>>>>> either way. Having both add/remove methods behave > similarly > >>>>>>>>>>>>>>>>> is probably more valuable. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>> -John > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Thu, 2020-09-03 at 12:15 -0700, Sophie Blee-Goldman > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> Hey, sorry for the late reply, I just have one minor > >>>>>>> suggestion. > >>>>>>>>>>>> Since > >>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> don't > >>>>>>>>>>>>>>>>>> make any guarantees about which thread gets removed or > >>>>>>>>>>>>>>>>>> allow > >>>>>>> the > >>>>>>>>>>>> user > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> specify, I think we should return either the index or > full > >>>>> name > >>>>>>>>> of > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> thread > >>>>>>>>>>>>>>>>>> that does get removed by removeThread(). > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I know you just updated the KIP to return true/false if > >>>>>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>> are/aren't any > >>>>>>>>>>>>>>>>>> threads to be removed, but I think this would be more > >>>>>>>>> appropriate as > >>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>> exception than as a return type. I think it's reasonable > >> to > >>>>>>>>> expect > >>>>>>>>>>>>>>>> users to > >>>>>>>>>>>>>>>>>> have some sense to how many threads are remaining, and > not > >>>>> try > >>>>>>> to > >>>>>>>>>>>>>> remove > >>>>>>>>>>>>>>>>>> a thread when there is none left. To me, that indicates > >>>>>>> something > >>>>>>>>>>>>>> wrong > >>>>>>>>>>>>>>>>>> with the user application code and should be treated as > an > >>>>>>>>>>>> exceptional > >>>>>>>>>>>>>>>> case. > >>>>>>>>>>>>>>>>>> I don't think the same code clarify argument applies > >>>>>>>>>>>>>>>>>> here as > >>>>> to > >>>>>>>>> the > >>>>>>>>>>>>>>>>>> addStreamThread() case, as there's no reason for an > >>>>> application > >>>>>>>>> to > >>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>> looping and retrying removeStreamThread() since if that > >>>>> fails, > >>>>>>>>> it's > >>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>> there are no threads left and thus it will continue to > >>>>>>>>>>>>>>>>>> always > >>>>>>>>> fail. > >>>>>>>>>>>>>> And > >>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> user actually wants to shut down all threads, they > should > >>>>> just > >>>>>>>>> close > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> whole application rather than call removeStreamThread() > >>>>>>>>>>>>>>>>>> in a > >>>>>>>>> loop. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> While I generally think it should be straightforward for > >>>>> users > >>>>>>> to > >>>>>>>>>>>>>> track > >>>>>>>>>>>>>>>> how > >>>>>>>>>>>>>>>>>> many stream threads they have running, maybe it would be > >>>>>>>>>>>>>>>>>> nice > >>>>>>> to > >>>>>>>>> add > >>>>>>>>>>>>>>>>>> a small utility method that does this for them. > Something > >>>>> like > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> // Returns the number of currently alive threads > >>>>>>>>>>>>>>>>>> boolean runningStreamThreads(); > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, Sep 3, 2020 at 7:41 AM Matthias J. Sax < > >>>>>>> mj...@apache.org > >>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> +1 (binding) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On 9/3/20 6:16 AM, Bruno Cadonna wrote: > >>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I would like to start the voting on KIP-663 that > >> proposes > >>>>> to > >>>>>>>>> add > >>>>>>>>>>>>>>>> methods > >>>>>>>>>>>>>>>>>>>> to the Kafka Streams client to add and remove stream > >>>>> threads > >>>>>>>>>>>> during > >>>>>>>>>>>>>>>>>>>> execution. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads > >>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >> > > >