Currently we, don't cleanup dead threads, but the KIP proposes to change this:
> Stream threads that are in state DEAD will be removed from the stream threads > of a Kafka Streams client. -Matthias On 9/8/20 2:37 PM, Sophie Blee-Goldman wrote: > Ah, I forgot about localThreadsMetadata(). In that. case I agree, there's > no reason > to introduce a new method when we can get both the names and number of all > running threads from this. > > I assume that we would update localThreadsMetadata to only return currently > alive threads as part of this KIP -- at a quick glance, it seems like we > don't do > any pruning of dead threads at the moment > > On Tue, Sep 8, 2020 at 1:58 PM Matthias J. Sax <mj...@apache.org> wrote: > >> I am not sure if we need a new method? There is already >> `localThreadsMetadata()`. What do we gain by adding a new one? >> >> Returning the thread's name (as `Optional<String>`) for both add() and >> remove() is fine with me. >> >> >> -Matthias >> >> On 9/8/20 12:58 PM, Sophie Blee-Goldman wrote: >>> Sorry Bruno, I think I missed the end of your message with the >>> numberOfAliveStreamThreads() >>> proposal. I agree, that would be better than the alternatives I listed. >>> That said: >>> >>>> They rather suggest that the method returns a list of handles to the >>> stream threads. >>> >>> I hadn't thought of that originally, but now that you mention it, this >>> might be a good idea. >>> I don't think we should return actual handles on the threads, but maybe a >>> list of the thread >>> names rather than a single number of currently alive threads. >>> >>> Since we seem to think it would be difficult if not impossible to keep >>> track of the number >>> of running stream threads, we should apply the same reasoning to the >> names >>> and not >>> assume the user can/will keep track of every thread returned by >>> addStreamThread() or >>> removeStreamThread(). Users should generally take any required action >>> immediately >>> after adding/removing the thread -- eg deregistering the thread metrics >> -- >>> but it might >>> still be useful to provide a convenience method listing all of the >> current >>> threads >>> >>> And of course you could still get the number of threads easily by >> invoking >>> size() on the >>> returned list (or ordered set?). >>> >>> On Tue, Sep 8, 2020 at 12:16 PM Bruno Cadonna <br...@confluent.io> >> wrote: >>> >>>> Thank you again for the feedback Sophie! >>>> >>>> As I tried to point out in my previous e-mail, removing a stream thread >>>> from a Kafka Streams client that does not have alive stream threads is >>>> nothing exceptional for the client per se. However, it can become >>>> exceptional within the context of the user. For example, if users want >>>> to remove a stream thread from a client without alive stream threads >>>> because one if their metrics say so, then this is exceptional in the >>>> context of that user metric not in the context of the Kafka Streams >>>> client. In that case, users should throw an exception and handle it. >>>> >>>> Regarding returning null, I do not like to return null because from a >>>> development point of view there is no distinction between returning null >>>> because we have a bug in the code or returning null because there are no >>>> alive stream threads. Additionally, Optional<String> makes it more >>>> explicit that the result could also be empty. >>>> >>>> Thank you for the alternative method names! However, with the names you >>>> propose it is not immediately clear that the method returns an amount of >>>> stream threads. They rather suggest that the method returns a list of >>>> handles to the stream threads. I chose to use "aliveStreamThreads" to be >>>> consistent with the client-level metric "alive-stream-threads" which >>>> reports the same number of stream threads that >>>> numberOfAliveStreamThreads() should report. If others also think that >>>> the proposed name in the KIP is too clumsy, I am open to rename it, >> though. >>>> >>>> Best, >>>> Bruno >>>> >>>> >>>> On 08.09.20 20:12, Sophie Blee-Goldman wrote: >>>>>> it's never a good sign when the discussion moves into the vote thread >>>>> >>>>> Hah, sorry, the gmail consolidation of [VOTE] and [DISCUSS] threads >>>> strikes >>>>> again. >>>>> Thanks for redirecting me Bruno >>>>> >>>>> I suppose it's unfair to expect the callers to keep perfect track of >> the >>>>> current >>>>> number of stream threads, but it also seems like you shouldn't be >>>> calling >>>>> removeStreamThread() when there are no threads left. Either you're just >>>>> haphazardly removing threads and could unintentionally slip into a >> state >>>> of >>>>> no >>>>> running threads without realizing it, or more realistically, you're >>>>> carefully >>>>> removing threads based on some metric(s) that convey whether the system >>>> is >>>>> over or under-provisioned. If your metrics say you're over-provisioned >>>> but >>>>> there's >>>>> not one thread running, well, that certainly sounds exceptional to me. >> Or >>>>> you might >>>>> be right in that the cluster is over-provisioned but have just been >>>>> directing the >>>>> removeStreamThread() and addStreamThread() calls to instances at >> random, >>>> and >>>>> end up with one massive instance and one with no threads at all. Again, >>>>> this >>>>> probably merits some human intervention (or system redesign) >>>>> >>>>> That said, I don't think there's any real harm to just returning null >> in >>>>> this case, but I hope >>>>> that users would pay attention to this since it seems likely to >> indicate >>>>> something has gone >>>>> seriously wrong. I suppose Optional<String> would be a reasonable >>>>> compromise. >>>>> >>>>> As for the method name, what about activeStreamThreads() or >>>>> liveStreamThreads() ? >>>>> >>>>> On Mon, Sep 7, 2020 at 1:45 AM Bruno Cadonna <br...@confluent.io> >> wrote: >>>>> >>>>>> Hi John, >>>>>> >>>>>> I agree with you except for checking null. I would rather prefer to >> use >>>>>> Optional<String> as the return type to both methods. >>>>>> >>>>>> I changed the subject from [VOTE] to [DISCUSS] so that we can follow >> up >>>>>> in the discussion thread. >>>>>> >>>>>> Best, >>>>>> Bruno >>>>>> >>>>>> On 04.09.20 23:12, John Roesler wrote: >>>>>>> Hi Sophie, >>>>>>> >>>>>>> Uh, oh, it's never a good sign when the discussion moves >>>>>>> into the vote thread :) >>>>>>> >>>>>>> I agree with you, it seems like a good touch for >>>>>>> removeStreamThread() to return the name of the thread that >>>>>>> got removed, rather than a boolean flag. Maybe the return >>>>>>> value would be `null` if there is no thread to remove. >>>>>>> >>>>>>> If we go that way, I'd suggest that addStreamThread() also >>>>>>> return the name of the newly created thread, or null if no >>>>>>> thread can be created right now. >>>>>>> >>>>>>> I'm not completely sure if I think that callers of this >>>>>>> method would know exactly how many threads there are. Sure, >>>>>>> if a human being is sitting there looking at the metrics or >>>>>>> logs and decides to call the method, it would work out, but >>>>>>> I'd expect this kind of method to find its way into >>>>>>> automated tooling that reacts to things like current system >>>>>>> load or resource saturation. Those kinds of toolchains often >>>>>>> are part of a distributed system, and it's probably not that >>>>>>> easy to guarantee that the thread count they observe is >>>>>>> fully consistent with the number of threads that are >>>>>>> actually running. Therefore, an in-situ `int >>>>>>> numStreamThreads()` method might not be a bad idea. Then >>>>>>> again, it seems sort of optional. A caller can catch an >>>>>>> exception or react to a `null` return value just the same >>>>>>> either way. Having both add/remove methods behave similarly >>>>>>> is probably more valuable. >>>>>>> >>>>>>> Thanks, >>>>>>> -John >>>>>>> >>>>>>> >>>>>>> On Thu, 2020-09-03 at 12:15 -0700, Sophie Blee-Goldman >>>>>>> wrote: >>>>>>>> Hey, sorry for the late reply, I just have one minor suggestion. >> Since >>>>>> we >>>>>>>> don't >>>>>>>> make any guarantees about which thread gets removed or allow the >> user >>>> to >>>>>>>> specify, I think we should return either the index or full name of >> the >>>>>>>> thread >>>>>>>> that does get removed by removeThread(). >>>>>>>> >>>>>>>> I know you just updated the KIP to return true/false if there >>>>>> are/aren't any >>>>>>>> threads to be removed, but I think this would be more appropriate as >>>> an >>>>>>>> exception than as a return type. I think it's reasonable to expect >>>>>> users to >>>>>>>> have some sense to how many threads are remaining, and not try to >>>> remove >>>>>>>> a thread when there is none left. To me, that indicates something >>>> wrong >>>>>>>> with the user application code and should be treated as an >> exceptional >>>>>> case. >>>>>>>> I don't think the same code clarify argument applies here as to the >>>>>>>> addStreamThread() case, as there's no reason for an application to >> be >>>>>>>> looping and retrying removeStreamThread() since if that fails, it's >>>>>> because >>>>>>>> there are no threads left and thus it will continue to always fail. >>>> And >>>>>> if >>>>>>>> the >>>>>>>> user actually wants to shut down all threads, they should just close >>>> the >>>>>>>> whole application rather than call removeStreamThread() in a loop. >>>>>>>> >>>>>>>> While I generally think it should be straightforward for users to >>>> track >>>>>> how >>>>>>>> many stream threads they have running, maybe it would be nice to add >>>>>>>> a small utility method that does this for them. Something like >>>>>>>> >>>>>>>> // Returns the number of currently alive threads >>>>>>>> boolean runningStreamThreads(); >>>>>>>> >>>>>>>> On Thu, Sep 3, 2020 at 7:41 AM Matthias J. Sax <mj...@apache.org> >>>>>> wrote: >>>>>>>> >>>>>>>>> +1 (binding) >>>>>>>>> >>>>>>>>> On 9/3/20 6:16 AM, Bruno Cadonna wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to start the voting on KIP-663 that proposes to add >>>>>> methods >>>>>>>>>> to the Kafka Streams client to add and remove stream threads >> during >>>>>>>>>> execution. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Bruno >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature