Hi Luke, John,

Thanks for bringing up this and also sorting it out!

I have added a note to the KIP.

Thanks,

Jim

On Wed, May 11, 2022 at 9:34 AM Luke Chen <show...@gmail.com> wrote:

> Thanks John!
> It makes sense.
> I have no other questions as long as it is documented in the KIP.
>
> Thank you.
> Luke
>
> On Wed, May 11, 2022 at 9:15 PM John Roesler <vvcep...@apache.org> wrote:
>
> > Hi Luke,
> >
> > It’s not my KIP, but my two cents is that users should not run the reset
> > tool while the application is paused.
> >
> > The reset tool should only be run while the whole app is shut down
> because
> > it messes with a lot of internal state bits without synchronization.
> > Leaving the app running (even while pausing processing) will result in
> the
> > app being in an undefined state, as the members and the tool will be
> > simultaneously trying to set the committed offsets to different values,
> etc.
> >
> > Jim, can you also make it a point to document this? As Luke points out,
> it
> > might be a natural thing to want to do.
> >
> > Thanks,
> > John
> >
> > On Wed, May 11, 2022, at 02:19, Luke Chen wrote:
> > > Hi Jim,
> > >
> > > Thanks for the KIP. Overall LGTM!
> > >
> > > One late question:
> > > Could we run the stream resetter tool (i.e.
> > > kafka-streams-application-reset.sh) during pause state?
> > > I can imagine there's a use case that after pausing for a while, user
> > just
> > > want to continue with the latest offset, and skipping the intermediate
> > > records.
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, May 11, 2022 at 10:12 AM Jim Hughes
> <jhug...@confluent.io.invalid
> > >
> > > wrote:
> > >
> > >> Hi Matthias,
> > >>
> > >> I like it.  I've updated the KIP to reflect that detail; I put the
> > details
> > >> in the docs for pause.
> > >>
> > >> Cheers,
> > >>
> > >> Jim
> > >>
> > >> On Tue, May 10, 2022 at 7:51 PM Matthias J. Sax <mj...@apache.org>
> > wrote:
> > >>
> > >> > Thanks for the KIP. Overall LGTM.
> > >> >
> > >> > Can we clarify one question: would it be allowed to call `pause()`
> > >> > before calling `start()`? I don't see any reason why we would need
> to
> > >> > disallow it?
> > >> >
> > >> > It could be helpful to start a KafkaStreams client in paused state
> --
> > >> > otherwise there is a race between calling `start()` and calling
> > >> `pause()`.
> > >> >
> > >> > If we allow it, we should clearly document it.
> > >> >
> > >> >
> > >> > -Matthias
> > >> >
> > >> > On 5/10/22 12:04 PM, Jim Hughes wrote:
> > >> > > Hi Bill, all,
> > >> > >
> > >> > > Thank you.  I've updated the KIP to reflect pausing standby tasks
> as
> > >> > well.
> > >> > > I think all the outstanding points have been addressed and I'm
> > going to
> > >> > > start the vote thread!
> > >> > >
> > >> > > Cheers,
> > >> > >
> > >> > > Jim
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Tue, May 10, 2022 at 2:43 PM Bill Bejeck <bbej...@gmail.com>
> > wrote:
> > >> > >
> > >> > >> Hi Jim,
> > >> > >>
> > >> > >> After reading the comments on the KIP, I agree that it makes
> sense
> > to
> > >> > pause
> > >> > >> all activities and any changes can be made later on.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> Bill
> > >> > >>
> > >> > >> On Tue, May 10, 2022 at 4:03 AM Bruno Cadonna <
> cado...@apache.org>
> > >> > wrote:
> > >> > >>
> > >> > >>> Hi Jim,
> > >> > >>>
> > >> > >>> Thanks for the KIP!
> > >> > >>>
> > >> > >>> I am fine with the KIP in general.
> > >> > >>>
> > >> > >>> However, I am with Sophie and John to also pause the standbys
> for
> > the
> > >> > >>> reasons they brought up. Is there a specific reason you want to
> > keep
> > >> > >>> standbys going? It feels like premature optimization to me. We
> can
> > >> > still
> > >> > >>> add keeping standby running in a follow up if needed.
> > >> > >>>
> > >> > >>> Best,
> > >> > >>> Bruno
> > >> > >>>
> > >> > >>> On 10.05.22 05:15, Sophie Blee-Goldman wrote:
> > >> > >>>> Thanks Jim, just one note/question on the standby tasks:
> > >> > >>>>
> > >> > >>>> At the minute, my moderately held position is that standby
> tasks
> > >> ought
> > >> > >> to
> > >> > >>>>> continue reading and remain caught up.  If standby tasks would
> > run
> > >> > out
> > >> > >>> of
> > >> > >>>>> space, there are probably bigger problems.
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> For a single node application, or when the #pause API is
> invoked
> > on
> > >> > all
> > >> > >>>> instances,
> > >> > >>>> then there won't be any further active processing and thus
> > nothing
> > >> to
> > >> > >>> keep
> > >> > >>>> up with,
> > >> > >>>> right? So for that case, it's just a matter of whether any
> > standbys
> > >> > >> that
> > >> > >>>> are lagging
> > >> > >>>> will have the chance to catch up to the (paused) active task
> > state
> > >> > >> before
> > >> > >>>> they stop
> > >> > >>>> as well, in which case having them continue feels fine to me.
> > >> However
> > >> > >>> this
> > >> > >>>> is a
> > >> > >>>> relatively trivial benefit and I would only consider it as a
> > >> deciding
> > >> > >>>> factor when all
> > >> > >>>> things are equal otherwise.
> > >> > >>>>
> > >> > >>>> My concern is the more interesting case: when this feature is
> > used
> > >> to
> > >> > >>> pause
> > >> > >>>> only
> > >> > >>>> one nodes, or some subset of the overall application. In this
> > case,
> > >> > >> yes,
> > >> > >>>> the standby
> > >> > >>>> tasks will indeed fall out of sync. But the only reason I can
> > >> imagine
> > >> > >>>> someone using
> > >> > >>>> the pause feature in such a way is because there is something
> > going
> > >> > >>> wrong,
> > >> > >>>> or about
> > >> > >>>> to go wrong, on that particular node. For example as mentioned
> > >> above,
> > >> > >> if
> > >> > >>>> the user
> > >> > >>>> wants to cut down on costs without stopping everything, or if
> the
> > >> node
> > >> > >> is
> > >> > >>>> about to
> > >> > >>>> run out of disk or needs to be debugged or so on. And in this
> > case,
> > >> > >>>> continuing to
> > >> > >>>> process the standby tasks while other instances continue to run
> > >> would
> > >> > >>>> pretty much
> > >> > >>>> defeat the purpose of pausing it entirely, and might have
> > unpleasant
> > >> > >>>> consequences
> > >> > >>>> for the unsuspecting developer.
> > >> > >>>>
> > >> > >>>> All that said, I don't want to block this KIP so if you have
> > strong
> > >> > >>>> feelings about the
> > >> > >>>> standby behavior I'm happy to back down. I'm only pushing back
> > now
> > >> > >>> because
> > >> > >>>> it
> > >> > >>>> felt like there wasn't any particular motivation for the
> > standbys to
> > >> > >>>> continue processing
> > >> > >>>> or not, and I figured I'd try to fill in this gap with my
> > thoughts
> > >> on
> > >> > >> the
> > >> > >>>> matter :)
> > >> > >>>> Either way we should just make sure that this behavior is
> > documented
> > >> > >>>> clearly,
> > >> > >>>> since it may be surprising if we decide to only pause active
> > >> > processing
> > >> > >>>> (another option
> > >> > >>>> is to rename the method something like #pauseProcessing or
> > >> > >>>> #pauseActiveProcessing
> > >> > >>>> so that it's hard to miss).
> > >> > >>>>
> > >> > >>>> Thanks! Sorry for the lengthy response, but hopefully we won't
> > need
> > >> to
> > >> > >>>> debate this any
> > >> > >>>> further. Beyond this I'm satisfied with the latest proposal
> > >> > >>>>
> > >> > >>>> On Mon, May 9, 2022 at 5:16 PM John Roesler <
> vvcep...@apache.org
> > >
> > >> > >> wrote:
> > >> > >>>>
> > >> > >>>>> Thanks for the updates, Jim!
> > >> > >>>>>
> > >> > >>>>> After this discussion and your updates, this KIP looks good to
> > me.
> > >> > >>>>>
> > >> > >>>>> Thanks,
> > >> > >>>>> John
> > >> > >>>>>
> > >> > >>>>> On Mon, May 9, 2022, at 17:52, Jim Hughes wrote:
> > >> > >>>>>> Hi Sophie, all,
> > >> > >>>>>>
> > >> > >>>>>> I've updated the KIP with feedback from the discussion so
> far:
> > >> > >>>>>>
> > >> > >>>>>
> > >> > >>>
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832
> > >> > >>>>>>
> > >> > >>>>>> As a terse summary of my current position:
> > >> > >>>>>> Pausing will only stop processing and punctuation (respecting
> > >> > modular
> > >> > >>>>>> topologies).
> > >> > >>>>>> Paused topologies will still a) consume from input topics, b)
> > call
> > >> > >> the
> > >> > >>>>>> usual commit pathways (commits will happen basically as they
> > would
> > >> > >>> have),
> > >> > >>>>>> and c) standBy tasks will still be processed.
> > >> > >>>>>>
> > >> > >>>>>> Shout if the KIP or those details still need some TLC.
> > Responding
> > >> > to
> > >> > >>>>>> Sophie inline below.
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>> On Mon, May 9, 2022 at 6:06 PM Sophie Blee-Goldman
> > >> > >>>>>> <sop...@confluent.io.invalid> wrote:
> > >> > >>>>>>
> > >> > >>>>>>> Don't worry, I'm going to be adding the APIs for
> > topology-level
> > >> > >>> pausing
> > >> > >>>>> as
> > >> > >>>>>>> part of the modular topologies KIP,
> > >> > >>>>>>> so we don't need to worry about that for now. That said, I
> > don't
> > >> > >> think
> > >> > >>>>> we
> > >> > >>>>>>> should brush it off entirely and design
> > >> > >>>>>>> this feature in a way that's going to be incompatible or
> > hugely
> > >> > >> raise
> > >> > >>>>> the
> > >> > >>>>>>> LOE on bringing the (mostly already
> > >> > >>>>>>> implemented) modular topologies feature into the public API,
> > just
> > >> > >>>>>>> because it "won the race to write a KIP" :)
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>> Yes, I'm hoping that this is all compatible with modular
> > >> > >> topologies.  I
> > >> > >>>>>> haven't seen anything so far which seems to be a problem;
> this
> > KIP
> > >> > is
> > >> > >>>>> just
> > >> > >>>>>> in a weird state to discuss details of acting on modular
> > >> > >> topologies.:)
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>> I may be biased (ok, I definitely am), but I'm not in favor
> of
> > >> > >> adding
> > >> > >>>>> this
> > >> > >>>>>>> as a state regardless of the modular topologies.
> > >> > >>>>>>> First of all any change to the KafkaStreams state machine
> is a
> > >> > >>> breaking
> > >> > >>>>>>> change, no? So we would have to wait until
> > >> > >>>>>>> the next major release which seems like an unnecessary thing
> > to
> > >> > >> block
> > >> > >>>>> on.
> > >> > >>>>>>> (Whether to add this as a state to the
> > >> > >>>>>>> StreamThread's FSM is an implementation detail).
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>> +1.  I am sold on skipping out on new states.  I had that as
> a
> > >> > >> rejected
> > >> > >>>>>> alternative in the KIP and have added a few more words to
> that
> > >> bit.
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>> Also, the semantics of using an `isPaused` method to
> > distinguish
> > >> a
> > >> > >>>>> paused
> > >> > >>>>>>> instance (or topology) make more sense
> > >> > >>>>>>> to me -- this is a user-specified status, whereas the
> > >> KafkaStreams
> > >> > >>>>> state is
> > >> > >>>>>>> intended to relay the status of the system
> > >> > >>>>>>> itself. For example, if we are going to continue to poll
> > during
> > >> > >> pause,
> > >> > >>>>> then
> > >> > >>>>>>> shouldn't the client transition to REBALANCING?
> > >> > >>>>>>> I believe it makes sense to still allow distinguishing these
> > >> states
> > >> > >>>>> while a
> > >> > >>>>>>> client is paused, whereas making PAUSED its
> > >> > >>>>>>> own state means you can't tell when the client is
> rebalancing
> > vs
> > >> > >>>>> running,
> > >> > >>>>>>> or whether it is paused or dead: presumably
> > >> > >>>>>>> the NOT_RUNNING/ERROR state would trump the PAUSED state,
> > which
> > >> > >> means
> > >> > >>>>> you
> > >> > >>>>>>> would not be able to rely on
> > >> > >>>>>>> checking the state to see if you had called PAUSED on that
> > >> > instance.
> > >> > >>>>>>> Obviously you can work around this by just
> > >> > >>>>>>> maintaining a flag in the usercode, but all this feels very
> > >> > >> unnatural
> > >> > >>>>> to me
> > >> > >>>>>>> vs just checking the `#isPaused` API.
> > >> > >>>>>>>
> > >> > >>>>>>> On that note, I had one question -- at what point would the
> > >> > >>> `#isPaused`
> > >> > >>>>>>> check return true? Would it do so immediately
> > >> > >>>>>>> after pausing the instance, or only once it has finished
> > >> committing
> > >> > >>>>> offsets
> > >> > >>>>>>> and stopped returning records?
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>> Immediately, `#isPaused` tells you about metadata.
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>> Finally, on the note of punctuators I think it would make
> most
> > >> > sense
> > >> > >>> to
> > >> > >>>>>>> either pause these as well or else add this an
> > >> > >>>>>>> an explicit option for the user. If this feature is used to,
> > for
> > >> > >>>>> example,
> > >> > >>>>>>> help save on processing costs while an app is
> > >> > >>>>>>> not in use, then it would probably be surprising and perhaps
> > >> > >> alarming
> > >> > >>> to
> > >> > >>>>>>> see certain kinds of processing still continue.
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>>   From other parts of the discussion, I'm sold on pausing
> > >> > punctuation.
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>> The question of whether to continue fetching for standby
> > tasks is
> > >> > >>> maybe
> > >> > >>>>> a
> > >> > >>>>>>> bit more debatable, as it would certainly be
> > >> > >>>>>>> nice to find your clients all caught up when you go to
> resume
> > the
> > >> > >>>>> instance
> > >> > >>>>>>> again, but I would still strongly suggest
> > >> > >>>>>>> pausing these as well. To use a similar example, imagine if
> > you
> > >> > >> paused
> > >> > >>>>> an
> > >> > >>>>>>> app because it was about to run out of
> > >> > >>>>>>> disk. If the standbys kept processing and filled up the
> > remaining
> > >> > >>> space,
> > >> > >>>>>>> you'd probably feel a bit betrayed by this API.
> > >> > >>>>>>>
> > >> > >>>>>>> WDYT?
> > >> > >>>>>>>
> > >> > >>>>>>
> > >> > >>>>>> At the minute, my moderately held position is that standby
> > tasks
> > >> > >> ought
> > >> > >>> to
> > >> > >>>>>> continue reading and remain caught up.  If standby tasks
> would
> > run
> > >> > >> out
> > >> > >>> of
> > >> > >>>>>> space, there are probably bigger problems.
> > >> > >>>>>>
> > >> > >>>>>> If later it is desirable to manage punctuation or standby
> > tasks,
> > >> > then
> > >> > >>> it
> > >> > >>>>>> should be easy for future folks to modify things.
> > >> > >>>>>>
> > >> > >>>>>> Overall, I'd frame this KIP as "pause processing resulting in
> > >> > >> outputs".
> > >> > >>>>>>
> > >> > >>>>>> Cheers,
> > >> > >>>>>>
> > >> > >>>>>> Jim
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>> On Mon, May 9, 2022 at 10:33 AM Guozhang Wang <
> > >> wangg...@gmail.com>
> > >> > >>>>> wrote:
> > >> > >>>>>>>
> > >> > >>>>>>>> I think for named topology we can leave the scope of this
> > KIP as
> > >> > >> "all
> > >> > >>>>> or
> > >> > >>>>>>>> nothing", i.e. when you pause an instance you pause all of
> > its
> > >> > >>>>>>> topologies.
> > >> > >>>>>>>> I raised this question in my previous email just trying to
> > >> clarify
> > >> > >> if
> > >> > >>>>>>> this
> > >> > >>>>>>>> is what you have in mind. We can leave the question of
> finer
> > >> > >>>>> controlled
> > >> > >>>>>>>> pausing behavior for later when we have named topology
> being
> > >> > >> exposed
> > >> > >>>>> via
> > >> > >>>>>>>> another KIP.
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>> Guozhang
> > >> > >>>>>>>>
> > >> > >>>>>>>> On Mon, May 9, 2022 at 7:50 AM John Roesler <
> > >> vvcep...@apache.org>
> > >> > >>>>> wrote:
> > >> > >>>>>>>>
> > >> > >>>>>>>>> Hi Jim,
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> Thanks for the replies. This all sounds good to me. Just
> two
> > >> > >> further
> > >> > >>>>>>>>> comments:
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> 3. It seems like you should aim for the simplest
> semantics.
> > If
> > >> > the
> > >> > >>>>>>> intent
> > >> > >>>>>>>>> is to “pause” the instance, then you’d better pause the
> > whole
> > >> > >>>>> instance.
> > >> > >>>>>>>> If
> > >> > >>>>>>>>> you leave punctuations and standbys running, I expect we’d
> > see
> > >> > bug
> > >> > >>>>>>>> reports
> > >> > >>>>>>>>> come in that the instance isn’t really paused.
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> 5. Since you won the race to write a KIP, I don’t think it
> > >> makes
> > >> > >> too
> > >> > >>>>>>> much
> > >> > >>>>>>>>> sense to worry too much about modular topologies. When
> they
> > >> > >> propose
> > >> > >>>>>>> their
> > >> > >>>>>>>>> KIP, they will have to specify a lot of state management
> > >> > behavior,
> > >> > >>>>> and
> > >> > >>>>>>>>> pause/resume will have to be part of it. If they have some
> > >> > concern
> > >> > >>>>>>> about
> > >> > >>>>>>>>> your KIP, they’ll chime in. It doesn’t make sense for you
> to
> > >> try
> > >> > >> and
> > >> > >>>>>>>> guess
> > >> > >>>>>>>>> what that proposal will look like.
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> To be honest, you’re proposing a KafkaStreams
> runtime-level
> > >> > >>>>>>> pause/resume
> > >> > >>>>>>>>> function, not a topology-level one anyway, so it seems
> > pretty
> > >> > >> clear
> > >> > >>>>>>> that
> > >> > >>>>>>>> it
> > >> > >>>>>>>>> would pause the whole runtime (of a single instance)
> > regardless
> > >> > of
> > >> > >>>>> any
> > >> > >>>>>>>>> modular topologies. If the intent is to pause individual
> > >> > >> topologies
> > >> > >>>>> in
> > >> > >>>>>>>> the
> > >> > >>>>>>>>> future, you’d need a different API anyway.
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> Thanks!
> > >> > >>>>>>>>> -John
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> On Mon, May 9, 2022, at 08:10, Jim Hughes wrote:
> > >> > >>>>>>>>>> Hi John,
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Long emails are great; responding inline!
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> On Sat, May 7, 2022 at 4:54 PM John Roesler <
> > >> > vvcep...@apache.org
> > >> > >>>
> > >> > >>>>>>>> wrote:
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> Thanks for the KIP, Jim!
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> This conversation seems to highlight that the KIP needs
> to
> > >> > >>>>> specify
> > >> > >>>>>>>>>>> some of its behavior as well as its APIs, where the
> > behavior
> > >> is
> > >> > >>>>>>>>>>> observable and significant to users.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> For example:
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> 1. Do you plan to have a guarantee that immediately
> after
> > >> > >>>>>>>>>>> calling KafkaStreams.pause(), users should observe that
> > the
> > >> > >>>>> instance
> > >> > >>>>>>>>>>> stops processing new records? Or should they expect that
> > the
> > >> > >>>>> threads
> > >> > >>>>>>>>>>> will continue to process some records and pause
> > >> asynchronously
> > >> > >>>>>>>>>>> (you already answered this in the thread earlier)?
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> I'm happy to build up to a guarantee of sorts.  My
> current
> > >> idea
> > >> > >> is
> > >> > >>>>>>> that
> > >> > >>>>>>>>>> pause() does not do anything "exceptional" to get control
> > back
> > >> > >>>>> from a
> > >> > >>>>>>>>>> running topology.  A currently running topology would get
> > to
> > >> > >>>>> complete
> > >> > >>>>>>>> its
> > >> > >>>>>>>>>> loop.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Separately, I'm still piecing together how commits work.
> > By
> > >> > some
> > >> > >>>>>>>>>> mechanism, after a pause, I do agree that the topology
> > needs
> > >> to
> > >> > >>>>>>> commit
> > >> > >>>>>>>>> its
> > >> > >>>>>>>>>> work in some manner.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> 2. Will the threads continue to poll new records until
> > they
> > >> > >>>>>>> naturally
> > >> > >>>>>>>>> fill
> > >> > >>>>>>>>>>> up the task buffers, or will they immediately pause
> their
> > >> > >>>>> Consumers
> > >> > >>>>>>>>>>> as well?
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Presently, I'm suggesting that consumers would fill up
> > their
> > >> > >>>>> buffers.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> 3. Will threads continue to call (system time)
> > punctuators,
> > >> or
> > >> > >>>>> would
> > >> > >>>>>>>>>>> punctuations also be paused?
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> In my first pass at thinking through this, I left the
> > >> > punctuators
> > >> > >>>>>>>>> running.
> > >> > >>>>>>>>>> To be honest, I'm not sure what they do, so my approach
> is
> > >> > either
> > >> > >>>>>>> lucky
> > >> > >>>>>>>>> and
> > >> > >>>>>>>>>> correct or it could be Very Clearly Wrong.;)
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> I realize that some of those questions simply may not
> have
> > >> > >>>>> occurred
> > >> > >>>>>>> to
> > >> > >>>>>>>>>>> you, so this is not a criticism for leaving them off;
> I'm
> > >> just
> > >> > >>>>>>>> pointing
> > >> > >>>>>>>>> out
> > >> > >>>>>>>>>>> that although we don't tend to mention implementation
> > details
> > >> > in
> > >> > >>>>>>> KIPs,
> > >> > >>>>>>>>>>> we also can't be too high level, since there are a lot
> of
> > >> > >>>>>>> operational
> > >> > >>>>>>>>>>> details that users rely on to achieve various behaviors
> in
> > >> > >>>>> Streams.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Ayup, I will add some details as we iron out the
> > guarantees,
> > >> > >>>>>>>>> implementation
> > >> > >>>>>>>>>> details that are at the API level.  This one is tough
> since
> > >> > >>>>> internal
> > >> > >>>>>>>>>> features like NamedTopologies are part of the discussion.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> A couple more comments:
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> 4. +1 to what Guozhang said. It seems like we should we
> > also
> > >> do
> > >> > >> a
> > >> > >>>>>>>> commit
> > >> > >>>>>>>>>>> before entering the paused state. That way, any open
> > >> > >> transactions
> > >> > >>>>>>>> would
> > >> > >>>>>>>>>>> be closed and not have to worry about timing out. Even
> > under
> > >> > >>>>> ALOS,
> > >> > >>>>>>> it
> > >> > >>>>>>>>>>> seems best to go ahead and complete the processing of
> > >> in-flight
> > >> > >>>>>>>> records
> > >> > >>>>>>>>>>> by committing. That way, if anything happens to die
> while
> > >> it's
> > >> > >>>>>>> paused,
> > >> > >>>>>>>>>>> existing
> > >> > >>>>>>>>>>> work won't have to be repeated. Plus, if there are any
> > >> > >> processors
> > >> > >>>>>>> with
> > >> > >>>>>>>>> side
> > >> > >>>>>>>>>>> effects, users won't have to tolerate weird edge cases
> > where
> > >> a
> > >> > >>>>> pause
> > >> > >>>>>>>>> occurs
> > >> > >>>>>>>>>>> after a processor sees a record, but before the result
> is
> > >> sent
> > >> > >> to
> > >> > >>>>>>> its
> > >> > >>>>>>>>>>> outputs.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> 5. I noticed that you proposed not to add a PAUSED
> state,
> > >> but I
> > >> > >>>>>>> didn't
> > >> > >>>>>>>>>>> follow
> > >> > >>>>>>>>>>> the rationale. Adding a state seems beneficial for a
> > number
> > >> of
> > >> > >>>>>>>> reasons:
> > >> > >>>>>>>>>>> StreamThreads already use the thread state to determine
> > >> whether
> > >> > >>>>> to
> > >> > >>>>>>>>> process
> > >> > >>>>>>>>>>> or not, so avoiding a new State would just mean adding a
> > >> > >> separate
> > >> > >>>>>>> flag
> > >> > >>>>>>>>> to
> > >> > >>>>>>>>>>> track
> > >> > >>>>>>>>>>> and then checking your new flag in addition to the State
> > in
> > >> the
> > >> > >>>>>>>> thread.
> > >> > >>>>>>>>>>> Also,
> > >> > >>>>>>>>>>> operating Streams applications is a non-trivial task,
> and
> > >> users
> > >> > >>>>> rely
> > >> > >>>>>>>> on
> > >> > >>>>>>>>>>> the State
> > >> > >>>>>>>>>>> (and transitions) to understand Streams's behavior.
> > Adding a
> > >> > >>>>> PAUSED
> > >> > >>>>>>>>> state
> > >> > >>>>>>>>>>> is an elegant way to communicate to operators what is
> > >> happening
> > >> > >>>>> with
> > >> > >>>>>>>> the
> > >> > >>>>>>>>>>> application. Note that the person digging though logs
> and
> > >> > >>>>> metrics,
> > >> > >>>>>>>>> trying
> > >> > >>>>>>>>>>> to understand why the application isn't doing anything
> is
> > >> > >>>>> probably
> > >> > >>>>>>> not
> > >> > >>>>>>>>>>> going
> > >> > >>>>>>>>>>> to be the same person who is calling pause() and
> resume().
> > >> > Also,
> > >> > >>>>> if
> > >> > >>>>>>>> you
> > >> > >>>>>>>>> add
> > >> > >>>>>>>>>>> a state, you don't need `isPaused()`.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> 5b. If you buy the arguments to go ahead and commit as
> > well
> > >> as
> > >> > >>>>> the
> > >> > >>>>>>>>>>> argument to add a State, then I'd also suggest to follow
> > the
> > >> > >>>>>>> existing
> > >> > >>>>>>>>>>> patterns
> > >> > >>>>>>>>>>> for the shutdown states by also adding PAUSING. That
> > >> > >>>>>>>>>>> way, you'll also expose a way to understand that Streams
> > >> > >> received
> > >> > >>>>>>> the
> > >> > >>>>>>>>>>> signal
> > >> > >>>>>>>>>>> to pause, and that it's still processing and committing
> > some
> > >> > >>>>> records
> > >> > >>>>>>>> in
> > >> > >>>>>>>>>>> preparation to enter a PAUSED state. I'm not sure if a
> > >> RESUMING
> > >> > >>>>>>> state
> > >> > >>>>>>>>> would
> > >> > >>>>>>>>>>> also make sense.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> I hit a tricky bit when thinking through having a PAUSED
> > >> > >>>>> state...  If
> > >> > >>>>>>>> one
> > >> > >>>>>>>>>> is using Named Topologies, and some of them are paused,
> > what
> > >> > >>>>> state is
> > >> > >>>>>>>> the
> > >> > >>>>>>>>>> Streams instance in?  If we can agree on that, things may
> > >> become
> > >> > >>>>>>>>> clear....
> > >> > >>>>>>>>>> I can see two quick ideas:
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> 1.  The state is RUNNING and NamedTopologies have some
> > other
> > >> way
> > >> > >>>>> to
> > >> > >>>>>>>>>> indicate state.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> 2.  The state is something messy like PARTIALLY_PAUSED to
> > >> > reflect
> > >> > >>>>>>> that
> > >> > >>>>>>>>> the
> > >> > >>>>>>>>>> instance has something interesting going on.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> When I poked at things initially, I did try out having
> > >> different
> > >> > >>>>>>>> states,
> > >> > >>>>>>>>>> and I readily agree that a PAUSING state may make sense.
> > >> > >>>>> (Especially
> > >> > >>>>>>>> if
> > >> > >>>>>>>>>> there's a need to run commits before transitioning all
> the
> > way
> > >> > to
> > >> > >>>>>>>>> PAUSED.)
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> And that's all I have to say about that. I hope you
> don't
> > >> find
> > >> > >> my
> > >> > >>>>>>>>>>> long message offputting. I'm fundamentally in favor of
> > your
> > >> > KIP,
> > >> > >>>>>>>>>>> and I think with a little more explanation in the KIP,
> > and a
> > >> > few
> > >> > >>>>>>>>>>> small tweaks to the proposal, we'll be able to provide
> > good
> > >> > >>>>>>>>>>> ergonomics to our users.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Thanks!
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> Jim
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> Thanks,
> > >> > >>>>>>>>>>> -John
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> On Sat, May 7, 2022, at 00:06, Guozhang Wang wrote:
> > >> > >>>>>>>>>>>> I'm in favor of the "just pausing the instance itself“
> > >> option
> > >> > >>>>> as
> > >> > >>>>>>>>> well. As
> > >> > >>>>>>>>>>>> for EOS, the point is that when the processing is
> > paused, we
> > >> > >>>>> would
> > >> > >>>>>>>> not
> > >> > >>>>>>>>>>>> trigger any `producer.send` during the time, and the
> > >> > >>>>> transaction
> > >> > >>>>>>>>> timeout
> > >> > >>>>>>>>>>> is
> > >> > >>>>>>>>>>>> sort of relying on that behavior, so my point was that
> > it's
> > >> > >>>>>>> probably
> > >> > >>>>>>>>>>> better
> > >> > >>>>>>>>>>>> to also commit the processing before we pause it.
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> Guozhang
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> On Fri, May 6, 2022 at 6:12 PM Jim Hughes
> > >> > >>>>>>>>> <jhug...@confluent.io.invalid>
> > >> > >>>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>>> Hi Matthias,
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>> Since the only thing which will be paused is
> processing
> > the
> > >> > >>>>>>>>> topology, I
> > >> > >>>>>>>>>>>>> think we can let commits happen naturally.
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>> Good point about getting the paused state to new
> > members;
> > >> it
> > >> > >>>>> is
> > >> > >>>>>>>>> seeming
> > >> > >>>>>>>>>>>>> like the "building block" approach is a good one to
> keep
> > >> > >>>>> things
> > >> > >>>>>>>>> simple
> > >> > >>>>>>>>>>> at
> > >> > >>>>>>>>>>>>> first.
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>> Cheers,
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>> Jim
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>> On Fri, May 6, 2022 at 8:31 PM Matthias J. Sax <
> > >> > >>>>> mj...@apache.org
> > >> > >>>>>>>>
> > >> > >>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> I think it's tricky to propagate a pauseAll() via the
> > >> > >>>>> rebalance
> > >> > >>>>>>>>>>>>>> protocol. New members joining the group would need to
> > get
> > >> > >>>>>>> paused,
> > >> > >>>>>>>>> too?
> > >> > >>>>>>>>>>>>>> Could there be weird race conditions with overlapping
> > >> > >>>>>>> pauseAll()
> > >> > >>>>>>>>> and
> > >> > >>>>>>>>>>>>>> resumeAll() calls on different instanced while there
> > could
> > >> > >>>>> be a
> > >> > >>>>>>>>>>> errors /
> > >> > >>>>>>>>>>>>>> network partitions or similar?
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> I would argue that similar to IQ, we provide the
> basic
> > >> > >>>>> building
> > >> > >>>>>>>>>>> blocks,
> > >> > >>>>>>>>>>>>>> and leave it the user users to implement cross
> instance
> > >> > >>>>>>>> management
> > >> > >>>>>>>>>>> for a
> > >> > >>>>>>>>>>>>>> pauseAll() scenario. -- Also, if there is really
> > demand,
> > >> we
> > >> > >>>>> can
> > >> > >>>>>>>>> always
> > >> > >>>>>>>>>>>>>> add pauseAll()/resumeAll() as follow up work.
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> About named typologies: I agree to Jim to not include
> > them
> > >> > >>>>> in
> > >> > >>>>>>>> this
> > >> > >>>>>>>>> KIP
> > >> > >>>>>>>>>>>>>> as they are not a public feature yet. If we make
> named
> > >> > >>>>>>> typologies
> > >> > >>>>>>>>>>>>>> public, the corresponding KIP should extend the
> > >> pause/resume
> > >> > >>>>>>>>> feature
> > >> > >>>>>>>>>>>>>> (ie, APIs) accordingly. Of course, the code can (and
> > >> should)
> > >> > >>>>>>>>> already
> > >> > >>>>>>>>>>> be
> > >> > >>>>>>>>>>>>>> setup to support it to be future proof.
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> Good call out about commit and EOS -- to simplify
> it, I
> > >> > >>>>> think
> > >> > >>>>>>> it
> > >> > >>>>>>>>> might
> > >> > >>>>>>>>>>>>>> be good to commit also for the at-least-once case?
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> -Matthias
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> On 5/6/22 1:05 PM, Jim Hughes wrote:
> > >> > >>>>>>>>>>>>>>> Hi Bill,
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Great questions; I'll do my best to reply inline:
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> On Fri, May 6, 2022 at 3:21 PM Bill Bejeck <
> > >> > >>>>>>> bbej...@gmail.com>
> > >> > >>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> Hi Jim,
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> Thanks for the KIP.  I have a couple of
> > meta-questions
> > >> as
> > >> > >>>>>>>> well:
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> 1) Regarding pausing only a subset of running
> > instances,
> > >> > >>>>> I'm
> > >> > >>>>>>>>>>> thinking
> > >> > >>>>>>>>>>>>>> there
> > >> > >>>>>>>>>>>>>>>> may be a use case for pausing all of them.
> > >> > >>>>>>>>>>>>>>>>       Would it make sense to also allow for pausing
> > all
> > >> > >>>>>>>> instances
> > >> > >>>>>>>>> by
> > >> > >>>>>>>>>>>>>> adding a
> > >> > >>>>>>>>>>>>>>>> method `pauseAll()` or something similar?
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Honestly, I'm indifferent on this point.
> Presently, I
> > >> > >>>>> think
> > >> > >>>>>>>>> what I
> > >> > >>>>>>>>>>>>> have
> > >> > >>>>>>>>>>>>>>> proposed is the minimal change to get the ability to
> > >> pause
> > >> > >>>>>>> and
> > >> > >>>>>>>>>>> resume
> > >> > >>>>>>>>>>>>>>> processing.  If adding a 'pauseAll()' is required,
> > I'd be
> > >> > >>>>>>> happy
> > >> > >>>>>>>>> to
> > >> > >>>>>>>>>>> do
> > >> > >>>>>>>>>>>>>> that!
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>    From Guozhang's email, it sounds like this would
> > >> require
> > >> > >>>>>>> using
> > >> > >>>>>>>>> the
> > >> > >>>>>>>>>>>>>>> rebalance protocol to trigger the coordination.
> Would
> > >> > >>>>> there
> > >> > >>>>>>> be
> > >> > >>>>>>>>>>> enough
> > >> > >>>>>>>>>>>>>> room
> > >> > >>>>>>>>>>>>>>> in that approach to indicate that a named topology
> is
> > to
> > >> > >>>>> be
> > >> > >>>>>>>>> paused
> > >> > >>>>>>>>>>>>> across
> > >> > >>>>>>>>>>>>>>> all nodes?
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> 2) Would pausing affect standby tasks?  For
> example,
> > >> > >>>>> imagine
> > >> > >>>>>>>>> there
> > >> > >>>>>>>>>>>>> are 3
> > >> > >>>>>>>>>>>>>>>> instances A, B, and C.
> > >> > >>>>>>>>>>>>>>>>       A user elects to pause instance C only but it
> > >> hosts
> > >> > >>>>> the
> > >> > >>>>>>>>> standby
> > >> > >>>>>>>>>>>>>> tasks
> > >> > >>>>>>>>>>>>>>>> for A.
> > >> > >>>>>>>>>>>>>>>>       Would the standby tasks on the paused
> > application
> > >> > >>>>>>> continue
> > >> > >>>>>>>>> to
> > >> > >>>>>>>>>>> read
> > >> > >>>>>>>>>>>>>> from
> > >> > >>>>>>>>>>>>>>>> the changelog topic?
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Yes, standby tasks would continue reading from the
> > >> > >>>>> changelog
> > >> > >>>>>>>>> topic.
> > >> > >>>>>>>>>>>>> All
> > >> > >>>>>>>>>>>>>>> consumers would continue reading to avoid getting
> > dropped
> > >> > >>>>>>> from
> > >> > >>>>>>>>> their
> > >> > >>>>>>>>>>>>>>> consumer groups.
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Cheers,
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Jim
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> Thanks!
> > >> > >>>>>>>>>>>>>>>> Bill
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> On Fri, May 6, 2022 at 2:44 PM Jim Hughes
> > >> > >>>>>>>>>>>>> <jhug...@confluent.io.invalid
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Hi Guozhang,
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Thanks for the feedback; responses inline below:
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> On Fri, May 6, 2022 at 1:09 PM Guozhang Wang <
> > >> > >>>>>>>>> wangg...@gmail.com>
> > >> > >>>>>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> Hello Jim,
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> Thanks for the proposed KIP. I have some meta
> > >> questions
> > >> > >>>>>>>> about
> > >> > >>>>>>>>> it:
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> 1) Would an instance always pause/resume all of
> its
> > >> > >>>>>>> current
> > >> > >>>>>>>>> owned
> > >> > >>>>>>>>>>>>>>>>>> topologies (i.e. the named topologies), or are
> > there
> > >> > >>>>> any
> > >> > >>>>>>>>>>> scenarios
> > >> > >>>>>>>>>>>>>>>> where
> > >> > >>>>>>>>>>>>>>>>> we
> > >> > >>>>>>>>>>>>>>>>>> only want to pause/resume a subset of them?
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> An instance may wish to pause some of its named
> > >> > >>>>> topologies.
> > >> > >>>>>>>> I
> > >> > >>>>>>>>> was
> > >> > >>>>>>>>>>>>>> unsure
> > >> > >>>>>>>>>>>>>>>>> what to say about named topologies in the KIP
> since
> > >> they
> > >> > >>>>>>> seem
> > >> > >>>>>>>>> to
> > >> > >>>>>>>>>>> be
> > >> > >>>>>>>>>>>>> an
> > >> > >>>>>>>>>>>>>>>>> internal detail at the moment.
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> I intend to add to
> KafkaStreamsNamedTopologyWrapper
> > >> > >>>>> methods
> > >> > >>>>>>>>> like:
> > >> > >>>>>>>>>>>>>>>>>        public void pauseNamedTopology(final String
> > >> > >>>>>>>>> topologyToPause)
> > >> > >>>>>>>>>>>>>>>>>        public boolean isNamedTopologyPaused(final
> > >> String
> > >> > >>>>>>>>> topology)
> > >> > >>>>>>>>>>>>>>>>>        public void resumeNamedTopology(final
> String
> > >> > >>>>>>>>>>> topologyToResume)
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> 2) From a user's perspective, do we want to
> always
> > >> > >>>>> issue a
> > >> > >>>>>>>>>>>>>>>> `pause/resume`
> > >> > >>>>>>>>>>>>>>>>>> to all the instances or not? For example, we can
> > >> define
> > >> > >>>>>>> the
> > >> > >>>>>>>>>>>>> semantics
> > >> > >>>>>>>>>>>>>>>> of
> > >> > >>>>>>>>>>>>>>>>>> the function as "you only need to call this
> > function
> > >> on
> > >> > >>>>>>> any
> > >> > >>>>>>>> of
> > >> > >>>>>>>>>>> the
> > >> > >>>>>>>>>>>>>>>>>> application's instances, and all instances would
> > then
> > >> > >>>>>>> pause
> > >> > >>>>>>>>> (via
> > >> > >>>>>>>>>>> the
> > >> > >>>>>>>>>>>>>>>>>> rebalance error codes)", or as "you would call
> this
> > >> > >>>>>>> function
> > >> > >>>>>>>>> for
> > >> > >>>>>>>>>>> all
> > >> > >>>>>>>>>>>>>>>> the
> > >> > >>>>>>>>>>>>>>>>>> instances of an application". Which one are you
> > >> > >>>>> referring
> > >> > >>>>>>>> to?
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> My initial intent is that one would call this
> > function
> > >> > >>>>> on
> > >> > >>>>>>> any
> > >> > >>>>>>>>>>>>> instances
> > >> > >>>>>>>>>>>>>>>> of
> > >> > >>>>>>>>>>>>>>>>> the application that one wishes to pause.  This
> > should
> > >> > >>>>>>> allow
> > >> > >>>>>>>>> more
> > >> > >>>>>>>>>>>>>> control
> > >> > >>>>>>>>>>>>>>>>> (in case one wanted to pause a portion of the
> > >> > >>>>> instances).
> > >> > >>>>>>> On
> > >> > >>>>>>>>> the
> > >> > >>>>>>>>>>>>> other
> > >> > >>>>>>>>>>>>>>>>> hand, this approach would put more work on the
> > >> > >>>>> implementer
> > >> > >>>>>>> to
> > >> > >>>>>>>>>>>>>> coordinate
> > >> > >>>>>>>>>>>>>>>>> calling pause or resume across instances.
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> If the other option is more suitable, happy to do
> > that
> > >> > >>>>>>>> instead.
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> 3) With EOS, there's a transaction timeout which
> > would
> > >> > >>>>>>>>> determine
> > >> > >>>>>>>>>>> how
> > >> > >>>>>>>>>>>>>>>>> long a
> > >> > >>>>>>>>>>>>>>>>>> transaction can stay idle before it's
> > force-aborted on
> > >> > >>>>> the
> > >> > >>>>>>>>> broker
> > >> > >>>>>>>>>>>>>>>> side. I
> > >> > >>>>>>>>>>>>>>>>>> think when a pause is issued, that means we'd
> need
> > to
> > >> > >>>>>>>>> immediately
> > >> > >>>>>>>>>>>>>>>> commit
> > >> > >>>>>>>>>>>>>>>>>> the current transaction for EOS since we do not
> > know
> > >> > >>>>> how
> > >> > >>>>>>>> long
> > >> > >>>>>>>>> we
> > >> > >>>>>>>>>>>>> could
> > >> > >>>>>>>>>>>>>>>>>> pause for. Is that right? If yes could you please
> > >> > >>>>> clarify
> > >> > >>>>>>>>> that in
> > >> > >>>>>>>>>>>>> the
> > >> > >>>>>>>>>>>>>>>> doc
> > >> > >>>>>>>>>>>>>>>>>> as well.
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Good point.  My intent is for pause() to wait for
> > the
> > >> > >>>>> next
> > >> > >>>>>>>>>>> iteration
> > >> > >>>>>>>>>>>>>>>>> through `runOnce()` and then only skip over the
> > >> > >>>>> processing
> > >> > >>>>>>>> for
> > >> > >>>>>>>>>>> paused
> > >> > >>>>>>>>>>>>>>>> tasks
> > >> > >>>>>>>>>>>>>>>>> in `taskManager.process(numIterations, time)`.
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Do commits live inside that call or do they live
> > >> > >>>>>>>>> across/outside of
> > >> > >>>>>>>>>>>>> it?
> > >> > >>>>>>>>>>>>>>>> In
> > >> > >>>>>>>>>>>>>>>>> the former case, I think there shouldn't be any
> > issues
> > >> > >>>>> with
> > >> > >>>>>>>>> EOS.
> > >> > >>>>>>>>>>>>>>>>> Otherwise, we may need to work through some
> details
> > to
> > >> > >>>>> get
> > >> > >>>>>>>> EOS
> > >> > >>>>>>>>>>> right.
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Once we figure that out, I can update the KIP.
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Thanks,
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>> Jim
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> Guozhang
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> On Wed, May 4, 2022 at 10:51 AM Jim Hughes
> > >> > >>>>>>>>>>>>>>>> <jhug...@confluent.io.invalid
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>> Hi all,
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>> I have written up a KIP for adding the ability
> to
> > >> > >>>>> pause
> > >> > >>>>>>> and
> > >> > >>>>>>>>>>> resume
> > >> > >>>>>>>>>>>>>>>> the
> > >> > >>>>>>>>>>>>>>>>>>> processing of a topology in AK Streams.  The KIP
> > is
> > >> > >>>>> here:
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>
> > >> > >>>
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>> Thanks in advance for your feedback!
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>> Cheers,
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>> Jim
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>> --
> > >> > >>>>>>>>>>>>>>>>>> -- Guozhang
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> --
> > >> > >>>>>>>>>>>> -- Guozhang
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>> --
> > >> > >>>>>>>> -- Guozhang
> > >> > >>>>>>>>
> > >> > >>>>>>>
> > >> > >>>>>
> > >> > >>>>
> > >> > >>>
> > >> > >>
> > >> > >
> > >> >
> > >>
> >
>

Reply via email to