Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

Jan Lukavský Fri, 23 Feb 2024 00:36:53 -0800

For me it always helps to seek analogy in our physical reality. Streamprocessing actually has quite a good analogy for both event-time andprocessing-time - the simplest model for this being relativity theory.Event-time is the time at which events occur _at distant locations_. Dueto finite and invariant speed of light (which is actually reallyinvolved in the explanation why any stream processing is inevitablyunordered) these events are observed (processed) at different times(processing time, different for different observers). It is perfectlypossible for an observer to observe events at a rate that is higher thanone second per second. This also happens in reality for observers thattravel at relativistic speeds (which might be an analogy for fast -batch - (re)processing). Besides the invariant speed, there is alsoanother invariant - local clock (wall time) always ticks exactly at therate of one second per second, no matter what. It is not possible to"move faster or slower" through (local) time.

In my understanding the reason why we do not put any guarantees orbounds on the delay of firing processing time timers is purely technical- the processing is (per key) single-threaded, thus any timer has towait before any element processing finishes. This is only consequence ofa technical solution, not something fundamental.

Having said that, my point is that according to the above analogy, itshould be perfectly fine to fire processing time timers in batch basedon (local wall) time only. There should be no way of manipulating thislocal time (excluding tests). Watermarks should be affected the same wayas any buffering in a state that would happen in a stateful DoFn (i.e.set timer holds output watermark). We should probably pay attention tolooping timers, but it seems possible to define a valid stoppingcondition (input watermark at infinity).


 Jan

On 2/22/24 19:50, Kenneth Knowles wrote:

Forking this thread.
The state of processing time timers in this mode of processing is notsatisfactory and is discussed a lot but we should make everythingexplicit.
Currently, a state and timer DoFn has a number of logical watermarks:(apologies for fixed width not coming through in email lists). Treattimers as a back edge.
input --(A)----(C)--> ParDo(DoFn) ----(D)---> output
            ^                      |
|--(B)-----------------|
                           timers
(A) Input Element watermark: this is the watermark that promises thereis no incoming element with a timestamp earlier than it. Each inputelement's timestamp holds this watermark. Note that *event time timersfiring is according to this watermark*. But a runner commits changesto this watermark *whenever it wants*, in a way that can beconsistent. So the runner can absolute process *all* the elementsbefore advancing the watermark (A), and only afterwards start firingtimers.
(B) Timer watermark: this is a watermark that promises no timer is setwith an output timestamp earlier than it. Each timer that has anoutput timestamp holds this watermark. Note that timers can set newtimers, indefinitely, so this may never reach infinity even in a drainscenario.
(C) (derived) total input watermark: this is a watermark that is theminimum of the two above, and ensures that all state for the DoFn forexpired windows can be GCd after calling @OnWindowExpiration.
(D) output watermark: this is a promise that the DoFn will not outputearlier than the watermark. It is held by the total input watermark.
So a any timer, processing or not, holds the total input watermarkwhich prevents window GC, hence the timer must be fired. You can settimers without a timestamp and they will not hold (B) hence not holdthe total input / GC watermark (C). Then if a timer fires for anexpired window, it is ignored. But in general a timer that sets anoutput timestamp is saying that it may produce output, so it *must* befired, even in batch, for data integrity. There was a time beforetimers had output timestamps that we said that you *always* have tohave an @OnWindowExpiration callback for data integrity, andprocessing time timers could not hold the watermark. That is changed now.
One main purpose of processing time timers in streaming is to be a"timeout" for data buffered in state, to eventually flush. In thiscase the output timestamp should be the minimum of the elements instate (or equivalent). In batch, of course, this kind of timer is notrelevant and we should definitely not wait for it, because the goal isto just get through all the data. We can justify this by saying thatthe worker really has no business having any idea what time it reallyis, and the runner can just run the clock at whatever speed it wants.
Another purpose, brought up on the Throttle thread, is to wait orbackoff. In this case it would be desired for the timer to actuallycause batch processing to pause and wait. This kind of behavior hasnot been explored much. Notably the runner can absolutely process allelements first, then start to fire any enqueued processing timetimers. In the same way that state in batch can just be in memory,this *could* just be a call to sleep(). It all seems a bit sketchy soI'd love clearer opinions.
These two are both operational effects - as you would expect forprocessing time timers - and they seem to be in conflict. Maybe theyjust need different features?
I'd love to hear some more uses of processing time timers from thecommunity.
Kenn

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

Reply via email to