Re: real real-time beam

2019-12-06 Thread Aaron Dixon
> Aaron - do you have the information you need to implement your sink? My impression is that you have quite a good grasp of the issues even before you asked. Yes I do thank you. I really appreciate the thorough help from everyone Thank you On Wed, Dec 4, 2019 at 9:41 AM Jan Lukavský wrote: > Hi

Re: real real-time beam

2019-12-04 Thread Jan Lukavský
Hi Kenn, On 12/4/19 5:38 AM, Kenneth Knowles wrote: Jan - let's try to defrag the threads on your time sorting proposal. This thread may have useful ideas but I want to focus on helping Aaron in this thread. You can link to this thread from other threads or from a design doc. Does this seem OK

Re: real real-time beam

2019-12-03 Thread Kenneth Knowles
Jan - let's try to defrag the threads on your time sorting proposal. This thread may have useful ideas but I want to focus on helping Aaron in this thread. You can link to this thread from other threads or from a design doc. Does this seem OK to you? Aaron - do you have the information you need to

Re: real real-time beam

2019-11-27 Thread Jan Lukavský
> Trigger firings can have decreasing event timestamps w/ the minimum timestamp combiner*. I do think the issue at hand is best analyzed in terms of the explicit ordering on panes. And I do think we need to have an explicit guarantee or annotation strong enough to describe a correct-under-all-a

Re: real real-time beam

2019-11-26 Thread Kenneth Knowles
On Tue, Nov 26, 2019 at 1:00 AM Jan Lukavský wrote: > > I will not try to formalize this notion in this email. But I will note > that since it is universally assured, it would be zero cost and > significantly safer to formalize it and add an annotation noting it was > required. It has nothing to

Re: real real-time beam

2019-11-26 Thread Jan Lukavský
> I will not try to formalize this notion in this email. But I will note that since it is universally assured, it would be zero cost and significantly safer to formalize it and add an annotation noting it was required. It has nothing to do with event time ordering, only trigger firing ordering.

Re: real real-time beam

2019-11-25 Thread Kenneth Knowles
Hi Aaron, Another insightful observation. Whenever an aggregation (GBK / Combine per key) has a trigger firing, there is a per-key sequence number attached. It is included in metadata known as "PaneInfo" [1]. The value of PaneInfo.getIndex() is colloquially referred to as the "pane index". You ca

Re: real real-time beam

2019-11-25 Thread Pablo Estrada
The blog posts on stateful and timely computation with Beam should help clarify a lot about how to use state and timers to do this: https://beam.apache.org/blog/2017/02/13/stateful-processing.html https://beam.apache.org/blog/2017/08/28/timely-processing.html You'll see there how there's an implic

Re: real real-time beam

2019-11-25 Thread Steve Niemitz
If you have a pipeline that looks like Input -> GroupByKey -> ParDo, while it is not guaranteed, in practice the sink will observe the trigger firings in order (per key), since it'll be fused to the output of the GBK operation (in all runners I know of). There have been a couple threads about trig

Re: real real-time beam

2019-11-25 Thread Aaron Dixon
@Jan @Pablo Thank you @Pablo In this case it's a single global windowed Combine/perKey, triggered per element. Keys are few (client accounts) so they can live forever. It looks like just by virtue of using a stateful ParDo I could get this final execution to be "serialized" per key. (Then I could

Re: real real-time beam

2019-11-25 Thread Jan Lukavský
One addition, to make the list of options exhaustive, there is probably one more option  c) create a ParDo keyed by primary key of your sink, cache the last write in there and compare it locally, without the need to query the database It would still need some timer to clear values after wate

Re: real real-time beam

2019-11-25 Thread Jan Lukavský
Hi Aaron, maybe someone else will give another option, but if I understand correctly what you want to solve, then you essentially have to do either:  a) use the compare & swap mechanism in the sink you described  b) use a buffer to buffer elements inside the outputting ParDo and only output

Re: real real-time beam

2019-11-25 Thread Pablo Estrada
If I understand correctly - your pipeline has some kind of windowing, and on every trigger downstream of the combiner, the pipeline updates a cache with a single, non-windowed value. Is that correct? What are your keys for this pipeline? You could work this out with, as you noted, a timer that fir

real real-time beam

2019-11-25 Thread Aaron Dixon
Suppose I trigger a Combine per-element (in a high-volume stream) and use a ParDo as a sink. I assume there is no guarantee about the order that my ParDo will see these triggers, especially as it processes in parallel, anyway. That said, my sink writes to a db or cache and I would not like the ca