> Aaron - do you have the information you need to implement your sink? My
impression is that you have quite a good grasp of the issues even before
you asked.
Yes I do thank you. I really appreciate the thorough help from everyone
Thank you
On Wed, Dec 4, 2019 at 9:41 AM Jan Lukavský wrote:
> Hi
Hi Kenn,
On 12/4/19 5:38 AM, Kenneth Knowles wrote:
Jan - let's try to defrag the threads on your time sorting proposal.
This thread may have useful ideas but I want to focus on helping Aaron
in this thread. You can link to this thread from other threads or from
a design doc. Does this seem OK
Jan - let's try to defrag the threads on your time sorting proposal. This
thread may have useful ideas but I want to focus on helping Aaron in this
thread. You can link to this thread from other threads or from a design
doc. Does this seem OK to you?
Aaron - do you have the information you need to
> Trigger firings can have decreasing event timestamps w/ the minimum
timestamp combiner*. I do think the issue at hand is best analyzed in
terms of the explicit ordering on panes. And I do think we need to have
an explicit guarantee or annotation strong enough to describe a
correct-under-all-a
On Tue, Nov 26, 2019 at 1:00 AM Jan Lukavský wrote:
> > I will not try to formalize this notion in this email. But I will note
> that since it is universally assured, it would be zero cost and
> significantly safer to formalize it and add an annotation noting it was
> required. It has nothing to
> I will not try to formalize this notion in this email. But I will
note that since it is universally assured, it would be zero cost and
significantly safer to formalize it and add an annotation noting it was
required. It has nothing to do with event time ordering, only trigger
firing ordering.
Hi Aaron,
Another insightful observation.
Whenever an aggregation (GBK / Combine per key) has a trigger firing, there
is a per-key sequence number attached. It is included in metadata known as
"PaneInfo" [1]. The value of PaneInfo.getIndex() is colloquially referred
to as the "pane index". You ca
The blog posts on stateful and timely computation with Beam should help
clarify a lot about how to use state and timers to do this:
https://beam.apache.org/blog/2017/02/13/stateful-processing.html
https://beam.apache.org/blog/2017/08/28/timely-processing.html
You'll see there how there's an implic
If you have a pipeline that looks like Input -> GroupByKey -> ParDo, while
it is not guaranteed, in practice the sink will observe the trigger firings
in order (per key), since it'll be fused to the output of the GBK operation
(in all runners I know of).
There have been a couple threads about trig
@Jan @Pablo Thank you
@Pablo In this case it's a single global windowed Combine/perKey, triggered
per element. Keys are few (client accounts) so they can live forever.
It looks like just by virtue of using a stateful ParDo I could get this
final execution to be "serialized" per key. (Then I could
One addition, to make the list of options exhaustive, there is probably
one more option
c) create a ParDo keyed by primary key of your sink, cache the last
write in there and compare it locally, without the need to query the
database
It would still need some timer to clear values after wate
Hi Aaron,
maybe someone else will give another option, but if I understand
correctly what you want to solve, then you essentially have to do either:
a) use the compare & swap mechanism in the sink you described
b) use a buffer to buffer elements inside the outputting ParDo and
only output
If I understand correctly - your pipeline has some kind of windowing, and
on every trigger downstream of the combiner, the pipeline updates a cache
with a single, non-windowed value. Is that correct?
What are your keys for this pipeline? You could work this out with, as you
noted, a timer that fir
Suppose I trigger a Combine per-element (in a high-volume stream) and use a
ParDo as a sink.
I assume there is no guarantee about the order that my ParDo will see these
triggers, especially as it processes in parallel, anyway.
That said, my sink writes to a db or cache and I would not like the ca
14 matches
Mail list logo