Re: [DESIGN] Beam Triggered side input specification

2023-03-31 Thread Kenneth Knowles
Really helpful discussion. Sounds like we pretty much agree that having a clearer spec will be good? I'm augmenting ViewTest to have enough tests to exercise the proposed spec a bit more. I'm really largely focused on singleton, which I pretty much assume to be the output of a combiner. So in bat

Re: [DESIGN] Beam Triggered side input specification

2023-03-29 Thread Jan Lukavský
> Well yes it was (though as mentioned before, the fact that none of these designs were even written into the spec is a problem), though in some ways not a great one. The only global synchronization method we had was the watermark/end of window, so if the source PCollection was triggered by som

Re: [DESIGN] Beam Triggered side input specification

2023-03-28 Thread Reuven Lax via dev
On Tue, Mar 28, 2023 at 12:39 AM Jan Lukavský wrote: > > On 3/27/23 19:44, Reuven Lax via dev wrote: > > > > On Mon, Mar 27, 2023 at 5:43 AM Jan Lukavský wrote: > >> Hi, >> >> I'd like to clarify my understanding. Side inputs generally perform a >> left (outer) join, LHS side is the main input,

Re: [DESIGN] Beam Triggered side input specification

2023-03-28 Thread Jan Lukavský
> Makes sense, is this a design decision? I can imagine that waiting for side input watermark unconditionally adds latency, on the other hand an "unexpected" non-deterministic behavior can confuse users. This type of non-determinism after pipeline failure and recovery is even the most hard to d

Re: [DESIGN] Beam Triggered side input specification

2023-03-28 Thread Jan Lukavský
On 3/27/23 19:44, Reuven Lax via dev wrote: On Mon, Mar 27, 2023 at 5:43 AM Jan Lukavský wrote: Hi, I'd like to clarify my understanding. Side inputs generally perform a left (outer) join, LHS side is the main input, RHS is the side input. Not completely - it's more of wha

Re: [DESIGN] Beam Triggered side input specification

2023-03-27 Thread Reuven Lax via dev
On Mon, Mar 27, 2023 at 5:43 AM Jan Lukavský wrote: > Hi, > > I'd like to clarify my understanding. Side inputs generally perform a left > (outer) join, LHS side is the main input, RHS is the side input. > Not completely - it's more of what I would call a nested-loop join. I.e. if the side input

Re: [DESIGN] Beam Triggered side input specification

2023-03-27 Thread Jan Lukavský
Hi, I'd like to clarify my understanding. Side inputs generally perform a left (outer) join, LHS side is the main input, RHS is the side input. Doing streaming left join requires watermark synchronization, thus elements from the main input are buffered until main_input_timestamp > side_input_

[DESIGN] Beam Triggered side input specification

2023-03-23 Thread Kenneth Knowles
Hi all, I had a great chat with +Reza Rokni and +Reuven Lax yesterday about some inconsistencies in side input behavior, both before and after portability was introduced. I wrote up my thoughts about how we should specify the semantics and implement them: https://s.apache.org/beam-triggered-si