Thanks for your responses.

I have a follow-up question, when you say

> elementCountAtLeast means that the runner can buffer as many as it wants
> and can decide to offer a low latency pipeline by triggering often or
> better throughput through the use of buffering.


Does it mean, I as a pipeline developer cannot control how often the runner
triggers?

Kishore

On Mon, Oct 12, 2020 at 2:15 PM Kenneth Knowles <k...@apache.org> wrote:

> Another thing to keep in mind - apologies if it was already clear:
> triggering governs aggregation (GBK / Combine). It does not have any effect
> on stateful DoFn.
>
> On Mon, Oct 12, 2020 at 9:24 AM Luke Cwik <lc...@google.com> wrote:
>
>> The default trigger will only fire when the global window closes which
>> does happen with sources whose watermark goes > GlobalWindow.MAX_TIMESTAMP
>> or during pipeline drain with partial results in streaming. Bounded sources
>> commonly have their watermark advance to the end of time when they complete
>> and some unbounded sources can stop producing output if they detect the end.
>>
>> Parallelization for stateful DoFns are per key and window.
>> Parallelization for GBK is per key and window pane. Note that
>> elementCountAtLeast means that the runner can buffer as many as it wants
>> and can decide to offer a low latency pipeline by triggering often or
>> better throughput through the use of buffering.
>>
>>
>>
>> On Mon, Oct 12, 2020 at 8:22 AM KV 59 <kvajjal...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I'm building a pipeline to process events as they come and do not really
>>> care about the event time and watermark. I'm more interested in not
>>> discarding the events and reducing the latency. The downstream pipeline has
>>> a stateful DoFn. I understand that the default window strategy is Global
>>> Windows,. I did not completely understand the default trigger as per
>>>
>>> https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/transforms/windowing/DefaultTrigger.html
>>> it says Repeatedly.forever(AfterWatermark.pastEndOfWindow()), In case
>>> of global window how does this work (there is no end of window)?.
>>>
>>> My source is Google PubSub and pipeline is running on Dataflow runner I
>>> have defined my window transform as below
>>>
>>> input.apply(TRANSFORM_NAME, Window.<T>into(new GlobalWindows())
>>>> .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
>>>> .withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes())
>>>
>>>
>>> A couple of questions
>>>
>>>    1. Is triggering after each element inefficient in terms of
>>>    persistence(serialization) after each element and also parallelism
>>>    triggering after each looks like a serial execution?
>>>    2. How does Dataflow parallelize in such cases of triggers?
>>>
>>>
>>> Thanks and appreciate the responses.
>>>
>>> Kishore
>>>
>>

Reply via email to