On Mon, Oct 12, 2020 at 9:23 PM KV 59 <kvajjal...@gmail.com> wrote:

> Thanks for your responses.
>
> I have a follow-up question, when you say
>
>> elementCountAtLeast means that the runner can buffer as many as it wants
>> and can decide to offer a low latency pipeline by triggering often or
>> better throughput through the use of buffering.
>
>
> Does it mean, I as a pipeline developer cannot control how often the
> runner triggers?
>
(Please correct if I am wrong):

Yes, as I recall when trigger conditions meet, it allows the runner to fire
the trigger, but the runner can choose when to fire.



>
> Kishore
>
> On Mon, Oct 12, 2020 at 2:15 PM Kenneth Knowles <k...@apache.org> wrote:
>
>> Another thing to keep in mind - apologies if it was already clear:
>> triggering governs aggregation (GBK / Combine). It does not have any effect
>> on stateful DoFn.
>>
>> On Mon, Oct 12, 2020 at 9:24 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> The default trigger will only fire when the global window closes which
>>> does happen with sources whose watermark goes > GlobalWindow.MAX_TIMESTAMP
>>> or during pipeline drain with partial results in streaming. Bounded sources
>>> commonly have their watermark advance to the end of time when they complete
>>> and some unbounded sources can stop producing output if they detect the end.
>>>
>>> Parallelization for stateful DoFns are per key and window.
>>> Parallelization for GBK is per key and window pane. Note that
>>> elementCountAtLeast means that the runner can buffer as many as it wants
>>> and can decide to offer a low latency pipeline by triggering often or
>>> better throughput through the use of buffering.
>>>
>>>
>>>
>>> On Mon, Oct 12, 2020 at 8:22 AM KV 59 <kvajjal...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm building a pipeline to process events as they come and do not
>>>> really care about the event time and watermark. I'm more interested in not
>>>> discarding the events and reducing the latency. The downstream pipeline has
>>>> a stateful DoFn. I understand that the default window strategy is Global
>>>> Windows,. I did not completely understand the default trigger as per
>>>>
>>>> https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/transforms/windowing/DefaultTrigger.html
>>>> it says Repeatedly.forever(AfterWatermark.pastEndOfWindow()), In case
>>>> of global window how does this work (there is no end of window)?.
>>>>
>>>> My source is Google PubSub and pipeline is running on Dataflow runner I
>>>> have defined my window transform as below
>>>>
>>>> input.apply(TRANSFORM_NAME, Window.<T>into(new GlobalWindows())
>>>>> .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
>>>>> .withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes())
>>>>
>>>>
>>>> A couple of questions
>>>>
>>>>    1. Is triggering after each element inefficient in terms of
>>>>    persistence(serialization) after each element and also parallelism
>>>>    triggering after each looks like a serial execution?
>>>>    2. How does Dataflow parallelize in such cases of triggers?
>>>>
>>>>
>>>> Thanks and appreciate the responses.
>>>>
>>>> Kishore
>>>>
>>>

Reply via email to