Hi All,

I'm building a pipeline to process events as they come and do not really
care about the event time and watermark. I'm more interested in not
discarding the events and reducing the latency. The downstream pipeline has
a stateful DoFn. I understand that the default window strategy is Global
Windows,. I did not completely understand the default trigger as per
https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/transforms/windowing/DefaultTrigger.html
it says Repeatedly.forever(AfterWatermark.pastEndOfWindow()), In case of
global window how does this work (there is no end of window)?.

My source is Google PubSub and pipeline is running on Dataflow runner I
have defined my window transform as below

input.apply(TRANSFORM_NAME, Window.<T>into(new GlobalWindows())
> .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
> .withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes())


A couple of questions

   1. Is triggering after each element inefficient in terms of
   persistence(serialization) after each element and also parallelism
   triggering after each looks like a serial execution?
   2. How does Dataflow parallelize in such cases of triggers?


Thanks and appreciate the responses.

Kishore

Reply via email to