The pipeline sees the message in dofns before the windowing. However any messages that were published before the pipeline starts will not end up in windows. As near as I can tell windows cannot have data older than the process time at which the pipeline was started.
On Wed, May 21, 2025 at 12:52 PM Reuven Lax via user <user@beam.apache.org> wrote: > If you create a Pub/Sub subscription before you start the Beam pipeline, > the subscription will capture all of those messages (as long as you start > the Beam pipeline within 7 days). You can then start the Beam pipeline > against that subscription, which should do what you want. > > On Fri, May 16, 2025 at 3:37 PM Jonathan Hope < > jonathan.douglas.h...@gmail.com> wrote: > >> I have a use case where I would like to start publishing messages to a >> PubSub topic in advance of starting a streaming pipeline that consumes >> those messages. When the pipeline starts I would like it to "catch up" >> which is to say to read all of the messages that were published even before >> it started. In this case event time doesn't matter to me. All that matters >> is that I see every message once. >> >> My initial thought was something like this: >> >> ``` >> trg := trigger.AfterAny([]trigger.Trigger{ >> trigger.Repeat(trigger.AfterProcessingTime().PlusDelay(time.Minute * >> 1)), >> trigger.AfterCount(1_000), >> }) >> >> beam.WindowInto(s, window.NewGlobalWindows(), x, beam.Trigger(trg), >> beam.PanesDiscard()) >> ``` >> >> Basically process a minutes worth OR a 1000 messages. This definitely >> doesn't work though. Later on I call `databaseio.Write`, and that never >> does anything. However if I publish a message _while_ the pipeline is >> running `databaseio.Write` seems to pick it up. I will freely admit I don't >> have much of an idea what I'm doing here. >> >