The pipeline sees the message in dofns before the windowing. However any
messages that were published before the pipeline starts will not end up in
windows. As near as I can tell windows cannot have data older than the
process time at which the pipeline was started.

On Wed, May 21, 2025 at 12:52 PM Reuven Lax via user <user@beam.apache.org>
wrote:

> If you create a Pub/Sub subscription before you start the Beam pipeline,
> the subscription will capture all of those messages (as long as you start
> the Beam pipeline within 7 days). You can then start the Beam pipeline
> against that subscription, which should do what you want.
>
> On Fri, May 16, 2025 at 3:37 PM Jonathan Hope <
> jonathan.douglas.h...@gmail.com> wrote:
>
>> I have a use case where I would like to start publishing messages to a
>> PubSub topic in advance of starting a streaming pipeline that consumes
>> those messages. When the pipeline starts I would like it to "catch up"
>> which is to say to read all of the messages that were published even before
>> it started. In this case event time doesn't matter to me. All that matters
>> is that I see every message once.
>>
>> My initial thought was something like this:
>>
>> ```
>> trg := trigger.AfterAny([]trigger.Trigger{
>>   trigger.Repeat(trigger.AfterProcessingTime().PlusDelay(time.Minute *
>> 1)),
>>   trigger.AfterCount(1_000),
>> })
>>
>> beam.WindowInto(s, window.NewGlobalWindows(), x, beam.Trigger(trg),
>> beam.PanesDiscard())
>> ```
>>
>> Basically process a minutes worth OR a 1000 messages. This definitely
>> doesn't work though. Later on I call `databaseio.Write`, and that never
>> does anything. However if I publish a message _while_ the pipeline is
>> running `databaseio.Write` seems to pick it up. I will freely admit I don't
>> have much of an idea what I'm doing here.
>>
>

Reply via email to