Yes, if you block that bundle will not progress. Generally streaming
pipelines are processing many (hundreds) of bundles in parallel (e.g. one
per key on dataflow), but at the source there may not be as much available
parallelism and it's better to return than wait if there are no elements
left ot read.

On Fri, Sep 15, 2023 at 10:42 AM Ruben Vargas <ruben.var...@metova.com>
wrote:

> Hello thanks for the reply
>
> I was digging into the UnboundedReader interface, and I observed that some
> implementations block the entire progress of the other inputs when they get
> blocked into the advance() method, (probably waiting if there are new
> elements or not), an example of this is the AWS SQSIO  implementation. if I
> return true or false immediately the progress of the main input continues,
> but If I wait for results on the advance() method, all of other inputs get
> blocked
>
>
> Is that assumption correct?
>
>
>
> El El vie, 15 de septiembre de 2023 a la(s) 10:59, Robert Bradshaw via
> user <user@beam.apache.org> escribió:
>
>> Beam will block on side inputs until at least one value is available (or
>> the watermark has advanced such that we can be sure one will never become
>> available, which doesn't really apply to the global window case).
>> After that, workers generally cache the side input value (for performance
>> reasons) but may periodically re-fetch it (the exact cadence probably
>> depends on the runner implementation).
>>
>> On Tue, Sep 12, 2023 at 10:34 PM Ruben Vargas <ruben.var...@metova.com>
>> wrote:
>>
>>> Hello Everyone
>>>
>>> I have a question, I have on my pipeline one side input that
>>> fetches some configurations from an API endpoint each 30 seconds, my
>>> question is this.
>>>
>>>
>>> I have something similar to what is showed in the side input patterns
>>> documentation
>>>
>>>  PCollectionView<Map<String, String>> map =
>>>         p.apply(GenerateSequence.from(0).withRate(1,
>>> Duration.standardSeconds(5L)))
>>>             .apply(
>>>                 ParDo.of(
>>>                     new DoFn<Long, Map<String, String>>() {
>>>
>>>                       @ProcessElement
>>>                       public void process(
>>>                           @Element Long input,
>>>                           @Timestamp Instant timestamp,
>>>                           OutputReceiver<Map<String, String>> o) {
>>>                         call HTTP endpoint here!!
>>>                       }
>>>                     }))
>>>             .apply(
>>>                 Window.<Map<String, String>>into(new GlobalWindows())
>>>
>>> .triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane()))
>>>                     .discardingFiredPanes())
>>>             .apply(Latest.globally())
>>>             .apply(View.asSingleton());
>>>
>>> What happens if for example the HTTP endpoint takes time to respond due
>>> some network issues and/or the amount of data. Is this gonna introduce
>>> delays on my main pipeline? Is the main pipeline blocked until the pardo
>>> that processes the side input ends?
>>>
>>> I don't care too much about the consistency here, I mean if the
>>> configuration changed in the Time T1 I don't care if some registries with
>>> T2 timestamp are processed with the configuration version of T1.
>>>
>>>
>>> Regards.
>>>
>>>

Reply via email to