Hey Joey, This is Ganesh. I am building a new portable beam runner in rust.
I guess we are working in same territory, as of now the runner can accept
pipeline from client via job service and run greedy fusion and create
execution graph. I am currently working on control and data channel
orchestration with runner and worker(SDK harness.  Lot of work to do.

With your question, based on my understanding your trying send multiple
batches of Elements to worker once and trying to check if worker is ready
to accept it.

I am not at your phase yet so mine may not be the best approach. I guess
you are trying to avoid the setup cost for every batch. One way is to pool
elements and make the batch size larger but the runner might spend some
idle time in collecting the elements and would introduce some latency. 2nd
is to spin up multiple workers and send each batch to each worker so that
they run the batch in parallel and save some time.

Thanks,
Ganesh.

On Fri, 10 Apr, 2026, 3:04 am Joey Tran, <[email protected]> wrote:

> Hey all,
>
> I'm working on extending my pipeline runner so that multiple batches of
> inputs are streamed into a single bundle (to amortize bundle setup time)
> rather than just sending a single batch of inputs into the data channel and
> then closing the data channel.
>
> I'm using the ProcessBundleProgressRequest[1] to poll my workers to see if
> they're idle and ready for more inputs using the `consuming_received_data`
> field, but I noticed that `consuming_received_data` doesn't actually imply
> whether or not the worker is busy or ready for work. Specifically, while
> the worker is setting up (e.g. running `setup_bundle`),
> `consuming_received_data` is set to False even though the worker is not
> actually ready for work.
>
> This makes it a bit awkward for runners to know whether or not it's safe
> to send work to a worker. If `setup_bundle` takes a very long time, then a
> runner might repeatedly send more and more work if it uses
> `consuming_received_data` as an indicator of idleness. Am I misusing
> `consuming_received_data`? Is there another more reliable way to detect
> worker idleness?
>
> cheers,
> Joey
>
>
> https://github.com/apache/beam/blob/357b7206f82f788b55732247c9096527f92adf4f/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L476
>

Reply via email to