Apologies for the last mail and noise, it was ment to be a direct reply. On Fri, 10 Apr, 2026, 8:10 am Ganesh Sivakumar, <[email protected]> wrote:
> Hey Joey, This is Ganesh. I am building a new portable beam runner in > rust. I guess we are working in same territory, as of now the runner can > accept pipeline from client via job service and run greedy fusion and > create execution graph. I am currently working on control and data channel > orchestration with runner and worker(SDK harness. Lot of work to do. > > With your question, based on my understanding your trying send multiple > batches of Elements to worker once and trying to check if worker is ready > to accept it. > > I am not at your phase yet so mine may not be the best approach. I guess > you are trying to avoid the setup cost for every batch. One way is to pool > elements and make the batch size larger but the runner might spend some > idle time in collecting the elements and would introduce some latency. 2nd > is to spin up multiple workers and send each batch to each worker so that > they run the batch in parallel and save some time. > > Thanks, > Ganesh. > > On Fri, 10 Apr, 2026, 3:04 am Joey Tran, <[email protected]> wrote: > >> Hey all, >> >> I'm working on extending my pipeline runner so that multiple batches of >> inputs are streamed into a single bundle (to amortize bundle setup time) >> rather than just sending a single batch of inputs into the data channel and >> then closing the data channel. >> >> I'm using the ProcessBundleProgressRequest[1] to poll my workers to see >> if they're idle and ready for more inputs using the >> `consuming_received_data` field, but I noticed that >> `consuming_received_data` doesn't actually imply whether or not the worker >> is busy or ready for work. Specifically, while the worker is setting up >> (e.g. running `setup_bundle`), `consuming_received_data` is set to False >> even though the worker is not actually ready for work. >> >> This makes it a bit awkward for runners to know whether or not it's safe >> to send work to a worker. If `setup_bundle` takes a very long time, then a >> runner might repeatedly send more and more work if it uses >> `consuming_received_data` as an indicator of idleness. Am I misusing >> `consuming_received_data`? Is there another more reliable way to detect >> worker idleness? >> >> cheers, >> Joey >> >> >> https://github.com/apache/beam/blob/357b7206f82f788b55732247c9096527f92adf4f/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L476 >> >
