Thanks, good to know. Ultimately I still want to lobby for a way to
throttle bundles based on progress made further down the pipeline but I
realize that might involve major architectural changes.
Sometimes I'm forced to cancel a streaming pipeline and I'm unable to drain
it, so that can present a
Do individual stages of a beam job exhibit backpressure to the consumer
though? I would think buffering elements with Beam's BagState might lead
to OOM errors on the workers if the consumerIO continues to feed in data.
Or does something else happen?
--Vincent
On Thu, Jun 17, 2021 at 11:42 AM Lu
If the service returns sensible throttling errors you could use a
StatefulDoFn and buffer elements that error out due to throttling from the
service instead of failing the bundle and schedule a timer to replay them.
This will effectively max out the service as long as there is more data
then the se
Thanks for the quick response.
Querying the Dataflow API seems like something that could break easily, but
I can go with that if it turns out to be easier.
The Splittable DoFn way sounds interesting, but I'm not very familiar with
that so I have some questions around it:
Splits seem to operate on
You could implement a Splittable DoFn that generates a limited number of
splits. We do something like this for
GenerateSequence.from(X).withRate(...) via UnboundedCountingSource[1]. It
keeps track of its local EPS, and generates new splits if more EPSs are
wanted. This should help you scale up to t
Could you possibly use a side input with fixed interval triggering[1] to
query the Dataflow API to get the most recent log statement of scaling as
suggested here[2]?
[1]
https://beam.apache.org/documentation/patterns/side-inputs/
[2]
https://stackoverflow.com/a/54406878/6432284
On Thu, Apr 15, 20
Hi folks,
I've been working on a custom PTransform that makes requests to another
service, and would like to add a rate limiting feature there. The
fundamental issue that I'm running into here is that I need a decent
heuristic to estimate the worker count, so that each worker can
independently set