Re: Rate Limiting in Beam

2021-06-18 Thread Vincent Marquez
Thanks, good to know. Ultimately I still want to lobby for a way to throttle bundles based on progress made further down the pipeline but I realize that might involve major architectural changes. Sometimes I'm forced to cancel a streaming pipeline and I'm unable to drain it, so that can present a

Re: Rate Limiting in Beam

2021-06-17 Thread Vincent Marquez
Do individual stages of a beam job exhibit backpressure to the consumer though? I would think buffering elements with Beam's BagState might lead to OOM errors on the workers if the consumerIO continues to feed in data. Or does something else happen? --Vincent On Thu, Jun 17, 2021 at 11:42 AM Lu

Re: Rate Limiting in Beam

2021-06-17 Thread Luke Cwik
If the service returns sensible throttling errors you could use a StatefulDoFn and buffer elements that error out due to throttling from the service instead of failing the bundle and schedule a timer to replay them. This will effectively max out the service as long as there is more data then the se

Re: Rate Limiting in Beam

2021-04-16 Thread Daniel Thevessen
Thanks for the quick response. Querying the Dataflow API seems like something that could break easily, but I can go with that if it turns out to be easier. The Splittable DoFn way sounds interesting, but I'm not very familiar with that so I have some questions around it: Splits seem to operate on

Re: Rate Limiting in Beam

2021-04-15 Thread Pablo Estrada
You could implement a Splittable DoFn that generates a limited number of splits. We do something like this for GenerateSequence.from(X).withRate(...) via UnboundedCountingSource[1]. It keeps track of its local EPS, and generates new splits if more EPSs are wanted. This should help you scale up to t

Re: Rate Limiting in Beam

2021-04-15 Thread Evan Galpin
Could you possibly use a side input with fixed interval triggering[1] to query the Dataflow API to get the most recent log statement of scaling as suggested here[2]? [1] https://beam.apache.org/documentation/patterns/side-inputs/ [2] https://stackoverflow.com/a/54406878/6432284 On Thu, Apr 15, 20

Rate Limiting in Beam

2021-04-15 Thread Daniel Thevessen
Hi folks, I've been working on a custom PTransform that makes requests to another service, and would like to add a rate limiting feature there. The fundamental issue that I'm running into here is that I need a decent heuristic to estimate the worker count, so that each worker can independently set