Hi, You might find this similar thread from the mailing list archive helpful : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/throttled-stream-td6138.html .
Best, Yassine 2017-01-20 10:53 GMT+01:00 Florian König <florian.koe...@micardo.com>: > Hi, > > i need to limit the rate of processing in a Flink stream application. > Specifically, the number of items processed in a .map() operation has to > stay under a certain maximum per second. > > At the moment, I have another .map() operation before the actual > processing, which just sleeps for a certain time (e.g., 250ms for a limit > of 4 requests / sec) and returns the item unchanged: > > … > > public T map(final T value) throws Exception { > Thread.sleep(delay); > return value; > } > > … > > This works as expected, but is a rather crude approach. Checkpointing the > job takes a very long time: minutes for a state of a few kB, which for > other jobs is done in a few milliseconds. I assume that letting the whole > thread sleep for most of the time interferes with the checkpointing - not > good! > > Would using a different synchronization mechanism (e.g., > https://google.github.io/guava/releases/19.0/api/docs/ > index.html?com/google/common/util/concurrent/RateLimiter.html) help to > make checkpointing work better? > > Or, preferably, is there a mechanism inside Flink that I can use to > accomplish the desired rate limiting? I haven’t found anything in the docs. > > Cheers, > Florian >