Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-12 Thread Arvid Heise
Hi Arti, ouch 3M is pretty far off the current setting. Flink aside, you need to use 100 machines at the very minimum with the current approach (AsyncHTTP and your evaluated machine). That's probably a point where I'd try other libraries first and most importantly I'd evaluate different machines.

Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-12 Thread Arti Pande
Hi Arvid, *Shared api client*: Actually in the flow of writing I missed to mention that we switched to a static shared instance of async http client for all 7 subtasks of the AsyncIO. The number of threads therefore is not 140 (20 * 7) but just (16 + 8 or 16 = 24 or 32) which includes a static sha

Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-12 Thread Arvid Heise
Hi Arti, Thank you very much for providing so much information. One additional test that you could do is to check how the pipeline performs by mocking the actual HTTP request and directly return a static response through Async IO. This would give you an exact number including potential serializat

Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-12 Thread Arti Pande
Hi Arvid, Thanks for quick reply and totally agree with you on the differences between microbenchmarks and a full benchmark with specific use-case. Thanks for sending the microbenchmark screenshot. For our use-case, the streaming pipeline has five main transformations that have business logic, of

Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-11 Thread Arti Pande
Hi Arvid, Thanks for a quick reply. The second reference link ( http://codespeed.dak8s.net:8000/timeline/?ben=asyncWait.ORDERED&env=2) from your answer is not accessible though. Could you share some more numbers from it? Are these benchmarks published somewhere? Without actual IO call, Async IO

Re: Flink Async IO operator tuning / micro-benchmarks

2020-06-10 Thread Arvid Heise
Hi Arti, microbenchmarks for AsyncIO are available [1] and the results shown in [2]. So you can roughly expect 1.6k records/ms per core to be the upper limit without any actual I/O. That range should hold for Flink 1.10 and coming Flink 1.11. I cannot say much about older versions and you didn't s

Flink Async IO operator tuning / micro-benchmarks

2020-06-10 Thread Arti Pande
As Flink Async IO operator is designed for external API or DB calls, are there any specific guidelines / tips for scaling up this operator? Particularly for use-cases where incoming events are being ingested at a very high-speed and the Async IO operator with orderedWait mode can not keep up with t