Re: Flink auto-scaling feature and documentation suggestions

2021-05-06 Thread vishalovercome
Thank you for answering all my questions. My suggestion would be to start off with exposing an API to allow dynamically changing operator parallelism as the users of flink will be better able to decide the right scaling policy. Once this functionality is there, its just a matter of providing polici

Re: Flink auto-scaling feature and documentation suggestions

2021-05-06 Thread vishalovercome
We do exactly what you mentioned. However, it's not that simple unfortunately. Our services don't have a predictable performance as traffic varies a lot during the day. As I've explained above increase source parallelism to 2 was enough to tip over our services and reducing parallelism of the asy

Re: Flink auto-scaling feature and documentation suggestions

2021-05-06 Thread vishalovercome
I am using the async IO operator. The problem is that increasing source parallelism from 1 to 2 was enough to tip our systems over the edge. Reducing the parallelism of async IO operator to 2 is not an option as that would reduce the throughput quite a bit. This means that no matter what we do, we'

Re: Flink auto-scaling feature and documentation suggestions

2021-05-06 Thread Till Rohrmann
Yes, exposing an API to adjust the parallelism of individual operators is definitely a good step towards the auto-scaling feature which we will consider. The missing piece is persisting this information so that in case of recovery you don't recover with a completely different parallelism. I also a

Re: Flink auto-scaling feature and documentation suggestions

2021-05-06 Thread Till Rohrmann
Hi Vishal, thanks a lot for all your feedback on the new reactive mode. I'll try to answer your questions. 0. In order to avoid confusion let me quickly explain a bit of terminology: The reactive mode is the new feature that allows Flink to react to newly available resources and to make use of th

Re: Flink auto-scaling feature and documentation suggestions

2021-05-05 Thread vishalovercome
Yes. While back-pressure would eventually ensure high throughput, hand tuning parallelism became necessary because the job with high source parallelism would immediately bring down our internal services - not giving enough time to flink to adjust the in-rate. Plus running all operators at such a hi

Re: Flink auto-scaling feature and documentation suggestions

2021-05-05 Thread Ken Krugler
Hi Vishal, WRT “bring down our internal services” - a common pattern with making requests to external services is to measure latency, and throttle (delay) requests in response to increased latency. You’ll see this discussed frequently on web crawling forums as an auto-tuning approach. Typical

Re: Flink auto-scaling feature and documentation suggestions

2021-05-05 Thread David Anderson
Well, I was thinking you could have avoided overwhelming your internal services by using something like Flink's async i/o operator, tuned to limit the total number of concurrent requests. That way the pipeline could have uniform parallelism without overwhelming those services, and then you'd rely o

Re: Flink auto-scaling feature and documentation suggestions

2021-05-05 Thread David Anderson
Interesting. So if I understand correctly, basically you limited the parallelism of the sources in order to avoid running the job with constant backpressure, and then scaled up the windows to maximize throughput. On Tue, May 4, 2021 at 11:23 PM vishalovercome wrote: > In one of my jobs, windowin

Re: Flink auto-scaling feature and documentation suggestions

2021-05-04 Thread vishalovercome
In one of my jobs, windowing is the costliest operation while upstream and downstream operators are not as resource intensive. There's another operator in this job that communicates with internal services. This has high parallelism as well but not as much as that of the windowing operation. Running

Re: Flink auto-scaling feature and documentation suggestions

2021-05-04 Thread David Anderson
Could you describe a situation in which hand-tuning the parallelism of individual operators produces significantly better throughput than the default approach? I think it would help this discussion if we could have a specific use case in mind where this is clearly better. Regards, David On Tue, M

Re: Flink auto-scaling feature and documentation suggestions

2021-05-04 Thread vishalovercome
Forgot to add one more question - 7. If maxParallelism needs to be set to control parallelism, then wouldn't that mean that we wouldn't ever be able to take a savepoint and rescale beyond the configured maxParallelism? This would mean that we can never achieve hand tuned resource efficient. I will

Re: Flink auto-scaling feature and documentation suggestions

2021-05-04 Thread vishalovercome
Some questions about adaptive scheduling documentation - "If new slots become available the job will be scaled up again, up to the configured parallelism". Does parallelism refer to maxParallelism or parallelism? I'm guessing its the latter because the doc later mentions - "In Reactive Mode (see