Yeah, at some point I've investigated performance issues with AWS K8s. They
have somewhat strict rate limits on the K8s api server.
You run into the rate limits by configuring a very high checkpoint
frequency (I guess something like 500ms) and a high
state.checkpoints.num-retained count (e.g. 10).
Thanks for sharing your opinions on the proposal. The concerns sound
reasonable. I guess, I'm going to follow-up on Chesnay's idea about
combining multiple requests into one for the k8s implementation. Having a
performance test for the k8s API server access sounds like a good idea,
too. Both action
This is a nice FLIP. I particular like how much background it provides
on the issue; something that other FLIPs could certainly benefit from...
I went over the FLIP and had a chat with Matthias about it.
Somewhat unrelated to the FLIP we found a flaw in the current cleanup
mechanism of failed
Thanks Matthias for continuously improving the clean-up process.
Given that we highly depends on K8s APIServer for HA implementation, I am
not in favor of storing too many entries in the ConfigMap,
as well as adding more update requests to the APIServer. So I lean towards
Proposal #2. It just work
I would like to bring this topic up one more time. I put some more thought
into it and created FLIP-270 [1] as a follow-up of FLIP-194 [2] with an
updated version of what I summarized in my previous email. It would be
interesting to get some additional perspectives on this; more specifically,
the t