Hi Matthias and thanks for replying to this thread. > "Hard to answer from a 10,000ft view." We tried hard to include a detailed explanation of our use case in the Stack Overflow thread( https://stackoverflow.com/questions/71222496/how-to-achieve-high-availability-in-a-kafka-streams-app-during-deployment), while we also wanted to avoid making it too complicated. We can of course provide more details if needed. Did you read the Stack Overflow thread and are there any details you want us to explain?
> "As you are using Kubernetes, using stateful sets that allow you to re-attach disk should be the way to go" We have considered this approach and it may very well have to be the way to go. However, this approach would require us to terminate one replica instance before starting the other. This would not take us to our initial target that is approx 2 sec downtime a single time for each task. And we would always be limited by the time for Kubernetes to start our JVM pod. We think this is not in the spirit of what you are trying to achieve in KIP-429. > "Rolling upgrade are only supported if the new program (ie, Topology) is compatible to the old one." We have testet rolling upgrade with identical Topology and by replacing one by one Kafka Streams replica/EKS pod. Details about this are included in the Stack Overflow thread. We are really trying to understand how we can achieve high availability and to make it work for our use case. We don't think that our use case is unique and that this is something that is useful for others as well. We are willing to explain a detailed solution, both in Stack Overflow and a Medium article if we find a working solution. - Ismar On Sat, Mar 5, 2022 at 7:42 PM Matthias J. Sax <mj...@mailbox.org.invalid> wrote: > Hard to answer from a 10,000ft view. > > In general, a rolling upgrade (ie, bounce one instance at a time) is > recommended. If you have state, you would need to ensure that state is > not lost during a bounce. As you are using Kubernetes, using stateful > sets that allow you to re-attach disk should be the way to go. > > Rolling upgrade are only supported if the new program (ie, Topology) is > compatible to the old one. The alternative to a rolling upgrade would > be, to deploy the new version in parallel to the old one (using a > different application-id), and after the new version is running stable, > shutting down the old version. > > Hope this helps. > > -Matthias > > On 2/28/22 12:11, Ismar Slomic wrote: > > We run Kafka Streams (Java) apps on Kubernetes to *consume*, *process* > and > > *produce* real time data in our Kafka Cluster (running Confluent > Community > > Edition v7.0/Kafka v3.0). How can we do a deployment of our apps in a way > > that limits downtime on consuming records? Our initial target was approx > *2 > > sec* downtime a single time for each task. > > > > We are aiming to do continuous deployments of changes to the production > > environment, but deployments are too disruptive by causing downtime in > > record consumption in our apps, leading to latency in produced real time > > records. > > > > Since this question has already been described in detail on Stack > Overflow ( > > > https://stackoverflow.com/questions/71222496/how-to-achieve-high-availability-in-a-kafka-streams-app-during-deployment > ), > > but has not been answered yet, we would like to refer to it instead of > > copy/pasting the content in this mailing list. > > > > Please let me know if you prefer to have the complete question in the > > mailing list instead. > > >