Hi Michael, I’m from Azul, provider of CRaC, good to hear this mostly positive feedback, and would be good to be in touch with you to resolve the issue you describe below, either here or via my apache.org mail or at gwiele...@azul.com.
Thanks, Gj On Fri, 16 Aug 2024 at 10:08, <michael.hambur...@mail.de.invalid> wrote: > Hello everyone, > > I wanted to share a quick note on our experience using Quarkus as the > framework for our AWS Lambda functions, with SnapStart enabled, leveraging > the CRaC API. We've seen significant improvements in cold startup times, > which has been great for our use case. > > However, we've encountered an issue during cold starts where we > consistently see a "connection to node xx terminated" exception related to > our use of SmallRye Reactive Messaging with the Apache Kafka connector. > Although the Kafka client automatically reconnects and retries, and we > haven't lost any messages, this exception can be misleading and cause > unnecessary concern. Initially, I suspected a misconfiguration and spent > time investigating potential network issues (firewalls, subnets, etc.). > > I attempted to close or flush Kafka client connections in the > beforeCheckpoint method to avoid old connections being reused when the > snapshot is reloaded, but unfortunately, this didn't resolve the issue. > > My key takeaway is that while CRaC support might not be widespread across > all JDK distributions yet, there are still many systems that can benefit > significantly from it. > > Best regards, > Michael > > > > On 2023/04/20 12:22:13 Radim Vansa wrote: > > Hello, > > > > upon filing a PR [1] with some initial support for OpenJDK CRaC [2][3] I > > was directed here to raise a KIP (I don't have the permissions in > > wiki/JIRA to create the KIP page yet, though). > > > > In a nutshell, CRaC intends to provide a way to checkpoint (snapshot) > > and persist a running Java application and later restore it, possibly on > > a different computer. This can be used to significantly speed up the > > boot process (from seconds or minutes to tens of milliseconds), live > > replication or migration of the heated up application. This is not > > entirely transparent to the application; the application can register > > for notification when this is happening, and sometime has to assist with > > that to prevent unexpected state after restore - e.g. close network > > connections and files. > > > > CRaC is not integrated yet into the mainline JDK; JEP is being prepared, > > and users are welcome to try out our builds. However even when this gets > > into JDK we can't expect users jump onto the latest release immediately; > > therefore we provide a facade package org.crac [4] that delegates to the > > implementation, if it is present in the running JDK, or provides a no-op > > implementation. > > > > With or without the implementation, the support for CRaC in the > > application should be designed to have a minimal impact on performance > > (few extra objects, some volatile reads...). On the other hand the > > checkpoint operation itself can be non-trivial in this matter. Therefore > > the main consideration should be about the maintenance costs - keeping a > > small JAR in dependencies and some extra code in networking and > persistence. > > > > The support for CRaC does not have to be all-in for all components - > > maybe it does not make sense to snapshot a Broker. My PR was for Kafka > > Clients because the open network connections need to be handled in a web > > application (in my case I am enabling CRaC in Quarkus Superheros [5] > > demo). The PR does not handle all possible client-side uses; as I am not > > familiar with Kafka I follow the whack-a-mole strategy. > > > > It is possible that the C/R could be handled in a different layer, e.g. > > in Quarkus integration code. However our intent is to push the changes > > as low in the technology stack as possible, to provide the best fanout > > to users without duplicating maintenance efforts. Also having the > > support higher up can be fragile and break encapsulation. > > > > Thank you for your consideration, I hope that you'll appreciate our > > attempt to innovate the Java ecosystem. > > > > Radim Vansa > > > > PS: I'd appreciate if someone could give me the permissions on wiki to > > create a proper KIP! Username: rvansa (both Confluence and JIRA). > > > > [1] https://github.com/apache/kafka/pull/13619 > > > > [2] https://wiki.openjdk.org/display/crac > > > > [3] https://github.com/openjdk/crac > > > > [4] https://github.com/CRaC/org.crac > > > > [5] https://quarkus.io/quarkus-workshops/super-heroes/ > > > > > >