Hello,
upon filing a PR [1] with some initial support for OpenJDK CRaC [2][3] I
was directed here to raise a KIP (I don't have the permissions in
wiki/JIRA to create the KIP page yet, though).
In a nutshell, CRaC intends to provide a way to checkpoint (snapshot)
and persist a running Java application and later restore it, possibly on
a different computer. This can be used to significantly speed up the
boot process (from seconds or minutes to tens of milliseconds), live
replication or migration of the heated up application. This is not
entirely transparent to the application; the application can register
for notification when this is happening, and sometime has to assist with
that to prevent unexpected state after restore - e.g. close network
connections and files.
CRaC is not integrated yet into the mainline JDK; JEP is being prepared,
and users are welcome to try out our builds. However even when this gets
into JDK we can't expect users jump onto the latest release immediately;
therefore we provide a facade package org.crac [4] that delegates to the
implementation, if it is present in the running JDK, or provides a no-op
implementation.
With or without the implementation, the support for CRaC in the
application should be designed to have a minimal impact on performance
(few extra objects, some volatile reads...). On the other hand the
checkpoint operation itself can be non-trivial in this matter. Therefore
the main consideration should be about the maintenance costs - keeping a
small JAR in dependencies and some extra code in networking and persistence.
The support for CRaC does not have to be all-in for all components -
maybe it does not make sense to snapshot a Broker. My PR was for Kafka
Clients because the open network connections need to be handled in a web
application (in my case I am enabling CRaC in Quarkus Superheros [5]
demo). The PR does not handle all possible client-side uses; as I am not
familiar with Kafka I follow the whack-a-mole strategy.
It is possible that the C/R could be handled in a different layer, e.g.
in Quarkus integration code. However our intent is to push the changes
as low in the technology stack as possible, to provide the best fanout
to users without duplicating maintenance efforts. Also having the
support higher up can be fragile and break encapsulation.
Thank you for your consideration, I hope that you'll appreciate our
attempt to innovate the Java ecosystem.
Radim Vansa
PS: I'd appreciate if someone could give me the permissions on wiki to
create a proper KIP! Username: rvansa (both Confluence and JIRA).
[1] https://github.com/apache/kafka/pull/13619
[2] https://wiki.openjdk.org/display/crac
[3] https://github.com/openjdk/crac
[4] https://github.com/CRaC/org.crac
[5] https://quarkus.io/quarkus-workshops/super-heroes/