Hey Radim

One of the reasons for the slowdown is preparation of upcoming releases
(the community is currently in code freeze/resolve release blockers mode)
and preparation for Kafka Summit next week. I would suggest giving another
2-3 weeks for folks to chime in. I would personally visit this KIP in the
last week of May.

--
Divij Vaidya



On Thu, May 11, 2023 at 1:34 PM Radim Vansa <rva...@azul.com.invalid> wrote:

> Hello all,
>
> it seems that this KIP did not sparkle much interest, not sure if people
> just don't care or whether there are any objections against the
> proposal. What should be the next step, I don't think it has been
> discussed enough to proceed with voting.
>
> Cheers,
>
> Radim
>
> On 27. 04. 23 8:39, Radim Vansa wrote:
> > Caution: This email originated from outside of the organization. Do
> > not click links or open attachments unless you recognize the sender
> > and know the content is safe.
> >
> >
> > Thank you for those questions, as I've mentioned, my knowledge of Kafka
> > is quite limited so these are the things that need careful thinking!
> > Comments inline.
> >
> > On 26. 04. 23 16:28, Mickael Maison wrote:
> >> Hi Radim,
> >>
> >> Thanks for the KIP! CRaC is an interesting project and it could be a
> >> useful feature in Kafka clients.
> >>
> >> The KIP is pretty vague in terms of the expected behavior of clients
> >> when checkpointing and restoring. For example:
> >>
> >> 1. A consumer may have pre-fetched records in memory. When it is
> >> checkpointed, its group will rebalance and another consumer will
> >> consume the same records. When the initial consumer is restored, will
> >> it process its pre-fetched records and basically reprocess record
> >> already handled by other consumers?
> >
> >
> > How would the broker (?) know what records were really consumed? I think
> > that there must be some form of Two Generals Problem.
> >
> > The checkpoint should keep as much of the application untouched as it
> > could. Here, I would expect that the prefetched records would be
> > consumed after the restore operation as if nothing happened. I can
> > imagine this could cause some trouble if the data is dependent on the
> > 'external' world, e.g. other members of the cluster. But I wouldn't
> > break the general guarantees Kafka provides if we can avoid it. We
> > certainly have an option to do the checkpoint more gracefully,
> > deregistering with the group (the checkpoint is effectively blocked by
> > the notification handler).
> >
> > If we're talking about using CRaC for boot speedup this is not that
> > important - when the app is about to be checkpointed it will likely stop
> > processing data anyway. For other use-cases (e.g. live migration) it
> > might matter.
> >
> >
> >>
> >> 2. Producers may have records in-flight or in the producer buffer when
> >> they are checkpointed. How do you propose to handle these cases?
> >
> >
> > If there's something in flight we can wait for the acks. Alternatively
> > if the receiver guards against double receive using unique ids/sequence
> > numbers we could resend that after restore. As for the data in the
> > buffer, IMO that can wait until restore.
> >
> >
> >>
> >> 3. Clients may have loaded plugins such as serializers. These plugins
> >> may establish network connections too. How are these expected to
> >> automatically reconnect when the application is restored?
> >
> >
> > If there's an independent pool of connections, it's up to the plugin
> > author to support CRaC, I doubt there's anything that the generic code
> > could do. Also it's likely that the plugins won't need any extension to
> > the SPI; these would register its handlers independently (if ordering
> > matters there are ways to prioritize one resource over another).
> >
> > Cheers!
> >
> > Radim Vansa
> >
> >
> >>
> >> Thanks,
> >> Mickael
> >>
> >>
> >> On Wed, Apr 26, 2023 at 8:27 AM Radim Vansa <rva...@azul.com.invalid>
> >> wrote:
> >>> Hi all,
> >>>
> >>> I haven't seen much reactions on this proposal. Is there any general
> >>> policy regarding dependencies, or a prior decision that would hint
> >>> on this?
> >>>
> >>> Thanks!
> >>>
> >>> Radim
> >>>
> >>>
> >>> On 21. 04. 23 10:10, Radim Vansa wrote:
> >>>> Caution: This email originated from outside of the organization. Do
> >>>> not click links or open attachments unless you recognize the sender
> >>>> and know the content is safe.
> >>>>
> >>>>
> >>>> Thank you,
> >>>>
> >>>> now to be tracked as KIP-921:
> >>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-921%3A+OpenJDK+CRaC+support
> >>>>
> >>>>
> >>>>
> >>>> Radim
> >>>>
> >>>> On 20. 04. 23 15:26, Josep Prat wrote:
> >>>>> Hi Radim,
> >>>>> You should have now permissions to create a KIP.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> On Thu, Apr 20, 2023 at 2:22 PM Radim Vansa <rva...@azul.com.invalid
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> upon filing a PR [1] with some initial support for OpenJDK CRaC
> >>>>>> [2][3] I
> >>>>>> was directed here to raise a KIP (I don't have the permissions in
> >>>>>> wiki/JIRA to create the KIP page yet, though).
> >>>>>>
> >>>>>> In a nutshell, CRaC intends to provide a way to checkpoint
> >>>>>> (snapshot)
> >>>>>> and persist a running Java application and later restore it,
> >>>>>> possibly on
> >>>>>> a different computer. This can be used to significantly speed up the
> >>>>>> boot process (from seconds or minutes to tens of milliseconds), live
> >>>>>> replication or migration of the heated up application. This is not
> >>>>>> entirely transparent to the application; the application can
> >>>>>> register
> >>>>>> for notification when this is happening, and sometime has to assist
> >>>>>> with
> >>>>>> that to prevent unexpected state after restore - e.g. close network
> >>>>>> connections and files.
> >>>>>>
> >>>>>> CRaC is not integrated yet into the mainline JDK; JEP is being
> >>>>>> prepared,
> >>>>>> and users are welcome to try out our builds. However even when this
> >>>>>> gets
> >>>>>> into JDK we can't expect users jump onto the latest release
> >>>>>> immediately;
> >>>>>> therefore we provide a facade package org.crac [4] that delegates to
> >>>>>> the
> >>>>>> implementation, if it is present in the running JDK, or provides a
> >>>>>> no-op
> >>>>>> implementation.
> >>>>>>
> >>>>>> With or without the implementation, the support for CRaC in the
> >>>>>> application should be designed to have a minimal impact on
> >>>>>> performance
> >>>>>> (few extra objects, some volatile reads...). On the other hand the
> >>>>>> checkpoint operation itself can be non-trivial in this matter.
> >>>>>> Therefore
> >>>>>> the main consideration should be about the maintenance costs -
> >>>>>> keeping a
> >>>>>> small JAR in dependencies and some extra code in networking and
> >>>>>> persistence.
> >>>>>>
> >>>>>> The support for CRaC does not have to be all-in for all components -
> >>>>>> maybe it does not make sense to snapshot a Broker. My PR was for
> >>>>>> Kafka
> >>>>>> Clients because the open network connections need to be handled in a
> >>>>>> web
> >>>>>> application (in my case I am enabling CRaC in Quarkus Superheros [5]
> >>>>>> demo). The PR does not handle all possible client-side uses; as I am
> >>>>>> not
> >>>>>> familiar with Kafka I follow the whack-a-mole strategy.
> >>>>>>
> >>>>>> It is possible that the C/R could be handled in a different
> >>>>>> layer, e.g.
> >>>>>> in Quarkus integration code. However our intent is to push the
> >>>>>> changes
> >>>>>> as low in the technology stack as possible, to provide the best
> >>>>>> fanout
> >>>>>> to users without duplicating maintenance efforts. Also having the
> >>>>>> support higher up can be fragile and break encapsulation.
> >>>>>>
> >>>>>> Thank you for your consideration, I hope that you'll appreciate our
> >>>>>> attempt to innovate the Java ecosystem.
> >>>>>>
> >>>>>> Radim Vansa
> >>>>>>
> >>>>>> PS: I'd appreciate if someone could give me the permissions on
> >>>>>> wiki to
> >>>>>> create a proper KIP! Username: rvansa (both Confluence and JIRA).
> >>>>>>
> >>>>>> [1] https://github.com/apache/kafka/pull/13619
> >>>>>>
> >>>>>> [2] https://wiki.openjdk.org/display/crac
> >>>>>>
> >>>>>> [3] https://github.com/openjdk/crac
> >>>>>>
> >>>>>> [4] https://github.com/CRaC/org.crac
> >>>>>>
> >>>>>> [5] https://quarkus.io/quarkus-workshops/super-heroes/
> >>>>>>
> >>>>>>
> >>>>> --
> >>>>> [image: Aiven] <https://www.aiven.io>
> >>>>>
> >>>>> *Josep Prat*
> >>>>> Open Source Engineering Director, *Aiven*
> >>>>> josep.p...@aiven.io   |   +491715557497
> >>>>> aiven.io <https://www.aiven.io>   |
> >>>>> <https://www.facebook.com/aivencloud>
> >>>>>     <https://www.linkedin.com/company/aiven/>
> >>>>> <https://twitter.com/aiven_io>
> >>>>> *Aiven Deutschland GmbH*
> >>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>
>

Reply via email to