Hey Radim One of the reasons for the slowdown is preparation of upcoming releases (the community is currently in code freeze/resolve release blockers mode) and preparation for Kafka Summit next week. I would suggest giving another 2-3 weeks for folks to chime in. I would personally visit this KIP in the last week of May.
-- Divij Vaidya On Thu, May 11, 2023 at 1:34 PM Radim Vansa <rva...@azul.com.invalid> wrote: > Hello all, > > it seems that this KIP did not sparkle much interest, not sure if people > just don't care or whether there are any objections against the > proposal. What should be the next step, I don't think it has been > discussed enough to proceed with voting. > > Cheers, > > Radim > > On 27. 04. 23 8:39, Radim Vansa wrote: > > Caution: This email originated from outside of the organization. Do > > not click links or open attachments unless you recognize the sender > > and know the content is safe. > > > > > > Thank you for those questions, as I've mentioned, my knowledge of Kafka > > is quite limited so these are the things that need careful thinking! > > Comments inline. > > > > On 26. 04. 23 16:28, Mickael Maison wrote: > >> Hi Radim, > >> > >> Thanks for the KIP! CRaC is an interesting project and it could be a > >> useful feature in Kafka clients. > >> > >> The KIP is pretty vague in terms of the expected behavior of clients > >> when checkpointing and restoring. For example: > >> > >> 1. A consumer may have pre-fetched records in memory. When it is > >> checkpointed, its group will rebalance and another consumer will > >> consume the same records. When the initial consumer is restored, will > >> it process its pre-fetched records and basically reprocess record > >> already handled by other consumers? > > > > > > How would the broker (?) know what records were really consumed? I think > > that there must be some form of Two Generals Problem. > > > > The checkpoint should keep as much of the application untouched as it > > could. Here, I would expect that the prefetched records would be > > consumed after the restore operation as if nothing happened. I can > > imagine this could cause some trouble if the data is dependent on the > > 'external' world, e.g. other members of the cluster. But I wouldn't > > break the general guarantees Kafka provides if we can avoid it. We > > certainly have an option to do the checkpoint more gracefully, > > deregistering with the group (the checkpoint is effectively blocked by > > the notification handler). > > > > If we're talking about using CRaC for boot speedup this is not that > > important - when the app is about to be checkpointed it will likely stop > > processing data anyway. For other use-cases (e.g. live migration) it > > might matter. > > > > > >> > >> 2. Producers may have records in-flight or in the producer buffer when > >> they are checkpointed. How do you propose to handle these cases? > > > > > > If there's something in flight we can wait for the acks. Alternatively > > if the receiver guards against double receive using unique ids/sequence > > numbers we could resend that after restore. As for the data in the > > buffer, IMO that can wait until restore. > > > > > >> > >> 3. Clients may have loaded plugins such as serializers. These plugins > >> may establish network connections too. How are these expected to > >> automatically reconnect when the application is restored? > > > > > > If there's an independent pool of connections, it's up to the plugin > > author to support CRaC, I doubt there's anything that the generic code > > could do. Also it's likely that the plugins won't need any extension to > > the SPI; these would register its handlers independently (if ordering > > matters there are ways to prioritize one resource over another). > > > > Cheers! > > > > Radim Vansa > > > > > >> > >> Thanks, > >> Mickael > >> > >> > >> On Wed, Apr 26, 2023 at 8:27 AM Radim Vansa <rva...@azul.com.invalid> > >> wrote: > >>> Hi all, > >>> > >>> I haven't seen much reactions on this proposal. Is there any general > >>> policy regarding dependencies, or a prior decision that would hint > >>> on this? > >>> > >>> Thanks! > >>> > >>> Radim > >>> > >>> > >>> On 21. 04. 23 10:10, Radim Vansa wrote: > >>>> Caution: This email originated from outside of the organization. Do > >>>> not click links or open attachments unless you recognize the sender > >>>> and know the content is safe. > >>>> > >>>> > >>>> Thank you, > >>>> > >>>> now to be tracked as KIP-921: > >>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-921%3A+OpenJDK+CRaC+support > >>>> > >>>> > >>>> > >>>> Radim > >>>> > >>>> On 20. 04. 23 15:26, Josep Prat wrote: > >>>>> Hi Radim, > >>>>> You should have now permissions to create a KIP. > >>>>> > >>>>> Best, > >>>>> > >>>>> On Thu, Apr 20, 2023 at 2:22 PM Radim Vansa <rva...@azul.com.invalid > > > >>>>> wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> upon filing a PR [1] with some initial support for OpenJDK CRaC > >>>>>> [2][3] I > >>>>>> was directed here to raise a KIP (I don't have the permissions in > >>>>>> wiki/JIRA to create the KIP page yet, though). > >>>>>> > >>>>>> In a nutshell, CRaC intends to provide a way to checkpoint > >>>>>> (snapshot) > >>>>>> and persist a running Java application and later restore it, > >>>>>> possibly on > >>>>>> a different computer. This can be used to significantly speed up the > >>>>>> boot process (from seconds or minutes to tens of milliseconds), live > >>>>>> replication or migration of the heated up application. This is not > >>>>>> entirely transparent to the application; the application can > >>>>>> register > >>>>>> for notification when this is happening, and sometime has to assist > >>>>>> with > >>>>>> that to prevent unexpected state after restore - e.g. close network > >>>>>> connections and files. > >>>>>> > >>>>>> CRaC is not integrated yet into the mainline JDK; JEP is being > >>>>>> prepared, > >>>>>> and users are welcome to try out our builds. However even when this > >>>>>> gets > >>>>>> into JDK we can't expect users jump onto the latest release > >>>>>> immediately; > >>>>>> therefore we provide a facade package org.crac [4] that delegates to > >>>>>> the > >>>>>> implementation, if it is present in the running JDK, or provides a > >>>>>> no-op > >>>>>> implementation. > >>>>>> > >>>>>> With or without the implementation, the support for CRaC in the > >>>>>> application should be designed to have a minimal impact on > >>>>>> performance > >>>>>> (few extra objects, some volatile reads...). On the other hand the > >>>>>> checkpoint operation itself can be non-trivial in this matter. > >>>>>> Therefore > >>>>>> the main consideration should be about the maintenance costs - > >>>>>> keeping a > >>>>>> small JAR in dependencies and some extra code in networking and > >>>>>> persistence. > >>>>>> > >>>>>> The support for CRaC does not have to be all-in for all components - > >>>>>> maybe it does not make sense to snapshot a Broker. My PR was for > >>>>>> Kafka > >>>>>> Clients because the open network connections need to be handled in a > >>>>>> web > >>>>>> application (in my case I am enabling CRaC in Quarkus Superheros [5] > >>>>>> demo). The PR does not handle all possible client-side uses; as I am > >>>>>> not > >>>>>> familiar with Kafka I follow the whack-a-mole strategy. > >>>>>> > >>>>>> It is possible that the C/R could be handled in a different > >>>>>> layer, e.g. > >>>>>> in Quarkus integration code. However our intent is to push the > >>>>>> changes > >>>>>> as low in the technology stack as possible, to provide the best > >>>>>> fanout > >>>>>> to users without duplicating maintenance efforts. Also having the > >>>>>> support higher up can be fragile and break encapsulation. > >>>>>> > >>>>>> Thank you for your consideration, I hope that you'll appreciate our > >>>>>> attempt to innovate the Java ecosystem. > >>>>>> > >>>>>> Radim Vansa > >>>>>> > >>>>>> PS: I'd appreciate if someone could give me the permissions on > >>>>>> wiki to > >>>>>> create a proper KIP! Username: rvansa (both Confluence and JIRA). > >>>>>> > >>>>>> [1] https://github.com/apache/kafka/pull/13619 > >>>>>> > >>>>>> [2] https://wiki.openjdk.org/display/crac > >>>>>> > >>>>>> [3] https://github.com/openjdk/crac > >>>>>> > >>>>>> [4] https://github.com/CRaC/org.crac > >>>>>> > >>>>>> [5] https://quarkus.io/quarkus-workshops/super-heroes/ > >>>>>> > >>>>>> > >>>>> -- > >>>>> [image: Aiven] <https://www.aiven.io> > >>>>> > >>>>> *Josep Prat* > >>>>> Open Source Engineering Director, *Aiven* > >>>>> josep.p...@aiven.io | +491715557497 > >>>>> aiven.io <https://www.aiven.io> | > >>>>> <https://www.facebook.com/aivencloud> > >>>>> <https://www.linkedin.com/company/aiven/> > >>>>> <https://twitter.com/aiven_io> > >>>>> *Aiven Deutschland GmbH* > >>>>> Alexanderufer 3-7, 10117 Berlin > >>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > >>>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>>> >