Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka

Colin McCabe Mon, 17 Jun 2024 11:48:57 -0700

Hi Assane,

Pluggable compression codecs in the protocol seem likely to greatly fragment 
the ecosystem. I don't see a corresponding benefit that offsets this.


As Greg says, people who want to use a closed source codec already can, by 
forking Kafka. Open source clients won't be able to talk to them over their 
proprietary protocol, but that would be true even if this proposal were 
adopted. (Or perhaps open source clients can fall back to a different protocol 
-- I don't think that changes the argument much).

People who want to use an open source codec really should be considering 
contributing that back to the community, so anything that makes that harder or 
less likely to happen seems like a negative.

best,
Colin


On Mon, Jun 17, 2024, at 11:35, Diop, Assane wrote:
> Hi Greg, 
> Thank you for your thoughtful response.  
> Our motivation is to enable new compression codecs and quick deployment 
> of new codecs in a scalable way.  
> Like most of the Kafka community, we also would like to prevent 
> fragmentation of the ecosystem  as a result of introducing incompatible 
> technologies to Kafka. Our thesis is that large users of compression 
> such as cloud service providers will increasingly seek to minimize the 
> resources consumed by the exponential growth of data. These service 
> providers who control their own hardware to some extent could very well 
> fork Kafka for their own purposes without contributing those changes 
> back to Kafka. By enabling pluggable compression, we remove the need to 
> fork Kafka to implement a custom compressor, so this framework could 
> help minimize fragmentation in that case.
> As we’ve asserted before, compression is an ongoing topic of research 
> and is becoming more important as the volume of data increases.  
> Current compression algorithms are mostly based on Huffman tables 
> leading to similar behaviors.  These Huffman-based algorithms have been 
> iterated on for more than 20 years and we anticipate that new 
> technologies will be required to improve compression beyond these 
> codecs.    If we can take the inclusion of zstd in Kafka as an example, 
> the time difference between the initial draft of KIP-110 (Jan 6, 2017) 
> [1] and the implementation of zstd in Kafka with v2.1.0 (Nov 20, 2018) 
> [2] [3] is almost two years.  
>
> Our team is currently working on proof-of-concept plugins, some of 
> which are proprietary and some accelerate existing codecs.  Although 
> acceleration of existing codecs could be possible in the short term, 
> Case D is still the most flexible and extensible for the long term. 
> Case D may be more complex than the alternatives you describe, but we 
> are committed to getting it right with help and feedback from the Kafka 
> community. 
> For example, on ensuring plugin compatibility, one of the important 
> points brought up in our discussions was around how clients would 
> obtain the correct plugin versions for a particular client language and 
> plugin version. We’d like to see a process where registration of the 
> plugins with the Kafka cluster also facilitates distribution of the 
> correct plugin versions to the clients.  When registering a new plugin 
> with Kafka, the process would have the admin upload metadata and the 
> corresponding plugins for each language client into language-specific 
> internal topics. When a new client connects to the cluster, it could 
> query available plugins from an internal topic, then download the 
> correct plugin for itself from the correct topic holding binaries or 
> plugin code for that language. This tight coupling would ensure that 
> any client connecting to the plugin-enabled Kafka could either continue 
> to use existing codecs or obtain the required compression codec plugins 
> directly from the cluster they are interacting with. 
>
> [1]: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> [2]: https://archive.apache.org/dist/kafka/2.1.0/RELEASE_NOTES.html
> [3]: https://github.com/apache/kafka/releases/tag/2.1.0
>
>
> -----Original Message-----
> From: Greg Harris <greg.har...@aiven.io.INVALID> 
> Sent: Monday, May 13, 2024 1:23 PM
> To: dev@kafka.apache.org
> Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka
>
> Hi Assane,
>
> Thank you for the further information about your motivation and 
> intended use-cases, that adds a lot of context.
>
>> Our motivation here is to accelerate compression with the use of hardware 
>> accelerators.
>
> This is a very broad statement, so let me break it down into cases, and 
> what I would recommend in each:
>
> Case A: Open source accelerators for supported compression codecs (e.g. 
> zstd) 1. Try to add your accelerator to an existing upstream 
> implementation (e.g. zstd-jni), so that whenever that library is used, 
> people benefit from your accelerator.
> 2. Fork an existing implementation, and propose that the Kafka project 
> use your fork.
>
> Case B: Closed-source accelerators for supported compression codecs 
> (e.g. zstd) 1. Fork an existing implementation, and structure your fork 
> such that it can be swapped out at runtime by operators that want a 
> particular accelerator.
> 2. Kafka can add a Java pluggable interface to the broker and clients 
> to pick among the accelerated and non-accelerated plugins, falling back 
> to non-accelerated "reference implementations" as necessary. This 
> wouldn't require protocol changes.
>
> Case C: Accelerators for unsupported open source compression codecs 
> (e.g. brotli) 1. I think that these should be proposed as official 
> codecs for Kafka to support, and then the acceleration can be 
> implemented as in Case A or B.
>
> Case D: Accelerators for unsupported closed-source compression codecs 
> These are codecs that would require a fully pluggable implementation, 
> and reserved bits in the binary protocol. They are also the codecs 
> which are most damaging to the ecosystem. If you have a specific 
> proprietary codec in mind please say so, otherwise I want to invoke the 
> YAGNI principle here.
>
> Thanks,
> Greg
>
>
>
>
>
> On Mon, May 13, 2024 at 11:22 AM Diop, Assane <assane.d...@intel.com> wrote:
>>
>> Hi Greg,
>>
>> Thank you for your thoughtful response. Resending this email to continue 
>> engagement on the KIP discussion.
>>
>> Our motivation here is to accelerate compression with the use of hardware 
>> accelerators.
>>
>> If the community prefers, we would be happy to contribute code to support 
>> compression accelerators, but we believe that introducing a pluggable 
>> compression framework   is more scalable than enabling new compression 
>> algorithms in an ad hoc manner.
>>
>> A pluggable compression interface would enable hardware accelerators without 
>> requiring vendor-specific code in Kafka code base.
>>
>> We aim to ensure robustness by supporting all possible language-clients. In 
>> this latest iteration, this design provides a path to support other 
>> languages where each client has its own topic holding the plugin information 
>> for that language.
>>
>> The pluggable interface does not replace the built-in functionally, rather, 
>> it is an optional compression path seamlessly added for Kafka users who 
>> would like to use custom compression algorithms or simply accelerate current 
>> algorithms. In this latter case, a vendor providing acceleration for 
>> compression will need to support their plugins.
>>
>> As far as your concerns, I appreciate you taking the time to respond. Let me 
>> address them the best I can:
>> 1) When an operator adds a plugin to a cluster, they must ensure that the 
>> compression algorithms for all the supported language-clients of that plugin 
>> are compatible . For the plugin to be installed, the language must support 
>> dynamic loading or linking of libraries and these mechanisms exist in at 
>> least Java, C, Go and Python. Clients written in a language that does not 
>> support dynamic loading or linking can still use built-in codecs and coexist 
>> in a cluster where plugins were registered. This coexistence highlights that 
>> the use of plugins is an optional feature.
>>
>> 2) Plugins source should come from a reputable developer. This is true of 
>> any dependencies. Should an operator register a plugin, the plugin should 
>> have a path for support including deprecation of such plugin. If the 
>> community finds it useful, there could be an official Kafka repository and 
>> we are open to discussing ways to provide governance of the plugin ecosystem.
>>
>> 3) We do not see this as a fork of the binary protocol, but rather an 
>> evolution of the protocol to provide additional flexibility for compression.
>> Once a plugin is registered, it is compatible with all the “flavors”     of 
>> the plugins which here means different minor versions of a codec. 
>> Compression algorithms typically follow semantic versioning where v1.x is 
>> compatible with v1.y and where v2.x is not necessarily compatible with v1.x.
>> If a plugin version breaks compatibility with an older version, then it 
>> should be treated as a new plugin with a new plugin alias.
>> In parallel to the plugin topic holding plugin information during 
>> registration, additional topics holding the plugin binaries can be published 
>> by the plugin admin tool during installation to ensure compatibility. We 
>> view this as improving performance at the cost of extra operator work.
>>
>> 4) We only require the operator to register and then install the plugins. 
>> During the registration  process,  the plugin admin tool  takes in a plugin 
>> information (plugin alias and classname/library) and then  internally 
>> assigns the pluginID.  The operator is only responsible for providing the 
>> plugin alias and the className/library. The plugin admin tool is a new Java 
>> class in Tools that interacts with the operator to setup the plugins in the 
>> cluster. At this stage of the KIP, we have assumed a manual installation of 
>> the plugin. Installing here means the deployment of the plugin binary making 
>> it ready to be dynamically loaded/linked when needed.
>> We are looking at an option for dynamic installation of the plugin which 
>> would require the operator to install the binary using the plugin admin 
>> tool. Using the same concept as plugin registration, the operator can 
>> install the plugin binary by publishing it to a topic using the plugin admin 
>> tool. Clients that register a plugin by consuming the plugin list would also 
>> consume the necessary binaries from the cluster.
>>
>> 5) When a plugin is used, the set of built-in codecs is augmented by the set 
>> of plugins described in the plugin topic. The additional set of codecs is 
>> cluster-dependent, so, while a given batch of records stays within a 
>> cluster, they remain self-contained. If these batches are produced into 
>> another cluster, then the operator needs to either recompress data using 
>> builtins/available plugins or install plugins in the dependent cluster. In 
>> this scenario a consumer would decompress the data, and, if the mirrored 
>> data needs the same compression plugin, then the operator is required to 
>> register and install the plugins in the secondary cluster.
>> Our assertion is that the additional work required by an operator could be 
>> justified by improved performance.
>>
>> 6) There is a finite number of pluginID available based on the number of 
>> bits used in the attribute. If a developer or operator is experimenting with 
>> multiple plugins then they can also unregister a plugin if they hit the 
>> limit. The number of attribute bits request to represent the pluginID is 
>> arbitrary and we are open to community input here. Ultimately, with the 
>> ability to unregister a plugin, fewer bits could be used to represent  the 
>> pluginID.
>>
>> 7) While plugins add some complexity to a Kafka deployment, that complexity 
>> is mostly the work of the operator to register and install the plugins. 
>> Additionally, this increased complexity is all upfront and out-of-band. We 
>> try to manage it by using existing Kafka mechanisms such as the Kafka plugin 
>> topic described earlier.
>>
>> We have discussed using a custom Serializer/Deserializer, but, since 
>> compression happens at the batch level, using a custom 
>> Serializer/Deserializer would compress each message rather than compressing 
>> the whole batch. It seems only large records could benefit from this scheme. 
>> We are open to suggestions or clarification on this topic.
>> Again, thank you for sharing your concerns about balancing this proposal 
>> against the impact to the ecosystem. We think the additional performance 
>> that this could provide along with the improved flexibility to add or 
>> accelerate compression codecs outweighs the increased complexity for the 
>> operators.
>>
>> Assane
>>
>>
>> -----Original Message-----
>> From: Greg Harris <greg.har...@aiven.io.INVALID>
>> Sent: Wednesday, May 1, 2024 12:09 PM
>> To: dev@kafka.apache.org
>> Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to 
>> Kafka
>>
>> Hi Assane,
>>
>> Thanks for the update. Unfortunately, I don't think that the design changes 
>> have solved all of the previous concerns, and I feel it has raised new ones.
>>
>> From my earlier email:
>> 1. The KIP has now included Python, but this feature is still 
>> disproportionately difficult for statically-linked languages to support.
>> 2. This is unaddressed.
>> 3. This is unaddressed.
>> 4. The KIP now includes a metadata topic that is used to persist a mapping 
>> from the binary ID to full class name, but requires the operator to manage 
>> this mapping.
>>
>> My new concerns are:
>> 5. It is not possible to interpret a single message without also 
>> reading from this additional metadata (messages are not self
>> contained)
>> 6. There are a finite number of pluggable IDs, and this finite number is 
>> baked into the protocol.
>> 6a. This is a problem with the existing binary protocol, but this is 
>> acceptable as the frequency that a new protocol is added is quite low, and 
>> is discussed with the community.
>> 6b. Someone experimenting with compression plugins could easily exhaust this 
>> limit in a single day, and the limit is exhausted for the lifetime of the 
>> cluster. This could be done accidentally or maliciously.
>> 6c. Consuming 4 of the remaining 8 reserved bits feels wasteful, compared to 
>> the benefit that the protocol is receiving from this feature.
>> 7. Implementing support for this feature would require distributing and 
>> caching the metadata, which is a significant increase in complexity compared 
>> to the current compression mechanisms.
>>
>> From your motivation section:
>> > Although compression is not a new problem, it has continued to be an 
>> > important research topic.
>> > The integration and testing of new compression algorithms into Kafka 
>> > currently requires significant code changes and rebuilding of the 
>> > distribution package for Kafka.
>>
>> I think it is completely appropriate for someone testing an experimental 
>> compression algorithm to temporarily fork Kafka, and then discard that fork 
>> and all of the compressed data when the experiment is over.
>> The project has to balance the experience of upstream developers (including 
>> compression researchers), ecosystem developers, and operators, and this 
>> proposal's cost to ecosystem developers and operators is too high to justify 
>> the benefits.
>>
>> As an alternative, have you considered implementing a custom 
>> Serializer/Deserializer that could implement this feature, and just leave 
>> the Kafka compression off?
>> I think an "Add Brotli Compression" KIP is definitely worth pursuing, if 
>> that is the compression algorithm you have in mind currently.
>>
>> Thanks,
>> Greg
>>
>>
>> On Mon, Apr 29, 2024 at 3:10 PM Diop, Assane <assane.d...@intel.com> wrote:
>> >
>> > Hi Divij, Greg and Luke,
>> > I have updated the KIP for Kafka pluggable compression addressing the 
>> > concerns from the original design.
>> > I believe this new design takes into account lots of concerns and have 
>> > solved them. I would like to receive feedback on them as I am working on 
>> > getting this KIP accepted. Not targeting a release or anything but 
>> > accepting the concept will help getting towards this direction.
>> >
>> > The link to the KIP is here
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+plu
>> > gg
>> > able+compression+interface+to+Kafka
>> >
>> > Assane
>> >
>> > -----Original Message-----
>> > From: Diop, Assane <assane.d...@intel.com>
>> > Sent: Wednesday, April 24, 2024 4:58 PM
>> > To: dev@kafka.apache.org
>> > Subject: RE:DISCUSS KIP-984 Add pluggable compression interface to 
>> > Kafka
>> >
>> > Hi,
>> >
>> > I would like to bring back attention to 
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+plu
>> > gg
>> > able+compression+interface+to+Kafka
>> > I have made significant changes to the design to accommodate the concerns 
>> > and would like some feedback from the community and engage communication.
>> >
>> > Assane
>> >
>> > -----Original Message-----
>> > From: Diop, Assane
>> > Sent: Friday, March 1, 2024 4:45 PM
>> > To: dev@kafka.apache.org
>> > Subject: RE: DISCUSS KIP-984 Add pluggable compression interface to 
>> > Kafka
>> >
>> > Hi Luke,
>> >
>> > The proposal doesn't preclude supporting multiple clients but each client 
>> > would need an implementation of the pluggable architecture.
>> > At the very least we envision other clients such as librdkafka and 
>> > kafka-python could be supported by C implementations.
>> >
>> > We agree with community feedback regarding the need to support these 
>> > clients, and we are looking at alternative approaches for brokers and 
>> > clients to coordinate the plugin.
>> >
>> > One way to do this coordination is each client should have a configuration 
>> > mapping of the plugin name to its implementation.
>> >
>> > Assane
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Luke Chen <show...@gmail.com>
>> > Sent: Monday, February 26, 2024 7:50 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to 
>> > Kafka
>> >
>> > Hi Assane,
>> >
>> > I also share the same concern as Greg has, which is that the KIP is not 
>> > kafka ecosystem friendly.
>> > And this will make the kafka client and broker have high dependencies that 
>> > once you use the pluggable compression interface, the producer must be 
>> > java client.
>> > This seems to go against the original Kafka's design.
>> >
>> > If the proposal can support all kinds of clients, that would be great.
>> >
>> > Thanks.
>> > Luke
>> >
>> > On Tue, Feb 27, 2024 at 7:44 AM Diop, Assane <assane.d...@intel.com> wrote:
>> >
>> > > Hi Greg,
>> > >
>> > > Thanks for taking the time to give some feedback. It was very insightful.
>> > >
>> > > I have some answers:
>> > >
>> > > 1. The current proposal is Java centric. We want to figure out 
>> > > with Java first and then later incorporate other languages. We will get 
>> > > there.
>> > >
>> > > 2. The question of where the plugins would live is an important one.
>> > > I would like to get the community engagement on where a plugin would 
>> > > live.
>> > >    Officially supported plugins could be part of Kafka and others 
>> > > could live in a plugin repository. Is there currently a way to 
>> > > store plugins in Kafka and load them into the classpath? If such a 
>> > > space could be allowed then it would provide an standard way of 
>> > > installing officially supported plugins.
>> > >    In OpenSearch for example, there is a plugin utility that takes 
>> > > the jar and installs it across the cluster, privileges can be granted by 
>> > > an admin.
>> > > Such utility could be implemented in Kafka.
>> > >
>> > > 3. There is many way to look at this, we could change the message 
>> > > format that use the pluggable interface to be for example v3 and 
>> > > synchronize against that.
>> > >    In order to use the pluggable codec, you will have to be at 
>> > > message version 3 for example.
>> > >
>> > > 4. Passing the class name as metadata is one way to have the 
>> > > producer talk to the broker about which plugin to use. However 
>> > > there could be other implementation
>> > >    where you could set every thing to know about the topic using 
>> > > topic level compression. In this case for example a rule could be 
>> > > that in order to use the
>> > >    pluggable interface, you should use topic level compression.
>> > >
>> > >  I would like to have your valuable inputs on this!!
>> > >
>> > > Thanks before end,
>> > > Assane
>> > >
>> > > -----Original Message-----
>> > > From: Greg Harris <greg.har...@aiven.io.INVALID>
>> > > Sent: Wednesday, February 14, 2024 2:36 PM
>> > > To: dev@kafka.apache.org
>> > > Subject: Re: DISCUSS KIP-984 Add pluggable compression interface 
>> > > to Kafka
>> > >
>> > > Hi Assane,
>> > >
>> > > Thanks for the KIP!
>> > > Looking back, it appears that the project has only ever added 
>> > > compression types twice: lz4 in 2014 and zstd in 2018, and perhaps 
>> > > Kafka has fallen behind the state-of-the-art compression algorithms.
>> > > Thanks for working to fix that!
>> > >
>> > > I do have some concerns:
>> > >
>> > > 1. I think this is a very "java centric" proposal, and doesn't 
>> > > take non-java clients into enough consideration. librdkafka [1] is 
>> > > a great example of an implementation of the Kafka protocol which 
>> > > doesn't have the same classloading and plugin infrastructure that 
>> > > Java has, which would make implementing this feature much more difficult.
>> > >
>> > > 2. By making the interface pluggable, it puts the burden of 
>> > > maintaining individual compression codecs onto external 
>> > > developers, which may not be willing to maintain a codec for the 
>> > > service-lifetime of such a codec.
>> > > An individual developer can easily implement a plugin to allow 
>> > > them to use a cutting-edge compression algorithm without 
>> > > consulting the Kafka project, but as soon as data is compressed 
>> > > using that algorithm, they are on the hook to support that plugin 
>> > > going forward by the organizations using their implementation.
>> > > Part of the collective benefits of the Kafka project is to ensure 
>> > > the ongoing maintenance of such codecs, and provide a long 
>> > > deprecation window when a codec reaches EOL. I think the Kafka 
>> > > project is well-equipped to evaluate the maturity and properties 
>> > > of compression codecs and then maintain them going forward.
>> > >
>> > > 3. Also by making the interface pluggable, it reduces the scope of 
>> > > individual compression codecs. No longer is there a single lineage 
>> > > of Kafka protocols, where vN+1 of a protocol supports a codec that 
>> > > vN does not. Now there will be "flavors" of the protocol, and 
>> > > operators will need to ensure that their servers and their clients 
>> > > support the same "flavors" or else encounter errors.
>> > > This is the sort of protocol forking which can be dangerous to the 
>> > > Kafka community going forward. If there is a single lineage of 
>> > > codecs such that the upstream Kafka vX.Y supports codec Z, it is 
>> > > much simpler for other implementations to check and specify "Kafka 
>> > > vX.Y compatible", than it is to check & specify "Kafka vX.Y & Z 
>> > > compatible".
>> > >
>> > > 4. The Java class namespace is distributed, as anyone can name 
>> > > their class anything. It achieves this by being very verbose, with 
>> > > long fully-qualified names for classes. This is in conflict with a 
>> > > binary protocol, where it is desirable for the overhead to be as small 
>> > > as possible.
>> > > This may incentivise developers to keep their class names short, 
>> > > which also makes conflict more likely. If you have the option of 
>> > > naming your class "B" instead of 
>> > > "org.example.blah.BrotlCompressionCodecVersionOne",
>> > > and meaningfully save a flat 47 bytes on every request, 
>> > > somebody/everybody is going to do that.
>> > > This now increases the likelihood for conflict, as perhaps two 
>> > > developers want the same short name. Yes there are 52 one-letter 
>> > > class names, but to ensure that no two codecs ever conflict 
>> > > requires global coordination that a pluggable interface tries to avoid.
>> > > Operators then take on the burden of ensuring that the "B" codec 
>> > > on the other machine is indeed the "B" codec that they have 
>> > > installed on their machines, or else encounter errors.
>> > >
>> > > I think that having contributors propose that Kafka support their 
>> > > favorite compression type in order to get assigned a globally 
>> > > unique number is much healthier for the ecosystem than making this 
>> > > a pluggable interface and leaving the namespace to be wrangled by 
>> > > operators and client libraries.
>> > >
>> > > Thanks,
>> > > Greg
>> > >
>> > > [1] https://github.com/confluentinc/librdkafka
>> > > [2]
>> > > https://github.com/apache/kafka/blob/e8c70fce26626ed2ab90f2728a45f
>> > > 6e
>> > > 55
>> > > e907ec1/clients/src/main/java/org/apache/kafka/common/record/Defau
>> > > lt
>> > > Re
>> > > cordBatch.java#L130
>> > >
>> > > On Wed, Feb 14, 2024 at 12:59 PM Diop, Assane 
>> > > <assane.d...@intel.com>
>> > > wrote:
>> > > >
>> > > > Hi Divij, Mickael,
>> > > > Since Mickael KIP-390 was accepted, I did not want to respond in 
>> > > > that
>> > > thread to not confuse the work.
>> > > >
>> > > > As mentioned in the thread, the KIP-390 and KIP-984 do not 
>> > > > supercede
>> > > each other. However the scope of KIP-984 goes beyond the scope of 
>> > > KIP-390.
>> > > Pluggable compression interface is added as a new codec. The other 
>> > > codecs already implemented are not affected by this change.  I 
>> > > believe these 2 KIP are not the same but they compliment each other.
>> > > >
>> > > > As I stated before, the motivation is to give the users the 
>> > > > ability to
>> > > use different compressors without needing future changes in Kafka.
>> > > > Kafka currently supports zstd, snappy, gzip and lz4. However, 
>> > > > other
>> > > opensource compression projects like the Brotli algorithm are also 
>> > > gaining traction. For example the HTTP servers Apache and nginx 
>> > > offer Brotli compression as an option. With a pluggable interface, 
>> > > any Kafka developer could integrate and test Brotli with Kafka 
>> > > simply by writing a plugin. This same motivation can be applied to 
>> > > any other compression algorithm including hardware accelerated 
>> > > compression. There are hardware companies including intel and AMD that 
>> > > are working on accelerating compression.
>> > > >
>> > > > The main change in itself is an update in the message format to 
>> > > > allow
>> > > for metadata to be passed indicating the which plugin to use  to 
>> > > the broker. This only happens if the user selects the pluggable codec.
>> > > The metadata adds on an additional 52 bytes to the message format.
>> > > >
>> > > > Broker recompression is taking care of when producer and brokers 
>> > > > have
>> > > different codec because it is just another codec being added as 
>> > > far as Kafka.
>> > > > I have added more information to the 
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add
>> > > > +p
>> > > > lu
>> > > > gg
>> > > > able+compression+interface+to+Kafka I am ready for a PR if this 
>> > > > able+compression+interface+to+KIP
>> > > > gets accepted
>> > > >
>> > > > Assane
>> > > >
>> > > > -----Original Message-----
>> > > > From: Diop, Assane <assane.d...@intel.com>
>> > > > Sent: Wednesday, January 31, 2024 10:24 AM
>> > > > To: dev@kafka.apache.org
>> > > > Subject: RE: DISCUSS KIP-984 Add pluggable compression interface 
>> > > > to Kafka
>> > > >
>> > > > Hi Divij,
>> > > > Thank you for your response!
>> > > >
>> > > > Although compression is not a new problem, it has continued to 
>> > > > be an
>> > > important research topic.
>> > > > The integration and testing of new compression algorithms into 
>> > > > Kafka
>> > > currently requires significant code changes and rebuilding of the 
>> > > distribution package for Kafka.
>> > > > This KIP will allow for any compression algorithm to be 
>> > > > seamlessly
>> > > integrated into Kafka by writing a plugin that would bind into the 
>> > > wrapForInput and wrapForOutput methods in Kafka.
>> > > >
>> > > > As you mentioned, Kafka currently supports zstd, snappy, gzip and lz4.
>> > > However, other opensource compression projects like the Brotli 
>> > > algorithm are also gaining traction. For example the HTTP servers 
>> > > Apache and nginx offer Brotli compression as an option. With a 
>> > > pluggable interface, any Kafka developer could integrate and test 
>> > > Brotli with Kafka simply by writing a plugin. This same motivation 
>> > > can be applied to any other compression algorithm including 
>> > > hardware accelerated compression. There are hardware companies 
>> > > including intel and AMD that are working on accelerating compression.
>> > > >
>> > > > This KIP would certainly complement the current
>> > > https://issues.apache.org/jira/browse/KAFKA-7632 by adding even 
>> > > more flexibility for the users.
>> > > > A plugin could be tailored to arbitrary datasets in response to 
>> > > > a user's
>> > > specific resource requirements.
>> > > >
>> > > > For reference, other opensource projects have already started or
>> > > implemented this type of plugin technology such as:
>> > > >         1. Cassandra, which has implemented the same concept of
>> > > pluggable interface.
>> > > >         2. OpenSearch is also working on enabling the same type 
>> > > > of
>> > > plugin framework.
>> > > >
>> > > > With respect to message recompression, the plugin interface 
>> > > > would handle
>> > > this use case on the broker side similar to the current 
>> > > recompression process.
>> > > >
>> > > > Assane
>> > > >
>> > > > -----Original Message-----
>> > > > From: Divij Vaidya <divijvaidy...@gmail.com>
>> > > > Sent: Friday, December 22, 2023 2:27 AM
>> > > > To: dev@kafka.apache.org
>> > > > Subject: Re: DISCUSS KIP-984 Add pluggable compression interface 
>> > > > to Kafka
>> > > >
>> > > > Thank you for writing the KIP Assane.
>> > > >
>> > > > In general, exposing a "pluggable" interface is not a decision 
>> > > > made
>> > > lightly because it limits our ability to remove / change that 
>> > > interface in future.
>> > > > Any future changes to the interface will have to remain 
>> > > > compatible with
>> > > existing plugins which limits the flexibility of changes we can 
>> > > make inside Kafka. Hence, we need a strong motivation for adding a 
>> > > pluggable interface.
>> > > >
>> > > > 1\ May I ask the motivation for this KIP? Are the current 
>> > > > compression codecs (zstd, gzip, lz4, snappy) not sufficient for your 
>> > > > use case?
>> > > > Would proving fine grained compression options as proposed in
>> > > > https://issues.apache.org/jira/browse/KAFKA-7632 and 
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Sup
>> > > > po
>> > > > rt
>> > > > +C
>> > > > ompression+Level
>> > > > address your use case?
>> > > > 2\ "This option impacts the following processes" -> This should 
>> > > > also
>> > > include the decompression and compression that occurs during 
>> > > message version transformation, i.e. when client send message with 
>> > > V1 and broker expects in V2, we convert the message and recompress it.
>> > > >
>> > > > --
>> > > > Divij Vaidya
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Dec 18, 2023 at 7:22 PM Diop, Assane 
>> > > > <assane.d...@intel.com>
>> > > wrote:
>> > > >
>> > > > > I would like to bring some attention to this KIP. We have 
>> > > > > added an interface to the compression code that allow anyone 
>> > > > > to build their own compression plugin and integrate easily back to 
>> > > > > kafka.
>> > > > >
>> > > > > Assane
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: Diop, Assane <assane.d...@intel.com>
>> > > > > Sent: Monday, October 2, 2023 9:27 AM
>> > > > > To: dev@kafka.apache.org
>> > > > > Subject: DISCUSS KIP-984 Add pluggable compression interface 
>> > > > > to Kafka
>> > > > >
>> > > > >
>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+A
>> > > > > dd
>> > > > > +p
>> > > > > lu
>> > > > > gg
>> > > > > able+compression+interface+to+Kafka
>> > > > >
>> > >

Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka

Reply via email to