Thanks, Liam! I have a mixture of Kafka record size. 10% are large (>100kbs) and 90% of the records are smaller than 1kb. I'm working on a streaming analytics solution that streams impressions, user actions and serving info and combines them together. End-to-end latency is more important than storage size.
On Mon, Mar 14, 2022 at 3:27 PM Liam Clarke-Hutchinson <lclar...@redhat.com> wrote: > Hi Dan, > > Decompression generally only happens in the broker if the topic has a > particular compression algorithm set, and the producer is using a different > one - then the broker will decompress records from the producer, then > recompress it using the topic's configured algorithm. (The LogCleaner will > also decompress then recompress records when compacting compressed topics). > > The consumer decompresses compressed record batches it receives. > > In my opinion, using topic compression instead of producer compression > would only make sense if the overhead of a few more CPU cycles compression > uses was not tolerable for the producing app. In all of my use cases, > network throughput becomes a bottleneck long before producer compression > CPU cost does. > > For your "if X, do Y" formulation I'd say - if your producer is sending > tiny batches, do some analysis of compressed vs. uncompressed size for your > given compression algorithm - you may find that compression overhead > increases batch size for tiny batches. > > If you're sending a large amount of data, do tune your batching and use > compression to reduce data being sent over the wire. > > If you can tell us more about what your problem domain, there might be more > advice that's applicable :) > > Cheers, > > Liam Clarke-Hutchinson > > On Tue, 15 Mar 2022 at 10:05, Dan Hill <quietgol...@gmail.com> wrote: > > > Hi. I looked around for advice about Kafka compression. I've seen mixed > > and conflicting advice. > > > > Is there any sorta "if X, do Y" type of documentation around Kafka > > compression? > > > > Any advice? Any good posts to read that talk about this trade off? > > > > *Detailed comments* > > I tried looking for producer vs topic compression. I didn't find much. > > Some of the information I see is back from 2011 (which I'm guessing is > > pretty stale). > > > > I can guess some potential benefits but I don't know if they are actually > > real. I've also seen some sites claim certain trade offs but it's > unclear > > if they're true. > > > > It looks like I can modify an existing topic's compression. I don't know > > if that actually works. I'd assume it'd just impact data going forward. > > > > I've seen multiple sites say that decompression happens in the broker and > > multiple that say it happens in the consumer. > > >