Hi Dan,

Okay, so if you're looking for low latency, I'm guessing that you're using
a very low linger.ms in the producers? Also, what format are the records?
If they're already in a binary format like Protobuf or Avro, unless they're
composed largely of strings, compression may offer little benefit.

With your small records, I'd suggest running some tests with your current
config with different compression settings - none, snappy, lz4, (don't
bother with gzip unless that's all you have) and checking producer metrics
(available via JMX if you're using the Java clients) for avg-batch-size and
compression-ratio.

You may just wish to start with no compression, and then consider moving to
it if/when network bandwidth becomes a bottleneck.

Regards,

Liam

On Tue, 15 Mar 2022 at 17:05, Dan Hill <quietgol...@gmail.com> wrote:

> Thanks, Liam!
>
> I have a mixture of Kafka record size.  10% are large (>100kbs) and 90% of
> the records are smaller than 1kb.  I'm working on a streaming analytics
> solution that streams impressions, user actions and serving info and
> combines them together.  End-to-end latency is more important than storage
> size.
>
>
> On Mon, Mar 14, 2022 at 3:27 PM Liam Clarke-Hutchinson <
> lclar...@redhat.com>
> wrote:
>
> > Hi Dan,
> >
> > Decompression generally only happens in the broker if the topic has a
> > particular compression algorithm set, and the producer is using a
> different
> > one - then the broker will decompress records from the producer, then
> > recompress it using the topic's configured algorithm. (The LogCleaner
> will
> > also decompress then recompress records when compacting compressed
> topics).
> >
> > The consumer decompresses compressed record batches it receives.
> >
> > In my opinion, using topic compression instead of producer compression
> > would only make sense if the overhead of a few more CPU cycles
> compression
> > uses was not tolerable for the producing app. In all of my use cases,
> > network throughput becomes a bottleneck long before producer compression
> > CPU cost does.
> >
> > For your "if X, do Y" formulation I'd say - if your producer is sending
> > tiny batches, do some analysis of compressed vs. uncompressed size for
> your
> > given compression algorithm - you may find that compression overhead
> > increases batch size for tiny batches.
> >
> > If you're sending a large amount of data, do tune your batching and use
> > compression to reduce data being sent over the wire.
> >
> > If you can tell us more about what your problem domain, there might be
> more
> > advice that's applicable :)
> >
> > Cheers,
> >
> > Liam Clarke-Hutchinson
> >
> > On Tue, 15 Mar 2022 at 10:05, Dan Hill <quietgol...@gmail.com> wrote:
> >
> > > Hi.  I looked around for advice about Kafka compression.  I've seen
> mixed
> > > and conflicting advice.
> > >
> > > Is there any sorta "if X, do Y" type of documentation around Kafka
> > > compression?
> > >
> > > Any advice?  Any good posts to read that talk about this trade off?
> > >
> > > *Detailed comments*
> > > I tried looking for producer vs topic compression.  I didn't find much.
> > > Some of the information I see is back from 2011 (which I'm guessing is
> > > pretty stale).
> > >
> > > I can guess some potential benefits but I don't know if they are
> actually
> > > real.  I've also seen some sites claim certain trade offs but it's
> > unclear
> > > if they're true.
> > >
> > > It looks like I can modify an existing topic's compression.  I don't
> know
> > > if that actually works.  I'd assume it'd just impact data going
> forward.
> > >
> > > I've seen multiple sites say that decompression happens in the broker
> and
> > > multiple that say it happens in the consumer.
> > >
> >
>

Reply via email to