We're using protos but there are still a bunch of custom fields where
clients specify redundant strings.

My local test is showing 75% reduction in size if I use zstd or gzip.  I
care the most about Kafka storage costs right now.

On Tue, Mar 15, 2022 at 2:25 PM Liam Clarke-Hutchinson <lclar...@redhat.com>
wrote:

> Hi Dan,
>
> Okay, so if you're looking for low latency, I'm guessing that you're using
> a very low linger.ms in the producers? Also, what format are the records?
> If they're already in a binary format like Protobuf or Avro, unless they're
> composed largely of strings, compression may offer little benefit.
>
> With your small records, I'd suggest running some tests with your current
> config with different compression settings - none, snappy, lz4, (don't
> bother with gzip unless that's all you have) and checking producer metrics
> (available via JMX if you're using the Java clients) for avg-batch-size and
> compression-ratio.
>
> You may just wish to start with no compression, and then consider moving to
> it if/when network bandwidth becomes a bottleneck.
>
> Regards,
>
> Liam
>
> On Tue, 15 Mar 2022 at 17:05, Dan Hill <quietgol...@gmail.com> wrote:
>
> > Thanks, Liam!
> >
> > I have a mixture of Kafka record size.  10% are large (>100kbs) and 90%
> of
> > the records are smaller than 1kb.  I'm working on a streaming analytics
> > solution that streams impressions, user actions and serving info and
> > combines them together.  End-to-end latency is more important than
> storage
> > size.
> >
> >
> > On Mon, Mar 14, 2022 at 3:27 PM Liam Clarke-Hutchinson <
> > lclar...@redhat.com>
> > wrote:
> >
> > > Hi Dan,
> > >
> > > Decompression generally only happens in the broker if the topic has a
> > > particular compression algorithm set, and the producer is using a
> > different
> > > one - then the broker will decompress records from the producer, then
> > > recompress it using the topic's configured algorithm. (The LogCleaner
> > will
> > > also decompress then recompress records when compacting compressed
> > topics).
> > >
> > > The consumer decompresses compressed record batches it receives.
> > >
> > > In my opinion, using topic compression instead of producer compression
> > > would only make sense if the overhead of a few more CPU cycles
> > compression
> > > uses was not tolerable for the producing app. In all of my use cases,
> > > network throughput becomes a bottleneck long before producer
> compression
> > > CPU cost does.
> > >
> > > For your "if X, do Y" formulation I'd say - if your producer is sending
> > > tiny batches, do some analysis of compressed vs. uncompressed size for
> > your
> > > given compression algorithm - you may find that compression overhead
> > > increases batch size for tiny batches.
> > >
> > > If you're sending a large amount of data, do tune your batching and use
> > > compression to reduce data being sent over the wire.
> > >
> > > If you can tell us more about what your problem domain, there might be
> > more
> > > advice that's applicable :)
> > >
> > > Cheers,
> > >
> > > Liam Clarke-Hutchinson
> > >
> > > On Tue, 15 Mar 2022 at 10:05, Dan Hill <quietgol...@gmail.com> wrote:
> > >
> > > > Hi.  I looked around for advice about Kafka compression.  I've seen
> > mixed
> > > > and conflicting advice.
> > > >
> > > > Is there any sorta "if X, do Y" type of documentation around Kafka
> > > > compression?
> > > >
> > > > Any advice?  Any good posts to read that talk about this trade off?
> > > >
> > > > *Detailed comments*
> > > > I tried looking for producer vs topic compression.  I didn't find
> much.
> > > > Some of the information I see is back from 2011 (which I'm guessing
> is
> > > > pretty stale).
> > > >
> > > > I can guess some potential benefits but I don't know if they are
> > actually
> > > > real.  I've also seen some sites claim certain trade offs but it's
> > > unclear
> > > > if they're true.
> > > >
> > > > It looks like I can modify an existing topic's compression.  I don't
> > know
> > > > if that actually works.  I'd assume it'd just impact data going
> > forward.
> > > >
> > > > I've seen multiple sites say that decompression happens in the broker
> > and
> > > > multiple that say it happens in the consumer.
> > > >
> > >
> >
>

Reply via email to