As a data point on one system, while Snappy compression is significantly
better than gzip, for our system isn't wasn't enough to offset the
decompress/compress on the broker. No matter how fast the compression is,
doing that on the broker will always be slower than not.

We went the route the original poster is discussing and compressed in our
app on the producer, allowing the broker to do its in-place offset
management without decompressing. This has the trade-off of having to do
the batching on our own and for our consuming applications to manage the
decompress, but the result was about 3x increase in throughput from that
change alone. Our application-level messages are around 3KB each.

I think anyone wanting to get the most throughput may have good luck going
this route. The decompress/compress on the broker has a significant effect,
regardless of the compression scheme used.

-Chris



On Thu, Jun 26, 2014 at 3:23 PM, Neha Narkhede <neha.narkh...@gmail.com>
wrote:

> Using a single Kafka message to contain an application snapshot has the
> upside of getting atomicity for free. Either the snapshot will be written
> as a whole to Kafka or not. This is poor man's transactionality. Care needs
> to be taken to ensure that the message is not too large since that might
> cause memory consumption problems on the server or the consumers.
>
> As far as compression overhead is concerned, have you tried running Snappy?
> Snappy's performance is good enough to offset the decompression-compression
> overhead on the server.
>
> Thanks,
> Neha
>
>
> On Thu, Jun 26, 2014 at 12:42 PM, Bert Corderman <bertc...@gmail.com>
> wrote:
>
> > We are in the process of engineering a system that will be using kafka.
> > The legacy system is using the local file system and  a database as the
> > queue.  In terms of scale we process about 35 billion events per day
> > contained in 15 million files.
> >
> >
> >
> > I am looking for feedback on a design decision we are discussing
> >
> >
> >
> > In our current system we depending heavily on compression as a
> performance
> > optimization.  In kafka the use of compression has some overhead as the
> > broker needs to decompress the data to assign offsets and re-compress.
> > (explained in detail here
> >
> >
> http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/
> > )
> >
> >
> >
> > We are thinking about NOT using Kafka compression but rather compressing
> > multiple rows in our code. For example let’s say we wanted to send data
> in
> > batches of 5,00 rows.  Using Kafka compression we would use a batch size
> of
> >  5,000 rows and use compression. The other option is using a batch size
> of
> > 1 in Kafka BUT in our code take 5,000 messages, compress them and then
> send
> > to kafka using the kafka compression setting of none.
> >
> >
> >
> > Is this  a pattern others have used?
> >
> >
> >
> > Regardless of compression I am curious if others are using a single
> message
> > in kafka to contain multiple messages from an application standpoint.
> >
> >
> > Bert
> >
>

Reply via email to