Using a single Kafka message to contain an application snapshot has the
upside of getting atomicity for free. Either the snapshot will be written
as a whole to Kafka or not. This is poor man's transactionality. Care needs
to be taken to ensure that the message is not too large since that might
cause memory consumption problems on the server or the consumers.

As far as compression overhead is concerned, have you tried running Snappy?
Snappy's performance is good enough to offset the decompression-compression
overhead on the server.

Thanks,
Neha


On Thu, Jun 26, 2014 at 12:42 PM, Bert Corderman <bertc...@gmail.com> wrote:

> We are in the process of engineering a system that will be using kafka.
> The legacy system is using the local file system and  a database as the
> queue.  In terms of scale we process about 35 billion events per day
> contained in 15 million files.
>
>
>
> I am looking for feedback on a design decision we are discussing
>
>
>
> In our current system we depending heavily on compression as a performance
> optimization.  In kafka the use of compression has some overhead as the
> broker needs to decompress the data to assign offsets and re-compress.
> (explained in detail here
>
> http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/
> )
>
>
>
> We are thinking about NOT using Kafka compression but rather compressing
> multiple rows in our code. For example let’s say we wanted to send data in
> batches of 5,00 rows.  Using Kafka compression we would use a batch size of
>  5,000 rows and use compression. The other option is using a batch size of
> 1 in Kafka BUT in our code take 5,000 messages, compress them and then send
> to kafka using the kafka compression setting of none.
>
>
>
> Is this  a pattern others have used?
>
>
>
> Regardless of compression I am curious if others are using a single message
> in kafka to contain multiple messages from an application standpoint.
>
>
> Bert
>

Reply via email to