In trying to better understand compression I came across the following

http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/


“in Kafka 0.8, messages for a partition are served by the leader broker.
The leader assigns these unique logical offsets to every message it appends
to its log. Now, if the data is compressed, the leader has to decompress
the data in order to assign offsets to the messages inside the compressed
message. So the leader decompresses data, assigns offsets, compresses it
again and then appends the re-compressed data to disk”


I am assuming when the data is re compressed on the broker the same rows
are batched together.  For example say I am using a batch size of 400 from
the producer these messages would be saved compressed on desk in batches of
400.  Does this imply that consumers need to ensure they set the same batch
size to ensure their requests align with the stored batch size?  For
example if the consumer is set to use batches of 100 and the producer used
400 would the consumer then read 400 messages for each batch of 100
messages? …only to go back and request many of the same rows on the next
batch?


Bert

Reply via email to