In trying to better understand compression I came across the following
http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/ “in Kafka 0.8, messages for a partition are served by the leader broker. The leader assigns these unique logical offsets to every message it appends to its log. Now, if the data is compressed, the leader has to decompress the data in order to assign offsets to the messages inside the compressed message. So the leader decompresses data, assigns offsets, compresses it again and then appends the re-compressed data to disk” I am assuming when the data is re compressed on the broker the same rows are batched together. For example say I am using a batch size of 400 from the producer these messages would be saved compressed on desk in batches of 400. Does this imply that consumers need to ensure they set the same batch size to ensure their requests align with the stored batch size? For example if the consumer is set to use batches of 100 and the producer used 400 would the consumer then read 400 messages for each batch of 100 messages? …only to go back and request many of the same rows on the next batch? Bert