Ah, I think I remember a previous discussion on a way to avoid the double compression....
So would it be possible for the producer to send metadata with a compressed batch that includes the logical offset info for the batch? Can this info just be a count of how many messages are in the batch? And to be clear, if uncompressed messages come in, they remain uncompressed in the broker, correct? Jason On Tue, Oct 8, 2013 at 10:20 AM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > The broker only recompresses the messages if the producer sent them > compressed. And it has to recompress to assign the logical offsets to the > individual messages inside the compressed message. > > Thanks, > Neha > On Oct 7, 2013 11:36 PM, "Jason Rosenberg" <j...@squareup.com> wrote: > > > Neha, > > > > Does the broker store messages compressed, even if the producer doesn't > > compress them when sending them to the broker? > > > > Why does the broker re-compress message batches? Does it not have enough > > info from the producer request to know the number of messages in the > batch? > > > > Jason > > > > > > On Mon, Oct 7, 2013 at 12:40 PM, Neha Narkhede <neha.narkh...@gmail.com > > >wrote: > > > > > the total message size of the batch should be less than > > > message.max.bytes or is that for each individual message? > > > > > > The former is correct. > > > > > > When you batch, I am assuming that the producer sends some sort of flag > > > that this is a batch, and then the broker will split up those messages > to > > > individual messages and store them in the log correct? > > > > > > The broker splits the compressed message into individual messages to > > assign > > > the logical offsets to every message, but the data is finally stored > > > compressed and is delivered in the compressed format to the consumer. > > > > > > Thanks, > > > Neha > > > > > > > > > On Mon, Oct 7, 2013 at 9:26 AM, S Ahmed <sahmed1...@gmail.com> wrote: > > > > > > > When you batch things on the producer, say you batch 1000 messages or > > by > > > > time whatever, the total message size of the batch should be less > than > > > > message.max.bytes or is that for each individual message? > > > > > > > > When you batch, I am assuming that the producer sends some sort of > flag > > > > that this is a batch, and then the broker will split up those > messages > > to > > > > individual messages and store them in the log correct? > > > > > > > > > > > > On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede < > > neha.narkh...@gmail.com > > > > >wrote: > > > > > > > > > The message size limit is imposed on the compressed message. To > > answer > > > > your > > > > > question about the effect of large messages - they cause memory > > > pressure > > > > on > > > > > the Kafka brokers as well as on the consumer since we re-compress > > > > messages > > > > > on the broker and decompress messages on the consumer. > > > > > > > > > > I'm not so sure that large messages will have a hit on latency > since > > > > > compressing a few large messages vs compressing lots of small > > messages > > > > with > > > > > the same content, should not be any slower. But you want to be > > careful > > > on > > > > > the batch size since you don't want the compressed message to > exceed > > > the > > > > > message size limit. > > > > > > > > > > Thanks, > > > > > Neha > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 9:10 AM, S Ahmed <sahmed1...@gmail.com> > > wrote: > > > > > > > > > > > I see, so that is one thing to consider is if I have 20 KB > > messages, > > > I > > > > > > shouldn't batch too many together as that will increase latency > and > > > the > > > > > > memory usage footprint on the producer side of things. > > > > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 11:55 AM, Jun Rao <jun...@gmail.com> > wrote: > > > > > > > > > > > > > At LinkedIn, our message size can be 10s of KB. This is mostly > > > > because > > > > > we > > > > > > > batch a set of messages and send them as a single compressed > > > message. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 7:44 AM, S Ahmed <sahmed1...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > When people using message queues, the message size is usually > > > > pretty > > > > > > > small. > > > > > > > > > > > > > > > > I want to know who out there is using kafka with larger > payload > > > > > sizes? > > > > > > > > > > > > > > > > In the configuration, the maximum message size by default is > > set > > > > to 1 > > > > > > > > megabyte ( > > > > > > > > message.max.bytes1000000) > > > > > > > > > > > > > > > > My message sizes will be probably be around 20-50 KB but to > me > > > that > > > > > is > > > > > > > > large for a message payload so I'm wondering what effects > that > > > will > > > > > > have > > > > > > > > with kafka. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >