And to be clear, if uncompressed messages come in, they remain uncompressed in the broker, correct?
Correct Currently, only the broker has knowledge of the offsets for a partition and hence is the right place to assign the offsets. Even if the producer sends metadata, the broker still needs to decompress the data in order to get a handle to the individual message in order to assign the logical offset. One of the JIRAs discussing this is here - https://issues.apache.org/jira/browse/KAFKA-595 Thanks, Neha On Tue, Oct 8, 2013 at 9:24 AM, Jason Rosenberg <j...@squareup.com> wrote: > Ah, > > I think I remember a previous discussion on a way to avoid the double > compression.... > > So would it be possible for the producer to send metadata with a compressed > batch that includes the logical offset info for the batch? Can this info > just be a count of how many messages are in the batch? > > And to be clear, if uncompressed messages come in, they remain uncompressed > in the broker, correct? > > Jason > > > On Tue, Oct 8, 2013 at 10:20 AM, Neha Narkhede <neha.narkh...@gmail.com > >wrote: > > > The broker only recompresses the messages if the producer sent them > > compressed. And it has to recompress to assign the logical offsets to the > > individual messages inside the compressed message. > > > > Thanks, > > Neha > > On Oct 7, 2013 11:36 PM, "Jason Rosenberg" <j...@squareup.com> wrote: > > > > > Neha, > > > > > > Does the broker store messages compressed, even if the producer doesn't > > > compress them when sending them to the broker? > > > > > > Why does the broker re-compress message batches? Does it not have > enough > > > info from the producer request to know the number of messages in the > > batch? > > > > > > Jason > > > > > > > > > On Mon, Oct 7, 2013 at 12:40 PM, Neha Narkhede < > neha.narkh...@gmail.com > > > >wrote: > > > > > > > the total message size of the batch should be less than > > > > message.max.bytes or is that for each individual message? > > > > > > > > The former is correct. > > > > > > > > When you batch, I am assuming that the producer sends some sort of > flag > > > > that this is a batch, and then the broker will split up those > messages > > to > > > > individual messages and store them in the log correct? > > > > > > > > The broker splits the compressed message into individual messages to > > > assign > > > > the logical offsets to every message, but the data is finally stored > > > > compressed and is delivered in the compressed format to the consumer. > > > > > > > > Thanks, > > > > Neha > > > > > > > > > > > > On Mon, Oct 7, 2013 at 9:26 AM, S Ahmed <sahmed1...@gmail.com> > wrote: > > > > > > > > > When you batch things on the producer, say you batch 1000 messages > or > > > by > > > > > time whatever, the total message size of the batch should be less > > than > > > > > message.max.bytes or is that for each individual message? > > > > > > > > > > When you batch, I am assuming that the producer sends some sort of > > flag > > > > > that this is a batch, and then the broker will split up those > > messages > > > to > > > > > individual messages and store them in the log correct? > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede < > > > neha.narkh...@gmail.com > > > > > >wrote: > > > > > > > > > > > The message size limit is imposed on the compressed message. To > > > answer > > > > > your > > > > > > question about the effect of large messages - they cause memory > > > > pressure > > > > > on > > > > > > the Kafka brokers as well as on the consumer since we re-compress > > > > > messages > > > > > > on the broker and decompress messages on the consumer. > > > > > > > > > > > > I'm not so sure that large messages will have a hit on latency > > since > > > > > > compressing a few large messages vs compressing lots of small > > > messages > > > > > with > > > > > > the same content, should not be any slower. But you want to be > > > careful > > > > on > > > > > > the batch size since you don't want the compressed message to > > exceed > > > > the > > > > > > message size limit. > > > > > > > > > > > > Thanks, > > > > > > Neha > > > > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 9:10 AM, S Ahmed <sahmed1...@gmail.com> > > > wrote: > > > > > > > > > > > > > I see, so that is one thing to consider is if I have 20 KB > > > messages, > > > > I > > > > > > > shouldn't batch too many together as that will increase latency > > and > > > > the > > > > > > > memory usage footprint on the producer side of things. > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 11:55 AM, Jun Rao <jun...@gmail.com> > > wrote: > > > > > > > > > > > > > > > At LinkedIn, our message size can be 10s of KB. This is > mostly > > > > > because > > > > > > we > > > > > > > > batch a set of messages and send them as a single compressed > > > > message. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 7, 2013 at 7:44 AM, S Ahmed < > sahmed1...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > When people using message queues, the message size is > usually > > > > > pretty > > > > > > > > small. > > > > > > > > > > > > > > > > > > I want to know who out there is using kafka with larger > > payload > > > > > > sizes? > > > > > > > > > > > > > > > > > > In the configuration, the maximum message size by default > is > > > set > > > > > to 1 > > > > > > > > > megabyte ( > > > > > > > > > message.max.bytes1000000) > > > > > > > > > > > > > > > > > > My message sizes will be probably be around 20-50 KB but to > > me > > > > that > > > > > > is > > > > > > > > > large for a message payload so I'm wondering what effects > > that > > > > will > > > > > > > have > > > > > > > > > with kafka. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >