And to be clear, if uncompressed messages come in, they remain uncompressed
in the broker, correct?

Correct

Currently, only the broker has knowledge of the offsets for a partition and
hence is the right place to assign the offsets. Even if the producer sends
metadata, the broker still needs to decompress the data in order to get a
handle to the individual message in order to assign the logical offset.

One of the JIRAs discussing this is here -
https://issues.apache.org/jira/browse/KAFKA-595

Thanks,
Neha

On Tue, Oct 8, 2013 at 9:24 AM, Jason Rosenberg <j...@squareup.com> wrote:

> Ah,
>
> I think I remember a previous discussion on a way to avoid the double
> compression....
>
> So would it be possible for the producer to send metadata with a compressed
> batch that includes the logical offset info for the batch?  Can this info
> just be a count of how many messages are in the batch?
>
> And to be clear, if uncompressed messages come in, they remain uncompressed
> in the broker, correct?
>
> Jason
>
>
> On Tue, Oct 8, 2013 at 10:20 AM, Neha Narkhede <neha.narkh...@gmail.com
> >wrote:
>
> > The broker only recompresses the messages if the producer sent them
> > compressed. And it has to recompress to assign the logical offsets to the
> > individual messages inside the compressed message.
> >
> > Thanks,
> > Neha
> > On Oct 7, 2013 11:36 PM, "Jason Rosenberg" <j...@squareup.com> wrote:
> >
> > > Neha,
> > >
> > > Does the broker store messages compressed, even if the producer doesn't
> > > compress them when sending them to the broker?
> > >
> > > Why does the broker re-compress message batches?  Does it not have
> enough
> > > info from the producer request to know the number of messages in the
> > batch?
> > >
> > > Jason
> > >
> > >
> > > On Mon, Oct 7, 2013 at 12:40 PM, Neha Narkhede <
> neha.narkh...@gmail.com
> > > >wrote:
> > >
> > > > the total message size of the batch should be less than
> > > > message.max.bytes or is that for each individual message?
> > > >
> > > > The former is correct.
> > > >
> > > > When you batch, I am assuming that the producer sends some sort of
> flag
> > > > that this is a batch, and then the broker will split up those
> messages
> > to
> > > > individual messages and store them in the log correct?
> > > >
> > > > The broker splits the compressed message into individual messages to
> > > assign
> > > > the logical offsets to every message, but the data is finally stored
> > > > compressed and is delivered in the compressed format to the consumer.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Mon, Oct 7, 2013 at 9:26 AM, S Ahmed <sahmed1...@gmail.com>
> wrote:
> > > >
> > > > > When you batch things on the producer, say you batch 1000 messages
> or
> > > by
> > > > > time whatever, the total message size of the batch should be less
> > than
> > > > > message.max.bytes or is that for each individual message?
> > > > >
> > > > > When you batch, I am assuming that the producer sends some sort of
> > flag
> > > > > that this is a batch, and then the broker will split up those
> > messages
> > > to
> > > > > individual messages and store them in the log correct?
> > > > >
> > > > >
> > > > > On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede <
> > > neha.narkh...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > The message size limit is imposed on the compressed message. To
> > > answer
> > > > > your
> > > > > > question about the effect of large messages - they cause memory
> > > > pressure
> > > > > on
> > > > > > the Kafka brokers as well as on the consumer since we re-compress
> > > > > messages
> > > > > > on the broker and decompress messages on the consumer.
> > > > > >
> > > > > > I'm not so sure that large messages will have a hit on latency
> > since
> > > > > > compressing a few large messages vs compressing lots of small
> > > messages
> > > > > with
> > > > > > the same content, should not be any slower. But you want to be
> > > careful
> > > > on
> > > > > > the batch size since you don't want the compressed message to
> > exceed
> > > > the
> > > > > > message size limit.
> > > > > >
> > > > > > Thanks,
> > > > > > Neha
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 7, 2013 at 9:10 AM, S Ahmed <sahmed1...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > I see, so that is one thing to consider is if I have 20 KB
> > > messages,
> > > > I
> > > > > > > shouldn't batch too many together as that will increase latency
> > and
> > > > the
> > > > > > > memory usage footprint on the producer side of things.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 7, 2013 at 11:55 AM, Jun Rao <jun...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > At LinkedIn, our message size can be 10s of KB. This is
> mostly
> > > > > because
> > > > > > we
> > > > > > > > batch a set of messages and send them as a single compressed
> > > > message.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Oct 7, 2013 at 7:44 AM, S Ahmed <
> sahmed1...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > When people using message queues, the message size is
> usually
> > > > > pretty
> > > > > > > > small.
> > > > > > > > >
> > > > > > > > > I want to know who out there is using kafka with larger
> > payload
> > > > > > sizes?
> > > > > > > > >
> > > > > > > > > In the configuration, the maximum message size by default
> is
> > > set
> > > > > to 1
> > > > > > > > > megabyte (
> > > > > > > > > message.max.bytes1000000)
> > > > > > > > >
> > > > > > > > > My message sizes will be probably be around 20-50 KB but to
> > me
> > > > that
> > > > > > is
> > > > > > > > > large for a message payload so I'm wondering what effects
> > that
> > > > will
> > > > > > > have
> > > > > > > > > with kafka.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to