Hey Becket,

I realized that Apurva has already raised similar questions. I think you
answered his question by saying that the request size will not be small. I
agree that there will be no impact on throughput if we can reach request
size limit with compression estimation disabled. But I am not sure that
will be the case. This is at least a concern where MM is mirroring traffic
for only a few partitions of high byte-in rate. Thus I am wondering if we
should do the optimization proposed above.

Thanks,
Dong

On Wed, Feb 22, 2017 at 6:39 PM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Becket,
>
> Thanks for the KIP. I have one question here.
>
> Suppose producer's batch.size=100 KB, max.in.flight.requests.per.connection=1.
> Since each ProduceRequest contains one batch per partition, it means that
> 100 KB compressed data will be produced per partition per round-trip time
> as of current implementation. If we disable compression estimation with
> this KIP, then produce can only produce 100 KB uncompressed data per
> partition per round-trip time. Suppose average compression ratio is 10,
> then there will be 10X difference in the bytes that are transmitted per
> round-trip time. The impact on the throughput can be big if mirror maker is
> producer to a remote cluster, even though the compression ratio may be the
> same.
>
> Given this observation, we should probably note in the KIP that user
> should bump up the producer's batch.size to the message.max.bytes
> configured on the broker, which by default is roughly 1MB, to achieve
> maximum possible throughput when compression estimation is disabled.
>
> Still, this can impact throughput of producer or MM that are producing
> highly compressible data. I think we can get around this problem by
> allowing each request to have multiple batches per partition as long as the
> size of these batches <= producer's batch.size config. Do you think it is
> worth doing?
>
> Thanks,
> Dong
>
>
>
> On Tue, Feb 21, 2017 at 7:56 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> Apurva has a point that can be documented for this config.
>>
>> Overall, LGTM +1.
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Tue, Feb 21, 2017 at 6:41 PM, Becket Qin <becket....@gmail.com> wrote:
>>
>> > Hi Apurva,
>> >
>> > Yes, it is true that the request size might be much smaller if the
>> batching
>> > is based on uncompressed size. I will let the users know about this.
>> That
>> > said, in practice, this is probably fine. For example, at LinkedIn, our
>> max
>> > message size is 1 MB, typically the compressed size would be 100 KB or
>> > larger, given that in most cases, there are many partitions, the request
>> > size would not be too small (typically around a few MB).
>> >
>> > At LinkedIn we do have some topics has various compression ratio. Those
>> are
>> > usually topics shared by different services so the data may differ a lot
>> > although they are in the same topic and similar fields.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> > On Tue, Feb 21, 2017 at 6:17 PM, Apurva Mehta <apu...@confluent.io>
>> wrote:
>> >
>> > > Hi Becket, Thanks for the kip.
>> > >
>> > > I think one of the risks here is that when compression estimation is
>> > > disabled, you could have much smaller batches than expected, and
>> > throughput
>> > > could be hurt. It would be worth adding this to the documentation of
>> this
>> > > setting.
>> > >
>> > > Also, one of the rejected alternatives states that per topic
>> estimations
>> > > would not work when the compression of individual messages is
>> variable.
>> > > This is true in theory, but in practice one would expect Kafka topics
>> to
>> > > have fairly homogenous data, and hence should compress evenly. I was
>> > > curious if you have data which shows otherwise.
>> > >
>> > > Thanks,
>> > > Apurva
>> > >
>> > > On Tue, Feb 21, 2017 at 12:30 PM, Becket Qin <becket....@gmail.com>
>> > wrote:
>> > >
>> > > > Hi folks,
>> > > >
>> > > > I would like to start the discussion thread on KIP-126. The KIP
>> propose
>> > > > adding a new configuration to KafkaProducer to allow batching based
>> on
>> > > > uncompressed message size.
>> > > >
>> > > > Comments are welcome.
>> > > >
>> > > > The KIP wiki is following:
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > 126+-+Allow+KafkaProducer+to+batch+based+on+uncompressed+size
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>

Reply via email to