Hey Becket, I realized that Apurva has already raised similar questions. I think you answered his question by saying that the request size will not be small. I agree that there will be no impact on throughput if we can reach request size limit with compression estimation disabled. But I am not sure that will be the case. This is at least a concern where MM is mirroring traffic for only a few partitions of high byte-in rate. Thus I am wondering if we should do the optimization proposed above.
Thanks, Dong On Wed, Feb 22, 2017 at 6:39 PM, Dong Lin <lindon...@gmail.com> wrote: > Hey Becket, > > Thanks for the KIP. I have one question here. > > Suppose producer's batch.size=100 KB, max.in.flight.requests.per.connection=1. > Since each ProduceRequest contains one batch per partition, it means that > 100 KB compressed data will be produced per partition per round-trip time > as of current implementation. If we disable compression estimation with > this KIP, then produce can only produce 100 KB uncompressed data per > partition per round-trip time. Suppose average compression ratio is 10, > then there will be 10X difference in the bytes that are transmitted per > round-trip time. The impact on the throughput can be big if mirror maker is > producer to a remote cluster, even though the compression ratio may be the > same. > > Given this observation, we should probably note in the KIP that user > should bump up the producer's batch.size to the message.max.bytes > configured on the broker, which by default is roughly 1MB, to achieve > maximum possible throughput when compression estimation is disabled. > > Still, this can impact throughput of producer or MM that are producing > highly compressible data. I think we can get around this problem by > allowing each request to have multiple batches per partition as long as the > size of these batches <= producer's batch.size config. Do you think it is > worth doing? > > Thanks, > Dong > > > > On Tue, Feb 21, 2017 at 7:56 PM, Mayuresh Gharat < > gharatmayures...@gmail.com> wrote: > >> Apurva has a point that can be documented for this config. >> >> Overall, LGTM +1. >> >> Thanks, >> >> Mayuresh >> >> On Tue, Feb 21, 2017 at 6:41 PM, Becket Qin <becket....@gmail.com> wrote: >> >> > Hi Apurva, >> > >> > Yes, it is true that the request size might be much smaller if the >> batching >> > is based on uncompressed size. I will let the users know about this. >> That >> > said, in practice, this is probably fine. For example, at LinkedIn, our >> max >> > message size is 1 MB, typically the compressed size would be 100 KB or >> > larger, given that in most cases, there are many partitions, the request >> > size would not be too small (typically around a few MB). >> > >> > At LinkedIn we do have some topics has various compression ratio. Those >> are >> > usually topics shared by different services so the data may differ a lot >> > although they are in the same topic and similar fields. >> > >> > Thanks, >> > >> > Jiangjie (Becket) Qin >> > >> > >> > On Tue, Feb 21, 2017 at 6:17 PM, Apurva Mehta <apu...@confluent.io> >> wrote: >> > >> > > Hi Becket, Thanks for the kip. >> > > >> > > I think one of the risks here is that when compression estimation is >> > > disabled, you could have much smaller batches than expected, and >> > throughput >> > > could be hurt. It would be worth adding this to the documentation of >> this >> > > setting. >> > > >> > > Also, one of the rejected alternatives states that per topic >> estimations >> > > would not work when the compression of individual messages is >> variable. >> > > This is true in theory, but in practice one would expect Kafka topics >> to >> > > have fairly homogenous data, and hence should compress evenly. I was >> > > curious if you have data which shows otherwise. >> > > >> > > Thanks, >> > > Apurva >> > > >> > > On Tue, Feb 21, 2017 at 12:30 PM, Becket Qin <becket....@gmail.com> >> > wrote: >> > > >> > > > Hi folks, >> > > > >> > > > I would like to start the discussion thread on KIP-126. The KIP >> propose >> > > > adding a new configuration to KafkaProducer to allow batching based >> on >> > > > uncompressed message size. >> > > > >> > > > Comments are welcome. >> > > > >> > > > The KIP wiki is following: >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > 126+-+Allow+KafkaProducer+to+batch+based+on+uncompressed+size >> > > > >> > > > Thanks, >> > > > >> > > > Jiangjie (Becket) Qin >> > > > >> > > >> > >> >> >> >> -- >> -Regards, >> Mayuresh R. Gharat >> (862) 250-7125 >> > >