Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-04-20 Thread Dong Lin
Thanks for the explanation. I agree that it is not easy to have a well-defined and accurate measurement of the split ratio. On Wed, Apr 19, 2017 at 9:38 AM, Becket Qin wrote: > Thanks for the comment, Dong. I think the batch-split-ratio makes sense but > is kind of redundant to batch-split-rate.

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-04-19 Thread Becket Qin
Thanks for the comment, Dong. I think the batch-split-ratio makes sense but is kind of redundant to batch-split-rate. Also the batch-split-ratio may be a little more involved to make right: 1. A all-time batch split ratio is easy to get but not that useful. 2. A time-windowed batch-split-ratio is

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-22 Thread Dong Lin
Never mind about my second comment. I misunderstood the semantics of producer's batch.size. On Wed, Mar 22, 2017 at 10:20 AM, Dong Lin wrote: > Hey Becket, > > In addition to the batch-split-rate, should we also add batch-split-ratio > sensor to gauge the probability that we have to split batch?

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-22 Thread Dong Lin
Hey Becket, In addition to the batch-split-rate, should we also add batch-split-ratio sensor to gauge the probability that we have to split batch? Also, in the case that the batch size configured for the producer is smaller than the max message size configured for the broker, why can't we just sp

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-15 Thread Becket Qin
I see, then we are thinking about the same thing :) On Wed, Mar 15, 2017 at 2:26 PM, Ismael Juma wrote: > I meant finishing what's described in the following section and then > starting a discussion followed by a vote: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 4+-+Command+line

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-15 Thread Ismael Juma
I meant finishing what's described in the following section and then starting a discussion followed by a vote: https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-DescribeConfigsReq

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-15 Thread Becket Qin
Hi Ismael, KIP-4 is also the one that I was thinking about. We have introduced a DescribeConfigRequest there so the producer can easily get the configurations. By "another KIP" do you mean a new (or maybe extended) protocol or using that protocol in clients? Thanks, Jiangjie (Becket) Qin On Wed

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-15 Thread Ismael Juma
Hi Becket, How were you thinking of retrieving the configuration items you mentioned? I am asking because I was planning to post a KIP for Describe Configs (one of the protocols in KIP-4), which would expose such information. But maybe you are thinking of extending Metadata request? Ismael On We

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-15 Thread Becket Qin
Hi Jason, Good point. I was thinking about that, too. I was not sure if that is the right thing to do by default. If we assume people always set the batch size to max message size, splitting the oversized batch makes a lot of sense. But it seems possible that users want to control the memory foot

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-15 Thread Jason Gustafson
Hey Becket, Thanks for the KIP! The approach seems reasonable. One clarification: is the intent to do the splitting after the broker rejects the request with MESSAGE_TOO_LARGE, or prior to sending if the configured batch size is exceeded? -Jason On Mon, Mar 13, 2017 at 8:10 PM, Becket Qin wrote

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-13 Thread Becket Qin
Bump up the thread for further comments. If there is no more comments on the KIP I will start the voting thread on Wed. Thanks, Jiangjie (Becket) Qin On Tue, Mar 7, 2017 at 9:48 AM, Becket Qin wrote: > Hi Dong, > > Thanks for the comments. > > The patch is mostly for proof of concept in case t

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-07 Thread Becket Qin
Hi Dong, Thanks for the comments. The patch is mostly for proof of concept in case there is any concern about the implementation which is indeed a little tricky. The new metric has already been mentioned in the Public Interface Change section. I added the reasoning about how the compression rat

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-06 Thread Dong Lin
Hey Becket, I am wondering if we should first vote for the KIP before reviewing the patch. I have two comments below: - Should we specify the new sensors as part of interface change in the KIP? - The KIP proposes to increase estimated compression ratio by 0.05 for each underestimation and decreme

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-04 Thread Becket Qin
I have updated the KIP based on the latest discussion. Please check and let me know if there is any further concern. Thanks, Jiangjie (Becket) Qin On Sat, Mar 4, 2017 at 10:56 AM, Becket Qin wrote: > Actually second thought on this, rate might be better for two reasons: > 1. Most of the metric

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-04 Thread Becket Qin
Actually second thought on this, rate might be better for two reasons: 1. Most of the metrics in the producer we already have are using rate instead of count. 2. If a service is bounced, the count will be reset to 0, but it does not affect rate. I'll make the change. Thanks, Jiangjie (Becket) Qi

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-04 Thread Becket Qin
Hi Dong, Yes, there is a sensor in the patch about the split occurrence. Currently it is a count instead of rate. In practice, it seems count is easier to use in this case. But I am open to change. Thanks, Jiangjie (Becket) Qin On Fri, Mar 3, 2017 at 7:43 PM, Dong Lin wrote: > Hey Becket, >

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-03 Thread Dong Lin
Hey Becket, I haven't looked at the patch yet. But since we are going to try the split-on-oversize solution, should the KIP also add a sensor that shows the rate of split per second and the probability of split? Thanks, Dong On Fri, Mar 3, 2017 at 6:39 PM, Becket Qin wrote: > Just to clarify,

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-03 Thread Becket Qin
Just to clarify, the implementation is basically what I mentioned above (split/resend + adjusted estimation evolving algorithm) and changing the compression ratio estimation to be per topic. Thanks, Jiangjie (Becket) Qin On Fri, Mar 3, 2017 at 6:36 PM, Becket Qin wrote: > I went ahead and have

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-03-03 Thread Becket Qin
I went ahead and have a patch submitted here: https://github.com/apache/kafka/pull/2638 Per Joel's suggestion, I changed the compression ratio to be per topic as well. It seems working well. Since there is an important behavior change and a new sensor is added, I'll keep the KIP and update it acco

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Joel Koshy
> > Lets say we sent the batch over the wire and received a > RecordTooLargeException, how do we split it as once we add the message to > the batch we loose the message level granularity. We will have to > decompress, do deep iteration and split and again compress. right? This > looks like a perfor

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Becket Qin
Hey Mayuresh, 1) The batch would be split when an RecordTooLargeException is received. 2) Not lower the actual compression ratio, but lower the estimated compression ratio "according to" the Actual Compression Ratio(ACR). An example, let's start with Estimated Compression Ratio (ECR) = 1.0. Say t

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Mayuresh Gharat
Hi Becket, Thanks for the expatiation. Regarding : 1) The batch would be split when an RecordTooLargeException is received. Lets say we sent the batch over the wire and received a RecordTooLargeException, how do we split it as once we add the message to the batch we loose the message level granul

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Mayuresh Gharat
Hi Becket, Seems like an interesting idea. I had couple of questions : 1) How do we decide when the batch should be split? 2) What do you mean by slowly lowering the "actual" compression ratio? An example would really help here. Thanks, Mayuresh On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin wrot

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-24 Thread Becket Qin
Hi Jay, Yeah, I got your point. I think there might be a solution which do not require adding a new configuration. We can start from a very conservative compression ratio say 1.0 and lower it very slowly according to the actual compression ratio until we hit a point that we have to split a batch.

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-23 Thread Jay Kreps
Hey Becket, Yeah that makes sense. I agree that you'd really have to both fix the estimation (i.e. make it per topic or make it better estimate the high percentiles) AND have the recovery mechanism. If you are underestimating often and then paying a high recovery price that won't fly. I think yo

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-23 Thread Becket Qin
@Dong, Thanks for the comments. The default behavior of the producer won't change. If the users want to use the uncompressed message size, they probably will also bump up the batch size to somewhere close to the max message size. This would be in the document. BTW the default batch size is 16K whi

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-22 Thread Jay Kreps
Hey Becket, I get the problem we want to solve with this, but I don't think this is something that makes sense as a user controlled knob that everyone sending data to kafka has to think about. It is basically a bug, right? First, as a technical question is it true that using the uncompressed size

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-22 Thread Dong Lin
Hey Becket, I realized that Apurva has already raised similar questions. I think you answered his question by saying that the request size will not be small. I agree that there will be no impact on throughput if we can reach request size limit with compression estimation disabled. But I am not sur

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-22 Thread Dong Lin
Hey Becket, Thanks for the KIP. I have one question here. Suppose producer's batch.size=100 KB, max.in.flight.requests.per.connection=1. Since each ProduceRequest contains one batch per partition, it means that 100 KB compressed data will be produced per partition per round-trip time as of curren

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-21 Thread Mayuresh Gharat
Apurva has a point that can be documented for this config. Overall, LGTM +1. Thanks, Mayuresh On Tue, Feb 21, 2017 at 6:41 PM, Becket Qin wrote: > Hi Apurva, > > Yes, it is true that the request size might be much smaller if the batching > is based on uncompressed size. I will let the users k

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-21 Thread Becket Qin
Hi Apurva, Yes, it is true that the request size might be much smaller if the batching is based on uncompressed size. I will let the users know about this. That said, in practice, this is probably fine. For example, at LinkedIn, our max message size is 1 MB, typically the compressed size would be

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-21 Thread Apurva Mehta
Hi Becket, Thanks for the kip. I think one of the risks here is that when compression estimation is disabled, you could have much smaller batches than expected, and throughput could be hurt. It would be worth adding this to the documentation of this setting. Also, one of the rejected alternatives

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-21 Thread Onur Karaman
+1. LGTM. No major concerns. On Tue, Feb 21, 2017 at 12:30 PM, Becket Qin wrote: > Hi folks, > > I would like to start the discussion thread on KIP-126. The KIP propose > adding a new configuration to KafkaProducer to allow batching based on > uncompressed message size. > > Comments are welcome.

[DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-21 Thread Becket Qin
Hi folks, I would like to start the discussion thread on KIP-126. The KIP propose adding a new configuration to KafkaProducer to allow batching based on uncompressed message size. Comments are welcome. The KIP wiki is following: https://cwiki.apache.org/confluence/display/KAFKA/KIP-126+-+Allow+K