Thanks for the explanation. I agree that it is not easy to have a
well-defined and accurate measurement of the split ratio.
On Wed, Apr 19, 2017 at 9:38 AM, Becket Qin wrote:
> Thanks for the comment, Dong. I think the batch-split-ratio makes sense but
> is kind of redundant to batch-split-rate.
Thanks for the comment, Dong. I think the batch-split-ratio makes sense but
is kind of redundant to batch-split-rate.
Also the batch-split-ratio may be a little more involved to make right:
1. A all-time batch split ratio is easy to get but not that useful.
2. A time-windowed batch-split-ratio is
Never mind about my second comment. I misunderstood the semantics of
producer's batch.size.
On Wed, Mar 22, 2017 at 10:20 AM, Dong Lin wrote:
> Hey Becket,
>
> In addition to the batch-split-rate, should we also add batch-split-ratio
> sensor to gauge the probability that we have to split batch?
Hey Becket,
In addition to the batch-split-rate, should we also add batch-split-ratio
sensor to gauge the probability that we have to split batch?
Also, in the case that the batch size configured for the producer is
smaller than the max message size configured for the broker, why can't we
just sp
I see, then we are thinking about the same thing :)
On Wed, Mar 15, 2017 at 2:26 PM, Ismael Juma wrote:
> I meant finishing what's described in the following section and then
> starting a discussion followed by a vote:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 4+-+Command+line
I meant finishing what's described in the following section and then
starting a discussion followed by a vote:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-DescribeConfigsReq
Hi Ismael,
KIP-4 is also the one that I was thinking about. We have introduced a
DescribeConfigRequest there so the producer can easily get the
configurations. By "another KIP" do you mean a new (or maybe extended)
protocol or using that protocol in clients?
Thanks,
Jiangjie (Becket) Qin
On Wed
Hi Becket,
How were you thinking of retrieving the configuration items you mentioned?
I am asking because I was planning to post a KIP for Describe Configs (one
of the protocols in KIP-4), which would expose such information. But maybe
you are thinking of extending Metadata request?
Ismael
On We
Hi Jason,
Good point. I was thinking about that, too. I was not sure if that is the
right thing to do by default.
If we assume people always set the batch size to max message size,
splitting the oversized batch makes a lot of sense. But it seems possible
that users want to control the memory foot
Hey Becket,
Thanks for the KIP! The approach seems reasonable. One clarification: is
the intent to do the splitting after the broker rejects the request with
MESSAGE_TOO_LARGE, or prior to sending if the configured batch size is
exceeded?
-Jason
On Mon, Mar 13, 2017 at 8:10 PM, Becket Qin wrote
Bump up the thread for further comments. If there is no more comments on
the KIP I will start the voting thread on Wed.
Thanks,
Jiangjie (Becket) Qin
On Tue, Mar 7, 2017 at 9:48 AM, Becket Qin wrote:
> Hi Dong,
>
> Thanks for the comments.
>
> The patch is mostly for proof of concept in case t
Hi Dong,
Thanks for the comments.
The patch is mostly for proof of concept in case there is any concern about
the implementation which is indeed a little tricky.
The new metric has already been mentioned in the Public Interface Change
section.
I added the reasoning about how the compression rat
Hey Becket,
I am wondering if we should first vote for the KIP before reviewing the
patch. I have two comments below:
- Should we specify the new sensors as part of interface change in the KIP?
- The KIP proposes to increase estimated compression ratio by 0.05 for each
underestimation and decreme
I have updated the KIP based on the latest discussion. Please check and let
me know if there is any further concern.
Thanks,
Jiangjie (Becket) Qin
On Sat, Mar 4, 2017 at 10:56 AM, Becket Qin wrote:
> Actually second thought on this, rate might be better for two reasons:
> 1. Most of the metric
Actually second thought on this, rate might be better for two reasons:
1. Most of the metrics in the producer we already have are using rate
instead of count.
2. If a service is bounced, the count will be reset to 0, but it does not
affect rate.
I'll make the change.
Thanks,
Jiangjie (Becket) Qi
Hi Dong,
Yes, there is a sensor in the patch about the split occurrence.
Currently it is a count instead of rate. In practice, it seems count is
easier to use in this case. But I am open to change.
Thanks,
Jiangjie (Becket) Qin
On Fri, Mar 3, 2017 at 7:43 PM, Dong Lin wrote:
> Hey Becket,
>
Hey Becket,
I haven't looked at the patch yet. But since we are going to try the
split-on-oversize solution, should the KIP also add a sensor that shows the
rate of split per second and the probability of split?
Thanks,
Dong
On Fri, Mar 3, 2017 at 6:39 PM, Becket Qin wrote:
> Just to clarify,
Just to clarify, the implementation is basically what I mentioned above
(split/resend + adjusted estimation evolving algorithm) and changing the
compression ratio estimation to be per topic.
Thanks,
Jiangjie (Becket) Qin
On Fri, Mar 3, 2017 at 6:36 PM, Becket Qin wrote:
> I went ahead and have
I went ahead and have a patch submitted here:
https://github.com/apache/kafka/pull/2638
Per Joel's suggestion, I changed the compression ratio to be per topic as
well. It seems working well. Since there is an important behavior change
and a new sensor is added, I'll keep the KIP and update it acco
>
> Lets say we sent the batch over the wire and received a
> RecordTooLargeException, how do we split it as once we add the message to
> the batch we loose the message level granularity. We will have to
> decompress, do deep iteration and split and again compress. right? This
> looks like a perfor
Hey Mayuresh,
1) The batch would be split when an RecordTooLargeException is received.
2) Not lower the actual compression ratio, but lower the estimated
compression ratio "according to" the Actual Compression Ratio(ACR).
An example, let's start with Estimated Compression Ratio (ECR) = 1.0. Say
t
Hi Becket,
Thanks for the expatiation.
Regarding :
1) The batch would be split when an RecordTooLargeException is received.
Lets say we sent the batch over the wire and received a
RecordTooLargeException, how do we split it as once we add the message to
the batch we loose the message level granul
Hi Becket,
Seems like an interesting idea.
I had couple of questions :
1) How do we decide when the batch should be split?
2) What do you mean by slowly lowering the "actual" compression ratio?
An example would really help here.
Thanks,
Mayuresh
On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin wrot
Hi Jay,
Yeah, I got your point.
I think there might be a solution which do not require adding a new
configuration. We can start from a very conservative compression ratio say
1.0 and lower it very slowly according to the actual compression ratio
until we hit a point that we have to split a batch.
Hey Becket,
Yeah that makes sense.
I agree that you'd really have to both fix the estimation (i.e. make it per
topic or make it better estimate the high percentiles) AND have the
recovery mechanism. If you are underestimating often and then paying a high
recovery price that won't fly.
I think yo
@Dong,
Thanks for the comments. The default behavior of the producer won't change.
If the users want to use the uncompressed message size, they probably will
also bump up the batch size to somewhere close to the max message size.
This would be in the document. BTW the default batch size is 16K whi
Hey Becket,
I get the problem we want to solve with this, but I don't think this is
something that makes sense as a user controlled knob that everyone sending
data to kafka has to think about. It is basically a bug, right?
First, as a technical question is it true that using the uncompressed size
Hey Becket,
I realized that Apurva has already raised similar questions. I think you
answered his question by saying that the request size will not be small. I
agree that there will be no impact on throughput if we can reach request
size limit with compression estimation disabled. But I am not sur
Hey Becket,
Thanks for the KIP. I have one question here.
Suppose producer's batch.size=100 KB,
max.in.flight.requests.per.connection=1. Since each ProduceRequest contains
one batch per partition, it means that 100 KB compressed data will be
produced per partition per round-trip time as of curren
Apurva has a point that can be documented for this config.
Overall, LGTM +1.
Thanks,
Mayuresh
On Tue, Feb 21, 2017 at 6:41 PM, Becket Qin wrote:
> Hi Apurva,
>
> Yes, it is true that the request size might be much smaller if the batching
> is based on uncompressed size. I will let the users k
Hi Apurva,
Yes, it is true that the request size might be much smaller if the batching
is based on uncompressed size. I will let the users know about this. That
said, in practice, this is probably fine. For example, at LinkedIn, our max
message size is 1 MB, typically the compressed size would be
Hi Becket, Thanks for the kip.
I think one of the risks here is that when compression estimation is
disabled, you could have much smaller batches than expected, and throughput
could be hurt. It would be worth adding this to the documentation of this
setting.
Also, one of the rejected alternatives
+1. LGTM. No major concerns.
On Tue, Feb 21, 2017 at 12:30 PM, Becket Qin wrote:
> Hi folks,
>
> I would like to start the discussion thread on KIP-126. The KIP propose
> adding a new configuration to KafkaProducer to allow batching based on
> uncompressed message size.
>
> Comments are welcome.
Hi folks,
I would like to start the discussion thread on KIP-126. The KIP propose
adding a new configuration to KafkaProducer to allow batching based on
uncompressed message size.
Comments are welcome.
The KIP wiki is following:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-126+-+Allow+K
34 matches
Mail list logo