Hi Sergio, I made a suggestion a few weeks ago about the name about the parameter but haven’t got a response for it. Did you consider it?
Do we need to update the rejected alternatives section to mention the alternative options discussed in this thread? Thanks, David Le ven. 25 mars 2022 à 03:44, Luke Chen <show...@gmail.com> a écrit : > Hi Sergio, > > Thanks for asking. > Since it's been discussed for weeks, you can start the vote anytime. > > Thank you. > Luke > > On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano > <sergio.troi...@adevinta.com.invalid> wrote: > > > Hey guys, > > > > > > What is the next step? Who decides when it is time for voting? > > > > > > Thanks! > > > > Sent from my iPhone > > > > > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano < > > sergio.troi...@adevinta.com> wrote: > > > > > > > > > Hi Michael, > > > > > > Yes, it's a good idea and I considered it, the main problem is the > > FileRecords class does not accept number of batches as a parameter, it > > accepts bytes instead, so if we want to do so either we redesign a core > > class or we create a new one. > > > One of the pretty things (I consider) about this change which will > bring > > a huge benefit is the change in the code is pretty small and it's on the > > final script, it does not require any deep change in a core library. > > > > > > An alternative which requires a big change as well without touching the > > FileRecords class would be accept number of batches as parameter, then > call > > the FileRecords.slice() with a small value (bytes) count the batches, see > > if we can satisfy the number of batches, if not then we call it again and > > again until we reach the amount of batches. This will require a lot of > code > > as well > > > > > > So long story short, the proposal change is quite small, it uses the > > current classes and has a big benefit. > > > > > > Maybe in the future we could consider the FileRecords class to support > > getting the amount of batches as parameters and we encapsulate this logic > > in the proper class (FileRecords) > > > What do you think? > > > > > > Thanks > > > Sergio > > > Thanks > > > Sergio > > > > > >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison <mickael.mai...@gmail.com > > > > wrote: > > >> Hi Sergio, > > >> > > >> Thanks for the KIP. Instead of specifying the size in bytes, have you > > >> considered specifying it in terms of number of batches? I think it's a > > >> bit more user friendly than a size in raw bytes. > > >> For example: --num-batches: The number of batches to read from the log > > segment. > > >> > > >> Thanks, > > >> Mickael > > >> > > >> > > >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano > > >> <sergio.troi...@adevinta.com.invalid> wrote: > > >> > > > >> > Hi Luke, > > >> > > > >> > Make sense, done! > > >> > > > >> > Thank you. > > >> > Sergio Troiano > > >> > > > >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen <show...@gmail.com> wrote: > > >> > > > >> > > Hi Sergio, > > >> > > > > >> > > > I don't want this to minimize the main feature I want to deploy > > as I > > >> > > think the > > >> > > message size limit is not as important as the limiting the amount > of > > >> > > batches. > > >> > > > > >> > > Agree! Let's focus on the feature of limiting the batch amounts. > > >> > > > > >> > > One more comment to the KIP: > > >> > > 1. Could you put the new parameter description into the KIP > > proposed change > > >> > > section? That would make it much clear. > > >> > > > > >> > > > > >> > > Thank you. > > >> > > Luke > > >> > > > > >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano > > >> > > <sergio.troi...@adevinta.com.invalid> wrote: > > >> > > > > >> > > > hey Luke, > > >> > > > > > >> > > > I am interested in expanding the KIP scope but I am a bit > > concerned this > > >> > > > could create a lot of noise and confusion as they look like very > > similar > > >> > > > parameters, I agree this is a small change, so I think if I do > it > > >> > > properly > > >> > > > it should not be a problem at all, I just will need a couple > more > > of days > > >> > > > as I want to create the proper tests as well. > > >> > > > > > >> > > > I have a doubt about editing the KIP, I mean should I add this > as > > a new > > >> > > > feature as well?, should I describe this as a side effect > > finding? I > > >> > > don't > > >> > > > want this to minimize the main feature I want to deploy as I > > think the > > >> > > > message size limit is not as important as the limiting the > amount > > of > > >> > > > batches. > > >> > > > > > >> > > > It is up to you, if you guys consider we must add this in this > > KIP then I > > >> > > > will be happy to do it. 😀 > > >> > > > > > >> > > > Best regards. > > >> > > > Sergio Troiano > > >> > > > > > >> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com> > wrote: > > >> > > > > > >> > > > > Hi Sergio, > > >> > > > > > > >> > > > > Thanks for your explanation. > > >> > > > > Make sense to me. > > >> > > > > > > >> > > > > > Only interesting thing that I have just found is > > *max-message-size > > >> > > *is > > >> > > > > not > > >> > > > > used while dump logs are requested, instead it is used by > > dumpIndex > > >> > > > > > > >> > > > > Are you interested in expanding the scope of this KIP to > > include the > > >> > > > > *max-message-size* in dumping logs? > > >> > > > > I think it's good because it will also be a small change, and > > no need > > >> > > to > > >> > > > go > > >> > > > > through another KIP discussing/voting process. But I'm fine if > > you want > > >> > > > to > > >> > > > > keep this KIP as is, and create another JIRA ticket for future > > work. > > >> > > > > > > >> > > > > Thank you. > > >> > > > > Luke > > >> > > > > > > >> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano > > >> > > > > <sergio.troi...@adevinta.com.invalid> wrote: > > >> > > > > > > >> > > > > > hey Luke, > > >> > > > > > > > >> > > > > > Let me answer them: > > >> > > > > > 1. If the *max-batches-size* is too small that results in no > > records > > >> > > > > > output, will we output any information to the user? > > >> > > > > > > > >> > > > > > If the *max-batches-size*is even smaller than the first > > batch then > > >> > > > there > > >> > > > > > won't be any output, this is handled by FileRecords class, I > > think > > >> > > this > > >> > > > > is > > >> > > > > > correct as this is the expected behaviour. > > >> > > > > > > > >> > > > > > 2. After your explanation, I guess the use of > > *max-batches-size* > > >> > > won't > > >> > > > > > conflict with *max-message-size*, right? > > >> > > > > > > > >> > > > > > Only interesting thing that I have just found is > > *max-message-size > > >> > > *is > > >> > > > > not > > >> > > > > > used while dump logs are requested, instead it is used by > > dumpIndex > > >> > > > > > < > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150 > > >> > > > > > > > > >> > > > > > so, > > >> > > > > > this feature is not working for dumping logs, even though I > > checked > > >> > > if > > >> > > > > > there is a unit test for this and there is not any. Maybe we > > could > > >> > > > > create a > > >> > > > > > ticket for this? > > >> > > > > > > > >> > > > > > Regards. > > >> > > > > > > > >> > > > > > > > >> > > > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com> > > wrote: > > >> > > > > > > > >> > > > > > > Hi Sergio, > > >> > > > > > > > > >> > > > > > > Thanks for the explanation! Very clear! > > >> > > > > > > I think we should put this example and explanation into > KIP. > > >> > > > > > > > > >> > > > > > > Other comments: > > >> > > > > > > 1. If the *max-batches-size* is too small that results in > no > > >> > > records > > >> > > > > > > output, will we output any information to the user? > > >> > > > > > > 2. After your explanation, I guess the use of > > *max-batches-size* > > >> > > > won't > > >> > > > > > > conflict with *max-message-size*, right? > > >> > > > > > > That is, user can set the 2 arguments at the same time. Is > > that > > >> > > > > correct? > > >> > > > > > > > > >> > > > > > > Thank you. > > >> > > > > > > Luke > > >> > > > > > > > > >> > > > > > > Thank you. > > >> > > > > > > Luke > > >> > > > > > > > > >> > > > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano > > >> > > > > > > <sergio.troi...@adevinta.com.invalid> wrote: > > >> > > > > > > > > >> > > > > > > > hey Luke, > > >> > > > > > > > > > >> > > > > > > > thanks for the interest, it is a good question, please > > let me > > >> > > > explain > > >> > > > > > > you: > > >> > > > > > > > > > >> > > > > > > > *max-message-size *a filter for the size of each batch, > > so for > > >> > > > > example > > >> > > > > > if > > >> > > > > > > > Iset --max-message-size 1000 bytes and my segment log > has > > 300 > > >> > > > > batches, > > >> > > > > > > 150 > > >> > > > > > > > of them has a size of 500 bytes and the other 150 has a > > size of > > >> > > > 2000 > > >> > > > > > > bytes > > >> > > > > > > > then the script will skip the las 150 ones as each batch > > is > > >> > > heavier > > >> > > > > > than > > >> > > > > > > > the limit. > > >> > > > > > > > > > >> > > > > > > > In the other hand following the same example above with > > >> > > > > > *max-batches-size > > >> > > > > > > > *set > > >> > > > > > > > to 1000 bytes it will only print out the first 2 batches > > (500 > > >> > > bytes > > >> > > > > > each) > > >> > > > > > > > and stop, This will avoid reading the whole file > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Also if all of them are smaller than 1000 bytes it will > > end up > > >> > > > > printing > > >> > > > > > > out > > >> > > > > > > > all the batches. > > >> > > > > > > > The idea of my change is to limit the *amount* of > batches > > no > > >> > > matter > > >> > > > > > their > > >> > > > > > > > size. > > >> > > > > > > > > > >> > > > > > > > I hope this reply helps. > > >> > > > > > > > Best regards. > > >> > > > > > > > > > >> > > > > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen < > show...@gmail.com > > > > > >> > > wrote: > > >> > > > > > > > > > >> > > > > > > > > Hi Sergio, > > >> > > > > > > > > > > >> > > > > > > > > Thanks for the KIP! > > >> > > > > > > > > > > >> > > > > > > > > One question: > > >> > > > > > > > > I saw there's a `max-message-size` argument that seems > > to do > > >> > > the > > >> > > > > same > > >> > > > > > > > thing > > >> > > > > > > > > as you want. > > >> > > > > > > > > Could you help explain what's the difference between > > >> > > > > > `max-message-size` > > >> > > > > > > > and > > >> > > > > > > > > `max-batches-size`? > > >> > > > > > > > > > > >> > > > > > > > > Thank you. > > >> > > > > > > > > Luke > > >> > > > > > > > > > > >> > > > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True < > > >> > > k...@mustardgrain.com> > > >> > > > > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > > > Hi Sergio, > > >> > > > > > > > > > > > >> > > > > > > > > > Thanks for the KIP. I don't know anything about the > > log > > >> > > segment > > >> > > > > > > > > internals, > > >> > > > > > > > > > but the logic and implementation seem sound. > > >> > > > > > > > > > > > >> > > > > > > > > > Three questions: > > >> > > > > > > > > > 1. Since the --max-batches-size unit is bytes, does > > it > > >> > > matter > > >> > > > if > > >> > > > > > > that > > >> > > > > > > > > > size doesn't align to a record boundary? > > >> > > > > > > > > > 2. Can you add a check to make sure that > > --max-batches-size > > >> > > > > > doesn't > > >> > > > > > > > > allow > > >> > > > > > > > > > the user to pass in a negative number? > > >> > > > > > > > > > 3. Can you add/update any unit tests related to the > > >> > > > > > DumpLogSegments > > >> > > > > > > > > > arguments? > > >> > > > > > > > > > Thanks, > > >> > > > > > > > > > Kirk > > >> > > > > > > > > > > > >> > > > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel > > Troiano wrote: > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >