Thanks, Sergio.
On Fri, Mar 25, 2022 at 7:41 AM Sergio Daniel Troiano <sergio.troi...@adevinta.com.invalid> wrote: > > Hi David, > > I apologize. I missed your suggestion. > By the way I like it and I have applied your suggestion. > > About the rejected alternatives I have updated the KIP as well > > Best regards > Sergio Troiano > > On Fri, 25 Mar 2022 at 06:50, David Jacot <david.ja...@gmail.com> wrote: > > > Hi Sergio, > > > > I made a suggestion a few weeks ago about the name about the parameter but > > haven’t got a response for it. Did you consider it? > > > > Do we need to update the rejected alternatives section to mention the > > alternative options discussed in this thread? > > > > Thanks, > > David > > > > Le ven. 25 mars 2022 à 03:44, Luke Chen <show...@gmail.com> a écrit : > > > > > Hi Sergio, > > > > > > Thanks for asking. > > > Since it's been discussed for weeks, you can start the vote anytime. > > > > > > Thank you. > > > Luke > > > > > > On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano > > > <sergio.troi...@adevinta.com.invalid> wrote: > > > > > > > Hey guys, > > > > > > > > > > > > What is the next step? Who decides when it is time for voting? > > > > > > > > > > > > Thanks! > > > > > > > > Sent from my iPhone > > > > > > > > > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano < > > > > sergio.troi...@adevinta.com> wrote: > > > > > > > > > > > > > > > Hi Michael, > > > > > > > > > > Yes, it's a good idea and I considered it, the main problem is the > > > > FileRecords class does not accept number of batches as a parameter, it > > > > accepts bytes instead, so if we want to do so either we redesign a core > > > > class or we create a new one. > > > > > One of the pretty things (I consider) about this change which will > > > bring > > > > a huge benefit is the change in the code is pretty small and it's on > > the > > > > final script, it does not require any deep change in a core library. > > > > > > > > > > An alternative which requires a big change as well without touching > > the > > > > FileRecords class would be accept number of batches as parameter, then > > > call > > > > the FileRecords.slice() with a small value (bytes) count the batches, > > see > > > > if we can satisfy the number of batches, if not then we call it again > > and > > > > again until we reach the amount of batches. This will require a lot of > > > code > > > > as well > > > > > > > > > > So long story short, the proposal change is quite small, it uses the > > > > current classes and has a big benefit. > > > > > > > > > > Maybe in the future we could consider the FileRecords class to > > support > > > > getting the amount of batches as parameters and we encapsulate this > > logic > > > > in the proper class (FileRecords) > > > > > What do you think? > > > > > > > > > > Thanks > > > > > Sergio > > > > > Thanks > > > > > Sergio > > > > > > > > > >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison < > > mickael.mai...@gmail.com > > > > > > > > wrote: > > > > >> Hi Sergio, > > > > >> > > > > >> Thanks for the KIP. Instead of specifying the size in bytes, have > > you > > > > >> considered specifying it in terms of number of batches? I think > > it's a > > > > >> bit more user friendly than a size in raw bytes. > > > > >> For example: --num-batches: The number of batches to read from the > > log > > > > segment. > > > > >> > > > > >> Thanks, > > > > >> Mickael > > > > >> > > > > >> > > > > >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano > > > > >> <sergio.troi...@adevinta.com.invalid> wrote: > > > > >> > > > > > >> > Hi Luke, > > > > >> > > > > > >> > Make sense, done! > > > > >> > > > > > >> > Thank you. > > > > >> > Sergio Troiano > > > > >> > > > > > >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen <show...@gmail.com> wrote: > > > > >> > > > > > >> > > Hi Sergio, > > > > >> > > > > > > >> > > > I don't want this to minimize the main feature I want to > > deploy > > > > as I > > > > >> > > think the > > > > >> > > message size limit is not as important as the limiting the > > amount > > > of > > > > >> > > batches. > > > > >> > > > > > > >> > > Agree! Let's focus on the feature of limiting the batch amounts. > > > > >> > > > > > > >> > > One more comment to the KIP: > > > > >> > > 1. Could you put the new parameter description into the KIP > > > > proposed change > > > > >> > > section? That would make it much clear. > > > > >> > > > > > > >> > > > > > > >> > > Thank you. > > > > >> > > Luke > > > > >> > > > > > > >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano > > > > >> > > <sergio.troi...@adevinta.com.invalid> wrote: > > > > >> > > > > > > >> > > > hey Luke, > > > > >> > > > > > > > >> > > > I am interested in expanding the KIP scope but I am a bit > > > > concerned this > > > > >> > > > could create a lot of noise and confusion as they look like > > very > > > > similar > > > > >> > > > parameters, I agree this is a small change, so I think if I do > > > it > > > > >> > > properly > > > > >> > > > it should not be a problem at all, I just will need a couple > > > more > > > > of days > > > > >> > > > as I want to create the proper tests as well. > > > > >> > > > > > > > >> > > > I have a doubt about editing the KIP, I mean should I add this > > > as > > > > a new > > > > >> > > > feature as well?, should I describe this as a side effect > > > > finding? I > > > > >> > > don't > > > > >> > > > want this to minimize the main feature I want to deploy as I > > > > think the > > > > >> > > > message size limit is not as important as the limiting the > > > amount > > > > of > > > > >> > > > batches. > > > > >> > > > > > > > >> > > > It is up to you, if you guys consider we must add this in this > > > > KIP then I > > > > >> > > > will be happy to do it. 😀 > > > > >> > > > > > > > >> > > > Best regards. > > > > >> > > > Sergio Troiano > > > > >> > > > > > > > >> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com> > > > wrote: > > > > >> > > > > > > > >> > > > > Hi Sergio, > > > > >> > > > > > > > > >> > > > > Thanks for your explanation. > > > > >> > > > > Make sense to me. > > > > >> > > > > > > > > >> > > > > > Only interesting thing that I have just found is > > > > *max-message-size > > > > >> > > *is > > > > >> > > > > not > > > > >> > > > > used while dump logs are requested, instead it is used by > > > > dumpIndex > > > > >> > > > > > > > > >> > > > > Are you interested in expanding the scope of this KIP to > > > > include the > > > > >> > > > > *max-message-size* in dumping logs? > > > > >> > > > > I think it's good because it will also be a small change, > > and > > > > no need > > > > >> > > to > > > > >> > > > go > > > > >> > > > > through another KIP discussing/voting process. But I'm fine > > if > > > > you want > > > > >> > > > to > > > > >> > > > > keep this KIP as is, and create another JIRA ticket for > > future > > > > work. > > > > >> > > > > > > > > >> > > > > Thank you. > > > > >> > > > > Luke > > > > >> > > > > > > > > >> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano > > > > >> > > > > <sergio.troi...@adevinta.com.invalid> wrote: > > > > >> > > > > > > > > >> > > > > > hey Luke, > > > > >> > > > > > > > > > >> > > > > > Let me answer them: > > > > >> > > > > > 1. If the *max-batches-size* is too small that results in > > no > > > > records > > > > >> > > > > > output, will we output any information to the user? > > > > >> > > > > > > > > > >> > > > > > If the *max-batches-size*is even smaller than the first > > > > batch then > > > > >> > > > there > > > > >> > > > > > won't be any output, this is handled by FileRecords > > class, I > > > > think > > > > >> > > this > > > > >> > > > > is > > > > >> > > > > > correct as this is the expected behaviour. > > > > >> > > > > > > > > > >> > > > > > 2. After your explanation, I guess the use of > > > > *max-batches-size* > > > > >> > > won't > > > > >> > > > > > conflict with *max-message-size*, right? > > > > >> > > > > > > > > > >> > > > > > Only interesting thing that I have just found is > > > > *max-message-size > > > > >> > > *is > > > > >> > > > > not > > > > >> > > > > > used while dump logs are requested, instead it is used by > > > > dumpIndex > > > > >> > > > > > < > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150 > > > > >> > > > > > > > > > > >> > > > > > so, > > > > >> > > > > > this feature is not working for dumping logs, even though > > I > > > > checked > > > > >> > > if > > > > >> > > > > > there is a unit test for this and there is not any. Maybe > > we > > > > could > > > > >> > > > > create a > > > > >> > > > > > ticket for this? > > > > >> > > > > > > > > > >> > > > > > Regards. > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com > > > > > > > wrote: > > > > >> > > > > > > > > > >> > > > > > > Hi Sergio, > > > > >> > > > > > > > > > > >> > > > > > > Thanks for the explanation! Very clear! > > > > >> > > > > > > I think we should put this example and explanation into > > > KIP. > > > > >> > > > > > > > > > > >> > > > > > > Other comments: > > > > >> > > > > > > 1. If the *max-batches-size* is too small that results > > in > > > no > > > > >> > > records > > > > >> > > > > > > output, will we output any information to the user? > > > > >> > > > > > > 2. After your explanation, I guess the use of > > > > *max-batches-size* > > > > >> > > > won't > > > > >> > > > > > > conflict with *max-message-size*, right? > > > > >> > > > > > > That is, user can set the 2 arguments at the same time. > > Is > > > > that > > > > >> > > > > correct? > > > > >> > > > > > > > > > > >> > > > > > > Thank you. > > > > >> > > > > > > Luke > > > > >> > > > > > > > > > > >> > > > > > > Thank you. > > > > >> > > > > > > Luke > > > > >> > > > > > > > > > > >> > > > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano > > > > >> > > > > > > <sergio.troi...@adevinta.com.invalid> wrote: > > > > >> > > > > > > > > > > >> > > > > > > > hey Luke, > > > > >> > > > > > > > > > > > >> > > > > > > > thanks for the interest, it is a good question, please > > > > let me > > > > >> > > > explain > > > > >> > > > > > > you: > > > > >> > > > > > > > > > > > >> > > > > > > > *max-message-size *a filter for the size of each > > batch, > > > > so for > > > > >> > > > > example > > > > >> > > > > > if > > > > >> > > > > > > > Iset --max-message-size 1000 bytes and my segment log > > > has > > > > 300 > > > > >> > > > > batches, > > > > >> > > > > > > 150 > > > > >> > > > > > > > of them has a size of 500 bytes and the other 150 > > has a > > > > size of > > > > >> > > > 2000 > > > > >> > > > > > > bytes > > > > >> > > > > > > > then the script will skip the las 150 ones as each > > batch > > > > is > > > > >> > > heavier > > > > >> > > > > > than > > > > >> > > > > > > > the limit. > > > > >> > > > > > > > > > > > >> > > > > > > > In the other hand following the same example above > > with > > > > >> > > > > > *max-batches-size > > > > >> > > > > > > > *set > > > > >> > > > > > > > to 1000 bytes it will only print out the first 2 > > batches > > > > (500 > > > > >> > > bytes > > > > >> > > > > > each) > > > > >> > > > > > > > and stop, This will avoid reading the whole file > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > Also if all of them are smaller than 1000 bytes it > > will > > > > end up > > > > >> > > > > printing > > > > >> > > > > > > out > > > > >> > > > > > > > all the batches. > > > > >> > > > > > > > The idea of my change is to limit the *amount* of > > > batches > > > > no > > > > >> > > matter > > > > >> > > > > > their > > > > >> > > > > > > > size. > > > > >> > > > > > > > > > > > >> > > > > > > > I hope this reply helps. > > > > >> > > > > > > > Best regards. > > > > >> > > > > > > > > > > > >> > > > > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen < > > > show...@gmail.com > > > > > > > > > >> > > wrote: > > > > >> > > > > > > > > > > > >> > > > > > > > > Hi Sergio, > > > > >> > > > > > > > > > > > > >> > > > > > > > > Thanks for the KIP! > > > > >> > > > > > > > > > > > > >> > > > > > > > > One question: > > > > >> > > > > > > > > I saw there's a `max-message-size` argument that > > seems > > > > to do > > > > >> > > the > > > > >> > > > > same > > > > >> > > > > > > > thing > > > > >> > > > > > > > > as you want. > > > > >> > > > > > > > > Could you help explain what's the difference between > > > > >> > > > > > `max-message-size` > > > > >> > > > > > > > and > > > > >> > > > > > > > > `max-batches-size`? > > > > >> > > > > > > > > > > > > >> > > > > > > > > Thank you. > > > > >> > > > > > > > > Luke > > > > >> > > > > > > > > > > > > >> > > > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True < > > > > >> > > k...@mustardgrain.com> > > > > >> > > > > > > wrote: > > > > >> > > > > > > > > > > > > >> > > > > > > > > > Hi Sergio, > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Thanks for the KIP. I don't know anything about > > the > > > > log > > > > >> > > segment > > > > >> > > > > > > > > internals, > > > > >> > > > > > > > > > but the logic and implementation seem sound. > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Three questions: > > > > >> > > > > > > > > > 1. Since the --max-batches-size unit is bytes, > > does > > > > it > > > > >> > > matter > > > > >> > > > if > > > > >> > > > > > > that > > > > >> > > > > > > > > > size doesn't align to a record boundary? > > > > >> > > > > > > > > > 2. Can you add a check to make sure that > > > > --max-batches-size > > > > >> > > > > > doesn't > > > > >> > > > > > > > > allow > > > > >> > > > > > > > > > the user to pass in a negative number? > > > > >> > > > > > > > > > 3. Can you add/update any unit tests related to > > the > > > > >> > > > > > DumpLogSegments > > > > >> > > > > > > > > > arguments? > > > > >> > > > > > > > > > Thanks, > > > > >> > > > > > > > > > Kirk > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel > > > > Troiano wrote: > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > >