Hi Sergio, Thanks for the explanation! Very clear! I think we should put this example and explanation into KIP.
Other comments: 1. If the *max-batches-size* is too small that results in no records output, will we output any information to the user? 2. After your explanation, I guess the use of *max-batches-size* won't conflict with *max-message-size*, right? That is, user can set the 2 arguments at the same time. Is that correct? Thank you. Luke Thank you. Luke On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano <sergio.troi...@adevinta.com.invalid> wrote: > hey Luke, > > thanks for the interest, it is a good question, please let me explain you: > > *max-message-size *a filter for the size of each batch, so for example if > Iset --max-message-size 1000 bytes and my segment log has 300 batches, 150 > of them has a size of 500 bytes and the other 150 has a size of 2000 bytes > then the script will skip the las 150 ones as each batch is heavier than > the limit. > > In the other hand following the same example above with *max-batches-size > *set > to 1000 bytes it will only print out the first 2 batches (500 bytes each) > and stop, This will avoid reading the whole file > > > Also if all of them are smaller than 1000 bytes it will end up printing out > all the batches. > The idea of my change is to limit the *amount* of batches no matter their > size. > > I hope this reply helps. > Best regards. > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com> wrote: > > > Hi Sergio, > > > > Thanks for the KIP! > > > > One question: > > I saw there's a `max-message-size` argument that seems to do the same > thing > > as you want. > > Could you help explain what's the difference between `max-message-size` > and > > `max-batches-size`? > > > > Thank you. > > Luke > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <k...@mustardgrain.com> wrote: > > > > > Hi Sergio, > > > > > > Thanks for the KIP. I don't know anything about the log segment > > internals, > > > but the logic and implementation seem sound. > > > > > > Three questions: > > > 1. Since the --max-batches-size unit is bytes, does it matter if that > > > size doesn't align to a record boundary? > > > 2. Can you add a check to make sure that --max-batches-size doesn't > > allow > > > the user to pass in a negative number? > > > 3. Can you add/update any unit tests related to the DumpLogSegments > > > arguments? > > > Thanks, > > > Kirk > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output > > > > > > > > > >