Hi Sergio,

Thanks for the explanation! Very clear!
I think we should put this example and explanation into KIP.

Other comments:
1. If the *max-batches-size* is too small that results in no records
output, will we output any information to the user?
2. After your explanation, I guess the use of *max-batches-size* won't
conflict with *max-message-size*, right?
That is, user can set the 2 arguments at the same time. Is that correct?

Thank you.
Luke

Thank you.
Luke

On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
<sergio.troi...@adevinta.com.invalid> wrote:

> hey Luke,
>
> thanks for the interest, it is a good question, please let me explain you:
>
> *max-message-size *a filter for the size of each batch, so for example if
> Iset --max-message-size 1000 bytes and my segment log has 300 batches, 150
> of them has a size of 500 bytes  and the other 150 has a size of 2000 bytes
> then the script will skip the las 150 ones as each batch is heavier than
> the limit.
>
> In the other hand following the same example above with *max-batches-size
> *set
> to 1000 bytes it will only print out the first 2 batches (500 bytes each)
> and stop, This will avoid reading the whole file
>
>
> Also if all of them are smaller than 1000 bytes it will end up printing out
> all the batches.
> The idea of my change is to limit the *amount* of batches no matter their
> size.
>
> I hope this reply helps.
> Best regards.
>
> On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com> wrote:
>
> > Hi Sergio,
> >
> > Thanks for the KIP!
> >
> > One question:
> > I saw there's a `max-message-size` argument that seems to do the same
> thing
> > as you want.
> > Could you help explain what's the difference between `max-message-size`
> and
> > `max-batches-size`?
> >
> > Thank you.
> > Luke
> >
> > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <k...@mustardgrain.com> wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for the KIP. I don't know anything about the log segment
> > internals,
> > > but the logic and implementation seem sound.
> > >
> > > Three questions:
> > >  1. Since the --max-batches-size unit is bytes, does it matter if that
> > > size doesn't align to a record boundary?
> > >  2. Can you add a check to make sure that --max-batches-size doesn't
> > allow
> > > the user to pass in a negative number?
> > >  3. Can you add/update any unit tests related to the DumpLogSegments
> > > arguments?
> > > Thanks,
> > > Kirk
> > >
> > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > >
> > >
> >
>

Reply via email to