Hi Sergio,

Thanks for asking.
Since it's been discussed for weeks, you can start the vote anytime.

Thank you.
Luke

On Fri, Mar 25, 2022 at 10:40 AM Sergio Troiano
<sergio.troi...@adevinta.com.invalid> wrote:

> Hey guys,
>
>
> What is the next step? Who decides when it is time for voting?
>
>
> Thanks!
>
> Sent from my iPhone
>
> > On 8 Mar 2022, at 19:57, Sergio Daniel Troiano <
> sergio.troi...@adevinta.com> wrote:
> >
> > 
> > Hi Michael,
> >
> > Yes, it's a good idea and I considered it, the main problem is the
> FileRecords class does not accept number of batches as a parameter, it
> accepts bytes instead, so if we want to do so either we redesign a core
> class or we create a new one.
> > One of the pretty things (I consider) about this change which will bring
> a huge benefit is the change in the code is pretty small and it's on the
> final script, it does not require any deep change in a core library.
> >
> > An alternative which requires a big change as well without touching the
> FileRecords class would be accept number of batches as parameter, then call
> the FileRecords.slice() with a small value (bytes) count the batches, see
> if we can satisfy the number of batches, if not then we call it again and
> again until we reach the amount of batches. This will require a lot of code
> as well
> >
> > So long story short, the proposal change is quite small, it uses the
> current classes and has a big benefit.
> >
> > Maybe in the future we could consider the FileRecords class to support
> getting the amount of batches as parameters and we encapsulate this logic
> in the proper class (FileRecords)
> > What do you think?
> >
> > Thanks
> > Sergio
> > Thanks
> > Sergio
> >
> >> On Tue, 8 Mar 2022 at 18:32, Mickael Maison <mickael.mai...@gmail.com>
> wrote:
> >> Hi Sergio,
> >>
> >> Thanks for the KIP. Instead of specifying the size in bytes, have you
> >> considered specifying it in terms of number of batches? I think it's a
> >> bit more user friendly than a size in raw bytes.
> >> For example: --num-batches: The number of batches to read from the log
> segment.
> >>
> >> Thanks,
> >> Mickael
> >>
> >>
> >> On Tue, Mar 8, 2022 at 5:27 AM Sergio Daniel Troiano
> >> <sergio.troi...@adevinta.com.invalid> wrote:
> >> >
> >> > Hi Luke,
> >> >
> >> > Make sense, done!
> >> >
> >> > Thank you.
> >> > Sergio Troiano
> >> >
> >> > On Tue, 8 Mar 2022 at 03:02, Luke Chen <show...@gmail.com> wrote:
> >> >
> >> > > Hi Sergio,
> >> > >
> >> > > > I don't want this to minimize the main feature I want to deploy
> as I
> >> > > think the
> >> > > message size limit is not as important as the limiting the amount of
> >> > > batches.
> >> > >
> >> > > Agree! Let's focus on the feature of limiting the batch amounts.
> >> > >
> >> > > One more comment to the KIP:
> >> > > 1. Could you put the new parameter description into the KIP
> proposed change
> >> > > section? That would make it much clear.
> >> > >
> >> > >
> >> > > Thank you.
> >> > > Luke
> >> > >
> >> > > On Mon, Mar 7, 2022 at 8:44 PM Sergio Daniel Troiano
> >> > > <sergio.troi...@adevinta.com.invalid> wrote:
> >> > >
> >> > > > hey Luke,
> >> > > >
> >> > > > I am interested in expanding the KIP scope but I am a bit
> concerned this
> >> > > > could create a lot of noise and confusion as they look like very
> similar
> >> > > > parameters, I agree this is a small change, so I think if I do it
> >> > > properly
> >> > > > it should not be a problem at all, I just will need a couple more
> of days
> >> > > > as I want to create the proper tests as well.
> >> > > >
> >> > > > I have a doubt about editing the KIP, I mean should I add this as
> a new
> >> > > > feature as well?, should I describe this as a side effect
> finding? I
> >> > > don't
> >> > > > want this to minimize the main feature I want to deploy as I
> think the
> >> > > > message size limit is not as important as the limiting the amount
> of
> >> > > > batches.
> >> > > >
> >> > > > It is up to you, if you guys consider we must add this in this
> KIP then I
> >> > > > will be happy to do it. 😀
> >> > > >
> >> > > > Best regards.
> >> > > > Sergio Troiano
> >> > > >
> >> > > > On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com> wrote:
> >> > > >
> >> > > > > Hi Sergio,
> >> > > > >
> >> > > > > Thanks for your explanation.
> >> > > > > Make sense to me.
> >> > > > >
> >> > > > > > Only interesting thing that I have just found is
> *max-message-size
> >> > > *is
> >> > > > > not
> >> > > > > used while dump logs are requested, instead it is used by
> dumpIndex
> >> > > > >
> >> > > > > Are you interested in expanding the scope of this KIP to
> include the
> >> > > > > *max-message-size* in dumping logs?
> >> > > > > I think it's good because it will also be a small change, and
> no need
> >> > > to
> >> > > > go
> >> > > > > through another KIP discussing/voting process. But I'm fine if
> you want
> >> > > > to
> >> > > > > keep this KIP as is, and create another JIRA ticket for future
> work.
> >> > > > >
> >> > > > > Thank you.
> >> > > > > Luke
> >> > > > >
> >> > > > > On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> >> > > > > <sergio.troi...@adevinta.com.invalid> wrote:
> >> > > > >
> >> > > > > > hey Luke,
> >> > > > > >
> >> > > > > > Let me answer them:
> >> > > > > > 1. If the *max-batches-size* is too small that results in no
> records
> >> > > > > > output, will we output any information to the user?
> >> > > > > >
> >> > > > > > If the  *max-batches-size*is even smaller than the first
> batch then
> >> > > > there
> >> > > > > > won't be any output, this is handled by FileRecords class, I
> think
> >> > > this
> >> > > > > is
> >> > > > > > correct as this is the expected behaviour.
> >> > > > > >
> >> > > > > > 2. After your explanation, I guess the use of
> *max-batches-size*
> >> > > won't
> >> > > > > > conflict with *max-message-size*, right?
> >> > > > > >
> >> > > > > > Only interesting thing that I have just found is
> *max-message-size
> >> > > *is
> >> > > > > not
> >> > > > > > used while dump logs are requested, instead it is used by
> dumpIndex
> >> > > > > > <
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> >> > > > > > >
> >> > > > > > so,
> >> > > > > > this feature is not working for dumping logs, even though I
> checked
> >> > > if
> >> > > > > > there is a unit test for this and there is not any. Maybe we
> could
> >> > > > > create a
> >> > > > > > ticket for this?
> >> > > > > >
> >> > > > > > Regards.
> >> > > > > >
> >> > > > > >
> >> > > > > > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com>
> wrote:
> >> > > > > >
> >> > > > > > > Hi Sergio,
> >> > > > > > >
> >> > > > > > > Thanks for the explanation! Very clear!
> >> > > > > > > I think we should put this example and explanation into KIP.
> >> > > > > > >
> >> > > > > > > Other comments:
> >> > > > > > > 1. If the *max-batches-size* is too small that results in no
> >> > > records
> >> > > > > > > output, will we output any information to the user?
> >> > > > > > > 2. After your explanation, I guess the use of
> *max-batches-size*
> >> > > > won't
> >> > > > > > > conflict with *max-message-size*, right?
> >> > > > > > > That is, user can set the 2 arguments at the same time. Is
> that
> >> > > > > correct?
> >> > > > > > >
> >> > > > > > > Thank you.
> >> > > > > > > Luke
> >> > > > > > >
> >> > > > > > > Thank you.
> >> > > > > > > Luke
> >> > > > > > >
> >> > > > > > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> >> > > > > > > <sergio.troi...@adevinta.com.invalid> wrote:
> >> > > > > > >
> >> > > > > > > > hey Luke,
> >> > > > > > > >
> >> > > > > > > > thanks for the interest, it is a good question, please
> let me
> >> > > > explain
> >> > > > > > > you:
> >> > > > > > > >
> >> > > > > > > > *max-message-size *a filter for the size of each batch,
> so for
> >> > > > > example
> >> > > > > > if
> >> > > > > > > > Iset --max-message-size 1000 bytes and my segment log has
> 300
> >> > > > > batches,
> >> > > > > > > 150
> >> > > > > > > > of them has a size of 500 bytes  and the other 150 has a
> size of
> >> > > > 2000
> >> > > > > > > bytes
> >> > > > > > > > then the script will skip the las 150 ones as each batch
> is
> >> > > heavier
> >> > > > > > than
> >> > > > > > > > the limit.
> >> > > > > > > >
> >> > > > > > > > In the other hand following the same example above with
> >> > > > > > *max-batches-size
> >> > > > > > > > *set
> >> > > > > > > > to 1000 bytes it will only print out the first 2 batches
> (500
> >> > > bytes
> >> > > > > > each)
> >> > > > > > > > and stop, This will avoid reading the whole file
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Also if all of them are smaller than 1000 bytes it will
> end up
> >> > > > > printing
> >> > > > > > > out
> >> > > > > > > > all the batches.
> >> > > > > > > > The idea of my change is to limit the *amount* of batches
> no
> >> > > matter
> >> > > > > > their
> >> > > > > > > > size.
> >> > > > > > > >
> >> > > > > > > > I hope this reply helps.
> >> > > > > > > > Best regards.
> >> > > > > > > >
> >> > > > > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com
> >
> >> > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Sergio,
> >> > > > > > > > >
> >> > > > > > > > > Thanks for the KIP!
> >> > > > > > > > >
> >> > > > > > > > > One question:
> >> > > > > > > > > I saw there's a `max-message-size` argument that seems
> to do
> >> > > the
> >> > > > > same
> >> > > > > > > > thing
> >> > > > > > > > > as you want.
> >> > > > > > > > > Could you help explain what's the difference between
> >> > > > > > `max-message-size`
> >> > > > > > > > and
> >> > > > > > > > > `max-batches-size`?
> >> > > > > > > > >
> >> > > > > > > > > Thank you.
> >> > > > > > > > > Luke
> >> > > > > > > > >
> >> > > > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <
> >> > > k...@mustardgrain.com>
> >> > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Sergio,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for the KIP. I don't know anything about the
> log
> >> > > segment
> >> > > > > > > > > internals,
> >> > > > > > > > > > but the logic and implementation seem sound.
> >> > > > > > > > > >
> >> > > > > > > > > > Three questions:
> >> > > > > > > > > >  1. Since the --max-batches-size unit is bytes, does
> it
> >> > > matter
> >> > > > if
> >> > > > > > > that
> >> > > > > > > > > > size doesn't align to a record boundary?
> >> > > > > > > > > >  2. Can you add a check to make sure that
> --max-batches-size
> >> > > > > > doesn't
> >> > > > > > > > > allow
> >> > > > > > > > > > the user to pass in a negative number?
> >> > > > > > > > > >  3. Can you add/update any unit tests related to the
> >> > > > > > DumpLogSegments
> >> > > > > > > > > > arguments?
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > > Kirk
> >> > > > > > > > > >
> >> > > > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel
> Troiano wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
>

Reply via email to