Hey Matthias,

I think the notion is to have the `session.timeout.ms` to start ticking
when the broker responds with the member.id. Then, the broker would
properly expire consumers and not hold too many stale ones.
This isn't mentioned in the KIP though so it is worth to wait for Boyang to
confirm

On Wed, Nov 28, 2018 at 3:10 AM Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks for the KIP Boyang.
>
> I guess I am missing something, but I am still learning more details
> about the rebalance protocol, so maybe you can help me out?
>
> Assume a client sends UNKNOWN_MEMBER_ID in its first joinGroup request.
> The broker generates a `member.id` and sends it back via
> `MEMBER_ID_REQUIRED` error response. This response might never reach the
> client or the client fails before it can send the second joinGroup
> request. Thus, a client would need to start over with a new
> UNKNOWN_MEMBER_ID in its joinGroup request. Thus, the broker needs to
> generate a new `member.id` again.
>
> So it seems the problem is moved, but not resolved? The motivation of
> the KIP is:
>
> > The edge case is that if initial join group request keeps failing due to
> connection timeout, or the consumer keeps restarting,
>
> From my understanding, this KIP move the issue from the first to the
> second joinGroup request (or broker joinGroup response).
>
> But maybe I am missing something. Can you help me out?
>
>
> -Matthias
>
>
> On 11/27/18 6:00 PM, Boyang Chen wrote:
> > Thanks Stanislav and Jason for the suggestions!
> >
> >
> >> Thanks for the KIP. Looks good overall. I think we will need to bump the
> >> version of the JoinGroup protocol in order to indicate compatibility
> with
> >> the new behavior. The coordinator needs to know when it is safe to
> assume
> >> the client will handle the error code.
> >>
> >> Also, I was wondering if we could reuse the REBALANCE_IN_PROGRESS error
> >> code. When the client sees this error code, it will take the memberId
> from
> >> the response and rejoin. We'd still need the protocol bump since older
> >> consumers do not have this logic.
> >
> > I will add the join group protocol version change to the KIP. Meanwhile
> I feel for
> > understandability it's better to define a separate error code since
> REBALANCE_IN_PROGRESS
> > is not the actual cause of the returned error.
> >
> >> One small question I have is now that we have one and a half round-trips
> >> needed to join in a rebalance (1 full RT addition), is it worth it to
> >> consider increasing the default value of `
> group.initial.rebalance.delay.ms`?
> > I guess we could keep it for now. After KIP-345 and incremental
> cooperative rebalancing
> > work we should be safe to deprecate `group.initial.rebalance.delay.ms`.
> Also one round trip
> > shouldn't increase the latency too much IMO.
> >
> > Best,
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <stanis...@confluent.io>
> > Sent: Wednesday, November 28, 2018 2:32 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-394: Require member.id for initial join
> group request
> >
> > Hi Boyang,
> >
> > The KIP looks very good.
> > One small question I have is now that we have one and a half round-trips
> > needed to join in a rebalance (1 full RT addition), is it worth it to
> > consider increasing the default value of `
> group.initial.rebalance.delay.ms`?
> >
> > Best,
> > Stanislav
> >
> > On Tue, Nov 27, 2018 at 5:39 PM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> >> Hi Boyang,
> >>
> >> Thanks for the KIP. Looks good overall. I think we will need to bump the
> >> version of the JoinGroup protocol in order to indicate compatibility
> with
> >> the new behavior. The coordinator needs to know when it is safe to
> assume
> >> the client will handle the error code.
> >>
> >> Also, I was wondering if we could reuse the REBALANCE_IN_PROGRESS error
> >> code. When the client sees this error code, it will take the memberId
> from
> >> the response and rejoin. We'd still need the protocol bump since older
> >> consumers do not have this logic.
> >>
> >> Thanks,
> >> Jason
> >>
> >> On Mon, Nov 26, 2018 at 5:47 PM Boyang Chen <bche...@outlook.com>
> wrote:
> >>
> >>> Hey friends,
> >>>
> >>>
> >>> I would like to start a discussion thread for KIP-394 which is trying
> to
> >>> mitigate broker cache bursting issue due to anonymous join group
> >> requests:
> >>>
> >>>
> >>>
> >>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-394%253A%2BRequire%2Bmember.id%2Bfor%2Binitial%2Bjoin%2Bgroup%2Brequest&amp;data=02%7C01%7C%7C8c2c54e07967404f0fa808d65496c9c7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636789403931186848&amp;sdata=oRbPKzwyDx6SodAaVb3Vv%2FXpJoD09E3%2BdTc0p1qKDEo%3D&amp;reserved=0
> >>>
> >>>
> >>> Thanks!
> >>>
> >>> Boyang
> >>>
> >>
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>

-- 
Best,
Stanislav

Reply via email to