Thanks for the KIP Boyang. I guess I am missing something, but I am still learning more details about the rebalance protocol, so maybe you can help me out?
Assume a client sends UNKNOWN_MEMBER_ID in its first joinGroup request. The broker generates a `member.id` and sends it back via `MEMBER_ID_REQUIRED` error response. This response might never reach the client or the client fails before it can send the second joinGroup request. Thus, a client would need to start over with a new UNKNOWN_MEMBER_ID in its joinGroup request. Thus, the broker needs to generate a new `member.id` again. So it seems the problem is moved, but not resolved? The motivation of the KIP is: > The edge case is that if initial join group request keeps failing due to > connection timeout, or the consumer keeps restarting, From my understanding, this KIP move the issue from the first to the second joinGroup request (or broker joinGroup response). But maybe I am missing something. Can you help me out? -Matthias On 11/27/18 6:00 PM, Boyang Chen wrote: > Thanks Stanislav and Jason for the suggestions! > > >> Thanks for the KIP. Looks good overall. I think we will need to bump the >> version of the JoinGroup protocol in order to indicate compatibility with >> the new behavior. The coordinator needs to know when it is safe to assume >> the client will handle the error code. >> >> Also, I was wondering if we could reuse the REBALANCE_IN_PROGRESS error >> code. When the client sees this error code, it will take the memberId from >> the response and rejoin. We'd still need the protocol bump since older >> consumers do not have this logic. > > I will add the join group protocol version change to the KIP. Meanwhile I > feel for > understandability it's better to define a separate error code since > REBALANCE_IN_PROGRESS > is not the actual cause of the returned error. > >> One small question I have is now that we have one and a half round-trips >> needed to join in a rebalance (1 full RT addition), is it worth it to >> consider increasing the default value of `group.initial.rebalance.delay.ms`? > I guess we could keep it for now. After KIP-345 and incremental cooperative > rebalancing > work we should be safe to deprecate `group.initial.rebalance.delay.ms`. Also > one round trip > shouldn't increase the latency too much IMO. > > Best, > Boyang > ________________________________ > From: Stanislav Kozlovski <stanis...@confluent.io> > Sent: Wednesday, November 28, 2018 2:32 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-394: Require member.id for initial join group > request > > Hi Boyang, > > The KIP looks very good. > One small question I have is now that we have one and a half round-trips > needed to join in a rebalance (1 full RT addition), is it worth it to > consider increasing the default value of `group.initial.rebalance.delay.ms`? > > Best, > Stanislav > > On Tue, Nov 27, 2018 at 5:39 PM Jason Gustafson <ja...@confluent.io> wrote: > >> Hi Boyang, >> >> Thanks for the KIP. Looks good overall. I think we will need to bump the >> version of the JoinGroup protocol in order to indicate compatibility with >> the new behavior. The coordinator needs to know when it is safe to assume >> the client will handle the error code. >> >> Also, I was wondering if we could reuse the REBALANCE_IN_PROGRESS error >> code. When the client sees this error code, it will take the memberId from >> the response and rejoin. We'd still need the protocol bump since older >> consumers do not have this logic. >> >> Thanks, >> Jason >> >> On Mon, Nov 26, 2018 at 5:47 PM Boyang Chen <bche...@outlook.com> wrote: >> >>> Hey friends, >>> >>> >>> I would like to start a discussion thread for KIP-394 which is trying to >>> mitigate broker cache bursting issue due to anonymous join group >> requests: >>> >>> >>> >> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-394%253A%2BRequire%2Bmember.id%2Bfor%2Binitial%2Bjoin%2Bgroup%2Brequest&data=02%7C01%7C%7C8c2c54e07967404f0fa808d65496c9c7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636789403931186848&sdata=oRbPKzwyDx6SodAaVb3Vv%2FXpJoD09E3%2BdTc0p1qKDEo%3D&reserved=0 >>> >>> >>> Thanks! >>> >>> Boyang >>> >> > > > -- > Best, > Stanislav >
signature.asc
Description: OpenPGP digital signature