Hey Vahid, Since there haven't been any additional comments, perhaps start a new vote?
-Jason On Tue, Jan 3, 2017 at 2:14 PM, Ismael Juma <ism...@juma.me.uk> wrote: > I also prefer option 1. > > Ismael > > On Fri, Dec 16, 2016 at 7:11 PM, Jason Gustafson <ja...@confluent.io> > wrote: > > > Thanks Vahid. To clarify the impact of this issue, since we have no way > to > > send an error code in the OffsetFetchResponse when requesting all > offsets, > > we cannot detect when the coordinator has moved to another broker or when > > it is still in the process of loading the offsets. This means we cannot > > tell if there were was an error or if there were just no offsets stored > for > > the group. We've considered a few options: > > > > 1. Include an error code at the top level of the response. This seems > like > > the cleanest approach. The downside is that clients need to look for > errors > > in two locations for response errors. One small benefit is that many > > OffsetFetch errors are group-level, so in that case, we can save the need > > to return responses for all the requested partitions. > > 2. Sort of hacky, but we could insert a "dummy" partition into the > response > > so that we have somewhere to return an error code. > > 3. Include no error code, but use a null array in the response to > indicate > > that there was some error. If there was no error, and the group simply > had > > no partitions, then we return an empty array. I guess in this case, if > the > > client receives a null array in the response, it should assume the worst > > and rediscover the coordinator and try again. > > > > My preference is the first one. Not sure if there are any other ideas? > > > > -Jason > > > > On Thu, Dec 15, 2016 at 3:02 PM, Vahid S Hashemian < > > vahidhashem...@us.ibm.com> wrote: > > > > > Hi all, > > > > > > Even though KIP-88 was recently approved, due to a limitation that > comes > > > with the proposed protocol change in KIP-88 I'll have to re-open it to > > > address the problem. > > > I'd like to thank Jason Gustafson for catching this issue. > > > > > > I'll explain this in the KIP as well, but to summarize, KIP-88 suggests > > > adding the option of passing a "null" array in FetchOffset request to > > > query all existing offsets for a consumer group. It does not suggest > any > > > modification to FetchOffset response. > > > > > > In the existing protocol, group or coordinator related errors are > > reported > > > along with each partition in the OffsetFetch response. > > > > > > If there are partitions in the request, they are guaranteed to appear > in > > > the response (there could be an error code associated with each). So if > > > there is an error, it is reported back by being attached to some > > partition > > > in the request. > > > If an empty array is passed, no error is reported (no matter what the > > > group or coordinator status is). The response comes back with an empty > > > list. > > > > > > With the proposed change in KIP-88 we could have a scenario in which a > > > null array is sent in FetchOffset request, and due to some errors (for > > > example if coordinator just started and hasn't caught up yet with the > > > offset topic), an empty list is returned in the FetchOffset response > (the > > > group may or may not actually be empty). The issue is in situations > like > > > this no error can be returned in the response because there is no > > > partition to attach the error to. > > > > > > I'll update the KIP with more details and propose to add to OffsetFetch > > > response schema an "error_code" at the top level that can be used to > > > report group related errors (instead of reporting those errors with > each > > > individual partition). > > > > > > I apologize if this causes any inconvenience. > > > > > > Feedback and comments are always welcome. > > > > > > Thanks. > > > --Vahid > > > > > > > > >