I also prefer option 1. Ismael
On Fri, Dec 16, 2016 at 7:11 PM, Jason Gustafson <ja...@confluent.io> wrote: > Thanks Vahid. To clarify the impact of this issue, since we have no way to > send an error code in the OffsetFetchResponse when requesting all offsets, > we cannot detect when the coordinator has moved to another broker or when > it is still in the process of loading the offsets. This means we cannot > tell if there were was an error or if there were just no offsets stored for > the group. We've considered a few options: > > 1. Include an error code at the top level of the response. This seems like > the cleanest approach. The downside is that clients need to look for errors > in two locations for response errors. One small benefit is that many > OffsetFetch errors are group-level, so in that case, we can save the need > to return responses for all the requested partitions. > 2. Sort of hacky, but we could insert a "dummy" partition into the response > so that we have somewhere to return an error code. > 3. Include no error code, but use a null array in the response to indicate > that there was some error. If there was no error, and the group simply had > no partitions, then we return an empty array. I guess in this case, if the > client receives a null array in the response, it should assume the worst > and rediscover the coordinator and try again. > > My preference is the first one. Not sure if there are any other ideas? > > -Jason > > On Thu, Dec 15, 2016 at 3:02 PM, Vahid S Hashemian < > vahidhashem...@us.ibm.com> wrote: > > > Hi all, > > > > Even though KIP-88 was recently approved, due to a limitation that comes > > with the proposed protocol change in KIP-88 I'll have to re-open it to > > address the problem. > > I'd like to thank Jason Gustafson for catching this issue. > > > > I'll explain this in the KIP as well, but to summarize, KIP-88 suggests > > adding the option of passing a "null" array in FetchOffset request to > > query all existing offsets for a consumer group. It does not suggest any > > modification to FetchOffset response. > > > > In the existing protocol, group or coordinator related errors are > reported > > along with each partition in the OffsetFetch response. > > > > If there are partitions in the request, they are guaranteed to appear in > > the response (there could be an error code associated with each). So if > > there is an error, it is reported back by being attached to some > partition > > in the request. > > If an empty array is passed, no error is reported (no matter what the > > group or coordinator status is). The response comes back with an empty > > list. > > > > With the proposed change in KIP-88 we could have a scenario in which a > > null array is sent in FetchOffset request, and due to some errors (for > > example if coordinator just started and hasn't caught up yet with the > > offset topic), an empty list is returned in the FetchOffset response (the > > group may or may not actually be empty). The issue is in situations like > > this no error can be returned in the response because there is no > > partition to attach the error to. > > > > I'll update the KIP with more details and propose to add to OffsetFetch > > response schema an "error_code" at the top level that can be used to > > report group related errors (instead of reporting those errors with each > > individual partition). > > > > I apologize if this causes any inconvenience. > > > > Feedback and comments are always welcome. > > > > Thanks. > > --Vahid > > > > >