Here's another strange bug that we're seeing after upgrading to Kafka
0.10.1.0: one of our consumer groups is appearing twice in the list, and
appears to belong to two different nodes.

% kafka-consumer-groups.sh --bootstrap-server localhost:40172 --list | sort
| uniq -c | sort -n | grep -v '^ *1'
      2 details-log-etl

If I manually send a ListGroups request to each node, the offending
consumer group shows up twice (once as owned by broker ID 1 and once as
owned by broker ID 2). If I manually send an OffsetFetchRequest to Broker
#1 and Broker #2 with the given group name, I get back conflicting
responses:

(from Broker #1):
OffsetFetchResponse_v1(topics=[(topic='tracking.details',
partitions=[(partition=0, offset=85606947, metadata='', error_code=0)])])

(from Broker #2):
OffsetFetchResponse_v1(topics=[(topic='tracking.details',
partitions=[(partition=0, offset=83718751, metadata='', error_code=0)])])

The offset=85606947 response is correct.

If I use the GroupCoordinatorRequest API, both broker 1 and broker2 return
a result that broker 1 is the coordinator. The actual consuming application
seems unaffected and is proceeding as expected using broker 1.

This isn't actually breaking anything critical (since, like I said, actual
consumers seem to be doing the right thing), but it's breaking monitoring,
and it concerns me that such a duplicate is possible.

I haven't tried bouncing the consumer yet to see if that fixes it; I
figured I'd e-mail out just in case there was anything else folks wanted me
to look at first.
-- 
James Brown
Engineer

Reply via email to