How fast does the in-memory cache grow? As a random datapoint...
10 months ago we set our offsets.retention.minutes to 1 year. So, for the past 10 months, we essentially have not expired any offsets. Via JMX, one of our brokers says kafka.coordinator.group:type=GroupMetadataManager,name=NumOffsets Value=153552 I don't know that maps into memory usage. Are the keys dependent on topic names and group names? And, of course, that number is highly dependent on cluster usage, so I'm not sure if we are able to generalize anything from it. -James > On Nov 15, 2017, at 5:05 PM, Vahid S Hashemian <vahidhashem...@us.ibm.com> > wrote: > > Thanks Jeff. > > I believe the in-memory cache size is currently unbounded. > As you mentioned the size of this cache on each broker is a factor of the > number of consumer groups (whose coordinator is on that broker) and the > number of partitions in each group. > With compaction in mind, the cache size could be manageable even with the > current KIP. > We could also consider implementing KAFKA-4664 to minimize the cache size: > https://issues.apache.org/jira/browse/KAFKA-5664. > > It would be great to hear feedback from others (and committers) on this. > > --Vahid > > > > > From: Jeff Widman <j...@jeffwidman.com> > To: dev@kafka.apache.org > Date: 11/15/2017 01:04 PM > Subject: Re: [DISCUSS] KIP-211: Revise Expiration Semantics of > Consumer Group Offsets > > > > I thought about this scenario as well. > > However, my conclusion was that because __consumer_offsets is a compacted > topic, this extra clutter from short-lived consumer groups is negligible. > > The disk size is the product of the number of consumer groups and the > number of partitions in the group's subscription. Typically I'd expect > that > for short-lived consumer groups, that number < 100K. > > The one area I wasn't sure of was how the group coordinator's in-memory > cache of offsets works. Is it a pull-through cache of unbounded size or > does it contain all offsets of all groups that use that broker as their > coordinator? If the latter, possibly there's an OOM risk there. If so, > might be worth investigating changing the cache design to a bounded size. > > Also, switching to this design means that consumer groups no longer need > to > commit all offsets, they only need to commit the ones that changed. I > expect in certain cases there will be broker-side performance gains due to > parsing smaller OffsetCommit requests. For example, due to some bad design > decisions we have some a couple of topics that have 1500 partitions of > which ~10% are regularly used. So 90% of the OffsetCommit request > processing is unnecessary. > > > > On Wed, Nov 15, 2017 at 11:27 AM, Vahid S Hashemian < > vahidhashem...@us.ibm.com> wrote: > >> I'm forwarding this feedback from John to the mailing list, and > responding >> at the same time: >> >> John, thanks for the feedback. I agree that the scenario you described >> could lead to unnecessary long offset retention for other consumer > groups. >> If we want to address that in this KIP we could either keep the >> 'retention_time' field in the protocol, or propose a per group retention >> configuration. >> >> I'd like to ask for feedback from the community on whether we should >> design and implement a per-group retention configuration as part of this >> KIP; or keep it simple at this stage and go with one broker level > setting >> only. >> Thanks in advance for sharing your opinion. >> >> --Vahid >> >> >> >> >> From: John Crowley <jdcrow...@gmail.com> >> To: vahidhashem...@us.ibm.com >> Date: 11/15/2017 10:16 AM >> Subject: [DISCUSS] KIP-211: Revise Expiration Semantics of > Consumer >> Group Offsets >> >> >> >> Sorry for the clutter, first found KAFKA-3806, then -4682, and finally >> this KIP - they have more detail which I’ll avoid duplicating here. >> >> Think that not starting the expiration until all consumers have ceased, >> and clearing all offsets at the same time, does clean things up and > solves >> 99% of the original issues - and 100% of my particular concern. >> >> A valid use-case may still have a periodic application - say production >> applications posting to Topics all week, and then a weekend batch job >> which consumes all new messages. >> >> Setting offsets.retention.minutes = 10 days does cover this but at the >> cost of extra clutter if there are other consumer groups which are truly >> created/used/abandoned on a frequent basis. Being able to set >> offsets.retention.minutes on a per groupId basis allows this to also be >> covered cleanly, and makes it visible that these groupIds are a special >> case. >> >> But relatively minor, and should not delay the original KIP. >> >> Thanks, >> >> John Crowley >> >> >> >> >> >> >> >> > > > -- > > *Jeff Widman* > jeffwidman.com < > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.jeffwidman.com_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=FOUocoBhSIWtjKztnhZYzYCu5XYGV8CH1aLuXkISF8s&s=jwQcBz1Q8MKz2AxGRabEJyGz2yzfOihfjgGaFRTnxw8&e= >> | 740-WIDMAN-J (943-6265) > <>< > > > >