Hey Yash, Thanks for your further comments. Here are my responses:
1) Deleting offsets via updateOffsets. Hmm, I am not sure this is really necessary to be part of the KIP at this point, and we can always add it later on if needed. I say this for the following reasons: - The size of offsets topic can be controlled by setting appropriate topic retention values and that is a standard practice in Kafka. Sure it's not always possible to get the right values but as I said it is a standard practice. For Connect specifically, there is also a KIP (KIP-943 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073470>) which is trying to solve the problem of a large connect-offsets topic. So, if that is really the motivation, then these are being addressed separately anyways. - Deleting offsets is not something which should be done very frequently and should be handled with care. That is why KIP-875's mechanism to have users/ cluster admin do this externally is the right thing to do. Agreed this involves some toil but it's not something that should be done on a very regular basis. - There is no stopping connector implementations to send tombstone records as offsets but in practice how many connectors actually do it? Maybe 1 or 2 from what we discussed. - The usecases you highlighted are edge cases at best. As I have been saying, if it is needed we can always add it in the future but that doesn't look like a problem we need to solve upfront. Due to these reasons, I don't think this is a point that we need to stress so much upon. I say this because offsets topic's purging/clean up can be handled either via standard Kafka techniques (point #1 above) or via Connect runtime techniques (Pt #2 above). IMO the problem we are trying to solve via this KIP has been solved by connectors using techniques which have been termed as having higher maintenance cost or a high cognitive load (i.e separate topic) and that needs to be addressed upfront. And since you yourself termed it as a nice to have feature, we can leave it to that and take it up as Future Work. Hope that's ok with you and other community members. 2) Purpose of offsets parameter in updateOffsets The main purpose is to provide the task with the visibility into what partitions are getting their offsets committed. It is not necessary that a task might choose to update offsets everytime it sees that a given source partition is missing from the about to be committed offsets. Maybe it chooses to wait for some X iterations or X amount of time and send out an updated offset for a partition only when such thresholds are breached. Even here we could argue that since it's sending the partition/offsets it can do the tracking on it's own, but IMO that is too much work given that the information is already available via offsets to be committed. Thanks! Sagar.