Hi Greg, Thanks for your thoughts.
RE your design questions: 1. The responses in the REST API may grow fairly large for sink connectors that consume from a large number of Kafka topic partitions, and source connectors that store a wide range of source partitions. If there is a large amount of data for a single source partition for a source connector, we will only show the latest committed offset (that the worker has read from the offsets topic) for that source partition. We could consider adding some kind of pagination API to put a default upper bound on the size of responses and allow (or really, require) users to issue multiple requests in order to get a complete view of the offsets of connectors with a large number of partitions. My initial instinct is to hold off on introducing this complexity, though. I can imagine some of the responses getting a little ugly to view raw in a terminal or a browser, but I'm not sure we'll see more significant issues than that. If we receive feedback that this is a serious-enough issue, then we can revisit it in a follow-up KIP for offsets V2. In the meantime, it's always possible to implement this behavior via a REST extension if it's a blocker for some users. Perhaps to make that alternative slightly easier to implement, we can add a contract to the API that responses will always be sorted lexicographically by source partition for source connectors or by Kafka topic (subsorting by partition) for sink connectors. What are your thoughts? If this strategy makes sense I can add it to the future work section. 2. The new STOPPED state should "just work" with the rebalancing algorithm, since it'll be implemented under the hood by publishing an empty set of task configs to the config topic for the connector. That should be enough to trigger a rebalance and to cause no tasks for the connector to be allocated across the cluster during that rebalance, regardless of the protocol (eager or incremental) that's in use. RE your implementation questions: 1. It's mostly a matter of convenience; we can issue a single admin request to delete the group rather than having to identify every topic partition the consumer group has committed offsets for and then issue a follow-up request to delete the offsets for that group. I made note of this detail in the KIP to make sure that we were comfortable with completely removing the consumer group instead of wiping its offsets, since it seems possible that some users may find that behavior unexpected. 2. The idea is to only perform zombie fencing when we know that it is necessary (which is a principle borrowed from KIP-618), so in this case, we'll only do it in response to an offsets reset request, and not when a connector is stopped. After being stopped, it's possible that the connector gets deleted, in which case, a proactive round of fencing would have served no benefit. It's also worth noting that publishing an empty set of task configs is not the same as tombstoning existing task configs; putting a connector into the STOPPED state should require no tombstones to be emitted to the config topic. Cheers, Chris On Thu, Oct 13, 2022 at 6:26 PM Greg Harris <greg.har...@aiven.io.invalid> wrote: > Hey Chris, > > Thanks for the KIP! > > I think this is an important feature for both development and operations > use-cases, and it's an obvious gap in the REST feature set. > I also appreciate the incremental nature of the KIP and the future > extensions that will now be possible. > > I had a couple of questions about the design and it's extensibility: > > 1. How do you imagine the API will behave with connectors that have > extremely large numbers of partitions (thousands or more) and/or source > connectors with large amounts of data per partition? > > 2. Does the new STOPPED state need any special integration with the > rebalance subsystem, or can the rebalance algorithms remain ignorant of the > target state of connectors? > > And about the implementation: > > 1. For my own edification, what is the difference between deleting a > consumer group and deleting all known offsets for that group? Does deleting > the group offer better/easier atomicity? > > 2. For EOS sources, will stopping the connector and tombstoning the task > configs perform a fence-out, or will that fence-out only occur when > performing the offsets DELETE operation? > > Thanks! > Greg > > On 2022/10/13 20:52:26 Chris Egerton wrote: > > Hi all, > > > > I noticed a fairly large gap in the first version of this KIP that I > > published last Friday, which has to do with accommodating connectors > > that target different Kafka clusters than the one that the Kafka Connect > > cluster uses for its internal topics and source connectors with dedicated > > offsets topics. I've since updated the KIP to address this gap, which has > > substantially altered the design. Wanted to give a heads-up to anyone > > that's already started reviewing. > > > > Cheers, > > > > Chris > > > > On Fri, Oct 7, 2022 at 1:29 PM Chris Egerton <ch...@aiven.io> wrote: > > > > > Hi all, > > > > > > I'd like to begin discussion on a KIP to add offsets support to the > Kafka > > > Connect REST API: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect > > > > > > Cheers, > > > > > > Chris > > > > > >