Hi Ashwin,

Thanks for the KIP! This would be a nice simplification to the process for
migrating connectors enabled by KIP-980, and would also add global support
for a feature I've seen implemented by hand in at least a couple connectors
over the years.


High-level thoughts:

1. If there are leftover offsets from a previous connector, what will the
impact on those offsets (if any) be if a new connector is created with the
same name with initial offsets specified? I can think of at least two
options: we leave those offsets as-are but allow any of the initial offsets
in the new connector request to overwrite them, or we automatically wipe
all existing offsets for the connector first before writing its initial
offsets and then creating it. I have a slight preference for the first
because it's simpler to implement and aligns with existing precedent for
offset handling where we never wipe them unless explicitly requested by the
user or connector, but it could be argued that the second is less likely to
generate footguns for users. Interested in your thoughts!

2. IMO preflight validation (i.e., the "Validate initial_offset before
creating the connector in STOPPED state rejected alternative) is a
must-have for this kind of feature. I acknowledge the possible race
condition (and I don't think it's worth it to track in-flight connector
creation requests in the herder in order to prevent this race, since
ever-blocking Connector operations would be cumbersome to deal with), and
the extra implementation effort. But I don't think either of those tip the
scales far enough to override the benefit of ensuring the submitted offsets
are valid before creating the connector.

3. On the topic of preflight validation--I also think it's important that
we validate both the connector config and the initial offsets before either
creating the connector or storing the initial offsets. I don't think this
point requires any changes to the KIP that aren't already proposed with
point 2. above, but wanted to see if we could adopt this as a primary goal
for the design of the feature and keep it in mind with future changes.

4. In the proposed changes section, the first step is "Create a connector
in STOPPED state", and the second step is "Validate the offset...". What
exactly is entailed by "Create a connector"? Submit the config to the
config topic (presumably after a preflight validation of the config)?
Participate in the ensuing rebalance? I'm a little hesitant to start
performing rebalances inside REST requests, wondering if we can find a
lighter-weight way to implement this.


Nits:

5. You can remove the italicized "This page is meant as a template for
writing a KIP" section after the table of contents.

6. Can you file a JIRA ticket for this change and add a link to it in the
KIP under the status section?

7. What do you think about changing the name for the new field from
"initial_offset" (singular) to "initial_offsets" (plural)? This is
especially relevant for connectors that read from multiple source
partitions, like MM2 and the various JDBC source connectors out there.

8. IMO it's not necessary to include this section: "Please note that sink
and source connectors have different schemas for offset." While it's
technically true that the fields of the objects inside the "partition" and
"offset" fields will likely differ between sink and source connectors,
they'll also likely differ in the exact same way across different sink
connectors. I think it's enough to link to the relevant section in KIP-875
on the details for the format.

9. Instead of "Connector-defined source partition" and "Connector-defined
source offset" in the comments for the sample connector creation body
(which aren't strictly accurate for sink connectors), can we say something
like "Source partition" and "Desired initial offset"?

10. In the compatibility section the KIP states that "This new feature will
use the current OffsetStorageWriter." IMO we should refrain from referring
to internal API class names in KIPs when possible, since those class names
may change and users may also mistakenly assume that they're part of the
public API. Can we say something like "This new feature will reuse existing
logic for storing offsets" instead?


Thanks again for the KIP!

Cheers,

Chris

On Thu, Nov 30, 2023 at 2:26 AM Ashwin <apan...@confluent.io.invalid> wrote:

> Hi all,
>
> I'd like to begin discussion on KIP-995 which proposes to allow users
> to specify initial offset as part of the request to create a connector
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-995%3A+Allow+users+to+specify+initial+offsets+while+creating+connectors
>
> During the discussion for KIP-980
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> >,
> which proposed the creation of connectors in STOPPED state, there was
> a suggestion to also allow setting the initial offset for a connector
> in the connector creation API. The proposal was deemed valid
> <https://lists.apache.org/thread/tq79l3975hy69wsycvwoh9n8gbp5j26f>
> (point no.4) but was deferred to a future KIP. This KIP proposes to
> implement that change and builds upon the changes introduced in
> KIP-875 <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Alteringoffsets(request)
> >
> and KIP-980 <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> >.
>
> Thanks,
> Ashwin
>

Reply via email to