Hi Chris, I've updated the KIP to call out the parsing logic and the user expectations explicitly. Thanks again for all your feedback on this KIP! I'll wait for a few more days to see if anyone else has comments before kicking off a vote thread.
Thanks, Yash On Thu, Oct 5, 2023 at 10:49 PM Chris Egerton <chr...@aiven.io.invalid> wrote: > Hi Yash, > > Yeah, I think just hardcoding with JSON-first, properties-second is fine. > IMO it's worth calling this out explicitly in the KIP. > > Apart from that, no further comments. LGTM, thanks for the KIP! > > Cheers, > > Chris > > On Thu, Oct 5, 2023 at 6:23 AM Yash Mayya <yash.ma...@gmail.com> wrote: > > > Hi Chris, > > > > Thanks for all your feedback so far! > > > > 3. That's a good question. I was thinking we'd do some "intelligent" > > parsing internally during the implementation (i.e. essentially your last > > option - attempting to parse first as one format, then the other) which > is > > why I didn't include any more details in the KIP itself (where I've only > > outlined the contract changes). This would allow for the smoothest user > > experience IMO and all the heavy lifting will be done in the parsing > logic. > > All the other options seemed either clunky or brittle from the user > > experience point of view. In terms of the actual implementation itself, > > we'd probably want to first try parsing it into the supported JSON > > structures before trying to parse it into Java properties since the Java > > properties format is very permissive (i.e. we won't really see any errors > > on attempting to parse a JSON file into Java properties). > > > > Thanks, > > Yash > > > > On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton <chr...@aiven.io.invalid> > > wrote: > > > > > Hi Yash, > > > > > > Looking great! Few more thoughts: > > > > > > > > > 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a > > > blocker; thanks for addressing my concerns about the name of the field > > and > > > how it may be perceived by users. > > > > > > 2. (Addressed) Thanks for looking into this, and sorry it turned out to > > be > > > a bit of a dead end! I'm convinced that the current proposal is good > > > enough. > > > > > > 3. Can you shed a little more light on how we'll determine whether a > > > connector config should be parsed as JSON or as a properties file? Will > > > this be based on file extension, a command-line flag (which might apply > > to > > > all configs, or individual configs), attempting to parse first as one > > > format then the other, something else? > > > > > > 4. (Addressed) Thanks! Looks great. > > > > > > 6. (Addressed) Awesome, great to hear. The point about laggy connector > > > startup is very convincing; my paranoia is satiated. > > > > > > > > > Cheers, > > > > > > Chris > > > > > > On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <yash.ma...@gmail.com> > wrote: > > > > > > > Hi Chris, > > > > > > > > Thanks for the quick follow up and the continued insightful > discourse! > > > > > > > > 1. Fair point on the need to differentiate it from the actual state > > > > displayed in the status API, I like the prefix of "initial" to make > > that > > > > differentiation (from your suggested alternatives previously). > > Regarding > > > > the dots vs underscores as delimiters - the new state field will be a > > top > > > > level field in the connector creation request body alongside the > > "config" > > > > map (i.e. it won't be a connector configuration itself), so I think > we > > > > should be using the underscore delimiter for consistency. For now, > I've > > > > updated the KIP to use "initial_state" as the new field's name - let > me > > > > know if you disagree, and I'd be happy to reconsider. > > > > > > > > 2. Hm, I actually hadn't considered the downgrade implications with > > your > > > > proposed single record approach. I agree that it's a bigger downside > > than > > > > writing two records to the config topic. I do understand your > concerns > > > with > > > > the potential for config topic inconsistencies which is why I > proposed > > > > writing the target state first (since the presence of a target state > > for > > > a > > > > connector with no configuration is a benign condition). Also, even in > > the > > > > non-transactional config topic producer case - if there is a failure > > > > between the two writes, the user will be notified of the error > > > > synchronously via the API response (ref - > > > > https://github.com/apache/kafka/pull/12984) and will be able to > safely > > > > retry the operation. I don't see how we'd be able to do a single > record > > > > write approach along with supporting clean downgrades since we'd > either > > > > need to introduce a new record type or add a new field to an existing > > > > record type - neither of which would be recognized as such by an > older > > > > Connect worker. > > > > > > > > > Standalone mode has always supported the REST API, > > > > > and so far FWICTwe've maintained feature parity between > > > > > the two modes > > > > > > > > > add support for JSON files with standalone mode. > > > > > > > > 3. Thanks, I wasn't aware about standalone mode always having > supported > > > the > > > > full REST API - I thought I'd seen some references earlier indicating > > > > otherwise. In that case, I do agree that it makes sense to maintain > > > parity > > > > across both methods of connector creation for user experience > > > consistency. > > > > I really like the idea of updating the standalone mode CLI to be able > > to > > > > parse JSON files (in the same format as the connector creation REST > API > > > > endpoint request body) along with Java properties files since I think > > > that > > > > offers two big benefits. One is that users will be able to copy and > use > > > > examples across both the methods of connector creation (REST API > > requests > > > > with JSON request bodies and JSON files passed to the standalone mode > > > > startup CLI). The second benefit is that any future extensions (such > as > > > the > > > > "offsets" field we've discussed in this thread) would be easily > applied > > > > across both the methods consistently instead of introducing new (and > > > likely > > > > ugly) CLI flags. I've updated the KIP to include this change in the > > > > standalone mode CLI. > > > > > > > > 4. Makes sense, I've added this under a new "Future Work" section in > > the > > > > KIP. > > > > > > > > 6. From what I can tell, there shouldn't be any issues with the lack > of > > > > task configurations in the config topic and it seems to be a > supported > > > > assumption across the Connect code-base that a connector > configuration > > > > could exist without any task configurations for the connector (a > > > situation > > > > that could currently manifest with slow starting connectors, > connectors > > > > that fail during startup, connectors that fail to generate task > > > > configurations, connectors that are paused right after being created > > > etc.). > > > > I did also try out a small prototype before publishing this KIP and > > > things > > > > do work as expected when creating a connector in the PAUSED / STOPPED > > > state > > > > by simply writing the appropriate target state along with the > connector > > > > configuration to the config topic. > > > > > > > > Thanks, > > > > Yash > > > > > > > > On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <chr...@aiven.io.invalid > > > > > > wrote: > > > > > > > > > Hi Yash, > > > > > > > > > > Thanks for the in-depth discussion! Continuations here: > > > > > > > > > > 1. Regarding delimiters (dots vs. underscores), we use dots in > > > connector > > > > > configs for all runtime-recognized properties, including but not > > > limited > > > > to > > > > > connector.class, tasks.max, key.converter, key.converter.*, and > > > > > transforms.*.type. Regarding the choice of name--I think it's > because > > > the > > > > > concept here is different from the "state" that we display in the > > > status > > > > > API that a different name is warranted. Users can't directly write > > the > > > > > connector's actual state, they can still only specify a target > state. > > > And > > > > > there's no guarantee that a connector will ever enter a target > state > > > > > (whether it's specified at creation time or later), since a failure > > can > > > > > always occur. > > > > > > > > > > 2. It still seems simpler to emit one record instead of two, > > especially > > > > > since we're not guaranteed that the leader will be using a > > > transactional > > > > > producer. I guess I'm just wary of knowingly putting the config > topic > > > in > > > > an > > > > > inconsistent state (target state present without accompanying > > connector > > > > > config), even if it's meant to be only for a brief period. Thinking > > > about > > > > > it some more though, a one-record approach comes with drawbacks in > > the > > > > > downgrade scenario: older workers wouldn't know how to handle the > new > > > > > config format and would just fall back to creating the connector in > > the > > > > > running state. I suppose we should favor the two-record approach > > since > > > > the > > > > > downgrade scenario is more likely than the other failure mode, but > > it'd > > > > be > > > > > nice if we could think of a way to satisfy both concerns. Not a > > > blocker, > > > > > though. > > > > > > > > > > 3. Standalone mode has always supported the REST API, and so far > > FWICT > > > > > we've maintained feature parity between the two modes for > everything > > > > except > > > > > exactly-once source connectors, which would have required > significant > > > > > additional work since we'd have to add support for storing source > > > > connector > > > > > offsets in a Kafka topic instead of on local storage like we > > currently > > > > do. > > > > > I'd really prefer if we could try to maintain feature parity > wherever > > > > > possible--one way we could possibly do that with this KIP is to > also > > > add > > > > > support for JSON files with standalone mode. > > > > > > > > > > 4. Yeah, no need to block on that idea since there are other use > > cases > > > > for > > > > > creating stopped connectors. We can treat it like the option to > > delete > > > > > offsets along with the connector discussed in KIP-875: punt for > now, > > > > > possibly implement later pending user feedback and indication of > > > demand. > > > > > Might be worth adding to a "Future work" section as an indication > > that > > > we > > > > > haven't ruled it out (in which case it'd make sense as a rejected > > > > > alternative) but have chosen not to implement yet. > > > > > > > > > > > > > > > And I had one new thought that's pretty implementation-oriented but > > may > > > > > influence the design slightly: > > > > > > > > > > 6. Right now we write an empty set of task configs to the config > > topic > > > > when > > > > > handling requests to stop a connector in distributed mode. Do we > need > > > to > > > > do > > > > > the same when creating connectors in the stopped state, or add any > > > other > > > > > special logic besides noting the new state in the config topic? Or > is > > > it > > > > > sufficient to write a non-running target state to the config topic > > and > > > > then > > > > > rely on existing logic to simply refuse to generate task configs > for > > > the > > > > > newly-created connector? Is there any chance that the lack of task > > > > configs > > > > > (as opposed to an empty list of task configs) in the config topic > > for a > > > > > connector that exists will cause issues? > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Chris > > > > > > > > > > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <yash.ma...@gmail.com> > > > wrote: > > > > > > > > > > > Hi Chris, > > > > > > > > > > > > Thanks for taking a look at this KIP! > > > > > > > > > > > > 1. I chose to go with simply "state" as that exact term is > already > > > > > exposed > > > > > > via some of the existing REST API responses and would be one that > > > users > > > > > are > > > > > > already familiar with (although admittedly something like > > > > "initial_state" > > > > > > wouldn't be much of a jump). Since it's a field in the request > body > > > for > > > > > the > > > > > > connector creation endpoint, wouldn't it be implied that it is > the > > > > > > "initial" state just like the "config" field represents the > > "initial" > > > > > > configuration? Also, I don't think x.y has been established as > the > > > > field > > > > > > naming convention in the Connect REST API right? From what I can > > > tell, > > > > > x_y > > > > > > is the convention being followed for fields in requests > > > ("kafka_topic" > > > > / > > > > > > "kafka_partition" / "kafka_offset" in the offsets APIs for > > instance) > > > > and > > > > > > responses ("error_count", "kafka_cluster_id", > "recommended_values" > > > > etc.). > > > > > > > > > > > > 2. The connector configuration record is currently used for both > > > > > connector > > > > > > create requests as well as connector config update requests. > Since > > > > we're > > > > > > only allowing configuring the target state for newly created > > > > connectors, > > > > > I > > > > > > feel like it'll be a cleaner separation of concerns to use the > > > existing > > > > > > records for connector configurations and connector target states > > > rather > > > > > > than bundling the "state" and "state.v2" (or equivalent) fields > > into > > > > the > > > > > > connector configuration record. The additional write should be > very > > > > > minimal > > > > > > overhead and the two writes would be an atomic operation for > > Connect > > > > > > clusters that are using a transactional producer for the config > > topic > > > > > > anyway. Thoughts? > > > > > > > > > > > > 3. I was thinking that we'd support standalone mode via the same > > > > > connector > > > > > > creation REST API endpoint changes (addition of the "state" > field). > > > If > > > > > > there is further interest in adding similar support to the > command > > > line > > > > > > method of creating connectors as well, perhaps we could do so in > a > > > > small > > > > > > follow-on KIP? I feel like ever since the standalone mode started > > > > > > supporting the full Connect REST API, the command line method of > > > > creating > > > > > > connectors is more of a "legacy" concept. > > > > > > > > > > > > 4. Yeah, connector / offsets migration was just used as a > > > > representative > > > > > > example of how this feature could be useful - I didn't intend for > > it > > > to > > > > > be > > > > > > the sole purpose of this KIP. However, that said, I really like > the > > > > idea > > > > > of > > > > > > accepting an "offsets" field in the connector creation request > > since > > > > it'd > > > > > > reduce the number of user operations required from 3 (create the > > > > > connector > > > > > > in a STOPPED state; alter the offsets; resume the connector) to > 1. > > > I'd > > > > be > > > > > > happy to either create or review a separate small KIP for that if > > it > > > > > sounds > > > > > > acceptable to you? > > > > > > > > > > > > 5. Whoops, thanks, I hadn't even noticed that that's where I had > > > linked > > > > > to. > > > > > > Fixed! > > > > > > > > > > > > Thanks, > > > > > > Yash > > > > > > > > > > > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton > > > <chr...@aiven.io.invalid > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Yash, > > > > > > > > > > > > > > Thanks for the KIP! Frankly this feels like an oversight in 875 > > and > > > > I'm > > > > > > > glad we're closing that loop ASAP. > > > > > > > > > > > > > > > > > > > > > Here are my thoughts: > > > > > > > > > > > > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state" > > > might > > > > > be > > > > > > > better than just "state" for the field name in the connector > > > creation > > > > > > > request. > > > > > > > > > > > > > > 2. Why implement this in distributed mode with two writes to > the > > > > config > > > > > > > topic? We could augment the existing format for connector > configs > > > in > > > > > the > > > > > > > config topic [1] with a new field instead. > > > > > > > > > > > > > > 3. Although standalone mode is mentioned in the KIP, there's no > > > > detail > > > > > on > > > > > > > the Java properties file format that we support for standalone > > mode > > > > > > (i.e., > > > > > > > `connect-standalone config/connect-standalone.properties > > > > > > > config/connect-file-source.properties > > > > > > > config/connect-file-sink.properties`). Do we plan on adding > > support > > > > for > > > > > > > that mode as well? > > > > > > > > > > > > > > 4. I suspect there will be advantages for this feature beyond > > > offsets > > > > > > > migration, but if offsets migration were the only motivation > for > > > it, > > > > > > > wouldn't it be simpler to accept an "offsets" field in > connector > > > > > creation > > > > > > > requests? That way, users wouldn't have to start a connector > in a > > > > > > different > > > > > > > state and then resume it, they could just create the connector > > like > > > > > > normal. > > > > > > > I think the current proposal is acceptable, but wanted to float > > > this > > > > > > > alternative in case we anticipate lots of connector migrations > > and > > > > want > > > > > > to > > > > > > > optimize for them a bit more. > > > > > > > > > > > > > > 5. (NIt): We can link to our own Javadocs [2] instead of > > > javadoc.io > > > > > > > > > > > > > > > > > > > > > [1] - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236 > > > > > > > > > > > > > > [2] - > > > > > > > > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya < > yash.ma...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Ashwin, > > > > > > > > > > > > > > > > Thanks for taking a look and sharing your thoughts! > > > > > > > > > > > > > > > > 1. Yes, the request / response formats of the two APIs were > > > > > > intentionally > > > > > > > > made identical for such use-cases. [1] > > > > > > > > > > > > > > > > 2. I'm assuming you're referring to retaining the offset / > > config > > > > > topic > > > > > > > > records for a connector when it is deleted by a user? > Firstly, > > a > > > > > > > > connector's offsets aren't actually currently deleted when > the > > > > > > connector > > > > > > > is > > > > > > > > deleted - it was listed as potential future work in KIP-875 > > [2]. > > > > > > > Secondly, > > > > > > > > the config topic is the source of truth for the state of a > > > Connect > > > > > > > cluster > > > > > > > > and a connector deletion is done by writing a null-valued > > record > > > > > > > > (tombstone) with the connector key to the config topic. We > > could > > > > > > > > potentially modify the delete mechanism to use a special new > > > record > > > > > > > > (instead of a tombstone with the connector key) in order to > > > retain > > > > > the > > > > > > > > latest configuration of a connector while still deleting the > > > actual > > > > > > > > connector - however, this would have its downsides and I > don't > > > see > > > > > too > > > > > > > many > > > > > > > > benefits. Furthermore, connector migration between different > > > Kafka > > > > > > > clusters > > > > > > > > was just used as a representational use case for creating > > > > connectors > > > > > > in a > > > > > > > > stopped state - it isn't the primary focus of this KIP as > such. > > > > > > > > > > > > > > > > 3. KIP-875 goes into some more details about this [3], but > the > > > TLDR > > > > > is > > > > > > > that > > > > > > > > the STOPPED state will be treated like the PAUSED state on > > older > > > > > > workers > > > > > > > > that don't recognize the STOPPED state. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Yash > > > > > > > > > > > > > > > > [1] - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat > > > > > > > > > > > > > > > > [2] - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors > > > > > > > > > > > > > > > > [3] - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate > > > > > > > > > > > > > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin > > > > <apan...@confluent.io.invalid > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks Yash! This is very useful for migrating connectors > > from > > > > one > > > > > > > > cluster > > > > > > > > > to another. > > > > > > > > > > > > > > > > > > I had the following comments/questions > > > > > > > > > > > > > > > > > > 1. Is the offset read using `GET /offsets` api always > > > guaranteed > > > > to > > > > > > be > > > > > > > > in a > > > > > > > > > format accepted by `PATCH /offsets` ? > > > > > > > > > 2. I had to tackle a similar migration situation but the > two > > > > > connect > > > > > > > > > clusters in question were using the same backing Kafka > > cluster. > > > > The > > > > > > > > > challenge in this case is that when I delete the original > > > > > connector, > > > > > > I > > > > > > > > want > > > > > > > > > to retain offsets and config topics. Do you think we should > > > > support > > > > > > > > > deletion of a connector without removal of these topics as > > part > > > > of > > > > > > this > > > > > > > > KIP > > > > > > > > > ? > > > > > > > > > 3. In the case of a downgrade, how will Connect worker > handle > > > the > > > > > > > > optional > > > > > > > > > “state” field in config topic ? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Ashwin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya < > > > > yash.ma...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > I'd like to begin discussion on a KIP to allow creating > > > > > connectors > > > > > > > in a > > > > > > > > > > stopped state - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Yash > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >