Hi Chris, Thanks for all your feedback so far!
3. That's a good question. I was thinking we'd do some "intelligent" parsing internally during the implementation (i.e. essentially your last option - attempting to parse first as one format, then the other) which is why I didn't include any more details in the KIP itself (where I've only outlined the contract changes). This would allow for the smoothest user experience IMO and all the heavy lifting will be done in the parsing logic. All the other options seemed either clunky or brittle from the user experience point of view. In terms of the actual implementation itself, we'd probably want to first try parsing it into the supported JSON structures before trying to parse it into Java properties since the Java properties format is very permissive (i.e. we won't really see any errors on attempting to parse a JSON file into Java properties). Thanks, Yash On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton <chr...@aiven.io.invalid> wrote: > Hi Yash, > > Looking great! Few more thoughts: > > > 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a > blocker; thanks for addressing my concerns about the name of the field and > how it may be perceived by users. > > 2. (Addressed) Thanks for looking into this, and sorry it turned out to be > a bit of a dead end! I'm convinced that the current proposal is good > enough. > > 3. Can you shed a little more light on how we'll determine whether a > connector config should be parsed as JSON or as a properties file? Will > this be based on file extension, a command-line flag (which might apply to > all configs, or individual configs), attempting to parse first as one > format then the other, something else? > > 4. (Addressed) Thanks! Looks great. > > 6. (Addressed) Awesome, great to hear. The point about laggy connector > startup is very convincing; my paranoia is satiated. > > > Cheers, > > Chris > > On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <yash.ma...@gmail.com> wrote: > > > Hi Chris, > > > > Thanks for the quick follow up and the continued insightful discourse! > > > > 1. Fair point on the need to differentiate it from the actual state > > displayed in the status API, I like the prefix of "initial" to make that > > differentiation (from your suggested alternatives previously). Regarding > > the dots vs underscores as delimiters - the new state field will be a top > > level field in the connector creation request body alongside the "config" > > map (i.e. it won't be a connector configuration itself), so I think we > > should be using the underscore delimiter for consistency. For now, I've > > updated the KIP to use "initial_state" as the new field's name - let me > > know if you disagree, and I'd be happy to reconsider. > > > > 2. Hm, I actually hadn't considered the downgrade implications with your > > proposed single record approach. I agree that it's a bigger downside than > > writing two records to the config topic. I do understand your concerns > with > > the potential for config topic inconsistencies which is why I proposed > > writing the target state first (since the presence of a target state for > a > > connector with no configuration is a benign condition). Also, even in the > > non-transactional config topic producer case - if there is a failure > > between the two writes, the user will be notified of the error > > synchronously via the API response (ref - > > https://github.com/apache/kafka/pull/12984) and will be able to safely > > retry the operation. I don't see how we'd be able to do a single record > > write approach along with supporting clean downgrades since we'd either > > need to introduce a new record type or add a new field to an existing > > record type - neither of which would be recognized as such by an older > > Connect worker. > > > > > Standalone mode has always supported the REST API, > > > and so far FWICTwe've maintained feature parity between > > > the two modes > > > > > add support for JSON files with standalone mode. > > > > 3. Thanks, I wasn't aware about standalone mode always having supported > the > > full REST API - I thought I'd seen some references earlier indicating > > otherwise. In that case, I do agree that it makes sense to maintain > parity > > across both methods of connector creation for user experience > consistency. > > I really like the idea of updating the standalone mode CLI to be able to > > parse JSON files (in the same format as the connector creation REST API > > endpoint request body) along with Java properties files since I think > that > > offers two big benefits. One is that users will be able to copy and use > > examples across both the methods of connector creation (REST API requests > > with JSON request bodies and JSON files passed to the standalone mode > > startup CLI). The second benefit is that any future extensions (such as > the > > "offsets" field we've discussed in this thread) would be easily applied > > across both the methods consistently instead of introducing new (and > likely > > ugly) CLI flags. I've updated the KIP to include this change in the > > standalone mode CLI. > > > > 4. Makes sense, I've added this under a new "Future Work" section in the > > KIP. > > > > 6. From what I can tell, there shouldn't be any issues with the lack of > > task configurations in the config topic and it seems to be a supported > > assumption across the Connect code-base that a connector configuration > > could exist without any task configurations for the connector (a > situation > > that could currently manifest with slow starting connectors, connectors > > that fail during startup, connectors that fail to generate task > > configurations, connectors that are paused right after being created > etc.). > > I did also try out a small prototype before publishing this KIP and > things > > do work as expected when creating a connector in the PAUSED / STOPPED > state > > by simply writing the appropriate target state along with the connector > > configuration to the config topic. > > > > Thanks, > > Yash > > > > On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <chr...@aiven.io.invalid> > > wrote: > > > > > Hi Yash, > > > > > > Thanks for the in-depth discussion! Continuations here: > > > > > > 1. Regarding delimiters (dots vs. underscores), we use dots in > connector > > > configs for all runtime-recognized properties, including but not > limited > > to > > > connector.class, tasks.max, key.converter, key.converter.*, and > > > transforms.*.type. Regarding the choice of name--I think it's because > the > > > concept here is different from the "state" that we display in the > status > > > API that a different name is warranted. Users can't directly write the > > > connector's actual state, they can still only specify a target state. > And > > > there's no guarantee that a connector will ever enter a target state > > > (whether it's specified at creation time or later), since a failure can > > > always occur. > > > > > > 2. It still seems simpler to emit one record instead of two, especially > > > since we're not guaranteed that the leader will be using a > transactional > > > producer. I guess I'm just wary of knowingly putting the config topic > in > > an > > > inconsistent state (target state present without accompanying connector > > > config), even if it's meant to be only for a brief period. Thinking > about > > > it some more though, a one-record approach comes with drawbacks in the > > > downgrade scenario: older workers wouldn't know how to handle the new > > > config format and would just fall back to creating the connector in the > > > running state. I suppose we should favor the two-record approach since > > the > > > downgrade scenario is more likely than the other failure mode, but it'd > > be > > > nice if we could think of a way to satisfy both concerns. Not a > blocker, > > > though. > > > > > > 3. Standalone mode has always supported the REST API, and so far FWICT > > > we've maintained feature parity between the two modes for everything > > except > > > exactly-once source connectors, which would have required significant > > > additional work since we'd have to add support for storing source > > connector > > > offsets in a Kafka topic instead of on local storage like we currently > > do. > > > I'd really prefer if we could try to maintain feature parity wherever > > > possible--one way we could possibly do that with this KIP is to also > add > > > support for JSON files with standalone mode. > > > > > > 4. Yeah, no need to block on that idea since there are other use cases > > for > > > creating stopped connectors. We can treat it like the option to delete > > > offsets along with the connector discussed in KIP-875: punt for now, > > > possibly implement later pending user feedback and indication of > demand. > > > Might be worth adding to a "Future work" section as an indication that > we > > > haven't ruled it out (in which case it'd make sense as a rejected > > > alternative) but have chosen not to implement yet. > > > > > > > > > And I had one new thought that's pretty implementation-oriented but may > > > influence the design slightly: > > > > > > 6. Right now we write an empty set of task configs to the config topic > > when > > > handling requests to stop a connector in distributed mode. Do we need > to > > do > > > the same when creating connectors in the stopped state, or add any > other > > > special logic besides noting the new state in the config topic? Or is > it > > > sufficient to write a non-running target state to the config topic and > > then > > > rely on existing logic to simply refuse to generate task configs for > the > > > newly-created connector? Is there any chance that the lack of task > > configs > > > (as opposed to an empty list of task configs) in the config topic for a > > > connector that exists will cause issues? > > > > > > > > > Cheers, > > > > > > Chris > > > > > > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <yash.ma...@gmail.com> > wrote: > > > > > > > Hi Chris, > > > > > > > > Thanks for taking a look at this KIP! > > > > > > > > 1. I chose to go with simply "state" as that exact term is already > > > exposed > > > > via some of the existing REST API responses and would be one that > users > > > are > > > > already familiar with (although admittedly something like > > "initial_state" > > > > wouldn't be much of a jump). Since it's a field in the request body > for > > > the > > > > connector creation endpoint, wouldn't it be implied that it is the > > > > "initial" state just like the "config" field represents the "initial" > > > > configuration? Also, I don't think x.y has been established as the > > field > > > > naming convention in the Connect REST API right? From what I can > tell, > > > x_y > > > > is the convention being followed for fields in requests > ("kafka_topic" > > / > > > > "kafka_partition" / "kafka_offset" in the offsets APIs for instance) > > and > > > > responses ("error_count", "kafka_cluster_id", "recommended_values" > > etc.). > > > > > > > > 2. The connector configuration record is currently used for both > > > connector > > > > create requests as well as connector config update requests. Since > > we're > > > > only allowing configuring the target state for newly created > > connectors, > > > I > > > > feel like it'll be a cleaner separation of concerns to use the > existing > > > > records for connector configurations and connector target states > rather > > > > than bundling the "state" and "state.v2" (or equivalent) fields into > > the > > > > connector configuration record. The additional write should be very > > > minimal > > > > overhead and the two writes would be an atomic operation for Connect > > > > clusters that are using a transactional producer for the config topic > > > > anyway. Thoughts? > > > > > > > > 3. I was thinking that we'd support standalone mode via the same > > > connector > > > > creation REST API endpoint changes (addition of the "state" field). > If > > > > there is further interest in adding similar support to the command > line > > > > method of creating connectors as well, perhaps we could do so in a > > small > > > > follow-on KIP? I feel like ever since the standalone mode started > > > > supporting the full Connect REST API, the command line method of > > creating > > > > connectors is more of a "legacy" concept. > > > > > > > > 4. Yeah, connector / offsets migration was just used as a > > representative > > > > example of how this feature could be useful - I didn't intend for it > to > > > be > > > > the sole purpose of this KIP. However, that said, I really like the > > idea > > > of > > > > accepting an "offsets" field in the connector creation request since > > it'd > > > > reduce the number of user operations required from 3 (create the > > > connector > > > > in a STOPPED state; alter the offsets; resume the connector) to 1. > I'd > > be > > > > happy to either create or review a separate small KIP for that if it > > > sounds > > > > acceptable to you? > > > > > > > > 5. Whoops, thanks, I hadn't even noticed that that's where I had > linked > > > to. > > > > Fixed! > > > > > > > > Thanks, > > > > Yash > > > > > > > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton > <chr...@aiven.io.invalid > > > > > > > wrote: > > > > > > > > > Hi Yash, > > > > > > > > > > Thanks for the KIP! Frankly this feels like an oversight in 875 and > > I'm > > > > > glad we're closing that loop ASAP. > > > > > > > > > > > > > > > Here are my thoughts: > > > > > > > > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state" > might > > > be > > > > > better than just "state" for the field name in the connector > creation > > > > > request. > > > > > > > > > > 2. Why implement this in distributed mode with two writes to the > > config > > > > > topic? We could augment the existing format for connector configs > in > > > the > > > > > config topic [1] with a new field instead. > > > > > > > > > > 3. Although standalone mode is mentioned in the KIP, there's no > > detail > > > on > > > > > the Java properties file format that we support for standalone mode > > > > (i.e., > > > > > `connect-standalone config/connect-standalone.properties > > > > > config/connect-file-source.properties > > > > > config/connect-file-sink.properties`). Do we plan on adding support > > for > > > > > that mode as well? > > > > > > > > > > 4. I suspect there will be advantages for this feature beyond > offsets > > > > > migration, but if offsets migration were the only motivation for > it, > > > > > wouldn't it be simpler to accept an "offsets" field in connector > > > creation > > > > > requests? That way, users wouldn't have to start a connector in a > > > > different > > > > > state and then resume it, they could just create the connector like > > > > normal. > > > > > I think the current proposal is acceptable, but wanted to float > this > > > > > alternative in case we anticipate lots of connector migrations and > > want > > > > to > > > > > optimize for them a bit more. > > > > > > > > > > 5. (NIt): We can link to our own Javadocs [2] instead of > javadoc.io > > > > > > > > > > > > > > > [1] - > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236 > > > > > > > > > > [2] - > > > > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Chris > > > > > > > > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <yash.ma...@gmail.com> > > > wrote: > > > > > > > > > > > Hi Ashwin, > > > > > > > > > > > > Thanks for taking a look and sharing your thoughts! > > > > > > > > > > > > 1. Yes, the request / response formats of the two APIs were > > > > intentionally > > > > > > made identical for such use-cases. [1] > > > > > > > > > > > > 2. I'm assuming you're referring to retaining the offset / config > > > topic > > > > > > records for a connector when it is deleted by a user? Firstly, a > > > > > > connector's offsets aren't actually currently deleted when the > > > > connector > > > > > is > > > > > > deleted - it was listed as potential future work in KIP-875 [2]. > > > > > Secondly, > > > > > > the config topic is the source of truth for the state of a > Connect > > > > > cluster > > > > > > and a connector deletion is done by writing a null-valued record > > > > > > (tombstone) with the connector key to the config topic. We could > > > > > > potentially modify the delete mechanism to use a special new > record > > > > > > (instead of a tombstone with the connector key) in order to > retain > > > the > > > > > > latest configuration of a connector while still deleting the > actual > > > > > > connector - however, this would have its downsides and I don't > see > > > too > > > > > many > > > > > > benefits. Furthermore, connector migration between different > Kafka > > > > > clusters > > > > > > was just used as a representational use case for creating > > connectors > > > > in a > > > > > > stopped state - it isn't the primary focus of this KIP as such. > > > > > > > > > > > > 3. KIP-875 goes into some more details about this [3], but the > TLDR > > > is > > > > > that > > > > > > the STOPPED state will be treated like the PAUSED state on older > > > > workers > > > > > > that don't recognize the STOPPED state. > > > > > > > > > > > > Thanks, > > > > > > Yash > > > > > > > > > > > > [1] - > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat > > > > > > > > > > > > [2] - > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors > > > > > > > > > > > > [3] - > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate > > > > > > > > > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin > > <apan...@confluent.io.invalid > > > > > > > > > > wrote: > > > > > > > > > > > > > Thanks Yash! This is very useful for migrating connectors from > > one > > > > > > cluster > > > > > > > to another. > > > > > > > > > > > > > > I had the following comments/questions > > > > > > > > > > > > > > 1. Is the offset read using `GET /offsets` api always > guaranteed > > to > > > > be > > > > > > in a > > > > > > > format accepted by `PATCH /offsets` ? > > > > > > > 2. I had to tackle a similar migration situation but the two > > > connect > > > > > > > clusters in question were using the same backing Kafka cluster. > > The > > > > > > > challenge in this case is that when I delete the original > > > connector, > > > > I > > > > > > want > > > > > > > to retain offsets and config topics. Do you think we should > > support > > > > > > > deletion of a connector without removal of these topics as part > > of > > > > this > > > > > > KIP > > > > > > > ? > > > > > > > 3. In the case of a downgrade, how will Connect worker handle > the > > > > > > optional > > > > > > > “state” field in config topic ? > > > > > > > > > > > > > > Thanks, > > > > > > > Ashwin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya < > > yash.ma...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I'd like to begin discussion on a KIP to allow creating > > > connectors > > > > > in a > > > > > > > > stopped state - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Yash > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >