Hi Chris,

I've updated the KIP to call out the parsing logic and the user
expectations explicitly. Thanks again for all your feedback on this KIP!
I'll wait for a few more days to see if anyone else has comments before
kicking off a vote thread.

Thanks,
Yash

On Thu, Oct 5, 2023 at 10:49 PM Chris Egerton <chr...@aiven.io.invalid>
wrote:

> Hi Yash,
>
> Yeah, I think just hardcoding with JSON-first, properties-second is fine.
> IMO it's worth calling this out explicitly in the KIP.
>
> Apart from that, no further comments. LGTM, thanks for the KIP!
>
> Cheers,
>
> Chris
>
> On Thu, Oct 5, 2023 at 6:23 AM Yash Mayya <yash.ma...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thanks for all your feedback so far!
> >
> > 3. That's a good question. I was thinking we'd do some "intelligent"
> > parsing internally during the implementation (i.e. essentially your last
> > option - attempting to parse first as one format, then the other) which
> is
> > why I didn't include any more details in the KIP itself (where I've only
> > outlined the contract changes). This would allow for the smoothest user
> > experience IMO and all the heavy lifting will be done in the parsing
> logic.
> > All the other options seemed either clunky or brittle from the user
> > experience point of view. In terms of the actual implementation itself,
> > we'd probably want to first try parsing it into the supported JSON
> > structures before trying to parse it into Java properties since the Java
> > properties format is very permissive (i.e. we won't really see any errors
> > on attempting to parse a JSON file into Java properties).
> >
> > Thanks,
> > Yash
> >
> > On Thu, Oct 5, 2023 at 1:39 AM Chris Egerton <chr...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Yash,
> > >
> > > Looking great! Few more thoughts:
> > >
> > >
> > > 1. (Downgraded to nit) I still prefer dot-delimitation but it's not a
> > > blocker; thanks for addressing my concerns about the name of the field
> > and
> > > how it may be perceived by users.
> > >
> > > 2. (Addressed) Thanks for looking into this, and sorry it turned out to
> > be
> > > a bit of a dead end! I'm convinced that the current proposal is good
> > > enough.
> > >
> > > 3. Can you shed a little more light on how we'll determine whether a
> > > connector config should be parsed as JSON or as a properties file? Will
> > > this be based on file extension, a command-line flag (which might apply
> > to
> > > all configs, or individual configs), attempting to parse first as one
> > > format then the other, something else?
> > >
> > > 4. (Addressed) Thanks! Looks great.
> > >
> > > 6. (Addressed) Awesome, great to hear. The point about laggy connector
> > > startup is very convincing; my paranoia is satiated.
> > >
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Wed, Oct 4, 2023 at 5:35 AM Yash Mayya <yash.ma...@gmail.com>
> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Thanks for the quick follow up and the continued insightful
> discourse!
> > > >
> > > > 1. Fair point on the need to differentiate it from the actual state
> > > > displayed in the status API, I like the prefix of "initial" to make
> > that
> > > > differentiation (from your suggested alternatives previously).
> > Regarding
> > > > the dots vs underscores as delimiters - the new state field will be a
> > top
> > > > level field in the connector creation request body alongside the
> > "config"
> > > > map (i.e. it won't be a connector configuration itself), so I think
> we
> > > > should be using the underscore delimiter for consistency. For now,
> I've
> > > > updated the KIP to use "initial_state" as the new field's name - let
> me
> > > > know if you disagree, and I'd be happy to reconsider.
> > > >
> > > > 2. Hm, I actually hadn't considered the downgrade implications with
> > your
> > > > proposed single record approach. I agree that it's a bigger downside
> > than
> > > > writing two records to the config topic. I do understand your
> concerns
> > > with
> > > > the potential for config topic inconsistencies which is why I
> proposed
> > > > writing the target state first (since the presence of a target state
> > for
> > > a
> > > > connector with no configuration is a benign condition). Also, even in
> > the
> > > > non-transactional config topic producer case - if there is a failure
> > > > between the two writes, the user will be notified of the error
> > > > synchronously via the API response (ref -
> > > > https://github.com/apache/kafka/pull/12984) and will be able to
> safely
> > > > retry the operation. I don't see how we'd be able to do a single
> record
> > > > write approach along with supporting clean downgrades since we'd
> either
> > > > need to introduce a new record type or add a new field to an existing
> > > > record type - neither of which would be recognized as such by an
> older
> > > > Connect worker.
> > > >
> > > > > Standalone mode has always supported the REST API,
> > > > > and so far FWICTwe've maintained feature parity between
> > > > > the two modes
> > > >
> > > > > add support for JSON files with standalone mode.
> > > >
> > > > 3. Thanks, I wasn't aware about standalone mode always having
> supported
> > > the
> > > > full REST API - I thought I'd seen some references earlier indicating
> > > > otherwise. In that case, I do agree that it makes sense to maintain
> > > parity
> > > > across both methods of connector creation for user experience
> > > consistency.
> > > > I really like the idea of updating the standalone mode CLI to be able
> > to
> > > > parse JSON files (in the same format as the connector creation REST
> API
> > > > endpoint request body) along with Java properties files since I think
> > > that
> > > > offers two big benefits. One is that users will be able to copy and
> use
> > > > examples across both the methods of connector creation (REST API
> > requests
> > > > with JSON request bodies and JSON files passed to the standalone mode
> > > > startup CLI). The second benefit is that any future extensions (such
> as
> > > the
> > > > "offsets" field we've discussed in this thread) would be easily
> applied
> > > > across both the methods consistently instead of introducing new (and
> > > likely
> > > > ugly) CLI flags. I've updated the KIP to include this change in the
> > > > standalone mode CLI.
> > > >
> > > > 4. Makes sense, I've added this under a new "Future Work" section in
> > the
> > > > KIP.
> > > >
> > > > 6. From what I can tell, there shouldn't be any issues with the lack
> of
> > > > task configurations in the config topic and it seems to be a
> supported
> > > > assumption across the Connect code-base that a connector
> configuration
> > > > could exist without any task configurations for the connector (a
> > > situation
> > > > that could currently manifest with slow starting connectors,
> connectors
> > > > that fail during startup, connectors that fail to generate task
> > > > configurations, connectors that are paused right after being created
> > > etc.).
> > > > I did also try out a small prototype before publishing this KIP and
> > > things
> > > > do work as expected when creating a connector in the PAUSED / STOPPED
> > > state
> > > > by simply writing the appropriate target state along with the
> connector
> > > > configuration to the config topic.
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > > On Wed, Oct 4, 2023 at 1:44 AM Chris Egerton <chr...@aiven.io.invalid
> >
> > > > wrote:
> > > >
> > > > > Hi Yash,
> > > > >
> > > > > Thanks for the in-depth discussion! Continuations here:
> > > > >
> > > > > 1. Regarding delimiters (dots vs. underscores), we use dots in
> > > connector
> > > > > configs for all runtime-recognized properties, including but not
> > > limited
> > > > to
> > > > > connector.class, tasks.max, key.converter, key.converter.*, and
> > > > > transforms.*.type. Regarding the choice of name--I think it's
> because
> > > the
> > > > > concept here is different from the "state" that we display in the
> > > status
> > > > > API that a different name is warranted. Users can't directly write
> > the
> > > > > connector's actual state, they can still only specify a target
> state.
> > > And
> > > > > there's no guarantee that a connector will ever enter a target
> state
> > > > > (whether it's specified at creation time or later), since a failure
> > can
> > > > > always occur.
> > > > >
> > > > > 2. It still seems simpler to emit one record instead of two,
> > especially
> > > > > since we're not guaranteed that the leader will be using a
> > > transactional
> > > > > producer. I guess I'm just wary of knowingly putting the config
> topic
> > > in
> > > > an
> > > > > inconsistent state (target state present without accompanying
> > connector
> > > > > config), even if it's meant to be only for a brief period. Thinking
> > > about
> > > > > it some more though, a one-record approach comes with drawbacks in
> > the
> > > > > downgrade scenario: older workers wouldn't know how to handle the
> new
> > > > > config format and would just fall back to creating the connector in
> > the
> > > > > running state. I suppose we should favor the two-record approach
> > since
> > > > the
> > > > > downgrade scenario is more likely than the other failure mode, but
> > it'd
> > > > be
> > > > > nice if we could think of a way to satisfy both concerns. Not a
> > > blocker,
> > > > > though.
> > > > >
> > > > > 3. Standalone mode has always supported the REST API, and so far
> > FWICT
> > > > > we've maintained feature parity between the two modes for
> everything
> > > > except
> > > > > exactly-once source connectors, which would have required
> significant
> > > > > additional work since we'd have to add support for storing source
> > > > connector
> > > > > offsets in a Kafka topic instead of on local storage like we
> > currently
> > > > do.
> > > > > I'd really prefer if we could try to maintain feature parity
> wherever
> > > > > possible--one way we could possibly do that with this KIP is to
> also
> > > add
> > > > > support for JSON files with standalone mode.
> > > > >
> > > > > 4. Yeah, no need to block on that idea since there are other use
> > cases
> > > > for
> > > > > creating stopped connectors. We can treat it like the option to
> > delete
> > > > > offsets along with the connector discussed in KIP-875: punt for
> now,
> > > > > possibly implement later pending user feedback and indication of
> > > demand.
> > > > > Might be worth adding to a "Future work" section as an indication
> > that
> > > we
> > > > > haven't ruled it out (in which case it'd make sense as a rejected
> > > > > alternative) but have chosen not to implement yet.
> > > > >
> > > > >
> > > > > And I had one new thought that's pretty implementation-oriented but
> > may
> > > > > influence the design slightly:
> > > > >
> > > > > 6. Right now we write an empty set of task configs to the config
> > topic
> > > > when
> > > > > handling requests to stop a connector in distributed mode. Do we
> need
> > > to
> > > > do
> > > > > the same when creating connectors in the stopped state, or add any
> > > other
> > > > > special logic besides noting the new state in the config topic? Or
> is
> > > it
> > > > > sufficient to write a non-running target state to the config topic
> > and
> > > > then
> > > > > rely on existing logic to simply refuse to generate task configs
> for
> > > the
> > > > > newly-created connector? Is there any chance that the lack of task
> > > > configs
> > > > > (as opposed to an empty list of task configs) in the config topic
> > for a
> > > > > connector that exists will cause issues?
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Tue, Oct 3, 2023 at 3:29 AM Yash Mayya <yash.ma...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Chris,
> > > > > >
> > > > > > Thanks for taking a look at this KIP!
> > > > > >
> > > > > > 1. I chose to go with simply "state" as that exact term is
> already
> > > > > exposed
> > > > > > via some of the existing REST API responses and would be one that
> > > users
> > > > > are
> > > > > > already familiar with (although admittedly something like
> > > > "initial_state"
> > > > > > wouldn't be much of a jump). Since it's a field in the request
> body
> > > for
> > > > > the
> > > > > > connector creation endpoint, wouldn't it be implied that it is
> the
> > > > > > "initial" state just like the "config" field represents the
> > "initial"
> > > > > > configuration? Also, I don't think x.y has been established as
> the
> > > > field
> > > > > > naming convention in the Connect REST API right? From what I can
> > > tell,
> > > > > x_y
> > > > > > is the convention being followed for fields in requests
> > > ("kafka_topic"
> > > > /
> > > > > > "kafka_partition" / "kafka_offset" in the offsets APIs for
> > instance)
> > > > and
> > > > > > responses ("error_count", "kafka_cluster_id",
> "recommended_values"
> > > > etc.).
> > > > > >
> > > > > > 2. The connector configuration record is currently used for both
> > > > > connector
> > > > > > create requests as well as connector config update requests.
> Since
> > > > we're
> > > > > > only allowing configuring the target state for newly created
> > > > connectors,
> > > > > I
> > > > > > feel like it'll be a cleaner separation of concerns to use the
> > > existing
> > > > > > records for connector configurations and connector target states
> > > rather
> > > > > > than bundling the "state" and "state.v2" (or equivalent) fields
> > into
> > > > the
> > > > > > connector configuration record. The additional write should be
> very
> > > > > minimal
> > > > > > overhead and the two writes would be an atomic operation for
> > Connect
> > > > > > clusters that are using a transactional producer for the config
> > topic
> > > > > > anyway. Thoughts?
> > > > > >
> > > > > > 3. I was thinking that we'd support standalone mode via the same
> > > > > connector
> > > > > > creation REST API endpoint changes (addition of the "state"
> field).
> > > If
> > > > > > there is further interest in adding similar support to the
> command
> > > line
> > > > > > method of creating connectors as well, perhaps we could do so in
> a
> > > > small
> > > > > > follow-on KIP? I feel like ever since the standalone mode started
> > > > > > supporting the full Connect REST API, the command line method of
> > > > creating
> > > > > > connectors is more of a "legacy" concept.
> > > > > >
> > > > > > 4. Yeah, connector / offsets migration was just used as a
> > > > representative
> > > > > > example of how this feature could be useful - I didn't intend for
> > it
> > > to
> > > > > be
> > > > > > the sole purpose of this KIP. However, that said, I really like
> the
> > > > idea
> > > > > of
> > > > > > accepting an "offsets" field in the connector creation request
> > since
> > > > it'd
> > > > > > reduce the number of user operations required from 3 (create the
> > > > > connector
> > > > > > in a STOPPED state; alter the offsets; resume the connector) to
> 1.
> > > I'd
> > > > be
> > > > > > happy to either create or review a separate small KIP for that if
> > it
> > > > > sounds
> > > > > > acceptable to you?
> > > > > >
> > > > > > 5. Whoops, thanks, I hadn't even noticed that that's where I had
> > > linked
> > > > > to.
> > > > > > Fixed!
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > > > On Mon, Oct 2, 2023 at 11:14 PM Chris Egerton
> > > <chr...@aiven.io.invalid
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Yash,
> > > > > > >
> > > > > > > Thanks for the KIP! Frankly this feels like an oversight in 875
> > and
> > > > I'm
> > > > > > > glad we're closing that loop ASAP.
> > > > > > >
> > > > > > >
> > > > > > > Here are my thoughts:
> > > > > > >
> > > > > > > 1. (Nit): IMO "start.state", "initial.state", or "target.state"
> > > might
> > > > > be
> > > > > > > better than just "state" for the field name in the connector
> > > creation
> > > > > > > request.
> > > > > > >
> > > > > > > 2. Why implement this in distributed mode with two writes to
> the
> > > > config
> > > > > > > topic? We could augment the existing format for connector
> configs
> > > in
> > > > > the
> > > > > > > config topic [1] with a new field instead.
> > > > > > >
> > > > > > > 3. Although standalone mode is mentioned in the KIP, there's no
> > > > detail
> > > > > on
> > > > > > > the Java properties file format that we support for standalone
> > mode
> > > > > > (i.e.,
> > > > > > > `connect-standalone config/connect-standalone.properties
> > > > > > > config/connect-file-source.properties
> > > > > > > config/connect-file-sink.properties`). Do we plan on adding
> > support
> > > > for
> > > > > > > that mode as well?
> > > > > > >
> > > > > > > 4. I suspect there will be advantages for this feature beyond
> > > offsets
> > > > > > > migration, but if offsets migration were the only motivation
> for
> > > it,
> > > > > > > wouldn't it be simpler to accept an "offsets" field in
> connector
> > > > > creation
> > > > > > > requests? That way, users wouldn't have to start a connector
> in a
> > > > > > different
> > > > > > > state and then resume it, they could just create the connector
> > like
> > > > > > normal.
> > > > > > > I think the current proposal is acceptable, but wanted to float
> > > this
> > > > > > > alternative in case we anticipate lots of connector migrations
> > and
> > > > want
> > > > > > to
> > > > > > > optimize for them a bit more.
> > > > > > >
> > > > > > > 5. (NIt): We can link to our own Javadocs [2] instead of
> > > javadoc.io
> > > > > > >
> > > > > > >
> > > > > > > [1] -
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/dcd8c7d05f2f22f2d815405e7ab3ad7439669239/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaConfigBackingStore.java#L234-L236
> > > > > > >
> > > > > > > [2] -
> > > > > >
> > https://kafka.apache.org/35/javadoc/index.html?overview-summary.html
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Chris
> > > > > > >
> > > > > > > On Thu, Sep 21, 2023 at 2:37 AM Yash Mayya <
> yash.ma...@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ashwin,
> > > > > > > >
> > > > > > > > Thanks for taking a look and sharing your thoughts!
> > > > > > > >
> > > > > > > > 1. Yes, the request / response formats of the two APIs were
> > > > > > intentionally
> > > > > > > > made identical for such use-cases. [1]
> > > > > > > >
> > > > > > > > 2. I'm assuming you're referring to retaining the offset /
> > config
> > > > > topic
> > > > > > > > records for a connector when it is deleted by a user?
> Firstly,
> > a
> > > > > > > > connector's offsets aren't actually currently deleted when
> the
> > > > > > connector
> > > > > > > is
> > > > > > > > deleted - it was listed as potential future work in KIP-875
> > [2].
> > > > > > > Secondly,
> > > > > > > > the config topic is the source of truth for the state of a
> > > Connect
> > > > > > > cluster
> > > > > > > > and a connector deletion is done by writing a null-valued
> > record
> > > > > > > > (tombstone) with the connector key to the config topic. We
> > could
> > > > > > > > potentially modify the delete mechanism to use a special new
> > > record
> > > > > > > > (instead of a tombstone with the connector key) in order to
> > > retain
> > > > > the
> > > > > > > > latest configuration of a connector while still deleting the
> > > actual
> > > > > > > > connector - however, this would have its downsides and I
> don't
> > > see
> > > > > too
> > > > > > > many
> > > > > > > > benefits. Furthermore, connector migration between different
> > > Kafka
> > > > > > > clusters
> > > > > > > > was just used as a representational use case for creating
> > > > connectors
> > > > > > in a
> > > > > > > > stopped state - it isn't the primary focus of this KIP as
> such.
> > > > > > > >
> > > > > > > > 3. KIP-875 goes into some more details about this [3], but
> the
> > > TLDR
> > > > > is
> > > > > > > that
> > > > > > > > the STOPPED state will be treated like the PAUSED state on
> > older
> > > > > > workers
> > > > > > > > that don't recognize the STOPPED state.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yash
> > > > > > > >
> > > > > > > > [1] -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Request/responseformat
> > > > > > > >
> > > > > > > > [2] -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Automaticallydeleteoffsetswithconnectors
> > > > > > > >
> > > > > > > > [3] -
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-STOPPEDtargetstate
> > > > > > > >
> > > > > > > > On Wed, Sep 20, 2023 at 7:24 PM Ashwin
> > > > <apan...@confluent.io.invalid
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Yash! This is very useful for migrating connectors
> > from
> > > > one
> > > > > > > > cluster
> > > > > > > > > to another.
> > > > > > > > >
> > > > > > > > > I had the following comments/questions
> > > > > > > > >
> > > > > > > > > 1. Is the offset read using `GET /offsets` api always
> > > guaranteed
> > > > to
> > > > > > be
> > > > > > > > in a
> > > > > > > > > format accepted by `PATCH /offsets` ?
> > > > > > > > > 2. I had to tackle a similar migration situation but the
> two
> > > > > connect
> > > > > > > > > clusters in question were using the same backing Kafka
> > cluster.
> > > > The
> > > > > > > > > challenge in this case is that when I delete the original
> > > > > connector,
> > > > > > I
> > > > > > > > want
> > > > > > > > > to retain offsets and config topics. Do you think we should
> > > > support
> > > > > > > > > deletion of a connector without removal of these topics as
> > part
> > > > of
> > > > > > this
> > > > > > > > KIP
> > > > > > > > > ?
> > > > > > > > > 3. In the case of a downgrade, how will Connect worker
> handle
> > > the
> > > > > > > > optional
> > > > > > > > > “state” field in config topic ?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Ashwin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Sep 17, 2023 at 11:09 PM Yash Mayya <
> > > > yash.ma...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I'd like to begin discussion on a KIP to allow creating
> > > > > connectors
> > > > > > > in a
> > > > > > > > > > stopped state -
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Yash
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to