Hi Hector, Thanks for the KIP.
One tricky aspect is that currently there's no real way to stop a connector so to do so people often just delete them temporarily. KIP-875 proposes adding a mechanism to properly stop connectors which should reduce the need to deleting them and avoid doing potentially expensive cleanup operations repetitively. This KIP also reminds me of KIP-419: https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka+Connect+SourceTask+is+stopped. Is it guaranteed the new delete callback will be the last method called? Thanks, Mickael On Tue, Nov 15, 2022 at 5:40 PM Sagar <sagarmeansoc...@gmail.com> wrote: > > Hey Hector, > > Thanks for the KIP. I have a minor suggestion in terms of naming. Since > this is a callback method, would it make sense to call it onDelete()? > > Also, the failure scenarios discussed by Greg would need handling. Among > other things, I like the idea of having a timeout for graceful shutdown or > else try a force shutdown. What do you think about that approach? > > Thanks! > Sagar. > > On Sat, Nov 12, 2022 at 1:53 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) < > hgerald...@bloomberg.net> wrote: > > > Thanks Greg for taking your time to review not just the KIP but also the > > PR. > > > > 1. You made very valid points regarding the behavior of the destroy() > > callback for connectors that don't follow the happy path. After thinking > > about it, I decided to tweak the implementation a bit and have the > > destroy() method be called during the worker shutdown: this means it will > > share the same guarantees the connector#stop() method has. An alternative > > implementation can be to have an overloaded connector#stop(boolean deleted) > > method that signals a connector that it is being stopped due to deletion, > > but I think that having a separate destroy() method provides clearer > > semantics. > > > > I'll make sure to ammend the KIP with these details. > > > > 3. Without going too deep on the types of operations that can be performed > > by a connector when it's being deleted, I can imagine the > > org.apache.kafka.connect.source.SourceConnector base class having a default > > implementation that deletes the connector's offsets automatically > > (controlled by a property); this is in the context of KIP-875 (first-class > > offsets support in Kafka Connect). Similar behaviors can be introduced for > > the SinkConnector, however I'm not sure if this KIP is the right place to > > discuss all the possibilities, or if we shoold keeping it more > > narrow-focused on providing a callback mechanism for when connectors are > > deleted, and what the expectations are around this newly introduced method. > > What do you think? > > > > > > From: dev@kafka.apache.org At: 11/09/22 16:55:04 UTC-5:00To: > > dev@kafka.apache.org > > Subject: Re: [DISCUSS] KIP-883: Add delete callback method to Connector API > > > > Hi Hector, > > > > Thanks for the KIP! > > > > This is certainly missing functionality from the native Connect framework, > > and we should try to make it possible to inform connectors about this part > > of their lifecycle. > > However, as with most functionality that was left out of the initial > > implementation of the framework, the details are more challenging to work > > out. > > > > 1. What happens when the destroy call throws an error, how does the > > framework respond? > > > > This is unspecified in the KIP, and it appears that your proposed changes > > could cause the herder to fail. > > From the perspective of operators & connector developers, what is a > > reasonable expectation to have for failure of a destroy? > > I could see operators wanting both a graceful-delete to make use of this > > new feature, and a force-delete for when the graceful-delete fails. > > A connector developer could choose to swallow all errors encountered, or > > fail-fast to indicate to the operator that there is an issue with the > > graceful-delete flow. > > If the alternative is crashing the herder, connector developers may choose > > to hide serious errors, which is undesirable. > > > > 2. What happens when the destroy() call takes a long time to complete, or > > is interrupted? > > > > It appears that your implementation serially destroy()s each appropriate > > connector, and may prevent the herder thread from making progress while the > > operation is ongoing. > > We have previously had to patch Connect to perform all connector and task > > operations on a background thread, because some connector method > > implementations can stall indefinitely. > > Connect also has the notion of "cancelling" a connector/task if a graceful > > shutdown timeout operation takes too long. Perhaps some of that design or > > machinery may be useful to protect this method call as well. > > > > More specific to the destroy() call itself, what happens when a connector > > completes part of a destroy operation and then cannot complete the > > remainder, either due to timing out or a worker crashing? > > What is the contract with the connector developer about this method? Is the > > destroy() only started exactly once during the lifetime of the connector, > > or may it be retried? > > > > 3. What should be considered a reasonable custom implementation of the > > destroy() call? What resources should it clean up by default? > > > > I think we can broadly categorize the state a connector mutates among the > > following > > * Framework-managed state (e.g. source offsets, consumer offsets) > > * Implementation detail state (e.g. debezium db history topic, audit > > tables, temporary accounts) > > * Third party system data (e.g. the actual data being written by a sink > > connector) > > * Third party system metadata (e.g. tables in a database, delivery > > receipts, permissions) > > > > I think it's apparent that the framework-managed state cannot/should not be > > interacted with by the destroy() call. However, the framework could be > > changed to clean up these resources at the same time that destroy() is > > called. Is that out-of-scope of this proposal, and better handled by manual > > intervention? > > From the text of the KIP, I think it explicitly includes the Implementation > > detail state, which should not be depended on externally and should be safe > > to clean up during a destroy(). I think this is completely reasonable. > > Are the third-party data and metadata out-of-scope for this proposal? Can > > we officially recommend against it, or should we accommodate users and > > connector developers that wish to clean up data/metadata during destroy()? > > > > 4. How should connector implementations of destroy handle backwards > > compatibility? > > > > In terms of backward-compatibility for the framework vs connector versions, > > I think the default-noop method is very reasonable. > > However, what happens when someone upgrades from a version of a connector > > without a destroy() implementation to one with an implementation, and > > maintain backwards compatibility? > > To replicate the same behavior, the connector might include something like > > an `enable.cleanup` config which allows users to opt-in to the new > > behavior. This could mean the proliferation of many different > > configurations to handle this behavior. > > Maybe we can provide some recommendations to developers, or some mechanism > > to standardize this opt-in behavior. > > > > I'm interested to hear if you have any experience with the above, if you've > > experimented with this feature in your fork. > > > > Thanks, > > Greg > > > > > > On Thu, Nov 3, 2022 at 11:55 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) < > > hgerald...@bloomberg.net> wrote: > > > > > Hi everyone, > > > > > > I've submitted KIP-883, which introduces a callback to the public > > > Connector API called when deleting a connector: > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-883%3A+Add+delete+callback > > +method+to+Connector+API > > > > > > It adds a new `deleted()` method (open to better naming suggestions) to > > > the org.apache.kafka.connect.connector.Connector abstract class, which > > will > > > be invoked by connect Workers when a connector is being deleted. > > > > > > Feedback and comments are welcome. > > > > > > Thank you! > > > Hector > > > > > > > > > > > >