Thanks for raising this issue, Gunnar.

It is a shortcoming that Connect does not differentiate between starting
for the first time and restarting, nor between validating prior to
connector creation vs (re)validating a (potentially modified) connector
configuration while the connector is running. Proposing a KIP certainly
would be fine, though we do need to weigh this against increasing the
complexity of the APIs.

In the meantime, Chris did have some good suggestions for how a connector
might be able to deal with the current limitation. ATM I can't think of any
other obvious workarounds.

Best regards,

Randall


On Thu, Jan 21, 2021 at 9:52 AM Chris Egerton <chr...@confluent.io> wrote:

> Hi Gunnar,
>
> It's not possible to do this in a generalized fashion with the API provided
> by the framework today. Trying to hack your way around things by setting a
> flag or storing the connector name in some shared JVM state wouldn't work
> in a cluster with more than one worker since that state would obviously not
> be available across workers.
>
> With the specific case of the Debezium PostgreSQL connector, I'm wondering
> if you might be able to store the name of the connector in some external
> system (likely either the database itself or a Kafka topic, as I seem to
> recall that Debezium connectors create and consume from topics outside of
> the framework) after successfully claiming the replication slot. Then,
> during config validation, you could skip the replication slot validation if
> that stored name matched the name of the connector being validated. There
> are obviously some edge cases that'd need to be addressed such as sudden
> death of connectors after claiming the replication slot but before storing
> their name; just wanted to share the thought in case it leads somewhere
> useful.
>
> Either way, I think a small, simple KIP for this would be fine, as long as
> we could maintain backwards compatibility for existing connectors and allow
> connectors that use this new API to work on older versions of Connect that
> don't have support for it.
>
> Cheers,
>
> Chris
>
> On Thu, Jan 21, 2021 at 6:00 AM Gunnar Morling <gun...@hibernate.org>
> wrote:
>
> > Hi,
> >
> > In the Debezium community, we ran into an interesting corner case of
> > connector config validation [1].
> >
> > The Debezium Postgres connector requires a database resource called a
> > "replication slot", which identifies this connector to the database and
> > tracks progress it has made reading the TX log. This replication slot
> must
> > not be shared between multiple clients (Debezium connectors, or others),
> so
> > we added a validation to make sure that the slot configured by the user
> > isn't active, i.e. no client is connected to it already. This works as
> > expected when setting up, or restarting a connector, but when trying to
> > update the connector configuration, the connector still is running when
> the
> > configuration is validated, so the slot is active and validation hence
> > fails.
> >
> > Is there a way we can distinguish during config validation whether the
> > connector is (re-)started or whether it's a validation upon
> > re-configuration (allowing us to skip this particular validation in the
> > re-configuration case)?
> >
> > If that's not the case, would there be interest for a KIP for adding such
> > capability to the Kafka Connect API?
> >
> > Thanks for any feedback,
> >
> > --Gunnar
> >
> > [1] https://issues.redhat.com/browse/DBZ-2952
> >
>

Reply via email to