I'm seeing some behavior with the DistributedHerder that I am trying to
understand. I'm working on setting up a cluster of kafka connect nodes and
have a relatively large number of connectors to submit to it (392
connectors right now that will soon become over 1100). As for the
deployment of it I am using chef, and having that PUT connector configs at
deployment time so I can create/update any connectors.

Everytime I PUT a new connector config to the worker it appears to be
initiating an assignment rebalance. I believe this is only happening when
submitting a new connector. This is causing all existing and running
connectors to stop and restart. My logs end up being flooded with
exceptions from the source jdbc task with sql connections being closed and
wakeup exceptions in my sink tasks when committing offsets. This causes
issues beyond having to wait for a rebalance as restarting the jdbc
connectors causes them to re-pull all data, since they are using bulk mode.
Everything eventually settles down and all the connectors finish
successfully, but each PUT takes progressively longer waiting for a
rebalance to finish.

If I simply restart the worker nodes and let them only instantiate
connectors that have already been successfully submitted everything starts
up fine. So, this is only an issue when submitting new connectors over the
REST endpoint.

So, I'm trying to understand why submitting a new connector causes the
rebalancing, but also if there is a better way to deploy the connector
configs in distributed mode?

Thanks,

Stephen

Reply via email to