I'm seeing some behavior with the DistributedHerder that I am trying to understand. I'm working on setting up a cluster of kafka connect nodes and have a relatively large number of connectors to submit to it (392 connectors right now that will soon become over 1100). As for the deployment of it I am using chef, and having that PUT connector configs at deployment time so I can create/update any connectors.
Everytime I PUT a new connector config to the worker it appears to be initiating an assignment rebalance. I believe this is only happening when submitting a new connector. This is causing all existing and running connectors to stop and restart. My logs end up being flooded with exceptions from the source jdbc task with sql connections being closed and wakeup exceptions in my sink tasks when committing offsets. This causes issues beyond having to wait for a rebalance as restarting the jdbc connectors causes them to re-pull all data, since they are using bulk mode. Everything eventually settles down and all the connectors finish successfully, but each PUT takes progressively longer waiting for a rebalance to finish. If I simply restart the worker nodes and let them only instantiate connectors that have already been successfully submitted everything starts up fine. So, this is only an issue when submitting new connectors over the REST endpoint. So, I'm trying to understand why submitting a new connector causes the rebalancing, but also if there is a better way to deploy the connector configs in distributed mode? Thanks, Stephen