Hi,
I was able to reproduce an issue with different source connector plugins,
so the issue seems to be in a Connect itself. Got the issue with the next
versions:
1. Connector plugin: com.nordstrom.kafka.connect.sqs.SqsSourceConnector
1.5.0
Kafka Connect cluster version: 3.9.0
2. Connector plugin: Debezium MySQL source 1.9.7
Kafka Connect cluster version: 3.9.0
When producer client for some particular task is overloaded, changing
connector config leads to missing MBeans with metrics for particular
client-id.
How I reproduced it:
1. Created a SQS source connector configured to have a high throughput, so
it's able to generate SourceRecord faster than Connect Worker can put them
to Kafka. This leads to producer buffer to run out of free space, produce
average request latency to increase significantly (30ms -> 3 seconds),
average record time in a queue to also increase significantly (50ms -> 1.5
minutes).
2. Using jconsole to check producer MBeans still reports metrics. E.g. for
simplicity metrics from this
MBean: kafka.producer:type=producer-metrics,client-id={clientId}
3. I patch the connector config: "producer.override.linger.ms" = "40" -> "
producer.override.linger.ms" = "42"
4. Mentioned MBean disappears from the jconsole
Tasks are still clearly working, since I can see incoming messages to a
topic from broker metrics.
If I patch the connector to have more tasks, e.g. 1 -> 3, MBean for client
0 disappears, while new clients 1 and 2 are visible. Once clients 1 and 2
become overloaded the same way, if I patch the connector with any config
change, their MBeans also disappear.
To fix the issue I have to either restart the Connect Worker systemd
service, or tune the connector to decrease amount of generated
SourceRecords and patch the connector config one more time once producers
become less saturated.
Is this some kind of a know issue?
Thanks