Hello community, Recently my team faced an issue with our Kafka Connect Mirrormaker cluster in which 1 partition was getting consumed and produced twice. The twice consumption and production scenarios were also confirmed by checking the BytesIn and BytesOut metrics from brokers.
What happened was we added a couple of servers to our connect cluster and then we did face a network partition with the rack of the new allocated servers. After some time the connectivity was restored and the system was working fine. But after a couple of hours, we observed that one topic was receiving twice the amount of data it usually gets and all the messages were getting repeated twice. The same topic also had twice the consumption rate from the source cluster. At this point, we thought that the issue might be because of the mirrormaker and restarted the connector. Even after the restart, the issue was still there and the messages were still getting duplicated. At this point, I checked out the assignment list and stopped the worker where this partition was assigned. After stopping this worker we observed that the message rate in the destination topic match that of the source topic and no message was being duplicated but after some time the coordinator detected the stopped broker and a rebalance was triggered and that again resulted in messages being consumed and produced twice. At this point, we stopped all the worker instances on servers that faced the network outage and restarted the connector and everything ran fine. Has anyone faced this issue before or are there any scenarios where this condition can arise. Is it a known issue? Regards, Lehar Jain