Hello all, I do have two Kafka clusters in action, test and prod. The two are formed by 3 nodes each, are independent and run their own zookeeper setups. My prod cluster is running fine. My test cluster is half-broken and I'm struggling to fix it. I could wipe it but I prefer to understand what's wrong and fix it.
I'm not sure what broke my test cluster. I had several network disconnections / split-brains but Kafka always recovered fine. The reasons for the network issues are independent and still being investigated (layer-2 storms, etc). So I upgraded my zookeeper and kafka to the latest versions and when trying to rebalance a topic across brokers I started to notice the problems. Not sure really when they started, before or after the upgrade. I ran the upgrade as for the official doc (rolling upgrade, moving up inter.broker.protocol.version and log.message.format.version gradually). ------------------------------------------------------ cat /etc/os-release NAME="SLES" VERSION="12-SP4" VERSION_ID="12.4" PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4" ID="sles" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:suse:sles:12:sp4" ------------------------------------------------------ versions : - zookeeper 3.5.6 - kafka 2.12-2.3.1 ------------------------------------------------------ many rapid log entries on brokers 1 & 2 (we have 1, 2, 3) [2019-12-05 09:56:54,967] ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis) ------------------------------------------------------ java is java-1_8_0-openjdk-1.8.0.222-27.35.2.x86_64 from SLES. I have tried Oracle Java jdk1.8.0_231 with the same issue. ------------------------------------------------------ when trying to see a reassignment I have this very suspect error : root@vmgato701a01:/appl/kafka/bin # ./kafka-reassign-partitions.sh --zookeeper $ZKLIST --reassignment-json-file /tmp/r7.json --verify Status of partition reassignment: Partitions reassignment failed due to com.fasterxml.jackson.databind.ext.Java7Support.getDeserializerForJavaNioFilePath(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/JsonDeserializer; java.lang.AbstractMethodError: com.fasterxml.jackson.databind.ext.Java7Support.getDeserializerForJavaNioFilePath(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/JsonDeserializer; at com.fasterxml.jackson.databind.ext.OptionalHandlerFactory.findDeserializer(OptionalHandlerFactory.java:122) at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findOptionalStdDeserializer(BasicDeserializerFactory.java:1589) at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findDefaultDeserializer(BasicDeserializerFactory.java:1812) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.findStdDeserializer(BeanDeserializerFactory.java:161) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:125) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:411) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:349) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:264) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244) at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142) at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:477) at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:4178) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3997) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3079) at kafka.utils.Json$.parseBytesAs(Json.scala:73) at kafka.zk.ReassignPartitionsZNode$.decode(ZkData.scala:407) at kafka.zk.KafkaZkClient.getPartitionReassignment(KafkaZkClient.scala:795) at kafka.admin.ReassignPartitionsCommand$.checkIfPartitionReassignmentSucceeded(ReassignPartitionsCommand.scala:355) at kafka.admin.ReassignPartitionsCommand$.verifyAssignment(ReassignPartitionsCommand.scala:97) at kafka.admin.ReassignPartitionsCommand$.verifyAssignment(ReassignPartitionsCommand.scala:90) at kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:61) at kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala) Help appreciated.