
James Cheng commented on KAFKA-4858:

Yup, I agree that it's actually the same as KAFKA-3219. The impact is the same, 
but is larger than was originally mentioned in KAFKA-3219. From that JIRA, it 
seemed to me to only be "a weird half-created topic", but from this experiment 
that I ran, it affects the broker as a whole.

I agree that there is no way to prevent this data from getting into Zookeeper, 
if someone uses old client scripts. All we can do is protect the broker itself 
from bad data in Zookeeper. I think that a good way to fix it would be for the 
broker to do sanity checks against the topic information in Zookeeper *before* 
it tries to bring the partition online. If the topic information is "not 
valid", then the topic should be left offline/ignored, and we should write a 
message to the logs saying that this topic is bad.

Who is in charge of noticing the new topics in Zookeeper? Is it the controller? 
And then who is responsible for notifying all the brokers to create the files 
on disk? Does this happen because the controller sent messages to the brokers 
telling them to do this? If so, then maybe we can just find the right place in 
the controller that responds to the "new topic" messages, and do the validation 

> Long topic names created using old kafka-topics.sh can prevent newer brokers 
> from joining any ISRs
> --------------------------------------------------------------------------------------------------
>                 Key: KAFKA-4858
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4858
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions:,
>            Reporter: James Cheng
>            Assignee: Vahid Hashemian
> I ran into a variant of KAFKA-3219 that resulted in a broker being unable to 
> join any ISRs the cluster.
> Prior to, the maximum topic length was 255.
> With and beyond, the maximum topic length is 249.
> The check on topic name length is done by kafka-topics.sh prior to topic 
> creation. Thus, it is possible to use a kafka-topics.sh script to 
> create a 255 character topic on a broker.
> When this happens, you will get the following stack trace (the same one seen 
> in KAFKA-3219)
> {code}
> $ TOPIC=$(printf 'd%.0s' {1..255} ) ; bin/kafka-topics.sh --zookeeper 
> --create --topic $TOPIC --partitions 1 --replication-factor 2
> Created topic 
> "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd".
> {code}
> {code}
> [2017-03-06 22:01:19,011] ERROR [KafkaApi-2] Error when handling request 
> {controller_id=1,controller_epoch=1,partition_states=[{topic=ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd,partition=0,controller_epoch=1,leader=2,leader_epoch=0,isr=[2,1],zk_version=0,replicas=[2,1]}],live_leaders=[{id=2,host=jchengmbpro15,port=9093}]}
>  (kafka.server.KafkaApis)
> java.lang.NullPointerException
>       at 
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
>       at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
>       at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>       at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>       at kafka.log.Log.loadSegments(Log.scala:155)
>       at kafka.log.Log.<init>(Log.scala:108)
>       at kafka.log.LogManager.createLog(LogManager.scala:362)
>       at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:94)
>       at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>       at 
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:174)
>       at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>       at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:174)
>       at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:168)
>       at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>       at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:242)
>       at kafka.cluster.Partition.makeLeader(Partition.scala:168)
>       at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:758)
>       at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:757)
>       at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>       at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>       at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>       at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:757)
>       at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:703)
>       at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
>       at kafka.server.KafkaApis.handle(KafkaApis.scala:82)
>       at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> The topic does not get created on disk, but the broker thinks the topic is 
> ready. The broker seems functional, for other topics. I can produce/consume 
> to other topics.
> {code}
> $ ./bin/kafka-topics.sh --zookeeper --describe
> Topic:ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
>  PartitionCount:1        ReplicationFactor:2     Configs:
>       Topic: 
> ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
>   Partition: 0    Leader: 2       Replicas: 2,1   Isr: 2,1
> {code}
> If you stop and restart the broker, it again gets that stack trace. This 
> time, the broker fails to join *any* ISRs in the cluster. Notice below that 
> broker 2 is out of all ISRs
> {code}
> $ ./bin/kafka-topics.sh --zookeeper --describe
> Topic:ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
>  PartitionCount:1        ReplicationFactor:2     Configs:
>       Topic: 
> ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
>   Partition: 0    Leader: 1       Replicas: 2,1   Isr: 1
> Topic:small   PartitionCount:5        ReplicationFactor:2     Configs:
>       Topic: small    Partition: 0    Leader: 1       Replicas: 1,2   Isr: 1
>       Topic: small    Partition: 1    Leader: 1       Replicas: 2,1   Isr: 1
>       Topic: small    Partition: 2    Leader: 1       Replicas: 1,2   Isr: 1
>       Topic: small    Partition: 3    Leader: 1       Replicas: 2,1   Isr: 1
>       Topic: small    Partition: 4    Leader: 1       Replicas: 1,2   Isr: 1
> {code}
> So, it appears that a long topic name that sneaks into the cluster can 
> prevent brokers from partipating in the cluster.
> Furthermore, I'm not exactly sure how to delete the offending topic. A 
> kafka-topics.sh --delete won't delete the topic because it can't talk to all 
> replicas, because the replicas are not in the ISR. We ran into this at work 
> today and ended up having to manually delete the topic configuration from 
> zookeeper and then did a bounce of all affected brokers. Until we did that, 
> those brokers weren't able to join the cluster.

This message was sent by Atlassian JIRA

Reply via email to