Any error in the controller and state-change log? Thanks,
Jun On Thu, Feb 12, 2015 at 7:28 AM, Omid Aladini <omidalad...@gmail.com> wrote: > Hi, > > I'm experimenting with the following scenario: > > - 3 brokers are running (0,1 and 2) -- Kafka version 0.8.2.0 > - Continuously: restart broker number 0 by triggering controlled shutdown. > Sleep rand(10) seconds. repeat. > - Continuously: create 'simple-test-topic' (RF=2), write and read messages, > then delete the topic. repeat. > > After a while, broker 0 doesn't come back up any more due to "corrupt > index" error (but that's not my question for the moment). Looking at the > state of the topics: > > Topic:simple-test-topic PartitionCount:8 ReplicationFactor:2 > Configs: > Topic: simple-test-topic Partition: 0 Leader: -1 Replicas: 1,2 > Isr: 1 > Topic: simple-test-topic Partition: 1 Leader: -1 Replicas: 2,0 > Isr: 2 > Topic: simple-test-topic Partition: 2 Leader: -1 Replicas: 0,1 > Isr: 1 > Topic: simple-test-topic Partition: 3 Leader: -1 Replicas: 1,0 > Isr: 1 > Topic: simple-test-topic Partition: 4 Leader: -1 Replicas: 2,1 > Isr: 1 > Topic: simple-test-topic Partition: 5 Leader: -1 Replicas: 0,2 > Isr: 2 > Topic: simple-test-topic Partition: 6 Leader: -1 Replicas: 1,2 > Isr: 1 > Topic: simple-test-topic Partition: 7 Leader: -1 Replicas: 2,0 > Isr: 2 > Topic:test PartitionCount:8 ReplicationFactor:3 Configs: > Topic: test Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 2,1 > Topic: test Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,1 > Topic: test Partition: 2 Leader: 2 Replicas: 0,1,2 Isr: 2,1 > Topic: test Partition: 3 Leader: 1 Replicas: 1,0,2 Isr: 2,1 > Topic: test Partition: 4 Leader: 2 Replicas: 2,1,0 Isr: 2,1 > Topic: test Partition: 5 Leader: 2 Replicas: 0,2,1 Isr: 2,1 > Topic: test Partition: 6 Leader: 1 Replicas: 1,2,0 Isr: 2,1 > Topic: test Partition: 7 Leader: 2 Replicas: 2,0,1 Isr: 2,1 > > .. at which point: > > - All 'simple-test-topic' partitions are leaderless. > - It's not possible to delete "simple-test-topic" any more. > - Calling 'kafka-preferred-replica-election.sh' successfully starts > election but doesn't have any effect. > > The other topic, named "test" (RF 3), is just sitting there and not > actively participating in the test. > > Now I'm wondering: > > - Which of the steps above could have caused "simple-test-topic" partitions > to become leaderless? > - How to recover in such situation in cases where broker 0 can or cannot be > recovered? > > Thanks, > Omid >