> > Any idea why you have session expirations? This is typically due to GC >> and/or flaky network. Regardless, we should be handling that scenario >> as well. However, your logs seem incomplete. Can you redo this and >> perhaps keep the set up running a little longer and send over those >> logs? >> >> > I am stress testing my application by doing a large number of read and > writes to kafka. My setup consist many docker instances (of brokers and > client) running (intentionally) on a single linux box. Since the machine is > overload, congested network and long GC are a possibility. > > I will redo the experiment and keep the kakfa brokers running. However, I > will move to 0.8.2 release since Jun asked me to try it for another issue > (topic creation). I hope that is fine. > > Joel, Harsha,
Here are the logs from a different run ( http://d.pr/f/1giCl/PA6GBhrU ). Everything is same except that I moved from 0.8.2-beta to 0.8.2.0 (for clients as well as brokers). My setup consist of 1 zookeeper, 3 kafka brokers, 5 my application nodes. I shutdown all my application nodes after the experiment but zookeeper and kafka brokers are still running in case we need more data. All the dump of log files on brokers were taken *after* I shut down all application nodes to make sure there were no consumer for any of the topics. The current state of topics on kafka nodes where the topic with deletion issue is LAX1-GRIFFIN-r25-1423118905870 root@69a87c0e7c5b:/# date && $KAFKA_HOME/bin/kafka-topics.sh --zookeeper=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --list Thu Feb 5 07:15:28 UTC 2015 LAX1-GRIFFIN-CONTROL LAX1-GRIFFIN-r1-1423119435923 LAX1-GRIFFIN-r10-1423119436039 LAX1-GRIFFIN-r11-1423119436675 LAX1-GRIFFIN-r12-1423119436320 LAX1-GRIFFIN-r13-1423119436162 LAX1-GRIFFIN-r14-1423119436583 LAX1-GRIFFIN-r15-1423119436446 LAX1-GRIFFIN-r16-1423119437195 LAX1-GRIFFIN-r17-1423119436330 LAX1-GRIFFIN-r18-1423119436411 LAX1-GRIFFIN-r19-1423119437150 LAX1-GRIFFIN-r2-1423119435925 LAX1-GRIFFIN-r20-1423119436750 LAX1-GRIFFIN-r21-1423119437717 LAX1-GRIFFIN-r22-1423119436820 LAX1-GRIFFIN-r23-1423119436483 LAX1-GRIFFIN-r24-1423119437210 LAX1-GRIFFIN-r25-1423118905870 - marked for deletion LAX1-GRIFFIN-r25-1423119437277 LAX1-GRIFFIN-r26-1423119438650 LAX1-GRIFFIN-r27-1423119436855 LAX1-GRIFFIN-r28-1423119436566 LAX1-GRIFFIN-r29-1423119437955 LAX1-GRIFFIN-r3-1423119435925 LAX1-GRIFFIN-r30-1423119437949 LAX1-GRIFFIN-r4-1423119435925 LAX1-GRIFFIN-r5-1423119435923 LAX1-GRIFFIN-r6-1423119436354 LAX1-GRIFFIN-r7-1423119436095 LAX1-GRIFFIN-r8-1423119436100 LAX1-GRIFFIN-r9-1423119436377 Sumit > > Sumit > > > >> Thanks, >> >> Joel >> >> On Wed, Feb 04, 2015 at 01:00:46PM -0800, Sumit Rangwala wrote: >> > > >> > > >> > >> I have since stopped the container so I cannot say if >> > > LAX1-GRIFFIN-r45-1423000088317 was one of the topic in "marked for >> > > deletion" forever. However, there were many topics (at least 10 of >> them) >> > > that were perennially in "marked for deletion" state. >> > > >> > > >> > I have the setup to recreate the issue in case the logs are not >> sufficient. >> > >> > >> > Sumit >> > >> > >> > >> > > Sumit >> > > >> > > >> > > >> > > >> > > >> > >> -Harsha >> > >> >> > >> On Tue, Feb 3, 2015, at 09:19 PM, Harsha wrote: >> > >> > you are probably handling it but there is a case where you call >> > >> > deleteTopic and kafka goes through delete topic process but your >> > >> > consumer is running probably made a TopicMetadataRequest for the >> same >> > >> > topic which can re-create the topic with the default >> num.partitions and >> > >> > replication.factor. Did you try stopping the consumer first and >> issue >> > >> > the topic delete. >> > >> > -Harsha >> > >> > >> > >> > On Tue, Feb 3, 2015, at 08:37 PM, Sumit Rangwala wrote: >> > >> > > On Tue, Feb 3, 2015 at 6:48 PM, Harsha <ka...@harsha.io> wrote: >> > >> > > >> > >> > > > Sumit, >> > >> > > > lets say you are deleting a older topic "test1" do you >> have >> > >> any >> > >> > > > consumers running simultaneously for the topic "test1" >> while >> > >> > > > deletion of topic going on. >> > >> > > > >> > >> > > >> > >> > > Yes it is the case. However, after a small period of time (say >> few >> > >> > > minutes) >> > >> > > there won't be any consumer running for the deleted topic. >> > >> > > >> > >> > > >> > >> > > Sumit >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > > -Harsha >> > >> > > > >> > >> > > > On Tue, Feb 3, 2015, at 06:17 PM, Joel Koshy wrote: >> > >> > > > > Thanks for the logs - will take a look tomorrow unless >> someone >> > >> else >> > >> > > > > gets a chance to get to it today. >> > >> > > > > >> > >> > > > > Joel >> > >> > > > > >> > >> > > > > On Tue, Feb 03, 2015 at 04:11:57PM -0800, Sumit Rangwala >> wrote: >> > >> > > > > > On Tue, Feb 3, 2015 at 3:37 PM, Joel Koshy < >> jjkosh...@gmail.com >> > >> > >> > >> > > > wrote: >> > >> > > > > > >> > >> > > > > > > Hey Sumit, >> > >> > > > > > > >> > >> > > > > > > I thought you would be providing the actual steps to >> > >> reproduce :) >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > > I want to but some proprietary code prevents me to do it. >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > > Nevertheless, can you get all the relevant logs: state >> change >> > >> logs >> > >> > > > and >> > >> > > > > > > controller logs at the very least and if possible server >> logs >> > >> and >> > >> > > > send >> > >> > > > > > > those over? >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > > Here are all the logs you requested (there are three >> brokers in >> > >> my >> > >> > > > setup >> > >> > > > > > k1, k2, k3): http://d.pr/f/1kprY/2quHBRRT (Gmail has issue >> > >> with the >> > >> > > > file) >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > Sumit >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > > >> > >> > > > > > > Joel >> > >> > > > > > > >> > >> > > > > > > On Tue, Feb 03, 2015 at 03:27:43PM -0800, Sumit Rangwala >> > >> wrote: >> > >> > > > > > > > In my setup kafka brokers are set for auto topic >> creation. >> > >> In the >> > >> > > > > > > scenario >> > >> > > > > > > > below a node informs other nodes (currently 5 in total) >> > >> about a >> > >> > > > number >> > >> > > > > > > of >> > >> > > > > > > > new (non-existent) topics, and all the nodes almost >> > >> > > > simultaneously open >> > >> > > > > > > a >> > >> > > > > > > > consumer for each of those topics. Sometime later >> another >> > >> node >> > >> > > > informs >> > >> > > > > > > all >> > >> > > > > > > > other nodes of a new list of topics and each node, if >> they >> > >> find >> > >> > > > that an >> > >> > > > > > > > older topic exists in kafka, goes ahead and deletes the >> > >> older >> > >> > > > topic. >> > >> > > > > > > What I >> > >> > > > > > > > have found is that many of the topics stay in the >> "marked >> > >> for >> > >> > > > deletion" >> > >> > > > > > > > state forever. >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > I get the list of topics using >> > >> ZkUtils.getAllTopics(zkClient) and >> > >> > > > delete >> > >> > > > > > > > topics using AdminUtils.deleteTopic(zkClient, topic). >> Since >> > >> many >> > >> > > > nodes >> > >> > > > > > > > might try to delete the same topic at the same time I >> do >> > >> > > > > > > > see ZkNodeExistsException while deleting the topic, >> which I >> > >> catch >> > >> > > > an >> > >> > > > > > > > ignore. (e.g., >> > >> > > > org.apache.zookeeper.KeeperException$NodeExistsException: >> > >> > > > > > > > KeeperErrorCode = NodeExists for >> > >> > > > > > > > /admin/delete_topics/LAX1-GRIFFIN-r13-1423001701601) >> > >> > > > > > > > >> > >> > > > > > > > # State of one deleted topic on kafka brokers: >> > >> > > > > > > > Topic:LAX1-GRIFFIN-r13-1423001701601 PartitionCount:8 >> > >> > > > ReplicationFactor:1 >> > >> > > > > > > > Configs: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 0 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49558 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 1 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49554 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 2 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49557 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 3 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49558 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 4 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49554 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 5 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49557 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 6 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49558 Isr: >> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 7 >> Leader: >> > >> -1 >> > >> > > > Replicas: >> > >> > > > > > > > 49554 Isr: >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > # Controller log says >> > >> > > > > > > > >> > >> > > > > > > > [2015-02-03 22:59:03,399] INFO >> [delete-topics-thread-49554], >> > >> > > > Deletion for >> > >> > > > > > > > replicas 49557,49554,49558 for partition >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > >> > >> >> [LAX1-GRIFFIN-r13-1423001701601,0],[LAX1-GRIFFIN-r13-1423001701601,6],[LAX1-GRIFFIN-r13-1423001701601,5],[LAX1-GRIFFIN-r13-1423001701601,3],[LAX1-GRIFFIN-r13-1423001701601,7],[LAX1-GRIFFIN-r13-1423001701601,1],[LAX1-GRIFFIN-r13-1423001701601,4],[LAX1-GRIFFIN-r13-1423001701601,2] >> > >> > > > > > > > of topic LAX1-GRIFFIN-r13-1423001701601 in progress >> > >> > > > > > > > >> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) >> > >> > > > > > > > >> > >> > > > > > > > current time: Tue Feb 3 23:20:58 UTC 2015 >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > Since I don't know the delete topic algorithm, I am not >> > >> sure why >> > >> > > > sure >> > >> > > > > > > these >> > >> > > > > > > > topics are not garbage collected. I do have the >> complete >> > >> setup >> > >> > > > running in >> > >> > > > > > > > docker right now on my local box so please let me know >> if >> > >> any more >> > >> > > > info >> > >> > > > > > > is >> > >> > > > > > > > required to troubleshoot this issue. >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > Furthermore, does ZkUtils.getAllTopics(zkClient) return >> > >> "marked for >> > >> > > > > > > > deletion" topic as well. If so, is there a easy way to >> get >> > >> a list >> > >> > > > of >> > >> > > > > > > active >> > >> > > > > > > > topics (other than looking at all the topics in >> > >> > > > /admin/delete_topics/ and >> > >> > > > > > > > taking a set difference with the topics returned >> > >> > > > > > > > by ZkUtils.getAllTopics(zkClient) ) >> > >> > > > > > > > >> > >> > > > > > > > Sumit >> > >> > > > > > > > (More setup info below) >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > Setup >> > >> > > > > > > > -------- >> > >> > > > > > > > Zookeeper: 3.4.6 >> > >> > > > > > > > Kafka broker: 0.8.2-beta >> > >> > > > > > > > Kafka clients: 0.8.2-beta >> > >> > > > > > > > >> > >> > > > > > > > # Kafka boker settings (all other settings are default >> > >> 0.8.2-beta >> > >> > > > > > > settings) >> > >> > > > > > > > kafka.controlled.shutdown.enable: 'FALSE' >> > >> > > > > > > > kafka.auto.create.topics.enable: 'TRUE' >> > >> > > > > > > > kafka.num.partitions: 8 >> > >> > > > > > > > kafka.default.replication.factor: 1 >> > >> > > > > > > > kafka.rebalance.backoff.ms: 3000 >> > >> > > > > > > > kafka.rebalance.max.retries: 10 >> > >> > > > > > > > kafka.log.retention.minutes: 1200 >> > >> > > > > > > > kafka.delete.topic.enable: 'TRUE' >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > >> > >> > > > >> > >> >> > > >> > > >> >> >