>
> Any idea why you have session expirations? This is typically due to GC
>> and/or flaky network. Regardless, we should be handling that scenario
>> as well. However, your logs seem incomplete. Can you redo this and
>> perhaps keep the set up running a little longer and send over those
>> logs?
>>
>>
> I am stress testing my application by doing a large number of read and
> writes to kafka. My setup consist many docker instances (of brokers and
> client) running (intentionally) on a single linux box. Since the machine is
> overload, congested network and long GC are a possibility.
>
> I will redo the experiment and keep the kakfa brokers running. However, I
> will move to 0.8.2 release since Jun asked me to try it for another issue
> (topic creation). I hope that is fine.
>
>
Joel, Harsha,

Here are the logs from a different run ( http://d.pr/f/1giCl/PA6GBhrU ).
Everything is same except that I moved from 0.8.2-beta to 0.8.2.0 (for
clients as well as brokers). My setup consist of 1 zookeeper, 3 kafka
brokers, 5 my application nodes. I shutdown all my application nodes after
the experiment but zookeeper and kafka brokers are still running in case we
need more data. All the dump of log files on brokers were taken *after* I
shut down all application nodes to make sure there were no consumer for any
of the topics.

The current state of topics on kafka nodes where the topic with deletion
issue is LAX1-GRIFFIN-r25-1423118905870

root@69a87c0e7c5b:/# date && $KAFKA_HOME/bin/kafka-topics.sh
--zookeeper=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --list

Thu Feb  5 07:15:28 UTC 2015
LAX1-GRIFFIN-CONTROL
LAX1-GRIFFIN-r1-1423119435923
LAX1-GRIFFIN-r10-1423119436039
LAX1-GRIFFIN-r11-1423119436675
LAX1-GRIFFIN-r12-1423119436320
LAX1-GRIFFIN-r13-1423119436162
LAX1-GRIFFIN-r14-1423119436583
LAX1-GRIFFIN-r15-1423119436446
LAX1-GRIFFIN-r16-1423119437195
LAX1-GRIFFIN-r17-1423119436330
LAX1-GRIFFIN-r18-1423119436411
LAX1-GRIFFIN-r19-1423119437150
LAX1-GRIFFIN-r2-1423119435925
LAX1-GRIFFIN-r20-1423119436750
LAX1-GRIFFIN-r21-1423119437717
LAX1-GRIFFIN-r22-1423119436820
LAX1-GRIFFIN-r23-1423119436483
LAX1-GRIFFIN-r24-1423119437210
LAX1-GRIFFIN-r25-1423118905870 - marked for deletion
LAX1-GRIFFIN-r25-1423119437277
LAX1-GRIFFIN-r26-1423119438650
LAX1-GRIFFIN-r27-1423119436855
LAX1-GRIFFIN-r28-1423119436566
LAX1-GRIFFIN-r29-1423119437955
LAX1-GRIFFIN-r3-1423119435925
LAX1-GRIFFIN-r30-1423119437949
LAX1-GRIFFIN-r4-1423119435925
LAX1-GRIFFIN-r5-1423119435923
LAX1-GRIFFIN-r6-1423119436354
LAX1-GRIFFIN-r7-1423119436095
LAX1-GRIFFIN-r8-1423119436100
LAX1-GRIFFIN-r9-1423119436377

Sumit




>
> Sumit
>
>
>
>> Thanks,
>>
>> Joel
>>
>> On Wed, Feb 04, 2015 at 01:00:46PM -0800, Sumit Rangwala wrote:
>> > >
>> > >
>> > >> I have since stopped the container so I cannot say if
>> > > LAX1-GRIFFIN-r45-1423000088317 was one of the topic in "marked for
>> > > deletion" forever.  However, there were many topics (at least 10 of
>> them)
>> > > that were perennially in "marked for deletion" state.
>> > >
>> > >
>> > I have the setup to recreate the issue in case the logs are not
>> sufficient.
>> >
>> >
>> > Sumit
>> >
>> >
>> >
>> > > Sumit
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >> -Harsha
>> > >>
>> > >> On Tue, Feb 3, 2015, at 09:19 PM, Harsha wrote:
>> > >> > you are probably handling it but there is a case where you call
>> > >> > deleteTopic and kafka goes through delete topic process but your
>> > >> > consumer is running probably made a TopicMetadataRequest for the
>> same
>> > >> > topic which can re-create the topic with the default
>> num.partitions and
>> > >> > replication.factor.  Did you try stopping the consumer first and
>> issue
>> > >> > the topic delete.
>> > >> > -Harsha
>> > >> >
>> > >> > On Tue, Feb 3, 2015, at 08:37 PM, Sumit Rangwala wrote:
>> > >> > > On Tue, Feb 3, 2015 at 6:48 PM, Harsha <ka...@harsha.io> wrote:
>> > >> > >
>> > >> > > > Sumit,
>> > >> > > >        lets say you are deleting a older topic "test1" do you
>> have
>> > >> any
>> > >> > > >        consumers running simultaneously for the topic "test1"
>> while
>> > >> > > >        deletion of topic going on.
>> > >> > > >
>> > >> > >
>> > >> > > Yes it is the case. However, after a small period of time (say
>> few
>> > >> > > minutes)
>> > >> > > there won't be any consumer running for the deleted topic.
>> > >> > >
>> > >> > >
>> > >> > > Sumit
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > > -Harsha
>> > >> > > >
>> > >> > > > On Tue, Feb 3, 2015, at 06:17 PM, Joel Koshy wrote:
>> > >> > > > > Thanks for the logs - will take a look tomorrow unless
>> someone
>> > >> else
>> > >> > > > > gets a chance to get to it today.
>> > >> > > > >
>> > >> > > > > Joel
>> > >> > > > >
>> > >> > > > > On Tue, Feb 03, 2015 at 04:11:57PM -0800, Sumit Rangwala
>> wrote:
>> > >> > > > > > On Tue, Feb 3, 2015 at 3:37 PM, Joel Koshy <
>> jjkosh...@gmail.com
>> > >> >
>> > >> > > > wrote:
>> > >> > > > > >
>> > >> > > > > > > Hey Sumit,
>> > >> > > > > > >
>> > >> > > > > > > I thought you would be providing the actual steps to
>> > >> reproduce :)
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > > > I want to but some proprietary code prevents me to do it.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > > Nevertheless, can you get all the relevant logs: state
>> change
>> > >> logs
>> > >> > > > and
>> > >> > > > > > > controller logs at the very least and if possible server
>> logs
>> > >> and
>> > >> > > > send
>> > >> > > > > > > those over?
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > > > Here are all the logs you requested (there are three
>> brokers in
>> > >> my
>> > >> > > > setup
>> > >> > > > > > k1, k2, k3): http://d.pr/f/1kprY/2quHBRRT (Gmail has issue
>> > >> with the
>> > >> > > > file)
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > Sumit
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > >
>> > >> > > > > > > Joel
>> > >> > > > > > >
>> > >> > > > > > > On Tue, Feb 03, 2015 at 03:27:43PM -0800, Sumit Rangwala
>> > >> wrote:
>> > >> > > > > > > > In my setup kafka brokers are set for auto topic
>> creation.
>> > >> In the
>> > >> > > > > > > scenario
>> > >> > > > > > > > below a node informs other nodes (currently 5 in total)
>> > >> about  a
>> > >> > > > number
>> > >> > > > > > > of
>> > >> > > > > > > > new (non-existent) topics, and  all the nodes almost
>> > >> > > > simultaneously open
>> > >> > > > > > > a
>> > >> > > > > > > > consumer for each of those topics. Sometime later
>> another
>> > >> node
>> > >> > > > informs
>> > >> > > > > > > all
>> > >> > > > > > > > other nodes of a new list of topics and each node, if
>> they
>> > >> find
>> > >> > > > that an
>> > >> > > > > > > > older topic exists in kafka, goes ahead and deletes the
>> > >> older
>> > >> > > > topic.
>> > >> > > > > > > What I
>> > >> > > > > > > > have found is that many of the topics stay in the
>> "marked
>> > >> for
>> > >> > > > deletion"
>> > >> > > > > > > > state forever.
>> > >> > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > > > I get the list of topics using
>> > >> ZkUtils.getAllTopics(zkClient) and
>> > >> > > > delete
>> > >> > > > > > > > topics using AdminUtils.deleteTopic(zkClient, topic).
>> Since
>> > >> many
>> > >> > > > nodes
>> > >> > > > > > > > might try to delete the same topic at the same time I
>> do
>> > >> > > > > > > > see ZkNodeExistsException while deleting the topic,
>> which I
>> > >> catch
>> > >> > > > an
>> > >> > > > > > > > ignore. (e.g.,
>> > >> > > > org.apache.zookeeper.KeeperException$NodeExistsException:
>> > >> > > > > > > > KeeperErrorCode = NodeExists for
>> > >> > > > > > > > /admin/delete_topics/LAX1-GRIFFIN-r13-1423001701601)
>> > >> > > > > > > >
>> > >> > > > > > > > # State of one deleted topic on kafka brokers:
>> > >> > > > > > > > Topic:LAX1-GRIFFIN-r13-1423001701601 PartitionCount:8
>> > >> > > > ReplicationFactor:1
>> > >> > > > > > > > Configs:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 0
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49558 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 1
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49554 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 2
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49557 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 3
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49558 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 4
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49554 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 5
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49557 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 6
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49558 Isr:
>> > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 7
>> Leader:
>> > >> -1
>> > >> > > > Replicas:
>> > >> > > > > > > > 49554 Isr:
>> > >> > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > > > # Controller log says
>> > >> > > > > > > >
>> > >> > > > > > > > [2015-02-03 22:59:03,399] INFO
>> [delete-topics-thread-49554],
>> > >> > > > Deletion for
>> > >> > > > > > > > replicas 49557,49554,49558 for partition
>> > >> > > > > > > >
>> > >> > > > > > >
>> > >> > > >
>> > >>
>> [LAX1-GRIFFIN-r13-1423001701601,0],[LAX1-GRIFFIN-r13-1423001701601,6],[LAX1-GRIFFIN-r13-1423001701601,5],[LAX1-GRIFFIN-r13-1423001701601,3],[LAX1-GRIFFIN-r13-1423001701601,7],[LAX1-GRIFFIN-r13-1423001701601,1],[LAX1-GRIFFIN-r13-1423001701601,4],[LAX1-GRIFFIN-r13-1423001701601,2]
>> > >> > > > > > > > of topic LAX1-GRIFFIN-r13-1423001701601 in progress
>> > >> > > > > > > >
>> (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
>> > >> > > > > > > >
>> > >> > > > > > > > current time: Tue Feb  3 23:20:58 UTC 2015
>> > >> > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > > > Since I don't know the delete topic algorithm, I am not
>> > >> sure why
>> > >> > > > sure
>> > >> > > > > > > these
>> > >> > > > > > > > topics are not garbage collected. I do have the
>> complete
>> > >> setup
>> > >> > > > running in
>> > >> > > > > > > > docker right now on my local box so please let me know
>> if
>> > >> any more
>> > >> > > > info
>> > >> > > > > > > is
>> > >> > > > > > > > required to troubleshoot this issue.
>> > >> > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > > > Furthermore, does ZkUtils.getAllTopics(zkClient) return
>> > >> "marked for
>> > >> > > > > > > > deletion" topic as well. If so, is there a easy way to
>> get
>> > >> a list
>> > >> > > > of
>> > >> > > > > > > active
>> > >> > > > > > > > topics (other than looking at all the topics in
>> > >> > > > /admin/delete_topics/ and
>> > >> > > > > > > > taking a set difference with the topics returned
>> > >> > > > > > > > by ZkUtils.getAllTopics(zkClient) )
>> > >> > > > > > > >
>> > >> > > > > > > > Sumit
>> > >> > > > > > > > (More setup info below)
>> > >> > > > > > > >
>> > >> > > > > > > >
>> > >> > > > > > > > Setup
>> > >> > > > > > > > --------
>> > >> > > > > > > > Zookeeper: 3.4.6
>> > >> > > > > > > > Kafka broker: 0.8.2-beta
>> > >> > > > > > > > Kafka clients: 0.8.2-beta
>> > >> > > > > > > >
>> > >> > > > > > > > # Kafka boker settings (all other settings are default
>> > >> 0.8.2-beta
>> > >> > > > > > > settings)
>> > >> > > > > > > > kafka.controlled.shutdown.enable: 'FALSE'
>> > >> > > > > > > > kafka.auto.create.topics.enable: 'TRUE'
>> > >> > > > > > > > kafka.num.partitions: 8
>> > >> > > > > > > > kafka.default.replication.factor: 1
>> > >> > > > > > > > kafka.rebalance.backoff.ms: 3000
>> > >> > > > > > > > kafka.rebalance.max.retries: 10
>> > >> > > > > > > > kafka.log.retention.minutes: 1200
>> > >> > > > > > > > kafka.delete.topic.enable: 'TRUE'
>> > >> > > > > > >
>> > >> > > > > > >
>> > >> > > > >
>> > >> > > >
>> > >>
>> > >
>> > >
>>
>>
>

Reply via email to