On Wed, Feb 4, 2015 at 10:16 PM, Harsha <ka...@harsha.io> wrote:

>
>        whats your zookeeper.session.timeout.ms value
>

30000 (30sec)


Sumit



>
> On Wed, Feb 4, 2015, at 09:35 PM, Sumit Rangwala wrote:
> > On Wed, Feb 4, 2015 at 6:14 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
> >
> > > I took a look at your logs. I agree with Harsh that the logs seem
> > > truncated. The basic issue though is that you have session expirations
> > > and controller failover. Broker 49554 was the controller and hosted
> > > some partition(s) of LAX1-GRIFFIN-r13-1423001701601. After controller
> > > failover the new controller marks it as ineligible for deletion since
> > > 49554 is considered down (until it re-registers in zookeeper) and is
> > > relected as the leader - however I don't see those logs.
> > >
> > > Ok. Just wondering if the delete logic is document anywhere.
> >
> >
> >
> > > Any idea why you have session expirations? This is typically due to GC
> > > and/or flaky network. Regardless, we should be handling that scenario
> > > as well. However, your logs seem incomplete. Can you redo this and
> > > perhaps keep the set up running a little longer and send over those
> > > logs?
> > >
> > >
> > I am stress testing my application by doing a large number of read and
> > writes to kafka. My setup consist many docker instances (of brokers and
> > client) running (intentionally) on a single linux box. Since the machine
> > is
> > overload, congested network and long GC are a possibility.
> >
> > I will redo the experiment and keep the kakfa brokers running. However, I
> > will move to 0.8.2 release since Jun asked me to try it for another issue
> > (topic creation). I hope that is fine.
> >
> >
> > Sumit
> >
> >
> >
> > > Thanks,
> > >
> > > Joel
> > >
> > > On Wed, Feb 04, 2015 at 01:00:46PM -0800, Sumit Rangwala wrote:
> > > > >
> > > > >
> > > > >> I have since stopped the container so I cannot say if
> > > > > LAX1-GRIFFIN-r45-1423000088317 was one of the topic in "marked for
> > > > > deletion" forever.  However, there were many topics (at least 10 of
> > > them)
> > > > > that were perennially in "marked for deletion" state.
> > > > >
> > > > >
> > > > I have the setup to recreate the issue in case the logs are not
> > > sufficient.
> > > >
> > > >
> > > > Sumit
> > > >
> > > >
> > > >
> > > > > Sumit
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >> -Harsha
> > > > >>
> > > > >> On Tue, Feb 3, 2015, at 09:19 PM, Harsha wrote:
> > > > >> > you are probably handling it but there is a case where you call
> > > > >> > deleteTopic and kafka goes through delete topic process but your
> > > > >> > consumer is running probably made a TopicMetadataRequest for the
> > > same
> > > > >> > topic which can re-create the topic with the default
> num.partitions
> > > and
> > > > >> > replication.factor.  Did you try stopping the consumer first and
> > > issue
> > > > >> > the topic delete.
> > > > >> > -Harsha
> > > > >> >
> > > > >> > On Tue, Feb 3, 2015, at 08:37 PM, Sumit Rangwala wrote:
> > > > >> > > On Tue, Feb 3, 2015 at 6:48 PM, Harsha <ka...@harsha.io>
> wrote:
> > > > >> > >
> > > > >> > > > Sumit,
> > > > >> > > >        lets say you are deleting a older topic "test1" do
> you
> > > have
> > > > >> any
> > > > >> > > >        consumers running simultaneously for the topic
> "test1"
> > > while
> > > > >> > > >        deletion of topic going on.
> > > > >> > > >
> > > > >> > >
> > > > >> > > Yes it is the case. However, after a small period of time
> (say few
> > > > >> > > minutes)
> > > > >> > > there won't be any consumer running for the deleted topic.
> > > > >> > >
> > > > >> > >
> > > > >> > > Sumit
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > > -Harsha
> > > > >> > > >
> > > > >> > > > On Tue, Feb 3, 2015, at 06:17 PM, Joel Koshy wrote:
> > > > >> > > > > Thanks for the logs - will take a look tomorrow unless
> someone
> > > > >> else
> > > > >> > > > > gets a chance to get to it today.
> > > > >> > > > >
> > > > >> > > > > Joel
> > > > >> > > > >
> > > > >> > > > > On Tue, Feb 03, 2015 at 04:11:57PM -0800, Sumit Rangwala
> > > wrote:
> > > > >> > > > > > On Tue, Feb 3, 2015 at 3:37 PM, Joel Koshy <
> > > jjkosh...@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hey Sumit,
> > > > >> > > > > > >
> > > > >> > > > > > > I thought you would be providing the actual steps to
> > > > >> reproduce :)
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > > I want to but some proprietary code prevents me to do
> it.
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > > Nevertheless, can you get all the relevant logs: state
> > > change
> > > > >> logs
> > > > >> > > > and
> > > > >> > > > > > > controller logs at the very least and if possible
> server
> > > logs
> > > > >> and
> > > > >> > > > send
> > > > >> > > > > > > those over?
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > > Here are all the logs you requested (there are three
> > > brokers in
> > > > >> my
> > > > >> > > > setup
> > > > >> > > > > > k1, k2, k3): http://d.pr/f/1kprY/2quHBRRT (Gmail has
> issue
> > > > >> with the
> > > > >> > > > file)
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > Sumit
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Joel
> > > > >> > > > > > >
> > > > >> > > > > > > On Tue, Feb 03, 2015 at 03:27:43PM -0800, Sumit
> Rangwala
> > > > >> wrote:
> > > > >> > > > > > > > In my setup kafka brokers are set for auto topic
> > > creation.
> > > > >> In the
> > > > >> > > > > > > scenario
> > > > >> > > > > > > > below a node informs other nodes (currently 5 in
> total)
> > > > >> about  a
> > > > >> > > > number
> > > > >> > > > > > > of
> > > > >> > > > > > > > new (non-existent) topics, and  all the nodes almost
> > > > >> > > > simultaneously open
> > > > >> > > > > > > a
> > > > >> > > > > > > > consumer for each of those topics. Sometime later
> > > another
> > > > >> node
> > > > >> > > > informs
> > > > >> > > > > > > all
> > > > >> > > > > > > > other nodes of a new list of topics and each node,
> if
> > > they
> > > > >> find
> > > > >> > > > that an
> > > > >> > > > > > > > older topic exists in kafka, goes ahead and deletes
> the
> > > > >> older
> > > > >> > > > topic.
> > > > >> > > > > > > What I
> > > > >> > > > > > > > have found is that many of the topics stay in the
> > > "marked
> > > > >> for
> > > > >> > > > deletion"
> > > > >> > > > > > > > state forever.
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > I get the list of topics using
> > > > >> ZkUtils.getAllTopics(zkClient) and
> > > > >> > > > delete
> > > > >> > > > > > > > topics using AdminUtils.deleteTopic(zkClient,
> topic).
> > > Since
> > > > >> many
> > > > >> > > > nodes
> > > > >> > > > > > > > might try to delete the same topic at the same time
> I do
> > > > >> > > > > > > > see ZkNodeExistsException while deleting the topic,
> > > which I
> > > > >> catch
> > > > >> > > > an
> > > > >> > > > > > > > ignore. (e.g.,
> > > > >> > > > org.apache.zookeeper.KeeperException$NodeExistsException:
> > > > >> > > > > > > > KeeperErrorCode = NodeExists for
> > > > >> > > > > > > > /admin/delete_topics/LAX1-GRIFFIN-r13-1423001701601)
> > > > >> > > > > > > >
> > > > >> > > > > > > > # State of one deleted topic on kafka brokers:
> > > > >> > > > > > > > Topic:LAX1-GRIFFIN-r13-1423001701601
> PartitionCount:8
> > > > >> > > > ReplicationFactor:1
> > > > >> > > > > > > > Configs:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 0
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49558 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 1
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49554 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 2
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49557 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 3
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49558 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 4
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49554 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 5
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49557 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 6
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49558 Isr:
> > > > >> > > > > > > > Topic: LAX1-GRIFFIN-r13-1423001701601 Partition: 7
> > > Leader:
> > > > >> -1
> > > > >> > > > Replicas:
> > > > >> > > > > > > > 49554 Isr:
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > # Controller log says
> > > > >> > > > > > > >
> > > > >> > > > > > > > [2015-02-03 22:59:03,399] INFO
> > > [delete-topics-thread-49554],
> > > > >> > > > Deletion for
> > > > >> > > > > > > > replicas 49557,49554,49558 for partition
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > >
> > > > >>
> > >
> [LAX1-GRIFFIN-r13-1423001701601,0],[LAX1-GRIFFIN-r13-1423001701601,6],[LAX1-GRIFFIN-r13-1423001701601,5],[LAX1-GRIFFIN-r13-1423001701601,3],[LAX1-GRIFFIN-r13-1423001701601,7],[LAX1-GRIFFIN-r13-1423001701601,1],[LAX1-GRIFFIN-r13-1423001701601,4],[LAX1-GRIFFIN-r13-1423001701601,2]
> > > > >> > > > > > > > of topic LAX1-GRIFFIN-r13-1423001701601 in progress
> > > > >> > > > > > > >
> > > (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
> > > > >> > > > > > > >
> > > > >> > > > > > > > current time: Tue Feb  3 23:20:58 UTC 2015
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > Since I don't know the delete topic algorithm, I am
> not
> > > > >> sure why
> > > > >> > > > sure
> > > > >> > > > > > > these
> > > > >> > > > > > > > topics are not garbage collected. I do have the
> complete
> > > > >> setup
> > > > >> > > > running in
> > > > >> > > > > > > > docker right now on my local box so please let me
> know
> > > if
> > > > >> any more
> > > > >> > > > info
> > > > >> > > > > > > is
> > > > >> > > > > > > > required to troubleshoot this issue.
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > Furthermore, does ZkUtils.getAllTopics(zkClient)
> return
> > > > >> "marked for
> > > > >> > > > > > > > deletion" topic as well. If so, is there a easy way
> to
> > > get
> > > > >> a list
> > > > >> > > > of
> > > > >> > > > > > > active
> > > > >> > > > > > > > topics (other than looking at all the topics in
> > > > >> > > > /admin/delete_topics/ and
> > > > >> > > > > > > > taking a set difference with the topics returned
> > > > >> > > > > > > > by ZkUtils.getAllTopics(zkClient) )
> > > > >> > > > > > > >
> > > > >> > > > > > > > Sumit
> > > > >> > > > > > > > (More setup info below)
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > Setup
> > > > >> > > > > > > > --------
> > > > >> > > > > > > > Zookeeper: 3.4.6
> > > > >> > > > > > > > Kafka broker: 0.8.2-beta
> > > > >> > > > > > > > Kafka clients: 0.8.2-beta
> > > > >> > > > > > > >
> > > > >> > > > > > > > # Kafka boker settings (all other settings are
> default
> > > > >> 0.8.2-beta
> > > > >> > > > > > > settings)
> > > > >> > > > > > > > kafka.controlled.shutdown.enable: 'FALSE'
> > > > >> > > > > > > > kafka.auto.create.topics.enable: 'TRUE'
> > > > >> > > > > > > > kafka.num.partitions: 8
> > > > >> > > > > > > > kafka.default.replication.factor: 1
> > > > >> > > > > > > > kafka.rebalance.backoff.ms: 3000
> > > > >> > > > > > > > kafka.rebalance.max.retries: 10
> > > > >> > > > > > > > kafka.log.retention.minutes: 1200
> > > > >> > > > > > > > kafka.delete.topic.enable: 'TRUE'
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >>
> > > > >
> > > > >
> > >
> > >
>

Reply via email to