Last comment, I upgraded to java 1.7 and restarted kafka.  It's now stable.
But I have not poked at it. I'm just letting it sit for now. Could this
have been somehow related to the problem, but just not apparent in the
logs, that I was running 1.6 with 0.8.2.1?

On Tue, Jan 12, 2016 at 11:19 PM, Dillian Murphey <crackshotm...@gmail.com>
wrote:

>
> [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
> 22:16:59,629 <http://airmail.calendar/2016-01-12%2022:16:59%20PST>] TRACE
> [Controller 925537]: leader imbalance ratio for broker 925537 is 0.000000
> (kafka.controller.KafkaController)
>
> [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
> 22:21:07,167 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
> [SessionExpirationListener on 925537], ZK expired; shut down all controller
> components and try to re-elect
> (kafka.controller.KafkaController$SessionExpirationListener)
>
> [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
> 22:21:07,167 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
> [delete-topics-thread-925537], Shutting down
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
>
> [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
> 22:21:07,169 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
> [delete-topics-thread-925537], Shutdown completed
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
>
> [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
> 22:21:07,169 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
> [delete-topics-thread-925537], Stopped
>  (kafka.controller.TopicDeletionManager$Del
>
>  This occurs very frequently, even after clean slating kafka.  This is
> something that never occurs in our production env. I've read here and there
> that it could be a GC issue? Here is the tail end of recent GC log.
>
>
> 20534K(8354560K), 52.5293140 secs] [Times: user=209.09 sys=0.06,
> real=52.53 secs]
>
> 2016-01-11T23:16:05.149+0000: 784.219: [GC 784.219: [ParNew:
> 274263K->1685K(306688K), 54.8993730 secs] 793174K->520803K(8354560K),
> 54.8994450 secs] [Times: user=218.86 sys=0.03, real=54.90 secs]
>
> 2016-01-11T23:17:01.095+0000: 840.165: [GC 840.165: [ParNew:
> 274325K->1896K(306688K), 56.4208930 secs] 793443K->521139K(8354560K),
> 56.4209750 secs] [Times: user=224.88 sys=0.05, real=56.42 secs]
>
> 2016-01-11T23:17:59.024+0000: 898.093: [GC 898.093: [ParNew:
> 274536K->1705K(306688K), 58.1100630 secs] 793779K->521093K(8354560K),
> 58.1101400 secs] [Times: user=231.75 sys=0.05, real=58.12 secs]
>
> 2016-01-11T23:18:58.240+0000: 957.310: [GC 957.310: [ParNew:
> 274345K->1483K(306688K), 64.2820420 secs] 793733K->521047K(8354560K),
> 64.2821180 secs] [Times: user=241.93 sys=0.06, real=64.28 secs]
>
> 2016-01-11T23:20:03.571+0000: 1022.640: [GC 1022.640: [ParNew:
> 274123K->1379K(306688K), 61.5305280 secs] 793687K->521097K(8354560K),
> 61.5305990 secs] [Times: user=245.72 sys=0.01, real=61.53 secs]
>
> 2016-01-11T23:21:06.194+0000: 1085.263: [GC 1085.263: [ParNew:
> 274019K->1508K(306688K), 63.4433440 secs] 793737K->521372K(8354560K),
> 63.4434240 secs] [Times: user=253.33 sys=0.02, real=63.44 secs]
>
> 2016-01-11T23:22:10.413+0000: 1149.482: [GC 1149.483: [ParNew:
> 274148K->1313K(306688K), 65.6956010 secs] 794012K->521330K(8354560K),
> 65.6956660 secs] [Times: user=262.01 sys=0.05, real=65.69 secs]
>
> Heap
>
>  par new generation   total 306688K, used 132112K [0x00000005f5a00000,
> 0x000000060a6c0000, 0x000000060a6c0000)
>
>   eden space 272640K,  47% used [0x00000005f5a00000, 0x00000005fd9bbba0,
> 0x0000000606440000)
>
>   from space 34048K,   3% used [0x0000000606440000, 0x00000006065884a8,
> 0x0000000608580000)
>
>   to   space 34048K,   0% used [0x0000000608580000, 0x0000000608580000,
> 0x000000060a6c0000)
>
>  concurrent mark-sweep generation total 8047872K, used 520016K
> [0x000000060a6c0000, 0x00000007f5a00000, 0x00000007f5a00000)
>
>  concurrent-mark-sweep perm gen total 38760K, used 25768K
> [0x00000007f5a00000, 0x00000007f7fda000, 0x0000000800000000)
>
>
>
> On Tue, Jan 12, 2016 at 6:34 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> Can you paste the logs?
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Tue, Jan 12, 2016 at 4:58 PM, Dillian Murphey <crackshotm...@gmail.com
>> >
>> wrote:
>>
>> > Possibly running more stable with 1.7 JVM.
>> >
>> > Can someone explain the Zookeeper session?  SHould it never expire,
>> unless
>> > the broker becomes unresponsive?  I set a massive timeout value in the
>> > broker config far beyond the amount of time I see the zk expiration. Is
>> > this entirely on the kafka side, or could zookeeper be doing something?
>> > From my zk logs I didn't see anything unusual, just exceptions as a
>> result
>> > of the zk session expiring (my guess).
>> >
>> > tnx
>> >
>> > On Tue, Jan 12, 2016 at 3:05 PM, Dillian Murphey <
>> crackshotm...@gmail.com>
>> > wrote:
>> >
>> > > Our 2 node kafka cluster has become unhealthy.  We're running
>> zookeeper
>> > as
>> > > a 3 node system, which very light load.
>> > >
>> > > What seems to be happening is in the controller log we get a ZK
>> session
>> > > expire message, and in the process of re-assigning the leader for the
>> > > partitions (if I'm understanding this right, please correct me), the
>> > broker
>> > > goes offline and it interrupts our applications that are publishing
>> > > messages.
>> > >
>> > > We don't see this in production, and kafka has been stable for months,
>> > > since september.
>> > >
>> > > I've searched a lot and found some similiar complaints but no real
>> > > solutions.
>> > >
>> > > I'm running 0.8.2 and JVM 1.6.X on ubuntu.
>> > >
>> > > Thanks for any ideas at all.
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>

Reply via email to