Oh interesting. I didn’t know about that log file until now.

The only error that has been populated among all brokers showing this behavior 
is:

ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)

Then we see many messages like this:

INFO Compaction for partition [__consumer_offsets,30] is resumed 
(kafka.log.LogCleaner)
INFO The cleaning for partition [__consumer_offsets,30] is aborted 
(kafka.log.LogCleaner)

Using Visual VM, I do not see any log-cleaner threads in those brokers.  I do 
see it in the brokers not showing this behavior though.

Any idea why the LogCleaner failed?

As a temporary fix, should we restart the affected brokers?

Thanks again!


Lawrence Weikum 

On 7/13/16, 10:34 AM, "Manikumar Reddy" <manikumar.re...@gmail.com> wrote:

Hi,

Are you seeing any errors in log-cleaner.log?  The log-cleaner thread can
crash on certain errors.

Thanks
Manikumar

On Wed, Jul 13, 2016 at 9:54 PM, Lawrence Weikum <lwei...@pandora.com>
wrote:

> Hello,
>
> We’re seeing a strange behavior in Kafka 0.9.0.1 which occurs about every
> other week.  I’m curious if others have seen it and know of a solution.
>
> Setup and Scenario:
>
> -          Brokers initially setup with log compaction turned off
>
> -          After 30 days, log compaction was turned on
>
> -          At this time, the number of Open FDs was ~ 30K per broker.
>
> -          After 2 days, the __consumer_offsets topic was compacted
> fully.  Open FDs reduced to ~5K per broker.
>
> -          Cluster has been under normal load for roughly 7 days.
>
> -          At the 7 day mark, __consumer_offsets topic seems to have
> stopped compacting on two of the brokers, and on those brokers, the FD
> count is up to ~25K.
>
>
> We have tried rebalancing the partitions before.  The first time, the
> destination broker had compacted the data fine and open FDs were low. The
> second time, the destination broker kept the FDs open.
>
>
> In all the broker logs, we’re seeing this messages:
> INFO [Group Metadata Manager on Broker 8]: Removed 0 expired offsets in 0
> milliseconds. (kafka.coordinator.GroupMetadataManager)
>
> There are only 4 consumers at the moment on the cluster; one topic with 92
> partitions.
>
> Is there a reason why log compaction may stop working or why the
> __consumer_offsets topic would start holding thousands of FDs?
>
> Thank you all for your help!
>
> Lawrence Weikum
>
>


Reply via email to