There are potentially quite a few things to do if you're noting Kafka
clusters are running out of disk space. A pretty reasonable one is just
turning down the retention time on your larger topics and upping the
retention checker's polling rate (all of that can be done via topic
configuration). Kafka's retention pruning is very fast - it's just file
deletion, so you can reclaim a lot of space very quickly. Another option
could be to disable writes (there are lots of options for that, from
firewalling the brokers off from each other with iptables to using the 0.9
quotas to limit every producer to 1 byte a second of throughput). Most of
these actions are fairly extreme, and require very good knowledge of
applications using Kafka, so I think leaving them in the operator's hands
is a good idea rather than building something into Kafka itself.

That of course assumes you're not running into
https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-1379,
which very severely impacts disk usage if you're using time based retention
on replicated topics and failover occurs.

On Thu, Apr 21, 2016 at 3:19 PM, Charlie Hull <char...@flax.co.uk> wrote:

> On 21/04/2016 15:16, Tom Crayford wrote:
>
>> Generally, Kafka will crash when the disk fills up (it gets an exception
>> trying to do the write to the disk). That will indeed lead to the
>> behaviour
>> you've talked about where all the brokers end up crashing, and there's no
>> contingency for it in the codebase right now (and generally writing code
>> for such scenarios is very tricky). The real trick is to monitor and
>> ensure
>> you don't run out of disk space at all.
>>
>
> We've been working on a project using Kafka and the ELK stack and run into
> exactly this problem - although not with Kafka but with Elasticsearch. Hope
> the following is of interest! I'm not sure how you might apply the same to
> Kafka myself but perhaps there's a way...
>
> http://www.flax.co.uk/blog/2016/04/21/running-disk-space-elasticsearch-solr/
>
> Cheers
>
> Charlie
>
>
>> On Thu, Apr 21, 2016 at 3:11 PM, Jens Rantil <jens.ran...@tink.se> wrote:
>>
>> Hi Lawrence,
>>>
>>> As I'm thinking through this a little more, if that is the case and the
>>>>
>>> node is removed, some partitions in the system may be marked as
>>> under-replicated and cause a cascading effect where partitions are
>>> re-replicated and cause other nodes to fill up.  Has that ever happened?
>>> Does Kafka have a contingency plan for such a scenario?
>>>
>>> Currently, Kafka doesn't rebalance partitions automagically if there is
>>> an
>>> issue with a broker. That excludes the failure scenario that you
>>> portrait.
>>>
>>> Regarding how Kafka handles a full disk, I can't answer that.
>>>
>>> Cheers,
>>> Jens
>>>
>>> On Wed, Apr 20, 2016 at 7:09 PM Lawrence Weikum <lwei...@pandora.com>
>>> wrote:
>>>
>>> Hello,
>>>>
>>>> I'm curious about the expected or default behavior that might occur if a
>>>> broker in the system has filled up.  By that I mean when a broker has
>>>>
>>> used
>>>
>>>> all of its memory and disk space.  Is the node simply removed from the
>>>> system until space is cleared?
>>>>
>>>> As I'm thinking through this a little more, if that is the case and the
>>>> node is removed, some partitions in the system may be marked as
>>>> under-replicated and cause a cascading effect where partitions are
>>>> re-replicated and cause other nodes to fill up.  Has that ever happened?
>>>> Does Kafka have a contingency plan for such a scenario?
>>>>
>>>> Thank you so much for your insight and all of your hard work!
>>>>
>>>> Lawrence
>>>>
>>>> --
>>>
>>> Jens Rantil
>>> Backend Developer @ Tink
>>>
>>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>>> For urgent matters you can reach me at +46-708-84 18 32.
>>>
>>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>

Reply via email to