[ 
https://issues.apache.org/jira/browse/KAFKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109089#comment-14109089
 ] 

Jim Hoagland commented on KAFKA-1489:
-------------------------------------

Good discussion here...

In a steady state an emergency discard based on reducing the retention by 10% 
will free up 10% of disk.  That assumes the topics have been used for long 
enough to have started to be affected by retention settings and it assumes a 
steady rate of incoming messages.  If the topic is new relative to its 
retention period a 10% cut in the retention period may free up 0% of disk.  If 
there has been a recent increase in messages (either a temporary spike or a new 
normal), then we would get less than 10% of disk freed.

That said, it adds complication to handle those cases so we may want to pass on 
those in the first solution.  Things certainly get significantly more 
complicated and hard to test if the retention cut needs to be iterative and 
based on how much previous attempts helped.  However, if it is feasible to 
predict how much impact a reduced retention period will have and we can take 
that into account when we do the emergency discard then we can adjust the 
retention cut percentage accordingly.  In fact we may just want to make the 
discard percent based on how much disk will actually get freed (applied on a 
per-topic basis) and not really based on a percent reduction in retention 
period.

> Global threshold on data retention size
> ---------------------------------------
>
>                 Key: KAFKA-1489
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1489
>             Project: Kafka
>          Issue Type: New Feature
>          Components: log
>    Affects Versions: 0.8.1.1
>            Reporter: Andras Sereny
>            Assignee: Jay Kreps
>              Labels: newbie
>
> Currently, Kafka has per topic settings to control the size of one single log 
> (log.retention.bytes). With lots of topics of different volume and as they 
> grow in number, it could become tedious to maintain topic level settings 
> applying to a single log. 
> Often, a chunk of disk space is dedicated to Kafka that hosts all logs 
> stored, so it'd make sense to have a configurable threshold to control how 
> much space *all* data in one Kafka log data directory can take up.
> See also:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201406.mbox/browser
> http://mail-archives.apache.org/mod_mbox/kafka-users/201311.mbox/%3c20131107015125.gc9...@jkoshy-ld.linkedin.biz%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to