Thanks Jun.  So if I understand this correctly, there really is no master 
property to control the total aggregate size of all Kafka data files on a 
broker.

log.retention.size and log.file.size are great for managing data at the 
application level.  In our case, application needs change frequently, and 
performance itself is an ever evolving feature.  This means various configs are 
constantly changing, like topics, # of partitions, etc.

What rarely changes though is provisioned hardware resources.  So a setting to 
control the total aggregate size of Kafka logs (or persisted data, for better 
clarity) would definitely simplify things at an operational level, regardless 
what happens at the application level.


On May 2, 2014, at 7:49 AM, Jun Rao <jun...@gmail.com> wrote:

> log.retention.size controls the total size in a log dir (per
> partition). log.file.size
> controls the size of each log segment in the log dir.
> 
> Thanks,
> 
> Jun
> 
> 
> On Thu, May 1, 2014 at 9:31 PM, vinh <v...@loggly.com> wrote:
> 
>> In the 0.7 docs, the description for log.retention.size and log.file.size
>> sound very much the same.  In particular, that they apply to a single log
>> file (or log segment file).
>> 
>> http://kafka.apache.org/07/configuration.html
>> 
>> I'm beginning to think there is no setting to control the max aggregate
>> size of all logs.  If this is correct, what would be a good approach to
>> enforce this requirement?  In my particular scenario, I have a lot of data
>> being written to Kafka at a very high rate.  So a 1TB disk can easily be
>> filled up in 24hrs or so.  One option is to add more Kafka brokers to add
>> more disk space to the pool, but I'd like to avoid that and see if I can
>> simply configure Kafka to not write more than 1TB aggregate.  Else, Kafka
>> will OOM and kill itself, and possibly the crash the node itself because
>> the disk is full.
>> 
>> 
>> On May 1, 2014, at 9:21 PM, vinh <v...@loggly.com> wrote:
>> 
>>> Using Kafka 0.7.2, I have the following in server.properties:
>>> 
>>> log.retention.hours=48
>>> log.retention.size=107374182400
>>> log.file.size=536870912
>>> 
>>> My interpretation of this is:
>>> a) a single log segment file over 48hrs old will be deleted
>>> b) the total combined size of *all* logs is 100GB
>>> c) a single log segment file is limited to 500MB in size before a new
>> segment file is spawned spawning a new segment file
>>> d) a "log file" can be composed of many "log segment files"
>>> 
>>> But, even after setting the above, I find that the total combined size
>> of all Kafka logs on disk is 200GB right now.  Isn't log.retention.size
>> supposed to limit it to 100GB?  Am I missing something?  The docs are not
>> really clear, especially when it comes to distinguishing between a "log
>> file" and a "log segment file".
>>> 
>>> I have disk monitoring.  But like anything else in software, even
>> monitoring can fail.  Via configuration, I'd like to make sure that Kafka
>> does not write more than the available disk space.  Or something like
>> log4j, where I can set a max number of log files and the max size per file,
>> which essentially allows me to set a max aggregate size limit across all
>> logs.
>>> 
>>> Thanks,
>>> -Vinh
>> 
>> 

Reply via email to