Re: log.retention.size

Jun Rao Tue, 10 Jun 2014 21:20:06 -0700

Could you file a jira to track this?

Thanks,


Jun


On Tue, Jun 10, 2014 at 8:22 AM, András Serény <sereny.and...@gravityrd.com>
wrote:

> Hi Kafka devs,
>
> are there currently any plans to implement the global threshold feature?
> Is there a JIRA about it?
>
> We are considering to implement a solution for this issue (either inside
> or outside of Kafka).
>
> Thanks a lot,
> András
>
>
> On 5/30/2014 11:45 AM, András Serény wrote:
>
>>
>> Sorry for the delay on this.
>>
>> Yes, that's right -- it'd be just another term in the chain of 'or'
>> conditions. Currently it's <time limit> OR <size limit>. With the global
>> condition, it would be
>> <time limit> OR <size limit> OR <global size limit>
>>
>> In my view, that's fairly simple and intuitive, hence a fine piece of
>> logic.
>>
>> Regards,
>> András
>>
>> On 5/27/2014 4:34 PM, Jun Rao wrote:
>>
>>> For log.retention.bytes.per.topic and log.retention.hours.per.topic, the
>>> current interpretation is that those are tight bounds. In other words,
>>> only
>>> when those thresholds are violated, a segment is deleted. To further
>>> satisfy log.retention.bytes.global, the per topic thresholds may no
>>> longer
>>> be tight, i.e., we may need to delete a segment even when the per topic
>>> threshold is not violated.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>> On Tue, May 27, 2014 at 12:22 AM, András Serény <
>>> sereny.and...@gravityrd.com
>>>
>>>> wrote:
>>>> No, I think more specific settings should get a chance first. I'm
>>>> suggesting that provided that there is a segment rolled for a topic,
>>>> *any
>>>> *of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a
>>>> future log.retention.bytes.global violation would cause segments to be
>>>> deleted.
>>>>
>>>> As far as I understand, the current logic says
>>>>
>>>> (1)
>>>> for each topic, if there is a segment already rolled {
>>>>      mark segments eligible for deletion due to
>>>> log.retention.hours.for.this.topic
>>>>      if log.retention.bytes.for.this.topic is still violated, mark
>>>> segments eligible for deletion due to log.retention.bytes.for.this.
>>>> topic
>>>> }
>>>>
>>>> After this cleanup cycle, there could be another one,  taking into
>>>> account
>>>> the global threshold. For instance, something along the lines of
>>>>
>>>> (2)
>>>> if after (1) log.retention.bytes.global is still violated, for each
>>>> topic,
>>>> if there is a segment already rolled {
>>>>    calculate the required size for this topic (e.g. the proportional
>>>> size,
>>>> or simply (full size - threshold)/#topics ?)
>>>>    mark segments exceeding the required size for deletion
>>>> }
>>>>
>>>> Regards,
>>>> András
>>>>
>>>>
>>>>
>>>> On 5/23/2014 4:46 PM, Jun Rao wrote:
>>>>
>>>>  Yes, that's possible. There is a default log.retention.bytes for every
>>>>> topic. By introducing a global threshold, we may have to delete data
>>>>> from
>>>>> logs whose size is smaller than log.retention.bytes. So, are you saying
>>>>> that the global threshold has precedence?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jun
>>>>>
>>>>>
>>>>> On Fri, May 23, 2014 at 2:26 AM, András Serény
>>>>> <sereny.and...@gravityrd.com>wrote:
>>>>>
>>>>>   Hi Kafka users,
>>>>>
>>>>>> this feature would also be very useful for us. With lots of topics of
>>>>>> different volume (and as they grow in number) it could become tedious
>>>>>> to
>>>>>> maintain topic level settings.
>>>>>>
>>>>>> As a start, I think uniform reduction is a good idea. Logs wouldn't be
>>>>>> retained as long as you want, but that's already the case when a
>>>>>> log.retention.bytes setting is specified. As for early rolling, I
>>>>>> don't
>>>>>> think it's necessary: currently, if there is no log segment eligible
>>>>>> for
>>>>>> deletion, log.retention.bytes and log.retention.hours settings won't
>>>>>> kick
>>>>>> in, so it's possible to exceed these limits, which is completely fine
>>>>>> (please correct me if I'm mistaken here).
>>>>>>
>>>>>> All in all, introducing a global threshold doesn't seem to induce a
>>>>>> considerable change in current retention logic.
>>>>>>
>>>>>> Regards,
>>>>>> András
>>>>>>
>>>>>>
>>>>>> On 5/8/2014 2:00 AM, vinh wrote:
>>>>>>
>>>>>>   Agreed…a global knob is a bit tricky for exactly the reason you've
>>>>>>
>>>>>>> identified.  Perhaps the problem could be simplified though by
>>>>>>> considering
>>>>>>> the context and purpose of Kafka.  I would use a persistent message
>>>>>>> queue
>>>>>>> because I want to guarantee that data/messages don't get lost.  But,
>>>>>>> since
>>>>>>> Kafka is not meant to be a long term storage solution (other products
>>>>>>> can
>>>>>>> be used for that), I would clarify that guarantee to apply only to
>>>>>>> the
>>>>>>> most
>>>>>>> recent messages up until a certain configured threshold (i.e. max 24
>>>>>>> hrs,
>>>>>>> max 500GB, etc).  Once those thresholds are reached, old messages are
>>>>>>> deleted first.
>>>>>>>
>>>>>>> To ensure no message loss (up to a limit), I must ensure Kafka is
>>>>>>> highly
>>>>>>> available.  There's a small a chance that the message deletion rate
>>>>>>> is
>>>>>>> the
>>>>>>> same rate that receive rate.  For example, when the incoming volume
>>>>>>> is
>>>>>>> so
>>>>>>> high that the size threshold is reached before the time threshold.
>>>>>>>   But, I
>>>>>>> may be ok with that because if Kafka goes down, it can cause upstream
>>>>>>> applications to fail.  This can result in higher losses overall, and
>>>>>>> particularly of the most *recent* messages.
>>>>>>>
>>>>>>> In other words, in a persistent but ephemeral message queue, I would
>>>>>>> give
>>>>>>> higher precedence to recent messages over older ones. On the flip
>>>>>>> side, by
>>>>>>> allowing Kafka to go down when a disk is full, applications are
>>>>>>> forced
>>>>>>> to
>>>>>>> deal with the issue.  This adds complexity to apps, but perhaps it's
>>>>>>> not a
>>>>>>> bad thing.  After all, in scalability, all apps should be designed to
>>>>>>> handle failure.
>>>>>>>
>>>>>>> Having said that, next is to decide which messages to delete first.
>>>>>>>  I
>>>>>>> believe that's a separate issue and has its own complexities, too.
>>>>>>>
>>>>>>> The main idea though is that a global knob would provide flexibility,
>>>>>>> even if not used.  From an operation perspective, if we can't ensure
>>>>>>> HA
>>>>>>> for
>>>>>>> all applications/components, it would be good if we can for at least
>>>>>>> some
>>>>>>> of the core ones, like Kafka.  This is much easier said that done
>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>> On May 5, 2014, at 9:16 AM, Jun Rao <jun...@gmail.com> wrote:
>>>>>>>
>>>>>>>    Yes, your understanding is correct. A global knob that controls
>>>>>>> aggregate
>>>>>>>
>>>>>>>  log size may make sense. What would be the expected behavior when
>>>>>>>> that
>>>>>>>> limit is reached? Would you reduce the retention uniformly across
>>>>>>>> all
>>>>>>>> topics? Then, it just means that some of the logs may not be
>>>>>>>> retained
>>>>>>>> as
>>>>>>>> long as you want. Also, we need to think through what happens when
>>>>>>>> every
>>>>>>>> log has only 1 segment left and yet the total size still exceeds the
>>>>>>>> limit.
>>>>>>>> Do we roll log segments early?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jun
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 4, 2014 at 4:31 AM, vinh <v...@loggly.com> wrote:
>>>>>>>>
>>>>>>>>    Thanks Jun.  So if I understand this correctly, there really is
>>>>>>>> no
>>>>>>>>
>>>>>>>>  master
>>>>>>>>> property to control the total aggregate size of all Kafka data
>>>>>>>>> files
>>>>>>>>> on
>>>>>>>>> a
>>>>>>>>> broker.
>>>>>>>>>
>>>>>>>>> log.retention.size and log.file.size are great for managing data at
>>>>>>>>> the
>>>>>>>>> application level.  In our case, application needs change
>>>>>>>>> frequently,
>>>>>>>>> and
>>>>>>>>> performance itself is an ever evolving feature. This means various
>>>>>>>>> configs
>>>>>>>>> are constantly changing, like topics, # of partitions, etc.
>>>>>>>>>
>>>>>>>>> What rarely changes though is provisioned hardware resources.  So a
>>>>>>>>> setting to control the total aggregate size of Kafka logs (or
>>>>>>>>> persisted
>>>>>>>>> data, for better clarity) would definitely simplify things at an
>>>>>>>>> operational level, regardless what happens at the application
>>>>>>>>> level.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On May 2, 2014, at 7:49 AM, Jun Rao <jun...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>    log.retention.size controls the total size in a log dir (per
>>>>>>>>>
>>>>>>>>>  partition). log.file.size
>>>>>>>>>> controls the size of each log segment in the log dir.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Jun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, May 1, 2014 at 9:31 PM, vinh <v...@loggly.com> wrote:
>>>>>>>>>>
>>>>>>>>>>    In the 0.7 docs, the description for log.retention.size and
>>>>>>>>>> log.file.size
>>>>>>>>>> sound very much the same.  In particular, that they apply to a
>>>>>>>>>> single
>>>>>>>>>> log
>>>>>>>>>> file (or log segment file).
>>>>>>>>>>
>>>>>>>>>>  http://kafka.apache.org/07/configuration.html
>>>>>>>>>>>
>>>>>>>>>>> I'm beginning to think there is no setting to control the max
>>>>>>>>>>> aggregate
>>>>>>>>>>> size of all logs.  If this is correct, what would be a good
>>>>>>>>>>> approach
>>>>>>>>>>> to
>>>>>>>>>>> enforce this requirement?  In my particular scenario, I have a
>>>>>>>>>>> lot
>>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>>   data
>>>>>>>>>>>
>>>>>>>>>> being written to Kafka at a very high rate.  So a 1TB disk can
>>>>>>>>>> easily
>>>>>>>>>>
>>>>>>>>>>  be
>>>>>>>>>>> filled up in 24hrs or so.  One option is to add more Kafka
>>>>>>>>>>> brokers
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>>   add
>>>>>>>>>>>
>>>>>>>>>> more disk space to the pool, but I'd like to avoid that and see
>>>>>>>>>> if I
>>>>>>>>>>
>>>>>>>>>>  can
>>>>>>>>>>> simply configure Kafka to not write more than 1TB aggregate.
>>>>>>>>>>>  Else,
>>>>>>>>>>>
>>>>>>>>>>>   Kafka
>>>>>>>>>>>
>>>>>>>>>> will OOM and kill itself, and possibly the crash the node itself
>>>>>>>>>>
>>>>>>>>>>  because
>>>>>>>>>>> the disk is full.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On May 1, 2014, at 9:21 PM, vinh <v...@loggly.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>    Using Kafka 0.7.2, I have the following in server.properties:
>>>>>>>>>>>
>>>>>>>>>>>  log.retention.hours=48
>>>>>>>>>>>> log.retention.size=107374182400
>>>>>>>>>>>> log.file.size=536870912
>>>>>>>>>>>>
>>>>>>>>>>>> My interpretation of this is:
>>>>>>>>>>>> a) a single log segment file over 48hrs old will be deleted
>>>>>>>>>>>> b) the total combined size of *all* logs is 100GB
>>>>>>>>>>>> c) a single log segment file is limited to 500MB in size before
>>>>>>>>>>>> a
>>>>>>>>>>>> new
>>>>>>>>>>>>
>>>>>>>>>>>>   segment file is spawned spawning a new segment file
>>>>>>>>>>>>
>>>>>>>>>>>   d) a "log file" can be composed of many "log segment files"
>>>>>>>>>>>
>>>>>>>>>>>> But, even after setting the above, I find that the total
>>>>>>>>>>>> combined
>>>>>>>>>>>> size
>>>>>>>>>>>>
>>>>>>>>>>>>   of all Kafka logs on disk is 200GB right now.  Isn't
>>>>>>>>>>>>
>>>>>>>>>>> log.retention.size
>>>>>>>>>>> supposed to limit it to 100GB?  Am I missing something?  The docs
>>>>>>>>>>> are
>>>>>>>>>>>
>>>>>>>>>>>   not
>>>>>>>>>>>
>>>>>>>>>> really clear, especially when it comes to distinguishing between a
>>>>>>>>>> "log
>>>>>>>>>>
>>>>>>>>>>  file" and a "log segment file".
>>>>>>>>>>>
>>>>>>>>>>>   I have disk monitoring.  But like anything else in software,
>>>>>>>>>>> even
>>>>>>>>>>>
>>>>>>>>>>>>   monitoring can fail. Via configuration, I'd like to make sure
>>>>>>>>>>>>
>>>>>>>>>>> that
>>>>>>>>>>>
>>>>>>>>>>>   Kafka
>>>>>>>>>>>
>>>>>>>>>> does not write more than the available disk space.  Or something
>>>>>>>>>> like
>>>>>>>>>>
>>>>>>>>>>  log4j, where I can set a max number of log files and the max size
>>>>>>>>>>> per
>>>>>>>>>>>
>>>>>>>>>>>   file,
>>>>>>>>>>>
>>>>>>>>>> which essentially allows me to set a max aggregate size limit
>>>>>>>>>> across
>>>>>>>>>>
>>>>>>>>>>  all
>>>>>>>>>>> logs.
>>>>>>>>>>>
>>>>>>>>>>>   Thanks,
>>>>>>>>>>>
>>>>>>>>>>>> -Vinh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>
>

Re: log.retention.size

Reply via email to