Could you file a jira to track this? Thanks,
Jun On Tue, Jun 10, 2014 at 8:22 AM, András Serény <sereny.and...@gravityrd.com> wrote: > Hi Kafka devs, > > are there currently any plans to implement the global threshold feature? > Is there a JIRA about it? > > We are considering to implement a solution for this issue (either inside > or outside of Kafka). > > Thanks a lot, > András > > > On 5/30/2014 11:45 AM, András Serény wrote: > >> >> Sorry for the delay on this. >> >> Yes, that's right -- it'd be just another term in the chain of 'or' >> conditions. Currently it's <time limit> OR <size limit>. With the global >> condition, it would be >> <time limit> OR <size limit> OR <global size limit> >> >> In my view, that's fairly simple and intuitive, hence a fine piece of >> logic. >> >> Regards, >> András >> >> On 5/27/2014 4:34 PM, Jun Rao wrote: >> >>> For log.retention.bytes.per.topic and log.retention.hours.per.topic, the >>> current interpretation is that those are tight bounds. In other words, >>> only >>> when those thresholds are violated, a segment is deleted. To further >>> satisfy log.retention.bytes.global, the per topic thresholds may no >>> longer >>> be tight, i.e., we may need to delete a segment even when the per topic >>> threshold is not violated. >>> >>> Thanks, >>> >>> Jun >>> >>> >>> On Tue, May 27, 2014 at 12:22 AM, András Serény < >>> sereny.and...@gravityrd.com >>> >>>> wrote: >>>> No, I think more specific settings should get a chance first. I'm >>>> suggesting that provided that there is a segment rolled for a topic, >>>> *any >>>> *of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a >>>> future log.retention.bytes.global violation would cause segments to be >>>> deleted. >>>> >>>> As far as I understand, the current logic says >>>> >>>> (1) >>>> for each topic, if there is a segment already rolled { >>>> mark segments eligible for deletion due to >>>> log.retention.hours.for.this.topic >>>> if log.retention.bytes.for.this.topic is still violated, mark >>>> segments eligible for deletion due to log.retention.bytes.for.this. >>>> topic >>>> } >>>> >>>> After this cleanup cycle, there could be another one, taking into >>>> account >>>> the global threshold. For instance, something along the lines of >>>> >>>> (2) >>>> if after (1) log.retention.bytes.global is still violated, for each >>>> topic, >>>> if there is a segment already rolled { >>>> calculate the required size for this topic (e.g. the proportional >>>> size, >>>> or simply (full size - threshold)/#topics ?) >>>> mark segments exceeding the required size for deletion >>>> } >>>> >>>> Regards, >>>> András >>>> >>>> >>>> >>>> On 5/23/2014 4:46 PM, Jun Rao wrote: >>>> >>>> Yes, that's possible. There is a default log.retention.bytes for every >>>>> topic. By introducing a global threshold, we may have to delete data >>>>> from >>>>> logs whose size is smaller than log.retention.bytes. So, are you saying >>>>> that the global threshold has precedence? >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> >>>>> On Fri, May 23, 2014 at 2:26 AM, András Serény >>>>> <sereny.and...@gravityrd.com>wrote: >>>>> >>>>> Hi Kafka users, >>>>> >>>>>> this feature would also be very useful for us. With lots of topics of >>>>>> different volume (and as they grow in number) it could become tedious >>>>>> to >>>>>> maintain topic level settings. >>>>>> >>>>>> As a start, I think uniform reduction is a good idea. Logs wouldn't be >>>>>> retained as long as you want, but that's already the case when a >>>>>> log.retention.bytes setting is specified. As for early rolling, I >>>>>> don't >>>>>> think it's necessary: currently, if there is no log segment eligible >>>>>> for >>>>>> deletion, log.retention.bytes and log.retention.hours settings won't >>>>>> kick >>>>>> in, so it's possible to exceed these limits, which is completely fine >>>>>> (please correct me if I'm mistaken here). >>>>>> >>>>>> All in all, introducing a global threshold doesn't seem to induce a >>>>>> considerable change in current retention logic. >>>>>> >>>>>> Regards, >>>>>> András >>>>>> >>>>>> >>>>>> On 5/8/2014 2:00 AM, vinh wrote: >>>>>> >>>>>> Agreed…a global knob is a bit tricky for exactly the reason you've >>>>>> >>>>>>> identified. Perhaps the problem could be simplified though by >>>>>>> considering >>>>>>> the context and purpose of Kafka. I would use a persistent message >>>>>>> queue >>>>>>> because I want to guarantee that data/messages don't get lost. But, >>>>>>> since >>>>>>> Kafka is not meant to be a long term storage solution (other products >>>>>>> can >>>>>>> be used for that), I would clarify that guarantee to apply only to >>>>>>> the >>>>>>> most >>>>>>> recent messages up until a certain configured threshold (i.e. max 24 >>>>>>> hrs, >>>>>>> max 500GB, etc). Once those thresholds are reached, old messages are >>>>>>> deleted first. >>>>>>> >>>>>>> To ensure no message loss (up to a limit), I must ensure Kafka is >>>>>>> highly >>>>>>> available. There's a small a chance that the message deletion rate >>>>>>> is >>>>>>> the >>>>>>> same rate that receive rate. For example, when the incoming volume >>>>>>> is >>>>>>> so >>>>>>> high that the size threshold is reached before the time threshold. >>>>>>> But, I >>>>>>> may be ok with that because if Kafka goes down, it can cause upstream >>>>>>> applications to fail. This can result in higher losses overall, and >>>>>>> particularly of the most *recent* messages. >>>>>>> >>>>>>> In other words, in a persistent but ephemeral message queue, I would >>>>>>> give >>>>>>> higher precedence to recent messages over older ones. On the flip >>>>>>> side, by >>>>>>> allowing Kafka to go down when a disk is full, applications are >>>>>>> forced >>>>>>> to >>>>>>> deal with the issue. This adds complexity to apps, but perhaps it's >>>>>>> not a >>>>>>> bad thing. After all, in scalability, all apps should be designed to >>>>>>> handle failure. >>>>>>> >>>>>>> Having said that, next is to decide which messages to delete first. >>>>>>> I >>>>>>> believe that's a separate issue and has its own complexities, too. >>>>>>> >>>>>>> The main idea though is that a global knob would provide flexibility, >>>>>>> even if not used. From an operation perspective, if we can't ensure >>>>>>> HA >>>>>>> for >>>>>>> all applications/components, it would be good if we can for at least >>>>>>> some >>>>>>> of the core ones, like Kafka. This is much easier said that done >>>>>>> though. >>>>>>> >>>>>>> >>>>>>> On May 5, 2014, at 9:16 AM, Jun Rao <jun...@gmail.com> wrote: >>>>>>> >>>>>>> Yes, your understanding is correct. A global knob that controls >>>>>>> aggregate >>>>>>> >>>>>>> log size may make sense. What would be the expected behavior when >>>>>>>> that >>>>>>>> limit is reached? Would you reduce the retention uniformly across >>>>>>>> all >>>>>>>> topics? Then, it just means that some of the logs may not be >>>>>>>> retained >>>>>>>> as >>>>>>>> long as you want. Also, we need to think through what happens when >>>>>>>> every >>>>>>>> log has only 1 segment left and yet the total size still exceeds the >>>>>>>> limit. >>>>>>>> Do we roll log segments early? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Jun >>>>>>>> >>>>>>>> >>>>>>>> On Sun, May 4, 2014 at 4:31 AM, vinh <v...@loggly.com> wrote: >>>>>>>> >>>>>>>> Thanks Jun. So if I understand this correctly, there really is >>>>>>>> no >>>>>>>> >>>>>>>> master >>>>>>>>> property to control the total aggregate size of all Kafka data >>>>>>>>> files >>>>>>>>> on >>>>>>>>> a >>>>>>>>> broker. >>>>>>>>> >>>>>>>>> log.retention.size and log.file.size are great for managing data at >>>>>>>>> the >>>>>>>>> application level. In our case, application needs change >>>>>>>>> frequently, >>>>>>>>> and >>>>>>>>> performance itself is an ever evolving feature. This means various >>>>>>>>> configs >>>>>>>>> are constantly changing, like topics, # of partitions, etc. >>>>>>>>> >>>>>>>>> What rarely changes though is provisioned hardware resources. So a >>>>>>>>> setting to control the total aggregate size of Kafka logs (or >>>>>>>>> persisted >>>>>>>>> data, for better clarity) would definitely simplify things at an >>>>>>>>> operational level, regardless what happens at the application >>>>>>>>> level. >>>>>>>>> >>>>>>>>> >>>>>>>>> On May 2, 2014, at 7:49 AM, Jun Rao <jun...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> log.retention.size controls the total size in a log dir (per >>>>>>>>> >>>>>>>>> partition). log.file.size >>>>>>>>>> controls the size of each log segment in the log dir. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Jun >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, May 1, 2014 at 9:31 PM, vinh <v...@loggly.com> wrote: >>>>>>>>>> >>>>>>>>>> In the 0.7 docs, the description for log.retention.size and >>>>>>>>>> log.file.size >>>>>>>>>> sound very much the same. In particular, that they apply to a >>>>>>>>>> single >>>>>>>>>> log >>>>>>>>>> file (or log segment file). >>>>>>>>>> >>>>>>>>>> http://kafka.apache.org/07/configuration.html >>>>>>>>>>> >>>>>>>>>>> I'm beginning to think there is no setting to control the max >>>>>>>>>>> aggregate >>>>>>>>>>> size of all logs. If this is correct, what would be a good >>>>>>>>>>> approach >>>>>>>>>>> to >>>>>>>>>>> enforce this requirement? In my particular scenario, I have a >>>>>>>>>>> lot >>>>>>>>>>> of >>>>>>>>>>> >>>>>>>>>>> data >>>>>>>>>>> >>>>>>>>>> being written to Kafka at a very high rate. So a 1TB disk can >>>>>>>>>> easily >>>>>>>>>> >>>>>>>>>> be >>>>>>>>>>> filled up in 24hrs or so. One option is to add more Kafka >>>>>>>>>>> brokers >>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>>> add >>>>>>>>>>> >>>>>>>>>> more disk space to the pool, but I'd like to avoid that and see >>>>>>>>>> if I >>>>>>>>>> >>>>>>>>>> can >>>>>>>>>>> simply configure Kafka to not write more than 1TB aggregate. >>>>>>>>>>> Else, >>>>>>>>>>> >>>>>>>>>>> Kafka >>>>>>>>>>> >>>>>>>>>> will OOM and kill itself, and possibly the crash the node itself >>>>>>>>>> >>>>>>>>>> because >>>>>>>>>>> the disk is full. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On May 1, 2014, at 9:21 PM, vinh <v...@loggly.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Using Kafka 0.7.2, I have the following in server.properties: >>>>>>>>>>> >>>>>>>>>>> log.retention.hours=48 >>>>>>>>>>>> log.retention.size=107374182400 >>>>>>>>>>>> log.file.size=536870912 >>>>>>>>>>>> >>>>>>>>>>>> My interpretation of this is: >>>>>>>>>>>> a) a single log segment file over 48hrs old will be deleted >>>>>>>>>>>> b) the total combined size of *all* logs is 100GB >>>>>>>>>>>> c) a single log segment file is limited to 500MB in size before >>>>>>>>>>>> a >>>>>>>>>>>> new >>>>>>>>>>>> >>>>>>>>>>>> segment file is spawned spawning a new segment file >>>>>>>>>>>> >>>>>>>>>>> d) a "log file" can be composed of many "log segment files" >>>>>>>>>>> >>>>>>>>>>>> But, even after setting the above, I find that the total >>>>>>>>>>>> combined >>>>>>>>>>>> size >>>>>>>>>>>> >>>>>>>>>>>> of all Kafka logs on disk is 200GB right now. Isn't >>>>>>>>>>>> >>>>>>>>>>> log.retention.size >>>>>>>>>>> supposed to limit it to 100GB? Am I missing something? The docs >>>>>>>>>>> are >>>>>>>>>>> >>>>>>>>>>> not >>>>>>>>>>> >>>>>>>>>> really clear, especially when it comes to distinguishing between a >>>>>>>>>> "log >>>>>>>>>> >>>>>>>>>> file" and a "log segment file". >>>>>>>>>>> >>>>>>>>>>> I have disk monitoring. But like anything else in software, >>>>>>>>>>> even >>>>>>>>>>> >>>>>>>>>>>> monitoring can fail. Via configuration, I'd like to make sure >>>>>>>>>>>> >>>>>>>>>>> that >>>>>>>>>>> >>>>>>>>>>> Kafka >>>>>>>>>>> >>>>>>>>>> does not write more than the available disk space. Or something >>>>>>>>>> like >>>>>>>>>> >>>>>>>>>> log4j, where I can set a max number of log files and the max size >>>>>>>>>>> per >>>>>>>>>>> >>>>>>>>>>> file, >>>>>>>>>>> >>>>>>>>>> which essentially allows me to set a max aggregate size limit >>>>>>>>>> across >>>>>>>>>> >>>>>>>>>> all >>>>>>>>>>> logs. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>>> -Vinh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >> >