[ https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439827#comment-15439827 ]
ASF GitHub Bot commented on KAFKA-1981: --------------------------------------- GitHub user ewasserman opened a pull request: https://github.com/apache/kafka/pull/1794 KAFKA-1981 Make log compaction point configurable Now uses LogSegment.largestTimestamp to determine age of segment's messages. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ewasserman/kafka feat-1981 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1794 ---- commit 50bcc6036217720a69229868fbd7ab3a18c47ff1 Author: Eric Wasserman <eric.wasser...@gmail.com> Date: 2016-08-26T19:09:26Z merge fixes commit 7e5da446cee19e2db2f7f7f93306d7d81de4c3aa Author: Eric Wasserman <eric.wasser...@gmail.com> Date: 2016-08-26T19:57:58Z back out orig files commit 6e8c1ea8832691f4bd8d0c08460dd24a82f676fc Author: Eric Wasserman <eric.wasser...@gmail.com> Date: 2016-08-26T20:48:00Z change logs to string interpolation ---- > Make log compaction point configurable > -------------------------------------- > > Key: KAFKA-1981 > URL: https://issues.apache.org/jira/browse/KAFKA-1981 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.8.2.0 > Reporter: Jay Kreps > Labels: newbie++ > Attachments: KIP for Kafka Compaction Patch.md > > > Currently if you enable log compaction the compactor will kick in whenever > you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. > Other than this we don't give you fine-grained control over when compaction > occurs. In addition we never compact the active segment (since it is still > being written to). > Other than this we don't really give you much control over when compaction > will happen. The result is that you can't really guarantee that a consumer > will get every update to a compacted topic--if the consumer falls behind a > bit it might just get the compacted version. > This is usually fine, but it would be nice to make this more configurable so > you could set either a # messages, size, or time bound for compaction. > This would let you say, for example, "any consumer that is no more than 1 > hour behind will get every message." > This should be relatively easy to implement since it just impacts the > end-point the compactor considers available for compaction. I think we > already have that concept, so this would just be some other overrides to add > in when calculating that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)