[ 
https://issues.apache.org/jira/browse/KAFKA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Warshaw updated KAFKA-3178:
--------------------------------
    Description: 
h3. Description
One of Kafka's officially-described use cases is a distributed commit log 
(http://kafka.apache.org/documentation.html#uses_commitlog).  In this case, for 
a distributed service that needed a commit log, there would be a topic with a 
single partition to guarantee log order.  This service would use the commit log 
to re-sync failed nodes.  Kafka is generally an excellent fit for such a 
system, but it does not expose an adequate mechanism for log cleanup in such a 
case.  The built-in log cleanup mechanisms are based on time / size thresholds, 
which doesn't work well with a commit log; data can only be deleted from a 
commit log when the client application determines that it is no longer needed.  
Here we propose a new API exposed to clients through AdminUtils that will 
delete all messages before a certain offset from a specific partition.

h3. Rejected Alternatives
- Manually setting / resetting time intervals for log retention configs to 
periodically flush messages from the logs from before a certain time period.  
Doing this involves several asynchronous processes, none of which provide any 
hooks to know when they are actually complete.
- Rolling a new topic each time we want to cleanup the log.  This is the best 
existing approach, but is not ideal.  All incoming writes would be paused while 
waiting for a new topic to be created.



  was:
One of Kafka's officially-described use cases is a distributed commit log 
(http://kafka.apache.org/documentation.html#uses_commitlog).  In this case, for 
a distributed service that needed a commit log, there would be a topic with a 
single partition to guarantee log order.  This service would use the commit log 
to re-sync failed nodes.  Kafka is generally an excellent fit for such a 
system, but it does not expose an adequate mechanism for log cleanup in such a 
case.  The built-in log cleanup mechanisms are based on time / size thresholds, 
which doesn't work well with a commit log; data can only be deleted from a 
commit log when the client application determines that it is no longer needed.  
Here we propose a new API exposed to clients through AdminUtils that will 
delete all messages before a certain offset from a specific partition.



> Expose a method in AdminUtils to manually truncate a specific partition to a 
> particular offset
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-3178
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3178
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Bill Warshaw
>              Labels: kafka
>
> h3. Description
> One of Kafka's officially-described use cases is a distributed commit log 
> (http://kafka.apache.org/documentation.html#uses_commitlog).  In this case, 
> for a distributed service that needed a commit log, there would be a topic 
> with a single partition to guarantee log order.  This service would use the 
> commit log to re-sync failed nodes.  Kafka is generally an excellent fit for 
> such a system, but it does not expose an adequate mechanism for log cleanup 
> in such a case.  The built-in log cleanup mechanisms are based on time / size 
> thresholds, which doesn't work well with a commit log; data can only be 
> deleted from a commit log when the client application determines that it is 
> no longer needed.  Here we propose a new API exposed to clients through 
> AdminUtils that will delete all messages before a certain offset from a 
> specific partition.
> h3. Rejected Alternatives
> - Manually setting / resetting time intervals for log retention configs to 
> periodically flush messages from the logs from before a certain time period.  
> Doing this involves several asynchronous processes, none of which provide any 
> hooks to know when they are actually complete.
> - Rolling a new topic each time we want to cleanup the log.  This is the best 
> existing approach, but is not ideal.  All incoming writes would be paused 
> while waiting for a new topic to be created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to