Jun Rao created KAFKA-4545:
------------------------------

             Summary: tombstone needs to be removed after delete.retention.ms 
has passed after it has been cleaned
                 Key: KAFKA-4545
                 URL: https://issues.apache.org/jira/browse/KAFKA-4545
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.8.2.0
            Reporter: Jun Rao


The algorithm for removing the tombstone in a compacted is supposed to be the 
following.
1. Tombstone is never removed when it's still in the dirty portion of the log.
2. After the tombstone is in the cleaned portion of the log, we further delay 
the removal of the tombstone by delete.retention.ms since the time the 
tombstone is in the cleaned portion.

Once the tombstone is in the cleaned portion, we know there can't be any 
message with the same key before the tombstone. Therefore, for any consumer, if 
it reads a non-tombstone message before the tombstone, but can read to the end 
of the log within delete.retention.ms, it's guaranteed to see the tombstone.

However, the current implementation doesn't seem correct. We delay the removal 
of the tombstone by delete.retention.ms since the last modified time of the 
last cleaned segment. However, the last modified time is inherited from the 
original segment, which could be arbitrarily old. So, the tombstone may not be 
preserved as long as it needs to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to