VIkram created KAFKA-9879:
-----------------------------

             Summary: How kafka deletes tombstone messages?
                 Key: KAFKA-9879
                 URL: https://issues.apache.org/jira/browse/KAFKA-9879
             Project: Kafka
          Issue Type: Bug
            Reporter: VIkram


I was able to delete records in kafka using tombstone messages after few 
attempts. However the algorithm (or logic) that kafka uses to delete these 
tombstone messages is still unclear to me.

>From my observations, I could figure out that there is some relation between 
>last modified time of a segment and deletion of tombstone messages. I have 
>looked into this [https://stackoverflow.com/a/48325699/6940279] but it's a bit 
>complex to understand.

*Topic details*

 

{{Topic:reddyvel_13   PartitionCount:1    ReplicationFactor:3 
Configs:cleanup.policy=compact,segment.bytes=2000000,delete.retention.ms=10000
    Topic: reddyvel_13  Partition: 0    Leader: 1   Replicas: 1,5,2 Isr: 1,5,2}}

I have set {{cleanup.policy=compact}}, {{segment.bytes=2000000}}, 
{{delete.retention.ms=10000}}

*Timeline of events*
 * First segment (baseOffset = 0) was closed at {{2020-04-02 07:12:09,908}}

 

{{[2020-04-02 07:12:09,908] INFO [ProducerStateManager partition=reddyvel_13-0] 
Writing producer snapshot at offset 16623 (kafka.log.ProducerStateManager)
[2020-04-02 07:12:09,908] INFO [Log partition=reddyvel_13-0, 
dir=/local/kafka/data] Rolled new log segment at offset 16623 in 1 ms. 
(kafka.log.Log)}}

Compaction has been triggered immediately on this closed segment

 

{{[2020-04-02 07:12:22,989] INFO Cleaner 0: Cleaning log reddyvel_13-0 
(cleaning prior to Thu Apr 02 07:12:09 EDT 2020, discarding tombstones prior to 
Wed Dec 31 19:00:00 EST 1969)... (kafka.l
og.LogCleaner)}}
 * Sent few more messages along with few tombstones (to delete messages present 
in first segment) and Second segment was closed at {{2020-04-02 07:56:50,405}}

 

{{[2020-04-02 07:56:50,405] INFO [ProducerStateManager partition=reddyvel_13-0] 
Writing producer snapshot at offset 33868 (kafka.log.ProducerStateManager)
[2020-04-02 07:56:50,406] INFO [Log partition=reddyvel_13-0, 
dir=/local/kafka/data] Rolled new log segment at offset 33868 in 2 ms. 
(kafka.log.Log)}}

Compaction has been triggered

 

{{[2020-04-02 07:56:53,180] INFO Cleaner 0: Cleaning log reddyvel_13-0 
(cleaning prior to Thu Apr 02 07:56:50 EDT 2020, discarding tombstones prior to 
Thu Apr 02 07:11:59 EDT 2020)... (kafka.l
og.LogCleaner)}}

Here, above log message says {{discarding tombstones prior to Thu Apr 02 
07:11:59 EDT 2020}}. This timestamp is exactly equal to first segment closing 
timestamp ({{2020-04-02 07:12:09,908}})   -   {{delete.retention.ms}} (10 
seconds) of my topic. I'm not able to figure out the link between these.

I want to understand at what time does kafka trigger deletion of tombstone 
messages. Can someone explain the tombstone deletion algorithm in simpler terms 
and the reasoning behind it?

 

It's not a bug but I need more information on this. I have posted this in other 
forums like stackoverflow but did not get any reply. The kafka official 
documentation doesn't have this information. If this is not the correct 
platform for this, kindly guide me to the relevant platform.

 

Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to