Compaction works on same keys, not same messages. So at a configured time
it will go through the log and delete al records but the most recent one
with the same key. I guess, but I'm not entirely sure the null message
get's deleted in the same pass, since it's a null message. So your example
would behave something like this:
M1, M2, M3, M4 are produced and checked for compaction, only M4 is kept.
null is added, after next compaction round, both M4 and null are removed.
Some time after M5 and M6 are produced, only M6 is kept.

On Sat, May 28, 2016 at 7:27 AM Bartosz Konieczny <bartkoniec...@gmail.com>
wrote:

> Hello,
>
> I'm studying the part about logs retention. For the delete I've no problems
> to see what's going on. However, this is more tricky for compaction. I come
> to you with some questions about it:
>
> 1) In the documentation I can see that putting null key/payload will be
> used as a 'delete' marker:
> "Compaction also allows for deletes. A message with a key and a null
> payload will be treated as a delete from the log. This delete marker will
> cause any prior message with that key to be removed (as would any new
> message with that key), but delete markers are special in that they will
> themselves be cleaned out of the log after a period of time to free up
> space."
>
> Let's suppose we have following messages: M1, M2, M3, M4, null, M5, M6.
> Now, could you tell me if my understanding is correct for below cases ?
> * For the first case, {M1, M2, M3, M4, null} are in inactive segment.
> Logically, they should be removed, right ?
> * For the second case, {M1, M2, M3, M4, null, M5} are in inactive segment
> and {M6} is in active one. LogCleaner should once again remove M1-null and
> leave only M5 and M6 (with potential merging of these two messages to a
> single one active segment) ?
>
> 2) Log compaction will only 'remove' duplicated messages ? All other
> messages (including the ones after deduplication) will be kept infinitely ?
>
>
>
> Best regards,
> Bartosz.
>

Reply via email to