[GitHub] [kafka] guozhangwang opened a new pull request #11252: KAFKA-13216: Use a KV with list serde for the shared store

GitBox Mon, 23 Aug 2021 21:26:13 -0700


guozhangwang opened a new pull request #11252:
URL: https://github.com/apache/kafka/pull/11252



   This is an alternative approach in parallel to #11235. After several 
unsuccessful trials to improve its efficiency i've come up with a "slightly" 
larger approach, which is to use a kv-store instead as the shared store, which 
would store the value as list<v>. The benefits of this approach are:
   
   1) Only serde once that compose <timestamp, byte, key>, at the outer metered 
stores, with less byte array copies.
   2) Deletes are straight-forward with no scan reads, just a single call to 
delete all duplicated <timestamp, byte, key> values.
   3) Using a KV store has less space amplification than a segmented window 
store.
   
   The cons though:
   
   1) Each put call would be a get-then-write to append to the list; also we 
would spend a few more bytes to store the list (most likely a singleton list, 
and hence just 4 more bytes).
   2) It's more complicated definitely.. :)
   
   The main idea is that since the shared store is actively GC'ed by the 
expiration logic, not based on time retention, and since that the key format is 
in <timestamp, byte, key>, the range expiration query is quite efficient as 
well.
   
   This is not a final PR as you can see I had many quick-hacks on serdes etc, 
but just to illustrate the idea. I plan to run the benchmarks to see how it 
behaves compare with the other and if people agree with this approach, I will 
refine it to be cleaner.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] guozhangwang opened a new pull request #11252: KAFKA-13216: Use a KV with list serde for the shared store

Reply via email to