[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015895#comment-16015895
 ] 

Charan Reddy Guttapalem commented on BOOKKEEPER-1041:
-----------------------------------------------------

Issue: 

In Bookie's EntryLogger, we are having only one current active entryLog and all 
the ledger/entries go to the same entrylog. This is perfect for HDDs as file 
syncs, seeks and moving head allover the disk platter is very expensive. But 
having single active Entry Log is  inefficient for SSDs, as each SSD can handle 
multiple parallel writers. Also, having single active EntryLog (irrespective of 
LedgerStorage type - interleaved/sorted), is inefficient for compaction, since 
entries of multiple ledgers will end up in an entrylog.

Also in SortedLedgerStorage , in the addEntry request we flush EntryMemtable, 
if it reaches the sizelimit. Because of this we are observing unpredictable 
tail latency for addEntry request. When EntryMemTable snapshot of size (64 MB) 
is flushed all at once, this may affect the journal addentry latency. Also, if 
the rate of new add requests surpasses the rate at which the EntryMemTable's 
previous snapshot is flushed, then at a point the current EntryMemTable map 
will reach the limit and since the previous snapshot flush is in progress, 
EntryMemTable will throttle new addRequests, which would affect addEntry 
latency.

Goals:

The main purpose of this feature is to have efficient Garbagecollection story 
by minimizing the amount of compactions required and the ability to reclaim the 
deleted ledger's space quicker. Also with this feature we can lay foreground 
for switching to InterleavedLedgerStorage from SortedLedgerStorage and get 
predictable tail latency. 

Proposal:

So proposal here is to have multiple active entrylogs. Which will help with 
compaction performance and make SortedLedgerStorage redundant.

Design Overview:

- is to have server configuration specifying number of active entry logs per 
ledgerdir.
- for backward compatibility (for existing behaviour) that config can be set to 
0. 
- round-robin method will be used for choosing the active entry log for the 
current ledger in EntryLogger.addEntry method
- if the total number of active entrylogs is more than or equal to number of 
active ledgers, then we get almost exclusivity
- For implementing Round-Robin approach, we need to maintain state information 
of mapping of ledgerId to SlotId
- there will be numberofledgerdirs*numberofactiveentrylogsperledgerdir slots. a 
slot is mapped to ledgerdir, but the activeentrylog of that slot will be 
rotated when it reaches the capacity.
- By knowing the SlotId we can get the corresponding entryLogId associated to 
that slot.
- If there is no entry for current ledger in the map, then we pick the next in 
order slot and add the mapping entry to the map.
- Since Bookie won't  be informed about the writeclose of the ledger, there is 
no easy way to know when to remove the mapping entry from the map. Considering 
it is just <long ledgerid, int slotid> mapentry, we may compromise on evicting 
policy. We can just use some Cache, which has eviction policy, timebased on 
last access
- If a ledgerdir becomes full, then all the slots having entrylogs in that 
ledgerdir, should become inactive. The existing mappings, mappings of active 
ledgers to these slots (active entrylogs), should be updated to available 
active slots.
- when ledgerdir becomes writable again, then the slots which were inactive 
should be made active and become eligible for round-robin distribution
- For this feature I need to make changes to checkpoint logic. Currently with 
BOOKKEEPER-564 change, we are scheduling checkpoint only when current entrylog 
file is rotated. So we dont call 'flushCurrentLog' when we checkpoint. But for 
this feature, since there are going to be multiple active entrylogs, scheduling 
checkpoint when entrylog file is rotated, is not an option. So I need to call 
flushCurrentLogs when checkpoint is made for every 'flushinterval' period

> Multiple active entrylogs
> -------------------------
>
>                 Key: BOOKKEEPER-1041
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-1041
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>            Reporter: Venkateswararao Jujjuri (JV)
>            Assignee: Charan Reddy Guttapalem
>
> Current bookkeeper is tuned for rotational HDDs. It has one active entrylog, 
> and all the ledger/entries go to the same entrylog until it is rotated out. 
> This is perfect for HDDs as seeks and moving head allover the disk platter is 
> very expensive. But this is very inefficient for SSDs, as each SSD can handle 
> multiple parallel writers, also this method is extremely inefficient for 
> compaction as it causes write amplification and inefficient disk space usage.
> Our proposal is to have multiple active entrylogs and a configuration param 
> on how many parallel entrylogs the system can have. This way one can have 
> ability to configure to have less (may be  one) ledger per entrylog.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to