[jira] [Commented] (BOOKKEEPER-926) Compacted entries are not properly synced before updating index

Venkateswararao Jujjuri (JV) (JIRA) Mon, 02 May 2016 09:06:25 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266884#comment-15266884
 ]


Venkateswararao Jujjuri (JV) commented on BOOKKEEPER-926:
---------------------------------------------------------

[~rithin.shetty] and I noticed this at Salesforce testing.

Our stack looked like this:

"GarbageCollectorThread" prio=10 tid=0x00007f6981152330 nid=0x13da5 in 
Object.wait() [0x00007f68cc5d9000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread$CompactionScannerFactory.waitEntrylogFlushed(GarbageCollectorThread.java:182)
        - locked <0x0000000480b13168> (a java.lang.Object)
        - locked <0x0000000480b01a38> (a 
org.apache.bookkeeper.bookie.GarbageCollectorThread$CompactionScannerFactory)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread$CompactionScannerFactory.flush(GarbageCollectorThread.java:201)
        - locked <0x0000000480b01a38> (a 
org.apache.bookkeeper.bookie.GarbageCollectorThread$CompactionScannerFactory)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.doCompactEntryLogs(GarbageCollectorThread.java:488)
        at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:388)

Also when [~rithin.shetty] added debug in the while loop mentioned above:

[vagrant@gus-mnds7-1-sfm sfslogs]$ grep "GC thread stuck" 
bookkeeper-bookie-gus-mnds7-1-sfm.gus.2015105437.vpod.sfdc.net.log
2016-05-01 19:59:21,271 - INFO 
[GarbageCollectorThread:GarbageCollectorThread$CompactionScannerFactory@187] - 
GC thread stuck waiting: lastOffsetLogId=12234, leastUnflushedLogId=12247
2016-05-01 20:14:21,886 - INFO 
[GarbageCollectorThread:GarbageCollectorThread$CompactionScannerFactory@187] - 
GC thread stuck waiting: lastOffsetLogId=12234, leastUnflushedLogId=12268
2016-05-01 20:29:22,181 - INFO 
[GarbageCollectorThread:GarbageCollectorThread$CompactionScannerFactory@187] - 
GC thread stuck waiting: lastOffsetLogId=12234, leastUnflushedLogId=12284
2016-05-01 20:44:22,940 - INFO 
[GarbageCollectorThread:GarbageCollectorThread$CompactionScannerFactory@187] - 
GC thread stuck waiting: lastOffsetLogId=12234, leastUnflushedLogId=12291
2016-05-01 20:59:28,856 - INFO 
[GarbageCollectorThread:GarbageCollectorThread$CompactionScannerFactory@187] - 
GC thread stuck waiting: lastOffsetLogId=12234, leastUnflushedLogId=12305

----
As you can see lastUnflushedLogId is growing pretty fast. [~merlimat] What I 
see is, leastUnflushedLogId keeps getting updated even after flush started and 
hence a problem too?

> Compacted entries are not properly synced before updating index
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-926
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-926
>             Project: Bookkeeper
>          Issue Type: Bug
>    Affects Versions: 4.3.1
>            Reporter: Matteo Merli
>            Assignee: Matteo Merli
>             Fix For: 4.4.0
>
>
> I have identified a couple of issues in Bookie compaction code. 
> h4. Compacted entries are not properly synced when index is updated
> When compacting, we read "active" entries from an entry log and we re-append 
> to current entry log. After compacting a number of entries, by default 100K, 
> or at the very end, we need to update the index pointing to the new entry log 
> and offset.
> Before updating the index, we need to wait for this entries to be flushed and 
> fsynced, otherwise a bookie crash might leave the index updated, pointing to 
> an invalid offset.
> The current code that is supposed to wait until flushed is:
> {code:java}
> // GarbageCollectorThread.java:178
> EntryLocation lastOffset = offsets.get(offsets.size()-1);
> long lastOffsetLogId = EntryLogger.logIdForOffset(lastOffset.location);
> while (lastOffsetLogId < entryLogger.getLeastUnflushedLogId() && running) {
>     synchronized (flushLock) {
>         flushLock.wait(1000);
>     }
>     lastOffset = offsets.get(offsets.size()-1);
>     lastOffsetLogId = EntryLogger.logIdForOffset(lastOffset.location);
> }
> // update the index 
> {code}
> The condition {{lastOffsetLogId < entryLogger.getLeastUnflushedLogId()}} is 
> wrong, because if the last compacted entry was written in an earlier entry 
> log than the least unflushed log, it means that the entries are already 
> flushed and thus we don't need to wait.
> In the normal case what happens is that {{lastOffsetLogId} is actually the 
> current entryLog and it's equal to {{entryLogger.getLeastUnflushedLogId()}}, 
> so we don't wait. But, in this case the entries appended to the current 
> entrylog are not flushed nor synced, hence the problem. 
> h4. Exception during index flush
> Having an exception when updating the index, combined with the above issue, 
> makes the bookie GC to stop indefinitely. 
> What happens is that the offset list is not cleared, and at the next bookie 
> GC iteration it will find the old compacted entries in that list, for which 
> now the entryLogId is less than the current log id, and that makes the while 
> loop to never exit.
> Another problem is that, any IOException during the index flush, will make 
> the GC thread to bail out and it will not remove even the entry logs that 
> were compacted and flushed. Next time, these entry logs will be compacted 
> again.
> h4. Proposed solution
> I think the best solution is to trigger the {{entryLogger.flush()}} form the 
> bookie GC thread before updating the indexes. That would simplify the code 
> and I don't see any disadvantages in doing that. 
> Another improvement would be to delete compacted entry logs individually, as 
> soon as the compacted data is flush, without waiting the end of the whole 
> compaction process. 
> The advantages are : 
>  * If compaction stop halfway, at least we don't have to re-compact what we 
> just compacted
>  * Compaction won't require significant space overhead. Today a major 
> compaction can end up reappending a large amount of data and then deleting 
> all the entry logs at the very end, requiring twice the size of the active 
> data set to be stored on disk at some point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BOOKKEEPER-926) Compacted entries are not properly synced before updating index

Reply via email to