[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Anoop Sam John (JIRA) Tue, 12 Dec 2017 09:47:57 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287964#comment-16287964
 ]


Anoop Sam John commented on HBASE-19468:
----------------------------------------

For me, Ram's patch look simpler.   Thiruvel's  POC patch is having ref count 
incr and then decr , and looks bit more complex.
Though my comment is this.  So in Ram's patch, we make it such that when flush 
is happened we open the scanner (with out waiting for the calls which need the 
open) and so ref count would have been incremented which prevent the Discharger 
thread from collecting these files.   Can we still have old way of opening when 
needed but we do have a way that at the end of compaction, we say the scanners 
abt it ?  Like the flushed files are been informed to scanners.  So the old 
model of opening scanner when needed only will be continued. Not sure abt the 
impact specially the lock needs. Just asking.  But I like this simpler patch 
way.

> FNFE during scans and flushes
> -----------------------------
>
>                 Key: HBASE-19468
>                 URL: https://issues.apache.org/jira/browse/HBASE-19468
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Thiruvel Thirumoolan
>            Priority: Critical
>             Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
>         Attachments: HBASE-19468-poc.patch, HBASE-19468_1.4.patch
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at 
> the same time. This causes regionserver to throw a UnknownScannerException 
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files 
> (StoreScanner.updateReaders())
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we 
> don't have new scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we 
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. 
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during 
> updateReaders() and decrement it during resetScannerStack(), so discharger 
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I 
> missing anything? Is there a better approach?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Reply via email to