[
https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287964#comment-16287964
]
Anoop Sam John commented on HBASE-19468:
----------------------------------------
For me, Ram's patch look simpler. Thiruvel's POC patch is having ref count
incr and then decr , and looks bit more complex.
Though my comment is this. So in Ram's patch, we make it such that when flush
is happened we open the scanner (with out waiting for the calls which need the
open) and so ref count would have been incremented which prevent the Discharger
thread from collecting these files. Can we still have old way of opening when
needed but we do have a way that at the end of compaction, we say the scanners
abt it ? Like the flushed files are been informed to scanners. So the old
model of opening scanner when needed only will be continued. Not sure abt the
impact specially the lock needs. Just asking. But I like this simpler patch
way.
> FNFE during scans and flushes
> -----------------------------
>
> Key: HBASE-19468
> URL: https://issues.apache.org/jira/browse/HBASE-19468
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Affects Versions: 1.3.1
> Reporter: Thiruvel Thirumoolan
> Priority: Critical
> Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
> Attachments: HBASE-19468-poc.patch, HBASE-19468_1.4.patch
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at
> the same time. This causes regionserver to throw a UnknownScannerException
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files
> (StoreScanner.updateReaders())
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we
> don't have new scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3.
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during
> updateReaders() and decrement it during resetScannerStack(), so discharger
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I
> missing anything? Is there a better approach?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)