[
https://issues.apache.org/jira/browse/HBASE-19542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296382#comment-16296382
]
Chia-Ping Tsai commented on HBASE-19542:
----------------------------------------
bq. So where is the test stuck?
The critical code is shown below.
{code:title=FanOutOneBlockAsyncDFSOutputHelper.java}
static void completeFile(DFSClient client, ClientProtocol namenode, String
src, String clientName,
ExtendedBlock block, long fileId) {
for (int retry = 0;; retry++) {
try {
if (namenode.complete(src, clientName, block, fileId)) {
endFileLease(client, fileId);
return;
} else {
LOG.warn("complete file " + src + " not finished, retry = " + retry);
}
} catch (RemoteException e) {
IOException ioe = e.unwrapRemoteException();
if (ioe instanceof LeaseExpiredException) {
LOG.warn("lease for file " + src + " is expired, give up", e);
return;
} else {
LOG.warn("complete file " + src + " failed, retry = " + retry, e);
}
} catch (Exception e) {
LOG.warn("complete file " + src + " failed, retry = " + retry, e);
}
sleepIgnoreInterrupt(retry);
}
}
{code}
If the filesystem is in safe mode, the exception here is of the RemoteException
wrapping a SafeModeException. So it hangs in the loop when we are closing the
wal.
bq. This means we may leave a wal always open if a FileSystem is temporary
unavailable but the RS is not down?
Or we can shutdown the rs if it reaches the retry limit?
> fix TestSafemodeBringsDownMaster
> --------------------------------
>
> Key: HBASE-19542
> URL: https://issues.apache.org/jira/browse/HBASE-19542
> Project: HBase
> Issue Type: Bug
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19542.v0.patch
>
>
> We need to check the stability of underlay file system when closing async
> wal. Otherwise, our hbase can't shutdown gracefully.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)