[
https://issues.apache.org/jira/browse/HDFS-12821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Zhuge resolved HDFS-12821.
-------------------------------
Resolution: Duplicate
> Block invalid IOException causes the DFSClient domain socket being disabled
> ---------------------------------------------------------------------------
>
> Key: HDFS-12821
> URL: https://issues.apache.org/jira/browse/HDFS-12821
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.4.0, 2.6.0
> Reporter: Gang Xie
>
> We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain socket
> is disabled when datanode throw block invalid exception.
> The block is invalidated for some reason on datanote and it's OK. Then
> DFSClient tries to access this block on this datanode via domain socket. This
> triggers a IOExcetion. On DFSClient side, when get a IOExcetion and error
> code 'ERROR', it disables the domain socket and fails back to TCP. and the
> worst is that it seems never recover the socket.
> I think this is a defect and with such "block invalid" exception, we should
> not disable the domain socket because the is nothing wrong about the domain
> socket service.
> And thoughts?
> The code:
> {code}
> private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
> Slot slot) throws IOException {
> ShortCircuitCache cache = clientContext.getShortCircuitCache();
> final DataOutputStream out =
> new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
> SlotId slotId = slot == null ? null : slot.getSlotId();
> new Sender(out).requestShortCircuitFds(block, token, slotId, 1);
> DataInputStream in = new DataInputStream(peer.getInputStream());
> BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
> PBHelper.vintPrefixed(in));
> DomainSocket sock = peer.getDomainSocket();
> switch (resp.getStatus()) {
> case SUCCESS:
> byte buf[] = new byte[1];
> FileInputStream fis[] = new FileInputStream[2];
> sock.recvFileInputStreams(fis, buf, 0, buf.length);
> ShortCircuitReplica replica = null;
> try {
> ExtendedBlockId key =
> new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
> replica = new ShortCircuitReplica(key, fis[0], fis[1], cache,
> Time.monotonicNow(), slot);
> } catch (IOException e) {
> // This indicates an error reading from disk, or a format error. Since
> // it's not a socket communication problem, we return null rather than
> // throwing an exception.
> LOG.warn(this + ": error creating ShortCircuitReplica.", e);
> return null;
> } finally {
> if (replica == null) {
> IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]);
> }
> }
> return new ShortCircuitReplicaInfo(replica);
> case ERROR_UNSUPPORTED:
> if (!resp.hasShortCircuitAccessVersion()) {
> LOG.warn("short-circuit read access is disabled for " +
> "DataNode " + datanode + ". reason: " + resp.getMessage());
> clientContext.getDomainSocketFactory()
> .disableShortCircuitForPath(pathInfo.getPath());
> } else {
> LOG.warn("short-circuit read access for the file " +
> fileName + " is disabled for DataNode " + datanode +
> ". reason: " + resp.getMessage());
> }
> return null;
> case ERROR_ACCESS_TOKEN:
> String msg = "access control error while " +
> "attempting to set up short-circuit access to " +
> fileName + resp.getMessage();
> if (LOG.isDebugEnabled()) {
> LOG.debug(this + ":" + msg);
> }
> return new ShortCircuitReplicaInfo(new InvalidToken(msg));
> default:
> LOG.warn(this + ": unknown response code " + resp.getStatus() +
> " while attempting to set up short-circuit access. " +
> resp.getMessage());
> clientContext.getDomainSocketFactory()
> .disableShortCircuitForPath(pathInfo.getPath());
> <<<<<<=====
> return null;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]