[ https://issues.apache.org/jira/browse/HDFS-12821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Zhuge resolved HDFS-12821. ------------------------------- Resolution: Duplicate > Block invalid IOException causes the DFSClient domain socket being disabled > --------------------------------------------------------------------------- > > Key: HDFS-12821 > URL: https://issues.apache.org/jira/browse/HDFS-12821 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.4.0, 2.6.0 > Reporter: Gang Xie > > We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain socket > is disabled when datanode throw block invalid exception. > The block is invalidated for some reason on datanote and it's OK. Then > DFSClient tries to access this block on this datanode via domain socket. This > triggers a IOExcetion. On DFSClient side, when get a IOExcetion and error > code 'ERROR', it disables the domain socket and fails back to TCP. and the > worst is that it seems never recover the socket. > I think this is a defect and with such "block invalid" exception, we should > not disable the domain socket because the is nothing wrong about the domain > socket service. > And thoughts? > The code: > {code} > private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer, > Slot slot) throws IOException { > ShortCircuitCache cache = clientContext.getShortCircuitCache(); > final DataOutputStream out = > new DataOutputStream(new BufferedOutputStream(peer.getOutputStream())); > SlotId slotId = slot == null ? null : slot.getSlotId(); > new Sender(out).requestShortCircuitFds(block, token, slotId, 1); > DataInputStream in = new DataInputStream(peer.getInputStream()); > BlockOpResponseProto resp = BlockOpResponseProto.parseFrom( > PBHelper.vintPrefixed(in)); > DomainSocket sock = peer.getDomainSocket(); > switch (resp.getStatus()) { > case SUCCESS: > byte buf[] = new byte[1]; > FileInputStream fis[] = new FileInputStream[2]; > sock.recvFileInputStreams(fis, buf, 0, buf.length); > ShortCircuitReplica replica = null; > try { > ExtendedBlockId key = > new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId()); > replica = new ShortCircuitReplica(key, fis[0], fis[1], cache, > Time.monotonicNow(), slot); > } catch (IOException e) { > // This indicates an error reading from disk, or a format error. Since > // it's not a socket communication problem, we return null rather than > // throwing an exception. > LOG.warn(this + ": error creating ShortCircuitReplica.", e); > return null; > } finally { > if (replica == null) { > IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]); > } > } > return new ShortCircuitReplicaInfo(replica); > case ERROR_UNSUPPORTED: > if (!resp.hasShortCircuitAccessVersion()) { > LOG.warn("short-circuit read access is disabled for " + > "DataNode " + datanode + ". reason: " + resp.getMessage()); > clientContext.getDomainSocketFactory() > .disableShortCircuitForPath(pathInfo.getPath()); > } else { > LOG.warn("short-circuit read access for the file " + > fileName + " is disabled for DataNode " + datanode + > ". reason: " + resp.getMessage()); > } > return null; > case ERROR_ACCESS_TOKEN: > String msg = "access control error while " + > "attempting to set up short-circuit access to " + > fileName + resp.getMessage(); > if (LOG.isDebugEnabled()) { > LOG.debug(this + ":" + msg); > } > return new ShortCircuitReplicaInfo(new InvalidToken(msg)); > default: > LOG.warn(this + ": unknown response code " + resp.getStatus() + > " while attempting to set up short-circuit access. " + > resp.getMessage()); > clientContext.getDomainSocketFactory() > .disableShortCircuitForPath(pathInfo.getPath()); > <<<<<<===== > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org