Shall I create the jira directly?
On Thu, Oct 26, 2017 at 12:34 PM, Xie Gang <[email protected]> wrote:
> Hi,
>
> We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain
> socket is disabled when datanode throw block invalid exception.
>
> The block is invalidated for some reason on datanote and it's OK. Then
> DFSClient tries to access this block on this datanode via domain socket.
> This triggers a IOExcetion. On DFSClient side, when get a IOExcetion and
> error code 'ERROR', it disables the domain socket and fails back to TCP.
> and the worst is that it seems never recover the socket.
>
> I think this is a defect and with such "block invalid" exception, we
> should not disable the domain socket because the is nothing wrong about the
> domain socket service.
>
> And thoughts?
>
> The code:
>
> private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
> Slot slot) throws IOException {
> ShortCircuitCache cache = clientContext.getShortCircuitCache();
> final DataOutputStream out =
> new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
> SlotId slotId = slot == null ? null : slot.getSlotId();
> new Sender(out).requestShortCircuitFds(block, token, slotId, 1);
> DataInputStream in = new DataInputStream(peer.getInputStream());
> BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
> PBHelper.vintPrefixed(in));
> DomainSocket sock = peer.getDomainSocket();
> switch (resp.getStatus()) {
> case SUCCESS:
> byte buf[] = new byte[1];
> FileInputStream fis[] = new FileInputStream[2];
> sock.recvFileInputStreams(fis, buf, 0, buf.length);
> ShortCircuitReplica replica = null;
> try {
> ExtendedBlockId key =
> new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
> replica = new ShortCircuitReplica(key, fis[0], fis[1], cache,
> Time.monotonicNow(), slot);
> } catch (IOException e) {
> // This indicates an error reading from disk, or a format error. Since
> // it's not a socket communication problem, we return null rather than
> // throwing an exception.
> LOG.warn(this + ": error creating ShortCircuitReplica.", e);
> return null;
> } finally {
> if (replica == null) {
> IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]);
> }
> }
> return new ShortCircuitReplicaInfo(replica);
> case ERROR_UNSUPPORTED:
> if (!resp.hasShortCircuitAccessVersion()) {
> LOG.warn("short-circuit read access is disabled for " +
> "DataNode " + datanode + ". reason: " + resp.getMessage());
> clientContext.getDomainSocketFactory()
> .disableShortCircuitForPath(pathInfo.getPath());
> } else {
> LOG.warn("short-circuit read access for the file " +
> fileName + " is disabled for DataNode " + datanode +
> ". reason: " + resp.getMessage());
> }
> return null;
> case ERROR_ACCESS_TOKEN:
> String msg = "access control error while " +
> "attempting to set up short-circuit access to " +
> fileName + resp.getMessage();
> if (LOG.isDebugEnabled()) {
> LOG.debug(this + ":" + msg);
> }
> return new ShortCircuitReplicaInfo(new InvalidToken(msg));
> default:
> LOG.warn(this + ": unknown response code " + resp.getStatus() +
> " while attempting to set up short-circuit access. " +
> resp.getMessage());
> clientContext.getDomainSocketFactory()
> .disableShortCircuitForPath(pathInfo.getPath());
> return null;
> }
>
>
>
> --
> Xie Gang
>
--
Xie Gang