David Mollitor created HADOOP-17462: ---------------------------------------
Summary: Hadoop Client getRpcResponse May Return Wrong Result Key: HADOOP-17462 URL: https://issues.apache.org/jira/browse/HADOOP-17462 Project: Hadoop Common Issue Type: Improvement Components: common Reporter: David Mollitor Assignee: David Mollitor {code:java|Title=Client.java} /** @return the rpc response or, in case of timeout, null. */ private Writable getRpcResponse(final Call call, final Connection connection, final long timeout, final TimeUnit unit) throws IOException { synchronized (call) { while (!call.done) { try { AsyncGet.Util.wait(call, timeout, unit); if (timeout >= 0 && !call.done) { return null; } } catch (InterruptedException ie) { Thread.currentThread().interrupt(); throw new InterruptedIOException("Call interrupted"); } } */ static class Call { final int id; // call id final int retry; // retry count ... boolean done; // true when call is done ... } {code} The {{done}} variable is not marked as {{volatile}} so the thread which is checking its status is free to cache the value and never reload it even though it is expected to change by a different thread. The while loop may be stuck waiting for the change, but is always looking at a cached value. In previous versions of Hadoop, there was no time-out at this level, so it would cause endless loop. Really tough error to track down if it happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org