Hey Andras,
I had a quick look at HADOOP-18143, the methods in question in
ProtobufRpcEngine2 are identical to the ones in ProtobufRpcEngine. So, I am
not very sure how the  race condition doesn't happen in  ProtobufRpcEngine.
I have to debug and spend some more time, considering that I have
reverted HADOOP-18082 for now to unblock YARN. Though the issue would still
be there as you said, but will give us some time to analyse.

Thanks
-Ayush

On Mon, 28 Feb 2022 at 21:26, Gyori Andras <gand...@cloudera.com.invalid>
wrote:

> Hey everyone!
>
> We have started seeing test failures in YARN PRs for a while. We have
> identified the problematic commit, which is HADOOP-18082
> <https://issues.apache.org/jira/browse/HADOOP-18082>, however, this change
> just revealed the race condition lying in ProtobufRpcEngine2 introduced in
> HADOOP-17046 <https://issues.apache.org/jira/browse/HADOOP-17046>. We have
> also fixed the underlying issue via a locking mechanism, presented in
> HADOOP-18143 <https://issues.apache.org/jira/browse/HADOOP-18143>, but
> since it is out of our area of expertise, we can neither verify nor
> guarantee that it will not cause some subtle issues in the RPC system.
> As we think it is a core part of Hadoop, we would use feedback from someone
> who is proficient in this part.
>
> Regards:
> Andras
>

Reply via email to