fapifta opened a new pull request #81: HDDS-2347 XCeiverClientGrpc's parallel use leads to NPE URL: https://github.com/apache/hadoop-ozone/pull/81 ## What changes were proposed in this pull request? We found this issue during Hive TPCDS tests, the basis of the problem is that Hive starts up an arbitrary number of threads to work on the same file, and reads the file from multiple threads. In this case, the same XCeiverClientGrpc is called, and there are certain scenarios, where the current client is not synchronized properly. This PR is to add necessary synchronization around the closed internal boolean state, and around the channels and asyncstubs structures. A fundamental change in behaviour is that the XCeiverClientGrpc instances are served after connecting to the first DN in a synchronized fashion in the XCeiverClientManager, then reconnect if needed is done after checking wether the DN is connected properly, and if not then reconnect in a synchronized block. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2347 ## How was this patch tested? As this issue comes out intermittently, and reproduction depends on how the JVM schedules the code of different threads, I was not able to write any reliable tests so far. Manually the patch was tested on a 42 node cluster, with the 100 tpcds queries on a scale 2 and scale 3 large data set generated by the tools here: https://github.com/fapifta/hive-testbench These tools are coming from https://github.com/hortonworks/hive-testbench with some modification to be able to use Ozone and HDFS as filesystems in parallel. After applying the patch on the cluster with current trunk, I have not seen the NPE in 3 runs of the 99 TPCDS queries, before the patch I was able to see 2-5 queries failing with the given NPE per run.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org