Pavel, I don't have the logs for the client node. It happened 2 times in our 
cluster till now in 45 days. Difficult to reproduce.
But the logs show a null point exception on server nodes... 1st one server node 
(192.168.1.6) went down and then the other.

In 12255, it is noted that an assertion could be seen on the coordinator, but 
this is a null pointer exception.
Agree, the race condition, described in 12255 seems similar to the logs i 
attached. But just does not explain the null pointer exception.


The race is the following:

Client node (with some configured caches) joins to a cluster sending 
SingleMessage to coordinator during client PME. This SingleMessage contains 
affinity fetch requests for all cluster caches. When SingleMessage is in-flight 
server nodes finish client PME and also process and finish cache destroy PME. 
When a cache is destroyed affinity for that cache is cleared. When 
SingleMessage delivered to coordinator it doesn’t have affinity for a requested 
cache because the cache is already destroyed. It leads to assertion error on 
the coordinator and unpredictable behavior on the client node.


Reply via email to