Alexey Goncharuk created IGNITE-5528:
----------------------------------------

             Summary: IS_EVICT_DISABLED flag is not cleared when cache store 
throws an exception
                 Key: IGNITE-5528
                 URL: https://issues.apache.org/jira/browse/IGNITE-5528
             Project: Ignite
          Issue Type: Bug
          Components: cache
    Affects Versions: 1.7
            Reporter: Alexey Goncharuk
             Fix For: 2.2


Below is an observation from a live system:
On a large cluster with occasional topology changes, there is a sporadic hang 
which manifests itself with "Failed to evict partition message" for one of the 
caches with enabled cache store. I managed to take a heap dump and found out 
that on the hanging node there was a single entry with IS_EVICT_DISABLED flag 
set and no other threads were doing store load operation. Earlier in the logs I 
saw that the cache store threw a CacheLoaderException due to interrupted 
connection with a database.

Currently, the flag is set before the cache store load and it is cleared after 
the load.
Looks like if the store throws an exception, this leads to the leaked flag set 
and the entry cannot be cleared from the partition. As a result, on the next 
topology change partition exchange will be freezed with "Failed to wait for 
partition eviction" error message.

Attached is the test reproducing this issue (note that the message appears 
after one minute)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to