Vladimir Ozerov created IGNITE-2876:
---------------------------------------

             Summary: IGFS: System pool starvation is possible during data 
block write.
                 Key: IGNITE-2876
                 URL: https://issues.apache.org/jira/browse/IGNITE-2876
             Project: Ignite
          Issue Type: Bug
          Components: IGFS
    Affects Versions: 1.5.0.final
            Reporter: Vladimir Ozerov
            Assignee: Ivan Veselovsky
            Priority: Critical
             Fix For: 1.6


*Problem*
IGFS has a set of messages to exchange data and signal events between nodes. 
These are:
- {{IgfsAckMessage}}
- {{IgfsBlocksMessage}}
- {{IgfsDeleteMessage}}
- {{IgfsFragmentizerRequest}}
- {{IgfsFragmentizerResponse}}

Currently these messages are processed in a system pool which is wrong and may 
lead to starvation, deadlocks and incorrect behavior.

Several examples:
1) {{IgfsBlocksMessage}} handling logic performs "Cache.putAsync" operation. 
This operation involves acquiring of semaphore permit. This semaphore, in turn, 
can only be released from another thread in the same system pool. As such, all 
system pool threads could hang on permit acquire forever.
2) In case file system size is exceeded, the same message waits for some time 
in hope that free space in cache will appear. However, if all system pool 
threads waits for this point, concurrent block removal cannot proceed, so these 
threads are doomed to receive {{IgfsOutOfSpaceException}} irrespective of 
whether they wait or not.

*Solution*
1) Introduce new IO policy for IGFS (see {{GridIoPolicy}}). 
2) Force all IGFS message to be processed with this policy. No backward 
compatibility is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to