Alexandr Kuramshin created IGNITE-7134:
------------------------------------------

             Summary: Never-ending timeout in 
IgniteSpiOperationTimeoutHelper.nextTimeoutChunk()
                 Key: IGNITE-7134
                 URL: https://issues.apache.org/jira/browse/IGNITE-7134
             Project: Ignite
          Issue Type: Bug
          Components: general
    Affects Versions: 2.3
            Reporter: Alexandr Kuramshin
            Priority: Critical
             Fix For: 2.4


{noformat}
org.apache.ignite.spi.IgniteSpiOperationTimeoutHelper#nextTimeoutChunk

long curTs = U.currentTimeMillis();

timeout = timeout - (curTs - lastOperStartTs);
{noformat}

Timeout will not be decreased at all if delay between successive calls to 
nextTimeoutChunk() is smaller than U.currentTimeMillis() discretization. Such 
behaviour could be easily achieved when getting an error right after the 
nextTimeoutChunk() invocation and do the retry.

Only rare calls (the first right before U.currentTimeMillis() and the second 
right after that) may decrease timeout, so actual 
IgniteSpiOperationTimeoutHelper timeout could be much bigger than the 
failureDetectionTimeout.

My opinion to not split failureDetectionTimeout between network operations, but 
initialize first operation timestamp at first call to nextTimeoutChunk(), and 
then calculate the timeout as a difference between the current timestamp and 
the first operation timestamp.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to