Roman Puchkovskiy created IGNITE-19655:
------------------------------------------

             Summary: Distributed Sql keeps mapping query fragments to a node 
that has already left
                 Key: IGNITE-19655
                 URL: https://issues.apache.org/jira/browse/IGNITE-19655
             Project: Ignite
          Issue Type: Bug
            Reporter: Roman Puchkovskiy
            Assignee: Maksim Zhuravkov
             Fix For: 3.0.0-beta2


There are two test failures: 
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7271211?expandCode+Inspection=true&expandBuildProblemsSection=true&hideProblemsFromDependencies=false&expandBuildTestsSection=true&hideTestsFromDependencies=false]
 and 
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7272905?hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandCode+Inspection=true&expandBuildProblemsSection=true&expandBuildChangesSection=true&expandBuildTestsSection=true]
 
(org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.entriesKeepAppendedAfterSnapshotInstallation
 and 
org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.snapshotInstallTimeoutDoesNotBreakSubsequentInstallsWhenSecondAttemptIsIdenticalToFirst,
 correspondingly).

In both cases, the test code creates a table with 3 replicas on a cluster of 3 
nodes, then it stops the last node and tries to make an insert using one of the 
2 remaining nodes. The RAFT majority (2 of 3) is still preserved, so the insert 
should succeed. It's understood that the insert might be issued before the 
remaining nodes understand that the third node has left, so we have a retry 
mechanism in place, it makes up to 5 attempts for almost 8 seconds (in total).

But in both the failed runs, each of 5 attempts failed because a fragment of 
the INSERT query was mapped to the missing node. This seems to be a bad luck 
(as the tests pass most of the time, fail rate is about 2.5%), but anyway: the 
SQL engine does not seem to care about the fact that the node has already left.

Probably, the SQL engine should track the Logical Topology events and avoid 
mapping query fragments to the missing nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to