[ https://issues.apache.org/jira/browse/IGNITE-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730596#comment-17730596 ]
Roman Puchkovskiy commented on IGNITE-19655: -------------------------------------------- The first weirdness was fixed in IGNITE-19685. To make the situation with removal of an outdated node less confusing, I've changed logging a bit to make this case stand out and differ from the normal 'node left' case. > Distributed Sql keeps mapping query fragments to a node that has already left > ----------------------------------------------------------------------------- > > Key: IGNITE-19655 > URL: https://issues.apache.org/jira/browse/IGNITE-19655 > Project: Ignite > Issue Type: Bug > Reporter: Roman Puchkovskiy > Assignee: Maksim Zhuravkov > Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > There are two test failures: > [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7271211?expandCode+Inspection=true&expandBuildProblemsSection=true&hideProblemsFromDependencies=false&expandBuildTestsSection=true&hideTestsFromDependencies=false] > and > [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7272905?hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandCode+Inspection=true&expandBuildProblemsSection=true&expandBuildChangesSection=true&expandBuildTestsSection=true] > > (org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.entriesKeepAppendedAfterSnapshotInstallation > and > org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.snapshotInstallTimeoutDoesNotBreakSubsequentInstallsWhenSecondAttemptIsIdenticalToFirst, > correspondingly). > In both cases, the test code creates a table with 3 replicas on a cluster of > 3 nodes, then it stops the last node and tries to make an insert using one of > the 2 remaining nodes. The RAFT majority (2 of 3) is still preserved, so the > insert should succeed. It's understood that the insert might be issued before > the remaining nodes understand that the third node has left, so we have a > retry mechanism in place, it makes up to 5 attempts for almost 8 seconds (in > total). > But in both the failed runs, each of 5 attempts failed because a fragment of > the INSERT query was mapped to the missing node. This seems to be a bad luck > (as the tests pass most of the time, fail rate is about 2.5%), but anyway: > the SQL engine does not seem to care about the fact that the node has already > left. > Probably, the SQL engine should track the Logical Topology events and avoid > mapping query fragments to the missing nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010)