Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21520 )
Change subject: IMPALA-13159: Fix query cancellation caused by statestore failover ...................................................................... Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/21520/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21520/1//COMMIT_MSG@12 PS1, Line 12: This patch adds code to handle inconsistent : cluster membership state after statestore failover in the same way. Please clarify what is the expected behavior. When is query should cancel and not cancel? Is both failover scenario and regular StateStore restart scenario should not cancel RUNNING query? http://gerrit.cloudera.org:8080/#/c/21520/1/be/src/statestore/statestore-subscriber.cc File be/src/statestore/statestore-subscriber.cc: http://gerrit.cloudera.org:8080/#/c/21520/1/be/src/statestore/statestore-subscriber.cc@1095 PS1, Line 1095: bool has_failed_before = connection_failure_metric_->GetValue() > 0; : bool in_failed_grace_period = MilliSecondsSinceLastRegistration() : < FLAGS_statestore_subscriber_recovery_grace_period_ms; I think 'has_disconnect_before' and 'in_disconnect_grace_period' is a better name to distinguish from failover scenario. My understanding is that, disconnect only means temporary network issue to StateStore, and is not changing from active StateStore to standby StateStore. http://gerrit.cloudera.org:8080/#/c/21520/1/tests/custom_cluster/test_statestored_ha.py File tests/custom_cluster/test_statestored_ha.py: http://gerrit.cloudera.org:8080/#/c/21520/1/tests/custom_cluster/test_statestored_ha.py@677 PS1, Line 677: # Wait for long enough for the standby statestored to detect the failure of active : # statestored and assign itself in active role. Is this 2s (statestore_peer_timeout_seconds)? http://gerrit.cloudera.org:8080/#/c/21520/1/tests/custom_cluster/test_statestored_ha.py@688 PS1, Line 688: # Now kill a backend, and make sure the query fails. Why not let query finish? I thought this test want to show that query completes and not impacted by StateStore failover. -- To view, visit http://gerrit.cloudera.org:8080/21520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I720bec5199df46475b954558abb0637ca7e6298b Gerrit-Change-Number: 21520 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Fri, 14 Jun 2024 18:12:59 +0000 Gerrit-HasComments: Yes
