[ https://issues.apache.org/jira/browse/IGNITE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vyacheslav Koptilin reassigned IGNITE-24523: -------------------------------------------- Assignee: Vyacheslav Koptilin > Better raft logging in case of node configuration changes > --------------------------------------------------------- > > Key: IGNITE-24523 > URL: https://issues.apache.org/jira/browse/IGNITE-24523 > Project: Ignite > Issue Type: Improvement > Reporter: Alexander Lapin > Assignee: Vyacheslav Koptilin > Priority: Major > Labels: ignite-3 > > h3. Motivation > We have two messages that are printed at the change peer op start: > {noformat} > [INFO ][%irt_aconlnj_1%JRaft-Request-Processor-22][BaseCliRequestProcessor] > Receive ChangePeersAndLearnersAsyncRequest with term 1 to > <17_part_0/irt_aconlnj_1> from 192.168.1.22:3345, new conf is > irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2 > [INFO ][%irt_aconlnj_1%JRaft-Request-Processor-22][NodeImpl] Node > <17_part_0/irt_aconlnj_1> change configuration from > irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_3 to > irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2. > {noformat} > Then we are printing a message about the configuration that was applied in > each node where the partition holds (unfortunately without RAFT group ID): > {noformat} > [INFO > ][%irt_aconlnj_3%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter] > onConfigurationCommitted: irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2. > [INFO > ][%irt_aconlnj_2%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter] > onConfigurationCommitted: irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_3. > [INFO > ][%irt_aconlnj_2%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter] > onConfigurationCommitted: irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2. > {noformat} > And finally we are printing a message about the rebalance that was finished > in the same node where it had been started: > {noformat} > [INFO > ][%irt_aconlnj_1%rebalance-scheduler-0][RebalanceRaftGroupEventsListener] > Rebalance finished [tablePartitionId=17_part_0, appliedPeers=[Assignment > [consistentId=irt_aconlnj_2, isPeer=true], Assignment > [consistentId=irt_aconlnj_0, isPeer=true], Assignment > [consistentId=irt_aconlnj_1, isPeer=true]]] > {noformat} > And again, print the message about the assignment that was changed and saved > on each cluster node: > {noformat} > [INFO ][%irt_aconlnj_0%tableManager-io-13][TableManager] Received update on > stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, > localMemberAddress=192.168.1.22:3344, stableAssignments=[Assignment > [consistentId=irt_aconlnj_2, isPeer=true], Assignment > [consistentId=irt_aconlnj_0, isPeer=true], Assignment > [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments > [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154] > [INFO ][%irt_aconlnj_2%tableManager-io-16][TableManager] Received update on > stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, > localMemberAddress=192.168.1.22:3346, stableAssignments=[Assignment > [consistentId=irt_aconlnj_2, isPeer=true], Assignment > [consistentId=irt_aconlnj_0, isPeer=true], Assignment > [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments > [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154] > [INFO ][%irt_aconlnj_3%tableManager-io-15][TableManager] Received update on > stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, > localMemberAddress=192.168.1.22:3347, stableAssignments=[Assignment > [consistentId=irt_aconlnj_2, isPeer=true], Assignment > [consistentId=irt_aconlnj_0, isPeer=true], Assignment > [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments > [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154] > [INFO ][%irt_aconlnj_1%tableManager-io-13][TableManager] Received update on > stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, > localMemberAddress=192.168.1.22:3345, stableAssignments=[Assignment > [consistentId=irt_aconlnj_2, isPeer=true], Assignment > [consistentId=irt_aconlnj_0, isPeer=true], Assignment > [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments > [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154] > {noformat} > In some cases we can print a message that confuses: > {noformat} > Node <51_part_3/idtt_n_3344> change configuration from idtt_n_3344 to > idtt_n_3344 > {noformat} > A set of peers in the old assignment coincides with the new one. > h3. Definition of done > * I believe we can change a log level to debug for all the messages that are > printed on each node. > * Chose the one message at the rebalance start (the other one move to debug). > * Research a case where the sets of peers match and change the log message in > order for the reason for the rebalance to become clear. > * Use our common log format if it is violated. -- This message was sent by Atlassian Jira (v8.20.10#820010)