[ 
https://issues.apache.org/jira/browse/IGNITE-24523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-24523:
--------------------------------------------

    Assignee: Vyacheslav Koptilin

> Better raft logging in case of node configuration changes
> ---------------------------------------------------------
>
>                 Key: IGNITE-24523
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24523
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> We have two messages that are printed at the change peer op start:
> {noformat}
> [INFO ][%irt_aconlnj_1%JRaft-Request-Processor-22][BaseCliRequestProcessor] 
> Receive ChangePeersAndLearnersAsyncRequest with term 1 to 
> <17_part_0/irt_aconlnj_1> from 192.168.1.22:3345, new conf is 
> irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2
> [INFO ][%irt_aconlnj_1%JRaft-Request-Processor-22][NodeImpl] Node 
> <17_part_0/irt_aconlnj_1> change configuration from 
> irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_3 to 
> irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2.
> {noformat}
> Then we are printing a message about the configuration that was applied in 
> each node where the partition holds (unfortunately without RAFT group ID):
> {noformat}
> [INFO 
> ][%irt_aconlnj_3%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter] 
> onConfigurationCommitted: irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2.
> [INFO 
> ][%irt_aconlnj_2%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter] 
> onConfigurationCommitted: irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_3.
> [INFO 
> ][%irt_aconlnj_2%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter] 
> onConfigurationCommitted: irt_aconlnj_0,irt_aconlnj_1,irt_aconlnj_2.
> {noformat}
> And finally we are printing a message about the rebalance that was finished 
> in the same node where it had been started:
> {noformat}
> [INFO 
> ][%irt_aconlnj_1%rebalance-scheduler-0][RebalanceRaftGroupEventsListener] 
> Rebalance finished [tablePartitionId=17_part_0, appliedPeers=[Assignment 
> [consistentId=irt_aconlnj_2, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_0, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_1, isPeer=true]]]
> {noformat}
> And again, print the message about the assignment that was changed and saved 
> on each cluster node:
> {noformat}
> [INFO ][%irt_aconlnj_0%tableManager-io-13][TableManager] Received update on 
> stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, 
> localMemberAddress=192.168.1.22:3344, stableAssignments=[Assignment 
> [consistentId=irt_aconlnj_2, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_0, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments 
> [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154]
> [INFO ][%irt_aconlnj_2%tableManager-io-16][TableManager] Received update on 
> stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, 
> localMemberAddress=192.168.1.22:3346, stableAssignments=[Assignment 
> [consistentId=irt_aconlnj_2, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_0, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments 
> [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154]
> [INFO ][%irt_aconlnj_3%tableManager-io-15][TableManager] Received update on 
> stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, 
> localMemberAddress=192.168.1.22:3347, stableAssignments=[Assignment 
> [consistentId=irt_aconlnj_2, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_0, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments 
> [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154]
> [INFO ][%irt_aconlnj_1%tableManager-io-13][TableManager] Received update on 
> stable assignments [key=assignments.stable.17_part_0, partition=17_part_0, 
> localMemberAddress=192.168.1.22:3345, stableAssignments=[Assignment 
> [consistentId=irt_aconlnj_2, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_0, isPeer=true], Assignment 
> [consistentId=irt_aconlnj_1, isPeer=true]], pendingAssignments=Assignments 
> [nodes=HashSet [], force=false, timestamp=0, fromReset=false], revision=154]
> {noformat}
> In some cases we can print a message that confuses:
> {noformat}
> Node <51_part_3/idtt_n_3344> change configuration from idtt_n_3344 to 
> idtt_n_3344
> {noformat}
> A set of peers in the old assignment coincides with the new one.
> h3. Definition of done
> * I believe we can change a log level to debug for all the messages that are 
> printed on each node.
> * Chose the one message at the rebalance start (the other one move to debug).
> * Research a case where the sets of peers match and change the log message in 
> order for the reason for the rebalance to become clear.
> * Use our common log format if it is violated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to