weizhouapache opened a new pull request, #9885:
URL: https://github.com/apache/cloudstack/pull/9885

   ### Description
   
   This PR improves the peers management servers and provides better peer 
states based on the mgmt server statistics sync.
   
   Old diagram:
   
![image](https://github.com/user-attachments/assets/730f0a14-c85f-4560-b599-95a1be407782)
   
   new diagram:
   
![image](https://github.com/user-attachments/assets/bd9c15ee-b533-423d-a610-cb1c2e224697)
   
   Test results: 
   
   <table>
   <thead>
   <tr>
   <th rowspan="2">Actions</th>
   <th rowspan="2">Current</th>
   <th rowspan="2">New</th>
   </tr>
   </thead>
   <tbody>
   <tr>
   <td>stop cloudstack-management</td>
   <td>mshost.state is Down immediately, mshost_peer.peer_state is Down after 
3mins</td>
   <td>Same as current</td>
   </tr>
   <tr>
   <td>restart cloudstack-management</td>
   <td>mshost.state is Down immediately and Up later. mshost_peer.peer_state is 
Up during the period. There are some leftover peers with old peer_runid.</td>
   <td>mshost.state is Down immediately and Up later. mshost_peer.peer_state is 
Up during the period. records with old peer_runid is removed.</td>
   </tr>
   <tr>
   <td>pkill java</td>
   <td>same as "stop cloudstack-management"</td>
   <td>Same as current</td>
   </tr>
   <tr>
   <td>kill -9 `pid of java process`</td>
   <td>mshost.state is still Up. mshost_peer.peer_state is Down after 3 
mins.</td>
   <td>mshost.state and mshost_peer.peer_state are Down after 3-4 mins.</td>
   </tr>
   <tr>
   <td>hard shutdown (echo o > /proc/sysrq-trigger)</td>
   <td>same as "kill -9"</td>
   <td>Same as current</td>
   </tr>
   <tr>
   <td>hard reset (echo b > /proc/sysrq-trigger)</td>
   <td>same as "kill -9"</td>
   <td>Same as current</td>
   </tr>
   <tr>
   <td>cannot write to db</td>
   <td>same as "kill -9". When link is recovered, exit with code 219</td>
   <td>Same as current</td>
   </tr>
   <tr>
   <td>cannot communicate with other mgmt nodes</td>
   <td>nothing changed</td>
   <td>Same as current</td>
   </tr>
   <tr>
   <td>mgmt link is down</td>
   <td>same as "kill -9" as eth0 is also used to connect to db. otherwise it 
should be same as "cannot communicate with other nodes"</td>
   <td>Same as current</td>
   </tr>
   </tbody>
   </table>
   
   New tab "Peers" for each management server on UI.
   
   
![image](https://github.com/user-attachments/assets/5f828d67-16a2-4041-b41a-a707598fd9ed)
   
   <!--- Describe your changes in DETAIL - And how has behaviour functionally 
changed. -->
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to 
reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be 
closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- 
******************************************************************************* 
-->
   <!--- NOTE: AUTOMATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE 
DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- 
******************************************************************************* 
-->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [x] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] build/CI
   - [ ] test (unit or integration test code)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   
   ### Screenshots (if appropriate):
   
   
   ### How Has This Been Tested?
   
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to 
-->
   
   #### How did you try to break this feature and the system with this change?
   
   <!-- see how your change affects other areas of the code, etc. -->
   
   
   <!-- Please read the 
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) 
document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to