vishesh92 opened a new pull request, #8903:
URL: https://github.com/apache/cloudstack/pull/8903

   ### Description
   
   This PR fixes the issues which occur when increment/decrement methods are 
waiting for a lock on domain tables and `ResourceCountCheckTask` is running at 
the same time. This issue appears when innodb_lock_wait_timeout is many times 
less than the time it takes for `recalculateDomainResourceCount` to complete. 
(Check steps below on how to reproduce the error).
   ```java
   com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
com.mysql.cj.jdbc.ClientPreparedStatement: SELECT resource_count.id, 
resource_count.type, resource_count.account_i
   d, resource_count.domain_id, resource_count.count, resource_count.tag FROM 
resource_count WHERE resource_count.id IN (33,4785,3513,4845)  FOR UPDATE 
           at 
com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:438)
           at 
com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:366)
           at com.cloud.utils.db.GenericDaoBase.search(GenericDaoBase.java:355)
           at 
com.cloud.utils.db.GenericDaoBase.lockRows(GenericDaoBase.java:341)
           .....
   
   ```
   
   We do this by removing unnecessary locks and simplifying count updates.
   
   As of now, to calculate the resource count for root domain, we are taking 
the lock on the entire table.
   This PR also splits the domain count calculation transaction into multiple 
transactions locks. This is done by breaking up the domain count calculation 
process by:
   1. Calculate resource count for all accounts in a domain
   2. Calculate resource count for all child domains in a domain
   3. In a transaction, fetch the child domain & accounts count and update the 
count if required
   
   <!--- Describe your changes in DETAIL - And how has behaviour functionally 
changed. -->
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to 
reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be 
closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- 
*********************************************************************************
 -->
   <!--- NOTE: AUTOMATATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE 
DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- 
*********************************************************************************
 -->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   
   ### Screenshots (if appropriate):
   
   
   ### How Has This Been Tested?
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to 
-->
   <!-- see how your change affects other areas of the code, etc. -->
   
   1. Setup multiple domains & networks. And update their limits. I used the 
below command.
   ```bash
   csbench -create -domain -network -limits
   ```
   ```
   # csbench-config
   numdomains = 10
   numnetworks = 1
   numvms = 100
   startvm = false  # For faster creation of VMs
   ```
   2. Check the time it takes for resource count calculation to run. To 
manually trigger resource count calculation, run this command:
   ```bash
   time cmk update resourcecount domainid=1
   ```
   3. Update `innodb_lock_wait_timeout` to a value less than by a few seconds 
it took for the above request to complete.
   ```sql
   SET GLOBAL innodb_lock_wait_timeout=3;
   ```
   4. Restart the management server for `innodb_lock_wait_timeout` change to 
take effect.
   5. Run the below commands.
   ```
   csbench -create -vm -workers=50
   csbench -teardown -vm -workers=50
   ```
   In parallel to above requests, execute `cmk update resourcecount domainid=1` 
to trigger resource count recalculation while VMs are getting created or 
destroyed.
   
   6. Check logs for `ClientPreparedStatement`.
   ```bash
   grep "ClientPreparedStatement" vmops.log
   ```
   
   #### Results
   
   ##### With patch - creation of VM in stopped state
   
   ```
   
+----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
   | TYPE     | COUNT |   MIN |    MAX |   AVG | MEDIAN | 90TH PERCENTILE | 
95TH PERCENTILE | 99TH PERCENTILE |
   
+----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
   | vm - All |  1000 | 1.708 | 12.123 | 3.874 |   3.46 |           5.428 |     
      6.662 |           8.614 |
   
+----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
   ```
   ```
   
+------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+
   | TYPE             | COUNT |    MIN |   MAX |    AVG | MEDIAN | 90TH 
PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE |
   
+------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+
   | vm-destroy - All |  1000 | 10.286 | 21.86 | 17.987 | 15.467 |          
21.518 |          21.589 |          21.779 |
   
+------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+
   ```
   ##### Without patch
   ```
   
+-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
   | TYPE            | COUNT |   MIN |    MAX |   AVG | MEDIAN | 90TH 
PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE |
   
+-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
   | vm - All        |  1000 | 2.039 | 17.463 | 5.656 |   4.77 |          
10.484 |          11.758 |          13.645 |
   | vm - Successful |   988 | 2.039 | 17.463 |  5.67 |  4.773 |          
10.489 |          11.791 |          13.753 |
   | vm - Failed     |    12 | 3.181 |  5.414 | 4.493 |  4.679 |            
5.21 |           5.313 |           5.313 |
   
+-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+
   ```
   ```
   
+------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+
   | TYPE             | COUNT |    MIN |    MAX |    AVG | MEDIAN | 90TH 
PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE |
   
+------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+
   | vm-destroy - All |   996 | 10.295 | 29.176 | 20.111 | 21.417 |          
21.655 |           22.27 |          28.691 |
   
+------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+
   ```
   
   
   <!-- Please read the 
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) 
document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to