vishesh92 opened a new pull request, #8903: URL: https://github.com/apache/cloudstack/pull/8903
### Description This PR fixes the issues which occur when increment/decrement methods are waiting for a lock on domain tables and `ResourceCountCheckTask` is running at the same time. This issue appears when innodb_lock_wait_timeout is many times less than the time it takes for `recalculateDomainResourceCount` to complete. (Check steps below on how to reproduce the error). ```java com.cloud.utils.exception.CloudRuntimeException: DB Exception on: com.mysql.cj.jdbc.ClientPreparedStatement: SELECT resource_count.id, resource_count.type, resource_count.account_i d, resource_count.domain_id, resource_count.count, resource_count.tag FROM resource_count WHERE resource_count.id IN (33,4785,3513,4845) FOR UPDATE at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:438) at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:366) at com.cloud.utils.db.GenericDaoBase.search(GenericDaoBase.java:355) at com.cloud.utils.db.GenericDaoBase.lockRows(GenericDaoBase.java:341) ..... ``` We do this by removing unnecessary locks and simplifying count updates. As of now, to calculate the resource count for root domain, we are taking the lock on the entire table. This PR also splits the domain count calculation transaction into multiple transactions locks. This is done by breaking up the domain count calculation process by: 1. Calculate resource count for all accounts in a domain 2. Calculate resource count for all child domains in a domain 3. In a transaction, fetch the child domain & accounts count and update the count if required <!--- Describe your changes in DETAIL - And how has behaviour functionally changed. --> <!-- For new features, provide link to FS, dev ML discussion etc. --> <!-- In case of bug fix, the expected and actual behaviours, steps to reproduce. --> <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be closed when this PR gets merged --> <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" --> <!-- Fixes: # --> <!--- ********************************************************************************* --> <!--- NOTE: AUTOMATATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE DOCUMENTATION. --> <!--- PLEASE PUT AN 'X' in only **ONE** box --> <!--- ********************************************************************************* --> ### Types of changes - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] New feature (non-breaking change which adds functionality) - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] Enhancement (improves an existing feature and functionality) - [ ] Cleanup (Code refactoring and cleanup, that may add test cases) ### Feature/Enhancement Scale or Bug Severity #### Feature/Enhancement Scale - [ ] Major - [ ] Minor #### Bug Severity - [ ] BLOCKER - [ ] Critical - [ ] Major - [ ] Minor - [ ] Trivial ### Screenshots (if appropriate): ### How Has This Been Tested? <!-- Please describe in detail how you tested your changes. --> <!-- Include details of your testing environment, and the tests you ran to --> <!-- see how your change affects other areas of the code, etc. --> 1. Setup multiple domains & networks. And update their limits. I used the below command. ```bash csbench -create -domain -network -limits ``` ``` # csbench-config numdomains = 10 numnetworks = 1 numvms = 100 startvm = false # For faster creation of VMs ``` 2. Check the time it takes for resource count calculation to run. To manually trigger resource count calculation, run this command: ```bash time cmk update resourcecount domainid=1 ``` 3. Update `innodb_lock_wait_timeout` to a value less than by a few seconds it took for the above request to complete. ```sql SET GLOBAL innodb_lock_wait_timeout=3; ``` 4. Restart the management server for `innodb_lock_wait_timeout` change to take effect. 5. Run the below commands. ``` csbench -create -vm -workers=50 csbench -teardown -vm -workers=50 ``` In parallel to above requests, execute `cmk update resourcecount domainid=1` to trigger resource count recalculation while VMs are getting created or destroyed. 6. Check logs for `ClientPreparedStatement`. ```bash grep "ClientPreparedStatement" vmops.log ``` #### Results ##### With patch - creation of VM in stopped state ``` +----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+ | TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE | +----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+ | vm - All | 1000 | 1.708 | 12.123 | 3.874 | 3.46 | 5.428 | 6.662 | 8.614 | +----------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+ ``` ``` +------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+ | TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE | +------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+ | vm-destroy - All | 1000 | 10.286 | 21.86 | 17.987 | 15.467 | 21.518 | 21.589 | 21.779 | +------------------+-------+--------+-------+--------+--------+-----------------+-----------------+-----------------+ ``` ##### Without patch ``` +-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+ | TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE | +-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+ | vm - All | 1000 | 2.039 | 17.463 | 5.656 | 4.77 | 10.484 | 11.758 | 13.645 | | vm - Successful | 988 | 2.039 | 17.463 | 5.67 | 4.773 | 10.489 | 11.791 | 13.753 | | vm - Failed | 12 | 3.181 | 5.414 | 4.493 | 4.679 | 5.21 | 5.313 | 5.313 | +-----------------+-------+-------+--------+-------+--------+-----------------+-----------------+-----------------+ ``` ``` +------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+ | TYPE | COUNT | MIN | MAX | AVG | MEDIAN | 90TH PERCENTILE | 95TH PERCENTILE | 99TH PERCENTILE | +------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+ | vm-destroy - All | 996 | 10.295 | 29.176 | 20.111 | 21.417 | 21.655 | 22.27 | 28.691 | +------------------+-------+--------+--------+--------+--------+-----------------+-----------------+-----------------+ ``` <!-- Please read the [CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) document --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org