[ https://issues.apache.org/jira/browse/CLOUDSTACK-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696310#comment-13696310 ]
Nitin Mehta commented on CLOUDSTACK-3294: ----------------------------------------- CLOUDSTACK-2813 has the short term fix, but we need to looking up at the cleaning up resources holistically atleast for virtual machines and have a better failover in case the cleanup fails. Some ideas add something like a cleanup flag in case the cleanup didn't work, and probably releasing the resources before next retry of vm deployment, expunge thread etc, but I am not convinced if this is the most elegant solution. Is this ok ? Was talking to Murali and he was suggesting if long term, can can make acquiring resources transactional ? Or enhance framework like Journal to keep a log of resources acquired and then releasing them ? Any ideas ? If we go down this path of checking each use case why cleanup resources can fail like for fix in CLOUDSTACK-2813, we will end up with a lot of flags and if else conditions. While it fixes this problem, I still see loopholes in our cleanup approach. At the minimum we should start checking the cleanup() response. If it returns false, cleanup is not done yet and needs to be taken care of in the future (say before another retry of vm deployment or expunge cycle). Next step, could be making cleanup function itself more robust(example – _networkMgr.release throws an exception and we just do nothing right now). > CLONE - System VMs not coming up due to > “InsufficientServerCapacityException”.(not consistently reproducible) > ------------------------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-3294 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3294 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server > Affects Versions: 4.2.0 > Reporter: Nitin Mehta > Priority: Critical > Fix For: 4.2.0 > > Attachments: management-server.zip > > > Seps:t > 1. Have a CS with advanced zone . > 2. Created some user VMs. > 3. Created VPCs and VMs under VPCs. > 4. Shutdown the Host(Xen) and MS. > 5. Start the Host and MS. > Observation: > The SSVM and CPVM were not coming up with > “InsufficientServerCapacityException” exception. > The Dashboard was showing exhausted management IPs . > Deleted all the VMS ,still the IPs were not released. > Below is the table which shows that all the management ips are reserved. > mysql> select * from op_dc_ip_address_alloc; > +----+---------------+----------------+--------+--------+--------------------------------------+---------------------+-------------+ > | id | ip_address | data_center_id | pod_id | nic_id | reservation_id > | taken | mac_address | > +----+---------------+----------------+--------+--------+--------------------------------------+---------------------+-------------+ > | 1 | 10.147.40.181 | 1 | 1 | 34 | > 48d95839-6fb1-4bc4-b23a-c9f1891bf1fa | 2013-05-31 17:10:06 | 1 | > | 2 | 10.147.40.182 | 1 | 1 | 3 | > a7b9610c-9319-478c-84e4-e70be099cd9d | 2013-05-31 17:07:29 | 2 | > | 3 | 10.147.40.183 | 1 | 1 | 7 | > 238830cd-8cbe-411e-8016-352129885df6 | 2013-05-31 17:07:30 | 3 | > | 4 | 10.147.40.184 | 1 | 1 | 7 | > 70f091d4-acb4-435b-bfde-9bdb35bcfa6b | 2013-05-31 17:09:15 | 4 | > | 5 | 10.147.40.185 | 1 | 1 | 29 | > 14690352-e9a0-4695-a834-0552175f7684 | 2013-05-31 17:08:45 | 5 | > | 6 | 10.147.40.186 | 1 | 1 | 30 | > 14690352-e9a0-4695-a834-0552175f7684 | 2013-05-31 17:08:45 | 6 | > | 7 | 10.147.40.187 | 1 | 1 | 4 | > a7b9610c-9319-478c-84e4-e70be099cd9d | 2013-05-31 17:07:29 | 7 | > | 8 | 10.147.40.188 | 1 | 1 | 7 | > ea8644d1-7801-4dbb-aa0c-204f31e922a1 | 2013-05-31 17:08:25 | 8 | > | 9 | 10.147.40.189 | 1 | 1 | 7 | > 245e0082-d697-454d-9689-b36cc3b6e113 | 2013-05-31 17:11:16 | 9 | > | 10 | 10.147.40.190 | 1 | 1 | 7 | > 094e371a-da69-44e0-80fd-14c2d090e935 | 2013-05-31 17:10:15 | 10 | > +----+---------------+----------------+--------+--------+--------------------------------------+---------------------+-------------+ > As all the IPs were in reserved state ,SSVM and CPVM were not coming up. > Was not able to reproduce this issue again . > Attached is the server log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira