Hey Jordan, It seems to be a race condition, I ran small scenario which failed. This was my setup:
Node:
Node84: 8GB ram available, 4GB of that allocated to the VM-1
Node85: 8GB ram, NO VMs.{}
The following actions run at the same time using cmk:
Deploy 4GB ram on Node84
Migrate VM-1 from Node84 to Node85
Deploy 4GB (without specifying the host id)
First Round:
Deploy 4GB ram on Node84 -> Successful
Migrate VM-1 from Node84 to Node85 -> Migration cancelled because Host does not
have enough capacity for vm
Deploy 4GB (without specifying the host id) -> Successful
Second Round:
Deploy 4GB ram on Node84 -> Unsuccessful, unable to create a deployment for VM
instance
Migrate VM-1 from Node84 to Node85 -> Successful
Deploy 4GB (without specifying the host id) -> Successful
Third Round:
Deploy 4GB ram on Node84 -> Successful
Migrate VM-1 from Node84 to Node85 -> Successful
Deploy 4GB (without specifying the host id) -> Unsuccessful, unable to create a
deployment for VM instance
Fourth Round:
Same as first round.
Kind regards,
Sina
------- Original Message -------
On Thursday, March 30th, 2023 at 2:24 PM, jordan j <[email protected]> wrote:
>
>
> Hey everyone,
>
> This week we are doing performance tests on the environment and we noticed
> something weird.
>
> Setup:
> - Cloudstack 4.17.2 + XCP-NG advanced network with SG.
> - zone with 30 XCP hosts (in 30 clusters) each with 100 GB ram and 100 cores
> - There is one compute offering with user dispersing planner. The offering
> has a local storage bound (no shared storage on servers) .
>
> Using terraform we tried to deploy 60 instances, 49 GB of ram each and 50
> cores.
> Some of them were not deployed (about 5).
> Running the same task again and again eventually makes the failed instances
> be deployed eventually.
>
> Wondering why this happens... looking at the logs i found out that the VMs
> fail because of not enough memory on the XCPs. Error comes from XAPI and
> not from Cloudstack which makes me conclude that Cloudstack allows the task
> but for some reason the scheduler/planner does not compute the memory
> resource properly. I wonder if there is a race condition problem where 2
> instances are assigned the same host server and what happens is sa both get
> created there is memory just for one of them.
>
> Tried to simulate the issue by simultaneously creating instances from the
> GUI on a group of 2 servers but it seems GUI-created-instances even if
> launched together are executed in order so the scheduler detects when there
> is no more RAM and the rest of the processes are stopped.
>
> Has anyone experienced such a problem?
>
> Regards,
> Jordan
signature.asc
Description: OpenPGP digital signature
