Re: AutoScaling group not triggering

Wei ZHOU Thu, 01 Jun 2023 01:47:48 -0700

Hi Ricardo,

ACS gets the VM statistics (including cpu, memory, network, disk
statistics) by sending GetVmStatsCommand to the kvm host, and getting the
answer GetVmStatsAnswer from the kvm host.
Can you check agent.log if there are errors ?


For example, I cannot get memory statistics due to error below
```
2023-06-01 08:42:55,925 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-4:null) (logid:5cfb0714) Processing command:
com.cloud.agent.api.GetVmStatsCommand
2023-06-01 08:42:55,925 DEBUG [kvm.resource.LibvirtConnection]
(agentRequest-Handler-4:null) (logid:5cfb0714) Looking for libvirtd
connection at: qemu:///system
2023-06-01 08:42:55,928 WARN  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) (logid:5cfb0714) Couldn't retrieve free
memory, returning -1.
```

But got the cpu load (which is very low)

```
mysql> select * from autoscale_vmgroup_statistics;
+-------+------------+-----------+------------+-------------+------------------+-----------+------------------+---------------------+----------+
| id    | vmgroup_id | policy_id | counter_id | resource_id | resource_type
   | raw_value | value_type       | created             | state    |
+-------+------------+-----------+------------+-------------+------------------+-----------+------------------+---------------------+----------+
...
| 34142 |          5 |        13 |        101 |        9020 | UserVm
    |  0.003534817956875221 | INSTANT_VM       | 2023-06-01 08:39:02 |
ACTIVE   |
| 34143 |          5 |        15 |        101 |        9020 | UserVm
    |  0.003534817956875221 | INSTANT_VM       | 2023-06-01 08:39:02 |
ACTIVE   |
| 34144 |          5 |        13 |        101 |        9021 | UserVm
    | 0.0035341933203746245 | INSTANT_VM       | 2023-06-01 08:39:02 |
ACTIVE   |
| 34145 |          5 |        15 |        101 |        9021 | UserVm
    | 0.0035341933203746245 | INSTANT_VM       | 2023-06-01 08:39:02 |
ACTIVE   |
```

-Wei


On Tue, 30 May 2023 at 23:03, Ricardo Pertuz
<[email protected]> wrote:

> Here's what I see
>
> 2023-05-30 15:08:35,483 INFO
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:bff92bbe) Fetching health check result
> for 169.254.82.180 and executing fresh checks: **false**
> 2023-05-30 15:08:35,884 INFO
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-2:null) (logid:bff92bbe) Fetching health check result
> for 169.254.116.47 and executing fresh checks: **false**
> 2023-05-30 15:08:36,333 INFO
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-3:null) (logid:bff92bbe) Fetching health check result
> for 169.254.166.143 and executing fresh checks: false
> 2023-05-30 15:08:36,739 INFO
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-1:null) (logid:bff92bbe) Fetching health check result
> for 169.254.47.117 and executing fresh checks: **false**
>
>
> Ricardo Pertuz
>
>
>
>
>
>
> May 30, 2023 at 3:25 PM, "Wei ZHOU" <[email protected]> wrote:
>
>
> >
> > Hi Ricardo,
> >
> > It looks the CPU usage (raw_value) is 0 . Can you check the agent.log ?
> >
> > INACTIVE means there are some changes with the AS vm group at that time,
> > for example create/enable/disable/scaleup/scaledown.
> >
> > -Wei
> >
> > On Tuesday, 30 May 2023, Ricardo Pertuz <[email protected]
> .invalid>
> > wrote:
> >
> > >
> > > Hi Wei,
> > >
> > >  Thanks for replying, my threshold is 5% just to check and the ACS
> metrics
> > >  says 28% in usage
> > >
> > >  looks like no error in logs, however I see this message
> > >
> > >  **success: Creating file in VR, with ip: 169.254.89.121, file:
> > >  monitor_service.json.ec3acdd8-b1c1-4603-9fde-79eece662390","null -
> > >  success: Invalid unit name "[email protected]
> ,172.28.0.83"
> > >  escaped as "[email protected]\x2c172.28.0.83" (maybe
> you
> > >  should use systemd-escape?)**
> > >
> > >  2023-05-30 15:05:13,988 DEBUG [c.c.s.StatsCollector]
> (StatsCollector-6:ctx-361217f1)
> > >  (logid:f59c817a) AutoScaling Monitor is running...
> > >  2023-05-30 15:05:13,989 DEBUG [c.c.s.StatsCollector]
> (StatsCollector-6:ctx-361217f1)
> > >  (logid:f59c817a) Skipping AutoScaling Monitor
> > >  2023-05-30 15:05:14,225 DEBUG [c.c.n.a.AutoScaleManagerImpl]
> > >  (VmGroup-Monitor-4-1:ctx-94401ba2) (logid:1b0d873d) Start monitoring
> on
> > >  AutoScale VmGroup
> AutoScaleVmGroupVO[id=4|name=scaler01|loadBalancerId=93|
> > >  profileId=5]
> > >  2023-05-30 15:05:14,232 DEBUG [c.c.n.a.AutoScaleManagerImpl]
> > >  (VmGroup-Monitor-4-1:ctx-94401ba2) (logid:1b0d873d) [AutoScale]
> > >  Collecting performance data ...
> > >  2023-05-30 15:05:14,239 DEBUG [c.c.n.a.AutoScaleManagerImpl]
> > >  (VmGroup-Monitor-4-1:ctx-94401ba2) (logid:1b0d873d) [AutoScale]
> > >  Collecting performance data from hosts ...
> > >
> > >  023-05-30 15:04:47,539 DEBUG [c.c.s.StatsCollector]
> > >  (Cluster-Worker-4706:ctx-4eac4d6f) (logid:e5e83be1) StatusUpdate from
> > >  262699919842878, json: {"managementServerHostId":202,
> > >  "managementServerHostUuid":"016a5d17-44ec-429b-acd9-
> > >  36ee81fbd295","collectionTime":"May 30, 2023, 3:04:47
> PM","sessions":0,"
> > >
> cpuUtilization":0.0,"totalJvmMemoryBytes":455081984,"freeJvmMemoryBytes"
> > >  :108107048,"maxJvmMemoryBytes":1908932607,"processJvmMemoryBytes":0,"
> > >  jvmUptime":594979551,"jvmStartTime":1684882107946,"
> > >  availableProcessors":16,"loadAverage":6.48,"totalInit":
> > >  1062535168,"totalUsed":573048008,"totalCommitted":691445760,"pid"
> > >  Regarding database this is what I see, no so sure why the **INACTIVE
> > >  **state
> > >
> > >  MariaDB [cloud]> select * from autoscale_vmgroup_statistics limit 5;
> > >  +-----+------------+-----------+------------+-------------+-
> > >  -----------------+-----------+------------------+-----------
> > >  ----------+----------+
> > >  | id | vmgroup_id | policy_id | counter_id | resource_id |
> > >  resource_type | raw_value | value_type | created |
> > >  state |
> > >  +-----+------------+-----------+------------+-------------+-
> > >  -----------------+-----------+------------------+-----------
> > >  ----------+----------+
> > >  | 294 | 2 | 0 | 0 | 2 |
> > >  AutoScaleVmGroup | -1 | INSTANT_VM_GROUP | 2023-05-30 13:48:25 |
> > >  INACTIVE |
> > >  | 295 | 2 | 0 | 0 | 2 |
> > >  AutoScaleVmGroup | -1 | INSTANT_VM_GROUP | 2023-05-30 13:48:31 |
> > >  INACTIVE |
> > >  | 296 | 2 | 0 | 0 | 2 |
> > >  AutoScaleVmGroup | -1 | INSTANT_VM_GROUP | 2023-05-30 13:48:37 |
> > >  INACTIVE |
> > >  | 297 | 2 | 3 | 106 | 9842 |
> > >  UserVm | 0 | INSTANT_VM | 2023-05-30 13:48:44 |
> > >  ACTIVE |
> > >  | 298 | 2 | 4 | 106 | 9842 |
> > >  UserVm | 0 | INSTANT_VM | 2023-05-30 13:48:44 |
> > >  ACTIVE |
> > >  +-----+------------+-----------+------------+-------------+-
> > >  -----------------+-----------+------------------+-----------
> > >  ----------+----------+
> > >
> > >  Regards,
> > >
> > >  Ricardo Pertuz
> > >
> > >  May 30, 2023 at 2:39 PM, "Wei ZHOU" <[email protected]> wrote:
> > >
> > >  Hi Ricardo,
> > >
> > >  We (including dev and qa) have done intensive testing with different
> > >  hypervisors and scenarios. You may hit a bug, but more likely a
> > >  misconfiguration issue.
> > >
> > >  You can check by the following steps:
> > >  (1) check database table "autoscale_vmgroup_statistics" to see if the
> > >  metrics have been collected with correct value and frequency.
> > >  (2) check management-server.log to see if cloudstack checks the
> metrics
> > >  periodically.
> > >
> > >  I suggest you to test with small threshold. The cpu usage is collected
> > >  from
> > >  the kvm hypervisor , calculated from the cpu time on the vm, which
> might
> > >  have big difference as you thought.
> > >
> > >  -Wei
> > >
> > >  On Tuesday, 30 May 2023, Ricardo Pertuz <[email protected].
> > >  invalid>
> > >  wrote:
> > >
> > >  >
> > >  > Hi,
> > >  >
> > >  > On our env with ACS 4.18 KVM hypervisor, we have configured an
> > >  autoscale
> > >  > vm group with cpu average counter, however it does not trigger the
> > >  scale up
> > >  > even the threshold have been reached longer than the stipulated.
> What
> > >  > should we check? are we missing something?
> > >  >
> > >  > Min Instances 1 (always remains in 1 instance)
> > >  > Max Instances 3
> > >  >
> > >  > Ricardo Pertuz
> > >  >
> > >
> >

Re: AutoScaling group not triggering

Reply via email to