Hey everyone,
I figured it out. It was a faulty SFP that caused a bottleneck of IOPS
so VRs could not write in the log dir which cascaded into DHCP outage.
Best regards,
Jordan
-----Original Message-----
From: Yordan Kostov <[email protected]>
Sent: 09 август 2021 г. 14:50
To: [email protected]
Subject: slow vm start and dhcp log full?
[X] This message came from outside your organization
Hello everyone,
Cloudstack 4.15 + XCP-NG 82 + Virtual router template 4.15. We
got just about 15 VMs or so running. Mostly doing some backup tests or people
trying it out.
Recently I noticed quite some sluggishness on our environment.
It took about 5-10 mins to create a new VM or start existing one.
One of our networks stopped creating VMs where it seems the
Virtual router was not giving addresses.
After some troubleshooting I found the following issues:
* The Virtual router that did not give IP addresses had his
/run/log/journal directory fill in the whole /run partition with logs. It
seems when this happen the Router stops giving IP addresses.
* The same Virtual router + one more were putting heavy load on the storage
(20-25 MB/s) squeezing all the IOPS they can get.
Lets say issue number one is by design. What causes issue number 2?
VR logs ( journalctl -p 3 -x --file
/run/log/journal/5212989feea04bb6b13843e7b0c9d2b3/system.journal ) show this
issue repeating:
Aug 09 11:41:22 r-39-VM systemd[1]: Failed to start User Manager for UID 0.
-- Subject: A start job for unit [email protected] has failed
-- Defined-By: systemd
-- Support:
https://urldefense.com/v3/__https://www.debian.org/support__;!!A6UyJA!wCf6hAHLa6AftXnrRfqcu9NkyxpVWGHy_xO0Bxz2lPUzny2fOmjNxxkOFmN4WsBnk9u5yxTvRxGj$
--
-- A start job for unit [email protected] has finished with a failure.
--
-- The job identifier is 588 and the job result is failed.
Aug 09 11:41:29 r-39-VM systemd[1607]: PAM _pam_load_conf_file: unable to open
config for /etc/pam.d/null Aug 09 11:41:29 r-39-VM systemd[1607]: PAM error
loading (null) Aug 09 11:41:29 r-39-VM systemd[1607]: PAM _pam_init_handlers:
error reading /etc/pam.d/systemd-user Aug 09 11:41:29 r-39-VM systemd[1607]:
PAM _pam_init_handlers: [Critical error - immediate abort] Aug 09 11:41:29
r-39-VM systemd[1607]: PAM error reading PAM configuration file Aug 09 11:41:29
r-39-VM systemd[1607]: PAM pam_start: failed to initialize handlers Aug 09
11:41:29 r-39-VM systemd[1607]: PAM failed: Critical error - immediate abort
Aug 09 11:41:29 r-39-VM systemd[1607]: [email protected]: Failed to set up PAM
session: Operation not permitted Aug 09 11:41:29 r-39-VM systemd[1607]:
[email protected]: Failed at step PAM spawning /lib/systemd/systemd: Operation not
permitted
-- Subject: Process /lib/systemd/systemd could not be executed
-- Defined-By: systemd
-- Support:
https://urldefense.com/v3/__https://www.debian.org/support__;!!A6UyJA!wCf6hAHLa6AftXnrRfqcu9NkyxpVWGHy_xO0Bxz2lPUzny2fOmjNxxkOFmN4WsBnk9u5yxTvRxGj$
--
-- The process /lib/systemd/systemd could not be executed and failed.
--
-- The error number returned by this process is ERRNO.
After rebooting the VMs things are back to normal, at least for
now.
Any advice on why VRs behave like that and why PAM is
complaining ?
Best regards,
Jordan