Re: [4.11] Management to VR connection issues

Rohit Yadav Mon, 26 Feb 2018 03:42:07 -0800

Hi Rene,


- I think on the general issue of slow iptables rules application, we need to 
fix that. Does it help to increase aggregation timeouts?


- If waiting for ssh and apache2 as part of post-init solves the issue, this 
would require a new systemvmtemplate as the systemd scripts cannot be changed 
or make effect during first boot.


- I think the additional nics always used to show up for vmware, there is a 
global setting to configure this (extra nics for vmware, probably because older 
versions did not support dynamic nic addition on vmware vrs).


- For VR timeouts, see logs and check if from management server host you're 
able to SSH into the VR using the private IP and port 3922. See the 
troubleshooting wiki: 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM%2C+templates%2C+Secondary+storage+troubleshooting


- Can you share/check which processes are consuming the RAM, 256MB ram is 
usually enough for non-redundant VRs. (share output of top or check using 
htop?). Make sure to use a latest Linux version (any Debian variant such as 
Debian 8, 9 or Ubuntu 16.04+ may also work). The issue is vCenter/ESXi 6.5 for 
some reason, gives lower RAM compared to 6.0 and 5.5 and has poor support for 
legacy os. I had faced/found this issue while testing redundant VRs which take 
more RAM usually than normal VRs.


- Rohit

<https://cloudstack.apache.org>



________________________________
From: Rene Moser <m...@renemoser.net>
Sent: Monday, February 26, 2018 11:22:27 AM
To: us...@cloudstack.apache.org; dev@cloudstack.apache.org
Subject: Re: [4.11] Management to VR connection issues

Hi again

We found the main problem.

== cloud-postinit hang

When having many iptables rules resulting in cloud-postinit to hang for
10min unless it was killed by systemd. As a result the ssh daemon was
not started for 10 min because it is configured to be started after
cloud-postinit.

It seems the issue was already fixed by
https://github.com/apache/cloudstack/commit/ce67726c6d3db6e7db537e76da6217c5d5f4b10e

== VR still needs manual reboot

However, we still notice adapter changes after a reboot: see before
after screenshots of "ip addr" in
https://photos.app.goo.gl/9XsjOJjLqQ9SRjYV2. We still need to manually
reboot the VR to make the network actually working.

== VR has too many adapters?

Next thing we noticed there are many network adapters (NICs) for this
non-vpc router (see screenshot of the vcenter in
https://photos.app.goo.gl/9XsjOJjLqQ9SRjYV2). Adapter 4 and 5 seem
unnecessary. Any comments on that?

== VR with 256 MB RAM dows not work

Next issue we found is, that the VR must have more than 256MB RAM.
Otherwise systemd will complain the daemon can not be reloaded, because
the ram disk of /run has too less space.

Feb 23 16:24:36 r-413-VM postinit.sh[1089]: Failed to reload daemon:
Refusing to reload, not enough space available on /run/systemd.
Currently, 8.6M are free, but a safety buffer of 16.0M is enforced.
root@r-413-VM:~# df -h /run/
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            16M  7.2M  8.7M  46% /run

Increaing to 512MB RAM helped:

root@r-413-VM:~# df -h /run/
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            41M  7.8M   34M  19% /run

Unsure if this can be tuned on systemd level, didn't find a way yet.

== VR API Command timeouts

When executing command related to VR, e.g. restart network, start/stop
router the command won't reach the vcenter api, and times out. We are
unsure yet, why.

== VR minor fixes

Next we fixed 2 minor things along.

* rsyslogd config syntax issue
* IMHO we should start apache2 also after cloud-postinit

Also see https://github.com/apache/cloudstack/pull/2468

Regards
René

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

Re: [4.11] Management to VR connection issues

Reply via email to