Hi Rene,
- I think on the general issue of slow iptables rules application, we need to fix that. Does it help to increase aggregation timeouts? - If waiting for ssh and apache2 as part of post-init solves the issue, this would require a new systemvmtemplate as the systemd scripts cannot be changed or make effect during first boot. - I think the additional nics always used to show up for vmware, there is a global setting to configure this (extra nics for vmware, probably because older versions did not support dynamic nic addition on vmware vrs). - For VR timeouts, see logs and check if from management server host you're able to SSH into the VR using the private IP and port 3922. See the troubleshooting wiki: https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM%2C+templates%2C+Secondary+storage+troubleshooting - Can you share/check which processes are consuming the RAM, 256MB ram is usually enough for non-redundant VRs. (share output of top or check using htop?). Make sure to use a latest Linux version (any Debian variant such as Debian 8, 9 or Ubuntu 16.04+ may also work). The issue is vCenter/ESXi 6.5 for some reason, gives lower RAM compared to 6.0 and 5.5 and has poor support for legacy os. I had faced/found this issue while testing redundant VRs which take more RAM usually than normal VRs. - Rohit <https://cloudstack.apache.org> ________________________________ From: Rene Moser <m...@renemoser.net> Sent: Monday, February 26, 2018 11:22:27 AM To: us...@cloudstack.apache.org; dev@cloudstack.apache.org Subject: Re: [4.11] Management to VR connection issues Hi again We found the main problem. == cloud-postinit hang When having many iptables rules resulting in cloud-postinit to hang for 10min unless it was killed by systemd. As a result the ssh daemon was not started for 10 min because it is configured to be started after cloud-postinit. It seems the issue was already fixed by https://github.com/apache/cloudstack/commit/ce67726c6d3db6e7db537e76da6217c5d5f4b10e == VR still needs manual reboot However, we still notice adapter changes after a reboot: see before after screenshots of "ip addr" in https://photos.app.goo.gl/9XsjOJjLqQ9SRjYV2. We still need to manually reboot the VR to make the network actually working. == VR has too many adapters? Next thing we noticed there are many network adapters (NICs) for this non-vpc router (see screenshot of the vcenter in https://photos.app.goo.gl/9XsjOJjLqQ9SRjYV2). Adapter 4 and 5 seem unnecessary. Any comments on that? == VR with 256 MB RAM dows not work Next issue we found is, that the VR must have more than 256MB RAM. Otherwise systemd will complain the daemon can not be reloaded, because the ram disk of /run has too less space. Feb 23 16:24:36 r-413-VM postinit.sh[1089]: Failed to reload daemon: Refusing to reload, not enough space available on /run/systemd. Currently, 8.6M are free, but a safety buffer of 16.0M is enforced. root@r-413-VM:~# df -h /run/ Filesystem Size Used Avail Use% Mounted on tmpfs 16M 7.2M 8.7M 46% /run Increaing to 512MB RAM helped: root@r-413-VM:~# df -h /run/ Filesystem Size Used Avail Use% Mounted on tmpfs 41M 7.8M 34M 19% /run Unsure if this can be tuned on systemd level, didn't find a way yet. == VR API Command timeouts When executing command related to VR, e.g. restart network, start/stop router the command won't reach the vcenter api, and times out. We are unsure yet, why. == VR minor fixes Next we fixed 2 minor things along. * rsyslogd config syntax issue * IMHO we should start apache2 also after cloud-postinit Also see https://github.com/apache/cloudstack/pull/2468 Regards René rohit.ya...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue