GitHub user wilderrodrigues opened a pull request: https://github.com/apache/cloudstack/pull/940
CLOUDSTACK-8952 - The redundant routers are facing a race condition due to several KeepaliveD/ConntrackD restarts This PR fixes the following issues: * KeepAliveD being restarted for each action performed on the routers * ConntrackD configuration being copied for each action performed on the routers, causing several restarts * ACS Management Server relying in the JSON file to report which router is Master/Backup * Public Interface on both routers are in UP state due to several places checking if the interface is UP/DOWN and trying to do KeepAliveD * Removing all the sleeps from the test_vpc_redundant.py - those are no longer needed * When KeepAliveD calls master.py during the election, update the cmdline.json to set the router in Backup mode: the election will take care of changing it afterwards. * Add LB stats_rules to iptables INPUT chain * The RVR public interface is set to eth2 instead of eth1 - as in the rVPC. Make sure the check works in both cases Those fixes make all the routers very stable, with ACL, FW, PF and LB working just fine! You can merge this pull request into a Git repository by running: $ git pull https://github.com/ekholabs/cloudstack fix/rvr__keepalived_restart Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cloudstack/pull/940.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #940 ---- commit 08b983fe022d309c5f49f776cce7c2b4a3f01cfd Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-14T09:21:53Z CLOUDSTACK-8952 - Remove the '--vrrp' search criteria form the CsProcess constructor call - There is no such process, which makes the CsProcess.find return false and restart keepalived all the time. commit 5a216056b5a325b8abbe6f7c20f98caf202a27bc Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-14T12:13:24Z CLOUDSTACK-8952 - Do not replace the conntrackd config file unless it's needed - With the new logic, the file will be replaced when the router starts, becasue the default conntrackd config file will be different. commit b4920aa028e75c64160988113ac268e5ea5ae69e Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-14T12:24:11Z CLOUDSTACK-8952 - Do not restart conntrackd unless it's needed - With the keepalived fixed they should not be needed anymore. So first reducing them drasticaly - I am now making a backup of the template file, write to the template file and compare it with the existing configuration - The template file is recovered afer the process - I also check if the process is running - I fixed a bug in the compare method - I am now updating the configuration variable once the file content is flushed to disk commit d762dc8579a3ee40c762559d62affdf44194e853 Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-15T10:44:28Z CLOUDSTACK-8952 - The public interface was comming UP in the Backup router - There were too many places trying to put the pub interface UP. I centralised it now. commit 1886c4a1b33c2cd75bd5e49626943b5526894bc6 Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-15T10:44:54Z CLOUDSTACK-8952 - Make sure we restart dnsmasq if the configuration file changes - It was working before because the Routers were restarting about 10 times for each operation e.g. adding a VM to a network ot acquiring a new IP. - Adding stat_rules of internal LB to iptables We needed one extra rule in the INPUT chain commit 2b286ecd730763a472fff2071a8fd7166692e11f Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-15T14:43:29Z CLOUDSTACK-8952 - Make sure the calls to CsFile use the new logic of commit/is_changed methods - We now have to check if the file changed before commiting. Doesn't make sense to write on disk if there was nono change. commit c7671f3cdd4cb1b52ff44b44288cb843098bccde Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-15T16:31:03Z CLOUDSTACK-8952 - Restart dnsmasq everytime the configure.py runs commit 41f4d8b58a337dc97526f2acb551c854b3432177 Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-16T09:55:31Z CLOUDSTACK-8952 - Make the check for master more reliable - Do not use the API call because it will read what is in the database, that might not have been updated yet * Check the status in the router directly instead - Remove all the sleeps commit 5b3c99031ffa1e2f73fc839d054cb88f6abd802b Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-17T06:09:52Z CLOUDSTACK-8952 - Do not rely in the router state on the json file to report back to ACS - If we stop/start a router, the state in the file will still say MASTER, when it is actually not - Checking the state based on the interface (eth1) state - Once master.py is called by keepalived, save the state in the json file to BACKUP just to make sure it's also written there commit 2a747ca73538325fb24b3eefb95197bc1f8c6222 Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-17T10:09:26Z CLOUDSTACK-8952 - Reduce retried from 20 to 5 - We do not need to retry that much commit 38d03576d61d1ddac8f29b962d9d30bc45d7a39b Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-17T12:47:05Z CLOUDSTACK-8952 - Make the tests rely on the interface state other than the json file commit fb33cb28aba7bfc829651e8881a9a6afa6a70a76 Author: Wilder Rodrigues <wrodrig...@schubergphilis.com> Date: 2015-10-17T12:48:08Z CLOUDSTACK-8952 - Make the checkrouter.sh compatible with RVR as well ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---