GitHub user wilderrodrigues opened a pull request:

    https://github.com/apache/cloudstack/pull/940

    CLOUDSTACK-8952 - The redundant routers are facing a race condition due to 
several KeepaliveD/ConntrackD restarts

    This PR fixes the following issues:
    
    * KeepAliveD being restarted for each action performed on the routers
    * ConntrackD configuration being copied for each action performed on the 
routers, causing several restarts
    * ACS Management Server relying in the JSON file to report which router is 
Master/Backup
    * Public Interface on both routers are in UP state due to several places 
checking if the interface is UP/DOWN and trying to do KeepAliveD
    * Removing all the sleeps from the test_vpc_redundant.py - those are no 
longer needed
    * When KeepAliveD calls master.py during the election, update the 
cmdline.json to set the router in Backup mode: the election will take care of 
changing it afterwards.
    * Add LB stats_rules to iptables INPUT chain
    * The RVR public interface is set to eth2 instead of eth1 - as in the rVPC. 
Make sure the check works in both cases
    
    Those fixes make all the routers very stable, with ACL, FW, PF and LB 
working just fine!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ekholabs/cloudstack 
fix/rvr__keepalived_restart

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #940
    
----
commit 08b983fe022d309c5f49f776cce7c2b4a3f01cfd
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-14T09:21:53Z

    CLOUDSTACK-8952 - Remove the '--vrrp' search criteria form the CsProcess 
constructor call
    
       - There is no such process, which makes the CsProcess.find return false 
and restart keepalived all the time.

commit 5a216056b5a325b8abbe6f7c20f98caf202a27bc
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-14T12:13:24Z

    CLOUDSTACK-8952 - Do not replace the conntrackd config file unless it's 
needed
    
       - With the new logic, the file will be replaced when the router starts, 
becasue the default
         conntrackd config file will be different.

commit b4920aa028e75c64160988113ac268e5ea5ae69e
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-14T12:24:11Z

    CLOUDSTACK-8952 - Do not restart conntrackd unless it's needed
    
       - With the keepalived fixed they should not be needed anymore. So first 
reducing them drasticaly
       - I am now making a backup of the template file, write to the template 
file and compare it with the existing configuration
       - The template file is recovered afer the process
       - I also check if the process is running
       - I fixed a bug in the compare method
       - I am now updating the configuration variable once the file content is 
flushed to disk

commit d762dc8579a3ee40c762559d62affdf44194e853
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-15T10:44:28Z

    CLOUDSTACK-8952 - The public interface was comming UP in the Backup router
    
       - There were too many places trying to put the pub interface UP. I 
centralised it now.

commit 1886c4a1b33c2cd75bd5e49626943b5526894bc6
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-15T10:44:54Z

    CLOUDSTACK-8952 - Make sure we restart dnsmasq if the configuration file 
changes
    
       - It was working before because the Routers were restarting about 10 
times for each operation
         e.g. adding a VM to a network ot acquiring a new IP.
       - Adding stat_rules of internal LB to iptables
         We needed one extra rule in the INPUT chain

commit 2b286ecd730763a472fff2071a8fd7166692e11f
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-15T14:43:29Z

    CLOUDSTACK-8952 - Make sure the calls to CsFile use the new logic of 
commit/is_changed methods
    
       - We now have to check if the file changed before commiting. Doesn't 
make sense to write on disk if there was nono change.

commit c7671f3cdd4cb1b52ff44b44288cb843098bccde
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-15T16:31:03Z

    CLOUDSTACK-8952 - Restart dnsmasq everytime the configure.py runs

commit 41f4d8b58a337dc97526f2acb551c854b3432177
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-16T09:55:31Z

    CLOUDSTACK-8952 - Make the check for master more reliable
    
       - Do not use the API call because it will read what is in the database, 
that might not have been updated yet
         * Check the status in the router directly instead
       - Remove all the sleeps

commit 5b3c99031ffa1e2f73fc839d054cb88f6abd802b
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-17T06:09:52Z

    CLOUDSTACK-8952 - Do not rely in the router state on the json file to 
report back to ACS
    
       - If we stop/start a router, the state in the file will still say 
MASTER, when it is actually not
       - Checking the state based on the interface (eth1) state
       - Once master.py is called by keepalived, save the state in the json 
file to BACKUP just to make sure it's also written there

commit 2a747ca73538325fb24b3eefb95197bc1f8c6222
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-17T10:09:26Z

    CLOUDSTACK-8952 - Reduce retried from 20 to 5
    
       - We do not need to retry that much

commit 38d03576d61d1ddac8f29b962d9d30bc45d7a39b
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-17T12:47:05Z

    CLOUDSTACK-8952 - Make the tests rely on the interface state other than the 
json file

commit fb33cb28aba7bfc829651e8881a9a6afa6a70a76
Author: Wilder Rodrigues <wrodrig...@schubergphilis.com>
Date:   2015-10-17T12:48:08Z

    CLOUDSTACK-8952 - Make the checkrouter.sh compatible with RVR as well

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to