GitHub user remibergsma opened a pull request: https://github.com/apache/cloudstack/pull/1486
Reimplement router.redundant.vrrp.interval setting Global setting `router.redundant.vrrp.interval` is not used any more and it is now set to a hardcoded 1. This results in a failover from master->backup when the backup doesn't hear from the master in ~3.6sec. This is a bit too tight, as we've seen failovers during live migrations. We could reproduce it in about half of the cases. Setting this to setting to 2 (tested it by hardcoding it in the systemvms) gives twice as much time and we didn't see issues any more. Instead of updating the hardcoded setting from 1 to 2, I reimplemented the global setting by sending it to the router with the cmd_line, as the non-VPC router also does. Background: Why is the maximum failover time in the example 3.6 seconds? This comes from the advertisement interval and the skew time. The default advertisement interval is 1 second (configurable in keepalived.conf). The skew time helps to keep everyone from trying to transition at once. It is a number between 0 and 1, based on the formula (256 - priority) / 256 As defined in the RFC, the backup must receive an advertisement from the master every (3 * advert_int) + skew_time seconds. If it doesn't hear anything from the master, it takes over. With a backup router priority of 100 (as in the example), the failover will happen at most 3.6 seconds after the master goes down. Source: http://www.hollenback.net/KeepalivedForNetworkReliability You can merge this pull request into a Git repository by running: $ git pull https://github.com/remibergsma/cloudstack reimplement-vrrp-setting-47 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cloudstack/pull/1486.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1486 ---- commit c33358db848faf8c8891e00e0100a2627b177407 Author: Remi Bergsma <git...@remi.nl> Date: 2016-03-23T15:33:20Z Have rVPCs use the router.redundant.vrrp.interval setting It defaults to 1, which is hardcoded in the template: ./cosmic/cosmic-core/systemvm/patches/debian/config/opt/cloud/templates/keepalived.conf.templ As non-VPC redundant routers use this setting, I think it makes sense to use it for rVPCs as well. We also need a change to pickup the cmd_line parameter and use it in the Python code that configures the router. commit 408478413ad0469265dfa0ce9101d6337f558ab2 Author: Remi Bergsma <git...@remi.nl> Date: 2016-03-23T15:56:54Z Configure rVPC for router.redundant.vrrp.interval advert_int setting ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---