GitHub user remibergsma opened a pull request:

    https://github.com/apache/cloudstack/pull/1486

    Reimplement router.redundant.vrrp.interval setting

    Global setting `router.redundant.vrrp.interval` is not used any more and it 
is now set to a hardcoded 1. 
    
    This results in a failover from master->backup when the backup doesn't hear 
from the master in ~3.6sec. This is a bit too tight, as we've seen failovers 
during live migrations. We could reproduce it in about half of the cases. 
Setting this to setting to 2 (tested it by hardcoding it in the systemvms) 
gives twice as much time and we didn't see issues any more. Instead of updating 
the hardcoded setting from 1 to 2, I reimplemented the global setting by 
sending it to the router with the cmd_line, as the non-VPC router also does.
    
    Background:
    Why is the maximum failover time in the example 3.6 seconds? This comes 
from the advertisement interval and the skew time. The default advertisement 
interval is 1 second (configurable in keepalived.conf). The skew time helps to 
keep everyone from trying to transition at once. It is a number between 0 and 
1, based on the formula (256 - priority) / 256
    
    As defined in the RFC, the backup must receive an advertisement from the 
master every (3 * advert_int) + skew_time seconds. If it doesn't hear anything 
from the master, it takes over. With a backup router priority of 100 (as in the 
example), the failover will happen at most 3.6 seconds after the master goes 
down.
    
    Source: http://www.hollenback.net/KeepalivedForNetworkReliability


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/remibergsma/cloudstack 
reimplement-vrrp-setting-47

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/1486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1486
    
----
commit c33358db848faf8c8891e00e0100a2627b177407
Author: Remi Bergsma <git...@remi.nl>
Date:   2016-03-23T15:33:20Z

    Have rVPCs use the router.redundant.vrrp.interval setting
    
    It defaults to 1, which is hardcoded in the template:
    
./cosmic/cosmic-core/systemvm/patches/debian/config/opt/cloud/templates/keepalived.conf.templ
    
    As non-VPC redundant routers use this setting, I think it makes sense to 
use it for rVPCs as well.
    
    We also need a change to pickup the cmd_line parameter and use it in the 
Python code that configures the router.

commit 408478413ad0469265dfa0ce9101d6337f558ab2
Author: Remi Bergsma <git...@remi.nl>
Date:   2016-03-23T15:56:54Z

    Configure rVPC for router.redundant.vrrp.interval advert_int setting

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to