Public bug reported: Scheduling ha routers end with two routers in master state. Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - after preparing new functional test.
ha_router.py in method - _get_state_change_monitor_callback() is starting a neutron-keepalived-state-change process with parameter --monitor-interface as ha_device (ha-xxx) and it's IP address. That application is monitoring using "ip netns exec xxx ip -o monitor address" all changes in that namespace. Each addition of that ha-xxx device produces a call to neutron-server API that this router becomes "master". It's producing false results because that device doesn't tell anything about that router is master or not. Logs from test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection Agent2: 2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] Launching async process [ip netns exec qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor address]. start /neutron/neutron/agent/linux/async_process.py:109 2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82 2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run /neutron/neutron/agent/l3/keepalived_state_change.py:59 2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process runs with uid/gid: 1000/1000 2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:2222/64, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87 2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] State: master notify_agent /neutron/neutron/agent/l3/keepalived_state_change.py:93 Agent1: 2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] Launching async process [ip netns exec qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. start /neutron/neutron/agent/linux/async_process.py:109 2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82 2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run /neutron/neutron/agent/l3/keepalived_state_change.py:59 2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process runs with uid/gid: 1000/1000 2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:28.543 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-22a4d1e0-ad, 169.254.0.1/24, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:28.544 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87 2016-05-10 16:23:28.544 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] State: master notify_agent /neutron/neutron/agent/l3/keepalived_state_change.py:93 Tox logs: > /neutron/neutron/tests/functional/agent/l3/test_ha_router.py(296)test_ha_router_lost_gw_connection() -> utils.wait_until_true(lambda: router1.ha_state == 'master') (Pdb) router1.ha_state, router2.ha_state ('master', 'master') (Pdb) ('master', 'master') (Pdb) ('master', 'master') ** Affects: neutron Importance: Undecided Assignee: Lubosz Kosnik (diltram) Status: New ** Tags: l3-ha ** Changed in: neutron Assignee: (unassigned) => Lubosz Kosnik (diltram) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1580648 Title: Two HA routers in master state Status in neutron: New Bug description: Scheduling ha routers end with two routers in master state. Issue discovered in that bug fix - https://review.openstack.org/#/c/273546 - after preparing new functional test. ha_router.py in method - _get_state_change_monitor_callback() is starting a neutron-keepalived-state-change process with parameter --monitor-interface as ha_device (ha-xxx) and it's IP address. That application is monitoring using "ip netns exec xxx ip -o monitor address" all changes in that namespace. Each addition of that ha-xxx device produces a call to neutron-server API that this router becomes "master". It's producing false results because that device doesn't tell anything about that router is master or not. Logs from test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection Agent2: 2016-05-10 16:23:20.653 16067 DEBUG neutron.agent.linux.async_process [-] Launching async process [ip netns exec qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2 ip -o monitor address]. start /neutron/neutron/agent/linux/async_process.py:109 2016-05-10 16:23:20.654 16067 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1@agent2', 'ip', '-o', 'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82 2016-05-10 16:23:20.661 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Monitor: ha-8aedf0c6-2a, 169.254.0.1/24 run /neutron/neutron/agent/l3/keepalived_state_change.py:59 2016-05-10 16:23:20.661 16067 INFO neutron.agent.linux.daemon [-] Process runs with uid/gid: 1000/1000 2016-05-10 16:23:20.767 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:20.901 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:21.324 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-8aedf0c6-2a, fe80::2022:22ff:fe22:2222/64, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:29.807 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-8aedf0c6-2a, 169.254.0.1/24, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87 2016-05-10 16:23:29.808 16067 DEBUG neutron.agent.l3.keepalived_state_change [-] State: master notify_agent /neutron/neutron/agent/l3/keepalived_state_change.py:93 Agent1: 2016-05-10 16:23:19.417 15906 DEBUG neutron.agent.linux.async_process [-] Launching async process [ip netns exec qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1 ip -o monitor address]. start /neutron/neutron/agent/linux/async_process.py:109 2016-05-10 16:23:19.418 15906 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qrouter-962f19e6-f592-49f7-8bc4-add116c0b7a3@agent1', 'ip', '-o', 'monitor', 'address'] create_process /neutron/neutron/agent/linux/utils.py:82 2016-05-10 16:23:19.425 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Monitor: ha-22a4d1e0-ad, 169.254.0.1/24 run /neutron/neutron/agent/l3/keepalived_state_change.py:59 2016-05-10 16:23:19.426 15906 INFO neutron.agent.linux.daemon [-] Process runs with uid/gid: 1000/1000 2016-05-10 16:23:19.525 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qr-88c93aa9-5a, fe80::c8fe:deff:fead:beef/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:19.645 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: qg-814d252d-26, fe80::c8fe:deff:fead:beee/64, False parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:19.927 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-22a4d1e0-ad, fe80::1034:56ff:fe78:2b5d/64, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:28.543 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Event: ha-22a4d1e0-ad, 169.254.0.1/24, True parse_and_handle_event /neutron/neutron/agent/l3/keepalived_state_change.py:73 2016-05-10 16:23:28.544 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] Wrote router 962f19e6-f592-49f7-8bc4-add116c0b7a3 state master write_state_change /neutron/neutron/agent/l3/keepalived_state_change.py:87 2016-05-10 16:23:28.544 15906 DEBUG neutron.agent.l3.keepalived_state_change [-] State: master notify_agent /neutron/neutron/agent/l3/keepalived_state_change.py:93 Tox logs: > /neutron/neutron/tests/functional/agent/l3/test_ha_router.py(296)test_ha_router_lost_gw_connection() -> utils.wait_until_true(lambda: router1.ha_state == 'master') (Pdb) router1.ha_state, router2.ha_state ('master', 'master') (Pdb) ('master', 'master') (Pdb) ('master', 'master') To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1580648/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp