> Le 22 juin 2016 à 19:40, Assaf Muller <as...@redhat.com> a écrit : > > On Wed, Jun 22, 2016 at 12:02 PM, fabrice grelaud > <fabrice.grel...@u-bordeaux.fr <mailto:fabrice.grel...@u-bordeaux.fr>> wrote: >> >> Le 22 juin 2016 à 17:35, fabrice grelaud <fabrice.grel...@u-bordeaux.fr> a >> écrit : >> >> >> Le 22 juin 2016 à 15:45, Assaf Muller <as...@redhat.com> a écrit : >> >> On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud >> <fabrice.grel...@u-bordeaux.fr> wrote: >> >> Hi, >> >> we deployed our openstack infrastructure with your « exciting » project >> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after >> create router. >> >> Our infra (closer to the doc): >> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan, >> br-vlan)) >> 2 compute nodes (same for network) >> >> We create an external network (vlan type), an internal network (vxlan type) >> and a router connected to both networks. >> And when we launch an instance (cirros), we can’t receive an ip on the vm. >> >> We have: >> >> root@p-osinfra03-utility-container-783041da:~# neutron >> l3-agent-list-hosting-router router-bim >> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+ >> | id | host >> | admin_state_up | alive | ha_state | >> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+ >> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 | >> p-osinfra01-neutron-agents-container-f1ab9c14 | True | :-) | >> active | >> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 | >> p-osinfra02-neutron-agents-container-48142ffe | True | :-) | >> active | >> | 55350fac-16aa-488e-91fd-a7db38179c62 | >> p-osinfra03-neutron-agents-container-2f6557f0 | True | :-) | >> active | >> +--------------------------------------+-----------------------------------------------+----------------+-------+—————+ >> >> I know, i got a problem now because i should have :-) active, :-) standby, >> :-) standby… Snif... >> >> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns >> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 >> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036 >> >> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec >> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group >> default >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> inet 127.0.0.1/8 scope host lo >> valid_lft forever preferred_lft forever >> inet6 ::1/128 scope host >> valid_lft forever preferred_lft forever >> 2: ha-4a5f0287-91@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc >> pfifo_fast state UP group default qlen 1000 >> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff >> inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91 >> valid_lft forever preferred_lft forever >> inet 169.254.0.1/24 scope global ha-4a5f0287-91 >> valid_lft forever preferred_lft forever >> inet6 fe80::f816:3eff:fec2:67a9/64 scope link >> valid_lft forever preferred_lft forever >> 3: qr-44804d69-88@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc >> pfifo_fast state UP group default qlen 1000 >> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff >> inet 192.168.100.254/24 scope global qr-44804d69-88 >> valid_lft forever preferred_lft forever >> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link >> valid_lft forever preferred_lft forever >> 4: qg-c5c7378e-1d@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >> pfifo_fast state UP group default qlen 1000 >> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff >> inet 147.210.240.11/23 scope global qg-c5c7378e-1d >> valid_lft forever preferred_lft forever >> inet 147.210.240.12/32 scope global qg-c5c7378e-1d >> valid_lft forever preferred_lft forever >> inet6 fe80::f816:3eff:feb6:4c97/64 scope link >> valid_lft forever preferred_lft forever >> >> Same result on infra02 and infra03, qr and qg interfaces have the same ip, >> and ha interfaces the address 169.254.0.1. >> >> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we >> restart the first (p-osinfra01), we can reboot the instance and we got an >> ip, a floating ip and we can access by ssh from internet to the vm. (Note: >> after few time, we loss our connectivity too). >> >> But if we restart the two containers, we got a ha_state to « standby » until >> the three become « active » and finally we have the problem again. >> >> The three routers on infra 01/02/03 are seen as master. >> >> If we ping from our instance to the router (internal network 192.168.100.4 >> to 192.168.100.254) we can see some ARP Request >> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28 >> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28 >> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28 >> >> And on the compute node we see all these frames on the various interfaces >> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back. >> >> We also have on ha interface, on each router, the VRRP communication >> (heartbeat packets over a hidden project network that connects all ha >> routers (vxlan 70) ) . Priori as normal, each router thinks to be master. >> >> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec >> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nl -i ha-4a5f0287-91 >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on ha-4a5f0287-91, link-type EN10MB (Ethernet), capture size 65535 >> bytes >> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> >> root@p-osinfra02-neutron-agents-container-48142ffe:~# ip netns exec >> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nt -i ha-4ee5f8d0-7f >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on ha-4ee5f8d0-7f, link-type EN10MB (Ethernet), capture size 65535 >> bytes >> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> >> >> Are you seeing VRRP advertisements crossing nodes though? That tcpdump >> only shows advertisements from the local node. If nodes aren't >> receiving VRRP messages from other nodes, keepalived will declare >> itself as master (As expected). Can you ping the 'ha' interface from >> one router namespace to the other? >> >> >> I stop the three neutron agent container. >> Restart on infra01 then on infra02 >> >> I can see VRRP frames from infra01 (169.254.192.1 -> 224.0.0.18) and receive >> by infra02. >> >> root@p-osinfra02:~# tcpdump -nl -i em2 | grep 169.254 >> tcpdump: WARNING: em2: no IPv4 address assigned >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on em2, link-type EN10MB (Ethernet), capture size 65535 bytes >> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> …. >> …. >> then i have >> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, >> authtype simple, intvl 2s, length 20 >> >> No more 169.254.192.1 from infra01 but the IP of HA interface from router on >> infra02. >> >> And no more VRRP advertisements cross the nodes. >> On each infra node, we see VRRP advertisements from the node itself but >> nothing from the other. >> >> And otherwise, i can ping ha interface from one router namespace to the >> other: >> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec >> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ping 169.254.192.3 >> PING 169.254.192.3 (169.254.192.3) 56(84) bytes of data. >> 64 bytes from 169.254.192.3: icmp_seq=1 ttl=64 time=0.297 ms >> 64 bytes from 169.254.192.3: icmp_seq=2 ttl=64 time=0.239 ms >> 64 bytes from 169.254.192.3: icmp_seq=3 ttl=64 time=0.264 ms >> >> im’ going to test with other version of keepalived (current version here >> 1.2.7-1 ubuntu 14.04). >> >> Thanks to help >> >> >> Note: >> I said i can ping between ha interface but not for long time. At one point, >> i can’t anymore… :-( > > That's the problem. This becomes a normal Neutron troubleshooting: Why > can't one port ping the other? This might help: > https://assafmuller.com/2015/08/31/neutron-troubleshooting/ > <https://assafmuller.com/2015/08/31/neutron-troubleshooting/>
Hi, thanks for the link. I already had a look (more or less) and finally, i suspected a problem rather on switch side. (nexus) And after some investigation and tcpdump, we saw that packets to 239.1.1.1 are not forwarding by the switch. In fact, « igmp snooping » is enabled by default on nexus switch causing the bad behaviour. We disable igmp snooping and That’s all folks ! Cordially, > >> >> >> >> >> >> Someone could tell me if he has already encountered this problem ? >> The infra and compute nodes are connected to a nexus 9000 switch. >> >> Thank you in advance for taking the time to study my request. >> >> Fabrice Grelaud >> Université de Bordeaux >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org > <mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev