答复: Agent LB for CloudStack failed
I added host.lb.check.interval = 0 to all agent.properties and restarted the cloudstack-agent The following is the connection status of the agent after reboot. mysql> select host.id ,host.name,host.mgmt_server_id,host.status,mshost.name from host,mshost where host.mgmt_server_id=mshost.msid; +++++--+ | id | name | mgmt_server_id | status | name | +++++--+ | 1 | test-ceph-node01.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 3 | s-8-VM | 2200502468634 | Up | acs-mn01 | | 5 | test-ceph-node03.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 2 | v-9-VM | 2199950196764 | Up | acs-mn02 | | 4 | test-ceph-node02.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | | 6 | test-ceph-node04.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | +++++--+ 6 rows in set (0.00 sec) 2019-07-18 15:10 Forced power off to close acs-mn02 wait After the 15th minute (2019-07-18 15:26:23), the agent found that the management node failed and began to switch. So, add host.lb.check.interval=0 to agent. properties doesn't solve the problem. Below is the log 2019-07-18 15:26:23,414 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport=33190] closed on read. Probably -1 returned: No route to host 2019-07-18 15:26:23,416 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=33190] 2019-07-18 15:26:23,417 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Clearing watch list: 2 2019-07-18 15:26:23,417 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in progress. 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:23,420 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.142 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.142:8250 2019-07-18 15:26:26,427 ERROR [utils.nio.NioConnection] (Agent-Handler-2:null) (logid:) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:454) at sun.nio.ch.Net.connect(Net.java:446) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) at com.cloud.utils.nio.NioClient.init(NioClient.java:56) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) at com.cloud.agent.Agent.reconnect(Agent.java:517) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-07-18 15:26:26,432 INFO [utils.exception.CSExceptionErrorCode] (Agent-Handler-2:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2019-07-18 15:26:26,432 WARN [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: No route to host 2019-07-18 15:26:26,432 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... 2019-07-18 15:26:26,432 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:31,433 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.141 2019-07-18 15:26:31,434 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.141:8250 2019-07-18 15:26:31,435 INFO [utils.nio.Link] (Agent-Handler-2:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties 2019-07-18 15:26:31,545 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) SSL: Handshake done 2019-07-18 15:26:31,546 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connected to 172.17.1.141:8250 2019-07-18 15:26:31,564 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system 发件人: Nicolas Vazquez
Re: VXLAN and KVm experiences
On 10/23/18 2:54 PM, Ivan Kudryavtsev wrote: > Doesn't solution like this works seamlessly for large VXLAN networks? > > https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn We are using that with CloudStack right now. We have a modified version of 'modifyvxlan.sh': https://github.com/PCextreme/cloudstack/blob/vxlan-bgp-evpn/scripts/vm/network/vnet/modifyvxlan.sh Your 'tunnelip' needs to be set on 'lo', in our case this is 10.255.255.255.X We have the script in /usr/share/modifyvxlan.sh so that it's found by the Agent and we don't overwrite the existing script (which might break after a upgrade). Our frr conf on the hypervisor: frr version 7.1 frr defaults traditional hostname myfirsthypervisor log syslog informational no ipv6 forwarding service integrated-vtysh-config ! interface enp81s0f0 no ipv6 nd suppress-ra ! interface enp81s0f1 no ipv6 nd suppress-ra ! interface lo ip address 10.255.255.9/32 ipv6 address 2001:db8:100::9/128 ! router bgp 4200100123 bgp router-id 10.255.255.9 no bgp default ipv4-unicast neighbor uplinks peer-group neighbor uplinks remote-as external neighbor uplinks ebgp-multihop 255 neighbor enp81s0f0 interface peer-group uplinks neighbor enp81s0f1 interface peer-group uplinks ! address-family ipv4 unicast network 10.255.255.9/32 neighbor uplinks activate neighbor uplinks next-hop-self exit-address-family ! address-family ipv6 unicast network 2001:db8:100::9/128 neighbor uplinks activate exit-address-family ! address-family l2vpn evpn neighbor uplinks activate advertise-all-vni exit-address-family ! line vty ! Both enp81s0f0 and enp81s0f1 are 100G interfaces connected to Cumulus Linux routers/switches and they use BGP Unnumbered (IPv6 Link Local) for their BGP sessions. Hope this helps! Wido > > вт, 23 окт. 2018 г., 8:34 Simon Weller : > >> Linux native VXLAN uses multicast and each host has to participate in >> multicast in order to see the VXLAN networks. We haven't tried using PIM >> across a L3 boundary with ACS, although it will probably work fine. >> >> Another option is to use a L3 VTEP, but right now there is no native >> support for that in CloudStack's VXLAN implementation, although we've >> thought about proposing it as feature. >> >> >> >> From: Wido den Hollander >> Sent: Tuesday, October 23, 2018 7:17 AM >> To: dev@cloudstack.apache.org; Simon Weller >> Subject: Re: VXLAN and KVm experiences >> >> >> >> On 10/23/18 1:51 PM, Simon Weller wrote: >>> We've also been using VXLAN on KVM for all of our isolated VPC guest >> networks for quite a long time now. As Andrija pointed out, make sure you >> increase the max_igmp_memberships param and also put an ip address on each >> interface host VXLAN interface in the same subnet for all hosts that will >> share networking, or multicast won't work. >>> >> >> Thanks! So you are saying that all hypervisors need to be in the same L2 >> network or are you routing the multicast? >> >> My idea was that each POD would be an isolated Layer 3 domain and that a >> VNI would span over the different Layer 3 networks. >> >> I don't like STP and other Layer 2 loop-prevention systems. >> >> Wido >> >>> >>> - Si >>> >>> >>> >>> From: Wido den Hollander >>> Sent: Tuesday, October 23, 2018 5:21 AM >>> To: dev@cloudstack.apache.org >>> Subject: Re: VXLAN and KVm experiences >>> >>> >>> >>> On 10/23/18 11:21 AM, Andrija Panic wrote: Hi Wido, I have "pioneered" this one in production for last 3 years (and >> suffered a nasty pain of silent drop of packages on kernel 3.X back in the days because of being unaware of max_igmp_memberships kernel parameters, so I have updated the manual long time ago). I never had any issues (beside above nasty one...) and it works very >> well. >>> >>> That's what I want to hear! >>> To avoid above issue that I described - you should increase max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships) - >> otherwise with more than 20 vxlan interfaces, some of them will stay in down state and have a hard traffic drop (with proper message in agent.log) with >> kernel > 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and >> also pay attention to MTU size as well - anyway everything is in the manual >> (I updated everything I though was missing) - so please check it. >>> >>> Yes, the underlying network will all be 9000 bytes MTU. >>> Our example setup: We have i.e. bond.950 as the main VLAN which will carry all vxlan >> "tunnels" - so this is defined as KVM traffic label. In our case it didn't make >> sense to use bridge on top of this bond0.950 (as the traffic label) - you can test it on your own - since this bridge is used only to extract child bond0.950 interface name, then based on vxlan ID, ACS will provision vxlan...@bond0.xxx and join this new vxlan interface to NEW