答复: Agent LB for CloudStack failed

2019-07-18 Thread li jerry

I added host.lb.check.interval = 0 to all agent.properties and restarted the 
cloudstack-agent


The following is the connection status of the agent after reboot.

mysql> select host.id ,host.name,host.mgmt_server_id,host.status,mshost.name 
from host,mshost where host.mgmt_server_id=mshost.msid;
+++++--+
| id | name   | mgmt_server_id | status | name |
+++++--+
|  1 | test-ceph-node01.cs2cloud.internal |  2200502468634 | Up | acs-mn01 |
|  3 | s-8-VM |  2200502468634 | Up | acs-mn01 |
|  5 | test-ceph-node03.cs2cloud.internal |  2200502468634 | Up | acs-mn01 |
|  2 | v-9-VM |  2199950196764 | Up | acs-mn02 |
|  4 | test-ceph-node02.cs2cloud.internal |  2199950196764 | Up | acs-mn02 |
|  6 | test-ceph-node04.cs2cloud.internal |  2199950196764 | Up | acs-mn02 |
+++++--+
6 rows in set (0.00 sec)

2019-07-18 15:10 Forced power off to close acs-mn02

wait

After the 15th minute (2019-07-18 15:26:23), the agent found that the 
management node failed and began to switch.
So, add host.lb.check.interval=0 to agent. properties doesn't solve the problem.

Below is the log




2019-07-18 15:26:23,414 DEBUG [utils.nio.NioConnection] 
(Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket 
Socket[addr=/172.17.1.142,port=8250,localport=33190] closed on read.  Probably 
-1 returned: No route to host
2019-07-18 15:26:23,416 DEBUG [utils.nio.NioConnection] 
(Agent-NioConnectionHandler-1:null) (logid:) Closing socket 
Socket[addr=/172.17.1.142,port=8250,localport=33190]
2019-07-18 15:26:23,417 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) Clearing watch list: 2
2019-07-18 15:26:23,417 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) Lost connection to host: 172.17.1.142. Attempting reconnection while 
we still have 0 commands in progress.
2019-07-18 15:26:23,420 INFO  [utils.nio.NioClient] (Agent-Handler-2:null) 
(logid:) NioClient connection closed
2019-07-18 15:26:23,420 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) Reconnecting to host:172.17.1.142
2019-07-18 15:26:23,420 INFO  [utils.nio.NioClient] (Agent-Handler-2:null) 
(logid:) Connecting to 172.17.1.142:8250
2019-07-18 15:26:26,427 ERROR [utils.nio.NioConnection] (Agent-Handler-2:null) 
(logid:) Unable to initialize the threads.
java.net.NoRouteToHostException: No route to host
  at sun.nio.ch.Net.connect0(Native Method)
  at sun.nio.ch.Net.connect(Net.java:454)
  at sun.nio.ch.Net.connect(Net.java:446)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
  at com.cloud.utils.nio.NioClient.init(NioClient.java:56)
  at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
  at com.cloud.agent.Agent.reconnect(Agent.java:517)
  at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091)
  at com.cloud.utils.nio.Task.call(Task.java:83)
  at com.cloud.utils.nio.Task.call(Task.java:29)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
2019-07-18 15:26:26,432 INFO  [utils.exception.CSExceptionErrorCode] 
(Agent-Handler-2:null) (logid:) Could not find exception: 
com.cloud.utils.exception.NioConnectionException in error code list for 
exceptions
2019-07-18 15:26:26,432 WARN  [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) NIO Connection Exception  
com.cloud.utils.exception.NioConnectionException: No route to host
2019-07-18 15:26:26,432 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) Attempted to connect to the server, but received an unexpected 
exception, trying again...
2019-07-18 15:26:26,432 INFO  [utils.nio.NioClient] (Agent-Handler-2:null) 
(logid:) NioClient connection closed
2019-07-18 15:26:31,433 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) Reconnecting to host:172.17.1.141
2019-07-18 15:26:31,434 INFO  [utils.nio.NioClient] (Agent-Handler-2:null) 
(logid:) Connecting to 172.17.1.141:8250
2019-07-18 15:26:31,435 INFO  [utils.nio.Link] (Agent-Handler-2:null) (logid:) 
Conf file found: /etc/cloudstack/agent/agent.properties
2019-07-18 15:26:31,545 INFO  [utils.nio.NioClient] (Agent-Handler-2:null) 
(logid:) SSL: Handshake done
2019-07-18 15:26:31,546 INFO  [utils.nio.NioClient] (Agent-Handler-2:null) 
(logid:) Connected to 172.17.1.141:8250
2019-07-18 15:26:31,564 DEBUG [kvm.resource.LibvirtConnection] 
(Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: 
qemu:///system

发件人: Nicolas Vazquez

Re: VXLAN and KVm experiences

2019-07-18 Thread Wido den Hollander



On 10/23/18 2:54 PM, Ivan Kudryavtsev wrote:
> Doesn't solution like this works seamlessly for large VXLAN networks?
> 
> https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn

We are using that with CloudStack right now. We have a modified version
of 'modifyvxlan.sh':
https://github.com/PCextreme/cloudstack/blob/vxlan-bgp-evpn/scripts/vm/network/vnet/modifyvxlan.sh

Your 'tunnelip' needs to be set on 'lo', in our case this is
10.255.255.255.X

We have the script in /usr/share/modifyvxlan.sh so that it's found by
the Agent and we don't overwrite the existing script (which might break
after a upgrade).

Our frr conf on the hypervisor:

frr version 7.1
frr defaults traditional
hostname myfirsthypervisor
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
interface enp81s0f0
 no ipv6 nd suppress-ra
!
interface enp81s0f1
 no ipv6 nd suppress-ra
!
interface lo
 ip address 10.255.255.9/32
 ipv6 address 2001:db8:100::9/128
!
router bgp 4200100123
 bgp router-id 10.255.255.9
 no bgp default ipv4-unicast
 neighbor uplinks peer-group
 neighbor uplinks remote-as external
 neighbor uplinks ebgp-multihop 255
 neighbor enp81s0f0 interface peer-group uplinks
 neighbor enp81s0f1 interface peer-group uplinks
 !
 address-family ipv4 unicast
  network 10.255.255.9/32
  neighbor uplinks activate
  neighbor uplinks next-hop-self
 exit-address-family
 !
 address-family ipv6 unicast
  network 2001:db8:100::9/128
  neighbor uplinks activate
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor uplinks activate
  advertise-all-vni
 exit-address-family
!
line vty
!

Both enp81s0f0 and enp81s0f1 are 100G interfaces connected to Cumulus
Linux routers/switches and they use BGP Unnumbered (IPv6 Link Local) for
their BGP sessions.

Hope this helps!

Wido

> 
> вт, 23 окт. 2018 г., 8:34 Simon Weller :
> 
>> Linux native VXLAN uses multicast and each host has to participate in
>> multicast in order to see the VXLAN networks. We haven't tried using PIM
>> across a L3 boundary with ACS, although it will probably work fine.
>>
>> Another option is to use a L3 VTEP, but right now there is no native
>> support for that in CloudStack's VXLAN implementation, although we've
>> thought about proposing it as feature.
>>
>>
>> 
>> From: Wido den Hollander 
>> Sent: Tuesday, October 23, 2018 7:17 AM
>> To: dev@cloudstack.apache.org; Simon Weller
>> Subject: Re: VXLAN and KVm experiences
>>
>>
>>
>> On 10/23/18 1:51 PM, Simon Weller wrote:
>>> We've also been using VXLAN on KVM for all of our isolated VPC guest
>> networks for quite a long time now. As Andrija pointed out, make sure you
>> increase the max_igmp_memberships param and also put an ip address on each
>> interface host VXLAN interface in the same subnet for all hosts that will
>> share networking, or multicast won't work.
>>>
>>
>> Thanks! So you are saying that all hypervisors need to be in the same L2
>> network or are you routing the multicast?
>>
>> My idea was that each POD would be an isolated Layer 3 domain and that a
>> VNI would span over the different Layer 3 networks.
>>
>> I don't like STP and other Layer 2 loop-prevention systems.
>>
>> Wido
>>
>>>
>>> - Si
>>>
>>>
>>> 
>>> From: Wido den Hollander 
>>> Sent: Tuesday, October 23, 2018 5:21 AM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: VXLAN and KVm experiences
>>>
>>>
>>>
>>> On 10/23/18 11:21 AM, Andrija Panic wrote:
 Hi Wido,

 I have "pioneered" this one in production for last 3 years (and
>> suffered a
 nasty pain of silent drop of packages on kernel 3.X back in the days
 because of being unaware of max_igmp_memberships kernel parameters, so I
 have updated the manual long time ago).

 I never had any issues (beside above nasty one...) and it works very
>> well.
>>>
>>> That's what I want to hear!
>>>
 To avoid above issue that I described - you should increase
 max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  -
>> otherwise
 with more than 20 vxlan interfaces, some of them will stay in down state
 and have a hard traffic drop (with proper message in agent.log) with
>> kernel
> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and
>> also
 pay attention to MTU size as well - anyway everything is in the manual
>> (I
 updated everything I though was missing) - so please check it.

>>>
>>> Yes, the underlying network will all be 9000 bytes MTU.
>>>
 Our example setup:

 We have i.e. bond.950 as the main VLAN which will carry all vxlan
>> "tunnels"
 - so this is defined as KVM traffic label. In our case it didn't make
>> sense
 to use bridge on top of this bond0.950 (as the traffic label) - you can
 test it on your own - since this bridge is used only to extract child
 bond0.950 interface name, then based on vxlan ID, ACS will provision
 vxlan...@bond0.xxx and join this new vxlan interface to NEW