Re: VXLAN and KVm experiences

2018-10-23 Thread Andrija Panic
Hi Wido,

I have "pioneered" this one in production for last 3 years (and suffered a
nasty pain of silent drop of packages on kernel 3.X back in the days
because of being unaware of max_igmp_memberships kernel parameters, so I
have updated the manual long time ago).

I never had any issues (beside above nasty one...) and it works very well.
To avoid above issue that I described - you should increase
max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
with more than 20 vxlan interfaces, some of them will stay in down state
and have a hard traffic drop (with proper message in agent.log) with kernel
>4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
pay attention to MTU size as well - anyway everything is in the manual (I
updated everything I though was missing) - so please check it.

Our example setup:

We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
- so this is defined as KVM traffic label. In our case it didn't make sense
to use bridge on top of this bond0.950 (as the traffic label) - you can
test it on your own - since this bridge is used only to extract child
bond0.950 interface name, then based on vxlan ID, ACS will provision
vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge created
(and then of course vNIC goes to this new bridge), so original bridge (to
which bond0.xxx belonged) is not used for anything.

Here is sample from above for vxlan 867 used for tenant isolation:

root@hostname:~# brctl show brvx-867

bridge name bridge id   STP enabled interfaces
brvx-8678000.2215cfce99ce   no  vnet6

 vxlan867

root@hostname:~# ip -d link show vxlan867

297: vxlan867:  mtu 8142 qdisc noqueue
master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300

root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
  UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1

So note how the vxlan interface has by 50 bytes smaller MTU than the
bond0.950 parent interface (which could affects traffic inside VM) - so
jumbo frames are needed anyway on the parent interface (bond.950 in example
above with minimum of 1550 MTU)

Ping me if more details needed, happy to help.

Cheers
Andrija

On Tue, 23 Oct 2018 at 08:23, Wido den Hollander  wrote:

> Hi,
>
> I just wanted to know if there are people out there using KVM with
> Advanced Networking and using VXLAN for different networks.
>
> Our main goal would be to spawn a VM and based on the network the NIC is
> in attach it to a different VXLAN bridge on the KVM host.
>
> It seems to me that this should work, but I just wanted to check and see
> if people have experience with it.
>
> Wido
>


-- 

Andrija Panić


Re: VXLAN and KVm experiences

2018-10-23 Thread Wido den Hollander



On 10/23/18 11:21 AM, Andrija Panic wrote:
> Hi Wido,
> 
> I have "pioneered" this one in production for last 3 years (and suffered a
> nasty pain of silent drop of packages on kernel 3.X back in the days
> because of being unaware of max_igmp_memberships kernel parameters, so I
> have updated the manual long time ago).
> 
> I never had any issues (beside above nasty one...) and it works very well.

That's what I want to hear!

> To avoid above issue that I described - you should increase
> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
> with more than 20 vxlan interfaces, some of them will stay in down state
> and have a hard traffic drop (with proper message in agent.log) with kernel
>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
> pay attention to MTU size as well - anyway everything is in the manual (I
> updated everything I though was missing) - so please check it.
> 

Yes, the underlying network will all be 9000 bytes MTU.

> Our example setup:
> 
> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
> - so this is defined as KVM traffic label. In our case it didn't make sense
> to use bridge on top of this bond0.950 (as the traffic label) - you can
> test it on your own - since this bridge is used only to extract child
> bond0.950 interface name, then based on vxlan ID, ACS will provision
> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge created
> (and then of course vNIC goes to this new bridge), so original bridge (to
> which bond0.xxx belonged) is not used for anything.
> 

Clear, I indeed thought something like that would happen.

> Here is sample from above for vxlan 867 used for tenant isolation:
> 
> root@hostname:~# brctl show brvx-867
> 
> bridge name bridge id   STP enabled interfaces
> brvx-8678000.2215cfce99ce   no  vnet6
> 
>  vxlan867
> 
> root@hostname:~# ip -d link show vxlan867
> 
> 297: vxlan867:  mtu 8142 qdisc noqueue
> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300
> 
> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
> 
> So note how the vxlan interface has by 50 bytes smaller MTU than the
> bond0.950 parent interface (which could affects traffic inside VM) - so
> jumbo frames are needed anyway on the parent interface (bond.950 in example
> above with minimum of 1550 MTU)
> 

Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
networks underneath will be ~9k.

> Ping me if more details needed, happy to help.
> 

Awesome! We'll be doing a PoC rather soon. I'll come back with our
experiences later.

Wido

> Cheers
> Andrija
> 
> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander  wrote:
> 
>> Hi,
>>
>> I just wanted to know if there are people out there using KVM with
>> Advanced Networking and using VXLAN for different networks.
>>
>> Our main goal would be to spawn a VM and based on the network the NIC is
>> in attach it to a different VXLAN bridge on the KVM host.
>>
>> It seems to me that this should work, but I just wanted to check and see
>> if people have experience with it.
>>
>> Wido
>>
> 
> 


Re: VXLAN and KVm experiences

2018-10-23 Thread Simon Weller
We've also been using VXLAN on KVM for all of our isolated VPC guest networks 
for quite a long time now. As Andrija pointed out, make sure you increase the 
max_igmp_memberships param and also put an ip address on each interface host 
VXLAN interface in the same subnet for all hosts that will share networking, or 
multicast won't work.


- Si



From: Wido den Hollander 
Sent: Tuesday, October 23, 2018 5:21 AM
To: dev@cloudstack.apache.org
Subject: Re: VXLAN and KVm experiences



On 10/23/18 11:21 AM, Andrija Panic wrote:
> Hi Wido,
>
> I have "pioneered" this one in production for last 3 years (and suffered a
> nasty pain of silent drop of packages on kernel 3.X back in the days
> because of being unaware of max_igmp_memberships kernel parameters, so I
> have updated the manual long time ago).
>
> I never had any issues (beside above nasty one...) and it works very well.

That's what I want to hear!

> To avoid above issue that I described - you should increase
> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
> with more than 20 vxlan interfaces, some of them will stay in down state
> and have a hard traffic drop (with proper message in agent.log) with kernel
>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
> pay attention to MTU size as well - anyway everything is in the manual (I
> updated everything I though was missing) - so please check it.
>

Yes, the underlying network will all be 9000 bytes MTU.

> Our example setup:
>
> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
> - so this is defined as KVM traffic label. In our case it didn't make sense
> to use bridge on top of this bond0.950 (as the traffic label) - you can
> test it on your own - since this bridge is used only to extract child
> bond0.950 interface name, then based on vxlan ID, ACS will provision
> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge created
> (and then of course vNIC goes to this new bridge), so original bridge (to
> which bond0.xxx belonged) is not used for anything.
>

Clear, I indeed thought something like that would happen.

> Here is sample from above for vxlan 867 used for tenant isolation:
>
> root@hostname:~# brctl show brvx-867
>
> bridge name bridge id   STP enabled interfaces
> brvx-8678000.2215cfce99ce   no  vnet6
>
>  vxlan867
>
> root@hostname:~# ip -d link show vxlan867
>
> 297: vxlan867:  mtu 8142 qdisc noqueue
> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300
>
> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>
> So note how the vxlan interface has by 50 bytes smaller MTU than the
> bond0.950 parent interface (which could affects traffic inside VM) - so
> jumbo frames are needed anyway on the parent interface (bond.950 in example
> above with minimum of 1550 MTU)
>

Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
networks underneath will be ~9k.

> Ping me if more details needed, happy to help.
>

Awesome! We'll be doing a PoC rather soon. I'll come back with our
experiences later.

Wido

> Cheers
> Andrija
>
> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander  wrote:
>
>> Hi,
>>
>> I just wanted to know if there are people out there using KVM with
>> Advanced Networking and using VXLAN for different networks.
>>
>> Our main goal would be to spawn a VM and based on the network the NIC is
>> in attach it to a different VXLAN bridge on the KVM host.
>>
>> It seems to me that this should work, but I just wanted to check and see
>> if people have experience with it.
>>
>> Wido
>>
>
>


Re: VXLAN and KVm experiences

2018-10-23 Thread Nux!
+1 VXLAN works just fine in my testing, the only gotcha I ever hit as Si 
mentioned is setting an IP address of sorts on the interface.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Simon Weller" 
> To: "dev" 
> Sent: Tuesday, 23 October, 2018 12:51:17
> Subject: Re: VXLAN and KVm experiences

> We've also been using VXLAN on KVM for all of our isolated VPC guest networks
> for quite a long time now. As Andrija pointed out, make sure you increase the
> max_igmp_memberships param and also put an ip address on each interface host
> VXLAN interface in the same subnet for all hosts that will share networking, 
> or
> multicast won't work.
> 
> 
> - Si
> 
> 
> 
> From: Wido den Hollander 
> Sent: Tuesday, October 23, 2018 5:21 AM
> To: dev@cloudstack.apache.org
> Subject: Re: VXLAN and KVm experiences
> 
> 
> 
> On 10/23/18 11:21 AM, Andrija Panic wrote:
>> Hi Wido,
>>
>> I have "pioneered" this one in production for last 3 years (and suffered a
>> nasty pain of silent drop of packages on kernel 3.X back in the days
>> because of being unaware of max_igmp_memberships kernel parameters, so I
>> have updated the manual long time ago).
>>
>> I never had any issues (beside above nasty one...) and it works very well.
> 
> That's what I want to hear!
> 
>> To avoid above issue that I described - you should increase
>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
>> with more than 20 vxlan interfaces, some of them will stay in down state
>> and have a hard traffic drop (with proper message in agent.log) with kernel
>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
>> pay attention to MTU size as well - anyway everything is in the manual (I
>> updated everything I though was missing) - so please check it.
>>
> 
> Yes, the underlying network will all be 9000 bytes MTU.
> 
>> Our example setup:
>>
>> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
>> - so this is defined as KVM traffic label. In our case it didn't make sense
>> to use bridge on top of this bond0.950 (as the traffic label) - you can
>> test it on your own - since this bridge is used only to extract child
>> bond0.950 interface name, then based on vxlan ID, ACS will provision
>> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge created
>> (and then of course vNIC goes to this new bridge), so original bridge (to
>> which bond0.xxx belonged) is not used for anything.
>>
> 
> Clear, I indeed thought something like that would happen.
> 
>> Here is sample from above for vxlan 867 used for tenant isolation:
>>
>> root@hostname:~# brctl show brvx-867
>>
>> bridge name bridge id   STP enabled interfaces
>> brvx-8678000.2215cfce99ce   no  vnet6
>>
>>  vxlan867
>>
>> root@hostname:~# ip -d link show vxlan867
>>
>> 297: vxlan867:  mtu 8142 qdisc noqueue
>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
>> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300
>>
>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>>
>> So note how the vxlan interface has by 50 bytes smaller MTU than the
>> bond0.950 parent interface (which could affects traffic inside VM) - so
>> jumbo frames are needed anyway on the parent interface (bond.950 in example
>> above with minimum of 1550 MTU)
>>
> 
> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
> networks underneath will be ~9k.
> 
>> Ping me if more details needed, happy to help.
>>
> 
> Awesome! We'll be doing a PoC rather soon. I'll come back with our
> experiences later.
> 
> Wido
> 
>> Cheers
>> Andrija
>>
>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander  wrote:
>>
>>> Hi,
>>>
>>> I just wanted to know if there are people out there using KVM with
>>> Advanced Networking and using VXLAN for different networks.
>>>
>>> Our main goal would be to spawn a VM and based on the network the NIC is
>>> in attach it to a different VXLAN bridge on the KVM host.
>>>
>>> It seems to me that this should work, but I just wanted to check and see
>>> if people have experience with it.
>>>
>>> Wido
>>>
>>


Re: VXLAN and KVm experiences

2018-10-23 Thread Wido den Hollander



On 10/23/18 1:51 PM, Simon Weller wrote:
> We've also been using VXLAN on KVM for all of our isolated VPC guest networks 
> for quite a long time now. As Andrija pointed out, make sure you increase the 
> max_igmp_memberships param and also put an ip address on each interface host 
> VXLAN interface in the same subnet for all hosts that will share networking, 
> or multicast won't work.
> 

Thanks! So you are saying that all hypervisors need to be in the same L2
network or are you routing the multicast?

My idea was that each POD would be an isolated Layer 3 domain and that a
VNI would span over the different Layer 3 networks.

I don't like STP and other Layer 2 loop-prevention systems.

Wido

> 
> - Si
> 
> 
> 
> From: Wido den Hollander 
> Sent: Tuesday, October 23, 2018 5:21 AM
> To: dev@cloudstack.apache.org
> Subject: Re: VXLAN and KVm experiences
> 
> 
> 
> On 10/23/18 11:21 AM, Andrija Panic wrote:
>> Hi Wido,
>>
>> I have "pioneered" this one in production for last 3 years (and suffered a
>> nasty pain of silent drop of packages on kernel 3.X back in the days
>> because of being unaware of max_igmp_memberships kernel parameters, so I
>> have updated the manual long time ago).
>>
>> I never had any issues (beside above nasty one...) and it works very well.
> 
> That's what I want to hear!
> 
>> To avoid above issue that I described - you should increase
>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
>> with more than 20 vxlan interfaces, some of them will stay in down state
>> and have a hard traffic drop (with proper message in agent.log) with kernel
>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
>> pay attention to MTU size as well - anyway everything is in the manual (I
>> updated everything I though was missing) - so please check it.
>>
> 
> Yes, the underlying network will all be 9000 bytes MTU.
> 
>> Our example setup:
>>
>> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
>> - so this is defined as KVM traffic label. In our case it didn't make sense
>> to use bridge on top of this bond0.950 (as the traffic label) - you can
>> test it on your own - since this bridge is used only to extract child
>> bond0.950 interface name, then based on vxlan ID, ACS will provision
>> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge created
>> (and then of course vNIC goes to this new bridge), so original bridge (to
>> which bond0.xxx belonged) is not used for anything.
>>
> 
> Clear, I indeed thought something like that would happen.
> 
>> Here is sample from above for vxlan 867 used for tenant isolation:
>>
>> root@hostname:~# brctl show brvx-867
>>
>> bridge name bridge id   STP enabled interfaces
>> brvx-8678000.2215cfce99ce   no  vnet6
>>
>>  vxlan867
>>
>> root@hostname:~# ip -d link show vxlan867
>>
>> 297: vxlan867:  mtu 8142 qdisc noqueue
>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
>> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300
>>
>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>>
>> So note how the vxlan interface has by 50 bytes smaller MTU than the
>> bond0.950 parent interface (which could affects traffic inside VM) - so
>> jumbo frames are needed anyway on the parent interface (bond.950 in example
>> above with minimum of 1550 MTU)
>>
> 
> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
> networks underneath will be ~9k.
> 
>> Ping me if more details needed, happy to help.
>>
> 
> Awesome! We'll be doing a PoC rather soon. I'll come back with our
> experiences later.
> 
> Wido
> 
>> Cheers
>> Andrija
>>
>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander  wrote:
>>
>>> Hi,
>>>
>>> I just wanted to know if there are people out there using KVM with
>>> Advanced Networking and using VXLAN for different networks.
>>>
>>> Our main goal would be to spawn a VM and based on the network the NIC is
>>> in attach it to a different VXLAN bridge on the KVM host.
>>>
>>> It seems to me that this should work, but I just wanted to check and see
>>> if people have experience with it.
>>>
>>> Wido
>>>
>>
>>
> 


Re: VXLAN and KVm experiences

2018-10-23 Thread Simon Weller
Linux native VXLAN uses multicast and each host has to participate in multicast 
in order to see the VXLAN networks. We haven't tried using PIM across a L3 
boundary with ACS, although it will probably work fine.

Another option is to use a L3 VTEP, but right now there is no native support 
for that in CloudStack's VXLAN implementation, although we've thought about 
proposing it as feature.



From: Wido den Hollander 
Sent: Tuesday, October 23, 2018 7:17 AM
To: dev@cloudstack.apache.org; Simon Weller
Subject: Re: VXLAN and KVm experiences



On 10/23/18 1:51 PM, Simon Weller wrote:
> We've also been using VXLAN on KVM for all of our isolated VPC guest networks 
> for quite a long time now. As Andrija pointed out, make sure you increase the 
> max_igmp_memberships param and also put an ip address on each interface host 
> VXLAN interface in the same subnet for all hosts that will share networking, 
> or multicast won't work.
>

Thanks! So you are saying that all hypervisors need to be in the same L2
network or are you routing the multicast?

My idea was that each POD would be an isolated Layer 3 domain and that a
VNI would span over the different Layer 3 networks.

I don't like STP and other Layer 2 loop-prevention systems.

Wido

>
> - Si
>
>
> 
> From: Wido den Hollander 
> Sent: Tuesday, October 23, 2018 5:21 AM
> To: dev@cloudstack.apache.org
> Subject: Re: VXLAN and KVm experiences
>
>
>
> On 10/23/18 11:21 AM, Andrija Panic wrote:
>> Hi Wido,
>>
>> I have "pioneered" this one in production for last 3 years (and suffered a
>> nasty pain of silent drop of packages on kernel 3.X back in the days
>> because of being unaware of max_igmp_memberships kernel parameters, so I
>> have updated the manual long time ago).
>>
>> I never had any issues (beside above nasty one...) and it works very well.
>
> That's what I want to hear!
>
>> To avoid above issue that I described - you should increase
>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
>> with more than 20 vxlan interfaces, some of them will stay in down state
>> and have a hard traffic drop (with proper message in agent.log) with kernel
>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
>> pay attention to MTU size as well - anyway everything is in the manual (I
>> updated everything I though was missing) - so please check it.
>>
>
> Yes, the underlying network will all be 9000 bytes MTU.
>
>> Our example setup:
>>
>> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
>> - so this is defined as KVM traffic label. In our case it didn't make sense
>> to use bridge on top of this bond0.950 (as the traffic label) - you can
>> test it on your own - since this bridge is used only to extract child
>> bond0.950 interface name, then based on vxlan ID, ACS will provision
>> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge created
>> (and then of course vNIC goes to this new bridge), so original bridge (to
>> which bond0.xxx belonged) is not used for anything.
>>
>
> Clear, I indeed thought something like that would happen.
>
>> Here is sample from above for vxlan 867 used for tenant isolation:
>>
>> root@hostname:~# brctl show brvx-867
>>
>> bridge name bridge id   STP enabled interfaces
>> brvx-8678000.2215cfce99ce   no  vnet6
>>
>>  vxlan867
>>
>> root@hostname:~# ip -d link show vxlan867
>>
>> 297: vxlan867:  mtu 8142 qdisc noqueue
>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
>> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300
>>
>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>>
>> So note how the vxlan interface has by 50 bytes smaller MTU than the
>> bond0.950 parent interface (which could affects traffic inside VM) - so
>> jumbo frames are needed anyway on the parent interface (bond.950 in example
>> above with minimum of 1550 MTU)
>>
>
> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
> networks underneath will be ~9k.
>
>> Ping me if more details needed, happy to help.
>>
>
> Awesome! We'll be doing a PoC rather soon. I'll come back with our
> experiences later.
>
> Wido
>
>> Cheers
>> Andrija
>>
>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander  wrote:
>>
>>> Hi,
>>>
>>> I just wanted to know if there are people out there using KVM with
>>> Advanced Networking and using VXLAN for different networks.
>>>
>>> Our main goal would be to spawn a VM and based on the network the NIC is
>>> in attach it to a different VXLAN bridge on the KVM host.
>>>
>>> It seems to me that this should work, but I just wanted to check and see
>>> if people have experience with it.
>>>
>>> Wido
>>>
>>
>>
>


Caching - Ehcache

2018-10-23 Thread Marc-Aurèle Brothier
Hi everyone,

Will trying to lower the DB load for CloudStack I did some long testing
and
here are my outcomes for the current cache mechanism in CloudStack.

I would be interested to hear from people who try to customize the
ehcache
configuration in CS.
A PR (https://github.com/apache/cloudstack/pull/2913) is also open to
desactivate (before deleting) ehcache in CS, read below to understand
why.

ProblemsThe code in CS does not seem to fit any caching mechanism
especially due to the homemade DAO code. The main 3 flaws are the
following:
Entities are not expected to be sharedThere is quite a lot of code with
method calls passing entity IDs value as long, which does some object
fetching. Without caching, this behavior will create distinct objects
each time an entity with the same ID is fetched. With the cache
enabled, the same object will be shared among those methods. It has
been seen that it does generate some side effects where code still
expected unchanged entity attributes after calling different methods
thus generating exception/bugs.
DAO update operations are using search queriesSome part of the code are
updating entities based on a search query, therefore the whole cache
must be invalidated (see GenericDaoBase: public int
update(UpdateBuilder ub, final SearchCriteria sc, Integer rows);).
Entities based on views joining multiple tablesThere are quite a lot of
entities based on SQL views joining multiple entities in a same object.
Enabling caching on those would require a mechanism to link and cross-
remove related objects whenever one of the sub-entity is changed.
Final wordBased on the previously discussed points, the best approach
IMHO would be to move out of the custom DAO framework in CS and use a
well known one. It will handle caching well and the joins made by the
views
in the code. It's not an easy change, but it will fix along a lot of
issues and
add a proven / robust framework to an important part of the code.

The work to change the DAO layer is a huge task, I don't know how / who
will perform it.

What are the proposals for a new DAO framework ?

FYI I will stop working for Exoscale at the end of the month, so I
won't be able to
tackle such challenge as I won't be working with CS anymore. I'll try
my best
to continue looking at the project to give my insights and share the
experienced
I have with CS.

Marc-Aurèle


Re: VXLAN and KVm experiences

2018-10-23 Thread Ivan Kudryavtsev
Doesn't solution like this works seamlessly for large VXLAN networks?

https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn

вт, 23 окт. 2018 г., 8:34 Simon Weller :

> Linux native VXLAN uses multicast and each host has to participate in
> multicast in order to see the VXLAN networks. We haven't tried using PIM
> across a L3 boundary with ACS, although it will probably work fine.
>
> Another option is to use a L3 VTEP, but right now there is no native
> support for that in CloudStack's VXLAN implementation, although we've
> thought about proposing it as feature.
>
>
> 
> From: Wido den Hollander 
> Sent: Tuesday, October 23, 2018 7:17 AM
> To: dev@cloudstack.apache.org; Simon Weller
> Subject: Re: VXLAN and KVm experiences
>
>
>
> On 10/23/18 1:51 PM, Simon Weller wrote:
> > We've also been using VXLAN on KVM for all of our isolated VPC guest
> networks for quite a long time now. As Andrija pointed out, make sure you
> increase the max_igmp_memberships param and also put an ip address on each
> interface host VXLAN interface in the same subnet for all hosts that will
> share networking, or multicast won't work.
> >
>
> Thanks! So you are saying that all hypervisors need to be in the same L2
> network or are you routing the multicast?
>
> My idea was that each POD would be an isolated Layer 3 domain and that a
> VNI would span over the different Layer 3 networks.
>
> I don't like STP and other Layer 2 loop-prevention systems.
>
> Wido
>
> >
> > - Si
> >
> >
> > 
> > From: Wido den Hollander 
> > Sent: Tuesday, October 23, 2018 5:21 AM
> > To: dev@cloudstack.apache.org
> > Subject: Re: VXLAN and KVm experiences
> >
> >
> >
> > On 10/23/18 11:21 AM, Andrija Panic wrote:
> >> Hi Wido,
> >>
> >> I have "pioneered" this one in production for last 3 years (and
> suffered a
> >> nasty pain of silent drop of packages on kernel 3.X back in the days
> >> because of being unaware of max_igmp_memberships kernel parameters, so I
> >> have updated the manual long time ago).
> >>
> >> I never had any issues (beside above nasty one...) and it works very
> well.
> >
> > That's what I want to hear!
> >
> >> To avoid above issue that I described - you should increase
> >> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  -
> otherwise
> >> with more than 20 vxlan interfaces, some of them will stay in down state
> >> and have a hard traffic drop (with proper message in agent.log) with
> kernel
> >>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and
> also
> >> pay attention to MTU size as well - anyway everything is in the manual
> (I
> >> updated everything I though was missing) - so please check it.
> >>
> >
> > Yes, the underlying network will all be 9000 bytes MTU.
> >
> >> Our example setup:
> >>
> >> We have i.e. bond.950 as the main VLAN which will carry all vxlan
> "tunnels"
> >> - so this is defined as KVM traffic label. In our case it didn't make
> sense
> >> to use bridge on top of this bond0.950 (as the traffic label) - you can
> >> test it on your own - since this bridge is used only to extract child
> >> bond0.950 interface name, then based on vxlan ID, ACS will provision
> >> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge
> created
> >> (and then of course vNIC goes to this new bridge), so original bridge
> (to
> >> which bond0.xxx belonged) is not used for anything.
> >>
> >
> > Clear, I indeed thought something like that would happen.
> >
> >> Here is sample from above for vxlan 867 used for tenant isolation:
> >>
> >> root@hostname:~# brctl show brvx-867
> >>
> >> bridge name bridge id   STP enabled interfaces
> >> brvx-8678000.2215cfce99ce   no  vnet6
> >>
> >>  vxlan867
> >>
> >> root@hostname:~# ip -d link show vxlan867
> >>
> >> 297: vxlan867:  mtu 8142 qdisc noqueue
> >> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
> >> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
> >> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing
> 300
> >>
> >> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
> >>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
> >>
> >> So note how the vxlan interface has by 50 bytes smaller MTU than the
> >> bond0.950 parent interface (which could affects traffic inside VM) - so
> >> jumbo frames are needed anyway on the parent interface (bond.950 in
> example
> >> above with minimum of 1550 MTU)
> >>
> >
> > Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
> > networks underneath will be ~9k.
> >
> >> Ping me if more details needed, happy to help.
> >>
> >
> > Awesome! We'll be doing a PoC rather soon. I'll come back with our
> > experiences later.
> >
> > Wido
> >
> >> Cheers
> >> Andrija
> >>
> >> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I just wanted to know if there are people out th

Re: VXLAN and KVm experiences

2018-10-23 Thread Simon Weller
Yeah, being able to handle EVPN within ACS via FRR would be awesome. FRR has 
added a lot of features since we tested it last. We were having problems with 
FRR honouring route targets and dynamically creating routes based on labels. If 
I recall, it was related to LDP  9.3 not functioning correctly.



From: Ivan Kudryavtsev 
Sent: Tuesday, October 23, 2018 7:54 AM
To: dev
Subject: Re: VXLAN and KVm experiences

Doesn't solution like this works seamlessly for large VXLAN networks?

https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn

вт, 23 окт. 2018 г., 8:34 Simon Weller :

> Linux native VXLAN uses multicast and each host has to participate in
> multicast in order to see the VXLAN networks. We haven't tried using PIM
> across a L3 boundary with ACS, although it will probably work fine.
>
> Another option is to use a L3 VTEP, but right now there is no native
> support for that in CloudStack's VXLAN implementation, although we've
> thought about proposing it as feature.
>
>
> 
> From: Wido den Hollander 
> Sent: Tuesday, October 23, 2018 7:17 AM
> To: dev@cloudstack.apache.org; Simon Weller
> Subject: Re: VXLAN and KVm experiences
>
>
>
> On 10/23/18 1:51 PM, Simon Weller wrote:
> > We've also been using VXLAN on KVM for all of our isolated VPC guest
> networks for quite a long time now. As Andrija pointed out, make sure you
> increase the max_igmp_memberships param and also put an ip address on each
> interface host VXLAN interface in the same subnet for all hosts that will
> share networking, or multicast won't work.
> >
>
> Thanks! So you are saying that all hypervisors need to be in the same L2
> network or are you routing the multicast?
>
> My idea was that each POD would be an isolated Layer 3 domain and that a
> VNI would span over the different Layer 3 networks.
>
> I don't like STP and other Layer 2 loop-prevention systems.
>
> Wido
>
> >
> > - Si
> >
> >
> > 
> > From: Wido den Hollander 
> > Sent: Tuesday, October 23, 2018 5:21 AM
> > To: dev@cloudstack.apache.org
> > Subject: Re: VXLAN and KVm experiences
> >
> >
> >
> > On 10/23/18 11:21 AM, Andrija Panic wrote:
> >> Hi Wido,
> >>
> >> I have "pioneered" this one in production for last 3 years (and
> suffered a
> >> nasty pain of silent drop of packages on kernel 3.X back in the days
> >> because of being unaware of max_igmp_memberships kernel parameters, so I
> >> have updated the manual long time ago).
> >>
> >> I never had any issues (beside above nasty one...) and it works very
> well.
> >
> > That's what I want to hear!
> >
> >> To avoid above issue that I described - you should increase
> >> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  -
> otherwise
> >> with more than 20 vxlan interfaces, some of them will stay in down state
> >> and have a hard traffic drop (with proper message in agent.log) with
> kernel
> >>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and
> also
> >> pay attention to MTU size as well - anyway everything is in the manual
> (I
> >> updated everything I though was missing) - so please check it.
> >>
> >
> > Yes, the underlying network will all be 9000 bytes MTU.
> >
> >> Our example setup:
> >>
> >> We have i.e. bond.950 as the main VLAN which will carry all vxlan
> "tunnels"
> >> - so this is defined as KVM traffic label. In our case it didn't make
> sense
> >> to use bridge on top of this bond0.950 (as the traffic label) - you can
> >> test it on your own - since this bridge is used only to extract child
> >> bond0.950 interface name, then based on vxlan ID, ACS will provision
> >> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge
> created
> >> (and then of course vNIC goes to this new bridge), so original bridge
> (to
> >> which bond0.xxx belonged) is not used for anything.
> >>
> >
> > Clear, I indeed thought something like that would happen.
> >
> >> Here is sample from above for vxlan 867 used for tenant isolation:
> >>
> >> root@hostname:~# brctl show brvx-867
> >>
> >> bridge name bridge id   STP enabled interfaces
> >> brvx-8678000.2215cfce99ce   no  vnet6
> >>
> >>  vxlan867
> >>
> >> root@hostname:~# ip -d link show vxlan867
> >>
> >> 297: vxlan867:  mtu 8142 qdisc noqueue
> >> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
> >> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
> >> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing
> 300
> >>
> >> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
> >>   UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
> >>
> >> So note how the vxlan interface has by 50 bytes smaller MTU than the
> >> bond0.950 parent interface (which could affects traffic inside VM) - so
> >> jumbo frames are needed anyway on the parent interface (bond.950 in
> example
> >> above with minimum of 1550 MTU)
> >>
> >
> >

Re: KVM CloudStack Agent Hacking proposal

2018-10-23 Thread Ivan Kudryavtsev
Hello, Paul. You have implemented the second part of the proposal, which is
related to Qemu hook. Unfortunately, Qemu hooks are not always the right
place to implement features. I would like it to be, but it's not because of:
https://www.libvirt.org/hooks.html#recursive

In my case, I even implemented a standalone unix-socket server which I have
called from the hook without awaiting and which, in turn, forked the
process to interact with libvirt, but it still causes the deadlock because
of CS KVM agent and security_groups.py still does their interaction and it
leads to a deadlock. So, hooks are not enough, but important to be. It's
cool that current agent implementation already includes hooks - less work
to do)

So, my proposal is to inject the capability into CS KVM agent. If the
design I introduced in the first e-mail is OK, we can implement it.

> I would like is to introduce a more generic approach, so the
administrator can specify additional scripts in the
> agent.properties, which will be called the same way "security_groups.py"
called.
> custom.vm.start=/path/to/script1,path/to.script2
> custom.vm.stop=/path/to/script3,path/to.script4

Thank you for your time and opinions.


вт, 23 окт. 2018 г. в 2:56, Paul Angus :

> Hi Ivan,
>
> I think that this may already have been added in 4.12 by ShapeBlue
>
>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+hook+script+include
>
> if nothing else it sounds like you want to build upon this rather than
> rewrite it.
>
>
>
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Wido den Hollander 
> Sent: 23 October 2018 07:46
> To: dev@cloudstack.apache.org
> Subject: Re: KVM CloudStack Agent Hacking proposal
>
>
>
> On 10/22/18 8:02 PM, Ivan Kudryavtsev wrote:
> > Hello, Devs.
> >
> > I would like to introduce a feature and decided to consult with you
> > about its design before implementation. The feature is connected with
> > KVM CloudStack agent. We have found it beneficial to be able to launch
> > custom scripts upon VM start/stop. It can be done using Qemu hook but
> > it has several drawbacks:
> > - the hook is deployed by CS and adding additional lines into it leads
> > to extra efforts when ACS package is updated.
> > - it leads to deadlocks as you cannot effectively and easy to
> > communicate with libvirt from hook even with "fork & exec" because
> > security_groups.py and agent also participate and as a result it causes
> deadlocks.
> >
> > Now, in the code, we have a call for "security_groups.py":
> >
> > Start:
> > https://github.com/apache/cloudstack/blob/65f31f1a9fbc1c20cd752d80a7e1
> > 117efc0248a5/plugins/hypervisors/kvm/src/main/java/com/cloud/hyperviso
> > r/kvm/resource/wrapper/LibvirtStartCommandWrapper.java#L103
> >
> > Stop:
> > https://github.com/apache/cloudstack/blob/65f31f1a9fbc1c20cd752d80a7e1
> > 117efc0248a5/plugins/hypervisors/kvm/src/main/java/com/cloud/hyperviso
> > r/kvm/resource/wrapper/LibvirtStopCommandWrapper.java#L88
> >
> > I would like is to introduce a more generic approach, so the
> > administrator can specify additional scripts in the agent.properties,
> > which will be called the same way "security_groups.py" called.
> >
> > custom.vm.start=/path/to/script1,path/to.script2
> > custom.vm.stop=/path/to/script3,path/to.script4
> >
> > So, this feature will help users to do custom hotplug mechanisms. E.g.
> > we have such implementation which adds per-account VXLAN as a hotplug
> > ethernet device. So, even for a Basic Zone, every VM gets automatic
> > second NIC which helps to build a private network for an account.
> >
> > Currently, we do the job thru adding lines into security_groups.py,
> > which is not a good approach, especially for end users who don't want
> > to hack the system.
> >
> > Also, I'm thinking about changing /etc/libvirt/hooks/qemu the same
> > way, so it was just an entry point to  /etc/libvirt/hooks/qemu.d/*
> located scripts.
> >
> > Let me know about this feature proposal and if its design is good, we
> > start developing it.
> >
>
> Seems like a good thing! It adds flexibility to the VM.
>
> How are you planning on getting things like the VM name and other details
> to the scripts?
>
> Wido
>
> > Have a good day.
> >
>


-- 
With best regards, Ivan Kudryavtsev
Bitworks LLC
Cell RU: +7-923-414-1515
Cell USA: +1-201-257-1512
WWW: http://bitworks.software/ 


Re: KVM CloudStack Agent Hacking proposal

2018-10-23 Thread Ivan Kudryavtsev
Wido,

> How are you planning on getting things like the VM name and other
details to the scripts?

Agent passes it right now into SGs, so I'm thinking about the same way:
https://github.com/apache/cloudstack/blob/65f31f1a9fbc1c20cd752d80a7e1117efc0248a5/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtStartCommandWrapper.java#L103

As about VM details, I think, the script can get them from libvirt if
necessary or from /var/run/libvirt/qemu it's up to the user how to get what
he/she needs. What I care about is about avoiding the deadlocks.

вт, 23 окт. 2018 г. в 2:46, Wido den Hollander :

>
>
> On 10/22/18 8:02 PM, Ivan Kudryavtsev wrote:
> > Hello, Devs.
> >
> > I would like to introduce a feature and decided to consult with you about
> > its design before implementation. The feature is connected with KVM
> > CloudStack agent. We have found it beneficial to be able to launch custom
> > scripts upon VM start/stop. It can be done using Qemu hook but it has
> > several drawbacks:
> > - the hook is deployed by CS and adding additional lines into it leads to
> > extra efforts when ACS package is updated.
> > - it leads to deadlocks as you cannot effectively and easy to communicate
> > with libvirt from hook even with "fork & exec" because security_groups.py
> > and agent also participate and as a result it causes deadlocks.
> >
> > Now, in the code, we have a call for "security_groups.py":
> >
> > Start:
> >
> https://github.com/apache/cloudstack/blob/65f31f1a9fbc1c20cd752d80a7e1117efc0248a5/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtStartCommandWrapper.java#L103
> >
> > Stop:
> >
> https://github.com/apache/cloudstack/blob/65f31f1a9fbc1c20cd752d80a7e1117efc0248a5/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtStopCommandWrapper.java#L88
> >
> > I would like is to introduce a more generic approach, so the
> administrator
> > can specify additional scripts in the agent.properties, which will be
> > called the same way "security_groups.py" called.
> >
> > custom.vm.start=/path/to/script1,path/to.script2
> > custom.vm.stop=/path/to/script3,path/to.script4
> >
> > So, this feature will help users to do custom hotplug mechanisms. E.g. we
> > have such implementation which adds per-account VXLAN as a hotplug
> ethernet
> > device. So, even for a Basic Zone, every VM gets automatic second NIC
> which
> > helps to build a private network for an account.
> >
> > Currently, we do the job thru adding lines into security_groups.py, which
> > is not a good approach, especially for end users who don't want to hack
> the
> > system.
> >
> > Also, I'm thinking about changing /etc/libvirt/hooks/qemu the same way,
> so
> > it was just an entry point to  /etc/libvirt/hooks/qemu.d/* located
> scripts.
> >
> > Let me know about this feature proposal and if its design is good, we
> start
> > developing it.
> >
>
> Seems like a good thing! It adds flexibility to the VM.
>
> How are you planning on getting things like the VM name and other
> details to the scripts?
>
> Wido
>
> > Have a good day.
> >
>


-- 
With best regards, Ivan Kudryavtsev
Bitworks LLC
Cell RU: +7-923-414-1515
Cell USA: +1-201-257-1512
WWW: http://bitworks.software/ 


[GitHub] AlexBeez opened a new pull request #13: Update Quick Installation Guide

2018-10-23 Thread GitBox
AlexBeez opened a new pull request #13: Update Quick Installation Guide
URL: https://github.com/apache/cloudstack-documentation/pull/13
 
 
   Updated to CentOS7, fixed networking to reflect required configuration


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rhtyd commented on issue #13: Update Quick Installation Guide

2018-10-23 Thread GitBox
rhtyd commented on issue #13: Update Quick Installation Guide
URL: 
https://github.com/apache/cloudstack-documentation/pull/13#issuecomment-432362171
 
 
   Thanks @AlexBeez, I'll review your changes later this week and get back to 
you. /cc @PaulAngus 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services