Re: [ovs-dev] WFP and tunneling and packet fragments

Samuel Ghinet Tue, 05 Aug 2014 09:50:44 -0700

Eitan,

What I'm trying to say is:
1. This "assumption" cannot be applied on a production environment
2. You have the functionality to do TCP segmentation, but I don't think you 
have the functionality to do Ipv4 fragmentation. Nor to originate icmp4 / icmp6 
errors.
3. I've just added a task. I believe we both agree that it is a task to do :)

Sam
________________________________________
From: Eitan Eliahu [elia...@vmware.com]
Sent: Tuesday, August 05, 2014 7:17 PM
To: Samuel Ghinet
Cc: dev@openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Sam,
The basic assumption is that the inner packet and the encapsulation header 
length is smaller than the host NIC MTU. We have the infrastructure in the 
driver to split the packet and we  can go ahead and implement your suggestion 
once we add more common cases (multiple buffers in NBL, macro flows etc..).
Thanks,
Eitan
-----Original Message-----
From: Samuel Ghinet [mailto:sghi...@cloudbasesolutions.com]
Sent: Tuesday, August 05, 2014 7:52 AM
To: Eitan Eliahu
Cc: dev@openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Hi Eitan,

The approach is different in the meaning of "fragmenting" the packets so that 
after we encapsulate them they are of acceptable size (i.e. <= MTU of the 
external NIC), or convincing the VM to lower its MTU for destination (MTU Path 
finding) to our given value -- it does not change the MTU stored in the VM's 
Nic. So in my case it doesn't matter how big the MTU is set in the VM. Or I do 
not understand what you are saying?

MSS setting is an optimization.

Sam
________________________________________
From: Eitan Eliahu [elia...@vmware.com]
Sent: Tuesday, August 05, 2014 5:12 PM
To: Samuel Ghinet
Cc: dev@openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Hi Sam,
I'm wondering how this case is different from the case when  ipSec is being 
used.
I am sure at the least we can control the MTU of the host NIC and increase it 
to accommodate the encapsulation overhead (but, still nothing prevent the VM 
MTU getting increased even further) MSS setting, cannot be used for non TCP 
packets.
We are currently do take care of the case of TCP packets which are larger than 
the MSS size (OvsTcpSegmentNBL).
Thanks,
Eitan

-----Original Message-----
From: Samuel Ghinet [mailto:sghi...@cloudbasesolutions.com]
Sent: Tuesday, August 05, 2014 6:23 AM
To: Eitan Eliahu
Cc: dev@openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Hello Eitan,

I personally do not find that a viable solution: I mean,I don't think we can 
requests the clients (i.e. those using the OS-s from within the VMs) to change 
their MTU to each of their OS?  Unless there is a method to automate this, from 
within the hypervisor, and not allow the user of the VM to mess things up from 
their OSs, I don't think it's a good solution at all.

The approach I had taken for our project was to do ipv4 fragmentation in code.
The issue is actually a bit more complex, and was dealt with in those found 
situations (except one issue I present at the bottom):
a) ipv4 packet too big if we add encap bytes, and the flag DF is not set for 
the ipv4 (payload) packet => fragment the buffer, then encapsulate each fragment
b) ipv4 packet too big if we add encap bytes, but the DF flag is set for the 
ipv4 (payload) packet: originate icmp4 error "packet too big" to VM, specifying 
an MTU value, considering the encap required bytes.
c) ipv6 packet too big, if we add encap bytes: originate icmp6 error packet to 
VM, specifying as MTU value, considering the encap required bytes.

Also, there is one more situation that I had handled: if the payload ip had DF 
set, and
VM1 on Hypervisor 1 MTU = 1500
VM2 Hypervisor 2 MTU = 1480
encap bytes = 60

The problem that had happened was that:
- Hypervisor1 used a packet 1440 bytes, and added 60 bytes => packet size = 1500
- an icmp4 error was coming back from VM2, from Hypervisor2, to the Hypervisor1 
(max MTU = 1480)
- packet decapsulation: it says "packet too big, max mtu = 1480"
- send icmp4 error to VM1 (switch forwarding)
- VM1 retransmits, but uses the packet size = 1440 (1440 < 1480, so it think 
it's ok)
- The Packet comes again at Hypervisor1 to be encapsulated, and after 
encapsulation it has THE SAME SIZE as before, when it was a problem: 1500
- Basically the icmp4 error reporting solved thing, and the packet did not 
reach the destination at the end.

I had dealt with this by intercepting icmp4 errors coming from GRE (gre only, 
didn't get to do for vxlan), so that when it said "max MTU = 1480", I would 
change the packet and make VM1 receive "max MTU = 1420" (1480 - 60) This solved 
the problem.

Now, the problem that I did not handle, or better said, properly :-P The 
problem that I had done was that I did fragmentation before, and encapsulation 
after, when I should have done the other way around. And I had not handled 
received fragmented encapsulated packets - basically, for simple cases it had 
worked ok, because, the packets were being encapsulated after fragmentation, 
and packets did not normally reach the other side as fragments, in my test 
cases :). However, if we do encapsulation first, and fragmentation after, and 
deal with received fragmented encapsulated packets (e.g. via WFP) , we should 
be set.

Also, one more thing I had taken into account for my project was the Maximum 
Segment Size for TCP when used in tunneling: Basically, why do the 
fragmentation and all if we can tell the TCP of the other side what max tcp 
segment size to use?

Sam

________________________________________
From: Eitan Eliahu [elia...@vmware.com]
Sent: Tuesday, August 05, 2014 3:25 PM
To: Samuel Ghinet
Cc: dev@openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Yes.
Eitan

-----Original Message-----
From: Samuel Ghinet [mailto:sghi...@cloudbasesolutions.com]
Sent: Tuesday, August 05, 2014 5:12 AM
To: Eitan Eliahu
Cc: dev@openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Thanks Eitan!

Regarding point 2: do you mean to set the MTU from within the VM?
As I remember, I had found no powershell cmdlet that changes the MTU of a VNic.

Sam
________________________________________
From: Eitan Eliahu [elia...@vmware.com]
Sent: Monday, August 04, 2014 5:17 AM
To: Samuel Ghinet
Subject: RE: WFP and tunneling and packet fragments

Sam,
Here are some answers for your comments:
[1] WFP is used for Rx only and as you mentioned for fragmented packets only.
[2] Setting the VM MTU to accommodate the tunnel header is the correct 
configuration.
[3] We need to match the external packet in the flow table as other VXLAN 
packets could be received. (The external port is set to promiscuous mode by the 
VM switch). (There might be other reasons as well).
Thanks,
Eitan

-----Original Message-----
From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Samuel Ghinet
Sent: Sunday, August 03, 2014 11:53 AM
To: dev@openvswitch.org
Subject: [ovs-dev] WFP and tunneling and packet fragments

Hello guys,

I have studied a bit more the part of your code that deals with tunneling and 
WFP.

A summary of the flow, as I understand it:

ON RECEIVE (from external):
A. If there's a VXLAN encapsulated packet coming from outside, one that is NOT 
fragmented, the flow is like this:
1. Extract packet info (i.e. flow key)
2. find flow
3. if flow found => out to port X (or to multiple ports) 3.1. else => send to 
userspace: a flow will be created to handle the vxlan encapsulated packets that 
are NOT fragmented (but we'll later need to make a new flow for vxlan 
encapsulated packets that are fragmented)

For the case we have a flow, we output to a port X (which should be the manag 
os nic) After received by the manag os, the WFP will come in and call the 
registered callout / callback. This will decapsulate the vxlan packet, find a 
flow for it, and then execute the actions on the decapsulated packet (e.g. 
output to port Y).

The problem I find here is that the search for a flow is done twice.

B. If there's a VXLAN encapsulated packet coming from outside, one that IS 
fragmented, the flow is similar:
1. Extract packet info (i.e. flow key)
2. find flow
3. if flow found => out to port X (or to multiple ports) 3.1. else => send to 
userspace: a flow will be created to handle the vxlan encapsulated packets that 
are fragmented (but we'll later need to make a new flow for vxlan encapsulated 
packets that are not fragmented)

For the case we have a flow, we output to a port X (which should be the manag 
os nic) After received by the manag os, the WFP will come in, reassemble the 
fragmented vxlan packets, and then call the registered callout / callback. This 
will decapsulate the vxlan packet, find a flow for it, and then execute the 
actions on the decapsulated packet (e.g. output to port Y).

Again we have two searches for flow for the same packet.

ON SEND (to external / VXLAN):
There are three situations, as I see them:
1. the packet is small, and thus not an LSO either => encapsulate and output, 
all is perfect

2. the packet is LSO. The only case I found it in my tests (as I remember) was 
if the packet was coming from the manag os. If LSO is enabled in a VM, then, 
when reaching the switch, it is already fragmented and no longer has LSO (as 
NBL info).
Regarding LSO packets coming from manag os: As I remember, packets can be LSO 
here.
However, I believe there is no practical case in which we need to do "if in 
port = manag os => out to VXLAN".
I mean, tunneling is used for outputting packets from VMs only, as I understand.

3. The packet is not an LSO, but packet size + encap additional bytes > MTU 
(e.g. packet size = 1500).
Here we have two cases:
3.1. The packet is coming from manag os: In this case, if we do a netsh to 
lower the MTU below 1500 (i.e. taking in to account the max encap additional 
bytes), then, when a packet will need to be encapsulated, the MTU in the driver 
will be 1500, and the packet will be, say, 1420, instead of 1500. So it will 
work ok.
3.2. If the packet is coming from a VM: In this case, as I had tested, lowering 
the MTU in the manag os below 1500 did not solve the problem, as the packets 
coming from that VM were having size = 1500, so after being encapsulated they 
were too big.

I understand there is no WFP part for the sending of packets (to external) - 
and I actually believe there would be no place for WFP on send to external, 
since WFP callouts are called in a higher level on the driver stack than our 
driver.

So I've got several questions:
1. For receive (from external):
1.1. if we detect that the packet is an encapsulated packet (e.g. VXLAN) and 
also fragmented, should we not better match the flow disregarding the fragment 
type?
1.2. Could there be any method to avoid double flow lookup for the received 
encapsulated packets? A way to do this, I'm thinking, would be to defer the 
flow lookup when the packet is encapsulated, and simply output it to manag os 
port (that's where it must go anywhere), and the flow lookup only to be done in 
the WFP callback.
But I'm not sure... do only manag os packets reach the WFP callback, or packets 
from VMs as well?

2. For send (to external / VXLAN):
2.1. Do you deal with non-LSO 1500 bytes packets that arrive from VMs and must 
be sent to VXLAN?
2.2. I personally believe it is no practical scenario to send packets coming 
from the manag os to VXLAN port. If you believe otherwise, please let me know.

Thanks!
Sam
_______________________________________________
dev mailing list
dev@openvswitch.org
https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=yTvML8OxA42Jb6ViHe7fUXbvPVOYDPVq87w43doxtlY%3D%0A&m=h2eGELlO1TY5x%2F6q%2BrWLhIWKQWsrjS101oerjTT7XdE%3D%0A&s=ea7fcc480ae0607ac5e4f20afbf516746584dba74e78ecce09871211164e8612
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] WFP and tunneling and packet fragments

Reply via email to