Hello guys,

While working on the integration between our kernel netlink component and your 
driver, I had my attention drawn a bit more towards your implementation of the 
buffer management.
I expect that there was a discussion upon it, but I'd like to highlight a few 
things myself, hoping that together we'll come up with a solution that will 
work best for the future.

Namely, in the OvsExtractFlow function, in the code pushed upstream, the first 
NET_BUFFER of the NET_BUFFER_LIST is used as the basis for the flow key 
extraction.

I hope you had read this MSDN documentattion:
http://msdn.microsoft.com/en-us/library/windows/hardware/ff570756(v=vs.85).aspx

The relevant issues here, for our project, are:
"Each NET_BUFFER structure that is linked to a NET_BUFFER_LIST structure 
describes a single Ethernet frame."

"All NET_BUFFER structures that are associated with a NET_BUFFER_LIST structure 
must have the same Ethernet frame type and IP protocol version (IPv4 or IPv6)."

"All NET_BUFFER structures that are associated with a NET_BUFFER_LIST structure 
must have the same source and destination MAC addresses."

"If a driver is sending TCP or UDP frames, all of the NET_BUFFER structures 
that are associated with a NET_BUFFER_LIST structure must be associated with 
same TCP or UDP connection."

-------

In other words, we have 3 special cases:
a) We've got one NET_BUFFER_LIST, that contains one NET_BUFFER - best case for 
us, nothing special to do.

b) We've got a TCP or UDP packet split in multiple NET_BUFFER-s, in a 
NET_BUFFER_LIST.
This is not a very good case for us, because the packet fields that are 
guaranteed to be the same for all NET_BUFFERs are:
eth, net layer info (transport protocol only), transport layer (source and dest 
port only), ipv4 info (source and dest ip) and arp info, and ipv6 info.

Whereas the fields that we have NO guarantee that are the same among 
NET_BUFFER-s, are:
net layer: TOS, TTL, fragment type
transport layer: tcp flags
ipv6 info: flow label.

c) We have something else than TCP / UDP packet (e.g. icmp4, icmp6, etc.):
NONE of these fields are guaranteed to be the same for each NET_BUFFER in the 
NET_BUFFER_LIST:
net layer: TOS, TTL, fragment type (don't know about mpls, where it fits)
transpot layer: icmp type, icmp code (icmp4 and icmp6)
ipv4: src and dest address (they are the same only for UDP and TCP)
arp: perhaps the same problem as above, for ip address
ipv6: src address, dest address, flow label
It can also happen (i.e. we have no guarantee otherwise) that we get a 
NET_BUFFER_LIST with 2 NET_BUFFERs: first is some icmp6, the second contains an 
ICMP6 Neighbor Discovery. By the current implementation of OvsExtractFlow , we 
would fail matching a flow for the Neighbor Discovery packet - instead, the 
flow for the first icmp6 packet would be used for both packets.

----------

As a one line conclusion, given the current implementation of the buffer 
management mechanism, the flow mechanics can at any time create or match a 
wrong flow for a given NET_BUFFER.

Our implementation - create a NET_BUFFER_LIST for each existing NET_BUFFER, and 
wrap it in an OVS_NET_BUFFER - may be less efficient, but it takes into account 
all these cases, by the fact that it considers each NET_BUFFER in a 
NET_BUFFER_LIST as a totally different packet.
An improvement for our OVS_NET_BUFFER implementation would've been to have a 
cache of pre-allocated buffers, so as not to waste time allocating and 
de-allocating memory for them each time; also to take one NET_BUFFER packet in 
a NET_BUFFER_LIST as it is, without duplicating it.

I believe this can be considered an architectural issue, and should best be 
addressed early, rather than to continue to build functionality upon it, and 
come to change it later - given that the network packets are used in many 
places in the project.

I am eager to hear any suggestions. As well as, perhaps, how you thought to 
address this issue.
Samuel Ghinet

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to