Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)

Mike Lovell Mon, 26 Nov 2012 09:19:21 -0800

On 11/24/2012 08:21 AM, Stefan Hajnoczi wrote:

On Mon, Jun 25, 2012 at 7:42 AM, Mike Lovell <m...@dev-zero.net> wrote:

This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
first had the idea when I was playing with the udp and mcast socket network
backends while exploring how to build a VM infrastructure. I liked the idea of
using the sockets backends cause it doesn't require escalated permissions to
configure and run as well as the ability to talk over IP networks.

Hi Mike,
I was just reading the VXLAN spec and Linux code when I realized this
is similar to your QDES approach:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d342894c5d2f8c7df194c793ec4059656e09ca31
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02


If you're still hacking on QDES you may be interested.

VXLAN is a VLAN mechanism that gets around the 12-bit 802.1Q tag size.
  In large deployments it may be necessary to have more than 4096
VLANs, this is where VXLAN comes in.

It's a tiny header with VXLAN Network ID that encapsulates Ethernet inside UDP:

[Outer Ethernet][IP][UDP] [VXLAN] [Inner Ethernet][...]

UDP is used as follows:
1. If the host has already learnt an Inner MAC -> Outer IP mapping,
then it transmits a unicast UDP packet.
2. Otherwise it transmits a multicast UDP packet.

That means all hosts join a multicast group - this enables broadcast
similar to what you've done in your patches.

Typically traffic from a VM on Host A to another VM on Host B will use
unicast UDP because the Inner MAC -> Outer IP mapping has been learnt.

I'm not sure if it makes sense to implement VXLAN in QEMU because the
multicast UDP socket uses a well-known port.  I guess that means
multiple QEMUs running on the same host cannot use VXLAN unless they
bind to unique IP addresses.  At that point we lose the advantage of a
pure userspace implementation and might as well use the kernel
implementation (or OpenVSwitch) with tap devices.

Anyway, it's still interesting and maybe there's a way to solve this.

Stefan

the VXLAN spec gave me some inspiration to write the original patch isubmitted. unfortunately i made a silly decision of using my own headerformat and should have used the VXLAN one. but i believe just changingthat would make this compatible with VXLAN.

i do still want to do more work on this such as converting to make itcompatible with VXLAN. there have also been a lot of other changes tothe network subsystem that i would need to update the patch for. i'vebeen rather busy the past few months with a work project and told myselfi have to finish that before i can go back to this. i also was waitingto see if the curn in the network subsystem would calm down and make allthe changes i need there at once. hopefully around the new year i'llhave time to look at it. since i originally sent the patch to the list,there have been a few people ask me about it so i think there is someinterest for it.

i think it does still make sense to implement it in QEMU. there isn't aproblem with multiple processes using the same multicast address. thenet_socket_mcast_create function in socket.c already sets theIP_MULTICAST_LOOP option which makes it so packets get looped back andalso delivered to processes on the same host. that is why there is acheck in qdes_receive to see if the sender is the localAddr and drop itif it is. the big advantage i see to implementing VXLAN inside QEMU isthat it can be done without any escalated privileges and withoutreconfiguring the hosts network configuration.


mike

Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)

Reply via email to