On Thu, Nov 29, 2012 at 2:46 PM, Kyle Mestery <kmest...@cisco.com> wrote: > Note: v6 of this patch addresses additional comments from > Jesse. Specifically, I've removed the hash table of > port/socket mappings and moved to a linked list. I've also > cleaned up the code around this quite a bit. > > Note: v5 of this patch addresses comments from Jesse > and Chris. > > Note: v4 of this patch removes VXLAN over IPSEC support, > per an offline conversation with Jesse. > > Actual commit message below. > --- > Add support for VXLAN tunnels to Open vSwitch. Add support > for setting the destination UDP port on a per-port basis. > This is done by adding a "dst_port" parameter to the port > configuration. This is only applicable currently to VXLAN > tunnels. > > Please note this currently does not implement any sort of multicast > learning. With this patch, VXLAN tunnels must be configured similar > to GRE tunnels (e.g. point to point). A subsequent patch will implement > a VXLAN control plane in userspace to handle multicast learning. > > This patch set is based on one posted by Ben Pfaff on Oct. 12, 2011 > to the ovs-dev mailing list: > > http://openvswitch.org/pipermail/dev/2011-October/012051.html > > The patch has been maintained, updated, and freshened by me and a > version of it is available at the following github repository: > > https://github.com/mestery/ovs-vxlan/tree/vxlan > > I've tested this patch with multiple VXLAN tunnels between hosts > using different UDP port numbers. Performance is on par (though > slightly faster) than comparable GRE tunnels. > > See the following IETF draft for additional information about VXLAN: > http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 > > Signed-off-by: Kyle Mestery <kmest...@cisco.com>
The git style is actually the reverse of the commit message and the comments (the commit message goes on top). If you do that then git will automatically chose the right pieces when it is applied. I received a sparse warning: CHECK /home/jgross/openvswitch/datapath/linux/vport-vxlan.c /home/jgross/openvswitch/datapath/linux/vport-vxlan.c:88:1: warning: symbol 'vxlan_ports' was not declared. Should it be static? > diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c > new file mode 100644 > index 0000000..b382bb5 > --- /dev/null > +++ b/datapath/vport-vxlan.c > +/** > + * struct if_names - List of names > + * @list: list element. > + * @name: The name of the interface. > + */ > +struct if_name { > + struct list_head list; > + char name[IFNAMSIZ]; > +}; I'm not quite sure why we need to be able to look up by name. I see it is used in vxlan_set_options() to find the socket to release but I think we could just do it by port number as is done in vxlan_tnl_destroy(). > +/** > + * struct vxlan_port - Keeps track of open UDP ports > + * @list: list element. > + * @port: The UDP port number. > + * @net: The net namespace this port is assosciated with. > + * @if_names: List of names this port is assosciated with. > + * @socket: The socket created for this port number. > + * @count: How many ports are using this socket/port. > + */ > +struct vxlan_port { > + struct list_head list; > + u16 port; > + struct net *net; Since we have the socket, you could save a little space and just use sock_net(vxlan_rcv_socket->sk) instead here. > +static struct vxlan_port *vxlan_port_exists(struct net *net, u16 port) > +{ > + struct vxlan_port *vxlan_port; > + > + list_for_each_entry(vxlan_port, &vxlan_ports, list) { > + if (vxlan_port->port == port && vxlan_port->net == net) You could use net_eq() here (the benefit of these different accessors is they compile out if network namespaces aren't configured). > +static struct sk_buff *vxlan_build_header(const struct vport *vport, > + const struct tnl_mutable_config > *mutable, > + struct dst_entry *dst, > + struct sk_buff *skb, > + int tunnel_hlen) > +{ > + struct udphdr *udph = udp_hdr(skb); > + struct vxlanhdr *vxh = (struct vxlanhdr *)(udph + 1); > + const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key; > + __be64 out_key; > + u32 flags; > + > + tnl_get_param(mutable, tun_key, &flags, &out_key); > + > + udph->dest = htons(mutable->dst_port); It would be nice to store this in network byte order rather than doing the swap every time. > +static int vxlan_set_options(struct vport *vport, struct nlattr *options) > +{ > + int err; > + const char *vname = vport->ops->get_name(vport); > + struct net *net = ovs_dp_get_net(vport->dp); > + struct tnl_vport *tnl_vport = tnl_vport_priv(vport); > + struct tnl_mutable_config *config; > + struct vxlan_port *old_port = NULL; > + struct vxlan_port *vxlan_port = NULL; > + > + config = rtnl_dereference(tnl_vport->mutable); > + > + old_port = find_vxlan_port_by_name(net, vname); > + if (old_port && old_port->port == config->dst_port) { > + /* Release old socket */ > + vxlan_tunnel_release(old_port); > + } > + > + err = vxlan_tunnel_setup(net, vname, options, &vxlan_port); > + if (err) > + goto out; > + > + err = ovs_tnl_set_options(vport, options); > + > + if (err) > + vxlan_tunnel_release(vxlan_port); I think we really want to make releasing the old socket the last thing that we do. Otherwise, if part of the operation fails then we'll be left in an inconsistent state without a way to rollback. Also, if an option other than the destination port is being changed then there will be a blip in traffic while we create and destroy the port. In this case, the refcounting it makes it easy to avoid both of the problems. > +static void vxlan_tnl_destroy(struct vport *vport) > +{ > + struct vxlan_port *vxlan_port; > + struct tnl_vport *tnl_vport = tnl_vport_priv(vport); > + struct tnl_mutable_config *config; > + > + config = rtnl_dereference(tnl_vport->mutable); > + > + vxlan_port = vxlan_port_exists(ovs_dp_get_net(vport->dp), > + config->dst_port); > + if (!vxlan_port) > + goto out; It almost seems better to not try to handle the NULL case here. It should never happen so it's good to know if it does. > +static struct vport *vxlan_tnl_create(const struct vport_parms *parms) > +{ > + int err; > + struct vport *vport; > + struct vxlan_port *vxlan_port = NULL; > + > + err = vxlan_tunnel_setup(ovs_dp_get_net(parms->dp), parms->name, > + parms->options, &vxlan_port); > + if (err) { > + vxlan_tunnel_release(vxlan_port); Is it necessary to call release at this point? It looks like setup should clean everything up if it fails (and worse it could cause us to dereference a NULL pointer). > + vport = ovs_tnl_create(parms, &ovs_vxlan_vport_ops, > &ovs_vxlan_tnl_ops); > + > + if (!vport) { > + vxlan_tunnel_release(vxlan_port); > + } The usual kernel convention is drop the braces in this case. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev