On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <pshe...@nicira.com> wrote: > openvswitch_headers = \ > diff --git a/datapath/linux/.gitignore b/datapath/linux/.gitignore > index 69d6658..1d1aae7 100644 > --- a/datapath/linux/.gitignore > +++ b/datapath/linux/.gitignore > @@ -50,6 +50,7 @@ > /vport-lisp.c > /vport-netdev.c > /vport-patch.c > +/vport-stt.c > /vport-vxlan.c > /vport.c > /vxlan.c
Should we add stt.c as well? > diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c > new file mode 100644 > index 0000000..4cb4ea6 > --- /dev/null > +++ b/datapath/linux/compat/stt.c > +static int skb_list_linearize(struct sk_buff *head, gfp_t gfp_mask) > +{ > + struct sk_buff *skb; > + int tlen = 0; > + int err; > + > + err = skb_linearize(head); > + if (err) > + return err; > + > + skb = head->next; > + while (skb) { > + tlen += skb->len; > + skb = skb->next; > + } > + err = pskb_expand_head(head, 0, tlen, gfp_mask); > + if (err) > + return err; Were you going to try to avoid linearizing and then expanding as Alex had mentioned in his comments? > +static int __build_segments(struct sk_buff **headp, bool ipv4, int l4_offset) > +{ > + struct sk_buff *head = *headp; > + struct sk_buff *nskb = NULL; > + struct sk_buff *rskb, *skb; > + struct tcphdr *tcph; > + int seg_len = 0; > + int hdr_len; > + int tcp_len; > + u32 seq; > + > + /* GRO can produce skbs with only the headers, which we've > + * already pulled off. We can just dump them. > + */ > + while (head->len == 0) { > + nskb = head->next; > + copy_skb_metadata(nskb, head); > + consume_skb(head); > + head = nskb; > + } > + *headp = head; > + tcph = (struct tcphdr *)(head->data + l4_offset); > + tcp_len = tcph->doff * 4; > + hdr_len = l4_offset + tcp_len; I think there is something wrong with the ordering of the calls in stt_rcv(). reassemble() will just produce a linked list of skbs that are expected to be joined here but we've already called functions that do pskb_may_pull(), which won't traverse that list. In particular, it doesn't make sense to both remove zero length skbs above and expect that there is a TCP header available below. > + > + if (unlikely((tcp_len < sizeof(struct tcphdr)) || > + (head->len < hdr_len))) > + return -EINVAL; > + > + if (unlikely(!pskb_may_pull(head, hdr_len))) > + return -ENOMEM; It seems risky to me to combine skb coalescing with TCP header updating as we would need to very carefully check boundary conditions. I feel like there are two halves of the loop anyways, so it seems better to just break them apart. > +static int __segment_skb(struct sk_buff **headp, bool ipv4, bool tcp, bool > csum_partial, int l4_offset) > +{ > + int err; > + > + err = straighten_frag_list(headp); > + if (unlikely(err)) > + return err; > + > + if (__linearize(*headp, ipv4, tcp, csum_partial)) { > + return skb_list_linearize(*headp, GFP_ATOMIC); > + } > + > + if ((*headp)->next) { I think we can push this check up above linearization. If there is no list of skbs, we don't need to linearize either. > +static int segment_skb(struct sk_buff **headp) > +{ Isn't there a check for frag_list for the receive path missing somewhere? It seems like linearization happens somewhat unconditionally. > +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 src, > + __be32 dst, __u8 tos, __u8 ttl, __be16 df) > +{ > + int len = 0; > + > + while (skb) { > + struct sk_buff *next = skb->next; > + > + if (next) > + dst_clone(&rt->dst); > + > + skb->next = NULL; > + len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP, > + tos, ttl, df, false); I think there may be a problem in our version of ip_local_out() if it thinks that fix_segment is set in the CB, so we need to find a way to make sure that it is cleared. > +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt, > + __be32 src, __be32 dst, __u8 tos, > + __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port, > + __be64 tun_id) [...] > + while (skb) { > + struct sk_buff *next_skb = skb->next; > + > + skb->next = NULL; > + > + if (next_skb) > + dst_clone(&rt->dst); > + > + /* Push STT and TCP header. */ > + skb = push_stt_header(skb, tun_id, src_port, dst_port, src, > + dst, inner_h_proto, inner_nw_proto, > + dst_mtu(&rt->dst)); > + if (unlikely(!skb)) > + goto next; I think it's somewhat surprising for a function called push_stt_header() to free the skb in the event of an error. I actually think that there is a double free already between __push_stt_header() and push_stt_header() so it seems like it is better to just propagate the error back here and then free. > +static bool set_offloads(struct sk_buff *skb) > +{ [...] > + if (proto_type == (STT_PROTO_IPV4 | STT_PROTO_TCP)) { Should we convert this to a switch statement? It seems mildly cleaner. > +void stt_sock_release(struct net *net, __be16 port) Is there a reason to not just pass in stt_sock here? It seems cleaner and more consistent. > + struct stt_sock *stt_sock; > + > + mutex_lock(&stt_mutex); > + rcu_read_lock(); > + stt_sock = stt_find_sock(net, port); > + rcu_read_unlock(); If we can't pass in stt_sock then I don't think we need rcu_read_lock since we're holding the mutex. > diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c > new file mode 100644 > index 0000000..ccd4c71 > --- /dev/null > +++ b/datapath/vport-stt.c > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0) I think we need a check for CONFIG_NETFILTER someplace, otherwise the registration mechanism won't compile. I wonder if both of these checks should be pushed down to stt.c though since that's where they actually matter. > diff --git a/datapath/vport.h b/datapath/vport.h > index e256fc0..50e6289 100644 > --- a/datapath/vport.h > +++ b/datapath/vport.h > @@ -35,6 +35,7 @@ struct vport_parms; > struct vport_net { > struct vport __rcu *gre_vport; > struct vport __rcu *gre64_vport; > + struct vport __rcu *stt_vport; I think this is no longer used now that the port number is configurable. > diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c > index 8e1b542..bd7f146 100644 > --- a/lib/netdev-vport.c > +++ b/lib/netdev-vport.c > @@ -153,7 +154,7 @@ netdev_vport_needs_dst_port(const struct netdev *dev) > > return (class->get_config == get_tunnel_config && > (!strcmp("geneve", type) || !strcmp("vxlan", type) || > - !strcmp("lisp", type))); > + !strcmp("lisp", type) || !strcmp("stt", type)) ); > } I think we need some additional support for configurable STT ports in netdev_vport_construct() and get_tunnel_config(). > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index dac3756..1b9da1d 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > + <dt><code>stt</code></dt> > + <dd> > + The Stateless TCP Tunnel (STT) protocol encapsulates traffic in > + IPv4/TCP packets. All traffic uses a destination port of 7471. > + The STT protocol does not engage in the usual TCP 3-way > handshake, > + so it will have difficulty traversing stateful firewalls. > + STT is only available in kernel datapath on kernel 3.5 or newer. > + </dd> Can you make this a little more descriptive? I think right now it is hard to understand when this should or should not be used. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev