On Mon, Mar 23, 2015 at 12:23 PM, Jesse Gross <je...@nicira.com> wrote:
> On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <pshe...@nicira.com> wrote:
>>  openvswitch_headers = \
>> diff --git a/datapath/linux/.gitignore b/datapath/linux/.gitignore
>> index 69d6658..1d1aae7 100644
>> --- a/datapath/linux/.gitignore
>> +++ b/datapath/linux/.gitignore
>> @@ -50,6 +50,7 @@
>>  /vport-lisp.c
>>  /vport-netdev.c
>>  /vport-patch.c
>> +/vport-stt.c
>>  /vport-vxlan.c
>>  /vport.c
>>  /vxlan.c
>
> Should we add stt.c as well?
>
ok.

>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>> new file mode 100644
>> index 0000000..4cb4ea6
>> --- /dev/null
>> +++ b/datapath/linux/compat/stt.c
>> +static int skb_list_linearize(struct sk_buff *head, gfp_t gfp_mask)
>> +{
>> +       struct sk_buff *skb;
>> +       int tlen = 0;
>> +       int err;
>> +
>> +       err = skb_linearize(head);
>> +       if (err)
>> +               return err;
>> +
>> +       skb = head->next;
>> +       while (skb) {
>> +               tlen += skb->len;
>> +               skb = skb->next;
>> +       }
>> +       err = pskb_expand_head(head, 0, tlen, gfp_mask);
>> +       if (err)
>> +               return err;
>
> Were you going to try to avoid linearizing and then expanding as Alex
> had mentioned in his comments?
>
ok.

>> +static int __build_segments(struct sk_buff **headp, bool ipv4, int 
>> l4_offset)
>> +{
>> +       struct sk_buff *head = *headp;
>> +       struct sk_buff *nskb = NULL;
>> +       struct sk_buff *rskb, *skb;
>> +       struct tcphdr *tcph;
>> +       int seg_len = 0;
>> +       int hdr_len;
>> +       int tcp_len;
>> +       u32 seq;
>> +
>> +       /* GRO can produce skbs with only the headers, which we've
>> +        * already pulled off.  We can just dump them.
>> +        */
>> +       while (head->len == 0) {
>> +               nskb = head->next;
>> +               copy_skb_metadata(nskb, head);
>> +               consume_skb(head);
>> +               head = nskb;
>> +       }
>> +       *headp = head;
>
>> +       tcph = (struct tcphdr *)(head->data + l4_offset);
>> +       tcp_len = tcph->doff * 4;
>> +       hdr_len = l4_offset + tcp_len;
>
> I think there is something wrong with the ordering of the calls in
> stt_rcv(). reassemble() will just produce a linked list of skbs that
> are expected to be joined here but we've already called functions that
> do pskb_may_pull(), which won't traverse that list. In particular, it
> doesn't make sense to both remove zero length skbs above and expect
> that there is a TCP header available below.
>
I can remove skb zero len check that is not required. skb should
always have TCP header since reassemble() used it to build the list.

>> +
>> +       if (unlikely((tcp_len < sizeof(struct tcphdr)) ||
>> +                    (head->len < hdr_len)))
>> +               return -EINVAL;
>> +
>> +       if (unlikely(!pskb_may_pull(head, hdr_len)))
>> +               return -ENOMEM;
>
> It seems risky to me to combine skb coalescing with TCP header
> updating as we would need to very carefully check boundary conditions.
> I feel like there are two halves of the loop anyways, so it seems
> better to just break them apart.
>
Do you mean convert it into two separate loops?

>> +static int __segment_skb(struct sk_buff **headp, bool ipv4, bool tcp, bool 
>> csum_partial, int l4_offset)
>> +{
>> +       int err;
>> +
>> +       err = straighten_frag_list(headp);
>> +       if (unlikely(err))
>> +               return err;
>> +
>> +       if (__linearize(*headp, ipv4, tcp, csum_partial)) {
>> +               return skb_list_linearize(*headp, GFP_ATOMIC);
>> +       }
>> +
>> +       if ((*headp)->next) {
>
> I think we can push this check up above linearization. If there is no
> list of skbs, we don't need to linearize either.
>
ok.

>> +static int segment_skb(struct sk_buff **headp)
>> +{
>
> Isn't there a check for frag_list for the receive path missing
> somewhere? It seems like linearization happens somewhat
> unconditionally.
>
I added check for frag_list.

>> +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 src,
>> +                        __be32 dst, __u8 tos, __u8 ttl, __be16 df)
>> +{
>> +       int len = 0;
>> +
>> +       while (skb) {
>> +               struct sk_buff *next = skb->next;
>> +
>> +               if (next)
>> +                       dst_clone(&rt->dst);
>> +
>> +               skb->next = NULL;
>> +               len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP,
>> +                                    tos, ttl, df, false);
>
> I think there may be a problem in our version of ip_local_out() if it
> thinks that fix_segment is set in the CB, so we need to find a way to
> make sure that it is cleared.
>

right, I will fix STT and I will send another patch for other vport types.

>> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt,
>> +                __be32 src, __be32 dst, __u8 tos,
>> +                __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
>> +                __be64 tun_id)
> [...]
>> +       while (skb) {
>> +               struct sk_buff *next_skb = skb->next;
>> +
>> +               skb->next = NULL;
>> +
>> +               if (next_skb)
>> +                       dst_clone(&rt->dst);
>> +
>> +               /* Push STT and TCP header. */
>> +               skb = push_stt_header(skb, tun_id, src_port, dst_port, src,
>> +                                     dst, inner_h_proto, inner_nw_proto,
>> +                                     dst_mtu(&rt->dst));
>> +               if (unlikely(!skb))
>> +                       goto next;
>
> I think it's somewhat surprising for a function called
> push_stt_header() to free the skb in the event of an error. I actually
> think that there is a double free already between __push_stt_header()
> and push_stt_header() so it seems like it is better to just propagate
> the error back here and then free.
>

I have to free skb in push_stt header since it can change skb, so I
will fix the double free bug.

>> +static bool set_offloads(struct sk_buff *skb)
>> +{
> [...]
>> +       if (proto_type == (STT_PROTO_IPV4 | STT_PROTO_TCP)) {
>
> Should we convert this to a switch statement? It seems mildly cleaner.
>
ok.

>> +void stt_sock_release(struct net *net, __be16 port)
>
> Is there a reason to not just pass in stt_sock here? It seems cleaner
> and more consistent.
>
I need to take the mutex to do lookup so I can not do it from vport-stt.c.
>> +       struct stt_sock *stt_sock;
>> +
>> +       mutex_lock(&stt_mutex);
>> +       rcu_read_lock();
>> +       stt_sock = stt_find_sock(net, port);
>> +       rcu_read_unlock();
>
> If we can't pass in stt_sock then I don't think we need rcu_read_lock
> since we're holding the mutex.
>
It is just to keep RCU checker happy.

>> diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c
>> new file mode 100644
>> index 0000000..ccd4c71
>> --- /dev/null
>> +++ b/datapath/vport-stt.c
>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0)
>
> I think we need a check for CONFIG_NETFILTER someplace, otherwise the
> registration mechanism won't compile. I wonder if both of these checks
> should be pushed down to stt.c though since that's where they actually
> matter.
>
ok, I will add stt stub function to handle case without netfilter.

>> diff --git a/datapath/vport.h b/datapath/vport.h
>> index e256fc0..50e6289 100644
>> --- a/datapath/vport.h
>> +++ b/datapath/vport.h
>> @@ -35,6 +35,7 @@ struct vport_parms;
>>  struct vport_net {
>>         struct vport __rcu *gre_vport;
>>         struct vport __rcu *gre64_vport;
>> +       struct vport __rcu *stt_vport;
>
> I think this is no longer used now that the port number is configurable.
>
ok.

>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index 8e1b542..bd7f146 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -153,7 +154,7 @@ netdev_vport_needs_dst_port(const struct netdev *dev)
>>
>>      return (class->get_config == get_tunnel_config &&
>>              (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
>> -             !strcmp("lisp", type)));
>> +             !strcmp("lisp", type) || !strcmp("stt", type)) );
>>  }
>
> I think we need some additional support for configurable STT ports in
> netdev_vport_construct() and get_tunnel_config().
>
ok.

>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index dac3756..1b9da1d 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> +          <dt><code>stt</code></dt>
>> +          <dd>
>> +             The Stateless TCP Tunnel (STT) protocol encapsulates traffic in
>> +             IPv4/TCP packets.  All traffic uses a destination port of 7471.
>> +             The STT protocol does not engage in the usual TCP 3-way 
>> handshake,
>> +             so it will have difficulty traversing stateful firewalls.
>> +             STT is only available in kernel datapath on kernel 3.5 or 
>> newer.
>> +          </dd>
>
> Can you make this a little more descriptive? I think right now it is
> hard to understand when this should or should not be used.
ok.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to