On Mon, Mar 23, 2015 at 12:23 PM, Jesse Gross <[email protected]> wrote:
> On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <[email protected]> wrote:
>> openvswitch_headers = \
>> diff --git a/datapath/linux/.gitignore b/datapath/linux/.gitignore
>> index 69d6658..1d1aae7 100644
>> --- a/datapath/linux/.gitignore
>> +++ b/datapath/linux/.gitignore
>> @@ -50,6 +50,7 @@
>> /vport-lisp.c
>> /vport-netdev.c
>> /vport-patch.c
>> +/vport-stt.c
>> /vport-vxlan.c
>> /vport.c
>> /vxlan.c
>
> Should we add stt.c as well?
>
ok.
>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>> new file mode 100644
>> index 0000000..4cb4ea6
>> --- /dev/null
>> +++ b/datapath/linux/compat/stt.c
>> +static int skb_list_linearize(struct sk_buff *head, gfp_t gfp_mask)
>> +{
>> + struct sk_buff *skb;
>> + int tlen = 0;
>> + int err;
>> +
>> + err = skb_linearize(head);
>> + if (err)
>> + return err;
>> +
>> + skb = head->next;
>> + while (skb) {
>> + tlen += skb->len;
>> + skb = skb->next;
>> + }
>> + err = pskb_expand_head(head, 0, tlen, gfp_mask);
>> + if (err)
>> + return err;
>
> Were you going to try to avoid linearizing and then expanding as Alex
> had mentioned in his comments?
>
ok.
>> +static int __build_segments(struct sk_buff **headp, bool ipv4, int
>> l4_offset)
>> +{
>> + struct sk_buff *head = *headp;
>> + struct sk_buff *nskb = NULL;
>> + struct sk_buff *rskb, *skb;
>> + struct tcphdr *tcph;
>> + int seg_len = 0;
>> + int hdr_len;
>> + int tcp_len;
>> + u32 seq;
>> +
>> + /* GRO can produce skbs with only the headers, which we've
>> + * already pulled off. We can just dump them.
>> + */
>> + while (head->len == 0) {
>> + nskb = head->next;
>> + copy_skb_metadata(nskb, head);
>> + consume_skb(head);
>> + head = nskb;
>> + }
>> + *headp = head;
>
>> + tcph = (struct tcphdr *)(head->data + l4_offset);
>> + tcp_len = tcph->doff * 4;
>> + hdr_len = l4_offset + tcp_len;
>
> I think there is something wrong with the ordering of the calls in
> stt_rcv(). reassemble() will just produce a linked list of skbs that
> are expected to be joined here but we've already called functions that
> do pskb_may_pull(), which won't traverse that list. In particular, it
> doesn't make sense to both remove zero length skbs above and expect
> that there is a TCP header available below.
>
I can remove skb zero len check that is not required. skb should
always have TCP header since reassemble() used it to build the list.
>> +
>> + if (unlikely((tcp_len < sizeof(struct tcphdr)) ||
>> + (head->len < hdr_len)))
>> + return -EINVAL;
>> +
>> + if (unlikely(!pskb_may_pull(head, hdr_len)))
>> + return -ENOMEM;
>
> It seems risky to me to combine skb coalescing with TCP header
> updating as we would need to very carefully check boundary conditions.
> I feel like there are two halves of the loop anyways, so it seems
> better to just break them apart.
>
Do you mean convert it into two separate loops?
>> +static int __segment_skb(struct sk_buff **headp, bool ipv4, bool tcp, bool
>> csum_partial, int l4_offset)
>> +{
>> + int err;
>> +
>> + err = straighten_frag_list(headp);
>> + if (unlikely(err))
>> + return err;
>> +
>> + if (__linearize(*headp, ipv4, tcp, csum_partial)) {
>> + return skb_list_linearize(*headp, GFP_ATOMIC);
>> + }
>> +
>> + if ((*headp)->next) {
>
> I think we can push this check up above linearization. If there is no
> list of skbs, we don't need to linearize either.
>
ok.
>> +static int segment_skb(struct sk_buff **headp)
>> +{
>
> Isn't there a check for frag_list for the receive path missing
> somewhere? It seems like linearization happens somewhat
> unconditionally.
>
I added check for frag_list.
>> +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 src,
>> + __be32 dst, __u8 tos, __u8 ttl, __be16 df)
>> +{
>> + int len = 0;
>> +
>> + while (skb) {
>> + struct sk_buff *next = skb->next;
>> +
>> + if (next)
>> + dst_clone(&rt->dst);
>> +
>> + skb->next = NULL;
>> + len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP,
>> + tos, ttl, df, false);
>
> I think there may be a problem in our version of ip_local_out() if it
> thinks that fix_segment is set in the CB, so we need to find a way to
> make sure that it is cleared.
>
right, I will fix STT and I will send another patch for other vport types.
>> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt,
>> + __be32 src, __be32 dst, __u8 tos,
>> + __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
>> + __be64 tun_id)
> [...]
>> + while (skb) {
>> + struct sk_buff *next_skb = skb->next;
>> +
>> + skb->next = NULL;
>> +
>> + if (next_skb)
>> + dst_clone(&rt->dst);
>> +
>> + /* Push STT and TCP header. */
>> + skb = push_stt_header(skb, tun_id, src_port, dst_port, src,
>> + dst, inner_h_proto, inner_nw_proto,
>> + dst_mtu(&rt->dst));
>> + if (unlikely(!skb))
>> + goto next;
>
> I think it's somewhat surprising for a function called
> push_stt_header() to free the skb in the event of an error. I actually
> think that there is a double free already between __push_stt_header()
> and push_stt_header() so it seems like it is better to just propagate
> the error back here and then free.
>
I have to free skb in push_stt header since it can change skb, so I
will fix the double free bug.
>> +static bool set_offloads(struct sk_buff *skb)
>> +{
> [...]
>> + if (proto_type == (STT_PROTO_IPV4 | STT_PROTO_TCP)) {
>
> Should we convert this to a switch statement? It seems mildly cleaner.
>
ok.
>> +void stt_sock_release(struct net *net, __be16 port)
>
> Is there a reason to not just pass in stt_sock here? It seems cleaner
> and more consistent.
>
I need to take the mutex to do lookup so I can not do it from vport-stt.c.
>> + struct stt_sock *stt_sock;
>> +
>> + mutex_lock(&stt_mutex);
>> + rcu_read_lock();
>> + stt_sock = stt_find_sock(net, port);
>> + rcu_read_unlock();
>
> If we can't pass in stt_sock then I don't think we need rcu_read_lock
> since we're holding the mutex.
>
It is just to keep RCU checker happy.
>> diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c
>> new file mode 100644
>> index 0000000..ccd4c71
>> --- /dev/null
>> +++ b/datapath/vport-stt.c
>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0)
>
> I think we need a check for CONFIG_NETFILTER someplace, otherwise the
> registration mechanism won't compile. I wonder if both of these checks
> should be pushed down to stt.c though since that's where they actually
> matter.
>
ok, I will add stt stub function to handle case without netfilter.
>> diff --git a/datapath/vport.h b/datapath/vport.h
>> index e256fc0..50e6289 100644
>> --- a/datapath/vport.h
>> +++ b/datapath/vport.h
>> @@ -35,6 +35,7 @@ struct vport_parms;
>> struct vport_net {
>> struct vport __rcu *gre_vport;
>> struct vport __rcu *gre64_vport;
>> + struct vport __rcu *stt_vport;
>
> I think this is no longer used now that the port number is configurable.
>
ok.
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index 8e1b542..bd7f146 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -153,7 +154,7 @@ netdev_vport_needs_dst_port(const struct netdev *dev)
>>
>> return (class->get_config == get_tunnel_config &&
>> (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
>> - !strcmp("lisp", type)));
>> + !strcmp("lisp", type) || !strcmp("stt", type)) );
>> }
>
> I think we need some additional support for configurable STT ports in
> netdev_vport_construct() and get_tunnel_config().
>
ok.
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index dac3756..1b9da1d 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> + <dt><code>stt</code></dt>
>> + <dd>
>> + The Stateless TCP Tunnel (STT) protocol encapsulates traffic in
>> + IPv4/TCP packets. All traffic uses a destination port of 7471.
>> + The STT protocol does not engage in the usual TCP 3-way
>> handshake,
>> + so it will have difficulty traversing stateful firewalls.
>> + STT is only available in kernel datapath on kernel 3.5 or
>> newer.
>> + </dd>
>
> Can you make this a little more descriptive? I think right now it is
> hard to understand when this should or should not be used.
ok.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev