On Tue, Mar 24, 2015 at 12:58 PM, Jesse Gross <je...@nicira.com> wrote:
> On Tue, Mar 24, 2015 at 1:21 PM, Pravin Shelar <pshe...@nicira.com> wrote:
>> On Mon, Mar 23, 2015 at 12:23 PM, Jesse Gross <je...@nicira.com> wrote:
>>> On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <pshe...@nicira.com> wrote:
>>>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>>>> new file mode 100644
>>>> index 0000000..4cb4ea6
>>>> --- /dev/null
>>>> +++ b/datapath/linux/compat/stt.c
>>>> +static int __build_segments(struct sk_buff **headp, bool ipv4, int 
>>>> l4_offset)
>>>> +{
>>>> +       struct sk_buff *head = *headp;
>>>> +       struct sk_buff *nskb = NULL;
>>>> +       struct sk_buff *rskb, *skb;
>>>> +       struct tcphdr *tcph;
>>>> +       int seg_len = 0;
>>>> +       int hdr_len;
>>>> +       int tcp_len;
>>>> +       u32 seq;
>>>> +
>>>> +       /* GRO can produce skbs with only the headers, which we've
>>>> +        * already pulled off.  We can just dump them.
>>>> +        */
>>>> +       while (head->len == 0) {
>>>> +               nskb = head->next;
>>>> +               copy_skb_metadata(nskb, head);
>>>> +               consume_skb(head);
>>>> +               head = nskb;
>>>> +       }
>>>> +       *headp = head;
>>>
>>>> +       tcph = (struct tcphdr *)(head->data + l4_offset);
>>>> +       tcp_len = tcph->doff * 4;
>>>> +       hdr_len = l4_offset + tcp_len;
>>>
>>> I think there is something wrong with the ordering of the calls in
>>> stt_rcv(). reassemble() will just produce a linked list of skbs that
>>> are expected to be joined here but we've already called functions that
>>> do pskb_may_pull(), which won't traverse that list. In particular, it
>>> doesn't make sense to both remove zero length skbs above and expect
>>> that there is a TCP header available below.
>>>
>> I can remove skb zero len check that is not required. skb should
>> always have TCP header since reassemble() used it to build the list.
>
> What I mean is that the linked list created by reassemble() isn't
> considered to be a single skb - it's a list of separate skbs.
> Therefore, when we call pskb_may_pull() or similar functions that are
> expected to bring data from different memory regions into the linear
> data area, they won't do that. All of the skb stitching code should
> happen before we start accessing the data region, otherwise it will
> have to deal with the problem itself.
>

ok, I fixed it

>>>> +
>>>> +       if (unlikely((tcp_len < sizeof(struct tcphdr)) ||
>>>> +                    (head->len < hdr_len)))
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (unlikely(!pskb_may_pull(head, hdr_len)))
>>>> +               return -ENOMEM;
>>>
>>> It seems risky to me to combine skb coalescing with TCP header
>>> updating as we would need to very carefully check boundary conditions.
>>> I feel like there are two halves of the loop anyways, so it seems
>>> better to just break them apart.
>>>
>> Do you mean convert it into two separate loops?
>
> Yes, that's what I was thinking.
>
>>>> +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 
>>>> src,
>>>> +                        __be32 dst, __u8 tos, __u8 ttl, __be16 df)
>>>> +{
>>>> +       int len = 0;
>>>> +
>>>> +       while (skb) {
>>>> +               struct sk_buff *next = skb->next;
>>>> +
>>>> +               if (next)
>>>> +                       dst_clone(&rt->dst);
>>>> +
>>>> +               skb->next = NULL;
>>>> +               len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP,
>>>> +                                    tos, ttl, df, false);
>>>
>>> I think there may be a problem in our version of ip_local_out() if it
>>> thinks that fix_segment is set in the CB, so we need to find a way to
>>> make sure that it is cleared.
>>>
>>
>> right, I will fix STT and I will send another patch for other vport types.
>
> I think the existing vports should be OK because their handle offloads
> compat code already knows how to initialize this.
>
I was thinking when compat handle offload is not used, but ovs calls
compat ip_local_out(). for example on newer kernels.

>>>> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt,
>>>> +                __be32 src, __be32 dst, __u8 tos,
>>>> +                __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
>>>> +                __be64 tun_id)
>>> [...]
>>>> +       while (skb) {
>>>> +               struct sk_buff *next_skb = skb->next;
>>>> +
>>>> +               skb->next = NULL;
>>>> +
>>>> +               if (next_skb)
>>>> +                       dst_clone(&rt->dst);
>>>> +
>>>> +               /* Push STT and TCP header. */
>>>> +               skb = push_stt_header(skb, tun_id, src_port, dst_port, src,
>>>> +                                     dst, inner_h_proto, inner_nw_proto,
>>>> +                                     dst_mtu(&rt->dst));
>>>> +               if (unlikely(!skb))
>>>> +                       goto next;
>>>
>>> I think it's somewhat surprising for a function called
>>> push_stt_header() to free the skb in the event of an error. I actually
>>> think that there is a double free already between __push_stt_header()
>>> and push_stt_header() so it seems like it is better to just propagate
>>> the error back here and then free.
>>>
>>
>> I have to free skb in push_stt header since it can change skb, so I
>> will fix the double free bug.
>
> Hmm, ok.
>
>>>> +void stt_sock_release(struct net *net, __be16 port)
>>>
>>> Is there a reason to not just pass in stt_sock here? It seems cleaner
>>> and more consistent.
>>>
>> I need to take the mutex to do lookup so I can not do it from vport-stt.c.
>
> I don't think that we need to a look it up at all. In
> stt_tnl_destroy() we already have stt_port->stt_sock so I think we can
> pass this in directly.
>
> As an aside, I don't believe that stt_mutex() is actually required
> because the only user is OVS and the critical section is also
> protected by ovs_mutex. However, I understand if you want to keep
> stt_mutex to more closely mirror the other upstream tunnel types.
>
ok, I changed parameter. I will keep stt mutex.

>>>> +       struct stt_sock *stt_sock;
>>>> +
>>>> +       mutex_lock(&stt_mutex);
>>>> +       rcu_read_lock();
>>>> +       stt_sock = stt_find_sock(net, port);
>>>> +       rcu_read_unlock();
>>>
>>> If we can't pass in stt_sock then I don't think we need rcu_read_lock
>>> since we're holding the mutex.
>>>
>> It is just to keep RCU checker happy.
>
> I think it should be OK without it - list_for_each_entry_rcu() doesn't
> purposely doesn't trigger RCU warnings.
>
ok.

>>>> diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c
>>>> new file mode 100644
>>>> index 0000000..ccd4c71
>>>> --- /dev/null
>>>> +++ b/datapath/vport-stt.c
>>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0)
>>>
>>> I think we need a check for CONFIG_NETFILTER someplace, otherwise the
>>> registration mechanism won't compile. I wonder if both of these checks
>>> should be pushed down to stt.c though since that's where they actually
>>> matter.
>>>
>> ok, I will add stt stub function to handle case without netfilter.
>
> I wonder if it is better to just make the whole thing not compile like
> we do if the kernel version is too low. After all, there's not much
> point in a STT port that can't receive packets so it could be better
> to just not allow it's creation.

ok. I will send updated patch soon.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to