On Tue, Mar 24, 2015 at 12:58 PM, Jesse Gross <je...@nicira.com> wrote: > On Tue, Mar 24, 2015 at 1:21 PM, Pravin Shelar <pshe...@nicira.com> wrote: >> On Mon, Mar 23, 2015 at 12:23 PM, Jesse Gross <je...@nicira.com> wrote: >>> On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <pshe...@nicira.com> wrote: >>>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c >>>> new file mode 100644 >>>> index 0000000..4cb4ea6 >>>> --- /dev/null >>>> +++ b/datapath/linux/compat/stt.c >>>> +static int __build_segments(struct sk_buff **headp, bool ipv4, int >>>> l4_offset) >>>> +{ >>>> + struct sk_buff *head = *headp; >>>> + struct sk_buff *nskb = NULL; >>>> + struct sk_buff *rskb, *skb; >>>> + struct tcphdr *tcph; >>>> + int seg_len = 0; >>>> + int hdr_len; >>>> + int tcp_len; >>>> + u32 seq; >>>> + >>>> + /* GRO can produce skbs with only the headers, which we've >>>> + * already pulled off. We can just dump them. >>>> + */ >>>> + while (head->len == 0) { >>>> + nskb = head->next; >>>> + copy_skb_metadata(nskb, head); >>>> + consume_skb(head); >>>> + head = nskb; >>>> + } >>>> + *headp = head; >>> >>>> + tcph = (struct tcphdr *)(head->data + l4_offset); >>>> + tcp_len = tcph->doff * 4; >>>> + hdr_len = l4_offset + tcp_len; >>> >>> I think there is something wrong with the ordering of the calls in >>> stt_rcv(). reassemble() will just produce a linked list of skbs that >>> are expected to be joined here but we've already called functions that >>> do pskb_may_pull(), which won't traverse that list. In particular, it >>> doesn't make sense to both remove zero length skbs above and expect >>> that there is a TCP header available below. >>> >> I can remove skb zero len check that is not required. skb should >> always have TCP header since reassemble() used it to build the list. > > What I mean is that the linked list created by reassemble() isn't > considered to be a single skb - it's a list of separate skbs. > Therefore, when we call pskb_may_pull() or similar functions that are > expected to bring data from different memory regions into the linear > data area, they won't do that. All of the skb stitching code should > happen before we start accessing the data region, otherwise it will > have to deal with the problem itself. >
ok, I fixed it >>>> + >>>> + if (unlikely((tcp_len < sizeof(struct tcphdr)) || >>>> + (head->len < hdr_len))) >>>> + return -EINVAL; >>>> + >>>> + if (unlikely(!pskb_may_pull(head, hdr_len))) >>>> + return -ENOMEM; >>> >>> It seems risky to me to combine skb coalescing with TCP header >>> updating as we would need to very carefully check boundary conditions. >>> I feel like there are two halves of the loop anyways, so it seems >>> better to just break them apart. >>> >> Do you mean convert it into two separate loops? > > Yes, that's what I was thinking. > >>>> +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 >>>> src, >>>> + __be32 dst, __u8 tos, __u8 ttl, __be16 df) >>>> +{ >>>> + int len = 0; >>>> + >>>> + while (skb) { >>>> + struct sk_buff *next = skb->next; >>>> + >>>> + if (next) >>>> + dst_clone(&rt->dst); >>>> + >>>> + skb->next = NULL; >>>> + len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP, >>>> + tos, ttl, df, false); >>> >>> I think there may be a problem in our version of ip_local_out() if it >>> thinks that fix_segment is set in the CB, so we need to find a way to >>> make sure that it is cleared. >>> >> >> right, I will fix STT and I will send another patch for other vport types. > > I think the existing vports should be OK because their handle offloads > compat code already knows how to initialize this. > I was thinking when compat handle offload is not used, but ovs calls compat ip_local_out(). for example on newer kernels. >>>> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt, >>>> + __be32 src, __be32 dst, __u8 tos, >>>> + __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port, >>>> + __be64 tun_id) >>> [...] >>>> + while (skb) { >>>> + struct sk_buff *next_skb = skb->next; >>>> + >>>> + skb->next = NULL; >>>> + >>>> + if (next_skb) >>>> + dst_clone(&rt->dst); >>>> + >>>> + /* Push STT and TCP header. */ >>>> + skb = push_stt_header(skb, tun_id, src_port, dst_port, src, >>>> + dst, inner_h_proto, inner_nw_proto, >>>> + dst_mtu(&rt->dst)); >>>> + if (unlikely(!skb)) >>>> + goto next; >>> >>> I think it's somewhat surprising for a function called >>> push_stt_header() to free the skb in the event of an error. I actually >>> think that there is a double free already between __push_stt_header() >>> and push_stt_header() so it seems like it is better to just propagate >>> the error back here and then free. >>> >> >> I have to free skb in push_stt header since it can change skb, so I >> will fix the double free bug. > > Hmm, ok. > >>>> +void stt_sock_release(struct net *net, __be16 port) >>> >>> Is there a reason to not just pass in stt_sock here? It seems cleaner >>> and more consistent. >>> >> I need to take the mutex to do lookup so I can not do it from vport-stt.c. > > I don't think that we need to a look it up at all. In > stt_tnl_destroy() we already have stt_port->stt_sock so I think we can > pass this in directly. > > As an aside, I don't believe that stt_mutex() is actually required > because the only user is OVS and the critical section is also > protected by ovs_mutex. However, I understand if you want to keep > stt_mutex to more closely mirror the other upstream tunnel types. > ok, I changed parameter. I will keep stt mutex. >>>> + struct stt_sock *stt_sock; >>>> + >>>> + mutex_lock(&stt_mutex); >>>> + rcu_read_lock(); >>>> + stt_sock = stt_find_sock(net, port); >>>> + rcu_read_unlock(); >>> >>> If we can't pass in stt_sock then I don't think we need rcu_read_lock >>> since we're holding the mutex. >>> >> It is just to keep RCU checker happy. > > I think it should be OK without it - list_for_each_entry_rcu() doesn't > purposely doesn't trigger RCU warnings. > ok. >>>> diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c >>>> new file mode 100644 >>>> index 0000000..ccd4c71 >>>> --- /dev/null >>>> +++ b/datapath/vport-stt.c >>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0) >>> >>> I think we need a check for CONFIG_NETFILTER someplace, otherwise the >>> registration mechanism won't compile. I wonder if both of these checks >>> should be pushed down to stt.c though since that's where they actually >>> matter. >>> >> ok, I will add stt stub function to handle case without netfilter. > > I wonder if it is better to just make the whole thing not compile like > we do if the kernel version is too low. After all, there's not much > point in a STT port that can't receive packets so it could be better > to just not allow it's creation. ok. I will send updated patch soon. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev