Re: [ovs-dev] [PATCH v3 15/17] native tunnel: Add support for STT

Jesse Gross Thu, 19 May 2016 12:46:09 -0700

On Tue, May 10, 2016 at 10:30 AM, Pravin B Shelar <pshe...@ovn.org> wrote:
> This patch used userpsace tunneling mechanism for implementing
> STT tunneling protocol.
> For details about STT you can refer the draft:
> https://tools.ietf.org/html/draft-davie-stt-07


Minor but the current version of the draft is -08.

> diff --git a/lib/netdev-native-stt.c b/lib/netdev-native-stt.c
> new file mode 100644
> index 0000000..f50b9ef
> --- /dev/null
> +++ b/lib/netdev-native-stt.c
> +static inline unsigned long long
> +cycles_counter(void)
> +{
> +    struct timespec tm;
> +
> +    if (clock_gettime(CLOCK_REALTIME, &tm) == -1 ) {

I think it would be better to use a monotonic clock to avoid problems
if the time changes on the system. Is there a reason why we can't just
use time_now()?

> +static uint32_t pkt_key_hash(const struct pkt_key *key)
> +{
> +        return hash_2words(hash_bytes(key, offsetof(struct pkt_key, pkt_seq),
> +                                      key->pkt_seq),
> +                           frag_hash_seed);

I guess the reason for having a two stage hashing here is to avoid any
possible padding bytes, right? This presumably isn't for correctness
since we memset the key when initializing it and the comparison looks
at the whole thing. Is it really faster to do it this way?

> +static struct dp_packet *
> +packet_merge(struct dp_packet *pkt)
> +{
> +    struct dp_packet *next;
> +    struct dp_packet *m;
> +
> +    m = dp_packet_clone(pkt);
> +    next = FRAG_DATA(pkt)->next;
> +    dp_packet_delete(pkt);
> +
> +    pkt = next;
> +    while (pkt) {
> +        dp_packet_put(m, STT_PACKET_DATA(pkt), FRAG_DATA(pkt)->pkt_size);

Since we already know the total size of the packet, can we preallocate
space so that we don't need to keep reallocating and copying over and
over again?

> +static struct dp_packet *
> +reassemble(struct dp_packet *packet)
> +{
[...]
> +    if (STT_BASE_HLEN > pkt_size) {
> +        goto out_free;
> +    }

Each individual packet will not have the STT header, so it doesn't
make sense to check it against the length here. We should check after
reassembly instead.

> +    frag = lookup_frag(reasm, &key, hash);
> +    if (!frag->pkts) {
[...]
> +        FRAG_DATA(packet)->first.set_ecn_ce = false;

I think we need to initialize this with the ECN value from this packet
- otherwise we will miss any congestion on the first packet.

> +static bool
> +valid_tcp_checksum(struct dp_packet *packet)
> +{
[...]
> +    if (csum_finish(csum)) {
> +        return false;
> +    }
> +    return true;
> +}

As a minor simplification/optimization, you could just make this
return !csum_finish().

> +void
> +netdev_stt_push_header(struct dp_packet *packet,
> +                       const struct ovs_action_push_tnl *data)
> +{
[...]
> +    /* TODO: Handle vlan header offload. */
> +    stth = (void *) (tcp + 1);

I guess we'll want to finish this - I don't think that it should to be too hard.

> +    if (packet->lso.type) {
> +        tcp->tcp_csum = ~csum_finish(csum);
> +
> +        if (packet->lso.type == DPBUF_LSO_TCPv4) {
> +            stth->flags = STT_PROTO_IPV4 | STT_PROTO_TCP;
> +        } else if (packet->lso.type == DPBUF_LSO_UDPv4) {
> +            stth->flags = STT_PROTO_IPV4;
> +        } else if (packet->lso.type == DPBUF_LSO_TCPv6) {
> +            stth->flags = STT_PROTO_TCP;
> +        } else if (packet->lso.type == DPBUF_LSO_UDPv6) {
> +            stth->flags = 0;
> +        }
> +        stth->flags |= STT_CSUM_PARTIAL;
> +        stth->l4_offset = packet->l4_ofs;
> +        stth->mss = htons(packet->lso.mss);

Are there any conditions that we need to check before encapsulating,
like length/offsets too large?

> +        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> +            packet->lso.type = DPBUF_LSO_TCPv6;
> +        } else {
> +            packet->lso.type = DPBUF_LSO_TCPv4;
> +        }

This keeps the original MSS of the packet, which imposes a strict MTU
requirement on the underlay similar to other encapsulation protocols.
In theory we should be able to adjust it to match what is available.

> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index e0a1ad4..cb1ee32 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> +static const void *
> +format_tcp_tnl_push_header(struct ds *ds, const struct tcp_header *tcp)
> +{
[...]

I think I would just format this as a single STT layer rather than
splitting between TCP/STT. The TCP information is already being
interpreted mostly (though not entirely) in the context of STT and
it's not likely we'll have additional forms of TCP based tunnels in
the future.

What about parsing of these actions?

> +    if (tcp->tcp_urg) {
> +        ds_put_format(ds, ",csum=0x%"PRIx16, ntohs(tcp->tcp_urg));
> +    }

This looks like there was some copy and pasted code that wasn't updated.

> @@ -2123,7 +2179,6 @@ format_be64(struct ds *ds, const char *name, ovs_be64 
> key,
>          if (!mask_full) { /* Partially masked. */
>              ds_put_format(ds, "/%#"PRIx64, ntohll(*mask));
>          }
> -        ds_put_char(ds, ',');

This is inconsistent with the other formatting functions, so it is
somewhat confusing. I think it would be better to clip off the comma
in places where we don't want it.

> diff --git a/lib/timeval.h b/lib/timeval.h
> index 7957dad..956ba51 100644
> --- a/lib/timeval.h
> +++ b/lib/timeval.h
> @@ -76,6 +76,7 @@ long long int time_boot_msec(void);
>
>  void timewarp_run(void);
>
> +#define time_before(a, b)  ((long)((a) - (b)) < 0)

What is the expected unit of time for this? I guess it is cycles as
opposed to timespec or timeval as primarily used in this function?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 15/17] native tunnel: Add support for STT

Reply via email to