With few suggestions and questions below: Acked-by: Jarno Rajahalme <jrajaha...@nicira.com>
Jarno > On Jul 29, 2015, at 8:09 PM, Jesse Gross <je...@nicira.com> wrote: > > The kernel implementation of Geneve options stores the TLV option > data in the flow exactly as received, without any further parsing. > This is then translated to known options for the purposes of matching > on flow setup (which will then install a datapath flow in the form > the kernel is expecting). > > The userspace implementation behaves a little bit differently - it > looks up known options as each packet is received. The reason for this > is there is a much tighter coupling between datapath and flow translation > and the representation is generally expected to be the same. This works > but it incurs work on a per-packet basis that could be done per-flow > instead. > > This introduces a small translation step for Geneve packets between > datapath and flow lookup for the userspace datapath in order to > allow the same kind of processing that the kernel does. > > There is a second benefit to this as well: for some operations it is > preferable to keep the options exactly as they were received on the wire, > which this enables. One example is that for packets that are executed from > ofproto-dpif-upcall to the datapath, this avoids the translation of > Geneve metadata. Since this conversion is potentially lossy (for unknown > options), keeping everything in the same format removes the possibility > of dropping options if the packet comes back up to userspace and the > Geneve option translation table has changed. To help with these types of > operations, most functions can understand both formats of data and seamlessly > do the right thing. > > Signed-off-by: Jesse Gross <je...@nicira.com> > --- > lib/automake.mk | 1 + > lib/dpif-netdev.c | 55 ++++++- > lib/flow.c | 48 ++++-- > lib/flow.h | 13 +- > lib/geneve.h | 63 ++++++++ > lib/meta-flow.c | 6 +- > lib/netdev-vport.c | 26 ++-- > lib/odp-execute.c | 2 +- > lib/odp-util.c | 58 ++++--- > lib/odp-util.h | 12 +- > lib/packets.h | 41 +---- > lib/tun-metadata.c | 352 ++++++++++++++++++++++++++++++------------ > lib/tun-metadata.h | 74 ++++++--- > ofproto/ofproto-dpif-sflow.c | 2 +- > ofproto/ofproto-dpif-upcall.c | 2 +- > tests/tunnel-push-pop.at | 2 +- > 16 files changed, 534 insertions(+), 223 deletions(-) > create mode 100644 lib/geneve.h > > diff --git a/lib/automake.mk b/lib/automake.mk > index faca968..5b6e9e8 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -81,6 +81,7 @@ lib_libopenvswitch_la_SOURCES = \ > lib/fatal-signal.h \ > lib/flow.c \ > lib/flow.h \ > + lib/geneve.h \ > lib/guarded-list.c \ > lib/guarded-list.h \ > lib/hash.c \ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index f587df5..c31a7e0 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -1884,8 +1884,8 @@ dpif_netdev_mask_from_nlattrs(const struct nlattr *key, > uint32_t key_len, > if (mask_key_len) { > enum odp_key_fitness fitness; > > - fitness = odp_flow_key_to_mask(mask_key, mask_key_len, key, key_len, > - &wc->masks, flow); > + fitness = odp_flow_key_to_mask_udpif(mask_key, mask_key_len, key, > + key_len, &wc->masks, flow); > if (fitness) { > /* This should not happen: it indicates that > * odp_flow_key_from_mask() and odp_flow_key_to_mask() > @@ -1919,7 +1919,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr *key, > uint32_t key_len, > { > odp_port_t in_port; > > - if (odp_flow_key_to_flow(key, key_len, flow)) { > + if (odp_flow_key_to_flow_udpif(key, key_len, flow)) { > /* This should not happen: it indicates that odp_flow_key_from_flow() > * and odp_flow_key_to_flow() disagree on the acceptable form of a > * flow. Log the problem as an error, with enough details to enable > @@ -3014,11 +3014,25 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, > struct dp_packet *packet_, > struct ofpbuf *actions, struct ofpbuf *put_actions) > { > struct dp_netdev *dp = pmd->dp; > + struct flow_tnl orig_tunnel; > + int err; > > if (OVS_UNLIKELY(!dp->upcall_cb)) { > return ENODEV; > } > > + orig_tunnel.flags = flow->tunnel.flags; > + if (flow->tunnel.flags & FLOW_TNL_F_UDPIF) { > + orig_tunnel.metadata.present.len = flow->tunnel.metadata.present.len; > + memcpy(orig_tunnel.metadata.opts.gnv, flow->tunnel.metadata.opts.gnv, > + flow->tunnel.metadata.present.len); > + err = tun_metadata_from_geneve_udpif(&orig_tunnel, &orig_tunnel, > + &flow->tunnel); > + if (err) { > + return err; > + } > + } > + All these new blocks in dp_netdev_upcall() would benefit from short comments. I.e., “Upcall processing expects the geneve options to be in the translated format, but we need to retain the raw format for datapath use.” > if (OVS_UNLIKELY(!VLOG_DROP_DBG(&upcall_rl))) { > struct ds ds = DS_EMPTY_INITIALIZER; > char *packet_str; > @@ -3046,8 +3060,39 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, > struct dp_packet *packet_, > ds_destroy(&ds); > } > > - return dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata, > - actions, wc, put_actions, dp->upcall_aux); > + err = dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata, > + actions, wc, put_actions, dp->upcall_aux); > + if (err && err != ENOSPC) { > + return err; > + } > + /* Translate tunnel metadata masks to datapath format. */ > + if (wc) { > + if (wc->masks.tunnel.metadata.present.map) { > + struct geneve_opt opts[GENEVE_TOT_OPT_SIZE / > + sizeof(struct geneve_opt)]; > + > + tun_metadata_to_geneve_udpif_mask(&flow->tunnel, > + &wc->masks.tunnel, > + orig_tunnel.metadata.opts.gnv, > + > orig_tunnel.metadata.present.len, > + opts); > + > + memset(&wc->masks.tunnel.metadata, 0, > + sizeof wc->masks.tunnel.metadata); > + memcpy(&wc->masks.tunnel.metadata.opts.gnv, opts, > + orig_tunnel.metadata.present.len); > + } > + wc->masks.tunnel.metadata.present.len = 0xff; > + } > + /* Restore tunnel metadata. */ Might add a sentence why restoring the original is better than translation, and how this works with the mask (unknown options will be wildcarded). > + if (orig_tunnel.flags & FLOW_TNL_F_UDPIF) { > + memcpy(&flow->tunnel.metadata.opts.gnv, > orig_tunnel.metadata.opts.gnv, > + orig_tunnel.metadata.present.len); > + flow->tunnel.metadata.present.len = orig_tunnel.metadata.present.len; > + flow->tunnel.flags |= FLOW_TNL_F_UDPIF; > + } > + > + return err; > } > > static inline uint32_t > diff --git a/lib/flow.c b/lib/flow.c > index 352e9b8..d3d25e4 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -462,9 +462,22 @@ miniflow_extract(struct dp_packet *packet, struct > miniflow *dst) > miniflow_push_words(mf, tunnel, &md->tunnel, > offsetof(struct flow_tnl, metadata) / > sizeof(uint64_t)); > - if (md->tunnel.metadata.opt_map) { > - miniflow_push_words(mf, tunnel.metadata, &md->tunnel.metadata, > - sizeof md->tunnel.metadata / > sizeof(uint64_t)); > + > + if (!(md->tunnel.flags & FLOW_TNL_F_UDPIF)) { > + if (md->tunnel.metadata.present.map) { > + miniflow_push_words(mf, tunnel.metadata, > &md->tunnel.metadata, > + sizeof md->tunnel.metadata / > + sizeof(uint64_t)); > + } > + } else { > + if (md->tunnel.metadata.present.len) { > + miniflow_push_words(mf, tunnel.metadata.present, > + &md->tunnel.metadata.present, 1); > + miniflow_push_words(mf, tunnel.metadata.opts.gnv, > + md->tunnel.metadata.opts.gnv, > + > DIV_ROUND_UP(md->tunnel.metadata.present.len, > + sizeof(uint64_t))); Wrong indentation here. > + } > } > } > if (md->skb_priority || md->pkt_mark) { > @@ -815,7 +828,7 @@ flow_get_metadata(const struct flow *flow, struct match > *flow_metadata) > if (flow->tunnel.gbp_flags) { > match_set_tun_gbp_flags(flow_metadata, flow->tunnel.gbp_flags); > } > - tun_metadata_get_fmd(&flow->tunnel.metadata, flow_metadata); > + tun_metadata_get_fmd(&flow->tunnel, flow_metadata); > if (flow->metadata != htonll(0)) { > match_set_metadata(flow_metadata, flow->metadata); > } > @@ -1161,9 +1174,16 @@ void flow_wildcards_init_for_packet(struct > flow_wildcards *wc, > WC_MASK_FIELD(wc, tunnel.gbp_id); > WC_MASK_FIELD(wc, tunnel.gbp_flags); > > - if (flow->tunnel.metadata.opt_map) { > - wc->masks.tunnel.metadata.opt_map = > flow->tunnel.metadata.opt_map; > - WC_MASK_FIELD(wc, tunnel.metadata.opts); > + if (!(flow->tunnel.flags & FLOW_TNL_F_UDPIF)) { > + if (flow->tunnel.metadata.present.map) { > + wc->masks.tunnel.metadata.present.map = > + > flow->tunnel.metadata.present.map; > + WC_MASK_FIELD(wc, tunnel.metadata.opts.u8); > + } > + } else { > + WC_MASK_FIELD(wc, tunnel.metadata.present.len); > + memset(wc->masks.tunnel.metadata.opts.gnv, 0xff, > + flow->tunnel.metadata.present.len); > } > } else if (flow->tunnel.tun_id) { > WC_MASK_FIELD(wc, tunnel.tun_id); > @@ -1253,9 +1273,17 @@ flow_wc_map(const struct flow *flow, struct miniflow > *map) > > map->tnl_map = 0; > if (flow->tunnel.ip_dst) { > - map->tnl_map = MINIFLOW_TNL_MAP(tunnel); > - if (!flow->tunnel.metadata.opt_map) { > - map->tnl_map &= ~MINIFLOW_TNL_MAP(tunnel.metadata); > + map->tnl_map |= MINIFLOW_TNL_MAP__(tunnel, > + offsetof(struct flow_tnl, > metadata)); > + if (!(flow->tunnel.flags & FLOW_TNL_F_UDPIF)) { > + if (flow->tunnel.metadata.present.map) { > + map->tnl_map |= MINIFLOW_TNL_MAP__(tunnel.metadata, > + > sizeof(flow->tunnel.metadata)); This could be MINIFLOW_TNL_MAP() instead? > + } > + } else { > + map->tnl_map |= MINIFLOW_TNL_MAP(tunnel.metadata.present.len); > + map->tnl_map |= MINIFLOW_TNL_MAP__(tunnel.metadata.opts.gnv, > + > flow->tunnel.metadata.present.len); > } > } > > (snip) > diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at > index bd95c8e..0f1724a 100644 > --- a/tests/tunnel-push-pop.at > +++ b/tests/tunnel-push-pop.at > @@ -132,7 +132,7 @@ AT_CHECK([ovs-ofctl dump-ports int-br | grep 'port 5'], > [0], [dnl > port 5: rx pkts=1, bytes=98, drop=0, errs=0, frame=0, over=0, crc=0 > ]) > AT_CHECK([ovs-appctl dpif/dump-flows int-br], [0], [dnl > -tunnel(tun_id=0x7b,src=1.1.2.92,dst=1.1.2.88,ttl=64,geneve({class=0xffff,type=0x80,len=4,0xa/0xf}),flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(6081),eth_type(0x0800),ipv4(frag=no), > packets:0, bytes:0, used:never, actions:drop > +tunnel(tun_id=0x7b,src=1.1.2.92,dst=1.1.2.88,ttl=64,geneve({class=0xffff,type=0x80,len=4,0xa/0xf}{class=0xffff,type=0,len=4}),flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(6081),eth_type(0x0800),ipv4(frag=no), > packets:0, bytes:0, used:never, actions:drop Why this changed? > ]) > > OVS_VSWITCHD_STOP > -- > 2.1.4 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev