On Fri, Jun 19, 2015 at 04:13:24PM -0700, Jesse Gross wrote: > The current support for Geneve in OVS is exactly equivalent to VXLAN: > it is possible to set and match on the VNI but not on any options > contained in the header. This patch enables the use of options. > > The goal for Geneve support is not to add support for any particular option > but to allow end users or controllers to specify what they would like to > match. That is, the full range of Geneve's capabilities should be exposed > without modifying the code (the one exception being options that require > per-packet computation in the fast path). > > The main issue with supporting Geneve options is how to integrate the > fields into the existing OpenFlow pipeline. All existing operations > are referred to by their NXM/OXM field name - matches, action generation, > arithmetic operations (i.e. tranfer to a register). However, the Geneve > option space is exactly the same as the OXM space, so a direct mapping > is not feasible. Instead, we create a pool of 64 NXMs that are then > dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped, > these fields become first-class citizens in the OpenFlow pipeline. > > An example of how to use Geneve options: > ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0 > ovs-ofctl add-flow br0 > in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1 > > This will add a 4 bytes option (filled will all 1's) to all packets > coming from the LOCAL port and then send then out to port 1. > > A limitation of this patch is that although the option table is specified > for a particular switch over OpenFlow, it is currently global to all > switches. This will be addressed in a future patch.
Out of curiosity, when you say that do you mean "This is left as future work" or "I'm cleaning up a patch as we speak" or somewhere in between? > Based on work originally done by Madhu Challa. > > Signed-off-by: Jesse Gross <je...@nicira.com> Among this series, this patch is clearly the money shot. Bugs ---- Building on i386, this makes the compiler really angry at me because struct tun_metadata is an odd multiple of 32 bits in size, which makes struct flow an odd multiple of 32 bits in size and fails the flow.h assertion BUILD_ASSERT_DECL(sizeof(struct flow) % sizeof(uint64_t) == 0); This fixes it but obviously it's not the right fix: diff --git a/lib/tun-metadata.h b/lib/tun-metadata.h index 8ba0757..4e1e84a 100644 --- a/lib/tun-metadata.h +++ b/lib/tun-metadata.h @@ -37,6 +37,7 @@ struct tun_metadata { uint8_t opts[TUN_METADATA_TOT_OPT_SIZE]; uint64_t opt_map; struct tun_table *tab; + uint8_t pad[4]; }; BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >= TUN_METADATA_NUM_OPTS); I see a lot of use of "1 << (something)" in tun-metadata, but I believe that these need to use 1ULL or UINT64_C(1) instead. I'm a little confused about the alloc_offset field in struct tun_metadata_allocation. It seems to be a byte offset but metadata_loc_from_match() compares it against TUN_METADATA_NUM_OPTS. Is that a typo for TUN_METADATA_TOT_OPT_SIZE? Other ----- tun-metadata.c has a few hard tabs. Fragmented tunnel metadata is nasty. Is there a way we can avoid it? I like the implementation approach used in ULONG_FOR_EACH_1 better than the one in PRESENT_OPT_FOR_EACH, since the user doesn't have to provide a scratch variable. Also it protects the macro arguments better. Actually ULONG_FOR_EACH_1 could be made to work with 32- or 64-bit maps just by changing "unsigned long" to "uint64_t" in its definition; maybe we should. I'm not certain whether the tun_metadata layer intends to limit itself to option values that are a multiple of 4 bytes (as Geneve has) or whether it's supposed to be an abstraction that doesn't care. tun_meta_find_key() could probably be a tiny bit faster by switching from HMAP_FOR_EACH_WITH_HASH to HMAP_FOR_EACH_IN_BUCKET. I spent some of my time studying the code by annotating it with comments. I'm appending a patch, hope you'll adopt a lot of it. Acked-by: Ben Pfaff <b...@nicira.com> --8<--------------------------cut here-------------------------->8-- diff --git a/lib/tun-metadata.c b/lib/tun-metadata.c index b09085e..eb7724c 100644 --- a/lib/tun-metadata.c +++ b/lib/tun-metadata.c @@ -31,16 +31,22 @@ #include "tun-metadata.h" struct tun_meta_entry { - struct hmap_node node; - uint32_t key; /* Class and Type */ + struct hmap_node node; /* In struct tun_table's key_hmap. */ + uint32_t key; /* (class << 16) | type. */ struct tun_metadata_loc loc; - bool valid; + bool valid; /* True if allocated to a class and type. */ }; +/* Maps from Geneve option class+type to positions in a struct tun_metadata's + * 'opts' array. */ struct tun_table { + /* TUN_METADATA<i> is stored in element <i>. */ struct tun_meta_entry entries[TUN_METADATA_NUM_OPTS]; - /* Each bit represents 4 bytes of allocated space */ + + /* Each bit represents 4 bytes of space, 0-bits are free space. */ unsigned long alloc_map[BITMAP_N_LONGS(TUN_METADATA_TOT_OPT_SIZE / 4)]; + + /* The valid elements in entries[], indexed by class+type. */ struct hmap key_hmap; }; BUILD_ASSERT_DECL(TUN_METADATA_TOT_OPT_SIZE % 4 == 0); @@ -81,6 +87,8 @@ tun_key_type(uint32_t key) return key & 0xff; } +/* Returns a newly allocated tun_table. If 'old_map' is nonnull then the new + * tun_table is a deep copy of the old one. */ static struct tun_table * table_alloc(const struct tun_table *old_map) OVS_REQUIRES(tab_mutex) { @@ -114,6 +122,7 @@ table_alloc(const struct tun_table *old_map) OVS_REQUIRES(tab_mutex) return new_map; } +/* Frees 'map' and all the memory it owns. */ static void table_free(struct tun_table *map) OVS_REQUIRES(tab_mutex) { @@ -130,6 +139,7 @@ table_free(struct tun_table *map) OVS_REQUIRES(tab_mutex) free(map); } +/* Creates a global tunnel metadata mapping table, if none already exists. */ void tun_metadata_init(void) { @@ -221,6 +231,13 @@ tun_metadata_table_request(struct ofputil_geneve_table_reply *gtr) } } +/* Copies the value of field 'mf' from 'metadata' into 'value'. + * + * 'mf' must be an MFF_TUN_METADATA* field. + * + * This uses the global tunnel metadata mapping table created by + * tun_metadata_init(). If no such table has been created or if 'mf' hasn't + * been allocated in it yet, this just zeros 'value'. */ void tun_metadata_read(const struct tun_metadata *metadata, const struct mf_field *mf, union mf_value *value) @@ -241,6 +258,13 @@ tun_metadata_read(const struct tun_metadata *metadata, metadata, loc); } +/* Copies 'value' into field 'mf' in 'metadata'. + * + * 'mf' must be an MFF_TUN_METADATA* field. + * + * This uses the global tunnel metadata mapping table created by + * tun_metadata_init(). If no such table has been created or if 'mf' hasn't + * been allocated in it yet, this function does nothing. */ void tun_metadata_write(struct tun_metadata *metadata, const struct mf_field *mf, const union mf_value *value) @@ -289,6 +313,21 @@ metadata_loc_from_match(struct tun_table *map, struct match *match, return &match->tun_md.loc[idx]; } +/* Makes 'match' match 'value'/'mask' on field 'mf'. + * + * 'mf' must be an MFF_TUN_METADATA* field. + * + * If there is global tunnel metadata matching table, this function is + * effective only if there is already a mapping for 'mf'. Otherwise, the + * metadata mapping table integrated into 'match' is used, adding 'mf' to its + * mapping table if it isn't already mapped (and if there is room). If 'mf' + * isn't or can't be mapped, this function returns without modifying 'match'. + * + * 'value' may be NULL; if so, then 'mf' is made to match on an all-zeros + * value. + * + * 'mask' may be NULL; if so, then 'mf' is made exact-match. + */ void tun_metadata_set_match(const struct mf_field *mf, const union mf_value *value, const union mf_value *mask, struct match *match) @@ -333,6 +372,7 @@ tun_metadata_set_match(const struct mf_field *mf, const union mf_value *value, memcpy_to_metadata(&match->wc.masks.tunnel.metadata, data.tun_metadata, loc); } +/* Copies all MFF_TUN_METADATA* fields from 'metadata' to 'flow_metadata'. */ void tun_metadata_get_fmd(const struct tun_metadata *metadata, struct match *flow_metadata) diff --git a/lib/tun-metadata.h b/lib/tun-metadata.h index 8ba0757..28ad9e4 100644 --- a/lib/tun-metadata.h +++ b/lib/tun-metadata.h @@ -33,10 +33,18 @@ struct tun_table; #define TUN_METADATA_NUM_OPTS 64 #define TUN_METADATA_TOT_OPT_SIZE 256 + +/* Geneve tunnel option data, plus metadata to aid in their interpretation. + * + * 'opt_map' is indexed by type, that is, by the <i> in TUN_METADATA<i>, so + * that e.g. TUN_METADATA5 is present if 'opt_map & (1ULL << 5)' is nonzero. + * The actual data for TUN_METADATA5, if present, might be anywhere in 'opts' + * (not necessarily even contiguous), and finding it requires referring to + * 'tab'. */ struct tun_metadata { - uint8_t opts[TUN_METADATA_TOT_OPT_SIZE]; - uint64_t opt_map; - struct tun_table *tab; + uint8_t opts[TUN_METADATA_TOT_OPT_SIZE]; /* Values from tunnel TLVs. */ + uint64_t opt_map; /* 1-bit for each present TLV. */ + struct tun_table *tab; /* Types & lengths for 'opts' and 'opt_map'. */ }; BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >= TUN_METADATA_NUM_OPTS); @@ -46,16 +54,16 @@ BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >= * linked list of these blocks. */ struct tun_metadata_loc_chain { struct tun_metadata_loc_chain *next; - uint8_t offset; - uint8_t len; + uint8_t offset; /* In bytes, from start of 'opts', multiple of 4. */ + uint8_t len; /* In bytes, multiple of 4. */ }; struct tun_metadata_loc { - int len; + int len; /* Sum of 'len' over elements in chain. */ struct tun_metadata_loc_chain c; }; -/* Allocation of options to be used inside a match structure. This is +/* Allocation of options inside struct match. This is * important if we don't have access to a global allocation table - either * because there isn't one (ovs-ofctl) or if we need to keep the allocation * outside of packet processing context (Packet-In). These structures never @@ -63,8 +71,8 @@ struct tun_metadata_loc { * fragmented. */ struct tun_metadata_allocation { struct tun_metadata_loc loc[TUN_METADATA_NUM_OPTS]; - uint8_t alloc_offset; - bool valid; + uint8_t alloc_offset; /* Byte offset into 'opts', multiple of 4. */ + bool valid; /* Set to true after any allocation occurs. */ }; void tun_metadata_init(void); _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev