On Fri, Jun 19, 2015 at 04:13:24PM -0700, Jesse Gross wrote:
> The current support for Geneve in OVS is exactly equivalent to VXLAN:
> it is possible to set and match on the VNI but not on any options
> contained in the header. This patch enables the use of options.
>
> The goal for Geneve support is not to add support for any particular option
> but to allow end users or controllers to specify what they would like to
> match. That is, the full range of Geneve's capabilities should be exposed
> without modifying the code (the one exception being options that require
> per-packet computation in the fast path).
>
> The main issue with supporting Geneve options is how to integrate the
> fields into the existing OpenFlow pipeline. All existing operations
> are referred to by their NXM/OXM field name - matches, action generation,
> arithmetic operations (i.e. tranfer to a register). However, the Geneve
> option space is exactly the same as the OXM space, so a direct mapping
> is not feasible. Instead, we create a pool of 64 NXMs that are then
> dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped,
> these fields become first-class citizens in the OpenFlow pipeline.
>
> An example of how to use Geneve options:
> ovs-ofctl add-geneve-map br0 {class=0xffff,type=0,len=4}->tun_metadata0
> ovs-ofctl add-flow br0
> in_port=LOCAL,actions=set_field:0xffffffff->tun_metadata0,1
>
> This will add a 4 bytes option (filled will all 1's) to all packets
> coming from the LOCAL port and then send then out to port 1.
>
> A limitation of this patch is that although the option table is specified
> for a particular switch over OpenFlow, it is currently global to all
> switches. This will be addressed in a future patch.
Out of curiosity, when you say that do you mean "This is left as future
work" or "I'm cleaning up a patch as we speak" or somewhere in between?
> Based on work originally done by Madhu Challa.
>
> Signed-off-by: Jesse Gross <[email protected]>
Among this series, this patch is clearly the money shot.
Bugs
----
Building on i386, this makes the compiler really angry at me because
struct tun_metadata is an odd multiple of 32 bits in size, which makes
struct flow an odd multiple of 32 bits in size and fails the flow.h
assertion
BUILD_ASSERT_DECL(sizeof(struct flow) % sizeof(uint64_t) == 0);
This fixes it but obviously it's not the right fix:
diff --git a/lib/tun-metadata.h b/lib/tun-metadata.h
index 8ba0757..4e1e84a 100644
--- a/lib/tun-metadata.h
+++ b/lib/tun-metadata.h
@@ -37,6 +37,7 @@ struct tun_metadata {
uint8_t opts[TUN_METADATA_TOT_OPT_SIZE];
uint64_t opt_map;
struct tun_table *tab;
+ uint8_t pad[4];
};
BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >=
TUN_METADATA_NUM_OPTS);
I see a lot of use of "1 << (something)" in tun-metadata, but I believe
that these need to use 1ULL or UINT64_C(1) instead.
I'm a little confused about the alloc_offset field in struct
tun_metadata_allocation. It seems to be a byte offset but
metadata_loc_from_match() compares it against TUN_METADATA_NUM_OPTS. Is
that a typo for TUN_METADATA_TOT_OPT_SIZE?
Other
-----
tun-metadata.c has a few hard tabs.
Fragmented tunnel metadata is nasty. Is there a way we can avoid it?
I like the implementation approach used in ULONG_FOR_EACH_1 better than
the one in PRESENT_OPT_FOR_EACH, since the user doesn't have to provide
a scratch variable. Also it protects the macro arguments better.
Actually ULONG_FOR_EACH_1 could be made to work with 32- or 64-bit maps
just by changing "unsigned long" to "uint64_t" in its definition; maybe
we should.
I'm not certain whether the tun_metadata layer intends to limit itself
to option values that are a multiple of 4 bytes (as Geneve has) or
whether it's supposed to be an abstraction that doesn't care.
tun_meta_find_key() could probably be a tiny bit faster by switching
from HMAP_FOR_EACH_WITH_HASH to HMAP_FOR_EACH_IN_BUCKET.
I spent some of my time studying the code by annotating it with
comments. I'm appending a patch, hope you'll adopt a lot of it.
Acked-by: Ben Pfaff <[email protected]>
--8<--------------------------cut here-------------------------->8--
diff --git a/lib/tun-metadata.c b/lib/tun-metadata.c
index b09085e..eb7724c 100644
--- a/lib/tun-metadata.c
+++ b/lib/tun-metadata.c
@@ -31,16 +31,22 @@
#include "tun-metadata.h"
struct tun_meta_entry {
- struct hmap_node node;
- uint32_t key; /* Class and Type */
+ struct hmap_node node; /* In struct tun_table's key_hmap. */
+ uint32_t key; /* (class << 16) | type. */
struct tun_metadata_loc loc;
- bool valid;
+ bool valid; /* True if allocated to a class and type. */
};
+/* Maps from Geneve option class+type to positions in a struct tun_metadata's
+ * 'opts' array. */
struct tun_table {
+ /* TUN_METADATA<i> is stored in element <i>. */
struct tun_meta_entry entries[TUN_METADATA_NUM_OPTS];
- /* Each bit represents 4 bytes of allocated space */
+
+ /* Each bit represents 4 bytes of space, 0-bits are free space. */
unsigned long alloc_map[BITMAP_N_LONGS(TUN_METADATA_TOT_OPT_SIZE / 4)];
+
+ /* The valid elements in entries[], indexed by class+type. */
struct hmap key_hmap;
};
BUILD_ASSERT_DECL(TUN_METADATA_TOT_OPT_SIZE % 4 == 0);
@@ -81,6 +87,8 @@ tun_key_type(uint32_t key)
return key & 0xff;
}
+/* Returns a newly allocated tun_table. If 'old_map' is nonnull then the new
+ * tun_table is a deep copy of the old one. */
static struct tun_table *
table_alloc(const struct tun_table *old_map) OVS_REQUIRES(tab_mutex)
{
@@ -114,6 +122,7 @@ table_alloc(const struct tun_table *old_map)
OVS_REQUIRES(tab_mutex)
return new_map;
}
+/* Frees 'map' and all the memory it owns. */
static void
table_free(struct tun_table *map) OVS_REQUIRES(tab_mutex)
{
@@ -130,6 +139,7 @@ table_free(struct tun_table *map) OVS_REQUIRES(tab_mutex)
free(map);
}
+/* Creates a global tunnel metadata mapping table, if none already exists. */
void
tun_metadata_init(void)
{
@@ -221,6 +231,13 @@ tun_metadata_table_request(struct
ofputil_geneve_table_reply *gtr)
}
}
+/* Copies the value of field 'mf' from 'metadata' into 'value'.
+ *
+ * 'mf' must be an MFF_TUN_METADATA* field.
+ *
+ * This uses the global tunnel metadata mapping table created by
+ * tun_metadata_init(). If no such table has been created or if 'mf' hasn't
+ * been allocated in it yet, this just zeros 'value'. */
void
tun_metadata_read(const struct tun_metadata *metadata,
const struct mf_field *mf, union mf_value *value)
@@ -241,6 +258,13 @@ tun_metadata_read(const struct tun_metadata *metadata,
metadata, loc);
}
+/* Copies 'value' into field 'mf' in 'metadata'.
+ *
+ * 'mf' must be an MFF_TUN_METADATA* field.
+ *
+ * This uses the global tunnel metadata mapping table created by
+ * tun_metadata_init(). If no such table has been created or if 'mf' hasn't
+ * been allocated in it yet, this function does nothing. */
void
tun_metadata_write(struct tun_metadata *metadata,
const struct mf_field *mf, const union mf_value *value)
@@ -289,6 +313,21 @@ metadata_loc_from_match(struct tun_table *map, struct
match *match,
return &match->tun_md.loc[idx];
}
+/* Makes 'match' match 'value'/'mask' on field 'mf'.
+ *
+ * 'mf' must be an MFF_TUN_METADATA* field.
+ *
+ * If there is global tunnel metadata matching table, this function is
+ * effective only if there is already a mapping for 'mf'. Otherwise, the
+ * metadata mapping table integrated into 'match' is used, adding 'mf' to its
+ * mapping table if it isn't already mapped (and if there is room). If 'mf'
+ * isn't or can't be mapped, this function returns without modifying 'match'.
+ *
+ * 'value' may be NULL; if so, then 'mf' is made to match on an all-zeros
+ * value.
+ *
+ * 'mask' may be NULL; if so, then 'mf' is made exact-match.
+ */
void
tun_metadata_set_match(const struct mf_field *mf, const union mf_value *value,
const union mf_value *mask, struct match *match)
@@ -333,6 +372,7 @@ tun_metadata_set_match(const struct mf_field *mf, const
union mf_value *value,
memcpy_to_metadata(&match->wc.masks.tunnel.metadata, data.tun_metadata,
loc);
}
+/* Copies all MFF_TUN_METADATA* fields from 'metadata' to 'flow_metadata'. */
void
tun_metadata_get_fmd(const struct tun_metadata *metadata,
struct match *flow_metadata)
diff --git a/lib/tun-metadata.h b/lib/tun-metadata.h
index 8ba0757..28ad9e4 100644
--- a/lib/tun-metadata.h
+++ b/lib/tun-metadata.h
@@ -33,10 +33,18 @@ struct tun_table;
#define TUN_METADATA_NUM_OPTS 64
#define TUN_METADATA_TOT_OPT_SIZE 256
+
+/* Geneve tunnel option data, plus metadata to aid in their interpretation.
+ *
+ * 'opt_map' is indexed by type, that is, by the <i> in TUN_METADATA<i>, so
+ * that e.g. TUN_METADATA5 is present if 'opt_map & (1ULL << 5)' is nonzero.
+ * The actual data for TUN_METADATA5, if present, might be anywhere in 'opts'
+ * (not necessarily even contiguous), and finding it requires referring to
+ * 'tab'. */
struct tun_metadata {
- uint8_t opts[TUN_METADATA_TOT_OPT_SIZE];
- uint64_t opt_map;
- struct tun_table *tab;
+ uint8_t opts[TUN_METADATA_TOT_OPT_SIZE]; /* Values from tunnel TLVs. */
+ uint64_t opt_map; /* 1-bit for each present TLV. */
+ struct tun_table *tab; /* Types & lengths for 'opts' and 'opt_map'. */
};
BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >=
TUN_METADATA_NUM_OPTS);
@@ -46,16 +54,16 @@ BUILD_ASSERT_DECL(sizeof(((struct tun_metadata
*)0)->opt_map) * 8 >=
* linked list of these blocks. */
struct tun_metadata_loc_chain {
struct tun_metadata_loc_chain *next;
- uint8_t offset;
- uint8_t len;
+ uint8_t offset; /* In bytes, from start of 'opts', multiple of 4. */
+ uint8_t len; /* In bytes, multiple of 4. */
};
struct tun_metadata_loc {
- int len;
+ int len; /* Sum of 'len' over elements in chain. */
struct tun_metadata_loc_chain c;
};
-/* Allocation of options to be used inside a match structure. This is
+/* Allocation of options inside struct match. This is
* important if we don't have access to a global allocation table - either
* because there isn't one (ovs-ofctl) or if we need to keep the allocation
* outside of packet processing context (Packet-In). These structures never
@@ -63,8 +71,8 @@ struct tun_metadata_loc {
* fragmented. */
struct tun_metadata_allocation {
struct tun_metadata_loc loc[TUN_METADATA_NUM_OPTS];
- uint8_t alloc_offset;
- bool valid;
+ uint8_t alloc_offset; /* Byte offset into 'opts', multiple of 4. */
+ bool valid; /* Set to true after any allocation occurs. */
};
void tun_metadata_init(void);
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev