Thanks for doing this! Some suggestions below.
This no longer applies cleanly, and I also needed to add this incremental to
make the testsuite pass:
diff --git a/tests/pmd.at b/tests/pmd.at
index 3d219ff..9b6427e 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -144,10 +144,11 @@ dummy@ovs-dummy: hit:0 missed:0
p0 7/1: (dummy-pmd: configured_rx_queues=4,
configured_tx_queues=<cleared>, requested_rx_queues=4,
requested_tx_queues=<cleared>)
])
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN |
sed '/cycles/d' | grep pmd -A 4], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN |
sed '/cycles/d' | grep pmd -A 5], [0], [dnl
pmd thread numa_id <cleared> core_id <cleared>:
emc hits:0
megaflow hits:0
+ avg. subtable lookups per hit: 0.00
miss:0
lost:0
])
@@ -170,10 +171,11 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install |
strip_xout], [0], [dnl
recirc_id(0),in_port(1),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no),
actions: <del>
])
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN |
sed '/cycles/d' | grep pmd -A 4], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN |
sed '/cycles/d' | grep pmd -A 5], [0], [dnl
pmd thread numa_id <cleared> core_id <cleared>:
emc hits:19
megaflow hits:0
+ avg. subtable lookups per hit: 0.00
miss:1
lost:0
])
> On Jun 16, 2016, at 6:56 AM, Jan Scheurich <[email protected]> wrote:
>
> The user-space datapath (dpif-netdev) consists of a first level "exact match
> cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
> many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient
> and the OVS forwarding performance is determined by the megaflow classifier.
>
> The megaflow classifier (dpcls) consists of a variable number of hash tables
> (aka subtables), each containing megaflow entries with the same mask of
> packet header and metadata fields to match upon. A dpcls lookup matches a
> given packet against all subtables in sequence until it hits a match. As
> megaflow cache entries are by construction non-overlapping, the first match
> is the only match.
>
> Today the order of the subtables in the dpcls is essentially random so that
> on average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the
> total number of subtables. Even though every single hash-table lookup is
> fast, the performance of the current dpcls degrades when there are many
> subtables.
>
> How does the patch address this issue:
>
> In reality there is often a strong correlation between the ingress port and a
> small subset of subtables that have hits. The entire megaflow cache typically
> decomposes nicely into partitions that are hit only by packets entering from
> a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM ->
> Phy).
>
> Therefore, keeping a separate list of subtables per ingress port, sorted by
> frequency of hits, reduces the average number of subtables lookups in the
> dpcls to a minimum, even if the total number of subtables gets large.
Did you consider keeping a separate dpcls instance per input port instead?
I.e., have an hmap of dpcls'es using the in_port as the hash key, and creating
and destroying them on demand. This would avoid the fixed size array, simplify
the code and each insert and remove operation would be a bit cheaper. A miss
would typically go though a lower number of subtables, even though the upcall
cost would likely hide this benefit in practice. We are always exact matching
in_port (as well as dl_type and recirc_id (see xlate_wc_init()), so this would
add no overhead in terms of the overall number of megaflows.
>
> The patch introduces 32 subtable vectors per dpcls and hashes the ingress
> port to select the subtable vector. The patch also counts matches per 32
> slots in each vector (hashing the subtable pointer to obtain the slot) and
> sorts the vectors according to match frequency every second.
>
Also, it should be possible to expand the pvector API to allow using the
pvector priorities directly as the counts. I posted a patch for this
(https://patchwork.ozlabs.org/patch/638920/), after which you could apply the
following incremental. I haven't tested how this performs, though.
Jarno
From: Jarno Rajahalme <[email protected]>
Date: Tue, 21 Jun 2016 17:17:55 -0700
Subject: [PATCH] dpif-netdev: Use non-concurrent pvector.
Cc: <[email protected]>
PMD threads do not need the concurrent version of the pvector. The
non-concurrent pvector entry priorities can be incremented in place
and periodically sorted.
Signed-off-by: Jarno Rajahalme <[email protected]>
---
lib/dpif-netdev.c | 68 +++++++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c491110..647bfb7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -151,12 +151,11 @@ struct emc_cache {
#define NUM_SUBTABLE_VECTORS 32
#define NUM_SUBTABLE_SLOTS 32
-#define DPCLS_OPTIMIATION_INTERVAL 1000
+#define DPCLS_OPTIMIZATION_INTERVAL 1000
struct dpcls {
struct cmap subtables_map;
- struct pvector subtables[NUM_SUBTABLE_VECTORS];
- uint64_t hit_cnt[NUM_SUBTABLE_VECTORS][NUM_SUBTABLE_SLOTS];
+ struct pvector_impl *subtables[NUM_SUBTABLE_VECTORS];
long long int next_optimization;
};
@@ -4371,11 +4370,10 @@ dpcls_init(struct dpcls *cls)
int i;
cmap_init(&cls->subtables_map);
- for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
- pvector_init(&cls->subtables[i]);
+ for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+ cls->subtables[i] = pvector_impl_alloc(PVECTOR_EXTRA_ALLOC);
}
- memset(cls->hit_cnt, 0, sizeof cls->hit_cnt);
- cls->next_optimization = time_msec() + DPCLS_OPTIMIATION_INTERVAL;
+ cls->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
}
static void
@@ -4383,8 +4381,8 @@ dpcls_destroy_subtable(struct dpcls *cls, struct
dpcls_subtable *subtable)
{
int i;
- for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
- pvector_remove(&cls->subtables[i], subtable);
+ for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+ pvector_impl_remove(cls->subtables[i], subtable);
}
cmap_remove(&cls->subtables_map, &subtable->cmap_node,
subtable->mask.hash);
@@ -4407,8 +4405,8 @@ dpcls_destroy(struct dpcls *cls)
dpcls_destroy_subtable(cls, subtable);
}
cmap_destroy(&cls->subtables_map);
- for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
- pvector_destroy(&cls->subtables[i]);
+ for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+ free(cls->subtables[i]);
}
}
}
@@ -4417,7 +4415,6 @@ static struct dpcls_subtable *
dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask)
{
struct dpcls_subtable *subtable;
- int i;
/* Need to add one. */
subtable = xmalloc(sizeof *subtable
@@ -4425,10 +4422,10 @@ dpcls_create_subtable(struct dpcls *cls, const struct
netdev_flow_key *mask)
cmap_init(&subtable->rules);
netdev_flow_key_clone(&subtable->mask, mask);
cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash);
- for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
+
+ for (int i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
/* Insert new subtable with priority zero (no hits) */
- pvector_insert(&cls->subtables[i], subtable, 0);
- pvector_publish(&cls->subtables[i]);
+ pvector_impl_push_back(&cls->subtables[i], subtable, 0);
}
return subtable;
@@ -4456,22 +4453,22 @@ dpcls_try_optimize(struct dpcls *cls)
long long int now = time_msec();
if (now > cls->next_optimization) {
- struct dpcls_subtable *subtable;
- int port_idx, subtable_idx;
-
- for (port_idx=0; port_idx<NUM_SUBTABLE_VECTORS; port_idx++) {
- struct pvector *pvec = &cls->subtables[port_idx];
- PVECTOR_FOR_EACH (subtable, pvec) {
- subtable_idx = hash_pointer(subtable, 210365) %
NUM_SUBTABLE_SLOTS;
- pvector_change_priority(pvec, subtable,
- cls->hit_cnt[port_idx][subtable_idx]);
+ int port_idx;
+
+ for (port_idx = 0; port_idx < NUM_SUBTABLE_VECTORS; port_idx++) {
+ struct pvector_impl *pvec = cls->subtables[port_idx];
+ struct pvector_entry *entry;
+ int index;
+
+ pvector_impl_sort(pvec);
+
+ PVECTOR_IMPL_FOR_EACH_ENTRY (entry, index, pvec) {
+ entry->priority = 0;
}
- pvector_publish(pvec);
}
/* Start new measuring interval */
- memset(cls->hit_cnt, 0, sizeof cls->hit_cnt);
- cls->next_optimization = now + DPCLS_OPTIMIATION_INTERVAL;
+ cls->next_optimization = now + DPCLS_OPTIMIZATION_INTERVAL;
}
}
@@ -4500,8 +4497,9 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule *rule)
if (cmap_remove(&subtable->rules, &rule->cmap_node, rule->flow.hash)
== 0) {
dpcls_destroy_subtable(cls, subtable);
- for (i=0; i<NUM_SUBTABLE_VECTORS; i++) {
- pvector_publish(&cls->subtables[i]);
+
+ for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) {
+ pvector_impl_sort(cls->subtables[i]); /* Removes gaps. */
}
}
}
@@ -4559,10 +4557,11 @@ dpcls_lookup(struct dpcls *cls, const struct
netdev_flow_key keys[],
/* Select the subtable vector for the in_port */
uint32_t port_idx = hash_port_no(port_no) % NUM_SUBTABLE_VECTORS;
- uint64_t *hit_cnt = cls->hit_cnt[port_idx];
int lookups_match = 0, subtable_pos = 1;
+ struct pvector_entry *entry;
+ int index;
- PVECTOR_FOR_EACH (subtable, &cls->subtables[port_idx]) {
+ PVECTOR_IMPL_FOR_EACH_ENTRY (entry, index, cls->subtables[port_idx]) {
const struct netdev_flow_key *mkeys = keys;
struct dpcls_rule **mrules = rules;
map_type remains = 0;
@@ -4570,6 +4569,8 @@ dpcls_lookup(struct dpcls *cls, const struct
netdev_flow_key keys[],
BUILD_ASSERT_DECL(sizeof remains == sizeof *maps);
+ subtable = entry->ptr;
+
for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules += MAP_BITS) {
uint32_t hashes[MAP_BITS];
const struct cmap_node *nodes[MAP_BITS];
@@ -4594,9 +4595,8 @@ dpcls_lookup(struct dpcls *cls, const struct
netdev_flow_key keys[],
CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
if (OVS_LIKELY(dpcls_rule_matches_key(rule, &mkeys[i]))) {
mrules[i] = rule;
- /* Update the subtable hit statistics for the in_port
vector */
- int subtable_idx = hash_pointer(subtable, 210365) %
NUM_SUBTABLE_SLOTS;
- hit_cnt[subtable_idx]++;
+ /* Update the subtable hit stats. */
+ entry->priority++;
lookups_match += subtable_pos;
goto next;
}
--
2.1.4
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev