Thanks Jarno, Your patches look good, unfortunately they do not achieve the same performance enhancements. By looking at the perf output it appears that we spend a lot of time in miniflow_extract() storing the miniflow length. I believe this is due to two reasons:
- Storing the length in the same bit field as the map complicates the assembly code. While the compiler is smart enough to understand that 'count:8¹ is 8-bit aligned and can be accessed as a byte, it access the memory several times and uses two different mask. Accoring to perf we lose a lot of time here. - It is not necessary to extract the length in miniflow_extract(). It makes more sense (at least for the userspace datapath) to store the length only for the miniflows in the exact match cache. Reducing further the NETDEV_KEY_BUF_SIZE_U32 is definitely a good idea and we should apply that. Thanks, Daniele On 9/9/14, 4:25 PM, "Jarno Rajahalme" <jrajaha...@nicira.com> wrote: >Daniele, > >I just posted rebased versions of two patches on which I worked earlier >in a similar area as there two patches: > >[PATCH 1/2] lib/flow: Add Œminiflow.count¹ >[PATCH 2/2] lib/flow: Maintain miniflow values as 64-bit aligned. > >I¹d like to see if we could generalize the inlining benefits, e.g., by >defining special ³inline only² functions in lib/flow.h, if necessary. > >So, maybe you could take a look and consider if you could build on those >two in the direction you proposed here? > > Jarno > >On Sep 5, 2014, at 5:10 PM, Daniele Di Proietto <ddiproie...@vmware.com> >wrote: > >> This optimization is done to avoid calling count_1bits(), which, if the >>popcnt >> istruction, is not available might is slow. popcnt may not be available >> because: >> >> - We are running on old hardware >> - (more likely) We're using a generic build (i.e. packaged OVS from a >>distro), >> not tuned for the specific CPU >> >> Signed-off-by: Daniele Di Proietto <ddiproie...@vmware.com> >> --- >> This commit improves 1-flow UDP 64-bytes packets test throughput by 6% >> (compiled without -march=native) >> --- >> lib/dpif-netdev.c | 22 ++++++++++++++-------- >> 1 file changed, 14 insertions(+), 8 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 112cd5a..ea14838 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -129,6 +129,7 @@ struct netdev_flow_key { >> >> struct emc_entry { >> uint32_t hash; >> + uint32_t mf_len; >> struct netdev_flow_key mf; >> struct dp_netdev_flow *flow; >> }; >> @@ -398,6 +399,7 @@ emc_cache_init(struct emc_cache *flow_cache) >> for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { >> flow_cache->entries[i].flow = NULL; >> flow_cache->entries[i].hash = 0; >> + flow_cache->entries[i].mf_len = 0; >> miniflow_initialize(&flow_cache->entries[i].mf.flow, >> flow_cache->entries[i].mf.buf); >> } >> @@ -1171,11 +1173,10 @@ netdev_flow_key_size(uint32_t flow_u32s) >> /* Used to compare 'netdev_flow_key's (miniflows) in the exact match >>cache. */ >> static inline bool >> netdev_flow_key_equal(const struct netdev_flow_key *a, >> - const struct netdev_flow_key *b) >> + const struct netdev_flow_key *b, >> + uint32_t size) >> { >> - uint32_t size = count_1bits(a->flow.map); >> - >> - return !memcmp(a, b, netdev_flow_key_size(size)); >> + return !memcmp(a, b, size); >> } >> >> static inline void >> @@ -1183,7 +1184,7 @@ netdev_flow_key_clone(struct netdev_flow_key *dst, >> const struct netdev_flow_key *src, >> uint32_t size) >> { >> - memcpy(dst, src, netdev_flow_key_size(size)); >> + memcpy(dst, src, size); >> } >> >> static inline bool >> @@ -1217,8 +1218,11 @@ emc_change_entry(struct emc_entry *ce, struct >>dp_netdev_flow *flow, >> } >> } >> if (mf) { >> - netdev_flow_key_clone(&ce->mf, mf, count_1bits(mf->flow.map)); >> + uint32_t mf_len = >>netdev_flow_key_size(count_1bits(mf->flow.map)); >> + >> + netdev_flow_key_clone(&ce->mf, mf, mf_len); >> ce->hash = hash; >> + ce->mf_len = mf_len; >> } >> } >> >> @@ -1232,7 +1236,8 @@ emc_insert(struct emc_cache *cache, const struct >>miniflow *mf, uint32_t hash, >> EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { >> if (current_entry->hash == hash >> && netdev_flow_key_equal(¤t_entry->mf, >> - miniflow_to_netdev_flow_key(mf))) >>{ >> + miniflow_to_netdev_flow_key(mf), >> + current_entry->mf_len)) { >> >> /* We found the entry with the 'mf' miniflow */ >> emc_change_entry(current_entry, flow, NULL, 0); >> @@ -1263,7 +1268,8 @@ emc_lookup(struct emc_cache *cache, const struct >>miniflow *mf, uint32_t hash) >> EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { >> if (current_entry->hash == hash && >>emc_entry_alive(current_entry) >> && netdev_flow_key_equal(¤t_entry->mf, >> - miniflow_to_netdev_flow_key(mf))) >>{ >> + miniflow_to_netdev_flow_key(mf), >> + current_entry->mf_len)) { >> >> /* We found the entry with the 'mf' miniflow */ >> return current_entry->flow; >> -- >> 2.1.0.rc1 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> >>https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman >>/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=MV9BdLjtFIdhBDBaw5z%2 >>BU6SSA2gAfY4L%2F1HCy3VjlKU%3D%0A&m=p4TD0rGhVHeOkMhx53A29QaTzHSWeNxyJzJP0n >>x3ZBM%3D%0A&s=0a68bb17939a8a2efbbcf04062be52f8ff48233f95d8f09d1e1a4eab283 >>ca1e6 > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev