Daniele, Thanks for the analysis. I did not see that now all miniflows need benefit form the count.
I’ll send a separate patch just for the NETDEV_KEY_BUF_SIZE_U32 changes. Jarno On Sep 11, 2014, at 4:09 PM, Daniele Di Proietto <ddiproie...@vmware.com> wrote: > Thanks Jarno, > > Your patches look good, unfortunately they do not achieve the same > performance enhancements. > By looking at the perf output it appears that we spend a lot of time in > miniflow_extract() storing the miniflow length. I believe this is due to > two reasons: > > - Storing the length in the same bit field as the map complicates the > assembly code. While the compiler is smart enough to understand that > 'count:8¹ is 8-bit aligned and can be accessed as a byte, it access the > memory several times and uses two different mask. Accoring to perf we lose > a lot of time here. > - It is not necessary to extract the length in miniflow_extract(). It > makes more sense (at least for the userspace datapath) to store the length > only for the miniflows in the exact match cache. > > Reducing further the NETDEV_KEY_BUF_SIZE_U32 is definitely a good idea and > we should apply that. > > Thanks, > > Daniele > > On 9/9/14, 4:25 PM, "Jarno Rajahalme" <jrajaha...@nicira.com> wrote: > >> Daniele, >> >> I just posted rebased versions of two patches on which I worked earlier >> in a similar area as there two patches: >> >> [PATCH 1/2] lib/flow: Add Œminiflow.count¹ >> [PATCH 2/2] lib/flow: Maintain miniflow values as 64-bit aligned. >> >> I¹d like to see if we could generalize the inlining benefits, e.g., by >> defining special ³inline only² functions in lib/flow.h, if necessary. >> >> So, maybe you could take a look and consider if you could build on those >> two in the direction you proposed here? >> >> Jarno >> >> On Sep 5, 2014, at 5:10 PM, Daniele Di Proietto <ddiproie...@vmware.com> >> wrote: >> >>> This optimization is done to avoid calling count_1bits(), which, if the >>> popcnt >>> istruction, is not available might is slow. popcnt may not be available >>> because: >>> >>> - We are running on old hardware >>> - (more likely) We're using a generic build (i.e. packaged OVS from a >>> distro), >>> not tuned for the specific CPU >>> >>> Signed-off-by: Daniele Di Proietto <ddiproie...@vmware.com> >>> --- >>> This commit improves 1-flow UDP 64-bytes packets test throughput by 6% >>> (compiled without -march=native) >>> --- >>> lib/dpif-netdev.c | 22 ++++++++++++++-------- >>> 1 file changed, 14 insertions(+), 8 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index 112cd5a..ea14838 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -129,6 +129,7 @@ struct netdev_flow_key { >>> >>> struct emc_entry { >>> uint32_t hash; >>> + uint32_t mf_len; >>> struct netdev_flow_key mf; >>> struct dp_netdev_flow *flow; >>> }; >>> @@ -398,6 +399,7 @@ emc_cache_init(struct emc_cache *flow_cache) >>> for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { >>> flow_cache->entries[i].flow = NULL; >>> flow_cache->entries[i].hash = 0; >>> + flow_cache->entries[i].mf_len = 0; >>> miniflow_initialize(&flow_cache->entries[i].mf.flow, >>> flow_cache->entries[i].mf.buf); >>> } >>> @@ -1171,11 +1173,10 @@ netdev_flow_key_size(uint32_t flow_u32s) >>> /* Used to compare 'netdev_flow_key's (miniflows) in the exact match >>> cache. */ >>> static inline bool >>> netdev_flow_key_equal(const struct netdev_flow_key *a, >>> - const struct netdev_flow_key *b) >>> + const struct netdev_flow_key *b, >>> + uint32_t size) >>> { >>> - uint32_t size = count_1bits(a->flow.map); >>> - >>> - return !memcmp(a, b, netdev_flow_key_size(size)); >>> + return !memcmp(a, b, size); >>> } >>> >>> static inline void >>> @@ -1183,7 +1184,7 @@ netdev_flow_key_clone(struct netdev_flow_key *dst, >>> const struct netdev_flow_key *src, >>> uint32_t size) >>> { >>> - memcpy(dst, src, netdev_flow_key_size(size)); >>> + memcpy(dst, src, size); >>> } >>> >>> static inline bool >>> @@ -1217,8 +1218,11 @@ emc_change_entry(struct emc_entry *ce, struct >>> dp_netdev_flow *flow, >>> } >>> } >>> if (mf) { >>> - netdev_flow_key_clone(&ce->mf, mf, count_1bits(mf->flow.map)); >>> + uint32_t mf_len = >>> netdev_flow_key_size(count_1bits(mf->flow.map)); >>> + >>> + netdev_flow_key_clone(&ce->mf, mf, mf_len); >>> ce->hash = hash; >>> + ce->mf_len = mf_len; >>> } >>> } >>> >>> @@ -1232,7 +1236,8 @@ emc_insert(struct emc_cache *cache, const struct >>> miniflow *mf, uint32_t hash, >>> EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { >>> if (current_entry->hash == hash >>> && netdev_flow_key_equal(¤t_entry->mf, >>> - miniflow_to_netdev_flow_key(mf))) >>> { >>> + miniflow_to_netdev_flow_key(mf), >>> + current_entry->mf_len)) { >>> >>> /* We found the entry with the 'mf' miniflow */ >>> emc_change_entry(current_entry, flow, NULL, 0); >>> @@ -1263,7 +1268,8 @@ emc_lookup(struct emc_cache *cache, const struct >>> miniflow *mf, uint32_t hash) >>> EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { >>> if (current_entry->hash == hash && >>> emc_entry_alive(current_entry) >>> && netdev_flow_key_equal(¤t_entry->mf, >>> - miniflow_to_netdev_flow_key(mf))) >>> { >>> + miniflow_to_netdev_flow_key(mf), >>> + current_entry->mf_len)) { >>> >>> /* We found the entry with the 'mf' miniflow */ >>> return current_entry->flow; >>> -- >>> 2.1.0.rc1 >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> >>> https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman >>> /listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=MV9BdLjtFIdhBDBaw5z%2 >>> BU6SSA2gAfY4L%2F1HCy3VjlKU%3D%0A&m=p4TD0rGhVHeOkMhx53A29QaTzHSWeNxyJzJP0n >>> x3ZBM%3D%0A&s=0a68bb17939a8a2efbbcf04062be52f8ff48233f95d8f09d1e1a4eab283 >>> ca1e6 >> > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev