Thanks Jarno,

Your patches look good, unfortunately they do not achieve the same
performance enhancements.
By looking at the perf output it appears that we spend a lot of time in
miniflow_extract() storing the miniflow length. I believe this is due to
two reasons:

- Storing the length in the same bit field as the map complicates the
assembly code. While the compiler is smart enough to understand that
'count:8¹ is 8-bit aligned and can be accessed as a byte, it access the
memory several times and uses two different mask. Accoring to perf we lose
a lot of time here.
- It is not necessary to extract the length in miniflow_extract(). It
makes more sense (at least for the userspace datapath) to store the length
only for the miniflows in the exact match cache.

Reducing further the NETDEV_KEY_BUF_SIZE_U32 is definitely a good idea and
we should apply that.

Thanks,

Daniele

On 9/9/14, 4:25 PM, "Jarno Rajahalme" <jrajaha...@nicira.com> wrote:

>Daniele,
>
>I just posted rebased versions of two patches on which I worked earlier
>in a similar area as there two patches:
>
>[PATCH 1/2] lib/flow: Add Œminiflow.count¹
>[PATCH 2/2] lib/flow: Maintain miniflow values as 64-bit aligned.
>
>I¹d like to see if we could generalize the inlining benefits, e.g., by
>defining special ³inline only² functions in lib/flow.h, if necessary.
>
>So, maybe you could take a look and consider if you could build on those
>two in the direction you proposed here?
>
>  Jarno
>
>On Sep 5, 2014, at 5:10 PM, Daniele Di Proietto <ddiproie...@vmware.com>
>wrote:
>
>> This optimization is done to avoid calling count_1bits(), which, if the
>>popcnt
>> istruction, is not available might is slow. popcnt may not be available
>> because:
>> 
>> - We are running on old hardware
>> - (more likely) We're using a generic build (i.e. packaged OVS from a
>>distro),
>>  not tuned for the specific CPU
>> 
>> Signed-off-by: Daniele Di Proietto <ddiproie...@vmware.com>
>> ---
>> This commit improves 1-flow UDP 64-bytes packets test throughput by 6%
>> (compiled without -march=native)
>> ---
>> lib/dpif-netdev.c | 22 ++++++++++++++--------
>> 1 file changed, 14 insertions(+), 8 deletions(-)
>> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 112cd5a..ea14838 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -129,6 +129,7 @@ struct netdev_flow_key {
>> 
>> struct emc_entry {
>>     uint32_t hash;
>> +    uint32_t mf_len;
>>     struct netdev_flow_key mf;
>>     struct dp_netdev_flow *flow;
>> };
>> @@ -398,6 +399,7 @@ emc_cache_init(struct emc_cache *flow_cache)
>>     for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
>>         flow_cache->entries[i].flow = NULL;
>>         flow_cache->entries[i].hash = 0;
>> +        flow_cache->entries[i].mf_len = 0;
>>         miniflow_initialize(&flow_cache->entries[i].mf.flow,
>>                             flow_cache->entries[i].mf.buf);
>>     }
>> @@ -1171,11 +1173,10 @@ netdev_flow_key_size(uint32_t flow_u32s)
>> /* Used to compare 'netdev_flow_key's (miniflows) in the exact match
>>cache. */
>> static inline bool
>> netdev_flow_key_equal(const struct netdev_flow_key *a,
>> -                      const struct netdev_flow_key *b)
>> +                      const struct netdev_flow_key *b,
>> +                      uint32_t size)
>> {
>> -    uint32_t size = count_1bits(a->flow.map);
>> -
>> -    return !memcmp(a, b, netdev_flow_key_size(size));
>> +    return !memcmp(a, b, size);
>> }
>> 
>> static inline void
>> @@ -1183,7 +1184,7 @@ netdev_flow_key_clone(struct netdev_flow_key *dst,
>>                       const struct netdev_flow_key *src,
>>                       uint32_t size)
>> {
>> -    memcpy(dst, src, netdev_flow_key_size(size));
>> +    memcpy(dst, src, size);
>> }
>> 
>> static inline bool
>> @@ -1217,8 +1218,11 @@ emc_change_entry(struct emc_entry *ce, struct
>>dp_netdev_flow *flow,
>>         }
>>     }
>>     if (mf) {
>> -        netdev_flow_key_clone(&ce->mf, mf, count_1bits(mf->flow.map));
>> +        uint32_t mf_len =
>>netdev_flow_key_size(count_1bits(mf->flow.map));
>> +
>> +        netdev_flow_key_clone(&ce->mf, mf, mf_len);
>>         ce->hash = hash;
>> +        ce->mf_len = mf_len;
>>     }
>> }
>> 
>> @@ -1232,7 +1236,8 @@ emc_insert(struct emc_cache *cache, const struct
>>miniflow *mf, uint32_t hash,
>>     EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) {
>>         if (current_entry->hash == hash
>>             && netdev_flow_key_equal(&current_entry->mf,
>> -                                     miniflow_to_netdev_flow_key(mf)))
>>{
>> +                                     miniflow_to_netdev_flow_key(mf),
>> +                                     current_entry->mf_len)) {
>> 
>>             /* We found the entry with the 'mf' miniflow */
>>             emc_change_entry(current_entry, flow, NULL, 0);
>> @@ -1263,7 +1268,8 @@ emc_lookup(struct emc_cache *cache, const struct
>>miniflow *mf, uint32_t hash)
>>     EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) {
>>         if (current_entry->hash == hash &&
>>emc_entry_alive(current_entry)
>>             && netdev_flow_key_equal(&current_entry->mf,
>> -                                     miniflow_to_netdev_flow_key(mf)))
>>{
>> +                                     miniflow_to_netdev_flow_key(mf),
>> +                                     current_entry->mf_len)) {
>> 
>>             /* We found the entry with the 'mf' miniflow */
>>             return current_entry->flow;
>> -- 
>> 2.1.0.rc1
>> 
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> 
>>https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman
>>/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=MV9BdLjtFIdhBDBaw5z%2
>>BU6SSA2gAfY4L%2F1HCy3VjlKU%3D%0A&m=p4TD0rGhVHeOkMhx53A29QaTzHSWeNxyJzJP0n
>>x3ZBM%3D%0A&s=0a68bb17939a8a2efbbcf04062be52f8ff48233f95d8f09d1e1a4eab283
>>ca1e6
>

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to