Background: After preliminary discussion with John (Zhihong) and Tim from Intel it was decided that it would be beneficial to use AVX/SSE instructions for memcmp similar to memcpy being implemeneted. In addition, we decided to use librte_hash as a test candidate to test both functionality and performance.
Currently memcmp in librte_hash is used for key comparisons whose length can vary and max key length is defined to 64 bytes. Preliminary tests on memory comparison alone shows using AVX/SSE instructions takes 1/3rd CPU ticks compared with regular memcmp function. Furthermore, hash_perf_autotest shows better results in all categories. Please note that memory comparison is a small portion in hash functionality and CPU Ticks/Op is for hash operations (Add on Empty, Add update, Lookup). Only hash lookup results are shown below. I can send complete results if interested. Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04, x86_64, 16GB DDR3 system. PS: I would like to keep "rte_memcmp" simple with return codes 0 - match 1 - no-match since usage in DPDK is for equality or inequality and I have not seen any instance where less-than/greater-than comparison is needed. Hence "if (unlikely(...))" portion in the code will probably be removed and it will be made specific to DPDK rather than being generic. /*************Existing code**********************************/ *** Hash table performance test results *** Hash Func. , Operation , Key size (bytes), Entries, Entries per bucket, Errors , Avg. bucket entries, Ticks/Op. rte_hash_crc, Lookup , 16 , 1024 , 1 , 10000 , 0.00 , 88.55 rte_hash_crc, Lookup , 16 , 1024 , 2 , 10000 , 0.00 , 99.28 rte_hash_crc, Lookup , 16 , 1024 , 4 , 10000 , 0.00 , 106.73 rte_hash_crc, Lookup , 16 , 1024 , 8 , 10000 , 0.00 , 126.99 rte_hash_crc, Lookup , 16 , 1024 , 16 , 10000 , 0.00 , 159.80 rte_hash_crc, Lookup , 16 , 1048576, 1 , 51 , 0.01 , 175.23 rte_hash_crc, Lookup , 16 , 1048576, 2 , 2 , 0.02 , 171.24 rte_hash_crc, Lookup , 16 , 1048576, 4 , 0 , 0.04 , 145.48 rte_hash_crc, Lookup , 16 , 1048576, 8 , 0 , 0.08 , 162.35 rte_hash_crc, Lookup , 16 , 1048576, 16 , 0 , 0.15 , 182.42 jhash , Lookup , 16 , 1048576, 1 , 33 , 0.01 , 219.71 jhash , Lookup , 16 , 1048576, 2 , 1 , 0.02 , 216.44 jhash , Lookup , 16 , 1048576, 4 , 0 , 0.04 , 188.29 jhash , Lookup , 16 , 1048576, 8 , 0 , 0.08 , 203.70 jhash , Lookup , 16 , 1048576, 16 , 0 , 0.15 , 229.50 /**************New AVX/SSE code******************************/ Hash Func. , Operation , Key size (bytes), Entries, Entries per bucket, Errors , Avg. bucket entries, Ticks/Op. rte_hash_crc, Lookup , 16 , 1024 , 1 , 10000 , 0.00 , 85.69 rte_hash_crc, Lookup , 16 , 1024 , 2 , 10000 , 0.00 , 93.95 rte_hash_crc, Lookup , 16 , 1024 , 4 , 10000 , 0.00 , 102.80 rte_hash_crc, Lookup , 16 , 1024 , 8 , 10000 , 0.00 , 122.60 rte_hash_crc, Lookup , 16 , 1024 , 16 , 10000 , 0.00 , 156.58 rte_hash_crc, Lookup , 16 , 1048576, 1 , 41 , 0.01 , 156.84 rte_hash_crc, Lookup , 16 , 1048576, 2 , 0 , 0.02 , 157.90 rte_hash_crc, Lookup , 16 , 1048576, 4 , 0 , 0.04 , 134.92 rte_hash_crc, Lookup , 16 , 1048576, 8 , 0 , 0.08 , 150.99 rte_hash_crc, Lookup , 16 , 1048576, 16 , 0 , 0.15 , 174.08 jhash , Lookup , 16 , 1048576, 1 , 45 , 0.01 , 212.03 jhash , Lookup , 16 , 1048576, 2 , 2 , 0.02 , 210.65 jhash , Lookup , 16 , 1048576, 4 , 0 , 0.04 , 185.90 jhash , Lookup , 16 , 1048576, 8 , 0 , 0.08 , 201.35 jhash , Lookup , 16 , 1048576, 16 , 0 , 0.15 , 223.54 Ravi Kerur (1): Implement memcmp using AVX/SSE instructions. app/test/test_hash_perf.c | 36 +- .../common/include/arch/ppc_64/rte_memcmp.h | 62 +++ .../common/include/arch/x86/rte_memcmp.h | 421 +++++++++++++++++++++ lib/librte_eal/common/include/generic/rte_memcmp.h | 131 +++++++ lib/librte_hash/rte_hash.c | 59 ++- 5 files changed, 675 insertions(+), 34 deletions(-) create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcmp.h create mode 100644 lib/librte_eal/common/include/generic/rte_memcmp.h -- 1.9.1