On Mon, Jan 12, 2015 at 07:16:04PM +0000, Konstantin Ananyev wrote: > v2 changes: > - When build with the compilers that don't support AVX2 instructions, > make rte_acl_classify_avx2() do nothing and return an error. > - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*. > - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y > always buildable. > > This patch series contain several fixes and enhancements for ACL library. > See complete list below. > Two main changes that are externally visible: > - Introduce new classify method: RTE_ACL_CLASSIFY_AVX2. > It uses AVX2 instructions and 256 bit wide data types > to perform internal trie traversal. > That helps to increase classify() throughput. > This method is selected as default one on CPUs that supports AVX2. > - Introduce new field in the build config structure: max_size. > It specifies maximum size that internal RT structure for given context > can reach. > The purpose of that is to allow user to decide about space/performance > trade-off > (faster classify() vs less space for RT internal structures) > for each given set of rules. > > Konstantin Ananyev (17): > fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y > app/test: few small fixes fot test_acl.c > librte_acl: make data_indexes long enough to survive idle transitions. > librte_acl: remove build phase heuristsic with negative perfomance > effect. > librte_acl: fix a bug at build phase that can cause matches beeing > overwirtten. > librte_acl: introduce DFA nodes compression (group64) for identical > entries. > librte_acl: build/gen phase - simplify the way match nodes are > allocated. > librte_acl: make scalar RT code to be more similar to vector one. > librte_acl: a bit of RT code deduplication. > EAL: introduce rte_ymm and relatives in rte_common_vect.h. > librte_acl: add AVX2 as new rte_acl_classify() method > test-acl: add ability to manually select RT method. > librte_acl: Remove search_sse_2 and relatives. > libter_acl: move lo/hi dwords shuffle out from calc_addr > libte_acl: make calc_addr a define to deduplicate the code. > libte_acl: introduce max_size into rte_acl_config. > libte_acl: remove unused macros. > > app/test-acl/main.c | 126 +++-- > app/test/test_acl.c | 8 +- > examples/l3fwd-acl/main.c | 3 +- > examples/l3fwd/main.c | 2 +- > lib/librte_acl/Makefile | 18 + > lib/librte_acl/acl.h | 58 ++- > lib/librte_acl/acl_bld.c | 392 +++++++--------- > lib/librte_acl/acl_gen.c | 268 +++++++---- > lib/librte_acl/acl_run.h | 7 +- > lib/librte_acl/acl_run_avx2.c | 54 +++ > lib/librte_acl/acl_run_avx2.h | 284 ++++++++++++ > lib/librte_acl/acl_run_scalar.c | 65 ++- > lib/librte_acl/acl_run_sse.c | 585 > +----------------------- > lib/librte_acl/acl_run_sse.h | 357 +++++++++++++++ > lib/librte_acl/acl_vect.h | 132 +++--- > lib/librte_acl/rte_acl.c | 47 +- > lib/librte_acl/rte_acl.h | 4 + > lib/librte_acl/rte_acl_osdep_alone.h | 47 +- > lib/librte_eal/common/include/rte_common_vect.h | 39 +- > lib/librte_lpm/rte_lpm.h | 2 +- > 20 files changed, 1444 insertions(+), 1054 deletions(-) > create mode 100644 lib/librte_acl/acl_run_avx2.c > create mode 100644 lib/librte_acl/acl_run_avx2.h > create mode 100644 lib/librte_acl/acl_run_sse.h > > -- > 1.8.5.3 > > Series Acked-by: Neil Horman <nhorman at tuxdriver.com>
Nice work Neil