________________________________________ Od: Kulasek, TomaszX <tomaszx.kulasek at intel.com> Wys?ane: 15 marca 2016 17:06 Do: Thomas Monjalon; Czekaj, Maciej DW: dev at dpdk.org Temat: RE: [dpdk-dev] [PATCH v3] examples/l3fwd: em path performance fix
> -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Tuesday, March 15, 2016 15:50 > To: Kulasek, TomaszX <tomaszx.kulasek at intel.com>; Maciej Czekaj > <maciej.czekaj at caviumnetworks.com> > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] examples/l3fwd: em path performance fix > > 2016-03-15 14:31, Kulasek, TomaszX: > > From: Kulasek, TomaszX > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > > There is an error: > > > > examples/l3fwd/l3fwd_em_hlm_sse.h:72:38: error: > > > > incompatible type for argument 2 of ?_mm_and_si128? > > > > > > It's caused by > > > > > > commit 64d3955de1de4d7879a0930a6d2f501369d3445a > > > Author: Maciej Czekaj <maciej.czekaj at caviumnetworks.com> > > > Date: Thu Mar 10 17:06:22 2016 +0100 > > > > > > examples/l3fwd: fix ARM build > > > > > > Enable NEON support in exact match mode. > > > l3fwd example did not compile on ARM due to SSE2 instrincics used > > > in generic part. > > > Some instrinsins were used to initialize data structures and > > > those were > > > replaced by ordinary structure initalization. > > > All SSE2 intrinsics used in forwarding, i.e. masking the IP/TCP > header > > > are moved to single inline function and made arch-specific. > > > > > > Signed-off-by: Maciej Czekaj <maciej.czekaj at caviumnetworks.com> > > > > > > Which doesn't include rework of l3fwd_em_hlm_sse.h file. > > > > > > When you compile it now with global "#define HASH_MULTI_LOOKUP 1" > > > and alternative classification is used, and compilation will also fail > now. > > > > > > I need a little bit more time to investigate it, because I'm not an > > > expert in ARM. It will be nice if Maciej will help me in that. > > > > > > Tomasz > > > > Will be that ok for you to disable this path for arm? > > Please, what do you mean? > Maciej, have you looked at this issue? This fix uses platform specific part of code which wasn't reworked in previous patch for ARM. It causes compilation error. What I mean, is to leave current classification path for ARM and turn on alternative only for Intel platform. Like that: 60 +#if defined(NO_HASH_MULTI_LOOKUP) || defined(__ARM_NEON) 61 #include "l3fwd_em_sse.h" 62 #else 63 #include "l3fwd_em_hlm_sse.h" Thanks guys for pointing this out. The issue is that after my patch mask0, mask1 and mask2 are now defined as: static rte_xmm_t mask0; static rte_xmm_t mask1; static rte_xmm_t mask2; rte_xmm_t is a union with xmm_t field inside. Apparently, I overlooked the HASH_MULTI_LOOKUP define I can provide a quick fix for that, I need to rename all maskN references to maskN.x, to point out to xmm_t variable. E.g. the following diff is fixing the compilation. diff --git a/examples/l3fwd/l3fwd_em_hlm_sse.h b/examples/l3fwd/l3fwd_em_hlm_sse.h index d3388da..eb23163 100644 --- a/examples/l3fwd/l3fwd_em_hlm_sse.h +++ b/examples/l3fwd/l3fwd_em_hlm_sse.h @@ -77,14 +77,14 @@ em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], sizeof(struct ether_hdr) + offsetof(struct ipv4_hdr, time_to_live))); - key[0].xmm = _mm_and_si128(data[0], mask0); - key[1].xmm = _mm_and_si128(data[1], mask0); - key[2].xmm = _mm_and_si128(data[2], mask0); - key[3].xmm = _mm_and_si128(data[3], mask0); - key[4].xmm = _mm_and_si128(data[4], mask0); - key[5].xmm = _mm_and_si128(data[5], mask0); - key[6].xmm = _mm_and_si128(data[6], mask0); - key[7].xmm = _mm_and_si128(data[7], mask0); + key[0].xmm = _mm_and_si128(data[0], mask0.x); + key[1].xmm = _mm_and_si128(data[1], mask0.x); + key[2].xmm = _mm_and_si128(data[2], mask0.x); + key[3].xmm = _mm_and_si128(data[3], mask0.x); + key[4].xmm = _mm_and_si128(data[4], mask0.x); + key[5].xmm = _mm_and_si128(data[5], mask0.x); + key[6].xmm = _mm_and_si128(data[6], mask0.x); + key[7].xmm = _mm_and_si128(data[7], mask0.x); const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3], &key[4], &key[5], &key[6], &key[7]}; @@ -175,14 +175,14 @@ em_get_dst_port_ipv6x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], int32_t ret[8]; union ipv6_5tuple_host key[8]; - get_ipv6_5tuple(m[0], mask1, mask2, &key[0]); - get_ipv6_5tuple(m[1], mask1, mask2, &key[1]); - get_ipv6_5tuple(m[2], mask1, mask2, &key[2]); - get_ipv6_5tuple(m[3], mask1, mask2, &key[3]); - get_ipv6_5tuple(m[4], mask1, mask2, &key[4]); - get_ipv6_5tuple(m[5], mask1, mask2, &key[5]); - get_ipv6_5tuple(m[6], mask1, mask2, &key[6]); - get_ipv6_5tuple(m[7], mask1, mask2, &key[7]); + get_ipv6_5tuple(m[0], mask1.x, mask2.x, &key[0]); + get_ipv6_5tuple(m[1], mask1.x, mask2.x, &key[1]); + get_ipv6_5tuple(m[2], mask1.x, mask2.x, &key[2]); + get_ipv6_5tuple(m[3], mask1.x, mask2.x, &key[3]); + get_ipv6_5tuple(m[4], mask1.x, mask2.x, &key[4]); + get_ipv6_5tuple(m[5], mask1.x, mask2.x, &key[5]); + get_ipv6_5tuple(m[6], mask1.x, mask2.x, &key[6]); + get_ipv6_5tuple(m[7], mask1.x, mask2.x, &key[7]); const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3], &key[4], &key[5], &key[6], &key[7]}; Would you like me to re-post the patch? Thanks Maciej