> -----Original Message----- > From: David Christensen <d...@linux.vnet.ibm.com> > Sent: Friday, July 26, 2019 3:36 AM > To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; Bruce Richardson > <bruce.richard...@intel.com>; hgovindh > <hariprasad.govindhara...@intel.com> > Cc: Remy Horton <remy.hor...@intel.com>; Marko Kovacevic > <marko.kovace...@intel.com>; Ori Kam <or...@mellanox.com>; Pablo de > Lara <pablo.de.lara.gua...@intel.com>; Radu Nicolau > <radu.nico...@intel.com>; Akhil Goyal <akhil.go...@nxp.com>; Tomasz > Kantecki <tomasz.kante...@intel.com>; dev@dpdk.org; > maciej.cze...@caviumnetworks.com; sta...@dpdk.org; Gavin Hu > <gavin...@arm.com> > Subject: [EXT] Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned > memory access > > >>>> Fix unaligned memory access when reading IPv6 header which leads to > >>>> segmentation fault by changing aligned memory read to unaligned > >>>> memory read. > >>>> > >>>> Bugzilla ID: 279 > >>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build") > >>>> Cc: maciej.cze...@caviumnetworks.com > >>>> Cc: sta...@dpdk.org > >>>> Signed-off-by: hgovindh <hariprasad.govindhara...@intel.com> > >>>> --- > >>>> V2: Added functions which will do unaligned load based on the > >>>> underlying architecture > >>>> --- > >>>> --- > >>>> examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++-- > >>>> 1 file changed, 24 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/examples/l3fwd/l3fwd_em.c > b/examples/l3fwd/l3fwd_em.c > >>>> index fa8f82be6..f2641586b 100644 > >>>> --- a/examples/l3fwd/l3fwd_em.c > >>>> +++ b/examples/l3fwd/l3fwd_em.c > >>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t > mask) #error No > >>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain > >>>> #endif > >>>> > >>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t > >>>> +em_load_key(void *key) { > >>>> + return _mm_loadu_si128((__m128i *)(key)); } #elif > >>>> +defined(RTE_MACHINE_CPUFLAG_NEON) > >>>> +static inline xmm_t > >>>> +em_load_key(void *key) > >>>> +{ > >>>> + return vld1q_s32((int32_t *)key); } #elif > >>>> +defined(RTE_MACHINE_CPUFLAG_ALTIVEC) > >>>> +static inline xmm_t > >>>> +em_load_key(void *key) > >>>> +{ > >>>> + return vec_ld(0, (xmm_t *)(key)); } > >> > >> Added power pc maintainer > > > >> Not sure all architecture need SIMD instructions for access to > >> unaligned memory location. > >> > >> @hgovindh, > >> Could you provide exact setup details for reproducing this issue, I > >> can test it on arm64. > >> Like l3fwd command, Traffic generator traffic pattern > > > > The vec_ld() function requires 16 byte alignment. (My understanding > > is that GCC code will mask the lower four bits of the address to > > enforce the requirement: > > https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-p > > ower8) > > Power 8 and later processors support the vec_vsx_ld() function which > > does not have the same memory alignment requirements. > > > > I'll need to try and reproduce the original bug to see what code is > > actually being generated. Outside of vector instructions I wouldn't > > expect to see errors with unaligned data references. > > Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc 4.8.5, no > segmentation fault observed after 30 minutes (observed segmentation fault > on Intel system immediately). > > Code dissassembly: > (gdb) info line l3fwd_em.c:290 > Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at > address 0x10146fbc <em_main_loop+1660> > and ends at 0x10146fc0 <em_main_loop+1664>. > (gdb) disass /m 0x10146fbc,0x10146fc0 > Dump of assembler code from 0x10146fbc to 0x10146fc0: > 290 key.xmm[1] = *(xmm_t *)data1; > 0x0000000010146fbc <em_main_loop+1660>: li r7,20 > > End of assembler dump. > > Since vector element ordering is different on Intel vs Power/ARM, suggest > only applying vector operation to Intel code at this time otherwise additional > steps may be required to modify MASK values to match the new vector > operations.
On arm64, Generated assembly is following. Where LDUR and STR works With unaligned memory(i.e no need for special handling). I would suggest to have eal function to abstract The difference between x86 vs Power/ARM to avoid ifdef clutter in all the applications. key.xmm[1] = *(xmm_t *)data1; 0x00000000004ebed4 <+1188>: 60 40 c1 3c ldur q0, [x3, #20] 0x00000000004ebedc <+1196>: a0 73 80 3d str q0, [x29, #448] 0x00000000004ec064 <+1588>: 41 40 c1 3c ldur q1, [x2, #20] 0x00000000004ec06c <+1596>: a1 73 80 3d str q1, [x29, #448] > > Dave