<snip> > > Thanks for your suggestions, we found that the -fno-tree-vectorize option > works. > PS: This option is not successfully added in the earliest test. > > Solution: > 1. use the -fno-tree-vectorize option to prevent compiler generate auto > vetorization > code, so tha slow-path will work fine. > 2. add '-march=armv8-a+sve+crc' line of implementer_generic in > arm/meson.build > 'part_number_config': { > 'generic': {'machine_args': ['-march=armv8-a+crc', > '-march=armv8-a+sve+crc', > '-moutline-atomics']} > } > If compiler doesn't support '-march=armv8-a+sve+crc', then it will fallback > supports '-march=armv8-a+crc'. > If compiler supports '-march=armv8-a+sve+crc', then it will compile SVE- > related > code, so the IO-path could support SVE. > > Base above we could achieve initial target. The 'generic' target is for generating a binary that would work on all ArmV8 machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path would not work on non-SVE machines.
> > > On 2021/5/1 4:54, Honnappa Nagarahalli wrote: > > <snip> > > > >> > >> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen > >> <fengcheng...@huawei.com> wrote: > >>> > >>> Hi, ALL > >>> We have a question for your help: > >>> 1. We have two platforms, both of which are ARM64, one of which > >> supports > >>> both NEON and SVE, the other only support NEON. > >>> 2. We want to run on both platforms with a single binary file, and use > the > >>> highest vector capability of the corresponding platform > >>> whenever > >> possible. > >> > >> I see VPP has a similar feature. IMO, it is not present in DPDK. > >> Basically, In order to do this. > >> - Compile slow-path code(90% of DPDK) with minimal CPU instruction > >> set support > >> - Have fastpath function compile with different CPU instruction set > >> levels -In slowpath, Attach the fastpath function pointer-based on > >> CPU instruction- level support. > > Agree. > > > >> > >> > >>> 3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC > >> 10.2). > > This defines the minimum capabilities of the target machine. > > > >>> However, it is found that invalid instructions occur when the program > >>> runs on a machine that does not support SVE (pls see below). > >>> 4. The problem is caused by the introduction of SVE in GCC > >>> automatic > >> vector > >>> optimization. > >>> > >>> So Is there a way to disable GCC automatic vector optimization or use > only > >>> NEON to perform automatic vector optimization? > > I do not think this is safe. Once SVE is enabled, compiler is allowed to use > the SVE instructions wherever it finds it fit. > > > >>> > >>> BTW: we already test -fno-tree-vectorize (as link below) but found > >>> no > >> effect. > >>> > >>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vector > >>> iz > >>> ation-while-using-gcc > >>> > >>> > >>> The GDB output: > >>> EAL: Detected 128 lcore(s) > >>> EAL: Detected 4 NUMA nodes > >>> Option -w, --pci-whitelist is deprecated, use -a, --allow > >>> option instead > >>> > >>> Program received signal SIGILL, Illegal instruction. > >>> 0x0000000000671b88 in eal_adjust_config () > >>> (gdb) > >>> (gdb) where > >>> #0 0x0000000000671b88 in eal_adjust_config () > >>> #1 0x0000000000682840 in rte_eal_init () > >>> #2 0x000000000051c870 in main () > >>> (gdb) > >>> > >>> The disassembly output of eal_adjust_config: > >>> 671b7c: f8237a81 str x1, [x20, x3, lsl #3] > >>> 671b80: f110001f cmp x0, #0x400 > >>> 671b84: 54ffff21 b.ne 671b68 > >>> <eal_adjust_config+0x1f4> // > >> b.any > >>> 671b88: 043357f5 addvl x21, x19, #-1 > >>> 671b8c: 043457e1 addvl x1, x20, #-1 > >>> 671b90: 910562b5 add x21, x21, #0x158 > >>> 671b94: 04e0e3e0 cntd x0 > >>> 671b98: 914012b5 add x21, x21, #0x4, lsl #12 > >>> 671b9c: 52800218 mov w24, #0x10 > >>> // #16 > >>> 671ba0: 25d8e3e1 ptrue p1.d > >>> 671ba4: 25f80fe0 whilelo p0.d, wzr, w24 > >>> 671ba8: a5e04020 ld1d {z0.d}, p0/z, [x1, x0, lsl #3] > >>> > >>> > >>> Best regards. > >>>