On 2021/5/11 22:10, Honnappa Nagarahalli wrote:
> <snip>
>>>
>>>>
>>>> Thanks for your suggestions, we found that the -fno-tree-vectorize
>>>> option works.
>>>> PS: This option is not successfully added in the earliest test.
>>>>
>>>> Solution:
>>>> 1. use the -fno-tree-vectorize option to prevent compiler generate
>>>> auto vetorization
>>>> code, so tha slow-path will work fine.
>>>> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
>>>> arm/meson.build
>>>> 'part_number_config': {
>>>> 'generic': {'machine_args': ['-march=armv8-a+crc',
>>>> '-march=armv8-a+sve+crc',
>>>> '-moutline-atomics']}
>>>> }
>>>> If compiler doesn't support '-march=armv8-a+sve+crc', then it will
>> fallback
>>>> supports '-march=armv8-a+crc'.
>>>> If compiler supports '-march=armv8-a+sve+crc', then it will
>>>> compile SVE- related
>>>> code, so the IO-path could support SVE.
>>>>
>>>> Base above we could achieve initial target.
>>> The 'generic' target is for generating a binary that would work on all ArmV8
>> machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path
>> would not work on non-SVE machines.
>>>
>>
>> The 'generic' only used in local CI (note: the two platforms are both ARMv8
>> machines)
>>
>> In the IO-path, we support NEON and SVE Rx/Tx, the code was written by
>> ACLE, so it will not affect by the -fno-tree-vectorize option.
>>
>> If compiler supports '-march=armv8-a+sve+crc', then it will compile both
>> NEON and SVE related code.
> Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an
> absolute guarantee that the compiler will not use SVE elsewhere.
>
> The safest way to ensure that only specific functions use SVE is to compile
> without +sve (e.g. using -march=armv8-a) and use pragmas around the functions
> that are allowed to use SVE. Ex:
>
> #pragma GCC push_options
> #pragma GCC target ("+sve")
> void f(int *x) {
> for (int i = 0; i < 100; ++i) x[i] = i;
> }
> #pragma GCC pop_options
> void g(int *x) {
> for (int i = 0; i < 100; ++i) x[i] = i;
> }
>
> compiles f() using SVE and g() with standard options.
>
> You can also follow the function multiversioning discussed in the other
> thread.
>
Thanks for your suggestions
Because the SVE code is organized by file, so use the following scheme in hns3
meson.build:
if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
sources += files('hns3_rxtx_vec.c')
# compile SVE when:
# a. support SVE in minimum instruction set baseline
# b. it's not minimum instruction set, but compiler support
if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
cflags += ['-DCC_SVE_SUPPORT']
sources += files('hns3_rxtx_vec_sve.c')
elif cc.has_argument('-march=armv8.2-a+sve')
cflags += ['-DCC_SVE_SUPPORT']
hns3_sve_lib = static_library('hns3_sve_lib',
'hns3_rxtx_vec_sve.c',
dependencies: [static_rte_ethdev],
include_directories: includes,
c_args: [cflags, '-march=armv8.2-a+sve'])
objs += hns3_sve_lib.extract_objects('hns3_rxtx_vec_sve.c')
endif
endif
Ref:
https://patchwork.dpdk.org/project/dpdk/patch/1620808126-18876-3-git-send-email-fengcheng...@huawei.com/
Best regards.
>> In the runtime, driver supports detect the platform whether support SVE, if
>> not it will select the NEON.
>>
>> Best regards.
>>
>>>>
>>>>
>>>> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
>>>>> <snip>
>>>>>
>>>>>>
>>>>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
>>>>>> <fengcheng...@huawei.com> wrote:
>>>>>>>
>>>>>>> Hi, ALL
>>>>>>> We have a question for your help:
>>>>>>> 1. We have two platforms, both of which are ARM64, one of which
>>>>>> supports
>>>>>>> both NEON and SVE, the other only support NEON.
>>>>>>> 2. We want to run on both platforms with a single binary file,
>>>>>>> and use
>>>> the
>>>>>>> highest vector capability of the corresponding platform
>>>>>>> whenever
>>>>>> possible.
>>>>>>
>>>>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
>>>>>> Basically, In order to do this.
>>>>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
>>>>>> set support
>>>>>> - Have fastpath function compile with different CPU instruction set
>>>>>> levels -In slowpath, Attach the fastpath function pointer-based on
>>>>>> CPU instruction- level support.
>>>>> Agree.
>>>>>
>>>>>>
>>>>>>
>>>>>>> 3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
>>>>>> 10.2).
>>>>> This defines the minimum capabilities of the target machine.
>>>>>
>>>>>>> However, it is found that invalid instructions occur when the
>> program
>>>>>>> runs on a machine that does not support SVE (pls see below).
>>>>>>> 4. The problem is caused by the introduction of SVE in GCC
>>>>>>> automatic
>>>>>> vector
>>>>>>> optimization.
>>>>>>>
>>>>>>> So Is there a way to disable GCC automatic vector optimization
>>>>>>> or use
>>>> only
>>>>>>> NEON to perform automatic vector optimization?
>>>>> I do not think this is safe. Once SVE is enabled, compiler is
>>>>> allowed to use
>>>> the SVE instructions wherever it finds it fit.
>>>>>
>>>>>>>
>>>>>>> BTW: we already test -fno-tree-vectorize (as link below) but
>>>>>>> found no
>>>>>> effect.
>>>>>>>
>>>>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vect
>>>>>>> or
>>>>>>> iz
>>>>>>> ation-while-using-gcc
>>>>>>>
>>>>>>>
>>>>>>> The GDB output:
>>>>>>> EAL: Detected 128 lcore(s)
>>>>>>> EAL: Detected 4 NUMA nodes
>>>>>>> Option -w, --pci-whitelist is deprecated, use -a, --allow
>>>>>>> option instead
>>>>>>>
>>>>>>> Program received signal SIGILL, Illegal instruction.
>>>>>>> 0x0000000000671b88 in eal_adjust_config ()
>>>>>>> (gdb)
>>>>>>> (gdb) where
>>>>>>> #0 0x0000000000671b88 in eal_adjust_config ()
>>>>>>> #1 0x0000000000682840 in rte_eal_init ()
>>>>>>> #2 0x000000000051c870 in main ()
>>>>>>> (gdb)
>>>>>>>
>>>>>>> The disassembly output of eal_adjust_config:
>>>>>>> 671b7c: f8237a81 str x1, [x20, x3, lsl #3]
>>>>>>> 671b80: f110001f cmp x0, #0x400
>>>>>>> 671b84: 54ffff21 b.ne 671b68
>>>>>>> <eal_adjust_config+0x1f4>
>> //
>>>>>> b.any
>>>>>>> 671b88: 043357f5 addvl x21, x19, #-1
>>>>>>> 671b8c: 043457e1 addvl x1, x20, #-1
>>>>>>> 671b90: 910562b5 add x21, x21, #0x158
>>>>>>> 671b94: 04e0e3e0 cntd x0
>>>>>>> 671b98: 914012b5 add x21, x21, #0x4, lsl #12
>>>>>>> 671b9c: 52800218 mov w24, #0x10
>>>>>>> // #16
>>>>>>> 671ba0: 25d8e3e1 ptrue p1.d
>>>>>>> 671ba4: 25f80fe0 whilelo p0.d, wzr, w24
>>>>>>> 671ba8: a5e04020 ld1d {z0.d}, p0/z, [x1, x0, lsl
>>>>>>> #3]
>>>>>>>
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>
>