Hi Damjan,
Sorry for being late. It took me some time to investigate writing SVE/SVE2 code 
for fixed vector register size.

I can totally get your concerns on the code branch caused by deploying scalable 
vector code, which is quite different from existing SIMD code using NEON, avx2, 
avx512.

In the patch https://gerrit.fd.io/r/c/vpp/+/29943/2, it totally rewrote 
ethernet-input node, which makes the code seems not easy to maintain and may 
cause your concern.

Maybe I should limit the usage of SVE/SVE2 in small scale code segment. Could 
you take a look at patches https://gerrit.fd.io/r/c/vpp/+/29942/2 and 
https://gerrit.fd.io/r/c/vpp/+/30326? Both are deploying SVE in function 
is_dmac_bad_x4().
The former one is using scalable type for all possible SVE vector register 
size, and latter one is writing code for SVE 256-bit register size only.

The scalable coding works for all possible VEC vector registers size, while in 
the fixed coding style, we have to provide the code separately for all possible 
SVE register size.
Another benefit of scalable coding is that the tail-loop will not be required, 
which will save CPU cycles.
Coding for fixed SVE vector register size will lose the two benefits above.
Please let us know your decision/suggestion?

For people having no access to SVE/SVE2 hardware, they can use the software 
emulator available in below steps.
[1] Install Arm QEMU/Docker on x86 servers to verify SVE/SVE2 code
sudo apt-get install qemu binfmt-support qemu-user-static # Install the qemu 
packages
sudo docker run --rm --privileged multiarch/qemu-user-static --reset -p yes # 
This step will execute the registering scripts
sudo docker run --rm -t arm64v8/ubuntu uname -m # Testing the emulation 
environment aarch64
gcc-10 -march=armv8.3-a+crc+crypto+sve2 main.c
Thanks.
From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Damjan Marion via 
lists.fd.io
Sent: 2020年11月17日 20:15
To: Lijian Zhang <lijian.zh...@arm.com>
Cc: nd <n...@arm.com>; Nitin Saxena <nsax...@marvell.com>; Govindarajan 
Mohandoss <govindarajan.mohand...@arm.com>; Honnappa Nagarahalli 
<honnappa.nagaraha...@arm.com>; Jieqiang Wang <jieqiang.w...@arm.com>; vpp-dev 
<vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] SVE/SVE2 based vectorization optimization


Hi Lijian,

I looked at your patches and I’m quite concerned about this approach, as you 
basically wrote completely different code path for the feature.
I don't see how we can maintain such code easily specially because today we 
don't have ARM hardware which can run that code.
If we merge that code two things can happen:
a) without testing - that code will fall out of sync quickly
a) with testing - people will not be able to modify existing code without 
updating also SVE code and that may be problem if they don't have access to 
hardware

Majority of the code we have is always dealing with the fixed vector size.
Vector size is mainly human decision in VPP code and it takes into account many 
factors including the size and locality of the data.
So it makes more sense to me that we provide SVE based VEC256 and VEC512 
functions which will make existing code to just work on arm instead of trying 
to implenet and maintain separate code paths…

—
Damjan



> On 16.11.2020., at 06:10, Lijian Zhang 
> <lijian.zh...@arm.com<mailto:lijian.zh...@arm.com>> wrote:
>
> Hi Damjan,
> I applied SVE based vectorization in ethernet-input node functions.
> Could you please take time to review below patches?
>
> The patches are committed as the proposal for your comments.
> I have verified the functionality of the code on software emulation platform, 
> and will do performance benchmarking when CPUs with SVE feature are available.
>
> https://gerrit.fd.io/r/c/vpp/+/29939 vppinfra: apply SVE/SVE2 based 
> vectorization [NEW]
> https://gerrit.fd.io/r/c/vpp/+/29940 ethernet: determine next[] node using 
> SVE [NEW]
> https://gerrit.fd.io/r/c/vpp/+/29941 ethernet: secondary DMAC check using SVE 
> [NEW]
> https://gerrit.fd.io/r/c/vpp/+/29942 ethernet: DMAC check using SVE [NEW]
> https://gerrit.fd.io/r/c/vpp/+/29943 ethernet: DMAC/ethertype parse using SVE 
> [NEW]
> https://gerrit.fd.io/r/c/vpp/+/29944 vlib: SVE based vlib_buffer operations 
> [NEW]
>
> Thanks.
>
>> -----Original Message-----
>> From: Damjan Marion <dmar...@me.com<mailto:dmar...@me.com>>
>> Sent: 2020年10月22日 20:33
>> To: Lijian Zhang <lijian.zh...@arm.com<mailto:lijian.zh...@arm.com>>
>> Cc: nd <n...@arm.com<mailto:n...@arm.com>>; Nitin Saxena 
>> <nsax...@marvell.com<mailto:nsax...@marvell.com>>; Govindarajan
>> Mohandoss 
>> <govindarajan.mohand...@arm.com<mailto:govindarajan.mohand...@arm.com>>; 
>> Honnappa Nagarahalli
>> <honnappa.nagaraha...@arm.com<mailto:honnappa.nagaraha...@arm.com>>; 
>> Jieqiang Wang
>> <jieqiang.w...@arm.com<mailto:jieqiang.w...@arm.com>>; vpp-dev 
>> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
>> Subject: Re: SVE/SVE2 based vectorization optimization
>>
>>
>> Dear Lijian,
>>
>> You took very uncommon example of vector usage in the VPP codebase.
>> Common usage is big packet processing loop which is dealing with 2, 4 or 8
>> packets in one iteration.
>>
>> I.e. How we will leverage use of SVE in src/vnet/ethernet/node.c ?
>>
>> Thanks,
>>
>> —
>> Damjan
>>
>>
>>
>>> On 22.10.2020., at 14:08, Lijian Zhang 
>>> <lijian.zh...@arm.com<mailto:lijian.zh...@arm.com>> wrote:
>>>
>>> Hi Damjan,
>>> I committed a patch (https://gerrit.fd.io/r/c/vpp/+/28986) to apply
>> SVE/SVE2 based vectorization in VPP.
>>> The patch works a demo, calling for comments from VPP community.
>>> Could you please review the patch?
>>> If the idea in this proposal is agreed, we will find more opportunities to
>> deploy SVE based vectorization in VPP.
>>>
>>> If necessary, we can make some explanation about the proposal patch in
>> next VPP/Aarch64 meeting to you.
>>>
>>> Some SVE/SVE2 references:
>>> 1. ACLE (Arm C Language Extension) for
>> SVEhttps://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en
>> .pdf
>>> 2. ARM Compiler Scalable Vector Extension User
>> Guidehttps://developer.arm.com/documentation/100891/0607/coding-
>> considerations/using-sve-intrinsics-directly-in-your-c-code
>>> 3. The Scalable Vector Extension (SVE), Architecture Reference Manual
>> Supplement, for ARMv8-
>> Ahttps://static.docs.arm.com/ddi0584/a/DDI0584A_a_SVE_supp_armv8A.pd
>> f
>>>
>>>
>>> Thank you.
>
>
>
>


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18261): https://lists.fd.io/g/vpp-dev/message/18261
Mute This Topic: https://lists.fd.io/mt/77728478/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to