Hi Damjan, Sorry for being late. It took me some time to investigate writing SVE/SVE2 code for fixed vector register size.
I can totally get your concerns on the code branch caused by deploying scalable vector code, which is quite different from existing SIMD code using NEON, avx2, avx512. In the patch https://gerrit.fd.io/r/c/vpp/+/29943/2, it totally rewrote ethernet-input node, which makes the code seems not easy to maintain and may cause your concern. Maybe I should limit the usage of SVE/SVE2 in small scale code segment. Could you take a look at patches https://gerrit.fd.io/r/c/vpp/+/29942/2 and https://gerrit.fd.io/r/c/vpp/+/30326? Both are deploying SVE in function is_dmac_bad_x4(). The former one is using scalable type for all possible SVE vector register size, and latter one is writing code for SVE 256-bit register size only. The scalable coding works for all possible VEC vector registers size, while in the fixed coding style, we have to provide the code separately for all possible SVE register size. Another benefit of scalable coding is that the tail-loop will not be required, which will save CPU cycles. Coding for fixed SVE vector register size will lose the two benefits above. Please let us know your decision/suggestion? For people having no access to SVE/SVE2 hardware, they can use the software emulator available in below steps. [1] Install Arm QEMU/Docker on x86 servers to verify SVE/SVE2 code sudo apt-get install qemu binfmt-support qemu-user-static # Install the qemu packages sudo docker run --rm --privileged multiarch/qemu-user-static --reset -p yes # This step will execute the registering scripts sudo docker run --rm -t arm64v8/ubuntu uname -m # Testing the emulation environment aarch64 gcc-10 -march=armv8.3-a+crc+crypto+sve2 main.c Thanks. From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Damjan Marion via lists.fd.io Sent: 2020年11月17日 20:15 To: Lijian Zhang <lijian.zh...@arm.com> Cc: nd <n...@arm.com>; Nitin Saxena <nsax...@marvell.com>; Govindarajan Mohandoss <govindarajan.mohand...@arm.com>; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Jieqiang Wang <jieqiang.w...@arm.com>; vpp-dev <vpp-dev@lists.fd.io> Subject: Re: [vpp-dev] SVE/SVE2 based vectorization optimization Hi Lijian, I looked at your patches and I’m quite concerned about this approach, as you basically wrote completely different code path for the feature. I don't see how we can maintain such code easily specially because today we don't have ARM hardware which can run that code. If we merge that code two things can happen: a) without testing - that code will fall out of sync quickly a) with testing - people will not be able to modify existing code without updating also SVE code and that may be problem if they don't have access to hardware Majority of the code we have is always dealing with the fixed vector size. Vector size is mainly human decision in VPP code and it takes into account many factors including the size and locality of the data. So it makes more sense to me that we provide SVE based VEC256 and VEC512 functions which will make existing code to just work on arm instead of trying to implenet and maintain separate code paths… — Damjan > On 16.11.2020., at 06:10, Lijian Zhang > <lijian.zh...@arm.com<mailto:lijian.zh...@arm.com>> wrote: > > Hi Damjan, > I applied SVE based vectorization in ethernet-input node functions. > Could you please take time to review below patches? > > The patches are committed as the proposal for your comments. > I have verified the functionality of the code on software emulation platform, > and will do performance benchmarking when CPUs with SVE feature are available. > > https://gerrit.fd.io/r/c/vpp/+/29939 vppinfra: apply SVE/SVE2 based > vectorization [NEW] > https://gerrit.fd.io/r/c/vpp/+/29940 ethernet: determine next[] node using > SVE [NEW] > https://gerrit.fd.io/r/c/vpp/+/29941 ethernet: secondary DMAC check using SVE > [NEW] > https://gerrit.fd.io/r/c/vpp/+/29942 ethernet: DMAC check using SVE [NEW] > https://gerrit.fd.io/r/c/vpp/+/29943 ethernet: DMAC/ethertype parse using SVE > [NEW] > https://gerrit.fd.io/r/c/vpp/+/29944 vlib: SVE based vlib_buffer operations > [NEW] > > Thanks. > >> -----Original Message----- >> From: Damjan Marion <dmar...@me.com<mailto:dmar...@me.com>> >> Sent: 2020年10月22日 20:33 >> To: Lijian Zhang <lijian.zh...@arm.com<mailto:lijian.zh...@arm.com>> >> Cc: nd <n...@arm.com<mailto:n...@arm.com>>; Nitin Saxena >> <nsax...@marvell.com<mailto:nsax...@marvell.com>>; Govindarajan >> Mohandoss >> <govindarajan.mohand...@arm.com<mailto:govindarajan.mohand...@arm.com>>; >> Honnappa Nagarahalli >> <honnappa.nagaraha...@arm.com<mailto:honnappa.nagaraha...@arm.com>>; >> Jieqiang Wang >> <jieqiang.w...@arm.com<mailto:jieqiang.w...@arm.com>>; vpp-dev >> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> >> Subject: Re: SVE/SVE2 based vectorization optimization >> >> >> Dear Lijian, >> >> You took very uncommon example of vector usage in the VPP codebase. >> Common usage is big packet processing loop which is dealing with 2, 4 or 8 >> packets in one iteration. >> >> I.e. How we will leverage use of SVE in src/vnet/ethernet/node.c ? >> >> Thanks, >> >> — >> Damjan >> >> >> >>> On 22.10.2020., at 14:08, Lijian Zhang >>> <lijian.zh...@arm.com<mailto:lijian.zh...@arm.com>> wrote: >>> >>> Hi Damjan, >>> I committed a patch (https://gerrit.fd.io/r/c/vpp/+/28986) to apply >> SVE/SVE2 based vectorization in VPP. >>> The patch works a demo, calling for comments from VPP community. >>> Could you please review the patch? >>> If the idea in this proposal is agreed, we will find more opportunities to >> deploy SVE based vectorization in VPP. >>> >>> If necessary, we can make some explanation about the proposal patch in >> next VPP/Aarch64 meeting to you. >>> >>> Some SVE/SVE2 references: >>> 1. ACLE (Arm C Language Extension) for >> SVEhttps://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en >> .pdf >>> 2. ARM Compiler Scalable Vector Extension User >> Guidehttps://developer.arm.com/documentation/100891/0607/coding- >> considerations/using-sve-intrinsics-directly-in-your-c-code >>> 3. The Scalable Vector Extension (SVE), Architecture Reference Manual >> Supplement, for ARMv8- >> Ahttps://static.docs.arm.com/ddi0584/a/DDI0584A_a_SVE_supp_armv8A.pd >> f >>> >>> >>> Thank you. > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18261): https://lists.fd.io/g/vpp-dev/message/18261 Mute This Topic: https://lists.fd.io/mt/77728478/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-