Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization

Adrian Ratiu Sat, 07 Nov 2020 10:08:45 -0800

On Fri, 06 Nov 2020, Nick Desaulniers <ndesaulni...@google.com>wrote:

On Fri, Nov 6, 2020 at 3:50 AM Adrian Ratiu<adrian.ra...@collabora.com> wrote:
Hi Nathan,On Fri, 06 Nov 2020, Nathan Chancellor<natechancel...@gmail.com> wrote:> + Ard, who wrote this code.>> On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote:>> Due to a Clang bug [1] neon autoloop vectorization does not>> happen or happens badly with no gains and considering>> previous GCC experiences which generated unoptimized code>> which was worse than the default asm implementation, it is>> safer to default clang builds to the known good generic>> implementation. The kernel currently supports a minimum>> Clang version of v10.0.1, see commit 1f7a44f63e6c>> ("compiler-clang: add build check for clang 10.0.1"). When>> the bug gets eventually fixed, this commit could be reverted>> or, if the minimum clang version bump takes a long time, a>> warning could be added for users to upgrade their compilers>> like was done for GCC. [1]>> https://bugs.llvm.org/show_bug.cgi?id=40976 Signed-off-by:>> Adrian Ratiu <adrian.ra...@collabora.com>>> Thank you for the patch! We are also tracking this here:>> https://github.com/ClangBuiltLinux/linux/issues/496>> It was on my TODO to revist getting the warning eliminated,> which likely would have involved a patch like this as well.>> I am curious if it is worth revisting or dusting off Arnd's> patch in the LLVM bug tracker first. I have not tried it> personally. If that is not a worthwhile option, I am fine> with this for now. It would be nice to try and get a fix> pinned down on the LLVM side at some point but alas, finite> amount of resources and people :(I tested Arnd's kernel patch from the LLVM bugtracker [1], butwith the Clang v10.0.1 I still get warnings like the followingeven though the __restrict workaround seems to affect thegenerated instructions:./include/asm-generic/xor.h:15:2: remark: the cost-modelindicates that interleaving is not beneficial[-Rpass-missed=loop-vectorize]./include/asm-generic/xor.h:11:1: remark: List vectorizationwas possible but not beneficial with cost 0 >= 0[-Rpass-missed=slp-vectorizer] xor_8regs_2(unsigned long bytes,unsigned long *__restrict p1, unsigned long *__restrict p2)
If it's just a matter of overruling the cost model #pragma clangloop vectorize(enable)will do the trick.Indeed, ``` diff --git a/include/asm-generic/xor.hb/include/asm-generic/xor.h index b62a2a56a4d4..8796955498b7100644 --- a/include/asm-generic/xor.h +++b/include/asm-generic/xor.h @@ -12,6 +12,7 @@xor_8regs_2(unsigned long bytes, unsigned long *p1, unsignedlong *p2){long lines = bytes / (sizeof (long)) / 8;+#pragma clang loop vectorize(enable)do {p1[0] ^= p2[0]; p1[1] ^= p2[1];@@ -32,6 +33,7 @@ xor_8regs_3(unsigned long bytes, unsigned long*p1, unsigned long *p2,{long lines = bytes / (sizeof (long)) / 8;+#pragma clang loop vectorize(enable)do {p1[0] ^= p2[0] ^ p3[0]; p1[1] ^= p2[1] ^ p3[1];@@ -53,6 +55,7 @@ xor_8regs_4(unsigned long bytes, unsigned long*p1, unsigned long *p2,{long lines = bytes / (sizeof (long)) / 8;+#pragma clang loop vectorize(enable)do {p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; p1[1] ^= p2[1] ^p3[1] ^ p4[1];@@ -75,6 +78,7 @@ xor_8regs_5(unsigned long bytes, unsigned long*p1, unsigned long *p2,{long lines = bytes / (sizeof (long)) / 8;+#pragma clang loop vectorize(enable)do {p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^=p2[1] ^ p3[1] ^ p4[1] ^ p5[1];``` seems to generate the vectorized code.Why don't we find a way to make those pragma's more toolchainportable, rather than open coding them like I have above ratherthan this series?


Hi Nick,

Thank you very much for the suggestion.

I agree. If a toolchain portable way can be found to realiablytrigger the optimization, I will gladly replace this patch. :)

Will work on it starting Monday then report back my findings or,if I can get it to work in a satisfying manner, send a v2 seriesdirectly.

The first patch is still needed because it's more of a generalcleanup as Nathan correctly observed.


Regards,
Adrian


--
Thanks,
~Nick Desaulniers

Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization

Reply via email to