On 29.08.2024 00:03, Andrew Cooper wrote:
> It has existed in x86 CPUs since 2008, so we're only 16 years late adding
> support.  With all the other scafolding in place, implement arch_hweightl()
> for x86.
> 
> The only complication is that the call to arch_generic_hweightl() is behind
> the compilers back.  Address this by writing it in ASM and ensure that it
> preserves all registers.
> 
> Copy the code generation from generic_hweightl().  It's not a complicated
> algorithm, and is easy to regenerate if needs be, but cover it with the same
> unit tests as test_generic_hweightl() just for piece of mind.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Roger Pau Monné <roger....@citrix.com>
> 
> v2:
>  * Fix MISRA 8.2 (parameter name) and 8.5 (single declaration) regressions.
>  * Rename {arch->x86}-generic-hweightl.{S->c}
>  * Adjust ASM formating
> 
> The __constructor trick to cause any reference to $foo() to pull in
> test_$foo() only works when both are in the same TU.  i.e. what I did in
> v1 (putting test_arch_generic_hweightl() in the regular generic-hweightl.c)
> didn't work.

I'm afraid I don't understand this. What exactly didn't work, breaking in which
way? Presumably as much as you, I don't really like the global asm() in a C
file, when ideally the same could be written with less clutter in an assembly
one.

> This in turn means that arch_generic_hweightl() needs writing in a global asm
> block, and also that we can't use FUNC()/END().  While we could adjust it to
> work for GCC/binutils, we can't have CPP macros in Clang-IAS strings.

What does Clang different from gcc there? I was hoping that at least their pre-
processors would work in (sufficiently) similar ways.

> --- /dev/null
> +++ b/xen/lib/x86-generic-hweightl.c
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/bitops.h>
> +#include <xen/init.h>
> +#include <xen/self-tests.h>
> +
> +/*
> + * An implementation of generic_hweightl() used on hardware without the 
> POPCNT
> + * instruction.
> + *
> + * This function is called from within an ALTERNATIVE in arch_hweightl().
> + * i.e. behind the back of the compiler.  Therefore all registers are callee
> + * preserved.
> + *
> + * The ASM is what GCC-12 emits for generic_hweightl() in a release build of
> + * Xen, with spilling of %rdi/%rdx to preserve the callers registers.
> + *
> + * Note: When we can use __attribute__((no_caller_saved_registers))
> + *       unconditionally (GCC 7, Clang 5), we can implement this in plain C.
> + */
> +asm (
> +    ".type arch_generic_hweightl, STT_FUNC\n\t"
> +    ".globl arch_generic_hweightl\n\t"
> +    ".hidden arch_generic_hweightl\n\t"
> +    ".balign " STR(CONFIG_FUNCTION_ALIGNMENT) ", 0x90\n\t"

Maybe better avoid open-coding CODE_FILL, in case we want to change that
down the road?

Also could I talk you into dropping the \t there? Canonical assembly code
wants ...

> +    "arch_generic_hweightl:\n\t"

.. labels unindented.

Jan

Reply via email to