On Tue, 1 Jul 2025, Thirumalai Nagalingam wrote:

> Hi all,
>
> This patch fixes existing issues in my earlier commit
> [https://github.com/cygwin/cygwin/commit/f4ba145056dbe99adf4dbe632bec035e006539f8]
> and optimizes the AArch64 thread startup sequence by eliminating the use
> of register x19 and streamlining register usage. The key modifications
> are detailed in the patch's commit description. These changes improve
> register efficiency while ensuring correct thread argument in register
> `x0` after virtual free call, preventing from any segmentation faults.
> The patch has been tested in our internal AArch64 environment where
> pthread related test cases are now passing as expected.
>
> Inlined Patch:
>
> From e197e39452e542d18812f41ac2a3af2fa172b273 Mon Sep 17 00:00:00 2001
> From: Thirumalai Nagalingam <[email protected]>
> Date: Tue, 1 Jul 2025 14:46:52 +0530
> Subject: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 and
>  streamlining register usage
>
> - Removed use of x19 by directly loading the thread func and arg using
> LDP from [WRAPPER_ARG], freeing up one additional register
> - Loaded thread function and argument into x20 and x21 before
> VirtualFree to preserve their values across the virtual free call
> - Used x1 as a temporary register to load stack base, subtract CYGTLS,
> and update SP
> - Moved thread argument back into x0 after VirtualFree and before
> calling the thread function
>

So the problem was that the registers used before were ones not required
to be preserved across function calls in the ABI?  Or was it that
wrapper_arg was on the now-freed stack so could not be accessed after the
VirtualFree?  Pleas include that in your commit message.  Also, please
wrap your commit message at 72 characters, prefix the subject/first line
"Cygwin: ", and include the trailer

Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper")
Signed-off-by: Thirumalai Nagalingam 
<[email protected]>


> ---
>  winsup/cygwin/create_posix_thread.cc | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/winsup/cygwin/create_posix_thread.cc 
> b/winsup/cygwin/create_posix_thread.cc
> index 592aaf1a5..17bb607f7 100644
> --- a/winsup/cygwin/create_posix_thread.cc
> +++ b/winsup/cygwin/create_posix_thread.cc
> @@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg)
>    /* Sets up a new thread stack, frees the original OS stack,
>     * and calls the thread function with its arg using AArch64 ABI. */
>    __asm__ __volatile__ ("\n\
> -     mov     x19, %[WRAPPER_ARG]  // x19 = &wrapper_arg              \n\
> -     ldp     x0, x10, [x19, #16]  // x0 = stackaddr, x10 = stackbase \n\
> -     sub     sp, x10, %[CYGTLS]   // sp = stackbase - (CYGTLS)       \n\
> -     mov     fp, xzr              // clear frame pointer (x29)       \n\
> -     mov     x1, xzr              // x1 = 0 (dwSize)                 \n\
> -     mov     x2, #0x8000          // x2 = MEM_RELEASE                \n\
> -     bl      VirtualFree          // free original stack             \n\
> -     ldp     x19, x0, [x19]       // x19 = func, x0 = arg            \n\
> -     blr     x19                  // call thread function            \n"
> +     ldp     x20, x21, [%[WRAPPER_ARG]]    // x20 = thread func, x21 = 
> thread arg \n\
> +     ldp     x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = stackbase 
> \n\
> +     sub     sp, x1, %[CYGTLS]         // sp = stackbase - (CYGTLS)       \n\
> +     mov     fp, xzr                // clear frame pointer (x29)       \n\
> +                  // x0 already has stackaddr     \n\
> +     mov     x1, xzr                // x1 = 0 (dwSize)                 \n\
> +     mov     x2, #0x8000            // x2 = MEM_RELEASE                \n\
> +     bl      VirtualFree            // free original stack             \n\
> +     mov     x0, x21          // Move arg into x0       \n\
> +     blr     x20                    // call thread function            \n"
>       : : [WRAPPER_ARG] "r" (&wrapper_arg),
>           [CYGTLS] "r" (__CYGTLS_PADSIZE__)
> -     : "x0", "x1", "x2", "x10", "x19", "x29", "memory");
> +     : "x0", "x1", "x2", "x20", "x21", "x29", "memory");
>  #else
>  #error unimplemented for this target
>  #endif
> --
> 2.49.0.windows.1
>
> Thanks,
> Thirumalai Nagalingam
>

-- 
"I'd love to go out with you, but there are important world issues that
need worrying about."

Reply via email to