On Tue, 1 Jul 2025, Thirumalai Nagalingam wrote:
> Hi all,
>
> This patch fixes existing issues in my earlier commit
> [https://github.com/cygwin/cygwin/commit/f4ba145056dbe99adf4dbe632bec035e006539f8]
> and optimizes the AArch64 thread startup sequence by eliminating the use
> of register x19 and streamlining register usage. The key modifications
> are detailed in the patch's commit description. These changes improve
> register efficiency while ensuring correct thread argument in register
> `x0` after virtual free call, preventing from any segmentation faults.
> The patch has been tested in our internal AArch64 environment where
> pthread related test cases are now passing as expected.
>
> Inlined Patch:
>
> From e197e39452e542d18812f41ac2a3af2fa172b273 Mon Sep 17 00:00:00 2001
> From: Thirumalai Nagalingam <[email protected]>
> Date: Tue, 1 Jul 2025 14:46:52 +0530
> Subject: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 and
> streamlining register usage
>
> - Removed use of x19 by directly loading the thread func and arg using
> LDP from [WRAPPER_ARG], freeing up one additional register
> - Loaded thread function and argument into x20 and x21 before
> VirtualFree to preserve their values across the virtual free call
> - Used x1 as a temporary register to load stack base, subtract CYGTLS,
> and update SP
> - Moved thread argument back into x0 after VirtualFree and before
> calling the thread function
>
So the problem was that the registers used before were ones not required
to be preserved across function calls in the ABI? Or was it that
wrapper_arg was on the now-freed stack so could not be accessed after the
VirtualFree? Pleas include that in your commit message. Also, please
wrap your commit message at 72 characters, prefix the subject/first line
"Cygwin: ", and include the trailer
Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper")
Signed-off-by: Thirumalai Nagalingam
<[email protected]>
> ---
> winsup/cygwin/create_posix_thread.cc | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/winsup/cygwin/create_posix_thread.cc
> b/winsup/cygwin/create_posix_thread.cc
> index 592aaf1a5..17bb607f7 100644
> --- a/winsup/cygwin/create_posix_thread.cc
> +++ b/winsup/cygwin/create_posix_thread.cc
> @@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg)
> /* Sets up a new thread stack, frees the original OS stack,
> * and calls the thread function with its arg using AArch64 ABI. */
> __asm__ __volatile__ ("\n\
> - mov x19, %[WRAPPER_ARG] // x19 = &wrapper_arg \n\
> - ldp x0, x10, [x19, #16] // x0 = stackaddr, x10 = stackbase \n\
> - sub sp, x10, %[CYGTLS] // sp = stackbase - (CYGTLS) \n\
> - mov fp, xzr // clear frame pointer (x29) \n\
> - mov x1, xzr // x1 = 0 (dwSize) \n\
> - mov x2, #0x8000 // x2 = MEM_RELEASE \n\
> - bl VirtualFree // free original stack \n\
> - ldp x19, x0, [x19] // x19 = func, x0 = arg \n\
> - blr x19 // call thread function \n"
> + ldp x20, x21, [%[WRAPPER_ARG]] // x20 = thread func, x21 =
> thread arg \n\
> + ldp x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = stackbase
> \n\
> + sub sp, x1, %[CYGTLS] // sp = stackbase - (CYGTLS) \n\
> + mov fp, xzr // clear frame pointer (x29) \n\
> + // x0 already has stackaddr \n\
> + mov x1, xzr // x1 = 0 (dwSize) \n\
> + mov x2, #0x8000 // x2 = MEM_RELEASE \n\
> + bl VirtualFree // free original stack \n\
> + mov x0, x21 // Move arg into x0 \n\
> + blr x20 // call thread function \n"
> : : [WRAPPER_ARG] "r" (&wrapper_arg),
> [CYGTLS] "r" (__CYGTLS_PADSIZE__)
> - : "x0", "x1", "x2", "x10", "x19", "x29", "memory");
> + : "x0", "x1", "x2", "x20", "x21", "x29", "memory");
> #else
> #error unimplemented for this target
> #endif
> --
> 2.49.0.windows.1
>
> Thanks,
> Thirumalai Nagalingam
>
--
"I'd love to go out with you, but there are important world issues that
need worrying about."