Hi all, This patch fixes existing issues in my earlier commit [https://github.com/cygwin/cygwin/commit/f4ba145056dbe99adf4dbe632bec035e006539f8] and optimizes the AArch64 thread startup sequence by eliminating the use of register x19 and streamlining register usage. The key modifications are detailed in the patch's commit description. These changes improve register efficiency while ensuring correct thread argument in register `x0` after virtual free call, preventing from any segmentation faults. The patch has been tested in our internal AArch64 environment where pthread related test cases are now passing as expected.
Inlined Patch: >From e197e39452e542d18812f41ac2a3af2fa172b273 Mon Sep 17 00:00:00 2001 From: Thirumalai Nagalingam <[email protected]> Date: Tue, 1 Jul 2025 14:46:52 +0530 Subject: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 and streamlining register usage - Removed use of x19 by directly loading the thread func and arg using LDP from [WRAPPER_ARG], freeing up one additional register - Loaded thread function and argument into x20 and x21 before VirtualFree to preserve their values across the virtual free call - Used x1 as a temporary register to load stack base, subtract CYGTLS, and update SP - Moved thread argument back into x0 after VirtualFree and before calling the thread function Signed-off-by: Thirumalai Nagalingam <[email protected]> --- winsup/cygwin/create_posix_thread.cc | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/winsup/cygwin/create_posix_thread.cc b/winsup/cygwin/create_posix_thread.cc index 592aaf1a5..17bb607f7 100644 --- a/winsup/cygwin/create_posix_thread.cc +++ b/winsup/cygwin/create_posix_thread.cc @@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg) /* Sets up a new thread stack, frees the original OS stack, * and calls the thread function with its arg using AArch64 ABI. */ __asm__ __volatile__ ("\n\ - mov x19, %[WRAPPER_ARG] // x19 = &wrapper_arg \n\ - ldp x0, x10, [x19, #16] // x0 = stackaddr, x10 = stackbase \n\ - sub sp, x10, %[CYGTLS] // sp = stackbase - (CYGTLS) \n\ - mov fp, xzr // clear frame pointer (x29) \n\ - mov x1, xzr // x1 = 0 (dwSize) \n\ - mov x2, #0x8000 // x2 = MEM_RELEASE \n\ - bl VirtualFree // free original stack \n\ - ldp x19, x0, [x19] // x19 = func, x0 = arg \n\ - blr x19 // call thread function \n" + ldp x20, x21, [%[WRAPPER_ARG]] // x20 = thread func, x21 = thread arg \n\ + ldp x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = stackbase \n\ + sub sp, x1, %[CYGTLS] // sp = stackbase - (CYGTLS) \n\ + mov fp, xzr // clear frame pointer (x29) \n\ + // x0 already has stackaddr \n\ + mov x1, xzr // x1 = 0 (dwSize) \n\ + mov x2, #0x8000 // x2 = MEM_RELEASE \n\ + bl VirtualFree // free original stack \n\ + mov x0, x21 // Move arg into x0 \n\ + blr x20 // call thread function \n" : : [WRAPPER_ARG] "r" (&wrapper_arg), [CYGTLS] "r" (__CYGTLS_PADSIZE__) - : "x0", "x1", "x2", "x10", "x19", "x29", "memory"); + : "x0", "x1", "x2", "x20", "x21", "x29", "memory"); #else #error unimplemented for this target #endif -- 2.49.0.windows.1 Thanks, Thirumalai Nagalingam
0001-Aarch64-Optimize-pthread_wrapper-by-eliminating-x19-.patch
Description: 0001-Aarch64-Optimize-pthread_wrapper-by-eliminating-x19-.patch
