Hi all,

This patch fixes existing issues in my earlier commit 
[https://github.com/cygwin/cygwin/commit/f4ba145056dbe99adf4dbe632bec035e006539f8]
 and optimizes the AArch64 thread startup sequence by eliminating the use of 
register x19 and streamlining register usage.
The key modifications are detailed in the patch's commit description. These 
changes improve register efficiency while ensuring correct thread argument in 
register `x0` after virtual free call, preventing from any segmentation faults. 
The patch has been tested in our internal AArch64 environment where pthread 
related test cases are now passing as expected.

Inlined Patch:

>From e197e39452e542d18812f41ac2a3af2fa172b273 Mon Sep 17 00:00:00 2001
From: Thirumalai Nagalingam <[email protected]>
Date: Tue, 1 Jul 2025 14:46:52 +0530
Subject: [PATCH] Aarch64: Optimize pthread_wrapper by eliminating x19 and
 streamlining register usage

- Removed use of x19 by directly loading the thread func and arg using LDP from 
[WRAPPER_ARG], freeing up one additional register
- Loaded thread function and argument into x20 and x21 before VirtualFree to 
preserve their values across the virtual free call
- Used x1 as a temporary register to load stack base, subtract CYGTLS, and 
update SP
- Moved thread argument back into x0 after VirtualFree and before calling the 
thread function

Signed-off-by: Thirumalai Nagalingam 
<[email protected]>
---
 winsup/cygwin/create_posix_thread.cc | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/winsup/cygwin/create_posix_thread.cc 
b/winsup/cygwin/create_posix_thread.cc
index 592aaf1a5..17bb607f7 100644
--- a/winsup/cygwin/create_posix_thread.cc
+++ b/winsup/cygwin/create_posix_thread.cc
@@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg)
   /* Sets up a new thread stack, frees the original OS stack,
    * and calls the thread function with its arg using AArch64 ABI. */
   __asm__ __volatile__ ("\n\
-     mov     x19, %[WRAPPER_ARG]  // x19 = &wrapper_arg              \n\
-     ldp     x0, x10, [x19, #16]  // x0 = stackaddr, x10 = stackbase \n\
-     sub     sp, x10, %[CYGTLS]   // sp = stackbase - (CYGTLS)       \n\
-     mov     fp, xzr              // clear frame pointer (x29)       \n\
-     mov     x1, xzr              // x1 = 0 (dwSize)                 \n\
-     mov     x2, #0x8000          // x2 = MEM_RELEASE                \n\
-     bl      VirtualFree          // free original stack             \n\
-     ldp     x19, x0, [x19]       // x19 = func, x0 = arg            \n\
-     blr     x19                  // call thread function            \n"
+     ldp     x20, x21, [%[WRAPPER_ARG]]    // x20 = thread func, x21 = thread 
arg \n\
+     ldp     x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = stackbase 
\n\
+     sub     sp, x1, %[CYGTLS]         // sp = stackbase - (CYGTLS)       \n\
+     mov     fp, xzr                // clear frame pointer (x29)       \n\
+                  // x0 already has stackaddr     \n\
+     mov     x1, xzr                // x1 = 0 (dwSize)                 \n\
+     mov     x2, #0x8000            // x2 = MEM_RELEASE                \n\
+     bl      VirtualFree            // free original stack             \n\
+     mov     x0, x21          // Move arg into x0       \n\
+     blr     x20                    // call thread function            \n"
      : : [WRAPPER_ARG] "r" (&wrapper_arg),
          [CYGTLS] "r" (__CYGTLS_PADSIZE__)
-     : "x0", "x1", "x2", "x10", "x19", "x29", "memory");
+     : "x0", "x1", "x2", "x20", "x21", "x29", "memory");
 #else
 #error unimplemented for this target
 #endif
--
2.49.0.windows.1

Thanks,
Thirumalai Nagalingam

Attachment: 0001-Aarch64-Optimize-pthread_wrapper-by-eliminating-x19-.patch
Description: 0001-Aarch64-Optimize-pthread_wrapper-by-eliminating-x19-.patch

Reply via email to