On Tue, 1 Jul 2025, Thirumalai Nagalingam wrote:

> Hi Jeremy,
>
> Please find this revised version of the previous patch.
> The main issue being addressed is a segfault caused by accessing 
> `wrapper_arg` on the stack after it had been deallocated by VirtualFree.
> This resulted in invalid memory access during thread startup on AArch64. In 
> this version, the thread `func` and `arg` are loaded before the stack is 
> freed, stored in callee-saved registers, and restored before calling the 
> thread func.
>
> Commit message has been updated accordingly and wrapped at 72 characters with 
> trailer. Thanks for the feedback!

Pushed, thanks

>
> In-Lined patch:
>
> From 53215b09e6a19c9493fa5fa58d91a82f6d51e1b2 Mon Sep 17 00:00:00 2001
> From: Thirumalai Nagalingam <[email protected]>
> Date: Tue, 1 Jul 2025 18:17:24 +0000
> Subject: [PATCH] Cygwin: Aarch64: optimize pthread_wrapper register usage
>
> This patch resolves issues related to unsafe access to deallocated
> stack memory in the pthread wrapper for AArch64.
>
> Key changes:
> - Removed use of x19 by directly loading the thread function and
>   argument using LDP from [WRAPPER_ARG], freeing one register.
> - Stored thread function and argument in x20 and x21 before
>   VirtualFree to preserve them across calls.
> - Used x1 as a temporary register to load the stack base,
>   subtract CYGTLS, and update SP.
> - Moved the thread argument back into x0 after VirtualFree and
>   before calling the thread function.
>
> Earlier, `wrapper_arg` lived on the stack, which was freed via
> `VirtualFree`, risking segfaults on later access. Now, the thread
> `func` and `arg` are loaded before the stack is freed, stored in
> callee-saved registers, and restored to `x0` before calling the
> thread function.
>
> Fixes: f4ba145056db ("Aarch64: Add inline assembly pthread wrapper")
> Signed-off-by: Thirumalai Nagalingam 
> <[email protected]>
> ---
>  winsup/cygwin/create_posix_thread.cc | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/winsup/cygwin/create_posix_thread.cc 
> b/winsup/cygwin/create_posix_thread.cc
> index 592aaf1a5..17bb607f7 100644
> --- a/winsup/cygwin/create_posix_thread.cc
> +++ b/winsup/cygwin/create_posix_thread.cc
> @@ -103,18 +103,19 @@ pthread_wrapper (PVOID arg)
>    /* Sets up a new thread stack, frees the original OS stack,
>     * and calls the thread function with its arg using AArch64 ABI. */
>    __asm__ __volatile__ ("\n\
> -        mov     x19, %[WRAPPER_ARG]  // x19 = &wrapper_arg              \n\
> -        ldp     x0, x10, [x19, #16]  // x0 = stackaddr, x10 = stackbase \n\
> -        sub     sp, x10, %[CYGTLS]   // sp = stackbase - (CYGTLS)       \n\
> -        mov     fp, xzr              // clear frame pointer (x29)       \n\
> -        mov     x1, xzr              // x1 = 0 (dwSize)                 \n\
> -        mov     x2, #0x8000          // x2 = MEM_RELEASE                \n\
> -        bl      VirtualFree          // free original stack             \n\
> -        ldp     x19, x0, [x19]       // x19 = func, x0 = arg            \n\
> -        blr     x19                  // call thread function            \n"
> +        ldp     x20, x21, [%[WRAPPER_ARG]]    // x20 = thread func, x21 = 
> thread arg \n\
> +        ldp     x0, x1, [%[WRAPPER_ARG], #16] // x0 = stackaddr, x1 = 
> stackbase      \n\
> +        sub     sp, x1, %[CYGTLS]             // sp = stackbase - (CYGTLS)   
>         \n\
> +        mov     fp, xzr                       // clear frame pointer (x29)   
>         \n\
> +                                              // x0 already has stackaddr    
>         \n\
> +        mov     x1, xzr                       // x1 = 0 (dwSize)             
>         \n\
> +        mov     x2, #0x8000                   // x2 = MEM_RELEASE            
>         \n\
> +        bl      VirtualFree                   // free original stack         
>         \n\
> +        mov     x0, x21                       // Move arg into x0            
>         \n\
> +        blr     x20                           // call thread function        
>         \n"
>          : : [WRAPPER_ARG] "r" (&wrapper_arg),
>              [CYGTLS] "r" (__CYGTLS_PADSIZE__)
> -        : "x0", "x1", "x2", "x10", "x19", "x29", "memory");
> +        : "x0", "x1", "x2", "x20", "x21", "x29", "memory");
>  #else
>  #error unimplemented for this target
>  #endif
>

Reply via email to