On Sunday 01 March 2026 21:53:07 LIU Hao wrote:
> 在 2026-2-25 04:46, Pali Rohár 写道:
> >  From a30d95935cf77baf36c1447e3dc984b30ba881ee Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?Pali=20Roh=C3=A1r?=<[email protected]>
> > Date: Fri, 21 Nov 2025 18:37:53 +0100
> > Subject: [PATCH 1/4] crt: Fix kernel32 BaseAttachCompleteThunk symbol
> >   decoration
> > 
> > Function BaseAttachCompleteThunk() takes one CONTEXT pointer argument and
> > on WinNT it "jumps" to address specified in the CONTEXT pointer argument.
> > So it does not return to the caller. On Win32s it sets win32 error to
> > ERROR_CALL_NOT_IMPLEMENTED, returns zero and does not touch stack pointer.
> > 
> > So gendef detects decoration for BaseAttachCompleteThunk incorrectly. It is
> > not stdcall @16 (detected for WinNT versions) and neither stdcall @0
> > (detected for Win32s versions). The correct decoration should be cdecl
> > calling convention to match above description.
> > 
> > BaseAttachCompleteThunk is mostly used as a callsite pointer argument
> > for RtlRemoteCall() function call, called by debuggers.
> > 
> 
> There's leaked source code of this function on GitHub [1] which says this
> function is stdcall and takes no argument, so existing code which says `@0`
> should be correct.

Hello, I did not look into that source code as I do not think it is a good idea.

Anyway, I wrote this change months ago when I did basic tests and I do
not remember right now all details.

But it is quite suspicious for me. That BaseAttachCompleteThunk for
sure is taking CONTEXT parameter because it is changing EIP on success
from it. And descriptions which I found on internet suggest that it is
used by debuggers.

Now I looked at it again to figure out how that BaseAttachCompleteThunk
really works and how it can be used.

Function is really small, llvm-objdump -d --print-imm-hex for
Win2k kernel32.dll shows following code:

7c50abac <BaseAttachCompleteThunk>:
7c50abac: 89 84 24 b4 00 00 00          movl    %eax, 0xb4(%esp)
7c50abb3: 89 ac 24 b8 00 00 00          movl    %ebp, 0xb8(%esp)
7c50abba: e8 f3 33 00 00                calll   0x7c50dfb2
7c50abbf: 90                            nop

(anonymous function):
7c50dfb2: 55                            pushl   %ebp
7c50dfb3: 8b ec                         movl    %esp, %ebp
7c50dfb5: 51                            pushl   %ecx
7c50dfb6: 56                            pushl   %esi
7c50dfb7: 33 f6                         xorl    %esi, %esi
7c50dfb9: 56                            pushl   %esi
7c50dfba: 8d 45 fc                      leal    -0x4(%ebp), %eax
7c50dfbd: 6a 04                         pushl   $0x4
7c50dfbf: 50                            pushl   %eax
7c50dfc0: 6a 07                         pushl   $0x7
7c50dfc2: 6a ff                         pushl   $-0x1
7c50dfc4: 89 75 fc                      movl    %esi, -0x4(%ebp)
7c50dfc7: ff 15 b8 10 4e 7c             calll   *0x7c4e10b8
7c50dfcd: 3b c6                         cmpl    %esi, %eax
7c50dfcf: 7c 0a                         jl      0x7c50dfdb
7c50dfd1: 39 75 fc                      cmpl    %esi, -0x4(%ebp)
7c50dfd4: 74 05                         je      0x7c50dfdb
7c50dfd6: e8 1b fa ff ff                calll   0x7c50d9f6 <DebugBreak>
7c50dfdb: 39 75 08                      cmpl    %esi, 0x8(%ebp)
7c50dfde: 56                            pushl   %esi
7c50dfdf: 75 0a                         jne     0x7c50dfeb
7c50dfe1: e8 55 7f fd ff                calll   0x7c4e5f3b <ExitThread>
7c50dfe6: 5e                            popl    %esi
7c50dfe7: c9                            leave
7c50dfe8: c2 04 00                      retl    $0x4
7c50dfeb: ff 75 08                      pushl   0x8(%ebp)
7c50dfee: ff 15 18 11 4e 7c             calll   *0x7c4e1118
7c50dff4: eb f0                         jmp     0x7c50dfe6

First part directly manipulates the stack, which means that at least
0xb8 + 4 bytes are passed as parameters. First 4 bytes on stack are in
cdecl and stdcall calling convention the return address. Therefore if
this is really stdcall function, then it is taking 184 bytes and hence
should be declared as BaseAttachCompleteThunk@184. And not @0.

Also in the first part is another suspicious thing, there is calll
instruction without modification of esp register. Which means that the
called anonymous function 0x7c50dfb2 (if has some standard calling
convention) is taking as its first argument, the return address of the
BaseAttachCompleteThunk caller.

The second part is anonymous function which looks like stdcall @4
function. It has normal prologue with retl $0x4 epilogue. And is
accessing one 4-byte argument passed by caller.
This should be really pointer to CONTEXT structure, which this function
is suppose to restore. And retl probably happens only on error.

What is very suspicious, how the anonymous function is called.
If the anonymous function is really stdcall @4 then it should expects
stack in this form:

  esp + 0 = return address
  esp + 4 = CONTEXT pointer

This anonymous function is called via "calll" instruction from
BaseAttachCompleteThunk and before it the stack has to be:

  esp + 0 = CONTEXT pointer

(so calll can push the return address on stack)

And now if the caller of the BaseAttachCompleteThunk wants to call the
BaseAttachCompleteThunk function, it has to ensure that on esp + 0 would
be CONTEXT pointer. IIRC this is impossible to achieve via x86 call
instruction, as call always push return address on the stack (esp + 0).

So I think that the only possible way to call BaseAttachCompleteThunk
function is via "jmp" instruction with properly prepared CONTEXT pointer
on the stack.

Such function is not stdcall at all, because stdcall calling convention
requires the return address on esp+0 (instead of first parameter).

And the first two instructions in BaseAttachCompleteThunk suggests that
the BaseAttachCompleteThunk takes more argument on the stack. By the
layout it looks like that that the second argument is the content of the
CONTEXT itself, as movl moves into it two registers.

I played a bit with this function on Win2K and I wrote a working example
how to call and use the BaseAttachCompleteThunk function. Example is in
the attachment.

Possible prototype for that BaseAttachCompleteThunk function could be:

  jmp_noreturn void BaseAttachCompleteThunk(CONTEXT *pctx, CONTEXT ctx);

where the jmp_noreturn means that return address is not passed on the
stack at all. Such thing is not supported by gcc or clang. I wanted by
this just to highlight that it is taking (at least) two arguments, not
zero. And the attached example proves that.


For completeness, Win32s implementation is just a stub:

b00271d4:       6a 78                   push   $0x78
b00271d6:       e8 49 9e fd ff          call   0xb0001024 <SetLastError>
b00271db:       2b c0                   sub    %eax,%eax
b00271dd:       c3                      ret

It is not taking any argument (so is compatible with both stdcall and
cdecl) and always returns an error.

NT implementation cannot return because it has to be called via "jmp".
So Win32s and NT implementations are ABI incompatible.

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to