On Thu, 27 Mar 2025, Corinna Vinschen wrote:

> On Mar 27 10:26, Jeremy Drake via Cygwin-patches wrote:
> > comment, it seems 8.0 is the odd-version-out here.
>
> Yeah, but we don't support 8.0 anymore, only 8.1.
>
> > BTW, something I would *like* to do but haven't figured out how to
> > accomplish cleanly yet is to follow the registers.  What I mean by this is
> > illustrated by what I did in the aarch64 version: I could find the call to
> > RtlEnterCrticalSection, then work backwards, find the add whose Rd was x0
> > (the register for the first (pointer) parameter in the calling
> > convention), then find the adrp whose Rd was the Rn of the add.  What I
> > would do on x86_64 is find the call to RtlEnterCriticalSection, find any
> > mov rcx, <reg> before, then find the lea <reg>, [rip+XXX] (where reg would
> > be rcx if there wasn't a mov rcx after the lea).  Unfortunately, the
> > variable length-ness doesn't lend itself to iterating backwards, so I am
> > not confirming that the lea actually ends up in rcx for the function call.
> > The only register correlation I do is that the register used in the
> > mov <reg>, QWORD PTR [rip+XXX] is then used in the next instruction that
> > must be test <reg>, <reg>.  The old code required that <reg> to be rbx,
> > but I don't see any reason that rbx is required...
>
> Yeah, reading x86_64 backwards will lead to confusion.  And no, rbx
> isn't required, any non-volatile register could do it.  It seems that
> rbx is used because of the way vc++ allocates register.


After taking out the windows 8.0 case, I think this should be doable:
* when finding the lea that we're already looking for, save the
  destination register

* if the destination register is not rcx, look for a 64-bit mov into rcx
  from <reg> (where <reg> is the register from the lea) before the call to
  RtlEnterCriticalSection

This won't catch cases where they shuffle it between multiple registers,
or otherwise obfusate the load into rcx (push/pop, xchg, using some memory
location, ...) but I think this covers every case I've seen (including
those mentioned in comments about preview builds).  It would also allow us
to skip the theoretical-but-legal sequence (intel)

lea rXX, [rip+XXXX] ; FastPebLock
...
call UnrelatedFunction
mov rcx, rXX
call RtlEnterCriticalSection
mov rYY, QWORD PTR [rip+YYYY] ; RtlpCurDirRef
test rYY, rYY
...

I'll try to find some time to test this latest round on as many released
Windows versions >= 8.1 as I can, and then send a v3 series.  It works on
22631 at least.

Reply via email to