[PATCH 1/2] CI: Remove inheritable permissions from working directory

2025-03-21 Thread Jon Turney
Remove inheritable permissions from the working directory, since they
break assumptions that the testsuite makes about the filemode a given
umask will result in.
---
 .github/workflows/cygwin.yml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
index 803462a01..53dd06d3c 100644
--- a/.github/workflows/cygwin.yml
+++ b/.github/workflows/cygwin.yml
@@ -105,6 +105,9 @@ jobs:
 # endings, but this could still be dangerous e.g if we need symlinks in the
 # repo)
 - run: git config --global core.autocrlf input
+# remove inheritable permissions since they break assumptions testsuite
+# makes about file modes
+- run: icacls . /inheritance:r
 - uses: actions/checkout@v3
 
 # install cygwin and build tools
-- 
2.45.1



[PATCH 2/2] Revert "Cygwin: CI: XFAIL umask03"

2025-03-21 Thread Jon Turney
This reverts commit cbe7543cdfdb7f3d270214877d4a4c3e78710bd3.
---
 winsup/testsuite/Makefile.am | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/winsup/testsuite/Makefile.am b/winsup/testsuite/Makefile.am
index 228955668..8f2967a6d 100644
--- a/winsup/testsuite/Makefile.am
+++ b/winsup/testsuite/Makefile.am
@@ -333,16 +333,12 @@ TESTS = $(check_PROGRAMS) \
mingw/cygload
 
 # expected fail tests
-XFAIL_TESTS_CI_true = \
-   winsup.api/ltp/umask03$(EXEEXT)
-
 XFAIL_TESTS = \
winsup.api/ltp/setgroups01 \
winsup.api/ltp/setuid02 \
winsup.api/ltp/ulimit01 \
winsup.api/ltp/unlink08 \
-   winsup.api/samples/sample-fail \
-   $(XFAIL_TESTS_CI_$(GITHUB_ACTIONS))
+   winsup.api/samples/sample-fail
 
 # cygrun.sh test-runner script, and variables used by it:
 LOG_COMPILER = $(srcdir)/cygrun.sh
-- 
2.45.1



Re: Use udis86 to walk x64 machine code in find_fast_cwd_pointer

2025-03-21 Thread Corinna Vinschen
On Mar 20 13:01, Jeremy Drake via Cygwin-patches wrote:
> On Thu, 20 Mar 2025, Corinna Vinschen wrote:
> 
> > On Mar 20 15:03, Corinna Vinschen wrote:
> > > On Mar 18 22:11, Jeremy Drake via Cygwin-patches wrote:
> > > > On Tue, 18 Mar 2025, Corinna Vinschen wrote:
> > > >
> > > > > Subdir of winsup/cygwin, probably.  What I'm most curious about is the
> > > > > size it adds to the DLL.  I wonder if, say, an extra 32K is really
> > > > > usefully spent, given it only checks a small part of ntdll.dll, and 
> > > > > only
> > > > > once per process tree, too.
> > > >
> > > > I did this with msys-2.0.dll, but it shouldn't matter as a delta.
> > > > all are stripped msys-2.0.dll size
> > > > start:
> > > > 3,246,118 bytes
> > > > with udis86 vendored, but not called:
> > > > 3,247,142 bytes
> > > > with find_fast_cwd_pointer rewritten to use udis86:
> > > > 3,328,550 bytes
> > > >
> > > > (I know the second one isn't realistic, the linker could exclude unused
> > > > code, I was just kind of curious)
> > > >
> > > > This is with all the "translate to assembly text, intel or at&t syntax"
> > > > and "table of strings for opcodes" stuff removed to try to save space,
> > > > still a net increase of 82,432 bytes.
> > >
> > > The DLL has currently a size of 3 Megs, optimzed, stripped.  82K are
> > > two more allocation granularity slots, 51 instead of 49, about 2%.
> >
> > 4!  4%.  I said 4%, right?
> >
> > *facepalm*
> 
> I'll take that as "patches welcome" :)  I'd also like to take the
> opportunity to add ARM64 support based on my PoC, but I feel bad about
> dropping another blob of code into path.cc.  Would it make sense to rename
> to find_fast_cwd_pointer_x64, move it into a separate source file, and add

find_fast_cwd_pointer_x86_64

> another source file for find_fast_cwd_pointer_arm64?  Or I guess put both

find_fast_cwd_pointer_aarch64

We should use the offical tags, not the Windows ones.  It's ok for the
uname output, but other than that...

> into a fastcwd.cc and #ifdef __x86_64__ the x64 variant (that will of
> course always be true at this point)?

A new fastcwd.cc would make sense.  Theoretically the aarch64 code
should go into an aarch64 subdir, just as with the x86_64 subdir,
but I guess this is handling the AMD64 on ARM64 emulation rather than
the native ARM64 mode?  Will there be any difference between the two
later on if we start supporting native ARM64?  If so, the name
of the function should probably express this.


Thanks,
Corinna


[PATCH 0/2] Fix CI testsuite run with 3.6

2025-03-21 Thread Jon Turney
I think there's been some changes in the way we compute the ACL for files we
create, which is causing a couple of tests to fail in CI.

Get rid of inheritable permissions, so filemodes follow the simple behaviour
(just controlled by umask) that tests expect.

(It seems like there must be something wrong with the contortions we go 
through to run the testsuite against the just-built DLL, as otherwise we've 
have noticed these failures earlier?)

Jon Turney (2):
  CI: Remove inheritable permissions from working directory
  Revert "Cygwin: CI: XFAIL umask03"

 .github/workflows/cygwin.yml | 3 +++
 winsup/testsuite/Makefile.am | 6 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

-- 
2.45.1



[PATCH 0/4] find_fast_cwd_pointer rewrite

2025-03-21 Thread Jeremy Drake via Cygwin-patches
The second patch of this series might be a little difficult to deal
with, but I included a diff of the changes from the upstream
udis86-1.7.2 tarball (retrieved from
https://downloads.sourceforge.net/udis86/udis86-1.7.2.tar.gz),
and I'm copying it again here.

diff -ur udis86-1.7.2/libudis86/decode.c udis86/decode.c
--- udis86-1.7.2/libudis86/decode.c
+++ udis86/decode.c
@@ -23,8 +23,9 @@
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
-#include "udint.h"
+#include "winsup.h"
 #include "types.h"
+#include "udint.h"
 #include "decode.h"

 #ifndef __UD_STANDALONE__
@@ -204,7 +205,7 @@
 decode_prefixes(struct ud *u)
 {
   int done = 0;
-  uint8_t curr, last = 0;
+  uint8_t curr = 0, last = 0;
   UD_RETURN_ON_ERROR(u);

   do {
@@ -653,12 +654,12 @@
   break;
 case OP_F:
   u->br_far  = 1;
-  /* intended fall through */
+  fallthrough;
 case OP_M:
   if (MODRM_MOD(modrm(u)) == 3) {
 UDERR(u, "expected modrm.mod != 3\n");
   }
-  /* intended fall through */
+  fallthrough;
 case OP_E:
   decode_modrm_rm(u, operand, REGCLASS_GPR, size);
   break;
@@ -677,7 +678,7 @@
   if (MODRM_MOD(modrm(u)) != 3) {
 UDERR(u, "expected modrm.mod == 3\n");
   }
-  /* intended fall through */
+  fallthrough;
 case OP_Q:
   decode_modrm_rm(u, operand, REGCLASS_MMX, size);
   break;
@@ -688,7 +689,7 @@
   if (MODRM_MOD(modrm(u)) != 3) {
 UDERR(u, "expected modrm.mod == 3\n");
   }
-  /* intended fall through */
+  fallthrough;
 case OP_W:
   decode_modrm_rm(u, operand, REGCLASS_XMM, size);
   break;
diff -ur udis86-1.7.2/libudis86/decode.h udis86/decode.h
--- udis86-1.7.2/libudis86/decode.h
+++ udis86/decode.h
@@ -183,8 +183,8 @@
   return (primary_opcode & 0x02) != 0;
 }

-extern struct ud_itab_entry ud_itab[];
-extern struct ud_lookup_table_list_entry ud_lookup_table_list[];
+extern const struct ud_itab_entry ud_itab[];
+extern const struct ud_lookup_table_list_entry ud_lookup_table_list[];

 #endif /* UD_DECODE_H */

diff -ur udis86-1.7.2/libudis86/extern.h udis86/extern.h
--- udis86-1.7.2/libudis86/extern.h
+++ udis86/extern.h
@@ -60,9 +60,11 @@

 extern unsigned int ud_disassemble(struct ud*);

+#ifndef __INSIDE_CYGWIN__
 extern void ud_translate_intel(struct ud*);

 extern void ud_translate_att(struct ud*);
+#endif /* __INSIDE_CYGWIN__ */

 extern const char* ud_insn_asm(const struct ud* u);

@@ -82,7 +84,9 @@

 extern enum ud_mnemonic_code ud_insn_mnemonic(const struct ud *u);

+#ifndef __INSIDE_CYGWIN__
 extern const char* ud_lookup_mnemonic(enum ud_mnemonic_code c);
+#endif /* __INSIDE_CYGWIN__ */

 extern void ud_set_user_opaque_data(struct ud*, void*);

diff -ur udis86-1.7.2/libudis86/itab.c udis86/itab.c
--- udis86-1.7.2/libudis86/itab.c
+++ udis86/itab.c
@@ -1,4 +1,5 @@
 /* itab.c -- generated by udis86:scripts/ud_itab.py, do no edit */
+#include "winsup.h"
 #include "decode.h"

 #define GROUP(n) (0x8000 | (n))
@@ -5028,7 +5029,7 @@
 };


-struct ud_lookup_table_list_entry ud_lookup_table_list[] = {
+const struct ud_lookup_table_list_entry ud_lookup_table_list[] = {
 /* 000 */ { ud_itab__0, UD_TAB__OPC_TABLE, "table0" },
 /* 001 */ { ud_itab__1, UD_TAB__OPC_MODE, "/m" },
 /* 002 */ { ud_itab__2, UD_TAB__OPC_MODE, "/m" },
@@ -6294,7 +6295,7 @@
 #define O_sIv { OP_sI,   SZ_V }
 #define O_sIz { OP_sI,   SZ_Z }

-struct ud_itab_entry ud_itab[] = {
+const struct ud_itab_entry ud_itab[] = {
   /*  */ { UD_Iinvalid, O_NONE, O_NONE, O_NONE, P_none },
   /* 0001 */ { UD_Iadd, O_Eb, O_Gb, O_NONE, P_aso|P_rexr|P_rexx|P_rexb },
   /* 0002 */ { UD_Iadd, O_Ev, O_Gv, O_NONE, 
P_aso|P_oso|P_rexw|P_rexr|P_rexx|P_rexb },
@@ -7749,6 +7750,7 @@
 };


+#ifndef __INSIDE_CYGWIN__
 const char * ud_mnemonics_str[] = {
 "invalid",
 "3dnow",
@@ -8399,3 +8401,4 @@
 "movbe",
 "crc32"
 };
+#endif /* __INSIDE_CYGWIN__ */
diff -ur udis86-1.7.2/libudis86/itab.h udis86/itab.h
--- udis86-1.7.2/libudis86/itab.h
+++ udis86/itab.h
@@ -673,6 +673,8 @@
 UD_MAX_MNEMONIC_CODE
 } UD_ATTR_PACKED;

+#ifndef __INSIDE_CYGWIN__
 extern const char * ud_mnemonics_str[];
+#endif /* __INSIDE_CYGWIN__ */

 #endif /* UD_ITAB_H */
Only in udis86-1.7.2/libudis86/: Makefile.am
Only in udis86-1.7.2/libudis86/: Makefile.in
Only in udis86-1.7.2/libudis86/: syn.c
Only in udis86-1.7.2/libudis86/: syn.h
Only in udis86-1.7.2/libudis86/: syn-att.c
Only in udis86-1.7.2/libudis86/: syn-intel.c
diff -ur udis86-1.7.2/libudis86/types.h udis86/types.h
--- udis86-1.7.2/libudis86/types.h
+++ udis86/types.h
@@ -36,6 +36,14 @@
 #endif
 #endif /* __KERNEL__ */

+#ifdef __INSIDE_CYGWIN__
+# include 
+# ifndef __UD_STANDALONE__
+#  define __UD_STANDALONE__ 1
+# endif
+#endif /* __INSIDE_CYGWIN__ */
+
+
 #if defined(_MSC_VER) || defined(__BORLANDC__)
 # include 
 # include 
@@ -221,8 +229,8 @@
   uint8_t   mo

[PATCH 1/4] Cygwin: factor out find_fast_cwd_pointer to arch-specific file

2025-03-21 Thread Jeremy Drake via Cygwin-patches
From: Jeremy Drake 

This is in preparation for rewriting it using udis86, and adding an
implementation for aarch64 hosts.

Signed-off-by: Jeremy Drake 
---
 winsup/cygwin/Makefile.am  |   1 +
 winsup/cygwin/path.cc  | 122 +--
 winsup/cygwin/x86_64/fastcwd_x86_64.cc | 128 +
 3 files changed, 134 insertions(+), 117 deletions(-)
 create mode 100644 winsup/cygwin/x86_64/fastcwd_x86_64.cc

diff --git a/winsup/cygwin/Makefile.am b/winsup/cygwin/Makefile.am
index d47a1a2d11..9ede4249e3 100644
--- a/winsup/cygwin/Makefile.am
+++ b/winsup/cygwin/Makefile.am
@@ -52,6 +52,7 @@ LIB_NAME=libcygwin.a
 if TARGET_X86_64
 TARGET_FILES= \
x86_64/bcopy.S \
+   x86_64/fastcwd_x86_64.cc \
x86_64/memchr.S \
x86_64/memcpy.S \
x86_64/memmove.S \
diff --git a/winsup/cygwin/path.cc b/winsup/cygwin/path.cc
index d2aaed3143..3a5e2ee07e 100644
--- a/winsup/cygwin/path.cc
+++ b/winsup/cygwin/path.cc
@@ -4490,122 +4490,10 @@ fcwd_access_t::SetDirHandleFromBufferPointer (PWCHAR 
buf_p, HANDLE dir)
   f_cwd->DirectoryHandle = dir;
 }

-/* This function scans the code in ntdll.dll to find the address of the
-   global variable used to access the CWD.  While the pointer is global,
-   it's not exported from the DLL, unfortunately.  Therefore we have to
-   use some knowledge to figure out the address. */
-
-#define peek32(x)  (*(int32_t *)(x))
-
-static fcwd_access_t **
-find_fast_cwd_pointer ()
-{
-  /* Fetch entry points of relevant functions in ntdll.dll. */
-  HMODULE ntdll = GetModuleHandle ("ntdll.dll");
-  if (!ntdll)
-return NULL;
-  const uint8_t *get_dir = (const uint8_t *)
-  GetProcAddress (ntdll, "RtlGetCurrentDirectory_U");
-  const uint8_t *ent_crit = (const uint8_t *)
-   GetProcAddress (ntdll, "RtlEnterCriticalSection");
-  if (!get_dir || !ent_crit)
-return NULL;
-  /* Search first relative call instruction in RtlGetCurrentDirectory_U. */
-  const uint8_t *rcall = (const uint8_t *) memchr (get_dir, 0xe8, 80);
-  if (!rcall)
-return NULL;
-  /* Fetch offset from instruction and compute address of called function.
- This function actually fetches the current FAST_CWD instance and
- performs some other actions, not important to us. */
-  const uint8_t *use_cwd = rcall + 5 + peek32 (rcall + 1);
-  /* Next we search for the locking mechanism and perform a sanity check.
- On Pre-Windows 8 we basically look for the RtlEnterCriticalSection call.
- Windows 8 does not call RtlEnterCriticalSection.  The code manipulates
- the FastPebLock manually, probably because RtlEnterCriticalSection has
- been converted to an inline function.  Either way, we test if the code
- uses the FastPebLock. */
-  const uint8_t *movrbx;
-  const uint8_t *lock = (const uint8_t *)
-memmem ((const char *) use_cwd, 80,
-"\xf0\x0f\xba\x35", 4);
-  if (lock)
-{
-  /* The lock instruction tweaks the LockCount member, which is not at
-the start of the PRTL_CRITICAL_SECTION structure.  So we have to
-subtract the offset of LockCount to get the real address. */
-  PRTL_CRITICAL_SECTION lockaddr =
-(PRTL_CRITICAL_SECTION) (lock + 9 + peek32 (lock + 4)
- - offsetof (RTL_CRITICAL_SECTION, LockCount));
-  /* Test if lock address is FastPebLock. */
-  if (lockaddr != NtCurrentTeb ()->Peb->FastPebLock)
-return NULL;
-  /* Search `mov rel(%rip),%rbx'.  This is the instruction fetching the
- address of the current fcwd_access_t pointer, and it should be pretty
-near to the locking stuff. */
-  movrbx = (const uint8_t *) memmem ((const char *) lock, 40,
- "\x48\x8b\x1d", 3);
-}
-  else
-{
-  /* Usually the callq RtlEnterCriticalSection follows right after
-fetching the lock address. */
-  int call_rtl_offset = 7;
-  /* Search `lea rel(%rip),%rcx'.  This loads the address of the lock into
- %rcx for the subsequent RtlEnterCriticalSection call. */
-  lock = (const uint8_t *) memmem ((const char *) use_cwd, 80,
-   "\x48\x8d\x0d", 3);
-  if (!lock)
-   {
- /* Windows 8.1 Preview calls `lea rel(rip),%r12' then some unrelated
-ops, then `mov %r12,%rcx', then `callq RtlEnterCriticalSection'. */
- lock = (const uint8_t *) memmem ((const char *) use_cwd, 80,
-  "\x4c\x8d\x25", 3);
- call_rtl_offset = 14;
-   }
-
-  if (!lock)
-   {
- /* A recent Windows 11 Preview calls `lea rel(rip),%r13' then
-some unrelated instructions, then `callq RtlEnterCriticalSection'.
-*/
- lock = (const uint8_t *) memmem ((const char *) use_cwd, 80,
-  "\x4c\x

[PATCH 3/4] Cygwin: use udis86 to find fast cwd pointer on x64

2025-03-21 Thread Jeremy Drake via Cygwin-patches
From: Jeremy Drake 

This makes find_fast_cwd_pointer more resiliant in the face of changes
to the generated code in ntdll.

Signed-off-by: Jeremy Drake 
---
 winsup/cygwin/x86_64/fastcwd_x86_64.cc | 191 ++---
 1 file changed, 111 insertions(+), 80 deletions(-)

diff --git a/winsup/cygwin/x86_64/fastcwd_x86_64.cc 
b/winsup/cygwin/x86_64/fastcwd_x86_64.cc
index 7765812f00..4d1af65d56 100644
--- a/winsup/cygwin/x86_64/fastcwd_x86_64.cc
+++ b/winsup/cygwin/x86_64/fastcwd_x86_64.cc
@@ -7,10 +7,17 @@
   details. */

 #include "winsup.h"
+#include "udis86/types.h"
+#include "udis86/extern.h"

 class fcwd_access_t;

-#define peek32(x)  (*(int32_t *)(x))
+static inline const void *
+rip_rel_offset (const ud_t *ud_obj, const ud_operand_t *opr, int additional=0)
+{
+  return (const void *) (ud_insn_off (ud_obj) + ud_insn_len (ud_obj) +
+opr->lval.sdword + additional);
+}

 /* This function scans the code in ntdll.dll to find the address of the
global variable used to access the CWD.  While the pointer is global,
@@ -30,99 +37,123 @@ find_fast_cwd_pointer_x86_64 ()
GetProcAddress (ntdll, "RtlEnterCriticalSection");
   if (!get_dir || !ent_crit)
 return NULL;
+  ud_t ud_obj;
+  ud_init (&ud_obj);
+  ud_set_mode (&ud_obj, 64);
+  ud_set_input_buffer (&ud_obj, get_dir, 80);
+  ud_set_pc (&ud_obj, (const uint64_t) get_dir);
+  const ud_operand_t *opr;
   /* Search first relative call instruction in RtlGetCurrentDirectory_U. */
-  const uint8_t *rcall = (const uint8_t *) memchr (get_dir, 0xe8, 80);
-  if (!rcall)
+  const uint8_t *use_cwd = NULL;
+  while (ud_disassemble (&ud_obj))
+{
+  if (ud_insn_mnemonic (&ud_obj) == UD_Icall)
+   {
+ opr = ud_insn_opr (&ud_obj, 0);
+ if (opr->type == UD_OP_JIMM && opr->size == 32)
+   {
+ /* Fetch offset from instruction and compute address of called
+function.  This function actually fetches the current FAST_CWD
+instance and performs some other actions, not important to us.
+  */
+ use_cwd = (const uint8_t *) rip_rel_offset (&ud_obj, opr);
+ break;
+   }
+   }
+}
+  if (!use_cwd)
 return NULL;
-  /* Fetch offset from instruction and compute address of called function.
- This function actually fetches the current FAST_CWD instance and
- performs some other actions, not important to us. */
-  const uint8_t *use_cwd = rcall + 5 + peek32 (rcall + 1);
+  ud_set_input_buffer (&ud_obj, use_cwd, 120);
+  ud_set_pc (&ud_obj, (const uint64_t) use_cwd);
+
   /* Next we search for the locking mechanism and perform a sanity check.
- On Pre-Windows 8 we basically look for the RtlEnterCriticalSection call.
- Windows 8 does not call RtlEnterCriticalSection.  The code manipulates
- the FastPebLock manually, probably because RtlEnterCriticalSection has
- been converted to an inline function.  Either way, we test if the code
- uses the FastPebLock. */
-  const uint8_t *movrbx;
-  const uint8_t *lock = (const uint8_t *)
-memmem ((const char *) use_cwd, 80,
-"\xf0\x0f\xba\x35", 4);
-  if (lock)
+ On Pre- (or Post-) Windows 8 we basically look for the
+ RtlEnterCriticalSection call.  Windows 8 does not call
+ RtlEnterCriticalSection.  The code manipulates the FastPebLock manually,
+ probably because RtlEnterCriticalSection has been converted to an inline
+ function.  Either way, we test if the code uses the FastPebLock. */
+  PRTL_CRITICAL_SECTION lockaddr = NULL;
+
+  /* both cases have an `lea rel(%rip)` on the lock */
+  while (ud_disassemble (&ud_obj))
 {
-  /* The lock instruction tweaks the LockCount member, which is not at
-the start of the PRTL_CRITICAL_SECTION structure.  So we have to
-subtract the offset of LockCount to get the real address. */
-  PRTL_CRITICAL_SECTION lockaddr =
-(PRTL_CRITICAL_SECTION) (lock + 9 + peek32 (lock + 4)
- - offsetof (RTL_CRITICAL_SECTION, LockCount));
-  /* Test if lock address is FastPebLock. */
-  if (lockaddr != NtCurrentTeb ()->Peb->FastPebLock)
-return NULL;
-  /* Search `mov rel(%rip),%rbx'.  This is the instruction fetching the
- address of the current fcwd_access_t pointer, and it should be pretty
-near to the locking stuff. */
-  movrbx = (const uint8_t *) memmem ((const char *) lock, 40,
- "\x48\x8b\x1d", 3);
+  if (ud_insn_mnemonic (&ud_obj) == UD_Ilea)
+   {
+ /* this seems to follow intel syntax, in that operand 0 is the
+dest and 1 is the src */
+ opr = ud_insn_opr (&ud_obj, 1);
+ if (opr->type == UD_OP_MEM && opr->base == UD_R_RIP &&
+ opr->index == UD_NONE && opr->scale == 0 && opr->offset == 32)
+   {
+ lock

[PATCH 4/4] Cygwin: add find_fast_cwd_pointer_aarch64.

2025-03-21 Thread Jeremy Drake via Cygwin-patches
From: Jeremy Drake 

This works for aarch64 hosts when the target is aarch64, x86_64, or i686,
with only a small #if block in one function that needs to care.

Signed-off-by: Jeremy Drake 
---
 winsup/cygwin/Makefile.am|   1 +
 winsup/cygwin/fastcwd_aarch64.cc | 185 +++
 winsup/cygwin/path.cc|  27 -
 3 files changed, 207 insertions(+), 6 deletions(-)
 create mode 100644 winsup/cygwin/fastcwd_aarch64.cc

diff --git a/winsup/cygwin/Makefile.am b/winsup/cygwin/Makefile.am
index 8ecf25d343..649617ab30 100644
--- a/winsup/cygwin/Makefile.am
+++ b/winsup/cygwin/Makefile.am
@@ -299,6 +299,7 @@ DLL_FILES= \
exceptions.cc \
exec.cc \
external.cc \
+   fastcwd_aarch64.cc \
fcntl.cc \
fenv.c \
flock.cc \
diff --git a/winsup/cygwin/fastcwd_aarch64.cc b/winsup/cygwin/fastcwd_aarch64.cc
new file mode 100644
index 00..c1a9c73536
--- /dev/null
+++ b/winsup/cygwin/fastcwd_aarch64.cc
@@ -0,0 +1,185 @@
+/* fastcwd_aarch64.cc: find the fast cwd pointer on aarch64 hosts.
+
+  This file is part of Cygwin.
+
+  This software is a copyrighted work licensed under the terms of the
+  Cygwin license.  Please consult the file "CYGWIN_LICENSE" for
+  details. */
+
+/* You might well wonder why this file is not in an aarch64 target-specific
+   directory, like fastcwd_x86_64.cc.  It turns out that this code works when
+   built for i686, x86_64, or aarch64 with just the small #if/#elif block in
+   GetArm64ProcAddress below caring which. */
+
+#include "winsup.h"
+#include "assert.h"
+
+class fcwd_access_t;
+
+static LPCVOID
+GetArm64ProcAddress (HMODULE hModule, LPCSTR procname)
+{
+  const BYTE * proc = (const BYTE *) GetProcAddress (hModule, procname);
+#if defined (__aarch64__)
+  return proc;
+#else
+#if defined(__i386__)
+  static const BYTE thunk[] = "\x8b\xff\x55\x8b\xec\x5d\x90\xe9";
+#elif defined(__x86_64__)
+  /* see
+ 
https://learn.microsoft.com/en-us/windows/arm/arm64ec-abi#fast-forward-sequences
 */
+  static const BYTE thunk[] = "\x48\x8b\xc4\x48\x89\x58\x20\x55\x5d\xe9";
+#else
+#error "Unhandled architecture for thunk detection"
+#endif
+  if (memcmp (proc, thunk, sizeof (thunk) - 1) == 0)
+{
+  proc += sizeof (thunk) - 1;
+  proc += 4 + *(const int32_t *) proc;
+}
+  return proc;
+#endif
+}
+
+#define IS_INSN(pc, name) ((*(pc) & name##_mask) == name##_id)
+static const uint32_t add_id = 0x1100;
+static const uint32_t add_mask = 0x7fc0;
+static const uint32_t adrp_id = 0x9000;
+static const uint32_t adrp_mask = 0x9f00;
+static const uint32_t bl_id = 0x9400;
+static const uint32_t bl_mask = 0xfc00;
+/* matches both cbz and cbnz */
+static const uint32_t cbz_id = 0x3400;
+static const uint32_t cbz_mask = 0x7e00;
+static const uint32_t ldr_id = 0xb940;
+static const uint32_t ldr_mask = 0xbfc0;
+
+static inline LPCVOID
+extract_bl_target (const uint32_t * pc)
+{
+  assert (IS_INSN (pc, bl));
+  int32_t offset = *pc & ~bl_mask;
+  /* sign extend */
+  if (offset & (1 << 25))
+offset |= bl_mask;
+  /* Note uint32_t * artithmatic will implicitly multiply the offset by 4 */
+  return pc + offset;
+}
+
+static inline uint64_t
+extract_adrp_address (const uint32_t * pc)
+{
+  assert (IS_INSN (pc, adrp));
+  uint64_t adrp_base = (uint64_t) pc & ~0xFFF;
+  int64_t  adrp_imm = (*pc >> (5+19+5)) & 0x3;
+  adrp_imm |= ((*pc >> 5) & 0x7) << 2;
+  /* sign extend */
+  if (adrp_imm & (1 << 20))
+adrp_imm |= ~((1 << 21) - 1);
+  adrp_imm <<= 12;
+  return adrp_base + adrp_imm;
+}
+
+/* This function scans the code in ntdll.dll to find the address of the
+   global variable used to access the CWD.  While the pointer is global,
+   it's not exported from the DLL, unfortunately.  Therefore we have to
+   use some knowledge to figure out the address. */
+
+fcwd_access_t **
+find_fast_cwd_pointer_aarch64 ()
+{
+  LPCVOID proc = GetArm64ProcAddress (GetModuleHandle ("ntdll"),
+ "RtlGetCurrentDirectory_U");
+  const uint32_t *start = (uint32_t *) proc;
+  const uint32_t *pc = start;
+  /* find the call to RtlpReferenceCurrentDirectory, and get its address */
+  for (; pc < start + 20; pc++)
+{
+  if (IS_INSN (pc, bl))
+   {
+ proc = extract_bl_target (pc);
+ break;
+   }
+}
+  if (proc == start)
+return NULL;
+
+  start = pc = (uint32_t *) proc;
+
+  const uint32_t *ldrpc = NULL;
+  uint32_t ldroffset, ldrsz;
+  uint32_t ldrrn, ldrrd;
+
+  /* find the ldr (immediate unsigned offset) for RtlpCurDirRef */
+  for (; pc < start + 20; pc++)
+{
+  if (IS_INSN (pc, ldr))
+   {
+ ldrpc = pc;
+ ldrsz = (*pc & 0x4000);
+ ldroffset = (*pc >> (5+5)) & 0xFFF;
+ ldroffset <<= ldrsz ? 3 : 2;
+ ldrrn = (*pc >> 5) & 0x1F;
+ ldrrd = *pc & 0x1F;
+ break;
+   }
+}
+  if (ldrpc == NULL)
+return NULL;
+
+  /* the next instruct