Hi Corinna,
I know you're on vacation, but I'm writing this now while it's fresh in
my mind. We can discuss it when you return.
This is a follow-up to
https://cygwin.com/pipermail/cygwin-developers/2024-December/012715.html
but I'm starting a new thread for clarity, and I'll start from scratch.
I'm back to thinking that mmap_is_attached_or_noreserve is broken. I'm
attaching two test cases to show this. To run them, create a file
hello.txt with one line, and then compile and run the two programs.
They both take a region of size 128K and map the two halves of it
separately with MAP_NORESERVE. The only difference between the two
cases is the order in which the halves are mapped. They then call
read(), passing the whole region of size 128K as buffer. If
mmap_is_attached_or_noreserve is working, the calls to read should
succeed. But only the one in mmap_noreserve2 succeeds, and the other
fails with a bus error. I'll now try to explain what's going on.
The task is to find one or more noreserve records that completely cover
the interval [addr, addr+len].
After a successful call to match(start_addr, len, u_addr, u_len), we
have the following situation:
|---------|---------------------|-----------------|
start_addr u_addr u_addr+u_len start_addr+len
The interval [u_addr, u_addr+u_len] is the intersection of [start_addr,
start_addr+len] with the interval of the mmap_record. Note that u_addr
>= start_addr and u_addr+u_len <= start_addr+len, but I've shown in the
picture the case where both inequalities are strict.
At this point we know that the intersection is contained in a noreserve
mmap_record. But we commit the entire interval [start_addr,
u_addr+u_len], whose length is commit_len, which is written as
u_len - (start_addr - u_addr) = (u_addr + u_len) - start_addr.
There's no problem if u_addr = start_addr. But if u_addr > start_addr,
then the commit includes [start_addr, u_addr], which is not at this
point known to be contained in a noreserve mmap_record. Moreover, even
if it is contained in a noreserve record that we just haven't come to
yet, experiments show that the call to VirtualAlloc in line 794 fails in
this case. Apparently you can't commit in one call to VirtualAlloc a
region that was reserved [Windows terminology] in two separate calls. I
can't find this documented anywhere, but that's what happens in the
attached mmap_noreserve1.c, as I verified by stepping through the code
under gdb.
As I said in an earlier message, I don't see how to fix this except by
making sure that the noreserve maps in map_list occur in order of their
starting address. Then if we ever find u_addr > start_addr, we know
we'll never find a noreserve mmap_record covering [start_addr, u_addr],
and we can break out of the loop. If you agree that there's a problem,
does this seem like the right fix?
I apologize for the length of this message, but I wanted to write
everything out in detail in order to be clear. I've written a patch
(also attached) that attempts to do does what I suggested in the
previous paragraph. Unfortunately, when I install it, mintty hangs on
startup without showing the bash prompt, and a grep process is left
running. So there must be something wrong with my patch, or else it
triggers a latent bug somewhere else.
Takashi, if you're reading this, could you take a look at my patch and
see if you spot any obvious errors? Or do you think this could be
related to your recent changes?
Ken
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
const size_t pagesize = 64 * 1024;
int
main ()
{
/* Map an interval of size 128K, then unmap it. This is just to
find suitable address space for the next calls to mmap. */
void *mem = mmap (NULL, 2 * pagesize, PROT_NONE,
MAP_PRIVATE | MAP_ANON, -1, 0);
if (mem == MAP_FAILED)
{
perror ("mmap");
exit (1);
}
if (munmap (mem, 2 * pagesize) == -1)
{
perror ("munmap");
exit (1);
}
/* Map the two halves of the 128K interval separately with
MAP_NORESERVE, starting with the first half. */
void *mem1 = mmap (mem, pagesize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_NORESERVE
| MAP_FIXED, -1, 0);
if (mem1 == MAP_FAILED)
{
perror ("mmap");
exit (2);
}
void *mem2 = mmap (mem + pagesize, pagesize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_NORESERVE
| MAP_FIXED, -1, 0);
if (mem2 == MAP_FAILED)
{
perror ("mmap");
exit (3);
}
/* Now try to access the region, by giving fhandler_base::raw_read
the whole region of size 128K as buffer. */
/* hello.txt is a file containing "hello". */
int fd = open ("hello.txt", O_RDONLY);
if (fd < 0)
{
perror ("open");
exit (1);
}
if (read (fd, mem1, 2 * pagesize) == -1)
{
perror ("read");
exit (1);
}
printf ("The region starting at mem1 contains %s\n", (char *) mem1);
if (read (fd, mem2, pagesize) == -1)
{
perror ("read");
exit (1);
}
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
const size_t pagesize = 64 * 1024;
int
main ()
{
/* Map an interval of size 128K, then unmap it. This is just to
find suitable address space for the next calls to mmap. */
void *mem = mmap (NULL, 2 * pagesize, PROT_NONE,
MAP_PRIVATE | MAP_ANON, -1, 0);
if (mem == MAP_FAILED)
{
perror ("mmap");
exit (1);
}
if (munmap (mem, 2 * pagesize) == -1)
{
perror ("munmap");
exit (1);
}
/* Map the two halves of the 128K interval separately with
MAP_NORESERVE, starting with the second half. */
void *mem2 = mmap (mem + pagesize, pagesize,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_NORESERVE
| MAP_FIXED, -1, 0);
if (mem2 == MAP_FAILED)
{
perror ("mmap");
exit (3);
}
void *mem1 = mmap (mem, pagesize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_NORESERVE
| MAP_FIXED, -1, 0);
if (mem1 == MAP_FAILED)
{
perror ("mmap");
exit (2);
}
/* Now try to access the region by giving fhandler_base::raw_read
the whole region of size 128K as buffer. */
/* hello.txt is a file containing "hello". */
int fd = open ("hello.txt", O_RDONLY);
if (fd < 0)
{
perror ("open");
exit (1);
}
if (read (fd, mem1, 2 * pagesize) == -1)
{
perror ("read");
exit (1);
}
printf ("The region starting at mem1 contains %s\n", (char *) mem1);
if (read (fd, mem2, pagesize) == -1)
{
perror ("read");
exit (1);
}
}
From e40aa465e274d1d80a2f7676df969506844dab6b Mon Sep 17 00:00:00 2001
From: Ken Brown <kbr...@cornell.edu>
Date: Wed, 25 Dec 2024 12:35:41 -0500
Subject: [PATCH] WIP
---
winsup/cygwin/mm/mmap.cc | 94 ++++++++++++++++++++++++++++++----------
1 file changed, 70 insertions(+), 24 deletions(-)
diff --git a/winsup/cygwin/mm/mmap.cc b/winsup/cygwin/mm/mmap.cc
index 13418d782baf..653e8b13be34 100644
--- a/winsup/cygwin/mm/mmap.cc
+++ b/winsup/cygwin/mm/mmap.cc
@@ -580,18 +580,46 @@ mmap_record::free_fh (fhandler_base *fh)
delete fh;
}
+/* We want the list of anonymous mappings to have all of its noreserve
+ mappings at the beginning of the list and sorted by address. This
+ is used in mmap_is_attached_or_noreserve. */
mmap_record *
mmap_list::add_record (mmap_record &r)
{
- mmap_record *rec = (mmap_record *) ccalloc (HEAP_MMAP,
- sizeof (mmap_record)
- + MAPSIZE (PAGE_CNT (r.get_len ())) * sizeof (DWORD), 1);
- if (!rec)
+ mmap_record *newrec = (mmap_record *) ccalloc (HEAP_MMAP,
+ sizeof (mmap_record)
+ + MAPSIZE (PAGE_CNT (r.get_len ()))
+ * sizeof (DWORD), 1);
+ if (!newrec)
return NULL;
- rec->init_page_map (r);
+ newrec->init_page_map (r);
+
+ LIST_WRITE_LOCK ();
+ if (fd != -1 || LIST_EMPTY (&recs))
+ {
+ LIST_INSERT_HEAD (&recs, newrec, mr_next);
+ goto out;
+ }
- LIST_INSERT_HEAD (&recs, rec, mr_next);
- return rec;
+ /* Now we're working on the anonymous list, which is non-empty. */
+ mmap_record *rec, *prev;
+ LIST_FOREACH (rec, &recs, mr_next)
+ {
+ if (!rec->noreserve ()
+ || (newrec->noreserve ()
+ && newrec->get_address () < rec->get_address ()))
+ {
+ LIST_INSERT_BEFORE (rec, newrec, mr_next);
+ goto out;
+ }
+ prev = rec;
+ }
+ /* If we get here, newrec should be inserted at the end of the list,
+ and prev points to the last record. */
+ LIST_INSERT_AFTER (prev, newrec, mr_next);
+out:
+ LIST_WRITE_UNLOCK ();
+ return newrec;
}
void
@@ -738,18 +766,18 @@ is_mmapped_region (caddr_t start_addr, caddr_t
end_address)
/* This function is called from exception_handler when a segmentation
violation has occurred. It should also be called from all Cygwin
- functions that want to support passing noreserve mmap page addresses
- to Windows system calls. In that case, it should be called only after
- a system call indicates that the application buffer passed had an
- invalid virtual address to avoid any performance impact in non-noreserve
- cases.
+ functions that want to support passing noreserve (anonymous) mmap
+ page addresses to Windows system calls. In that case, it should be
+ called only after a system call indicates that the application
+ buffer passed had an invalid virtual address to avoid any
+ performance impact in non-noreserve cases.
Check if the address range is all within noreserve mmap regions. If so,
call VirtualAlloc to commit the pages and return MMAP_NORESERVE_COMMITED
- on success. If the page has __PROT_ATTACH (SUSv3 memory protection
- extension), or if VirtualAlloc fails, return MMAP_RAISE_SIGBUS.
- Otherwise, return MMAP_NONE if the address range is not covered by an
- attached or noreserve map.
+ on success. If some page in the address range has __PROT_ATTACH
+ (SUSv3 memory protection extension), or if VirtualAlloc fails,
+ return MMAP_RAISE_SIGBUS. Otherwise, return MMAP_NONE if the
+ address range is not covered by noreserve maps.
On MAP_NORESERVE_COMMITED, the exeception handler should return 0 to
allow the application to retry the memory access, or the calling Cygwin
@@ -777,6 +805,11 @@ mmap_is_attached_or_noreserve (void *addr, size_t len)
LIST_FOREACH (rec, &map_list->recs, mr_next)
{
+ if (!rec->noreserve ())
+ /* In view of the way the anonymous map list is sorted, there
+ are no nonreserve map records left in the list. See
+ mmap_list::add_record. */
+ break;
if (!rec->match (start_addr, len, u_addr, u_len))
continue;
if (rec->attached ())
@@ -784,22 +817,35 @@ mmap_is_attached_or_noreserve (void *addr, size_t len)
ret = MMAP_RAISE_SIGBUS;
break;
}
- if (!rec->noreserve ())
- break;
- size_t commit_len = u_len - (start_addr - u_addr);
- if (commit_len > len)
- commit_len = len;
+ /* We now have the following situation:
+
+ |---------|---------------------|-----------------|
+ start_addr u_addr u_addr+u_len start_addr+len
+
+ The interval [u_addr, u_addr+u_len] is the intersection of
+ [start_addr, start_addr+len] with the interval of the
+ mmap_record *rec. Note that u_addr >= start_addr and
+ u_addr+u_len <= start_addr+len. The picture shows the case
+ where both inequalities are strict.
+
+ If u_addr > start_addr, then [start_addr, u_addr] lies to the
+ left of the region of *rec. Again, the sorting of map_list
+ implies that [start_addr, u_addr] is not covered by noreserve
+ maps. */
+
+ if (u_addr > start_addr)
+ break;
- if (!VirtualAlloc (start_addr, commit_len, MEM_COMMIT,
+ if (!VirtualAlloc (start_addr, u_len, MEM_COMMIT,
rec->gen_protect ()))
{
ret = MMAP_RAISE_SIGBUS;
break;
}
- start_addr += commit_len;
- len -= commit_len;
+ start_addr += u_len;
+ len -= u_len;
if (!len)
{
ret = MMAP_NORESERVE_COMMITED;
--
2.45.1