[PATCH unionfs 3/3] Don't use strncat() with length derived from source string
This fixes Wstringop-overflow and Wstringop-truncation GCC warnings. See https://gcc.gnu.org/bugzilla//show_bug.cgi?id=88059 Also, fix a bug where a string was not properly null-terminated. --- lib.c | 4 ++-- stow.c | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/lib.c b/lib.c index 01cdbd0..717979b 100644 --- a/lib.c +++ b/lib.c @@ -151,8 +151,8 @@ make_filepath (char *path, char *filename) if (filepath == NULL) return NULL; - strncpy (filepath, path, length); - strncat (filepath, filename, strlen (filename)); + strcpy (filepath, path); + strcat (filepath, filename); return filepath; } diff --git a/stow.c b/stow.c index 812d33b..adfcf53 100644 --- a/stow.c +++ b/stow.c @@ -282,14 +282,15 @@ stow_diradd (char *dir, int flags, struct patternlist *patternlist, { char *tmp; - tmp = (char *) malloc (dir_len + 1); + tmp = (char *) malloc (dir_len + 2); if (tmp == NULL) return ENOMEM; - strncpy (tmp, dir, dir_len); + strcpy (tmp, dir); tmp[dir_len] = '/'; + tmp[dir_len + 1] = 0; dir = tmp; } -- 2.31.1
[PATCH unionfs 2/3] Add some missing includes
This fixes multiple build errors. --- lib.c| 1 + lib.h| 1 + lnode.c | 1 + lnode.h | 1 + node.c | 1 + stow.c | 2 ++ ulfs.c | 1 + update.c | 1 + 8 files changed, 9 insertions(+) diff --git a/lib.c b/lib.c index e7f3aa9..01cdbd0 100644 --- a/lib.c +++ b/lib.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include "lib.h" diff --git a/lib.h b/lib.h index 095adaa..da01f4a 100644 --- a/lib.h +++ b/lib.h @@ -23,6 +23,7 @@ #include #include #include +#include /* Returned directory entries are aligned to blocks this many bytes long. Must be a power of two. */ diff --git a/lnode.c b/lnode.c index 427bc3c..4287489 100644 --- a/lnode.c +++ b/lnode.c @@ -21,6 +21,7 @@ nodes. */ #include +#include #include #include #include diff --git a/lnode.h b/lnode.h index f5a50f7..0df3047 100644 --- a/lnode.h +++ b/lnode.h @@ -24,6 +24,7 @@ #include #include +#include struct lnode { diff --git a/node.c b/node.c index 19228f6..78b5eb8 100644 --- a/node.c +++ b/node.c @@ -19,6 +19,7 @@ /* node management. */ +#include #include #include #include diff --git a/stow.c b/stow.c index a04ffdf..812d33b 100644 --- a/stow.c +++ b/stow.c @@ -21,7 +21,9 @@ /* Stow mode for unionfs. */ #include +#include #include +#include #include "ulfs.h" #include "lib.h" diff --git a/ulfs.c b/ulfs.c index 50affa6..773ebc4 100644 --- a/ulfs.c +++ b/ulfs.c @@ -19,6 +19,7 @@ /* Underlying filesystem management. */ +#include #include #include #include diff --git a/update.c b/update.c index 113b25c..f416d5e 100644 --- a/update.c +++ b/update.c @@ -21,6 +21,7 @@ root node update. */ #include +#include #include #include "ncache.h" -- 2.31.1
[PATCH unionfs 0/3] Fix unionfs build errors
Hello, I'm trying to use unionfs [0]; but I've run into a few build errors and warnings. With the following patches it builds and works on the latest Debian GNU/Hurd with GCC 10. P.S. This is the first time I attempt sending patches over email, please let me know if I've done anything wrong. [0]: https://www.gnu.org/software/hurd/hurd/translator/unionfs.html Sergey Bugaev (3): Implement MiG intran payload support Add some missing includes Don't use strncat() with length derived from source string lib.c| 5 +++-- lib.h| 1 + lnode.c | 1 + lnode.h | 1 + node.c | 1 + stow-mutations.h | 1 + stow-priv.h | 4 stow.c | 16 ++-- ulfs.c | 1 + update.c | 1 + 10 files changed, 28 insertions(+), 4 deletions(-) -- 2.31.1
[PATCH unionfs 1/3] Implement MiG intran payload support
This fixes a build error. --- stow-mutations.h | 1 + stow-priv.h | 4 stow.c | 9 + 3 files changed, 14 insertions(+) diff --git a/stow-mutations.h b/stow-mutations.h index d36280d..95c1e7f 100644 --- a/stow-mutations.h +++ b/stow-mutations.h @@ -21,6 +21,7 @@ /* Only CPP macro definitions should go in this file. */ #define FS_NOTIFY_INTRAN stow_notify_t begin_using_notify_port (fs_notify_t) +#define FS_NOTIFY_INTRAN_PAYLOAD stow_notify_t begin_using_notify_port_payload #define FS_NOTIFY_DESTRUCTOR end_using_notify_port (stow_notify_t) #define FS_NOTIFY_IMPORTS import "stow-priv.h"; diff --git a/stow-priv.h b/stow-priv.h index 2212ac9..eeae7b3 100644 --- a/stow-priv.h +++ b/stow-priv.h @@ -36,6 +36,10 @@ typedef struct stow_notify *stow_notify_t; arranges for this to happen for the fs_notify interfaces. */ stow_notify_t begin_using_notify_port (fs_notify_t port); +/* Called by MiG to translate ports into stow_notify_t when using the + protected payload feature. mutations.h arranges for this to happen + for the fs_notify interfaces. */ +stow_notify_t begin_using_notify_port_payload (unsigned long payload); /* Called by MiG after server routines have been run; this balances begin_using_notify_port, and is arranged for the fs_notify diff --git a/stow.c b/stow.c index cf6366e..a04ffdf 100644 --- a/stow.c +++ b/stow.c @@ -167,6 +167,15 @@ begin_using_notify_port (fs_notify_t port) return ports_lookup_port (stow_port_bucket, port, stow_port_class); } +/* Called by MiG to translate ports into stow_notify_t when using the + protected payload feature. mutations.h arranges for this to happen + for the fs_notify interfaces. */ +stow_notify_t +begin_using_notify_port_payload (unsigned long payload) +{ + return ports_lookup_payload (stow_port_bucket, payload, stow_port_class); +} + /* Called by MiG after server routines have been run; this balances begin_using_notify_port, and is arranged for the fs_notify interfaces by mutations.h. */ -- 2.31.1
Re: [PATCH unionfs 3/3] Don't use strncat() with length derived from source string
On Mon, Apr 26, 2021 at 9:02 PM Jessica Clarke wrote: > > - strncpy (filepath, path, length); > > - strncat (filepath, filename, strlen (filename)); > > + strcpy (filepath, path); > > + strcat (filepath, filename); > > This is dubious. We should be using safe interfaces where possible. To expand a bit on my (an GCC's) reasoning, it makes no sense to use strncpy/strncat with length that is derived from the *source* string length, such as strncat (filepath, filename, strlen (filename)); because str[n]cat would never copy more than strlen (filename) bytes from filename anyway. The intended/safe way to use strncpy is by specifying the *destination* buffer size, e.g. strcpy (buffer, source, sizeof buffer); I could have done just that; but then it makes little sense when the buffer size is, too, directly derived from the source string length, as in int dir_len; dir_len = strlen(dir); char *tmp; tmp = (char *) malloc (dir_len + 1); strncpy (tmp, dir, dir_len); But anyway, I don't care enough about this rather harmless warning to argue further. The actually important part here was the following fix (which I'm going to use as an excuse to try out the "scissors" feature): -- >8 -- Subject: [PATCH unionfs v2 3/3] Properly null-terminate a string --- stow.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/stow.c b/stow.c index 812d33b..ddbcf20 100644 --- a/stow.c +++ b/stow.c @@ -281,16 +281,17 @@ stow_diradd (char *dir, int flags, struct patternlist *patternlist, if (dir[dir_len - 1 ] != '/') { char *tmp; - tmp = (char *) malloc (dir_len + 1); + tmp = (char *) malloc (dir_len + 2); if (tmp == NULL) return ENOMEM; strncpy (tmp, dir, dir_len); tmp[dir_len] = '/'; + tmp[dir_len + 1] = 0; dir = tmp; } -- Sergey
Re: [PATCH unionfs 3/3] Don't use strncat() with length derived from source string
On Mon, Apr 26, 2021 at 11:10 PM Samuel Thibault wrote: > Err, but wouldn't the compiler be able to determine that the size was > properly computed, and avoid emitting a false-positive warning? It is my understanding, based on https://gcc.gnu.org/bugzilla//show_bug.cgi?id=88059, that GCC does not do any sophisticated analysis here, and just warns about any case where the specified length depends on the source size. Which makes sense to me, because either the destination buffer size depends on the source string length, in which case you can be sure it fits and don't need strncpy, or it does not depend on the source string length, in which case the string might not fit and you'd use strncpy, passing the destination buffer size. > The compiler emitting a warning looks to me like a sign that there might > really be something subtly wrong. Actually sending the output of the > compiler on the list would help us to know more about it. Here are the warnings I get (gcc --version is gcc (Debian 10.2.1-6+hurd.1) 10.2.1 20210110): lib.c: In function ‘make_filepath’: lib.c:154:3: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=] 154 | strncpy (filepath, path, length); | ^~~~ lib.c:149:12: note: length computed here 149 | length = strlen (path) + strlen (filename) + 2; |^ lib.c:155:3: warning: ‘strncat’ specified bound depends on the length of the source argument [-Wstringop-overflow=] 155 | strncat (filepath, filename, strlen (filename)); | ^~~ options.c: In function ‘argp_parse_common_options’: options.c:77:5: warning: variable ‘ulfs_match’ set but not used [-Wunused-but-set-variable] 77 | ulfs_match = 0, ulfs_priority = 0; | ^~ stow.c: In function ‘stow_diradd’: stow.c:290:7: warning: ‘strncpy’ output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation] 290 | strncpy (tmp, dir, dir_len); | ^~~ stow.c:275:13: note: length computed here 275 | dir_len = strlen(dir); | ^~~ The last one is indeed a false positive, because we (with my previous patch merged) now null-terminate the resulting string explicitly. -- Sergey
[RFC PATCH tarfs 5/6] Attempt to implement mmap coherence
When reading from or writing to a node that has an attached pager, use pager_memcpy () instead of accessing the cache directly. This enables tarfs_read_node () and tarfs_write_node () too see and affect the changes made to the file through the memory mapping. --- tarfs.c | 89 - 1 file changed, 76 insertions(+), 13 deletions(-) diff --git a/tarfs.c b/tarfs.c index 3b9bcf590..5195d5771 100644 --- a/tarfs.c +++ b/tarfs.c @@ -759,11 +759,43 @@ tarfs_lookup_node (struct node** node, struct node* dir, const char* name) error_t tarfs_read_node (struct node *node, off_t offset, size_t *len, void* data) { + struct pager *pager; + memory_object_t memobj; + error_t err; + if (S_ISDIR (node->nn_stat.st_mode)) { *len = 0; return EISDIR; } + + if (node->nn->hardlink) +node = node->nn->hardlink; + + if (node->nn_stat.st_size <= offset) +{ + *len = 0; + return 0; +} + if (node->nn_stat.st_size - offset < *len) +*len = node->nn_stat.st_size - offset; + + /* If we have a pager, always go through it. + The pager will call into the cache if needed. */ + pager = NODE_INFO(node)->pager; + if (pager) +{ + ports_port_ref (pager); + pthread_mutex_unlock (&node->lock); + memobj = pager_get_port (pager); + mach_port_insert_right (mach_task_self (), memobj, memobj, + MACH_MSG_TYPE_MAKE_SEND); + err = pager_memcpy (pager, memobj, offset, data, len, VM_PROT_READ); + mach_port_deallocate (mach_task_self (), memobj); + pthread_mutex_lock (&node->lock); + ports_port_deref (pager); + return err; +} else return cache_read (node, offset, *len, data, len); } @@ -774,27 +806,54 @@ error_t tarfs_write_node (struct node *node, off_t offset, size_t *len, void *data) { IF_RWFS; + struct node *original_node; + struct pager *pager; + memory_object_t memobj; + error_t err; if (S_ISDIR (node->nn_stat.st_mode)) { *len = 0; return EISDIR; } - else + + original_node = node; + if (node->nn->hardlink) +node = node->nn->hardlink; + + /* First, grow the node if we need to. This would happen automatically + for the cache path below, but not for the pager path. */ + if (offset + *len > node->nn_stat.st_size) { -error_t err; -/* Checks whether we need to actually write to another node. - (hard links are not handled by cache_write ()). */ -struct node *what = node->nn->hardlink ? node->nn->hardlink : node; - +err = cache_set_size (node, offset + *len); +if (err) + return err; + } + + /* If we have a pager, always go through it. + The pager will call into the cache if needed. */ + pager = NODE_INFO(node)->pager; + if (pager) +{ + ports_port_ref (pager); + pthread_mutex_unlock (&node->lock); + memobj = pager_get_port (pager); + mach_port_insert_right (mach_task_self (), memobj, memobj, + MACH_MSG_TYPE_MAKE_SEND); + err = pager_memcpy (pager, memobj, offset, data, len, + VM_PROT_READ|VM_PROT_WRITE); + mach_port_deallocate (mach_task_self (), memobj); + pthread_mutex_lock (&node->lock); + ports_port_deref (pager); +} + else err = cache_write (node, offset, data, *len, len); -/* Synchronize stat with hard link's target. */ -if ((! err) && (what != node)) - node->nn_stat.st_size = what->nn_stat.st_size; + /* Synchronize stat with hard link's target. */ + if ((! err) && (original_node != node)) +original_node->nn_stat.st_size = node->nn_stat.st_size; -return err; - } + return err; } /* Update NODE stat structure and mark it as dirty. */ @@ -1035,7 +1094,6 @@ tarfs_io_map (struct node *node, memory_object_t *rdobj, memory_object_t *wrobj) return 0; } - /* Rounds SIZE to the upper RECORDSIZE. */ static inline size_t @@ -1171,7 +1229,12 @@ tarfs_sync_fs (int wait) char *path; size_t size; - /* Lock the node first */ + /* If the node has a pager, ask it to flush + its state back into the cache. Do this before + locking the node. */ + if (NODE_INFO(node)->pager) +pager_sync (NODE_INFO(node)->pager, 1); + pthread_mutex_lock (&node->lock); have_to_sync = (tar->offset != file_offs + RECORDSIZE); path = fs_get_path_from_root (netfs_root_node, node); -- 2.31.1
[RFC PATCH tarfs 2/6] Link to libpager and initialize it on startup
--- Makefile | 2 +- main.c | 12 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e3936bfda..6e2eb422e 100644 --- a/Makefile +++ b/Makefile @@ -29,7 +29,7 @@ CFLAGS += -DDEBUG_ZIP # zip stores debugging # Note: -lz has to be first otherwise inflate() will be the exec server's # inflate function LDFLAGS = -L~ -lz -L. -lnetfs -lfshelp -liohelp -lports \ - -lihash -lshouldbeinlibc -lpthread -lstore -lbz2 + -lihash -lshouldbeinlibc -lpthread -lstore -lbz2 -lpager CTAGS = ctags SRC = main.c netfs.c tarfs.c tarlist.c fs.c cache.c tar.c names.c \ diff --git a/main.c b/main.c index 68268013f..90286de08 100644 --- a/main.c +++ b/main.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -40,6 +41,8 @@ mach_port_t ul_node; /* Has to be defined for libnetfs... */ int netfs_maxsymlinks = 2; +struct port_bucket *tarfs_pager_bucket; + /* Main. */ int main (int argc, char **argv) @@ -47,6 +50,7 @@ main (int argc, char **argv) struct argp fs_argp; mach_port_t bootstrap_port; struct iouser *user; + struct pager_requests *pager_requests; error_t err; /* Defaults to tarfs. */ @@ -68,6 +72,14 @@ main (int argc, char **argv) error (EXIT_FAILURE, err, "cannot create root node"); ul_node = netfs_startup (bootstrap_port, 0); + /* Init libpager. */ + tarfs_pager_bucket = ports_create_bucket (); + if (!tarfs_pager_bucket) +error (EXIT_FAILURE, errno, "cannot create port bucket"); + err = pager_start_workers (tarfs_pager_bucket, &pager_requests); + if (err) +error (1, err, "cannot start libpager"); + for (;;) netfs_server_loop (); -- 2.31.1
[RFC PATCH tarfs 0/6] mmap support for tarfs
Hello, I'm trying to implement mmap support in tarfs. The reason I need it is I want to run some binaries directly form a tarfs mount, without unpacking them to disk (ext2fs). ld.so uses mmap to load shared libraries uncoditionally [0]. So I've been digging through libpager, libnetfs, tarfs, and other code, trying to understand how things fit together and hook them up. The patch series I'm attaching implements mmap by lazily creating a pager for each node. The pager reads and writes data the same way tarfs_read_node () and tarfs_write_node () do: namely, by calling into the existing cache implementation. The most challenging part has been "mmap coherence" (perhaps there is an established term for this concept that I'm not aware of?). Basically, the file system has to guarantee that any modificaitons made to a file via io_write () (and other similar APIs) are immediately visible through any mappings of the file, and similarly any changes made through a mapping have to be immediately visible through io_read (). I peeked at how libdiskfs deals with this (since this does work as expected with ext2fs in my testing), and it forwards any reads/writes to pager_memcpy () instead of accessing the underlying data directly. I've tried to do the same, except that I only forward the calls to a pager if there is one. I've written a tiny program to test mmap behavior (I've uploaded its source here [1]), and used it to verify that my implementation behaves somewhat sanely. I'm also able to load binaries and shared libraries from tarfs with this patchset. In fact, I'm able to run code chrooted to tarfs. Please help me out by commenting on the patches. Does the overall approach look fine, or am I missing some crucial details? Have I messed up reference counting? Is there a better way to deal with a held mutex that unlocking it before calling a function that re-locks it? (Making the mutex recursive, perhaps? Or extracting the inner part into its own function?) [0]: https://www.gnu.org/software/hurd/open_issues/libnetfs_io_map.html [1]: https://paste.gg/p/anonymous/7de6f56e7a1c4babbab17bd6ccc2db46 Sergey Bugaev (6): Plumb io_map () through the backend layer Link to libpager and initialize it on startup Implement basic read-only mmap support Implement basic support for writable mappings Attempt to implement mmap coherence Update TODO and BUGS BUGS | 1 - Makefile | 4 +- TODO | 2 - backend.h | 3 ++ main.c| 13 + netfs.c | 36 +++--- pager.c | 141 ++ pager.h | 9 tarfs.c | 114 ++- tarfs.h | 2 + 10 files changed, 302 insertions(+), 23 deletions(-) create mode 100644 pager.c create mode 100644 pager.h -- 2.31.1
[RFC PATCH tarfs 1/6] Plumb io_map () through the backend layer
--- backend.h | 3 +++ netfs.c | 36 ++-- tarfs.c | 9 + 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/backend.h b/backend.h index 82bbee77b..0b281f686 100644 --- a/backend.h +++ b/backend.h @@ -110,6 +110,9 @@ struct fs_backend or S_IFCHR). */ error_t (* mkdev_node) (struct node *node, mode_t type, dev_t indexes); + /* Returns memory objects for mapping this node. */ + error_t (* io_map) (struct node *node, memory_object_t *rdobj, memory_object_t *wrobj); + /* Free all resources associated to NODE. */ void (* free_node) (struct node *node); diff --git a/netfs.c b/netfs.c index 6ed7e3f39..6a6f14f94 100644 --- a/netfs.c +++ b/netfs.c @@ -821,14 +821,38 @@ netfs_S_file_syncfs (struct protid *user, return err; } -/* The following stub has been added as a reminder. */ -#if 0 error_t -netfs_S_io_map (struct protid *user, +netfs_S_io_map (struct protid *user, mach_port_t *rdobj, mach_msg_type_name_t *rdobjtype, mach_port_t *wrobj, mach_msg_type_name_t *wrobjtype) { - error (0, 0, "Warning: io_map () not supported"); - return EOPNOTSUPP; + error_t err; + struct node *node; + + if (!user) +return EOPNOTSUPP; + + if (!(user->po->openstat & O_READ)) +{ + *rdobj = MACH_PORT_NULL; + rdobj = NULL; +} + if (!(user->po->openstat & O_WRITE)) +{ + *wrobj = MACH_PORT_NULL; + wrobj = NULL; +} + + if (rdobj || wrobj) +{ + node = user->po->np; + pthread_mutex_lock (&node->lock); + err = backend.io_map (node, rdobj, wrobj); + pthread_mutex_unlock (&node->lock); +} + + *rdobjtype = MACH_MSG_TYPE_MAKE_SEND; + *wrobjtype = MACH_MSG_TYPE_MAKE_SEND; + + return err; } -#endif diff --git a/tarfs.c b/tarfs.c index 1a01040a2..c1d9a5cc1 100644 --- a/tarfs.c +++ b/tarfs.c @@ -1010,6 +1010,14 @@ tarfs_mkdev_node (struct node *node, mode_t type, dev_t indexes) return EOPNOTSUPP; } +error_t +tarfs_io_map (struct node *node, memory_object_t *rdobj, memory_object_t *wrobj) +{ + error (0, 0, "Warning: io_map () not supported"); + return EOPNOTSUPP; +} + + /* Rounds SIZE to the upper RECORDSIZE. */ static inline size_t @@ -1370,6 +1378,7 @@ struct fs_backend tarfs_backend = tarfs_link_node, tarfs_symlink_node, tarfs_mkdev_node, + tarfs_io_map, tarfs_free_node, -- 2.31.1
[RFC PATCH tarfs 3/6] Implement basic read-only mmap support
Each node can have a pager attached. The pager gets created the first time io_map () is invoked on the node, and destroyed once it has no clients. The pager gets its data the same way tarfs_read_node () does: by ask the cache to copy out the data. Note that this ommit only implement read-only mappings, and even then it does nothing to provide coherency between what can be read from the created mappings and subsequent modifications done to the underlying file. The implementation will be enhanced in the following commits. This is enough for ld.so to successfully mmap shared libraries from a tarfs instance. --- Makefile | 2 +- main.c | 1 + pager.c | 117 +++ pager.h | 9 + tarfs.c | 22 ++- tarfs.h | 2 + 6 files changed, 150 insertions(+), 3 deletions(-) create mode 100644 pager.c create mode 100644 pager.h diff --git a/Makefile b/Makefile index 6e2eb422e..49b1950a9 100644 --- a/Makefile +++ b/Makefile @@ -33,7 +33,7 @@ LDFLAGS = -L~ -lz -L. -lnetfs -lfshelp -liohelp -lports \ CTAGS = ctags SRC = main.c netfs.c tarfs.c tarlist.c fs.c cache.c tar.c names.c \ - store-bzip2.c store-gzip.c debug.c + store-bzip2.c store-gzip.c debug.c pager.c OBJ = $(SRC:%.c=%.o) diff --git a/main.c b/main.c index 90286de08..64b48a973 100644 --- a/main.c +++ b/main.c @@ -30,6 +30,7 @@ #include #include "backend.h" +#include "pager.h" /* Choose the right backend here. */ extern struct fs_backend tarfs_backend; diff --git a/pager.c b/pager.c new file mode 100644 index 0..ecd564020 --- /dev/null +++ b/pager.c @@ -0,0 +1,117 @@ +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "pager.h" +#include "cache.h" +#include "backend.h" + +error_t +pager_read_page (struct user_pager_info *upi, vm_offset_t page, + vm_address_t *buf, int *writelock) +{ + error_t err; + struct node *node; + size_t actual; + void *data; + + data = mmap (0, vm_page_size, PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); + if (data == MAP_FAILED) +return ENOSPC; + *buf = (vm_address_t) data; + + node = (struct node *) upi; + + pthread_mutex_lock (&node->lock); + err = cache_read (node, page, vm_page_size, data, &actual); + pthread_mutex_unlock (&node->lock); + + if (err) +{ + munmap (data, vm_page_size); + return err; +} + + memset(data + actual, 0, vm_page_size - actual); + *writelock = 1; + + return 0; +} + +error_t +pager_write_page (struct user_pager_info *upi, + vm_offset_t page, + vm_address_t buf) +{ + assert_backtrace (!"we only create read-only pages"); +} + +error_t +pager_unlock_page (struct user_pager_info *upi, + vm_offset_t address) +{ + return EROFS; +} + +void +pager_notify_evict (struct user_pager_info *upi, +vm_offset_t page) +{ + assert_backtrace (!"unrequested notification on eviction"); +} + +/* Tell how big the file is. */ +error_t +pager_report_extent (struct user_pager_info *upi, + vm_address_t *offset, + vm_size_t *size) +{ + struct node *node; + + node = (struct node *) upi; + *offset = 0; + *size = node->nn_stat.st_size; + return 0; +} + +error_t +create_pager (struct node *node, memory_object_t *obj) +{ + struct pager *pager; + + /* The pager holds a hard reference on the node. */ + netfs_nref (node); + pager = pager_create ((struct user_pager_info *) node, tarfs_pager_bucket, +1, MEMORY_OBJECT_COPY_NONE, 0); + if (!pager) +return errno; + + NODE_INFO(node)->pager = pager; + + *obj = pager_get_port (pager); + ports_port_deref (pager); + + return 0; +} + +void +pager_clear_user_data (struct user_pager_info *upi) +{ + struct node *node; + + node = (struct node *) upi; + NODE_INFO(node)->pager = NULL; + netfs_nrele (node); +} + +void +pager_dropweak (struct user_pager_info *upi) +{ +} + diff --git a/pager.h b/pager.h new file mode 100644 index 0..34447c8c9 --- /dev/null +++ b/pager.h @@ -0,0 +1,9 @@ +#ifndef PAGER_H +#define PAGER_H + +struct user_pager_info; +extern struct port_bucket *tarfs_pager_bucket; + +error_t create_pager (struct node *node, memory_object_t *obj); + +#endif /* PAGER_H_ */ diff --git a/tarfs.c b/tarfs.c index c1d9a5cc1..2f5c53dad 100644 --- a/tarfs.c +++ b/tarfs.c @@ -18,6 +18,7 @@ #include #include +#include #include #include @@ -36,6 +37,7 @@ #include "cache.h" #include "zipstores.h" #include "debug.h" +#include "pager.h" /* New netfs variables */ char *netfs_server_name = "tarfs"; @@ -1013,8 +1015,24 @@ tarfs_mkdev_node (struct node *node, mode_t type, dev_t indexes) error_t tarfs_io_map (struct node *node, memory_object_t *rdobj, memory_object_t *wrobj) { - error (0, 0, "Warning: io_map () not supported"); - return EOPNOTSUPP; + error_t err; + memory_object_t obj; + + if (!NODE_INFO(nod
[RFC PATCH tarfs 4/6] Implement basic support for writable mappings
No mmap coherence is implemented yet. --- pager.c | 30 +++--- tarfs.c | 2 +- 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/pager.c b/pager.c index ecd564020..8756a2a4b 100644 --- a/pager.c +++ b/pager.c @@ -11,6 +11,9 @@ #include "pager.h" #include "cache.h" #include "backend.h" +#include "tarfs.h" + +extern struct tarfs_opts tarfs_options; error_t pager_read_page (struct user_pager_info *upi, vm_offset_t page, @@ -39,7 +42,7 @@ pager_read_page (struct user_pager_info *upi, vm_offset_t page, } memset(data + actual, 0, vm_page_size - actual); - *writelock = 1; + *writelock = tarfs_options.readonly; return 0; } @@ -49,14 +52,35 @@ pager_write_page (struct user_pager_info *upi, vm_offset_t page, vm_address_t buf) { - assert_backtrace (!"we only create read-only pages"); + error_t err; + struct node *node; + size_t to_write, actual; + + node = (struct node *) upi; + + pthread_mutex_lock (&node->lock); + + if (page + vm_page_size > node->nn_stat.st_size) +to_write = node->nn_stat.st_size - page; + else +to_write = vm_page_size; + err = cache_write (node, page, (void *) buf, to_write, &actual); + + pthread_mutex_unlock (&node->lock); + + if (!err && actual < vm_page_size) +err = EIO; + + vm_deallocate (mach_task_self (), buf, vm_page_size); + + return err; } error_t pager_unlock_page (struct user_pager_info *upi, vm_offset_t address) { - return EROFS; + return tarfs_options.readonly ? EROFS : 0; } void diff --git a/tarfs.c b/tarfs.c index 2f5c53dad..3b9bcf590 100644 --- a/tarfs.c +++ b/tarfs.c @@ -1030,7 +1030,7 @@ tarfs_io_map (struct node *node, memory_object_t *rdobj, memory_object_t *wrobj) if (rdobj) *rdobj = obj; if (wrobj) -*wrobj = MACH_PORT_NULL; +*wrobj = tarfs_options.readonly ? MACH_PORT_NULL : obj; return 0; } -- 2.31.1
[RFC PATCH tarfs 6/6] Update TODO and BUGS
--- BUGS | 1 - TODO | 2 -- 2 files changed, 3 deletions(-) diff --git a/BUGS b/BUGS index 1c946b366..4b88e85d1 100644 --- a/BUGS +++ b/BUGS @@ -4,7 +4,6 @@ Known tarfs bugs 1. General -* io_map () not implemented (gcc uses it unfortunately). * netfs.c (netfs_get_dirents): too much memory may be allocated (mmap'd). diff --git a/TODO b/TODO index c1c826167..4c5a1531f 100644 --- a/TODO +++ b/TODO @@ -9,7 +9,5 @@ TODO List translator set on this node. For symlinks, we would not show any passive translator setting. -* Implement netfs_S_io_map (). - * Update the gzip store to make it more robust (for instance, find *real* specifications of the header format). -- 2.31.1
Re: [RFC PATCH tarfs 0/6] mmap support for tarfs
On Sat, May 1, 2021 at 6:38 PM Samuel Thibault wrote: > Actually I'd say the pager should replace the cache. The pager is > already a cache by itself, we should not need to keep both the pager and > the cache, particularly since it means having to keep both coherent. Well, yes, I've considered that; but I've tried to keep the changes less invasive than that. I'll take a stab at it if you think this is the way to go. This actually brings me to a question: why is tarfs using netfs over diskfs? I see that "a translator which shows a .tar archive in a unpacked way" is mentioned in the wiki [0] as a motivating example of what netfs should be used for, so there must be a good reason. Yet from what I see, libnetfs is suited for filesystems that are either served from a remote location (the net in netfs) or just synthesized on the fly; and on the other hand, the tar format, with its 512-byte blocks, sounds very much like a filesystem image to me. isofs uses diskfs, why doesn't tarfs? [0]: https://www.gnu.org/software/hurd/hurd/libnetfs.html > > Is there a better way to deal with a held mutex that unlocking it > > before calling a function that re-locks it? > > Yes, avoiding the situation. If you unlock a mutex, anything can then > happen, and you can easily get to an incoherent state. I understand; if some code drops a mutex and then re-acquires it, it has to be prepared that everything might have changed in the meantime. Which is admittedly something I have not done for this prototype. > > (Making the mutex recursive, perhaps? > > That's usually frowned upon because it's often a sign that you don't > know at various places whether you hold the lock or not. > > > Or extracting the inner part into its own function?) > > Yes, in some cases that makes sense. See, I cannot easily do either (make the mutex recursive or extract the logic). The flow goes something like this: 1. S_io_write () 2. lock the node, validate size/offset, grow it if needed 3. pager_memcpy (), fault 4. another thread (!) gets to pager_{read,write}_page () This is why the mutex cannot be made recursive: it's being grabbed from the other thread. And this is also why I cannot just extract the logic: I'm not calling the logic inside pager.c directly, I'm calling pager_memcpy (), which faults and *that* causes pager_{read,write}_page () to be called the same way it'd be called for any other task faulting on the mapping. And pager_{read,write}_page () naturally has to lock, to validate the size, access the cache, and whatnot. Perhaps you can think of a solution? How does diskfs cope with this? (I see that _diskfs_rdwr_internal () still has the node locked while calling pager_memcpy (), how come that doesn't deadlock?) -- Sergey
Re: [RFC PATCH tarfs 3/6] Implement basic read-only mmap support
On Sat, May 1, 2021 at 6:23 PM Samuel Thibault wrote: > This is missing the copyright head:ng. Ah yes, sorry. Should I just put myself there, or "Hurd developers", or "Free Software Foundation, Inc."?.. > > + err = create_pager (node, &obj); > > Is the pager getting cleaned at some point? The idea was that NODE_INFO(node)->pager does *not* itself keep a hard reference on the pager (create_pager () calls ports_port_deref (pager) after pager_get_port ()), so once all the pager clients are gone, the memory object would get garbage-collected, which would call pager_clear_user_data (), which sets NODE_INFO(node)->pager = NULL. But I was unable to test this in practice: Mach doesn't seem to want to do memory_object_terminate () even after the clients are gone. Perhaps it wants to keep the page cached while it has enough free memory. -- Sergey
Re: [RFC PATCH tarfs 0/6] mmap support for tarfs
On Sat, May 1, 2021 at 7:38 PM Samuel Thibault wrote: > > on the other hand, the tar format, with its 512-byte > > blocks, sounds very much like a filesystem image to me. isofs uses > > diskfs, why doesn't tarfs? > > It's not exactly the same since you have compression in the way. But > yes, that looks similar enough. Does tar actually do any compression? I was under the impression that tar is about just sticking multiple files together, and then you'd apply gzip or bzip2 on top if you want compression. And in case of tarfs, the compression is handled by the store abstraction, which makes it transparent to the rest of the logic. > Diskfs' pager_read_page does *not* have to lock the node, it just reads > and returns the data. That's again a point where you see that having > also the cache is in the way rather than helping. Hmm, but doesn't the page reading/writing implementation need to access the file size, atime/mtime, etc.? All of which may be changed concurrently. -- Sergey
Re: [RFC PATCH tarfs 0/6] mmap support for tarfs
On Sat, May 1, 2021 at 8:05 PM Samuel Thibault wrote: > > And in case of tarfs, the compression is handled by the store > > abstraction, which makes it transparent to the rest of the logic. > > Ah, ok. > And in the zip case? As I understand it, zip is a different model that basically combines functionality of tar and gzip; you cannot just zip a blob, you need a file tree. So it's not relevant for compressing ar files. tarfs can be run with --bzip2 and --gzip options, but not with --zip. > > Hmm, but doesn't the page reading/writing implementation need to > > access the file size, atime/mtime, etc.? All of which may be changed > > concurrently. > > IIRC it isn't handled by the pager itself. Is it? I don't think pager (as in libpager) knows anything about atime/mtime and other filesystem-specific cruft. The pager knows the size; but the callbacks need to know the size too, to decide how much to read/write in case of the last page. (To quote the Linux manual for mmap: "For a file that is not a multiple of the page size, the remaining bytes in the partial page at the end of the mapping are zeroed when mapped, and modifications to that region are not written out to the file.") -- Sergey
Re: [RFC PATCH tarfs 3/6] Implement basic read-only mmap support
On Sat, May 1, 2021 at 8:48 PM Samuel Thibault wrote: > Since this is not part of the main hurd repo there is no copyright > assignment requirement, so you can put your name. You can also assign > copyright and then put the FSF name. OK, I'll put my own name for now then. I'm fine with assigning copyright to FSF if that's required for contributing to the main repo; but I've heard it's a long complicated process that involves physically sending papeps overseas & getting an ack from my employer. Is that true? Could you (or somebody) guide me through it? > > The idea was that NODE_INFO(node)->pager does *not* itself keep a hard > > reference on the pager (create_pager () calls ports_port_deref (pager) > > after pager_get_port ()), so once all the pager clients are gone, the > > memory object would get garbage-collected, which would call > > pager_clear_user_data (), which sets NODE_INFO(node)->pager = NULL. > > Does that also destroy the pager? Doesn't it? Here's my understanding of how this works (or should work): * Ports (as in struct port_info) are reference counted. One can hold explicit references on them (with ports_port_[de]ref ()); a reference is also acquired and dropped implicitly by MIG intran/outtran. The port also gets ref'ed when you do ports_get_right (); this reference is then automatically dropped once there are no longer any send rights left. (I don't quite understand how that last one works, yet.) So overall, a port is alive while it's either referenced by something in-process, or has clients. * Pagers from libpager use this mechanism. Once nothing in the process references a pager and all the send rights to it are gone (so, after memory_object_terminate ()), the pager gets destroyed by the reference counting mechanism. It is then when pager_clear_user_data () is called, among other things. * In my implementation, nothing in the tarfs process holds references on pagers, so each pager should get destroyed as there are no send rights to it, i.e. as soon as Mach terminates it. -- Sergey
[PATCH 1/6] libpager: Fix mixing up success and error
_pager_pagemap_resize () returns an error or 0 on success, not a boolean. --- libpager/offer-page.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/libpager/offer-page.c b/libpager/offer-page.c index 9f090bcb..ddea236a 100644 --- a/libpager/offer-page.c +++ b/libpager/offer-page.c @@ -31,21 +31,22 @@ pager_offer_page (struct pager *p, pthread_mutex_lock (&p->interlock); if (_pager_pagemap_resize (p, offset + vm_page_size)) +goto release_out; + + short *pm_entry = &p->pagemap[offset / vm_page_size]; + + while (*pm_entry & PM_INCORE) { - short *pm_entry = &p->pagemap[offset / vm_page_size]; - - while (*pm_entry & PM_INCORE) - { - pthread_mutex_unlock (&p->interlock); - pager_flush_some (p, offset, vm_page_size, 1); - pthread_mutex_lock (&p->interlock); - } - *pm_entry |= PM_INCORE; - - memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0, -writelock ? VM_PROT_WRITE : VM_PROT_NONE, -precious, MACH_PORT_NULL); + pthread_mutex_unlock (&p->interlock); + pager_flush_some (p, offset, vm_page_size, 1); + pthread_mutex_lock (&p->interlock); } + *pm_entry |= PM_INCORE; + + memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0, + writelock ? VM_PROT_WRITE : VM_PROT_NONE, + precious, MACH_PORT_NULL); + release_out: pthread_mutex_unlock (&p->interlock); } -- 2.31.1
[PATCH 2/6] libpager: Do not flush in-core pages on offer
pager_offer_page () is documented to may ignore the offered page if the kernel already has a copy in core (indeed, that's what Mach does). However the current behavior is the inverse of that: it asks the kernel to flush (i.e. drop) its in-core copy, and replace it with the offered one. Fix this by not doing that, and calling memory_object_data_supply () directly. --- libpager/offer-page.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/libpager/offer-page.c b/libpager/offer-page.c index ddea236a..0b0ca2eb 100644 --- a/libpager/offer-page.c +++ b/libpager/offer-page.c @@ -34,13 +34,6 @@ pager_offer_page (struct pager *p, goto release_out; short *pm_entry = &p->pagemap[offset / vm_page_size]; - - while (*pm_entry & PM_INCORE) -{ - pthread_mutex_unlock (&p->interlock); - pager_flush_some (p, offset, vm_page_size, 1); - pthread_mutex_lock (&p->interlock); -} *pm_entry |= PM_INCORE; memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0, -- 2.31.1
[PATCH 0/6] Various libpager fixes
While hacking further on tarfs mmap support, I've come across some issues in libpager and I thought I'd fix them. The issues range from plain bugs (such as pager_offer_page () always silently failing) to minor things that could be improved. The patches are fairly independent, but they touch a lot of the same code, so they may be hard to apply separately. I can regenerate diffs for individual patches if needed. Most of the patches only modify internal workings of libpager; the 3rd patch slightly modifies the public API by adding return codes to various functions that have previously returned void. This should not break any callers though. I have tested that the modified libpager builds without warnings, and tmpfs seems to work as expected with libpager.so replaced. Sergey Bugaev (6): libpager: Fix mixing up success and error libpager: Do not flush in-core pages on offer libpager: Add error handling to various functions libpager: Store pagemapsize as vm_size_t libpager: Fix overallocating pagemap libpager: Use libc heap for pagemap doc/hurd.texi | 16 +-- libpager/data-unlock.c | 13 + libpager/lock-object.c | 32 ++--- libpager/object-terminate.c | 5 ++-- libpager/offer-page.c | 28 +-- libpager/pagemap.c | 27 +++--- libpager/pager-attr.c | 56 + libpager/pager-flush.c | 21 -- libpager/pager-return.c | 21 -- libpager/pager-sync.c | 19 +++-- libpager/pager.h| 20 ++--- libpager/priv.h | 6 ++-- 12 files changed, 143 insertions(+), 121 deletions(-) -- 2.31.1
[PATCH 3/6] libpager: Add error handling to various functions
Instead of ignoring errors, we might as well return them to the caller. This technically changes the function signatures, but should not break any users since they can continue to ignore the return value. --- doc/hurd.texi | 16 ++-- libpager/data-unlock.c | 13 +- libpager/lock-object.c | 32 +++ libpager/offer-page.c | 14 +++ libpager/pager-attr.c | 56 +++-- libpager/pager-flush.c | 21 +--- libpager/pager-return.c | 21 +--- libpager/pager-sync.c | 19 -- libpager/pager.h| 20 +++ libpager/priv.h | 4 +-- 10 files changed, 125 insertions(+), 91 deletions(-) diff --git a/doc/hurd.texi b/doc/hurd.texi index f63cfcda..c8db86b5 100644 --- a/doc/hurd.texi +++ b/doc/hurd.texi @@ -1434,8 +1434,8 @@ Demultiplex incoming @code{libports} messages on pager ports. The following functions are the body of the pager library, and provide a clean interface to pager functionality: -@deftypefun void pager_sync (@w{struct pager *@var{pager}}, @w{int @var{wait}}) -@deftypefunx void pager_sync_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) +@deftypefun error_t pager_sync (@w{struct pager *@var{pager}}, @w{int @var{wait}}) +@deftypefunx error_t pager_sync_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) Write data from pager @var{pager} to its backing store. Wait for all the writes to complete if and only if @var{wait} is set. @@ -1443,8 +1443,8 @@ the writes to complete if and only if @var{wait} is set. data starting at @var{start}, for @var{len} bytes. @end deftypefun -@deftypefun void pager_flush (@w{struct pager *@var{pager}}, @w{int @var{wait}}) -@deftypefunx void pager_flush_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) +@deftypefun error_t pager_flush (@w{struct pager *@var{pager}}, @w{int @var{wait}}) +@deftypefunx error_t pager_flush_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) Flush data from the kernel for pager @var{pager} and force any pending delayed copies. Wait for all pages to be flushed if and only if @var{wait} is set. @@ -1453,8 +1453,8 @@ delayed copies. Wait for all pages to be flushed if and only if flushes data starting at @var{start}, for @var{len} bytes. @end deftypefun -@deftypefun void pager_return (@w{struct pager *@var{pager}}, @w{int @var{wait}}) -@deftypefunx void pager_return_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) +@deftypefun error_t pager_return (@w{struct pager *@var{pager}}, @w{int @var{wait}}) +@deftypefunx error_t pager_return_some (@w{struct pager *@var{pager}}, @w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}}) Flush data from the kernel for pager @var{pager} and force any pending delayed copies. Wait for all pages to be flushed if and only if @var{wait} is set. Have the kernel write back modifications. @@ -1464,14 +1464,14 @@ delayed copies. Wait for all pages to be flushed if and only if @var{start}, for @var{len} bytes. @end deftypefun -@deftypefun void pager_offer_page (@w{struct pager *@var{pager}}, @w{int @var{precious}}, @w{int @var{writelock}}, @w{vm_offset_t @var{page}}, @w{vm_address_t @var{buf}}) +@deftypefun error_t pager_offer_page (@w{struct pager *@var{pager}}, @w{int @var{precious}}, @w{int @var{writelock}}, @w{vm_offset_t @var{page}}, @w{vm_address_t @var{buf}}) Offer a page of data to the kernel. If @var{precious} is set, then this page will be paged out at some future point, otherwise it might be dropped by the kernel. If the page is currently in core, the kernel might ignore this call. @end deftypefun -@deftypefun void pager_change_attributes (@w{struct pager *@var{pager}}, @w{boolean_t @var{may_cache}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}, @w{int @var{wait}}) +@deftypefun error_t pager_change_attributes (@w{struct pager *@var{pager}}, @w{boolean_t @var{may_cache}}, @w{memory_object_copy_strategy_t @var{copy_strategy}}, @w{int @var{wait}}) Change the attributes of the memory object underlying pager @var{pager}. The @var{may_cache} and @var{copy_strategy} arguments are as for @code{memory_object_change_}. Wait for the kernel to report diff --git a/libpager/data-unlock.c b/libpager/data-unlock.c index 077e673f..caaea41a 100644 --- a/libpager/data-unlock.c +++ b/libpager/data-unlock.c @@ -28,7 +28,7 @@ _pager_S_memory_object_data_unlock (struct pager *p, vm_size_t length, vm_prot_t access) { - volatile int err; + error_t err = 0; if (!p || p->port.class != _pager_class) @@
[PATCH 4/6] libpager: Store pagemapsize as vm_size_t
On a 64-bit system, there can be a lot more pages than a 32-bit int can fit. --- libpager/pagemap.c | 2 +- libpager/priv.h| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libpager/pagemap.c b/libpager/pagemap.c index 963f6567..63c5f44d 100644 --- a/libpager/pagemap.c +++ b/libpager/pagemap.c @@ -29,7 +29,7 @@ _pager_pagemap_resize (struct pager *p, vm_address_t off) if (p->pagemapsize < off) { void *newaddr; - int newsize = round_page (off); + vm_size_t newsize = round_page (off); newaddr = mmap (0, newsize * sizeof (*p->pagemap), PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); diff --git a/libpager/priv.h b/libpager/priv.h index c0a99fe3..d9d76965 100644 --- a/libpager/priv.h +++ b/libpager/priv.h @@ -66,7 +66,7 @@ struct pager #endif short *pagemap; - int pagemapsize; /* number of elements in PAGEMAP */ + vm_size_t pagemapsize; /* number of elements in PAGEMAP */ }; struct lock_request -- 2.31.1
[PATCH 5/6] libpager: Fix overallocating pagemap
The code tried to round up the allocation size to a multiple of page size. But we actually allocate newsize * sizeof (*p->pagemap) bytes, not newsize bytes, which meant allocations were sizeof (*p->pagemap) times larger than they needed to be. --- libpager/pagemap.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libpager/pagemap.c b/libpager/pagemap.c index 63c5f44d..1570c75b 100644 --- a/libpager/pagemap.c +++ b/libpager/pagemap.c @@ -17,21 +17,22 @@ #include "priv.h" #include - + /* Grow the pagemap of pager P as necessary to deal with address OFF */ error_t _pager_pagemap_resize (struct pager *p, vm_address_t off) { error_t err = 0; - + off /= __vm_page_size; if (p->pagemapsize < off) { void *newaddr; - vm_size_t newsize = round_page (off); + vm_size_t newsize = round_page (off * sizeof (*p->pagemap)) + / sizeof (*p->pagemap); - newaddr = mmap (0, newsize * sizeof (*p->pagemap), + newaddr = mmap (0, newsize * sizeof (*p->pagemap), PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); err = (newaddr == (void *) -1) ? errno : 0; if (! err) -- 2.31.1
[PATCH 6/6] libpager: Use libc heap for pagemap
libc already implements the functionality for allocating and managing memory blocks like the pagemap. Using libc functions gives us some additional niceties like overflow checking in reallocarray (). it also means that we will not allocate a whole page of memory if we need to store just a few integers. --- libpager/object-terminate.c | 5 +++-- libpager/pagemap.c | 26 +- 2 files changed, 12 insertions(+), 19 deletions(-) diff --git a/libpager/object-terminate.c b/libpager/object-terminate.c index 95298217..3e7df167 100644 --- a/libpager/object-terminate.c +++ b/libpager/object-terminate.c @@ -17,6 +17,7 @@ #include "priv.h" #include "memory_object_S.h" +#include #include /* Implement the object termination call from the kernel as described @@ -118,10 +119,10 @@ _pager_free_structure (struct pager *p) /* Free the pagemap */ if (p->pagemapsize) { - munmap (p->pagemap, p->pagemapsize * sizeof (* p->pagemap)); + free (p->pagemap); p->pagemapsize = 0; p->pagemap = 0; } - + p->pager_state = NOTINIT; } diff --git a/libpager/pagemap.c b/libpager/pagemap.c index 1570c75b..7bbb8c56 100644 --- a/libpager/pagemap.c +++ b/libpager/pagemap.c @@ -16,33 +16,25 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include "priv.h" +#include #include /* Grow the pagemap of pager P as necessary to deal with address OFF */ error_t _pager_pagemap_resize (struct pager *p, vm_address_t off) { - error_t err = 0; - off /= __vm_page_size; if (p->pagemapsize < off) { - void *newaddr; - vm_size_t newsize = round_page (off * sizeof (*p->pagemap)) - / sizeof (*p->pagemap); - - newaddr = mmap (0, newsize * sizeof (*p->pagemap), - PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); - err = (newaddr == (void *) -1) ? errno : 0; - if (! err) - { - memcpy (newaddr, p->pagemap, p->pagemapsize * sizeof (*p->pagemap)); - munmap (p->pagemap, p->pagemapsize * sizeof (*p->pagemap)); - p->pagemap = newaddr; - p->pagemapsize = newsize; - } + void *newaddr = reallocarray (p->pagemap, off, +sizeof (*p->pagemap)); + if (!newaddr) +return errno; + + p->pagemap = newaddr; + p->pagemapsize = off; } - return err; + return 0; } -- 2.31.1
And another patch...
There's yet another bug: libpager just throws away clean precious pages (those previously offered with pager_offer_page (precious = 1)) upon receiving them from the kernel, since it mistakes them for evicted pages it's being notified about. This is obviously very problematic. I'm attaching a patch that attempts to fix the issue, but I'm less sure about it. What do you think? Sergey -- >8 -- Subject: [PATCH] libpager: Do not throw away precious pages The kernel invokes memory_object_data_return () with dirty = 0 in two cases: if notification on eviction was requested, or when returning precious pages. Properly distinguish between the two cases, and only throw the clean page away in the former case. --- libpager/data-return.c | 69 -- libpager/offer-page.c | 2 ++ libpager/priv.h| 1 + 3 files changed, 29 insertions(+), 43 deletions(-) diff --git a/libpager/data-return.c b/libpager/data-return.c index c0f5aaf7..59e3e956 100644 --- a/libpager/data-return.c +++ b/libpager/data-return.c @@ -38,12 +38,12 @@ _pager_do_write_request (struct pager *p, short *pm_entries; int npages, i; char *notified; + char *omit_data; error_t *pagerrs; struct lock_request *lr; struct lock_list {struct lock_request *lr; struct lock_list *next;} *lock_list, *ll; int wakeup; - int omitdata = 0; if (!p || p->port.class != _pager_class) @@ -78,6 +78,9 @@ _pager_do_write_request (struct pager *p, npages = length / __vm_page_size; pagerrs = alloca (npages * sizeof (error_t)); + omit_data = alloca (npages * (sizeof *omit_data)); + memset (omit_data, 0, npages * (sizeof *omit_data)); + notified = alloca (npages * (sizeof *notified)); #ifndef NDEBUG memset (notified, -1, npages * (sizeof *notified)); @@ -90,23 +93,6 @@ _pager_do_write_request (struct pager *p, pm_entries = &p->pagemap[offset / __vm_page_size]; - if (! dirty) -{ - munmap ((void *) data, length); - if (!kcopy) { -/* Prepare notified array. */ -for (i = 0; i < npages; i++) - notified[i] = (p->notify_on_evict - && ! (pm_entries[i] & PM_PAGEINWAIT)); - -goto notify; - } - else { -_pager_allow_termination (p); -goto release_out; - } -} - /* Make sure there are no other in-progress writes for any of these pages before we begin. This imposes a little more serialization than we really have to require (because *all* future writes on @@ -115,27 +101,24 @@ _pager_do_write_request (struct pager *p, /* XXX: Is this still needed? */ retry: for (i = 0; i < npages; i++) -if (pm_entries[i] & PM_PAGINGOUT) - { - pm_entries[i] |= PM_WRITEWAIT; - pthread_cond_wait (&p->wakeup, &p->interlock); - goto retry; +{ + if ((initializing && (pm_entries[i] & PM_INIT)) || + (!dirty && !(pm_entries[i] & PM_PRECIOUS))) +{ + omit_data[i] = 1; + continue; +} + if (pm_entries[i] & PM_PAGINGOUT) +{ + pm_entries[i] |= PM_WRITEWAIT; + pthread_cond_wait (&p->wakeup, &p->interlock); + goto retry; } +} /* Mark these pages as being paged out. */ - if (initializing) -{ - assert_backtrace (npages <= 32); - for (i = 0; i < npages; i++) - { - if (pm_entries[i] & PM_INIT) -omitdata |= 1 << i; - else -pm_entries[i] |= PM_PAGINGOUT | PM_INIT; - } -} - else -for (i = 0; i < npages; i++) + for (i = 0; i < npages; i++) +if (!omit_data[i]) pm_entries[i] |= PM_PAGINGOUT | PM_INIT; /* If this write occurs while a lock is pending, record @@ -162,7 +145,7 @@ _pager_do_write_request (struct pager *p, but until the pager library interface is changed, this will have to do. */ for (i = 0; i < npages; i++) -if (!(omitdata & (1 << i))) +if (!omit_data[i]) pagerrs[i] = pager_write_page (p->upi, offset + (vm_page_size * i), data + (vm_page_size * i)); @@ -175,9 +158,11 @@ _pager_do_write_request (struct pager *p, wakeup = 0; for (i = 0; i < npages; i++) { - if (omitdata & (1 << i)) + if (omit_data[i]) { - notified[i] = 0; + notified[i] = (p->notify_on_evict + && !kcopy + && ! (pm_entries[i] & PM_PAGEINWAIT)); continue; } @@ -205,14 +190,13 @@ _pager_do_write_request (struct pager *p, } else { - munmap ((void *) (data + (vm_page_size * i)), - vm_page_size); notified[i] = (! kcopy && p->notify_on_evict); if (! kcopy) pm_entries[i] &= ~PM_INCORE; } - pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT | PM_WRITEWAIT); + pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT + | PM_WRITEWAIT | PM_PRECIOUS); } for (ll = lock_list; ll; ll = ll->next) @@ -222,7 +206,6 @@ _pager_do_write_request (struct pager *p, if (wakeup) pthread_cond_broadcast (&p->wakeup);
Re: [PATCH 6/6] libpager: Use libc heap for pagemap
On Thu, May 6, 2021 at 3:56 PM Sergey Bugaev wrote: > - newaddr = mmap (0, newsize * sizeof (*p->pagemap), > - PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); > + void *newaddr = reallocarray (p->pagemap, off, > +sizeof (*p->pagemap)); It seems while fixing preexisting issues I accidentally introduces a new one. Unlike mmap, reallocarray does not zero-initialize the newly allocated memory; but other code expects new pagemap entries to be set to zero. This is the cause of those tarfs hangs I've been seeing lately. I'm sorry, and here's a hotfix. Sergey -- >8 -- Subject: [PATCH] libpager: Properly zero-initialize pagemap Unlike mmap () and calloc (), reallocarray () does not automatically zero-fill the newly allocated memory. Do so explicitly. --- libpager/pagemap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libpager/pagemap.c b/libpager/pagemap.c index 7bbb8c56..c7c86d60 100644 --- a/libpager/pagemap.c +++ b/libpager/pagemap.c @@ -32,6 +32,8 @@ _pager_pagemap_resize (struct pager *p, vm_address_t off) if (!newaddr) return errno; + memset ((short *) newaddr + p->pagemapsize, 0, + (off - p->pagemapsize) * sizeof (*p->pagemap)); p->pagemap = newaddr; p->pagemapsize = off; } -- 2.31.1
settrans: (os/kern) invalid right
Hello yet again, I'm hitting the following issue when trying to re-set a translator on a node whose translator has previously died: $ settrans -acP /tmp/yes ~/dev-yes/hurd/yes Translator pid: 1039 Pausing... $ kill 1039 $ settrans -ag /tmp/yes ~/dev-yes/hurd/yes settrans: /tmp/yes: (os/kern) invalid right I've traced it to fshelp_fetch_control () returning KERN_INVALID_RIGHT when the control right of the previously set translator turns out to be a dead name. fshelp_fetch_control () actually handles this case with a specific code path which resets the stored port to MACH_PORT_NULL, but it returns an error nevertheless. Digging up git history, the two relevant commits are [0] and [1]. Commit [0] introduces this special handling for dead names, and actually tries to deallocate the dead name and return 0 + MACH_PORT_NULL from the fshelp_fetch_control () call. Commit [1] reverts a part of this logic: fshelp_fetch_control () no longer tries to deallocate the dead name (so the task keeps it forever?..), and no longer returns 0 to the caller, propagating KERN_INVALID_RIGHT instead. Which is the source of the error I'm seeing. Indeed, reverting [1] seems to fix my issue. I don't think I understand the reasoning behind [1]. Perhaps I'm missing something? Sergey [0]: http://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=3319f7f6a238cf88b9f46849e7be84d3c96376d6 [1]: http://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=7b9011628ec0dea3e01b19b75013ce5cf5b9c841
Re: settrans: (os/kern) invalid right
On Sat, May 8, 2021 at 2:39 PM Samuel Thibault wrote: > I pushed the revert, thanks! Thank you! I'm attaching another patch that tries to make settrans handle this case a tiny bit better: if we failed to set the translator (either because of this issue, or for any other reason), let's shut the translator back down instead of leaving it in limbo. Sergey P.S. There must be something wrong with your mailserver. I only received a bunch of replies form you just a few minutes ago, and seemingly so did the archive [0], but your citation indicates that you have sent the reply promptly after my message. [0] https://lists.gnu.org/archive/html/bug-hurd/2021-05/ -- >8 -- Subject: [PATCH] utils/settrans: Ask translator to go away on errors If the translator starts up fine, but we fail to set it as the active translator on the underlying node for whatever reason, ask it to shut down again, instead of leaving it orphaned. This is what mount(1) does already. To test: $ settrans -a /dev/null /hurd/hello settrans: /dev/null: Operation not permitted $ pgrep hello 1141 That means PID 1141, the translator we started, is still alive. With this patch it shuts down, and pgrep returns nothing. --- utils/settrans.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/utils/settrans.c b/utils/settrans.c index 9c9f087e..f07d294b 100644 --- a/utils/settrans.c +++ b/utils/settrans.c @@ -396,7 +396,11 @@ main(int argc, char *argv[]) argz, argz_len, active_control, MACH_MSG_TYPE_COPY_SEND); if (err) - error (5, err, "%s", node_name); +{ + if (active_control != MACH_PORT_NULL) +fsys_goaway (active_control, FSYS_GOAWAY_FORCE); + error (5, err, "%s", node_name); +} } if (chroot_command) -- 2.31.1
Re: And another patch...
On Sat, May 8, 2021 at 2:36 PM Samuel Thibault wrote: > > Hello, > > Just to be sure: this is not only because of your previous patch > libpager: Do not flush in-core pages on offer > ? Yes; this issue is completely orthogonal to that fix (but there's an interesting interaction I haven't thought about, see below) This issue is about the Mach mechanism of "precious pages". Namely, when offering a page to the kernel (with memory_object_data_supply () or pager_offer_page () which wraps it), a memory manager can tell the kernel that the page is "precious". This means the kernel should not silently drop the page when the kernel no longer needs the page, even if the page is not dirtied (written to by clients) in the meantime. The intended use case is the memory manager supplying the kernel with the actual page where it stores its data, not a copy from some other sort of storage. The kernel will eventually return the page through memory_object_data_return (). Now, offering precious pages that are already in core (this is where that other patch is relevant) will result in KERN_MEMORY_PRESENT error being delivered via memory_object_supply_completed ()... which libpager currently ignores. So that might be something to fix, too. Thanks! Now, for something slightly different: > @@ -205,14 +190,13 @@ _pager_do_write_request (struct pager *p, > } >else > { > - munmap ((void *) (data + (vm_page_size * i)), > - vm_page_size); > notified[i] = (! kcopy && p->notify_on_evict); > if (! kcopy) > pm_entries[i] &= ~PM_INCORE; >} I've included this in the patch up-thread; it definitely should have been a separate patch; and now I'm doubting if it's a right change at all. Namely, pager_write_page () is documented to "In addition, mfree (or equivalent) BUF." I don't know what mfree is, but I assume this means munmap () or vm_deallocate (). In other words, pager_write_page () is documented to _assume ownership_ of the passed in page. So it would seem like _pager_do_write_request () *also* trying to unmap the page after the pager_write_page () calls is wrong (and may even lead to accidentally unmapping an unrelated page), which is why I removed it. However. The other branch in _pager_do_write_request () tries to return the page back to the kernel, so it definitely assumes that the page is still valid/available. And looking at various pager_write_page () implementations throughout the Hurd, some do deallocate the page (storeio), but most don't (defpager, ext2fs, fatfs). So overall, it would appear that pager_write_page () is actually *not* supposed to deallocate the page. If that's correct, storeio (and potentially other users outside of the main tree?) needs changing, and the documentation for pager_write_page () needs to be fixed. What do you think? Sergey
Re: [RFC PATCH tarfs 0/6] mmap support for tarfs
On Sat, May 8, 2021 at 2:40 PM Samuel Thibault wrote: > Ah, seeing zipstores.c I thought zipfiles were supported, so I was > wondering how since, as you say, they combine both. By the way, upstream libstore itself supports gzip & bzip2; I have only briefly looked into why tarfs needs to ship its own implementation. Upstream libstore seems to only support store_gunzip and store_bunzip2, which, to quote the docs, "contains a snapshot of the uncompressed contents of the store FROM". The version shipped with tarfs has store_gzip and store_bzip2, seemingly providing both reading and writing support. Any idea why the implementations have not been upstreamed? Sergey
[PATCH 1/4] libpager: pager_write_page () should not unmap page
Clarify this in the documentation, and fix the storeio implementation. --- doc/hurd.texi | 6 +++--- libpager/pager.h| 6 +++--- storeio/pager.c | 8 +++- tmpfs/pager-stubs.c | 6 +++--- 4 files changed, 12 insertions(+), 14 deletions(-) diff --git a/doc/hurd.texi b/doc/hurd.texi index c8db86b5..239d82da 100644 --- a/doc/hurd.texi +++ b/doc/hurd.texi @@ -1536,9 +1536,9 @@ only permissible error returns are @code{EIO}, @code{EDQUOT}, and @deftypefun error_t pager_write_page (@w{struct user_pager_info *@var{pager}}, @w{vm_offset_t @var{page}}, @w{vm_address_t @var{buf}}) For pager @var{pager}, synchronously write one page from @var{buf} to -offset @var{page}. In addition, @code{vm_deallocate} (or equivalent) -@var{buf}. The only permissible error returns are @code{EIO}, -@code{EDQUOT}, and @code{ENOSPC}. +offset @var{page}. Do not deallocate @var{buf}, and do not keep any +references to @var{buf}. The only permissible error returns are +@code{EIO}, @code{EDQUOT}, and @code{ENOSPC}. @end deftypefun @deftypefun error_t pager_unlock_page (@w{struct user_pager_info *@var{pager}}, @w{vm_offset_t @var{address}}) diff --git a/libpager/pager.h b/libpager/pager.h index df7fbfc8..5c03b748 100644 --- a/libpager/pager.h +++ b/libpager/pager.h @@ -191,9 +191,9 @@ pager_read_page (struct user_pager_info *pager, int *write_lock); /* The user must define this function. For pager PAGER, synchronously - write one page from BUF to offset PAGE. In addition, mfree - (or equivalent) BUF. The only permissible error returns are EIO, - EDQUOT, and ENOSPC. */ + write one page from BUF to offset PAGE. Do not deallocate BUF, and do + not keep any references to BUF. The only permissible error returns + are EIO, EDQUOT, and ENOSPC. */ error_t pager_write_page (struct user_pager_info *pager, vm_offset_t page, diff --git a/storeio/pager.c b/storeio/pager.c index 12387939..01a1525c 100644 --- a/storeio/pager.c +++ b/storeio/pager.c @@ -66,9 +66,9 @@ pager_read_page (struct user_pager_info *upi, return 0; } -/* For pager PAGER, synchronously write one page from BUF to offset PAGE. In - addition, vm_deallocate (or equivalent) BUF. The only permissible error - returns are EIO, EDQUOT, and ENOSPC. */ +/* For pager PAGER, synchronously write one page from BUF to offset PAGE. + Do not deallocate BUF, and do not keep any references to BUF. The only + permissible error returns are EIO, EDQUOT, and ENOSPC. */ error_t pager_write_page (struct user_pager_info *upi, vm_offset_t page, vm_address_t buf) @@ -90,8 +90,6 @@ pager_write_page (struct user_pager_info *upi, err = dev_write (dev, page, (char *)buf, want, &written); - munmap ((caddr_t) buf, vm_page_size); - if (err || written < want) return EIO; else diff --git a/tmpfs/pager-stubs.c b/tmpfs/pager-stubs.c index 3299e218..883a635e 100644 --- a/tmpfs/pager-stubs.c +++ b/tmpfs/pager-stubs.c @@ -36,9 +36,9 @@ pager_read_page (struct user_pager_info *pager, } /* The user must define this function. For pager PAGER, synchronously - write one page from BUF to offset PAGE. In addition, mfree - (or equivalent) BUF. The only permissible error returns are EIO, - EDQUOT, and ENOSPC. */ + write one page from BUF to offset PAGE. Do not deallocate BUF, and do + not keep any references to BUF. The only permissible error returns + are EIO, EDQUOT, and ENOSPC. */ error_t pager_write_page (struct user_pager_info *pager, vm_offset_t page, -- 2.31.1
[PATCH 4/4] libpager: Do not throw away precious pages
The kernel invokes memory_object_data_return () with dirty = 0 in two cases: if notification on eviction was requested, or when returning precious pages. Properly distinguish between the two cases, and only throw the clean page away in the former case. --- libpager/data-return.c | 54 ++ 1 file changed, 18 insertions(+), 36 deletions(-) diff --git a/libpager/data-return.c b/libpager/data-return.c index 2cc9f1f8..304e88bc 100644 --- a/libpager/data-return.c +++ b/libpager/data-return.c @@ -93,23 +93,6 @@ _pager_do_write_request (struct pager *p, pm_entries = &p->pagemap[offset / __vm_page_size]; - if (! dirty) -{ - munmap ((void *) data, length); - if (!kcopy) { -/* Prepare notified array. */ -for (i = 0; i < npages; i++) - notified[i] = (p->notify_on_evict - && ! (pm_entries[i] & PM_PAGEINWAIT)); - -goto notify; - } - else { -_pager_allow_termination (p); -goto release_out; - } -} - /* Make sure there are no other in-progress writes for any of these pages before we begin. This imposes a little more serialization than we really have to require (because *all* future writes on @@ -118,26 +101,24 @@ _pager_do_write_request (struct pager *p, /* XXX: Is this still needed? */ retry: for (i = 0; i < npages; i++) -if (pm_entries[i] & PM_PAGINGOUT) - { - pm_entries[i] |= PM_WRITEWAIT; - pthread_cond_wait (&p->wakeup, &p->interlock); - goto retry; +{ + if ((initializing && (pm_entries[i] & PM_INIT)) || + (!dirty && !(pm_entries[i] & PM_PRECIOUS))) +{ + omit_data[i] = 1; + continue; +} + if (pm_entries[i] & PM_PAGINGOUT) +{ + pm_entries[i] |= PM_WRITEWAIT; + pthread_cond_wait (&p->wakeup, &p->interlock); + goto retry; } +} /* Mark these pages as being paged out. */ - if (initializing) -{ - for (i = 0; i < npages; i++) - { - if (pm_entries[i] & PM_INIT) - omit_data[i] = 1; - else - pm_entries[i] |= PM_PAGINGOUT | PM_INIT; - } -} - else -for (i = 0; i < npages; i++) + for (i = 0; i < npages; i++) +if (!omit_data[i]) pm_entries[i] |= PM_PAGINGOUT | PM_INIT; /* If this write occurs while a lock is pending, record @@ -179,7 +160,9 @@ _pager_do_write_request (struct pager *p, { if (omit_data[i]) { - notified[i] = 0; + notified[i] = (p->notify_on_evict + && !kcopy + && ! (pm_entries[i] & PM_PAGEINWAIT)); continue; } @@ -225,7 +208,6 @@ _pager_do_write_request (struct pager *p, if (wakeup) pthread_cond_broadcast (&p->wakeup); - notify: _pager_allow_termination (p); pthread_mutex_unlock (&p->interlock); -- 2.31.1
[PATCH 0/4] _pager_do_write_request fixes
This is the same patch, split into three for readabilty and reviewability. I'm still looking for feedback :) Instead of removing the munmap from _pager_do_write_request (), document that pager_write_page () is *not* supposed to unmap the page (nor keep references to it), and remove the munmap call from storeio. Sergey Bugaev (4): libpager: pager_write_page () should not unmap page libpager: Store omit_data in an array libpager: Track which pages are precious libpager: Do not throw away precious pages doc/hurd.texi | 6 ++-- libpager/data-return.c | 67 -- libpager/offer-page.c | 2 ++ libpager/pager.h | 6 ++-- libpager/priv.h| 1 + storeio/pager.c| 8 ++--- tmpfs/pager-stubs.c| 6 ++-- 7 files changed, 41 insertions(+), 55 deletions(-) -- 2.31.1
[PATCH 2/4] libpager: Store omit_data in an array
While this is less efficient than using a single bit per page, there is no guarantee that the kernel will not send us more than 32 pages at a time. Indeed, one can easily craft a situation where it will send more by unmapping a large mapping. Also, this function already uses on-stack arrays for tracking other information about the pages; so do the same for whether the data should be omited. --- libpager/data-return.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/libpager/data-return.c b/libpager/data-return.c index c0f5aaf7..db2f84e6 100644 --- a/libpager/data-return.c +++ b/libpager/data-return.c @@ -38,12 +38,12 @@ _pager_do_write_request (struct pager *p, short *pm_entries; int npages, i; char *notified; + char *omit_data; error_t *pagerrs; struct lock_request *lr; struct lock_list {struct lock_request *lr; struct lock_list *next;} *lock_list, *ll; int wakeup; - int omitdata = 0; if (!p || p->port.class != _pager_class) @@ -78,6 +78,9 @@ _pager_do_write_request (struct pager *p, npages = length / __vm_page_size; pagerrs = alloca (npages * sizeof (error_t)); + omit_data = alloca (npages * (sizeof *omit_data)); + memset (omit_data, 0, npages * (sizeof *omit_data)); + notified = alloca (npages * (sizeof *notified)); #ifndef NDEBUG memset (notified, -1, npages * (sizeof *notified)); @@ -125,11 +128,10 @@ _pager_do_write_request (struct pager *p, /* Mark these pages as being paged out. */ if (initializing) { - assert_backtrace (npages <= 32); for (i = 0; i < npages; i++) { if (pm_entries[i] & PM_INIT) - omitdata |= 1 << i; + omit_data[i] = 1; else pm_entries[i] |= PM_PAGINGOUT | PM_INIT; } @@ -162,7 +164,7 @@ _pager_do_write_request (struct pager *p, but until the pager library interface is changed, this will have to do. */ for (i = 0; i < npages; i++) -if (!(omitdata & (1 << i))) +if (!omit_data[i]) pagerrs[i] = pager_write_page (p->upi, offset + (vm_page_size * i), data + (vm_page_size * i)); @@ -175,7 +177,7 @@ _pager_do_write_request (struct pager *p, wakeup = 0; for (i = 0; i < npages; i++) { - if (omitdata & (1 << i)) + if (omit_data[i]) { notified[i] = 0; continue; -- 2.31.1
[PATCH 3/4] libpager: Track which pages are precious
This introduces the PM_PRECIOUS bit, which gets set when a precious page is offered to the kernel using pager_offer_page () with the precious argument set, and cleared when the page is returned back. --- libpager/data-return.c | 3 ++- libpager/offer-page.c | 2 ++ libpager/priv.h| 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/libpager/data-return.c b/libpager/data-return.c index db2f84e6..2cc9f1f8 100644 --- a/libpager/data-return.c +++ b/libpager/data-return.c @@ -214,7 +214,8 @@ _pager_do_write_request (struct pager *p, pm_entries[i] &= ~PM_INCORE; } - pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT | PM_WRITEWAIT); + pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT + | PM_WRITEWAIT | PM_PRECIOUS); } for (ll = lock_list; ll; ll = ll->next) diff --git a/libpager/offer-page.c b/libpager/offer-page.c index 26f88ca3..392e83b8 100644 --- a/libpager/offer-page.c +++ b/libpager/offer-page.c @@ -38,6 +38,8 @@ pager_offer_page (struct pager *p, short *pm_entry = &p->pagemap[offset / vm_page_size]; *pm_entry |= PM_INCORE; + if (precious) +*pm_entry |= PM_PRECIOUS; err = memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0, writelock ? VM_PROT_WRITE : VM_PROT_NONE, diff --git a/libpager/priv.h b/libpager/priv.h index d9d76965..a5e22f36 100644 --- a/libpager/priv.h +++ b/libpager/priv.h @@ -108,6 +108,7 @@ extern int _pager_page_errors[]; /* Pagemap format */ /* These are binary state bits */ +#define PM_PRECIOUS 0x0400 /* return even if not dirty */ #define PM_WRITEWAIT 0x0200 /* queue wakeup once write is done */ #define PM_INIT 0x0100/* data has been written */ #define PM_INCORE 0x0080 /* kernel might have a copy */ -- 2.31.1
glibc THREAD_GSCOPE and excessive gsync_wake ()
I've noticed that even a simple hello world does a lot of gsync calls: $ rpctrace echo hello world |& grep -c gsync 53 These are in fact all identical gsync_wake () calls on a single address: $ rpctrace echo hello world |& grep gsync | uniq task132(pid18549)->gsync_wake (202820 0 0); This immediately screams that something is wrong to me, for two reasons: * Surely there's just a single thread in /bin/echo! Well, there's always the implicit signal/msg thread, but it shouldn't be creating *that* much contention. * gsync_wake () calls are useless unless paired with gsync_wait (). I traced the gsync_wake () calls to the THREAD_GSCOPE macros in glibc (sysdeps/mach/hurd/tls.h). As I understand it from looking at the Linux version, THREAD_GSCOPE is supposed to be an RCU-like mechanism, where a thread can wait until all the other threads have left the "global scope" at least once. The Hurd version is different, it waits for all the other threads to be outside the "global scope" *at the same time*. Which also sounds wrong to me, because this way threads constantly entering and exiting the "global scope" at the same time can completely starve a waiting thread, unlike with RCU. I'm probably misunderstanding something though. Anyway, the Hurd implementation seems pretty straightforward: it keeps a single global counter, increments it when entering "global scope", decrements it when exiting it. If this was the last thread exiting, it calls gsync_wake (), and this is where the excessive gsync_wake () calls come from. In my experience building synchronization primitives on top of futexes, one of the most important things to do is make it so that the fast (uncontended) path never requires calling into the kernel. Any (reasonably small) amount of userspace logic is far cheaper than calling into the kernel. Which is why you add various "have waiters" flags, to only call futex(FUTEX_WAKE_PRIVATE) if there are, in fact, any waiters. So I took a stab at implementing this for THREAD_GSCOPE. I'm currently building glibc to see if it works; but I might have easily messed up ("futexes are tricky"!). Please tell me if I did :) This is not supposed to be a patch ready for merging (I don't even know if this is the right list to send glibc patches to); just an RFC. Sergey -- >8 -- diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h index f83956d3..9fccd9f2 100644 --- a/sysdeps/mach/hurd/tls.h +++ b/sysdeps/mach/hurd/tls.h @@ -55,22 +55,46 @@ /* Global scope switch support. */ #define THREAD_GSCOPE_IN_TCB 0 #define THREAD_GSCOPE_GLOBAL + +#define THREAD_GSCOPE_WAITERS (1 << 31) + #define THREAD_GSCOPE_SET_FLAG() \ atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1) + #define THREAD_GSCOPE_RESET_FLAG() \ - do \ -if (atomic_exchange_and_add_rel (&GL(dl_thread_gscope_count), -1) == 1) \ - lll_wake (GL(dl_thread_gscope_count), 0); \ + do \ +{ \ + int count; \ + count = atomic_exchange_and_add_rel (&GL(dl_thread_gscope_count), -1); \ + if (count == (1 | THREAD_GSCOPE_WAITERS)) \ +{ \ + /* We're going to wake everyone; clear the waiters flag. */ \ + count = atomic_and_val (&GL(dl_thread_gscope_count), ~THREAD_GSCOPE_WAITERS); \ + if (count & THREAD_GSCOPE_WAITERS) \ +lll_wake (GL(dl_thread_gscope_count), GSYNC_BROADCAST); \ +} \ +} \ while (0) + #define THREAD_GSCOPE_WAIT() \ - do \ -{ \ - int count; \ - atomic_write_barrier (); \ - while ((count = GL(dl_thread_gscope_count))) \ -lll_wait (GL(dl_thread_gscope_count), count, 0); \ -} \ - while (0) + do \ +{ \ + int count; \ + count = atomic_load_acquire (&GL(dl_thread_gscope_count));\ + if (count == 0) \ +break; \ + if (!(count & THREAD_GSCOPE_WAITERS)) \ +{ \ + /* Record that we're going to wait. */ \ + if (atomic_compare_and_exchange_bool_acq ( \ +&GL(dl_thread_gscope_count), \ +count | THREAD_GSCOPE_WAITERS, count)) \ +continue; \ + count |= THREAD_GSCOPE_WAITERS; \ +} \ + lll_wait (GL(dl_thread_gscope_count), count, 0); \ +} \ + while (1) #endif /* !ASSEMBLER */
Re: glibc THREAD_GSCOPE and excessive gsync_wake ()
On Sat, May 8, 2021 at 10:54 PM Samuel Thibault wrote: > Don't bother looking at the existing implementation, its roots dates > before we really had gsync working. Interesting! As far as I can see, this implementation was committed (by you) in 2018; and gsync has not seen significant changes since 2017. What am I missing? > While at it, rather try to reproduce the other archs' behavior: use > gscope_flag from the tcb, and switch THREAD_GSCOPE_IN_TCB to 1. That > will make the implementation much simpler and scalable. Ah, I assumed this was on purpose. Thanks for the pointer. Sergey
[RFC PATCH glibc 3/3] XXX: Reimplement gscope
This moves the Hurd gsope implementation to be more like the Linux one. --- sysdeps/generic/ldsodefs.h | 2 -- sysdeps/htl/dl-thread_gscope_wait.c | 36 + sysdeps/mach/hurd/i386/tls.h| 22 ++ sysdeps/mach/hurd/tls.h | 20 4 files changed, 58 insertions(+), 22 deletions(-) create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index 4604d00b..af16162f 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -1324,10 +1324,8 @@ link_map_audit_state (struct link_map *l, size_t index) } #endif /* SHARED */ -#if THREAD_GSCOPE_IN_TCB void __thread_gscope_wait (void) attribute_hidden; # define THREAD_GSCOPE_WAIT() __thread_gscope_wait () -#endif __END_DECLS diff --git a/sysdeps/htl/dl-thread_gscope_wait.c b/sysdeps/htl/dl-thread_gscope_wait.c new file mode 100644 index ..66b75194 --- /dev/null +++ b/sysdeps/htl/dl-thread_gscope_wait.c @@ -0,0 +1,36 @@ +/* TODO: copyright heading */ + +#include +#include +#include + +void +__thread_gscope_wait (void) +{ + size_t i; + struct __pthread *t; + int *flag_ptr; + + lll_lock (GL (dl_pthread_threads_lock), 0); + + for (i = 0; i < GL (dl_pthread_num_threads); ++i) +{ + t = GL (dl_pthread_threads[i]); + if (t == NULL) +continue; + + flag_ptr = &t->tcb->gscope_flag; + again: + if (atomic_load_acquire (flag_ptr) == THREAD_GSCOPE_FLAG_UNUSED) +continue; + + if (atomic_compare_and_exchange_bool_acq (flag_ptr, +THREAD_GSCOPE_FLAG_WAIT, +THREAD_GSCOPE_FLAG_USED)) +goto again; + + lll_wait (flag_ptr, THREAD_GSCOPE_FLAG_WAIT, 0); +} + + lll_unlock (GL (dl_pthread_threads_lock), 0); +} diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h index 057b2613..0ca4e641 100644 --- a/sysdeps/mach/hurd/i386/tls.h +++ b/sysdeps/mach/hurd/i386/tls.h @@ -369,6 +369,28 @@ _hurd_tls_new (thread_t child, struct i386_thread_state *state, tcbhead_t *tcb) return err; } +/* Global scope switch support. */ +#define THREAD_GSCOPE_LINK_MAP 0 + +# define THREAD_GSCOPE_FLAG_UNUSED 0 +# define THREAD_GSCOPE_FLAG_USED 1 +# define THREAD_GSCOPE_FLAG_WAIT 2 + +#define THREAD_GSCOPE_SET_FLAG() \ + THREAD_SETMEM (THREAD_SELF, gscope_flag, THREAD_GSCOPE_FLAG_USED) + +#define THREAD_GSCOPE_RESET_FLAG() \ + ({ \ +int __flag;\ +asm volatile ("xchgl %0, %%gs:%P1" \ + : "=r" (__flag) \ + : "i" (offsetof (tcbhead_t, gscope_flag)), \ +"0" (THREAD_GSCOPE_FLAG_UNUSED)); \ +if (__flag == THREAD_GSCOPE_FLAG_WAIT) \ + lll_wake (THREAD_SELF->gscope_flag, GSYNC_BROADCAST);\ + }) + + #endif /* !__ASSEMBLER__ */ #endif /* i386/tls.h */ diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h index 8cbf22aa..8e66d5ff 100644 --- a/sysdeps/mach/hurd/tls.h +++ b/sysdeps/mach/hurd/tls.h @@ -52,26 +52,6 @@ # define GET_DTV(descr) \ (((tcbhead_t *) (descr))->dtv) -/* Global scope switch support. */ -#define THREAD_GSCOPE_LINK_MAP0 -#define THREAD_GSCOPE_GLOBAL -#define THREAD_GSCOPE_SET_FLAG() \ - atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1) -#define THREAD_GSCOPE_RESET_FLAG() \ - do \ -if (atomic_exchange_and_add_rel (&GL(dl_thread_gscope_count), -1) == 1) \ - lll_wake (GL(dl_thread_gscope_count), 0); \ - while (0) -#define THREAD_GSCOPE_WAIT() \ - do \ -{\ - int count; \ - atomic_write_barrier (); \ - while ((count = GL(dl_thread_gscope_count))) \ -lll_wait (GL(dl_thread_gscope_count), count, 0); \ -}\ - while (0) - #endif /* !ASSEMBLER */ -- 2.31.1
[RFC PATCH glibc 1/3] XXX: Rename THREAD_GSCOPE_IN_TCB -> THREAD_GSCOPE_LINK_MAP
This patch is probably bogus; but I'm including it anyway. I've noticed that THREAD_GSCOPE_IN_TCB is actually being used not to check whether the thread global scope state is being stored in TCB, but rather as a general flag to distinguish between NPTL and HTL. When THREAD_GSCOPE_IN_TCB is defined, glibc (namely elf/dl-*) uses NPTL-specific internals, such as their stack link map. Since the following patches will move HTL towards having the global scope flag in TCB as well, but the distinction between NPTL and HTL is still useful, I'm trying to rename this flag. This new name is also bad, though. Overall, it was my impression that glibc tries hard not to have ifdef'ed code paths (dependent on the platform being built for) in common code; they seem to use platform-specific versions of headers instead. So perhaps instead of having an environment variable, the right thing to do would be to move this code into platform-specific files. --- elf/dl-reloc.c| 4 ++-- elf/dl-support.c | 2 +- elf/dl-tls.c | 6 +++--- elf/rtld.c| 2 +- sysdeps/aarch64/nptl/tls.h| 2 +- sysdeps/alpha/nptl/tls.h | 2 +- sysdeps/arc/nptl/tls.h| 2 +- sysdeps/arm/nptl/tls.h| 2 +- sysdeps/csky/nptl/tls.h | 2 +- sysdeps/generic/ldsodefs.h| 8 sysdeps/generic/tls.h | 3 ++- sysdeps/hppa/nptl/tls.h | 2 +- sysdeps/i386/nptl/tls.h | 2 +- sysdeps/ia64/nptl/tls.h | 2 +- sysdeps/m68k/nptl/tls.h | 2 +- sysdeps/mach/hurd/tls.h | 2 +- sysdeps/microblaze/nptl/tls.h | 2 +- sysdeps/mips/nptl/tls.h | 2 +- sysdeps/nios2/nptl/tls.h | 2 +- sysdeps/powerpc/nptl/tls.h| 2 +- sysdeps/riscv/nptl/tls.h | 2 +- sysdeps/s390/nptl/tls.h | 2 +- sysdeps/sh/nptl/tls.h | 2 +- sysdeps/sparc/nptl/tls.h | 2 +- sysdeps/x86_64/nptl/tls.h | 2 +- 25 files changed, 32 insertions(+), 31 deletions(-) diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c index bb9ca1a1..77e10895 100644 --- a/elf/dl-reloc.c +++ b/elf/dl-reloc.c @@ -141,7 +141,7 @@ cannot allocate memory in static TLS block")); } } -#if !THREAD_GSCOPE_IN_TCB +#if !THREAD_GSCOPE_LINK_MAP /* Initialize static TLS area and DTV for current (only) thread. libpthread implementations should provide their own hook to handle all threads. */ @@ -160,7 +160,7 @@ _dl_nothread_init_static_tls (struct link_map *map) memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); } -#endif /* !THREAD_GSCOPE_IN_TCB */ +#endif /* !THREAD_GSCOPE_LINK_MAP */ void _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[], diff --git a/elf/dl-support.c b/elf/dl-support.c index f966a2e7..af340bee 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -189,7 +189,7 @@ ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS; int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable; -#if THREAD_GSCOPE_IN_TCB +#if THREAD_GSCOPE_LINK_MAP list_t _dl_stack_used; list_t _dl_stack_user; int _dl_stack_cache_lock; diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 6baff0c1..8a7c534b 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -29,7 +29,7 @@ #include #include -#if THREAD_GSCOPE_IN_TCB +#if THREAD_GSCOPE_LINK_MAP # include #endif @@ -1010,7 +1010,7 @@ cannot create TLS data structures")); } } -#if THREAD_GSCOPE_IN_TCB +#if THREAD_GSCOPE_LINK_MAP static inline void __attribute__((always_inline)) init_one_static_tls (struct pthread *curp, struct link_map *map) { @@ -1043,4 +1043,4 @@ _dl_init_static_tls (struct link_map *map) lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE); } -#endif /* THREAD_GSCOPE_IN_TCB */ +#endif /* THREAD_GSCOPE_LINK_MAP */ diff --git a/elf/rtld.c b/elf/rtld.c index ad325d4c..1857479a 100644 --- a/elf/rtld.c +++ b/elf/rtld.c @@ -1139,7 +1139,7 @@ dl_main (const ElfW(Phdr) *phdr, struct dl_main_state state; dl_main_state_init (&state); -#if !THREAD_GSCOPE_IN_TCB +#if !THREAD_GSCOPE_LINK_MAP GL(dl_init_static_tls) = &_dl_nothread_init_static_tls; #endif diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h index 6e896207..a21a0826 100644 --- a/sysdeps/aarch64/nptl/tls.h +++ b/sysdeps/aarch64/nptl/tls.h @@ -109,7 +109,7 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -# define THREAD_GSCOPE_IN_TCB 1 +# define THREAD_GSCOPE_LINK_MAP 1 # define THREAD_GSCOPE_FLAG_UNUSED 0 # define THREAD_GSCOPE_FLAG_USED 1 # define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h index 4dbccc52..d15defaa 100644 --- a/sysdeps/alpha/nptl/tls.h +++ b/sysdeps/alpha/nptl/tls.h @@ -103,7 +103,7 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define
[RFC PATCH glibc 2/3] XXX: Move __pthread_threads to ld.so
The next commit is going to introduce a new implementation of THREAD_GSCOPE_WAIT, which needs to access the list of threads. Since it must be usable from the dynamic laoder, we have to move the symbols for the list of threads into the loader. This is my attempt at doing so. --- elf/dl-support.c | 4 ++- htl/Versions | 2 +- htl/pt-alloc.c | 46 htl/pt-create.c | 11 ++ htl/pt-internal.h| 28 +-- sysdeps/generic/ldsodefs.h | 8 - sysdeps/htl/pt-key-delete.c | 8 ++--- sysdeps/htl/pthreadP.h | 2 +- sysdeps/htl/raise.c | 9 +++-- sysdeps/htl/thrd_current.c | 8 +++-- sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- 13 files changed, 67 insertions(+), 65 deletions(-) diff --git a/elf/dl-support.c b/elf/dl-support.c index af340bee..a55834a1 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -194,7 +194,9 @@ list_t _dl_stack_used; list_t _dl_stack_user; int _dl_stack_cache_lock; #else -int _dl_thread_gscope_count; +int _dl_pthread_num_threads; +struct __pthread **_dl_pthread_threads; +int _dl_pthread_threads_lock; void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls; #endif struct dl_scope_free_list *_dl_scope_free_list; diff --git a/htl/Versions b/htl/Versions index 9506043c..07a343f9 100644 --- a/htl/Versions +++ b/htl/Versions @@ -168,7 +168,7 @@ libpthread { GLIBC_PRIVATE { __pthread_initialize_minimal; -__pthread_threads; +# __pthread_threads; __cthread_detach; __cthread_fork; diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c index acc67f27..b60daf81 100644 --- a/htl/pt-alloc.c +++ b/htl/pt-alloc.c @@ -29,17 +29,17 @@ corresponding to that specified by the given thread ID." */ /* Thread ID lookup table. */ -struct __pthread **__pthread_threads; +// struct __pthread **__pthread_threads; /* The size of the thread ID lookup table. */ int __pthread_max_threads; /* The total number of thread IDs currently in use, or on the list of available thread IDs. */ -int __pthread_num_threads; +// int __pthread_num_threads; /* A lock for the table, and the other variables above. */ -pthread_rwlock_t __pthread_threads_lock; +// pthread_rwlock_t __pthread_threads_lock; /* List of thread structures corresponding to free thread IDs. */ struct __pthread *__pthread_free_threads; @@ -132,25 +132,25 @@ __pthread_alloc (struct __pthread **pthread) } retry: - __pthread_rwlock_wrlock (&__pthread_threads_lock); + lll_lock (GL (dl_pthread_threads_lock), 0); - if (__pthread_num_threads < __pthread_max_threads) + if (GL (dl_pthread_num_threads) < __pthread_max_threads) { /* We have a free slot. Use the slot number plus one as the thread ID for the new thread. */ - new->thread = 1 + __pthread_num_threads++; - __pthread_threads[new->thread - 1] = NULL; + new->thread = 1 + GL (dl_pthread_num_threads)++; + GL (dl_pthread_threads)[new->thread - 1] = NULL; - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), 0); *pthread = new; return 0; } #ifdef PTHREAD_THREADS_MAX - else if (__pthread_num_threads >= PTHREAD_THREADS_MAX) + else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX) { /* We have reached the limit on the number of threads per process. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), 0); free (new); return EAGAIN; @@ -162,7 +162,7 @@ retry: memory allocation, since that's a potentially blocking operation. */ max_threads = __pthread_max_threads; - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), 0); /* Allocate a new lookup table that's twice as large. */ new_max_threads @@ -174,13 +174,13 @@ retry: return ENOMEM; } - __pthread_rwlock_wrlock (&__pthread_threads_lock); + lll_lock (GL (dl_pthread_threads_lock), 0); /* Check if nobody else has already enlarged the table. */ if (max_threads != __pthread_max_threads) { /* Yep, they did. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), 0); /* Free the newly allocated table and try again to allocate a slot. */ free (threads); @@ -188,22 +188,22 @@ retry: } /* Copy over the contents of the old table. */ - memcpy (threads, __pthread_threads, + memcpy (threads, GL (dl_pthread_threads), __pthread_max_threads * sizeof (struct __pthread *)); /* Save the location of the old table. We want to deallocate its
[RFC PATCH glibc 0/3] Rewrite THREAD_GSCOPE
Alright, here's my second attempt. On Sat, May 8, 2021 at 10:54 PM Samuel Thibault wrote: > While at it, rather try to reproduce the other archs' behavior: use > gscope_flag from the tcb, and switch THREAD_GSCOPE_IN_TCB to 1. That > will make the implementation much simpler and scalable. Well, this has been much less traightforward than the patch I sent yesterday (but unlike that patch, it appears to build and work). I am much less familiar with glibc internals than I am with Hurd, and it shows. Please don't throw stones at me :) For one thing, I don't think I understand how the buildsystem works. (For instance, how do I even add a new source file to a library? Or where does it decide which system-specific parts to build, such as whether to build NPTL or HTL? How do I debug the buildsystem when something doesn't work as expected?) It must be that I'm used to either straightforward Makefiles, or higher-level buildsystems (like CMake or Meson) that autogenerate the low-level boilerplate. Next, I was somewhat confused by what the GL macro is. My current understanding is: some data needs to be shared between the dynamic loader and glibc proper, if glibc is being built as a shared library. To support that case, such data is placed into a special struct instance that itself resides in ld.so. Then the GL macros is used to actually access those symbols. It switches, mostly transparently, between accessing said global struct instance in ld.so, and simply declaring global variables for the case of static build. Naming such variable with the dl_ prefix is just a convention. I likely got something wrong there. If I got the general idea wrong, the second patch must be completely bogus. Initially, I tried to get rid of THREAD_GSCOPE_IN_TCB completely, since THREAD_GSCOPE is now in TCB in all ports. But it turned out that the way it was actualy being used was not to check if the thread scope flag is in TCB or not, but simply to distingusish NPTL from HTL. Namely, files like elf/dl-tls.c would directly use NPTL specifics like struct pthread if THREAD_GSCOPE_IN_TCB is 1. The actual implementation of THREAD_GSCOPE_* is similar to the one in NPTL. Each thread uses a flag in its TCB. THREAD_GSCOPE_WAIT() has to iterate over all the threads and wait for each one to leave the global scope. This has to be available from inside ld.so, which is why I had to convert __pthread_threads (and friends) to GL (dl_pthread_threads). NPTL does a similar thing with its global thread lists, too. Now, on to the good parts. It builds, and even appears to work somewhat. And all the gsync_wake () calls are gone! ~/glibc/build$ ./testrun.sh /bin/uname GNU ~/glibc/build$ ./testrun.sh --tool=rpctrace /bin/uname |& grep gsync ~/glibc/build$ Sergey Bugaev (3): XXX: Rename THREAD_GSCOPE_IN_TCB -> THREAD_GSCOPE_LINK_MAP XXX: Move __pthread_threads to ld.so XXX: Reimplement gscope elf/dl-reloc.c | 4 +-- elf/dl-support.c | 6 ++-- elf/dl-tls.c | 6 ++-- elf/rtld.c | 2 +- htl/Versions | 2 +- htl/pt-alloc.c | 46 htl/pt-create.c | 11 ++ htl/pt-internal.h| 28 +-- sysdeps/aarch64/nptl/tls.h | 2 +- sysdeps/alpha/nptl/tls.h | 2 +- sysdeps/arc/nptl/tls.h | 2 +- sysdeps/arm/nptl/tls.h | 2 +- sysdeps/csky/nptl/tls.h | 2 +- sysdeps/generic/ldsodefs.h | 18 ++ sysdeps/generic/tls.h| 3 +- sysdeps/hppa/nptl/tls.h | 2 +- sysdeps/htl/dl-thread_gscope_wait.c | 36 +++ sysdeps/htl/pt-key-delete.c | 8 ++--- sysdeps/htl/pthreadP.h | 2 +- sysdeps/htl/raise.c | 9 +++-- sysdeps/htl/thrd_current.c | 8 +++-- sysdeps/i386/nptl/tls.h | 2 +- sysdeps/ia64/nptl/tls.h | 2 +- sysdeps/m68k/nptl/tls.h | 2 +- sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- sysdeps/mach/hurd/i386/tls.h | 22 sysdeps/mach/hurd/tls.h | 20 --- sysdeps/microblaze/nptl/tls.h| 2 +- sysdeps/mips/nptl/tls.h | 2 +- sysdeps/nios2/nptl/tls.h | 2 +- sysdeps/powerpc/nptl/tls.h | 2 +- sysdeps/riscv/nptl/tls.h | 2 +- sysdeps/s390/nptl/tls.h | 2 +- sysdeps/sh/nptl/tls.h| 2 +- sysdeps/sparc/nptl/tls.h | 2 +- sysdeps/x86_64/nptl/tls.h| 2 +- 38 files changed, 156 insertions(+), 117 d
rpctrace output improvements?
Hello, I use rpctrace a lot, but I'm not particularly fond of its output format. Other tracing tools, like strace [0] on Linux and Darling's xtrace [1] can symbolicate many flags and structures, producing something that looks much closer to C code, and comprehensible at a glance. [0]: https://strace.io/ [1]: https://docs.darlinghq.org/contributing/debugging.html#xtrace I thought I'd look if I could start implementing the same for rpctrace. Here's sample rpctrace output: 111<--144(pid1004)->dir_lookup ("proc/loadavg" 1 0) = 0 1 "loadavg" 163<--162(pid1004) task133(pid1004)->mach_port_mod_refs (pn{ 21} 0 1) = 0 163<--162(pid1004)->dir_lookup ("loadavg" 1 0) = 0 1 ""165<--158(pid1004) I was thinking it could look like this instead (mock-up): dir_lookup (file 111 (/), "proc/loadavg", O_READ, 0) -> FS_RETRY_NORMAL, "loadavg", file 163 (/proc) mach_port_mod_refs (task 133 (pid 1004), pn 21, MACH_PORT_RIGHT_SEND, 1) = KERN_SUCCESS dir_lookup (file 163 (/proc), "loadavg", O_READ, 0) -> FS_RETRY_NORMAL, "", file 165 (/proc/loadavg) The few changes I've made here are: * Moved the request port into the parens, like in C and MIG * Added commas * Replaced = with -> for denoting out params * Dropped the 0 return code, except when there are no out params * Symbolicated flags and file paths (potentially, also proc and auth handles) * Some other misc tweaks to how ports are represented I have started working on implementing this; but thought I'd ask for feedback first. What do you think, does this sound worthwhile? Perhaps you like some of the proposed changes but not others? Or is this a terrible idea that I should forget about? Sergey P. S. There are other interesting features that we could copy from strace, such as dynamically attaching to a process (I bet that would be fun to implement!), tracing a subset of all calls (maps very well to different RPC subsystems), and injecting errors (very useful for testing all error handling code paths).
Re: rpctrace output improvements?
On Wed, May 12, 2021 at 11:48 PM Samuel Thibault wrote: > > * Symbolicated flags and file paths (potentially, also proc and auth > > handles) > > For file path it'll probably be much harder since there can be several > path (or even no path!) for a given file port. And as far as I can see, there's no RPC to get the parent directory of a file. I can do that for a dir with dir_lookup ("..") (and that's what _hurd_canonicalize_directory_name_internal () uses), but it won't work with arbitrary files. So my plan is to use _hurd_canonicalize_directory_name_internal () with directories (previously unknown ports when they're used as a request port in a dir_* RPC), and take note of file paths as file ports are received from dir_lookup (). This should catch all files that the tracee opens; but not any fds it inherits. This should not be a big deal though; since when using rpctrace std{in,out,err} will typically be the terminal, and for those we can always get a name. Of course, there'd always be a fallback: if we see an unrecognized flag value, or a port we cannot symbolicate, we'd just print its raw value (as now). > > and injecting errors (very useful for testing all error handling code > > paths). > > That's probably a bit tricky to express. I was thinking it'd use the same format as strace, e.g. --inject=file:error=ENOENT:when=9..10 (inject ENOENT into the 9th and 10th syscalls from the "file" group). Sergey P. S. Could you please take a look at my glibc patches in the other thread?
Re: rpctrace output improvements?
On Wed, May 12, 2021 at 11:48 PM Samuel Thibault wrote: > It'd probably mean introducing types > here and there, but that'd be a really good thing to do. > > The port type is useful indeed and is most often available in the types > already. Oh, do you mean the types specified in .defs? If so, the problem is rpctrace doesn't read those, it only reads the msgids files, which contain lines like this: fs 2 dir_lookup 18 20018 20118 (meaning, subsystem fs whose base msgid is 2, routine name is dir_lookup, msgid relative to subsystem base is 18, absolute msgid is 20018, reply msgid is 20118). There are no types in there. So either: * rpctrace has to learn to parse defs, instead of (or in addition to) msgids * GNU MIG needs to be taught to emit more info into msgids files * rcptrace itself would hardcode knowledge of signatures of some common RPCs My plan has been the latter, since there likely aren't that many RPCs that need specific treatment, and since rpctrace already knows to treat some RPCs (slightly) specially. But perhaps you would recommend something else? Or did you mean something else entirely? On Thu, May 13, 2021 at 12:15 AM Samuel Thibault wrote: > > And as far as I can see, there's no RPC to get the parent directory of > > a file. > > Sure since the file can be hardlinked from various places. Yes; but other systems manage to still make this work. E.g. on my GNU/Linux box: $ touch foo $ ln foo bar $ ls -l /proc/self/fd/{3,4} 3>foo 4>bar l-wx--. 1 sergey sergey 64 May 13 14:59 /proc/self/fd/3 -> /tmp/foo l-wx--. 1 sergey sergey 64 May 13 14:59 /proc/self/fd/4 -> /tmp/bar I don't know how this work in Linux; but in the SerenityOS kernel, an open FileDescription keeps a reference to a Custody (an which is a chain of Inode's + file names), which can then be used to recreate the path the FileDescription has been opened through. And the Hurd also does this, in fact! libnetfs and libdiskfs already have char *path in struct peropen, meaning they already know for which path this individual peropen was created, even if there are several paths that can be used to get to the [i]node. It's just that there's no RPC that exposes this info to clients. > > P. S. Could you please take a look at my glibc patches in the other thread? > > It's piled up in the middle of so many dozens of stuff that people throw > at me to do. Well... thank you for reviewing my patches and otherwise guiding me so far. And if there's a better way to contribute beside throwing even more patches at you, I'd be glad to. Sergey
How do I disclose a vulnerability?
As luck would have it, I have found a serious issue in a core component of the Hurd. It is a denial of service, which can then be turned into privilege escalation. I have developed an exploit. Here is it in action: sergey@sergey-hurd-box:~/hax$ id uid=1000(sergey) gid=1000(sergey) groups=1000(sergey),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),103(netdev) sergey@sergey-hurd-box:~/hax$ ./hax Got root auth port :) root@sergey-hurd-box:~/hax# id uid=0(root) gid=0(root) groups=0(root) root@sergey-hurd-box:~/hax# (To be clear, I'm not the first person to realize that, let's say, _this way of doing things_ could be exploited. I just stumbled on a piece of code, realized that it uses a problematic pattern, thought of possible ramifications, and developed the specific exploit.) As far as I can see from Git history, this vulnerability has been present in the code base for more than 20 years. Is such a vulnerability already known (and am I just late to the party)? If it's not known, how do I responsibly disclose this, so that nobody's system gets hacked? I guess I could send the vulnerability description and the exploit source code in a private e-mail; is there perhaps a dedicated GNU e-mail address for this purpose? How do we ensure that a future commit fixing the vulnerability doesn't immediately disclose what it was? Or, should I just dump the whole thing out in the open on this mailing list? Should we get a CVE ID assigned? Should we notify Debian? Sergey P. S. On a personal note, it has been *very* exciting to find the issue and develop a successful exploit! But now I'm a bit lost as to what to do next. And sorry for throwing more stuff at you. This can certainly wait for a few more days if it hasn't been discovered for 20 years.
Re: How do I disclose a vulnerability?
On Fri, May 14, 2021 at 4:30 PM Samuel Thibault wrote: > We don't have anything set up for disclosures, you can drop me an e-mail > (ciphered if you can). OK, I'll prepare a write-up and send it to you. And I will attempt to use GPG for it. I asked about this on the Fediverse; and got (among other replies) this small guide [0] which sounds like a good plan of action. What do you think? Oh, and you would not believe this, but in the past couple of hours I have discovered *another* vulnerability, unrelated to the first one; it's even easier to exploit and also gives you root: sergey@sergey-hurd-box:~/hax2$ ./hax2 Got root auth port :) root@sergey-hurd-box:~/hax2# id uid=0(root) gid=0(root) groups=0(root) root@sergey-hurd-box:~/hax2# Sergey [0]: https://functional.cafe/@minoru/106234136976353911
Re: rpctrace output improvements?
On Sat, May 15, 2021 at 4:56 PM Samuel Thibault wrote: > > * rpctrace has to learn to parse defs, instead of (or in addition to) msgids > > * GNU MIG needs to be taught to emit more info into msgids files > > Either of those. Probably the latter would be more interesting > long-term. We could introduce a new file format that encodes the RPCs > in a way that is easy to parse for programs, similary to what the msgid > files, does, but more powerful. Well, it just so happens that I have some experience in extending MIG for precisely this purpose. Darwin uses untyped MIG, so we couldn't just make xtrace iterate over data items like rpctrace does. Instead I extended Darling MIG to generate a separate C source file, fooXtraceMig.c, in addition to fooUser.c and fooServer.c. The file consists of routines that can parse a message blob back into its arguments; this is then built into dynamic libraries that are installed into a special directory and dlopened by xtrace on startup. But I don't think we want to go all loadable dynamic libraries for rpctrace; typed MIG makes this much easier to do with just a declarative format. Sergey
Re: [RFC PATCH glibc 2/3] XXX: Move __pthread_threads to ld.so
On Sat, May 15, 2021 at 5:09 PM Samuel Thibault wrote: > This can't go to the generic-purpose dl-support.c file. > Create a sysdeps/mach/hurd/dl-thread_gscope_wait.c file, it's already > getting included by elf/Makefile's routine variable. I'm creating sysdeps/htl/dl-thread_gscope_wait.c in the next commit. Is there a reason this has to go into sysdeps/mach/hurd/dl-thread_gscope_wait.c rather than there, it seems rather specific to HTL? > The rest looks ok, did you try to run make check to make sure nothing > broke? I started it and it ran for some time, apparently without crashing. But then I interrupted it before it could finish, and anyway I don't know where to look for results and how to interpret them. I'll figure it out. Sergey
Re: How do I disclose a vulnerability?
On Fri, May 14, 2021 at 7:33 PM Sergey Bugaev wrote: > Oh, and you would not believe this, but in the past couple of hours I > have discovered *another* vulnerability, unrelated to the first one; > it's even easier to exploit and also gives you root And one more, unrelated to either of the first two: sergey@sergey-hurd-box:~/hax3$ gcc hax3.c -o hax3 sergey@sergey-hurd-box:~/hax3$ ids effective uids: 1000(sergey) effective gids: 1000(sergey) 24(cdrom) 25(floppy) 27(sudo) 29(audio) 30(dip) 44(video) 46(plugdev) 103(netdev) available uids: 1000(sergey) 1000(sergey) available gids: 1000(sergey) 1000(sergey) sergey@sergey-hurd-box:~/hax3$ ./hax3 Got root auth port :) root@sergey-hurd-box:~/hax3# ids effective uids: 0(root) effective gids: 0(root) available uids: 0(root) 0(root) available gids: 0(root) 0(root) root@sergey-hurd-box:~/hax3# There are other obvious issues with [this part of code], many of them should also be exploitable. Sergey
[PATCH] utils/ps.c: Don't limit output width to 80 on non-tty
If no output width limit has been set explicitly, and we're not printing to a tty, do not limit output width. In particular, this fixes grepping ps output. --- utils/ps.c | 5 + 1 file changed, 5 insertions(+) diff --git a/utils/ps.c b/utils/ps.c index 2cf6e4bd..8e39c2c5 100644 --- a/utils/ps.c +++ b/utils/ps.c @@ -367,6 +367,11 @@ main(int argc, char *argv[]) /* Parse our command line. This shouldn't ever return an error. */ argp_parse (&argp, argc, argv, 0, 0, 0); + /* If no output width limit has been set explicitly, and we're not printing + to a tty, do not limit output width. */ + if (output_width == -1 && !isatty (1)) +output_width = 0; + msgids_scan_std (); err = proc_stat_list_create(context, &procset); -- 2.31.1
[PATCH 0/1] Fix that procfs crash
This fixes the procfs crash that happens on shutdown, previously discussed here: https://lists.debian.org/debian-hurd/2021/04/msg00035.html Sergey Bugaev (1): procfs: Fix use-after-free procfs/rootdir.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) -- 2.31.1
[PATCH 1/1] procfs: Fix use-after-free
Fix several issues with reference counting in rootdir_make_translated_node (): * Grab an additional reference while still holding the lock. * Give the node an additional reference for being pointed to by the ops. * Reference the existing node if we find it on the second try, not only if we find it on the first try. * Dereference, not just clean, the created node if it turns out to be unneeded. Fixes a crash with the following backtrace: #2 0x010855c8 in __assert_fail_backtrace ( assertion=0x1051148 "! (r.hard == 1 && r.weak == 0) || !\"refcount detected use-after-free!\"", file=0x1050cec "../../libshouldbeinlibc/refcount.h", line=170, function=0x1051220 <__PRETTY_FUNCTION__.3> "refcounts_ref") at ../../libshouldbeinlibc/assert-backtrace.c:64 #3 0x01050358 in refcounts_ref (result=0x0, ref=0x15009a0) at ../../libshouldbeinlibc/refcount.h:170 #4 netfs_nref (np=0x15008f0) at ../../libnetfs/nref.c:26 #5 0x0804c9f4 in rootdir_make_translated_node (dir_hook=0x10001990, entry_hook=0x8055180 <__compound_literal.8>) at ../../procfs/rootdir.c:674 #6 0x0804ace3 in procfs_dir_lookup (hook=0x10001c10, name=0x2857efc "mounts", np=0x2855d68) at ../../procfs/procfs_dir.c:88 #7 0x0804a410 in procfs_lookup (np=0x10001c40, name=0x2857efc "mounts", npp=0x2855d68) at ../../procfs/procfs.c:185 #8 0x0804cae5 in dircat_lookup (hook=0x10001d50, name=0x2857efc "mounts", np=0x2855d68) at ../../procfs/dircat.c:76 #9 0x0804a410 in procfs_lookup (np=0x10001d80, name=0x2857efc "mounts", npp=0x2855d68) at ../../procfs/procfs.c:185 #10 0x0804a96f in netfs_attempt_lookup (user=0x1401c60, dir=0x10001d80, name=0x2857efc "mounts", np=0x2855d68) at ../../procfs/netfs.c:212 #11 0x0103fd5b in netfs_S_dir_lookup (dircred=, filename=, flags=, mode=, do_retry=, retry_name=, retry_port=, retry_port_type=) at ../../libnetfs/dir-lookup.c:175 --- procfs/rootdir.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/procfs/rootdir.c b/procfs/rootdir.c index 9f860a91..0e7c05c0 100644 --- a/procfs/rootdir.c +++ b/procfs/rootdir.c @@ -667,13 +667,12 @@ rootdir_make_translated_node (void *dir_hook, const void *entry_hook) pthread_spin_lock (&rootdir_translated_node_lock); np = *ops->npp; + if (np != NULL) +netfs_nref (np); pthread_spin_unlock (&rootdir_translated_node_lock); if (np != NULL) -{ - netfs_nref (np); - return np; -} +return np; np = procfs_make_node (entry_hook, (void *) entry_hook); if (np == NULL) @@ -686,11 +685,12 @@ rootdir_make_translated_node (void *dir_hook, const void *entry_hook) prev = *ops->npp; if (*ops->npp == NULL) *ops->npp = np; + netfs_nref (*ops->npp); pthread_spin_unlock (&rootdir_translated_node_lock); if (prev != NULL) { - procfs_cleanup (np); + netfs_nrele (np); np = prev; } -- 2.31.1
[RFC PATCH 0/1] gsync blocking the calling thread considered harmful
Hello, while hacking on rpctrace -p, I once again ran into gsync_wait () calls permanently hanging rpctrace. The reason for this is simple: once rpctrace logs the gsync_wait () call it receives from a traced task, it forwards the same gsync_wait () call to the actual task port of the traced task, and this causes rpctrace itself to block since the gsync_wait () implementation always affects the calling thread. Normally, some time later the blocked thread would be woken up by a gsync_wake () call done by another thread; but since rpctrace itself is single-threaded, other threads in the traced tasks can enqueue gsync_wake_request ()s in vain, as they'll never even get received by the hanging rpctrace, nor forwarded to the kernel. One way to work around this would be to make rpctrace multithreaded, and I've heard there's been some work in that direction. But to me it sounds like gsync_wait () blocking rpctrace is the part that goes wrong. Generally, rpctrace is never supposed to block on an RPC made by a traced task: it forwards the request message without blocking for the reply (that may come later, or never). So I've been long thinking that gsync_wait () should do the same: instead of actually blocking the thread calling gsync_wait (), gsync_wait_request () should return immediately, and the reply message will come once someone calls gsync_wake (). This doesn't change anything for the regular callers of gsync_wait (), since the call will still appear to block the same way other RPCs do, but it will actually block on msg receive, not inside gsync_wait () itself. This must have been discussed before, and there must be a reason why gsync_wait () was made to behave the way that it does and not in the (arguably simpler and more consistent) way I'm proposing; but I can't find any relevant discussion. Anyway, I thought I'd try implementing my idea and seeing what would break. Much like with glibc, I'm not very familiar with Mach-the-kernel internals, but to my surprise the first version that compiled appears to work just fine, booting a full working Hurd system (and rpctracing gsync_wait () totally works). Still, I probably messed something up: some locking or reference counting or somesuch; so please review :) The part that I didn't figure out yet is how the kernel can listen for a right to become a dead name (like the dead-name notification in userspace). Perhaps it amounts to calling ipc_port_dnrequest () and listening for messages just like in userspace, but I have not figured out the details yet. Without this, the kernel cannot really know when the reply port is deallocated, either explicitly by userspace or because the task died, so it means the kernel will leak memory allocated for the waiters. Another to-do item is timeout support; ideally it should just turn into waittime in the MIG definition, but this again requires handling the reply port dying, and would change the message format, and also GNU MIG doesn't yet support conditional timeouts (although I might have a MIG patch or two pending for this). So, what do you think? Sergey P.S. Feels so good to hack on something that I can just post about in public again!
[RFC PATCH 1/1] gsync: Do not block the thread calling gsync_wait ()
Instead, delay the reply message until the corresponding gsync_wake () call is made. TODO: Handle the case of reply port dying. --- include/mach/gnumach.defs | 4 ++ kern/gsync.c | 112 +++--- kern/gsync.h | 8 ++- 3 files changed, 89 insertions(+), 35 deletions(-) diff --git a/include/mach/gnumach.defs b/include/mach/gnumach.defs index 531b5d4d..06859196 100644 --- a/include/mach/gnumach.defs +++ b/include/mach/gnumach.defs @@ -33,6 +33,9 @@ subsystem GNUMACH_IMPORTS #endif +type reply_port_t = MACH_MSG_TYPE_MAKE_SEND_ONCE | polymorphic + ctype: mach_port_t; + type vm_cache_statistics_data_t = struct[11] of integer_t; type vm_wire_t = int; @@ -98,6 +101,7 @@ routine register_new_task_notification( * - GSYNC_TIMED: The call only blocks for MSEC milliseconds. */ routine gsync_wait( task : task_t; + sreplyport reply_port: reply_port_t; addr : vm_address_t; val1 : unsigned; val2 : unsigned; diff --git a/kern/gsync.c b/kern/gsync.c index e73a6cf0..aae39ba6 100644 --- a/kern/gsync.c +++ b/kern/gsync.c @@ -21,9 +21,14 @@ #include #include #include +#include #include #include +#define GSYNC_MSGH_SEQNO 0 + +static struct kmem_cache gsync_waiter_cache; + /* An entry in the global hash table. */ struct gsync_hbucket { @@ -57,12 +62,13 @@ union gsync_key } any; }; -/* A thread that is blocked on an address with 'gsync_wait'. */ +/* An RPC that is blocked on an address with 'gsync_wait'. */ struct gsync_waiter { struct list link; union gsync_key key; - thread_t waiter; + ipc_port_t reply_port; + mach_msg_type_name_t reply_port_type; }; /* Needed data for temporary mappings. */ @@ -83,6 +89,9 @@ void gsync_setup (void) list_init (&gsync_buckets[i].entries); kmutex_init (&gsync_buckets[i].lock); } + + kmem_cache_init (&gsync_waiter_cache, "gsync_waiter", + sizeof (struct gsync_waiter), 0, NULL, 0); } /* Convenience comparison functions for gsync_key's. */ @@ -219,24 +228,38 @@ temp_mapping (struct vm_args *vap, vm_offset_t addr, vm_prot_t prot) return (paddr); } -kern_return_t gsync_wait (task_t task, vm_offset_t addr, - unsigned int lo, unsigned int hi, natural_t msec, int flags) +kern_return_t +gsync_wait (task_t task, +ipc_port_t reply_port, mach_msg_type_name_t reply_port_type, +vm_offset_t addr, +unsigned int lo, unsigned int hi, natural_t msec, int flags) { + struct gsync_waiter *w; + if (task == 0) return (KERN_INVALID_TASK); else if (addr % sizeof (int) != 0) return (KERN_INVALID_ADDRESS); + if (! IP_VALID (reply_port)) +return (KERN_INVALID_CAPABILITY); + if (reply_port_type == MACH_MSG_TYPE_MOVE_RECEIVE) +return (KERN_INVALID_RIGHT); + + w = (struct gsync_waiter *) kmem_cache_alloc (&gsync_waiter_cache); + if (! w) +return (KERN_RESOURCE_SHORTAGE); + vm_map_lock_read (task->map); - struct gsync_waiter w; struct vm_args va; boolean_t remote = task != current_task (); - int bucket = gsync_prepare_key (task, addr, flags, &w.key, &va); + int bucket = gsync_prepare_key (task, addr, flags, &w->key, &va); if (bucket < 0) { vm_map_unlock_read (task->map); + kmem_cache_free (&gsync_waiter_cache, (vm_offset_t) w); return (KERN_INVALID_ADDRESS); } else if (remote) @@ -270,6 +293,7 @@ kern_return_t gsync_wait (task_t task, vm_offset_t addr, vm_map_unlock_read (task->map); /* Make sure to remove the reference we added. */ vm_object_deallocate (va.obj); + kmem_cache_free (&gsync_waiter_cache, (vm_offset_t) w); return (KERN_MEMORY_FAILURE); } @@ -293,6 +317,7 @@ kern_return_t gsync_wait (task_t task, vm_offset_t addr, if (! equal) { kmutex_unlock (&hbp->lock); + kmem_cache_free (&gsync_waiter_cache, (vm_offset_t) w); return (KERN_INVALID_ARGUMENT); } @@ -300,37 +325,18 @@ kern_return_t gsync_wait (task_t task, vm_offset_t addr, * compares strictly greater than this waiter. */ struct list *runp; list_for_each (&hbp->entries, runp) -if (gsync_key_lt (&w.key, &node_to_waiter(runp)->key)) +if (gsync_key_lt (&w->key, &node_to_waiter(runp)->key)) break; - /* Finally, add ourselves to the list and go to sleep. */ - list_add (runp->prev, runp, &w.link); - w.waiter = current_thread (); + /* Finally, add ourselves to the list. */ - if (flags & GSYNC_TIMED) -thread_will_wait_with_timeout (w.waiter, msec); - else -thread_will_wait (w.waiter); + list_add (runp->prev, runp, &w->link); + w->reply_port = reply_port; + w->reply_port_type = reply_port_type; kmutex_unlock (&hbp->lock); - thread_block (thread_no_continuation); - - /* We're back. */ - kern_return_t ret = KERN_SUCCESS; - if (current_thread()->wait_result != THREAD_AWAKENED) -{ - /* We were interrupted or timed out. */ -
[PATCH 0/1] Rewrite /hurd/symlink on top of trivfs
Hello, I came by this issue [0], which basically states that /hurd/symlink was behaving in a weird way. Namely, when opened with O_NOLINK, it would appear to be an empty file, not a symlink. The report talks about broken symlinks, but the issue is more fundamental: $ settrans -ac /tmp/demo /hurd/symlink /etc/hostname $ cat /tmp/demo sergey-hurd-box $ file /tmp/demo /tmp/demo: empty $ ls -l /tmp/demo -rw-r--r-- 1 sergey sergey 0 Jun 21 15:52 /tmp/demo $ fsysopts /tmp/demo ext2fs --writable --relatime --no-inherit-dir-group --store-type=typed device:hd0s1 [0]: https://www.gnu.org/software/hurd/open_issues/symlink_translator.html I took a look at trans/symlink.c, and found out that: * It was not using libtrivs, as I would expect it to. * It was treating O_NOLINK like O_NOTRANS; namely by returning the underlying node. This is the cause of the issue: O_NOLINK has very different semantics. It should open the link itself, and then suppport io_stat () and io_read () on it, returning S_IFLNK | 0777 for the mode, and the link target as contents. * It was leaking ports (should deallocate dotdot when not MOVE_SEND'ing it), and skipping some error checks. So I thought I'd try rewriting /hurd/symlink to use trivfs. This is possible because trivfs has this convenient trivfs_getroot_hook mechanism, which lets me either handle fsys_getroot () any way I want to, or continue with the default implementation found in trivfs (for O_NOLINK). End result: it seems to work great. Absolute, relative; file, ls, ls -l, stat, readlink, cat; everything just does what I'd expect it to. Even fsysopts now shows the invocation properly. Some demos: $ settrans -ac /tmp/demo trans/symlink /etc/hostname $ cat /etc/hostname sergey-hurd-box $ cat /tmp/demo sergey-hurd-box $ file /tmp/demo /tmp/demo: symbolic link to /etc/hostname $ ls -l /tmp/demo lrwxrwxrwx 1 sergey sergey 13 Jun 21 16:11 /tmp/demo -> /etc/hostname $ fsysopts /tmp/demo trans/symlink /etc/hostname $ settrans -ac /tmp/demo trans/symlink no-such-file $ cat /tmp/demo cat: /tmp/demo: No such file or directory $ file /tmp/demo /tmp/demo: broken symbolic link to no-such-file $ ls -l /tmp/demo lrwxrwxrwx 1 sergey sergey 12 Jun 21 16:11 /tmp/demo -> no-such-file Sergey Bugaev (1): trans/symlink.c: Rewrite on top of trivfs trans/Makefile | 10 +- trans/symlink.c | 243 +++- 2 files changed, 137 insertions(+), 116 deletions(-) -- 2.31.1
[PATCH 1/1] trans/symlink.c: Rewrite on top of trivfs
--- trans/Makefile | 10 +- trans/symlink.c | 243 +++- 2 files changed, 137 insertions(+), 116 deletions(-) diff --git a/trans/Makefile b/trans/Makefile index 6cf50e7a..c04220a2 100644 --- a/trans/Makefile +++ b/trans/Makefile @@ -25,7 +25,7 @@ targets = symlink firmlink ifsock magic null fifo new-fifo fwd crash \ SRCS = ifsock.c symlink.c magic.c null.c fifo.c new-fifo.c fwd.c \ crash.c firmlink.c password.c hello.c hello-mt.c streamio.c \ fakeroot.c proxy-defpager.c remap.c mtab.c -OBJS = $(SRCS:.c=.o) fsysServer.o ifsockServer.o passwordServer.o \ +OBJS = $(SRCS:.c=.o) ifsockServer.o passwordServer.o \ crashServer.o crash_replyUser.o msgServer.o \ default_pagerServer.o default_pagerUser.o \ device_replyServer.o elfcore.o startup_notifyServer.o @@ -48,11 +48,6 @@ device_reply-MIGSFLAGS=\ "-DMACH_PAYLOAD_TO_PORT=ports_payload_get_name" \ "-DDEVICE_IMPORTS=import \"$(srcdir)/../libports/ports.h\";" -# fsysServer is only used by the symlink translator which does not use -# libports. Disable the default payload to port conversion. -fsys-MIGSFLAGS = "-DHURD_DEFAULT_PAYLOAD_TO_PORT=1" -fsysServer-CFLAGS = "-DMIG_EOPNOTSUPP=EOPNOTSUPP" - # If we have a configured tree, include the configuration so that we # can conditionally build translators. ifneq (,$(wildcard ../config.make)) @@ -76,11 +71,10 @@ mtab: fsysUser.o password: passwordServer.o proxy-defpager: default_pagerServer.o default_pagerUser.o streamio: device_replyServer.o -symlink: fsysServer.o fakeroot: ../libnetfs/libnetfs.a fifo new-fifo: ../libpipe/libpipe.a -crash fifo firmlink hello hello-mt ifsock magic mtab new-fifo null password proxy-defpager remap streamio: ../libtrivfs/libtrivfs.a +crash fifo firmlink hello hello-mt ifsock magic mtab new-fifo null password proxy-defpager remap streamio symlink: ../libtrivfs/libtrivfs.a $(targets): ../libfshelp/libfshelp.a \ ../libihash/libihash.a \ ../libiohelp/libiohelp.a \ diff --git a/trans/symlink.c b/trans/symlink.c index 80d60f64..63c94048 100644 --- a/trans/symlink.c +++ b/trans/symlink.c @@ -1,5 +1,5 @@ /* Translator for S_IFLNK nodes - Copyright (C) 1994, 2000, 2001, 2002 Free Software Foundation + Copyright (C) 1994, 2000, 2001, 2002, 2021 Free Software Foundation This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as @@ -15,30 +15,137 @@ along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include -#include -#include -#include #include -#include -#include +#include #include #include +#include +#include +#include +#include +#include +#include #include -#include "fsys_S.h" -mach_port_t realnode; - -/* We return this for O_NOLINK lookups */ -mach_port_t realnodenoauth; - -/* We return this for non O_NOLINK lookups */ -char *linktarget; - -extern int fsys_server (mach_msg_header_t *, mach_msg_header_t *); +char *link_target; +size_t link_target_len; const char *argp_program_version = STANDARD_HURD_VERSION (symlink); +int trivfs_fsid = 0; +int trivfs_fstype = FSTYPE_MISC; + +int trivfs_support_read = 1; +int trivfs_support_write = 0; +int trivfs_support_exec = 0; + +int trivfs_allow_open = O_READ; + +void +trivfs_modify_stat (struct trivfs_protid *cred, io_statbuf_t *st) +{ + st->st_size = link_target_len; + st->st_blocks = 0; + st->st_mode = S_IFLNK | 0777; +} + +kern_return_t +trivfs_S_io_read (struct trivfs_protid *cred, + mach_port_t reply, + mach_msg_type_name_t reply_type, + data_t *data, + mach_msg_type_number_t *data_len, + loff_t offset, + vm_size_t amount) +{ + if (!cred) +return EOPNOTSUPP; + else if (!(cred->po->openmodes & O_READ)) +return EBADF; + + if (amount > link_target_len) +amount = link_target_len; + + if (amount > *data_len) +{ + /* Try to make more space. */ + void *buffer; + + buffer = mmap (0, amount, PROT_READ | PROT_WRITE, MAP_ANON, 0, 0); + if (buffer == MAP_FAILED) +amount = *data_len; + else +{ + *data = buffer; + *data_len = amount; +} +} + else +*data_len = amount; + + memcpy (*data, link_target, amount); + return 0; +} + +error_t +get_root (struct trivfs_control *fsys, + mach_port_t reply_port, + mach_msg_type_name_t reply_port_type, + mach_port_t dotdot, + uid_t *uids, size_t nuids, + uid_t *gids, size_t ngids, + int flags, + retry_type *do_retry, + char *retry_name, + mach_port_t *retry_node, + mach_msg_type_name_t *retry_node_type) +{ + /* If O_NOLINK is specified, proceed + with opening the link itself. */ + if (flags & O_NOLINK) +return EAGAIN; +
Re: [PATCH 0/1] Rewrite /hurd/symlink on top of trivfs OFF TOPIC PRAISE
On Wed, Jul 28, 2021 at 10:39 PM wrote: > Thanks for working on this! I don't believe it: I finally got my own "OFF TOPIC PRAISE" letter! Thank you for your kind words :) > I'm wanting to learn more about trivfs for my caesar > cipher eventually, but looking that how to write trivial translators makes > me a little curious...I'll learn more about it someday. :) > > It would be cool to have a gemini translator. It would only take a few 100 > lines of code. Oh yes, a Gemini translator actually sounds like a perfect exercise! It's not as simple as /hurd/hello or /dev/yes, but still simple enough, and would be actually useful for the end users. Why don't you start working on it? If you run into any trouble (or are unsure where to start from), feel free to ask me! — I'd be glad to help you, in so much as I understand things myself (which may not be that much, actually). Perhaps others on this list would be interested in helping you too. One particular thing that may turn out to be complicated: TLS, which is mandatory in Gemini. I admit I don't have any experience interacting with TLS from C myself, and I hear that things are indeed complicated there [0]. So what I would do is ignore the TLS requirement for now and implement just cleartext Gemini, even though it's against the spec; then once things are working figure out TLS. [0]: https://blogs.gnome.org/mcatanzaro/2018/11/11/the-gnome-and-webkitgtk-networking-stack/ Sergey
Re: [PATCH 0/1] Rewrite /hurd/symlink on top of trivfs OFF TOPIC PRAISE
On Wed, Jul 28, 2021 at 11:25 PM Sergey Bugaev wrote: > On Wed, Jul 28, 2021 at 10:39 PM wrote: > > It would be cool to have a gemini translator. It would only take a few 100 > > lines of code. > > Oh yes, a Gemini translator actually sounds like a perfect exercise! > It's not as simple as /hurd/hello or /dev/yes, but still simple > enough, and would be actually useful for the end users. Thinking about it further, while it's true that you should be able to write a very simple trivfs-based Gemini translator that would basically do a request for each open and let the client read the response body, for a more serious, solid Gemini translator you'd want much more than that. In particular, you'd want the ability to talk about directories as filesystem nodes even if there's no document at that URL. So for instance, $ settrans gemini: /hypothetical/gemini $ cat gemini://example.com/foo/bar/hello Hello # You'd also want this to work: $ cd gemini://example.com/foo $ cat bar/hello Hello # ...even if there's nothing (error code 51) at gemini://example.com/foo Ideally, cd'ing into a directory would not even cause a network request, only trying to read a file would. Moreover, you'd want ls(1) to work, at least somewhat. We could learn a trick from httpfs here: when asked to list the contents of a directory, fetch the page and collect links that point to pages inside the directory. For instance, gemini://gemini.circumlunar.space/ contains a bunch of links to its subpages, which we would parse out and display as directory entries. Thankfully, Hurd fully supports treating a single node as both a directory and a readable file. I think this is all implementable, but would require using netfs, probably overriding netfs_S_dir_lookup (), and quite a bit more than a few hundred lines of code. :| Sergey
Re: Hurd Security vulnerabilities, please upgrade!
On Tue, Aug 10, 2021 at 5:04 AM Samuel Thibault wrote: > In the past months, Sergey Bugaev has been working on fixing some > Hurd security vulnerabilities. Well I certainly wasn't doing it alone :) Samuel and me have been working together over the past few months to design and implement fixes for the several severe vulnerabilities in the Hurd. (How many of those vulnerabilities we have fixed is hard to quantify, but it's more than just the three I reported initially.) I worked on: - Actually finding the vulnerabilities and developing exploits for them - Coming up with potential ways we could work towards fixing them - Actually writing most of the code - Testing it in a subhurd Samuel helped with reviewing my changes and making design decisions; towards the end he got some time and joined in with testing, debugging, and writing code. None of the vulnerabilities were as simple as an off-by-one error or a missing check; they all had to do with certain mechanisms being structured in a way that makes them subtly insecure, which is why fixing them required a lot of design work. We ended up switching our approach several times; I believe our final version is much better than what we were trying to do initially. In the end, we managed to make the changes way less invasive than it seemed they had to be, and they complicate things much less than it initially appeared was necessary. Still, the changes touch most of the components of the Hurd. We were aiming to make it in time for the upcoming Debian release, to make sure it already contains the fixed versions. There were some troubles and a change of approach and new bugs discovered (and fixed) in the last few days, but apparently we did make it in time! I urge everybody to upgrade (and reboot!) their systems as soon as possible. I have already updated mine, and can confirm that all my exploits fail now. Sergey
Re: Problems with Xorg
For what it's worth, a few years ago I was able to successfully run Xfce, LXDE, GNUstep, parts of GNOME, and even Plan 9 from User Space. Since then things have regressed. At one point Xorg was failing to find any displays at all, lately it's been starting just fine and displaying simple windows properly, but it's unable to take any input from my mouse and keyboard, so it's still not usable. Sergey
Re: Hurd Security vulnerabilities, please upgrade!
Hello! On Tue, Aug 10, 2021 at 2:53 PM Richard Braun wrote: > I have issues mounting /home (or rather, keeping it mounted) on > darnassus.sceen.net since the reboot. Samuel already has root access > so he can look at it when he has time. That's... not good. I hope we'll figure it out. By the way, is there still a way to get an account on darnassus? I hope I qualify as "people who have already shown some level of involvement in the project". (But I won't be offended or anything if the answer is no.) (A word of warning though: I may have another vulnerability or two up my sleeve, so giving me an account isn't that different from giving me root access.) > I haven't looked at the work, but if Samuel merged it, I trust it's > good. Thanks :). > > -- > Richard Braun It would be great to get code and design review from you too, if you happen to have time for that. Sergey
Re: Problems with Xorg
On Tue, Aug 10, 2021 at 4:42 PM Almudena Garcia wrote: > > > Since then things have regressed. At one point Xorg was failing to > > find any displays at all, lately it's been starting just fine and > > displaying simple windows properly, but it's unable to take any input > > from my mouse and keyboard, so it's still not usable. > > This is not the point here. I am able to run MATE properly in Debian > GNU/Hurd, and it works with the mouse and keyboard without problems. Even I > can play a video in VLC. > I repeated the test in real hardware, and it works without lags. > > But currently there are a "little" bug which avoid to access to filesystem > from files explorer, both in Xfce than MATE. > It seems a coding error, because the error refers to a type problem (check > attached image). > > This is the only bug (that i know) which currently avoid full functionality > in MATE, and this is the reason because I advice to fix it. > > Could you check it? > Thanks I cannot check it because I cannot reproduce it — as I said, I don't get to the stage where I could attempt to open a file explorer. It also sounds quite likely to only be reproducible on your machine, not anyone else's. You should debug or rpctrace it to see what it actually tries to do and how exactly it fails. Sergey
Re: Problems with Xorg
On Tue, Aug 10, 2021 at 11:41 PM Samuel Thibault wrote: > Are you at the console while running this? Normally vcs is a symlink to > /dev/vcs/1 or such. Good point, that works from the console. Is it supposed to be dependent on who's asking, like /dev/console in Unix? (If so, why doesn't it like ptys?) Or does it show the currently active virtual console? (If so, what does it matter whether I'm asking from a console?) Where's that even implemented? — grep vcs -r console/ returns nothing > (WW) Hotplugging requested but the server was compiled without a config > backend. No input devices were configured, the server will start without any > input devices. > > I don't know where that hotplug request comes from, but that cannot work. Thanks, that looks relevant indeed! Amusingly enough, the first result in Google points to someone else having the same issue with X on the Hurd: https://www.mail-archive.com/bug-hurd@gnu.org/msg30877.html Looking at X source code (hw/xfree86/common/xf86Config.c), this seems to refer to an option named AutoAddDevices. And sure enough, I have: sergey@sergey-hurd-box:~$ grep AutoAddDevices -r /etc/X11/ /etc/X11/xorg.conf.d/auto-all.conf:Option "AutoAddDevices" "true" sergey@sergey-hurd-box:~$ dpkg -S /etc/X11/xorg.conf.d/auto-all.conf dpkg-query: no path found matching pattern /etc/X11/xorg.conf.d/auto-all.conf What the hell? How'd that file get there? I certainly didn't write it. Sergey
Re: Problems with Xorg
On Wed, Aug 11, 2021 at 12:04 AM Samuel Thibault wrote: > > What the hell? How'd that file get there? I certainly didn't write it. > > No idea, and > > https://codesearch.debian.net/search?q=auto-all.conf&hl=en > > doesn't know about it either. Well in any case, removing that file made input work once again! Thanks! I can now run rio, 9term, and xterm. LXDE is still having troubles with D-Bus or something. But we can figure that out separately. Sergey
Re: Debian Hurd release this week-end
On Fri, Aug 13, 2021, 01:25 Samuel Thibault wrote: > So as usual, along the Debian release, we will have a Debian Hurd > release. I have drafted the following list of news since the 2019 > release: <...> > Is there some other big line that I am forgetting? > Perhaps "multiple security improvements" deserve a mention? Sergey >
Re: Debian Hurd release this week-end
On Fri, Aug 13, 2021, 10:05 Sergey Bugaev wrote: > Perhaps "multiple security improvements" deserve a mention? > Oh, or is it not going to make it into the release because of the unfinished copyright assignment? :( Sergey >
Re: Regarding copyright assignment to FSF
On Sat, Aug 14, 2021 at 8:43 AM Michael Banck wrote: > The fact that this process potentially or apparently took (or rather, > has been taking) months for Sergey (I don't know when it was initiated), > is a pretty good indicator that it is more than a nuisance. Well, this is partly my own fault: I've been postponing trying to scan the signed papers (I can hack on kernel internals alright, but ask me to deal with a scanner and I'm lost). But that being said, the FSF has also been consistently slow to respond, so it would take months even if I had done everything promptly. Sergey
Re: PCI arbiter memory mapping
On Mon, Aug 16, 2021 at 9:17 PM Samuel Thibault wrote: > > > I'd rather see a > > > hurd-specific libpciaccess function to get the pager. > > > > That's better, but we'd have to add that function in both hurd_pci.c and > > x86_pci.c. I don't like the idea of adding Hurd specific code to x86_module > > but there's already a lot of it and we could avoid the existence of struct > > pci_user_data, which I like even less. > > Actually I'm thinking that this is just another case of mremap(). The > underlying question is getting the memory object of a given memory > range, like vm_region does. We need to be careful since we don't want > any process to be able to get the memory object of any other process, > but it does make sense, and would be useful for mremap. IMO the proper way to get mremap () is vm_remap () — basically like vm_map (), but instead of a memory object to map it takes another (task, address) pair. OSF (or just Apple's?) Mach has it [0], although I'd add 'anywhere' and 'unmap_source' flags to that interface. [0] https://developer.apple.com/documentation/kernel/1585336-vm_remap This makes me think of something cooler though: what if there was an API to create a proxy memory object from a (task, address, size) triple? Then you'd be able to map the new proxy, and there wouldn't be a need for a separate vm_remap () — all without getting direct access to the underlying memory object(s). So, something like this: routine vm_create_proxy ( target_task : vm_task_t; address : vm_address_t; size : vm_size_t; out proxy : memory_object_t); Then mremap () could be implemented like this: vm_create_proxy (mach_task_self (), old_address, old_size, &proxy); vm_deallocate (mach_task_self (), old_address, old_size); vm_map (mach_task_self (), new_address, new_size, ..., proxy, ...); mach_port_deallocate (mach_task_self (), proxy); (Hmm, but what about increasing the map size? This cannot be done securely for non-anonimous mappings, can it?) This could also be used for other fun things, like implementing /proc/pid/mem in a coherent way. (It can be implemented in a sad way without this by making a pager that does vm_read () / vm_write () when asked to retrieve / store a page.) What do you think? Sergey
Re: PCI arbiter memory mapping
On Mon, Aug 16, 2021 at 10:28 PM Samuel Thibault wrote: > Sergey Bugaev, le lun. 16 août 2021 21:59:47 +0300, a ecrit: > > IMO the proper way to get mremap () is vm_remap () > > But that doesn't answer Joan's need for getting a memory object, to be > able to put a proxy memory object on top of it to make it read-only. Ah, I wasn't really following the discussion. I'm talking specifically about mremap. > But then it'll create a pile of proxies if the application calls mremap > several times, which I guess some applications will happily try to do. No, for two reasons: 1. The kernel should just transparently short-circuit nested proxies at proxy creation time — make the new proxy reference the original memory object and not the other proxy; this way the other proxy can be deallocated when it's no longer referenced, even if the new proxy continues to exist. I believe this is what Joan has suggested above: > I think it'd be a better solution to move the call to > memory_object_proxy_valid() and the start value overwrite to > memory_object_create_proxy() 2. As always, when you map a proxy it's the original memory object that actually gets mapped (and referenced, and kept alive); the proxy will then get deallocated when the task does mach_port_deallocate (mach_task_self (), proxy). In other words, the proxy is short-lived and only serves to make this whole operation representable. (This is transparent to userspace though: from its perspective it's the proxy that stays mapped.) Sergey
Re: PCI arbiter memory mapping
On Mon, Aug 16, 2021 at 11:43 PM Samuel Thibault wrote: > Here, possibly proxies can indeed work properly. But please do look at > what Joan's situation is to make sure it does. I don't think I understand enough about the situation. It would help if you or Joan were to kindly give me some more context :) As I understand it, there's the PCI arbiter, which is a translator that arbitrates access to PCI, which is a hardware bus that various devices can be connected to. The hardware devices connected via PCI are available (to the PCI arbiter) as Mach devices, and in particular it's possible to use device_map () and then vm_map () to access the device memory. Then there's libpciaccess whose Hurd backend uses the files exported by the PCI arbiter to get access to the PCI, including mapping device memory. Naturally its user can request read-only or read-write mapping, but the PCI arbiter may decide to only return a read-only memory object (a proxy to the device pager), in which case libpciaccess should deallocate the port and return EPREM, or the PCI arbiter may return the real device pager. Is that right? What am I missing? What's the issue you're trying to solve? Sergey
Re: PCI arbiter memory mapping
On Tue, Aug 17, 2021 at 12:10 AM Samuel Thibault wrote: > Yes, and we want to implement nesting: providing a sub-hurd with a > partial view of the PCI space. Interesting. But I still don't think I understand what the problem you are running into. boot(1) already emulates Mach devices; it would itself use libpciaccess (or the PCI arbiter directly, since it doesn't need portability?) to get access to the host's devices that are to be passed through to the subhurd. Then it'd return whatever memory objects it got from the host's PCI arbiter — or proxies to them — from emulated device_map (). The kernel would take care of short-circuiting the nested proxies, although this is just a tiny optimization in this case and is not required for correctness or anything like that. Is that how you're implementing it? (Hmm, how does I/O port access work in a subhurd? I'm guessing the subhurd has to be privileged? But then it probably has to anyway, to access the PCI devices?) Sergey
Re: Problems trying to install Debian GNU/Hurd 2021 in SATA AHCI mode, over Thinkpad T410
On Tue, Aug 17, 2021 at 7:44 PM Almudena Garcia wrote: > But, in fstab, the drives are registered as /dev/hd0s1 (root partition in > harddisk), and /dev/hd2 (cdron drive). > I can try to change them to sdX naming, but is a bit risky fstab only comes into play later in the boot process (certainly not until the root partition is mounted). You'd need to instruct GRUB to pass different arguments to the boot filesystem. Sergey
Re: PCI arbiter memory mapping
On Tue, Aug 17, 2021 at 12:38 AM Samuel Thibault wrote: > The root pci-arbiter uses libpciaccess' x86 backend to access PCI On Tue, Aug 17, 2021 at 9:47 PM Joan Lledó wrote: > Yes, and the arbiter can play two roles: root arbiter, which uses x86 > module in libpciacces; and nested arbiter, which uses the hurd module in > libpciaccess. > > > The hardware devices connected via PCI are available (to the PCI arbiter) > > as Mach devices > > Actually, the devices are available to the arbiter as libpciaccess devices Thank you both for explaining, this is the part that I was missing: that the arbiter itself uses libpciaccess, and that it cannot map the devices directly, it has to map /dev/mem. To me it sounds like libpciaccess should have a Hurd-specific API addition that would let the user get the memory object backing the mapping created by device_map_region (). I.e., device_map_region () is a cross-platform API that maps the device memory into your address space (right?), but on the Hurd there'd also be a way to actually get the memory object it would map (and map it yourself if you so choose, or do something else). It would be the job of that libpciaccess API to make this object have the right offset and everything, so that the caller wouldn't have to worry about that. If I understand this right, its Hurd backend already gets the memory object with the right offset and size, and would return that directly; while the x86 backend would either have to use the unimplemented-as-of-now offset parameter in device_map (), or create and return an appropriate proxy from that API. In memory_object_create_proxy (), the kernel would take care of short-circuiting nested proxy creation to make that a non-issue. This will allow netfs_get_filemap (VM_PROT_READ) to create another proxy just for enforcing read-only access without worrying that the object might already be a proxy. Does that sound remotely sensible? :) Please keep in mind that while (I think) I understand Mach VM, I have very little idea about PCI. I'm (obviously) not in a position to decide what's best for libpciaccess and the PCI arbiter. Sergey
Re: PCI arbiter memory mapping
On Tue, Aug 17, 2021 at 9:14 PM Joan Lledó wrote: > > The underlying question is getting the memory object of a given memory > > range, like vm_region does. > > Yes, I see that could be useful. Is vm_region not workig for proxies? > > > We need to be careful since we don't want any process to be able to > > get the memory object of any other process > > Is that not being checked yet? can I call vm_region to get another > task's memory object now? It doesn't matter whether it's another task or not, or whether it's a proxy or not (in fact it will never be a proxy, but that doesn't matter): you can no longer get the underlying memory object with vm_region () — for, uh, obvious security reasons. It was a gnumach extension anyway — vanilla Mach doesn't support that. (Well in fact what actually changed is that gnumach at some point allowed the use of "memory object name ports" in vm_map (), and now once again doesn't. It never allowed you to get the actual memory object.) Sergey
Re: PCI arbiter memory mapping
On Wed, Aug 18, 2021 at 8:34 PM Joan Lledó wrote: > El 18/8/21 a les 0:13, Sergey Bugaev ha escrit: > > you can no longer get the underlying memory object with vm_region () > > How so? reading the docs I understood it does: > > https://www.gnu.org/software/hurd/gnumach-doc/Memory-Attributes.html > > "The port object_name identifies the memory object associated with this > region" Yes; the docs could have been clearer about this — and I was myself confused about this not long ago. What it gives you is a _memory object name port_ which "identifies" the memory object, in the sense that you can get name ports for two regions (even from different tasks) and they'll compare equal if it's the same memory object mapped into both. And that's about the only thing you can do with the name port. It's *not* the actual port to the underlying memory object. > > (Well in fact what actually changed is that gnumach at some point > > allowed the use of "memory object name ports" in vm_map (), and now > > once again doesn't. > > vm_map does receive the memory object as parameter: > > https://www.gnu.org/software/hurd/gnumach-doc/Mapping-Memory-Objects.html > > "memory_object is the port that represents the memory object: used by > user tasks in vm_map" Previously, as a GNU Mach extension, you could also pass the memory object *name* port (one you get from vm_region ()), instead of the memory object port. But not anymore. > > It never allowed you to get the actual memory > > object.) > > vm_map receives a memory object, doesn't return it Yes. What I'm saying is vm_region () gives you the name port, which previously could be used in vm_map () much like the memory object port (this was allowed to enable mremap () to be implemented). vm_region () never gave out the actual memory object port, but the name port could be used in vm_map () in a similar way to the real memory object port. Sergey
Re: PCI arbiter memory mapping
On Wed, Aug 18, 2021 at 8:43 PM Joan Lledó wrote: > El 18/8/21 a les 0:02, Sergey Bugaev ha escrit: > > To me it sounds like libpciaccess should have a Hurd-specific API > > addition that would let the user get the memory object > > That's a solution and can be done. But I'd like to know more about > vm_region first. It seems it can return the object, and I don't see why > is a security problem to allow a task to retrieve objects belonging to > itself. Basically because it allowed the task to overcome any limitations imposed onto it by max_protection (and offset/size). If you have page 2 of a memory object mapped into your address space read-only, that shouldn't let you write to page 5. Sergey
[RFC PATCH htl v2 0/4] Rewrite THREAD_GSCOPE
Hello, this is a revised version of the patchset I have sent a few months ago [0]. Sending to both bug-hurd and libc-alpha this time. [0]: https://lists.gnu.org/archive/html/bug-hurd/2021-05/msg00063.html The goal is to get rid of the excessive (and redundant) gsync_wake () calls. The means are rewriting how THREAD_GSCOPE_* is implemented in HTL to work more like its NPTL counterpart. To get this working, I first have to expose HTL's internal thread table to ld.so, a lot like NPTL already does. Before this patchset: $ rpctrace uname |& grep -c gsync_wake 57 With this patchset (first 3 patches): $ ./testrun.sh --tool=rpctrace /bin/uname |& grep -c gsync_wake 0 The unsolved question with this patchset is the fate of THREAD_GSCOPE_IN_TCB. Apparently, when it was introduced in [1], it was indeed only used to distinguish between cases where the GSCOPE flag is in TCB or not. But as of now, it is used mostly as a generic way to conditionally compile either HTL- or NPTL-specific code, mostly for TLS initialization. Now that all ports have the GSCOPE flag in TCB, this definition no longer does what it says. [1]: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a5df0318ef30a4dcff3fa2cb82265f641813d9ea I can think of two ways this can be handled: * Code can be rewritten and reorganized to use port-specific files instead of conditional compilation. From looking at how glibc source is organized, this appears to be the generally preferred way. I'm sending one more patch that makes a (perhaps misguided) attempt to do this; but I'm not sure what would be good way to handle the definitions in -- please take a look. * A more conservative approach would be to leave the code using THREAD_GSCOPE_IN_TCB as is for now, but change the variable's name to better reflect what it's used for. I haven't really been able to come up with a good name, though. Alternatively, it could just be left as is for now. Sergey Bugaev (4): htl: Move thread table to ld.so htl: Reimplement GSCOPE testrun.sh: Add support for --tool=rpctrace XXX: Attempt to get rid of most THREAD_GSCOPE_IN_TCB usages Makefile | 9 +++- elf/dl-reloc.c | 21 - elf/dl-support.c | 15 -- elf/dl-tls.c | 39 --- htl/Versions | 2 - htl/pt-alloc.c | 50 htl/pt-create.c| 11 ++--- htl/pt-internal.h | 23 +++-- sysdeps/generic/ldsodefs.h | 10 ++-- sysdeps/htl/dl-support.c | 28 +++ sysdeps/htl/dl-thread_gscope_wait.c| 55 ++ sysdeps/htl/dl-tls.c | 39 +++ {elf => sysdeps/htl}/dl-tls_init_tp.c | 4 +- sysdeps/htl/pt-key-delete.c| 8 ++-- sysdeps/htl/pthreadP.h | 2 - sysdeps/htl/raise.c| 8 +++- sysdeps/htl/thrd_current.c | 7 ++- sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h | 2 +- sysdeps/mach/hurd/i386/tls.h | 19 sysdeps/mach/hurd/tls.h| 20 +--- sysdeps/nptl/dl-support.c | 26 ++ sysdeps/nptl/dl-tls.c | 53 + sysdeps/unix/sysv/linux/alpha/dl-support.c | 2 +- 25 files changed, 287 insertions(+), 170 deletions(-) create mode 100644 sysdeps/htl/dl-support.c create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c create mode 100644 sysdeps/htl/dl-tls.c rename {elf => sysdeps/htl}/dl-tls_init_tp.c (93%) create mode 100644 sysdeps/nptl/dl-support.c create mode 100644 sysdeps/nptl/dl-tls.c -- 2.31.1
[RFC PATCH htl v2 4/4] XXX: Attempt to get rid of most THREAD_GSCOPE_IN_TCB usages
Try to do this by adding HTL- and NPTL-specific versions of dl-support.c, dl-tls.c, and dl-tls_init_tp.c, and moving HTL and NPTL specifics there. The remaining user of THREAD_GSCOPE_IN_TCB is , which I'm not sure how to handle. I see that there are sysdeps versions of it, so it could be possible to create one for NPTL and one for HTL. However, I don't see an existing way for these port-sepcific ldsodefs versions to add their own members to struct rtld_global. One way to do this would be to introduce a macro like LDSO_GLOBAL_EXTRA_MEMBERS, but I'm not sure whether that's a good or even acceptable approach. Signed-off-by: Sergey Bugaev --- elf/dl-reloc.c | 21 - elf/dl-support.c | 14 -- elf/dl-tls.c | 39 sysdeps/htl/dl-support.c | 5 ++ sysdeps/htl/dl-tls.c | 39 {elf => sysdeps/htl}/dl-tls_init_tp.c | 4 +- sysdeps/nptl/dl-support.c | 26 +++ sysdeps/nptl/dl-tls.c | 53 ++ sysdeps/unix/sysv/linux/alpha/dl-support.c | 2 +- 9 files changed, 125 insertions(+), 78 deletions(-) create mode 100644 sysdeps/htl/dl-tls.c rename {elf => sysdeps/htl}/dl-tls_init_tp.c (93%) create mode 100644 sysdeps/nptl/dl-support.c create mode 100644 sysdeps/nptl/dl-tls.c diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c index e13a672ade..0d97af752a 100644 --- a/elf/dl-reloc.c +++ b/elf/dl-reloc.c @@ -141,27 +141,6 @@ cannot allocate memory in static TLS block")); } } -#if !THREAD_GSCOPE_IN_TCB -/* Initialize static TLS area and DTV for current (only) thread. - libpthread implementations should provide their own hook - to handle all threads. */ -void -_dl_nothread_init_static_tls (struct link_map *map) -{ -#if TLS_TCB_AT_TP - void *dest = (char *) THREAD_SELF - map->l_tls_offset; -#elif TLS_DTV_AT_TP - void *dest = (char *) THREAD_SELF + map->l_tls_offset + TLS_PRE_TCB_SIZE; -#else -# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" -#endif - - /* Initialize the memory. */ - memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), - '\0', map->l_tls_blocksize - map->l_tls_initimage_size); -} -#endif /* !THREAD_GSCOPE_IN_TCB */ - void _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[], int reloc_mode, int consider_profiling) diff --git a/elf/dl-support.c b/elf/dl-support.c index 68abf606ca..28d61bda51 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -183,20 +183,6 @@ uint64_t _dl_hwcap_mask; * executable but this isn't true for all platforms. */ ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS; -#if THREAD_GSCOPE_IN_TCB -list_t _dl_stack_used; -list_t _dl_stack_user; -list_t _dl_stack_cache; -size_t _dl_stack_cache_actsize; -uintptr_t _dl_in_flight_stack; -int _dl_stack_cache_lock; -#else -/* If loading a shared object requires that we make the stack executable - when it was not, we do it by calling this function. - It returns an errno code or zero on success. */ -int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable; -void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls; -#endif struct dl_scope_free_list *_dl_scope_free_list; #ifdef NEED_DL_SYSINFO diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 423e380f7c..64797d2ed6 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -29,10 +29,6 @@ #include #include -#if THREAD_GSCOPE_IN_TCB -# include -#endif - #define TUNABLE_NAMESPACE rtld #include @@ -1057,38 +1053,3 @@ cannot create TLS data structures")); GL(dl_tls_generation) + 1); } } - -#if THREAD_GSCOPE_IN_TCB -static inline void __attribute__((always_inline)) -init_one_static_tls (struct pthread *curp, struct link_map *map) -{ -# if TLS_TCB_AT_TP - void *dest = (char *) curp - map->l_tls_offset; -# elif TLS_DTV_AT_TP - void *dest = (char *) curp + map->l_tls_offset + TLS_PRE_TCB_SIZE; -# else -# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" -# endif - - /* Initialize the memory. */ - memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), - '\0', map->l_tls_blocksize - map->l_tls_initimage_size); -} - -void -_dl_init_static_tls (struct link_map *map) -{ - lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE); - - /* Iterate over the list with system-allocated threads first. */ - list_t *runp; - list_for_each (runp, &GL (dl_stack_used)) -init_one_static_tls (list_entry (runp, struct pthread, list), map); - - /* Now the list with threads using user-allocated stacks. */ - list_for_each (runp, &GL (dl_stack_user)) -init_one_static_tls (list_entry (runp, struct pthread, list), map); - - lll_unlo
[RFC PATCH htl v2 2/4] htl: Reimplement GSCOPE
This is a new implementation of GSCOPE which largely mirrors its NPTL counterpart. Same as in NPTL, instead of a global flag shared between threads, there is now a per-thread GSCOPE flag stored in each thread's TCB. This makes entering and exiting a GSCOPE faster at the expense of making THREAD_GSCOPE_WAIT () slower. The largest win is the elimination of many redundant gsync_wake () RPC calls; previously, even simplest programs would make dozens of fully redundant gsync_wake () calls. Despite all ports now putting GSCOPE flag into TCP, this patch keeps the THREAD_GSCOPE_IN_TCB preprocessor definition, with value 0 for HTL and 1 for NPTL. It turns out that although originally this definition was indeed used to distinguish between the cases where the GSCOPE flag was stored in TCB or not, it has since become used as a general way to distinguish between HTL and NPTL. It may be worthwhile to get rid of this flag, or at least rename it to better reflect what it's used for. This patch does neither, though. Signed-off-by: Sergey Bugaev --- elf/dl-support.c| 1 - sysdeps/generic/ldsodefs.h | 4 --- sysdeps/htl/dl-thread_gscope_wait.c | 55 + sysdeps/mach/hurd/i386/tls.h| 19 ++ sysdeps/mach/hurd/tls.h | 20 +-- 5 files changed, 75 insertions(+), 24 deletions(-) create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c diff --git a/elf/dl-support.c b/elf/dl-support.c index 0155718175..68abf606ca 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -195,7 +195,6 @@ int _dl_stack_cache_lock; when it was not, we do it by calling this function. It returns an errno code or zero on success. */ int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable; -int _dl_thread_gscope_count; void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls; #endif struct dl_scope_free_list *_dl_scope_free_list; diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index 31c7d3945b..5a779ae1a3 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -487,8 +487,6 @@ struct rtld_global /* Mutex protecting the stack lists. */ EXTERN int _dl_stack_cache_lock; #else - EXTERN int _dl_thread_gscope_count; - /* The total number of thread IDs currently in use, or on the list of available thread IDs. */ EXTERN int _dl_pthread_num_threads; @@ -1379,10 +1377,8 @@ __rtld_mutex_init (void) } #endif /* !PTHREAD_IN_LIBC */ -#if THREAD_GSCOPE_IN_TCB void __thread_gscope_wait (void) attribute_hidden; # define THREAD_GSCOPE_WAIT() __thread_gscope_wait () -#endif __END_DECLS diff --git a/sysdeps/htl/dl-thread_gscope_wait.c b/sysdeps/htl/dl-thread_gscope_wait.c new file mode 100644 index 00..b277217b8e --- /dev/null +++ b/sysdeps/htl/dl-thread_gscope_wait.c @@ -0,0 +1,55 @@ +/* Out-of-line notification function for the GSCOPE locking mechanism. + Copyright (C) 2007-2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include +#include +#include + +void +__thread_gscope_wait (void) +{ + size_t i; + struct __pthread *t; + int *gscope_flagp; + + lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE); + + /* Iterate over the list of threads. */ + for (i = 0; i < GL (dl_pthread_num_threads); ++i) +{ + t = GL (dl_pthread_threads[i]); + if (t == NULL || t->tcb->gscope_flag == THREAD_GSCOPE_FLAG_UNUSED) +continue; + + gscope_flagp = &t->tcb->gscope_flag; + + /* We have to wait until this thread is done with the global + scope. First tell the thread that we are waiting and + possibly have to be woken. */ + if (atomic_compare_and_exchange_bool_acq (gscope_flagp, +THREAD_GSCOPE_FLAG_WAIT, +THREAD_GSCOPE_FLAG_USED)) +continue; + + do +lll_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE); + while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT); +} + + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); +} diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h index 057b2613f
[RFC PATCH htl v2 1/4] htl: Move thread table to ld.so
The next commit is going to introduce a new implementation of THREAD_GSCOPE_WAIT which needs to access the list of threads. Since it must be usable from the dynamic laoder, we have to move the symbols for the list of threads into the loader. Signed-off-by: Sergey Bugaev --- htl/Versions | 2 - htl/pt-alloc.c | 50 ++-- htl/pt-create.c | 11 ++ htl/pt-internal.h| 23 --- sysdeps/generic/ldsodefs.h | 8 sysdeps/htl/dl-support.c | 23 +++ sysdeps/htl/pt-key-delete.c | 8 ++-- sysdeps/htl/pthreadP.h | 2 - sysdeps/htl/raise.c | 8 +++- sysdeps/htl/thrd_current.c | 7 +++- sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- 13 files changed, 81 insertions(+), 67 deletions(-) create mode 100644 sysdeps/htl/dl-support.c diff --git a/htl/Versions b/htl/Versions index 4aea321016..4e0ebac285 100644 --- a/htl/Versions +++ b/htl/Versions @@ -168,8 +168,6 @@ libpthread { GLIBC_PRIVATE { __pthread_initialize_minimal; -__pthread_threads; - __cthread_detach; __cthread_fork; __pthread_detach; diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c index acc67f2711..f6708a60bf 100644 --- a/htl/pt-alloc.c +++ b/htl/pt-alloc.c @@ -28,19 +28,9 @@ of the threads functions "shall fail" if "No thread could be found corresponding to that specified by the given thread ID." */ -/* Thread ID lookup table. */ -struct __pthread **__pthread_threads; - /* The size of the thread ID lookup table. */ int __pthread_max_threads; -/* The total number of thread IDs currently in use, or on the list of - available thread IDs. */ -int __pthread_num_threads; - -/* A lock for the table, and the other variables above. */ -pthread_rwlock_t __pthread_threads_lock; - /* List of thread structures corresponding to free thread IDs. */ struct __pthread *__pthread_free_threads; pthread_mutex_t __pthread_free_threads_lock; @@ -132,25 +122,25 @@ __pthread_alloc (struct __pthread **pthread) } retry: - __pthread_rwlock_wrlock (&__pthread_threads_lock); + lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE); - if (__pthread_num_threads < __pthread_max_threads) + if (GL (dl_pthread_num_threads) < __pthread_max_threads) { /* We have a free slot. Use the slot number plus one as the thread ID for the new thread. */ - new->thread = 1 + __pthread_num_threads++; - __pthread_threads[new->thread - 1] = NULL; + new->thread = 1 + GL (dl_pthread_num_threads)++; + GL (dl_pthread_threads)[new->thread - 1] = NULL; - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); *pthread = new; return 0; } #ifdef PTHREAD_THREADS_MAX - else if (__pthread_num_threads >= PTHREAD_THREADS_MAX) + else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX) { /* We have reached the limit on the number of threads per process. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); free (new); return EAGAIN; @@ -162,7 +152,7 @@ retry: memory allocation, since that's a potentially blocking operation. */ max_threads = __pthread_max_threads; - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); /* Allocate a new lookup table that's twice as large. */ new_max_threads @@ -174,13 +164,13 @@ retry: return ENOMEM; } - __pthread_rwlock_wrlock (&__pthread_threads_lock); + lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE); /* Check if nobody else has already enlarged the table. */ if (max_threads != __pthread_max_threads) { /* Yep, they did. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); /* Free the newly allocated table and try again to allocate a slot. */ free (threads); @@ -188,22 +178,22 @@ retry: } /* Copy over the contents of the old table. */ - memcpy (threads, __pthread_threads, + memcpy (threads, GL (dl_pthread_threads), __pthread_max_threads * sizeof (struct __pthread *)); /* Save the location of the old table. We want to deallocate its storage after we released the lock. */ - old_threads = __pthread_threads; + old_threads = GL (dl_pthread_threads); /* Replace the table with the new one. */ __pthread_max_threads = new_max_threads; - __pthread_threads = threads; + GL (dl_pthread_threads) = threads; /* And allocate ourselves
[RFC PATCH htl v2 3/4] testrun.sh: Add support for --tool=rpctrace
rpctrace(1) is a Hurd RPC tracer tool, which is used similar to how strace(1) is used on GNU/Linux. Signed-off-by: Sergey Bugaev --- Makefile | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index f98d5a9e67..a49870d3d1 100644 --- a/Makefile +++ b/Makefile @@ -141,8 +141,9 @@ usage () { cat << EOF Usage: $$0 [OPTIONS] [ARGUMENTS...] - --tool=TOOL Run with the specified TOOL. It can be strace, valgrind or - container. The container will run within support/test-container. + --tool=TOOL Run with the specified TOOL. It can be strace, rpctrace, + valgrind or container. The container will run within + support/test-container. EOF exit 1 @@ -177,6 +178,10 @@ case "$$toolname" in exec strace $(patsubst %, -E%, $(run-program-env)) \ $(test-via-rtld-prefix) $${1+"$$@"} ;; + rpctrace) +exec rpctrace $(patsubst %, -E%, $(run-program-env)) \ + $(test-via-rtld-prefix) $${1+"$$@"} +;; valgrind) exec env $(run-program-env) valgrind $(test-via-rtld-prefix) $${1+"$$@"} ;; -- 2.31.1
Re: [RFC PATCH htl v2 0/4] Rewrite THREAD_GSCOPE
Thanks for looking into it! On Mon, Aug 30, 2021 at 6:37 PM Florian Weimer wrote: > You could kick the can down the road by switching to PTHREAD_IN_LIBC. > In some cases, that might actually be correct replacement. Ack; that should be easy to do. Nevertheless, please do look at patch 4/4 and say if you would prefer that approach. Do you happen to know if there's a testcase that exercises THREAD_GSCOPE_WAIT ()? Even simple programs evidently use THREAD_GSCOPE_SET_FLAG () / THREAD_GSCOPE_RESET_FLAG (), but I so far haven't been able to construct a program that would trigger the WAIT. I've tried dlopening and dlclosing libraries concurrently from several threads; I will try dlopening hundreds of libraries next (to hopefully trigger a reallocation of some table), but it would be handy if there was an official ready-to-use testcase. Sergey
sudo & addauth woes
Hello! Since this is bug-hurd and not just discuss-hurd, let me actually report some bugs I just ran into. Since recently, sudo has been broken: $ sudo echo hi Sorry, try again. Sorry, try again. sudo: 3 incorrect password attempts It never even asks for the password, just errors out all by itself. I thought, this is a perfect opportunity to break my Unix habits and use addauth instead. So I run: $ addauth root Password: it asks me for the (root) password, which I input, and then hangs. Since unlike sudo it's not setuid, I can actually rpctrace it. Here are the relevant parts: 144<--148(pid773)->proc_getloginpids_request (1) = 0 {1 730 725 694 685 662 648 538 111 97 28 12 2 749 659 547 200 95 3 4 7 5 755 756 748 747 746 745 744 743 724 649 644 643 636 545 544 543 542 519 506 166 44 29 27 26 25 20 16 10 9 6 757 772 773} task135(pid773)->mach_port_deallocate (pn{ 23}) = 0 144<--148(pid773)->proc_getmsgport_request (28) = 0138<--169(pid773) 138<--169(pid773)->msg_add_auth ( 167<--162(pid773)) = 0 task135(pid773)->mach_port_deallocate (pn{ 23}) = 0 144<--148(pid773)->proc_getmsgport_request (12) = 0138<--165(pid773) 138<--165(pid773)->msg_add_auth ( 167<--162(pid773)) = 0 task135(pid773)->mach_port_deallocate (pn{ 23}) = 0 144<--148(pid773)->proc_getmsgport_request (2) = 0138<--169(pid773) 138<--169(pid773)->msg_add_auth ( 167<--162(pid773)) It tries to add the new auth to all the processes in the current login collection (as documented), but I'm somehow not a login collection — or, rather, in the default login collection rooted at PID 1. It hangs on PID 2 — it looks like /hurd/startup does not communicate its message port to proc properly, or something. So next I try only adding auth to the session, not the whole login collection: $ addauth -S root Password: addauth: 757: Cannot get message port: (ipc/send) invalid destination port $ ps 757 PID TT STAT TIME COMMAND 757 p0 Ssow 0:00.10 -bash Eh, this last one is actually my fault: something something reauthentication, something something cached proc server port. The thing that worked in the end was: $ addauth -p $$ root I.e. adding auth only to the shell process. So, the issues here are: - sudo no longer works, - /hurd/startup / PID 2 doesn't respond to msg_add_auth (), - OpenSSH doesn't put me into a login collection, - addauth doesn't quite work once it adds new auth to itself. Sergey
[PATCH htl v3 0/5] Rewrite THREAD_GSCOPE
Hello, this is yet another revision of the same patchset to stop issuing many redundant gsync_wake () calls. Following the brief feedback I got the last time, it switches most users of THREAD_GSCOPE_IN_TCB to PTHREAD_IN_LIBC, then actually introduces the new GSCOPE implementation, and then finally drops the remaining uses of THREAD_GSCOPE_IN_TCB entirely, since it's not always on. Things still seem to build and work somewhat (tested on GNU/Hurd i686 and GNU/Linux x86_64). rpctrace uname still shows that there are no more gsync_wake () calls. Apparently dlopen (RTLD_GLOBAL) is a good way to trigger THREAD_GSCOPE_WAIT (). I have written a test program that spawns 100 threads, then each of the threads dlopen's 100 shared objects, then dlclose's them back. On GNU/Linux, I have verified with GDB that it hits NPTL's __thread_gscope_wait (). I haven't been able to verify the same on the Hurd due to the general flakiness of GDB there, but I have ran the program with the patched glibc multiple times and it doesn't crash, so I assume it's working fine. Sergey Bugaev (5): elf: Replace most uses of THREAD_GSCOPE_IN_TCB htl: Move thread table to ld.so htl: Reimplement GSCOPE elf: Remove THREAD_GSCOPE_IN_TCB testrun.sh: Add support for --tool=rpctrace Makefile | 9 +++- elf/dl-reloc.c | 4 +- elf/dl-support.c | 3 +- elf/dl-tls.c | 6 +-- elf/dl-tls_init_tp.c | 2 +- htl/Versions | 2 - htl/pt-alloc.c | 50 + htl/pt-create.c | 11 ++--- htl/pt-internal.h| 23 +++--- sysdeps/aarch64/nptl/tls.h | 1 - sysdeps/alpha/nptl/tls.h | 1 - sysdeps/arc/nptl/tls.h | 1 - sysdeps/arm/nptl/tls.h | 1 - sysdeps/csky/nptl/tls.h | 1 - sysdeps/generic/ldsodefs.h | 18 +--- sysdeps/generic/tls.h| 6 --- sysdeps/hppa/nptl/tls.h | 1 - sysdeps/htl/dl-support.c | 23 ++ sysdeps/htl/dl-thread_gscope_wait.c | 55 sysdeps/htl/pt-key-delete.c | 8 ++-- sysdeps/htl/pthreadP.h | 2 - sysdeps/htl/raise.c | 8 +++- sysdeps/htl/thrd_current.c | 7 ++- sysdeps/i386/nptl/tls.h | 1 - sysdeps/ia64/nptl/tls.h | 1 - sysdeps/m68k/nptl/tls.h | 1 - sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- sysdeps/mach/hurd/i386/tls.h | 19 sysdeps/mach/hurd/tls.h | 20 - sysdeps/microblaze/nptl/tls.h| 1 - sysdeps/mips/nptl/tls.h | 1 - sysdeps/nios2/nptl/tls.h | 1 - sysdeps/powerpc/nptl/tls.h | 1 - sysdeps/riscv/nptl/tls.h | 1 - sysdeps/s390/nptl/tls.h | 1 - sysdeps/sh/nptl/tls.h| 1 - sysdeps/sparc/nptl/tls.h | 1 - sysdeps/x86_64/nptl/tls.h| 1 - 40 files changed, 172 insertions(+), 128 deletions(-) create mode 100644 sysdeps/htl/dl-support.c create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c -- 2.31.1
[PATCH htl v3 5/5] testrun.sh: Add support for --tool=rpctrace
rpctrace(1) is a Hurd RPC tracer tool, which is used similar to how strace(1) is used on GNU/Linux. Signed-off-by: Sergey Bugaev --- Makefile | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index f98d5a9e67..a49870d3d1 100644 --- a/Makefile +++ b/Makefile @@ -141,8 +141,9 @@ usage () { cat << EOF Usage: $$0 [OPTIONS] [ARGUMENTS...] - --tool=TOOL Run with the specified TOOL. It can be strace, valgrind or - container. The container will run within support/test-container. + --tool=TOOL Run with the specified TOOL. It can be strace, rpctrace, + valgrind or container. The container will run within + support/test-container. EOF exit 1 @@ -177,6 +178,10 @@ case "$$toolname" in exec strace $(patsubst %, -E%, $(run-program-env)) \ $(test-via-rtld-prefix) $${1+"$$@"} ;; + rpctrace) +exec rpctrace $(patsubst %, -E%, $(run-program-env)) \ + $(test-via-rtld-prefix) $${1+"$$@"} +;; valgrind) exec env $(run-program-env) valgrind $(test-via-rtld-prefix) $${1+"$$@"} ;; -- 2.31.1
[PATCH htl v3 4/5] elf: Remove THREAD_GSCOPE_IN_TCB
All the ports now have THREAD_GSCOPE_IN_TCB set to 1. Remove all support for !THREAD_GSCOPE_IN_TCB, along with the definition itself. Signed-off-by: Sergey Bugaev --- elf/dl-support.c | 3 --- sysdeps/aarch64/nptl/tls.h| 1 - sysdeps/alpha/nptl/tls.h | 1 - sysdeps/arc/nptl/tls.h| 1 - sysdeps/arm/nptl/tls.h| 1 - sysdeps/csky/nptl/tls.h | 1 - sysdeps/generic/ldsodefs.h| 6 -- sysdeps/generic/tls.h | 6 -- sysdeps/hppa/nptl/tls.h | 1 - sysdeps/i386/nptl/tls.h | 1 - sysdeps/ia64/nptl/tls.h | 1 - sysdeps/m68k/nptl/tls.h | 1 - sysdeps/mach/hurd/i386/tls.h | 2 -- sysdeps/microblaze/nptl/tls.h | 1 - sysdeps/mips/nptl/tls.h | 1 - sysdeps/nios2/nptl/tls.h | 1 - sysdeps/powerpc/nptl/tls.h| 1 - sysdeps/riscv/nptl/tls.h | 1 - sysdeps/s390/nptl/tls.h | 1 - sysdeps/sh/nptl/tls.h | 1 - sysdeps/sparc/nptl/tls.h | 1 - sysdeps/x86_64/nptl/tls.h | 1 - 22 files changed, 35 deletions(-) diff --git a/elf/dl-support.c b/elf/dl-support.c index a317459fae..02e2ed72f5 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -197,9 +197,6 @@ int _dl_stack_cache_lock; int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable; void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls; #endif -#if !THREAD_GSCOPE_IN_TCB -int _dl_thread_gscope_count; -#endif struct dl_scope_free_list *_dl_scope_free_list; #ifdef NEED_DL_SYSINFO diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h index 6e896207a6..72f22dc718 100644 --- a/sysdeps/aarch64/nptl/tls.h +++ b/sysdeps/aarch64/nptl/tls.h @@ -109,7 +109,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -# define THREAD_GSCOPE_IN_TCB 1 # define THREAD_GSCOPE_FLAG_UNUSED 0 # define THREAD_GSCOPE_FLAG_USED 1 # define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h index 4dbccc5249..6328112135 100644 --- a/sysdeps/alpha/nptl/tls.h +++ b/sysdeps/alpha/nptl/tls.h @@ -103,7 +103,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define THREAD_GSCOPE_IN_TCB 1 #define THREAD_GSCOPE_FLAG_UNUSED 0 #define THREAD_GSCOPE_FLAG_USED 1 #define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/arc/nptl/tls.h b/sysdeps/arc/nptl/tls.h index 95300fdd21..e269c0a7a5 100644 --- a/sysdeps/arc/nptl/tls.h +++ b/sysdeps/arc/nptl/tls.h @@ -111,7 +111,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define THREAD_GSCOPE_IN_TCB 1 #define THREAD_GSCOPE_FLAG_UNUSED 0 #define THREAD_GSCOPE_FLAG_USED 1 #define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/arm/nptl/tls.h b/sysdeps/arm/nptl/tls.h index 1bd11307ce..699c16acfb 100644 --- a/sysdeps/arm/nptl/tls.h +++ b/sysdeps/arm/nptl/tls.h @@ -100,7 +100,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define THREAD_GSCOPE_IN_TCB 1 #define THREAD_GSCOPE_FLAG_UNUSED 0 #define THREAD_GSCOPE_FLAG_USED 1 #define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/csky/nptl/tls.h b/sysdeps/csky/nptl/tls.h index 7a234041ed..b210dfcb76 100644 --- a/sysdeps/csky/nptl/tls.h +++ b/sysdeps/csky/nptl/tls.h @@ -127,7 +127,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -# define THREAD_GSCOPE_IN_TCB 1 # define THREAD_GSCOPE_FLAG_UNUSED 0 # define THREAD_GSCOPE_FLAG_USED 1 # define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index 3092bbc924..b251fd1733 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -495,10 +495,6 @@ struct rtld_global EXTERN struct __pthread **_dl_pthread_threads; EXTERN int _dl_pthread_threads_lock; #endif - -#if !THREAD_GSCOPE_IN_TCB - EXTERN int _dl_thread_gscope_count; -#endif #ifdef SHARED }; # define __rtld_global_attribute__ @@ -1381,10 +1377,8 @@ __rtld_mutex_init (void) } #endif /* !PTHREAD_IN_LIBC */ -#if THREAD_GSCOPE_IN_TCB void __thread_gscope_wait (void) attribute_hidden; # define THREAD_GSCOPE_WAIT() __thread_gscope_wait () -#endif __END_DECLS diff --git a/sysdeps/generic/tls.h b/sysdeps/generic/tls.h index e86d70e6ce..f581c9a992 100644 --- a/sysdeps/generic/tls.h +++ b/sysdeps/generic/tls.h @@ -71,10 +71,4 @@ This macro returns the address of the DTV of the current thread. This normally is done using the thread register which points to the dtv or the TCB (from which the DTV can found). - - - THREAD_GSCOPE_IN_TCB - - This should be set to 1 if the global scope flag is stored within the TCB. -
[PATCH htl v3 3/5] htl: Reimplement GSCOPE
This is a new implementation of GSCOPE which largely mirrors its NPTL counterpart. Same as in NPTL, instead of a global flag shared between threads, there is now a per-thread GSCOPE flag stored in each thread's TCB. This makes entering and exiting a GSCOPE faster at the expense of making THREAD_GSCOPE_WAIT () slower. The largest win is the elimination of many redundant gsync_wake () RPC calls; previously, even simplest programs would make dozens of fully redundant gsync_wake () calls. Signed-off-by: Sergey Bugaev --- sysdeps/htl/dl-thread_gscope_wait.c | 55 + sysdeps/mach/hurd/i386/tls.h| 21 +++ sysdeps/mach/hurd/tls.h | 20 --- 3 files changed, 76 insertions(+), 20 deletions(-) create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c diff --git a/sysdeps/htl/dl-thread_gscope_wait.c b/sysdeps/htl/dl-thread_gscope_wait.c new file mode 100644 index 00..b277217b8e --- /dev/null +++ b/sysdeps/htl/dl-thread_gscope_wait.c @@ -0,0 +1,55 @@ +/* Out-of-line notification function for the GSCOPE locking mechanism. + Copyright (C) 2007-2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include +#include +#include + +void +__thread_gscope_wait (void) +{ + size_t i; + struct __pthread *t; + int *gscope_flagp; + + lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE); + + /* Iterate over the list of threads. */ + for (i = 0; i < GL (dl_pthread_num_threads); ++i) +{ + t = GL (dl_pthread_threads[i]); + if (t == NULL || t->tcb->gscope_flag == THREAD_GSCOPE_FLAG_UNUSED) +continue; + + gscope_flagp = &t->tcb->gscope_flag; + + /* We have to wait until this thread is done with the global + scope. First tell the thread that we are waiting and + possibly have to be woken. */ + if (atomic_compare_and_exchange_bool_acq (gscope_flagp, +THREAD_GSCOPE_FLAG_WAIT, +THREAD_GSCOPE_FLAG_USED)) +continue; + + do +lll_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE); + while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT); +} + + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); +} diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h index 057b2613f3..2ac65df921 100644 --- a/sysdeps/mach/hurd/i386/tls.h +++ b/sysdeps/mach/hurd/i386/tls.h @@ -369,6 +369,27 @@ _hurd_tls_new (thread_t child, struct i386_thread_state *state, tcbhead_t *tcb) return err; } +/* Global scope switch support. */ +# define THREAD_GSCOPE_IN_TCB 1 + +# define THREAD_GSCOPE_FLAG_UNUSED 0 +# define THREAD_GSCOPE_FLAG_USED 1 +# define THREAD_GSCOPE_FLAG_WAIT 2 + +# define THREAD_GSCOPE_SET_FLAG() \ + THREAD_SETMEM (THREAD_SELF, gscope_flag, THREAD_GSCOPE_FLAG_USED) + +# define THREAD_GSCOPE_RESET_FLAG() \ + ({ \ +int __flag; \ +asm volatile ("xchgl %0, %%gs:%P1" \ + : "=r" (__flag)\ + : "i" (offsetof (tcbhead_t, gscope_flag)), \ +"0" (THREAD_GSCOPE_FLAG_UNUSED));\ +if (__flag == THREAD_GSCOPE_FLAG_WAIT) \ + lll_wake (THREAD_SELF->gscope_flag, LLL_PRIVATE); \ + }) + #endif /* !__ASSEMBLER__ */ #endif /* i386/tls.h */ diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h index f83956d3d7..8e66d5ff53 100644 --- a/sysdeps/mach/hurd/tls.h +++ b/sysdeps/mach/hurd/tls.h @@ -52,26 +52,6 @@ # define GET_DTV(descr) \ (((tcbhead_t *) (descr))->dtv) -/* Global scope switch support. */ -#define THREAD_GSCOPE_IN_TCB 0 -#define THREAD_GSCOPE_GLOBAL -#define THREAD_GSCOPE_SET_FLAG() \ - atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1) -#define THREAD_GSCOPE_RESET_FLAG() \ - do
[PATCH htl v3 2/5] htl: Move thread table to ld.so
The next commit is going to introduce a new implementation of THREAD_GSCOPE_WAIT which needs to access the list of threads. Since it must be usable from the dynamic laoder, we have to move the symbols for the list of threads into the loader. Signed-off-by: Sergey Bugaev --- htl/Versions | 2 - htl/pt-alloc.c | 50 ++-- htl/pt-create.c | 11 ++ htl/pt-internal.h| 23 --- sysdeps/generic/ldsodefs.h | 9 + sysdeps/htl/dl-support.c | 23 +++ sysdeps/htl/pt-key-delete.c | 8 ++-- sysdeps/htl/pthreadP.h | 2 - sysdeps/htl/raise.c | 8 +++- sysdeps/htl/thrd_current.c | 7 +++- sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- 13 files changed, 82 insertions(+), 67 deletions(-) create mode 100644 sysdeps/htl/dl-support.c diff --git a/htl/Versions b/htl/Versions index 4aea321016..4e0ebac285 100644 --- a/htl/Versions +++ b/htl/Versions @@ -168,8 +168,6 @@ libpthread { GLIBC_PRIVATE { __pthread_initialize_minimal; -__pthread_threads; - __cthread_detach; __cthread_fork; __pthread_detach; diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c index acc67f2711..f6708a60bf 100644 --- a/htl/pt-alloc.c +++ b/htl/pt-alloc.c @@ -28,19 +28,9 @@ of the threads functions "shall fail" if "No thread could be found corresponding to that specified by the given thread ID." */ -/* Thread ID lookup table. */ -struct __pthread **__pthread_threads; - /* The size of the thread ID lookup table. */ int __pthread_max_threads; -/* The total number of thread IDs currently in use, or on the list of - available thread IDs. */ -int __pthread_num_threads; - -/* A lock for the table, and the other variables above. */ -pthread_rwlock_t __pthread_threads_lock; - /* List of thread structures corresponding to free thread IDs. */ struct __pthread *__pthread_free_threads; pthread_mutex_t __pthread_free_threads_lock; @@ -132,25 +122,25 @@ __pthread_alloc (struct __pthread **pthread) } retry: - __pthread_rwlock_wrlock (&__pthread_threads_lock); + lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE); - if (__pthread_num_threads < __pthread_max_threads) + if (GL (dl_pthread_num_threads) < __pthread_max_threads) { /* We have a free slot. Use the slot number plus one as the thread ID for the new thread. */ - new->thread = 1 + __pthread_num_threads++; - __pthread_threads[new->thread - 1] = NULL; + new->thread = 1 + GL (dl_pthread_num_threads)++; + GL (dl_pthread_threads)[new->thread - 1] = NULL; - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); *pthread = new; return 0; } #ifdef PTHREAD_THREADS_MAX - else if (__pthread_num_threads >= PTHREAD_THREADS_MAX) + else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX) { /* We have reached the limit on the number of threads per process. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); free (new); return EAGAIN; @@ -162,7 +152,7 @@ retry: memory allocation, since that's a potentially blocking operation. */ max_threads = __pthread_max_threads; - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); /* Allocate a new lookup table that's twice as large. */ new_max_threads @@ -174,13 +164,13 @@ retry: return ENOMEM; } - __pthread_rwlock_wrlock (&__pthread_threads_lock); + lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE); /* Check if nobody else has already enlarged the table. */ if (max_threads != __pthread_max_threads) { /* Yep, they did. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE); /* Free the newly allocated table and try again to allocate a slot. */ free (threads); @@ -188,22 +178,22 @@ retry: } /* Copy over the contents of the old table. */ - memcpy (threads, __pthread_threads, + memcpy (threads, GL (dl_pthread_threads), __pthread_max_threads * sizeof (struct __pthread *)); /* Save the location of the old table. We want to deallocate its storage after we released the lock. */ - old_threads = __pthread_threads; + old_threads = GL (dl_pthread_threads); /* Replace the table with the new one. */ __pthread_max_threads = new_max_threads; - __pthread_threads = threads; + GL (dl_pthread_threads) = threads; /* And allocate ourselves
[PATCH htl v3 1/5] elf: Replace most uses of THREAD_GSCOPE_IN_TCB
While originally this definition was indeed used to distinguish between the cases where the GSCOPE flag was stored in TCB or not, it has since become used as a general way to distinguish between HTL and NPTL. THREAD_GSCOPE_IN_TCB will be removed in the following commits, as HTL, which currently is the only port that does not put the flag into TCB, will get ported to put the GSCOPE flag into the TCB as well. To prepare for that change, migrate all code that wants to distinguish between HTL and NPTL to use PTHREAD_IN_LIBC instead, which is a better choice since the distinction mostly has to do with whether libc has access to the list of thread structures and therefore can initialize thread-local storage. The parts of code that actually depend on whether the GSCOPE flag is in TCB are left unchanged. Signed-off-by: Sergey Bugaev --- elf/dl-reloc.c | 4 ++-- elf/dl-support.c | 6 -- elf/dl-tls.c | 6 +++--- elf/dl-tls_init_tp.c | 2 +- sysdeps/generic/ldsodefs.h | 11 ++- 5 files changed, 16 insertions(+), 13 deletions(-) diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c index e13a672ade..6c957456b8 100644 --- a/elf/dl-reloc.c +++ b/elf/dl-reloc.c @@ -141,7 +141,7 @@ cannot allocate memory in static TLS block")); } } -#if !THREAD_GSCOPE_IN_TCB +#if !PTHREAD_IN_LIBC /* Initialize static TLS area and DTV for current (only) thread. libpthread implementations should provide their own hook to handle all threads. */ @@ -160,7 +160,7 @@ _dl_nothread_init_static_tls (struct link_map *map) memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); } -#endif /* !THREAD_GSCOPE_IN_TCB */ +#endif /* !PTHREAD_IN_LIBC */ void _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[], diff --git a/elf/dl-support.c b/elf/dl-support.c index 0155718175..a317459fae 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -183,7 +183,7 @@ uint64_t _dl_hwcap_mask; * executable but this isn't true for all platforms. */ ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS; -#if THREAD_GSCOPE_IN_TCB +#if PTHREAD_IN_LIBC list_t _dl_stack_used; list_t _dl_stack_user; list_t _dl_stack_cache; @@ -195,9 +195,11 @@ int _dl_stack_cache_lock; when it was not, we do it by calling this function. It returns an errno code or zero on success. */ int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable; -int _dl_thread_gscope_count; void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls; #endif +#if !THREAD_GSCOPE_IN_TCB +int _dl_thread_gscope_count; +#endif struct dl_scope_free_list *_dl_scope_free_list; #ifdef NEED_DL_SYSINFO diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 423e380f7c..d554ae4497 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -29,7 +29,7 @@ #include #include -#if THREAD_GSCOPE_IN_TCB +#if PTHREAD_IN_LIBC # include #endif @@ -1058,7 +1058,7 @@ cannot create TLS data structures")); } } -#if THREAD_GSCOPE_IN_TCB +#if PTHREAD_IN_LIBC static inline void __attribute__((always_inline)) init_one_static_tls (struct pthread *curp, struct link_map *map) { @@ -1091,4 +1091,4 @@ _dl_init_static_tls (struct link_map *map) lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE); } -#endif /* THREAD_GSCOPE_IN_TCB */ +#endif /* PTHREAD_IN_LIBC */ diff --git a/elf/dl-tls_init_tp.c b/elf/dl-tls_init_tp.c index d84adc992c..e482f3cfa9 100644 --- a/elf/dl-tls_init_tp.c +++ b/elf/dl-tls_init_tp.c @@ -36,7 +36,7 @@ rtld_lock_default_unlock_recursive (void *lock) void __tls_pre_init_tp (void) { -#if !THREAD_GSCOPE_IN_TCB +#if !PTHREAD_IN_LIBC GL(dl_init_static_tls) = &_dl_nothread_init_static_tls; #endif diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index 9c15259236..02bd579ef8 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -456,7 +456,7 @@ struct rtld_global /* Generation counter for the dtv. */ EXTERN size_t _dl_tls_generation; -#if !THREAD_GSCOPE_IN_TCB +#if !PTHREAD_IN_LIBC EXTERN void (*_dl_init_static_tls) (struct link_map *); #endif @@ -466,7 +466,7 @@ struct rtld_global size_t count; void *list[50]; } *_dl_scope_free_list; -#if THREAD_GSCOPE_IN_TCB +#if PTHREAD_IN_LIBC /* List of active thread stacks, with memory managed by glibc. */ EXTERN list_t _dl_stack_used; @@ -486,7 +486,8 @@ struct rtld_global /* Mutex protecting the stack lists. */ EXTERN int _dl_stack_cache_lock; -#else +#endif +#if !THREAD_GSCOPE_IN_TCB EXTERN int _dl_thread_gscope_count; #endif #ifdef SHARED @@ -1297,13 +1298,13 @@ extern void _dl_aux_init (ElfW(auxv_t) *av) /* Initialize the static TLS space for the link map in all existing threads. */ -#if THREAD_GSCOPE_IN_TCB +#if PTHREAD_IN_LIBC void _dl_init_static_tls (struct link_map *map)
Re: Wrong MIG generated headers
Hello, On Sat, Sep 11, 2021 at 12:27 PM Joan Lledó wrote: > Note the additional parameter *mach_msg_type_name_t objectPoly* in > memory_object_create_proxy. I don't know why is it there but I can't > compile the pci arbiter anymore because of that. What can I do? This is a little change that Samuel has committed recently [0] (at my suggestion). It lets the client specify the "disposition" of the object port(s), i.e. MAKE_SEND, COPY_SEND, or MOVE_SEND. [0]: https://git.savannah.gnu.org/cgit/hurd/gnumach.git/commit/?id=d1c056f2e23b7a9fa91dd5132b040813bfc00487 This doesn't change the RPC format, only the client (user) C routine signature. You'll have to adapt the PCI arbiter code to pass a value for this additional argument. The previous implicit default was MACH_MSG_TYPE_COPY_SEND, so passing it explicitly should be equivalent to the old behavior; but depending on your use case you might find it cleaner to use one of the other two dispositions. Sergey
Re: [PATCH htl v3 2/5] htl: Move thread table to ld.so
On Wed, Sep 15, 2021 at 2:27 AM Samuel Thibault wrote: > Please rather use __libc_rwlock_wrlock etc. Thanks, I did not realize there were actually more internal synchronization primitives implemented on top of lll. I will send out a new version shortly. > Even if for now they are actually locks and not rwlocks, the day we fix > that the rwlocking of dl_pthread_threads_lock will get fixed alongside. May I ask, why are rwlocks unimplemented? Is it just that nobody has done the work, or are there some unobvious complications? I'm asking because I happen to have an implementation of rwlocks [0] that works (passes the test) on the Hurd. It makes use of futex bitsets (which don't seem to be supported by gsync), but seems to work just fine without them. And HTL of course has its own rwlocks implementation. Would it make sense to me to spend some time to try and write a __libc_rwlock based on them, or are there some obstacles? [0]: https://github.com/bugaevc/lets-write-sync-primitives/blob/master/src/rwlock.cpp Sergey
Re: [PATCH htl v3 2/5] htl: Move thread table to ld.so
On Wed, Sep 15, 2021 at 5:34 PM Samuel Thibault wrote: > so it's probably not > worth spending time on making a separate __libc_rwlock implementation, > and rather spend it on making pthread_rwlock use gsync, like was done > for pthread_mutex and sem Oh, it currently doesn't? That's horrifying. Then I'll look into that next. Sergey P.S. There's this other thing that is way more urgent than either eliminating gsync_wake () calls or optimizing rwlocks. I sent you an email a week ago. Please take a look.
[PATCH htl v4 0/4] Rewrite THREAD_GSCOPE
This is v4 of the patchset. The previous versions are at [0][1][2][3]. [0]: https://lists.gnu.org/archive/html/bug-hurd/2021-05/msg00053.html [1]: https://lists.gnu.org/archive/html/bug-hurd/2021-05/msg00063.html [2]: https://sourceware.org/pipermail/libc-alpha/2021-August/130614.html [3]: https://sourceware.org/pipermail/libc-alpha/2021-September/130810.html Changes since v3: * Drop the two patches (1/5 and 5/5) that have been pushed (the remaining three patches are numbered 2/4, 3/4, 4/4) * Use __libc_rwlock for dl_pthread_threads_lock instead of a plain int/lll To test, before: $ rpctrace -o >(grep -c wake) /bin/uname GNU 58 After: $ ./testrun.sh --tool=rpctrace /bin/uname 2> >(grep -c wake) GNU 0 Sergey Bugaev (3): htl: Move thread table to ld.so htl: Reimplement GSCOPE elf: Remove THREAD_GSCOPE_IN_TCB elf/dl-support.c | 3 -- htl/Versions | 2 - htl/pt-alloc.c | 50 + htl/pt-create.c | 8 ++-- htl/pt-internal.h| 23 +++--- sysdeps/aarch64/nptl/tls.h | 1 - sysdeps/alpha/nptl/tls.h | 1 - sysdeps/arc/nptl/tls.h | 1 - sysdeps/arm/nptl/tls.h | 1 - sysdeps/csky/nptl/tls.h | 1 - sysdeps/generic/ldsodefs.h | 13 +++--- sysdeps/generic/tls.h| 6 --- sysdeps/hppa/nptl/tls.h | 1 - sysdeps/htl/dl-support.c | 23 ++ sysdeps/htl/dl-thread_gscope_wait.c | 55 sysdeps/htl/pt-key-delete.c | 8 ++-- sysdeps/htl/pthreadP.h | 2 - sysdeps/htl/raise.c | 8 +++- sysdeps/htl/thrd_current.c | 7 ++- sysdeps/i386/nptl/tls.h | 1 - sysdeps/ia64/nptl/tls.h | 1 - sysdeps/m68k/nptl/tls.h | 1 - sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- sysdeps/mach/hurd/i386/tls.h | 19 sysdeps/mach/hurd/tls.h | 20 - sysdeps/microblaze/nptl/tls.h| 1 - sysdeps/mips/nptl/tls.h | 1 - sysdeps/nios2/nptl/tls.h | 1 - sysdeps/powerpc/nptl/tls.h | 1 - sysdeps/riscv/nptl/tls.h | 1 - sysdeps/s390/nptl/tls.h | 1 - sysdeps/sh/nptl/tls.h| 1 - sysdeps/sparc/nptl/tls.h | 1 - sysdeps/x86_64/nptl/tls.h| 1 - 36 files changed, 156 insertions(+), 115 deletions(-) create mode 100644 sysdeps/htl/dl-support.c create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c -- 2.31.1
[PATCH htl v4 2/4] htl: Move thread table to ld.so
The next commit is going to introduce a new implementation of THREAD_GSCOPE_WAIT which needs to access the list of threads. Since it must be usable from the dynamic laoder, we have to move the symbols for the list of threads into the loader. Signed-off-by: Sergey Bugaev --- htl/Versions | 2 - htl/pt-alloc.c | 50 ++-- htl/pt-create.c | 8 ++-- htl/pt-internal.h| 23 --- sysdeps/generic/ldsodefs.h | 9 + sysdeps/htl/dl-support.c | 23 +++ sysdeps/htl/pt-key-delete.c | 8 ++-- sysdeps/htl/pthreadP.h | 2 - sysdeps/htl/raise.c | 8 +++- sysdeps/htl/thrd_current.c | 7 +++- sysdeps/mach/hurd/htl/pt-sigstate-init.c | 2 +- sysdeps/mach/hurd/htl/pt-sysdep.c| 2 +- sysdeps/mach/hurd/htl/pt-sysdep.h| 2 +- 13 files changed, 83 insertions(+), 63 deletions(-) create mode 100644 sysdeps/htl/dl-support.c diff --git a/htl/Versions b/htl/Versions index 4aea321016..4e0ebac285 100644 --- a/htl/Versions +++ b/htl/Versions @@ -168,8 +168,6 @@ libpthread { GLIBC_PRIVATE { __pthread_initialize_minimal; -__pthread_threads; - __cthread_detach; __cthread_fork; __pthread_detach; diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c index acc67f2711..f6e783be10 100644 --- a/htl/pt-alloc.c +++ b/htl/pt-alloc.c @@ -28,19 +28,9 @@ of the threads functions "shall fail" if "No thread could be found corresponding to that specified by the given thread ID." */ -/* Thread ID lookup table. */ -struct __pthread **__pthread_threads; - /* The size of the thread ID lookup table. */ int __pthread_max_threads; -/* The total number of thread IDs currently in use, or on the list of - available thread IDs. */ -int __pthread_num_threads; - -/* A lock for the table, and the other variables above. */ -pthread_rwlock_t __pthread_threads_lock; - /* List of thread structures corresponding to free thread IDs. */ struct __pthread *__pthread_free_threads; pthread_mutex_t __pthread_free_threads_lock; @@ -132,25 +122,25 @@ __pthread_alloc (struct __pthread **pthread) } retry: - __pthread_rwlock_wrlock (&__pthread_threads_lock); + __libc_rwlock_wrlock (GL (dl_pthread_threads_lock)); - if (__pthread_num_threads < __pthread_max_threads) + if (GL (dl_pthread_num_threads) < __pthread_max_threads) { /* We have a free slot. Use the slot number plus one as the thread ID for the new thread. */ - new->thread = 1 + __pthread_num_threads++; - __pthread_threads[new->thread - 1] = NULL; + new->thread = 1 + GL (dl_pthread_num_threads)++; + GL (dl_pthread_threads)[new->thread - 1] = NULL; - __pthread_rwlock_unlock (&__pthread_threads_lock); + __libc_rwlock_unlock (GL (dl_pthread_threads_lock)); *pthread = new; return 0; } #ifdef PTHREAD_THREADS_MAX - else if (__pthread_num_threads >= PTHREAD_THREADS_MAX) + else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX) { /* We have reached the limit on the number of threads per process. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + __libc_rwlock_unlock (GL (dl_pthread_threads_lock)); free (new); return EAGAIN; @@ -162,7 +152,7 @@ retry: memory allocation, since that's a potentially blocking operation. */ max_threads = __pthread_max_threads; - __pthread_rwlock_unlock (&__pthread_threads_lock); + __libc_rwlock_unlock (GL (dl_pthread_threads_lock)); /* Allocate a new lookup table that's twice as large. */ new_max_threads @@ -174,13 +164,13 @@ retry: return ENOMEM; } - __pthread_rwlock_wrlock (&__pthread_threads_lock); + __libc_rwlock_wrlock (GL (dl_pthread_threads_lock)); /* Check if nobody else has already enlarged the table. */ if (max_threads != __pthread_max_threads) { /* Yep, they did. */ - __pthread_rwlock_unlock (&__pthread_threads_lock); + __libc_rwlock_unlock (GL (dl_pthread_threads_lock)); /* Free the newly allocated table and try again to allocate a slot. */ free (threads); @@ -188,22 +178,22 @@ retry: } /* Copy over the contents of the old table. */ - memcpy (threads, __pthread_threads, + memcpy (threads, GL (dl_pthread_threads), __pthread_max_threads * sizeof (struct __pthread *)); /* Save the location of the old table. We want to deallocate its storage after we released the lock. */ - old_threads = __pthread_threads; + old_threads = GL (dl_pthread_threads); /* Replace the table with the new one. */ __pthread_max_threads = new_max_threads; - __pthread_threads = threads; + GL (dl_pthread_threads) = threads; /* And allocate ourselves
[PATCH htl v4 3/4] htl: Reimplement GSCOPE
This is a new implementation of GSCOPE which largely mirrors its NPTL counterpart. Same as in NPTL, instead of a global flag shared between threads, there is now a per-thread GSCOPE flag stored in each thread's TCB. This makes entering and exiting a GSCOPE faster at the expense of making THREAD_GSCOPE_WAIT () slower. The largest win is the elimination of many redundant gsync_wake () RPC calls; previously, even simplest programs would make dozens of fully redundant gsync_wake () calls. Signed-off-by: Sergey Bugaev --- sysdeps/htl/dl-thread_gscope_wait.c | 55 + sysdeps/mach/hurd/i386/tls.h| 21 +++ sysdeps/mach/hurd/tls.h | 20 --- 3 files changed, 76 insertions(+), 20 deletions(-) create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c diff --git a/sysdeps/htl/dl-thread_gscope_wait.c b/sysdeps/htl/dl-thread_gscope_wait.c new file mode 100644 index 00..7557e77aab --- /dev/null +++ b/sysdeps/htl/dl-thread_gscope_wait.c @@ -0,0 +1,55 @@ +/* Out-of-line notification function for the GSCOPE locking mechanism. + Copyright (C) 2007-2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include +#include +#include + +void +__thread_gscope_wait (void) +{ + size_t i; + struct __pthread *t; + int *gscope_flagp; + + __libc_rwlock_rdlock (GL (dl_pthread_threads_lock)); + + /* Iterate over the list of threads. */ + for (i = 0; i < GL (dl_pthread_num_threads); ++i) +{ + t = GL (dl_pthread_threads[i]); + if (t == NULL || t->tcb->gscope_flag == THREAD_GSCOPE_FLAG_UNUSED) +continue; + + gscope_flagp = &t->tcb->gscope_flag; + + /* We have to wait until this thread is done with the global + scope. First tell the thread that we are waiting and + possibly have to be woken. */ + if (atomic_compare_and_exchange_bool_acq (gscope_flagp, +THREAD_GSCOPE_FLAG_WAIT, +THREAD_GSCOPE_FLAG_USED)) +continue; + + do +lll_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE); + while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT); +} + + __libc_rwlock_unlock (GL (dl_pthread_threads_lock)); +} diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h index 057b2613f3..2ac65df921 100644 --- a/sysdeps/mach/hurd/i386/tls.h +++ b/sysdeps/mach/hurd/i386/tls.h @@ -369,6 +369,27 @@ _hurd_tls_new (thread_t child, struct i386_thread_state *state, tcbhead_t *tcb) return err; } +/* Global scope switch support. */ +# define THREAD_GSCOPE_IN_TCB 1 + +# define THREAD_GSCOPE_FLAG_UNUSED 0 +# define THREAD_GSCOPE_FLAG_USED 1 +# define THREAD_GSCOPE_FLAG_WAIT 2 + +# define THREAD_GSCOPE_SET_FLAG() \ + THREAD_SETMEM (THREAD_SELF, gscope_flag, THREAD_GSCOPE_FLAG_USED) + +# define THREAD_GSCOPE_RESET_FLAG() \ + ({ \ +int __flag; \ +asm volatile ("xchgl %0, %%gs:%P1" \ + : "=r" (__flag)\ + : "i" (offsetof (tcbhead_t, gscope_flag)), \ +"0" (THREAD_GSCOPE_FLAG_UNUSED));\ +if (__flag == THREAD_GSCOPE_FLAG_WAIT) \ + lll_wake (THREAD_SELF->gscope_flag, LLL_PRIVATE); \ + }) + #endif /* !__ASSEMBLER__ */ #endif /* i386/tls.h */ diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h index f83956d3d7..8e66d5ff53 100644 --- a/sysdeps/mach/hurd/tls.h +++ b/sysdeps/mach/hurd/tls.h @@ -52,26 +52,6 @@ # define GET_DTV(descr) \ (((tcbhead_t *) (descr))->dtv) -/* Global scope switch support. */ -#define THREAD_GSCOPE_IN_TCB 0 -#define THREAD_GSCOPE_GLOBAL -#define THREAD_GSCOPE_SET_FLAG() \ - atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1) -#define THREAD_GSCOPE_RESET_FLAG() \ - do \ -if (atomic_
[PATCH htl v4 4/4] elf: Remove THREAD_GSCOPE_IN_TCB
All the ports now have THREAD_GSCOPE_IN_TCB set to 1. Remove all support for !THREAD_GSCOPE_IN_TCB, along with the definition itself. Signed-off-by: Sergey Bugaev --- elf/dl-support.c | 3 --- sysdeps/aarch64/nptl/tls.h| 1 - sysdeps/alpha/nptl/tls.h | 1 - sysdeps/arc/nptl/tls.h| 1 - sysdeps/arm/nptl/tls.h| 1 - sysdeps/csky/nptl/tls.h | 1 - sysdeps/generic/ldsodefs.h| 6 -- sysdeps/generic/tls.h | 6 -- sysdeps/hppa/nptl/tls.h | 1 - sysdeps/i386/nptl/tls.h | 1 - sysdeps/ia64/nptl/tls.h | 1 - sysdeps/m68k/nptl/tls.h | 1 - sysdeps/mach/hurd/i386/tls.h | 2 -- sysdeps/microblaze/nptl/tls.h | 1 - sysdeps/mips/nptl/tls.h | 1 - sysdeps/nios2/nptl/tls.h | 1 - sysdeps/powerpc/nptl/tls.h| 1 - sysdeps/riscv/nptl/tls.h | 1 - sysdeps/s390/nptl/tls.h | 1 - sysdeps/sh/nptl/tls.h | 1 - sysdeps/sparc/nptl/tls.h | 1 - sysdeps/x86_64/nptl/tls.h | 1 - 22 files changed, 35 deletions(-) diff --git a/elf/dl-support.c b/elf/dl-support.c index a317459fae..02e2ed72f5 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -197,9 +197,6 @@ int _dl_stack_cache_lock; int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable; void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls; #endif -#if !THREAD_GSCOPE_IN_TCB -int _dl_thread_gscope_count; -#endif struct dl_scope_free_list *_dl_scope_free_list; #ifdef NEED_DL_SYSINFO diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h index 6e896207a6..72f22dc718 100644 --- a/sysdeps/aarch64/nptl/tls.h +++ b/sysdeps/aarch64/nptl/tls.h @@ -109,7 +109,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -# define THREAD_GSCOPE_IN_TCB 1 # define THREAD_GSCOPE_FLAG_UNUSED 0 # define THREAD_GSCOPE_FLAG_USED 1 # define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h index 4dbccc5249..6328112135 100644 --- a/sysdeps/alpha/nptl/tls.h +++ b/sysdeps/alpha/nptl/tls.h @@ -103,7 +103,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define THREAD_GSCOPE_IN_TCB 1 #define THREAD_GSCOPE_FLAG_UNUSED 0 #define THREAD_GSCOPE_FLAG_USED 1 #define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/arc/nptl/tls.h b/sysdeps/arc/nptl/tls.h index 95300fdd21..e269c0a7a5 100644 --- a/sysdeps/arc/nptl/tls.h +++ b/sysdeps/arc/nptl/tls.h @@ -111,7 +111,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define THREAD_GSCOPE_IN_TCB 1 #define THREAD_GSCOPE_FLAG_UNUSED 0 #define THREAD_GSCOPE_FLAG_USED 1 #define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/arm/nptl/tls.h b/sysdeps/arm/nptl/tls.h index 1bd11307ce..699c16acfb 100644 --- a/sysdeps/arm/nptl/tls.h +++ b/sysdeps/arm/nptl/tls.h @@ -100,7 +100,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -#define THREAD_GSCOPE_IN_TCB 1 #define THREAD_GSCOPE_FLAG_UNUSED 0 #define THREAD_GSCOPE_FLAG_USED 1 #define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/csky/nptl/tls.h b/sysdeps/csky/nptl/tls.h index 7a234041ed..b210dfcb76 100644 --- a/sysdeps/csky/nptl/tls.h +++ b/sysdeps/csky/nptl/tls.h @@ -127,7 +127,6 @@ typedef struct descr->member[idx] = (value) /* Get and set the global scope generation counter in struct pthread. */ -# define THREAD_GSCOPE_IN_TCB 1 # define THREAD_GSCOPE_FLAG_UNUSED 0 # define THREAD_GSCOPE_FLAG_USED 1 # define THREAD_GSCOPE_FLAG_WAIT 2 diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index f01b601d63..fd67871f4b 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -495,10 +495,6 @@ struct rtld_global EXTERN struct __pthread **_dl_pthread_threads; __libc_rwlock_define (EXTERN, _dl_pthread_threads_lock) #endif - -#if !THREAD_GSCOPE_IN_TCB - EXTERN int _dl_thread_gscope_count; -#endif #ifdef SHARED }; # define __rtld_global_attribute__ @@ -1381,10 +1377,8 @@ __rtld_mutex_init (void) } #endif /* !PTHREAD_IN_LIBC */ -#if THREAD_GSCOPE_IN_TCB void __thread_gscope_wait (void) attribute_hidden; # define THREAD_GSCOPE_WAIT() __thread_gscope_wait () -#endif __END_DECLS diff --git a/sysdeps/generic/tls.h b/sysdeps/generic/tls.h index e86d70e6ce..f581c9a992 100644 --- a/sysdeps/generic/tls.h +++ b/sysdeps/generic/tls.h @@ -71,10 +71,4 @@ This macro returns the address of the DTV of the current thread. This normally is done using the thread register which points to the dtv or the TCB (from which the DTV can found). - - - THREAD_GSCOPE_IN_TCB - - This should be set to 1 if the global scope flag is stored within the TCB. -
Re: [PATCH htl v4 0/4] Rewrite THREAD_GSCOPE
On Wed, Sep 15, 2021 at 8:22 PM Samuel Thibault wrote: > Did you run make check to verify that you didn't break the world? :) > > (i.e. no more failures than before the patch) Not yet; it takes a long time here, I'll leave it running overnight and report tomorrow. By the way, for a project as important as glibc, I'd expect there'd be a CI bot that would constantly build and test the latest master, if not all submitted patches. But it looks like this hasn't been done yet? [0] [0]: https://sourceware.org/glibc/wiki/CICDDesign Sergey
On conscription, copyright assignment, and CVEs
Hello everyone. I'm likely getting conscripted for military service some time soon. When exactly, I don't yet know, but it could happen any day now. What does it have to do with the Hurd? Well, it most likely means I will be offline (and so, unable to contribute or to read and respond to any messages) for a whole year; so I want to take care of some unfinished matters. My FSF copyright assignment is still unfinished, and I'm starting to doubt it will ever be. I haven't received a reply from the FSF person I was communicating with for the last two months. As I've stated previously, it's partly my own fault that the process is taking so long, for I have also been very slow to respond to them (in one case). But no matter whose fault it is, it looks like the process will require some more back-and-forth iterations/roundtrips, which are unlikely to happen fast enough to complete before I get conscripted. Recently, several people have asked me what's up with getting official CVEs for those Hurd vulnerabilities I've written about previously. Truth is, I don't really know how this works! Back in May, Amos Jeffries has kindly offered to help me with the CVE process; but we got stuck at exchanging GPG keys, and I haven't heard from him since June. I don't know if Amos is still interested, or if I should seek help elsewhere; but in any case, it's been two months since the fixes have been published. Everybody should have had plenty of time to upgrade. It's also been possible for any attackers to infer what the vulnerabilities were from the patches, which are publicly accessible (if not in the main Hurd tree). I think it would make sense now for me to just publish the details of what the vulnerabilities were. It should be an interesting read for everyone, and it would hopefully help with the CVE process somewhat (assuming someone would be interested in it, perhaps they even would be able to complete the process in my absence?). And also I expect to forget the details in a year's time (I must have already forgotten some!), so I better do it now rather than afterwards. So, if anybody knows of a reason I shall not do this, speak now or forever hold your peace! :) Sergey
Re: gnumach RPC: get info about the calling task
On Tue, Oct 12, 2021 at 3:37 PM Joan Lledó wrote: > Hi, > > I'm working on the gnumach rpc to return a pager from a task and an > address mapped in that task. Cool! What's the RPC signature? > For security reasons I'd like to check if the calling task is the same > as the one the given map belongs to. But I don't know how to do it. Please don't do this. This is very much against how capability systems, and Mach / Hurd in particular, are designed. In capability systems, it should never ever matter *who's* making a call; the only important thing is whether they have a relevant capability or not. In other words, this makes Mach RPC caller-transparent; it matters which messages are sent to which destination, but not who sends them. This (beautiful!) property/invariant is what allows you to manipulate any task that you have a task port of, not just your own task. This is used, for example, by the exec server to destroy and create threads and manipulate memory mappings of the task being exec'd. This is also used to manipulate the just-created "child" task in fork (), and likely in many other places. Finally, consider rpctrace: it intercepts, traces, and re-sends every single message coming from or to the traced task. If the kernel would check who the caller is, it would see that the caller is rpctrace and not the task, and that would presumably affect its behavior somehow (such as make it reject some calls). The whole idea of rpctrace is predicated on the invariant that the caller identity *doesn't* affect the servers' behavior. So in the case of vm_map-backed pager, it should matter whether you have a task port to the target task, not whether you *are* the target task. If someone has a task port to a task, it allows them to completely control the task anyway (including making it invoke any RPC); there's no additional security gains from checking who the caller is, but there will be additional breakage. > In the rpc implementation, the function receives a vm_map_t parameter. > How can I get the task from the vm_map_t? You'd declare the argument as task_t instead of vm_task_t; that would make the kernel-side C handler accept a task_t instead of a vm_map_t. The user code passes a task port in either case. > and how can I compare it with > the calling task to check whether it's the same? You can get the current (calling) task with the current_task () macro (see kern/thread.h). Or in fact you could get the vm_map directly by using the current_map () macro from the same file; then you could compare that against the vm_map_t you already have; so you wouldn't even need to do what I've said above (make your handler to accept a full task_t). But again, please don't do this. Sergey