[PATCH unionfs 3/3] Don't use strncat() with length derived from source string

2021-04-26 Thread Sergey Bugaev
This fixes Wstringop-overflow and Wstringop-truncation GCC warnings.
See https://gcc.gnu.org/bugzilla//show_bug.cgi?id=88059

Also, fix a bug where a string was not properly null-terminated.
---
 lib.c  | 4 ++--
 stow.c | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib.c b/lib.c
index 01cdbd0..717979b 100644
--- a/lib.c
+++ b/lib.c
@@ -151,8 +151,8 @@ make_filepath (char *path, char *filename)
   if (filepath == NULL) 
 return NULL;
   
-  strncpy (filepath, path, length);
-  strncat (filepath, filename, strlen (filename));
+  strcpy (filepath, path);
+  strcat (filepath, filename);
 
   return filepath;
 }
diff --git a/stow.c b/stow.c
index 812d33b..adfcf53 100644
--- a/stow.c
+++ b/stow.c
@@ -282,14 +282,15 @@ stow_diradd (char *dir, int flags, struct patternlist 
*patternlist,
 {
   char *tmp;
 
-  tmp = (char *) malloc (dir_len + 1);
+  tmp = (char *) malloc (dir_len + 2);
 
   if (tmp == NULL)
return ENOMEM;
 
-  strncpy (tmp, dir, dir_len);
+  strcpy (tmp, dir);
 
   tmp[dir_len] = '/';
+  tmp[dir_len + 1] = 0;
 
   dir = tmp;
 }
-- 
2.31.1




[PATCH unionfs 2/3] Add some missing includes

2021-04-26 Thread Sergey Bugaev
This fixes multiple build errors.
---
 lib.c| 1 +
 lib.h| 1 +
 lnode.c  | 1 +
 lnode.h  | 1 +
 node.c   | 1 +
 stow.c   | 2 ++
 ulfs.c   | 1 +
 update.c | 1 +
 8 files changed, 9 insertions(+)

diff --git a/lib.c b/lib.c
index e7f3aa9..01cdbd0 100644
--- a/lib.c
+++ b/lib.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "lib.h"
diff --git a/lib.h b/lib.h
index 095adaa..da01f4a 100644
--- a/lib.h
+++ b/lib.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Returned directory entries are aligned to blocks this many bytes
long.  Must be a power of two.  */
diff --git a/lnode.c b/lnode.c
index 427bc3c..4287489 100644
--- a/lnode.c
+++ b/lnode.c
@@ -21,6 +21,7 @@
nodes.  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/lnode.h b/lnode.h
index f5a50f7..0df3047 100644
--- a/lnode.h
+++ b/lnode.h
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 
 struct lnode
 {
diff --git a/node.c b/node.c
index 19228f6..78b5eb8 100644
--- a/node.c
+++ b/node.c
@@ -19,6 +19,7 @@
 
 /* node management.  */
 
+#include 
 #include 
 #include 
 #include 
diff --git a/stow.c b/stow.c
index a04ffdf..812d33b 100644
--- a/stow.c
+++ b/stow.c
@@ -21,7 +21,9 @@
 /* Stow mode for unionfs.  */
 
 #include 
+#include 
 #include 
+#include 
 
 #include "ulfs.h"
 #include "lib.h"
diff --git a/ulfs.c b/ulfs.c
index 50affa6..773ebc4 100644
--- a/ulfs.c
+++ b/ulfs.c
@@ -19,6 +19,7 @@
 
 /* Underlying filesystem management.  */
 
+#include 
 #include 
 #include 
 #include 
diff --git a/update.c b/update.c
index 113b25c..f416d5e 100644
--- a/update.c
+++ b/update.c
@@ -21,6 +21,7 @@
root node update.  */
 
 #include 
+#include 
 #include 
 
 #include "ncache.h"
-- 
2.31.1




[PATCH unionfs 0/3] Fix unionfs build errors

2021-04-26 Thread Sergey Bugaev
Hello,

I'm trying to use unionfs [0]; but I've run into a few build
errors and warnings. With the following patches it builds and
works on the latest Debian GNU/Hurd with GCC 10.

P.S. This is the first time I attempt sending patches over email,
please let me know if I've done anything wrong.

[0]: https://www.gnu.org/software/hurd/hurd/translator/unionfs.html

Sergey Bugaev (3):
  Implement MiG intran payload support
  Add some missing includes
  Don't use strncat() with length derived from source string

 lib.c|  5 +++--
 lib.h|  1 +
 lnode.c  |  1 +
 lnode.h  |  1 +
 node.c   |  1 +
 stow-mutations.h |  1 +
 stow-priv.h  |  4 
 stow.c   | 16 ++--
 ulfs.c   |  1 +
 update.c |  1 +
 10 files changed, 28 insertions(+), 4 deletions(-)

-- 
2.31.1




[PATCH unionfs 1/3] Implement MiG intran payload support

2021-04-26 Thread Sergey Bugaev
This fixes a build error.
---
 stow-mutations.h | 1 +
 stow-priv.h  | 4 
 stow.c   | 9 +
 3 files changed, 14 insertions(+)

diff --git a/stow-mutations.h b/stow-mutations.h
index d36280d..95c1e7f 100644
--- a/stow-mutations.h
+++ b/stow-mutations.h
@@ -21,6 +21,7 @@
 /* Only CPP macro definitions should go in this file. */
 
 #define FS_NOTIFY_INTRAN stow_notify_t begin_using_notify_port (fs_notify_t)
+#define FS_NOTIFY_INTRAN_PAYLOAD stow_notify_t begin_using_notify_port_payload
 #define FS_NOTIFY_DESTRUCTOR end_using_notify_port (stow_notify_t)
 
 #define FS_NOTIFY_IMPORTS import "stow-priv.h";
diff --git a/stow-priv.h b/stow-priv.h
index 2212ac9..eeae7b3 100644
--- a/stow-priv.h
+++ b/stow-priv.h
@@ -36,6 +36,10 @@ typedef struct stow_notify *stow_notify_t;
arranges for this to happen for the fs_notify interfaces. */
 stow_notify_t begin_using_notify_port (fs_notify_t port);
 
+/* Called by MiG to translate ports into stow_notify_t when using the
+   protected payload feature.  mutations.h arranges for this to happen
+   for the fs_notify interfaces. */
+stow_notify_t begin_using_notify_port_payload (unsigned long payload);
 
 /* Called by MiG after server routines have been run; this balances
begin_using_notify_port, and is arranged for the fs_notify
diff --git a/stow.c b/stow.c
index cf6366e..a04ffdf 100644
--- a/stow.c
+++ b/stow.c
@@ -167,6 +167,15 @@ begin_using_notify_port (fs_notify_t port)
   return ports_lookup_port (stow_port_bucket, port, stow_port_class);
 }
 
+/* Called by MiG to translate ports into stow_notify_t when using the
+   protected payload feature.  mutations.h arranges for this to happen
+   for the fs_notify interfaces. */
+stow_notify_t
+begin_using_notify_port_payload (unsigned long payload)
+{
+  return ports_lookup_payload (stow_port_bucket, payload, stow_port_class);
+}
+
 /* Called by MiG after server routines have been run; this balances
begin_using_notify_port, and is arranged for the fs_notify
interfaces by mutations.h. */
-- 
2.31.1




Re: [PATCH unionfs 3/3] Don't use strncat() with length derived from source string

2021-04-26 Thread Sergey Bugaev
On Mon, Apr 26, 2021 at 9:02 PM Jessica Clarke  wrote:
> > -  strncpy (filepath, path, length);
> > -  strncat (filepath, filename, strlen (filename));
> > +  strcpy (filepath, path);
> > +  strcat (filepath, filename);
>
> This is dubious. We should be using safe interfaces where possible.

To expand a bit on my (an GCC's) reasoning, it makes no sense to use
strncpy/strncat with length that is derived from the *source* string
length, such as

strncat (filepath, filename, strlen (filename));

because str[n]cat would never copy more than strlen (filename) bytes
from filename anyway. The intended/safe way to use strncpy is by
specifying the *destination* buffer size, e.g.

strcpy (buffer, source, sizeof buffer);

I could have done just that; but then it makes little sense when the
buffer size is, too, directly derived from the source string length,
as in

int dir_len;
dir_len = strlen(dir);
char *tmp;
tmp = (char *) malloc (dir_len + 1);
strncpy (tmp, dir, dir_len);

But anyway, I don't care enough about this rather harmless warning to
argue further.

The actually important part here was the following fix (which I'm
going to use as an excuse to try out the "scissors" feature):

-- >8 --
Subject: [PATCH unionfs v2 3/3] Properly null-terminate a string

---
 stow.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/stow.c b/stow.c
index 812d33b..ddbcf20 100644
--- a/stow.c
+++ b/stow.c
@@ -281,16 +281,17 @@ stow_diradd (char *dir, int flags, struct
patternlist *patternlist,
   if (dir[dir_len - 1 ] != '/')
 {
   char *tmp;

-  tmp = (char *) malloc (dir_len + 1);
+  tmp = (char *) malloc (dir_len + 2);

   if (tmp == NULL)
return ENOMEM;

   strncpy (tmp, dir, dir_len);

   tmp[dir_len] = '/';
+  tmp[dir_len + 1] = 0;

   dir = tmp;
 }
-- 

Sergey



Re: [PATCH unionfs 3/3] Don't use strncat() with length derived from source string

2021-04-27 Thread Sergey Bugaev
On Mon, Apr 26, 2021 at 11:10 PM Samuel Thibault
 wrote:
> Err, but wouldn't the compiler be able to determine that the size was
> properly computed, and avoid emitting a false-positive warning?

It is my understanding, based on
https://gcc.gnu.org/bugzilla//show_bug.cgi?id=88059, that GCC does not
do any sophisticated analysis here, and just warns about any case
where the specified length depends on the source size. Which makes
sense to me, because either the destination buffer size depends on the
source string length, in which case you can be sure it fits and don't
need strncpy, or it does not depend on the source string length, in
which case the string might not fit and you'd use strncpy, passing the
destination buffer size.

> The compiler emitting a warning looks to me like a sign that there might
> really be something subtly wrong. Actually sending the output of the
> compiler on the list would help us to know more about it.

Here are the warnings I get (gcc --version is gcc (Debian
10.2.1-6+hurd.1) 10.2.1 20210110):

lib.c: In function ‘make_filepath’:
lib.c:154:3: warning: ‘strncpy’ specified bound depends on the length
of the source argument [-Wstringop-overflow=]
  154 |   strncpy (filepath, path, length);
  |   ^~~~
lib.c:149:12: note: length computed here
  149 |   length = strlen (path) + strlen (filename) + 2;
  |^
lib.c:155:3: warning: ‘strncat’ specified bound depends on the length
of the source argument [-Wstringop-overflow=]
  155 |   strncat (filepath, filename, strlen (filename));
  |   ^~~
options.c: In function ‘argp_parse_common_options’:
options.c:77:5: warning: variable ‘ulfs_match’ set but not used
[-Wunused-but-set-variable]
   77 | ulfs_match = 0, ulfs_priority = 0;
  | ^~
stow.c: In function ‘stow_diradd’:
stow.c:290:7: warning: ‘strncpy’ output truncated before terminating
nul copying as many bytes from a string as its length
[-Wstringop-truncation]
  290 |   strncpy (tmp, dir, dir_len);
  |   ^~~
stow.c:275:13: note: length computed here
  275 |   dir_len = strlen(dir);
  | ^~~

The last one is indeed a false positive, because we (with my previous
patch merged) now null-terminate the resulting string explicitly.

-- 
Sergey



[RFC PATCH tarfs 5/6] Attempt to implement mmap coherence

2021-04-29 Thread Sergey Bugaev
When reading from or writing to a node that has an attached pager,
use pager_memcpy () instead of accessing the cache directly. This
enables tarfs_read_node () and tarfs_write_node () too see and affect
the changes made to the file through the memory mapping.
---
 tarfs.c | 89 -
 1 file changed, 76 insertions(+), 13 deletions(-)

diff --git a/tarfs.c b/tarfs.c
index 3b9bcf590..5195d5771 100644
--- a/tarfs.c
+++ b/tarfs.c
@@ -759,11 +759,43 @@ tarfs_lookup_node (struct node** node, struct node* dir, 
const char* name)
 error_t
 tarfs_read_node (struct node *node, off_t offset, size_t *len, void* data)
 {
+  struct pager *pager;
+  memory_object_t memobj;
+  error_t err;
+
   if (S_ISDIR (node->nn_stat.st_mode))
   {
 *len = 0;
 return EISDIR;
   }
+
+  if (node->nn->hardlink)
+node = node->nn->hardlink;
+
+  if (node->nn_stat.st_size <= offset)
+{
+  *len = 0;
+  return 0;
+}
+  if (node->nn_stat.st_size - offset < *len)
+*len = node->nn_stat.st_size - offset;
+
+  /* If we have a pager, always go through it.
+ The pager will call into the cache if needed. */
+  pager = NODE_INFO(node)->pager;
+  if (pager)
+{
+  ports_port_ref (pager);
+  pthread_mutex_unlock (&node->lock);
+  memobj = pager_get_port (pager);
+  mach_port_insert_right (mach_task_self (), memobj, memobj,
+  MACH_MSG_TYPE_MAKE_SEND);
+  err = pager_memcpy (pager, memobj, offset, data, len, VM_PROT_READ);
+  mach_port_deallocate (mach_task_self (), memobj);
+  pthread_mutex_lock (&node->lock);
+  ports_port_deref (pager);
+  return err;
+}
   else
 return cache_read (node, offset, *len, data, len);
 }
@@ -774,27 +806,54 @@ error_t
 tarfs_write_node (struct node *node, off_t offset, size_t *len, void *data)
 {
   IF_RWFS;
+  struct node *original_node;
+  struct pager *pager;
+  memory_object_t memobj;
+  error_t err;
 
   if (S_ISDIR (node->nn_stat.st_mode))
   {
 *len = 0;
 return EISDIR;
   }
-  else
+
+  original_node = node;
+  if (node->nn->hardlink)
+node = node->nn->hardlink;
+
+  /* First, grow the node if we need to. This would happen automatically
+ for the cache path below, but not for the pager path. */
+  if (offset + *len > node->nn_stat.st_size)
   {
-error_t err;
-/* Checks whether we need to actually write to another node.
-   (hard links are not handled by cache_write ()).  */
-struct node *what = node->nn->hardlink ? node->nn->hardlink : node;
-
+err = cache_set_size (node, offset + *len);
+if (err)
+  return err;
+  }
+
+  /* If we have a pager, always go through it.
+ The pager will call into the cache if needed. */
+  pager = NODE_INFO(node)->pager;
+  if (pager)
+{
+  ports_port_ref (pager);
+  pthread_mutex_unlock (&node->lock);
+  memobj = pager_get_port (pager);
+  mach_port_insert_right (mach_task_self (), memobj, memobj,
+  MACH_MSG_TYPE_MAKE_SEND);
+  err = pager_memcpy (pager, memobj, offset, data, len,
+  VM_PROT_READ|VM_PROT_WRITE);
+  mach_port_deallocate (mach_task_self (), memobj);
+  pthread_mutex_lock (&node->lock);
+  ports_port_deref (pager);
+}
+  else
 err = cache_write (node, offset, data, *len, len);
 
-/* Synchronize stat with hard link's target.  */
-if ((! err) && (what != node))
-  node->nn_stat.st_size = what->nn_stat.st_size;
+  /* Synchronize stat with hard link's target.  */
+  if ((! err) && (original_node != node))
+original_node->nn_stat.st_size = node->nn_stat.st_size;
 
-return err;
-  }
+  return err;
 }
 
 /* Update NODE stat structure and mark it as dirty.  */
@@ -1035,7 +1094,6 @@ tarfs_io_map (struct node *node, memory_object_t *rdobj, 
memory_object_t *wrobj)
   return 0;
 }
 
-
 
 /* Rounds SIZE to the upper RECORDSIZE.  */
 static inline size_t
@@ -1171,7 +1229,12 @@ tarfs_sync_fs (int wait)
   char *path;
   size_t size;
 
-  /* Lock the node first */
+  /* If the node has a pager, ask it to flush
+ its state back into the cache. Do this before
+ locking the node. */
+  if (NODE_INFO(node)->pager)
+pager_sync (NODE_INFO(node)->pager, 1);
+
   pthread_mutex_lock (&node->lock);
   have_to_sync = (tar->offset != file_offs + RECORDSIZE);
   path = fs_get_path_from_root (netfs_root_node, node);
-- 
2.31.1




[RFC PATCH tarfs 2/6] Link to libpager and initialize it on startup

2021-04-29 Thread Sergey Bugaev
---
 Makefile |  2 +-
 main.c   | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e3936bfda..6e2eb422e 100644
--- a/Makefile
+++ b/Makefile
@@ -29,7 +29,7 @@ CFLAGS += -DDEBUG_ZIP # zip stores debugging
 # Note: -lz has to be first otherwise inflate() will be the exec server's
 #   inflate function
 LDFLAGS = -L~ -lz -L. -lnetfs -lfshelp -liohelp -lports \
-  -lihash -lshouldbeinlibc -lpthread -lstore -lbz2
+  -lihash -lshouldbeinlibc -lpthread -lstore -lbz2 -lpager
 CTAGS   = ctags
 
 SRC = main.c netfs.c tarfs.c tarlist.c fs.c cache.c tar.c names.c \
diff --git a/main.c b/main.c
index 68268013f..90286de08 100644
--- a/main.c
+++ b/main.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -40,6 +41,8 @@ mach_port_t ul_node;
 /* Has to be defined for libnetfs...  */
 int netfs_maxsymlinks = 2;
 
+struct port_bucket *tarfs_pager_bucket;
+
 /* Main.  */
 int
 main (int argc, char **argv)
@@ -47,6 +50,7 @@ main (int argc, char **argv)
   struct argp fs_argp;
   mach_port_t bootstrap_port;
   struct iouser *user;
+  struct pager_requests *pager_requests;
   error_t err;
 
   /* Defaults to tarfs. */
@@ -68,6 +72,14 @@ main (int argc, char **argv)
 error (EXIT_FAILURE, err, "cannot create root node");
   ul_node = netfs_startup (bootstrap_port, 0);
 
+  /* Init libpager. */
+  tarfs_pager_bucket = ports_create_bucket ();
+  if (!tarfs_pager_bucket)
+error (EXIT_FAILURE, errno, "cannot create port bucket");
+  err = pager_start_workers (tarfs_pager_bucket, &pager_requests);
+  if (err)
+error (1, err, "cannot start libpager");
+
   for (;;)
 netfs_server_loop ();
 
-- 
2.31.1




[RFC PATCH tarfs 0/6] mmap support for tarfs

2021-04-29 Thread Sergey Bugaev
Hello,

I'm trying to implement mmap support in tarfs.

The reason I need it is I want to run some binaries directly form a tarfs
mount, without unpacking them to disk (ext2fs). ld.so uses mmap to load shared
libraries uncoditionally [0].

So I've been digging through libpager, libnetfs, tarfs, and other code, trying
to understand how things fit together and hook them up. The patch series I'm
attaching implements mmap by lazily creating a pager for each node. The pager
reads and writes data the same way tarfs_read_node () and tarfs_write_node ()
do: namely, by calling into the existing cache implementation.

The most challenging part has been "mmap coherence" (perhaps there is an
established term for this concept that I'm not aware of?). Basically, the file
system has to guarantee that any modificaitons made to a file via io_write ()
(and other similar APIs) are immediately visible through any mappings of the
file, and similarly any changes made through a mapping have to be immediately
visible through io_read (). I peeked at how libdiskfs deals with this (since
this does work as expected with ext2fs in my testing), and it forwards any
reads/writes to pager_memcpy () instead of accessing the underlying data
directly. I've tried to do the same, except that I only forward the calls to
a pager if there is one.

I've written a tiny program to test mmap behavior (I've uploaded its source
here [1]), and used it to verify that my implementation behaves somewhat
sanely. I'm also able to load binaries and shared libraries from tarfs with
this patchset. In fact, I'm able to run code chrooted to tarfs.

Please help me out by commenting on the patches. Does the overall approach look
fine, or am I missing some crucial details? Have I messed up reference
counting? Is there a better way to deal with a held mutex that unlocking it
before calling a function that re-locks it? (Making the mutex recursive,
perhaps? Or extracting the inner part into its own function?)

[0]: https://www.gnu.org/software/hurd/open_issues/libnetfs_io_map.html
[1]: https://paste.gg/p/anonymous/7de6f56e7a1c4babbab17bd6ccc2db46

Sergey Bugaev (6):
  Plumb io_map () through the backend layer
  Link to libpager and initialize it on startup
  Implement basic read-only mmap support
  Implement basic support for writable mappings
  Attempt to implement mmap coherence
  Update TODO and BUGS

 BUGS  |   1 -
 Makefile  |   4 +-
 TODO  |   2 -
 backend.h |   3 ++
 main.c|  13 +
 netfs.c   |  36 +++---
 pager.c   | 141 ++
 pager.h   |   9 
 tarfs.c   | 114 ++-
 tarfs.h   |   2 +
 10 files changed, 302 insertions(+), 23 deletions(-)
 create mode 100644 pager.c
 create mode 100644 pager.h

-- 
2.31.1




[RFC PATCH tarfs 1/6] Plumb io_map () through the backend layer

2021-04-29 Thread Sergey Bugaev
---
 backend.h |  3 +++
 netfs.c   | 36 ++--
 tarfs.c   |  9 +
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/backend.h b/backend.h
index 82bbee77b..0b281f686 100644
--- a/backend.h
+++ b/backend.h
@@ -110,6 +110,9 @@ struct fs_backend
  or S_IFCHR).  */
   error_t (* mkdev_node)   (struct node *node, mode_t type, dev_t indexes);
 
+  /* Returns memory objects for mapping this node. */
+  error_t (* io_map) (struct node *node, memory_object_t *rdobj, 
memory_object_t *wrobj);
+
   /* Free all resources associated to NODE.  */
   void (* free_node) (struct node *node);
 
diff --git a/netfs.c b/netfs.c
index 6ed7e3f39..6a6f14f94 100644
--- a/netfs.c
+++ b/netfs.c
@@ -821,14 +821,38 @@ netfs_S_file_syncfs (struct protid *user,
   return err;
 }
 
-/* The following stub has been added as a reminder.  */
-#if 0
 error_t
-netfs_S_io_map (struct protid *user, 
+netfs_S_io_map (struct protid *user,
mach_port_t *rdobj, mach_msg_type_name_t *rdobjtype,
mach_port_t *wrobj, mach_msg_type_name_t *wrobjtype)
 {
-  error (0, 0, "Warning: io_map () not supported");
-  return EOPNOTSUPP;
+  error_t err;
+  struct node *node;
+
+  if (!user)
+return EOPNOTSUPP;
+
+  if (!(user->po->openstat & O_READ))
+{
+  *rdobj = MACH_PORT_NULL;
+  rdobj = NULL;
+}
+  if (!(user->po->openstat & O_WRITE))
+{
+  *wrobj = MACH_PORT_NULL;
+  wrobj = NULL;
+}
+
+  if (rdobj || wrobj)
+{
+  node = user->po->np;
+  pthread_mutex_lock (&node->lock);
+  err = backend.io_map (node, rdobj, wrobj);
+  pthread_mutex_unlock (&node->lock);
+}
+
+  *rdobjtype = MACH_MSG_TYPE_MAKE_SEND;
+  *wrobjtype = MACH_MSG_TYPE_MAKE_SEND;
+
+  return err;
 }
-#endif
diff --git a/tarfs.c b/tarfs.c
index 1a01040a2..c1d9a5cc1 100644
--- a/tarfs.c
+++ b/tarfs.c
@@ -1010,6 +1010,14 @@ tarfs_mkdev_node (struct node *node, mode_t type, dev_t 
indexes)
   return EOPNOTSUPP;
 }
 
+error_t
+tarfs_io_map (struct node *node, memory_object_t *rdobj, memory_object_t 
*wrobj)
+{
+  error (0, 0, "Warning: io_map () not supported");
+  return EOPNOTSUPP;
+}
+
+
 
 /* Rounds SIZE to the upper RECORDSIZE.  */
 static inline size_t
@@ -1370,6 +1378,7 @@ struct fs_backend tarfs_backend =
   tarfs_link_node,
   tarfs_symlink_node,
   tarfs_mkdev_node,
+  tarfs_io_map,
 
   tarfs_free_node,
 
-- 
2.31.1




[RFC PATCH tarfs 3/6] Implement basic read-only mmap support

2021-04-29 Thread Sergey Bugaev
Each node can have a pager attached. The pager gets created
the first time io_map () is invoked on the node, and destroyed
once it has no clients. The pager gets its data the same way
tarfs_read_node () does: by ask the cache to copy out the data.

Note that this ommit only implement read-only mappings, and even
then it does nothing to provide coherency between what can be read
from the created mappings and subsequent modifications done to the
underlying file. The implementation will be enhanced in the following
commits.

This is enough for ld.so to successfully mmap shared libraries from
a tarfs instance.
---
 Makefile |   2 +-
 main.c   |   1 +
 pager.c  | 117 +++
 pager.h  |   9 +
 tarfs.c  |  22 ++-
 tarfs.h  |   2 +
 6 files changed, 150 insertions(+), 3 deletions(-)
 create mode 100644 pager.c
 create mode 100644 pager.h

diff --git a/Makefile b/Makefile
index 6e2eb422e..49b1950a9 100644
--- a/Makefile
+++ b/Makefile
@@ -33,7 +33,7 @@ LDFLAGS = -L~ -lz -L. -lnetfs -lfshelp -liohelp -lports \
 CTAGS   = ctags
 
 SRC = main.c netfs.c tarfs.c tarlist.c fs.c cache.c tar.c names.c \
-  store-bzip2.c store-gzip.c debug.c
+  store-bzip2.c store-gzip.c debug.c pager.c
 
 OBJ = $(SRC:%.c=%.o)
 
diff --git a/main.c b/main.c
index 90286de08..64b48a973 100644
--- a/main.c
+++ b/main.c
@@ -30,6 +30,7 @@
 #include 
 
 #include "backend.h"
+#include "pager.h"
 
 /* Choose the right backend here. */
 extern struct fs_backend tarfs_backend;
diff --git a/pager.c b/pager.c
new file mode 100644
index 0..ecd564020
--- /dev/null
+++ b/pager.c
@@ -0,0 +1,117 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "pager.h"
+#include "cache.h"
+#include "backend.h"
+
+error_t
+pager_read_page (struct user_pager_info *upi, vm_offset_t page,
+ vm_address_t *buf, int *writelock)
+{
+  error_t err;
+  struct node *node;
+  size_t actual;
+  void *data;
+
+  data = mmap (0, vm_page_size, PROT_READ|PROT_WRITE, MAP_ANON, 0, 0);
+  if (data == MAP_FAILED)
+return ENOSPC;
+  *buf = (vm_address_t) data;
+
+  node = (struct node *) upi;
+
+  pthread_mutex_lock (&node->lock);
+  err = cache_read (node, page, vm_page_size, data, &actual);
+  pthread_mutex_unlock (&node->lock);
+
+  if (err)
+{
+  munmap (data, vm_page_size);
+  return err;
+}
+
+  memset(data + actual, 0, vm_page_size - actual);
+  *writelock = 1;
+
+  return 0;
+}
+
+error_t
+pager_write_page (struct user_pager_info *upi,
+  vm_offset_t page,
+  vm_address_t buf)
+{
+  assert_backtrace (!"we only create read-only pages");
+}
+
+error_t
+pager_unlock_page (struct user_pager_info *upi,
+   vm_offset_t address)
+{
+  return EROFS;
+}
+
+void
+pager_notify_evict (struct user_pager_info *upi,
+vm_offset_t page)
+{
+  assert_backtrace (!"unrequested notification on eviction");
+}
+
+/* Tell how big the file is. */
+error_t
+pager_report_extent (struct user_pager_info *upi,
+ vm_address_t *offset,
+ vm_size_t *size)
+{
+  struct node *node;
+
+  node = (struct node *) upi;
+  *offset = 0;
+  *size = node->nn_stat.st_size;
+  return 0;
+}
+
+error_t
+create_pager (struct node *node, memory_object_t *obj)
+{
+  struct pager *pager;
+
+  /* The pager holds a hard reference on the node. */
+  netfs_nref (node);
+  pager = pager_create ((struct user_pager_info *) node, tarfs_pager_bucket,
+1, MEMORY_OBJECT_COPY_NONE, 0);
+  if (!pager)
+return errno;
+
+  NODE_INFO(node)->pager = pager;
+
+  *obj = pager_get_port (pager);
+  ports_port_deref (pager);
+
+  return 0;
+}
+
+void
+pager_clear_user_data (struct user_pager_info *upi)
+{
+  struct node *node;
+
+  node = (struct node *) upi;
+  NODE_INFO(node)->pager = NULL;
+  netfs_nrele (node);
+}
+
+void
+pager_dropweak (struct user_pager_info *upi)
+{
+}
+
diff --git a/pager.h b/pager.h
new file mode 100644
index 0..34447c8c9
--- /dev/null
+++ b/pager.h
@@ -0,0 +1,9 @@
+#ifndef PAGER_H
+#define PAGER_H
+
+struct user_pager_info;
+extern struct port_bucket *tarfs_pager_bucket;
+
+error_t create_pager (struct node *node, memory_object_t *obj);
+
+#endif  /* PAGER_H_ */
diff --git a/tarfs.c b/tarfs.c
index c1d9a5cc1..2f5c53dad 100644
--- a/tarfs.c
+++ b/tarfs.c
@@ -18,6 +18,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -36,6 +37,7 @@
 #include "cache.h"
 #include "zipstores.h"
 #include "debug.h"
+#include "pager.h"
 
 /* New netfs variables */
 char *netfs_server_name = "tarfs";
@@ -1013,8 +1015,24 @@ tarfs_mkdev_node (struct node *node, mode_t type, dev_t 
indexes)
 error_t
 tarfs_io_map (struct node *node, memory_object_t *rdobj, memory_object_t 
*wrobj)
 {
-  error (0, 0, "Warning: io_map () not supported");
-  return EOPNOTSUPP;
+  error_t err;
+  memory_object_t obj;
+
+  if (!NODE_INFO(nod

[RFC PATCH tarfs 4/6] Implement basic support for writable mappings

2021-04-29 Thread Sergey Bugaev
No mmap coherence is implemented yet.
---
 pager.c | 30 +++---
 tarfs.c |  2 +-
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/pager.c b/pager.c
index ecd564020..8756a2a4b 100644
--- a/pager.c
+++ b/pager.c
@@ -11,6 +11,9 @@
 #include "pager.h"
 #include "cache.h"
 #include "backend.h"
+#include "tarfs.h"
+
+extern struct tarfs_opts tarfs_options;
 
 error_t
 pager_read_page (struct user_pager_info *upi, vm_offset_t page,
@@ -39,7 +42,7 @@ pager_read_page (struct user_pager_info *upi, vm_offset_t 
page,
 }
 
   memset(data + actual, 0, vm_page_size - actual);
-  *writelock = 1;
+  *writelock = tarfs_options.readonly;
 
   return 0;
 }
@@ -49,14 +52,35 @@ pager_write_page (struct user_pager_info *upi,
   vm_offset_t page,
   vm_address_t buf)
 {
-  assert_backtrace (!"we only create read-only pages");
+  error_t err;
+  struct node *node;
+  size_t to_write, actual;
+
+  node = (struct node *) upi;
+
+  pthread_mutex_lock (&node->lock);
+
+  if (page + vm_page_size > node->nn_stat.st_size)
+to_write = node->nn_stat.st_size - page;
+  else
+to_write = vm_page_size;
+  err = cache_write (node, page, (void *) buf, to_write, &actual);
+
+  pthread_mutex_unlock (&node->lock);
+
+  if (!err && actual < vm_page_size)
+err = EIO;
+
+  vm_deallocate (mach_task_self (), buf, vm_page_size);
+
+  return err;
 }
 
 error_t
 pager_unlock_page (struct user_pager_info *upi,
vm_offset_t address)
 {
-  return EROFS;
+  return tarfs_options.readonly ? EROFS : 0;
 }
 
 void
diff --git a/tarfs.c b/tarfs.c
index 2f5c53dad..3b9bcf590 100644
--- a/tarfs.c
+++ b/tarfs.c
@@ -1030,7 +1030,7 @@ tarfs_io_map (struct node *node, memory_object_t *rdobj, 
memory_object_t *wrobj)
   if (rdobj)
 *rdobj = obj;
   if (wrobj)
-*wrobj = MACH_PORT_NULL;
+*wrobj = tarfs_options.readonly ? MACH_PORT_NULL : obj;
 
   return 0;
 }
-- 
2.31.1




[RFC PATCH tarfs 6/6] Update TODO and BUGS

2021-04-29 Thread Sergey Bugaev
---
 BUGS | 1 -
 TODO | 2 --
 2 files changed, 3 deletions(-)

diff --git a/BUGS b/BUGS
index 1c946b366..4b88e85d1 100644
--- a/BUGS
+++ b/BUGS
@@ -4,7 +4,6 @@ Known tarfs bugs
 
 1. General
 
-* io_map () not implemented (gcc uses it unfortunately).
 * netfs.c (netfs_get_dirents): too much memory may be allocated (mmap'd).
 
 
diff --git a/TODO b/TODO
index c1c826167..4c5a1531f 100644
--- a/TODO
+++ b/TODO
@@ -9,7 +9,5 @@ TODO List
   translator set on this node.  For symlinks, we would not show any passive
   translator setting.
 
-* Implement netfs_S_io_map ().
-
 * Update the gzip store to make it more robust (for instance, find *real*
   specifications of the header format).
-- 
2.31.1




Re: [RFC PATCH tarfs 0/6] mmap support for tarfs

2021-05-01 Thread Sergey Bugaev
On Sat, May 1, 2021 at 6:38 PM Samuel Thibault  wrote:
> Actually I'd say the pager should replace the cache. The pager is
> already a cache by itself, we should not need to keep both the pager and
> the cache, particularly since it means having to keep both coherent.

Well, yes, I've considered that; but I've tried to keep the changes
less invasive than that. I'll take a stab at it if you think this is
the way to go.

This actually brings me to a question: why is tarfs using netfs over
diskfs? I see that "a translator which shows a .tar archive in a
unpacked way" is mentioned in the wiki [0] as a motivating example of
what netfs should be used for, so there must be a good reason. Yet
from what I see, libnetfs is suited for filesystems that are either
served from a remote location (the net in netfs) or just synthesized
on the fly; and on the other hand, the tar format, with its 512-byte
blocks, sounds very much like a filesystem image to me. isofs uses
diskfs, why doesn't tarfs?

[0]: https://www.gnu.org/software/hurd/hurd/libnetfs.html

> > Is there a better way to deal with a held mutex that unlocking it
> > before calling a function that re-locks it?
>
> Yes, avoiding the situation. If you unlock a mutex, anything can then
> happen, and you can easily get to an incoherent state.

I understand; if some code drops a mutex and then re-acquires it, it
has to be prepared that everything might have changed in the meantime.
Which is admittedly something I have not done for this prototype.

> > (Making the mutex recursive, perhaps?
>
> That's usually frowned upon because it's often a sign that you don't
> know at various places whether you hold the lock or not.
>
> > Or extracting the inner part into its own function?)
>
> Yes, in some cases that makes sense.

See, I cannot easily do either (make the mutex recursive or extract
the logic). The flow goes something like this:

1. S_io_write ()
2. lock the node, validate size/offset, grow it if needed
3. pager_memcpy (), fault
4. another thread (!) gets to pager_{read,write}_page ()

This is why the mutex cannot be made recursive: it's being grabbed
from the other thread. And this is also why I cannot just extract the
logic: I'm not calling the logic inside pager.c directly, I'm calling
pager_memcpy (), which faults and *that* causes
pager_{read,write}_page () to be called the same way it'd be called
for any other task faulting on the mapping. And
pager_{read,write}_page () naturally has to lock, to validate the
size, access the cache, and whatnot.

Perhaps you can think of a solution? How does diskfs cope with this?
(I see that _diskfs_rdwr_internal () still has the node locked while
calling pager_memcpy (), how come that doesn't deadlock?)

-- 
Sergey



Re: [RFC PATCH tarfs 3/6] Implement basic read-only mmap support

2021-05-01 Thread Sergey Bugaev
On Sat, May 1, 2021 at 6:23 PM Samuel Thibault  wrote:
> This is missing the copyright head:ng.

Ah yes, sorry. Should I just put myself there, or "Hurd developers",
or "Free Software Foundation, Inc."?..

> > +  err = create_pager (node, &obj);
>
> Is the pager getting cleaned at some point?

The idea was that NODE_INFO(node)->pager does *not* itself keep a hard
reference on the pager (create_pager () calls ports_port_deref (pager)
after pager_get_port ()), so once all the pager clients are gone, the
memory object would get garbage-collected, which would call
pager_clear_user_data (), which sets NODE_INFO(node)->pager = NULL.

But I was unable to test this in practice: Mach doesn't seem to want
to do memory_object_terminate () even after the clients are gone.
Perhaps it wants to keep the page cached while it has enough free
memory.

-- 
Sergey



Re: [RFC PATCH tarfs 0/6] mmap support for tarfs

2021-05-01 Thread Sergey Bugaev
On Sat, May 1, 2021 at 7:38 PM Samuel Thibault  wrote:
> > on the other hand, the tar format, with its 512-byte
> > blocks, sounds very much like a filesystem image to me. isofs uses
> > diskfs, why doesn't tarfs?
>
> It's not exactly the same since you have compression in the way. But
> yes, that looks similar enough.

Does tar actually do any compression? I was under the impression that
tar is about just sticking multiple files together, and then you'd
apply gzip or bzip2 on top if you want compression. And in case of
tarfs, the compression is handled by the store abstraction, which
makes it transparent to the rest of the logic.

> Diskfs' pager_read_page does *not* have to lock the node, it just reads
> and returns the data. That's again a point where you see that having
> also the cache is in the way rather than helping.

Hmm, but doesn't the page reading/writing implementation need to
access the file size, atime/mtime, etc.? All of which may be changed
concurrently.

-- 
Sergey



Re: [RFC PATCH tarfs 0/6] mmap support for tarfs

2021-05-01 Thread Sergey Bugaev
On Sat, May 1, 2021 at 8:05 PM Samuel Thibault  wrote:
> > And in case of tarfs, the compression is handled by the store
> > abstraction, which makes it transparent to the rest of the logic.
>
> Ah, ok.
> And in the zip case?

As I understand it, zip is a different model that basically combines
functionality of tar and gzip; you cannot just zip a blob, you need a
file tree. So it's not relevant for compressing ar files.

tarfs can be run with --bzip2 and --gzip options, but not with --zip.

> > Hmm, but doesn't the page reading/writing implementation need to
> > access the file size, atime/mtime, etc.? All of which may be changed
> > concurrently.
>
> IIRC it isn't handled by the pager itself.

Is it? I don't think pager (as in libpager) knows anything about
atime/mtime and other filesystem-specific cruft. The pager knows the
size; but the callbacks need to know the size too, to decide how much
to read/write in case of the last page.

(To quote the Linux manual for mmap: "For a file that is not a
multiple of the page size, the remaining bytes in the partial page at
the end of the mapping are zeroed when mapped, and modifications to
that region are not written out to the file.")

-- 
Sergey



Re: [RFC PATCH tarfs 3/6] Implement basic read-only mmap support

2021-05-01 Thread Sergey Bugaev
On Sat, May 1, 2021 at 8:48 PM Samuel Thibault  wrote:
> Since this is not part of the main hurd repo there is no copyright
> assignment requirement, so you can put your name. You can also assign
> copyright and then put the FSF name.

OK, I'll put my own name for now then.

I'm fine with assigning copyright to FSF if that's required for
contributing to the main repo; but I've heard it's a long complicated
process that involves physically sending papeps overseas & getting an
ack from my employer. Is that true? Could you (or somebody) guide me
through it?

> > The idea was that NODE_INFO(node)->pager does *not* itself keep a hard
> > reference on the pager (create_pager () calls ports_port_deref (pager)
> > after pager_get_port ()), so once all the pager clients are gone, the
> > memory object would get garbage-collected, which would call
> > pager_clear_user_data (), which sets NODE_INFO(node)->pager = NULL.
>
> Does that also destroy the pager?

Doesn't it?

Here's my understanding of how this works (or should work):

* Ports (as in struct port_info) are reference counted. One can hold
explicit references on them (with ports_port_[de]ref ()); a reference
is also acquired and dropped implicitly by MIG intran/outtran. The
port also gets ref'ed when you do ports_get_right (); this reference
is then automatically dropped once there are no longer any send rights
left. (I don't quite understand how that last one works, yet.) So
overall, a port is alive while it's either referenced by something
in-process, or has clients.
* Pagers from libpager use this mechanism. Once nothing in the process
references a pager and all the send rights to it are gone (so, after
memory_object_terminate ()), the pager gets destroyed by the reference
counting mechanism. It is then when pager_clear_user_data () is
called, among other things.
* In my implementation, nothing in the tarfs process holds references
on pagers, so each pager should get destroyed as there are no send
rights to it, i.e. as soon as Mach terminates it.

-- 
Sergey



[PATCH 1/6] libpager: Fix mixing up success and error

2021-05-06 Thread Sergey Bugaev
_pager_pagemap_resize () returns an error or 0 on success,
not a boolean.
---
 libpager/offer-page.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/libpager/offer-page.c b/libpager/offer-page.c
index 9f090bcb..ddea236a 100644
--- a/libpager/offer-page.c
+++ b/libpager/offer-page.c
@@ -31,21 +31,22 @@ pager_offer_page (struct pager *p,
   pthread_mutex_lock (&p->interlock);
 
   if (_pager_pagemap_resize (p, offset + vm_page_size))
+goto release_out;
+
+  short *pm_entry = &p->pagemap[offset / vm_page_size];
+
+  while (*pm_entry & PM_INCORE)
 {
-  short *pm_entry = &p->pagemap[offset / vm_page_size];
-
-  while (*pm_entry & PM_INCORE)
-   {
- pthread_mutex_unlock (&p->interlock);
- pager_flush_some (p, offset, vm_page_size, 1);
- pthread_mutex_lock (&p->interlock);
-   }
-  *pm_entry |= PM_INCORE;
-
-  memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0,
-writelock ? VM_PROT_WRITE : VM_PROT_NONE, 
-precious, MACH_PORT_NULL);
+  pthread_mutex_unlock (&p->interlock);
+  pager_flush_some (p, offset, vm_page_size, 1);
+  pthread_mutex_lock (&p->interlock);
 }
+  *pm_entry |= PM_INCORE;
+
+  memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0,
+ writelock ? VM_PROT_WRITE : VM_PROT_NONE,
+ precious, MACH_PORT_NULL);
 
+ release_out:
   pthread_mutex_unlock (&p->interlock);
 }
-- 
2.31.1




[PATCH 2/6] libpager: Do not flush in-core pages on offer

2021-05-06 Thread Sergey Bugaev
pager_offer_page () is documented to may ignore the offered page if
the kernel already has a copy in core (indeed, that's what Mach does).

However the current behavior is the inverse of that: it asks the kernel
to flush (i.e. drop) its in-core copy, and replace it with the offered
one. Fix this by not doing that, and calling memory_object_data_supply ()
directly.
---
 libpager/offer-page.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/libpager/offer-page.c b/libpager/offer-page.c
index ddea236a..0b0ca2eb 100644
--- a/libpager/offer-page.c
+++ b/libpager/offer-page.c
@@ -34,13 +34,6 @@ pager_offer_page (struct pager *p,
 goto release_out;
 
   short *pm_entry = &p->pagemap[offset / vm_page_size];
-
-  while (*pm_entry & PM_INCORE)
-{
-  pthread_mutex_unlock (&p->interlock);
-  pager_flush_some (p, offset, vm_page_size, 1);
-  pthread_mutex_lock (&p->interlock);
-}
   *pm_entry |= PM_INCORE;
 
   memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0,
-- 
2.31.1




[PATCH 0/6] Various libpager fixes

2021-05-06 Thread Sergey Bugaev
While hacking further on tarfs mmap support, I've come across some
issues in libpager and I thought I'd fix them. The issues range from
plain bugs (such as pager_offer_page () always silently failing) to
minor things that could be improved.

The patches are fairly independent, but they touch a lot of the same
code, so they may be hard to apply separately. I can regenerate diffs
for individual patches if needed.

Most of the patches only modify internal workings of libpager; the
3rd patch slightly modifies the public API by adding return codes
to various functions that have previously returned void. This should
not break any callers though.

I have tested that the modified libpager builds without warnings, and
tmpfs seems to work as expected with libpager.so replaced.

Sergey Bugaev (6):
  libpager: Fix mixing up success and error
  libpager: Do not flush in-core pages on offer
  libpager: Add error handling to various functions
  libpager: Store pagemapsize as vm_size_t
  libpager: Fix overallocating pagemap
  libpager: Use libc heap for pagemap

 doc/hurd.texi   | 16 +--
 libpager/data-unlock.c  | 13 +
 libpager/lock-object.c  | 32 ++---
 libpager/object-terminate.c |  5 ++--
 libpager/offer-page.c   | 28 +--
 libpager/pagemap.c  | 27 +++---
 libpager/pager-attr.c   | 56 +
 libpager/pager-flush.c  | 21 --
 libpager/pager-return.c | 21 --
 libpager/pager-sync.c   | 19 +++--
 libpager/pager.h| 20 ++---
 libpager/priv.h |  6 ++--
 12 files changed, 143 insertions(+), 121 deletions(-)

-- 
2.31.1




[PATCH 3/6] libpager: Add error handling to various functions

2021-05-06 Thread Sergey Bugaev
Instead of ignoring errors, we might as well return them to the caller.

This technically changes the function signatures, but should not break
any users since they can continue to ignore the return value.
---
 doc/hurd.texi   | 16 ++--
 libpager/data-unlock.c  | 13 +-
 libpager/lock-object.c  | 32 +++
 libpager/offer-page.c   | 14 +++
 libpager/pager-attr.c   | 56 +++--
 libpager/pager-flush.c  | 21 +---
 libpager/pager-return.c | 21 +---
 libpager/pager-sync.c   | 19 --
 libpager/pager.h| 20 +++
 libpager/priv.h |  4 +--
 10 files changed, 125 insertions(+), 91 deletions(-)

diff --git a/doc/hurd.texi b/doc/hurd.texi
index f63cfcda..c8db86b5 100644
--- a/doc/hurd.texi
+++ b/doc/hurd.texi
@@ -1434,8 +1434,8 @@ Demultiplex incoming @code{libports} messages on pager 
ports.
 The following functions are the body of the pager library, and provide a
 clean interface to pager functionality:
 
-@deftypefun void pager_sync (@w{struct pager *@var{pager}}, @w{int @var{wait}})
-@deftypefunx void pager_sync_some (@w{struct pager *@var{pager}}, 
@w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}})
+@deftypefun error_t pager_sync (@w{struct pager *@var{pager}}, @w{int 
@var{wait}})
+@deftypefunx error_t pager_sync_some (@w{struct pager *@var{pager}}, 
@w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}})
 Write data from pager @var{pager} to its backing store.  Wait for all
 the writes to complete if and only if @var{wait} is set.
 
@@ -1443,8 +1443,8 @@ the writes to complete if and only if @var{wait} is set.
 data starting at @var{start}, for @var{len} bytes.
 @end deftypefun
 
-@deftypefun void pager_flush (@w{struct pager *@var{pager}}, @w{int 
@var{wait}})
-@deftypefunx void pager_flush_some (@w{struct pager *@var{pager}}, 
@w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}})
+@deftypefun error_t pager_flush (@w{struct pager *@var{pager}}, @w{int 
@var{wait}})
+@deftypefunx error_t pager_flush_some (@w{struct pager *@var{pager}}, 
@w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}})
 Flush data from the kernel for pager @var{pager} and force any pending
 delayed copies.  Wait for all pages to be flushed if and only if
 @var{wait} is set.
@@ -1453,8 +1453,8 @@ delayed copies.  Wait for all pages to be flushed if and 
only if
 flushes data starting at @var{start}, for @var{len} bytes.
 @end deftypefun
 
-@deftypefun void pager_return (@w{struct pager *@var{pager}}, @w{int 
@var{wait}})
-@deftypefunx void pager_return_some (@w{struct pager *@var{pager}}, 
@w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}})
+@deftypefun error_t pager_return (@w{struct pager *@var{pager}}, @w{int 
@var{wait}})
+@deftypefunx error_t pager_return_some (@w{struct pager *@var{pager}}, 
@w{vm_address_t @var{start}}, @w{vm_size_t @var{len}}, @w{int @var{wait}})
 Flush data from the kernel for pager @var{pager} and force any pending
 delayed copies.  Wait for all pages to be flushed if and only if
 @var{wait} is set.  Have the kernel write back modifications.
@@ -1464,14 +1464,14 @@ delayed copies.  Wait for all pages to be flushed if 
and only if
 @var{start}, for @var{len} bytes.
 @end deftypefun
 
-@deftypefun void pager_offer_page (@w{struct pager *@var{pager}}, @w{int 
@var{precious}}, @w{int @var{writelock}}, @w{vm_offset_t @var{page}}, 
@w{vm_address_t @var{buf}})
+@deftypefun error_t pager_offer_page (@w{struct pager *@var{pager}}, @w{int 
@var{precious}}, @w{int @var{writelock}}, @w{vm_offset_t @var{page}}, 
@w{vm_address_t @var{buf}})
 Offer a page of data to the kernel.  If @var{precious} is set, then this
 page will be paged out at some future point, otherwise it might be
 dropped by the kernel.  If the page is currently in core, the kernel
 might ignore this call.
 @end deftypefun
 
-@deftypefun void pager_change_attributes (@w{struct pager *@var{pager}}, 
@w{boolean_t @var{may_cache}}, @w{memory_object_copy_strategy_t 
@var{copy_strategy}}, @w{int @var{wait}})
+@deftypefun error_t pager_change_attributes (@w{struct pager *@var{pager}}, 
@w{boolean_t @var{may_cache}}, @w{memory_object_copy_strategy_t 
@var{copy_strategy}}, @w{int @var{wait}})
 Change the attributes of the memory object underlying pager @var{pager}.
 The @var{may_cache} and @var{copy_strategy} arguments are as for
 @code{memory_object_change_}.  Wait for the kernel to report
diff --git a/libpager/data-unlock.c b/libpager/data-unlock.c
index 077e673f..caaea41a 100644
--- a/libpager/data-unlock.c
+++ b/libpager/data-unlock.c
@@ -28,7 +28,7 @@ _pager_S_memory_object_data_unlock (struct pager *p,
 vm_size_t length,
 vm_prot_t access)
 {
-  volatile int err;
+  error_t err = 0;
 
   if (!p
   || p->port.class != _pager_class)
@@ 

[PATCH 4/6] libpager: Store pagemapsize as vm_size_t

2021-05-06 Thread Sergey Bugaev
On a 64-bit system, there can be a lot more pages
than a 32-bit int can fit.
---
 libpager/pagemap.c | 2 +-
 libpager/priv.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libpager/pagemap.c b/libpager/pagemap.c
index 963f6567..63c5f44d 100644
--- a/libpager/pagemap.c
+++ b/libpager/pagemap.c
@@ -29,7 +29,7 @@ _pager_pagemap_resize (struct pager *p, vm_address_t off)
   if (p->pagemapsize < off)
 {
   void *newaddr;
-  int newsize = round_page (off);
+  vm_size_t newsize = round_page (off);
 
   newaddr = mmap (0, newsize * sizeof (*p->pagemap), 
  PROT_READ|PROT_WRITE, MAP_ANON, 0, 0);
diff --git a/libpager/priv.h b/libpager/priv.h
index c0a99fe3..d9d76965 100644
--- a/libpager/priv.h
+++ b/libpager/priv.h
@@ -66,7 +66,7 @@ struct pager
 #endif
 
   short *pagemap;
-  int pagemapsize; /* number of elements in PAGEMAP */
+  vm_size_t pagemapsize;   /* number of elements in PAGEMAP */
 };
 
 struct lock_request
-- 
2.31.1




[PATCH 5/6] libpager: Fix overallocating pagemap

2021-05-06 Thread Sergey Bugaev
The code tried to round up the allocation size to a multiple of page size.
But we actually allocate newsize * sizeof (*p->pagemap) bytes, not newsize
bytes, which meant allocations were sizeof (*p->pagemap) times larger than
they needed to be.
---
 libpager/pagemap.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libpager/pagemap.c b/libpager/pagemap.c
index 63c5f44d..1570c75b 100644
--- a/libpager/pagemap.c
+++ b/libpager/pagemap.c
@@ -17,21 +17,22 @@
 
 #include "priv.h"
 #include 
-  
+
 /* Grow the pagemap of pager P as necessary to deal with address OFF */
 error_t
 _pager_pagemap_resize (struct pager *p, vm_address_t off)
 {
   error_t err = 0;
-  
+
   off /= __vm_page_size;
 
   if (p->pagemapsize < off)
 {
   void *newaddr;
-  vm_size_t newsize = round_page (off);
+  vm_size_t newsize = round_page (off * sizeof (*p->pagemap))
+  / sizeof (*p->pagemap);
 
-  newaddr = mmap (0, newsize * sizeof (*p->pagemap), 
+  newaddr = mmap (0, newsize * sizeof (*p->pagemap),
  PROT_READ|PROT_WRITE, MAP_ANON, 0, 0);
   err = (newaddr == (void *) -1) ? errno : 0;
   if (! err)
-- 
2.31.1




[PATCH 6/6] libpager: Use libc heap for pagemap

2021-05-06 Thread Sergey Bugaev
libc already implements the functionality for allocating and managing
memory blocks like the pagemap. Using libc functions gives us some
additional niceties like overflow checking in reallocarray (). it also
means that we will not allocate a whole page of memory if we need to
store just a few integers.
---
 libpager/object-terminate.c |  5 +++--
 libpager/pagemap.c  | 26 +-
 2 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/libpager/object-terminate.c b/libpager/object-terminate.c
index 95298217..3e7df167 100644
--- a/libpager/object-terminate.c
+++ b/libpager/object-terminate.c
@@ -17,6 +17,7 @@
 
 #include "priv.h"
 #include "memory_object_S.h"
+#include 
 #include 
 
 /* Implement the object termination call from the kernel as described
@@ -118,10 +119,10 @@ _pager_free_structure (struct pager *p)
   /* Free the pagemap */
   if (p->pagemapsize)
 {
-  munmap (p->pagemap, p->pagemapsize * sizeof (* p->pagemap));
+  free (p->pagemap);
   p->pagemapsize = 0;
   p->pagemap = 0;
 }
-  
+
   p->pager_state = NOTINIT;
 }
diff --git a/libpager/pagemap.c b/libpager/pagemap.c
index 1570c75b..7bbb8c56 100644
--- a/libpager/pagemap.c
+++ b/libpager/pagemap.c
@@ -16,33 +16,25 @@
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
 
 #include "priv.h"
+#include 
 #include 
 
 /* Grow the pagemap of pager P as necessary to deal with address OFF */
 error_t
 _pager_pagemap_resize (struct pager *p, vm_address_t off)
 {
-  error_t err = 0;
-
   off /= __vm_page_size;
 
   if (p->pagemapsize < off)
 {
-  void *newaddr;
-  vm_size_t newsize = round_page (off * sizeof (*p->pagemap))
-  / sizeof (*p->pagemap);
-
-  newaddr = mmap (0, newsize * sizeof (*p->pagemap),
- PROT_READ|PROT_WRITE, MAP_ANON, 0, 0);
-  err = (newaddr == (void *) -1) ? errno : 0;
-  if (! err)
-   {
- memcpy (newaddr, p->pagemap, p->pagemapsize * sizeof (*p->pagemap));
- munmap (p->pagemap, p->pagemapsize * sizeof (*p->pagemap));
- p->pagemap = newaddr;
- p->pagemapsize = newsize;
-   }
+  void *newaddr = reallocarray (p->pagemap, off,
+sizeof (*p->pagemap));
+  if (!newaddr)
+return errno;
+
+  p->pagemap = newaddr;
+  p->pagemapsize = off;
 }
 
-  return err;
+  return 0;
 }
-- 
2.31.1




And another patch...

2021-05-06 Thread Sergey Bugaev
There's yet another bug: libpager just throws away clean precious
pages (those previously offered with pager_offer_page (precious = 1))
upon receiving them from the kernel, since it mistakes them for
evicted pages it's being notified about. This is obviously very
problematic.

I'm attaching a patch that attempts to fix the issue, but I'm less
sure about it. What do you think?

Sergey

-- >8 --
Subject: [PATCH] libpager: Do not throw away precious pages

The kernel invokes memory_object_data_return () with dirty = 0 in two
cases: if notification on eviction was requested, or when returning
precious pages. Properly distinguish between the two cases, and only
throw the clean page away in the former case.
---
 libpager/data-return.c | 69 --
 libpager/offer-page.c  |  2 ++
 libpager/priv.h|  1 +
 3 files changed, 29 insertions(+), 43 deletions(-)

diff --git a/libpager/data-return.c b/libpager/data-return.c
index c0f5aaf7..59e3e956 100644
--- a/libpager/data-return.c
+++ b/libpager/data-return.c
@@ -38,12 +38,12 @@ _pager_do_write_request (struct pager *p,
   short *pm_entries;
   int npages, i;
   char *notified;
+  char *omit_data;
   error_t *pagerrs;
   struct lock_request *lr;
   struct lock_list {struct lock_request *lr;
 struct lock_list *next;} *lock_list, *ll;
   int wakeup;
-  int omitdata = 0;

   if (!p
   || p->port.class != _pager_class)
@@ -78,6 +78,9 @@ _pager_do_write_request (struct pager *p,
   npages = length / __vm_page_size;
   pagerrs = alloca (npages * sizeof (error_t));

+  omit_data = alloca (npages * (sizeof *omit_data));
+  memset (omit_data, 0, npages * (sizeof *omit_data));
+
   notified = alloca (npages * (sizeof *notified));
 #ifndef NDEBUG
   memset (notified, -1, npages * (sizeof *notified));
@@ -90,23 +93,6 @@ _pager_do_write_request (struct pager *p,

   pm_entries = &p->pagemap[offset / __vm_page_size];

-  if (! dirty)
-{
-  munmap ((void *) data, length);
-  if (!kcopy) {
-/* Prepare notified array.  */
-for (i = 0; i < npages; i++)
-  notified[i] = (p->notify_on_evict
- && ! (pm_entries[i] & PM_PAGEINWAIT));
-
-goto notify;
-  }
-  else {
-_pager_allow_termination (p);
-goto release_out;
-  }
-}
-
   /* Make sure there are no other in-progress writes for any of these
  pages before we begin.  This imposes a little more serialization
  than we really have to require (because *all* future writes on
@@ -115,27 +101,24 @@ _pager_do_write_request (struct pager *p,
   /* XXX: Is this still needed?  */
  retry:
   for (i = 0; i < npages; i++)
-if (pm_entries[i] & PM_PAGINGOUT)
-  {
- pm_entries[i] |= PM_WRITEWAIT;
- pthread_cond_wait (&p->wakeup, &p->interlock);
- goto retry;
+{
+  if ((initializing && (pm_entries[i] & PM_INIT)) ||
+  (!dirty && !(pm_entries[i] & PM_PRECIOUS)))
+{
+  omit_data[i] = 1;
+  continue;
+}
+  if (pm_entries[i] & PM_PAGINGOUT)
+{
+  pm_entries[i] |= PM_WRITEWAIT;
+  pthread_cond_wait (&p->wakeup, &p->interlock);
+  goto retry;
   }
+}

   /* Mark these pages as being paged out.  */
-  if (initializing)
-{
-  assert_backtrace (npages <= 32);
-  for (i = 0; i < npages; i++)
- {
-  if (pm_entries[i] & PM_INIT)
-omitdata |= 1 << i;
-  else
-pm_entries[i] |= PM_PAGINGOUT | PM_INIT;
- }
-}
-  else
-for (i = 0; i < npages; i++)
+  for (i = 0; i < npages; i++)
+if (!omit_data[i])
   pm_entries[i] |= PM_PAGINGOUT | PM_INIT;

   /* If this write occurs while a lock is pending, record
@@ -162,7 +145,7 @@ _pager_do_write_request (struct pager *p,
  but until the pager library interface is changed, this will have to do. */

   for (i = 0; i < npages; i++)
-if (!(omitdata & (1 << i)))
+if (!omit_data[i])
   pagerrs[i] = pager_write_page (p->upi,
  offset + (vm_page_size * i),
  data + (vm_page_size * i));
@@ -175,9 +158,11 @@ _pager_do_write_request (struct pager *p,
   wakeup = 0;
   for (i = 0; i < npages; i++)
 {
-  if (omitdata & (1 << i))
+  if (omit_data[i])
  {
-  notified[i] = 0;
+  notified[i] = (p->notify_on_evict
+ && !kcopy
+ && ! (pm_entries[i] & PM_PAGEINWAIT));
   continue;
  }

@@ -205,14 +190,13 @@ _pager_do_write_request (struct pager *p,
  }
   else
  {
-  munmap ((void *) (data + (vm_page_size * i)),
-  vm_page_size);
   notified[i] = (! kcopy && p->notify_on_evict);
   if (! kcopy)
 pm_entries[i] &= ~PM_INCORE;
  }

-  pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT | PM_WRITEWAIT);
+  pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT
+   | PM_WRITEWAIT | PM_PRECIOUS);
 }

   for (ll = lock_list; ll; ll = ll->next)
@@ -222,7 +206,6 @@ _pager_do_write_request (struct pager *p,
   if (wakeup)
 pthread_cond_broadcast (&p->wakeup);

Re: [PATCH 6/6] libpager: Use libc heap for pagemap

2021-05-07 Thread Sergey Bugaev
On Thu, May 6, 2021 at 3:56 PM Sergey Bugaev  wrote:
> -  newaddr = mmap (0, newsize * sizeof (*p->pagemap),
> - PROT_READ|PROT_WRITE, MAP_ANON, 0, 0);
> +  void *newaddr = reallocarray (p->pagemap, off,
> +sizeof (*p->pagemap));

It seems while fixing preexisting issues I accidentally introduces a
new one. Unlike mmap, reallocarray does not zero-initialize the newly
allocated memory; but other code expects new pagemap entries to be set
to zero. This is the cause of those tarfs hangs I've been seeing
lately.

I'm sorry, and here's a hotfix.

Sergey

-- >8 --
Subject: [PATCH] libpager: Properly zero-initialize pagemap

Unlike mmap () and calloc (), reallocarray () does not automatically
zero-fill the newly allocated memory. Do so explicitly.
---
 libpager/pagemap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libpager/pagemap.c b/libpager/pagemap.c
index 7bbb8c56..c7c86d60 100644
--- a/libpager/pagemap.c
+++ b/libpager/pagemap.c
@@ -32,6 +32,8 @@ _pager_pagemap_resize (struct pager *p, vm_address_t off)
   if (!newaddr)
 return errno;

+  memset ((short *) newaddr + p->pagemapsize, 0,
+  (off - p->pagemapsize) * sizeof (*p->pagemap));
   p->pagemap = newaddr;
   p->pagemapsize = off;
 }
-- 
2.31.1



settrans: (os/kern) invalid right

2021-05-07 Thread Sergey Bugaev
Hello yet again,

I'm hitting the following issue when trying to re-set a translator on
a node whose translator has previously died:

$ settrans -acP /tmp/yes ~/dev-yes/hurd/yes
Translator pid: 1039
Pausing...
$ kill 1039
$ settrans -ag /tmp/yes ~/dev-yes/hurd/yes
settrans: /tmp/yes: (os/kern) invalid right

I've traced it to fshelp_fetch_control () returning KERN_INVALID_RIGHT
when the control right of the previously set translator turns out to
be a dead name. fshelp_fetch_control () actually handles this case
with a specific code path which resets the stored port to
MACH_PORT_NULL, but it returns an error nevertheless.

Digging up git history, the two relevant commits are [0] and [1].
Commit [0] introduces this special handling for dead names, and
actually tries to deallocate the dead name and return 0 +
MACH_PORT_NULL from the fshelp_fetch_control () call. Commit [1]
reverts a part of this logic: fshelp_fetch_control () no longer tries
to deallocate the dead name (so the task keeps it forever?..), and no
longer returns 0 to the caller, propagating KERN_INVALID_RIGHT
instead. Which is the source of the error I'm seeing.

Indeed, reverting [1] seems to fix my issue.

I don't think I understand the reasoning behind [1]. Perhaps I'm
missing something?

Sergey

[0]: 
http://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=3319f7f6a238cf88b9f46849e7be84d3c96376d6
[1]: 
http://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=7b9011628ec0dea3e01b19b75013ce5cf5b9c841



Re: settrans: (os/kern) invalid right

2021-05-08 Thread Sergey Bugaev
On Sat, May 8, 2021 at 2:39 PM Samuel Thibault  wrote:
> I pushed the revert, thanks!

Thank you!

I'm attaching another patch that tries to make settrans handle this
case a tiny bit better: if we failed to set the translator (either
because of this issue, or for any other reason), let's shut the
translator back down instead of leaving it in limbo.

Sergey

P.S. There must be something wrong with your mailserver. I only
received a bunch of replies form you just a few minutes ago, and
seemingly so did the archive [0], but your citation indicates that you
have sent the reply promptly after my message.

[0] https://lists.gnu.org/archive/html/bug-hurd/2021-05/

-- >8 --
Subject: [PATCH] utils/settrans: Ask translator to go away on errors

If the translator starts up fine, but we fail to set it as the active
translator on the underlying node for whatever reason, ask it to shut
down again, instead of leaving it orphaned. This is what mount(1)
does already.

To test:

$ settrans -a /dev/null /hurd/hello
settrans: /dev/null: Operation not permitted
$ pgrep hello
1141

That means PID 1141, the translator we started, is still alive. With
this patch it shuts down, and pgrep returns nothing.
---
 utils/settrans.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/utils/settrans.c b/utils/settrans.c
index 9c9f087e..f07d294b 100644
--- a/utils/settrans.c
+++ b/utils/settrans.c
@@ -396,7 +396,11 @@ main(int argc, char *argv[])
  argz, argz_len,
  active_control, MACH_MSG_TYPE_COPY_SEND);
   if (err)
- error (5, err, "%s", node_name);
+{
+  if (active_control != MACH_PORT_NULL)
+fsys_goaway (active_control, FSYS_GOAWAY_FORCE);
+  error (5, err, "%s", node_name);
+}
 }

   if (chroot_command)
-- 
2.31.1



Re: And another patch...

2021-05-08 Thread Sergey Bugaev
On Sat, May 8, 2021 at 2:36 PM Samuel Thibault  wrote:
>
> Hello,
>
> Just to be sure: this is not only because of your previous patch
> libpager: Do not flush in-core pages on offer
> ?

Yes; this issue is completely orthogonal to that fix (but there's an
interesting interaction I haven't thought about, see below)

This issue is about the Mach mechanism of "precious pages". Namely,
when offering a page to the kernel (with memory_object_data_supply ()
or pager_offer_page () which wraps it), a memory manager can tell the
kernel that the page is "precious". This means the kernel should not
silently drop the page when the kernel no longer needs the page, even
if the page is not dirtied (written to by clients) in the meantime.
The intended use case is the memory manager supplying the kernel with
the actual page where it stores its data, not a copy from some other
sort of storage. The kernel will eventually return the page through
memory_object_data_return ().

Now, offering precious pages that are already in core (this is where
that other patch is relevant) will result in KERN_MEMORY_PRESENT error
being delivered via memory_object_supply_completed ()... which
libpager currently ignores. So that might be something to fix, too.
Thanks!

Now, for something slightly different:

> @@ -205,14 +190,13 @@ _pager_do_write_request (struct pager *p,
> }
>else
> {
> - munmap ((void *) (data + (vm_page_size * i)),
> - vm_page_size);
>   notified[i] = (! kcopy && p->notify_on_evict);
>   if (! kcopy)
> pm_entries[i] &= ~PM_INCORE;
>}

I've included this in the patch up-thread; it definitely should have
been a separate patch; and now I'm doubting if it's a right change at
all.

Namely, pager_write_page () is documented to "In addition, mfree (or
equivalent) BUF." I don't know what mfree is, but I assume this means
munmap () or vm_deallocate (). In other words, pager_write_page () is
documented to _assume ownership_ of the passed in page. So it would
seem like _pager_do_write_request () *also* trying to unmap the page
after the pager_write_page () calls is wrong (and may even lead to
accidentally unmapping an unrelated page), which is why I removed it.

However.

The other branch in _pager_do_write_request () tries to return the
page back to the kernel, so it definitely assumes that the page is
still valid/available. And looking at various pager_write_page ()
implementations throughout the Hurd, some do deallocate the page
(storeio), but most don't (defpager, ext2fs, fatfs).

So overall, it would appear that pager_write_page () is actually *not*
supposed to deallocate the page. If that's correct, storeio (and
potentially other users outside of the main tree?) needs changing, and
the documentation for pager_write_page () needs to be fixed.

What do you think?

Sergey



Re: [RFC PATCH tarfs 0/6] mmap support for tarfs

2021-05-08 Thread Sergey Bugaev
On Sat, May 8, 2021 at 2:40 PM Samuel Thibault  wrote:
> Ah, seeing zipstores.c I thought zipfiles were supported, so I was
> wondering how since, as you say, they combine both.

By the way, upstream libstore itself supports gzip & bzip2; I have
only briefly looked into why tarfs needs to ship its own
implementation. Upstream libstore seems to only support store_gunzip
and store_bunzip2, which, to quote the docs, "contains a snapshot of
the uncompressed contents of the store FROM". The version shipped with
tarfs has store_gzip and store_bzip2, seemingly providing both reading
and writing support.

Any idea why the implementations have not been upstreamed?

Sergey



[PATCH 1/4] libpager: pager_write_page () should not unmap page

2021-05-08 Thread Sergey Bugaev
Clarify this in the documentation, and fix the storeio implementation.
---
 doc/hurd.texi   | 6 +++---
 libpager/pager.h| 6 +++---
 storeio/pager.c | 8 +++-
 tmpfs/pager-stubs.c | 6 +++---
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/doc/hurd.texi b/doc/hurd.texi
index c8db86b5..239d82da 100644
--- a/doc/hurd.texi
+++ b/doc/hurd.texi
@@ -1536,9 +1536,9 @@ only permissible error returns are @code{EIO}, 
@code{EDQUOT}, and
 
 @deftypefun error_t pager_write_page (@w{struct user_pager_info *@var{pager}}, 
@w{vm_offset_t @var{page}}, @w{vm_address_t @var{buf}})
 For pager @var{pager}, synchronously write one page from @var{buf} to
-offset @var{page}.  In addition, @code{vm_deallocate} (or equivalent)
-@var{buf}.  The only permissible error returns are @code{EIO},
-@code{EDQUOT}, and @code{ENOSPC}.
+offset @var{page}.  Do not deallocate @var{buf}, and do not keep any
+references to @var{buf}.  The only permissible error returns are
+@code{EIO}, @code{EDQUOT}, and @code{ENOSPC}.
 @end deftypefun
 
 @deftypefun error_t pager_unlock_page (@w{struct user_pager_info 
*@var{pager}}, @w{vm_offset_t @var{address}})
diff --git a/libpager/pager.h b/libpager/pager.h
index df7fbfc8..5c03b748 100644
--- a/libpager/pager.h
+++ b/libpager/pager.h
@@ -191,9 +191,9 @@ pager_read_page (struct user_pager_info *pager,
 int *write_lock);
 
 /* The user must define this function.  For pager PAGER, synchronously
-   write one page from BUF to offset PAGE.  In addition, mfree
-   (or equivalent) BUF.  The only permissible error returns are EIO,
-   EDQUOT, and ENOSPC. */
+   write one page from BUF to offset PAGE.  Do not deallocate BUF, and do
+   not keep any references to BUF.  The only permissible error returns
+   are EIO, EDQUOT, and ENOSPC. */
 error_t
 pager_write_page (struct user_pager_info *pager,
  vm_offset_t page,
diff --git a/storeio/pager.c b/storeio/pager.c
index 12387939..01a1525c 100644
--- a/storeio/pager.c
+++ b/storeio/pager.c
@@ -66,9 +66,9 @@ pager_read_page (struct user_pager_info *upi,
 return 0;
 }
 
-/* For pager PAGER, synchronously write one page from BUF to offset PAGE.  In
-   addition, vm_deallocate (or equivalent) BUF.  The only permissible error
-   returns are EIO, EDQUOT, and ENOSPC. */
+/* For pager PAGER, synchronously write one page from BUF to offset PAGE.
+   Do not deallocate BUF, and do not keep any references to BUF.  The only
+   permissible error returns are EIO, EDQUOT, and ENOSPC. */
 error_t
 pager_write_page (struct user_pager_info *upi,
  vm_offset_t page, vm_address_t buf)
@@ -90,8 +90,6 @@ pager_write_page (struct user_pager_info *upi,
 
   err = dev_write (dev, page, (char *)buf, want, &written);
 
-  munmap ((caddr_t) buf, vm_page_size);
-
   if (err || written < want)
return EIO;
   else
diff --git a/tmpfs/pager-stubs.c b/tmpfs/pager-stubs.c
index 3299e218..883a635e 100644
--- a/tmpfs/pager-stubs.c
+++ b/tmpfs/pager-stubs.c
@@ -36,9 +36,9 @@ pager_read_page (struct user_pager_info *pager,
 }
 
 /* The user must define this function.  For pager PAGER, synchronously
-   write one page from BUF to offset PAGE.  In addition, mfree
-   (or equivalent) BUF.  The only permissible error returns are EIO,
-   EDQUOT, and ENOSPC. */
+   write one page from BUF to offset PAGE.  Do not deallocate BUF, and do
+   not keep any references to BUF.  The only permissible error returns
+   are EIO, EDQUOT, and ENOSPC. */
 error_t
 pager_write_page (struct user_pager_info *pager,
  vm_offset_t page,
-- 
2.31.1




[PATCH 4/4] libpager: Do not throw away precious pages

2021-05-08 Thread Sergey Bugaev
The kernel invokes memory_object_data_return () with dirty = 0 in two
cases: if notification on eviction was requested, or when returning
precious pages. Properly distinguish between the two cases, and only
throw the clean page away in the former case.
---
 libpager/data-return.c | 54 ++
 1 file changed, 18 insertions(+), 36 deletions(-)

diff --git a/libpager/data-return.c b/libpager/data-return.c
index 2cc9f1f8..304e88bc 100644
--- a/libpager/data-return.c
+++ b/libpager/data-return.c
@@ -93,23 +93,6 @@ _pager_do_write_request (struct pager *p,
 
   pm_entries = &p->pagemap[offset / __vm_page_size];
 
-  if (! dirty)
-{
-  munmap ((void *) data, length);
-  if (!kcopy) {
-/* Prepare notified array.  */
-for (i = 0; i < npages; i++)
-  notified[i] = (p->notify_on_evict
- && ! (pm_entries[i] & PM_PAGEINWAIT));
-
-goto notify;
-  }
-  else {
-_pager_allow_termination (p);
-goto release_out;
-  }
-}
-
   /* Make sure there are no other in-progress writes for any of these
  pages before we begin.  This imposes a little more serialization
  than we really have to require (because *all* future writes on
@@ -118,26 +101,24 @@ _pager_do_write_request (struct pager *p,
   /* XXX: Is this still needed?  */
  retry:
   for (i = 0; i < npages; i++)
-if (pm_entries[i] & PM_PAGINGOUT)
-  {
-   pm_entries[i] |= PM_WRITEWAIT;
-   pthread_cond_wait (&p->wakeup, &p->interlock);
-   goto retry;
+{
+  if ((initializing && (pm_entries[i] & PM_INIT)) ||
+  (!dirty && !(pm_entries[i] & PM_PRECIOUS)))
+{
+  omit_data[i] = 1;
+  continue;
+}
+  if (pm_entries[i] & PM_PAGINGOUT)
+{
+  pm_entries[i] |= PM_WRITEWAIT;
+  pthread_cond_wait (&p->wakeup, &p->interlock);
+  goto retry;
   }
+}
 
   /* Mark these pages as being paged out.  */
-  if (initializing)
-{
-  for (i = 0; i < npages; i++)
-   {
- if (pm_entries[i] & PM_INIT)
-   omit_data[i] = 1;
- else
-   pm_entries[i] |= PM_PAGINGOUT | PM_INIT;
-   }
-}
-  else
-for (i = 0; i < npages; i++)
+  for (i = 0; i < npages; i++)
+if (!omit_data[i])
   pm_entries[i] |= PM_PAGINGOUT | PM_INIT;
 
   /* If this write occurs while a lock is pending, record
@@ -179,7 +160,9 @@ _pager_do_write_request (struct pager *p,
 {
   if (omit_data[i])
{
- notified[i] = 0;
+ notified[i] = (p->notify_on_evict
+ && !kcopy
+ && ! (pm_entries[i] & PM_PAGEINWAIT));
  continue;
}
 
@@ -225,7 +208,6 @@ _pager_do_write_request (struct pager *p,
   if (wakeup)
 pthread_cond_broadcast (&p->wakeup);
 
- notify:
   _pager_allow_termination (p);
   pthread_mutex_unlock (&p->interlock);
 
-- 
2.31.1




[PATCH 0/4] _pager_do_write_request fixes

2021-05-08 Thread Sergey Bugaev
This is the same patch, split into three for readabilty and
reviewability. I'm still looking for feedback :)

Instead of removing the munmap from _pager_do_write_request (),
document that pager_write_page () is *not* supposed to unmap the
page (nor keep references to it), and remove the munmap call from
storeio.

Sergey Bugaev (4):
  libpager: pager_write_page () should not unmap page
  libpager: Store omit_data in an array
  libpager: Track which pages are precious
  libpager: Do not throw away precious pages

 doc/hurd.texi  |  6 ++--
 libpager/data-return.c | 67 --
 libpager/offer-page.c  |  2 ++
 libpager/pager.h   |  6 ++--
 libpager/priv.h|  1 +
 storeio/pager.c|  8 ++---
 tmpfs/pager-stubs.c|  6 ++--
 7 files changed, 41 insertions(+), 55 deletions(-)

-- 
2.31.1




[PATCH 2/4] libpager: Store omit_data in an array

2021-05-08 Thread Sergey Bugaev
While this is less efficient than using a single bit per page,
there is no guarantee that the kernel will not send us more than
32 pages at a time. Indeed, one can easily craft a situation where
it will send more by unmapping a large mapping.

Also, this function already uses on-stack arrays for tracking other
information about the pages; so do the same for whether the data
should be omited.
---
 libpager/data-return.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/libpager/data-return.c b/libpager/data-return.c
index c0f5aaf7..db2f84e6 100644
--- a/libpager/data-return.c
+++ b/libpager/data-return.c
@@ -38,12 +38,12 @@ _pager_do_write_request (struct pager *p,
   short *pm_entries;
   int npages, i;
   char *notified;
+  char *omit_data;
   error_t *pagerrs;
   struct lock_request *lr;
   struct lock_list {struct lock_request *lr;
struct lock_list *next;} *lock_list, *ll;
   int wakeup;
-  int omitdata = 0;
 
   if (!p
   || p->port.class != _pager_class)
@@ -78,6 +78,9 @@ _pager_do_write_request (struct pager *p,
   npages = length / __vm_page_size;
   pagerrs = alloca (npages * sizeof (error_t));
 
+  omit_data = alloca (npages * (sizeof *omit_data));
+  memset (omit_data, 0, npages * (sizeof *omit_data));
+
   notified = alloca (npages * (sizeof *notified));
 #ifndef NDEBUG
   memset (notified, -1, npages * (sizeof *notified));
@@ -125,11 +128,10 @@ _pager_do_write_request (struct pager *p,
   /* Mark these pages as being paged out.  */
   if (initializing)
 {
-  assert_backtrace (npages <= 32);
   for (i = 0; i < npages; i++)
{
  if (pm_entries[i] & PM_INIT)
-   omitdata |= 1 << i;
+   omit_data[i] = 1;
  else
pm_entries[i] |= PM_PAGINGOUT | PM_INIT;
}
@@ -162,7 +164,7 @@ _pager_do_write_request (struct pager *p,
  but until the pager library interface is changed, this will have to do. */
 
   for (i = 0; i < npages; i++)
-if (!(omitdata & (1 << i)))
+if (!omit_data[i])
   pagerrs[i] = pager_write_page (p->upi,
 offset + (vm_page_size * i),
 data + (vm_page_size * i));
@@ -175,7 +177,7 @@ _pager_do_write_request (struct pager *p,
   wakeup = 0;
   for (i = 0; i < npages; i++)
 {
-  if (omitdata & (1 << i))
+  if (omit_data[i])
{
  notified[i] = 0;
  continue;
-- 
2.31.1




[PATCH 3/4] libpager: Track which pages are precious

2021-05-08 Thread Sergey Bugaev
This introduces the PM_PRECIOUS bit, which gets set when a precious
page is offered to the kernel using pager_offer_page () with the
precious argument set, and cleared when the page is returned back.
---
 libpager/data-return.c | 3 ++-
 libpager/offer-page.c  | 2 ++
 libpager/priv.h| 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libpager/data-return.c b/libpager/data-return.c
index db2f84e6..2cc9f1f8 100644
--- a/libpager/data-return.c
+++ b/libpager/data-return.c
@@ -214,7 +214,8 @@ _pager_do_write_request (struct pager *p,
pm_entries[i] &= ~PM_INCORE;
}
 
-  pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT | PM_WRITEWAIT);
+  pm_entries[i] &= ~(PM_PAGINGOUT | PM_PAGEINWAIT
+   | PM_WRITEWAIT | PM_PRECIOUS);
 }
 
   for (ll = lock_list; ll; ll = ll->next)
diff --git a/libpager/offer-page.c b/libpager/offer-page.c
index 26f88ca3..392e83b8 100644
--- a/libpager/offer-page.c
+++ b/libpager/offer-page.c
@@ -38,6 +38,8 @@ pager_offer_page (struct pager *p,
 
   short *pm_entry = &p->pagemap[offset / vm_page_size];
   *pm_entry |= PM_INCORE;
+  if (precious)
+*pm_entry |= PM_PRECIOUS;
 
   err = memory_object_data_supply (p->memobjcntl, offset, buf, vm_page_size, 0,
writelock ? VM_PROT_WRITE : VM_PROT_NONE,
diff --git a/libpager/priv.h b/libpager/priv.h
index d9d76965..a5e22f36 100644
--- a/libpager/priv.h
+++ b/libpager/priv.h
@@ -108,6 +108,7 @@ extern int _pager_page_errors[];
 
 /* Pagemap format */
 /* These are binary state bits */
+#define PM_PRECIOUS   0x0400   /* return even if not dirty */
 #define PM_WRITEWAIT  0x0200   /* queue wakeup once write is done */
 #define PM_INIT   0x0100/* data has been written */
 #define PM_INCORE 0x0080   /* kernel might have a copy */
-- 
2.31.1




glibc THREAD_GSCOPE and excessive gsync_wake ()

2021-05-08 Thread Sergey Bugaev
I've noticed that even a simple hello world does a lot of gsync calls:

$ rpctrace echo hello world |& grep -c gsync
53

These are in fact all identical gsync_wake () calls on a single address:

$ rpctrace echo hello world |& grep gsync | uniq
task132(pid18549)->gsync_wake (202820 0 0);

This immediately screams that something is wrong to me, for two reasons:
* Surely there's just a single thread in /bin/echo! Well, there's
always the implicit signal/msg thread, but it shouldn't be creating
*that* much contention.
* gsync_wake () calls are useless unless paired with gsync_wait ().

I traced the gsync_wake () calls to the THREAD_GSCOPE macros in glibc
(sysdeps/mach/hurd/tls.h). As I understand it from looking at the
Linux version, THREAD_GSCOPE is supposed to be an RCU-like mechanism,
where a thread can wait until all the other threads have left the
"global scope" at least once. The Hurd version is different, it waits
for all the other threads to be outside the "global scope" *at the
same time*. Which also sounds wrong to me, because this way threads
constantly entering and exiting the "global scope" at the same time
can completely starve a waiting thread, unlike with RCU. I'm probably
misunderstanding something though.

Anyway, the Hurd implementation seems pretty straightforward: it keeps
a single global counter, increments it when entering "global scope",
decrements it when exiting it. If this was the last thread exiting, it
calls gsync_wake (), and this is where the excessive gsync_wake ()
calls come from.

In my experience building synchronization primitives on top of
futexes, one of the most important things to do is make it so that the
fast (uncontended) path never requires calling into the kernel. Any
(reasonably small) amount of userspace logic is far cheaper than
calling into the kernel. Which is why you add various "have waiters"
flags, to only call futex(FUTEX_WAKE_PRIVATE) if there are, in fact,
any waiters.

So I took a stab at implementing this for THREAD_GSCOPE. I'm currently
building glibc to see if it works; but I might have easily messed up
("futexes are tricky"!). Please tell me if I did :) This is not
supposed to be a patch ready for merging (I don't even know if this is
the right list to send glibc patches to); just an RFC.

Sergey

-- >8 --
diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h
index f83956d3..9fccd9f2 100644
--- a/sysdeps/mach/hurd/tls.h
+++ b/sysdeps/mach/hurd/tls.h
@@ -55,22 +55,46 @@
 /* Global scope switch support.  */
 #define THREAD_GSCOPE_IN_TCB  0
 #define THREAD_GSCOPE_GLOBAL
+
+#define THREAD_GSCOPE_WAITERS (1 << 31)
+
 #define THREAD_GSCOPE_SET_FLAG() \
   atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1)
+
 #define THREAD_GSCOPE_RESET_FLAG() \
-  do  \
-if (atomic_exchange_and_add_rel (&GL(dl_thread_gscope_count), -1) == 1)   \
-  lll_wake (GL(dl_thread_gscope_count), 0);  \
+  do \
+{ \
+  int count; \
+  count = atomic_exchange_and_add_rel (&GL(dl_thread_gscope_count), -1); \
+  if (count == (1 | THREAD_GSCOPE_WAITERS)) \
+{ \
+  /* We're going to wake everyone; clear the waiters flag.  */ \
+  count = atomic_and_val (&GL(dl_thread_gscope_count),
~THREAD_GSCOPE_WAITERS); \
+  if (count & THREAD_GSCOPE_WAITERS) \
+lll_wake (GL(dl_thread_gscope_count), GSYNC_BROADCAST); \
+} \
+} \
   while (0)
+
 #define THREAD_GSCOPE_WAIT() \
-  do  \
-{  \
-  int count;  \
-  atomic_write_barrier ();  \
-  while ((count = GL(dl_thread_gscope_count)))  \
-lll_wait (GL(dl_thread_gscope_count), count, 0);  \
-}  \
-  while (0)
+  do \
+{ \
+  int count; \
+  count = atomic_load_acquire (&GL(dl_thread_gscope_count));\
+  if (count == 0) \
+break; \
+  if (!(count & THREAD_GSCOPE_WAITERS)) \
+{ \
+  /* Record that we're going to wait.  */ \
+  if (atomic_compare_and_exchange_bool_acq ( \
+&GL(dl_thread_gscope_count), \
+count | THREAD_GSCOPE_WAITERS, count)) \
+continue; \
+  count |= THREAD_GSCOPE_WAITERS; \
+} \
+  lll_wait (GL(dl_thread_gscope_count), count, 0); \
+} \
+  while (1)

 #endif /* !ASSEMBLER */



Re: glibc THREAD_GSCOPE and excessive gsync_wake ()

2021-05-08 Thread Sergey Bugaev
On Sat, May 8, 2021 at 10:54 PM Samuel Thibault  wrote:
> Don't bother looking at the existing implementation, its roots dates
> before we really had gsync working.

Interesting!

As far as I can see, this implementation was committed (by you) in
2018; and gsync has not seen significant changes since 2017. What am I
missing?

> While at it, rather try to reproduce the other archs' behavior: use
> gscope_flag from the tcb, and switch THREAD_GSCOPE_IN_TCB to 1. That
> will make the implementation much simpler and scalable.

Ah, I assumed this was on purpose. Thanks for the pointer.

Sergey



[RFC PATCH glibc 3/3] XXX: Reimplement gscope

2021-05-09 Thread Sergey Bugaev
This moves the Hurd gsope implementation to be more like the Linux
one.
---
 sysdeps/generic/ldsodefs.h  |  2 --
 sysdeps/htl/dl-thread_gscope_wait.c | 36 +
 sysdeps/mach/hurd/i386/tls.h| 22 ++
 sysdeps/mach/hurd/tls.h | 20 
 4 files changed, 58 insertions(+), 22 deletions(-)
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c

diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 4604d00b..af16162f 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1324,10 +1324,8 @@ link_map_audit_state (struct link_map *l, size_t index)
 }
 #endif /* SHARED */
 
-#if THREAD_GSCOPE_IN_TCB
 void __thread_gscope_wait (void) attribute_hidden;
 # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
-#endif
 
 __END_DECLS
 
diff --git a/sysdeps/htl/dl-thread_gscope_wait.c 
b/sysdeps/htl/dl-thread_gscope_wait.c
new file mode 100644
index ..66b75194
--- /dev/null
+++ b/sysdeps/htl/dl-thread_gscope_wait.c
@@ -0,0 +1,36 @@
+/* TODO: copyright heading */
+
+#include 
+#include 
+#include 
+
+void
+__thread_gscope_wait (void)
+{
+  size_t i;
+  struct __pthread *t;
+  int *flag_ptr;
+
+  lll_lock (GL (dl_pthread_threads_lock), 0);
+
+  for (i = 0; i < GL (dl_pthread_num_threads); ++i)
+{
+  t = GL (dl_pthread_threads[i]);
+  if (t == NULL)
+continue;
+
+  flag_ptr = &t->tcb->gscope_flag;
+ again:
+  if (atomic_load_acquire (flag_ptr) == THREAD_GSCOPE_FLAG_UNUSED)
+continue;
+
+  if (atomic_compare_and_exchange_bool_acq (flag_ptr,
+THREAD_GSCOPE_FLAG_WAIT,
+THREAD_GSCOPE_FLAG_USED))
+goto again;
+
+  lll_wait (flag_ptr, THREAD_GSCOPE_FLAG_WAIT, 0);
+}
+
+  lll_unlock (GL (dl_pthread_threads_lock), 0);
+}
diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h
index 057b2613..0ca4e641 100644
--- a/sysdeps/mach/hurd/i386/tls.h
+++ b/sysdeps/mach/hurd/i386/tls.h
@@ -369,6 +369,28 @@ _hurd_tls_new (thread_t child, struct i386_thread_state 
*state, tcbhead_t *tcb)
   return err;
 }
 
+/* Global scope switch support.  */
+#define THREAD_GSCOPE_LINK_MAP  0
+
+# define THREAD_GSCOPE_FLAG_UNUSED 0
+# define THREAD_GSCOPE_FLAG_USED   1
+# define THREAD_GSCOPE_FLAG_WAIT   2
+
+#define THREAD_GSCOPE_SET_FLAG() \
+  THREAD_SETMEM (THREAD_SELF, gscope_flag, THREAD_GSCOPE_FLAG_USED)
+
+#define THREAD_GSCOPE_RESET_FLAG() \
+  ({   \
+int __flag;\
+asm volatile ("xchgl %0, %%gs:%P1" \
+  : "=r" (__flag)  \
+  : "i" (offsetof (tcbhead_t, gscope_flag)),   \
+"0" (THREAD_GSCOPE_FLAG_UNUSED));  \
+if (__flag == THREAD_GSCOPE_FLAG_WAIT) \
+  lll_wake (THREAD_SELF->gscope_flag, GSYNC_BROADCAST);\
+  })
+
+
 #endif /* !__ASSEMBLER__ */
 
 #endif /* i386/tls.h */
diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h
index 8cbf22aa..8e66d5ff 100644
--- a/sysdeps/mach/hurd/tls.h
+++ b/sysdeps/mach/hurd/tls.h
@@ -52,26 +52,6 @@
 # define GET_DTV(descr) \
   (((tcbhead_t *) (descr))->dtv)
 
-/* Global scope switch support.  */
-#define THREAD_GSCOPE_LINK_MAP0
-#define THREAD_GSCOPE_GLOBAL
-#define THREAD_GSCOPE_SET_FLAG() \
-  atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1)
-#define THREAD_GSCOPE_RESET_FLAG() \
-  do \
-if (atomic_exchange_and_add_rel (&GL(dl_thread_gscope_count), -1) == 1)   \
-  lll_wake (GL(dl_thread_gscope_count), 0);
  \
-  while (0)
-#define THREAD_GSCOPE_WAIT() \
-  do \
-{\
-  int count; \
-  atomic_write_barrier ();   \
-  while ((count = GL(dl_thread_gscope_count)))   \
-lll_wait (GL(dl_thread_gscope_count), count, 0); \
-}\
-  while (0)
-
 #endif /* !ASSEMBLER */
 
 
-- 
2.31.1




[RFC PATCH glibc 1/3] XXX: Rename THREAD_GSCOPE_IN_TCB -> THREAD_GSCOPE_LINK_MAP

2021-05-09 Thread Sergey Bugaev
This patch is probably bogus; but I'm including it anyway.

I've noticed that THREAD_GSCOPE_IN_TCB is actually being used not to
check whether the thread global scope state is being stored in TCB,
but rather as a general flag to distinguish between NPTL and HTL.

When THREAD_GSCOPE_IN_TCB is defined, glibc (namely elf/dl-*) uses
NPTL-specific internals, such as their stack link map. Since the
following patches will move HTL towards having the global scope flag
in TCB as well, but the distinction between NPTL and HTL is still
useful, I'm trying to rename this flag.

This new name is also bad, though.

Overall, it was my impression that glibc tries hard not to have
ifdef'ed code paths (dependent on the platform being built for) in
common code; they seem to use platform-specific versions of headers
instead. So perhaps instead of having an environment variable, the
right thing to do would be to move this code into platform-specific
files.
---
 elf/dl-reloc.c| 4 ++--
 elf/dl-support.c  | 2 +-
 elf/dl-tls.c  | 6 +++---
 elf/rtld.c| 2 +-
 sysdeps/aarch64/nptl/tls.h| 2 +-
 sysdeps/alpha/nptl/tls.h  | 2 +-
 sysdeps/arc/nptl/tls.h| 2 +-
 sysdeps/arm/nptl/tls.h| 2 +-
 sysdeps/csky/nptl/tls.h   | 2 +-
 sysdeps/generic/ldsodefs.h| 8 
 sysdeps/generic/tls.h | 3 ++-
 sysdeps/hppa/nptl/tls.h   | 2 +-
 sysdeps/i386/nptl/tls.h   | 2 +-
 sysdeps/ia64/nptl/tls.h   | 2 +-
 sysdeps/m68k/nptl/tls.h   | 2 +-
 sysdeps/mach/hurd/tls.h   | 2 +-
 sysdeps/microblaze/nptl/tls.h | 2 +-
 sysdeps/mips/nptl/tls.h   | 2 +-
 sysdeps/nios2/nptl/tls.h  | 2 +-
 sysdeps/powerpc/nptl/tls.h| 2 +-
 sysdeps/riscv/nptl/tls.h  | 2 +-
 sysdeps/s390/nptl/tls.h   | 2 +-
 sysdeps/sh/nptl/tls.h | 2 +-
 sysdeps/sparc/nptl/tls.h  | 2 +-
 sysdeps/x86_64/nptl/tls.h | 2 +-
 25 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index bb9ca1a1..77e10895 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -141,7 +141,7 @@ cannot allocate memory in static TLS block"));
 }
 }
 
-#if !THREAD_GSCOPE_IN_TCB
+#if !THREAD_GSCOPE_LINK_MAP
 /* Initialize static TLS area and DTV for current (only) thread.
libpthread implementations should provide their own hook
to handle all threads.  */
@@ -160,7 +160,7 @@ _dl_nothread_init_static_tls (struct link_map *map)
   memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
  '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
 }
-#endif /* !THREAD_GSCOPE_IN_TCB */
+#endif /* !THREAD_GSCOPE_LINK_MAP */
 
 void
 _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
diff --git a/elf/dl-support.c b/elf/dl-support.c
index f966a2e7..af340bee 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -189,7 +189,7 @@ ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS;
 int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
 
 
-#if THREAD_GSCOPE_IN_TCB
+#if THREAD_GSCOPE_LINK_MAP
 list_t _dl_stack_used;
 list_t _dl_stack_user;
 int _dl_stack_cache_lock;
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 6baff0c1..8a7c534b 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -29,7 +29,7 @@
 #include 
 #include 
 
-#if THREAD_GSCOPE_IN_TCB
+#if THREAD_GSCOPE_LINK_MAP
 # include 
 #endif
 
@@ -1010,7 +1010,7 @@ cannot create TLS data structures"));
 }
 }
 
-#if THREAD_GSCOPE_IN_TCB
+#if THREAD_GSCOPE_LINK_MAP
 static inline void __attribute__((always_inline))
 init_one_static_tls (struct pthread *curp, struct link_map *map)
 {
@@ -1043,4 +1043,4 @@ _dl_init_static_tls (struct link_map *map)
 
   lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
 }
-#endif /* THREAD_GSCOPE_IN_TCB */
+#endif /* THREAD_GSCOPE_LINK_MAP */
diff --git a/elf/rtld.c b/elf/rtld.c
index ad325d4c..1857479a 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -1139,7 +1139,7 @@ dl_main (const ElfW(Phdr) *phdr,
   struct dl_main_state state;
   dl_main_state_init (&state);
 
-#if !THREAD_GSCOPE_IN_TCB
+#if !THREAD_GSCOPE_LINK_MAP
   GL(dl_init_static_tls) = &_dl_nothread_init_static_tls;
 #endif
 
diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h
index 6e896207..a21a0826 100644
--- a/sysdeps/aarch64/nptl/tls.h
+++ b/sysdeps/aarch64/nptl/tls.h
@@ -109,7 +109,7 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-# define THREAD_GSCOPE_IN_TCB  1
+# define THREAD_GSCOPE_LINK_MAP  1
 # define THREAD_GSCOPE_FLAG_UNUSED 0
 # define THREAD_GSCOPE_FLAG_USED   1
 # define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h
index 4dbccc52..d15defaa 100644
--- a/sysdeps/alpha/nptl/tls.h
+++ b/sysdeps/alpha/nptl/tls.h
@@ -103,7 +103,7 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define 

[RFC PATCH glibc 2/3] XXX: Move __pthread_threads to ld.so

2021-05-09 Thread Sergey Bugaev
The next commit is going to introduce a new implementation of
THREAD_GSCOPE_WAIT, which needs to access the list of threads. Since
it must be usable from the dynamic laoder, we have to move the
symbols for the list of threads into the loader.

This is my attempt at doing so.
---
 elf/dl-support.c |  4 ++-
 htl/Versions |  2 +-
 htl/pt-alloc.c   | 46 
 htl/pt-create.c  | 11 ++
 htl/pt-internal.h| 28 +--
 sysdeps/generic/ldsodefs.h   |  8 -
 sysdeps/htl/pt-key-delete.c  |  8 ++---
 sysdeps/htl/pthreadP.h   |  2 +-
 sysdeps/htl/raise.c  |  9 +++--
 sysdeps/htl/thrd_current.c   |  8 +++--
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 13 files changed, 67 insertions(+), 65 deletions(-)

diff --git a/elf/dl-support.c b/elf/dl-support.c
index af340bee..a55834a1 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -194,7 +194,9 @@ list_t _dl_stack_used;
 list_t _dl_stack_user;
 int _dl_stack_cache_lock;
 #else
-int _dl_thread_gscope_count;
+int _dl_pthread_num_threads;
+struct __pthread **_dl_pthread_threads;
+int _dl_pthread_threads_lock;
 void (*_dl_init_static_tls) (struct link_map *) = 
&_dl_nothread_init_static_tls;
 #endif
 struct dl_scope_free_list *_dl_scope_free_list;
diff --git a/htl/Versions b/htl/Versions
index 9506043c..07a343f9 100644
--- a/htl/Versions
+++ b/htl/Versions
@@ -168,7 +168,7 @@ libpthread {
   GLIBC_PRIVATE {
 __pthread_initialize_minimal;
 
-__pthread_threads;
+# __pthread_threads;
 
 __cthread_detach;
 __cthread_fork;
diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c
index acc67f27..b60daf81 100644
--- a/htl/pt-alloc.c
+++ b/htl/pt-alloc.c
@@ -29,17 +29,17 @@
corresponding to that specified by the given thread ID."  */
 
 /* Thread ID lookup table.  */
-struct __pthread **__pthread_threads;
+// struct __pthread **__pthread_threads;
 
 /* The size of the thread ID lookup table.  */
 int __pthread_max_threads;
 
 /* The total number of thread IDs currently in use, or on the list of
available thread IDs.  */
-int __pthread_num_threads;
+// int __pthread_num_threads;
 
 /* A lock for the table, and the other variables above.  */
-pthread_rwlock_t __pthread_threads_lock;
+// pthread_rwlock_t __pthread_threads_lock;
 
 /* List of thread structures corresponding to free thread IDs.  */
 struct __pthread *__pthread_free_threads;
@@ -132,25 +132,25 @@ __pthread_alloc (struct __pthread **pthread)
 }
 
 retry:
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  lll_lock (GL (dl_pthread_threads_lock), 0);
 
-  if (__pthread_num_threads < __pthread_max_threads)
+  if (GL (dl_pthread_num_threads) < __pthread_max_threads)
 {
   /* We have a free slot.  Use the slot number plus one as the
  thread ID for the new thread.  */
-  new->thread = 1 + __pthread_num_threads++;
-  __pthread_threads[new->thread - 1] = NULL;
+  new->thread = 1 + GL (dl_pthread_num_threads)++;
+  GL (dl_pthread_threads)[new->thread - 1] = NULL;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), 0);
 
   *pthread = new;
   return 0;
 }
 #ifdef PTHREAD_THREADS_MAX
-  else if (__pthread_num_threads >= PTHREAD_THREADS_MAX)
+  else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX)
 {
   /* We have reached the limit on the number of threads per process.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), 0);
 
   free (new);
   return EAGAIN;
@@ -162,7 +162,7 @@ retry:
  memory allocation, since that's a potentially blocking operation.  */
   max_threads = __pthread_max_threads;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), 0);
 
   /* Allocate a new lookup table that's twice as large.  */
   new_max_threads
@@ -174,13 +174,13 @@ retry:
   return ENOMEM;
 }
 
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  lll_lock (GL (dl_pthread_threads_lock), 0);
 
   /* Check if nobody else has already enlarged the table.  */
   if (max_threads != __pthread_max_threads)
 {
   /* Yep, they did.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), 0);
 
   /* Free the newly allocated table and try again to allocate a slot.  */
   free (threads);
@@ -188,22 +188,22 @@ retry:
 }
 
   /* Copy over the contents of the old table.  */
-  memcpy (threads, __pthread_threads,
+  memcpy (threads, GL (dl_pthread_threads),
  __pthread_max_threads * sizeof (struct __pthread *));
 
   /* Save the location of the old table.  We want to deallocate its
  

[RFC PATCH glibc 0/3] Rewrite THREAD_GSCOPE

2021-05-09 Thread Sergey Bugaev
Alright, here's my second attempt.

On Sat, May 8, 2021 at 10:54 PM Samuel Thibault  wrote:
> While at it, rather try to reproduce the other archs' behavior: use
> gscope_flag from the tcb, and switch THREAD_GSCOPE_IN_TCB to 1. That
> will make the implementation much simpler and scalable.

Well, this has been much less traightforward than the patch I sent
yesterday (but unlike that patch, it appears to build and work). I am
much less familiar with glibc internals than I am with Hurd, and it
shows. Please don't throw stones at me :)

For one thing, I don't think I understand how the buildsystem works.
(For instance, how do I even add a new source file to a library? Or
where does it decide which system-specific parts to build, such as
whether to build NPTL or HTL? How do I debug the buildsystem when
something doesn't work as expected?) It must be that I'm used to
either straightforward Makefiles, or higher-level buildsystems (like
CMake or Meson) that autogenerate the low-level boilerplate.

Next, I was somewhat confused by what the GL macro is. My current
understanding is: some data needs to be shared between the dynamic
loader and glibc proper, if glibc is being built as a shared library.
To support that case, such data is placed into a special struct
instance that itself resides in ld.so. Then the GL macros is used
to actually access those symbols. It switches, mostly transparently,
between accessing said global struct instance in ld.so, and simply
declaring global variables for the case of static build. Naming
such variable with the dl_ prefix is just a convention.

I likely got something wrong there. If I got the general idea wrong,
the second patch must be completely bogus.

Initially, I tried to get rid of THREAD_GSCOPE_IN_TCB completely,
since THREAD_GSCOPE is now in TCB in all ports. But it turned out
that the way it was actualy being used was not to check if the
thread scope flag is in TCB or not, but simply to distingusish NPTL
from HTL. Namely, files like elf/dl-tls.c would directly use NPTL
specifics like struct pthread if THREAD_GSCOPE_IN_TCB is 1.

The actual implementation of THREAD_GSCOPE_* is similar to the one
in NPTL. Each thread uses a flag in its TCB. THREAD_GSCOPE_WAIT()
has to iterate over all the threads and wait for each one to leave
the global scope. This has to be available from inside ld.so, which
is why I had to convert __pthread_threads (and friends) to
GL (dl_pthread_threads). NPTL does a similar thing with its global
thread lists, too.

Now, on to the good parts. It builds, and even appears to work
somewhat. And all the gsync_wake () calls are gone!

~/glibc/build$ ./testrun.sh /bin/uname
GNU
~/glibc/build$ ./testrun.sh --tool=rpctrace /bin/uname |& grep gsync
~/glibc/build$

Sergey Bugaev (3):
  XXX: Rename THREAD_GSCOPE_IN_TCB -> THREAD_GSCOPE_LINK_MAP
  XXX: Move __pthread_threads to ld.so
  XXX: Reimplement gscope

 elf/dl-reloc.c   |  4 +--
 elf/dl-support.c |  6 ++--
 elf/dl-tls.c |  6 ++--
 elf/rtld.c   |  2 +-
 htl/Versions |  2 +-
 htl/pt-alloc.c   | 46 
 htl/pt-create.c  | 11 ++
 htl/pt-internal.h| 28 +--
 sysdeps/aarch64/nptl/tls.h   |  2 +-
 sysdeps/alpha/nptl/tls.h |  2 +-
 sysdeps/arc/nptl/tls.h   |  2 +-
 sysdeps/arm/nptl/tls.h   |  2 +-
 sysdeps/csky/nptl/tls.h  |  2 +-
 sysdeps/generic/ldsodefs.h   | 18 ++
 sysdeps/generic/tls.h|  3 +-
 sysdeps/hppa/nptl/tls.h  |  2 +-
 sysdeps/htl/dl-thread_gscope_wait.c  | 36 +++
 sysdeps/htl/pt-key-delete.c  |  8 ++---
 sysdeps/htl/pthreadP.h   |  2 +-
 sysdeps/htl/raise.c  |  9 +++--
 sysdeps/htl/thrd_current.c   |  8 +++--
 sysdeps/i386/nptl/tls.h  |  2 +-
 sysdeps/ia64/nptl/tls.h  |  2 +-
 sysdeps/m68k/nptl/tls.h  |  2 +-
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 sysdeps/mach/hurd/i386/tls.h | 22 
 sysdeps/mach/hurd/tls.h  | 20 ---
 sysdeps/microblaze/nptl/tls.h|  2 +-
 sysdeps/mips/nptl/tls.h  |  2 +-
 sysdeps/nios2/nptl/tls.h |  2 +-
 sysdeps/powerpc/nptl/tls.h   |  2 +-
 sysdeps/riscv/nptl/tls.h |  2 +-
 sysdeps/s390/nptl/tls.h  |  2 +-
 sysdeps/sh/nptl/tls.h|  2 +-
 sysdeps/sparc/nptl/tls.h |  2 +-
 sysdeps/x86_64/nptl/tls.h|  2 +-
 38 files changed, 156 insertions(+), 117 d

rpctrace output improvements?

2021-05-12 Thread Sergey Bugaev
Hello,

I use rpctrace a lot, but I'm not particularly fond of its output
format. Other tracing tools, like strace [0] on Linux and Darling's
xtrace [1] can symbolicate many flags and structures, producing
something that looks much closer to C code, and comprehensible at a
glance.

[0]: https://strace.io/
[1]: https://docs.darlinghq.org/contributing/debugging.html#xtrace

I thought I'd look if I could start implementing the same for
rpctrace. Here's sample rpctrace output:

  111<--144(pid1004)->dir_lookup ("proc/loadavg" 1 0) = 0 1 "loadavg"
  163<--162(pid1004)
task133(pid1004)->mach_port_mod_refs (pn{ 21} 0 1) = 0
  163<--162(pid1004)->dir_lookup ("loadavg" 1 0) = 0 1 ""165<--158(pid1004)

I was thinking it could look like this instead (mock-up):

dir_lookup (file 111 (/), "proc/loadavg", O_READ, 0) ->
FS_RETRY_NORMAL, "loadavg", file 163 (/proc)
mach_port_mod_refs (task 133 (pid 1004), pn 21, MACH_PORT_RIGHT_SEND,
1) = KERN_SUCCESS
dir_lookup (file 163 (/proc), "loadavg", O_READ, 0) ->
FS_RETRY_NORMAL, "", file 165 (/proc/loadavg)

The few changes I've made here are:
* Moved the request port into the parens, like in C and MIG
* Added commas
* Replaced = with -> for denoting out params
* Dropped the 0 return code, except when there are no out params
* Symbolicated flags and file paths (potentially, also proc and auth handles)
* Some other misc tweaks to how ports are represented

I have started working on implementing this; but thought I'd ask for
feedback first. What do you think, does this sound worthwhile? Perhaps
you like some of the proposed changes but not others? Or is this a
terrible idea that I should forget about?

Sergey

P. S. There are other interesting features that we could copy from
strace, such as dynamically attaching to a process (I bet that would
be fun to implement!), tracing a subset of all calls (maps very well
to different RPC subsystems), and injecting errors (very useful for
testing all error handling code paths).



Re: rpctrace output improvements?

2021-05-12 Thread Sergey Bugaev
On Wed, May 12, 2021 at 11:48 PM Samuel Thibault
 wrote:
> > * Symbolicated flags and file paths (potentially, also proc and auth 
> > handles)
>
> For file path it'll probably be much harder since there can be several
> path (or even no path!) for a given file port.

And as far as I can see, there's no RPC to get the parent directory of
a file. I can do that for a dir with dir_lookup ("..") (and that's
what _hurd_canonicalize_directory_name_internal () uses), but it won't
work with arbitrary files.

So my plan is to use _hurd_canonicalize_directory_name_internal ()
with directories (previously unknown ports when they're used as a
request port in a dir_* RPC), and take note of file paths as file
ports are received from dir_lookup (). This should catch all files
that the tracee opens; but not any fds it inherits. This should not be
a big deal though; since when using rpctrace std{in,out,err} will
typically be the terminal, and for those we can always get a name.

Of course, there'd always be a fallback: if we see an unrecognized
flag value, or a port we cannot symbolicate, we'd just print its raw
value (as now).

> > and injecting errors (very useful for testing all error handling code
> > paths).
>
> That's probably a bit tricky to express.

I was thinking it'd use the same format as strace, e.g.
--inject=file:error=ENOENT:when=9..10 (inject ENOENT into the 9th and
10th syscalls from the "file" group).

Sergey

P. S. Could you please take a look at my glibc patches in the other thread?



Re: rpctrace output improvements?

2021-05-13 Thread Sergey Bugaev
On Wed, May 12, 2021 at 11:48 PM Samuel Thibault
 wrote:
> It'd probably mean introducing types
> here and there, but that'd be a really good thing to do.
>
> The port type is useful indeed and is most often available in the types
> already.

Oh, do you mean the types specified in .defs? If so, the problem is
rpctrace doesn't read those, it only reads the msgids files, which
contain lines like this:

fs 2 dir_lookup 18 20018 20118

(meaning, subsystem fs whose base msgid is 2, routine name is
dir_lookup, msgid relative to subsystem base is 18, absolute msgid is
20018, reply msgid is 20118). There are no types in there.

So either:
* rpctrace has to learn to parse defs, instead of (or in addition to) msgids
* GNU MIG needs to be taught to emit more info into msgids files
* rcptrace itself would hardcode knowledge of signatures of some common RPCs

My plan has been the latter, since there likely aren't that many RPCs
that need specific treatment, and since rpctrace already knows to
treat some RPCs (slightly) specially.

But perhaps you would recommend something else? Or did you mean
something else entirely?

On Thu, May 13, 2021 at 12:15 AM Samuel Thibault
 wrote:
> > And as far as I can see, there's no RPC to get the parent directory of
> > a file.
>
> Sure since the file can be hardlinked from various places.

Yes; but other systems manage to still make this work. E.g. on my GNU/Linux box:

$ touch foo
$ ln foo bar
$ ls -l /proc/self/fd/{3,4} 3>foo 4>bar
l-wx--. 1 sergey sergey 64 May 13 14:59 /proc/self/fd/3 -> /tmp/foo
l-wx--. 1 sergey sergey 64 May 13 14:59 /proc/self/fd/4 -> /tmp/bar

I don't know how this work in Linux; but in the SerenityOS kernel, an
open FileDescription keeps a reference to a Custody (an which is a
chain of Inode's + file names), which can then be used to recreate the
path the FileDescription has been opened through.

And the Hurd also does this, in fact! libnetfs and libdiskfs already
have char *path in struct peropen, meaning they already know for which
path this individual peropen was created, even if there are several
paths that can be used to get to the [i]node. It's just that there's
no RPC that exposes this info to clients.

> > P. S. Could you please take a look at my glibc patches in the other thread?
>
> It's piled up in the middle of so many dozens of stuff that people throw
> at me to do.

Well... thank you for reviewing my patches and otherwise guiding me so
far. And if there's a better way to contribute beside throwing even
more patches at you, I'd be glad to.

Sergey



How do I disclose a vulnerability?

2021-05-14 Thread Sergey Bugaev
As luck would have it, I have found a serious issue in a core
component of the Hurd. It is a denial of service, which can then be
turned into privilege escalation.

I have developed an exploit. Here is it in action:

sergey@sergey-hurd-box:~/hax$ id
uid=1000(sergey) gid=1000(sergey)
groups=1000(sergey),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),103(netdev)
sergey@sergey-hurd-box:~/hax$ ./hax
Got root auth port :)
root@sergey-hurd-box:~/hax# id
uid=0(root) gid=0(root) groups=0(root)
root@sergey-hurd-box:~/hax#

(To be clear, I'm not the first person to realize that, let's say,
_this way of doing things_ could be exploited. I just stumbled on a
piece of code, realized that it uses a problematic pattern, thought of
possible ramifications, and developed the specific exploit.)

As far as I can see from Git history, this vulnerability has been
present in the code base for more than 20 years. Is such a
vulnerability already known (and am I just late to the party)?

If it's not known, how do I responsibly disclose this, so that
nobody's system gets hacked? I guess I could send the vulnerability
description and the exploit source code in a private e-mail; is there
perhaps a dedicated GNU e-mail address for this purpose? How do we
ensure that a future commit fixing the vulnerability doesn't
immediately disclose what it was?

Or, should I just dump the whole thing out in the open on this mailing list?

Should we get a CVE ID assigned? Should we notify Debian?

Sergey

P. S. On a personal note, it has been *very* exciting to find the
issue and develop a successful exploit! But now I'm a bit lost as to
what to do next.

And sorry for throwing more stuff at you. This can certainly wait for
a few more days if it hasn't been discovered for 20 years.



Re: How do I disclose a vulnerability?

2021-05-14 Thread Sergey Bugaev
On Fri, May 14, 2021 at 4:30 PM Samuel Thibault  wrote:
> We don't have anything set up for disclosures, you can drop me an e-mail
> (ciphered if you can).

OK, I'll prepare a write-up and send it to you. And I will attempt to
use GPG for it.

I asked about this on the Fediverse; and got (among other replies)
this small guide [0] which sounds like a good plan of action. What do
you think?

Oh, and you would not believe this, but in the past couple of hours I
have discovered *another* vulnerability, unrelated to the first one;
it's even easier to exploit and also gives you root:

sergey@sergey-hurd-box:~/hax2$ ./hax2
Got root auth port :)
root@sergey-hurd-box:~/hax2# id
uid=0(root) gid=0(root) groups=0(root)
root@sergey-hurd-box:~/hax2#

Sergey

[0]: https://functional.cafe/@minoru/106234136976353911



Re: rpctrace output improvements?

2021-05-15 Thread Sergey Bugaev
On Sat, May 15, 2021 at 4:56 PM Samuel Thibault  wrote:
> > * rpctrace has to learn to parse defs, instead of (or in addition to) msgids
> > * GNU MIG needs to be taught to emit more info into msgids files
>
> Either of those. Probably the latter would be more interesting
> long-term. We could introduce a new file format that encodes the RPCs
> in a way that is easy to parse for programs, similary to what the msgid
> files, does, but more powerful.

Well, it just so happens that I have some experience in extending MIG
for precisely this purpose.

Darwin uses untyped MIG, so we couldn't just make xtrace iterate over
data items like rpctrace does. Instead I extended Darling MIG to
generate a separate C source file, fooXtraceMig.c, in addition to
fooUser.c and fooServer.c. The file consists of routines that can
parse a message blob back into its arguments; this is then built into
dynamic libraries that are installed into a special directory and
dlopened by xtrace on startup.

But I don't think we want to go all loadable dynamic libraries for
rpctrace; typed MIG makes this much easier to do with just a
declarative format.

Sergey



Re: [RFC PATCH glibc 2/3] XXX: Move __pthread_threads to ld.so

2021-05-15 Thread Sergey Bugaev
On Sat, May 15, 2021 at 5:09 PM Samuel Thibault  wrote:
> This can't go to the generic-purpose dl-support.c file.
> Create a sysdeps/mach/hurd/dl-thread_gscope_wait.c file, it's already
> getting included by elf/Makefile's routine variable.

I'm creating sysdeps/htl/dl-thread_gscope_wait.c in the next commit.
Is there a reason this has to go into
sysdeps/mach/hurd/dl-thread_gscope_wait.c rather than there, it seems
rather specific to HTL?

> The rest looks ok, did you try to run make check to make sure nothing
> broke?

I started it and it ran for some time, apparently without crashing.
But then I interrupted it before it could finish, and anyway I don't
know where to look for results and how to interpret them. I'll figure
it out.

Sergey



Re: How do I disclose a vulnerability?

2021-05-18 Thread Sergey Bugaev
On Fri, May 14, 2021 at 7:33 PM Sergey Bugaev  wrote:
> Oh, and you would not believe this, but in the past couple of hours I
> have discovered *another* vulnerability, unrelated to the first one;
> it's even easier to exploit and also gives you root

And one more, unrelated to either of the first two:

sergey@sergey-hurd-box:~/hax3$ gcc hax3.c -o hax3
sergey@sergey-hurd-box:~/hax3$ ids
effective uids: 1000(sergey)
effective gids: 1000(sergey) 24(cdrom) 25(floppy) 27(sudo) 29(audio)
30(dip) 44(video) 46(plugdev) 103(netdev)
available uids: 1000(sergey) 1000(sergey)
available gids: 1000(sergey) 1000(sergey)
sergey@sergey-hurd-box:~/hax3$ ./hax3
Got root auth port :)
root@sergey-hurd-box:~/hax3# ids
effective uids: 0(root)
effective gids: 0(root)
available uids: 0(root) 0(root)
available gids: 0(root) 0(root)
root@sergey-hurd-box:~/hax3#

There are other obvious issues with [this part of code], many of them
should also be exploitable.

Sergey



[PATCH] utils/ps.c: Don't limit output width to 80 on non-tty

2021-05-18 Thread Sergey Bugaev
If no output width limit has been set explicitly, and we're not printing
to a tty, do not limit output width.

In particular, this fixes grepping ps output.
---
 utils/ps.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/utils/ps.c b/utils/ps.c
index 2cf6e4bd..8e39c2c5 100644
--- a/utils/ps.c
+++ b/utils/ps.c
@@ -367,6 +367,11 @@ main(int argc, char *argv[])
   /* Parse our command line.  This shouldn't ever return an error.  */
   argp_parse (&argp, argc, argv, 0, 0, 0);
 
+  /* If no output width limit has been set explicitly, and we're not printing
+ to a tty, do not limit output width.  */
+  if (output_width == -1 && !isatty (1))
+output_width = 0;
+
   msgids_scan_std ();
 
   err = proc_stat_list_create(context, &procset);
-- 
2.31.1




[PATCH 0/1] Fix that procfs crash

2021-05-24 Thread Sergey Bugaev
This fixes the procfs crash that happens on shutdown, previously discussed
here: https://lists.debian.org/debian-hurd/2021/04/msg00035.html

Sergey Bugaev (1):
  procfs: Fix use-after-free

 procfs/rootdir.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.31.1




[PATCH 1/1] procfs: Fix use-after-free

2021-05-24 Thread Sergey Bugaev
Fix several issues with reference counting in
rootdir_make_translated_node ():

* Grab an additional reference while still holding the lock.

* Give the node an additional reference for being pointed to
  by the ops.

* Reference the existing node if we find it on the second try,
  not only if we find it on the first try.

* Dereference, not just clean, the created node if it turns out
  to be unneeded.

Fixes a crash with the following backtrace:

 #2  0x010855c8 in __assert_fail_backtrace (
 assertion=0x1051148 "! (r.hard == 1 && r.weak == 0) || !\"refcount 
detected use-after-free!\"",
 file=0x1050cec "../../libshouldbeinlibc/refcount.h", line=170, 
function=0x1051220 <__PRETTY_FUNCTION__.3> "refcounts_ref")
 at ../../libshouldbeinlibc/assert-backtrace.c:64
 #3  0x01050358 in refcounts_ref (result=0x0, ref=0x15009a0) at 
../../libshouldbeinlibc/refcount.h:170
 #4  netfs_nref (np=0x15008f0) at ../../libnetfs/nref.c:26
 #5  0x0804c9f4 in rootdir_make_translated_node (dir_hook=0x10001990, 
entry_hook=0x8055180 <__compound_literal.8>)
 at ../../procfs/rootdir.c:674
 #6  0x0804ace3 in procfs_dir_lookup (hook=0x10001c10, name=0x2857efc "mounts", 
np=0x2855d68) at ../../procfs/procfs_dir.c:88
 #7  0x0804a410 in procfs_lookup (np=0x10001c40, name=0x2857efc "mounts", 
npp=0x2855d68) at ../../procfs/procfs.c:185
 #8  0x0804cae5 in dircat_lookup (hook=0x10001d50, name=0x2857efc "mounts", 
np=0x2855d68) at ../../procfs/dircat.c:76
 #9  0x0804a410 in procfs_lookup (np=0x10001d80, name=0x2857efc "mounts", 
npp=0x2855d68) at ../../procfs/procfs.c:185
 #10 0x0804a96f in netfs_attempt_lookup (user=0x1401c60, dir=0x10001d80, 
name=0x2857efc "mounts", np=0x2855d68)
 at ../../procfs/netfs.c:212
 #11 0x0103fd5b in netfs_S_dir_lookup (dircred=, 
filename=, flags=,
 mode=, do_retry=, retry_name=, retry_port=,
 retry_port_type=) at ../../libnetfs/dir-lookup.c:175
---
 procfs/rootdir.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/procfs/rootdir.c b/procfs/rootdir.c
index 9f860a91..0e7c05c0 100644
--- a/procfs/rootdir.c
+++ b/procfs/rootdir.c
@@ -667,13 +667,12 @@ rootdir_make_translated_node (void *dir_hook, const void 
*entry_hook)
 
   pthread_spin_lock (&rootdir_translated_node_lock);
   np = *ops->npp;
+  if (np != NULL)
+netfs_nref (np);
   pthread_spin_unlock (&rootdir_translated_node_lock);
 
   if (np != NULL)
-{
-  netfs_nref (np);
-  return np;
-}
+return np;
 
   np = procfs_make_node (entry_hook, (void *) entry_hook);
   if (np == NULL)
@@ -686,11 +685,12 @@ rootdir_make_translated_node (void *dir_hook, const void 
*entry_hook)
   prev = *ops->npp;
   if (*ops->npp == NULL)
 *ops->npp = np;
+  netfs_nref (*ops->npp);
   pthread_spin_unlock (&rootdir_translated_node_lock);
 
   if (prev != NULL)
 {
-  procfs_cleanup (np);
+  netfs_nrele (np);
   np = prev;
 }
 
-- 
2.31.1




[RFC PATCH 0/1] gsync blocking the calling thread considered harmful

2021-06-10 Thread Sergey Bugaev
Hello,

while hacking on rpctrace -p, I once again ran into gsync_wait () calls 
permanently hanging rpctrace.

The reason for this is simple: once rpctrace logs the gsync_wait () call it 
receives from a traced task, it forwards the same gsync_wait () call to the 
actual task port of the traced task, and this causes rpctrace itself to block 
since the gsync_wait () implementation always affects the calling thread. 
Normally, some time later the blocked thread would be woken up by a gsync_wake 
() call done by another thread; but since rpctrace itself is single-threaded, 
other threads in the traced tasks can enqueue gsync_wake_request ()s in vain, 
as they'll never even get received by the hanging rpctrace, nor forwarded to 
the kernel.

One way to work around this would be to make rpctrace multithreaded, and I've 
heard there's been some work in that direction.

But to me it sounds like gsync_wait () blocking rpctrace is the part that goes 
wrong. Generally, rpctrace is never supposed to block on an RPC made by a 
traced task: it forwards the request message without blocking for the reply 
(that may come later, or never).

So I've been long thinking that gsync_wait () should do the same: instead of 
actually blocking the thread calling gsync_wait (), gsync_wait_request () 
should return immediately, and the reply message will come once someone calls 
gsync_wake (). This doesn't change anything for the regular callers of 
gsync_wait (), since the call will still appear to block the same way other 
RPCs do, but it will actually block on msg receive, not inside gsync_wait () 
itself.

This must have been discussed before, and there must be a reason why gsync_wait 
() was made to behave the way that it does and not in the (arguably simpler and 
more consistent) way I'm proposing; but I can't find any relevant discussion.

Anyway, I thought I'd try implementing my idea and seeing what would break. 
Much like with glibc, I'm not very familiar with Mach-the-kernel internals, but 
to my surprise the first version that compiled appears to work just fine, 
booting a full working Hurd system (and rpctracing gsync_wait () totally 
works). Still, I probably messed something up: some locking or reference 
counting or somesuch; so please review :)

The part that I didn't figure out yet is how the kernel can listen for a right 
to become a dead name (like the dead-name notification in userspace). Perhaps 
it amounts to calling ipc_port_dnrequest () and listening for messages just 
like in userspace, but I have not figured out the details yet. Without this, 
the kernel cannot really know when the reply port is deallocated, either 
explicitly by userspace or because the task died, so it means the kernel will 
leak memory allocated for the waiters. Another to-do item is timeout support; 
ideally it should just turn into waittime in the MIG definition, but this again 
requires handling the reply port dying, and would change the message format, 
and also GNU MIG doesn't yet support conditional timeouts (although I might 
have a MIG patch or two pending for this).

So, what do you think?

Sergey

P.S. Feels so good to hack on something that I can just post about in public 
again!



[RFC PATCH 1/1] gsync: Do not block the thread calling gsync_wait ()

2021-06-10 Thread Sergey Bugaev
Instead, delay the reply message until the corresponding
gsync_wake () call is made.

TODO: Handle the case of reply port dying.
---
 include/mach/gnumach.defs |   4 ++
 kern/gsync.c  | 112 +++---
 kern/gsync.h  |   8 ++-
 3 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/include/mach/gnumach.defs b/include/mach/gnumach.defs
index 531b5d4d..06859196 100644
--- a/include/mach/gnumach.defs
+++ b/include/mach/gnumach.defs
@@ -33,6 +33,9 @@ subsystem
 GNUMACH_IMPORTS
 #endif
 
+type reply_port_t = MACH_MSG_TYPE_MAKE_SEND_ONCE | polymorphic
+   ctype: mach_port_t;
+
 type vm_cache_statistics_data_t = struct[11] of integer_t;
 
 type vm_wire_t = int;
@@ -98,6 +101,7 @@ routine register_new_task_notification(
  * - GSYNC_TIMED: The call only blocks for MSEC milliseconds. */
 routine gsync_wait(
   task : task_t;
+  sreplyport reply_port: reply_port_t;
   addr : vm_address_t;
   val1 : unsigned;
   val2 : unsigned;
diff --git a/kern/gsync.c b/kern/gsync.c
index e73a6cf0..aae39ba6 100644
--- a/kern/gsync.c
+++ b/kern/gsync.c
@@ -21,9 +21,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
+#define GSYNC_MSGH_SEQNO 0
+
+static struct kmem_cache gsync_waiter_cache;
+
 /* An entry in the global hash table. */
 struct gsync_hbucket
 {
@@ -57,12 +62,13 @@ union gsync_key
 } any;
 };
 
-/* A thread that is blocked on an address with 'gsync_wait'. */
+/* An RPC that is blocked on an address with 'gsync_wait'. */
 struct gsync_waiter
 {
   struct list link;
   union gsync_key key;
-  thread_t waiter;
+  ipc_port_t reply_port;
+  mach_msg_type_name_t reply_port_type;
 };
 
 /* Needed data for temporary mappings. */
@@ -83,6 +89,9 @@ void gsync_setup (void)
   list_init (&gsync_buckets[i].entries);
   kmutex_init (&gsync_buckets[i].lock);
 }
+
+  kmem_cache_init (&gsync_waiter_cache, "gsync_waiter",
+   sizeof (struct gsync_waiter), 0, NULL, 0);
 }
 
 /* Convenience comparison functions for gsync_key's. */
@@ -219,24 +228,38 @@ temp_mapping (struct vm_args *vap, vm_offset_t addr, 
vm_prot_t prot)
   return (paddr);
 }
 
-kern_return_t gsync_wait (task_t task, vm_offset_t addr,
-  unsigned int lo, unsigned int hi, natural_t msec, int flags)
+kern_return_t
+gsync_wait (task_t task,
+ipc_port_t reply_port, mach_msg_type_name_t reply_port_type,
+vm_offset_t addr,
+unsigned int lo, unsigned int hi, natural_t msec, int flags)
 {
+  struct gsync_waiter *w;
+
   if (task == 0)
 return (KERN_INVALID_TASK);
   else if (addr % sizeof (int) != 0)
 return (KERN_INVALID_ADDRESS);
 
+  if (! IP_VALID (reply_port))
+return (KERN_INVALID_CAPABILITY);
+  if (reply_port_type == MACH_MSG_TYPE_MOVE_RECEIVE)
+return (KERN_INVALID_RIGHT);
+
+  w = (struct gsync_waiter *) kmem_cache_alloc (&gsync_waiter_cache);
+  if (! w)
+return (KERN_RESOURCE_SHORTAGE);
+
   vm_map_lock_read (task->map);
 
-  struct gsync_waiter w;
   struct vm_args va;
   boolean_t remote = task != current_task ();
-  int bucket = gsync_prepare_key (task, addr, flags, &w.key, &va);
+  int bucket = gsync_prepare_key (task, addr, flags, &w->key, &va);
 
   if (bucket < 0)
 {
   vm_map_unlock_read (task->map);
+  kmem_cache_free (&gsync_waiter_cache, (vm_offset_t) w);
   return (KERN_INVALID_ADDRESS);
 }
   else if (remote)
@@ -270,6 +293,7 @@ kern_return_t gsync_wait (task_t task, vm_offset_t addr,
   vm_map_unlock_read (task->map);
   /* Make sure to remove the reference we added. */
   vm_object_deallocate (va.obj);
+  kmem_cache_free (&gsync_waiter_cache, (vm_offset_t) w);
   return (KERN_MEMORY_FAILURE);
 }
 
@@ -293,6 +317,7 @@ kern_return_t gsync_wait (task_t task, vm_offset_t addr,
   if (! equal)
 {
   kmutex_unlock (&hbp->lock);
+  kmem_cache_free (&gsync_waiter_cache, (vm_offset_t) w);
   return (KERN_INVALID_ARGUMENT);
 }
 
@@ -300,37 +325,18 @@ kern_return_t gsync_wait (task_t task, vm_offset_t addr,
* compares strictly greater than this waiter. */
   struct list *runp;
   list_for_each (&hbp->entries, runp)
-if (gsync_key_lt (&w.key, &node_to_waiter(runp)->key))
+if (gsync_key_lt (&w->key, &node_to_waiter(runp)->key))
   break;
 
-  /* Finally, add ourselves to the list and go to sleep. */
-  list_add (runp->prev, runp, &w.link);
-  w.waiter = current_thread ();
+  /* Finally, add ourselves to the list. */
 
-  if (flags & GSYNC_TIMED)
-thread_will_wait_with_timeout (w.waiter, msec);
-  else
-thread_will_wait (w.waiter);
+  list_add (runp->prev, runp, &w->link);
+  w->reply_port = reply_port;
+  w->reply_port_type = reply_port_type;
 
   kmutex_unlock (&hbp->lock);
-  thread_block (thread_no_continuation);
-
-  /* We're back. */
-  kern_return_t ret = KERN_SUCCESS;
-  if (current_thread()->wait_result != THREAD_AWAKENED)
-{
-  /* We were interrupted or timed out. */
-

[PATCH 0/1] Rewrite /hurd/symlink on top of trivfs

2021-06-21 Thread Sergey Bugaev
Hello,

I came by this issue [0], which basically states that /hurd/symlink was
behaving in a weird way. Namely, when opened with O_NOLINK, it would
appear to be an empty file, not a symlink. The report talks about broken
symlinks, but the issue is more fundamental:

$ settrans -ac /tmp/demo /hurd/symlink /etc/hostname
$ cat /tmp/demo
sergey-hurd-box
$ file /tmp/demo
/tmp/demo: empty
$ ls -l /tmp/demo
-rw-r--r-- 1 sergey sergey 0 Jun 21 15:52 /tmp/demo
$ fsysopts /tmp/demo
ext2fs --writable --relatime --no-inherit-dir-group --store-type=typed 
device:hd0s1

[0]: https://www.gnu.org/software/hurd/open_issues/symlink_translator.html

I took a look at trans/symlink.c, and found out that:

* It was not using libtrivs, as I would expect it to.

* It was treating O_NOLINK like O_NOTRANS; namely by returning the underlying
  node. This is the cause of the issue: O_NOLINK has very different semantics.
  It should open the link itself, and then suppport io_stat () and io_read ()
  on it, returning S_IFLNK | 0777 for the mode, and the link target as
  contents.

* It was leaking ports (should deallocate dotdot when not MOVE_SEND'ing it),
  and skipping some error checks.

So I thought I'd try rewriting /hurd/symlink to use trivfs. This is possible
because trivfs has this convenient trivfs_getroot_hook mechanism, which lets me
either handle fsys_getroot () any way I want to, or continue with the default
implementation found in trivfs (for O_NOLINK).

End result: it seems to work great. Absolute, relative; file, ls, ls -l, stat,
readlink, cat; everything just does what I'd expect it to. Even fsysopts now
shows the invocation properly. Some demos:

$ settrans -ac /tmp/demo trans/symlink /etc/hostname
$ cat /etc/hostname
sergey-hurd-box
$ cat /tmp/demo
sergey-hurd-box
$ file /tmp/demo
/tmp/demo: symbolic link to /etc/hostname
$ ls -l /tmp/demo
lrwxrwxrwx 1 sergey sergey 13 Jun 21 16:11 /tmp/demo -> /etc/hostname
$ fsysopts /tmp/demo
trans/symlink /etc/hostname

$ settrans -ac /tmp/demo trans/symlink no-such-file
$ cat /tmp/demo
cat: /tmp/demo: No such file or directory
$ file /tmp/demo
/tmp/demo: broken symbolic link to no-such-file
$ ls -l /tmp/demo
lrwxrwxrwx 1 sergey sergey 12 Jun 21 16:11 /tmp/demo -> no-such-file

Sergey Bugaev (1):
  trans/symlink.c: Rewrite on top of trivfs

 trans/Makefile  |  10 +-
 trans/symlink.c | 243 +++-
 2 files changed, 137 insertions(+), 116 deletions(-)

-- 
2.31.1




[PATCH 1/1] trans/symlink.c: Rewrite on top of trivfs

2021-06-21 Thread Sergey Bugaev
---
 trans/Makefile  |  10 +-
 trans/symlink.c | 243 +++-
 2 files changed, 137 insertions(+), 116 deletions(-)

diff --git a/trans/Makefile b/trans/Makefile
index 6cf50e7a..c04220a2 100644
--- a/trans/Makefile
+++ b/trans/Makefile
@@ -25,7 +25,7 @@ targets = symlink firmlink ifsock magic null fifo new-fifo 
fwd crash \
 SRCS = ifsock.c symlink.c magic.c null.c fifo.c new-fifo.c fwd.c \
crash.c firmlink.c password.c hello.c hello-mt.c streamio.c \
fakeroot.c proxy-defpager.c remap.c mtab.c
-OBJS = $(SRCS:.c=.o) fsysServer.o ifsockServer.o passwordServer.o \
+OBJS = $(SRCS:.c=.o) ifsockServer.o passwordServer.o \
crashServer.o crash_replyUser.o msgServer.o \
default_pagerServer.o default_pagerUser.o \
device_replyServer.o elfcore.o startup_notifyServer.o
@@ -48,11 +48,6 @@ device_reply-MIGSFLAGS=\
 "-DMACH_PAYLOAD_TO_PORT=ports_payload_get_name" \
 "-DDEVICE_IMPORTS=import \"$(srcdir)/../libports/ports.h\";"
 
-# fsysServer is only used by the symlink translator which does not use
-# libports.  Disable the default payload to port conversion.
-fsys-MIGSFLAGS = "-DHURD_DEFAULT_PAYLOAD_TO_PORT=1"
-fsysServer-CFLAGS = "-DMIG_EOPNOTSUPP=EOPNOTSUPP"
-
 # If we have a configured tree, include the configuration so that we
 # can conditionally build translators.
 ifneq (,$(wildcard ../config.make))
@@ -76,11 +71,10 @@ mtab: fsysUser.o
 password: passwordServer.o
 proxy-defpager: default_pagerServer.o default_pagerUser.o
 streamio: device_replyServer.o
-symlink: fsysServer.o
 
 fakeroot: ../libnetfs/libnetfs.a
 fifo new-fifo: ../libpipe/libpipe.a
-crash fifo firmlink hello hello-mt ifsock magic mtab new-fifo null password 
proxy-defpager remap streamio: ../libtrivfs/libtrivfs.a
+crash fifo firmlink hello hello-mt ifsock magic mtab new-fifo null password 
proxy-defpager remap streamio symlink: ../libtrivfs/libtrivfs.a
 $(targets): ../libfshelp/libfshelp.a \
../libihash/libihash.a \
../libiohelp/libiohelp.a \
diff --git a/trans/symlink.c b/trans/symlink.c
index 80d60f64..63c94048 100644
--- a/trans/symlink.c
+++ b/trans/symlink.c
@@ -1,5 +1,5 @@
 /* Translator for S_IFLNK nodes
-   Copyright (C) 1994, 2000, 2001, 2002 Free Software Foundation
+   Copyright (C) 1994, 2000, 2001, 2002, 2021 Free Software Foundation
 
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
@@ -15,30 +15,137 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
+#include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
-#include "fsys_S.h"
 
-mach_port_t realnode;
-
-/* We return this for O_NOLINK lookups */
-mach_port_t realnodenoauth;
-
-/* We return this for non O_NOLINK lookups */
-char *linktarget;
-
-extern int fsys_server (mach_msg_header_t *, mach_msg_header_t *);
+char *link_target;
+size_t link_target_len;
 
 const char *argp_program_version = STANDARD_HURD_VERSION (symlink);
 
+int trivfs_fsid = 0;
+int trivfs_fstype = FSTYPE_MISC;
+
+int trivfs_support_read = 1;
+int trivfs_support_write = 0;
+int trivfs_support_exec = 0;
+
+int trivfs_allow_open = O_READ;
+
+void
+trivfs_modify_stat (struct trivfs_protid *cred, io_statbuf_t *st)
+{
+  st->st_size = link_target_len;
+  st->st_blocks = 0;
+  st->st_mode = S_IFLNK | 0777;
+}
+
+kern_return_t
+trivfs_S_io_read (struct trivfs_protid *cred,
+  mach_port_t reply,
+  mach_msg_type_name_t reply_type,
+  data_t *data,
+  mach_msg_type_number_t *data_len,
+  loff_t offset,
+  vm_size_t amount)
+{
+  if (!cred)
+return EOPNOTSUPP;
+  else if (!(cred->po->openmodes & O_READ))
+return EBADF;
+
+  if (amount > link_target_len)
+amount = link_target_len;
+
+  if (amount > *data_len)
+{
+  /* Try to make more space.  */
+  void *buffer;
+
+  buffer = mmap (0, amount, PROT_READ | PROT_WRITE, MAP_ANON, 0, 0);
+  if (buffer == MAP_FAILED)
+amount = *data_len;
+  else
+{
+  *data = buffer;
+  *data_len = amount;
+}
+}
+  else
+*data_len = amount;
+
+  memcpy (*data, link_target, amount);
+  return 0;
+}
+
+error_t
+get_root (struct trivfs_control *fsys,
+  mach_port_t reply_port,
+  mach_msg_type_name_t reply_port_type,
+  mach_port_t dotdot,
+  uid_t *uids, size_t nuids,
+  uid_t *gids, size_t ngids,
+  int flags,
+  retry_type *do_retry,
+  char *retry_name,
+  mach_port_t *retry_node,
+  mach_msg_type_name_t *retry_node_type)
+{
+  /* If O_NOLINK is specified, proceed
+ with opening the link itself.  */
+  if (flags & O_NOLINK)
+return EAGAIN;
+

Re: [PATCH 0/1] Rewrite /hurd/symlink on top of trivfs OFF TOPIC PRAISE

2021-07-28 Thread Sergey Bugaev
On Wed, Jul 28, 2021 at 10:39 PM  wrote:
> Thanks for working on this!

I don't believe it: I finally got my own "OFF TOPIC PRAISE" letter!
Thank you for your kind words :)

> I'm wanting to learn more about trivfs for my caesar
> cipher eventually, but looking that how to write trivial translators makes
> me a little curious...I'll learn more about it someday.  :)
>
> It would be cool to have a gemini translator.  It would only take a few 100
> lines of code.

Oh yes, a Gemini translator actually sounds like a perfect exercise!
It's not as simple as /hurd/hello or /dev/yes, but still simple
enough, and would be actually useful for the end users.

Why don't you start working on it? If you run into any trouble (or are
unsure where to start from), feel free to ask me! — I'd be glad to
help you, in so much as I understand things myself (which may not be
that much, actually). Perhaps others on this list would be interested
in helping you too.

One particular thing that may turn out to be complicated: TLS, which
is mandatory in Gemini. I admit I don't have any experience
interacting with TLS from C myself, and I hear that things are indeed
complicated there [0]. So what I would do is ignore the TLS
requirement for now and implement just cleartext Gemini, even though
it's against the spec; then once things are working figure out TLS.

[0]: 
https://blogs.gnome.org/mcatanzaro/2018/11/11/the-gnome-and-webkitgtk-networking-stack/

Sergey



Re: [PATCH 0/1] Rewrite /hurd/symlink on top of trivfs OFF TOPIC PRAISE

2021-07-29 Thread Sergey Bugaev
On Wed, Jul 28, 2021 at 11:25 PM Sergey Bugaev  wrote:
> On Wed, Jul 28, 2021 at 10:39 PM  wrote:
> > It would be cool to have a gemini translator.  It would only take a few 100
> > lines of code.
>
> Oh yes, a Gemini translator actually sounds like a perfect exercise!
> It's not as simple as /hurd/hello or /dev/yes, but still simple
> enough, and would be actually useful for the end users.

Thinking about it further,

while it's true that you should be able to write a very simple
trivfs-based Gemini translator that would basically do a request for
each open and let the client read the response body,

for a more serious, solid Gemini translator you'd want much more than
that. In particular, you'd want the ability to talk about directories
as filesystem nodes even if there's no document at that URL. So for
instance,

$ settrans gemini: /hypothetical/gemini
$ cat gemini://example.com/foo/bar/hello
Hello
# You'd also want this to work:
$ cd gemini://example.com/foo
$ cat bar/hello
Hello
# ...even if there's nothing (error code 51) at gemini://example.com/foo

Ideally, cd'ing into a directory would not even cause a network
request, only trying to read a file would. Moreover, you'd want ls(1)
to work, at least somewhat. We could learn a trick from httpfs here:
when asked to list the contents of a directory, fetch the page and
collect links that point to pages inside the directory. For instance,
gemini://gemini.circumlunar.space/ contains a bunch of links to its
subpages, which we would parse out and display as directory entries.
Thankfully, Hurd fully supports treating a single node as both a
directory and a readable file.

I think this is all implementable, but would require using netfs,
probably overriding netfs_S_dir_lookup (), and quite a bit more than a
few hundred lines of code. :|

Sergey



Re: Hurd Security vulnerabilities, please upgrade!

2021-08-10 Thread Sergey Bugaev
On Tue, Aug 10, 2021 at 5:04 AM Samuel Thibault  wrote:
> In the past months, Sergey Bugaev has been working on fixing some
> Hurd security vulnerabilities.

Well I certainly wasn't doing it alone :)

Samuel and me have been working together over the past few months to
design and implement fixes for the several severe vulnerabilities in
the Hurd. (How many of those vulnerabilities we have fixed is hard to
quantify, but it's more than just the three I reported initially.)

I worked on:
- Actually finding the vulnerabilities and developing exploits for them
- Coming up with potential ways we could work towards fixing them
- Actually writing most of the code
- Testing it in a subhurd

Samuel helped with reviewing my changes and making design decisions;
towards the end he got some time and joined in with testing,
debugging, and writing code.

None of the vulnerabilities were as simple as an off-by-one error or a
missing check; they all had to do with certain mechanisms being
structured in a way that makes them subtly insecure, which is why
fixing them required a lot of design work. We ended up switching our
approach several times; I believe our final version is much better
than what we were trying to do initially. In the end, we managed to
make the changes way less invasive than it seemed they had to be, and
they complicate things much less than it initially appeared was
necessary. Still, the changes touch most of the components of the
Hurd.

We were aiming to make it in time for the upcoming Debian release, to
make sure it already contains the fixed versions. There were some
troubles and a change of approach and new bugs discovered (and fixed)
in the last few days, but apparently we did make it in time!

I urge everybody to upgrade (and reboot!) their systems as soon as
possible. I have already updated mine, and can confirm that all my
exploits fail now.

Sergey



Re: Problems with Xorg

2021-08-10 Thread Sergey Bugaev
For what it's worth, a few years ago I was able to successfully run
Xfce, LXDE, GNUstep, parts of GNOME, and even Plan 9 from User Space.

Since then things have regressed. At one point Xorg was failing to
find any displays at all, lately it's been starting just fine and
displaying simple windows properly, but it's unable to take any input
from my mouse and keyboard, so it's still not usable.

Sergey



Re: Hurd Security vulnerabilities, please upgrade!

2021-08-10 Thread Sergey Bugaev
Hello!

On Tue, Aug 10, 2021 at 2:53 PM Richard Braun  wrote:
> I have issues mounting /home (or rather, keeping it mounted) on
> darnassus.sceen.net since the reboot. Samuel already has root access
> so he can look at it when he has time.

That's... not good. I hope we'll figure it out.

By the way, is there still a way to get an account on darnassus? I
hope I qualify as "people who have already shown some level of
involvement in the project". (But I won't be offended or anything if
the answer is no.)

(A word of warning though: I may have another vulnerability or two up
my sleeve, so giving me an account isn't that different from giving me
root access.)

> I haven't looked at the work, but if Samuel merged it, I trust it's
> good. Thanks :).
>
> --
> Richard Braun

It would be great to get code and design review from you too, if you
happen to have time for that.

Sergey



Re: Problems with Xorg

2021-08-10 Thread Sergey Bugaev
On Tue, Aug 10, 2021 at 4:42 PM Almudena Garcia
 wrote:
>
> > Since then things have regressed. At one point Xorg was failing to
> > find any displays at all, lately it's been starting just fine and
> > displaying simple windows properly, but it's unable to take any input
> > from my mouse and keyboard, so it's still not usable.
>
> This is not the point here. I am able to run MATE properly in Debian 
> GNU/Hurd, and it works with the mouse and keyboard without problems. Even I 
> can play a video in VLC.
> I repeated the test in real hardware, and it works without lags.
>
> But currently there are a "little" bug which avoid to access to filesystem 
> from files explorer, both in Xfce than MATE.
> It seems a coding error, because the error refers to a type problem (check 
> attached image).
>
> This is the only bug (that i know) which currently avoid full functionality 
> in MATE, and this is the reason because I advice to fix it.
>
> Could you check it?
> Thanks

I cannot check it because I cannot reproduce it — as I said, I don't
get to the stage where I could attempt to open a file explorer. It
also sounds quite likely to only be reproducible on your machine, not
anyone else's.

You should debug or rpctrace it to see what it actually tries to do
and how exactly it fails.

Sergey



Re: Problems with Xorg

2021-08-10 Thread Sergey Bugaev
On Tue, Aug 10, 2021 at 11:41 PM Samuel Thibault
 wrote:
> Are you at the console while running this? Normally vcs is a symlink to
> /dev/vcs/1 or such.

Good point, that works from the console. Is it supposed to be
dependent on who's asking, like /dev/console in Unix? (If so, why
doesn't it like ptys?) Or does it show the currently active virtual
console? (If so, what does it matter whether I'm asking from a
console?)

Where's that even implemented? — grep vcs -r console/ returns nothing

> (WW) Hotplugging requested but the server was compiled without a config 
> backend. No input devices were configured, the server will start without any 
> input devices.
>
> I don't know where that hotplug request comes from, but that cannot work.

Thanks, that looks relevant indeed!

Amusingly enough, the first result in Google points to someone else
having the same issue with X on the Hurd:
https://www.mail-archive.com/bug-hurd@gnu.org/msg30877.html

Looking at X source code (hw/xfree86/common/xf86Config.c), this seems
to refer to an option named AutoAddDevices. And sure enough, I have:

sergey@sergey-hurd-box:~$ grep AutoAddDevices -r /etc/X11/
/etc/X11/xorg.conf.d/auto-all.conf:Option "AutoAddDevices" "true"
sergey@sergey-hurd-box:~$ dpkg -S /etc/X11/xorg.conf.d/auto-all.conf
dpkg-query: no path found matching pattern /etc/X11/xorg.conf.d/auto-all.conf

What the hell? How'd that file get there? I certainly didn't write it.

Sergey



Re: Problems with Xorg

2021-08-10 Thread Sergey Bugaev
On Wed, Aug 11, 2021 at 12:04 AM Samuel Thibault
 wrote:
> > What the hell? How'd that file get there? I certainly didn't write it.
>
> No idea, and
>
> https://codesearch.debian.net/search?q=auto-all.conf&hl=en
>
> doesn't know about it either.

Well in any case, removing that file made input work once again! Thanks!

I can now run rio, 9term, and xterm. LXDE is still having troubles
with D-Bus or something. But we can figure that out separately.

Sergey



Re: Debian Hurd release this week-end

2021-08-13 Thread Sergey Bugaev
On Fri, Aug 13, 2021, 01:25 Samuel Thibault  wrote:

> So as usual, along the Debian release, we will have a Debian Hurd
> release. I have drafted the following list of news since the 2019
> release: <...>
> Is there some other big line that I am forgetting?
>

Perhaps "multiple security improvements" deserve a mention?

Sergey

>


Re: Debian Hurd release this week-end

2021-08-13 Thread Sergey Bugaev
On Fri, Aug 13, 2021, 10:05 Sergey Bugaev  wrote:

> Perhaps "multiple security improvements" deserve a mention?
>

Oh, or is it not going to make it into the release because of the
unfinished copyright assignment? :(

Sergey

>


Re: Regarding copyright assignment to FSF

2021-08-14 Thread Sergey Bugaev
On Sat, Aug 14, 2021 at 8:43 AM Michael Banck  wrote:
> The fact that this process potentially or apparently took (or rather,
> has been taking) months for Sergey (I don't know when it was initiated),
> is a pretty good indicator that it is more than a nuisance.

Well, this is partly my own fault: I've been postponing trying to scan
the signed papers (I can hack on kernel internals alright, but ask me
to deal with a scanner and I'm lost). But that being said, the FSF has
also been consistently slow to respond, so it would take months even
if I had done everything promptly.

Sergey



Re: PCI arbiter memory mapping

2021-08-16 Thread Sergey Bugaev
On Mon, Aug 16, 2021 at 9:17 PM Samuel Thibault  wrote:
> > > I'd rather see a
> > > hurd-specific libpciaccess function to get the pager.
> >
> > That's better, but we'd have to add that function in both hurd_pci.c and
> > x86_pci.c. I don't like the idea of adding Hurd specific code to x86_module
> > but there's already a lot of it and we could avoid the existence of struct
> > pci_user_data, which I like even less.
>
> Actually I'm thinking that this is just another case of mremap(). The
> underlying question is getting the memory object of a given memory
> range, like vm_region does. We need to be careful since we don't want
> any process to be able to get the memory object of any other process,
> but it does make sense, and would be useful for mremap.

IMO the proper way to get mremap () is vm_remap () — basically like
vm_map (), but instead of a memory object to map it takes another
(task, address) pair. OSF (or just Apple's?) Mach has it [0], although
I'd add 'anywhere' and 'unmap_source' flags to that interface.

[0] https://developer.apple.com/documentation/kernel/1585336-vm_remap

This makes me think of something cooler though: what if there was an
API to create a proxy memory object from a (task, address, size)
triple? Then you'd be able to map the new proxy, and there wouldn't be
a need for a separate vm_remap () — all without getting direct access
to the underlying memory object(s).

So, something like this:

routine vm_create_proxy (
target_task : vm_task_t;
address : vm_address_t;
size : vm_size_t;
out proxy : memory_object_t);

Then mremap () could be implemented like this:

vm_create_proxy (mach_task_self (), old_address, old_size, &proxy);
vm_deallocate (mach_task_self (), old_address, old_size);
vm_map (mach_task_self (), new_address, new_size, ..., proxy, ...);
mach_port_deallocate (mach_task_self (), proxy);

(Hmm, but what about increasing the map size? This cannot be done
securely for non-anonimous mappings, can it?)

This could also be used for other fun things, like implementing
/proc/pid/mem in a coherent way. (It can be implemented in a sad way
without this by making a pager that does vm_read () / vm_write () when
asked to retrieve / store a page.)

What do you think?

Sergey



Re: PCI arbiter memory mapping

2021-08-16 Thread Sergey Bugaev
On Mon, Aug 16, 2021 at 10:28 PM Samuel Thibault
 wrote:
> Sergey Bugaev, le lun. 16 août 2021 21:59:47 +0300, a ecrit:
> > IMO the proper way to get mremap () is vm_remap ()
>
> But that doesn't answer Joan's need for getting a memory object, to be
> able to put a proxy memory object on top of it to make it read-only.

Ah, I wasn't really following the discussion. I'm talking specifically
about mremap.

> But then it'll create a pile of proxies if the application calls mremap
> several times, which I guess some applications will happily try to do.

No, for two reasons:

1. The kernel should just transparently short-circuit nested proxies
at proxy creation time — make the new proxy reference the original
memory object and not the other proxy; this way the other proxy can be
deallocated when it's no longer referenced, even if the new proxy
continues to exist. I believe this is what Joan has suggested above:

> I think it'd be a better solution to move the call to
> memory_object_proxy_valid() and the start value overwrite to
> memory_object_create_proxy()

2. As always, when you map a proxy it's the original memory object
that actually gets mapped (and referenced, and kept alive); the proxy
will then get deallocated when the task does mach_port_deallocate
(mach_task_self (), proxy). In other words, the proxy is short-lived
and only serves to make this whole operation representable. (This is
transparent to userspace though: from its perspective it's the proxy
that stays mapped.)

Sergey



Re: PCI arbiter memory mapping

2021-08-16 Thread Sergey Bugaev
On Mon, Aug 16, 2021 at 11:43 PM Samuel Thibault
 wrote:
> Here, possibly proxies can indeed work properly. But please do look at
> what Joan's situation is to make sure it does.

I don't think I understand enough about the situation. It would help
if you or Joan were to kindly give me some more context :)

As I understand it, there's the PCI arbiter, which is a translator
that arbitrates access to PCI, which is a hardware bus that various
devices can be connected to. The hardware devices connected via PCI
are available (to the PCI arbiter) as Mach devices, and in particular
it's possible to use device_map () and then vm_map () to access the
device memory. Then there's libpciaccess whose Hurd backend uses the
files exported by the PCI arbiter to get access to the PCI, including
mapping device memory. Naturally its user can request read-only or
read-write mapping, but the PCI arbiter may decide to only return a
read-only memory object (a proxy to the device pager), in which case
libpciaccess should deallocate the port and return EPREM, or the PCI
arbiter may return the real device pager.

Is that right? What am I missing? What's the issue you're trying to solve?

Sergey



Re: PCI arbiter memory mapping

2021-08-16 Thread Sergey Bugaev
On Tue, Aug 17, 2021 at 12:10 AM Samuel Thibault
 wrote:
> Yes, and we want to implement nesting: providing a sub-hurd with a
> partial view of the PCI space.

Interesting. But I still don't think I understand what the problem you
are running into.

boot(1) already emulates Mach devices; it would itself use
libpciaccess (or the PCI arbiter directly, since it doesn't need
portability?) to get access to the host's devices that are to be
passed through to the subhurd. Then it'd return whatever memory
objects it got from the host's PCI arbiter — or proxies to them — from
emulated device_map (). The kernel would take care of short-circuiting
the nested proxies, although this is just a tiny optimization in this
case and is not required for correctness or anything like that.

Is that how you're implementing it?

(Hmm, how does I/O port access work in a subhurd? I'm guessing the
subhurd has to be privileged? But then it probably has to anyway, to
access the PCI devices?)

Sergey



Re: Problems trying to install Debian GNU/Hurd 2021 in SATA AHCI mode, over Thinkpad T410

2021-08-17 Thread Sergey Bugaev
On Tue, Aug 17, 2021 at 7:44 PM Almudena Garcia
 wrote:
> But, in fstab, the drives are registered as /dev/hd0s1 (root partition in 
> harddisk), and /dev/hd2 (cdron drive).
> I can try to change them to sdX naming, but is a bit risky

fstab only comes into play later in the boot process (certainly not
until the root partition is mounted). You'd need to instruct GRUB to
pass different arguments to the boot filesystem.

Sergey



Re: PCI arbiter memory mapping

2021-08-17 Thread Sergey Bugaev
On Tue, Aug 17, 2021 at 12:38 AM Samuel Thibault
 wrote:
> The root pci-arbiter uses libpciaccess' x86 backend to access PCI

On Tue, Aug 17, 2021 at 9:47 PM Joan Lledó  wrote:
> Yes, and the arbiter can play two roles: root arbiter, which uses x86
> module in libpciacces; and nested arbiter, which uses the hurd module in
> libpciaccess.
>
> > The hardware devices connected via PCI are available (to the PCI arbiter)
> > as Mach devices
>
> Actually, the devices are available to the arbiter as libpciaccess devices

Thank you both for explaining, this is the part that I was missing:
that the arbiter itself uses libpciaccess, and that it cannot map the
devices directly, it has to map /dev/mem.

To me it sounds like libpciaccess should have a Hurd-specific API
addition that would let the user get the memory object backing the
mapping created by device_map_region (). I.e., device_map_region () is
a cross-platform API that maps the device memory into your address
space (right?), but on the Hurd there'd also be a way to actually get
the memory object it would map (and map it yourself if you so choose,
or do something else).

It would be the job of that libpciaccess API to make this object have
the right offset and everything, so that the caller wouldn't have to
worry about that. If I understand this right, its Hurd backend already
gets the memory object with the right offset and size, and would
return that directly; while the x86 backend would either have to use
the unimplemented-as-of-now offset parameter in device_map (), or
create and return an appropriate proxy from that API.

In memory_object_create_proxy (), the kernel would take care of
short-circuiting nested proxy creation to make that a non-issue. This
will allow netfs_get_filemap (VM_PROT_READ) to create another proxy
just for enforcing read-only access without worrying that the object
might already be a proxy.

Does that sound remotely sensible? :)
Please keep in mind that while (I think) I understand Mach VM, I have
very little idea about PCI. I'm (obviously) not in a position to
decide what's best for libpciaccess and the PCI arbiter.

Sergey



Re: PCI arbiter memory mapping

2021-08-17 Thread Sergey Bugaev
On Tue, Aug 17, 2021 at 9:14 PM Joan Lledó  wrote:
> > The underlying question is getting the memory object of a given memory
> > range, like vm_region does.
>
> Yes, I see that could be useful. Is vm_region not workig for proxies?
>
>  > We need to be careful since we don't want any process to be able to
>  > get the memory object of any other process
>
> Is that not being checked yet? can I call vm_region to get another
> task's memory object now?

It doesn't matter whether it's another task or not, or whether it's a
proxy or not (in fact it will never be a proxy, but that doesn't
matter): you can no longer get the underlying memory object with
vm_region () — for, uh, obvious security reasons. It was a gnumach
extension anyway — vanilla Mach doesn't support that.

(Well in fact what actually changed is that gnumach at some point
allowed the use of "memory object name ports" in vm_map (), and now
once again doesn't. It never allowed you to get the actual memory
object.)

Sergey



Re: PCI arbiter memory mapping

2021-08-18 Thread Sergey Bugaev
On Wed, Aug 18, 2021 at 8:34 PM Joan Lledó  wrote:
> El 18/8/21 a les 0:13, Sergey Bugaev ha escrit:
> > you can no longer get the underlying memory object with vm_region ()
>
> How so? reading the docs I understood it does:
>
> https://www.gnu.org/software/hurd/gnumach-doc/Memory-Attributes.html
>
> "The port object_name identifies the memory object associated with this
> region"

Yes; the docs could have been clearer about this — and I was myself
confused about this not long ago. What it gives you is a _memory
object name port_ which "identifies" the memory object, in the sense
that you can get name ports for two regions (even from different
tasks) and they'll compare equal if it's the same memory object mapped
into both. And that's about the only thing you can do with the name
port. It's *not* the actual port to the underlying memory object.

> > (Well in fact what actually changed is that gnumach at some point
> > allowed the use of "memory object name ports" in vm_map (), and now
> > once again doesn't.
>
> vm_map does receive  the memory object as parameter:
>
> https://www.gnu.org/software/hurd/gnumach-doc/Mapping-Memory-Objects.html
>
> "memory_object is the port that represents the memory object: used by
> user tasks in vm_map"

Previously, as a GNU Mach extension, you could also pass the memory
object *name* port (one you get from vm_region ()), instead of the
memory object port. But not anymore.

> > It never allowed you to get the actual memory
> > object.)
>
> vm_map receives a memory object, doesn't return it

Yes. What I'm saying is vm_region () gives you the name port, which
previously could be used in vm_map () much like the memory object port
(this was allowed to enable mremap () to be implemented). vm_region ()
never gave out the actual memory object port, but the name port could
be used in vm_map () in a similar way to the real memory object port.

Sergey



Re: PCI arbiter memory mapping

2021-08-18 Thread Sergey Bugaev
On Wed, Aug 18, 2021 at 8:43 PM Joan Lledó  wrote:
> El 18/8/21 a les 0:02, Sergey Bugaev ha escrit:
> > To me it sounds like libpciaccess should have a Hurd-specific API
> > addition that would let the user get the memory object
>
> That's a solution and can be done. But I'd like to know more about
> vm_region first. It seems it can return the object, and I don't see why
> is a security problem to allow a task to retrieve objects belonging to
> itself.

Basically because it allowed the task to overcome any limitations
imposed onto it by max_protection (and offset/size). If you have page
2 of a memory object mapped into your address space read-only, that
shouldn't let you write to page 5.

Sergey



[RFC PATCH htl v2 0/4] Rewrite THREAD_GSCOPE

2021-08-30 Thread Sergey Bugaev
Hello,

this is a revised version of the patchset I have sent a few months ago [0].
Sending to both bug-hurd and libc-alpha this time.

[0]: https://lists.gnu.org/archive/html/bug-hurd/2021-05/msg00063.html

The goal is to get rid of the excessive (and redundant) gsync_wake () calls.
The means are rewriting how THREAD_GSCOPE_* is implemented in HTL to work more
like its NPTL counterpart. To get this working, I first have to expose HTL's
internal thread table to ld.so, a lot like NPTL already does.

Before this patchset:
$ rpctrace uname |& grep -c gsync_wake
57

With this patchset (first 3 patches):
$ ./testrun.sh --tool=rpctrace /bin/uname |& grep -c gsync_wake
0

The unsolved question with this patchset is the fate of THREAD_GSCOPE_IN_TCB.
Apparently, when it was introduced in [1], it was indeed only used to
distinguish between cases where the GSCOPE flag is in TCB or not. But as of
now, it is used mostly as a generic way to conditionally compile either HTL-
or NPTL-specific code, mostly for TLS initialization. Now that all ports have
the GSCOPE flag in TCB, this definition no longer does what it says.

[1]: 
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a5df0318ef30a4dcff3fa2cb82265f641813d9ea

I can think of two ways this can be handled:

* Code can be rewritten and reorganized to use port-specific files instead of
  conditional compilation. From looking at how glibc source is organized, this
  appears to be the generally preferred way. I'm sending one more patch that
  makes a (perhaps misguided) attempt to do this; but I'm not sure what would
  be good way to handle the definitions in  -- please take a look.

* A more conservative approach would be to leave the code using
  THREAD_GSCOPE_IN_TCB as is for now, but change the variable's name to better
  reflect what it's used for. I haven't really been able to come up with a good
  name, though.

Alternatively, it could just be left as is for now.

Sergey Bugaev (4):
  htl: Move thread table to ld.so
  htl: Reimplement GSCOPE
  testrun.sh: Add support for --tool=rpctrace
  XXX: Attempt to get rid of most THREAD_GSCOPE_IN_TCB usages

 Makefile   |  9 +++-
 elf/dl-reloc.c | 21 -
 elf/dl-support.c   | 15 --
 elf/dl-tls.c   | 39 ---
 htl/Versions   |  2 -
 htl/pt-alloc.c | 50 
 htl/pt-create.c| 11 ++---
 htl/pt-internal.h  | 23 +++--
 sysdeps/generic/ldsodefs.h | 10 ++--
 sysdeps/htl/dl-support.c   | 28 +++
 sysdeps/htl/dl-thread_gscope_wait.c| 55 ++
 sysdeps/htl/dl-tls.c   | 39 +++
 {elf => sysdeps/htl}/dl-tls_init_tp.c  |  4 +-
 sysdeps/htl/pt-key-delete.c|  8 ++--
 sysdeps/htl/pthreadP.h |  2 -
 sysdeps/htl/raise.c|  8 +++-
 sysdeps/htl/thrd_current.c |  7 ++-
 sysdeps/mach/hurd/htl/pt-sigstate-init.c   |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c  |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h  |  2 +-
 sysdeps/mach/hurd/i386/tls.h   | 19 
 sysdeps/mach/hurd/tls.h| 20 +---
 sysdeps/nptl/dl-support.c  | 26 ++
 sysdeps/nptl/dl-tls.c  | 53 +
 sysdeps/unix/sysv/linux/alpha/dl-support.c |  2 +-
 25 files changed, 287 insertions(+), 170 deletions(-)
 create mode 100644 sysdeps/htl/dl-support.c
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c
 create mode 100644 sysdeps/htl/dl-tls.c
 rename {elf => sysdeps/htl}/dl-tls_init_tp.c (93%)
 create mode 100644 sysdeps/nptl/dl-support.c
 create mode 100644 sysdeps/nptl/dl-tls.c

-- 
2.31.1




[RFC PATCH htl v2 4/4] XXX: Attempt to get rid of most THREAD_GSCOPE_IN_TCB usages

2021-08-30 Thread Sergey Bugaev
Try to do this by adding HTL- and NPTL-specific versions of
dl-support.c, dl-tls.c, and dl-tls_init_tp.c, and moving HTL and NPTL
specifics there.

The remaining user of THREAD_GSCOPE_IN_TCB is , which I'm
not sure how to handle. I see that there are sysdeps versions of it,
so it could be possible to create one for NPTL and one for HTL.

However, I don't see an existing way for these port-sepcific ldsodefs
versions to add their own members to struct rtld_global. One way to do
this would be to introduce a macro like LDSO_GLOBAL_EXTRA_MEMBERS, but
I'm not sure whether that's a good or even acceptable approach.

Signed-off-by: Sergey Bugaev 
---
 elf/dl-reloc.c | 21 -
 elf/dl-support.c   | 14 --
 elf/dl-tls.c   | 39 
 sysdeps/htl/dl-support.c   |  5 ++
 sysdeps/htl/dl-tls.c   | 39 
 {elf => sysdeps/htl}/dl-tls_init_tp.c  |  4 +-
 sysdeps/nptl/dl-support.c  | 26 +++
 sysdeps/nptl/dl-tls.c  | 53 ++
 sysdeps/unix/sysv/linux/alpha/dl-support.c |  2 +-
 9 files changed, 125 insertions(+), 78 deletions(-)
 create mode 100644 sysdeps/htl/dl-tls.c
 rename {elf => sysdeps/htl}/dl-tls_init_tp.c (93%)
 create mode 100644 sysdeps/nptl/dl-support.c
 create mode 100644 sysdeps/nptl/dl-tls.c

diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index e13a672ade..0d97af752a 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -141,27 +141,6 @@ cannot allocate memory in static TLS block"));
 }
 }
 
-#if !THREAD_GSCOPE_IN_TCB
-/* Initialize static TLS area and DTV for current (only) thread.
-   libpthread implementations should provide their own hook
-   to handle all threads.  */
-void
-_dl_nothread_init_static_tls (struct link_map *map)
-{
-#if TLS_TCB_AT_TP
-  void *dest = (char *) THREAD_SELF - map->l_tls_offset;
-#elif TLS_DTV_AT_TP
-  void *dest = (char *) THREAD_SELF + map->l_tls_offset + TLS_PRE_TCB_SIZE;
-#else
-# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
-#endif
-
-  /* Initialize the memory.  */
-  memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
- '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
-}
-#endif /* !THREAD_GSCOPE_IN_TCB */
-
 void
 _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
 int reloc_mode, int consider_profiling)
diff --git a/elf/dl-support.c b/elf/dl-support.c
index 68abf606ca..28d61bda51 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -183,20 +183,6 @@ uint64_t _dl_hwcap_mask;
  * executable but this isn't true for all platforms.  */
 ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS;
 
-#if THREAD_GSCOPE_IN_TCB
-list_t _dl_stack_used;
-list_t _dl_stack_user;
-list_t _dl_stack_cache;
-size_t _dl_stack_cache_actsize;
-uintptr_t _dl_in_flight_stack;
-int _dl_stack_cache_lock;
-#else
-/* If loading a shared object requires that we make the stack executable
-   when it was not, we do it by calling this function.
-   It returns an errno code or zero on success.  */
-int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
-void (*_dl_init_static_tls) (struct link_map *) = 
&_dl_nothread_init_static_tls;
-#endif
 struct dl_scope_free_list *_dl_scope_free_list;
 
 #ifdef NEED_DL_SYSINFO
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 423e380f7c..64797d2ed6 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -29,10 +29,6 @@
 #include 
 #include 
 
-#if THREAD_GSCOPE_IN_TCB
-# include 
-#endif
-
 #define TUNABLE_NAMESPACE rtld
 #include 
 
@@ -1057,38 +1053,3 @@ cannot create TLS data structures"));
GL(dl_tls_generation) + 1);
 }
 }
-
-#if THREAD_GSCOPE_IN_TCB
-static inline void __attribute__((always_inline))
-init_one_static_tls (struct pthread *curp, struct link_map *map)
-{
-# if TLS_TCB_AT_TP
-  void *dest = (char *) curp - map->l_tls_offset;
-# elif TLS_DTV_AT_TP
-  void *dest = (char *) curp + map->l_tls_offset + TLS_PRE_TCB_SIZE;
-# else
-#  error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
-# endif
-
-  /* Initialize the memory.  */
-  memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
- '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
-}
-
-void
-_dl_init_static_tls (struct link_map *map)
-{
-  lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
-
-  /* Iterate over the list with system-allocated threads first.  */
-  list_t *runp;
-  list_for_each (runp, &GL (dl_stack_used))
-init_one_static_tls (list_entry (runp, struct pthread, list), map);
-
-  /* Now the list with threads using user-allocated stacks.  */
-  list_for_each (runp, &GL (dl_stack_user))
-init_one_static_tls (list_entry (runp, struct pthread, list), map);
-
-  lll_unlo

[RFC PATCH htl v2 2/4] htl: Reimplement GSCOPE

2021-08-30 Thread Sergey Bugaev
This is a new implementation of GSCOPE which largely mirrors its NPTL
counterpart. Same as in NPTL, instead of a global flag shared between
threads, there is now a per-thread GSCOPE flag stored in each thread's
TCB. This makes entering and exiting a GSCOPE faster at the expense of
making THREAD_GSCOPE_WAIT () slower.

The largest win is the elimination of many redundant gsync_wake () RPC
calls; previously, even simplest programs would make dozens of fully
redundant gsync_wake () calls.

Despite all ports now putting GSCOPE flag into TCP, this patch keeps
the THREAD_GSCOPE_IN_TCB preprocessor definition, with value 0 for HTL
and 1 for NPTL. It turns out that although originally this definition
was indeed used to distinguish between the cases where the GSCOPE flag
was stored in TCB or not, it has since become used as a general way to
distinguish between HTL and NPTL. It may be worthwhile to get rid of
this flag, or at least rename it to better reflect what it's used for.
This patch does neither, though.

Signed-off-by: Sergey Bugaev 
---
 elf/dl-support.c|  1 -
 sysdeps/generic/ldsodefs.h  |  4 ---
 sysdeps/htl/dl-thread_gscope_wait.c | 55 +
 sysdeps/mach/hurd/i386/tls.h| 19 ++
 sysdeps/mach/hurd/tls.h | 20 +--
 5 files changed, 75 insertions(+), 24 deletions(-)
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c

diff --git a/elf/dl-support.c b/elf/dl-support.c
index 0155718175..68abf606ca 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -195,7 +195,6 @@ int _dl_stack_cache_lock;
when it was not, we do it by calling this function.
It returns an errno code or zero on success.  */
 int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
-int _dl_thread_gscope_count;
 void (*_dl_init_static_tls) (struct link_map *) = 
&_dl_nothread_init_static_tls;
 #endif
 struct dl_scope_free_list *_dl_scope_free_list;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 31c7d3945b..5a779ae1a3 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -487,8 +487,6 @@ struct rtld_global
   /* Mutex protecting the stack lists.  */
   EXTERN int _dl_stack_cache_lock;
 #else
-  EXTERN int _dl_thread_gscope_count;
-
   /* The total number of thread IDs currently in use, or on the list of
  available thread IDs.  */
   EXTERN int _dl_pthread_num_threads;
@@ -1379,10 +1377,8 @@ __rtld_mutex_init (void)
 }
 #endif /* !PTHREAD_IN_LIBC */
 
-#if THREAD_GSCOPE_IN_TCB
 void __thread_gscope_wait (void) attribute_hidden;
 # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
-#endif
 
 __END_DECLS
 
diff --git a/sysdeps/htl/dl-thread_gscope_wait.c 
b/sysdeps/htl/dl-thread_gscope_wait.c
new file mode 100644
index 00..b277217b8e
--- /dev/null
+++ b/sysdeps/htl/dl-thread_gscope_wait.c
@@ -0,0 +1,55 @@
+/* Out-of-line notification function for the GSCOPE locking mechanism.
+   Copyright (C) 2007-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include 
+#include 
+#include 
+
+void
+__thread_gscope_wait (void)
+{
+  size_t i;
+  struct __pthread *t;
+  int *gscope_flagp;
+
+  lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
+
+  /* Iterate over the list of threads.  */
+  for (i = 0; i < GL (dl_pthread_num_threads); ++i)
+{
+  t = GL (dl_pthread_threads[i]);
+  if (t == NULL || t->tcb->gscope_flag == THREAD_GSCOPE_FLAG_UNUSED)
+continue;
+
+  gscope_flagp = &t->tcb->gscope_flag;
+
+  /* We have to wait until this thread is done with the global
+ scope.  First tell the thread that we are waiting and
+ possibly have to be woken.  */
+  if (atomic_compare_and_exchange_bool_acq (gscope_flagp,
+THREAD_GSCOPE_FLAG_WAIT,
+THREAD_GSCOPE_FLAG_USED))
+continue;
+
+  do
+lll_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE);
+  while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT);
+}
+
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
+}
diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h
index 057b2613f

[RFC PATCH htl v2 1/4] htl: Move thread table to ld.so

2021-08-30 Thread Sergey Bugaev
The next commit is going to introduce a new implementation of
THREAD_GSCOPE_WAIT which needs to access the list of threads.
Since it must be usable from the dynamic laoder, we have to move
the symbols for the list of threads into the loader.

Signed-off-by: Sergey Bugaev 
---
 htl/Versions |  2 -
 htl/pt-alloc.c   | 50 ++--
 htl/pt-create.c  | 11 ++
 htl/pt-internal.h| 23 ---
 sysdeps/generic/ldsodefs.h   |  8 
 sysdeps/htl/dl-support.c | 23 +++
 sysdeps/htl/pt-key-delete.c  |  8 ++--
 sysdeps/htl/pthreadP.h   |  2 -
 sysdeps/htl/raise.c  |  8 +++-
 sysdeps/htl/thrd_current.c   |  7 +++-
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 13 files changed, 81 insertions(+), 67 deletions(-)
 create mode 100644 sysdeps/htl/dl-support.c

diff --git a/htl/Versions b/htl/Versions
index 4aea321016..4e0ebac285 100644
--- a/htl/Versions
+++ b/htl/Versions
@@ -168,8 +168,6 @@ libpthread {
   GLIBC_PRIVATE {
 __pthread_initialize_minimal;
 
-__pthread_threads;
-
 __cthread_detach;
 __cthread_fork;
 __pthread_detach;
diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c
index acc67f2711..f6708a60bf 100644
--- a/htl/pt-alloc.c
+++ b/htl/pt-alloc.c
@@ -28,19 +28,9 @@
of the threads functions "shall fail" if "No thread could be found
corresponding to that specified by the given thread ID."  */
 
-/* Thread ID lookup table.  */
-struct __pthread **__pthread_threads;
-
 /* The size of the thread ID lookup table.  */
 int __pthread_max_threads;
 
-/* The total number of thread IDs currently in use, or on the list of
-   available thread IDs.  */
-int __pthread_num_threads;
-
-/* A lock for the table, and the other variables above.  */
-pthread_rwlock_t __pthread_threads_lock;
-
 /* List of thread structures corresponding to free thread IDs.  */
 struct __pthread *__pthread_free_threads;
 pthread_mutex_t __pthread_free_threads_lock;
@@ -132,25 +122,25 @@ __pthread_alloc (struct __pthread **pthread)
 }
 
 retry:
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
-  if (__pthread_num_threads < __pthread_max_threads)
+  if (GL (dl_pthread_num_threads) < __pthread_max_threads)
 {
   /* We have a free slot.  Use the slot number plus one as the
  thread ID for the new thread.  */
-  new->thread = 1 + __pthread_num_threads++;
-  __pthread_threads[new->thread - 1] = NULL;
+  new->thread = 1 + GL (dl_pthread_num_threads)++;
+  GL (dl_pthread_threads)[new->thread - 1] = NULL;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   *pthread = new;
   return 0;
 }
 #ifdef PTHREAD_THREADS_MAX
-  else if (__pthread_num_threads >= PTHREAD_THREADS_MAX)
+  else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX)
 {
   /* We have reached the limit on the number of threads per process.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   free (new);
   return EAGAIN;
@@ -162,7 +152,7 @@ retry:
  memory allocation, since that's a potentially blocking operation.  */
   max_threads = __pthread_max_threads;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   /* Allocate a new lookup table that's twice as large.  */
   new_max_threads
@@ -174,13 +164,13 @@ retry:
   return ENOMEM;
 }
 
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   /* Check if nobody else has already enlarged the table.  */
   if (max_threads != __pthread_max_threads)
 {
   /* Yep, they did.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   /* Free the newly allocated table and try again to allocate a slot.  */
   free (threads);
@@ -188,22 +178,22 @@ retry:
 }
 
   /* Copy over the contents of the old table.  */
-  memcpy (threads, __pthread_threads,
+  memcpy (threads, GL (dl_pthread_threads),
  __pthread_max_threads * sizeof (struct __pthread *));
 
   /* Save the location of the old table.  We want to deallocate its
  storage after we released the lock.  */
-  old_threads = __pthread_threads;
+  old_threads = GL (dl_pthread_threads);
 
   /* Replace the table with the new one.  */
   __pthread_max_threads = new_max_threads;
-  __pthread_threads = threads;
+  GL (dl_pthread_threads) = threads;
 
   /* And allocate ourselves 

[RFC PATCH htl v2 3/4] testrun.sh: Add support for --tool=rpctrace

2021-08-30 Thread Sergey Bugaev
rpctrace(1) is a Hurd RPC tracer tool, which is used similar to how
strace(1) is used on GNU/Linux.

Signed-off-by: Sergey Bugaev 
---
 Makefile | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index f98d5a9e67..a49870d3d1 100644
--- a/Makefile
+++ b/Makefile
@@ -141,8 +141,9 @@ usage () {
 cat << EOF
 Usage: $$0 [OPTIONS]  [ARGUMENTS...]
 
-  --tool=TOOL  Run with the specified TOOL. It can be strace, valgrind or
-   container. The container will run within support/test-container.
+  --tool=TOOL  Run with the specified TOOL. It can be strace, rpctrace,
+   valgrind or container. The container will run within
+   support/test-container.
 EOF
 
   exit 1
@@ -177,6 +178,10 @@ case "$$toolname" in
 exec strace $(patsubst %, -E%, $(run-program-env)) \
   $(test-via-rtld-prefix) $${1+"$$@"}
 ;;
+  rpctrace)
+exec rpctrace $(patsubst %, -E%, $(run-program-env)) \
+  $(test-via-rtld-prefix) $${1+"$$@"}
+;;
   valgrind)
 exec env $(run-program-env) valgrind $(test-via-rtld-prefix) $${1+"$$@"}
 ;;
-- 
2.31.1




Re: [RFC PATCH htl v2 0/4] Rewrite THREAD_GSCOPE

2021-08-31 Thread Sergey Bugaev
Thanks for looking into it!

On Mon, Aug 30, 2021 at 6:37 PM Florian Weimer  wrote:
> You could kick the can down the road by switching to PTHREAD_IN_LIBC.
> In some cases, that might actually be correct replacement.

Ack; that should be easy to do. Nevertheless, please do look at patch
4/4 and say if you would prefer that approach.

Do you happen to know if there's a testcase that exercises
THREAD_GSCOPE_WAIT ()? Even simple programs evidently use
THREAD_GSCOPE_SET_FLAG () / THREAD_GSCOPE_RESET_FLAG (), but I so far
haven't been able to construct a program that would trigger the WAIT.
I've tried dlopening and dlclosing libraries concurrently from several
threads; I will try dlopening hundreds of libraries next (to hopefully
trigger a reallocation of some table), but it would be handy if there
was an official ready-to-use testcase.

Sergey



sudo & addauth woes

2021-09-07 Thread Sergey Bugaev
Hello!

Since this is bug-hurd and not just discuss-hurd, let me actually
report some bugs I just ran into.

Since recently, sudo has been broken:

$ sudo echo hi
Sorry, try again.
Sorry, try again.
sudo: 3 incorrect password attempts

It never even asks for the password, just errors out all by itself.

I thought, this is a perfect opportunity to break my Unix habits and
use addauth instead. So I run:

$ addauth root
Password:

it asks me for the (root) password, which I input, and then hangs.
Since unlike sudo it's not setuid, I can actually rpctrace it. Here
are the relevant parts:


  144<--148(pid773)->proc_getloginpids_request (1) = 0 {1 730 725 694
685 662 648 538 111 97 28 12 2 749 659 547 200 95 3 4 7 5 755 756 748
747 746 745 744 743 724 649 644 643 636 545 544 543 542 519 506 166 44
29 27 26 25 20 16 10 9 6 757 772 773}

task135(pid773)->mach_port_deallocate (pn{ 23}) = 0
  144<--148(pid773)->proc_getmsgport_request (28) = 0138<--169(pid773)
  138<--169(pid773)->msg_add_auth (   167<--162(pid773)) = 0
task135(pid773)->mach_port_deallocate (pn{ 23}) = 0
  144<--148(pid773)->proc_getmsgport_request (12) = 0138<--165(pid773)
  138<--165(pid773)->msg_add_auth (   167<--162(pid773)) = 0
task135(pid773)->mach_port_deallocate (pn{ 23}) = 0
  144<--148(pid773)->proc_getmsgport_request (2) = 0138<--169(pid773)
  138<--169(pid773)->msg_add_auth (   167<--162(pid773))

It tries to add the new auth to all the processes in the current login
collection (as documented), but I'm somehow not a login collection —
or, rather, in the default login collection rooted at PID 1. It hangs
on PID 2 — it looks like /hurd/startup does not communicate its
message port to proc properly, or something.

So next I try only adding auth to the session, not the whole login collection:

$ addauth -S root
Password:
addauth: 757: Cannot get message port: (ipc/send) invalid destination port
$ ps 757
  PID TT STAT TIME COMMAND
  757 p0 Ssow  0:00.10 -bash

Eh, this last one is actually my fault: something something
reauthentication, something something cached proc server port.

The thing that worked in the end was:

$ addauth -p $$ root

I.e. adding auth only to the shell process.

So, the issues here are:
- sudo no longer works,
- /hurd/startup / PID 2 doesn't respond to msg_add_auth (),
- OpenSSH doesn't put me into a login collection,
- addauth doesn't quite work once it adds new auth to itself.

Sergey



[PATCH htl v3 0/5] Rewrite THREAD_GSCOPE

2021-09-07 Thread Sergey Bugaev
Hello,

this is yet another revision of the same patchset to stop issuing many
redundant gsync_wake () calls. Following the brief feedback I got the last
time, it switches most users of THREAD_GSCOPE_IN_TCB to PTHREAD_IN_LIBC, then
actually introduces the new GSCOPE implementation, and then finally drops the
remaining uses of THREAD_GSCOPE_IN_TCB entirely, since it's not always on.

Things still seem to build and work somewhat (tested on GNU/Hurd i686 and
GNU/Linux x86_64). rpctrace uname still shows that there are no more
gsync_wake () calls.

Apparently dlopen (RTLD_GLOBAL) is a good way to trigger THREAD_GSCOPE_WAIT ().
I have written a test program that spawns 100 threads, then each of the threads
dlopen's 100 shared objects, then dlclose's them back. On GNU/Linux, I have
verified with GDB that it hits NPTL's __thread_gscope_wait (). I haven't been
able to verify the same on the Hurd due to the general flakiness of GDB there,
but I have ran the program with the patched glibc multiple times and it doesn't
crash, so I assume it's working fine.

Sergey Bugaev (5):
  elf: Replace most uses of THREAD_GSCOPE_IN_TCB
  htl: Move thread table to ld.so
  htl: Reimplement GSCOPE
  elf: Remove THREAD_GSCOPE_IN_TCB
  testrun.sh: Add support for --tool=rpctrace

 Makefile |  9 +++-
 elf/dl-reloc.c   |  4 +-
 elf/dl-support.c |  3 +-
 elf/dl-tls.c |  6 +--
 elf/dl-tls_init_tp.c |  2 +-
 htl/Versions |  2 -
 htl/pt-alloc.c   | 50 +
 htl/pt-create.c  | 11 ++---
 htl/pt-internal.h| 23 +++---
 sysdeps/aarch64/nptl/tls.h   |  1 -
 sysdeps/alpha/nptl/tls.h |  1 -
 sysdeps/arc/nptl/tls.h   |  1 -
 sysdeps/arm/nptl/tls.h   |  1 -
 sysdeps/csky/nptl/tls.h  |  1 -
 sysdeps/generic/ldsodefs.h   | 18 +---
 sysdeps/generic/tls.h|  6 ---
 sysdeps/hppa/nptl/tls.h  |  1 -
 sysdeps/htl/dl-support.c | 23 ++
 sysdeps/htl/dl-thread_gscope_wait.c  | 55 
 sysdeps/htl/pt-key-delete.c  |  8 ++--
 sysdeps/htl/pthreadP.h   |  2 -
 sysdeps/htl/raise.c  |  8 +++-
 sysdeps/htl/thrd_current.c   |  7 ++-
 sysdeps/i386/nptl/tls.h  |  1 -
 sysdeps/ia64/nptl/tls.h  |  1 -
 sysdeps/m68k/nptl/tls.h  |  1 -
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 sysdeps/mach/hurd/i386/tls.h | 19 
 sysdeps/mach/hurd/tls.h  | 20 -
 sysdeps/microblaze/nptl/tls.h|  1 -
 sysdeps/mips/nptl/tls.h  |  1 -
 sysdeps/nios2/nptl/tls.h |  1 -
 sysdeps/powerpc/nptl/tls.h   |  1 -
 sysdeps/riscv/nptl/tls.h |  1 -
 sysdeps/s390/nptl/tls.h  |  1 -
 sysdeps/sh/nptl/tls.h|  1 -
 sysdeps/sparc/nptl/tls.h |  1 -
 sysdeps/x86_64/nptl/tls.h|  1 -
 40 files changed, 172 insertions(+), 128 deletions(-)
 create mode 100644 sysdeps/htl/dl-support.c
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c

-- 
2.31.1




[PATCH htl v3 5/5] testrun.sh: Add support for --tool=rpctrace

2021-09-07 Thread Sergey Bugaev
rpctrace(1) is a Hurd RPC tracer tool, which is used similar to how
strace(1) is used on GNU/Linux.

Signed-off-by: Sergey Bugaev 
---
 Makefile | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index f98d5a9e67..a49870d3d1 100644
--- a/Makefile
+++ b/Makefile
@@ -141,8 +141,9 @@ usage () {
 cat << EOF
 Usage: $$0 [OPTIONS]  [ARGUMENTS...]
 
-  --tool=TOOL  Run with the specified TOOL. It can be strace, valgrind or
-   container. The container will run within support/test-container.
+  --tool=TOOL  Run with the specified TOOL. It can be strace, rpctrace,
+   valgrind or container. The container will run within
+   support/test-container.
 EOF
 
   exit 1
@@ -177,6 +178,10 @@ case "$$toolname" in
 exec strace $(patsubst %, -E%, $(run-program-env)) \
   $(test-via-rtld-prefix) $${1+"$$@"}
 ;;
+  rpctrace)
+exec rpctrace $(patsubst %, -E%, $(run-program-env)) \
+  $(test-via-rtld-prefix) $${1+"$$@"}
+;;
   valgrind)
 exec env $(run-program-env) valgrind $(test-via-rtld-prefix) $${1+"$$@"}
 ;;
-- 
2.31.1




[PATCH htl v3 4/5] elf: Remove THREAD_GSCOPE_IN_TCB

2021-09-07 Thread Sergey Bugaev
All the ports now have THREAD_GSCOPE_IN_TCB set to 1. Remove all
support for !THREAD_GSCOPE_IN_TCB, along with the definition itself.

Signed-off-by: Sergey Bugaev 
---
 elf/dl-support.c  | 3 ---
 sysdeps/aarch64/nptl/tls.h| 1 -
 sysdeps/alpha/nptl/tls.h  | 1 -
 sysdeps/arc/nptl/tls.h| 1 -
 sysdeps/arm/nptl/tls.h| 1 -
 sysdeps/csky/nptl/tls.h   | 1 -
 sysdeps/generic/ldsodefs.h| 6 --
 sysdeps/generic/tls.h | 6 --
 sysdeps/hppa/nptl/tls.h   | 1 -
 sysdeps/i386/nptl/tls.h   | 1 -
 sysdeps/ia64/nptl/tls.h   | 1 -
 sysdeps/m68k/nptl/tls.h   | 1 -
 sysdeps/mach/hurd/i386/tls.h  | 2 --
 sysdeps/microblaze/nptl/tls.h | 1 -
 sysdeps/mips/nptl/tls.h   | 1 -
 sysdeps/nios2/nptl/tls.h  | 1 -
 sysdeps/powerpc/nptl/tls.h| 1 -
 sysdeps/riscv/nptl/tls.h  | 1 -
 sysdeps/s390/nptl/tls.h   | 1 -
 sysdeps/sh/nptl/tls.h | 1 -
 sysdeps/sparc/nptl/tls.h  | 1 -
 sysdeps/x86_64/nptl/tls.h | 1 -
 22 files changed, 35 deletions(-)

diff --git a/elf/dl-support.c b/elf/dl-support.c
index a317459fae..02e2ed72f5 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -197,9 +197,6 @@ int _dl_stack_cache_lock;
 int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
 void (*_dl_init_static_tls) (struct link_map *) = 
&_dl_nothread_init_static_tls;
 #endif
-#if !THREAD_GSCOPE_IN_TCB
-int _dl_thread_gscope_count;
-#endif
 struct dl_scope_free_list *_dl_scope_free_list;
 
 #ifdef NEED_DL_SYSINFO
diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h
index 6e896207a6..72f22dc718 100644
--- a/sysdeps/aarch64/nptl/tls.h
+++ b/sysdeps/aarch64/nptl/tls.h
@@ -109,7 +109,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-# define THREAD_GSCOPE_IN_TCB  1
 # define THREAD_GSCOPE_FLAG_UNUSED 0
 # define THREAD_GSCOPE_FLAG_USED   1
 # define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h
index 4dbccc5249..6328112135 100644
--- a/sysdeps/alpha/nptl/tls.h
+++ b/sysdeps/alpha/nptl/tls.h
@@ -103,7 +103,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define THREAD_GSCOPE_IN_TCB  1
 #define THREAD_GSCOPE_FLAG_UNUSED 0
 #define THREAD_GSCOPE_FLAG_USED   1
 #define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/arc/nptl/tls.h b/sysdeps/arc/nptl/tls.h
index 95300fdd21..e269c0a7a5 100644
--- a/sysdeps/arc/nptl/tls.h
+++ b/sysdeps/arc/nptl/tls.h
@@ -111,7 +111,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define THREAD_GSCOPE_IN_TCB  1
 #define THREAD_GSCOPE_FLAG_UNUSED 0
 #define THREAD_GSCOPE_FLAG_USED   1
 #define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/arm/nptl/tls.h b/sysdeps/arm/nptl/tls.h
index 1bd11307ce..699c16acfb 100644
--- a/sysdeps/arm/nptl/tls.h
+++ b/sysdeps/arm/nptl/tls.h
@@ -100,7 +100,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define THREAD_GSCOPE_IN_TCB  1
 #define THREAD_GSCOPE_FLAG_UNUSED 0
 #define THREAD_GSCOPE_FLAG_USED   1
 #define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/csky/nptl/tls.h b/sysdeps/csky/nptl/tls.h
index 7a234041ed..b210dfcb76 100644
--- a/sysdeps/csky/nptl/tls.h
+++ b/sysdeps/csky/nptl/tls.h
@@ -127,7 +127,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-# define THREAD_GSCOPE_IN_TCB  1
 # define THREAD_GSCOPE_FLAG_UNUSED 0
 # define THREAD_GSCOPE_FLAG_USED   1
 # define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 3092bbc924..b251fd1733 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -495,10 +495,6 @@ struct rtld_global
   EXTERN struct __pthread **_dl_pthread_threads;
   EXTERN int _dl_pthread_threads_lock;
 #endif
-
-#if !THREAD_GSCOPE_IN_TCB
-  EXTERN int _dl_thread_gscope_count;
-#endif
 #ifdef SHARED
 };
 # define __rtld_global_attribute__
@@ -1381,10 +1377,8 @@ __rtld_mutex_init (void)
 }
 #endif /* !PTHREAD_IN_LIBC */
 
-#if THREAD_GSCOPE_IN_TCB
 void __thread_gscope_wait (void) attribute_hidden;
 # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
-#endif
 
 __END_DECLS
 
diff --git a/sysdeps/generic/tls.h b/sysdeps/generic/tls.h
index e86d70e6ce..f581c9a992 100644
--- a/sysdeps/generic/tls.h
+++ b/sysdeps/generic/tls.h
@@ -71,10 +71,4 @@
  This macro returns the address of the DTV of the current thread.
  This normally is done using the thread register which points
  to the dtv or the TCB (from which the DTV can found).
-
-
- THREAD_GSCOPE_IN_TCB
-
- This should be set to 1 if the global scope flag is stored within the TCB.
-   

[PATCH htl v3 3/5] htl: Reimplement GSCOPE

2021-09-07 Thread Sergey Bugaev
This is a new implementation of GSCOPE which largely mirrors its NPTL
counterpart. Same as in NPTL, instead of a global flag shared between
threads, there is now a per-thread GSCOPE flag stored in each thread's
TCB. This makes entering and exiting a GSCOPE faster at the expense of
making THREAD_GSCOPE_WAIT () slower.

The largest win is the elimination of many redundant gsync_wake () RPC
calls; previously, even simplest programs would make dozens of fully
redundant gsync_wake () calls.

Signed-off-by: Sergey Bugaev 
---
 sysdeps/htl/dl-thread_gscope_wait.c | 55 +
 sysdeps/mach/hurd/i386/tls.h| 21 +++
 sysdeps/mach/hurd/tls.h | 20 ---
 3 files changed, 76 insertions(+), 20 deletions(-)
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c

diff --git a/sysdeps/htl/dl-thread_gscope_wait.c 
b/sysdeps/htl/dl-thread_gscope_wait.c
new file mode 100644
index 00..b277217b8e
--- /dev/null
+++ b/sysdeps/htl/dl-thread_gscope_wait.c
@@ -0,0 +1,55 @@
+/* Out-of-line notification function for the GSCOPE locking mechanism.
+   Copyright (C) 2007-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include 
+#include 
+#include 
+
+void
+__thread_gscope_wait (void)
+{
+  size_t i;
+  struct __pthread *t;
+  int *gscope_flagp;
+
+  lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
+
+  /* Iterate over the list of threads.  */
+  for (i = 0; i < GL (dl_pthread_num_threads); ++i)
+{
+  t = GL (dl_pthread_threads[i]);
+  if (t == NULL || t->tcb->gscope_flag == THREAD_GSCOPE_FLAG_UNUSED)
+continue;
+
+  gscope_flagp = &t->tcb->gscope_flag;
+
+  /* We have to wait until this thread is done with the global
+ scope.  First tell the thread that we are waiting and
+ possibly have to be woken.  */
+  if (atomic_compare_and_exchange_bool_acq (gscope_flagp,
+THREAD_GSCOPE_FLAG_WAIT,
+THREAD_GSCOPE_FLAG_USED))
+continue;
+
+  do
+lll_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE);
+  while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT);
+}
+
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
+}
diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h
index 057b2613f3..2ac65df921 100644
--- a/sysdeps/mach/hurd/i386/tls.h
+++ b/sysdeps/mach/hurd/i386/tls.h
@@ -369,6 +369,27 @@ _hurd_tls_new (thread_t child, struct i386_thread_state 
*state, tcbhead_t *tcb)
   return err;
 }
 
+/* Global scope switch support.  */
+# define THREAD_GSCOPE_IN_TCB  1
+
+# define THREAD_GSCOPE_FLAG_UNUSED 0
+# define THREAD_GSCOPE_FLAG_USED   1
+# define THREAD_GSCOPE_FLAG_WAIT   2
+
+# define THREAD_GSCOPE_SET_FLAG() \
+  THREAD_SETMEM (THREAD_SELF, gscope_flag, THREAD_GSCOPE_FLAG_USED)
+
+# define THREAD_GSCOPE_RESET_FLAG() \
+  ({ \
+int __flag;  \
+asm volatile ("xchgl %0, %%gs:%P1"   \
+  : "=r" (__flag)\
+  : "i" (offsetof (tcbhead_t, gscope_flag)), \
+"0" (THREAD_GSCOPE_FLAG_UNUSED));\
+if (__flag == THREAD_GSCOPE_FLAG_WAIT)   \
+  lll_wake (THREAD_SELF->gscope_flag, LLL_PRIVATE);  \
+  })
+
 #endif /* !__ASSEMBLER__ */
 
 #endif /* i386/tls.h */
diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h
index f83956d3d7..8e66d5ff53 100644
--- a/sysdeps/mach/hurd/tls.h
+++ b/sysdeps/mach/hurd/tls.h
@@ -52,26 +52,6 @@
 # define GET_DTV(descr) \
   (((tcbhead_t *) (descr))->dtv)
 
-/* Global scope switch support.  */
-#define THREAD_GSCOPE_IN_TCB  0
-#define THREAD_GSCOPE_GLOBAL
-#define THREAD_GSCOPE_SET_FLAG() \
-  atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1)
-#define THREAD_GSCOPE_RESET_FLAG() \
-  do  

[PATCH htl v3 2/5] htl: Move thread table to ld.so

2021-09-07 Thread Sergey Bugaev
The next commit is going to introduce a new implementation of
THREAD_GSCOPE_WAIT which needs to access the list of threads.
Since it must be usable from the dynamic laoder, we have to move
the symbols for the list of threads into the loader.

Signed-off-by: Sergey Bugaev 
---
 htl/Versions |  2 -
 htl/pt-alloc.c   | 50 ++--
 htl/pt-create.c  | 11 ++
 htl/pt-internal.h| 23 ---
 sysdeps/generic/ldsodefs.h   |  9 +
 sysdeps/htl/dl-support.c | 23 +++
 sysdeps/htl/pt-key-delete.c  |  8 ++--
 sysdeps/htl/pthreadP.h   |  2 -
 sysdeps/htl/raise.c  |  8 +++-
 sysdeps/htl/thrd_current.c   |  7 +++-
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 13 files changed, 82 insertions(+), 67 deletions(-)
 create mode 100644 sysdeps/htl/dl-support.c

diff --git a/htl/Versions b/htl/Versions
index 4aea321016..4e0ebac285 100644
--- a/htl/Versions
+++ b/htl/Versions
@@ -168,8 +168,6 @@ libpthread {
   GLIBC_PRIVATE {
 __pthread_initialize_minimal;
 
-__pthread_threads;
-
 __cthread_detach;
 __cthread_fork;
 __pthread_detach;
diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c
index acc67f2711..f6708a60bf 100644
--- a/htl/pt-alloc.c
+++ b/htl/pt-alloc.c
@@ -28,19 +28,9 @@
of the threads functions "shall fail" if "No thread could be found
corresponding to that specified by the given thread ID."  */
 
-/* Thread ID lookup table.  */
-struct __pthread **__pthread_threads;
-
 /* The size of the thread ID lookup table.  */
 int __pthread_max_threads;
 
-/* The total number of thread IDs currently in use, or on the list of
-   available thread IDs.  */
-int __pthread_num_threads;
-
-/* A lock for the table, and the other variables above.  */
-pthread_rwlock_t __pthread_threads_lock;
-
 /* List of thread structures corresponding to free thread IDs.  */
 struct __pthread *__pthread_free_threads;
 pthread_mutex_t __pthread_free_threads_lock;
@@ -132,25 +122,25 @@ __pthread_alloc (struct __pthread **pthread)
 }
 
 retry:
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
-  if (__pthread_num_threads < __pthread_max_threads)
+  if (GL (dl_pthread_num_threads) < __pthread_max_threads)
 {
   /* We have a free slot.  Use the slot number plus one as the
  thread ID for the new thread.  */
-  new->thread = 1 + __pthread_num_threads++;
-  __pthread_threads[new->thread - 1] = NULL;
+  new->thread = 1 + GL (dl_pthread_num_threads)++;
+  GL (dl_pthread_threads)[new->thread - 1] = NULL;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   *pthread = new;
   return 0;
 }
 #ifdef PTHREAD_THREADS_MAX
-  else if (__pthread_num_threads >= PTHREAD_THREADS_MAX)
+  else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX)
 {
   /* We have reached the limit on the number of threads per process.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   free (new);
   return EAGAIN;
@@ -162,7 +152,7 @@ retry:
  memory allocation, since that's a potentially blocking operation.  */
   max_threads = __pthread_max_threads;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   /* Allocate a new lookup table that's twice as large.  */
   new_max_threads
@@ -174,13 +164,13 @@ retry:
   return ENOMEM;
 }
 
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  lll_lock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   /* Check if nobody else has already enlarged the table.  */
   if (max_threads != __pthread_max_threads)
 {
   /* Yep, they did.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  lll_unlock (GL (dl_pthread_threads_lock), LLL_PRIVATE);
 
   /* Free the newly allocated table and try again to allocate a slot.  */
   free (threads);
@@ -188,22 +178,22 @@ retry:
 }
 
   /* Copy over the contents of the old table.  */
-  memcpy (threads, __pthread_threads,
+  memcpy (threads, GL (dl_pthread_threads),
  __pthread_max_threads * sizeof (struct __pthread *));
 
   /* Save the location of the old table.  We want to deallocate its
  storage after we released the lock.  */
-  old_threads = __pthread_threads;
+  old_threads = GL (dl_pthread_threads);
 
   /* Replace the table with the new one.  */
   __pthread_max_threads = new_max_threads;
-  __pthread_threads = threads;
+  GL (dl_pthread_threads) = threads;
 
   /* And allocate ourselves 

[PATCH htl v3 1/5] elf: Replace most uses of THREAD_GSCOPE_IN_TCB

2021-09-07 Thread Sergey Bugaev
While originally this definition was indeed used to distinguish between
the cases where the GSCOPE flag was stored in TCB or not, it has since
become used as a general way to distinguish between HTL and NPTL.

THREAD_GSCOPE_IN_TCB will be removed in the following commits, as HTL,
which currently is the only port that does not put the flag into TCB,
will get ported to put the GSCOPE flag into the TCB as well. To prepare
for that change, migrate all code that wants to distinguish between HTL
and NPTL to use PTHREAD_IN_LIBC instead, which is a better choice since
the distinction mostly has to do with whether libc has access to the
list of thread structures and therefore can initialize thread-local
storage.

The parts of code that actually depend on whether the GSCOPE flag is in
TCB are left unchanged.

Signed-off-by: Sergey Bugaev 
---
 elf/dl-reloc.c |  4 ++--
 elf/dl-support.c   |  6 --
 elf/dl-tls.c   |  6 +++---
 elf/dl-tls_init_tp.c   |  2 +-
 sysdeps/generic/ldsodefs.h | 11 ++-
 5 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index e13a672ade..6c957456b8 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -141,7 +141,7 @@ cannot allocate memory in static TLS block"));
 }
 }
 
-#if !THREAD_GSCOPE_IN_TCB
+#if !PTHREAD_IN_LIBC
 /* Initialize static TLS area and DTV for current (only) thread.
libpthread implementations should provide their own hook
to handle all threads.  */
@@ -160,7 +160,7 @@ _dl_nothread_init_static_tls (struct link_map *map)
   memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
  '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
 }
-#endif /* !THREAD_GSCOPE_IN_TCB */
+#endif /* !PTHREAD_IN_LIBC */
 
 void
 _dl_relocate_object (struct link_map *l, struct r_scope_elem *scope[],
diff --git a/elf/dl-support.c b/elf/dl-support.c
index 0155718175..a317459fae 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -183,7 +183,7 @@ uint64_t _dl_hwcap_mask;
  * executable but this isn't true for all platforms.  */
 ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS;
 
-#if THREAD_GSCOPE_IN_TCB
+#if PTHREAD_IN_LIBC
 list_t _dl_stack_used;
 list_t _dl_stack_user;
 list_t _dl_stack_cache;
@@ -195,9 +195,11 @@ int _dl_stack_cache_lock;
when it was not, we do it by calling this function.
It returns an errno code or zero on success.  */
 int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
-int _dl_thread_gscope_count;
 void (*_dl_init_static_tls) (struct link_map *) = 
&_dl_nothread_init_static_tls;
 #endif
+#if !THREAD_GSCOPE_IN_TCB
+int _dl_thread_gscope_count;
+#endif
 struct dl_scope_free_list *_dl_scope_free_list;
 
 #ifdef NEED_DL_SYSINFO
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 423e380f7c..d554ae4497 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -29,7 +29,7 @@
 #include 
 #include 
 
-#if THREAD_GSCOPE_IN_TCB
+#if PTHREAD_IN_LIBC
 # include 
 #endif
 
@@ -1058,7 +1058,7 @@ cannot create TLS data structures"));
 }
 }
 
-#if THREAD_GSCOPE_IN_TCB
+#if PTHREAD_IN_LIBC
 static inline void __attribute__((always_inline))
 init_one_static_tls (struct pthread *curp, struct link_map *map)
 {
@@ -1091,4 +1091,4 @@ _dl_init_static_tls (struct link_map *map)
 
   lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
 }
-#endif /* THREAD_GSCOPE_IN_TCB */
+#endif /* PTHREAD_IN_LIBC */
diff --git a/elf/dl-tls_init_tp.c b/elf/dl-tls_init_tp.c
index d84adc992c..e482f3cfa9 100644
--- a/elf/dl-tls_init_tp.c
+++ b/elf/dl-tls_init_tp.c
@@ -36,7 +36,7 @@ rtld_lock_default_unlock_recursive (void *lock)
 void
 __tls_pre_init_tp (void)
 {
-#if !THREAD_GSCOPE_IN_TCB
+#if !PTHREAD_IN_LIBC
   GL(dl_init_static_tls) = &_dl_nothread_init_static_tls;
 #endif
 
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 9c15259236..02bd579ef8 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -456,7 +456,7 @@ struct rtld_global
   /* Generation counter for the dtv.  */
   EXTERN size_t _dl_tls_generation;
 
-#if !THREAD_GSCOPE_IN_TCB
+#if !PTHREAD_IN_LIBC
   EXTERN void (*_dl_init_static_tls) (struct link_map *);
 #endif
 
@@ -466,7 +466,7 @@ struct rtld_global
 size_t count;
 void *list[50];
   } *_dl_scope_free_list;
-#if THREAD_GSCOPE_IN_TCB
+#if PTHREAD_IN_LIBC
   /* List of active thread stacks, with memory managed by glibc.  */
   EXTERN list_t _dl_stack_used;
 
@@ -486,7 +486,8 @@ struct rtld_global
 
   /* Mutex protecting the stack lists.  */
   EXTERN int _dl_stack_cache_lock;
-#else
+#endif
+#if !THREAD_GSCOPE_IN_TCB
   EXTERN int _dl_thread_gscope_count;
 #endif
 #ifdef SHARED
@@ -1297,13 +1298,13 @@ extern void _dl_aux_init (ElfW(auxv_t) *av)
 
 /* Initialize the static TLS space for the link map in all existing
threads. */
-#if THREAD_GSCOPE_IN_TCB
+#if PTHREAD_IN_LIBC
 void _dl_init_static_tls (struct link_map *map) 

Re: Wrong MIG generated headers

2021-09-11 Thread Sergey Bugaev
Hello,

On Sat, Sep 11, 2021 at 12:27 PM Joan Lledó  wrote:
> Note the additional parameter *mach_msg_type_name_t objectPoly* in
> memory_object_create_proxy. I don't know why is it there but I can't
> compile the pci arbiter anymore because of that. What can I do?

This is a little change that Samuel has committed recently [0] (at my
suggestion). It lets the client specify the "disposition" of the
object port(s), i.e. MAKE_SEND, COPY_SEND, or MOVE_SEND.

[0]: 
https://git.savannah.gnu.org/cgit/hurd/gnumach.git/commit/?id=d1c056f2e23b7a9fa91dd5132b040813bfc00487

This doesn't change the RPC format, only the client (user) C routine
signature. You'll have to adapt the PCI arbiter code to pass a value
for this additional argument. The previous implicit default was
MACH_MSG_TYPE_COPY_SEND, so passing it explicitly should be equivalent
to the old behavior; but depending on your use case you might find it
cleaner to use one of the other two dispositions.

Sergey



Re: [PATCH htl v3 2/5] htl: Move thread table to ld.so

2021-09-15 Thread Sergey Bugaev
On Wed, Sep 15, 2021 at 2:27 AM Samuel Thibault  wrote:
> Please rather use __libc_rwlock_wrlock etc.

Thanks, I did not realize there were actually more internal
synchronization primitives implemented on top of lll. I will send out
a new version shortly.

> Even if for now they are actually locks and not rwlocks, the day we fix
> that the rwlocking of dl_pthread_threads_lock will get fixed alongside.

May I ask, why are rwlocks unimplemented? Is it just that nobody has
done the work, or are there some unobvious complications?

I'm asking because I happen to have an implementation of rwlocks [0]
that works (passes the test) on the Hurd. It makes use of futex
bitsets (which don't seem to be supported by gsync), but seems to work
just fine without them. And HTL of course has its own rwlocks
implementation. Would it make sense to me to spend some time to try
and write a __libc_rwlock based on them, or are there some obstacles?

[0]: 
https://github.com/bugaevc/lets-write-sync-primitives/blob/master/src/rwlock.cpp

Sergey



Re: [PATCH htl v3 2/5] htl: Move thread table to ld.so

2021-09-15 Thread Sergey Bugaev
On Wed, Sep 15, 2021 at 5:34 PM Samuel Thibault  wrote:
> so it's probably not
> worth spending time on making a separate __libc_rwlock implementation,
> and rather spend it on making pthread_rwlock use gsync, like was done
> for pthread_mutex and sem

Oh, it currently doesn't? That's horrifying. Then I'll look into that next.

Sergey

P.S. There's this other thing that is way more urgent than either
eliminating gsync_wake () calls or optimizing rwlocks. I sent you an
email a week ago. Please take a look.



[PATCH htl v4 0/4] Rewrite THREAD_GSCOPE

2021-09-15 Thread Sergey Bugaev
This is v4 of the patchset. The previous versions are at [0][1][2][3].

[0]: https://lists.gnu.org/archive/html/bug-hurd/2021-05/msg00053.html
[1]: https://lists.gnu.org/archive/html/bug-hurd/2021-05/msg00063.html
[2]: https://sourceware.org/pipermail/libc-alpha/2021-August/130614.html
[3]: https://sourceware.org/pipermail/libc-alpha/2021-September/130810.html

Changes since v3:
* Drop the two patches (1/5 and 5/5) that have been pushed
  (the remaining three patches are numbered 2/4, 3/4, 4/4)
* Use __libc_rwlock for dl_pthread_threads_lock instead of a plain int/lll

To test, before:
$ rpctrace -o >(grep -c wake) /bin/uname
GNU
58

After:
$ ./testrun.sh --tool=rpctrace /bin/uname 2> >(grep -c wake)
GNU
0

Sergey Bugaev (3):
  htl: Move thread table to ld.so
  htl: Reimplement GSCOPE
  elf: Remove THREAD_GSCOPE_IN_TCB

 elf/dl-support.c |  3 --
 htl/Versions |  2 -
 htl/pt-alloc.c   | 50 +
 htl/pt-create.c  |  8 ++--
 htl/pt-internal.h| 23 +++---
 sysdeps/aarch64/nptl/tls.h   |  1 -
 sysdeps/alpha/nptl/tls.h |  1 -
 sysdeps/arc/nptl/tls.h   |  1 -
 sysdeps/arm/nptl/tls.h   |  1 -
 sysdeps/csky/nptl/tls.h  |  1 -
 sysdeps/generic/ldsodefs.h   | 13 +++---
 sysdeps/generic/tls.h|  6 ---
 sysdeps/hppa/nptl/tls.h  |  1 -
 sysdeps/htl/dl-support.c | 23 ++
 sysdeps/htl/dl-thread_gscope_wait.c  | 55 
 sysdeps/htl/pt-key-delete.c  |  8 ++--
 sysdeps/htl/pthreadP.h   |  2 -
 sysdeps/htl/raise.c  |  8 +++-
 sysdeps/htl/thrd_current.c   |  7 ++-
 sysdeps/i386/nptl/tls.h  |  1 -
 sysdeps/ia64/nptl/tls.h  |  1 -
 sysdeps/m68k/nptl/tls.h  |  1 -
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 sysdeps/mach/hurd/i386/tls.h | 19 
 sysdeps/mach/hurd/tls.h  | 20 -
 sysdeps/microblaze/nptl/tls.h|  1 -
 sysdeps/mips/nptl/tls.h  |  1 -
 sysdeps/nios2/nptl/tls.h |  1 -
 sysdeps/powerpc/nptl/tls.h   |  1 -
 sysdeps/riscv/nptl/tls.h |  1 -
 sysdeps/s390/nptl/tls.h  |  1 -
 sysdeps/sh/nptl/tls.h|  1 -
 sysdeps/sparc/nptl/tls.h |  1 -
 sysdeps/x86_64/nptl/tls.h|  1 -
 36 files changed, 156 insertions(+), 115 deletions(-)
 create mode 100644 sysdeps/htl/dl-support.c
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c

-- 
2.31.1




[PATCH htl v4 2/4] htl: Move thread table to ld.so

2021-09-15 Thread Sergey Bugaev
The next commit is going to introduce a new implementation of
THREAD_GSCOPE_WAIT which needs to access the list of threads.
Since it must be usable from the dynamic laoder, we have to move
the symbols for the list of threads into the loader.

Signed-off-by: Sergey Bugaev 
---
 htl/Versions |  2 -
 htl/pt-alloc.c   | 50 ++--
 htl/pt-create.c  |  8 ++--
 htl/pt-internal.h| 23 ---
 sysdeps/generic/ldsodefs.h   |  9 +
 sysdeps/htl/dl-support.c | 23 +++
 sysdeps/htl/pt-key-delete.c  |  8 ++--
 sysdeps/htl/pthreadP.h   |  2 -
 sysdeps/htl/raise.c  |  8 +++-
 sysdeps/htl/thrd_current.c   |  7 +++-
 sysdeps/mach/hurd/htl/pt-sigstate-init.c |  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.c|  2 +-
 sysdeps/mach/hurd/htl/pt-sysdep.h|  2 +-
 13 files changed, 83 insertions(+), 63 deletions(-)
 create mode 100644 sysdeps/htl/dl-support.c

diff --git a/htl/Versions b/htl/Versions
index 4aea321016..4e0ebac285 100644
--- a/htl/Versions
+++ b/htl/Versions
@@ -168,8 +168,6 @@ libpthread {
   GLIBC_PRIVATE {
 __pthread_initialize_minimal;
 
-__pthread_threads;
-
 __cthread_detach;
 __cthread_fork;
 __pthread_detach;
diff --git a/htl/pt-alloc.c b/htl/pt-alloc.c
index acc67f2711..f6e783be10 100644
--- a/htl/pt-alloc.c
+++ b/htl/pt-alloc.c
@@ -28,19 +28,9 @@
of the threads functions "shall fail" if "No thread could be found
corresponding to that specified by the given thread ID."  */
 
-/* Thread ID lookup table.  */
-struct __pthread **__pthread_threads;
-
 /* The size of the thread ID lookup table.  */
 int __pthread_max_threads;
 
-/* The total number of thread IDs currently in use, or on the list of
-   available thread IDs.  */
-int __pthread_num_threads;
-
-/* A lock for the table, and the other variables above.  */
-pthread_rwlock_t __pthread_threads_lock;
-
 /* List of thread structures corresponding to free thread IDs.  */
 struct __pthread *__pthread_free_threads;
 pthread_mutex_t __pthread_free_threads_lock;
@@ -132,25 +122,25 @@ __pthread_alloc (struct __pthread **pthread)
 }
 
 retry:
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  __libc_rwlock_wrlock (GL (dl_pthread_threads_lock));
 
-  if (__pthread_num_threads < __pthread_max_threads)
+  if (GL (dl_pthread_num_threads) < __pthread_max_threads)
 {
   /* We have a free slot.  Use the slot number plus one as the
  thread ID for the new thread.  */
-  new->thread = 1 + __pthread_num_threads++;
-  __pthread_threads[new->thread - 1] = NULL;
+  new->thread = 1 + GL (dl_pthread_num_threads)++;
+  GL (dl_pthread_threads)[new->thread - 1] = NULL;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  __libc_rwlock_unlock (GL (dl_pthread_threads_lock));
 
   *pthread = new;
   return 0;
 }
 #ifdef PTHREAD_THREADS_MAX
-  else if (__pthread_num_threads >= PTHREAD_THREADS_MAX)
+  else if (GL (dl_pthread_num_threads) >= PTHREAD_THREADS_MAX)
 {
   /* We have reached the limit on the number of threads per process.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  __libc_rwlock_unlock (GL (dl_pthread_threads_lock));
 
   free (new);
   return EAGAIN;
@@ -162,7 +152,7 @@ retry:
  memory allocation, since that's a potentially blocking operation.  */
   max_threads = __pthread_max_threads;
 
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  __libc_rwlock_unlock (GL (dl_pthread_threads_lock));
 
   /* Allocate a new lookup table that's twice as large.  */
   new_max_threads
@@ -174,13 +164,13 @@ retry:
   return ENOMEM;
 }
 
-  __pthread_rwlock_wrlock (&__pthread_threads_lock);
+  __libc_rwlock_wrlock (GL (dl_pthread_threads_lock));
 
   /* Check if nobody else has already enlarged the table.  */
   if (max_threads != __pthread_max_threads)
 {
   /* Yep, they did.  */
-  __pthread_rwlock_unlock (&__pthread_threads_lock);
+  __libc_rwlock_unlock (GL (dl_pthread_threads_lock));
 
   /* Free the newly allocated table and try again to allocate a slot.  */
   free (threads);
@@ -188,22 +178,22 @@ retry:
 }
 
   /* Copy over the contents of the old table.  */
-  memcpy (threads, __pthread_threads,
+  memcpy (threads, GL (dl_pthread_threads),
  __pthread_max_threads * sizeof (struct __pthread *));
 
   /* Save the location of the old table.  We want to deallocate its
  storage after we released the lock.  */
-  old_threads = __pthread_threads;
+  old_threads = GL (dl_pthread_threads);
 
   /* Replace the table with the new one.  */
   __pthread_max_threads = new_max_threads;
-  __pthread_threads = threads;
+  GL (dl_pthread_threads) = threads;
 
   /* And allocate ourselves 

[PATCH htl v4 3/4] htl: Reimplement GSCOPE

2021-09-15 Thread Sergey Bugaev
This is a new implementation of GSCOPE which largely mirrors its NPTL
counterpart. Same as in NPTL, instead of a global flag shared between
threads, there is now a per-thread GSCOPE flag stored in each thread's
TCB. This makes entering and exiting a GSCOPE faster at the expense of
making THREAD_GSCOPE_WAIT () slower.

The largest win is the elimination of many redundant gsync_wake () RPC
calls; previously, even simplest programs would make dozens of fully
redundant gsync_wake () calls.

Signed-off-by: Sergey Bugaev 
---
 sysdeps/htl/dl-thread_gscope_wait.c | 55 +
 sysdeps/mach/hurd/i386/tls.h| 21 +++
 sysdeps/mach/hurd/tls.h | 20 ---
 3 files changed, 76 insertions(+), 20 deletions(-)
 create mode 100644 sysdeps/htl/dl-thread_gscope_wait.c

diff --git a/sysdeps/htl/dl-thread_gscope_wait.c 
b/sysdeps/htl/dl-thread_gscope_wait.c
new file mode 100644
index 00..7557e77aab
--- /dev/null
+++ b/sysdeps/htl/dl-thread_gscope_wait.c
@@ -0,0 +1,55 @@
+/* Out-of-line notification function for the GSCOPE locking mechanism.
+   Copyright (C) 2007-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include 
+#include 
+#include 
+
+void
+__thread_gscope_wait (void)
+{
+  size_t i;
+  struct __pthread *t;
+  int *gscope_flagp;
+
+  __libc_rwlock_rdlock (GL (dl_pthread_threads_lock));
+
+  /* Iterate over the list of threads.  */
+  for (i = 0; i < GL (dl_pthread_num_threads); ++i)
+{
+  t = GL (dl_pthread_threads[i]);
+  if (t == NULL || t->tcb->gscope_flag == THREAD_GSCOPE_FLAG_UNUSED)
+continue;
+
+  gscope_flagp = &t->tcb->gscope_flag;
+
+  /* We have to wait until this thread is done with the global
+ scope.  First tell the thread that we are waiting and
+ possibly have to be woken.  */
+  if (atomic_compare_and_exchange_bool_acq (gscope_flagp,
+THREAD_GSCOPE_FLAG_WAIT,
+THREAD_GSCOPE_FLAG_USED))
+continue;
+
+  do
+lll_wait (gscope_flagp, THREAD_GSCOPE_FLAG_WAIT, LLL_PRIVATE);
+  while (*gscope_flagp == THREAD_GSCOPE_FLAG_WAIT);
+}
+
+  __libc_rwlock_unlock (GL (dl_pthread_threads_lock));
+}
diff --git a/sysdeps/mach/hurd/i386/tls.h b/sysdeps/mach/hurd/i386/tls.h
index 057b2613f3..2ac65df921 100644
--- a/sysdeps/mach/hurd/i386/tls.h
+++ b/sysdeps/mach/hurd/i386/tls.h
@@ -369,6 +369,27 @@ _hurd_tls_new (thread_t child, struct i386_thread_state 
*state, tcbhead_t *tcb)
   return err;
 }
 
+/* Global scope switch support.  */
+# define THREAD_GSCOPE_IN_TCB  1
+
+# define THREAD_GSCOPE_FLAG_UNUSED 0
+# define THREAD_GSCOPE_FLAG_USED   1
+# define THREAD_GSCOPE_FLAG_WAIT   2
+
+# define THREAD_GSCOPE_SET_FLAG() \
+  THREAD_SETMEM (THREAD_SELF, gscope_flag, THREAD_GSCOPE_FLAG_USED)
+
+# define THREAD_GSCOPE_RESET_FLAG() \
+  ({ \
+int __flag;  \
+asm volatile ("xchgl %0, %%gs:%P1"   \
+  : "=r" (__flag)\
+  : "i" (offsetof (tcbhead_t, gscope_flag)), \
+"0" (THREAD_GSCOPE_FLAG_UNUSED));\
+if (__flag == THREAD_GSCOPE_FLAG_WAIT)   \
+  lll_wake (THREAD_SELF->gscope_flag, LLL_PRIVATE);  \
+  })
+
 #endif /* !__ASSEMBLER__ */
 
 #endif /* i386/tls.h */
diff --git a/sysdeps/mach/hurd/tls.h b/sysdeps/mach/hurd/tls.h
index f83956d3d7..8e66d5ff53 100644
--- a/sysdeps/mach/hurd/tls.h
+++ b/sysdeps/mach/hurd/tls.h
@@ -52,26 +52,6 @@
 # define GET_DTV(descr) \
   (((tcbhead_t *) (descr))->dtv)
 
-/* Global scope switch support.  */
-#define THREAD_GSCOPE_IN_TCB  0
-#define THREAD_GSCOPE_GLOBAL
-#define THREAD_GSCOPE_SET_FLAG() \
-  atomic_exchange_and_add_acq (&GL(dl_thread_gscope_count), 1)
-#define THREAD_GSCOPE_RESET_FLAG() \
-  do \
-if (atomic_

[PATCH htl v4 4/4] elf: Remove THREAD_GSCOPE_IN_TCB

2021-09-15 Thread Sergey Bugaev
All the ports now have THREAD_GSCOPE_IN_TCB set to 1. Remove all
support for !THREAD_GSCOPE_IN_TCB, along with the definition itself.

Signed-off-by: Sergey Bugaev 
---
 elf/dl-support.c  | 3 ---
 sysdeps/aarch64/nptl/tls.h| 1 -
 sysdeps/alpha/nptl/tls.h  | 1 -
 sysdeps/arc/nptl/tls.h| 1 -
 sysdeps/arm/nptl/tls.h| 1 -
 sysdeps/csky/nptl/tls.h   | 1 -
 sysdeps/generic/ldsodefs.h| 6 --
 sysdeps/generic/tls.h | 6 --
 sysdeps/hppa/nptl/tls.h   | 1 -
 sysdeps/i386/nptl/tls.h   | 1 -
 sysdeps/ia64/nptl/tls.h   | 1 -
 sysdeps/m68k/nptl/tls.h   | 1 -
 sysdeps/mach/hurd/i386/tls.h  | 2 --
 sysdeps/microblaze/nptl/tls.h | 1 -
 sysdeps/mips/nptl/tls.h   | 1 -
 sysdeps/nios2/nptl/tls.h  | 1 -
 sysdeps/powerpc/nptl/tls.h| 1 -
 sysdeps/riscv/nptl/tls.h  | 1 -
 sysdeps/s390/nptl/tls.h   | 1 -
 sysdeps/sh/nptl/tls.h | 1 -
 sysdeps/sparc/nptl/tls.h  | 1 -
 sysdeps/x86_64/nptl/tls.h | 1 -
 22 files changed, 35 deletions(-)

diff --git a/elf/dl-support.c b/elf/dl-support.c
index a317459fae..02e2ed72f5 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -197,9 +197,6 @@ int _dl_stack_cache_lock;
 int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
 void (*_dl_init_static_tls) (struct link_map *) = 
&_dl_nothread_init_static_tls;
 #endif
-#if !THREAD_GSCOPE_IN_TCB
-int _dl_thread_gscope_count;
-#endif
 struct dl_scope_free_list *_dl_scope_free_list;
 
 #ifdef NEED_DL_SYSINFO
diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h
index 6e896207a6..72f22dc718 100644
--- a/sysdeps/aarch64/nptl/tls.h
+++ b/sysdeps/aarch64/nptl/tls.h
@@ -109,7 +109,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-# define THREAD_GSCOPE_IN_TCB  1
 # define THREAD_GSCOPE_FLAG_UNUSED 0
 # define THREAD_GSCOPE_FLAG_USED   1
 # define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h
index 4dbccc5249..6328112135 100644
--- a/sysdeps/alpha/nptl/tls.h
+++ b/sysdeps/alpha/nptl/tls.h
@@ -103,7 +103,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define THREAD_GSCOPE_IN_TCB  1
 #define THREAD_GSCOPE_FLAG_UNUSED 0
 #define THREAD_GSCOPE_FLAG_USED   1
 #define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/arc/nptl/tls.h b/sysdeps/arc/nptl/tls.h
index 95300fdd21..e269c0a7a5 100644
--- a/sysdeps/arc/nptl/tls.h
+++ b/sysdeps/arc/nptl/tls.h
@@ -111,7 +111,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define THREAD_GSCOPE_IN_TCB  1
 #define THREAD_GSCOPE_FLAG_UNUSED 0
 #define THREAD_GSCOPE_FLAG_USED   1
 #define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/arm/nptl/tls.h b/sysdeps/arm/nptl/tls.h
index 1bd11307ce..699c16acfb 100644
--- a/sysdeps/arm/nptl/tls.h
+++ b/sysdeps/arm/nptl/tls.h
@@ -100,7 +100,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-#define THREAD_GSCOPE_IN_TCB  1
 #define THREAD_GSCOPE_FLAG_UNUSED 0
 #define THREAD_GSCOPE_FLAG_USED   1
 #define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/csky/nptl/tls.h b/sysdeps/csky/nptl/tls.h
index 7a234041ed..b210dfcb76 100644
--- a/sysdeps/csky/nptl/tls.h
+++ b/sysdeps/csky/nptl/tls.h
@@ -127,7 +127,6 @@ typedef struct
   descr->member[idx] = (value)
 
 /* Get and set the global scope generation counter in struct pthread.  */
-# define THREAD_GSCOPE_IN_TCB  1
 # define THREAD_GSCOPE_FLAG_UNUSED 0
 # define THREAD_GSCOPE_FLAG_USED   1
 # define THREAD_GSCOPE_FLAG_WAIT   2
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index f01b601d63..fd67871f4b 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -495,10 +495,6 @@ struct rtld_global
   EXTERN struct __pthread **_dl_pthread_threads;
   __libc_rwlock_define (EXTERN, _dl_pthread_threads_lock)
 #endif
-
-#if !THREAD_GSCOPE_IN_TCB
-  EXTERN int _dl_thread_gscope_count;
-#endif
 #ifdef SHARED
 };
 # define __rtld_global_attribute__
@@ -1381,10 +1377,8 @@ __rtld_mutex_init (void)
 }
 #endif /* !PTHREAD_IN_LIBC */
 
-#if THREAD_GSCOPE_IN_TCB
 void __thread_gscope_wait (void) attribute_hidden;
 # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
-#endif
 
 __END_DECLS
 
diff --git a/sysdeps/generic/tls.h b/sysdeps/generic/tls.h
index e86d70e6ce..f581c9a992 100644
--- a/sysdeps/generic/tls.h
+++ b/sysdeps/generic/tls.h
@@ -71,10 +71,4 @@
  This macro returns the address of the DTV of the current thread.
  This normally is done using the thread register which points
  to the dtv or the TCB (from which the DTV can found).
-
-
- THREAD_GSCOPE_IN_TCB
-
- This should be set to 1 if the global scope flag is stored within the TCB.
-   

Re: [PATCH htl v4 0/4] Rewrite THREAD_GSCOPE

2021-09-15 Thread Sergey Bugaev
On Wed, Sep 15, 2021 at 8:22 PM Samuel Thibault  wrote:
> Did you run make check to verify that you didn't break the world? :)
>
> (i.e. no more failures than before the patch)

Not yet; it takes a long time here, I'll leave it running overnight
and report tomorrow.

By the way, for a project as important as glibc, I'd expect there'd be
a CI bot that would constantly build and test the latest master, if
not all submitted patches. But it looks like this hasn't been done
yet? [0]

[0]: https://sourceware.org/glibc/wiki/CICDDesign

Sergey



On conscription, copyright assignment, and CVEs

2021-10-07 Thread Sergey Bugaev
Hello everyone.

I'm likely getting conscripted for military service some time soon.
When exactly, I don't yet know, but it could happen any day now.

What does it have to do with the Hurd? Well, it most likely means I
will be offline (and so, unable to contribute or to read and respond
to any messages) for a whole year; so I want to take care of some
unfinished matters.

My FSF copyright assignment is still unfinished, and I'm starting to
doubt it will ever be. I haven't received a reply from the FSF person
I was communicating with for the last two months. As I've stated
previously, it's partly my own fault that the process is taking so
long, for I have also been very slow to respond to them (in one case).
But no matter whose fault it is, it looks like the process will
require some more back-and-forth iterations/roundtrips, which are
unlikely to happen fast enough to complete before I get conscripted.

Recently, several people have asked me what's up with getting official
CVEs for those Hurd vulnerabilities I've written about previously.
Truth is, I don't really know how this works!

Back in May, Amos Jeffries has kindly offered to help me with the CVE
process; but we got stuck at exchanging GPG keys, and I haven't heard
from him since June. I don't know if Amos is still interested, or if I
should seek help elsewhere;

but in any case, it's been two months since the fixes have been
published. Everybody should have had plenty of time to upgrade. It's
also been possible for any attackers to infer what the vulnerabilities
were from the patches, which are publicly accessible (if not in the
main Hurd tree). I think it would make sense now for me to just
publish the details of what the vulnerabilities were. It should be an
interesting read for everyone, and it would hopefully help with the
CVE process somewhat (assuming someone would be interested in it,
perhaps they even would be able to complete the process in my
absence?). And also I expect to forget the details in a year's time (I
must have already forgotten some!), so I better do it now rather than
afterwards.

So, if anybody knows of a reason I shall not do this, speak now or
forever hold your peace! :)

Sergey



Re: gnumach RPC: get info about the calling task

2021-10-12 Thread Sergey Bugaev
On Tue, Oct 12, 2021 at 3:37 PM Joan Lledó  wrote:
> Hi,
>
> I'm working on the gnumach rpc to return a pager from a task and an
> address mapped in that task.

Cool! What's the RPC signature?

> For security reasons I'd like to check if the calling task is the same
> as the one the given map belongs to. But I don't know how to do it.

Please don't do this. This is very much against how capability
systems, and Mach / Hurd in particular, are designed. In capability
systems, it should never ever matter *who's* making a call; the only
important thing is whether they have a relevant capability or not. In
other words, this makes Mach RPC caller-transparent; it matters which
messages are sent to which destination, but not who sends them.

This (beautiful!) property/invariant is what allows you to manipulate
any task that you have a task port of, not just your own task. This is
used, for example, by the exec server to destroy and create threads
and manipulate memory mappings of the task being exec'd. This is also
used to manipulate the just-created "child" task in fork (), and
likely in many other places.

Finally, consider rpctrace: it intercepts, traces, and re-sends every
single message coming from or to the traced task. If the kernel would
check who the caller is, it would see that the caller is rpctrace and
not the task, and that would presumably affect its behavior somehow
(such as make it reject some calls). The whole idea of rpctrace is
predicated on the invariant that the caller identity *doesn't* affect
the servers' behavior.

So in the case of vm_map-backed pager, it should matter whether you
have a task port to the target task, not whether you *are* the target
task. If someone has a task port to a task, it allows them to
completely control the task anyway (including making it invoke any
RPC); there's no additional security gains from checking who the
caller is, but there will be additional breakage.

> In the rpc implementation, the function receives a vm_map_t parameter.
> How can I get the task from the vm_map_t?

You'd declare the argument as task_t instead of vm_task_t; that would
make the kernel-side C handler accept a task_t instead of a vm_map_t.
The user code passes a task port in either case.

> and how can I compare it with
> the calling task to check whether it's the same?

You can get the current (calling) task with the current_task () macro
(see kern/thread.h). Or in fact you could get the vm_map directly by
using the current_map () macro from the same file; then you could
compare that against the vm_map_t you already have; so you wouldn't
even need to do what I've said above (make your handler to accept a
full task_t).

But again, please don't do this.

Sergey



  1   2   3   4   5   6   7   8   9   >