Re: doc: Improve documentation of previous POSIX functions.

2024-06-28 Thread Bruno Haible
Collin Funk wrote:
> I've committed this patch doing the same for the rest of the files in
> doc/pastposix-functions/*.texi since I think it is helpful.

Thanks.

> Are POSIX functions always removed the following release that they are
> marked obsolete?

I would not make such an assumption. But the changes you made are OK;
I cross-checked by looking at the section "CONFORMING TO" of the
corresponding Linux man-pages.

Bruno






Re: doc: Improve documentation of previous POSIX functions.

2024-06-28 Thread Paul Eggert

On 6/28/24 04:31, Collin Funk wrote:

Are POSIX functions always removed the following release that they are
marked obsolete?


Typically, but no guarantee.



time: Fix test failure on FreeBSD

2024-06-28 Thread Bruno Haible
The test-time test fails on FreeBSD 14.0/x86_64, in a VirtualBox VM, but
only if 2 or more CPUs are assigned to the VM.

This patch fixes it.


2024-06-28  Bruno Haible  

time: Fix test failure on FreeBSD.
* m4/time.m4 (gl_FUNC_TIME): Guess no for FreeBSD in general.
* doc/posix-functions/time.texi: Mention FreeBSD in general.

diff --git a/doc/posix-functions/time.texi b/doc/posix-functions/time.texi
index 39a00d4370..804c2a3bfa 100644
--- a/doc/posix-functions/time.texi
+++ b/doc/posix-functions/time.texi
@@ -12,7 +12,10 @@
 This function is not consistent with @code{gettimeofday} and 
@code{timespec_get}
 on some platforms:
 @c https://sourceware.org/bugzilla/show_bug.cgi?id=30200
-glibc 2.31 or newer on Linux, FreeBSD 12.2/sparc64, AIX 7.2, native Windows.
+glibc 2.31 or newer on Linux,
+@c Only seen on machines with 2 or more CPUs.
+FreeBSD 14.0,
+AIX 7.2, native Windows.
 Namely, in the first 1 to 2.5 milliseconds of every second (or, on AIX and
 Windows, in the first 5 milliseconds of every second), @code{time} returns
 a value that is one less than the @code{tv_sec} part of the return value of
diff --git a/m4/time.m4 b/m4/time.m4
index ba5b65cead..dd346419ff 100644
--- a/m4/time.m4
+++ b/m4/time.m4
@@ -1,5 +1,5 @@
 # time.m4
-# serial 5
+# serial 6
 dnl Copyright (C) 2023-2024 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -17,7 +17,7 @@ AC_DEFUN([gl_FUNC_TIME]
  dnl   - glibc >= 2.31 with Linux. And binaries produced on glibc < 2.31
  dnl need to run fine on newer glibc versions as well; therefore ignore
  dnl __GLIBC_MINOR__.
- dnl   - FreeBSD/sparc,
+ dnl   - FreeBSD, on machines with 2 or more CPUs,
  dnl   - AIX,
  dnl   - native Windows.
  case "$host_os" in
@@ -33,12 +33,7 @@ AC_DEFUN([gl_FUNC_TIME]
[gl_cv_func_time_works="guessing no"],
[gl_cv_func_time_works="guessing yes"])
  ;;
-   freebsd*)
- case "$host_cpu" in
-   sparc*)gl_cv_func_time_works="guessing no";;
-   *) gl_cv_func_time_works="guessing yes";;
- esac
- ;;
+   freebsd*)  gl_cv_func_time_works="guessing no";;
aix*)  gl_cv_func_time_works="guessing no";;
mingw* | windows*) gl_cv_func_time_works="guessing no";;
*) gl_cv_func_time_works="guessing yes";;






Re: test-pthread-rwlock failure on Pop!_OS 22.04 LTS

2024-06-28 Thread Paul Eggert

On 6/28/24 01:44, Bruno Haible wrote:

Paul Eggert wrote:

On my Pop_OS! laptop it's way slower than it was on any
of your measured VMs:

$ time ./test-pthread-rwlock
Starting test_rwlock ...Alarm clock

real10m0.002s
user90m18.341s
sys 9m38.634s


Indeed, that's 5 times slower...


It's worse than that, possibly much worse. The test timed out after 10 
minutes real time, due to the 'alarm' call. We don't know how long the 
test would take if it were to run to completion.






Re: doc: Improve documentation of previous POSIX functions.

2024-06-28 Thread Collin Funk
Bruno Haible  writes:

>> Are POSIX functions always removed the following release that they are
>> marked obsolete?
>
> I would not make such an assumption. But the changes you made are OK;
> I cross-checked by looking at the section "CONFORMING TO" of the
> corresponding Linux man-pages.

Thats what I did + checking the POSIX pages to verify them.

I'll probably add some commentary to gethostbyname and gethostbyaddr in
a similar manner to inet_ntoa. It seems like those are used in many
places too.

Paul Eggert  writes:

> Typically, but no guarantee.

Thanks.

Collin



Re: test-pthread-rwlock failure on Pop!_OS 22.04 LTS

2024-06-28 Thread Bruno Haible
I wrote:
> 3) This topic has been discussed in the glibc bug
> https://sourceware.org/bugzilla/show_bug.cgi?id=13701
> where I have raised my voice for a writer-preferring implementation.

The rwlock implementation being not writer-preferring appears to be
only one contributing factor.

I did "time ./test-pthread-rwlock"

* in virtual machines with 8 CPUs:

  - Ubuntu 22.0460 sec  (4 CPUs: 15 sec)
  - musl libc on Ubuntu 22.040.3 sec
  - Alpine Linux 0.3 sec
  - FreeBSD 14   1.4 sec
  - OpenBSD 7.5  2.5 sec
  - Solaris 11.422 sec  (4 CPUs: 20 sec)
  - Cygwin  33..35 sec  (4 CPUs: 25..27 sec)
  - mingw   12 sec
  - MSVC16 sec

* outside VirtualBox:

  - Ubuntu 22.04 (12 CPUs)   6 sec
  - macOS 12 3.6 sec
  - AIX 7.1  4 sec

Intermediate results:

* While gl_cv_pthread_rwlock_rdlock_prefer_writer is
  - yes on macOS, FreeBSD, NetBSD, OpenBSD <= 6.0, Solaris, Cygwin,
  - no  on glibc, musl libc, OpenBSD >= 6.5, AIX, Android,
  only glibc has the problem; musl libc, OpenBSD 7.5, AIX don't.

* The first two measurements were done on the same Ubuntu 22.04 VM.
  This hints that kernel customization (scheduler, policies etc.)
  are not the cause — otherwise musl libc on Ubuntu 22.04 would have
  the same problem.

* I don't understand why the same Ubuntu 22.04 outside VirtualBox
  works fine (6 seconds), whereas inside VirtualBox it's 60 seconds.
  But it's not entirely a VirtualBox or virtualization problem,
  since in Paul's case the system is running on the naked harware
  (no virtualization).

So:
  - It seems to be a glibc problem.
  - But a large amount of CPU time is spent in the kernel.

Which performance tools exist that can help understand the situation
across the glibc / kernel boundary?

Bruno






Re: test-pthread-rwlock failure on Pop!_OS 22.04 LTS

2024-06-28 Thread Bruno Haible
Some more measurements in virtual machines with 8 CPUs:
  Debian 9 (glibc 2.24)   4 CPUs: 0.33 sec  8 CPUs: 0.34 sec
  Debian 10 (glibc 2.28)  4 CPUs: 23 sec8 CPUs: 29 sec

So, while glibc < 2.25 did not prefer writers (like musl libc, OpenBSD 7.5,
AIX), it's only starting with Torvald Riegel's rewrite of the rwlocks in
glibc 2.25 [1] that writer starvation occurs massively.

Bruno

[1] 
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=cc25c8b4c1196a8c29e9a45b1e096b99a87b7f8c






a pthread_cond_timedwait bug

2024-06-28 Thread Bruno Haible
While testing a 'pthread-rwlock' testdir on various VMs with 8 CPUs,
there was also a failure of 'test-pthread-cond' on Debian 10.

It's reproducible: On that Debian 10.7 VM (glibc 2.28, Linux 4.19.0)
with 8 CPUs, 'test-pthread-cond' with a probability of ca. 20%..30%
aborts. Stack trace:
  test-pthread-cond.c:218
-> test-pthread-cond.c:186

'ltrace' and 'strace' can be used to get a log of the two threads
(attached):
  $ ltrace -f -r llog05 ./test-pthread-cond
  $ strace -f -r -ttt slog08 ./test-pthread-cond

The relevant lines in llog05:

2409   0.40 nanosleep(0x7fff1f9ee690, 0x7fff1f9ee6e0, 0x7fff1f9ee6e0, 
0x7ff543b6e917 
2411   0.000142 gettimeofday(0x7ff543a89ea0, 0)= 0
2411   0.000125 pthread_cond_timedwait(0x5647187a4140, 0x5647187a4180, 
0x7ff543a89ed0, 0x20c49ba5e353f7cf 
2409   2.034394 <... nanosleep resumed> )  = 0
2409   0.87 pthread_mutex_lock(0x5647187a4180, 0x7fff1f9ee6e0, 0, 
0x7ff543c60bf0) = 0
2409   0.000612 pthread_cond_signal(0x5647187a4140, 0, 0, 0)   = 0
2409   0.000618 pthread_mutex_unlock(0x5647187a4180, 129, 1, 0x7ff543c5d5d6 

2411   0.000242 <... pthread_cond_timedwait resumed> ) = 0
2409   0.26 <... pthread_mutex_unlock resumed> )   = 0

They show that while the main thread sleeps for more than 2 seconds, the
pthread_cond_timedwait invocation in the other thread, that has a timeout
of 1 second as argument
  - takes more than 2 seconds to complete,
  - when it completes, returns 0, not ETIMEDOUT.

Similarly, in the strace log. Here you can see the absolute time stamps:

2550  1719588804.808799 (+ 0.24) nanosleep({tv_sec=2, tv_nsec=0},  

2552  1719588804.808810 (+ 0.11) futex(0x562c763d816c, 
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1719588805, 
tv_nsec=808799000}, FUTEX_BITSET_MATCH_ANY 
2550  1719588806.863535 (+ 2.054725) <... nanosleep resumed> 
0x7ffe334d1a90) = 0
2550  1719588806.863658 (+ 0.000122) futex(0x562c763d816c, 
FUTEX_WAKE_PRIVATE, 1) = 1

Even though the futex was supposed to time out at  1719588805.808799
it terminated only at  1719588806.863658
which is more than 1 second too late.

It seems that this is the same bug as
.

Evidently, 'test-cond' (the test of the Gnulib condition variables) and
'test-cnd' (the test of the ISO C11 condition variables) fail as well,
occasionally, since they use pthread_cond_*.

2409   0.00 signal(SIGALRM, 0) = 0
2409   0.000173 alarm(600) = 0
2409   0.000123 pthread_cond_init(0x5647187a4140, 0, 0, 0x7ff543b545c7) = 0
2409   0.000110 pthread_mutexattr_init(0x7fff1f9ee718, 0, 0, 0x7ff543b545c7) = 0
2409   0.93 pthread_mutexattr_settype(0x7fff1f9ee718, 0, 0, 0x7ff543b545c7) 
= 0
2409   0.000105 pthread_mutex_init(0x5647187a4180, 0x7fff1f9ee718, 0, 
0x7ff543b545c7) = 0
2409   0.000122 pthread_mutexattr_destroy(0x7fff1f9ee718, 0x7fff1f9ee718, 640, 
0) = 0
2409   0.000126 printf("Starting test_pthread_cond_wait "...)  = 35
2409   0.000476 fflush(0x7ff543c4a760) = 0
2409   0.000995 pthread_create(0x7fff1f9ee6d8, 0, 0x5647187a1295, 0) = 0
2409   0.000474 sched_yield(0x3d0f00, 0, 0, 0 
2410   0.000208 pthread_mutex_lock(0x5647187a4180, 0, 0xb6bb9194910ce38a, 
0x7ff543a8a700 
2409   0.000162 <... sched_yield resumed> )= 0
2410   0.30 <... pthread_mutex_lock resumed> ) = 0
2409   0.17 nanosleep(0x7fff1f9ee690, 0x7fff1f9ee6e0, 0x7fff1f9ee6e0, 
0x7ff543b6e917 
2410   0.000125 pthread_cond_wait(0x5647187a4140, 0x5647187a4180, 0, 0 

2409   2.040120 <... nanosleep resumed> )  = 0
2409   0.90 pthread_mutex_lock(0x5647187a4180, 0x7fff1f9ee6e0, 0, 
0x7ff543c60bf0) = 0
2409   0.000688 pthread_cond_signal(0x5647187a4140, 0, 0, 0)   = 0
2409   0.000493 pthread_mutex_unlock(0x5647187a4180, 129, 1, 0x7ff543c5d5d6 

2410   0.000350 <... pthread_cond_wait resumed> )  = 0
2409   0.40 <... pthread_mutex_unlock resumed> )   = 0
2410   0.40 pthread_mutex_unlock(0x5647187a4180, 128, 0, 0 
2409   0.000247 pthread_join(0x7ff543a8a700, 0, 0x5647187a4180, 0x7ff543c6033a 

2410   0.000327 <... pthread_mutex_unlock resumed> )   = 0
2410   0.000296 +++ exited (status 0) +++
2409   0.39 <... pthread_join resumed> )   = 0
2409   0.89 puts(" OK")= 4
2409   0.000389 fflush(0x7ff543c4a760) = 0
2409   0.000270 printf("Starting test_pthread_cond_timed"...)  = 40
2409   0.000394 fflush(0x7ff543c4a760) = 0
2409   0.000360 pthread_create(0x7fff1f9ee6d8, 0, 0x5647187a1669, 0) = 0
2409   0.000509 sched_yield(0x3d0f00, 0, 0, 0x7ff543a8a700 
2411   0.000267 pthread_mutex_lock(0x5647187a4180, 0, 0xb6bb9194910ce38a, 
0x7ff543a8a700 
2409   0.000240 <... sched_yield resumed> ) 

[PATCH] tests: switch nap() to use file creation to gauge delay

2024-06-28 Thread Jeff Layton
The current test in nap.h tries to gauge the minimum delay that results
in a timestamp change when successively writing to a file.

There are some proposed changes [1] to track finer-grained timestamps in
the Linux kernel that will break the assumptions that nap() uses to
gauge the delay. In particular, writing to a file will almost always
show a change in the timestamp now, so usually this method will settle
on a delay of 1ns.

Switch this code over to use file creation of two files to gauge the
delay. Since that is operating on two different files, it should still
be possible to gauge the coarse-grained timer tick from that.

[1]: 
https://lore.kernel.org/linux-fsdevel/20240626-mgtime-v1-0-a189352d0...@kernel.org/

Signed-off-by: Jeff Layton 
---
Failure of the test-stat-time test is what triggered us to revert the
multigrain timestamp series from the Linux kernel last October.  With
that failure, we'd sometimes see timestamps showing files being modified
in reverse order (if one got a fine-grained and another got a
coarse-grained timestamp).  That problem has now been addressed, but
test-stat-time still fails without this change.
---
 ChangeLog   |  5 
 tests/nap.h | 84 +
 2 files changed, 33 insertions(+), 56 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a9453b0b4513..6524d717cf7c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2024-06-28  Jeff Layton  
+   * tests/nap.h: change the method used to gauge the coarse-grained
+   timestamp to use creation of two files instead of writes to a
+   single file.
+
 2024-06-28  Bruno Haible  
 
time: Fix test failure on FreeBSD.
diff --git a/tests/nap.h b/tests/nap.h
index cf7d998b800e..323df6264a02 100644
--- a/tests/nap.h
+++ b/tests/nap.h
@@ -31,10 +31,8 @@
 # endif
 
 /* Name of the witness file.  */
-#define TEMPFILE BASE "nap.tmp"
-
-/* File descriptor used for the witness file.  */
-static int nap_fd = -1;
+#define TEMPFILE1 BASE "nap1.tmp"
+#define TEMPFILE2 BASE "nap2.tmp"
 
 /* Return A - B, in ns.
Return 0 if the true result would be negative.
@@ -62,56 +60,46 @@ diff_timespec (struct timespec a, struct timespec b)
   return sdiff;
 }
 
-/* If DO_WRITE, bump the modification time of the file designated by NAP_FD.
-   Then fetch the new STAT information of NAP_FD.  */
-static void
-nap_get_stat (struct stat *st, int do_write)
+void clear_tempfiles(void)
 {
-  if (do_write)
-{
-  ASSERT (write (nap_fd, "\n", 1) == 1);
-#if defined _WIN32 || defined __CYGWIN__
-  /* On Windows, the modification times are not changed until NAP_FD
- is closed. See
- 

 */
-  close (nap_fd);
-  nap_fd = open (TEMPFILE, O_RDWR, 0600);
-  ASSERT (nap_fd != -1);
-  lseek (nap_fd, 0, SEEK_END);
-#endif
-}
-  ASSERT (fstat (nap_fd, st) == 0);
+   unlink(TEMPFILE1);
+   unlink(TEMPFILE2);
 }
 
 /* Given a file whose descriptor is FD, see whether delaying by DELAY
nanoseconds causes a change in a file's mtime.
OLD_ST is the file's status, recently gotten.  */
 static bool
-nap_works (int delay, struct stat old_st)
+nap_works (int delay)
 {
-  struct stat st;
+  struct stat st1, st2;
   struct timespec delay_spec;
+  int fd1, fd2;
+  int ret;
+
+  ret = unlink (TEMPFILE1);
+  ASSERT(ret == 0 || errno == ENOENT);
+  ret = unlink (TEMPFILE2);
+  ASSERT(ret == 0 || errno == ENOENT);
+
   delay_spec.tv_sec = delay / 10;
   delay_spec.tv_nsec = delay % 10;
-  ASSERT (nanosleep (&delay_spec, 0) == 0);
-  nap_get_stat (&st, 1);
 
-  if (diff_timespec (get_stat_mtime (&st), get_stat_mtime (&old_st)))
+  ASSERT ((fd1 = creat (TEMPFILE1, 0600)) != -1);
+  close(fd1);
+  ASSERT (nanosleep (&delay_spec, 0) == 0);
+  ASSERT ((fd2 = creat (TEMPFILE2, 0600)) != -1);
+  close(fd2);
+  ASSERT (stat (TEMPFILE1, &st1) != -1);
+  ASSERT (stat (TEMPFILE2, &st2) != -1);
+  ASSERT (unlink (TEMPFILE1) == 0);
+  ASSERT (unlink (TEMPFILE2) == 0);
+
+  if (diff_timespec (get_stat_mtime (&st2), get_stat_mtime (&st1)))
 return true;
-
   return false;
 }
 
-static void
-clear_temp_file (void)
-{
-  if (0 <= nap_fd)
-{
-  ASSERT (close (nap_fd) != -1);
-  ASSERT (unlink (TEMPFILE) != -1);
-}
-}
-
 /* Sleep long enough to notice a timestamp difference on the file
system in the current directory.  Use an adaptive approach, trying
to find the smallest delay which works on the current file system
@@ -122,28 +110,12 @@ clear_temp_file (void)
 static void
 nap (void)
 {
-  struct stat old_st;
   static int delay = 1;
 
-  if (-1 == nap_fd)
-{
-  atexit (clear_temp_file);
-  ASSERT ((nap_fd = creat (TEMPFILE, 0600)) != -1);
-  nap_get_stat (&old_st, 0);
-}
-  else
-{
-  ASSERT (0 <= nap_fd);
-  nap_get_stat (&old_st, 1);
-}
-
-  if (1 < delay)
-delay = delay / 2;  /* Try half of the previ

Re: a pthread_cond_timedwait bug

2024-06-28 Thread Bruno Haible
I wrote:
> Even though the futex was supposed to time out at  1719588805.808799
> it terminated only at  1719588806.863658
> which is more than 1 second too late.
> 
> It seems that this is the same bug as
> .

Wrong guess. It is *not* this glibc bug.

First, I polished this test (tests/test-pthread-cond.c) and its two
siblings, fixing bugs in the test:
  - Use of an 'int' variable to communicate between two threads,
where at least one of the accesses was not in the scope of a lock.
I marked it 'volatile'. (Also tried '_Atomic int volatile', but
that did not make a difference.)
  - Fix a race condition that could occur if one thread does not get
woken up for 2 seconds (likely would only occur under very high
load).

The test still sometimes failed on Debian 10.7.

On an Alpine Linux 3.20 VM, also with 8 threads, the test fails in the
same way.

So, if it was a glibc bug, it would have to be a musl bug as well. That
seemed unlikely, since musl libc generally has simpler code.

The suspicion thus moved to the hypervisor. I tried a couple of VMs under
QEMU (with '-smp 8' for 8 CPUs, and on x86_64 with '-enable-kvm'), and
there the test succeeds.

So, it looked to be specific to VirtualBox.

I turned to the VirtualBox settings, more precisely to the "Paravirtualization"
acceleration setting. Result:
  Default, KVM   -> test fails occasionally (about 1 in 3 times)
  None, Legacy, Minimal, Hyper-V -> test succeeds (no failure in 20 runs)

With that, I'm documenting it as a bug specific to VirtualBox.


2024-06-28  Bruno Haible  

doc: Mention pthread_cond_timedwait bug caused by hypervisor.
* doc/posix-functions/pthread_cond_timedwait.texi: Mention VirtualBox
specific bug.
* doc/posix-functions/cnd_timedwait.texi: Likewise.

2024-06-28  Bruno Haible  

cond tests: Avoid theoretically possible failure.
* tests/test-cnd.c (cnd_wait_routine, cnd_timedwait_routine): Report to
the main thread if this thread comes too late.
(test_cnd_wait, test_cnd_timedwait): Return a 'skipped' value.
(main): Print SKIP instead of OK if the essence of the test was skipped.
* tests/test-cond.c (cond_routine, timedcond_routine): Report to the
main thread if this thread comes too late.
(test_cond, test_timedcond): Return a 'skipped' value.
(main): Print SKIP instead of OK if the essence of the test was skipped.
* tests/test-pthread-cond.c (pthread_cond_wait_routine,
pthread_cond_timedwait_routine): Report to the main thread if this
thread comes too late.
(test_pthread_cond_wait, test_pthread_cond_timedwait): Return a
'skipped' value.
(main): Print SKIP instead of OK if the essence of the test was skipped.

cond tests: Improve multithread-safety.
* tests/test-cnd.c (cond_value, cond_timed_out): Mark as volatile.
* tests/test-cond.c (cond_value, cond_timed_out): Likewise.
* tests/test-pthread-cond.c (cond_value, cond_timed_out): Likewise.

cond tests: Improve comments.
* tests/test-cnd.c: Improve comments.
(cond_value): Remove initializer.
(cond_timed_out): Renamed from cond_timeout.
* tests/test-cond.c: Improve comments.
(cond_value): Remove initializer.
(cond_timed_out): Renamed from cond_timeout.
* tests/test-pthread-cond.c: Improve comments.
(cond_value): Remove initializer.
(cond_timed_out): Renamed from cond_timeout.

>From d7b8b1ccc988290277945f52e1582dd669a97d7f Mon Sep 17 00:00:00 2001
From: Bruno Haible 
Date: Fri, 28 Jun 2024 21:40:16 +0200
Subject: [PATCH 1/4] cond tests: Improve comments.

* tests/test-cnd.c: Improve comments.
(cond_value): Remove initializer.
(cond_timed_out): Renamed from cond_timeout.
* tests/test-cond.c: Improve comments.
(cond_value): Remove initializer.
(cond_timed_out): Renamed from cond_timeout.
* tests/test-pthread-cond.c: Improve comments.
(cond_value): Remove initializer.
(cond_timed_out): Renamed from cond_timeout.
---
 ChangeLog | 13 +++
 tests/test-cnd.c  | 71 +++
 tests/test-cond.c | 55 ++
 tests/test-pthread-cond.c | 71 +++
 4 files changed, 131 insertions(+), 79 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a9453b0b45..a309c59a50 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2024-06-28  Bruno Haible  
+
+	cond tests: Improve comments.
+	* tests/test-cnd.c: Improve comments.
+	(cond_value): Remove initializer.
+	(cond_timed_out): Renamed from cond_timeout.
+	* tests/test-cond.c: Improve comments.
+	(cond_value): Remove initializer.
+	(cond_timed_out): Renamed from cond_timeout.
+	* tests/test-pthread-cond.c: Improve comments.
+	(cond_value): Remove initializer.
+	(

Re: [PATCH] tests: switch nap() to use file creation to gauge delay

2024-06-28 Thread Bruno Haible
Jeff Layton wrote:
> Failure of the test-stat-time test is what triggered us to revert the
> multigrain timestamp series from the Linux kernel last October.  With
> that failure, we'd sometimes see timestamps showing files being modified
> in reverse order (if one got a fine-grained and another got a
> coarse-grained timestamp).

I remember. Thanks for having listened to Jan Kara, Paul, me, and others.

> Switch this code over to use file creation of two files to gauge the
> delay. Since that is operating on two different files, it should still
> be possible to gauge the coarse-grained timer tick from that.

So, if I understand it right, there will now be two concepts of "file
system time stamp resolution":
  a. regarding writes in a file,
  b. regarding file creation,
and   a <= b. Your patch to make nap() return b, not a, seems reasonable.

Let's apply it when the copyright assignment process is complete.

Thanks for your contribution!

Bruno






Re: test-pthread-rwlock failure on Pop!_OS 22.04 LTS

2024-06-28 Thread Bruno Haible
Hi Paul,

These measurements:

> * in virtual machines with 8 CPUs:
> 
>   - Ubuntu 22.0460 sec  (4 CPUs: 15 sec)
> 
> * outside VirtualBox:
> 
>   - Ubuntu 22.04 (12 CPUs)   6 sec

together with the investigation of the pthread_cond_timedwait "bug",
makes it sound possible that a hypervisor is exercising some influence.

I would therefore be interested in the result of the following
experiment:
  1. Install a toolchain for creating binaries linked against musl-libc:
 $ sudo apt install musl-gcc
  2. Create a testdir for the module 'pthread-rwlock'.
  3. Build it with CC="gcc". You should still see the test failure after
 10 minutes.
  4. Build it with CC="musl-gcc". Run
 $ time gltests/test-pthread-rwlock

If 4. takes more than 30 seconds, your OS is probably running on top of
a hypervisor, and that is causing the problem.

If 4. takes just a few seconds, it looks like a glibc bug.

Bruno






Re: a pthread_cond_timedwait bug

2024-06-28 Thread Collin Funk
Hi Bruno,

Bruno Haible  writes:

> The suspicion thus moved to the hypervisor. I tried a couple of VMs under
> QEMU (with '-smp 8' for 8 CPUs, and on x86_64 with '-enable-kvm'), and
> there the test succeeds.

Unrelated to the test, but you might like 'virt-manager' for setting up
KVM virtual machines [1]. I can never remember all the correct flags for
qemu. But 'virt-manager' has an easy UI and configuration similar to
VirtualBox.

Collin

[1] https://virt-manager.org/
It has been packaged in every GNU/Linux distribution I have used.



[PATCH] doc: Mention the byteswap module in function documentation.

2024-06-28 Thread Collin Funk
* doc/glibc-functions/bswap_16.texi (bswap_16): Mention the byteswap
module.
* doc/glibc-functions/bswap_32.texi (bswap_32): Likewise.
* doc/glibc-functions/bswap_64.texi (bswap_64): Likewise.
---
 ChangeLog | 8 
 doc/glibc-functions/bswap_16.texi | 2 +-
 doc/glibc-functions/bswap_32.texi | 2 +-
 doc/glibc-functions/bswap_64.texi | 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index effda3a4a9..6d5caed394 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2024-06-28  Collin Funk  
+
+   doc: Mention the byteswap module in function documentation.
+   * doc/glibc-functions/bswap_16.texi (bswap_16): Mention the byteswap
+   module.
+   * doc/glibc-functions/bswap_32.texi (bswap_32): Likewise.
+   * doc/glibc-functions/bswap_64.texi (bswap_64): Likewise.
+
 2024-06-28  Bruno Haible  
 
doc: Mention pthread_cond_timedwait bug caused by hypervisor.
diff --git a/doc/glibc-functions/bswap_16.texi 
b/doc/glibc-functions/bswap_16.texi
index 63462e14fc..32ef44671a 100644
--- a/doc/glibc-functions/bswap_16.texi
+++ b/doc/glibc-functions/bswap_16.texi
@@ -4,7 +4,7 @@ @node bswap_16
 
 Documentation:@* 
@uref{https://www.kernel.org/doc/man-pages/online/pages/man3/bswap_16.3.html,,man
 bswap_16}
 
-Gnulib module: ---
+Gnulib module: byteswap
 
 Portability problems fixed by Gnulib:
 @itemize
diff --git a/doc/glibc-functions/bswap_32.texi 
b/doc/glibc-functions/bswap_32.texi
index 4606162feb..a1f84269e8 100644
--- a/doc/glibc-functions/bswap_32.texi
+++ b/doc/glibc-functions/bswap_32.texi
@@ -4,7 +4,7 @@ @node bswap_32
 
 Documentation:@* 
@uref{https://www.kernel.org/doc/man-pages/online/pages/man3/bswap_32.3.html,,man
 bswap_32}
 
-Gnulib module: ---
+Gnulib module: byteswap
 
 Portability problems fixed by Gnulib:
 @itemize
diff --git a/doc/glibc-functions/bswap_64.texi 
b/doc/glibc-functions/bswap_64.texi
index d12e885882..b4d72e75a3 100644
--- a/doc/glibc-functions/bswap_64.texi
+++ b/doc/glibc-functions/bswap_64.texi
@@ -4,7 +4,7 @@ @node bswap_64
 
 Documentation:@* 
@uref{https://www.kernel.org/doc/man-pages/online/pages/man3/bswap_64.3.html,,man
 bswap_64}
 
-Gnulib module: ---
+Gnulib module: byteswap
 
 Portability problems fixed by Gnulib:
 @itemize
-- 
2.45.2