Hello Andres,

14.04.2025 19:06, Andres Freund wrote:
Unfortunately I'm several hundred iterations in, without reproducing the
issue. I'm bad at statistics, but I think that makes it rather unlikely that I
will, without changing some aspect.

Was this an assert enabled build? What compiler and what optimization settings
did you use? Do you have huge pages configured (so that the default
huge_pages=try would end up with huge pages)?

Yes, I used --enable-cassert; no explicit optimization setting and no huge
pages configured. pg_config says:
CONFIGURE =  '--enable-debug' '--enable-cassert' '--enable-tap-tests' 
'--with-liburing'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2

Please look at the complete script attached. I've just run it and got:
iteration 56 (jobs: 44)
Tue Apr 15 06:30:52 PM CEST 2025
dropdb: error: database removal failed: ERROR:  could not read blocks 0..0 in file 
"global/1213": Operation canceled
2025-04-15 18:31:00.650 CEST [1612266] LOG:  could not read blocks 0..0 in file 
"global/1213": Operation canceled
2025-04-15 18:31:00.650 CEST [1612266] CONTEXT:  completing I/O on behalf of 
process 1612271
2025-04-15 18:31:00.650 CEST [1612266] STATEMENT:  DROP DATABASE db3;

I used gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, but now I've also
reproduced the issue with CC=clang (18.1.3 (1ubuntu1)).

Please take a look also at the simple reproducer for the crash inside
pg_get_aios() I mentioned upthread:
for i in {1..100}; do
  numjobs=12
  echo "iteration $i"
  date
  for ((j=1;j<=numjobs;j++)); do
    ( createdb db$j; for k in {1..300}; do
        echo "CREATE TABLE t (a INT); CREATE INDEX ON t (a); VACUUM t;
              SELECT COUNT(*) >= 0 AS ok FROM pg_aios; " \
        | psql -d db$j >/dev/null 2>&1;
      done; dropdb db$j; ) &
  done
  wait
  psql -c 'SELECT 1' || break;
done

it fails for me as follows:
iteration 20
Tue Apr 15 07:21:29 PM EEST 2025
dropdb: error: connection to server on socket "/tmp/.s.PGSQL.55432" failed: No 
such file or directory
       Is the server running locally and accepting connections on that socket?
...
2025-04-15 19:21:30.675 EEST [3111699] LOG:  client backend (PID 3320979) was 
terminated by signal 11: Segmentation fault
2025-04-15 19:21:30.675 EEST [3111699] DETAIL:  Failed process was running: SELECT 
COUNT(*) >= 0 AS ok FROM pg_aios;
2025-04-15 19:21:30.675 EEST [3111699] LOG:  terminating any other active 
server processes

I reproduced this error on three different machines (all are running
Ubuntu 24.04, two with kernel version 6.8, one with 6.11), with PGDATA
located on tmpfs.
That's another variable to try - so far I've been trying this on 6.15.0-rc1
[1].  I guess I'll have to set up a ubuntu 24.04 VM and try with that.

Greetings,

Andres Freund


[1] I wanted to play with io_uring changes that were recently merged. Namely
support for readv/writev of "fixed" buffers. That avoids needing to pin/unpin
buffers while IO is ongoing, which turns out to be a noticeable bottleneck in
some workloads, particularly when using 1GB huge pages.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Attachment: repro.tar.gz
Description: application/gzip

Reply via email to