Re: AIO v2.5

Andres Freund Fri, 25 Apr 2025 09:44:41 -0700

Hi,

On 2025-04-15 21:00:00 +0300, Alexander Lakhin wrote:
> Please take a look also at the simple reproducer for the crash inside
> pg_get_aios() I mentioned upthread:
> for i in {1..100}; do
>   numjobs=12
>   echo "iteration $i"
>   date
>   for ((j=1;j<=numjobs;j++)); do
>     ( createdb db$j; for k in {1..300}; do
>         echo "CREATE TABLE t (a INT); CREATE INDEX ON t (a); VACUUM t;
>               SELECT COUNT(*) >= 0 AS ok FROM pg_aios; " \
>         | psql -d db$j >/dev/null 2>&1;
>       done; dropdb db$j; ) &
>   done
>   wait
>   psql -c 'SELECT 1' || break;
> done
> 
> it fails for me as follows:
> iteration 20
> Tue Apr 15 07:21:29 PM EEST 2025
> dropdb: error: connection to server on socket "/tmp/.s.PGSQL.55432" failed: 
> No such file or directory
>        Is the server running locally and accepting connections on that socket?
> ...
> 2025-04-15 19:21:30.675 EEST [3111699] LOG:  client backend (PID 3320979) was 
> terminated by signal 11: Segmentation fault
> 2025-04-15 19:21:30.675 EEST [3111699] DETAIL:  Failed process was running: 
> SELECT COUNT(*) >= 0 AS ok FROM pg_aios;
> 2025-04-15 19:21:30.675 EEST [3111699] LOG:  terminating any other active 
> server processes


Thanks for that.  The bug turns out to be pretty stupid - pgaio_io_reclaim()
resets the fields in PgAioHandle *before* updating the generation/state. That
opens up a window in which pg_get_aios() thinks the copied PgAioHandle is
valid, even though it was taken while the fields were being reset.

Once I had figured that out, it was easy to make it more reproducible - put a
pg_usleep() between the fields being reset in pgaio_io_reclaim() and the
generation increase / state update.

The fix is simple, increment generation and state before resetting fields.

Will push the fix for that soon.

Greetings,

Andres Freund

Re: AIO v2.5

Reply via email to