Re: Non-reproducible AIO failure

Konstantin Knizhnik Thu, 12 Jun 2025 07:22:41 -0700


On 12/06/2025 4:57 pm, Andres Freund wrote:

The problem appears to be in that switch between "when submitted, by the IO
worker" and "then again by the backend".  It's not concurrent access in the
sense of two processes writing to the same value, it's that when switching
from the worker updating ->distilled_result to the issuer looking at that, the
issuer didn't ensure that no outdated version of ->distilled_result could be
used.

Basically, the problem is that the worker would

1) set ->distilled_result
2) perform a write memory barrier
3) set ->state to COMPLETED_SHARED

and then the issuer of the IO would:

4) check ->state is COMPLETED_SHARED
5) use ->distilled_result

The problem is that there currently is no barrier between 4 & 5, which means
an outdated ->distilled_result could be used.


This also explains why the issue looked so weird - eventually, after fprintfs,
after a core dump, etc, the updated ->distilled_result result would "arrive"
in the issuing process, and suddenly look correct.


Thank you very much for explanation.

Everything seems to be so simple after explanations, that you can noteven believe that before you think that such behavior can be only causedby "black magic" or "OS bug":)


Certainly using outdated result can explain such behavior.
But in which particular place we loose read barrier between 4 and 5?

I see `pgaio_io_wait` which as I expect should be called by backend towait completion of IO.And it calls `pgaio_io_was_recycled` to get state and it in turn enforceread barrier:

```

bool

pgaio_io_was_recycled(PgAioHandle *ioh, uint64 ref_generation,PgAioHandleState *state)

{
    *state = ioh->state;
    pg_read_barrier();

    return ioh->generation != ref_generation;
}
```

Re: Non-reproducible AIO failure

Reply via email to