On Tue, 23 Feb 2021 at 05:04, Andres Freund <and...@anarazel.de> wrote: > > ## Callbacks > > In the core AIO pieces there are two different types of callbacks at the > moment: > > Shared callbacks, which can be invoked by any backend (normally the issuing > backend / the AIO workers, but can be other backends if they are waiting for > the IO to complete). For operations on shared resources (e.g. shared buffer > reads/writes, or WAL writes) these shared callback needs to transition the > state of the object the IO is being done for to completion. E.g. for a shared > buffer read that means setting BM_VALID / unsetting BM_IO_IN_PROGRESS. > > The main reason these callbacks exist is that they make it safe for a backend > to issue non-blocking IO on buffers (see the deadlock section above). As any > blocked backend can cause the IO to complete, the deadlock danger is gone.
So firstly this is all just awesome work and I have questions but I don't want them to come across in any way as criticism or as a demand for more work. This is really great stuff, thank you so much! The callbacks make me curious about two questions: 1) Is there a chance that a backend issues i/o, the i/o completes in some other backend and by the time this backend gets around to looking at the buffer it's already been overwritten again? Do we have to initiate I/O again or have you found a way to arrange that this backend has the buffer pinned from the time the i/o starts even though it doesn't handle the comletion? 2) Have you made (or considered making) things like sequential scans (or more likely bitmap index scans) asynchronous at a higher level. That is, issue a bunch of asynchronous i/o and then handle the pages and return the tuples as the pages arrive. Since sequential scans and bitmap scans don't guarantee to read the pages in order they're generally free to return tuples from any page in any order. I'm not sure how much of a win that would actually be since all the same i/o would be getting executed and the savings in shared buffers would be small but if there are mostly hot pages you could imagine interleaving a lot of in-memory pages with the few i/os instead of sitting idle waiting for the async i/o to return. > ## Stats > > There are two new views: pg_stat_aios showing AIOs that are currently > in-progress, pg_stat_aio_backends showing per-backend statistics about AIO. This is impressive. How easy is it to correlate with system aio stats? -- greg