On Thu, Sep 4, 2025 at 6:50 AM Jeff Davis <pg...@j-davis.com> wrote: > I like the idea of some kind of fallback for multiple reasons. I > noticed that if I set io_workers=1, and then I SIGSTOP that worker, > then sequential scans make no progress at all until I send SIGCONT. A > fallback to synchronous sounds more robust, and more similar to what we > do with walwriter and bgwriter. (That may be 19 material, though.)
This seems like a non-problem. Robustness against SIGSTOP of random backends is not a project goal or reasonable goal, is it? You can SIGSTOP a backend doing IO in any historical release, possibly blocking other backends too based on locks etc etc. That said, it is quite reasonable to ask why it doesn't just start a new worker, and that's just code maturity: * I have a patch basically ready to commit for v19 (CF #5913) that would automatically add more workers if you did that. But even then you could be unlucky and SIGSTOP a worker while it holds the submission queue lock. * I also had experimental versions that use a lock free queue, but it didn't seem necessary given how hard it is to measure meaningful lock contention so far; I guess it must be easier on a NUMA system and one might wonder about per-NUMA-node queues, but that also feels a bit questionable because if you had a very high end system you'd probably be looking into better tuning including io_method=io_uring.