On Mon, Dec 9, 2024 at 7:31 PM Andres Freund <and...@anarazel.de> wrote: > Pretty unexcited about all of these - XFS is fairly widely used for PG, but > this problem doesn't seem very common. It seems to me that we're missing > something that causes this to only happen in a small subset of cases.
I wonder if this is actually pretty common on XFS. I mean, we've already hit this with at least one EDB customer, and Michael's report is, as far as I know, independent of that; and he points to a pgsql-general thread which, AFAIK, is also independent. We don't get three (or more?) independent reports of that many bugs, so I think it's not crazy to think that the problem is actually pretty common. It's probably workload dependent somehow, but for all we know today it seems like the workload could be as simple as "do enough file extension and you'll get into trouble eventually" or maybe "do enough file extension[with some level of concurrency and you'll get into trouble eventually". > I think the source of this needs to be debugged further before we try to apply > workarounds in postgres. Why? It seems to me that this has to be a filesystem bug, and we should almost certainly adopt one of these ideas from Michael Harris: - Providing a way to configure PG not to use posix_fallocate at runtime - In the case of posix_fallocate failing with ENOSPC, fall back to FileZero (worst case that will fail as well, in which case we will know that we really are out of space) Maybe we need some more research to figure out which of those two things we should do -- I suspect the second one is better but if that fails then we might need to do the first one -- but I doubt that we can wait for XFS to fix whatever the issue is here. Our usage of posix_fallocate doesn't look to be anything more than plain vanilla, so as between these competing hypotheses: (1) posix_fallocate is and always has been buggy and you can't rely on it, or (2) we use posix_fallocate in a way that nobody else has and have hit an incredibly obscure bug as a result, which will be swiftly patched ...the first seems much more likely. -- Robert Haas EDB: http://www.enterprisedb.com