On 1/3/25 09:47, Greg Sabino Mullane wrote:
On Fri, Jan 3, 2025 at 8:33 AM Robert Haas <robertmh...@gmail.com <mailto:robertmh...@gmail.com>> wrote:

    We tried to make our code as robust as it could be in the face of
    kernel code that behaved in a manner that was fairly ridiculous
    relative to our needs. This case doesn't seem that different to me.


+1. Seems a shame that freebsd chooses such "optimizations", but making our code do various workarounds and jump through hoops to support various OS quirks (hello Win32 fans!) seems a burden we agreed to take on a long time ago.

FWIW, we observed this issue in pgBackRest a few years ago -- as you can imagine we do a lot of scanning so readdir gets a real workout.

We had one issue reported [1] involving Alpine Linux and CIFS and another [2] with SLES and NFS. We also had at least one internal report that involved RHEL and a proprietary storage appliance. I'm not certain if we received any reports for FreeBSD but it kind of rings a bell.

Over some time and various reports it seemed that any storage was potentially a problem. I resisted the notion that we would have to work around something that seemed to be an obvious kernel bug but in the end I capitulated.

We fixed this by making a snapshot of each directory before performing any operations on that directory (as has been suggested upthread). One advantage we have is that our storage is very centralized since we deal with a number of storage types so there are no readdirs in the general code base. It was still a pretty major patch [3] but a lot of it was removing the callbacks that we had used previously and adding optimizations to reduce memory consumption.

One more thing to note -- we are still assuming that Postgres is running on storage that is not subject to this issue. Even with our new methodology if Postgres is deleting files while we are trying to build a backup manifest that could cause us (and base backup) problems. The only solution I came up with for that problem was to keep reading the directory until we get two snapshots that match -- not very attractive but probably workable for pgBackRest. I doubt the same could be said for Postgres.

Regards,
-David

---

[1] https://github.com/pgbackrest/pgbackrest/issues/1754
[2] https://github.com/pgbackrest/pgbackrest/issues/1423
[3] https://github.com/pgbackrest/pgbackrest/commit/75623d45



Reply via email to