On 5/31/25 16:00, Thomas Munro wrote: > On Fri, May 30, 2025 at 3:58 AM Dimitrios Apostolou <ji...@gmx.net> wrote: >> All I'm saying is that this is a regression for PostgreSQL users that keep >> tablespaces on compressed Btrfs. What could be done from postgres, is to >> provide a runtime setting for avoiding fallocate(), going instead through >> the old code path. Idelly this would be an option per tablespace, but even >> a global one is better than nothing. > > Here's an initial sketch of such a setting. Better name, design, > words welcome. Would need a bit more work to cover temp tables too. > It's slightly tricky to get smgr to behave differently because of the > contents of a system catalogue! I couldn't think of a better way than > exposing it as a flag that the buffer manager layer has to know about > and compute earlier, but that also seems a bit strange, as fallocate > is a highly md.c specific concern. Hmm. >
I find the definition of io_min_fallocate confusing, or rather that 0 means "never" instead of "always". It's described as a "threshold at which to start using fallocate", so I'd expect 0 to mean "always" because (len >= 0). I suggest to use "-1" to mean never and "0" always, as for other similar settings (e.g. log_min_duration_statement or log_lock_waits). > I suppose something like the 0001 part could be back-patched if this > is considered a serious enough problem without other workarounds, so I > did this in two steps. I wonder if there are good reasons to want to > change the number on other file systems. I suppose it at least allows > experimentation. Maybe. It'd need to get some of the 0002 bits too, ofc. I'm not sure we really want all these special GUC tailored for different filesystems. We already have a few such GUCs, it's getting tricky to know which ones to set / not set, and it also changes with the filesystem version ... I personally don't know which ones to set, a lot of the knowledge is somewhat outdated I think. Wouldn't it be better for btrfs to just start returning EOPNOTSUPP (maybe with a mount option), in which case we already do the right thing automatically already? Sure, it means the admin needs to be aware of this in both cases. regards -- Tomas Vondra