On 5/31/25 16:00, Thomas Munro wrote:
> On Fri, May 30, 2025 at 3:58 AM Dimitrios Apostolou <ji...@gmx.net> wrote:
>> All I'm saying is that this is a regression for PostgreSQL users that keep
>> tablespaces on compressed Btrfs. What could be done from postgres, is to
>> provide a runtime setting for avoiding fallocate(), going instead through
>> the old code path. Idelly this would be an option per tablespace, but even
>> a global one is better than nothing.
> 
> Here's an initial sketch of such a setting.  Better name, design,
> words welcome.  Would need a bit more work to cover temp tables too.
> It's slightly tricky to get smgr to behave differently because of the
> contents of a system catalogue!  I couldn't think of a better way than
> exposing it as a flag that the buffer manager layer has to know about
> and compute earlier, but that also seems a bit strange, as fallocate
> is a highly md.c specific concern.  Hmm.
> 

I find the definition of io_min_fallocate confusing, or rather that 0
means "never" instead of "always". It's described as a "threshold at
which to start using fallocate", so I'd expect 0 to mean "always"
because (len >= 0).

I suggest to use "-1" to mean never and "0" always, as for other similar
settings (e.g. log_min_duration_statement or log_lock_waits).

> I suppose something like the 0001 part could be back-patched if this
> is considered a serious enough problem without other workarounds, so I
> did this in two steps.  I wonder if there are good reasons to want to
> change the number on other file systems.  I suppose it at least allows
> experimentation.

Maybe. It'd need to get some of the 0002 bits too, ofc.

I'm not sure we really want all these special GUC tailored for different
filesystems. We already have a few such GUCs, it's getting tricky to
know which ones to set / not set, and it also changes with the
filesystem version ... I personally don't know which ones to set, a lot
of the knowledge is somewhat outdated I think.

Wouldn't it be better for btrfs to just start returning EOPNOTSUPP
(maybe with a mount option), in which case we already do the right thing
automatically already? Sure, it means the admin needs to be aware of
this in both cases.


regards

-- 
Tomas Vondra



Reply via email to