Re: FileFallocate misbehaving on XFS

Jakub Wartak Tue, 10 Dec 2024 03:37:09 -0800

 On Tue, Dec 10, 2024 at 7:34 AM Michael Harris <har...@gmail.com> wrote:


Hi Michael,

1. Well it doesn't look like XFS AG fragmentation to me (we had a customer
with a huge number of AGs with small space in them) reporting such errors
after upgrading to 16, but not for earlier versions (somehow
posix_fallocate() had to be the culprit).

2.

> # xfs_info /dev/mapper/ippvg-ipplv
> meta-data=/dev/mapper/ippvg-ipplv isize=512    agcount=4,
agsize=262471424 blks
>         =                       sectsz=512   attr=2, projid32bit=1
>         =                       crc=1        finobt=0, sparse=0, rmapbt=0
>         =                       reflink=0    bigtime=0 inobtcount=0
nrext64=0

Yay, reflink=0, that's pretty old fs ?!

> ERROR:  could not extend file
"pg_tblspc/16401/PG_16_202307071/17643/1249.1" with FileFallocate(): No
space left on device

2. This indicates it was allocating 1GB for such a table (".1"), on
tablespace that was created more than a year ago. Could you get us maybe
those below commands too? (or from any other directory exhibiting such
errors)

stat pg_tblspc/16401/PG_16_202307071/17643/
ls -1 pg_tblspc/16401/PG_16_202307071/17643/ | wc -l
time ls -1 pg_tblspc/16401/PG_16_202307071/17643/ | wc -l # to assess
timing of getdents() call as that may something about that directory
indirectly

3. Maybe somehow there is a bigger interaction between posix_fallocate()
and delayed XFS's dynamic speculative preallocation from many processes all
writing into different partitions ? Maybe try "allocsize=1m" mount option
for that /fs and see if that helps.  I'm going to speculate about XFS
speculative :) pre allocations, but if we have fdcache and are *not*
closing fds, how XFS might know to abort its own speculation about
streaming write ? (multiply that up to potentially the number of opened fds
to get an avalanche of "preallocations").

4. You can also try compiling with patch from Alvaro from [2]
"0001-Add-some-debugging-around-mdzeroextend.patch", so we might end up
having more clarity in offsets involved. If not then you could use 'strace
-e fallocate -p <pid>' to get the exact syscall.

5. Another idea could be catching the kernel side stacktrace of fallocate()
when it is hitting ENOSPC. E.g. with XFS fs and attached bpftrace eBPF
tracer I could get the source of the problem in my artificial reproducer,
e.g

# bpftrace ./track_enospc2.bt # wait for "START" and then start reproducing
on the sess2, but try to minimize the time period, that eBPF might things
really slow

$ dd if=/dev/zero of=/fs/test1 bs=1M count=200
$ fallocate /fs/test -l 30000000
fallocate: fallocate failed: No space left on device
$ df -h /fs
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      236M  217M   20M  92% /fs

# in bpftrace CTRL+C, will get:
@errors[-28, kretprobe:xfs_file_fallocate,
    xfs_alloc_file_space+665
    xfs_alloc_file_space+665
    xfs_file_fallocate+869
    vfs_fallocate+319
    __x64_sys_fallocate+68
    do_syscall_64+130
    entry_SYSCALL_64_after_hwframe+118
]: 1

-28 = ENOSPC, xfs_alloc_file_space() was the routine that was root-cause
and shows the full logic behind it. That ABI might be different on Your
side due to kernel variations. It could be enhanced, and it might print too
much (so you need to look for that -28 in the outputs). Possibly if you get
any sensible output from it, you could also involve OS support (because if
posix_fallocate() fails and there's space , then it's pretty odd anyway).

-J.

[1] -
https://www.postgresql.org/message-id/50a117b6.5030...@optionshouse.com
[2] -
https://www.postgresql.org/message-id/202409110955.6njbwzm4ocus%40alvherre.pgsql

track_enospc2.bt
Description: Binary data

Re: FileFallocate misbehaving on XFS

Reply via email to