On Tue, Dec 10, 2024 at 7:34 AM Michael Harris <har...@gmail.com> wrote:
Hi Michael, 1. Well it doesn't look like XFS AG fragmentation to me (we had a customer with a huge number of AGs with small space in them) reporting such errors after upgrading to 16, but not for earlier versions (somehow posix_fallocate() had to be the culprit). 2. > # xfs_info /dev/mapper/ippvg-ipplv > meta-data=/dev/mapper/ippvg-ipplv isize=512 agcount=4, agsize=262471424 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=0, sparse=0, rmapbt=0 > = reflink=0 bigtime=0 inobtcount=0 nrext64=0 Yay, reflink=0, that's pretty old fs ?! > ERROR: could not extend file "pg_tblspc/16401/PG_16_202307071/17643/1249.1" with FileFallocate(): No space left on device 2. This indicates it was allocating 1GB for such a table (".1"), on tablespace that was created more than a year ago. Could you get us maybe those below commands too? (or from any other directory exhibiting such errors) stat pg_tblspc/16401/PG_16_202307071/17643/ ls -1 pg_tblspc/16401/PG_16_202307071/17643/ | wc -l time ls -1 pg_tblspc/16401/PG_16_202307071/17643/ | wc -l # to assess timing of getdents() call as that may something about that directory indirectly 3. Maybe somehow there is a bigger interaction between posix_fallocate() and delayed XFS's dynamic speculative preallocation from many processes all writing into different partitions ? Maybe try "allocsize=1m" mount option for that /fs and see if that helps. I'm going to speculate about XFS speculative :) pre allocations, but if we have fdcache and are *not* closing fds, how XFS might know to abort its own speculation about streaming write ? (multiply that up to potentially the number of opened fds to get an avalanche of "preallocations"). 4. You can also try compiling with patch from Alvaro from [2] "0001-Add-some-debugging-around-mdzeroextend.patch", so we might end up having more clarity in offsets involved. If not then you could use 'strace -e fallocate -p <pid>' to get the exact syscall. 5. Another idea could be catching the kernel side stacktrace of fallocate() when it is hitting ENOSPC. E.g. with XFS fs and attached bpftrace eBPF tracer I could get the source of the problem in my artificial reproducer, e.g # bpftrace ./track_enospc2.bt # wait for "START" and then start reproducing on the sess2, but try to minimize the time period, that eBPF might things really slow $ dd if=/dev/zero of=/fs/test1 bs=1M count=200 $ fallocate /fs/test -l 30000000 fallocate: fallocate failed: No space left on device $ df -h /fs Filesystem Size Used Avail Use% Mounted on /dev/loop0 236M 217M 20M 92% /fs # in bpftrace CTRL+C, will get: @errors[-28, kretprobe:xfs_file_fallocate, xfs_alloc_file_space+665 xfs_alloc_file_space+665 xfs_file_fallocate+869 vfs_fallocate+319 __x64_sys_fallocate+68 do_syscall_64+130 entry_SYSCALL_64_after_hwframe+118 ]: 1 -28 = ENOSPC, xfs_alloc_file_space() was the routine that was root-cause and shows the full logic behind it. That ABI might be different on Your side due to kernel variations. It could be enhanced, and it might print too much (so you need to look for that -28 in the outputs). Possibly if you get any sensible output from it, you could also involve OS support (because if posix_fallocate() fails and there's space , then it's pretty odd anyway). -J. [1] - https://www.postgresql.org/message-id/50a117b6.5030...@optionshouse.com [2] - https://www.postgresql.org/message-id/202409110955.6njbwzm4ocus%40alvherre.pgsql
track_enospc2.bt
Description: Binary data