Hi Jakub On Tue, 10 Dec 2024 at 22:36, Jakub Wartak <jakub.war...@enterprisedb.com> wrote: > Yay, reflink=0, that's pretty old fs ?!
This particular filesystem was created on Centos 7, and retained when the system was upgraded to RL9. So yes probably pretty old! > Could you get us maybe those below commands too? (or from any other directory > exhibiting such errors) > > stat pg_tblspc/16401/PG_16_202307071/17643/ > ls -1 pg_tblspc/16401/PG_16_202307071/17643/ | wc -l > time ls -1 pg_tblspc/16401/PG_16_202307071/17643/ | wc -l # to assess timing > of getdents() call as that may something about that directory indirectly # stat pg_tblspc/16402/PG_16_202307071/49163/ File: pg_tblspc/16402/PG_16_202307071/49163/ Size: 5177344 Blocks: 14880 IO Block: 4096 directory Device: fd02h/64770d Inode: 4299946593 Links: 2 Access: (0700/drwx------) Uid: ( 26/postgres) Gid: ( 26/postgres) Access: 2024-12-11 09:39:42.467802419 +0900 Modify: 2024-12-11 09:51:19.813948673 +0900 Change: 2024-12-11 09:51:19.813948673 +0900 Birth: 2024-11-25 17:37:11.812374672 +0900 # time ls -1 pg_tblspc/16402/PG_16_202307071/49163/ | wc -l 179000 real 0m0.474s user 0m0.439s sys 0m0.038s > 3. Maybe somehow there is a bigger interaction between posix_fallocate() and > delayed XFS's dynamic speculative preallocation from many processes all > writing into different partitions ? Maybe try "allocsize=1m" mount option for > that /fs and see if that helps. I'm going to speculate about XFS speculative > :) pre allocations, but if we have fdcache and are *not* closing fds, how XFS > might know to abort its own speculation about streaming write ? (multiply > that up to potentially the number of opened fds to get an avalanche of > "preallocations"). I will try to organize that. They are production systems so it might take some time. > 4. You can also try compiling with patch from Alvaro from [2] > "0001-Add-some-debugging-around-mdzeroextend.patch", so we might end up > having more clarity in offsets involved. If not then you could use 'strace -e > fallocate -p <pid>' to get the exact syscall. I'll take a look at Alvaro's patch. strace sounds good, but how to arrange to start it on the correct PG backends? There will be a large-ish number of PG backends going at a time, only some of which are performing imports, and they will be coming and going every so often as the ETL application scales up and down with the load. > 5. Another idea could be catching the kernel side stacktrace of fallocate() > when it is hitting ENOSPC. E.g. with XFS fs and attached bpftrace eBPF tracer > I could get the source of the problem in my artificial reproducer, e.g OK, I will look into that also. Cheers Mike