Re: FileFallocate misbehaving on XFS

2025-01-03 Thread Michael Harris
Hi Andres On Wed, 1 Jan 2025 at 02:31, Andres Freund wrote: > Note that there's > a) a few hours between messages, whereas previous they were more frequent > b) f_bfree increased substantially. > > I assume that somewhere around 2AM some script prunes old partitions? Correct. Data is imported co

Re: FileFallocate misbehaving on XFS

2025-01-02 Thread Andres Freund
Hi, On 2024-12-20 11:39:42 -0500, Andres Freund wrote: > On 2024-12-19 17:47:13 +1100, Michael Harris wrote: > > This is a different system to those I previously provided logs from. > > It is also running RHEL8 with a similar configuration to the other > > system. > > Given it's a RHEL system, hav

Re: FileFallocate misbehaving on XFS

2025-01-02 Thread Andres Freund
Hi, On 2025-01-02 11:41:56 +0100, Andrea Gelmini wrote: > Il giorno mar 31 dic 2024 alle ore 16:31 Andres Freund > ha scritto: > > 2024-12-19 04:47:04 CET [2646363]: ERROR: could not extend file > > "pg_tblspc/107724/PG_16_202307071/465960/3232056651.2" by 11 blocks, from > > 29850 to 29861, u

Re: FileFallocate misbehaving on XFS

2025-01-02 Thread Andrea Gelmini
Il giorno mar 31 dic 2024 alle ore 16:31 Andres Freund ha scritto: 2024-12-19 04:47:04 CET [2646363]: ERROR: could not extend file > "pg_tblspc/107724/PG_16_202307071/465960/3232056651.2" by 11 blocks, from > 29850 to 29861, using FileFallocate(): No space left on device Dunno it it helps, bu

Re: FileFallocate misbehaving on XFS

2024-12-31 Thread Andres Freund
Hi, On 2024-12-19 17:47:13 +1100, Michael Harris wrote: > I have attached a file containing all the errors I collected. The > error is happening pretty regularly - over 400 times in a ~6 hour > period. The number of blocks being extended varies from ~9 to ~15, and > the statfs result shows plenty

Re: FileFallocate misbehaving on XFS

2024-12-27 Thread Bruce Momjian
On Fri, Dec 20, 2024 at 01:25:41PM +0100, Jakub Wartak wrote: > On Thu, Dec 19, 2024 at 7:49 AM Michael Harris wrote: > No one else has responded, so I'll try. My take is that we got very limited > number of reports (2-3) of this stuff happening and it always seem to be >90% > space used, yet the

Re: FileFallocate misbehaving on XFS

2024-12-20 Thread Andres Freund
Hi, On 2024-12-19 17:47:13 +1100, Michael Harris wrote: > I finally managed to get the patched version installed in a production > database where the error is occurring very regularly. Thanks! > Here is a sample of the output: > > 2024-12-19 01:08:50 CET [2533222]: LOG: mdzeroextend FileFall

Re: FileFallocate misbehaving on XFS

2024-12-20 Thread Jakub Wartak
On Thu, Dec 19, 2024 at 7:49 AM Michael Harris wrote: > Hello, > > I finally managed to get the patched version installed in a production > database where the error is occurring very regularly. > > Here is a sample of the output: > > 2024-12-19 01:08:50 CET [2533222]: LOG: mdzeroextend FileFall

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Robert Haas
On Mon, Dec 16, 2024 at 12:52 PM Andres Freund wrote: > I don't see what we gain by requiring guesswork (what does allocating vs > zeroing mean, zeroing also allocates disk space after all) to interpret the > main error message. My experience is that it's often harder to get the DETAIL > than the

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Andres Freund
Hi, On 2024-12-16 18:05:59 +0100, Alvaro Herrera wrote: > On 2024-Dec-16, Robert Haas wrote: > > > On Mon, Dec 16, 2024 at 9:12 AM Andres Freund wrote: > > > Personally I don't like the obfuscation of "allocate" and "zero" vs just > > > naming the function names. But I guess that's just taste th

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Alvaro Herrera
On 2024-Dec-16, Robert Haas wrote: > On Mon, Dec 16, 2024 at 9:12 AM Andres Freund wrote: > > Personally I don't like the obfuscation of "allocate" and "zero" vs just > > naming the function names. But I guess that's just taste thing. > > > > When looking for problems it's considerably more work

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Andres Freund
Hi, On 2024-12-16 14:45:37 +0100, Jakub Wartak wrote: > On Thu, Dec 12, 2024 at 12:50 AM Andres Freund wrote: > An extended cycle of 80 backends copying into relations and occasionally > > truncating them (to simulate the partitions being dropped and new ones > > created). For this I ran a 4TB fi

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Robert Haas
On Mon, Dec 16, 2024 at 9:12 AM Andres Freund wrote: > Personally I don't like the obfuscation of "allocate" and "zero" vs just > naming the function names. But I guess that's just taste thing. > > When looking for problems it's considerably more work with bytes, because - at > least for me - the

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Andres Freund
Hi, On 2024-12-14 09:29:12 +0100, Alvaro Herrera wrote: > On 2024-Dec-11, Andres Freund wrote: > > > One thing that I think we should definitely do is to include more detail in > > the error message. mdzeroextend()'s error messages don't include how many > > blocks the relation was to be extended

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Jakub Wartak
On Thu, Dec 12, 2024 at 12:50 AM Andres Freund wrote: > Hi, > > FWIW, I tried fairly hard to reproduce this. > Same, but without PG and also without much success. I've also tried to push the AGs (with just one or two AGs created via mkfs) to contain only small size extents (by creating hundreds

Re: FileFallocate misbehaving on XFS

2024-12-16 Thread Robert Haas
On Sat, Dec 14, 2024 at 4:20 AM Thomas Munro wrote: > On Sat, Dec 14, 2024 at 9:29 PM Alvaro Herrera > wrote: > > On 2024-Dec-11, Andres Freund wrote: > > > One thing that I think we should definitely do is to include more detail > > > in > > > the error message. mdzeroextend()'s error messages

Re: FileFallocate misbehaving on XFS

2024-12-14 Thread Thomas Munro
On Sat, Dec 14, 2024 at 9:29 PM Alvaro Herrera wrote: > On 2024-Dec-11, Andres Freund wrote: > > One thing that I think we should definitely do is to include more detail in > > the error message. mdzeroextend()'s error messages don't include how many > > blocks the relation was to be extended by.

Re: FileFallocate misbehaving on XFS

2024-12-14 Thread Alvaro Herrera
On 2024-Dec-11, Andres Freund wrote: > One thing that I think we should definitely do is to include more detail in > the error message. mdzeroextend()'s error messages don't include how many > blocks the relation was to be extended by. Neither mdextend() nor > mdzeroextend() include the offset at

Re: FileFallocate misbehaving on XFS

2024-12-12 Thread Michael Harris
Hi Andres On Fri, 13 Dec 2024 at 08:38, Andres Freund wrote: > > Another interesting snippet: the application has a number of ETL > > workers going at once. The actual number varies depending on a number > > of factors but might be somewhere from 10 - 150. Each worker will have > > a single postg

Re: FileFallocate misbehaving on XFS

2024-12-12 Thread Andres Freund
Hi, On 2024-12-12 14:14:20 +1100, Michael Harris wrote: > On Thu, 12 Dec 2024 at 10:50, Andres Freund wrote: > > Just to make sure - you're absolutely certain that you actually have space > > at > > the time of the errors? > > As sure as I can be. The RHEL8 system that I took prints from > yest

Re: FileFallocate misbehaving on XFS

2024-12-11 Thread Michael Harris
Hi Andres On Thu, 12 Dec 2024 at 10:50, Andres Freund wrote: > Just to make sure - you're absolutely certain that you actually have space at > the time of the errors? As sure as I can be. The RHEL8 system that I took prints from yesterday has > 1.5TB free. I can't see it varying by that much. I

Re: FileFallocate misbehaving on XFS

2024-12-11 Thread Andres Freund
Hi, FWIW, I tried fairly hard to reproduce this. An extended cycle of 80 backends copying into relations and occasionally truncating them (to simulate the partitions being dropped and new ones created). For this I ran a 4TB filesystem very close to fully filled (peaking at 99.998 % full). I did

Re: FileFallocate misbehaving on XFS

2024-12-11 Thread Andres Freund
Hi, On 2024-12-10 16:33:06 -0500, Andres Freund wrote: > Maybe. I think we would have gotten a lot more reports if it were common. I > know of quite a few very busy installs using xfs. > > I think there must be some as-of-yet-unknown condition gating it. E.g. that > the filesystem has been created

Re: FileFallocate misbehaving on XFS

2024-12-11 Thread Jakub Wartak
On Wed, Dec 11, 2024 at 4:00 AM Michael Harris wrote: > Hi Jakub > > On Tue, 10 Dec 2024 at 22:36, Jakub Wartak > wrote: [..] > > > 3. Maybe somehow there is a bigger interaction between posix_fallocate() > and delayed XFS's dynamic speculative preallocation from many processes all > writing

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Michael Harris
Hi again On Wed, 11 Dec 2024 at 12:09, Michael Harris wrote: > But another system I can access has multiple databases with ongoing > imports, yet all the errors bar one relate to one directory. > I will collect some data from that system and post it shortly. I've attached the same set of data co

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Michael Harris
Hi Jakub On Tue, 10 Dec 2024 at 22:36, Jakub Wartak wrote: > Yay, reflink=0, that's pretty old fs ?! This particular filesystem was created on Centos 7, and retained when the system was upgraded to RL9. So yes probably pretty old! > Could you get us maybe those below commands too? (or from any

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Michael Harris
Hi Andres On Wed, 11 Dec 2024 at 03:09, Andres Freund wrote: > I think it's implied, but I just want to be sure: This was one of the affected > systems? Yes, correct. > Any chance to get df output? I'm mainly curious about the number of used > inodes. Sorry, I could swear I had included that a

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Andres Freund
Hi, On 2024-12-10 12:36:40 -0500, Robert Haas wrote: > On Mon, Dec 9, 2024 at 7:31 PM Andres Freund wrote: > > Pretty unexcited about all of these - XFS is fairly widely used for PG, but > > this problem doesn't seem very common. It seems to me that we're missing > > something that causes this to

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Robert Haas
On Mon, Dec 9, 2024 at 7:31 PM Andres Freund wrote: > Pretty unexcited about all of these - XFS is fairly widely used for PG, but > this problem doesn't seem very common. It seems to me that we're missing > something that causes this to only happen in a small subset of cases. I wonder if this is

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Andres Freund
Hi, On 2024-12-10 12:36:33 +0100, Jakub Wartak wrote: > On Tue, Dec 10, 2024 at 7:34 AM Michael Harris wrote: > 1. Well it doesn't look like XFS AG fragmentation to me (we had a customer > with a huge number of AGs with small space in them) reporting such errors > after upgrading to 16, but not

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Andres Freund
On 2024-12-10 11:34:15 -0500, Andres Freund wrote: > On 2024-12-10 12:36:33 +0100, Jakub Wartak wrote: > > On Tue, Dec 10, 2024 at 7:34 AM Michael Harris wrote: > > 2. > > > > > # xfs_info /dev/mapper/ippvg-ipplv > > > meta-data=/dev/mapper/ippvg-ipplv isize=512agcount=4, > > agsize=26247142

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Andres Freund
Hi, On 2024-12-10 17:28:21 +1100, Michael Harris wrote: > On Tue, 10 Dec 2024 at 11:31, Andres Freund wrote: > > It'd be useful to get the xfs_info output that Jakub asked for. Perhaps also > > xfs_spaceman -c 'freesp -s' /mountpoint > > xfs_spaceman -c 'health' /mountpoint > > and df. > > I gath

Re: FileFallocate misbehaving on XFS

2024-12-10 Thread Jakub Wartak
On Tue, Dec 10, 2024 at 7:34 AM Michael Harris wrote: Hi Michael, 1. Well it doesn't look like XFS AG fragmentation to me (we had a customer with a huge number of AGs with small space in them) reporting such errors after upgrading to 16, but not for earlier versions (somehow posix_fallocate() h

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Michael Harris
Hi Andres Following up on the earlier question about OS upgrade paths - all the cases reported so far are either on RL8 (Kernel 4.18.0) or were upgraded to RL9 (kernel 5.14.0) and the affected filesystems were preserved. In fact the RL9 systems were initially built as Centos 7, and then when that

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Michael Harris
Hi again One extra piece of information: I had said that all the machines were Rocky Linux 8 or Rocky Linux 9, but actually a large number of them are RHEL8. Sorry for the confusion. Of course RL8 is a rebuild of RHEL8 so it is not surprising they would be behaving similarly. Cheers Mike On Tu

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andres Freund
Hi, On 2024-12-10 10:00:43 +1100, Michael Harris wrote: > On Mon, 9 Dec 2024 at 21:06, Tomas Vondra wrote: > > Sounds more like an XFS bug/behavior, so it's not clear to me what we > > could do about it. I mean, if the filesystem reports bogus out-of-space, > > is there even something we can do?

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andres Freund
Hi, On 2024-12-10 09:34:08 +1100, Michael Harris wrote: > On Tue, 10 Dec 2024 at 03:31, Andres Freund wrote: > > I found some references for bugs that were fixed in 5.13. But I think at > > least > > some of this would persist if the filesystem ran into the issue with a > > kernel > > before th

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Michael Harris
Hi Tomas On Mon, 9 Dec 2024 at 21:06, Tomas Vondra wrote: > Sounds more like an XFS bug/behavior, so it's not clear to me what we > could do about it. I mean, if the filesystem reports bogus out-of-space, > is there even something we can do? I don't disagree that it's most likely an XFS issue. H

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Michael Harris
Hi Andres On Tue, 10 Dec 2024 at 03:31, Andres Freund wrote: > Were those pg_upgrades done with pg_upgrade --clone? Or have been, on the same > filesystem, in the past? No, our procedure is to use --link. > I found some references for bugs that were fixed in 5.13. But I think at least > some of

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andres Freund
Hi, On 2024-12-09 18:34:22 +1100, Michael Harris wrote: > Our application has recently migrated to PG16, and we have experienced > some failed upgrades. The upgrades are performed using pg_upgrade and > have failed during the phase where the schema is restored into the new > cluster, with the foll

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andres Freund
Hi, On 2024-12-09 15:47:55 +0100, Tomas Vondra wrote: > On 12/9/24 11:27, Jakub Wartak wrote: > > On Mon, Dec 9, 2024 at 10:19 AM Michael Harris > > wrote: > > > > Hi Michael, > > > > We found this thread describing similar issues: > > > > https://www.postgresq

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Tomas Vondra
On 12/9/24 11:27, Jakub Wartak wrote: > On Mon, Dec 9, 2024 at 10:19 AM Michael Harris > wrote: > > Hi Michael, > > We found this thread describing similar issues: > > https://www.postgresql.org/message-id/flat/ > > AS1PR05MB91059AC8B525910A5FCD6E699F9A2

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Jakub Wartak
On Mon, Dec 9, 2024 at 10:19 AM Michael Harris wrote: Hi Michael, We found this thread describing similar issues: > > > https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com > We've got some case in the past here in EDB, wher

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Tomas Vondra
On 12/9/24 10:47, Andrea Gelmini wrote: > > > Il Lun 9 Dic 2024, 10:19 Michael Harris > ha scritto: > > > Is this already being looked into? > > > Funny, i guess it's the same reason I see randomly complain of WhatsApp > web interface, on Chrome, since I switch

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Tomas Vondra
On 12/9/24 08:34, Michael Harris wrote: > Hello PG Hackers > > Our application has recently migrated to PG16, and we have experienced > some failed upgrades. The upgrades are performed using pg_upgrade and > have failed during the phase where the schema is restored into the new > cluster, with

Re: FileFallocate misbehaving on XFS

2024-12-09 Thread Andrea Gelmini
Il Lun 9 Dic 2024, 10:19 Michael Harris ha scritto: > > Is this already being looked into? > Funny, i guess it's the same reason I see randomly complain of WhatsApp web interface, on Chrome, since I switched to XFS. It says something like "no more space on disk" and logout, with more than 300GB

FileFallocate misbehaving on XFS

2024-12-09 Thread Michael Harris
Hello PG Hackers Our application has recently migrated to PG16, and we have experienced some failed upgrades. The upgrades are performed using pg_upgrade and have failed during the phase where the schema is restored into the new cluster, with the following error: pg_restore: error: could not exec