Re: Handing off SLRU fsyncs to the checkpointer

2021-01-04 Thread Thomas Munro
On Mon, Jan 4, 2021 at 3:35 AM Tomas Vondra wrote: > Seems this commit left behind a couple unnecessary prototypes in a bunch > of header files. In particular, it removed these functions > > - ShutdownCLOG(); > - ShutdownCommitTs(); > - ShutdownSUBTRANS(); > - ShutdownMultiXact(); Thanks. Fixed.

Re: Handing off SLRU fsyncs to the checkpointer

2021-01-03 Thread Tomas Vondra
On 9/25/20 9:09 AM, Thomas Munro wrote: On Fri, Sep 25, 2020 at 12:53 PM Thomas Munro wrote: Here's a new version. The final thing I'm contemplating before pushing this is whether there may be hidden magical dependencies in the order of operations in CheckPointGuts(), which I've changed around

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-25 Thread Thomas Munro
On Fri, Sep 25, 2020 at 12:53 PM Thomas Munro wrote: > Here's a new version. The final thing I'm contemplating before > pushing this is whether there may be hidden magical dependencies in > the order of operations in CheckPointGuts(), which I've changed > around. Andres, any comments? I nagged

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-24 Thread Thomas Munro
On Fri, Sep 25, 2020 at 12:05 PM Tom Lane wrote: > Thomas Munro writes: > > Tom, do you have any thoughts on ShutdownCLOG() etc? > > Hm, if we cannot reach that without first completing a shutdown checkpoint, > it does seem a little pointless. Thanks for the sanity check. > It'd likely be a goo

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-24 Thread Tom Lane
Thomas Munro writes: > Tom, do you have any thoughts on ShutdownCLOG() etc? Hm, if we cannot reach that without first completing a shutdown checkpoint, it does seem a little pointless. It'd likely be a good idea to add a comment to CheckPointCLOG et al explaining that we expect $what-exactly to

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-24 Thread Thomas Munro
On Wed, Sep 23, 2020 at 1:56 PM Thomas Munro wrote: > As for the ShutdownXXX() functions, I haven't yet come up with any > reason for this code to exist. Emboldened by a colleague's inability > to explain to me what that code is doing for us, here is a new version > that just rips it all out. Re

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-22 Thread Thomas Munro
On Tue, Sep 22, 2020 at 9:08 AM Thomas Munro wrote: > On Mon, Sep 21, 2020 at 2:19 PM Thomas Munro wrote: > > While scanning for comments and identifier names that needed updating, > > I realised that this patch changed the behaviour of the ShutdownXXX() > > functions, since they currently flush

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-21 Thread Thomas Munro
On Mon, Sep 21, 2020 at 2:19 PM Thomas Munro wrote: > While scanning for comments and identifier names that needed updating, > I realised that this patch changed the behaviour of the ShutdownXXX() > functions, since they currently flush the SLRUs but are not followed > by a checkpoint. I'm not en

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-20 Thread Thomas Munro
On Sun, Sep 20, 2020 at 12:40 PM Thomas Munro wrote: > On Sat, Sep 19, 2020 at 5:06 PM Thomas Munro wrote: > > In the meantime, from the low-hanging-fruit department, here's a new > > version of the SLRU-fsync-offload patch. The only changes are a > > tweaked commit message, and adoption of C99

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-19 Thread Thomas Munro
On Sat, Sep 19, 2020 at 5:06 PM Thomas Munro wrote: > In the meantime, from the low-hanging-fruit department, here's a new > version of the SLRU-fsync-offload patch. The only changes are a > tweaked commit message, and adoption of C99 designated initialisers > for the function table, so { [SYNC_H

Re: Handing off SLRU fsyncs to the checkpointer

2020-09-18 Thread Thomas Munro
On Mon, Aug 31, 2020 at 8:50 PM Jakub Wartak wrote: > - IO_URING - gives a lot of promise here I think, is it even planned to be > shown for PgSQL14 cycle ? Or it's more like PgSQL15? I can't answer that, but I've played around with the prototype quite a bit, and thought quite a lot about how to

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-31 Thread Jakub Wartak
Hi Thomas, hackers, >> ... %CPU ... COMMAND >> ... 97.4 ... postgres: startup recovering 00010089 > So, what else is pushing this thing off CPU, anyway? For one thing, I > guess it might be stalling while reading the WAL itself, because (1) > we only read it 8KB at a time, relying

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Thomas Munro
On Sat, Aug 29, 2020 at 12:43 AM Jakub Wartak wrote: > ... %CPU ... COMMAND > ... 97.4 ... postgres: startup recovering 00010089 So, what else is pushing this thing off CPU, anyway? For one thing, I guess it might be stalling while reading the WAL itself, because (1) we only read

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Thomas Munro
On Sat, Aug 29, 2020 at 12:43 AM Jakub Wartak wrote: > USERPID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > postgres 120935 0.9 0.0 866052 3824 ?Ss 09:47 0:00 postgres: > checkpointer > postgres 120936 61.9 0.0 865796 3824 ?Rs 09:47 0:22 postgre

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Jakub Wartak
Hi Thomas, hackers, >> > To move these writes out of recovery's way, we should probably just >> > run the bgwriter process during crash recovery. I'm going to look >> > into that. >> >> Sounds awesome. > >I wrote a quick and dirty experimental patch to try that. I can't see >any benefit from it

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Thomas Munro
On Thu, Aug 27, 2020 at 8:48 PM Jakub Wartak wrote: > >> 29.62% postgres [kernel.kallsyms] [k] > >> copy_user_enhanced_fast_string > >> ---copy_user_enhanced_fast_string > >>|--17.98%--copyin > >> [..] > >>| __pwrite_nocancel > >>

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Thomas Munro
On Thu, Aug 27, 2020 at 8:48 PM Jakub Wartak wrote: > I've tried to get cache misses ratio via PMCs, apparently on EC2 they are > (even on bigger) reporting as not-supported or zeros. I heard some of the counters are only allowed on their dedicated instance types. > However interestingly the wo

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Jakub Wartak
Hi Alvaro, Thomas, hackers >> 14.69% postgres postgres[.] hash_search_with_hash_value >> ---hash_search_with_hash_value >>|--9.80%--BufTableLookup [..] >> --4.90%--smgropen >> |--2.86%--ReadBufferWithoutRelcach

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-27 Thread Jakub Wartak
Hi Thomas / hackers, >> The append-only bottleneck appears to be limited by syscalls/s due to small >> block size even with everything in FS cache (but not in shared buffers, >> please compare with TEST1 as there was no such bottleneck at all): >> >> 29.62% postgres [kernel.kallsyms] [k]

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-26 Thread Thomas Munro
On Thu, Aug 27, 2020 at 6:15 AM Alvaro Herrera wrote: > > --4.90%--smgropen > > |--2.86%--ReadBufferWithoutRelcache > > Looking at an earlier report of this problem I was thinking whether it'd > make sense to replace SMgrRelationHash with a simplehash tabl

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-26 Thread Alvaro Herrera
On 2020-Aug-25, Jakub Wartak wrote: > Turning on/off the defer SLRU patch and/or fsync doesn't seem to make > any difference, so if anyone is curious the next sets of append-only > bottlenecks is like below: > > 14.69% postgres postgres[.] hash_search_with_hash_value >

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-26 Thread Alvaro Herrera
On 2020-Aug-25, Andres Freund wrote: > Hi, > > On 2020-08-26 15:58:14 +1200, Thomas Munro wrote: > > > --12.51%--compactify_tuples > > > PageRepairFragmentation > > > heap2_redo > > > StartupXLOG > > >

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Andres Freund
Hi, On 2020-08-26 15:58:14 +1200, Thomas Munro wrote: > > --12.51%--compactify_tuples > > PageRepairFragmentation > > heap2_redo > > StartupXLOG > > I wonder if there is something higher level that could

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Thomas Munro
On Tue, Aug 25, 2020 at 9:16 PM Jakub Wartak wrote: > I just wanted to help testing this patch (defer SLRU fsyncs during recovery) > and also faster compactify_tuples() patch [2] as both are related to the WAL > recovery performance in which I'm interested in. This is my first message to > this

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-25 Thread Jakub Wartak
On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro wrote: > [patch] Hi Thomas / hackers, I just wanted to help testing this patch (defer SLRU fsyncs during recovery) and also faster compactify_tuples() patch [2] as both are related to the WAL recovery performance in which I'm interested in. This is

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-12 Thread Thomas Munro
On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro wrote: > [patch] Bitrot, rebased, no changes. > Yeah, the combined effect of these two patches is better than I > expected. To be clear though, I was only measuring the time between > the "redo starts at ..." and "redo done at ..." messages, since I'

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-11 Thread Thomas Munro
On Sat, Aug 8, 2020 at 2:44 AM Robert Haas wrote: > On Wed, Aug 5, 2020 at 2:01 AM Thomas Munro wrote: > > * Master is around 11% faster than last week before commit c5315f4f > > "Cache smgrnblocks() results in recovery." > > * This patch gives a similar speedup, bringing the total to around 25%

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-07 Thread Robert Haas
On Wed, Aug 5, 2020 at 2:01 AM Thomas Munro wrote: > * Master is around 11% faster than last week before commit c5315f4f > "Cache smgrnblocks() results in recovery." > * This patch gives a similar speedup, bringing the total to around 25% > faster than last week (the time is ~20% less, the WAL pro

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-04 Thread Thomas Munro
On Tue, Aug 4, 2020 at 6:02 PM Thomas Munro wrote: > ... speedup of around 6% ... I did some better testing. OS: Linux, storage: consumer SSD. I repeatedly ran crash recovery on 3.3GB worth of WAL generated with 8M pgbench transactions. I tested 3 different builds 7 times each and used "minist

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-03 Thread Thomas Munro
On Wed, Feb 12, 2020 at 9:54 PM Thomas Munro wrote: > In commit 3eb77eba we made it possible for any subsystem that wants a > file to be flushed as part of the next checkpoint to ask the > checkpointer to do that, as previously only md.c could do. Hello, While working on recovery performance, I