On Thu, Nov 1, 2018 at 4:25 AM Jeff Janes <jeff.ja...@gmail.com> wrote: > On Wed, Oct 31, 2018 at 1:38 AM Torsten Förtsch <tfoertsch...@gmail.com> > wrote: >> I am working on restoring a database from a base backup + WAL. With the >> default settings the database replays about 3-4 WAL files per second. The >> startup process takes about 65% of a CPU and writes data with something >> between 50 and 100 MB/sec. >> >> Is there a way to speed that up? The disk can easily sustain 400-500 MB/sec. > > > WAL replay is single-threaded, so the most you would be able to speed it up > is 50%, to where it would be taking 100% CPU. > > Is the time spent not on the CPU being spent waiting for WAL files to arrive > from the restore_command, or waiting for the blocks it needs to replay into > to get read from disk, or waiting for dirty pages to get written to disk so > they can be evicted to make way for new ones? > > One way I found to speed up restore_command is to have another program run a > few WAL files ahead of it, copying the WAL from the real archive into a > scratch space which is on the same filesystem as pg_xlog/pg_wal. Then have > restore_command simply move (not copy) the requested files from the scratch > space onto %p. The intra-filesystem move completes much faster than a copy. > > If it spends time waiting for blocks that need to be recovered into to get > read from disk, and you have enough RAM, you could speed it up by pre-warming > the file system cache. Something like: > > tar -cf - $PGDATA | wc -c
For more targeted prewarming of large systems that don't fit in RAM and to get all the way into PostgreSQL's buffer pool, I suppose you could write a small Python/whatever script that extracts the relfilenode + block references from the output of pg_waldump (one file ahead, or whatever), sorts and uniques them, merges them into block ranges, converts the relfilenode reference to relation OID, and then calls pg_prewarm() for each range. -- Thomas Munro http://www.enterprisedb.com