Hi,

This thread started at 
https://www.postgresql.org/message-id/20220213021746.GM31460%40telsasoft.com
but is mostly independent, so I split the thread off

On 2022-02-12 20:17:46 -0600, Justin Pryzby wrote:
> On Sat, Feb 12, 2022 at 06:00:44PM -0800, Andres Freund wrote:
> > I bet using COW file copies would speed up our own regression tests 
> > noticeably
> > - on slower systems we spend a fair bit of time and space creating template0
> > and postgres, with the bulk of the data never changing.
> > 
> > Template databases are also fairly commonly used by application developers 
> > to
> > avoid the cost of rerunning all the setup DDL & initial data loading for
> > different tests. Making that measurably cheaper would be a significant win.
> 
> +1
> 
> I ran into this last week and was still thinking about proposing it.
> 
> Would this help CI

It could theoretically help linux - but currently I think the filesystem for
CI is ext4, which doesn't support FICLONE. I assume it'd help macos, but I
don't know the performance characteristics of copyfile(). I don't think any of
the other OSs have working reflink / file clone support.

You could prototype it for CI on macos by using the "template initdb" patch
and passing -c to cp.

On linux it might be worth using copy_file_range(), if supported, if not file
cloning. But that's kind of an even more separate topic...


>  or any significant fraction of buildfarm ?

Not sure how many are on new enough linux / mac to benefit and use a suitable
filesystem. There are a few animals with slow-ish storage but running fairly
new linux. Don't think we can see the FS. Those would likely benefit the most.


> Or just tests run locally on supporting filesystems.

Probably depends on your storage subsystem. If not that fast, and running
tests concurrently, it'd likely help.


On my workstation, with lots of cores and very fast storage, using the initdb
caching patch modified to do cp --reflink=never / always yields the following
time for concurrent check-world (-j40 PROVE_FLAGS=-j4):

cp --reflink=never:

96.64user 61.74system 1:04.69elapsed 244%CPU (0avgtext+0avgdata 
97544maxresident)k
0inputs+34124296outputs (2584major+7247038minor)pagefaults 0swaps
pcheck-world-success

cp --reflink=always:

91.79user 56.16system 1:04.21elapsed 230%CPU (0avgtext+0avgdata 
97716maxresident)k
189328inputs+16361720outputs (2674major+7229696minor)pagefaults 0swaps
pcheck-world-success

Seems roughly stable across three runs.


Just comparing the time for cp -r of a fresh initdb'd cluster:
cp -a --reflink=never
real    0m0.043s
user    0m0.000s
sys     0m0.043s
cp -a --reflink=always
real    0m0.021s
user    0m0.004s
sys     0m0.018s

so that's a pretty nice win.


> Note that pg_upgrade already supports copy/link/clone.  (Obviously, link
> wouldn't do anything desirable for CREATE DATABASE).

Yea. We'd likely have to move relevant code into src/port.


Greetings,

Andres Freund


Reply via email to