On 27.10.2016 14:39, Mithun Cy wrote:
# pg_autoprewarm.

This a PostgreSQL contrib module which automatically dump all of the blocknums present in buffer pool at the time of server shutdown(smart and fast mode only, to be enhanced to dump at regular interval.) and load these blocks when server restarts.

Design:
------
We have created a BG Worker Auto Pre-warmer which during shutdown dumps all the
blocknum in buffer pool in sorted order.
Format of each entry is <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>. Auto Pre-warmer is started as soon as the postmaster is started we do not wait for recovery to finish and database to reach a consistent state. If there is a
"dump_file" to load we start loading each block entry to buffer pool until
there is a free buffer. This way we do not replace any new blocks which was loaded either by recovery process or querying clients. Then it waits until it receives
SIGTERM to dump the block information in buffer pool.

HOW TO USE:
-----------
Build and add the pg_autoprewarm to shared_preload_libraries. Auto Pre-warmer process automatically do dumping of buffer pool's block info and load them when
restarted.

TO DO:
------
Add functionality to dump based on timer at regular interval.
And some cleanups.

I wonder if you considered parallel prewarming of a table?
Right now either with pg_prewarm, either with pg_autoprewarm, preloading table's data is performed by one backend. It certainly makes sense if there is just one HDD and we want to minimize impact of pg_prewarm on normal DBMS activity. But sometimes we need to load data in memory as soon as possible. And modern systems has larger number of CPU cores and
RAID devices make it possible to efficiently load data in parallel.

I have asked this question in context of my CFS (compressed file system) for Postgres. The customer's complaint was that there are 64 cores at his system but when he is building index, decompression of heap data is performed by only one core. This is why I thought about prewarm... (parallel index construction is separate story...)

pg_prewarm makes is possible to specify range of blocks, so, in principle, it is possible to manually preload table in parallel, by spawining pg_prewarm with different subranges in several backends. But it is definitely not user friendly approach. And as far as I understand pg_autoprewarm has all necessary infrastructure to do parallel load. We just need to spawn more than one background worker and specify
separate block range for each worker.

Do you think that such functionality (parallel autoprewarm) can be useful and be easily added?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to