On 27.10.2016 14:39, Mithun Cy wrote:
# pg_autoprewarm.
This a PostgreSQL contrib module which automatically dump all of the
blocknums
present in buffer pool at the time of server shutdown(smart and fast
mode only,
to be enhanced to dump at regular interval.) and load these blocks
when server restarts.
Design:
------
We have created a BG Worker Auto Pre-warmer which during shutdown
dumps all the
blocknum in buffer pool in sorted order.
Format of each entry is
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.
Auto Pre-warmer is started as soon as the postmaster is started we do
not wait
for recovery to finish and database to reach a consistent state. If
there is a
"dump_file" to load we start loading each block entry to buffer pool until
there is a free buffer. This way we do not replace any new blocks
which was
loaded either by recovery process or querying clients. Then it waits
until it receives
SIGTERM to dump the block information in buffer pool.
HOW TO USE:
-----------
Build and add the pg_autoprewarm to shared_preload_libraries. Auto
Pre-warmer
process automatically do dumping of buffer pool's block info and load
them when
restarted.
TO DO:
------
Add functionality to dump based on timer at regular interval.
And some cleanups.
I wonder if you considered parallel prewarming of a table?
Right now either with pg_prewarm, either with pg_autoprewarm, preloading
table's data is performed by one backend.
It certainly makes sense if there is just one HDD and we want to
minimize impact of pg_prewarm on normal DBMS activity.
But sometimes we need to load data in memory as soon as possible. And
modern systems has larger number of CPU cores and
RAID devices make it possible to efficiently load data in parallel.
I have asked this question in context of my CFS (compressed file system)
for Postgres. The customer's complaint was that there are 64 cores at
his system but when
he is building index, decompression of heap data is performed by only
one core. This is why I thought about prewarm... (parallel index
construction is separate story...)
pg_prewarm makes is possible to specify range of blocks, so, in
principle, it is possible to manually preload table in parallel, by
spawining pg_prewarm
with different subranges in several backends. But it is definitely not
user friendly approach.
And as far as I understand pg_autoprewarm has all necessary
infrastructure to do parallel load. We just need to spawn more than one
background worker and specify
separate block range for each worker.
Do you think that such functionality (parallel autoprewarm) can be
useful and be easily added?
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers