On Wed, Feb 27, 2019 at 7:05 AM Jeff Janes <jeff.ja...@gmail.com> wrote:
> > > After doing a truncation and '\copy ... with (freeze)' of a table with > long data, I find that the associated toast table has a handful of unfrozen > blocks. I don't know if that is an actual problem, but it does seem a bit > odd, and thus suspicious. > > Hi Jeff, thanks for looking at it and the test. I can reproduce the problem and quite curiously block number 1 and then every 32672th block is getting skipped. postgres=# select * from pg_visibility('pg_toast.pg_toast_16384') where all_visible = 'f'; blkno | all_visible | all_frozen | pd_all_visible --------+-------------+------------+---------------- 1 | f | f | f 32673 | f | f | f 65345 | f | f | f 98017 | f | f | f 130689 | f | f | f 163361 | f | f | f <snip> Having investigated this a bit, I see that a relcache invalidation arrives after 1st and then after every 32672th block is filled. That clears the rel->rd_smgr field and we lose the information about the saved target block. The code then moves to extend the relation again and thus skips the previously less-than-half filled block, losing the free space in that block. postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_16384', 0)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid ------------+----------+-------+-------+-------+---------+----------+---------+----------- 1/15B37748 | 0 | 4 | 40 | 64 | 8192 | 8192 | 4 | 0 (1 row) postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_16384', 1)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid ------------+----------+-------+-------+-------+---------+----------+---------+----------- 1/15B39A28 | 0 | 4 | 28 | 7640 | 8192 | 8192 | 4 | 0 (1 row) postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_16384', 2)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid ------------+----------+-------+-------+-------+---------+----------+---------+----------- 1/15B3BE08 | 0 | 4 | 40 | 64 | 8192 | 8192 | 4 | 0 (1 row) So the block 1 has a large amount of free space (upper - lower), which never gets filled. I am not yet sure what causes the relcache invalidation at regular intervals. But if I have to guess, it could be because of a new VM (or FSM?) page getting allocated. I am bit puzzled because this issue seems to only occur with toast tables since I tested the patch while writing it on a regular table and did not see any block remaining unfrozen. I tested only upto 450 blocks, but that shouldn't matter because with your test, we see the problem with block 1 as well. So something to look into in more detail. While we could potentially fix this by what you'd done in the original patch and what Kuntal also suggested, i.e. by setting the PD_ALL_VISIBLE bit during page initialisation itself, I am a bit circumspect about that approach for two reasons: 1. It requires us to then add extra logic to avoid clearing the bit during insertions 2. It requires us to also update the VM bit during page init or risk having divergent views on the page-level bit and the VM bit. And even if we do that, this newly discovered problem of less-than-half filled intermediate blocks remain. I wonder if we should instead track the last used block in BulkInsertState and if the relcache invalidation flushes smgr, start inserting again from the last saved block. In fact, we already track the last used buffer in BulkInsertState and that's enough to know the last used block. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services