Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Pavan Deolasee Tue, 26 Feb 2019 20:57:31 -0800

On Wed, Feb 27, 2019 at 7:05 AM Jeff Janes <jeff.ja...@gmail.com> wrote:


>
>
> After doing a truncation and '\copy ... with (freeze)' of a table with
> long data, I find that the associated toast table has a handful of unfrozen
> blocks.  I don't know if that is an actual problem, but it does seem a bit
> odd, and thus suspicious.
>
>
Hi Jeff, thanks for looking at it and the test. I can reproduce the problem
and quite curiously block number 1 and then every 32672th block is getting
skipped.

postgres=# select * from pg_visibility('pg_toast.pg_toast_16384') where
all_visible = 'f';
 blkno  | all_visible | all_frozen | pd_all_visible
--------+-------------+------------+----------------
      1 | f           | f          | f
  32673 | f           | f          | f
  65345 | f           | f          | f
  98017 | f           | f          | f
 130689 | f           | f          | f
 163361 | f           | f          | f
 <snip>

Having investigated this a bit, I see that a relcache invalidation arrives
after 1st and then after every 32672th block is filled. That clears the
rel->rd_smgr field and we lose the information about the saved target
block. The code then moves to extend the relation again and thus skips the
previously less-than-half filled block, losing the free space in that block.

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 0));
    lsn     | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
 1/15B37748 |        0 |     4 |    40 |    64 |    8192 |     8192 |
 4 |         0
(1 row)

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 1));
    lsn     | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
 1/15B39A28 |        0 |     4 |    28 |  7640 |    8192 |     8192 |
 4 |         0
(1 row)

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 2));
    lsn     | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
 1/15B3BE08 |        0 |     4 |    40 |    64 |    8192 |     8192 |
 4 |         0
(1 row)

So the block 1 has a large amount of free space (upper - lower), which
never gets filled.

I am not yet sure what causes the relcache invalidation at regular
intervals. But if I have to guess, it could be because of a new VM (or
FSM?) page getting allocated. I am bit puzzled because this issue seems to
only occur with toast tables since I tested the patch while writing it on a
regular table and did not see any block remaining unfrozen. I tested only
upto 450 blocks, but that shouldn't matter because with your test, we see
the problem with block 1 as well. So something to look into in more detail.

While we could potentially fix this by what you'd done in the original
patch and what Kuntal also suggested, i.e. by setting the PD_ALL_VISIBLE
bit during page initialisation itself, I am a bit circumspect about that
approach for two reasons:

1. It requires us to then add extra logic to avoid clearing the bit during
insertions
2. It requires us to also update the VM bit during page init or risk having
divergent views on the page-level bit and the VM bit.

And even if we do that, this newly discovered problem of less-than-half
filled intermediate blocks remain. I wonder if we should instead track the
last used block in BulkInsertState and if the relcache invalidation flushes
smgr, start inserting again from the last saved block. In fact, we already
track the last used buffer in BulkInsertState and that's enough to know the
last used block.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Reply via email to