Re: Use read streams in pg_visibility

Noah Misch Fri, 23 Aug 2024 12:01:43 -0700

On Fri, Aug 23, 2024 at 02:20:06PM +0300, Nazir Bilal Yavuz wrote:
> On Tue, 20 Aug 2024 at 21:47, Noah Misch <n...@leadboat.com> wrote:
> > On Tue, Aug 13, 2024 at 03:22:27PM +0300, Nazir Bilal Yavuz wrote:
> > > The downside of this approach is there are too many "vmbuffer is valid
> > > but BufferGetBlockNumber(*vmbuffer) is not equal to mapBlock, so
> > > vmbuffer needs to be read again" cases in the read stream version (700
> > > vs 20 for the 6 GB table). This is caused by the callback function of
> > > the read stream reading a new vmbuf while getting next block numbers.
> > > So, vmbuf is wrong when we are checking visibility map bits that might
> > > have changed while we were acquiring the page lock.
> >
> > Did the test that found 700 "read again" use a different procedure than the
> > one shared as setup.sh down-thread?  Using that script alone, none of the vm
> > bits are set, so I'd not expect any heap page reads.
> 
> Yes, it is basically the same script but the query is 'SELECT
> pg_check_visible('${TABLE}'::regclass);'.


I gather the scripts deal exclusively in tables with no vm bits set, so
pg_visibility did no heap page reads under those scripts.  But the '700 "read
again"' result suggests either I'm guessing wrong, or that result came from a
different test procedure.  Thoughts?

> > The "vmbuffer needs to be read again" regression fits what I would expect 
> > the
> > v1 patch to do with a table having many vm bits set.  In general, I think 
> > the
> > fix is to keep two heap Buffer vars, so the callback can work on one 
> > vmbuffer
> > while collect_corrupt_items() works on another vmbuffer.  Much of the time,
> > they'll be the same buffer.  It could be as simple as that, but you could
> > consider further optimizations like these:
> 
> Thanks for the suggestion. Keeping two vmbuffers lowered the count of
> "read-again" cases to ~40. I ran the test again and the timing
> improvement is ~5% now.

> I think the number of "read-again" cases are low enough now.

Agreed.  The increase from 20 to 40 is probably an increase in buffer mapping
lookups, not an increase in I/O.

> Does creating something like
> pg_general_read_stream_next_block() in read_stream code and exporting
> it makes sense?

To me, that name isn't conveying the function's behavior, and the words "pg_"
and "general_" aren't adding information there.  How about one of these names
or similar:

seq_read_stream_cb
sequential_read_stream_cb
block_range_read_stream_cb

> > The callback doesn't return blocks having zero vm bits, so the blkno 
> > variable
> > is not accurate.  I didn't test, but I think the loop's "Recheck to avoid
> > returning spurious results." looks at the bit for the wrong block.  If 
> > that's
> > what v1 does, could you expand the test file to catch that?  For example, 
> > make
> > a two-block table with only the second block all-visible.
> 
> Yes, it was not accurate. I am getting blockno from the buffer now. I
> checked and confirmed it is working as expected by manually logging
> blocknos returned from the read stream. I am not sure how to add a
> test case for this.

VACUUM FREEZE makes an all-visible, all-frozen table.  DELETE of a particular
TID, even if rolled back, clears both vm bits for the TID's page.  Past tests
like that had instability problems.  One cause is a concurrent session's XID
or snapshot, which can prevent VACUUM setting vm bits.  Using a TEMP table may
have been one of the countermeasures, but I don't remember clearly.  Hence,
please search the archives or the existing pg_visibility tests for how we
dealt with that.  It may not be problem for this particular test.

> There is one behavioral change introduced with the patches. It could
> happen when collect_corrupt_items() is called with both all_visible
> and all_frozen being true.
> -> Let's say VM_ALL_FROZEN() returns true but VM_ALL_VISIBLE() returns
> false in callback. Callback returns this block number because
> VM_ALL_FROZEN() is true.
> -> At the /* Recheck to avoid returning spurious results. */ part, we
> should only check VM_ALL_FROZEN() again as this was returning true in
> the callback. But we check both VM_ALL_FROZEN() and VM_ALL_VISIBLE().
> VM_ALL_FROZEN() returns false and VM_ALL_VISIBLE() returns true now
> (vice versa of callback).
> -> We were skipping this block in the master but the patched version
> does not skip that.
> 
> Is this a problem? If this is a problem, a solution might be to
> callback return values of VM_ALL_FROZEN() and VM_ALL_VISIBLE() in the
> per_buffer_data.

No, I don't consider that a problem.  That's not making us do additional I/O,
just testing more bits within the pages we're already loading.  The difference
in behavior is user-visible, but it's the same behavior change the user would
get if the timing had been a bit different.

Re: Use read streams in pg_visibility

Reply via email to