On Fri, Aug 23, 2024 at 02:20:06PM +0300, Nazir Bilal Yavuz wrote: > On Tue, 20 Aug 2024 at 21:47, Noah Misch <n...@leadboat.com> wrote: > > On Tue, Aug 13, 2024 at 03:22:27PM +0300, Nazir Bilal Yavuz wrote: > > > The downside of this approach is there are too many "vmbuffer is valid > > > but BufferGetBlockNumber(*vmbuffer) is not equal to mapBlock, so > > > vmbuffer needs to be read again" cases in the read stream version (700 > > > vs 20 for the 6 GB table). This is caused by the callback function of > > > the read stream reading a new vmbuf while getting next block numbers. > > > So, vmbuf is wrong when we are checking visibility map bits that might > > > have changed while we were acquiring the page lock. > > > > Did the test that found 700 "read again" use a different procedure than the > > one shared as setup.sh down-thread? Using that script alone, none of the vm > > bits are set, so I'd not expect any heap page reads. > > Yes, it is basically the same script but the query is 'SELECT > pg_check_visible('${TABLE}'::regclass);'.
I gather the scripts deal exclusively in tables with no vm bits set, so pg_visibility did no heap page reads under those scripts. But the '700 "read again"' result suggests either I'm guessing wrong, or that result came from a different test procedure. Thoughts? > > The "vmbuffer needs to be read again" regression fits what I would expect > > the > > v1 patch to do with a table having many vm bits set. In general, I think > > the > > fix is to keep two heap Buffer vars, so the callback can work on one > > vmbuffer > > while collect_corrupt_items() works on another vmbuffer. Much of the time, > > they'll be the same buffer. It could be as simple as that, but you could > > consider further optimizations like these: > > Thanks for the suggestion. Keeping two vmbuffers lowered the count of > "read-again" cases to ~40. I ran the test again and the timing > improvement is ~5% now. > I think the number of "read-again" cases are low enough now. Agreed. The increase from 20 to 40 is probably an increase in buffer mapping lookups, not an increase in I/O. > Does creating something like > pg_general_read_stream_next_block() in read_stream code and exporting > it makes sense? To me, that name isn't conveying the function's behavior, and the words "pg_" and "general_" aren't adding information there. How about one of these names or similar: seq_read_stream_cb sequential_read_stream_cb block_range_read_stream_cb > > The callback doesn't return blocks having zero vm bits, so the blkno > > variable > > is not accurate. I didn't test, but I think the loop's "Recheck to avoid > > returning spurious results." looks at the bit for the wrong block. If > > that's > > what v1 does, could you expand the test file to catch that? For example, > > make > > a two-block table with only the second block all-visible. > > Yes, it was not accurate. I am getting blockno from the buffer now. I > checked and confirmed it is working as expected by manually logging > blocknos returned from the read stream. I am not sure how to add a > test case for this. VACUUM FREEZE makes an all-visible, all-frozen table. DELETE of a particular TID, even if rolled back, clears both vm bits for the TID's page. Past tests like that had instability problems. One cause is a concurrent session's XID or snapshot, which can prevent VACUUM setting vm bits. Using a TEMP table may have been one of the countermeasures, but I don't remember clearly. Hence, please search the archives or the existing pg_visibility tests for how we dealt with that. It may not be problem for this particular test. > There is one behavioral change introduced with the patches. It could > happen when collect_corrupt_items() is called with both all_visible > and all_frozen being true. > -> Let's say VM_ALL_FROZEN() returns true but VM_ALL_VISIBLE() returns > false in callback. Callback returns this block number because > VM_ALL_FROZEN() is true. > -> At the /* Recheck to avoid returning spurious results. */ part, we > should only check VM_ALL_FROZEN() again as this was returning true in > the callback. But we check both VM_ALL_FROZEN() and VM_ALL_VISIBLE(). > VM_ALL_FROZEN() returns false and VM_ALL_VISIBLE() returns true now > (vice versa of callback). > -> We were skipping this block in the master but the patched version > does not skip that. > > Is this a problem? If this is a problem, a solution might be to > callback return values of VM_ALL_FROZEN() and VM_ALL_VISIBLE() in the > per_buffer_data. No, I don't consider that a problem. That's not making us do additional I/O, just testing more bits within the pages we're already loading. The difference in behavior is user-visible, but it's the same behavior change the user would get if the timing had been a bit different.