On Fri, Jul 22, 2022 at 1:22 AM 王伟(学弈) <rogers...@alibaba-inc.com> wrote:
> I recently find this problem while testing PG14 with sysbench.

The line numbers from your stack trace don't match up with
REL_14_STABLE. Is this actually a fork of Postgres 14? (Oh, looks like
it's an old beta release.)

> Then I look through the emails from pgsql-hackers and find a previous 
> similary bug which is 
> https://www.postgresql.org/message-id/flat/2247102.1618008027%40sss.pgh.pa.us.
>  But the bugfix commit(34f581c39e97e2ea237255cf75cccebccc02d477) is already 
> patched to PG14.

It does seem possible that there is another similar bug somewhere --
another case where we were protected by the fact that VACUUM acquired
a full cleanup lock (not just an exclusive buffer lock) during its
second heap pass. That changed in Postgres 14 (commit 8523492d4e). But
I really don't know -- almost anything is possible.

> I'm wondering whether there's another code path to lead this problem 
> happened. Since, I take a deep dig via gdb which turns out that newbuffer is 
> not euqal to buffer. In other words, the function RelationGetBufferForTuple 
> must have been called just now.
> Besides, why didn't we re-check the flag after RelationGetBufferForTuple was 
> called?

Recheck what flag? And at what point? It's not easy to figure this out
from your stack trace, because of the line number issues.

It would also be helpful if you told us about the specific table
involved. Though the important thing (the essential thing) is to test
today's REL_14_STABLE. There have been *lots* of bug fixes since
Postgres 14 beta2 was current.

-- 
Peter Geoghegan


Reply via email to