On Sun, Jan 5, 2020 at 11:00 PM chenhj <chjis...@163.com> wrote:
> According to above information, the flags of the heap page (163363) with the 
> problem tuple (163363, 9) is 0x0001 (HAS_FREE_LINES), that is, ALL_VISIBLE is 
> not set.
>
> However, according  hexdump content of the corresponding vm file, that 
> block(location is 9F88 + 6bit) has set VISIBILITYMAP_ALL_FROZEN and 
> VISIBILITYMAP_ALL_VISIBLE flags. That is, the heap file and the vm file are 
> inconsistent.

That's not supposed to happen, and represents data corruption. Your
previous report of a too-old xmin surviving in the heap is also
corruption.  There is no guarantee that both problems have the same
cause, but suppose they do. One possibility is that a write to the
heap page may have gotten lost or undone. Suppose that, while this
page was in shared_buffers, VACUUM came through and froze it, setting
the bits in the VM and later truncating CLOG. Then, suppose that when
that page was evicted from shared_buffers, it didn't really get
written back to disk, or alternatively it did, but then later somehow
the old version reappeared. I think that would produce these symptoms.

I think that bad hardware could cause this, or running two copies of
the server on the same data files at the same time, or maybe some kind
of filesystem-related flakiness, especially if, for example, you are
using a network filesystem like NFS, or maybe a broken iSCSI stack.
There is also no reason it couldn't be a bug in PostgreSQL itself,
although if we lost page writes routinely somebody would surely have
noticed by now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to