On Sun, Jan 5, 2020 at 11:00 PM chenhj <chjis...@163.com> wrote: > According to above information, the flags of the heap page (163363) with the > problem tuple (163363, 9) is 0x0001 (HAS_FREE_LINES), that is, ALL_VISIBLE is > not set. > > However, according hexdump content of the corresponding vm file, that > block(location is 9F88 + 6bit) has set VISIBILITYMAP_ALL_FROZEN and > VISIBILITYMAP_ALL_VISIBLE flags. That is, the heap file and the vm file are > inconsistent.
That's not supposed to happen, and represents data corruption. Your previous report of a too-old xmin surviving in the heap is also corruption. There is no guarantee that both problems have the same cause, but suppose they do. One possibility is that a write to the heap page may have gotten lost or undone. Suppose that, while this page was in shared_buffers, VACUUM came through and froze it, setting the bits in the VM and later truncating CLOG. Then, suppose that when that page was evicted from shared_buffers, it didn't really get written back to disk, or alternatively it did, but then later somehow the old version reappeared. I think that would produce these symptoms. I think that bad hardware could cause this, or running two copies of the server on the same data files at the same time, or maybe some kind of filesystem-related flakiness, especially if, for example, you are using a network filesystem like NFS, or maybe a broken iSCSI stack. There is also no reason it couldn't be a bug in PostgreSQL itself, although if we lost page writes routinely somebody would surely have noticed by now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company