Hi hackers,

We have some problems on our production with hint bits and frozen tuples.
More and more following errors began to appear on master after switchover:
ERROR:  58P01: could not access status of transaction 1952523525
DETAIL:  Could not open file "pg_clog/0746": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:896

We investigated the problem with pageinspect and found the tuples that are the 
cause:

xdb311g(master)=# select * from mytable where ctid = '(4,21)';
ERROR:  58P01: could not access status of transaction 1951521353
DETAIL:  Could not open file "pg_clog/0745": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:896

But the same query successfully executed on replica.

We found some difference in hint bits between master and replica:

xdb311g(master)=# SELECT t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) 
FROM heap_page_items(get_raw_page(‘mytable',4)) where lp=21;
-[ RECORD 1 ]------------------------------
t_xmin   | 1951521353
?column? | 00000000000000000000000000000000

old master, now replica:
xdb311e(replica)=# SELECT t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) 
FROM heap_page_items(get_raw_page(‘mytable',4)) where lp=21;
-[ RECORD 1 ]------------------------------
t_xmin   | 1951521353
?column? | 00000000000000000000001100000000

X’0300’ = HEAP_XMIN_FROZEN according to

#define HEAP_XMIN_COMMITTED     0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID       0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

xdb311g(master)=# select relfrozenxid from pg_class where relname = ‘mytable';
relfrozenxid
--------------
2266835605
(1 row)

This tuple must be frozen but there are no set bits HEAP_XMIN_COMMITTED and 
HEAP_XMIN_INVALID on master

Another interesting thing that LSN of this page on master and replica are not 
the same:
xdb311g(master)=# select lsn from page_header(get_raw_page(‘mytable',4));
   lsn
---------------
8092/6A26DD08
(1 row)

xdb311e(replica)=# select lsn from page_header(get_raw_page(‘mytable',4));
   lsn
---------------
838D/C4A0D280
(1 row)

And LSN on replica is greater that LSN on master (838D/C4A0D280 > 8092/6A26DD08)
How can this be possible?

We wrote a query which returns ctid of frozen tuples, which must be frozen but 
not actually frozen.

xdb311e(replica)=# select t_ctid from generate_series(0, 
pg_relation_size(‘mytable')/8192 - 1 ) s(i) left join lateral 
heap_page_items(get_raw_page(‘mytable',s.i::int)) on true where 
t_xmin::text::bigint < (select relfrozenxid::text::bigint from pg_class where 
relname = ‘mytable') and t_infomask & X'0300'::int < 1;
t_ctid
-----------
(400,16)
(2837,71)
(2837,72)
(2837,73)
(2837,75)
(2837,76)
(3042,40)
(4750,80)
(4750,81)
(5214,60)
(5214,65)
(6812,31)
(6912,63)
(7329,8)
(7374,26)
(7374,27)
(16 rows)
Same query on master returns 317 rows.

Our thoughts:
1) We think that it is related to switchover.
2) Any WAL-logged modification of this page on master will replace this page on 
replica due to full page writes.
 And all replicas will have broken hint bits too. It’s dangerous.

Where to dig further?

RHEL6, PostgreSQL 9.6.3, wal_log_hints=off, full_page_writes=on, fsync=on, 
checksums disabled.
We don’t think that it is any hardware-related problems because this databases 
started from 9.4
and they survived 2 upgrades with pg_upgrade. And any hardware-related problems 
was not detected. 
Problem appears not only in this shard.
Size of each shard is around 5TB and we can’t provide data.

Regards
Dmitriy Sarafannikov

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to