On 1/20/19 5:07 PM, chenhj wrote: > In our PG 10.2(CentOS 7.3) database, the following error is reported when > querying a table. We have already restored the production data through > backup, but i want to confirm what may be the reason and how to avoid it in > the future. > > lma=# select count(*) from bi_dm.tdm_ttk_site_on_way_rt; > ERROR: could not access status of transaction 3250922107 > DETAIL: Could not open file "pg_xact/0C1C": No such file or directory. > > Here are some related information > > The CLOG files in pg_xact diractory is as follow: > > 0C4A(Last update date: 2018/12/29) > ... > 0D09(Last update date: 2019/01/13) >
Yes, that very much looks like a data corruption, probably due to truncating the clog too early or something like that. > ... > > A similar problem has been reported in 9.0, but there is no reason to mention > it. > > https://www.postgresql.org/message-id/flat/1300970362.2349.27.camel%40stevie > The symptoms are the same, but that's far from sufficient to conclude it's the same root cause. > Currently I suspect that it may be the same problem as the bug below. is it > possible? > > The bug will cause some sessions to cache the wrong relfrozenxid of the > table. The session that may call vac_truncate_clog() will clean up the clog > after the actual relfrozenxid due to reading the wrong relfrozenxid. > > https://www.postgresql.org/message-id/flat/20180809161148.GB22623%40momjian.us#a7cc4d41464064b7752a5574eb74a06d > Maybe. But it'll be hard to confirm it's what happened. It also shows why it's important to keep up with minor updates (you're running 10.3, which is almost 1 year old). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services