On 1/20/19 5:07 PM, chenhj wrote:
> In our PG 10.2(CentOS 7.3) database, the following error is reported when 
> querying a table. We have already restored the production data through 
> backup, but i want to confirm what may be the reason and how to avoid it in 
> the future.
> 
>       lma=# select count(*) from bi_dm.tdm_ttk_site_on_way_rt;
>       ERROR:  could not access status of transaction 3250922107
>       DETAIL:  Could not open file "pg_xact/0C1C": No such file or directory.
> 
> Here are some related information
> 
> The CLOG files in pg_xact diractory is as follow:
> 
>       0C4A(Last update date: 2018/12/29)
>         ...
>         0D09(Last update date: 2019/01/13)
> 

Yes, that very much looks like a data corruption, probably due to
truncating the clog too early or something like that.

> ...
> 
> A similar problem has been reported in 9.0, but there is no reason to mention 
> it.
> 
> https://www.postgresql.org/message-id/flat/1300970362.2349.27.camel%40stevie
> 

The symptoms are the same, but that's far from sufficient to conclude
it's the same root cause.

> Currently I suspect that it may be the same problem as the bug below. is it 
> possible?
> 
> The bug will cause some sessions to cache the wrong relfrozenxid of the 
> table. The session that may call vac_truncate_clog() will clean up the clog 
> after the actual relfrozenxid due to reading the wrong relfrozenxid.
> 
> https://www.postgresql.org/message-id/flat/20180809161148.GB22623%40momjian.us#a7cc4d41464064b7752a5574eb74a06d
> 

Maybe. But it'll be hard to confirm it's what happened. It also shows
why it's important to keep up with minor updates (you're running 10.3,
which is almost 1 year old).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to