On Fri, Aug 31, 2018 at 8:48 PM Dave Peticolas <d...@krondo.com> wrote:
> On Fri, Aug 31, 2018 at 5:19 PM Adrian Klaver <adrian.kla...@aklaver.com> > wrote: > >> On 08/31/2018 08:51 AM, Dave Peticolas wrote: >> > On Fri, Aug 31, 2018 at 8:14 AM Adrian Klaver < >> adrian.kla...@aklaver.com >> > <mailto:adrian.kla...@aklaver.com>> wrote: >> > >> > On 08/31/2018 08:02 AM, Dave Peticolas wrote: >> > > Hello, I'm running into the following error running a large query >> > on a >> > > database restored from WAL replay: >> > > >> > > could not access status of transaction 330569126 >> > > DETAIL: Could not open file "pg_clog/0C68": No such file or >> directory >> > >> > >> > Postgres version? >> > >> > >> > Right! Sorry, that original email didn't have a lot of info. This is >> > 9.6.9 restoring a backup from 9.6.8. >> > >> > Where is the replay coming from? >> > >> > >> > From a snapshot and WAL files stored in Amazon S3. >> >> Seems the process is not creating a consistent backup. >> > > This time, yes. This setup has been working for almost two years with > probably hundreds of restores in that time. But nothing's perfect I guess :) > > >> How are they being generated? >> > > The snapshots are sent to S3 via a tar process after calling the start > backup function. I am following the postgres docs here. The WAL files are > just copied to S3. > > >> >> > Are you sure you are not working across versions? >> > >> > >> > I am sure, they are all 9.6. >> > >> > If not do pg_clog/ and 0C68 actually exist? >> > >> > >> > pg_clog definitely exists, but 0C68 does not. I think I have >> > subsequently found the precise row in the specific table that seems to >> > be the problem. Specifically I can select * from TABLE where id = BADID >> > - 1 or id = BADID + 1 and the query returns. I get the error if I >> select >> > the row with the bad ID. >> > >> > Now what I'm not sure of is how to fix. >> >> One thing I can think of is to rebuild from a later version of your S3 >> data and see if it has all the necessary files. >> > > Yes, I think that's a good idea, I'm trying that. > > >> There is also pg_resetxlog: >> >> https://www.postgresql.org/docs/9.6/static/app-pgresetxlog.html >> >> I have not used it, so I can not offer much in the way of tips. Just >> from reading the docs I would suggest stopping the server and then >> creating a backup of $PG_DATA(if possible) before using pg_resetxlog. >> > > Thanks, I didn't know about that. The primary DB seems OK so hopefully it > won't be needed. > Well restoring from a backup of the primary does seem to have fixed the issue with the corrupt table.