Hi Magnus, Thanks for your update. To identify the number of tables corrupted in the database if I run below command, Will any impact on other tables in the production environment.
"pg_dump -f /dev/null database" Thanks in advance. Regards, Vasu Madhineni On Fri, Sep 18, 2020 at 3:42 PM Magnus Hagander <mag...@hagander.net> wrote: > That depends on what the problem is and how they fix it. Most likely yes > -- especially since if you haven't enabled data checksums you won't *know* > if things are OK or not. So I'd definitely recommend it even if things > *look* OK. > > //Magnus > > > On Wed, Sep 16, 2020 at 5:06 AM Vasu Madhineni <vasumdba1...@gmail.com> > wrote: > >> I could see block read I/O errors in /var/log/syslog. if those error >> fixed by OS team, will it require recovery. >> >> Also can i use LIMIT and OFFSET to locate corrupted rows? >> >> Thanks in advance >> >> Regards, >> Vasu Madhineni >> >> On Wed, Sep 16, 2020, 01:58 Magnus Hagander <mag...@hagander.net> wrote: >> >>> Try reading them "row by row" until it breaks. That is, SELECT * FROM >>> ... LIMIT 1, then LIMIT 2 etc. For more efficiency use a binary search >>> starting at what seems like a reasonable place looking at the size of the >>> table vs the first failed block to make it faster, but the principle is the >>> same. Once it fails, you've found a corrupt block... >>> >>> //Magnus >>> >>> >>> On Tue, Sep 15, 2020 at 12:46 PM Vasu Madhineni <vasumdba1...@gmail.com> >>> wrote: >>> >>>> Is it possible to identify which rows are corrupted in particular >>>> tables. >>>> >>>> On Tue, Sep 15, 2020 at 5:36 PM Magnus Hagander <mag...@hagander.net> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni < >>>>> vasumdba1...@gmail.com> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> In one of my postgres databases multiple tables got corrupted and >>>>>> followed the below steps but still the same error. >>>>>> >>>>>> 1.SET zero_damaged_pages = on >>>>>> 2. VACUUM ANALYZE, VACUUM FULL >>>>>> but still same error. >>>>>> >>>>> >>>>> >>>>> That is a very destructive first attempt. I hope you took a full >>>>> disk-level backup of the database before you did that, as it can ruin your >>>>> chances for forensics and data recovery for other issues. >>>>> >>>>> >>>>> moh_fa=# VACUUM FULL; >>>>>> ERROR: could not read block 9350 in file "base/1156523/1270812": >>>>>> Input/output error >>>>>> >>>>>> Tried to take backup of tables with pg_dump but same error. files >>>>>> exist physically in base location. >>>>>> >>>>>> How to proceed on this, no backup to restore. >>>>>> >>>>>> >>>>> This is clearly some sort of disk error, and with no backups to >>>>> restore you will definitely be losing data. >>>>> >>>>> I'd start by figuring out which tables have no corruption and do work, >>>>> and back those up (with pg_dump for example) as soon as possible to a >>>>> different machine -- since it's not exactly unlikely that further disk >>>>> errors will appear. >>>>> >>>>> Once you've done that, identify the tables, and then try to do partial >>>>> recovery. For example, if you look at the file 1270812, how big it is? >>>>> PostgreSQL is failing to read block 9350 which is 76595200 bytes into the >>>>> file. If this is at the very end of the file, you can for example try to >>>>> get the data out until that point with LIMIT. If it's in the middle of the >>>>> file, it gets more ticky, but similar approaches can be done. >>>>> >>>>> Also, unless you are running with data checksums enabled, I wouldn't >>>>> fully trust the data in the tables that you *can* read either. Since you >>>>> clearly have disk issues, they may have caused corruption elsewhere as >>>>> well, so whatever verification you can do against other tables, you should >>>>> do as well. >>>>> >>>>> >>>>> You'll of course also want to check any kernel logs or storage system >>>>> logs to see if they can give you a hint as to what happened, but they are >>>>> unlikely to actually give you something that will help you fix the >>>>> problem. >>>>> >>>>>