Hi Chiru, I am trying to pg_dump the database to have a snapshot of the current state. I've turned on 'zero_damaged_pages' but pg_dump is still failing with an "invalid page header" error - this time from what looks like a sequence object that is auto-setting IDs on a table. Any advice on how to remove this error?
Here is the full query that's failing: SELECT sequence_name, start_value, last_value, increment_by, CASE WHEN increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycled, is_called from unfinishedsubmissionstub_id_seq On Fri, Dec 26, 2014 at 2:35 PM, chiru r <chir...@gmail.com> wrote: > Hi Cory, > > After recovering table turn off *zero_damaged_pages *parameter. > > > On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue <c...@dimagi.com> wrote: > >> Hi all, >> >> Thanks for the responses. Chiru, I'm looking into your suggestion. >> >> Sameer, here is the kernel version info: >> >> Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT >> 2014 x86_64 x86_64 x86_64 GNU/Linux >> >> Does that seem like it could be a problematic version? >> >> More generally - I'm still wondering whether I should chalk this failure >> up to a transient/random issue, or whether I should be more worried about >> the hardware on the machine. According to our diagnostic tools, disk and >> memory are fine, but it's still not clear to me how it got into this state. >> Any general bits of information regarding the potential causes of these >> types of issues would be much appreciated. >> >> thanks, >> Cory >> >> >> On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar <sameer.ku...@ashnik.com> >> wrote: >> >>> On 23 Dec 2014 12:05, "Cory Zue" <c...@dimagi.com> wrote: >>> > >>> > Hi all, >>> > >>> > Our postgres instance on one of our production machines has recently >>> been returning errors of the form "DatabaseError: invalid page header in >>> block 1 of relation base/16384/76623" from normal queries. I've been >>> reading that these are often linked to hardware errors, but I would like to >>> better understand what else it could be or how to determine that for sure. >>> I've filled out the standard issue reporting template below. Any feedback >>> or troubleshooting instructions would be much appreciated. >>> > >>> > --- >>> > A description of what you are trying to achieve and what results you >>> expect.: >>> > >>> > Intermittent queries are failing with the error "DatabaseError: >>> invalid page header in block 1 of relation base/16384/76623" >>> > >>> > PostgreSQL version number you are running: >>> > >>> > PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc >>> (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit >>> > >>> > How you installed PostgreSQL: >>> > >>> > from standard package installer >>> > >>> > Changes made to the settings in the postgresql.conf file: >>> > >>> > >>> > name | current_setting | >>> source >>> > >>> ------------------------------+-----------------------------+---------------------- >>> > checkpoint_completion_target | 0.9 | >>> configuration file >>> > checkpoint_segments | 32 | >>> configuration file >>> > checkpoint_timeout | 15min | >>> configuration file >>> > DateStyle | ISO, MDY | >>> configuration file >>> > default_text_search_config | pg_catalog.english | >>> configuration file >>> > effective_cache_size | 1GB | >>> configuration file >>> > lc_messages | en_US.UTF-8 | >>> configuration file >>> > lc_monetary | en_US.UTF-8 | >>> configuration file >>> > lc_numeric | en_US.UTF-8 | >>> configuration file >>> > lc_time | en_US.UTF-8 | >>> configuration file >>> > log_checkpoints | on | >>> configuration file >>> > log_connections | off | >>> configuration file >>> > log_destination | csvlog | >>> configuration file >>> > log_directory | /opt/data/pgsql/data/pg_log | >>> configuration file >>> > log_disconnections | off | >>> configuration file >>> > log_duration | on | >>> configuration file >>> > log_filename | postgres-%Y-%m-%d_%H%M%S | >>> configuration file >>> > log_lock_waits | on | >>> configuration file >>> > log_min_duration_statement | 250ms | >>> configuration file >>> > log_rotation_age | 1d | >>> configuration file >>> > log_rotation_size | 1GB | >>> configuration file >>> > log_temp_files | 0 | >>> configuration file >>> > log_timezone | Asia/Kolkata | command >>> line >>> > log_truncate_on_rotation | on | >>> configuration file >>> > logging_collector | on | >>> configuration file >>> > maintenance_work_mem | 768MB | >>> configuration file >>> > max_connections | 500 | >>> configuration file >>> > max_stack_depth | 2MB | >>> environment variable >>> > port | 5432 | command >>> line >>> > shared_buffers | 4GB | >>> configuration file >>> > ssl | on | >>> configuration file >>> > TimeZone | Asia/Kolkata | command >>> line >>> > timezone_abbreviations | Default | command >>> line >>> > wal_buffers | 16MB | >>> configuration file >>> > work_mem | 48MB | >>> configuration file >>> > >>> > It's also probably worth noting that postgres is installed on an >>> encrypted volume which is mounted using ecryptfs. >>> > >>> > Operating system and version: >>> > >>> > RedHatEnterpriseServer, version 6.6 >>> > >>> > What program you're using to connect to PostgreSQL: >>> > >>> > Python (django) >>> > >>> > Is there anything relevant or unusual in the PostgreSQL server logs?: >>> > >>> > I see lots of instances of this error (and similar). I'm not sure what >>> else I should be looking for. >>> > >>> > What you were doing when the error happened / how to cause the error: >>> > >>> > I haven't explicitly tried to reproduce it, but it seems to >>> consistently happen with certain queries. However, the system was rebooted >>> shortly before the errors started occuring. The system was rebooted because >>> another database (elasticsearch) was having problems on the same machine >>> and the reboot was to attempt to resolve things. >>> > >>> > The EXACT TEXT of the error message you're getting, if there is one: >>> > >>> > DatabaseError: invalid page header in block 1 of relation >>> base/16384/76623 >>> > >>> > (block and relation numbers change) >>> > >>> > Unfortunately, I'm not completely familiar with the CPU and disk/RAID >>> configurations used on the server. However it is storing to a (software) >>> encrypted volume as mentioned above. >>> > >>> > Have you ever set fsync=off in the postgresql config file? >>> > No >>> > Have you had any unexpected power loss lately? Replaced a failed RAID >>> disk? Had an operating system crash? >>> > Not recently, though the system did reboot normally as described above. >>> > Have you run a file system check? (chkdsk / fsck) >>> > No. >>> > Are there any error messages in the system logs? >>> (unix/linux: dmesg, /var/log/syslog ; >>> > I haven't seen anything obvious but I wasn't sure what to look for. >>> > >>> >>> I guess you missed to provide the details and kernel version (rhel >>> version and kernel level). >>> This will give you kernel patch level- >>> >>> uname -a >>> >>> I had once faced this issue and I was on a buggy patch of Linux kernel. >>> I just had to update to latest patch. That worked for me. >>> >> >> >