Re: [BUGS] BUG #3484: Missing pg_clog file / corrupt index

Decibel! Mon, 30 Jul 2007 22:23:32 -0700

On Jul 25, 2007, at 4:02 PM, Marc Schablewski wrote:

The following bug has been logged online:
Bug reference:      3484
Logged by:          Marc Schablewski
Email address:      [EMAIL PROTECTED]
PostgreSQL version: 8.1.8
Operating system:   SuSE Linux 10.0 / Kernel 2.6.13-15.8-smp (x86-64)
Description:        Missing pg_clog file / corrupt index
Details:
Our application receives and processes payment information thatcomes inplain text files and stores the processed data into differenttables in ourdatabase. There are about 5 tables involved, three of them with 35millionrecords so far. We get approximately 150000 payments a day. Eachpayment ishandled in a single transaction, because in case of an error, wewant tostore as many payments as possible. We have about 500000 INSERT anda fewUPDATE statements each day. The whole application runs on twoservers (seespecs below) which are nearly identical. One is the productionsystem, the
other is for testing.

A few months ago we had some trouble with the test system. Postgres
complained about a missing pg_clog file during nightly routine
VACUUM/ANALYZE. Some days later, the same error occurred on theproductionsystem, even on the same table! The corrupted table is one of thosebiggerones involved into the file processing. After searching the web wefound ahint that this problem could be related to a bug in 8.1.3, so weupgraded to8.1.8 and restored the databases on both servers. This was aroundMay, butnow we discovered the same problem on our production system again.Actually,the error occurred four weeks ago, but it was not discovered untilnow –
if you do error logging, you should look at it from time to time ;)

When trying to backup or vacuum the database, we receive one of the
following error messages:

**BACKUP**
pg_dump: ERROR:  could not access status of transaction 2134240
DETAIL: could not open file "pg_clog/0002": Datei oder Verzeichnisnicht
gefunden
pg_dump: SQL command to dump the contents of table "dateiblock"failed:
PQendcopy() failed.
pg_dump: Error message from server: ERROR:  could not access status of
transaction 2134240
DETAIL: could not open file "pg_clog/0002": Datei oder Verzeichnisnicht
gefunden
pg_dump: The command was: COPY public.dateiblock (id, eda_id,dbt_id, lfdnr,
binaer_offset, laenge) TO stdout;

**VACUUM**
INFO:  vacuuming "public.dateiblock"
ERROR:  could not access status of transaction 2134240
DETAIL: could not open file "pg_clog/0002": Datei oder Verzeichnisnicht
gefunden
(For those not familiar to the German language: ‘Datei oderVerzeichnis
nicht gefunden’ means ‘file or directory not found’).
Current pg_clogs range from 005A to 008F. The oldest one is datedto April
30th.
We narrowed it down to a few records in that table. Some recordscontain
unreasonable values, others produce the same message about the missing
pg_clog file when selected and some are simply missing. But theymust haveexisted, because there are still records in a second tablereferencing them.One strange thing about this is, that the referencing records areabout two
and a half months old and shouldn’t been touched since then.

We don’t think this is a hardware issue, because we had it on two
different servers and within a short period of time.

Luckily, the loss of data is minimal. There are only about 30 records
affected. Otherwise this would have been fatal, because as saidbefore, our
backup was not working either.
In addition, we had another problem today. One of the indexes on asecond
table became corrupted, causing the database backend to rollback all
transactions and restart the processes. A REINDEX fixed it, but itleaves me
with a bad feeling about what will break next.
2007-07-25 08:07:00 CEST PANIC: right sibling's left-link doesn'tmatch
2007-07-25 08:07:00 CEST STATEMENT:  insert into transaktion (kaz_id,
dbl_id, sta_id, kss_id, summe, zeitpunkt, tracenr, terminalid, status,
kartennr
, wae_id, kassenschnitt, freigabe, flo_id, importdatum) VALUES($1, $2, $3,
$4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)
2007-07-25 08:07:00 CEST LOG: server process (PID 5699) wasterminated by
signal 6
2007-07-25 08:07:00 CEST LOG:  terminating any other active server
processes
2007-07-25 08:07:00 CEST WARNING: terminating connection becauseof crash
of another server process
2007-07-25 08:07:00 CEST DETAIL: The postmaster has commanded thisserver
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.


Kind regards,

Marc Schablewski



System:

OS:     SUSE LINUX 10.0 (x86-64), 2.6.13-15.8-smp x86_64
System: 2x Intel(R) Xeon(TM) CPU 2.80GHz Dual Core, 4GB RAM (HPproliant
server)
3Ware 9500S-4LP, 2xRAID1 (one for OS & one for database)

Postgres 8.1.8

Postgres was compiled manually but with default parameters. The
configuration has been tuned to improve performance.

shared_buffers = 30000
maintenance_work_mem = 131072
max_fsm_pages = 500000
default_statistics_target = 200
redirect_stderr = on
log_directory = '/var/log/pg_log'

log_min_duration_statement = 10
log_duration = on
log_line_prefix = '%t '
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = off

VACUUM ANALYZE runs once a day, auto_vacuum is off.

Actually, this does sound like a hardware problem to me. You onlyhave 5 tables that get hit heavily, so you've likely got somewherearound a 20% chance that corruption would hit the same table on twodifferent machines.

So far you haven't said anything that sounds unusual about how you'reusing the database, and the hardware certainly seems pretty common-place, so I'm rather doubtful that it's software. But if you stillhave copies of the bad database, someone might be able to help you.

--
Decibel!, aka Jim Nasby                        [EMAIL PROTECTED]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [BUGS] BUG #3484: Missing pg_clog file / corrupt index

Reply via email to