Hi Kevin!
Thanks for your reply. You make me feel that this is more serious than I
thought.
This is development server, that is also used as pre-live server.
Pre-live database is restored from live database dump every night. So
far the errors have been in pre-live database, that's why I didn't worry
too much - it is anyway overwritten every night from backup. Usually the
next day error was gone. I mostly blamed badly timed backup and restore
scripts, although this shouldn't result in errors.
The errors started from 07.09.2010, when I was still running PostgreSQL
8.1. Few examples:
07.09.2010:
Warning: pg_dump: ERROR: could not open relation with OID 339815468
pg_dump: SQL command to dump the contents of table "kannete_read"
failed: PQendcopy() failed. pg_dump: Error message from server: ERROR:
could not open relation with OID 339815468 pg_dump: The command was:
COPY public.kannete_read (yhistu_id, kande_rea_id, kande_id, konto_nr,
alamkonto_nr, deebetsumma, kreeditsumma, deebetsaldo, kreeditsaldo,
alamkonto_deebetsaldo, alamkonto_kreeditsaldo, looja, loomise_aeg,
muutja, muutmise_aeg, kuupaev, kande_nr, kinnitatud, deebetprotsent,
kreeditprotsent) TO stdout; pg_dumpall: pg_dump failed on database
"korteriy_histu", exiting
19.09.2010:
Warning: pg_dump: ERROR: unexpected chunk number 926884437 (expected
514) for toast value 1736426835 pg_dump: SQL command to dump the
contents of table "failid" failed: PQendcopy() failed. pg_dump: Error
message from server: ERROR: unexpected chunk number 926884437 (expected
514) for toast value 1736426835 pg_dump: The command was: COPY
public.failid (faili_id, yhistu_id, perioodi_id, arve_id, dokumendi_id,
tyyp, sisu, laius, korgus, pikkus, faili_nimi, sisu_tyyp, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "arvetest", exiting
24.09.2010:
Warning: pg_dump: socket not open pg_dump: SQL command to dump the
contents of table "failid" failed: PQendcopy() failed. pg_dump: Error
message from server: socket not open pg_dump: The command was: COPY
public.failid (faili_id, yhistu_id, perioodi_id, arve_id, dokumendi_id,
tyyp, sisu, laius, korgus, pikkus, faili_nimi, sisu_tyyp, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "arvetest", exiting
9.11.2010:
Warning: pg_dump: Dumping the contents of table "maaramised" failed:
PQgetCopyData() failed. pg_dump: Error message from server: server
closed the connection unexpectedly This probably means the server
terminated abnormally before or while processing the request. pg_dump:
The command was: COPY public.maaramised (maaramise_id, kululiigi_id,
perioodi_id, yhistu_id, korteri_id, kogus, yhik, hind, summa, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "arvetest", exiting
More recently after I upgraded to 8.4, 11.02.2010:
Warning: pg_dump: SQL command failed pg_dump: Error message from server:
ERROR: compressed data is corrupt pg_dump: The command was: COPY
public.failid (faili_id, yhistu_id, perioodi_id, arve_id, dokumendi_id,
tyyp, sisu, laius, korgus, pikkus, faili_nimi, sisu_tyyp, looja,
loomise_aeg, muutja, muutmise_aeg) TO stdout; pg_dumpall: pg_dump failed
on database "korteriy_histu", exiting
The current error has occurred 3 days in a row - 13-15.03.2011:
Warning: pg_dump: SQL command failed pg_dump: Error message from server:
ERROR: found toasted toast chunk for toast value 260340218 in
pg_toast_260339342 pg_dump: The command was: COPY
public.yhistud_urlcache (id, url, params, sess_id, content) TO stdout;
pg_dumpall: pg_dump failed on database "yhistud", exiting
This time the error is not in pre-live database and therefore it doesn't
go away.
I have not noticed any unusual errors in other services. The server is
also running Subversion, Trac, Apache, Samba, MySQL, Oracle, Tomcat and
so on. PostgreSQL, Subversion, Trac and Apache+PHP are used actively
every day.
Both fsync and full_page_writes are on. OK, I don't have UPS for this
machine, but power has been stable. Current uptime is 32 days, which I
bet is from the last kernel update. I run Debian testing on that machine.
Currently I blame either faulty memory or faulty software RAID driver. I
can easily eliminate the memory cause by running memtest86 for few
hours. But how do I eliminate the software RAID driver? PostgreSQL has
always been solid for me, so I suspect it least, but you never know...
Now, off to buy UPS...
Tambet
On 15.03.2011 19:47, Kevin Grittner wrote:
"Tambet Matiisen"<tambet.matii...@gmail.com> wrote:
For a few days I've been getting this error from my nightly backup
script:
Warning: pg_dump: SQL command failed pg_dump: Error message from
server: ERROR: found toasted toast chunk for toast value 260340218
in pg_toast_260339342 pg_dump: The command was: COPY
public.yhistud_urlcache (id, url, params, sess_id, content) TO
stdout; pg_dumpall: pg_dump failed on database "yhistud", exiting
Warning: Failed to dump pgsql cluster
So you don't have a current backup, and your database is corrupted.
(1) If you still have a backup from before you started getting
backup failures, keep it safe until everything has settled down and
is running well for several months.
(2) Stop PostgreSQL and do a full copy of the data directory and
everything under it to a backup medium or another machine. Keep
this copy safe for months, too.
Yesterday I upgraded the server from 8.4.5 to 8.4.7, hoping that
this error will go away, but no success.
Newer versions with more bug fixes may be less likely to contain
bugs which could cause corruption, but an upgrade like that is
unlikely to "heal" data which is already corrupted.
I've been getting occasional errors from backup script for several
months,
Do you know what those were?
I have upgraded Linux kernel to 2.6.32, hoping that maybe the
problem is in software RAID driver, but no changes, occasionally I
still get errors.
Occasionally get what errors?
I still have to do memory test on the server, but I doubt faulty
memories are the problem, because otherwise the server behaves
well.
So, no problems other than months of errors on backups? Never any
OS lockups, power losses, or other abrupt terminations of
operations?
Also, do you now or have you ever run the database with fsync = off
or full_page_writes = off?
It is very important to figure out how your data got corrupted;
otherwise you can't really trust this machine..
-Kevin
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs