On 13-9-2005 16:25, Tom Lane wrote:
Arjen van der Meijden <[EMAIL PROTECTED]> writes:

It's highly unlikely that that query has anything to do with it, since
it's not touching anything but system catalogs and not trying to write
them either.

Indeed, other things trigger it as well.

The first thing you ought to find out is which table
1663/2013826/9975789 is, and look to see if the corrupted LSN value is
already present on disk in that block.

Well, its an index, not a table. It was the index:
"pg_class_relname_nsp_index" on pg_class(relname, relnamespace).

Using pg_filedump I extracted the LSN for block 21 and indeed, that was already 67713428 instead of something below 2E73E53C. It wasn't that block alone though, here are a few LSN-lines from it:

 LSN:  logid     41 recoff 0x676f5174      Special  8176 (0x1ff0)
 LSN:  logid     25 recoff 0x3c6c5504      Special  8176 (0x1ff0)
 LSN:  logid     41 recoff 0x2ea8a270      Special  8176 (0x1ff0)
 LSN:  logid     41 recoff 0x2ea88190      Special  8176 (0x1ff0)
 LSN:  logid      1 recoff 0x68e2f660      Special  8176 (0x1ff0)
 LSN:  logid     41 recoff 0x2ea8a270      Special  8176 (0x1ff0)
 LSN:  logid      1 recoff 0x68e2f6a4      Special  8176 (0x1ff0)

I tried other files and each one I tried only had LSN's of 0.

When trying (\d indexname in psql) to determine to which table that index belonged I noticed it got the errors again, but for another file (pg_index this time). And another try (oid2name ...) after that, yet another file (the pg_class-table). All those files where last changed somewhere August 25, so now new changes.

On that day I did some active query-tuning, but a few times it took too long, so I issued immediate shut downs when the selects took too long. There were no warnings about broken records afterwards in the log though, so I don't believe anything got damaged afterwards.

After that I loaded some fresh data from a production-database using either pg_restore or psql < some-file-from-pg_dump.sql (I don't know which one anymore). A few days later I shut down that postgres, installed 8.1-beta and used that (in another directory of course), this 8.0.3 only came back up because of a reboot and wasn't used since that reboot.

I guess, during that reloading those system tables got mixed up?

If it is, then we've probably
not got much chance of finding out how it got there.  If it is *not* on
disk, but you have a repeatable way of causing this to happen starting
from a clean postmaster start, then that's pretty interesting --- but
I don't know any way of figuring it out short of groveling through the
code with a debugger.  If you're not already pretty familiar with the PG
code, coaching you remotely isn't going to work very well :-(.  I'd be
glad to look into it if you can get me access to the machine though.

Well, I can very probably give you that access. But as you say, finding out was went wrong is very hard to do.

Best regards,

Arjen van der Meijden

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to