On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote:
> daveg <da...@sonic.net> writes:
> > We are seeing "cannot read' and 'cannot open' errors too that would be
> > consistant with trying to use a vanished file.
> 
> Yeah, these all seem consistent with the idea that the failing backend
> somehow missed an update for the relation mapping file.  You would get
> the "could not find pg_class tuple" syndrome if the process was holding
> an open file descriptor for the now-deleted file, and otherwise cannot
> open/cannot read type errors.  And unless it later received another
> sinval message for the relation mapping file, the errors would persist.
> 
> If this theory is correct then all of the file-related errors ought to
> match up to recently-vacuumed mapped catalogs or indexes (those are the
> ones with relfilenode = 0 in pg_class).  Do you want to expand your
> logging of the VACUUM FULL actions and see if you can confirm that idea?

At your service, what would you like to see?
 
> Since the machine is running RHEL, I think we can use glibc's
> backtrace() function to get simple stack traces without too much effort.
> I'll write and test a patch and send it along in a bit.

Great.

Any point to try to capture SI events somehow?

-dg

-- 
David Gould       da...@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to