Tom Lane <[EMAIL PROTECTED]> writes: > Doug McNaught <[EMAIL PROTECTED]> writes: > > I'm running VACUUM, then VACUUM ANALYZE (the docs seem to suggest that > > you need both). Basically my script is: > > VACUUM ANALYZE is a superset of VACUUM; you do not need both. Good to know. > > The example I sent was a crash during VACUUM. > > Hm. Another perfectly good theory shot to heck ;-). It seems unlikely > that VACUUM would fail because of corrupted data inside a tuple ... > although corrupted tuple headers could kill it. Again, though, one > would think such a crash would be repeatable. Agreed, given what you've said. > > Another thing that springs to mind--once the crash happens, the > > database doesn't respond (or gives fatal errors) to new connections > > and to queries on existing connections. Killing the postmaster does > > nothing--I have to send SIGTERM to all backends and the postmaster in > > order to get it to exit. I don't know if this helps... > > Now *this* is interesting. Normally the system recovers quite nicely > from an elog(FATAL), or even from a backend coredump. I now suspect > something must be getting corrupted in shared memory. The next time > it happens, would you proceed as follows: > 1. kill -INT the postmaster. > 2. The backends *should* exit in response to the SIGTERM the > postmaster will have sent them. Any backend that survives > more than a fraction of a second is stuck somehow. For each > stuck backend, in turn: > 3. kill -ABORT the backend, to create a corefile, and collect > a gdb backtrace from the corefile. Be careful to get the > right corefile, if you are dealing with more than one > database. > > That should give us some idea of what's stuck (especially if you compile > with -g). I don't remember if I did or not, and (like a moron) I blew away the source tree. I'll see what gdb tells me about the presence of symbols. >From what I've seen so far, all the backends (other than the one that actually crashes) seem to survive the SIGTERM I send to the postmaster. How do I tell which one is which? The command line? > BTW, which version did you say you were running? If it's less than > 7.0.3 I'd recommend an update before we pursue this much further ... Just double-checked and it is indeed 7.0.3. I'll be back with more info once I get another crash... -Doug