Tom Lane <[EMAIL PROTECTED]> writes:

> Doug McNaught <[EMAIL PROTECTED]> writes:
> > I'm running VACUUM, then VACUUM ANALYZE (the docs seem to suggest that 
> > you need both).  Basically my script is:
> 
> VACUUM ANALYZE is a superset of VACUUM; you do not need both.

Good to know.

> > The example I sent was a crash during VACUUM.
> 
> Hm.  Another perfectly good theory shot to heck ;-).  It seems unlikely
> that VACUUM would fail because of corrupted data inside a tuple ...
> although corrupted tuple headers could kill it.  Again, though, one
> would think such a crash would be repeatable.

Agreed, given what you've said. 

> > Another thing that springs to mind--once the crash happens, the
> > database doesn't respond (or gives fatal errors) to new connections
> > and to queries on existing connections.  Killing the postmaster does
> > nothing--I have to send SIGTERM to all backends and the postmaster in
> > order to get it to exit.  I don't know if this helps...
> 
> Now *this* is interesting.  Normally the system recovers quite nicely
> from an elog(FATAL), or even from a backend coredump.  I now suspect
> something must be getting corrupted in shared memory.  The next time
> it happens, would you proceed as follows:
>       1. kill -INT the postmaster.
>       2. The backends *should* exit in response to the SIGTERM the
>          postmaster will have sent them.  Any backend that survives
>          more than a fraction of a second is stuck somehow.  For each
>          stuck backend, in turn:
>       3. kill -ABORT the backend, to create a corefile, and collect
>          a gdb backtrace from the corefile.  Be careful to get the
>          right corefile, if you are dealing with more than one
>          database.
> 
> That should give us some idea of what's stuck (especially if you compile
> with -g).

I don't remember if I did or not, and (like a moron) I blew away the
source tree.  I'll see what gdb tells me about the presence of
symbols.

>From what I've seen so far, all the backends (other than the one that
actually crashes) seem to survive the SIGTERM I send to the
postmaster.  How do I tell which one is which?  The command line?

> BTW, which version did you say you were running?  If it's less than
> 7.0.3 I'd recommend an update before we pursue this much further ...

Just double-checked and it is indeed 7.0.3.

I'll be back with more info once I get another crash...

-Doug

Reply via email to