RE: [HACKERS] Quite strange crash

2001-01-09 Thread Mikheev, Vadim
> Yup. I had just come to the realization that we'd be best > off to treat the *entire* period from SpinAcquire to SpinRelease > as a critical section for the purposes of die(). That is, response > to SIGTERM will be held off until we have released the spinlock. > Most of the places where we gra

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
[EMAIL PROTECTED] (Nathan Myers) writes: > If a backend dies while holding a lock, doesn't that imply that > the shared memory may be in an inconsistent state? Yup. I had just come to the realization that we'd be best off to treat the *entire* period from SpinAcquire to SpinRelease as a critical

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Nathan Myers
On Wed, Jan 10, 2001 at 12:46:50AM +0600, Denis Perchine wrote: > > > Didn't you get my mail with a piece of Linux kernel code? I think all is > > > clear there. > > > > That was implementing CPU-time-exceeded kill, which is a different > > issue. > > Opps.. You are talking about OOM killer. > >

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: > You will get SIGKILL in most cases. Well, a SIGKILL will cause the postmaster to shut down and restart the other backends, so we should be safe if that happens. (Annoyed as heck, maybe, but safe.) Anyway, this is looking more and more like the SIGTER

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: >> Yeah, I suppose. We already do record locking of all the fixed >> spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks that >> are missing from that (and CRIT_SECTION calls). Would it be >> reasonable to assume that only one buffer spinlo

RE: [HACKERS] Quite strange crash

2001-01-09 Thread Mikheev, Vadim
> > START_/END_CRIT_SECTION is mostly CritSectionCount++/--. > > Recording could be made as > > LockedSpinLocks[LockedSpinCounter++] = &spinlock > > in pre-allocated array. > > Yeah, I suppose. We already do record locking of all the fixed > spinlocks (BufMgrLock etc), it's just the per-buffer

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Denis Perchine
> > Didn't you get my mail with a piece of Linux kernel code? I think all is > > clear there. > > That was implementing CPU-time-exceeded kill, which is a different > issue. Opps.. You are talking about OOM killer. /* This process has hardware access, be more careful. */ if (cap_t(p->cap_effecti

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: > Didn't you get my mail with a piece of Linux kernel code? I think all is > clear there. That was implementing CPU-time-exceeded kill, which is a different issue. regards, tom lane

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > START_/END_CRIT_SECTION is mostly CritSectionCount++/--. > Recording could be made as LockedSpinLocks[LockedSpinCounter++] = &spinlock > in pre-allocated array. Yeah, I suppose. We already do record locking of all the fixed spinlocks (BufMgrLock etc

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Denis Perchine
> > The relevance to the issue at hand is that processes dying during > > heavy memory load is a documented feature of our supported platforms. > > Ugh. Do you know anything about *how* they get killed --- ie, with > what signal? Didn't you get my mail with a piece of Linux kernel code? I think

RE: [HACKERS] Quite strange crash

2001-01-09 Thread Mikheev, Vadim
> > Is it true that elog(FATAL) doesn't clean up shmem etc? > > This would be very bad... > > It tries, but I don't think it's possible to make a complete guarantee > without an unreasonable amount of overhead. The case at hand was a > stuck spinlock because die() --> elog(FATAL) had neglected t

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Tom Lane
[EMAIL PROTECTED] (Nathan Myers) writes: > The relevance to the issue at hand is that processes dying during > heavy memory load is a documented feature of our supported platforms. Ugh. Do you know anything about *how* they get killed --- ie, with what signal? regards,

Re: [HACKERS] Quite strange crash

2001-01-09 Thread Vadim Mikheev
> > Well, it's not good idea because of SIGTERM is used for ABORT + EXIT > > (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? > > Er, shouldn't ABORT leave the system in the exact state that it's > in so that one can get a crashdump/traceback on a wedged process > without it trying

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Alfred Perlstein
* Mikheev, Vadim <[EMAIL PROTECTED]> [010108 23:08] wrote: > > >> Killing an individual backend with SIGTERM is bad luck. > > >> The backend will assume that it's being killed by the postmaster, > > >> and will exit without a whole lot of concern for cleaning up shared > > >> memory --- the >

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: > Killing an individual backend with SIGTERM is bad luck. > SIGTERM --> die() --> elog(FATAL) > Is it true that elog(FATAL) doesn't clean up shmem etc? > This would be very bad... It tries, but I don't think it's possible to make a complete gua

RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim
> >> Killing an individual backend with SIGTERM is bad luck. > >> The backend will assume that it's being killed by the postmaster, > >> and will exit without a whole lot of concern for cleaning up shared > >> memory --- the SIGTERM --> die() --> elog(FATAL) Is it true that elog(FATAL) doesn'

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes: >> Killing an individual backend with SIGTERM is bad luck. The backend >> will assume that it's being killed by the postmaster, and will exit >> without a whole lot of concern for cleaning up shared memory --- the > What code will be returned to postm

RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim
> Killing an individual backend with SIGTERM is bad luck. The backend > will assume that it's being killed by the postmaster, and will exit > without a whole lot of concern for cleaning up shared memory --- the What code will be returned to postmaster in this case? Vadim

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: > Hmmm... actually this is real problem with vacuum lazy. Sometimes it > just do something for enormous amount of time (I have mailed a sample > database to Vadim, but did not get any response yet). It is possible, > that it was me, who killed the backend

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine
On Monday 08 January 2001 23:21, Tom Lane wrote: > Denis Perchine <[EMAIL PROTECTED]> writes: > >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > > > > Were there any errors before that? > > > > Actually you can have a look on the logs yourself. > > Well, I foun

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: >> It's worth noting here that modern Unixes run around killing user-level >> processes more or less at random when free swap space (and sometimes >> just RAM) runs low. > That's not the case for sure. There are 512Mb on the machine, and when I had > th

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine
> > Well, I found a smoking gun: ... > > What seems to have happened is that 2501 curled up and died, leaving > > one or more buffer spinlocks locked. ... > > There is something pretty fishy about this. You aren't by any chance > > running the postmaster under a ulimit setting that might cut off

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Nathan Myers
On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote: > Denis Perchine <[EMAIL PROTECTED]> writes: > >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > > > > Were there any errors before that? > > > Actually you can have a look on the logs yourself. > > We

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > > Were there any errors before that? > Actually you can have a look on the logs yourself. Well, I found a smoking gun: Jan 7 04:27:51 mx postgres[2501]: FATAL 1: T

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine
> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > >> > >> Were there any errors before that? > > > > No... Just clean log (I redirect log from stderr/out t file, and all > > other to syslog). > > The error messages would be in the syslog then, not in stderr. Hmmm... The

Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: > On Monday 08 January 2001 00:08, Tom Lane wrote: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. >> >> Were there any errors before that? > No... Just clean log (I redirect log from stderr/out t file, and all > other to syslog

Re: [HACKERS] Quite strange crash

2001-01-07 Thread Denis Perchine
On Monday 08 January 2001 00:08, Tom Lane wrote: > Denis Perchine <[EMAIL PROTECTED]> writes: > > Does anyone seen this on PostgreSQL 7.0.3? > > FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. > > Were there any errors before that? No... Just clean log (I redirect log from

Re: [HACKERS] Quite strange crash

2001-01-07 Thread Tom Lane
Denis Perchine <[EMAIL PROTECTED]> writes: > Does anyone seen this on PostgreSQL 7.0.3? > FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? I've been suspicious for awhile that the system might neglect to release buffer cntx_lock spinlocks