> Yup. I had just come to the realization that we'd be best
> off to treat the *entire* period from SpinAcquire to SpinRelease
> as a critical section for the purposes of die(). That is, response
> to SIGTERM will be held off until we have released the spinlock.
> Most of the places where we gra
[EMAIL PROTECTED] (Nathan Myers) writes:
> If a backend dies while holding a lock, doesn't that imply that
> the shared memory may be in an inconsistent state?
Yup. I had just come to the realization that we'd be best off to treat
the *entire* period from SpinAcquire to SpinRelease as a critical
On Wed, Jan 10, 2001 at 12:46:50AM +0600, Denis Perchine wrote:
> > > Didn't you get my mail with a piece of Linux kernel code? I think all is
> > > clear there.
> >
> > That was implementing CPU-time-exceeded kill, which is a different
> > issue.
>
> Opps.. You are talking about OOM killer.
>
>
Denis Perchine <[EMAIL PROTECTED]> writes:
> You will get SIGKILL in most cases.
Well, a SIGKILL will cause the postmaster to shut down and restart the
other backends, so we should be safe if that happens. (Annoyed as heck,
maybe, but safe.)
Anyway, this is looking more and more like the SIGTER
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
>> Yeah, I suppose. We already do record locking of all the fixed
>> spinlocks (BufMgrLock etc), it's just the per-buffer spinlocks that
>> are missing from that (and CRIT_SECTION calls). Would it be
>> reasonable to assume that only one buffer spinlo
> > START_/END_CRIT_SECTION is mostly CritSectionCount++/--.
> > Recording could be made as
> > LockedSpinLocks[LockedSpinCounter++] = &spinlock
> > in pre-allocated array.
>
> Yeah, I suppose. We already do record locking of all the fixed
> spinlocks (BufMgrLock etc), it's just the per-buffer
> > Didn't you get my mail with a piece of Linux kernel code? I think all is
> > clear there.
>
> That was implementing CPU-time-exceeded kill, which is a different
> issue.
Opps.. You are talking about OOM killer.
/* This process has hardware access, be more careful. */
if (cap_t(p->cap_effecti
Denis Perchine <[EMAIL PROTECTED]> writes:
> Didn't you get my mail with a piece of Linux kernel code? I think all is
> clear there.
That was implementing CPU-time-exceeded kill, which is a different
issue.
regards, tom lane
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> START_/END_CRIT_SECTION is mostly CritSectionCount++/--.
> Recording could be made as LockedSpinLocks[LockedSpinCounter++] = &spinlock
> in pre-allocated array.
Yeah, I suppose. We already do record locking of all the fixed
spinlocks (BufMgrLock etc
> > The relevance to the issue at hand is that processes dying during
> > heavy memory load is a documented feature of our supported platforms.
>
> Ugh. Do you know anything about *how* they get killed --- ie, with
> what signal?
Didn't you get my mail with a piece of Linux kernel code? I think
> > Is it true that elog(FATAL) doesn't clean up shmem etc?
> > This would be very bad...
>
> It tries, but I don't think it's possible to make a complete guarantee
> without an unreasonable amount of overhead. The case at hand was a
> stuck spinlock because die() --> elog(FATAL) had neglected t
[EMAIL PROTECTED] (Nathan Myers) writes:
> The relevance to the issue at hand is that processes dying during
> heavy memory load is a documented feature of our supported platforms.
Ugh. Do you know anything about *how* they get killed --- ie, with
what signal?
regards,
> > Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
> > (pg_ctl -m fast stop), but shouldn't ABORT clean up everything?
>
> Er, shouldn't ABORT leave the system in the exact state that it's
> in so that one can get a crashdump/traceback on a wedged process
> without it trying
* Mikheev, Vadim <[EMAIL PROTECTED]> [010108 23:08] wrote:
> > >> Killing an individual backend with SIGTERM is bad luck.
> > >> The backend will assume that it's being killed by the postmaster,
> > >> and will exit without a whole lot of concern for cleaning up shared
> > >> memory --- the
>
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> Killing an individual backend with SIGTERM is bad luck.
> SIGTERM --> die() --> elog(FATAL)
> Is it true that elog(FATAL) doesn't clean up shmem etc?
> This would be very bad...
It tries, but I don't think it's possible to make a complete gua
> >> Killing an individual backend with SIGTERM is bad luck.
> >> The backend will assume that it's being killed by the postmaster,
> >> and will exit without a whole lot of concern for cleaning up shared
> >> memory --- the
SIGTERM --> die() --> elog(FATAL)
Is it true that elog(FATAL) doesn'
"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
>> Killing an individual backend with SIGTERM is bad luck. The backend
>> will assume that it's being killed by the postmaster, and will exit
>> without a whole lot of concern for cleaning up shared memory --- the
> What code will be returned to postm
> Killing an individual backend with SIGTERM is bad luck. The backend
> will assume that it's being killed by the postmaster, and will exit
> without a whole lot of concern for cleaning up shared memory --- the
What code will be returned to postmaster in this case?
Vadim
Denis Perchine <[EMAIL PROTECTED]> writes:
> Hmmm... actually this is real problem with vacuum lazy. Sometimes it
> just do something for enormous amount of time (I have mailed a sample
> database to Vadim, but did not get any response yet). It is possible,
> that it was me, who killed the backend
On Monday 08 January 2001 23:21, Tom Lane wrote:
> Denis Perchine <[EMAIL PROTECTED]> writes:
> >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >
> > Were there any errors before that?
> >
> > Actually you can have a look on the logs yourself.
>
> Well, I foun
Denis Perchine <[EMAIL PROTECTED]> writes:
>> It's worth noting here that modern Unixes run around killing user-level
>> processes more or less at random when free swap space (and sometimes
>> just RAM) runs low.
> That's not the case for sure. There are 512Mb on the machine, and when I had
> th
> > Well, I found a smoking gun: ...
> > What seems to have happened is that 2501 curled up and died, leaving
> > one or more buffer spinlocks locked. ...
> > There is something pretty fishy about this. You aren't by any chance
> > running the postmaster under a ulimit setting that might cut off
On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote:
> Denis Perchine <[EMAIL PROTECTED]> writes:
> >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >
> > Were there any errors before that?
>
> > Actually you can have a look on the logs yourself.
>
> We
Denis Perchine <[EMAIL PROTECTED]> writes:
>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
>
> Were there any errors before that?
> Actually you can have a look on the logs yourself.
Well, I found a smoking gun:
Jan 7 04:27:51 mx postgres[2501]: FATAL 1: T
> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >>
> >> Were there any errors before that?
> >
> > No... Just clean log (I redirect log from stderr/out t file, and all
> > other to syslog).
>
> The error messages would be in the syslog then, not in stderr.
Hmmm... The
Denis Perchine <[EMAIL PROTECTED]> writes:
> On Monday 08 January 2001 00:08, Tom Lane wrote:
FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
>>
>> Were there any errors before that?
> No... Just clean log (I redirect log from stderr/out t file, and all
> other to syslog
On Monday 08 January 2001 00:08, Tom Lane wrote:
> Denis Perchine <[EMAIL PROTECTED]> writes:
> > Does anyone seen this on PostgreSQL 7.0.3?
> > FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
>
> Were there any errors before that?
No... Just clean log (I redirect log from
Denis Perchine <[EMAIL PROTECTED]> writes:
> Does anyone seen this on PostgreSQL 7.0.3?
> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
Were there any errors before that?
I've been suspicious for awhile that the system might neglect to release
buffer cntx_lock spinlocks
28 matches
Mail list logo