On 2013-12-12 20:45:17 -0500, Tom Lane wrote:
> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that
> most systems dump core files with process IDs embedded in the names.
> What would be more useful today is an option to send SIGABRT, or some
> other signal that would force core
On Thu, Dec 26, 2013 at 03:18:23PM -0800, Robert Haas wrote:
> On Thu, Dec 26, 2013 at 11:54 AM, Peter Eisentraut wrote:
> > On 12/12/13, 8:45 PM, Tom Lane wrote:
> >> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that
> >> most systems dump core files with process IDs embedded
On Thu, Dec 26, 2013 at 11:54 AM, Peter Eisentraut wrote:
> On 12/12/13, 8:45 PM, Tom Lane wrote:
>> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that
>> most systems dump core files with process IDs embedded in the names.
>
> Which systems are those?
MacOS X dumps core files
On 12/12/13, 8:45 PM, Tom Lane wrote:
> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that
> most systems dump core files with process IDs embedded in the names.
Which systems are those?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to
On Mon, Dec 16, 2013 at 6:46 AM, Tom Lane wrote:
> Andres Freund writes:
> > Hard to say, the issues fixed in the release are quite important as
> > well. I'd tend to say they are more important. I think we just need to
> > release 9.3.3 pretty soon.
>
> Yeah.
>
Has there been any talk about wh
Tom Lane escribió:
> Andres Freund writes:
> > On 2013-12-16 09:46:19 -0500, Tom Lane wrote:
> >> Are they complete now?
>
> > Hm. There's two issues I know of left, both discovered in #8673:
> > - slru.c:SlruScanDirectory() doesn't support long enough
> > filenames. Afaics that should be a fai
Andres Freund writes:
> On 2013-12-16 09:46:19 -0500, Tom Lane wrote:
>> Are they complete now?
> Hm. There's two issues I know of left, both discovered in #8673:
> - slru.c:SlruScanDirectory() doesn't support long enough
> filenames. Afaics that should be a fairly easy fix.
> - multixact/membe
On 2013-12-16 09:46:19 -0500, Tom Lane wrote:
> Andres Freund writes:
> > The multixact fixes in 9.3.2 weren't complete either... (see recent push)
>
> Are they complete now?
Hm. There's two issues I know of left, both discovered in #8673:
- slru.c:SlruScanDirectory() doesn't support long enough
Andres Freund writes:
> Hard to say, the issues fixed in the release are quite important as
> well. I'd tend to say they are more important. I think we just need to
> release 9.3.3 pretty soon.
Yeah.
> The multixact fixes in 9.3.2 weren't complete either... (see recent push)
Are they complete n
On 2013-12-16 08:36:51 -0600, Merlin Moncure wrote:
> On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund wrote:
> > On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
> >> Is this an edge case or something that will hit a lot of users?
> >> Arbitrary server panics seems pretty serious...
> >
> > Is y
On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund wrote:
> On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
>> On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas wrote:
>> > On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote:
>> >> And while we're on the subject ... isn't bgworker_die() utterly and
>>
On 2013-12-13 13:39:42 -0500, Robert Haas wrote:
> On Fri, Dec 13, 2013 at 1:15 PM, Andres Freund wrote:
> > Agreed on not going forward like now, but I don't really see how they
> > could usefully use die(). I think we should just mandate that every
> > bgworker conneced to shared memory register
Hi,
On 2013-12-13 15:57:14 -0300, Alvaro Herrera wrote:
> If there was a way for raising an #error at compile time whenever a
> worker relies on the existing signal handler, I would vote for doing
> that. (But then I have no idea how to do such a thing.)
I don't see a way either given how discon
On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
> On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas wrote:
> > On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote:
> >> And while we're on the subject ... isn't bgworker_die() utterly and
> >> completely broken? That unconditional elog(FATAL) means t
On Dec 13, 2013, at 8:52 AM, Tom Lane wrote:
> Please apply commit 478af9b79770da43a2d89fcc5872d09a2d8731f8 and see
> if that doesn't fix it for you.
It appears to fix it. Thanks!
--
-- Christophe Pettus
x...@thebuild.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.
On Dec 13, 2013, at 1:49 PM, Merlin Moncure wrote:
> Is this an edge case or something that will hit a lot of users?
My understanding (Tom can correct me if I'm wrong, I'm sure) is that it is an
issue for servers on 9.3.2 where there are a lot of query cancellations due to
facilities like stat
On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas wrote:
> On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote:
>> And while we're on the subject ... isn't bgworker_die() utterly and
>> completely broken? That unconditional elog(FATAL) means that no process
>> using that handler can do anything remotel
Robert Haas writes:
> It seems to me that we should change every place that temporarily
> changes ImmediateInterruptOK to restore the original value instead of
> making assumptions about what it must have been.
No, that's backwards. The problem isn't that it could be sane to enter,
say, PGSemaph
Robert Haas escribió:
> On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote:
> > And while we're on the subject ... isn't bgworker_die() utterly and
> > completely broken? That unconditional elog(FATAL) means that no process
> > using that handler can do anything remotely interesting, like say touch
On Fri, Dec 13, 2013 at 1:15 PM, Andres Freund wrote:
> On 2013-12-13 12:54:09 -0500, Tom Lane wrote:
>> Andres Freund writes:
>> > I wonder what to do about bgworker's bgworker_die()? I don't really see
>> > how that can be fixed without breaking the API?
>>
>> IMO it should be flushed and bgwor
On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane wrote:
> And while we're on the subject ... isn't bgworker_die() utterly and
> completely broken? That unconditional elog(FATAL) means that no process
> using that handler can do anything remotely interesting, like say touch
> shared memory.
Yeah, but f
On 2013-12-13 12:54:09 -0500, Tom Lane wrote:
> Andres Freund writes:
> > I wonder what to do about bgworker's bgworker_die()? I don't really see
> > how that can be fixed without breaking the API?
>
> IMO it should be flushed and bgworkers should use the same die() handler
> as every other backe
Andres Freund writes:
> I wonder what to do about bgworker's bgworker_die()? I don't really see
> how that can be fixed without breaking the API?
IMO it should be flushed and bgworkers should use the same die() handler
as every other backend, or else one like the one in worker_spi, which just
set
On 2013-12-13 12:19:56 -0500, Tom Lane wrote:
> Andres Freund writes:
> > Shouldn't the HOLD_INTERRUPTS() in handle_sig_alarm() prevent any
> > eventual ProcessInterrupts() in the timeout handlers from doing anything
> > harmful?
>
> Sorry, I misspoke there. The case I'm worried about is doing s
Christophe Pettus writes:
> On Dec 13, 2013, at 8:52 AM, Tom Lane wrote:
>> Please apply commit 478af9b79770da43a2d89fcc5872d09a2d8731f8 and see
>> if that doesn't fix it for you.
> Great, thanks. Would the statement_timeout firing invoke this path? (I'm
> wondering why this particular instal
Andres Freund writes:
> On 2013-12-13 11:26:44 -0500, Tom Lane wrote:
>> On closer inspection, I'm thinking that actually it'd be a good idea if
>> handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt:
>> it should save, clear, and restore ImmediateInterruptOK, so as to make
>>
On Dec 13, 2013, at 8:52 AM, Tom Lane wrote:
> Please apply commit 478af9b79770da43a2d89fcc5872d09a2d8731f8 and see
> if that doesn't fix it for you.
Great, thanks. Would the statement_timeout firing invoke this path? (I'm
wondering why this particular installation was experiencing this.)
-
Christophe Pettus writes:
> Yes, that's what is happening there (I had to check with the client's
> developers). It's possible that the one-minute repeat is due to the
> application reissuing the query, rather than specifically related to the
> spinlock issue. What this does reveal is that al
On 2013-12-13 11:26:44 -0500, Tom Lane wrote:
> On closer inspection, I'm thinking that actually it'd be a good idea if
> handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt:
> it should save, clear, and restore ImmediateInterruptOK, so as to make
> the world safe for timeout ha
On closer inspection, I'm thinking that actually it'd be a good idea if
handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt:
it should save, clear, and restore ImmediateInterruptOK, so as to make
the world safe for timeout handlers to do things that might include a
CHECK_FOR_INT
On 2013-12-13 10:30:48 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2013-12-13 09:52:06 -0500, Tom Lane wrote:
> >> I think you're probably right:
> >> what should be in the interrupt handler is something like
> >> "if (ImmediateInterruptOK) CHECK_FOR_INTERRUPTS();"
>
> > Yea, that sound
Andres Freund writes:
> On 2013-12-13 09:52:06 -0500, Tom Lane wrote:
>> I think you're probably right:
>> what should be in the interrupt handler is something like
>> "if (ImmediateInterruptOK) CHECK_FOR_INTERRUPTS();"
> Yea, that sounds right. Or just don't set process interrupts there, it
> do
On 2013-12-13 09:52:06 -0500, Tom Lane wrote:
> Andres Freund writes:
> > Tom, could this be caused by c357be2cd9434c70904d871d9b96828b31a50cc5?
> > Specifically the added CHECK_FOR_INTERRUPTS() in handle_sig_alarm()?
> > ISTM nothing is preventing us from jumping out of code holding a
> > spinloc
Andres Freund writes:
> Tom, could this be caused by c357be2cd9434c70904d871d9b96828b31a50cc5?
> Specifically the added CHECK_FOR_INTERRUPTS() in handle_sig_alarm()?
> ISTM nothing is preventing us from jumping out of code holding a
> spinlock?
Hm ... what should stop it is that ImmediateInterrup
Hi,
On 2013-12-12 19:35:36 -0800, Christophe Pettus wrote:
> On Dec 12, 2013, at 6:41 PM, Andres Freund wrote:
>
> > Christophe: are there any "unusual" ERROR messages preceding the crash,
> > possibly some minutes before?
>
> Interestingly, each spinlock PANIC is *followed*, about one minute l
On Dec 12, 2013, at 7:40 PM, Peter Geoghegan wrote:
> Couldn't that just be the app setting it locally?
Yes, that's what is happening there (I had to check with the client's
developers). It's possible that the one-minute repeat is due to the
application reissuing the query, rather than specif
On Thu, Dec 12, 2013 at 7:35 PM, Christophe Pettus wrote:
> There are a *lot* of "canceling statement due to statement timeout" messages,
> which is interesting, because:
>
> postgres=# show statement_timeout;
> statement_timeout
> ---
> 0
> (1 row)
Couldn't that just be the ap
On Dec 12, 2013, at 6:41 PM, Andres Freund wrote:
> Christophe: are there any "unusual" ERROR messages preceding the crash,
> possibly some minutes before?
Interestingly, each spinlock PANIC is *followed*, about one minute later (+/-
five seconds) by a "canceling statement due to statement tim
On Dec 12, 2013, at 6:24 PM, Andres Freund wrote:
> Is it really a regular pattern like hourly? What's your
> checkpoint_segments?
No, it's not a pattern like that; that's an approximation. Sometimes, they
come in clusters, sometimes, 2-3 hours past without one. They don't happen
exclusively
On 2013-12-12 21:15:29 -0500, Tom Lane wrote:
> Christophe Pettus writes:
> > On Dec 12, 2013, at 5:45 PM, Tom Lane wrote:
> >> Presumably, we are seeing the victim rather than the perpetrator of
> >> whatever is going wrong.
>
> > This is probing about a bit blindly, but the only thing I can se
On Thu, Dec 12, 2013 at 5:45 PM, Tom Lane wrote:
> Memo to hackers: I think the SIGSTOP stuff is rather obsolete now that
> most systems dump core files with process IDs embedded in the names.
> What would be more useful today is an option to send SIGABRT, or some
> other signal that would force c
Hi,
On 2013-12-12 13:50:06 -0800, Christophe Pettus wrote:
> Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting
> frequent (hourly) errors of the form:
Is it really a regular pattern like hourly? What's your
checkpoint_segments?
Could you, arround the time of a crash, c
On Dec 12, 2013, at 6:15 PM, Tom Lane wrote:
> Are you possibly using any nonstandard extensions?
No, totally stock PostgreSQL.
--
-- Christophe Pettus
x...@thebuild.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://ww
Christophe Pettus writes:
> On Dec 12, 2013, at 5:45 PM, Tom Lane wrote:
>> Presumably, we are seeing the victim rather than the perpetrator of
>> whatever is going wrong.
> This is probing about a bit blindly, but the only thing I can see about this
> system that is in some way unique (and thi
On Dec 12, 2013, at 5:45 PM, Tom Lane wrote:
> Presumably, we are seeing the victim rather than the perpetrator of
> whatever is going wrong.
This is probing about a bit blindly, but the only thing I can see about this
system that is in some way unique (and this is happening on multiple machin
Christophe Pettus writes:
> On Dec 12, 2013, at 4:23 PM, Andres Freund wrote:
>> Could you install the -dbg package and regenerate?
> Here's another, same system, different crash:
Both of these look like absolutely run-of-the-mill buffer access attempts.
Presumably, we are seeing the victim rat
On Dec 12, 2013, at 4:23 PM, Andres Freund wrote:
> Could you install the -dbg package and regenerate?
Here's another, same system, different crash:
#0 0x7fa03faf5425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x7fa03faf8b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2
On Dec 12, 2013, at 4:23 PM, Andres Freund wrote:
> Could you install the -dbg package and regenerate?
Of course!
#0 0x7f699a4fa425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x7f699a4fdb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x7f699c81991b in errfinish (
On 2013-12-12 16:22:28 -0800, Christophe Pettus wrote:
>
> On Dec 12, 2013, at 4:04 PM, Tom Lane wrote:
> > If you aren't getting a core file for a PANIC, then core
> > files are disabled.
>
> And just like that, we get one. Stack trace:
>
> #0 0x7f699a4fa425 in raise () from /lib/x86_64-
On Dec 12, 2013, at 4:04 PM, Tom Lane wrote:
> If you aren't getting a core file for a PANIC, then core
> files are disabled.
And just like that, we get one. Stack trace:
#0 0x7f699a4fa425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x7f699a4fa425 in raise () from /l
Christophe Pettus writes:
> On Dec 12, 2013, at 3:18 PM, Tom Lane wrote:
>> Hm, a PANIC really ought to result in a core file. You sure you don't
>> have that disabled (perhaps via a ulimit setting)?
> Since it's using the Ubuntu packaging, we have pg_ctl_options = '-c' in
> /etc/postgresql/9.
On Dec 12, 2013, at 3:18 PM, Tom Lane wrote:
> Hm, a PANIC really ought to result in a core file. You sure you don't
> have that disabled (perhaps via a ulimit setting)?
Since it's using the Ubuntu packaging, we have pg_ctl_options = '-c' in
/etc/postgresql/9.3/main/pg_ctl.conf.
> As for the
On Dec 12, 2013, at 3:33 PM, Andres Freund wrote:
> Any other changes but the upgrade? Maybe a different compiler version?
Just the upgrade; they're using the Ubuntu packages from apt.postgresql.org.
> Also, could you share some details about the workload? Highly
> concurrent? Standby? ...
Th
On Dec 12, 2013, at 3:37 PM, Peter Geoghegan wrote:
> Show pg_config output.
Below; it's the Ubuntu package.
BINDIR = /usr/lib/postgresql/9.3/bin
DOCDIR = /usr/share/doc/postgresql-doc-9.3
HTMLDIR = /usr/share/doc/postgresql-doc-9.3
INCLUDEDIR = /usr/include/postgresql
PKGINCLUDEDIR = /usr/incl
On Thu, Dec 12, 2013 at 3:33 PM, Andres Freund wrote:
> Any other changes but the upgrade? Maybe a different compiler version?
Show pg_config output.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.pos
On 2013-12-12 13:50:06 -0800, Christophe Pettus wrote:
> Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting
> frequent (hourly) errors of the form:
>
> /var/lib/postgresql/9.3/main/pg_log/postgresql-2013-12-12_211710.csv:2013-12-12
> 21:40:10.328
> UTC,"n","n",32376,"10.
Christophe Pettus writes:
> Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting
> frequent (hourly) errors of the form:
> /var/lib/postgresql/9.3/main/pg_log/postgresql-2013-12-12_211710.csv:2013-12-12
> 21:40:10.328
> UTC,"n","n",32376,"10.2.1.142:52451",52aa24eb.7e78,5
Greetings,
Immediately after an upgrade from 9.3.1 to 9.3.2, we have a client getting
frequent (hourly) errors of the form:
/var/lib/postgresql/9.3/main/pg_log/postgresql-2013-12-12_211710.csv:2013-12-12
21:40:10.328
UTC,"n","n",32376,"10.2.1.142:52451",52aa24eb.7e78,5,"SELECT",2013-12-12
21:
Interesting numbers --- thanks for sending them along.
Looks like I was mistaken to think that most platforms would allow
tv_usec >= 1 sec. Ah well, another day, another bug...
regards, tom lane
Tom Lane wrote:
> Judging from the line number, this is in CreateCheckPoint. I'm
> betting that your platform (Solaris 2.7, you said?) has the same odd
> behavior that I discovered a couple days ago on HPUX: a select with
> a delay of tv_sec = 0, tv_usec = 100 doesn't delay 1 second like
> a
Peter Schindler <[EMAIL PROTECTED]> writes:
> FATAL: s_lock(fcc01067) at xlog.c:2088, stuck spinlock. Aborting.
Judging from the line number, this is in CreateCheckPoint. I'm
betting that your platform (Solaris 2.7, you said?) has the same odd
behavior that I discovered a couple days ago on HPUX
Can anyone tell me what is going on, when I get a stuck spinlock?
Is there a data corruption or anything else to worry about ?
I've found some references about spinlocks in the -hackers list,
so is that fixed with a later version than beta4 already?
Actually I was running a stack of pgbench jobs
"Oliver Elphick" <[EMAIL PROTECTED]> writes:
> Has anyone got PostgreSQL 7.0.3 working on m68k architecture?
> Russell is trying to install it on m68k and is consistently getting a
> stuck spinlock in initdb. He used to have 6.3.2 working. Both 6.5.3
> and 7.0.3 fail.
> His message shows that th
Has anyone got PostgreSQL 7.0.3 working on m68k architecture?
Russell is trying to install it on m68k and is consistently getting a
stuck spinlock in initdb. He used to have 6.3.2 working. Both 6.5.3
and 7.0.3 fail.
His message shows that the first attempt to set a lock fails.
--- Forward
64 matches
Mail list logo