Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-09 Thread Duncan Rance
On 9 Feb 2012, at 15:02, Tom Lane wrote: > Duncan Rance writes: >> Our customers are keen to get the official release as soon as possible. They >> are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how >> long this might take, and I promised I'll find out for them. Any idea

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-09 Thread Tom Lane
Duncan Rance writes: > Our customers are keen to get the official release as soon as possible. They > are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how > long this might take, and I promised I'll find out for them. Any ideas? There's no firm plan at the moment. The ea

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-09 Thread Duncan Rance
On 8 Feb 2012, at 10:01, Duncan Rance wrote: > On 6 Feb 2012, at 20:48, Tom Lane wrote: > >> bug reports. Please see if you can break REL9_0_STABLE branch tip > > Just to let you know that I built this yesterday and I'm giving it a good > battering in our Solaris 10 Sparc test environment. In

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-08 Thread Duncan Rance
On 6 Feb 2012, at 20:48, Tom Lane wrote: > bug reports. Please see if you can break REL9_0_STABLE branch tip Just to let you know that I built this yesterday and I'm giving it a good battering in our Solaris 10 Sparc test environment. D -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgre

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-07 Thread Bridget Frey
Just a quick update, we have now deployed the patch to all three of our production slave databases, and none has experienced an alloc error or segfault since receiving the patch. So it's looking very good! We would not be able to deploy the whole 9.1 stable build to our production environment sin

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-06 Thread Tom Lane
[ in re bugs 6200 and 6425 ] I've committed patches for all the issues I could find pursuant to these bug reports. Please see if you can break REL9_0_STABLE branch tip (or 9.1 if that's what you're working with). regards, tom lane -- Sent via pgsql-bugs mailing list (pg

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-04 Thread Bridget Frey
We deployed the patch to one of our production slaves at 3:30 PM yesterday (so roughly 20 hours ago), and since then we have not seen any alloc errors. On Feb 2nd, the last full day in which we ran without the patch, we saw 13 alloc errors. We're going to continue monitoring this slave, but we're

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-04 Thread Simon Riggs
On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane wrote: > I wrote: >> I have not gotten very far with the coredump, except to observe that >> gdb says the Assert ought to have passed: ... >> This suggests very strongly that indeed the buffer was changing under >> us. > > I probably ought to let the test c

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-03 Thread Tom Lane
Duncan Rance writes: > On 3 Feb 2012, at 06:45, Tom Lane wrote: >> I probably ought to let the test case run overnight before concluding >> anything, but at this point it's run for two-plus hours with no errors >> after applying this patch: > Thank Tom! I've had this running for a few hours now w

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-03 Thread Duncan Rance
On 3 Feb 2012, at 06:45, Tom Lane wrote: > > I probably ought to let the test case run overnight before concluding > anything, but at this point it's run for two-plus hours with no errors > after applying this patch: > > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/trans

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-03 Thread Bridget Frey
I just wanted to say thanks to everyone who has been working so hard on this issue. I realize it's not certain that this would fix the issues we're seeing, but we'd be willing to try it out and report back. The only caveat is we would need to deploy it to production, so if someone could let us kn

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Tom Lane
Bridget Frey writes: > I just wanted to say thanks to everyone who has been working so hard on > this issue. I realize it's not certain that this would fix the issues > we're seeing, but we'd be willing to try it out and report back. The only > caveat is we would need to deploy it to production,

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Tom Lane
I wrote: > I have not gotten very far with the coredump, except to observe that > gdb says the Assert ought to have passed: ... > This suggests very strongly that indeed the buffer was changing under > us. I probably ought to let the test case run overnight before concluding anything, but at this

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Tom Lane
I wrote: > So far no luck reproducing any issue with this test case. And I swear my finger had barely left the "send" key when: TRAP: FailedAssertion("!(((lpp)->lp_flags == 1))", File: "heapam.c", Line: 735) LOG: server process (PID 24740) was terminated by signal 6: Aborted DETAIL: Failed proc

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Tom Lane
Duncan Rance writes: > At last I have been able to reproduce this problem in a relatively simple > (yet contrived) way. > I've put together a tarball with a few scripts, some to be run on the primary > and others to be run on the hot-stanby. There's a README in there explaining > what to do.

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Duncan Rance
On 2 Feb 2012, at 18:02, Duncan Rance wrote: > > At last I have been able to reproduce this problem in a relatively simple > (yet contrived) way. Doh! Should have mentioned this already, but in case a Sparc is not available, the latest on the debugging is as follows: As well as the bus error,

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Duncan Rance
On 1 Feb 2012, at 22:37, Duncan Rance wrote: > On 1 Feb 2012, at 21:43, Tom Lane wrote: > >> If you could post complete instructions for duplicating this, we >> could probably find the cause fairly quickly. > > I've been on this for over a week now, and much of that has been trying to > simplif

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Duncan Rance
On 1 Feb 2012, at 21:43, Tom Lane wrote: >> Client 87 aborted in state 8: ERROR: wrong hoff: 134 > > Yowza. Is this just the standard pgbench test, or something else? This is pgbench with a custom script (-f option.) > If you could post complete instructions for duplicating this, we > could pr

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Tom Lane
Duncan Rance writes: > I mentioned in the bug report that I has asserts in places were t_hoff is > set. I've been doing it like so: > if (hoff % 4 != 0) { > elog(ERROR, "wrong hoff: %d",hoff); > abort(); > } > I've been sitting here waiting for the server to abort and only just realised >

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Alvaro Herrera
Excerpts from Duncan Rance's message of mié feb 01 17:43:48 -0300 2012: > I mentioned in the bug report that I has asserts in places were t_hoff is > set. I've been doing it like so: > > if (hoff % 4 != 0) { > elog(ERROR, "wrong hoff: %d",hoff); > abort(); > } > > I've been sitting here wa

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Duncan Rance
On 1 Feb 2012, at 18:10, Robert Haas wrote: > I went looking for commits that might be relevant to this that are new > in 9.0.6, also present in 9.1.2 (per 6200), and related to t_hoff, and > came up with this one: > > Branch: master [039680aff] 2011-11-04 23:22:50 -0400 I looked at this and it s

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Robert Haas
On Wed, Feb 1, 2012 at 11:04 AM, Tom Lane wrote: > Have you read the thread about bug #6200?  I'm suspicious that this is > the same or similar problem, with a slightly different visible symptom > because of pickier hardware.  I'm afraid we don't know what's going on > yet there either, but the id

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Duncan Rance
On 1 Feb 2012, at 16:04, Tom Lane wrote: > postg...@dunquino.com writes: >> This is intermittent and hard to reproduce but crashes consistently in the >> same place. That place is backend/access/common/heaptuple.c line 1104: >> ... >> This system is using streaming replication, and the problem alw

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Tom Lane
postg...@dunquino.com writes: > This is intermittent and hard to reproduce but crashes consistently in the > same place. That place is backend/access/common/heaptuple.c line 1104: > ... > This system is using streaming replication, and the problem always occurrs > on the secondary. Have you read t