On 9 Feb 2012, at 15:02, Tom Lane wrote:
> Duncan Rance writes:
>> Our customers are keen to get the official release as soon as possible. They
>> are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how
>> long this might take, and I promised I'll find out for them. Any idea
Duncan Rance writes:
> Our customers are keen to get the official release as soon as possible. They
> are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how
> long this might take, and I promised I'll find out for them. Any ideas?
There's no firm plan at the moment. The ea
On 8 Feb 2012, at 10:01, Duncan Rance wrote:
> On 6 Feb 2012, at 20:48, Tom Lane wrote:
>
>> bug reports. Please see if you can break REL9_0_STABLE branch tip
>
> Just to let you know that I built this yesterday and I'm giving it a good
> battering in our Solaris 10 Sparc test environment.
In
On 6 Feb 2012, at 20:48, Tom Lane wrote:
> bug reports. Please see if you can break REL9_0_STABLE branch tip
Just to let you know that I built this yesterday and I'm giving it a good
battering in our Solaris 10 Sparc test environment.
D
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgre
Just a quick update, we have now deployed the patch to all three of our
production slave databases, and none has experienced an alloc error or
segfault since receiving the patch. So it's looking very good! We would
not be able to deploy the whole 9.1 stable build to our production
environment sin
[ in re bugs 6200 and 6425 ]
I've committed patches for all the issues I could find pursuant to these
bug reports. Please see if you can break REL9_0_STABLE branch tip
(or 9.1 if that's what you're working with).
regards, tom lane
--
Sent via pgsql-bugs mailing list (pg
We deployed the patch to one of our production slaves at 3:30 PM yesterday
(so roughly 20 hours ago), and since then we have not seen any alloc
errors. On Feb 2nd, the last full day in which we ran without the patch,
we saw 13 alloc errors. We're going to continue monitoring this slave, but
we're
On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane wrote:
> I wrote:
>> I have not gotten very far with the coredump, except to observe that
>> gdb says the Assert ought to have passed: ...
>> This suggests very strongly that indeed the buffer was changing under
>> us.
>
> I probably ought to let the test c
Duncan Rance writes:
> On 3 Feb 2012, at 06:45, Tom Lane wrote:
>> I probably ought to let the test case run overnight before concluding
>> anything, but at this point it's run for two-plus hours with no errors
>> after applying this patch:
> Thank Tom! I've had this running for a few hours now w
On 3 Feb 2012, at 06:45, Tom Lane wrote:
>
> I probably ought to let the test case run overnight before concluding
> anything, but at this point it's run for two-plus hours with no errors
> after applying this patch:
>
> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/trans
I just wanted to say thanks to everyone who has been working so hard on
this issue. I realize it's not certain that this would fix the issues
we're seeing, but we'd be willing to try it out and report back. The only
caveat is we would need to deploy it to production, so if someone could let
us kn
Bridget Frey writes:
> I just wanted to say thanks to everyone who has been working so hard on
> this issue. I realize it's not certain that this would fix the issues
> we're seeing, but we'd be willing to try it out and report back. The only
> caveat is we would need to deploy it to production,
I wrote:
> I have not gotten very far with the coredump, except to observe that
> gdb says the Assert ought to have passed: ...
> This suggests very strongly that indeed the buffer was changing under
> us.
I probably ought to let the test case run overnight before concluding
anything, but at this
I wrote:
> So far no luck reproducing any issue with this test case.
And I swear my finger had barely left the "send" key when:
TRAP: FailedAssertion("!(((lpp)->lp_flags == 1))", File: "heapam.c", Line: 735)
LOG: server process (PID 24740) was terminated by signal 6: Aborted
DETAIL: Failed proc
Duncan Rance writes:
> At last I have been able to reproduce this problem in a relatively simple
> (yet contrived) way.
> I've put together a tarball with a few scripts, some to be run on the primary
> and others to be run on the hot-stanby. There's a README in there explaining
> what to do.
On 2 Feb 2012, at 18:02, Duncan Rance wrote:
>
> At last I have been able to reproduce this problem in a relatively simple
> (yet contrived) way.
Doh! Should have mentioned this already, but in case a Sparc is not available,
the latest on the debugging is as follows:
As well as the bus error,
On 1 Feb 2012, at 22:37, Duncan Rance wrote:
> On 1 Feb 2012, at 21:43, Tom Lane wrote:
>
>> If you could post complete instructions for duplicating this, we
>> could probably find the cause fairly quickly.
>
> I've been on this for over a week now, and much of that has been trying to
> simplif
On 1 Feb 2012, at 21:43, Tom Lane wrote:
>> Client 87 aborted in state 8: ERROR: wrong hoff: 134
>
> Yowza. Is this just the standard pgbench test, or something else?
This is pgbench with a custom script (-f option.)
> If you could post complete instructions for duplicating this, we
> could pr
Duncan Rance writes:
> I mentioned in the bug report that I has asserts in places were t_hoff is
> set. I've been doing it like so:
> if (hoff % 4 != 0) {
> elog(ERROR, "wrong hoff: %d",hoff);
> abort();
> }
> I've been sitting here waiting for the server to abort and only just realised
>
Excerpts from Duncan Rance's message of mié feb 01 17:43:48 -0300 2012:
> I mentioned in the bug report that I has asserts in places were t_hoff is
> set. I've been doing it like so:
>
> if (hoff % 4 != 0) {
> elog(ERROR, "wrong hoff: %d",hoff);
> abort();
> }
>
> I've been sitting here wa
On 1 Feb 2012, at 18:10, Robert Haas wrote:
> I went looking for commits that might be relevant to this that are new
> in 9.0.6, also present in 9.1.2 (per 6200), and related to t_hoff, and
> came up with this one:
>
> Branch: master [039680aff] 2011-11-04 23:22:50 -0400
I looked at this and it s
On Wed, Feb 1, 2012 at 11:04 AM, Tom Lane wrote:
> Have you read the thread about bug #6200? I'm suspicious that this is
> the same or similar problem, with a slightly different visible symptom
> because of pickier hardware. I'm afraid we don't know what's going on
> yet there either, but the id
On 1 Feb 2012, at 16:04, Tom Lane wrote:
> postg...@dunquino.com writes:
>> This is intermittent and hard to reproduce but crashes consistently in the
>> same place. That place is backend/access/common/heaptuple.c line 1104:
>> ...
>> This system is using streaming replication, and the problem alw
postg...@dunquino.com writes:
> This is intermittent and hard to reproduce but crashes consistently in the
> same place. That place is backend/access/common/heaptuple.c line 1104:
> ...
> This system is using streaming replication, and the problem always occurrs
> on the secondary.
Have you read t
24 matches
Mail list logo