Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Duncan Rance
On 1 Feb 2012, at 16:04, Tom Lane wrote:

> postg...@dunquino.com writes:
>> This is intermittent and hard to reproduce but crashes consistently in the
>> same place. That place is backend/access/common/heaptuple.c line 1104:
>> ...
>> This system is using streaming replication, and the problem always occurrs
>> on the secondary.
> 
> Have you read the thread about bug #6200?  I'm suspicious that this is
> the same or similar problem, with a slightly different visible symptom
> because of pickier hardware.  I'm afraid we don't know what's going on
> yet there either, but the idea that t_hoff is wrong gives us a new line
> of thought.
> 
>   regards, tom lane

I didn't find 6200 when looking for mentions of this problem. So thanks for 
that.

I have read the thread now and I guess it could be the same kind of thing. I 
have tried creating a cut-down version of what is happening for real, but that 
didn't  cause the problem.

I do have a bunch of core files, but I'm not (yet!) familiar with pg code so I 
am unable to usefully analyse it.

One idea I saw mentioned in #6200 is of using another HS. I didn't think of 
that before. I may try creating another (or more) of them to see if I can 
reproduce it more quickly.

Regards,
orval
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Duncan Rance
On 1 Feb 2012, at 18:10, Robert Haas wrote:
> I went looking for commits that might be relevant to this that are new
> in 9.0.6, also present in 9.1.2 (per 6200), and related to t_hoff, and
> came up with this one:
> 
> Branch: master [039680aff] 2011-11-04 23:22:50 -0400

I looked at this and it seems specific to doing an ALTER TABLE ADD COLUMN, 
which we're not in this case.

I mentioned in the bug report that I has asserts in places were t_hoff is set. 
I've been doing it like so:

if (hoff % 4 != 0) {
  elog(ERROR, "wrong hoff: %d",hoff);
  abort();
}

I've been sitting here waiting for the server to abort and only just realised 
there are some interesting entries in my pgbench logs. I'm using pgbench to 
hammer the server with queries, and I have a handful of these:

Client 87 aborted in state 8: ERROR:  wrong hoff: 134

I have these abort() calls in:

backend/access/common/heaptuple.c
backend/access/heap/heapam.c
backend/access/heap/tuptoaster.c

But I know from the text that it must have been from either 
slot_deform_tuple(), heap_form_tuple() or heap_form_minimal_tuple() in 
heaptuple.c.

What I don't get is why this is causing the client to abort, and not the 
backend.

What can I do to get the server to abort at this point? Use PANIC instead of 
ERROR in the elog call perhaps?




Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-01 Thread Duncan Rance
On 1 Feb 2012, at 21:43, Tom Lane wrote:
>> Client 87 aborted in state 8: ERROR:  wrong hoff: 134
> 
> Yowza.  Is this just the standard pgbench test, or something else?

This is pgbench with a custom script (-f option.)

> If you could post complete instructions for duplicating this, we
> could probably find the cause fairly quickly.

I'd love to, really I would! If I did, the instructions would be War & Peace 
length :)

I've been on this for over a week now, and much of that has been trying to 
simplify the test case. I have a lot more to go on now though so I may make 
more progress with that soon. (Although it's 10:30pm so I'm calling it a day!)

>> What I don't get is why this is causing the client to abort, and not the 
>> backend.
> 
> As Alvaro said, it's not reaching the abort().  You should use PANIC
> instead.


Yes thanks, and to Álvaro too. I changed it to PANIC and I now have many many 
core files to choose from!

Cheers,
Duncan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-02 Thread Duncan Rance
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last 
reproduction of the problem I saw this:

Client 2 aborted in state 0: ERROR:  invalid memory alloc request size 
18446744073709551613

So like Tom said, these two issues could well be related. I just wanted to 
mention it here in this thread, FYI.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2012-02-02 Thread Duncan Rance
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last 
reproduction of the problem I saw this:

Client 2 aborted in state 0: ERROR:  invalid memory alloc request size 
18446744073709551613

So like Tom said, these two issues could well be related. I just wanted to 
mention it here in this thread, FYI.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Duncan Rance
On 1 Feb 2012, at 22:37, Duncan Rance wrote:

> On 1 Feb 2012, at 21:43, Tom Lane wrote:
> 
>> If you could post complete instructions for duplicating this, we
>> could probably find the cause fairly quickly.
> 
> I've been on this for over a week now, and much of that has been trying to 
> simplify the test case.

At last I have been able to reproduce this problem in a relatively simple (yet 
contrived) way.

I've put together a tarball with a few scripts, some to be run on the primary 
and others to be run on the hot-stanby. There's a README in there explaining 
what to do.

I'm going to try attaching it here, although I wouldn't be surprised if one is 
not allowed to send attachments to the list. Any suggestions of where to put it 
would be gratefully received.

Cheers,
Duncan



bug_6425.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-02 Thread Duncan Rance
On 2 Feb 2012, at 18:02, Duncan Rance wrote:
> 
> At last I have been able to reproduce this problem in a relatively simple 
> (yet contrived) way.

Doh! Should have mentioned this already, but in case a Sparc is not available, 
the latest on the debugging is as follows:

As well as the bus error, I also saw the same symptom as described in BUG 
#6200. I changed the four places that did an elog ERROR "invalid memory alloc 
request size" to PANIC instead and got a raft of core files.

I have not dug any further as yet, but at the following function on the stack:

char *
text_to_cstring(const text *t)

The values t and tunpacked are the same, so pg_detoast_datum_packed() did not 
modify t. And len comes out as -4.

A couple of bits from dbx:

(dbx) print -fx t->vl_len_[0]
t->vl_len_[0] = 0xff84
(dbx) examine tunpacked /2x
0x01ceb9dc:  0x8474 0x776f

Going to have a look further up the stack now.

Cheers,
Dunc

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-03 Thread Duncan Rance
On 3 Feb 2012, at 06:45, Tom Lane wrote:
> 
> I probably ought to let the test case run overnight before concluding
> anything, but at this point it's run for two-plus hours with no errors
> after applying this patch:
> 
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c

Thank Tom! I've had this running for a few hours now without problems. 
Previously, on Sparc, the problem would occur in less than a minute.

I did try a build with --enable-cassert and it didn't actually cause the 
problem. I think I left it for about an hour. Although a a relatively modern 
machine, this Sparc box I am using is painfully slow. My guess is that the 
extra time taken to perform the Assert code is hiding the problem.

Now it's time to persuade the customer to use a patched version of pg ;)

Cheers,
Duncan

P.S. I've been looking for an OS project to contribute to, and I think I'll see 
if I can help with pg. Time to look a the TODO list :)

Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-08 Thread Duncan Rance
On 6 Feb 2012, at 20:48, Tom Lane wrote:

> bug reports.  Please see if you can break REL9_0_STABLE branch tip


Just to let you know that I built this yesterday and I'm giving it a good 
battering in our Solaris 10 Sparc test environment.

D
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-09 Thread Duncan Rance
On 8 Feb 2012, at 10:01, Duncan Rance wrote:

> On 6 Feb 2012, at 20:48, Tom Lane wrote:
> 
>> bug reports.  Please see if you can break REL9_0_STABLE branch tip
> 
> Just to let you know that I built this yesterday and I'm giving it a good 
> battering in our Solaris 10 Sparc test environment.

In this environment my bug repro scripts would produce the problem within 
seconds. It has now been running for 24 hours, so I'm confident the problem is 
solved.

Our customers are keen to get the official release as soon as possible. They 
are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how 
long this might take, and I promised I'll find out for them. Any ideas?

Thanks,
Duncan
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple

2012-02-09 Thread Duncan Rance
On 9 Feb 2012, at 15:02, Tom Lane wrote:

> Duncan Rance  writes:
>> Our customers are keen to get the official release as soon as possible. They 
>> are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how 
>> long this might take, and I promised I'll find out for them. Any ideas?
> 
> There's no firm plan at the moment.  The earliest it could happen is
> around the end of the month, since various key people have other
> commitments in the next couple weeks.  I'm not promising it *will*
> happen then, but that's the way things look right now.
> 
> (Since you're new around here, I'll explain that the way this works
> is that the pgsql-core and pgsql-packagers lists agree on a release
> date in advance.  We've had some preliminary discussions, and people
> seem to agree that this is a bad enough bug to force a release, but
> no date's been set.  Once a schedule decision is made, some core
> member --- often me --- will announce it on pgsql-hackers, so you
> can keep an eye on that list if you want advance notice.)
> 
>   regards, tom lane

Good explanation. Thanks Tom!
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-15 Thread Duncan Rance
On 14 Feb 2012, at 18:28, Tom Lane wrote:
> 
> Oh, I see the reason for this: the code in cclass() in regc_locale.c
> doesn't go further up than U+00FF, so no codes above that will be
> thought to be letters (or members of any other character class).
> Clearly we need to go further when we are dealing with UTF8.
> I'm not sure what a sane limit would be though.

The Basic Multilingual Plane goes up to :

https://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes

Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-15 Thread Duncan Rance
On 14 Feb 2012, at 18:28, Tom Lane wrote:
> 
> Oh, I see the reason for this: the code in cclass() in regc_locale.c
> doesn't go further up than U+00FF, so no codes above that will be
> thought to be letters (or members of any other character class).
> Clearly we need to go further when we are dealing with UTF8.
> I'm not sure what a sane limit would be though.


The Basic Multilingual Plane goes up to :

https://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes