Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 1 Feb 2012, at 16:04, Tom Lane wrote: > postg...@dunquino.com writes: >> This is intermittent and hard to reproduce but crashes consistently in the >> same place. That place is backend/access/common/heaptuple.c line 1104: >> ... >> This system is using streaming replication, and the problem always occurrs >> on the secondary. > > Have you read the thread about bug #6200? I'm suspicious that this is > the same or similar problem, with a slightly different visible symptom > because of pickier hardware. I'm afraid we don't know what's going on > yet there either, but the idea that t_hoff is wrong gives us a new line > of thought. > > regards, tom lane I didn't find 6200 when looking for mentions of this problem. So thanks for that. I have read the thread now and I guess it could be the same kind of thing. I have tried creating a cut-down version of what is happening for real, but that didn't cause the problem. I do have a bunch of core files, but I'm not (yet!) familiar with pg code so I am unable to usefully analyse it. One idea I saw mentioned in #6200 is of using another HS. I didn't think of that before. I may try creating another (or more) of them to see if I can reproduce it more quickly. Regards, orval -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 1 Feb 2012, at 18:10, Robert Haas wrote: > I went looking for commits that might be relevant to this that are new > in 9.0.6, also present in 9.1.2 (per 6200), and related to t_hoff, and > came up with this one: > > Branch: master [039680aff] 2011-11-04 23:22:50 -0400 I looked at this and it seems specific to doing an ALTER TABLE ADD COLUMN, which we're not in this case. I mentioned in the bug report that I has asserts in places were t_hoff is set. I've been doing it like so: if (hoff % 4 != 0) { elog(ERROR, "wrong hoff: %d",hoff); abort(); } I've been sitting here waiting for the server to abort and only just realised there are some interesting entries in my pgbench logs. I'm using pgbench to hammer the server with queries, and I have a handful of these: Client 87 aborted in state 8: ERROR: wrong hoff: 134 I have these abort() calls in: backend/access/common/heaptuple.c backend/access/heap/heapam.c backend/access/heap/tuptoaster.c But I know from the text that it must have been from either slot_deform_tuple(), heap_form_tuple() or heap_form_minimal_tuple() in heaptuple.c. What I don't get is why this is causing the client to abort, and not the backend. What can I do to get the server to abort at this point? Use PANIC instead of ERROR in the elog call perhaps?
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 1 Feb 2012, at 21:43, Tom Lane wrote: >> Client 87 aborted in state 8: ERROR: wrong hoff: 134 > > Yowza. Is this just the standard pgbench test, or something else? This is pgbench with a custom script (-f option.) > If you could post complete instructions for duplicating this, we > could probably find the cause fairly quickly. I'd love to, really I would! If I did, the instructions would be War & Peace length :) I've been on this for over a week now, and much of that has been trying to simplify the test case. I have a lot more to go on now though so I may make more progress with that soon. (Although it's 10:30pm so I'm calling it a day!) >> What I don't get is why this is causing the client to abort, and not the >> backend. > > As Alvaro said, it's not reaching the abort(). You should use PANIC > instead. Yes thanks, and to Álvaro too. I changed it to PANIC and I now have many many core files to choose from! Cheers, Duncan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last reproduction of the problem I saw this: Client 2 aborted in state 0: ERROR: invalid memory alloc request size 18446744073709551613 So like Tom said, these two issues could well be related. I just wanted to mention it here in this thread, FYI. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last reproduction of the problem I saw this: Client 2 aborted in state 0: ERROR: invalid memory alloc request size 18446744073709551613 So like Tom said, these two issues could well be related. I just wanted to mention it here in this thread, FYI. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 1 Feb 2012, at 22:37, Duncan Rance wrote: > On 1 Feb 2012, at 21:43, Tom Lane wrote: > >> If you could post complete instructions for duplicating this, we >> could probably find the cause fairly quickly. > > I've been on this for over a week now, and much of that has been trying to > simplify the test case. At last I have been able to reproduce this problem in a relatively simple (yet contrived) way. I've put together a tarball with a few scripts, some to be run on the primary and others to be run on the hot-stanby. There's a README in there explaining what to do. I'm going to try attaching it here, although I wouldn't be surprised if one is not allowed to send attachments to the list. Any suggestions of where to put it would be gratefully received. Cheers, Duncan bug_6425.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 2 Feb 2012, at 18:02, Duncan Rance wrote: > > At last I have been able to reproduce this problem in a relatively simple > (yet contrived) way. Doh! Should have mentioned this already, but in case a Sparc is not available, the latest on the debugging is as follows: As well as the bus error, I also saw the same symptom as described in BUG #6200. I changed the four places that did an elog ERROR "invalid memory alloc request size" to PANIC instead and got a raft of core files. I have not dug any further as yet, but at the following function on the stack: char * text_to_cstring(const text *t) The values t and tunpacked are the same, so pg_detoast_datum_packed() did not modify t. And len comes out as -4. A couple of bits from dbx: (dbx) print -fx t->vl_len_[0] t->vl_len_[0] = 0xff84 (dbx) examine tunpacked /2x 0x01ceb9dc: 0x8474 0x776f Going to have a look further up the stack now. Cheers, Dunc
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 3 Feb 2012, at 06:45, Tom Lane wrote: > > I probably ought to let the test case run overnight before concluding > anything, but at this point it's run for two-plus hours with no errors > after applying this patch: > > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c Thank Tom! I've had this running for a few hours now without problems. Previously, on Sparc, the problem would occur in less than a minute. I did try a build with --enable-cassert and it didn't actually cause the problem. I think I left it for about an hour. Although a a relatively modern machine, this Sparc box I am using is painfully slow. My guess is that the extra time taken to perform the Assert code is hiding the problem. Now it's time to persuade the customer to use a patched version of pg ;) Cheers, Duncan P.S. I've been looking for an OS project to contribute to, and I think I'll see if I can help with pg. Time to look a the TODO list :)
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 6 Feb 2012, at 20:48, Tom Lane wrote: > bug reports. Please see if you can break REL9_0_STABLE branch tip Just to let you know that I built this yesterday and I'm giving it a good battering in our Solaris 10 Sparc test environment. D -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 8 Feb 2012, at 10:01, Duncan Rance wrote: > On 6 Feb 2012, at 20:48, Tom Lane wrote: > >> bug reports. Please see if you can break REL9_0_STABLE branch tip > > Just to let you know that I built this yesterday and I'm giving it a good > battering in our Solaris 10 Sparc test environment. In this environment my bug repro scripts would produce the problem within seconds. It has now been running for 24 hours, so I'm confident the problem is solved. Our customers are keen to get the official release as soon as possible. They are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how long this might take, and I promised I'll find out for them. Any ideas? Thanks, Duncan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On 9 Feb 2012, at 15:02, Tom Lane wrote: > Duncan Rance writes: >> Our customers are keen to get the official release as soon as possible. They >> are on 9.0.6, so I guess this'll be 9.0.7? I'm new here so I don't know how >> long this might take, and I promised I'll find out for them. Any ideas? > > There's no firm plan at the moment. The earliest it could happen is > around the end of the month, since various key people have other > commitments in the next couple weeks. I'm not promising it *will* > happen then, but that's the way things look right now. > > (Since you're new around here, I'll explain that the way this works > is that the pgsql-core and pgsql-packagers lists agree on a release > date in advance. We've had some preliminary discussions, and people > seem to agree that this is a bad enough bug to force a release, but > no date's been set. Once a schedule decision is made, some core > member --- often me --- will announce it on pgsql-hackers, so you > can keep an eye on that list if you want advance notice.) > > regards, tom lane Good explanation. Thanks Tom! -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
On 14 Feb 2012, at 18:28, Tom Lane wrote: > > Oh, I see the reason for this: the code in cclass() in regc_locale.c > doesn't go further up than U+00FF, so no codes above that will be > thought to be letters (or members of any other character class). > Clearly we need to go further when we are dealing with UTF8. > I'm not sure what a sane limit would be though. The Basic Multilingual Plane goes up to : https://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes
Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
On 14 Feb 2012, at 18:28, Tom Lane wrote: > > Oh, I see the reason for this: the code in cclass() in regc_locale.c > doesn't go further up than U+00FF, so no codes above that will be > thought to be letters (or members of any other character class). > Clearly we need to go further when we are dealing with UTF8. > I'm not sure what a sane limit would be though. The Basic Multilingual Plane goes up to : https://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes