Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-30 Thread Alvaro Herrera
Alvaro Herrera wrote: > Alvaro Herrera wrote: > > > Before pushing, I'll give a look to the regular autovacuum path to see > > if it needs a similar fix. > > Reading that one, my conclusion is that it doesn't have the same problem > because the strings are allocated in AutovacuumMemCxt which is n

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-24 Thread Alvaro Herrera
Alvaro Herrera wrote: > Before pushing, I'll give a look to the regular autovacuum path to see > if it needs a similar fix. Reading that one, my conclusion is that it doesn't have the same problem because the strings are allocated in AutovacuumMemCxt which is not reset by error recovery. This ga

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-23 Thread Alvaro Herrera
Tom Lane wrote: > What I'm suspicious of as the actual bug cause is the comment in > perform_work_item about how we need to be sure that we're allocating these > strings in a long-lived context. If, in fact, they were allocated in some > context that could get reset during the PG_TRY (particularl

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Tom Lane
Alvaro Herrera writes: > And the previous code crashes in 45 minutes? That's solid enough for > me; I'll clean up the patch and push in the next few days. I think what > you have now should be sufficient for the time being for your production > system. I'm still of the opinion that the presente

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Wed, Oct 18, 2017 at 07:22:27PM +0200, Alvaro Herrera wrote: > Do you still have those core dumps? If so, would you please verify the > database that autovacuum was running in? Just open each with gdb (using > the original postgres binary, not the one you just installed) and do > "print MyData

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Alvaro Herrera
Justin Pryzby wrote: > On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote: > > And the previous code crashes in 45 minutes? That's solid enough for > > me; I'll clean up the patch and push in the next few days. I think what > > you have now should be sufficient for the time being for

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote: > Justin Pryzby wrote: > > > No crashes in ~28hr. It occurs to me that it's a weaker test due to not > > preserving most compilation options. > > And the previous code crashes in 45 minutes? That's solid enough for > me; I'll clean

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Alvaro Herrera
Justin Pryzby wrote: > No crashes in ~28hr. It occurs to me that it's a weaker test due to not > preserving most compilation options. And the previous code crashes in 45 minutes? That's solid enough for me; I'll clean up the patch and push in the next few days. I think what you have now should

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 09:07:40AM -0500, Justin Pryzby wrote: > On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: > > Justin Pryzby writes: > > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > > >> Anyway, can give this patch a try? > > > > The trick in this sort of si

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tom Lane
Alvaro Herrera writes: > cur_datname here seems corrupted -- it points halfway into cur_nspname, > which is also a corrupt value. Yeah. > And I think that's because we're not > checking that the namespace OID is a valid value before calling > get_namespace_name on it. No, because get_namespace_

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Craig Ringer
On 17 October 2017 at 22:39, Tom Lane wrote: > Justin Pryzby writes: >> On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: >>> So: where did you get the existing binaries? If it's from some vendor >>> packaging system, what you should do is fetch the package source, add >>> the patch to t

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Alvaro Herrera
Justin Pryzby wrote: > I'm happy to try the patch, but in case it makes any difference, we have few > DBs/schemas: I don't expect that it does. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsq

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tom Lane
Justin Pryzby writes: > On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: >> So: where did you get the existing binaries? If it's from some vendor >> packaging system, what you should do is fetch the package source, add >> the patch to the probably-nonempty set of patches the vendor is ap

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: > Justin Pryzby writes: > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > >> Anyway, can give this patch a try? > > > I've only compiled postgres once before and this is a production environment > > (althought nothing s

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tom Lane
Justin Pryzby writes: > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: >> Anyway, can give this patch a try? > I've only compiled postgres once before and this is a production environment > (althought nothing so important that the crashes are a serious concern > either). > Is i

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tomas Vondra
On 10/17/2017 02:29 PM, Justin Pryzby wrote: > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: >> Anyway, can give this patch a try? > > I've only compiled postgres once before and this is a production environment > (althought nothing so important that the crashes are a serious con

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > Justin Pryzby wrote: > > > #1 0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at > > autovacuum.c:2676 > > cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 > > :varcollid 0 :varlevelsup 0 :varno

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > Anyway, can give this patch a try? I've only compiled postgres once before and this is a production environment (althought nothing so important that the crashes are a serious concern either). Is it reasonable to wget the postgres t

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Alvaro Herrera
Justin Pryzby wrote: > #1 0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at > autovacuum.c:2676 > cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 :varcollid > 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 146} {CONST :consttype > 1184 :consttypmod -1

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Alvaro Herrera
Justin Pryzby wrote: > On Sun, Oct 15, 2017 at 02:44:58PM +0200, Tomas Vondra wrote: > > Thanks, but I'm not sure that'll help, at this point. We already know > > what happened (corrupted memory), we don't know "how". And core files > > are mostly just "snapshots" so are not very useful in answerin

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-16 Thread Justin Pryzby
On Sun, Oct 15, 2017 at 02:44:58PM +0200, Tomas Vondra wrote: > Thanks, but I'm not sure that'll help, at this point. We already know > what happened (corrupted memory), we don't know "how". And core files > are mostly just "snapshots" so are not very useful in answering that :-( Is there anything

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Justin Pryzby
On Sat, Oct 14, 2017 at 08:56:56PM -0500, Justin Pryzby wrote: > On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: > > > Also notice the vacuum process was interrupted, same as yesterday (think > > > goodness for full logs). Our INSERT script is using python > > > multiprocessing.pool

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Tomas Vondra
Hi, On 10/15/2017 03:56 AM, Justin Pryzby wrote: > On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: ... >> It's a bit difficult to guess what went wrong from this backtrace. For >> me gdb typically prints a bunch of lines immediately before the frames, >> explaining what went wrong -

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: > > Also notice the vacuum process was interrupted, same as yesterday (think > > goodness for full logs). Our INSERT script is using python > > multiprocessing.pool() with "maxtasksperchild=1", which I think means we > > load > > one

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Tomas Vondra
Hi, On 10/15/2017 12:42 AM, Justin Pryzby wrote: > On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: >> I don't have any reason to believe there's memory issue on the server, So I >> suppose this is just a "heads up" to early adopters until/in case it happens >> again and I can at lea

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: > I don't have any reason to believe there's memory issue on the server, So I > suppose this is just a "heads up" to early adopters until/in case it happens > again and I can at least provide a stack trace. I'm back; find stacktrace be

[HACKERS] SIGSEGV in BRIN autosummarize

2017-10-13 Thread Justin Pryzby
I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced an BTREE index with BRIN index (WITH autosummarize). Today I see: < 2017-10-13 17:22:47.839 -04 >LOG: server process (PID 32127) was terminated by signal 11: Segmentation fault < 2017-10-13 17:22:47.839 -04 >DETAIL: