Re: longfin and tamandua aren't too happy but I'm not sure why

2022-10-01 Thread Justin Pryzby
On Sat, Oct 01, 2022 at 03:15:14PM -0700, Andres Freund wrote: > On 2022-10-01 11:14:20 -0500, Justin Pryzby wrote: > > (I still suggest my patches to run all tests using vcregress. The number of > > people who remember that, for v15, cirrusci runs incomplete tests is > > probably > > fewer than

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-10-01 Thread Andres Freund
Hi, On 2022-10-01 11:14:20 -0500, Justin Pryzby wrote: > I just tried this, which works fine at least for v11-v14: > | git checkout origin/REL_15_STABLE .cirrus.yml src/tools/ci > > https://cirrus-ci.com/task/5742859943936000 v15a > https://cirrus-ci.com/task/6725412431593472 v15b > https://cirru

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-10-01 Thread Justin Pryzby
On Wed, Sep 28, 2022 at 08:45:31PM -0700, Andres Freund wrote: > Hi, > > On 2022-09-28 21:22:26 +0200, Alvaro Herrera wrote: > > I have an additional, unrelated complaint about CI, which is that we > > don't have anything for past branches. I have a partial hack(*), but > > I wish we had somethin

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Andres Freund
Hi, On 2022-09-29 22:16:10 -0400, Tom Lane wrote: > Andres Freund writes: > > Any opinions about whether to do this only in head or backpatch to 15? > > HEAD should be sufficient, IMO. Pushed. I think we should add some more divergent options to increase the coverage. E.g. a different xlog page

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Tom Lane
Andres Freund writes: > Any opinions about whether to do this only in head or backpatch to 15? HEAD should be sufficient, IMO. regards, tom lane

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Andres Freund
Hi, On 2022-09-29 21:24:44 -0400, Tom Lane wrote: > Andres Freund writes: > > Tom, wasn't there something recently that made you complain about not having > > coverage around collations due to system settings? > > We found there was a gap for ICU plus LANG=C, IIRC. Using that then. Any opinio

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Tom Lane
Andres Freund writes: > Tom, wasn't there something recently that made you complain about not having > coverage around collations due to system settings? We found there was a gap for ICU plus LANG=C, IIRC. regards, tom lane

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Andres Freund
Hi, On 2022-09-29 18:16:51 -0700, Peter Geoghegan wrote: > On Thu, Sep 29, 2022 at 5:40 PM Andres Freund wrote: > > Onder if we should vary some build options like ICU or the system collation? > > Tom, wasn't there something recently that made you complain about not having > > coverage around col

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2022 at 5:40 PM Andres Freund wrote: > Onder if we should vary some build options like ICU or the system collation? > Tom, wasn't there something recently that made you complain about not having > coverage around collations due to system settings? That was related to TRUST_STRXFRM

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Andres Freund
Hi, On 2022-09-29 17:31:35 -0700, Andres Freund wrote: > I already added the necessary packages to the image. I didn't install llvm for > 32bit because that'd have a) bloated the image unduly b) they can't currently > be installed in parallel afaics. > Attached is the patch adding it to CI. To av

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-29 Thread Andres Freund
Hi, On 2022-09-28 16:07:13 -0400, Tom Lane wrote: > I agree that it'd be good if CI did some 32-bit testing so it could have > caught (5) and (6), but that's being worked on. I wasn't aware of anybody doing so, thus here's a patch for that. I already added the necessary packages to the image. I

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Andres Freund
Hi, On 2022-09-28 21:22:26 +0200, Alvaro Herrera wrote: > I have an additional, unrelated complaint about CI, which is that we > don't have anything for past branches. I have a partial hack(*), but > I wish we had something we could readily use. > > (*) I just backpatched the commit that added t

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Andres Freund
Hi, On 2022-09-28 22:14:11 -0400, Tom Lane wrote: > I was thinking of meson when I wrote that ... but re-reading it, > I think Robert meant CI. FWIW, I had planned to put the "translation table" between autoconf and meson into the docs, but Peter E. argued that the wiki is better for that. Happy

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Tom Lane
Andres Freund writes: > On 2022-09-28 16:07:13 -0400, Tom Lane wrote: >> Robert Haas writes: >>> And like the existing buildfarm, it's severely under-documented. >> That complaint I agree with. A wiki page is a pretty poor substitute >> for in-tree docs. > I assume we're talking about CI? I w

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Andres Freund
Hi, On 2022-09-28 16:07:13 -0400, Tom Lane wrote: > Robert Haas writes: > > And like the existing buildfarm, it's severely under-documented. > > That complaint I agree with. A wiki page is a pretty poor substitute > for in-tree docs. I assume we're talking about CI? What would you like to see

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 12:20 PM Alvaro Herrera wrote: > What do you think would constitute a test here? I would start with something simple. Focus on the record types that we know are the most common. It's very skewed towards heap and nbtree record types, plus some transaction rmgr types. > Say

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Tom Lane
Robert Haas writes: > Yeah, I suppose I have to get in the habit of looking at CI before > committing anything. It's sort of annoying to me, though. Here's a > list of the follow-up fixes I've so far committed: > 1. headerscheck > 2. typos > 3. pg_buffercache's meson.build > 4. compiler warning >

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 12:32 PM Thomas Munro wrote: > I don't agree with this. The build farm clearly has more ways to > break than CI, because it has more CPUs, compilers, operating systems, > combinations of configure options and rolls of the timing dice, but CI > now catches a lot and, import

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Thomas Munro
On Thu, Sep 29, 2022 at 1:27 AM Robert Haas wrote: > ... Here's a > list of the follow-up fixes I've so far committed: > > 1. headerscheck > 2. typos > 3. pg_buffercache's meson.build > 4. compiler warning > 5. alignment problem > 6. F_INTEQ/F_OIDEQ problem > > CI caught (1), (3), and (4). The bui

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Alvaro Herrera
On 2022-Sep-28, Robert Haas wrote: > The number of buildfarm failures that I would have avoided by checking > CI is less than the number of extra things I had to fix to keep CI > happy, and the serious problems were caught by the buildfarm, not by > CI. [...] So I guess the way you're supposed to

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Alvaro Herrera
On 2022-Sep-28, Peter Geoghegan wrote: > It would be useful if there were generic tests that caught issues like > this. There are various subtle effects related to how struct layout > can impact WAL record size that might easily be missed. It's not like > there are a huge number of truly critical

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2022 at 6:48 AM Robert Haas wrote: > On second thought, I'm going to revert the whole thing. There's a > bigger mess here than can be cleaned up on the fly. The > alignment-related mess in ParseCommitRecord is maybe something for > which I could just hack a quick fix, but what I've

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Robert Haas
On Wed, Sep 28, 2022 at 9:16 AM Robert Haas wrote: > I agree. I should have gone through and checked that every place where > RelFileLocator got embedded in some larger struct, there was no > problem with making it bigger and increasing the alignment > requirement. I'll go back and do that as soon

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Robert Haas
On Tue, Sep 27, 2022 at 5:50 PM Tom Lane wrote: > Maybe it wouldn't have any great impact. I don't know, but I don't > think it's incumbent on me to measure that. You or the patch author > should have had a handle on that question *before* committing. I agree. I should have gone through and che

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Robert Haas
On Tue, Sep 27, 2022 at 5:29 PM Tom Lane wrote: > ... also, lapwing's not too happy [1]. The alter_table test > expects this to yield zero rows, but it doesn't: > > SELECT m.* FROM filenode_mapping m LEFT JOIN pg_class c ON c.oid = m.oid > WHERE c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL; >

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Robert Haas
On Wed, Sep 28, 2022 at 1:15 AM Tom Lane wrote: > Dilip Kumar writes: > > wrasse is also failing with a bus error, > > Yeah. At this point I think it's time to call for this patch > to get reverted. It should get tested *off line* on some > non-Intel, non-64-bit, alignment-picky architectures b

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Robert Haas
On Wed, Sep 28, 2022 at 1:48 AM Thomas Munro wrote: > FTR CI reported that cpluspluscheck failure and more[1], so perhaps we > just need to get clearer agreement on the status of CI, ie a policy > that CI had better be passing before you get to the next stage. It's > still pretty new... Yeah, I

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Dilip Kumar
On Wed, Sep 28, 2022 at 11:57 AM Tom Lane wrote: > > Dilip Kumar writes: > > Btw, I think the reason for the bus error on wrasse is the same as > > what is creating failure on longfin[1], I mean this unaligned access > > is causing Bus error during startup, IMHO. > > Maybe, but there's not a lot

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Thomas Munro
On Wed, Sep 28, 2022 at 9:26 PM Dilip Kumar wrote: > It was a silly mistake, I used the F_OIDEQ function instead of > F_INT8EQ. Although this was correct on the 0003 patch where we have > removed the tablespace from key, but got missed in this :( > > I have locally reproduced this in a 32 bit mach

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-28 Thread Dilip Kumar
On Wed, Sep 28, 2022 at 9:40 AM Dilip Kumar wrote: > > On Wed, Sep 28, 2022 at 2:59 AM Tom Lane wrote: > > > > ... also, lapwing's not too happy [1]. The alter_table test > > expects this to yield zero rows, but it doesn't: > > By looking at regression diff as shown below, it seems that we are >

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane
Dilip Kumar writes: > Btw, I think the reason for the bus error on wrasse is the same as > what is creating failure on longfin[1], I mean this unaligned access > is causing Bus error during startup, IMHO. Maybe, but there's not a lot of evidence for that. wrasse got through the test_decoding che

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Dilip Kumar
On Wed, Sep 28, 2022 at 10:45 AM Tom Lane wrote: > > Dilip Kumar writes: > > wrasse is also failing with a bus error, > > Yeah. At this point I think it's time to call for this patch > to get reverted. It should get tested *off line* on some > non-Intel, non-64-bit, alignment-picky architecture

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Thomas Munro
On Wed, Sep 28, 2022 at 6:15 PM Tom Lane wrote: > There may be a larger conversation to be had here about how > much our CI infrastructure should be detecting. There seems > to be a depressingly large gap between what that found and > what the buildfarm is finding --- not only in portability > is

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane
Dilip Kumar writes: > wrasse is also failing with a bus error, Yeah. At this point I think it's time to call for this patch to get reverted. It should get tested *off line* on some non-Intel, non-64-bit, alignment-picky architectures before the rest of us have to deal with it any more. There m

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Dilip Kumar
wrasse is also failing with a bus error, but I cannot get the stack trace. So it seems it is hitting some alignment issues during startup [1]. Is it possible to get the backtrace or lineno? [1] 2022-09-28 03:19:26.228 CEST [180:4] LOG: redo starts at 0/30FE9D8 2022-09-28 03:19:27.674 CEST [177:

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Dilip Kumar
On Wed, Sep 28, 2022 at 2:59 AM Tom Lane wrote: > > ... also, lapwing's not too happy [1]. The alter_table test > expects this to yield zero rows, but it doesn't: By looking at regression diff as shown below, it seems that we are able to get the relfilenode from the Oid using pg_relation_filenod

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane
Robert Haas writes: > On Tue, Sep 27, 2022 at 4:50 PM Tom Lane wrote: >> There is a second problem that I am going to hold your feet to the >> fire about: >> (lldb) p sizeof(SharedInvalidationMessage) >> (unsigned long) $1 = 24 > Also, I don't really know what problem you think it's going to cau

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane
... also, lapwing's not too happy [1]. The alter_table test expects this to yield zero rows, but it doesn't: SELECT m.* FROM filenode_mapping m LEFT JOIN pg_class c ON c.oid = m.oid WHERE c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL; I've reproduced that symptom in a 32-bit FreeBSD VM buildin

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Robert Haas
On Tue, Sep 27, 2022 at 4:50 PM Tom Lane wrote: > I wrote: > > * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', > > xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30 > > Okay, so the problem is this: by widening RelFileNumber to 64 bits, > you have

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane
I wrote: > * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', > xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30 Okay, so the problem is this: by widening RelFileNumber to 64 bits, you have increased the alignment requirement of struct RelFileLocator

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane
Justin Pryzby writes: > On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote: >> Both animals are running with -fsanitize=alignment and it's not >> difficult to believe that the commit mentioned above could have >> introduced an alignment problem where we didn't have one before, but >> with

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Justin Pryzby
On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote: > Both animals are running with -fsanitize=alignment and it's not > difficult to believe that the commit mentioned above could have > introduced an alignment problem where we didn't have one before, but > without a stack backtrace I don't