Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-03-03 Thread Noah Misch
On Sun, Jan 16, 2022 at 01:02:41PM -0800, Noah Misch wrote: > My next steps: > > - Report a Debian bug for the sparc64+ext4 zeros problem. Reported to Debian, then upstream: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006157 https://marc.info/?t=16453926991 Last week, someone confirme

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-02-01 Thread Noah Misch
On Mon, Jan 24, 2022 at 12:02:43AM -0800, Noah Misch wrote: > For 003_cic_2pc.pl, I'm > fine using $TODO so we continue to run all test commands and quietly log their > results. For 027_stream_regress.pl, which would need deep changes to use > $TODO, it works to use any of todo_skip, skip, or skip

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-24 Thread Noah Misch
On Sun, Jan 23, 2022 at 06:34:32PM -0800, Andres Freund wrote: > On 2022-01-23 18:10:07 -0800, Noah Misch wrote: > > On Sun, Jan 23, 2022 at 05:40:54PM -0800, Andres Freund wrote: > > > Test::more's description: "If it's something the programmer hasn't done > > > yet, > > > use TODO. This is for a

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Andres Freund
On 2022-01-23 21:25:04 -0500, Tom Lane wrote: > Michael Paquier writes: > > On Sun, Jan 23, 2022 at 06:10:07PM -0800, Noah Misch wrote: > >> Could do that. Every run that doesn't get the flaky failure will print a > >> message like "TODO passed: 3-5", though the test file could mitigate that >

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Andres Freund
Hi, On 2022-01-23 18:10:07 -0800, Noah Misch wrote: > On Sun, Jan 23, 2022 at 05:40:54PM -0800, Andres Freund wrote: > > Test::more's description: "If it's something the programmer hasn't done yet, > > use TODO. This is for any code you haven't written yet, or bugs you have yet > > to fix, but wan

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Tom Lane
Michael Paquier writes: > On Sun, Jan 23, 2022 at 06:10:07PM -0800, Noah Misch wrote: >> Could do that. Every run that doesn't get the flaky failure will print a >> message like "TODO passed: 3-5", though the test file could mitigate that by >> declaring the TODO only on configurations where we

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Michael Paquier
On Sun, Jan 23, 2022 at 06:10:07PM -0800, Noah Misch wrote: > Could do that. Every run that doesn't get the flaky failure will print a > message like "TODO passed: 3-5", though the test file could mitigate that by > declaring the TODO only on configurations where we expect a failure. The > 027_s

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Noah Misch
On Sun, Jan 23, 2022 at 05:40:54PM -0800, Andres Freund wrote: > On 2022-01-23 17:17:59 -0800, Noah Misch wrote: > > On Sun, Jan 23, 2022 at 05:03:04PM -0800, Andres Freund wrote: > > > On January 23, 2022 3:29:27 PM PST > > > >(a) Modify the tests so the affected animals can skip affected tests by

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Andres Freund
Hi, On 2022-01-23 17:17:59 -0800, Noah Misch wrote: > On Sun, Jan 23, 2022 at 05:03:04PM -0800, Andres Freund wrote: > > On January 23, 2022 3:29:27 PM PST > > >(a) Modify the tests so the affected animals can skip affected tests by > > >setting an environment variable, named PG_TEST_HAS_WAL_READ_

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Noah Misch
On Sun, Jan 23, 2022 at 05:03:04PM -0800, Andres Freund wrote: > On January 23, 2022 3:29:27 PM PST > >(a) Modify the tests so the affected animals can skip affected tests by > >setting an environment variable, named PG_TEST_HAS_WAL_READ_BUG or similar. > > Why not just detect the problem in the t

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Andres Freund
On January 23, 2022 3:29:27 PM PST >(a) Modify the tests so the affected animals can skip affected tests by >setting an environment variable, named PG_TEST_HAS_WAL_READ_BUG or similar. Why not just detect the problem in the tap test and skip, rather than requiring multiple buildfarm configs to

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Tom Lane
Noah Misch writes: > On Mon, Jan 24, 2022 at 12:49:16PM +1300, Thomas Munro wrote: >> Trying out a new idea: what if we could tell the buildfarm website >> that a certain test is currently expected to fail for reasons we can't >> fix yet (configuration change needed but owner not responding, or >>

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Noah Misch
On Mon, Jan 24, 2022 at 12:49:16PM +1300, Thomas Munro wrote: > On Mon, Jan 24, 2022 at 12:29 PM Noah Misch wrote: > > On Mon, Jan 24, 2022 at 09:42:13AM +1300, Thomas Munro wrote: > > > I'm less > > > sure it makes sense to do anything to support the presumed bogus > > > zeroes bug for (probably)

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Thomas Munro
On Mon, Jan 24, 2022 at 12:29 PM Noah Misch wrote: > On Mon, Jan 24, 2022 at 09:42:13AM +1300, Thomas Munro wrote: > > I'm less > > sure it makes sense to do anything to support the presumed bogus > > zeroes bug for (probably) no real users, especially before we've even > > reported it and heard s

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Noah Misch
On Mon, Jan 24, 2022 at 09:42:13AM +1300, Thomas Munro wrote: > I'm less > sure it makes sense to do anything to support the presumed bogus > zeroes bug for (probably) no real users, especially before we've even > reported it and heard some analysis, for example acceptance that it's > broken and co

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Andres Freund
Hi, On 2022-01-24 09:42:13 +1300, Thomas Munro wrote: > On Sun, Jan 23, 2022 at 7:52 AM Noah Misch wrote: > > Future work can benchmark the new behavior and, if it performs well, make > > it unconditional in v15+. I would expect performance to be unchanged or > > slightly better, because the new

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-23 Thread Thomas Munro
On Sun, Jan 23, 2022 at 7:52 AM Noah Misch wrote: > Attached. With this, kittiwake has survived 8.5hr of 003_cic_2pc.pl. Without > the patch, it failed many times, always within 1.3hr. For easier review, this > patch uses the new behavior on all platforms. Before commit and back-patch, I > pla

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-22 Thread Noah Misch
On Sun, Jan 16, 2022 at 01:02:41PM -0800, Noah Misch wrote: > My next steps: > > - Report a Debian bug for the sparc64+ext4 zeros problem. (Not done yet.) > - Try to falsify the idea that "write only the not-already-written portion of > a WAL block" is an effective workaround. Specifically, m

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-20 Thread Noah Misch
On Fri, Jan 21, 2022 at 08:34:22AM +1300, Thomas Munro wrote: > On Mon, Jan 17, 2022 at 10:02 AM Noah Misch wrote: > > - Report a Debian bug for the sparc64+ext4 zeros problem. > > I suspect that 027_stream_regress.pl hits this kernel bug with high > probability[1]. I wonder if the owner of kitt

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-20 Thread Thomas Munro
On Mon, Jan 17, 2022 at 10:02 AM Noah Misch wrote: > - Report a Debian bug for the sparc64+ext4 zeros problem. I suspect that 027_stream_regress.pl hits this kernel bug with high probability[1]. I wonder if the owner of kittiwake and tadarida would consider setting up an xfs file system? Or alt

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-16 Thread Noah Misch
Cancel that kernel upgrade idea. I no longer expect it to help... On Sun, Jan 16, 2022 at 10:19:30PM +1300, Thomas Munro wrote: > On Sun, Jan 16, 2022 at 8:12 PM Noah Misch wrote: > > For specifics of the kernel bug, see the attached test program. In brief, > > the > > bug arises if one proces

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-16 Thread Thomas Munro
On Sun, Jan 16, 2022 at 8:12 PM Noah Misch wrote: > For specifics of the kernel bug, see the attached test program. In brief, the > bug arises if one process is write()ing or pwrite()ing a file at about the > same time that another process is read()ing or pread()ing the same. POSIX > says the re

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2022-01-15 Thread Noah Misch
On Fri, Nov 19, 2021 at 09:18:23PM -0800, Noah Misch wrote: > On Wed, Nov 17, 2021 at 11:05:06PM -0800, Noah Misch wrote: > > On Wed, Nov 17, 2021 at 05:47:10PM -0500, Tom Lane wrote: > > > Noah Misch writes: > > > > Each of the three failures happened on a sparc64 Debian+gcc machine. I > > > >

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-19 Thread Tom Lane
I wrote: > snapper just exhibited the same failure, too: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2021-11-18%2016%3A09%3A49 I grepped the buildfarm logs for all recent (last 3 months) occurrences of 'could not read two-phase state'. Here's the results: sysname |

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-19 Thread Tom Lane
Noah Misch writes: > Tom Turelinckx, are you able to provide remote access to kittiwake or > tadarida? I'd use it to attempt the above things. All else being equal, > kittiwake is more relevant since it's still supported upstream. snapper just exhibited the same failure, too: https://buildfarm

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-19 Thread Noah Misch
On Wed, Nov 17, 2021 at 11:05:06PM -0800, Noah Misch wrote: > On Wed, Nov 17, 2021 at 05:47:10PM -0500, Tom Lane wrote: > > Noah Misch writes: > > > Each of the three failures happened on a sparc64 Debian+gcc machine. I > > > had > > > tried ~8000 iterations on thorntail, another sparc64 Debian+

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-18 Thread Tom Lane
Andrey Borodin writes: > Let's add more tests that check survival of 2PC through crash recovery? We do > now only one restart. Maybe it worth to do 4 or 8? That seems a little premature when we can't explain the failure we have. Also, buildfarm cycles aren't free. regar

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-18 Thread Andrey Borodin
> 18 нояб. 2021 г., в 12:05, Noah Misch написал(а): > > What else might help? Let's add more tests that check survival of 2PC through crash recovery? We do now only one restart. Maybe it worth to do 4 or 8? Best regards, Andrey Borodin.

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-17 Thread Noah Misch
On Wed, Nov 17, 2021 at 05:47:10PM -0500, Tom Lane wrote: > Noah Misch writes: > > Each of the three failures happened on a sparc64 Debian+gcc machine. I had > > tried ~8000 iterations on thorntail, another sparc64 Debian+gcc animal, > > without reproducing this. > # 'pgbench:

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-17 Thread Tom Lane
Noah Misch writes: > Tom Lane reported another instance today: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-11%2013%3A29%3A58 > Each of the three failures happened on a sparc64 Debian+gcc machine. I had > tried ~8000 iterations on thorntail, another sparc64 Debia

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-11 Thread Noah Misch
On Mon, Nov 08, 2021 at 01:42:46PM +0900, Michael Paquier wrote: > On Sat, Nov 06, 2021 at 06:31:57PM -0700, Noah Misch wrote: > > On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote: > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-10-24%2012%3A01%3A10 > > > g

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-08 Thread Michael Paquier
On Mon, Nov 08, 2021 at 01:42:46PM +0900, Michael Paquier wrote: > Indeed. Looking closer, I think that we'd better improve > DecodingContextFindStartpoint(), > pg_logical_replication_slot_advance(), XLogSendLogical() as well as > pg_logical_slot_get_changes_guts() to follow a format closer to wha

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-08 Thread Andrey Borodin
> 7 нояб. 2021 г., в 06:31, Noah Misch написал(а): > > As a first step, let's report the actual XLogReadRecord() error message. > Attached. All the other sites that expect no error already do this. BTW some time ago I've spotted a good number of related unreported errors [0]. [0] https://

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-07 Thread Michael Paquier
On Sat, Nov 06, 2021 at 06:31:57PM -0700, Noah Misch wrote: > As a first step, let's report the actual XLogReadRecord() error message. > Attached. Good catch! This looks good. > All the other sites that expect no error already do this. Indeed. Looking closer, I think that we'd better improve D