Re: Changing the state of data checksums in a running cluster

2025-04-04 Thread Alexander Korotkov
Hi! On Sat, Mar 15, 2025 at 7:33 PM Tomas Vondra wrote: > On 3/15/25 17:26, Andres Freund wrote: > > Jo. > > > > On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote: > >> Thanks, here's an updated patch version > > > > FWIW, this fails in CI; > > > > https://cirrus-ci.com/build/4678473324691456 > >

Re: Changing the state of data checksums in a running cluster

2025-03-15 Thread Tomas Vondra
On 3/15/25 17:26, Andres Freund wrote: > Jo. > > On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote: >> Thanks, here's an updated patch version > > FWIW, this fails in CI; > > https://cirrus-ci.com/build/4678473324691456 > On all OSs: > [16:08:36.331] # Failed test 'options --locale-provider=icu

Re: Changing the state of data checksums in a running cluster

2025-03-15 Thread Andres Freund
Jo. On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote: > Thanks, here's an updated patch version FWIW, this fails in CI; https://cirrus-ci.com/build/4678473324691456 On all OSs: [16:08:36.331] # Failed test 'options --locale-provider=icu --locale=und --lc-*=C: no stderr' [16:08:36.331] # at

Re: Changing the state of data checksums in a running cluster

2025-03-15 Thread Daniel Gustafsson
> On 14 Mar 2025, at 13:20, Tomas Vondra wrote: > I experimented with this a little bit, and unfortunately I ran into not > one, but two race conditions in this :-( I don't have reproducers, all > of this was done by manually adding sleep() calls / gdb breakpoints to > pause the processes for a w

Re: Changing the state of data checksums in a running cluster

2025-03-14 Thread Tomas Vondra
On 3/14/25 00:11, Tomas Vondra wrote: > ... >> One issue I ran into is the postmaster does not seem to be processing >> the barriers, and thus not getting info about the data_checksum_version >> changes. > > Makes sense, that seems like a pretty reasonable constraint for the >>

Re: Changing the state of data checksums in a running cluster

2025-03-13 Thread Tomas Vondra
On 3/13/25 17:26, Tomas Vondra wrote: > On 3/13/25 13:32, Daniel Gustafsson wrote: >>> On 13 Mar 2025, at 12:03, Tomas Vondra wrote: >>> >>> ... >>> >>> This also reminds me I had a question about the barrier - can't it >>> happen a process gets to process multiple barriers at the same time? I >>>

Re: Changing the state of data checksums in a running cluster

2025-03-13 Thread Tomas Vondra
On 3/13/25 13:32, Daniel Gustafsson wrote: >> On 13 Mar 2025, at 12:03, Tomas Vondra wrote: >> On 3/13/25 10:54, Daniel Gustafsson wrote: On 12 Mar 2025, at 14:16, Tomas Vondra wrote: > I believe the approach is correct, but the number of possible states (e.g. after a crash/restar

Re: Changing the state of data checksums in a running cluster

2025-03-13 Thread Daniel Gustafsson
> On 13 Mar 2025, at 12:03, Tomas Vondra wrote: > On 3/13/25 10:54, Daniel Gustafsson wrote: >>> On 12 Mar 2025, at 14:16, Tomas Vondra wrote: >>> I believe the approach is correct, but the number of possible states >>> (e.g. after a crash/restart) seems a bit complex. I wonder if there's a >>>

Re: Changing the state of data checksums in a running cluster

2025-03-13 Thread Tomas Vondra
On 3/13/25 10:54, Daniel Gustafsson wrote: >> On 12 Mar 2025, at 14:16, Tomas Vondra wrote: > >> I continued investigating this and experimenting with alternative >> approaches, and I think the way the patch relies on ControlFile is not >> quite right. That is, it always sets data_checksum_versio

Re: Changing the state of data checksums in a running cluster

2025-03-13 Thread Daniel Gustafsson
> On 12 Mar 2025, at 14:16, Tomas Vondra wrote: > I continued investigating this and experimenting with alternative > approaches, and I think the way the patch relies on ControlFile is not > quite right. That is, it always sets data_checksum_version to the last > ("current") value, but that's not

Re: Changing the state of data checksums in a running cluster

2025-03-12 Thread Dagfinn Ilmari Mannsåker
Tomas Vondra writes: > On 3/11/25 14:07, Dagfinn Ilmari Mannsåker wrote: >> As the resident perl style pedant, I'd just like to complain about the >> below: >> >> Tomas Vondra writes: >> >>> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm >>> b/src/test/perl/PostgreSQL/Test/Cluster.pm >

Re: Changing the state of data checksums in a running cluster

2025-03-12 Thread Tomas Vondra
On 3/11/25 14:07, Dagfinn Ilmari Mannsåker wrote: > As the resident perl style pedant, I'd just like to complain about the > below: > > Tomas Vondra writes: > >> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm >> b/src/test/perl/PostgreSQL/Test/Cluster.pm >> index 666bd2a2d4c..1c66360c16c

Re: Changing the state of data checksums in a running cluster

2025-03-11 Thread Tomas Vondra
Hi, I continued stress testing this, as I was rather unsure why the assert failures reported in [1] disappeared. And I managed to reproduce that again, and I think I actually understand why it happens. I modified the test script (attached) to setup replication, not just a single instance. And the

Re: Changing the state of data checksums in a running cluster

2025-03-11 Thread Dagfinn Ilmari Mannsåker
As the resident perl style pedant, I'd just like to complain about the below: Tomas Vondra writes: > diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm > b/src/test/perl/PostgreSQL/Test/Cluster.pm > index 666bd2a2d4c..1c66360c16c 100644 > --- a/src/test/perl/PostgreSQL/Test/Cluster.pm > +++

Re: Changing the state of data checksums in a running cluster

2025-03-10 Thread Tomas Vondra
One thing I forgot to mention is the progress reporting only updates blocks for the FORK_MAIN. It wouldn't be difficult to report blocks for each fork, but it'd be confusing - the relation counters would remain the same, but the block counters would change for each fork. I guess we could report th

Re: Changing the state of data checksums in a running cluster

2025-03-10 Thread Daniel Gustafsson
> On 10 Mar 2025, at 12:17, Tomas Vondra wrote: > > On 3/10/25 10:46, Tomas Vondra wrote: >> On 3/10/25 01:18, Tomas Vondra wrote: Thank you so much for picking up and fixing the blockers, it's highly appreciated! >> For me, this passes all CI tests, hopefully cfbot will be happy too. Confirm

Re: Changing the state of data checksums in a running cluster

2024-12-10 Thread Michael Paquier
On Tue, Nov 26, 2024 at 11:07:12PM +0100, Tomas Vondra wrote: > I spent a bit more time doing some testing on the last version of the > patch from [1]. And I ran into this assert in PostmasterStateMachine() > when stopping the cluster: > > /* All types should be included in targetMask or remainM

Re: Changing the state of data checksums in a running cluster

2024-11-26 Thread Tomas Vondra
Hi, I spent a bit more time doing some testing on the last version of the patch from [1]. And I ran into this assert in PostmasterStateMachine() when stopping the cluster: /* All types should be included in targetMask or remainMask */ Assert((remainMask.mask | targetMask.mask) == BTYPE_MASK_A

Re: Changing the state of data checksums in a running cluster

2024-11-07 Thread Tomas Vondra
Hi, Unfortunately it seems we're not out of the woods yet :-( I started doing some more testing on the v8 patch. My plan was to do some stress testing with physical replication, random restarts and stuff like that. But I ran into issues before that. Attached is a reproducer script, that does thi

Re: Changing the state of data checksums in a running cluster

2024-10-09 Thread Tomas Vondra
On 10/8/24 22:38, Daniel Gustafsson wrote: > >> 7) controldata.c - maybe this >> >> if (oldctrl->data_checksum_version == 2) >> >> should use PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION instead of the magic >> constant? For "off" we use "0" which seems somewhat acceptable, but for >> other values

Re: Changing the state of data checksums in a running cluster

2024-10-08 Thread Daniel Gustafsson
> On 7 Oct 2024, at 20:42, Dagfinn Ilmari Mannsåker wrote: > > Tomas Vondra writes: > >> 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345, >> shadowing earlier variable > > All the ListCell variables can be eliminated by using the foreach_ptr > and foreach_oid macros instead

Re: Changing the state of data checksums in a running cluster

2024-10-07 Thread Dagfinn Ilmari Mannsåker
Tomas Vondra writes: > 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345, >shadowing earlier variable All the ListCell variables can be eliminated by using the foreach_ptr and foreach_oid macros instead of plain foreach. - ilmari

Re: Changing the state of data checksums in a running cluster

2024-10-07 Thread Tomas Vondra
Hi, I did a quick review today. First a couple minor comments: 1) monitoring.sgml typos: number of database -> number of databases calcuated -> calculated 2) unnecessary newline in heapam.c (no other changes) 3) unnecessary ListCell in DataChecksumsWorkerMain() on line 1345, shadowing e

Re: Changing the state of data checksums in a running cluster

2024-10-01 Thread Daniel Gustafsson
> On 1 Oct 2024, at 00:43, Michael Banck wrote: > > Hi, > > On Mon, Sep 30, 2024 at 11:21:30PM +0200, Daniel Gustafsson wrote: >>> Yeah, I think a view like pg_stat_progress_checksums would work. >> >> Added in the attached version. It probably needs some polish (the docs for >> sure do) but i

Re: Changing the state of data checksums in a running cluster

2024-09-30 Thread Michael Banck
Hi, On Mon, Sep 30, 2024 at 11:21:30PM +0200, Daniel Gustafsson wrote: > > Yeah, I think a view like pg_stat_progress_checksums would work. > > Added in the attached version. It probably needs some polish (the docs for > sure do) but it's at least a start. Just a nitpick, but we call it data_ch

Re: Changing the state of data checksums in a running cluster

2024-07-05 Thread Bruce Momjian
On Wed, Jul 3, 2024 at 01:20:10PM +0200, Tomas Vondra wrote: > > * Restartability - the initial version of the patch did not support stateful > > restarts, a shutdown performed (or crash) before checksums were enabled > > would > > result in a need to start over from the beginning. This was deem

Re: Changing the state of data checksums in a running cluster

2024-07-03 Thread Tomas Vondra
Hi Daniel, Thanks for rebasing the patch and submitting it again! On 7/3/24 08:41, Daniel Gustafsson wrote: > After some off-list discussion about the desirability of this feature, where > several hackers opined that it's something that we should have, I've decided > to > rebase this patch and s