On Mon, Apr 8, 2024 at 10:26 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote: > > On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote: > > > From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001 > > > From: Melanie Plageman <melanieplage...@gmail.com> > > > Date: Sun, 7 Apr 2024 15:38:41 -0400 > > > Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore() > > > > > > A previous commit stopped using BlockSampler_HasMore() for flow control > > > in acquire_sample_rows(). There seems little use now for > > > BlockSampler_HasMore(). It should be sufficient to return > > > InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore() > > > would have returned false. Remove BlockSampler_HasMore(). > > > > > > Author: Melanie Plageman, Nazir Bilal Yavuz > > > Discussion: > > > https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com > > > > The justification here seems somewhat odd. Sure, the previous commit stopped > > using BlockSampler_HasMore in acquire_sample_rows - but only because it was > > moved to block_sampling_streaming_read_next()? > > It didn't stop using it. It stopped being useful. The reason it existed, > as far as I can tell, was to use it as the while() loop condition in > acquire_sample_rows(). I think it makes much more sense for > BlockSampler_Next() to return InvalidBlockNumber when there are no more > blocks -- not to assert you don't call it when there aren't any more > blocks. > > I didn't want to change BlockSampler_Next() in the same commit as the > streaming read user and we can't remove BlockSampler_HasMore() without > changing BlockSampler_Next().
I agree that the code looks useless if one condition implies the other, but isn't it good to keep that cross-check, perhaps reformulated as an assertion? I didn't look too hard at the maths, I just saw the words "It is not obvious that this code matches Knuth's Algorithm S ..." and realised I'm not sure I have time to develop a good opinion about this today. So I'll leave the 0002 change out for now, as it's a tidy-up that can easily be applied in the next cycle.
From c3b8df8e4720d8b0dfb4c892c0aa3ddaef8f401f Mon Sep 17 00:00:00 2001 From: Thomas Munro <tmu...@postgresql.org> Date: Mon, 8 Apr 2024 14:38:58 +1200 Subject: [PATCH v12] Remove obsolete BlockSampler_HasMore(). Commit 041b9680 stopped using BlockSampler_HasMore() for flow control in acquire_sample_rows(). There seems to be little use for it now. We can just return InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore() would have returned false. Author: Melanie Plageman <melanieplage...@gmail.com> Author: Nazir Bilal Yavuz <byavu...@gmail.com> Reviewed-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com --- src/backend/commands/analyze.c | 4 +--- src/backend/utils/misc/sampling.c | 10 +++------- src/include/utils/sampling.h | 1 - 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index da27a13a3f..e9fa3470cf 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1111,9 +1111,7 @@ block_sampling_read_stream_next(ReadStream *stream, void *callback_private_data, void *per_buffer_data) { - BlockSamplerData *bs = callback_private_data; - - return BlockSampler_HasMore(bs) ? BlockSampler_Next(bs) : InvalidBlockNumber; + return BlockSampler_Next(callback_private_data); } /* diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c index 933db06702..6e2bca9739 100644 --- a/src/backend/utils/misc/sampling.c +++ b/src/backend/utils/misc/sampling.c @@ -54,12 +54,6 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize, return Min(bs->n, bs->N); } -bool -BlockSampler_HasMore(BlockSampler bs) -{ - return (bs->t < bs->N) && (bs->m < bs->n); -} - BlockNumber BlockSampler_Next(BlockSampler bs) { @@ -68,7 +62,9 @@ BlockSampler_Next(BlockSampler bs) double p; /* probability to skip block */ double V; /* random */ - Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */ + /* Return if no remaining blocks or no blocks to sample */ + if (K <= 0 || k <= 0) + return InvalidBlockNumber; if ((BlockNumber) k >= K) { diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h index be48ee52ba..fb5d6820a2 100644 --- a/src/include/utils/sampling.h +++ b/src/include/utils/sampling.h @@ -38,7 +38,6 @@ typedef BlockSamplerData *BlockSampler; extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize, uint32 randseed); -extern bool BlockSampler_HasMore(BlockSampler bs); extern BlockNumber BlockSampler_Next(BlockSampler bs); /* Reservoir sampling methods */ -- 2.44.0