On 30/12/2020 11:12, Denis Smirnov wrote:
But why do you pass int natts and VacAttrStats **stats to
acquire_sample_rows()? Is it of any use? It seems to break
abstraction too.
Yes, it is really a kluge that should be discussed. The main problem
is that we don’t pass projection information to analyze scan (analyze
begin scan relise only on relation information during
initialization). And as a result we can’t benefit from column AMs
during «analyze t(col)» and consume data only from target columns.
This parameters were added to solve this problem.
The documentation needs to be updated accordingly, see
AcquireSampleRowsFunc in fdwhandler.sgml.
This part of the patch, adding the list of columns being analyzed, seems
a bit unfinished. I'd suggest to leave that out for now, and add it as
part of the "Table AM modifications to accept column projection lists"
patch that's also in this commitfest [1]
This suggestion also ... removes PROGRESS_ANALYZE_BLOCKS_TOTAL and
PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be
block-oriented.
We shouldn't just remove it, a progress indicator is nice. It's true
that not all AMs are block-oriented, but those that are can still use
those. Perhaps we can add ther PROGRESS_ANALYZE_* states for
non-block-oriented AMs, but that can wait until there is a concrete use
for it.
static int
acquire_sample_rows(Relation onerel, int elevel,
HeapTuple *rows, int targrows,
double *totalrows, double
*totaldeadrows)
{
int numrows = 0; /* # rows now in reservoir */
TableScanDesc scan;
Assert(targrows > 0);
scan = table_beginscan_analyze(onerel);
numrows = table_acquire_sample_rows(scan, elevel,
natts, stats,
vac_strategy, rows,
targrows, totalrows,
totaldeadrows);
table_endscan(scan);
/*
* If we didn't find as many tuples as we wanted then we're done. No
sort
* is needed, since they're already in order.
*
* Otherwise we need to sort the collected tuples by position
* (itempointer). It's not worth worrying about corner cases where the
* tuples are already sorted.
*/
if (numrows == targrows)
qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
return numrows;
}
Perhaps better to move the qsort() into heapam_acquire_sample_rows(),
and document that the acquire_sample_rows() AM function must return the
tuples in 'ctid' order. In a generic API, it seems like a shady
assumption that they must be in order if we didn't find as many rows as
we wanted. Or always call qsort(); if the tuples are already in order,
that should be pretty quick.
The table_beginscan_analyze() call seems a bit pointless now. Let's
remove it, and pass the Relation to table_acquire_sample_rows directly.
/*
* This callback needs to fill reservour with sample rows during analyze
* scan.
*/
int (*acquire_sample_rows) (TableScanDesc scan,
The "reservoir" is related to the block sampler, but that's just an
implementation detail of the function. Perhaps something like "Acquire a
sample of rows from the table, for ANALYZE". And explain the arguments
here, or in table_acquire_sample_rows().
Overall, I like where this patch is going.
[1] https://commitfest.postgresql.org/31/2922/
- Heikki