Re: PoC Refactor AM analyse API

Heikki Linnakangas Fri, 22 Jan 2021 05:12:51 -0800

On 30/12/2020 11:12, Denis Smirnov wrote:

But why do you pass int natts and VacAttrStats **stats to
acquire_sample_rows()? Is it of any use? It seems to break
abstraction too.


Yes, it is really a kluge that should be discussed. The main problem
is that we don’t pass projection information to analyze scan (analyze
begin scan relise only on relation information during
initialization). And as a result we can’t benefit from column AMs
during «analyze t(col)» and consume data only from target columns.
This parameters were added to solve this problem.

The documentation needs to be updated accordingly, seeAcquireSampleRowsFunc in fdwhandler.sgml.

This part of the patch, adding the list of columns being analyzed, seemsa bit unfinished. I'd suggest to leave that out for now, and add it aspart of the "Table AM modifications to accept column projection lists"patch that's also in this commitfest [1]

This suggestion also ... removes PROGRESS_ANALYZE_BLOCKS_TOTAL and
PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be
block-oriented.

We shouldn't just remove it, a progress indicator is nice. It's truethat not all AMs are block-oriented, but those that are can still usethose. Perhaps we can add ther PROGRESS_ANALYZE_* states fornon-block-oriented AMs, but that can wait until there is a concrete usefor it.

static int
acquire_sample_rows(Relation onerel, int elevel,
                                        HeapTuple *rows, int targrows,
                                        double *totalrows, double 
*totaldeadrows)
{
        int                     numrows = 0;    /* # rows now in reservoir */
        TableScanDesc scan;

        Assert(targrows > 0);

        scan = table_beginscan_analyze(onerel);

        numrows = table_acquire_sample_rows(scan, elevel,
                                                                                
natts, stats,
                                                                                
vac_strategy, rows,
                                                                                
targrows, totalrows,
                                                                                
totaldeadrows);

        table_endscan(scan);

        /*
         * If we didn't find as many tuples as we wanted then we're done. No 
sort
         * is needed, since they're already in order.
         *
         * Otherwise we need to sort the collected tuples by position
         * (itempointer). It's not worth worrying about corner cases where the
         * tuples are already sorted.
         */
        if (numrows == targrows)
                qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);

        return numrows;
}

Perhaps better to move the qsort() into heapam_acquire_sample_rows(),and document that the acquire_sample_rows() AM function must return thetuples in 'ctid' order. In a generic API, it seems like a shadyassumption that they must be in order if we didn't find as many rows aswe wanted. Or always call qsort(); if the tuples are already in order,that should be pretty quick.

The table_beginscan_analyze() call seems a bit pointless now. Let'sremove it, and pass the Relation to table_acquire_sample_rows directly.

        /*
         * This callback needs to fill reservour with sample rows during analyze
         * scan.
         */
        int                     (*acquire_sample_rows) (TableScanDesc scan,

The "reservoir" is related to the block sampler, but that's just animplementation detail of the function. Perhaps something like "Acquire asample of rows from the table, for ANALYZE". And explain the argumentshere, or in table_acquire_sample_rows().


Overall, I like where this patch is going.

[1] https://commitfest.postgresql.org/31/2922/

- Heikki

Re: PoC Refactor AM analyse API

Reply via email to