Hello,

On Thu, Dec 5, 2019 at 11:14 AM Pengzhou Tang <pt...@pivotal.io> wrote:
>
> When hacking the Zedstore, we need to get a more accurate statistic for 
> zedstore and we
> faced some restrictions:
> 1) acquire_sample_rows() always use RelationGetNumberOfBlocks to generate 
> sampling block
>     numbers, this is not friendly for zedstore which wants to use a logical 
> block number and might also
>     not friendly to non-block-oriented Table AMs.
> 2) columns of zedstore table store separately, so columns in a row have a 
> different physical position,
>     tid in a tuple is invalid for zedstore which means the correlation 
> statistic is incorrect for zedstore.
> 3) RelOptInfo->pages is not correct for Zedstore if we only access partial of 
> the columns which make
>    the IO cost much higher than the actual cost.
>
> For 1) and 2), we propose to extend existing ANALYZE-scan table AM routines 
> in patch
> "0001-ANALYZE-tableam-API-change.patch" which add three more APIs:
> scan_analyze_beginscan(), scan_analyze_sample_tuple(), 
> scan_analyze_endscan(). This provides
> more convenience and table AMs can take more control of every step of 
> sampling rows. Meanwhile,
> with the new structure named "AcquireSampleContext", we can acquire extra 
> info (eg: physical position,
> physical size) except the real columns values.
>
> For 3), we hope we can have a similar mechanism with RelOptInfo->rows which 
> is calculated from
>  (RelOptInfo->tuples * Selectivity), we can calculate RelOptInfo->pages with 
> a page selectivity which
> is base on the selected zedstore columns.  
> 0002-Planner-can-estimate-the-pages-based-on-the-columns-.patch
> shows one idea that adding the `stadiskfrac` to pg_statistic and planner use 
> it to estimate the
> RelOptInfo->pages.
>
> 0003-ZedStore-use-extended-ANAlYZE-API.patch is attached to only show how 
> Zedstore use the
> previous patches to achieve:
> 1. use logical block id to acquire the sample rows.
> 2. can only acquire sample rows from specified column c1, this is used when 
> user only analyze table
>     on specified columns eg: "analyze zs (c1)".
> 3 when ANALYZE, zedstore table AM provided extra disksize info, then ANALYZE 
> compute the
>     physical fraction statistic of each column and planner use it to estimate 
> the IO cost based on
>     the selected columns.

I couldn't find an entry for that patchset in the next commitfest.
Could you register it so that it won't be forgotten?


Reply via email to