Hello, On Thu, Dec 5, 2019 at 11:14 AM Pengzhou Tang <pt...@pivotal.io> wrote: > > When hacking the Zedstore, we need to get a more accurate statistic for > zedstore and we > faced some restrictions: > 1) acquire_sample_rows() always use RelationGetNumberOfBlocks to generate > sampling block > numbers, this is not friendly for zedstore which wants to use a logical > block number and might also > not friendly to non-block-oriented Table AMs. > 2) columns of zedstore table store separately, so columns in a row have a > different physical position, > tid in a tuple is invalid for zedstore which means the correlation > statistic is incorrect for zedstore. > 3) RelOptInfo->pages is not correct for Zedstore if we only access partial of > the columns which make > the IO cost much higher than the actual cost. > > For 1) and 2), we propose to extend existing ANALYZE-scan table AM routines > in patch > "0001-ANALYZE-tableam-API-change.patch" which add three more APIs: > scan_analyze_beginscan(), scan_analyze_sample_tuple(), > scan_analyze_endscan(). This provides > more convenience and table AMs can take more control of every step of > sampling rows. Meanwhile, > with the new structure named "AcquireSampleContext", we can acquire extra > info (eg: physical position, > physical size) except the real columns values. > > For 3), we hope we can have a similar mechanism with RelOptInfo->rows which > is calculated from > (RelOptInfo->tuples * Selectivity), we can calculate RelOptInfo->pages with > a page selectivity which > is base on the selected zedstore columns. > 0002-Planner-can-estimate-the-pages-based-on-the-columns-.patch > shows one idea that adding the `stadiskfrac` to pg_statistic and planner use > it to estimate the > RelOptInfo->pages. > > 0003-ZedStore-use-extended-ANAlYZE-API.patch is attached to only show how > Zedstore use the > previous patches to achieve: > 1. use logical block id to acquire the sample rows. > 2. can only acquire sample rows from specified column c1, this is used when > user only analyze table > on specified columns eg: "analyze zs (c1)". > 3 when ANALYZE, zedstore table AM provided extra disksize info, then ANALYZE > compute the > physical fraction statistic of each column and planner use it to estimate > the IO cost based on > the selected columns.
I couldn't find an entry for that patchset in the next commitfest. Could you register it so that it won't be forgotten?