Hi hackers, I have a question about `use_physical_tlist()` which is applied in `create_scan_plan()`: ``` if (flags == CP_IGNORE_TLIST) { tlist = NULL; } else if (use_physical_tlist(root, best_path, flags)) { if (best_path->pathtype == T_IndexOnlyScan) { /* For index-only scan, the preferred tlist is the index's */ tlist = copyObject(((IndexPath *) best_path)->indexinfo->indextlist);
/* * Transfer sortgroupref data to the replacement tlist, if * requested (use_physical_tlist checked that this will work). */ if (flags & CP_LABEL_TLIST) apply_pathtarget_labeling_to_tlist(tlist, best_path->pathtarget); } else { tlist = build_physical_tlist(root, rel); …… ``` And the comment above the code block says: ``` /* * For table scans, rather than using the relation targetlist (which is * only those Vars actually needed by the query), we prefer to generate a * tlist containing all Vars in order. This will allow the executor to * optimize away projection of the table tuples, if possible. * * But if the caller is going to ignore our tlist anyway, then don't * bother generating one at all. We use an exact equality test here, so * that this only applies when CP_IGNORE_TLIST is the only flag set. */ ``` But for some column-oriented database based on Postgres, it may help a lot in case of projection of the table tuples in execution? And is there any other optimization considerations behind this design? e.g. If we have such table definition and a query: ``` CREATE TABLE partsupp (PS_PARTKEY INT, PS_SUPPKEY INT, PS_AVAILQTY INTEGER, PS_SUPPLYCOST DECIMAL(15,2), PS_COMMENT VARCHAR(199), dummy text); explain analyze verbose select sum(ps_supplycost * ps_availqty) from partsupp; ``` And the planner would give such plan: ``` QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Aggregate (cost=12.80..12.81 rows=1 width=32) (actual time=0.013..0.015 rows=1 loops=1) Output: sum((ps_supplycost * (ps_availqty)::numeric)) -> Seq Scan on public.partsupp (cost=0.00..11.60 rows=160 width=22) (actual time=0.005..0.005 rows=0 loops=1) Output: ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment, dummy Planning Time: 0.408 ms Execution Time: 0.058 ms (6 rows) ``` It looks the columns besides `ps_supplycost` and `ps_availqty` are not necessary, but fetched from tuples all at once. For the row-based storage such as heap, it looks fine, but for column-based storage, it would result into unnecessary overhead and impact performance. Is there any plan to optimize here? Thanks.