While hacking on zedstore, we needed to get a list of the columns to be projected--basically all of the columns needed to satisfy the query. The two use cases we have for this is 1) to pass this column list down to the AM layer for the AM to leverage it 2) for use during planning to improving costing In other threads, such as [1], there has been discussion about the possible benefits for all table types of having access to this set of columns.
Focusing on how to get this used cols list (as opposed to how to pass it down to the AM), we have tried a few approaches to constructing it and wanted to get some ideas on how best to do it. We are trying to determine which phase to get the columns -- after parsing, after planning, or during execution right before calling the AM. Approach A: right before calling AM Leverage expression_tree_walker() right before calling beginscan() and collecting the columns into a needed columns context. This approach is what is currently in the zedstore patch mentioned in this thread [2]. The benefit of this approach is that it walks the tree right before the used column set will be used--which makes it easy to skip this walk for queries or AMs that don't benefit from this used columns list. Approach B: after parsing and/or after planning Add a new member 'used_cols' to PlannedStmt which contains the attributes for each relation present in the query. Construct 'used_cols' at the end of planning using the PathTargets in the RelOptInfos in the PlannerInfo->simple_rel_array and the RangeTblEntries in PlannerInfo->simple_rte_array. The nice thing about this is that it does not require a full walk of the plan tree. Approach A could be more expensive if the tree is quite large. I'm not sure, however, if just getting the PathTargets from the RelOptInfos is sufficient for obtaining the whole set of columns used in the query. Approach B, however, does not work for utility statements which do not go through planning. One potential solution to this that we tried was getting the columns from the query tree after parse analyze and then in exec_simple_query() adding the column list to the PlannedStmt. This turned out to be as messy or more than Approach A because each kind of utility statement has its own data structure that is composed of elements taken from the Query tree but does not directly include the original PlannedStmt created for the query (the PlannedStmt doesn't contain anything except the query tree for utility statements since they do not go through planning). So, for each type of utility statement, we would have to separately copy over the column list from the PlannedStmt in its own way. It is worth noting that Approach A also requires special handling for each type of utility statement. We are wondering about specific benefits of Approach B--that is, is there some use case (maybe plan re-use) for having the column set accessible in the PlannedStmt itself? One potential benefit of Approach B could be for scans of partition tables. Collecting the used column list could be done once for the query instead of once for each partition. Both approaches, however, do not address our second use case, as we would not have the column list during planning for non-utility statements. To satisfy this, we would likely have to extract the columns from the query tree after parse analyze for non-utility statements as well. An approach which extracted this list before planning and saved it somewhere would help avoid having to do the same walk during planning and then again during execution. Though, using the list constructed after parsing may not be ideal when some columns were able to be eliminated during planning. Melanie & Ashwin [1] https://www.postgresql.org/message-id/20190409010440.bqdikgpslh42pqit%40alap3.anarazel.de [2] https://www.postgresql.org/message-id/CALfoeiuuLe12PuQ%3DzvM_L7B5qegBqGHYENHUGbLOsjAnG%3Dmi4A%40mail.gmail.com