On Sat, May 3, 2025 at 7:48 PM Chengpeng Yan <chengpeng_...@outlook.com> wrote: > I've been following the V4 patches (focusing on 1 and 2 for now): Patch 2's > preprocess_relation_rtes is a nice improvement for efficiently gathering > early catalog info like inh and attgenerated definitions in one pass. > > However, Patch 1 needs to add expansion calls inside specific pull-up > functions (like convert_EXISTS_sublink_to_join) because the main expansion > work was moved before pull_up_sublinks. > > Could we perhaps simplify this? What if preprocess_relation_rtes only > collected the attgenerated definitions (storing them, maybe in a hashtable > like planned for attnotnull in Patch 3), but didn't perform the actual > expansion (Var replacement)? > > Then, we could perform the actual expansion (Var replacement) in a separate, > single, global step later on. Perhaps after pull_up_sublinks (closer to the > original timing), or maybe even later still, for instance after > flatten_simple_union_all, once the main query structure including pulled-up > subqueries/links has stabilized? A unified expansion after the major > structural changes seems cleaner. I'm not sure where is the better position > now. > > This might avoid the need for the extra expansion calls within > convert_EXISTS_sublink_to_join, etc., keeping the information gathering > separate from the expression transformation and potentially making the > overall flow a bit cleaner. > > Any thoughts?
This approach is possible, but I chose not to go that route because 1) it would require an additional loop over the rangetable; 2) it would involve collecting and storing in hash table a lot more information that is only used during the expansion of virtual generated columns. This includes not only the attgenerated attributes of columns you mentioned, but also the default values of columns and the total number of attributes in the tuple. Therefore, it seems to me that expanding the virtual generated columns within the same loop is cleaner and more efficient. Please note that even if we move the expansion of virtual generated columns into a separate loop, it still needs to occur before subquery pull-up. This is because we must ensure that RTE_RELATION RTEs do not have lateral markers. In other words, the expansion still needs to take place within the subquery pull-up function. Thanks Richard