Kaigai-san, On 2019/05/29 12:13, Kohei KaiGai wrote: > One interesting data type in Apache Arrow is "Struct" data type. It is > equivalent to composite > type in PostgreSQL. The "Struct" type has sub-fields, and individual > sub-fields have its own > values array for each. > > It means we can skip to load the sub-fields unreferenced, if > query-planner can handle > referenced and unreferenced sub-fields correctly. > On the other hands, it looks to me RelOptInfo or other optimizer > related structure don't have > this kind of information. RelOptInfo->attr_needed tells extension > which attributes are referenced > by other relation, however, its granularity is not sufficient for sub-fields.
Isn't that true for some other cases as well, like when a query accesses only some sub-fields of a json(b) column? In that case too, planner itself can't optimize away access to other sub-fields. What it can do though is match a suitable index to the operator used to access the individual sub-fields, so that the index (if one is matched and chosen) can optimize away accessing unnecessary sub-fields. IOW, it seems to me that the optimizer leaves it up to the indexes (and plan nodes) to further optimize access to within a field. How is this case any different? > Probably, all we can do right now is walk-on the RelOptInfo list to > lookup FieldSelect node > to see the referenced sub-fields. Do we have a good idea instead of > this expensive way? > # Right now, PG-Strom loads all the sub-fields of Struct column from > arrow_fdw foreign-table > # regardless of referenced / unreferenced sub-fields. Just a second best. I'm missing something, but if PG-Strom/arrow_fdw does look at the FieldSelect nodes to see which sub-fields are referenced, why doesn't it generate a plan that will only access those sub-fields or why can't it? Thanks, Amit