On Wed, May 29, 2019 at 4:51 PM Kohei KaiGai <kai...@heterodb.com> wrote:
> Hi Amit, > > 2019年5月29日(水) 13:26 Amit Langote <langote_amit...@lab.ntt.co.jp>: > > > > Kaigai-san, > > > > On 2019/05/29 12:13, Kohei KaiGai wrote: > > > One interesting data type in Apache Arrow is "Struct" data type. It is > > > equivalent to composite > > > type in PostgreSQL. The "Struct" type has sub-fields, and individual > > > sub-fields have its own > > > values array for each. > > > > > > It means we can skip to load the sub-fields unreferenced, if > > > query-planner can handle > > > referenced and unreferenced sub-fields correctly. > > > On the other hands, it looks to me RelOptInfo or other optimizer > > > related structure don't have > > > this kind of information. RelOptInfo->attr_needed tells extension > > > which attributes are referenced > > > by other relation, however, its granularity is not sufficient for > sub-fields. > > > > Isn't that true for some other cases as well, like when a query accesses > > only some sub-fields of a json(b) column? In that case too, planner > > itself can't optimize away access to other sub-fields. What it can do > > though is match a suitable index to the operator used to access the > > individual sub-fields, so that the index (if one is matched and chosen) > > can optimize away accessing unnecessary sub-fields. IOW, it seems to me > > that the optimizer leaves it up to the indexes (and plan nodes) to > further > > optimize access to within a field. How is this case any different? > > > I think it is a little bit different scenario. > Even if an index on sub-fields can indicate the tuples to be fetched, > the fetched tuple contains all the sub-fields because heaptuple is > row-oriented data. > For example, if WHERE-clause checks a sub-field: "x" then aggregate > function references other sub-field "y", Scan/Join node has to return > a tuple that contains both "x" and "y". IndexScan also pops up a tuple > with a full composite type, so here is no problem if we cannot know > which sub-fields are referenced in the later stage. > Maybe, if IndexOnlyScan supports to return a partial composite type, > it needs similar infrastructure that can be used for a better composite > type support on columnar storage. > There is another issue related to the columnar store that needs targeted columns for projection from the scan is discussed in zedstore [1]. Projecting all columns from a columnar store is quite expensive than the row store. [1] - https://www.postgresql.org/message-id/CALfoeivu-n5o8Juz9wW%2BkTjnis6_%2BrfMf%2BzOTky1LiTVk-ZFjA%40mail.gmail.com Regards, Haribabu Kommi Fujitsu Australia