Hi, On 2021-05-30 15:14:33 +0200, Andreas Karlsson wrote: > I did the first part since it seemed easy enough and an obvious win for all > workloads.
Cool! > +typedef struct GIST_COL_STATE > +{ > + FmgrInfo consistentFn; > + FmgrInfo unionFn; > + FmgrInfo compressFn; > + FmgrInfo decompressFn; > + FmgrInfo penaltyFn; > + FmgrInfo picksplitFn; > + FmgrInfo equalFn; > + FmgrInfo distanceFn; > + FmgrInfo fetchFn; > + > + /* Collations to pass to the support functions */ > + Oid supportCollation; > +} GIST_COL_STATE; > + > /* > * GISTSTATE: information needed for any GiST index operation > * > @@ -83,18 +99,7 @@ typedef struct GISTSTATE > TupleDesc fetchTupdesc; /* tuple descriptor for tuples returned > in an > * index-only > scan */ > > - FmgrInfo consistentFn[INDEX_MAX_KEYS]; > - FmgrInfo unionFn[INDEX_MAX_KEYS]; > - FmgrInfo compressFn[INDEX_MAX_KEYS]; > - FmgrInfo decompressFn[INDEX_MAX_KEYS]; > - FmgrInfo penaltyFn[INDEX_MAX_KEYS]; > - FmgrInfo picksplitFn[INDEX_MAX_KEYS]; > - FmgrInfo equalFn[INDEX_MAX_KEYS]; > - FmgrInfo distanceFn[INDEX_MAX_KEYS]; > - FmgrInfo fetchFn[INDEX_MAX_KEYS]; > - > - /* Collations to pass to the support functions */ > - Oid supportCollation[INDEX_MAX_KEYS]; > + GIST_COL_STATE column_state[FLEXIBLE_ARRAY_MEMBER]; > } GISTSTATE; This makes me wonder if the better design would be to keep the layout of dense arrays for each type of function, but to make it more dense by allocating dynamically. As above GIST_COL_STATE is 436 bytes (I think), i.e. *well* above a cache line - far enough apart that accessing different column's equalFn or such will be hard for the hardware prefetcher to understand. Presumably not all functions are accessed all the time. So we could lay it out as struct GISTSTATE { ... FmgrInfo *consistentFn; FmgrInfo *unionFn; ... } [ncolumns consistentFns follow] [ncolumns unionFn's follow] Which'd likely end up with better cache locality for gist indexes with a few columns. At the expense of a pointer indirection, of course. Another angle: I don't know how it is for GIST, but for btree, the FunctionCall2Coll() etc overhead shows up significantly - directly allocating the FunctionCallInfo and initializing it once, instead of every call, reduces overhead noticeably (but is a bit complicated to implement, due to the insertion scan and stuff). I'd be surprised if we didn't get better performance for gist if it had initialized-once FunctionCallInfos intead of the FmgrInfos. And that's not even just true because of needing to re-initialize FunctionCallInfo on every call, but also because the function call itself rarely accesses the data from the FmgrInfo, but always accesses the FunctionCallInfo. And a FunctionCallInfos with 1 argument is the same size as a FmgrInfo, with 2 it's 16 bytes more. So storing the once-initialized FunctionCallInfo results in considerably better locality, by not having all the FmgrInfos in cache. One annoying bit: Right now it's not trivial to declare arrays of specific-number-of-arguments FunctionCallInfos. See the uglyness of LOCAL_FCINFO. I think we should consider having a bunch of typedefs for 1..3 argument FunctionCallInfo structs to make that easier. Probably would still need union trickery, but it'd not be *too* bad I think. Greetings, Andres Freund