Short version: In what I'm currently working on I had a few questions about arrays and the execExpr/execExprInterp framework that didn't seem obviously answered in the code or README.
- Does the execExpr/execExprInterp framework allow a scalar array op to get an already expanded array (unless I'm missing something we can't easily lookup a given index in a flattened array)? - If not, is there a way in that framework to know if the array expr has stayed the same through multiple evaluations of the expression tree (i.e., so you could expand and sort it just once)? Long version/background: I've been looking at how a query like: select * from t where i in (<long array>); executes as a seq scan the execution time increases linearly with respect to the length of the array. That's because in execExprInterp.c's ExecEvalScalarArrayOp() we do a linear search through the array. In contrast, when setting up a btree scan with a similar saop, we first sort the array, remove duplicates, and remove nulls. Of course with btree scans this has other values, like allowing us to return results in array order. I've been considering approaches to improve the seq scan case. We might: - At plan time rewrite as a hash join to a deduped array values relation (gross, but included here since that's an approach we can take rewriting the SQL itself as a proof of concept). - At execution time build a hash and lookup. - At execution time sort the array and binary search through it. I've been working other last approach to see what results I might get (it seemed like the easiest to hack together). Putting that together left me with the questions mentioned above in the "short version". Obviously if anyone has thoughts on the above approaches I'd be interested in that too. Side question: when we do: arr = DatumGetArrayTypeP(*op->resvalue); in ExecEvalScalarArrayOp() is that going to be expensive each time through a seq scan? Or is it (likely) going to resolve to an already in-memory array and effectively be the cost of retrieving that pointer? James