Hi hackers, While trying to measure if there is any gain from the change as discussed in [1], I happened to notice another place that is slowed down by list_delete_first. I'm using the query as below:
(n=1000000; printf "explain (summary on) select * from t where " for ((i=1;i<$n;i++)); do printf "a = $i or "; done; printf "a = $n;" ) | psql And I notice that a large part of planning time is spent on the list_delete_first calls inside simplify_or_arguments(). I think the issue here is clear and straightforward: list_delete_first has an O(N) cost due to data movement. And I believe similar issue has been discussed several times before. I wonder if we can improve it by using list_delete_last instead, so I tried the following change: --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -3612,9 +3612,9 @@ simplify_or_arguments(List *args, unprocessed_args = list_copy(args); while (unprocessed_args) { - Node *arg = (Node *) linitial(unprocessed_args); + Node *arg = (Node *) llast(unprocessed_args); - unprocessed_args = list_delete_first(unprocessed_args); + unprocessed_args = list_delete_last(unprocessed_args); With this change, in my box the planning time for the query above is reduced from 64257.784 ms to 1411.666 ms, a big improvement. The side effect is that it results in a lot of plan diffs in regression tests, but they are all about different order of OR arguments. I believe simplify_and_arguments() can also benefit from similar changes. But I'm not sure if we could have such a long AND/OR arguments in real world. So is this worth doing? [1] https://www.postgresql.org/message-id/CAMbWs4-RXhgz0i4O1z62gt%2BbTLTM5vXYyYhgnius0j_txLH7hg%40mail.gmail.com Thanks Richard