James Coleman <jtc...@gmail.com> writes: > But I think that still leaves something missing: after all, > prepare_sort_from_pathkeys does know how to insert new target list > entries, so something either there (or in the caller/how its called) > has to be enforcing an apparently implicit rule about what point in > the tree it's safe to do such. Or even just no path generation had > ever considered it before (that feels unsatisfactory as an explanation > to me, because it feels more implicit than I'd like, but...)
Hi guys, I hadn't been paying any attention to this thread, but Robert asked me to take a look. A few comments: 1. James was wondering, far upthread, why we would do projections pre-sort or post-sort. I think the answers can be found by studying planner.c's make_sort_input_target(), which separates out what we want to do pre-sort and post-sort. (There, the "sort" is envisioned as a standalone Sort node atop all the scan/join steps; but its decisions nonetheless constrain what will be considered for sorting that's implemented further down.) It has a *very* long introductory comment laying out all the considerations. 2. Robert is correct that under normal circumstances, the targetlists of both baserels and join rels contain only Vars. Any computations we have to do are postponed to the final top-level targetlist, whether they are volatile or not. The fact that this policy is applied regardless of volatility may explain why James isn't seeing volatility checks where he expected them. The core (and, I think, only) exception to this is that an expression that is a sort or group key has to be evaluated earlier. But even then, it won't be pushed down as far as the reltargets of any scan or join relations; it'll be computed after the final join. 3. If we have a volatile sort/group key, that constrains the set of plans we can validly generate, because the key expression mustn't be evaluated redundantly. With the addition of parallelism, a non-parallel-safe sort/group key expression likewise constrains the set of valid plans, even if it's not considered volatile. 4. In the past, the only way we'd consider non-Var sort keys in any way during scan/join planning was if (a) a relation had an expression index matching a requested sort key; of course, that's a-fortiori going to be a non-volatile expression. Or (b) we needed to sort in order to provide suitable input for a merge join ... but again, volatile expressions aren't considered candidates for mergejoin. So there basically wasn't any need to consider sorting by volatiles below the top level sort/group processing, and that again contributes to why you don't see explicit tests in that area. I'm not sure offhand whether the parallel-query patches added anything to guard against nonvolatile-but-parallel-unsafe sort expressions. If not, there might be live bugs there independently of incremental sort; or that might be unreachable because of overall limitations on parallelism. 5. While I've not dug far enough into the code to identify the exact issue, the impression I have about this bug and the parallel-unsafe subplan bug is that the incremental sort code is generating Paths without due regard to point 3. If it is making paths based on explicit sorts for final-stage pathkeys, independently of either expression indexes or mergejoin requirements, that could well explain why it's got problems that weren't seen before. 6. I did look at ebb7ae839, and I suspect that the last part of find_em_expr_usable_for_sorting_rel(), ie the search of the reltarget list, is dead code. There is no case where a volatile expression would appear there, per point 2. The committed regression test cases certainly do not provide a counterexample: a look at the code coverage report will show you that that loop never finds a match in any regression test case. I'd be inclined to flush that code and just say that we won't return volatile expressions here. regards, tom lane