I wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: >> A full EXPLAIN ANALYZE is always desirable - we agree on that. The >> question is what we do when one is not available.
> The least bad alternative I've heard is to let EXPLAIN ANALYZE print > out stats-so-far if the query is canceled by control-C or statement > timeout. The objection to this is you may mistake startup transients > for full query behavior ... but at least the numbers will be good as > far as they go. I thought some more about this, and it seems doable but tricky (ie, there are many wrong ways to do it). Here are my conclusions: We can't just use the normal QueryCancel logic that throws an elog(ERROR) from someplace down inside the query. This would leave the backend in an unclean state from which we could only certainly recover by doing AbortTransaction. And once we've aborted the transaction we can't do catalog accesses, which gets in the way of producing the EXPLAIN printout. Running the test query inside a subtransaction would fix that, but aborting the subtransaction would throw away the executor state, including the Instrumentation nodes we need. So it seems we need a way to stop the query without raising an error per se. What I'm thinking is that while EXPLAIN ANALYZE is running, timeout or SIGINT should not set QueryCancelPending, but instead set a separate flag "ExplainCancelPending", which we then test in ExecProcNode(), say if (node->instrument) { + if (ExplainCancelPending) + return NULL; InstrStartNode(node->instrument); } There might be one or two other places to check it, but basically we'd only notice the flag in very limited circumstances where it's definitely safe to force early termination of ExecutorRun. When control comes back to explain.c, we just print the results as normal (but probably adding a line explicitly noting that the query was abandoned before completion). Note that we won't have any "running" Instrumentation nodes to contend with, since the change doesn't cause nodes to drop out after they've started timing. So the data is good, just incomplete. After we've printed the results, we have a bit of a problem: if ExplainCancelPending is set, we now need to abort the transaction. It would not do at all to allow an incompletely executed UPDATE to commit. I experimented with throwing an elog() at the bottom of ExplainQuery() after end_tup_output(), but this does not work: psql only prints the error and not the data, because libpq throws away the query result upon seeing the error. We could probably hack psql to print the results before noting the error, but I'm concerned about making a change that would change the behavior for other error-at-end-of-statement cases. Also, it's not clear what might have to happen to get non-psql clients to play along. It seems like the best solution is to establish a subtransaction around the entire EXPLAIN command (not just the test query), which we can abort after we've printed the results. A possible objection to this is that running a query inside a subtransaction might have different/worse performance than running it at top level. I don't recall any severe bottlenecks of that kind but that doesn't mean there aren't any (Alvaro, any comments?) Note that this would mean that ending an EXPLAIN ANALYZE early, via either control-C or statement timeout, would be treated as a non-error situation from the point of view of the outer transaction. This bothers me a bit, because in fact the effects if any of the tested query would have been rolled back. Not sure we have any choice though. If we expose the error then we'll have problems with clients not showing the EXPLAIN results. Thoughts? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match