Robert Haas <robertmh...@gmail.com> writes: > Thanks for working on this. Some review comments:
> - I think all of the new path creation functions should bubble up > parallel_degree from their subpath. Ah, thanks, I didn't have any clue how to set that (though in my defense, the documentation about it is next to nonexistent). Just to confirm, I assume the rules are: * parallel_aware: indicates whether the plan node itself has any parallelism behavior * parallel_safe: indicates that the entire plan tree rooted at this node is safe to execute in a parallel worker * parallel_degree: indicates number of parallel threads potentially useful for this plan tree (0 if not parallel-safe) This leads me to the conclusion that all these new node types should set parallel_aware to false and copy up the other two fields from the child, except for LockRows and ModifyTable which should set them all to false/0. Correct? If so, I'll go fix. > - RollupPath seems like a poor choice of name, if nothing else. You > would expect that it would be related to GROUP BY ROLLUP() but of > course that's really the same thing as GROUP BY GROUPING SETS () or > GROUP BY CUBE (), and the fundamental operation is actually GROUPING > SETS, not ROLLUP. As I noted to David, that thing seems to me to be in need of refactoring, but I'd just as soon leave untangling the grouping-sets mess for later. I don't mind substituting a different name if you have a better idea, but don't really want to do more work than that right now. > - It's not entirely related to this patch, but I'm starting to wonder > if we've made the wrong bet about target lists. It seems to me that > there's a huge difference between a projection which simply throws > away columns we don't need and one which actually computes something, > and maybe those cases ought to be treated differently instead of > saying "well, it's a target list". It strikes me that there are > probably execution-time optimizations that are possible in the former > case, and maybe a more compact representation of the projection > operation as well. I can't shake the feeling that our extensive use > of lists can't be the best thing ever for performance. We do already have the "physical tlist" optimization. I agree that there's more work to be done here, but again would rather leave that to a later patch. > - A related point that is more connected to this patch is that you've > added 13 (!) new calls to disuse_physical_tlist, and 8 of those are > marked with /* XXX hack: need original tlist with sortgroupref marking > */. I don't quite understand what's going on there. I think if we're > going to be stuck with that hack we at least need some comments > explaining what is going on there. What has caused us to suddenly > need these calls when we didn't before, and why these places and not > some others? Yeah, that's a hack to get things working. The problem is that these node types need to be able to identify sort/group columns in their inputs, but if the child has switched to a "physical tlist" then the ressortgroupref marking isn't there, and indeed the needed column might not be there at all if it's a computed expression not a Var. So what I did for the moment was to force the inputs back to their nominal tlists. In the old code we didn't have this problem because none of the upper-level plan node types could see a physical tlist unless make_subplanTargetList had allowed it, and then we applied locate_grouping_columns() to re-identify the grouping columns. That logic probably needs to be transposed into createplan.c, but I've not taken the time yet to figure out exactly how. I don't know if it's reasonable to do that separately from rethinking how the whole disuse_physical_tlist thing works. > - For SortPath, you mention that the SortGroupClauses representation > isn't currently used. It's not clear to me that it ever would be; > what would be the case where that might be useful? At any rate, I'd > be inclined to rip it out for now; you can always put it back later. Yeah, I was dithering about that. It seems like createplan.c now has a few too many ways to identify sort/group columns, and I was hoping to consolidate them somehow. That might lead to wanting to use SortGroupClauses not PathKeys in some cases. But until that's worked out, I agree the extra field is useless and we can just drop it. > - create_distinct_paths() disables the hash path if it seems like it > would exceed work_mem, unless the sort path isn't viable. But there's > something that feels a bit uncomfortable about this. Suppose the sort > path isn't viable but some other kind of future path is viable. It > seems like it would be better to restructure this a bit so that the > decision about whether to add the hash path is based on whether there > are any other paths in the rel when we reach the bottom of the > function. create_grouping_paths() has a similar issue. OK, I'll take a look. Quite a lot of these functions can probably stand more local rearrangements; I've been mainly concerned about getting the overall structure right. (BTW, I am also suspicious that there's now dead code in places, but I've not gone looking for that either. There is a lot of rather boring mop-up to be done, which I left out of the v1 patch mostly to keep it from being even more unreviewably huge than it had to be.) > In general, and I'm sure this is not a huge surprise, most of this > looks very good to me. I think the design is sound and that, if the > performance is OK, we ought to move forward with it. Thanks. As I told Teodor last night, I can't reproduce a performance issue here with pgbench-style queries. Do you have any thoughts about how we might satisfy ourselves whether there is or isn't a performance problem? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers