On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofengli...@gmail.com> wrote: > On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowle...@gmail.com> wrote: >> >> So in short, I propose the attached fix without any regression tests >> because I feel that any regression test would just mark that there was >> a big in create_agg_path() and not really help with ensuring we don't >> end up with some similar problem in the future. > > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > are computable from the targetlist, it seems that we do not need to trim > them off, because prepare_sort_from_pathkeys() will add resjunk target > entries for them. But it's also no harm if we trim them off. So I > think the patch is a pretty safe fix. +1 to it.
hmm, I think one of us does not understand what is going on here. I tried to explain in [1] why we *need* to strip off the pathkeys added by adjust_group_pathkeys_for_groupagg(). Given the following example: create table ab (a int,b int); explain (costs off) select a,count(distinct b) from ab group by a; QUERY PLAN ---------------------------- GroupAggregate Group Key: a -> Sort Sort Key: a, b -> Seq Scan on ab (5 rows) adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b" column and that results in the Sort node sorting on {a,b}. It's simply not at all valid to have the GroupAggregate path claim that its pathkeys are also (effectively) {a,b}" as "b" does not and cannot legally exist after the aggregation takes place. We cannot put a resjunk "b" in the targetlist of the GroupAggregate either as there could be any number "b" values aggregated. Can you explain why you think we can put a resjunk "b" in the target list of the GroupAggregate in the above case? >> >> I have some concerns that the assert_pathkeys_in_target() function >> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm >> not proposing to commit that without further discussion. > > > Yeah, it looks like some heavy to call assert_pathkeys_in_target() for > each path node. Can we run some benchmarks to see how much overhead it > would bring to USE_ASSERT_CHECKING build? I think it'll be easy to show that there is an overhead to it. It'll be in the realm of the ~41ms patched vs ~24ms unpatched that I showed in [2]. That's quite an extreme case. Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". David [1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=n-tb6...@mail.gmail.com [2] https://postgr.es/m/caaphdvo7rzcqyw-gnkzr6qcijcqf8vjlkj4xfk-kawvyaw1...@mail.gmail.com