Em dom., 20 de set. de 2020 às 21:10, Tomas Vondra < tomas.von...@2ndquadrant.com> escreveu:
> On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote: > >Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra < > >tomas.von...@2ndquadrant.com> escreveu: > > > >> On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote: > >> >Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra < > >> >tomas.von...@2ndquadrant.com> escreveu: > >> > > >> >> On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote: > >> >> >Hi, > >> >> > > >> >> >In case gd->any_hashable is FALSE, grouping_is_hashable is never > >> called. > >> >> >In this case, the planner could use HASH for groupings, but will > never > >> >> know. > >> >> > > >> >> > >> >> The condition is pretty simple - if the query has grouping sets, > look at > >> >> grouping sets, otherwise look at groupClause. For this to be an issue > >> >> the query would need to have both grouping sets and (independent) > group > >> >> clause - which AFAIK is not possible. > >> >> > >> >Uh? > >> >(parse->groupClause != NIL) If different from NIL we have > ((independent) > >> >group clause), grouping_is_hashable should check? > >> >(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)))) > >> >If gd is not NIL and gd->any_hashtable is FALSE? > >> > > >> > >> Sorry, I'm not quite sure I understand what this is meant to say :-( > >> > >> Anyway, (groupClause != NIL) does not mean the groupClause is somehow > >> independent (of what?). Add some debugging to create_grouping_paths and > >> you'll see that e.g. this query ends up with groupClause with 3 items: > >> > >> select 1 from t group by grouping sets ((a), (b), (c)); > >> > >> and so does this one: > >> > >> select 1 from t group by grouping sets ((a,c), (a,b)); > >> > >> I'm no expert in grouping sets, but I'd bet this means we transform the > >> grouping sets into a groupClause by extracting the keys. I haven't > >> investigated why exactly we do this, but I'm sure there's a reason (e.g. > >> it gives us SortGroupClause). > >> > >> You seem to believe a query can have both grouping sets and regular > >> grouping at the same level - but how would such query look like? Surely > >> you can't have two GROuP BY clauses. You can do > >> > >Translating into code: > >gd is grouping sets and > >parse->groupClause is regular grouping > >and we cannot have both at the same time. > > > > Have you verified the claim that we can't have both at the same time? As > I already explained, we build groupClause from grouping sets. I suggest > you do some debugging using the queries I used as examples, and you'll > see the claim is not true. > I think we already agreed that we cannot have both at the same time. > > > > >> select 1 from t group by a, grouping sets ((b), (c)); > >> > >> which is however mostly equivalent to (AFAICS) > >> > >> select 1 from t group by grouping sets ((a,b), (a,c)) > >> > >> so it's not like we have an independent groupClause either I think. > >> > >> The whole point of the original condition is - if there are grouping > >> sets, check if at least one can be executed using hashing (i.e. all keys > >> are hashable). Otherwise (without grouping sets) look at the grouping as > >> a whole. > >> > >Or if gd is NULL check parse->groupClause. > > > > Which is what I'm saying. > > > > >> I don't see how your change improves this - if there are grouping sets, > >> it's futile to look at the whole groupClause if at least one grouping > >> set can't be hashed. > >> > >> But there's a simple way to disprove this - show us a case (table and a > >> query) where your code correctly allows hashing while the current one > >> does not. > > > >I'm not an expert in grouping either. > >The question I have here is whether gd is populated and has gd-> > >any_hashable as FALSE, > >Its mean no use checking parse-> groupClause, it's a waste of time, Ok. > > > > I'm sorry, I don't follow your logic. Those are two separate cases. If > we have grouping sets, we have to check if at least one can be hashed. > If we don't have grouping sets, we have to check groupClause directly. > Why would that be a waste of time is unclear to me. > This is clear to me. The problem is how it was implemented in create_grouping_paths. > > > > >> > >> > > >> >> For hashing to be worth considering, at least one grouping set has > to be > >> >> hashable - otherwise it's pointless. > >> >> > >> >> Granted, you could have something like > >> >> > >> >> GROUP BY GROUPING SETS ((a), (b)), c > >> >> > >> >> which I think essentially says "add c to every grouping set" and that > >> >> will be covered by the any_hashable check. > >> >> > >> >I am not going into the merit of whether or not it is worth hashing. > >> >grouping_is_hashable as a last resort, must decide. > >> > > >> > >> I don't know what this is supposed to say either. The whole point of > >> this check is to simply skip construction of hash-based paths in cases > >> when it's obviously futile (i.e. when some of the keys don't support > >> hashing). We do this as early as possible, because the whole point is to > >> save precious CPU cycles during query planning. > >> > > > >> > > >> >> >Apparently gd pointer, will never be NULL there, verified with > >> Assert(gd > >> >> != > >> >> >NULL). > >> >> > > >> >> > >> >> Um, what? If you add the assert right before the if condition, you > won't > >> >> even be able to do initdb. It's pretty clear it'll crash for any > query > >> >> without grouping sets. > >> >> > >> >Here not: > >> >Assert(gd != NULL); > >> >create_ordinary_grouping_paths(root, input_rel, grouped_rel, > >> > agg_costs, gd, &extra, > >> > &partially_grouped_rel); > >> > > >> > >> I have no idea where you're adding this assert. But simply adding it to > >> create_grouping_paths (right before the if condition changed by your > >> patch) will trigger a failure during initdb. Simply because for queries > >> without grouping sets (and with regular grouping) we pass gs=NULL. > >> > >> Try applying the attached patch and do "pg_ctl -D ... init" - you'll get > >> a failure proving that gd=NULL. > >> > >Well, I did a test right now, downloaded the latest HEAD and added the > >Assertive. > >I compiled everything, ran the regression tests, installed the binaries, > >loaded the server and installed a client database. > >Everything is working. > >Maybe in all these steps, there is no grouping that would trigger the code > >in question, but I doubt it. > > > > For me doing this leads to reliable crashes when pg_regress does initdb > (so before the actual checks): > > patch -p1 < ~/grouping-assert.patch > ./configure --enable-debug --enable-cassert > make -s clean && make -s -j4 && make check > > And the "make check" it immediately crashes like this: > > ============== creating temporary instance ============== > ============== initializing database system ============== > > pg_regress: initdb failed > Examine /home/user/work/postgres/src/test/regress/log/initdb.log for > the reason. > > on the assert. So yeah, I'd guess there's something wrong with your > build. What does pg_config say? > Default. I never change anything. I simply clone the last head: git clone --single-branch https://github.com/postgres/postgres/ and have it compiled. > > > > >> > >> >Otherwise Coverity would be right. > >> >CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13. > >> >var_deref_model: Passing null pointer gd to > >> create_ordinary_grouping_paths, > >> >which dereferences it. [show details > >> >< > >> > https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180 > >> >] > >> > > >> > > >> >Which cannot be true, gd is never NULL here. > >> > > >> > >> Or maybe coverity simply does not realize the NULL-dereference can't > >> happen, because it's guarded by other conditions, or something like > >> that ... As the assert failure demonstrates, we do indeed get NULLs > >> here, and it does not crash so coverity is missing something. > >> > >1. With Assert. > >All works. > > > > No, it doesn't. > Here yes. Latest head, msvc 2019 (64 bits), Windows 10 64 bits. vcregress check install c:\postgres_debug pg_ctl -D c:\postgres_debug\data -l c:\postgres_debug\log\logfile start locmaq.db=# select 1 from tbPersons group by grouping sets ((a), (b), (c)); ERROR: column "a" does not exist LINE 1: select 1 from tbPersons group by grouping sets ((a), (b), (c... None crash. > >2. create_ordinary_grouping_paths is called with gd var. > > > >3. create_ordinary_grouping_paths call get_number_of_groups > > > >4. get_number_of_groups dereference gd pointer (line 3705). > >foreach(lc, gd->rollups) > > > > Yes. And that only happens in branch > > if (parse->groupingSets) { ... } > > which means we know gd is not null here. But coverity does not realize > that, and produces bogus report. > > >5. line (3701: > >Assert(gd); /* keep Coverity happy */ > > > >Of course, this works, gd is never NULL here. > > > >6. grouping_is_hashable is never called here, because gd is never NULL > here. > >if ((parse->groupClause != NIL && > >agg_costs->numOrderedAggs == 0 && > >(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)))) > >flags |= GROUPING_CAN_USE_HASH; > > > >The question is is it worth it or not worth calling > (grouping_is_hashable), > >at that point? > > > > Here crash. Line (2199, planner.c): if (have_grouping) { Assert(gset_data != NULL); ^Clocmaq.db=?# select 1 from tbpersons group by grouping sets ((a), (b), (c)); no connection to the server The connection to the server was lost. Attempting reset: Failed. !?> !?> \q regards, Ranier Vilela