Side effect of remove_useless_groupby_columns
Hi, When looking at [1], I realized we may have a side effect when removing redundant columns in the GROUP BY clause. Suppose we have a query with ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide that 'b' is redundant due to being functionally dependent on other GROUP BY columns, we would remove it from group keys. This will make us lose the opportunity to leverage the index on 'b'. Here is an example for illustration. # create table t (a int primary key, b int); # insert into t select i, i%1000 from generate_series(1,100)i; # create index on t(b); By default, we remove 'b' from group keys and generate a plan as below: # explain (costs off) select b from t group by a, b order by b limit 10; QUERY PLAN Limit -> Sort Sort Key: b -> Group Group Key: a -> Index Scan using t_pkey on t (6 rows) The index on 'b' is not being used and we'll have to retrieve all the data underneath to perform the sort work. On the other hand, if we keep 'b' as a group column, we can get such a plan as: # explain (costs off) select b from t group by a, b order by b limit 10; QUERY PLAN - Limit -> Group Group Key: b, a -> Incremental Sort Sort Key: b, a Presorted Key: b -> Index Scan using t_b_idx on t (7 rows) With the help of 't_b_idx', we can avoid the full scan on 't' and it would run much faster. Any thoughts? [1] https://www.postgresql.org/message-id/flat/16869-26346b77d6ccaeec%40postgresql.org Thanks Richard
Re: Side effect of remove_useless_groupby_columns
Hi, On Sun, Feb 28, 2021 at 5:15 PM David Rowley wrote: > On Sun, 28 Feb 2021 at 20:52, Richard Guo wrote: > > When looking at [1], I realized we may have a side effect when removing > > redundant columns in the GROUP BY clause. Suppose we have a query with > > ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide > > that 'b' is redundant due to being functionally dependent on other GROUP > > BY columns, we would remove it from group keys. This will make us lose > > the opportunity to leverage the index on 'b'. > > > Any thoughts? > > I had initially thought that this was a non-issue before incremental > sort was added, but the following case highlights that's wrong. > > # create table t (a int not null, b int); > # insert into t select i, i%1000 from generate_series(1,100)i; > # create index on t(b,a); > > # explain (analyze) select b from t group by a, b order by b,a limit 10; > # alter table t add constraint t_pkey primary key (a); > # explain (analyze) select b from t group by a, b order by b,a limit 10; > > Execution slows down after adding the primary key since that allows > the functional dependency to be detected and the "b" column removed > from the GROUP BY clause. > > Perhaps we could do something like add: > > diff --git a/src/backend/optimizer/plan/planner.c > b/src/backend/optimizer/plan/planner.c > index 545b56bcaf..7e52212a13 100644 > --- a/src/backend/optimizer/plan/planner.c > +++ b/src/backend/optimizer/plan/planner.c > @@ -3182,7 +3182,8 @@ remove_useless_groupby_columns(PlannerInfo *root) > if (!IsA(var, Var) || > var->varlevelsup > 0 || > !bms_is_member(var->varattno - > FirstLowInvalidHeapAttributeNumber, > - > surplusvars[var->varno])) > + > surplusvars[var->varno]) || > + list_member(parse->sortClause, sgc)) > new_groupby = lappend(new_groupby, sgc); > } > > to remove_useless_groupby_columns(). It's not exactly perfect though > since it's still good to remove the useless GROUP BY columns if > there's no useful index to implement the ORDER BY. We just don't know > that when we call remove_useless_groupby_columns(). > Or, rather than thinking it as a side effect of removing useless groupby columns, how about we do an additional optimization as 'adding useful groupby columns', to add into group keys some column which matches ORDER BY and meanwhile is being functionally dependent on other GROUP BY columns. What I'm thinking is a scenario like below. # create table t (a int primary key, b int, c int); # insert into t select i, i%1000, i%1000 from generate_series(1,100)i; # create index on t(b); # explain (analyze) select b from t group by a, c order by b limit 10; For this query, we can first remove 'c' from groupby columns with the existing optimization of 'removing useless groupby columns', and then add 'b' to groupby columns with the new-added optimization of 'adding useful groupby columns'. We can do that because we know 'b' is functionally dependent on 'a', and we are required to be sorted by 'b', and meanwhile there is index on 'b' that we can leverage. Thanks Richard
Re: A problem about partitionwise join
On Fri, Nov 6, 2020 at 11:26 PM Anastasia Lubennikova < a.lubennik...@postgrespro.ru> wrote: > Status update for a commitfest entry. > > According to CFbot this patch fails to apply. Richard, can you send an > update, please? > > Also, I see that the thread was inactive for a while. > Are you going to continue this work? I think it would be helpful, if you > could write a short recap about current state of the patch and list open > questions for reviewers. > > The new status of this patch is: Waiting on Author > Thanks Anastasia. I've rebased the patch with latest master. To recap, the problem we are fixing here is when generating join clauses from equivalence classes, we only select the joinclause with the 'best score', or the first joinclause with a score of 3. This may cause us to miss some joinclause on partition keys and thus fail to generate partitionwise join. The initial idea for the fix is to create all the RestrictInfos from ECs in order to check whether there exist equi-join conditions involving pairs of matching partition keys of the relations being joined for all partition keys. And then Tom proposed a much better idea which leverages function exprs_known_equal() to tell whether the partkeys can be found in the same eclass, which is the current implementation in the latest patch. Thanks Richard v4-0001-Fix-up-partitionwise-join.patch Description: Binary data
A spot of redundant initialization of brin memtuple
Happened to notice this when reading around the codes. The BrinMemTuple would be initialized in brin_new_memtuple(), right after being created. So we don't need to initialize it again outside. diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index ccc9fa0959..67a277e1f9 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, state->bs_bdesc = brin_build_desc(idxRel); state->bs_dtuple = brin_new_memtuple(state->bs_bdesc); - brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc); - return state; } Thanks Richard
Re: A spot of redundant initialization of brin memtuple
On Sat, Nov 20, 2021 at 12:23 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Fri, Nov 19, 2021 at 1:13 PM Richard Guo > wrote: > > > > Happened to notice this when reading around the codes. The BrinMemTuple > > would be initialized in brin_new_memtuple(), right after being created. > > So we don't need to initialize it again outside. > > > > diff --git a/src/backend/access/brin/brin.c > b/src/backend/access/brin/brin.c > > index ccc9fa0959..67a277e1f9 100644 > > --- a/src/backend/access/brin/brin.c > > +++ b/src/backend/access/brin/brin.c > > @@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel, > BrinRevmap *revmap, > > state->bs_bdesc = brin_build_desc(idxRel); > > state->bs_dtuple = brin_new_memtuple(state->bs_bdesc); > > > > - brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc); > > - > > return state; > > } > > Good catch. +1 for the change. Please submit a patch. > Thanks for the review. Attached is the patch. Thanks Richard v1-0001-Remove-redundant-initialization-of-brin-memtuple.patch Description: Binary data
Re: A spot of redundant initialization of brin memtuple
On Mon, Nov 22, 2021 at 12:52 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Mon, Nov 22, 2021 at 8:53 AM Richard Guo > wrote: > > > > > > On Sat, Nov 20, 2021 at 12:23 AM Bharath Rupireddy < > bharath.rupireddyforpostg...@gmail.com> wrote: > >> > >> On Fri, Nov 19, 2021 at 1:13 PM Richard Guo > wrote: > >> > > >> > Happened to notice this when reading around the codes. The > BrinMemTuple > >> > would be initialized in brin_new_memtuple(), right after being > created. > >> > So we don't need to initialize it again outside. > >> > > >> > diff --git a/src/backend/access/brin/brin.c > b/src/backend/access/brin/brin.c > >> > index ccc9fa0959..67a277e1f9 100644 > >> > --- a/src/backend/access/brin/brin.c > >> > +++ b/src/backend/access/brin/brin.c > >> > @@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel, > BrinRevmap *revmap, > >> > state->bs_bdesc = brin_build_desc(idxRel); > >> > state->bs_dtuple = brin_new_memtuple(state->bs_bdesc); > >> > > >> > - brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc); > >> > - > >> > return state; > >> > } > >> > >> Good catch. +1 for the change. Please submit a patch. > > > > > > Thanks for the review. Attached is the patch. > > Thanks. The patch looks good to me. Let's add it to the commitfest to > not lose track of it. > Done. Here it is: https://commitfest.postgresql.org/36/3424/ Thanks again for the review. Thanks Richard
Re: A problem about partitionwise join
On Wed, Oct 6, 2021 at 1:19 AM Jaime Casanova wrote: > On Wed, Jul 21, 2021 at 04:44:53PM +0800, Richard Guo wrote: > > On Fri, Nov 27, 2020 at 8:05 PM Ashutosh Bapat < > ashutosh.bapat@gmail.com> > > wrote: > > > > > > > > In the example you gave earlier, the equi join on partition key was > > > there but it was replaced by individual constant assignment clauses. > > > So if we keep the original restrictclause in there with a new flag > > > indicating that it's redundant, have_partkey_equi_join will still be > > > able to use it without much change. Depending upon where all we need > > > to use avoid restrictclauses with the redundant flag, this might be an > > > easier approach. However, with Tom's idea partition-wise join may be > > > used even when there is no equi-join between partition keys but there > > > are clauses like pk = const for all tables involved and const is the > > > same for all such tables. > > > > > > > Correct. So with Tom's idea partition-wise join can cope with clauses > > such as 'foo.k1 = bar.k1 and foo.k2 = 16 and bar.k2 = 16'. > > > > > > > > > > In the spirit of small improvement made to the performance of > > > have_partkey_equi_join(), pk_has_clause should be renamed as > > > pk_known_equal and pks_known_equal as num_equal_pks. > > > > > > > Thanks for the suggestion. Will do that in the new version of patch. > > > > Hi Richard, > > We are marking this CF entry as "Returned with Feedback", which means > you are encouraged to send a new patch (and create a new entry for a > future CF for it) with the suggested changes. > Hi, The suggested changes have already been included in v5 patch. Sorry for the confusion. Verified that the patch still applies and works on latest master. So I'm moving it to the next CF (which is Commitfest 2022-01). Please correct me if this is not the right thing to do. Thanks Richard
Inconsistent results from seqscan and gist indexscan
Here is how it can be reproduced. create table point_tbl (f1 point); insert into point_tbl(f1) values ('(5.1, 34.5)'); insert into point_tbl(f1) values (' ( Nan , NaN ) '); analyze; create index gpointind on point_tbl using gist (f1); set enable_seqscan to on; set enable_indexscan to off; # select * from point_tbl where f1 <@ polygon '(0,0),(0,100),(100,100),(100,0),(0,0)'; f1 (5.1,34.5) (NaN,NaN) (2 rows) set enable_seqscan to off; set enable_indexscan to on; # select * from point_tbl where f1 <@ polygon '(0,0),(0,100),(100,100),(100,0),(0,0)'; f1 (5.1,34.5) (1 row) Seems point_inside() does not handle NaN properly. BTW, I'm using 15devel. But this issue can be seen in at least 12 version also. Thanks Richard
Re: Inconsistent results from seqscan and gist indexscan
On Fri, Nov 26, 2021 at 5:23 PM Julien Rouhaud wrote: > Hi, > > On Fri, Nov 26, 2021 at 2:10 PM Richard Guo > wrote: > > > > Seems point_inside() does not handle NaN properly. > > This is unfortunately a known issue, which was reported twice ([1] and > [2]) already. There's a patch proposed for it: > https://commitfest.postgresql.org/32/2710/ (adding Horiguchi-san in > Cc). > Ah, I missed the previous threads. Good to know there is a patch fixing it. Thanks Richard
Re: d25ea01275 and partitionwise join
Hi Amit, On Wed, Sep 4, 2019 at 10:01 AM Amit Langote wrote: > Fujita-san, > > To avoid losing track of this, I've added this to November CF. > > https://commitfest.postgresql.org/25/2278/ > > I know there is one more patch beside the partitionwise join fix, but > I've set the title to suggest that this is related mainly to > partitionwise joins. > Thank you for working on this. Currently partitionwise join does not take COALESCE expr into consideration when matching to partition keys. This is a problem. BTW, a rebase is needed for the patch set. Thanks Richard
Re: d25ea01275 and partitionwise join
Hi Amit, On Wed, Sep 4, 2019 at 3:30 PM Richard Guo wrote: > Hi Amit, > > On Wed, Sep 4, 2019 at 10:01 AM Amit Langote > wrote: > >> Fujita-san, >> >> To avoid losing track of this, I've added this to November CF. >> >> https://commitfest.postgresql.org/25/2278/ >> >> I know there is one more patch beside the partitionwise join fix, but >> I've set the title to suggest that this is related mainly to >> partitionwise joins. >> > > Thank you for working on this. Currently partitionwise join does not > take COALESCE expr into consideration when matching to partition keys. > This is a problem. > > BTW, a rebase is needed for the patch set. > I'm reviewing v2-0002 and I have concern about how COALESCE expr is processed in match_join_arg_to_partition_keys(). If there is a COALESCE expr with first arg being non-partition key expr and second arg being partition key, the patch would match it to the partition key, which may result in wrong results in some cases. For instance, consider the partition table below: create table p (k int, val int) partition by range(k); create table p_1 partition of p for values from (1) to (10); create table p_2 partition of p for values from (10) to (100); So with patch v2-0002, the following query will be planned with partitionwise join. # explain (costs off) select * from (p as t1 full join p as t2 on t1.k = t2.k) as t12(k1,val1,k2,val2) full join p as t3 on COALESCE(t12.val1, t12.k1) = t3.k; QUERY PLAN -- Append -> Hash Full Join Hash Cond: (COALESCE(t1.val, t1.k) = t3.k) -> Hash Full Join Hash Cond: (t1.k = t2.k) -> Seq Scan on p_1 t1 -> Hash -> Seq Scan on p_1 t2 -> Hash -> Seq Scan on p_1 t3 -> Hash Full Join Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k) -> Hash Full Join Hash Cond: (t1_1.k = t2_1.k) -> Seq Scan on p_2 t1_1 -> Hash -> Seq Scan on p_2 t2_1 -> Hash -> Seq Scan on p_2 t3_1 (19 rows) But as t1.val is not a partition key, actually we cannot use partitionwise join here. If we insert below data into the table, we will get wrong results for the query above. insert into p select 5,15; insert into p select 15,5; Thanks Richard
Pulling up direct-correlated ANY_SUBLINK
Hi, Currently we do not try to pull up sub-select of type ANY_SUBLINK if it refers to any Vars of the parent query, as indicated in the code snippet below: JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Relids available_rels) { ... if (contain_vars_of_level((Node *) subselect, 1)) return NULL; Why do we have this check? Can we try to pull up direct-correlated ANY SubLink with the help of LATERAL? That is, do the pull up in the same way as uncorrelated ANY SubLink, by adding the SubLink's subselect to the query's rangetable, but explicitly set LATERAL for its RangeTblEntry, like: --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1226,13 +1226,6 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); /* -* The sub-select must not refer to any Vars of the parent query. (Vars of -* higher levels should be okay, though.) -*/ - if (contain_vars_of_level((Node *) subselect, 1)) - return NULL; - - /* * The test expression must contain some Vars of the parent query, else * it's not gonna be a join. (Note that it won't have Vars referring to * the subquery, rather Params.) @@ -1267,7 +1260,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, rte = addRangeTableEntryForSubquery(pstate, subselect, makeAlias("ANY_subquery", NIL), - false, + contain_vars_of_level((Node *) subselect, 1), /* lateral */ false); parse->rtable = lappend(parse->rtable, rte); rtindex = list_length(parse->rtable); By this way, we can convert the query: select * from a where a.i = ANY(select i from b where a.j > b.j); To: select * from a SEMI JOIN lateral(select * from b where a.j > b.j) sub on a.i = sub.i; Does this make sense? Thanks Richard
Re: A problem about partitionwise join
Hi Alvaro, Thank you for reviewing this patch. On Wed, Sep 11, 2019 at 4:48 AM Alvaro Herrera from 2ndQuadrant < alvhe...@alvh.no-ip.org> wrote: > So in this patch, the input restrictlist is modified to include the > clauses generated by generate_join_implied_equalities_for_all. That > doesn't seem okay -- doesn't it affect downstream usage of the > restrictlist in the caller of set_joinrel_size_estimates? > Actually the joinclauses generated by generate_join_implied_equalities_for_all only affects the restrictlist used in have_partkey_equi_join to check equi-join conditions for partition keys. The input restrictlist would not be altered. > > I wonder if it's possible to do this by using the ECs directly in > have_partkey_equi_join instead of using them to create fake join > clauses. > Hmm.. I thought about this option and at last figured that what we need to do in have_partkey_equi_join with the ECs is actually the same as in generate_join_implied_equalities_for_all. Maybe I didn't think it correctly. Thanks Richard
Re: Pulling up direct-correlated ANY_SUBLINK
Hi Antonin, On Tue, Sep 10, 2019 at 4:31 PM Antonin Houska wrote: > Richard Guo wrote: > > > Can we try to pull up direct-correlated ANY SubLink with the help of > > LATERAL? > > > By this way, we can convert the query: > > > > select * from a where a.i = ANY(select i from b where a.j > b.j); > > > > To: > > > > select * from a SEMI JOIN lateral(select * from b where a.j > b.j) > > sub on a.i = sub.i; > > > > I tried this a few years ago. This is where the problems started: > > > https://www.postgresql.org/message-id/1386716782.5203.YahooMailNeo%40web162905.mail.bf1.yahoo.com Thank you for this link. Good to know the discussions years ago. > I'm not sure I remember enough, but the problem has something to do with > one > possible strategy to plan SEMI JOIN: unique-ify the inner path and then > perform plain INNER JOIN instead. > > I think the problemm was that the WHERE clause of the subquery didn't > participate in the SEMI JOIN evaluation and was used as filter instead. > Thus > the clause's Vars were not used in unique keys of the inner path and so the > SEMI JOIN didn't work well. > This used to be a problem until it was fixed by commit 043f6ff0, which includes the postponed qual from a LATERAL subquery into the quals seen by make_outerjoininfo(). Thanks Richard
Re: Pulling up direct-correlated ANY_SUBLINK
Hi Antonin, On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska wrote: > > Nevertheless, I don't know how to overcome the problems that I mentioned > upthread. > Do you mean the problem "the WHERE clause of the subquery didn't participate in the SEMI JOIN evaluation"? Good news is it has been fixed by commit 043f6ff0 as I mentioned upthread. Thanks Richard
Re: Pulling up direct-correlated ANY_SUBLINK
Hi Tom, On Tue, Sep 10, 2019 at 9:48 PM Tom Lane wrote: > > > Can we try to pull up direct-correlated ANY SubLink with the help of > > LATERAL? > > Perhaps. But what's the argument that you'd end up with a better > plan? LATERAL pretty much constrains things to use a nestloop, > so I'm not sure there's anything fundamentally different. > This is a point I didn't think of. In that case if the pull-up mostly results in a nestloop then we cannot make sure we will get a better plan. Thank you for pointing it out. Thanks Richard
Re: Pulling up direct-correlated ANY_SUBLINK
On Thu, Sep 12, 2019 at 11:35 PM Antonin Houska wrote: > Richard Guo wrote: > > > On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska > > wrote: > > > > > > Nevertheless, I don't know how to overcome the problems that I > > mentioned > > upthread. > > > > > > Do you mean the problem "the WHERE clause of the subquery didn't > > participate in the SEMI JOIN evaluation"? Good news is it has been > > fixed > > by commit 043f6ff0 as I mentioned upthread. > > Do you say that my old patch (rebased) no longer breaks the regression > tests? > I think so. > > (I noticed your other email in the thread which seems to indicate that > you're > no lo longer interested to work on the feature, but asking out of > curiosity.) > Tom pointed out that even if we pull up the subquery with the help of LATERAL, we cannot make sure we will end up with a better plan, since LATERAL pretty much constrains things to use a nestloop. Hmm, I think what he said makes sense. Thanks Richard
Re: A problem about partitionwise join
Hi Dilip, Thank you for reviewing this patch. On Fri, Sep 20, 2019 at 12:48 PM Dilip Kumar wrote: > On Thu, Aug 29, 2019 at 3:15 PM Richard Guo wrote: > > > > > > Attached is a patch as an attempt to address this issue. The idea is > > quite straightforward. When building partition info for joinrel, we > > generate any possible EC-derived joinclauses of form 'outer_em = > > inner_em', which will be used together with the original restrictlist to > > check if there exists an equi-join condition for each pair of partition > > keys. > > > > Any comments are welcome! > /* > + * generate_join_implied_equalities_for_all > + * Create any EC-derived joinclauses of form 'outer_em = inner_em'. > + * > + * This is used when building partition info for joinrel. > + */ > +List * > +generate_join_implied_equalities_for_all(PlannerInfo *root, > + Relids join_relids, > + Relids outer_relids, > + Relids inner_relids) > > I think we need to have more detailed comments about why we need this > separate function, we can also explain that > generate_join_implied_equalities function will avoid > the join clause if EC has the constant but for partition-wise join, we > need that clause too. > Thank you for the suggestion. > > + while ((i = bms_next_member(matching_ecs, i)) >= 0) > + { > + EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, > i); > + List*outer_members = NIL; > + List*inner_members = NIL; > + ListCell *lc1; > + > + /* Do not consider this EC if it's ec_broken */ > + if (ec->ec_broken) > + continue; > + > + /* Single-member ECs won't generate any deductions */ > + if (list_length(ec->ec_members) <= 1) > + continue; > + > > I am wondering isn't it possible to just process the missing join > clause? I mean 'generate_join_implied_equalities' has only skipped > the ECs which has const so > can't we create join clause only for those ECs and append it the > "Restrictlist" we already have? I might be missing something? > For ECs without const, 'generate_join_implied_equalities' also has skipped some join clauses since it only selects the joinclause with 'best_score' between outer members and inner members. And the missing join clauses are needed to generate partitionwise join. That's why the query below cannot be planned as partitionwise join, as we have missed joinclause 'foo.k = bar.k'. select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k; And yes 'generate_join_implied_equalities_for_all' will create join clauses that have existed in restrictlist. I think it's OK since the same RestrictInfo deduced from EC will share the same pointer and list_concat_unique_ptr will make sure there are no duplicates in the restrictlist used by have_partkey_equi_join. Thanks Richard
Re: d25ea01275 and partitionwise join
On Thu, Sep 19, 2019 at 4:15 PM Amit Langote wrote: > Hi Richard, > > Thanks a lot for taking a close look at the patch and sorry about the > delay. > > On Wed, Sep 4, 2019 at 5:29 PM Richard Guo wrote: > >> On Wed, Sep 4, 2019 at 10:01 AM Amit Langote > wrote: > > I'm reviewing v2-0002 and I have concern about how COALESCE expr is > > processed in match_join_arg_to_partition_keys(). > > > > If there is a COALESCE expr with first arg being non-partition key expr > > and second arg being partition key, the patch would match it to the > > partition key, which may result in wrong results in some cases. > > > > For instance, consider the partition table below: > > > > create table p (k int, val int) partition by range(k); > > create table p_1 partition of p for values from (1) to (10); > > create table p_2 partition of p for values from (10) to (100); > > > > So with patch v2-0002, the following query will be planned with > > partitionwise join. > > > > # explain (costs off) > > select * from (p as t1 full join p as t2 on t1.k = t2.k) as > t12(k1,val1,k2,val2) > > full join p as t3 on COALESCE(t12.val1, > t12.k1) = t3.k; > > QUERY PLAN > > -- > > Append > >-> Hash Full Join > > Hash Cond: (COALESCE(t1.val, t1.k) = t3.k) > > -> Hash Full Join > >Hash Cond: (t1.k = t2.k) > >-> Seq Scan on p_1 t1 > >-> Hash > > -> Seq Scan on p_1 t2 > > -> Hash > >-> Seq Scan on p_1 t3 > >-> Hash Full Join > > Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k) > > -> Hash Full Join > >Hash Cond: (t1_1.k = t2_1.k) > >-> Seq Scan on p_2 t1_1 > >-> Hash > > -> Seq Scan on p_2 t2_1 > > -> Hash > >-> Seq Scan on p_2 t3_1 > > (19 rows) > > > > But as t1.val is not a partition key, actually we cannot use > > partitionwise join here. > > > > If we insert below data into the table, we will get wrong results for > > the query above. > > > > insert into p select 5,15; > > insert into p select 15,5; > > Good catch! It's quite wrong to use COALESCE(t12.val1, t12.k1) = t3.k > for partitionwise join as the COALESCE expression might as well output > the value of val1 which doesn't conform to partitioning. > > I've fixed match_join_arg_to_partition_keys() to catch that case and > fail. Added a test case as well. > > Please find attached updated patches. > Thank you for the fix. Will review. Thanks Richard
Re: NOT IN subquery optimization
On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng wrote: > Resend the patch with a whitespace removed so that "git apply patch" works > directly. > > Hi Zheng, I have reviewed your patch. Good job except two issues I can find: 1. The patch would give wrong results when the inner side is empty. In this case, the whole data from outer side should be in the outputs. But with the patch, we will lose the NULLs from outer side. 2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash join or merge join to do the ANTI JOIN. Nested loop becomes the only choice, which is low-efficency. Thanks Richard
Re: Parallel grouping sets
On Wed, Jun 12, 2019 at 10:58 AM Richard Guo wrote: > Hi all, > > Paul and I have been hacking recently to implement parallel grouping > sets, and here we have two implementations. > > Implementation 1 > > > Attached is the patch and also there is a github branch [1] for this > work. > Rebased with the latest master. Thanks Richard v3-0001-Implementing-parallel-grouping-sets.patch Description: Binary data
Re: Parallel grouping sets
On Tue, Jul 30, 2019 at 11:05 PM Tomas Vondra wrote: > On Tue, Jul 30, 2019 at 03:50:32PM +0800, Richard Guo wrote: > >On Wed, Jun 12, 2019 at 10:58 AM Richard Guo wrote: > > > >> Hi all, > >> > >> Paul and I have been hacking recently to implement parallel grouping > >> sets, and here we have two implementations. > >> > >> Implementation 1 > >> > >> > >> Attached is the patch and also there is a github branch [1] for this > >> work. > >> > > > >Rebased with the latest master. > > > > Hi Richard, > > thanks for the rebased patch. I think the patch is mostly fine (at least I > don't see any serious issues). A couple minor comments: > Hi Tomas, Thank you for reviewing this patch. > > 1) I think get_number_of_groups() would deserve a short explanation why > it's OK to handle (non-partial) grouping sets and regular GROUP BY in the > same branch. Before these cases were clearly separated, now it seems a bit > mixed up and it may not be immediately obvious why it's OK. > Added a short comment in get_number_of_groups() explaining the behavior when doing partial aggregation for grouping sets. > > 2) There are new regression tests, but they are not added to any schedule > (parallel or serial), and so are not executed as part of "make check". I > suppose this is a mistake. > Yes, thanks. Added the new regression test in parallel_schedule and serial_schedule. > > 3) The regression tests do check plan and results like this: > > EXPLAIN (COSTS OFF, VERBOSE) SELECT ...; > SELECT ... ORDER BY 1, 2, 3; > > which however means that the query might easily use a different plan than > what's verified in the eplain (thanks to the additional ORDER BY clause). > So I think this should explain and execute the same query. > > (In this case the plans seems to be the same, but that may easily change > in the future, and we could miss it here, failing to verify the results.) > Thank you for pointing this out. Fixed it in V4 patch. > > 4) It might be a good idea to check the negative case too, i.e. a query on > data set that we should not parallelize (because the number of partial > groups would be too high). > Yes, agree. Added a negative case. > > > Do you have any plans to hack on the second approach too? AFAICS those two > approaches are complementary (address different data sets / queries), and > it would be nice to have both. One of the things I've been wondering is if > we need to invent gset_id as a new concept, or if we could simply use the > existing GROUPING() function - that uniquely identifies the grouping set. > > Yes, I'm planning to hack on the second approach in short future. I'm also reconsidering the gset_id stuff since it brings a lot of complexity for the second approach. I agree with you that we can try GROUPING() function to see if it can replace gset_id. Thanks Richard v4-0001-Implementing-parallel-grouping-sets.patch Description: Binary data
Re: How to retain lesser paths at add_path()?
On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai wrote: > 2019年8月1日(木) 1:41 Tom Lane : > > > > Robert Haas writes: > > > Yeah, but I have to admit that this whole design makes me kinda > > > uncomfortable. Every time somebody comes up with a new figure of > > > merit, it increases not only the number of paths retained but also the > > > cost of comparing two paths to possibly reject one of them. A few > > > years ago, you came up with the (good) idea of rejecting some join > > > paths before actually creating the paths, and I wonder if we ought to > > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has > > > been saying, we ought to think about planning top-down with > > > memoization instead of bottom up (yeah, I know that's a huge change). > > > It just feels like the whole idea of a list of paths ordered by cost > > > breaks down when there are so many ways that a not-cheapest path can > > > still be worth keeping. Not sure exactly what would be better, though. > > > > Yeah, I agree that add_path is starting to feel creaky. I don't > > know what to do instead though. Changing to a top-down design > > sounds like it would solve some problems while introducing others > > (not to mention the amount of work and breakage involved). > > > Hmm... It looks the problem we ought to revise about path construction > is much larger than my expectation, and uncertain for me how much works > are needed. > > Although it might be a workaround until fundamental reconstruction, > how about to have a margin of estimated cost to reject paths? > Current add_path() immediately rejects lesser paths if its cost is > even a little more expensive than the compared one. One the other hands, > Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on costs of two paths, although the fuzz factor (1%) is hard coded and not user-controllable. > I understand it is not an essential re-design of path-construction logic, > and > may have limitation. However, amount of works are reasonable and no side- > effect. (current behavior = 0% threshold). > How about your opinions? > > How's about Tom's suggestion on adding another dimension in add_path() to be considered, just like how it considers paths of better sort order or parallel-safe? Thanks Richard
Re: Partial join
On Thu, Aug 1, 2019 at 5:38 PM Arne Roland wrote: > Hello, > > I attached one example of a partitioned table with multi column partition > key. I also attached the output. > Disabling the hash_join is not really necessary, it just shows the more > drastic result in the case of low work_mem. > > Comparing the first and the second query I was surprised to see that SET > enable_partitionwise_join could cause the costs to go up. Shouldn't the > paths of the first query be generated as well? > > The third query seems to have a different issue. That one is close to my > original performance problem. It looks to me like the push down of the sl > condition stops the optimizer considering a partial join. > If so would it be sane to keep a copy of the original quals to make the > partial join possible? Do you have better ideas? > For the third query, a rough investigation shows that, the qual 'sl = 5' and 'sc.sl = sg.sl' will form an equivalence class and generate two implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed down to the base rels. One consequence of the deduction is when constructing restrict lists for the joinrel, we lose the original restrict 'sc.sl = sg.sl', and this would fail the check have_partkey_equi_join(), which checks if there exists an equi-join condition for each pair of partition keys. As a result, this joinrel would not be considered as an input to further partitionwise joins. We need to fix this. Thanks Richard
Re: Partial join
On Thu, Aug 1, 2019 at 10:15 PM Tom Lane wrote: > Richard Guo writes: > > For the third query, a rough investigation shows that, the qual 'sl = > > 5' and 'sc.sl = sg.sl' will form an equivalence class and generate two > > implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed > > down to the base rels. One consequence of the deduction is when > > constructing restrict lists for the joinrel, we lose the original > > restrict 'sc.sl = sg.sl', and this would fail the check > > have_partkey_equi_join(), which checks if there exists an equi-join > > condition for each pair of partition keys. As a result, this joinrel > > would not be considered as an input to further partitionwise joins. > > > We need to fix this. > > Uh ... why? The pushed-down restrictions should result in pruning > away any prunable partitions at the scan level, leaving nothing for > the partitionwise join code to do. > Hmm..In the case of multiple partition keys, for range partitioning, if we have no clauses for a given key, any later keys would not be considered for partition pruning. That is to day, for table 'p partition by range (k1, k2)', quals like 'k2 = Const' would not prune partitions. For query: select * from p as t1 join p as t2 on t1.k1 = t2.k1 and t1.k2 = t2.k2 and t1.k2 = 2; Since we don't consider ECs containing consts when generating join clauses, we don't have restriction 't1.k2 = t2.k2' when building the joinrel. As a result, partitionwise join is not considered as it requires there existing an equi-join condition for each pair of partition keys. Is this a problem? What's your opinion? Thanks Richard
Re: Partial join
On Thu, Aug 1, 2019 at 7:46 PM Arne Roland wrote: > Hello Richard, > > thanks for your quick reply. > > > > We need to fix this. > > > Do you have a better idea than just keeping the old quals - possibly just > the ones that get eliminated - in a separate data structure? Is the push > down of quals the only case of elimination of quals, only counting the ones > which happen before the restrict lists are generated? > In you case, the restriction 'sl = sl' is just not generated for the join, because it forms an EC with const, which is not considered when generating join clauses. Please refer to the code snippet below: @@ -1164,8 +1164,8 @@ generate_join_implied_equalities(PlannerInfo *root, List *sublist = NIL; /* ECs containing consts do not need any further enforcement */ if (ec->ec_has_const) continue; Thanks Richard
Re: Serialization questions
On Wed, Aug 21, 2019 at 9:30 AM Alex wrote: > > first issue "set default_transaction_isolation to 'serializable';" on the > both sessions, then run: > > Session 1: begin; select * from t; (2 rows selected); > Session 2: delete from t; (committed automatically) > Session 1: commit; (commit successfully). > > looks the reads in session 1 has no impact on the session 2 at all which > is conflicted with the document > This behavior makes sense to me. The effect can be considered as we execute the two sessions in a serial order of first session 1 and then session 2. Thanks Richard
A problem about partitionwise join
Hi All, To generate partitionwise join, we need to make sure there exists an equi-join condition for each pair of partition keys, which is performed by have_partkey_equi_join(). This makes sense and works well. But if, let's say, one certain pair of partition keys (foo.k = bar.k) has formed an equivalence class containing consts, no join clause would be generated for it, since we have already generated 'foo.k = const' and 'bar.k = const' and pushed them into the proper restrictions earlier. This will make partitionwise join fail to be planned if there are multiple partition keys and the pushed-down restrictions 'xxx = const' fail to prune away any partitions. Consider the examples below: create table p (k1 int, k2 int, val int) partition by range(k1,k2); create table p_1 partition of p for values from (1,1) to (10,100); create table p_2 partition of p for values from (10,100) to (20,200); If we are joining on each pair of partition keys, we can generate partitionwise join: # explain (costs off) select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2; QUERY PLAN -- Append -> Hash Join Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2)) -> Seq Scan on p_1 foo -> Hash -> Seq Scan on p_1 bar -> Hash Join Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2)) -> Seq Scan on p_2 foo_1 -> Hash -> Seq Scan on p_2 bar_1 (11 rows) But if we add another qual 'foo.k2 = const', we will be unable to generate partitionwise join any more, because have_partkey_equi_join() thinks not every partition key has an equi-join condition. # explain (costs off) select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16; QUERY PLAN - Hash Join Hash Cond: (foo.k1 = bar.k1) -> Append -> Seq Scan on p_1 foo Filter: (k2 = 16) -> Seq Scan on p_2 foo_1 Filter: (k2 = 16) -> Hash -> Append -> Seq Scan on p_1 bar Filter: (k2 = 16) -> Seq Scan on p_2 bar_1 Filter: (k2 = 16) (13 rows) Is this a problem? Thanks Richard
Re: A problem about partitionwise join
On Tue, Aug 27, 2019 at 8:51 AM Amit Langote wrote: > Hi Richard, > > On Mon, Aug 26, 2019 at 6:33 PM Richard Guo wrote: > > > > Hi All, > > > > To generate partitionwise join, we need to make sure there exists an > > equi-join condition for each pair of partition keys, which is performed > > by have_partkey_equi_join(). This makes sense and works well. > > > > But if, let's say, one certain pair of partition keys (foo.k = bar.k) > > has formed an equivalence class containing consts, no join clause would > > be generated for it, since we have already generated 'foo.k = const' and > > 'bar.k = const' and pushed them into the proper restrictions earlier. > > > > This will make partitionwise join fail to be planned if there are > > multiple partition keys and the pushed-down restrictions 'xxx = const' > > fail to prune away any partitions. > > > > Consider the examples below: > > > > create table p (k1 int, k2 int, val int) partition by range(k1,k2); > > create table p_1 partition of p for values from (1,1) to (10,100); > > create table p_2 partition of p for values from (10,100) to (20,200); > > > > If we are joining on each pair of partition keys, we can generate > > partitionwise join: > > > > # explain (costs off) > > select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = > bar.k2; > > QUERY PLAN > > -- > > Append > >-> Hash Join > > Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2)) > > -> Seq Scan on p_1 foo > > -> Hash > >-> Seq Scan on p_1 bar > >-> Hash Join > > Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2)) > > -> Seq Scan on p_2 foo_1 > > -> Hash > >-> Seq Scan on p_2 bar_1 > > (11 rows) > > > > But if we add another qual 'foo.k2 = const', we will be unable to > > generate partitionwise join any more, because have_partkey_equi_join() > > thinks not every partition key has an equi-join condition. > > > > # explain (costs off) > > select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = > bar.k2 and foo.k2 = 16; > >QUERY PLAN > > - > > Hash Join > >Hash Cond: (foo.k1 = bar.k1) > >-> Append > > -> Seq Scan on p_1 foo > >Filter: (k2 = 16) > > -> Seq Scan on p_2 foo_1 > >Filter: (k2 = 16) > >-> Hash > > -> Append > >-> Seq Scan on p_1 bar > > Filter: (k2 = 16) > >-> Seq Scan on p_2 bar_1 > > Filter: (k2 = 16) > > (13 rows) > > > > Is this a problem? > > Perhaps. Maybe it has to do with the way have_partkey_equi_join() has > been coded. If it was coded such that it figured out on its own that > the equivalence (foo.k2, bar.k2, ...) does exist, then that would > allow partitionwise join to occur, which I think would be OK to do. > But maybe I'm missing something. > > This should be caused by how we deduce join clauses from equivalence classes. ECs containing consts will not be considered so we cannot generate (foo.k2 = bar.k2) for the query above. In addition, when generating join clauses from equivalence classes, we only select the joinclause with the 'best score', or the first joinclause with a score of 3. This may make us miss some joinclause on partition keys. Check the query below as a more illustrative example: create table p (k int, val int) partition by range(k); create table p_1 partition of p for values from (1) to (10); create table p_2 partition of p for values from (10) to (100); If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate partitionwise join: # explain (costs off) select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val; QUERY PLAN - Append -> Hash Join Hash Cond: (foo.k = bar.k) -> Seq Scan on p_1 foo -> Hash -> Seq Scan on p_1 bar Filter: (k = val) -> Hash Join Hash Cond: (foo_1.k = bar_1.k) -> Seq Scan on p_2 foo_1 -> Hash -> Seq Scan on p_2 bar_1 Filter: (k = val) (13 rows) But if we exchange the order of the two quals
Re: A problem about partitionwise join
On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita wrote: > Hi, > > On Tue, Aug 27, 2019 at 4:57 PM Richard Guo wrote: > > Check the query below as a more illustrative example: > > > > create table p (k int, val int) partition by range(k); > > create table p_1 partition of p for values from (1) to (10); > > create table p_2 partition of p for values from (10) to (100); > > > > If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate > > partitionwise join: > > > > # explain (costs off) > > select * from p as foo join p as bar on foo.k = bar.k and foo.k = > bar.val; > >QUERY PLAN > > - > > Append > >-> Hash Join > > Hash Cond: (foo.k = bar.k) > > -> Seq Scan on p_1 foo > > -> Hash > >-> Seq Scan on p_1 bar > > Filter: (k = val) > >-> Hash Join > > Hash Cond: (foo_1.k = bar_1.k) > > -> Seq Scan on p_2 foo_1 > > -> Hash > >-> Seq Scan on p_2 bar_1 > > Filter: (k = val) > > (13 rows) > > > > But if we exchange the order of the two quals to 'foo.k = bar.val and > > foo.k = bar.k', then partitionwise join cannot be generated any more, > > because we only have joinclause 'foo.k = bar.val' as it first reached > > score of 3. We have missed the joinclause on the partition key although > > it does exist. > > > > # explain (costs off) > > select * from p as foo join p as bar on foo.k = bar.val and foo.k = > bar.k; > >QUERY PLAN > > - > > Hash Join > >Hash Cond: (foo.k = bar.val) > >-> Append > > -> Seq Scan on p_1 foo > > -> Seq Scan on p_2 foo_1 > >-> Hash > > -> Append > >-> Seq Scan on p_1 bar > > Filter: (val = k) > >-> Seq Scan on p_2 bar_1 > > Filter: (val = k) > > (11 rows) > > I think it would be nice if we can address this issue. > Thank you. Attached is a patch as an attempt to address this issue. The idea is quite straightforward. When building partition info for joinrel, we generate any possible EC-derived joinclauses of form 'outer_em = inner_em', which will be used together with the original restrictlist to check if there exists an equi-join condition for each pair of partition keys. Any comments are welcome! Thanks Richard v1-0001-Fix-up-partitionwise-join.patch Description: Binary data
Re: A problem about partitionwise join
On Fri, Aug 30, 2019 at 2:08 AM Etsuro Fujita wrote: > On Thu, Aug 29, 2019 at 6:45 PM Richard Guo wrote: > > On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita > wrote: > >> On Tue, Aug 27, 2019 at 4:57 PM Richard Guo wrote: > >> > Check the query below as a more illustrative example: > >> > > >> > create table p (k int, val int) partition by range(k); > >> > create table p_1 partition of p for values from (1) to (10); > >> > create table p_2 partition of p for values from (10) to (100); > >> > > >> > If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate > >> > partitionwise join: > >> > > >> > # explain (costs off) > >> > select * from p as foo join p as bar on foo.k = bar.k and foo.k = > bar.val; > >> >QUERY PLAN > >> > - > >> > Append > >> >-> Hash Join > >> > Hash Cond: (foo.k = bar.k) > >> > -> Seq Scan on p_1 foo > >> > -> Hash > >> >-> Seq Scan on p_1 bar > >> > Filter: (k = val) > >> >-> Hash Join > >> > Hash Cond: (foo_1.k = bar_1.k) > >> > -> Seq Scan on p_2 foo_1 > >> > -> Hash > >> >-> Seq Scan on p_2 bar_1 > >> > Filter: (k = val) > >> > (13 rows) > >> > > >> > But if we exchange the order of the two quals to 'foo.k = bar.val and > >> > foo.k = bar.k', then partitionwise join cannot be generated any more, > >> > because we only have joinclause 'foo.k = bar.val' as it first reached > >> > score of 3. We have missed the joinclause on the partition key > although > >> > it does exist. > >> > > >> > # explain (costs off) > >> > select * from p as foo join p as bar on foo.k = bar.val and foo.k = > bar.k; > >> >QUERY PLAN > >> > - > >> > Hash Join > >> >Hash Cond: (foo.k = bar.val) > >> >-> Append > >> > -> Seq Scan on p_1 foo > >> > -> Seq Scan on p_2 foo_1 > >> >-> Hash > >> > -> Append > >> >-> Seq Scan on p_1 bar > >> > Filter: (val = k) > >> >-> Seq Scan on p_2 bar_1 > >> > Filter: (val = k) > >> > (11 rows) > >> > >> I think it would be nice if we can address this issue. > > > Attached is a patch as an attempt to address this issue. The idea is > > quite straightforward. When building partition info for joinrel, we > > generate any possible EC-derived joinclauses of form 'outer_em = > > inner_em', which will be used together with the original restrictlist to > > check if there exists an equi-join condition for each pair of partition > > keys. > > Thank you for the patch! Will review. Could you add the patch to the > upcoming CF so that it doesn’t get lost? > Added this patch: https://commitfest.postgresql.org/24/2266/ Thanks Richard
Re: A problem about partitionwise join
On Fri, Nov 29, 2019 at 11:03 AM Michael Paquier wrote: > On Tue, Nov 26, 2019 at 08:35:33PM +0900, Etsuro Fujita wrote: > > I've just started reviewing this patch. One comment I have for now > > is: this is categorized into Bug Fixes, but we have a workaround at > > least to the regression test case in the patch (ie, just reorder join > > clauses), so this seems to me more like an improvement than a bug fix. > > Hmm. Agreed. Changed the category and moved to next CF. > Thanks Etsuro for the comment and thanks Michael for the change. Thanks Richard
Re: Parallel grouping sets
On Sun, Dec 1, 2019 at 10:03 AM Michael Paquier wrote: > On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote: > > Richard pointed out that he get incorrect results with the patch I > > attached, there are bugs somewhere, > > I fixed them now and attached the newest version, please refer to [1] for > > the fix. > > Mr Robot is reporting that the latest patch fails to build at least on > Windows. Could you please send a rebase? I have moved for now the > patch to next CF, waiting on author. Thanks for reporting this issue. Here is the rebase. Thanks Richard v3-0001-Support-for-parallel-grouping-sets.patch Description: Binary data
Re: A problem about partitionwise join
Rebased the patch with latest master and also addressed the test case failure reported by PostgreSQL Patch Tester. Thanks Richard On Fri, Nov 29, 2019 at 11:35 AM Etsuro Fujita wrote: > On Fri, Nov 29, 2019 at 12:08 PM Richard Guo wrote: > > On Fri, Nov 29, 2019 at 11:03 AM Michael Paquier > wrote: > >> On Tue, Nov 26, 2019 at 08:35:33PM +0900, Etsuro Fujita wrote: > >> > I've just started reviewing this patch. One comment I have for now > >> > is: this is categorized into Bug Fixes, but we have a workaround at > >> > least to the regression test case in the patch (ie, just reorder join > >> > clauses), so this seems to me more like an improvement than a bug fix. > >> > >> Hmm. Agreed. Changed the category and moved to next CF. > > > thanks Michael for the change. > > +1 > > Best regards, > Etsuro Fujita > v2-0001-Fix-up-partitionwise-join.patch Description: Binary data
Re: Parallel grouping sets
I realized that there are two patches in this thread that are implemented according to different methods, which causes confusion. So I decide to update this thread with only one patch, i.e. the patch for 'Implementation 1' as described in the first email and then move the other patch to a separate thread. With this idea, here is the patch for 'Implementation 1' that is rebased with the latest master. Thanks Richard On Wed, Jan 8, 2020 at 3:24 PM Richard Guo wrote: > > On Sun, Dec 1, 2019 at 10:03 AM Michael Paquier > wrote: > >> On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote: >> > Richard pointed out that he get incorrect results with the patch I >> > attached, there are bugs somewhere, >> > I fixed them now and attached the newest version, please refer to [1] >> for >> > the fix. >> >> Mr Robot is reporting that the latest patch fails to build at least on >> Windows. Could you please send a rebase? I have moved for now the >> patch to next CF, waiting on author. > > > Thanks for reporting this issue. Here is the rebase. > > Thanks > Richard > v5-0001-Implementing-parallel-grouping-sets.patch Description: Binary data
Restore CurrentUserId only if 'prevUser' is valid when abort transaction
Hi, This is a follow-up to the issue described in thread https://www.postgresql.org/message-id/CAMbWs4-Mys%3DhBQSevTA8Zpd-TYFnb%3DXuHhN2TnktXMsfMUbjiQ%40mail.gmail.com In short, during the first transaction starting phase within a backend, if there is an 'ereport' after setting transaction state but before saving CurrentUserId into 'prevUser' in TransactionStateData, 'prevUser' will remain as InvalidOid. Then in AbortTransaction(), CurrentUserId is restored with 'prevUser'. As a result, CurrentUserId will be InvalidOid in the rest of the session. Attached is a patch that fixes this issue. Thanks Richard 0001-Restore-CurrentUserId-only-if-prevUser-is-valid.patch Description: Binary data
Re: Restore CurrentUserId only if 'prevUser' is valid when abort transaction
Hi Michael, Thanks for your input. On Thu, Oct 11, 2018 at 11:38 AM, Michael Paquier wrote: > On Wed, Oct 10, 2018 at 03:37:50PM +0800, Richard Guo wrote: > > This is a follow-up to the issue described in thread > > https://www.postgresql.org/message-id/CAMbWs4-Mys%3DhBQSevTA8Zpd-TYFnb% > 3DXuHhN2TnktXMsfMUbjiQ%40mail.gmail.com > > > > In short, during the first transaction starting phase within a backend, > if > > there is an 'ereport' after setting transaction state but before saving > > CurrentUserId into 'prevUser' in TransactionStateData, 'prevUser' will > > remain as InvalidOid. Then in AbortTransaction(), CurrentUserId is > > restored with 'prevUser'. As a result, CurrentUserId will be > > InvalidOid in the rest of the session. > > > > Attached is a patch that fixes this issue. > > I guess that's an issue showing up with Greenplum as you folks are > likely tweaking how a transaction start happens? > > It is as easy as doing something like that in StartTransaction() to get > into a failed state: > s->state = TRANS_START; > s->transactionId = InvalidTransactionId;/* until assigned */ > > + { > + struct stat statbuf; > + if (stat("/tmp/hoge", &statbuf) == 0) > + elog(ERROR, "hoge invalid state!"); > + } > > Then do something like the following: > 1) Start a session > 2) touch /tmp/hoge > 3) Issue BEGIN, which fails and initializes CurrentUserId to InvalidOid. > 4) rm /tmp/hoge > 3) any DDL causes the system to crash. > Yes, you're right. This issue shows up when we were adding inside StartTransaction() some resource-group related logic which would error out in some cases. Your example reproduces the same scene. > > Anyway, looking at the patch, I am poked by the comment on top of > GetUserIdAndSecContext which states that InvalidOid can be a possible > value. It seems to me that the root of the problem is that TRANS_STATE > is enforced to TRANS_INPROGRESS when aborting a transaction in a > starting state, in which case we should not have to reset CurrentUserId > as it has never been set. The main reason why this was done is to > prevent a warning message to show up. > >From the comment, Get/SetUserIdAndSecContext() has considered the case of InvalidOid, but fails to handle it properly in AbortTransaction(). I think it is a better idea to avoid adjusting the state to TRANS_INPROGRESS from TRANS_START when aborting a transaction, as your patch does, since its only purpose is to suppress warning message. > > Tom, eedb068c0 is in cause here, and that's your commit. Could you > check if something like the attached is adapted? I am pretty sure that > we still want the sub-transaction part to still reset CurrentUserId > unconditionally by the way. > > Thanks, > -- > Michael >
Pull up sublink of type 'NOT NOT (expr)'
Hi hackers, Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would not be considered when pulling up sublinks. For instance: gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from b)); QUERY PLAN - Seq Scan on a (cost=51.50..85.62 rows=1005 width=8) Filter: (hashed SubPlan 1) SubPlan 1 -> Seq Scan on b (cost=0.00..44.00 rows=3000 width=4) (4 rows) Should we give it a chance, like the attached does? Thanks Richard 0001-Pull-up-sublink-of-type-NOT-NOT-expr.patch Description: Binary data
Re: Pull up sublink of type 'NOT NOT (expr)'
Hi Alex, Yes hashed SubPlan preserves order and may be faster than hash join in some cases. But I don't think that is a reason good enough to prevent the subplan from being converted to join. Let's suppose the subplan is uncorrelated, otherwise hashed SubPlan would not be used. Hashed SubPlan can only be applied when the size of the subquery result can fit in work_mem. When the data size increases or work_mem is set to a smaller value that more than one batch is needed, hashed SubPlan will not be used and the total cost will increase dramatically. Hash join, by contrast, handles multiple batches better. Below is an example to show the behavior of hash join and hashed SubPlan for single batch and multiple batches. create table a (i int); create table b (i int); insert into a select i from generate_series(1,1)i; insert into b select i from generate_series(1,1)i; *For single batch:* *Hash Join* gpadmin=# explain analyze select * from a where a.i in (select b.i from b); QUERY PLAN - Hash Semi Join (cost=270.00..552.50 rows=1 width=4) (actual time=4.332..9.499 rows=1 loops=1) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..145.00 rows=1 width=4) (actual time=0.016..1.328 rows=1 loops=1) -> Hash (cost=145.00..145.00 rows=1 width=4) (actual time=4.292..4.292 rows=1 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 480kB -> Seq Scan on b (cost=0.00..145.00 rows=1 width=4) (actual time=0.013..1.817 rows=1 loops=1) Planning Time: 0.236 ms Execution Time: 10.099 ms (8 rows) *Hashed SubPlan* gpadmin=# explain analyze select * from a where not not a.i in (select b.i from b); QUERY PLAN - Seq Scan on a (cost=170.00..340.00 rows=5000 width=4) (actual time=6.359..11.843 rows=1 loops=1) Filter: (hashed SubPlan 1) SubPlan 1 -> Seq Scan on b (cost=0.00..145.00 rows=1 width=4) (actual time=0.017..1.749 rows=1 loops=1) Planning Time: 0.109 ms Execution Time: 12.905 ms (6 rows) insert into b select i from generate_series(1,10)i; insert into b select i from generate_series(1,10)i; *For multiple batches:* *Hash Join* gpadmin=# explain analyze select * from a where a.i in (select b.i from b); QUERY PLAN - Hash Semi Join (cost=6476.00..7659.50 rows=1 width=4) (actual time=89.915..121.016 rows=1 loops=1) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..145.00 rows=1 width=4) (actual time=0.020..1.779 rows=1 loops=1) -> Hash (cost=3030.00..3030.00 rows=21 width=4) (actual time=89.666..89.667 rows=21 loops=1) Buckets: 131072 Batches: 4 Memory Usage: 2860kB -> Seq Scan on b (cost=0.00..3030.00 rows=21 width=4) (actual time=0.015..37.686 rows=21 loops=1) Planning Time: 0.256 ms Execution Time: 121.769 ms (8 rows) *Plain SubPlan* gpadmin=# explain analyze select * from a where not not a.i in (select b.i from b); QUERY PLAN - Seq Scan on a (cost=0.00..27130170.00 rows=5000 width=4) (actual time=0.030..9994.036 rows=1 loops=1) Filter: (SubPlan 1) SubPlan 1 -> Materialize (cost=0.00..4901.00 rows=21 width=4) (actual time=0.000..0.359 rows=5000 loops=1) -> Seq Scan on b (cost=0.00..3030.00 rows=21 width=4) (actual time=0.010..1.673 rows=1 loops=1) Planning Time: 0.077 ms Execution Time: 9995.556 ms (7 rows) Thanks Richard On Wed, Oct 24, 2018 at 12:39 AM, Alexey Bashtanov wrote: > Hello Richard, > > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would > not > be considered when pulling up sublinks. For instance: > > gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from > b)); > QUERY PLAN > - > Seq Scan on a (cost=51.50..85.62 rows=1005 width=8) >Filter: (hashed SubPlan 1) >SubPlan 1 > -> Seq Scan on b (cost=0.00..44.00 rows=3000 width=4) > (4 rows) > > > Should we give it a chance, like the attached does? > > > Sometimes hashed subplan is faster than hash join and than all the other > options, as it preserves the order. > Using NOT NOT, one can control whether to use it or not: > https://pgblog.bashtanov.com/2017/12/08/double-negative- > and-query-performance/ (test case and results in the bottom of
Re: Pull up sublink of type 'NOT NOT (expr)'
Hi Tom, Thanks for reviewing. On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane wrote: > Richard Guo writes: > > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink > would > > not be considered when pulling up sublinks. > > Yup. > > > Should we give it a chance, like the attached does? > > What is the argument that this occurs often enough to be worth expending > extra cycles and code space on? > > If we do do something like this, I'd be inclined to make it handle > any-number-of-consecutive-NOTs, and maybe remove NOT NOT over an ANY, > not just EXISTS. But I don't honestly think that it's worth troubling > over. Do even the dumbest ORMs generate such code? > What this patch does is to recursively remove NOT NOT over a SubLink, so it actually can handle any-number-of-consecutive-NOTs, both over ANY and over EXISTS. Over ANY: gpadmin=# explain select * from a where not not not not a.i in (select i from b); QUERY PLAN --- Hash Join (cost=42.75..93.85 rows=1130 width=8) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=40.25..40.25 rows=200 width=4) -> HashAggregate (cost=38.25..40.25 rows=200 width=4) Group Key: b.i -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=4) (7 rows) Over EXISTS: gpadmin=# explain select * from a where not not not not exists (select 1 from b where a.i = b.i); QUERY PLAN --- Hash Join (cost=42.75..93.85 rows=1130 width=8) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Hash (cost=40.25..40.25 rows=200 width=4) -> HashAggregate (cost=38.25..40.25 rows=200 width=4) Group Key: b.i -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=4) (7 rows) I am not using an ORM, but just considering maybe it would be better if PostgreSQL can do such pull-up. Tom, what's your suggestion? Is it worthwhile expending several lines of codes to do this pull-up? Thanks Richard > >
Re: Pull up sublink of type 'NOT NOT (expr)'
Hi Tom, Thanks for your input. On Fri, Nov 16, 2018 at 6:06 AM, Tom Lane wrote: > Richard Guo writes: > > On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane wrote: > >> What is the argument that this occurs often enough to be worth expending > >> extra cycles and code space on? > > > I am not using an ORM, but just considering maybe it would be better if > > PostgreSQL can do such pull-up. > > Tom, what's your suggestion? Is it worthwhile expending several lines of > > codes to do this pull-up? > > Well, really, if we're just doing this on speculation and don't even have > any concrete indication that anybody writes code like that, I can't get > excited about expending even a few lines of code on it. > > The reason why we perform optimizations similar to this in places like > eval_const_expressions is (IMO anyway) that transformations such as > function inlining and subquery pullup can create parse trees that look > like this, even when the original query was perfectly sane and without > obvious redundancy. However, because pull_up_sublinks runs before any > of that, it would only ever see NOT NOT if someone had actually written > such a thing. So to justify expending any code space or cycles on > optimizing this, you have to make the case that that actually happens > in the wild, and does so often enough to justify wasting some (admittedly > small) number of cycles for everybody else. I'd kind of like to see some > actual field occurrence of NOT NOT over an optimizable IN/EXISTS before > deciding that it's worth doing. > > It's sort of annoying that we have to run pull_up_sublinks before we do > scalar expression simplification. If we could do that in the other order, > NOT NOT elimination would fall out automatically, and we'd also be able > to recognize some other cases where it initially seems that an IN or > EXISTS is not at top level, but it becomes so after we const-fold, apply > DeMorgan's laws, etc. However, according to the point I made above, > it makes more sense to apply expression simplification after we've > completed pullup-like operations. So we can't really get there from > here unless we wanted to do (parts of?) expression simplification twice, > which is unlikely to win often enough to justify the cost. > Agree! If we do expression simplification in the other order, we have to do that twice, which is not a good idea. > > So I'm inclined to reject this patch as probably being a net loss in > almost all cases. If you can show any real-world cases where it wins, > we could reconsider. > I am convinced. I do not see this happen often enough in real world neither. Feel free to reject his patch. Thanks Richard
Re: An inefficient query caused by unnecessary PlaceHolderVar
Updated this patch over 29f114b6ff, which indicates that we should apply the same rules for PHVs. Thanks Richard v3-0001-Avoid-unnecessary-PlaceHolderVars-for-Vars-PHVs.patch Description: Binary data
Re: POC: GROUP BY optimization
On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov wrote: > Thank you for providing the test case relevant for this code change. > The revised patch incorporating this change is attached. Now the > patchset looks good to me. I'm going to push it if there are no > objections. Seems I'm late for the party. Can we hold for several more days? I'd like to have a review on this patch. Thanks Richard
Re: POC: GROUP BY optimization
On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov wrote: > On Mon, Jan 15, 2024 at 8:42 AM Richard Guo > wrote: > > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov > wrote: > >> > >> Thank you for providing the test case relevant for this code change. > >> The revised patch incorporating this change is attached. Now the > >> patchset looks good to me. I'm going to push it if there are no > >> objections. > > > > Seems I'm late for the party. Can we hold for several more days? I'd > > like to have a review on this patch. > > Sure, NP. I'll hold it till your review. Thanks. Appreciate that. I have briefly reviewed this patch and here are some comments. * When trying to match the ordering of GROUP BY to that of ORDER BY in get_useful_group_keys_orderings, this patch checks against the length of path->pathkeys. This does not make sense. I guess this is a typo and the real intention is to check against root->sort_pathkeys. --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) root->num_groupby_pathkeys); if (n > 0 && - (enable_incremental_sort || n == list_length(path->pathkeys))) + (enable_incremental_sort || n == list_length(root->sort_pathkeys))) However, I think this is also incorrect. When incremental sort is disabled, it is reasonable to consider reordering the GROUP BY keys only if the number of matching pathkeys is equal to the length of root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'. * When the original ordering of GROUP BY keys matches the path's pathkeys or ORDER BY keys, get_useful_group_keys_orderings would generate duplicate PathKeyInfos. Consequently, this duplication would lead to the creation and addition of identical paths multiple times. This is not great. Consider the query below: create table t (a int, b int); create index on t (a, b); set enable_hashagg to off; explain select count(*) from t group by a, b; QUERY PLAN -- GroupAggregate (cost=0.15..97.27 rows=226 width=16) Group Key: a, b -> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8) (3 rows) The same path with group keys {a, b} is created and added twice. * Part of the work performed in this patch overlaps with that of preprocess_groupclause. They are both trying to adjust the ordering of the GROUP BY keys to match ORDER BY. I wonder if it would be better to perform this work only once. * When reordering the GROUP BY keys, I think the following checks need some improvements. + /* +* Since 1349d27 pathkey coming from underlying node can be in the +* root->group_pathkeys but not in the processed_groupClause. So, we +* should be careful here. +*/ + sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref, + *group_clauses); + if (!sgc) + /* The grouping clause does not cover this pathkey */ + break; I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is valid before calling get_sortgroupref_clause_noerr with it. Also, how can we guarantee that the returned GROUP BY item is sortable? Should we check that with OidIsValid(sgc->sortop)? Thanks Richard
Re: Revise the Asserts added to bimapset manipulation functions
On Tue, Jan 16, 2024 at 11:08 AM David Rowley wrote: > I've now adjusted the patch to have all modifications to Bitmapsets > return a newly allocated set. There are a few cases missing in master > that need to be fixed (items 6-10 below): > > The patch now includes changes for the following: > > 1. Document what REALLOCATE_BITMAPSETS is for. > 2. Provide a reusable function to check a set is valid and use it in > cassert builds and use it to validate sets (Richard) > 3. Provide a reusable function to copy a set and pfree the original > and use that instead of duplicating that code all over bitmapset.c > 4. Fix Assert duplication (Richard) > 5. Make comments in bms_union, bms_intersect, bms_difference clear > that a new set is allocated. (This has annoyed me for a while) > 6. Fix failure to duplicate the set in bms_add_members() when b == NULL. > 7. Fix failure to duplicate the set in bms_add_range() when upper < lower > 8. Fix failure to duplicate the set in bms_add_range() when the set > has enough words already. > 9. Fix failure to duplicate the set in bms_del_members() when b == NULL > 10. Fix failure to duplicate the set in bms_join() when a == NULL and > also fix the b == NULL case too > 11. Fix hazard in bms_add_members(), bms_int_members(), > bms_del_members() and bms_join(), where the code added in 7d58f2342 > could crash if a == b due to the REALLOCATE_BITMAPSETS code pfreeing > 'a'. This happens in knapsack.c:93, although I propose we adjust that > code now that empty sets are always represented as NULL. Thank you so much for all the work you have put into making this patch perfect. I reviewed through the v3 patch and I do not have further comments. I think it's in good shape now. Thanks Richard
Re: Strange Bitmapset manipulation in DiscreteKnapsack()
On Tue, Jan 16, 2024 at 11:32 AM David Rowley wrote: > While working on [1], I noticed some strange code in > DiscreteKnapsack() which seems to be aiming to copy the Bitmapset. > > It's not that obvious at a casual glance, but: > > sets[j] = bms_del_members(sets[j], sets[j]); > > this is aiming to zero all the words in the set by passing the same > set in both parameters. > > Now that 00b41463c changed Bitmapset to have NULL be the only valid > representation of an empty set, this code no longer makes sense. We > may as well just bms_free() the original set and bms_copy() in the new > set as the bms_del_members() call will always pfree the set anyway. > > I've done that in the attached. +1. This is actually what happens with the original code, i.e. bms_del_members() will pfree sets[j] and bms_add_members() will bms_copy sets[ow] to sets[j]. But the new code looks more natural. I also checked other callers of bms_del_members() and did not find another case that passes the same set in both parameters. Thanks Richard
Re: Fix a typo of func DecodeInsert()
On Wed, Jan 17, 2024 at 8:47 AM Yongtao Huang wrote: > Hi all, > I think the comment above the function DecodeInsert() > in src/backend/replication/logical/decode.c should be > + * *Inserts *can contain the new tuple. > , rather than > - * *Deletes *can contain the new tuple. > Nice catch. +1. I kind of wonder if it would be clearer to state that "XLOG_HEAP_INSERT can contain the new tuple", in order to differentiate it from XLOG_HEAP2_MULTI_INSERT. Thanks Richard
Re: Oversight in reparameterize_path_by_child leading to executor crash
On Tue, Jan 16, 2024 at 2:30 AM Robert Haas wrote: > On Mon, Jan 8, 2024 at 3:32 AM Richard Guo wrote: > > Thanks for the suggestion. Attached is an updated patch which is added > > with a commit message that tries to explain the problem and the fix. > > This is great. The references to "the sampling infos" are a little bit > confusing to me. I'm not sure if this is referring to > TableSampleClause objects or what. Yeah, it's referring to TableSampleClause objects. I've updated the commit message to clarify that. In passing I also updated the test cases a bit. Please see v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch > > Fair point. I think we can back-patch a more minimal fix, as Tom > > proposed in [1], which disallows the reparameterization if the path > > contains sampling info that references the outer rel. But I think we > > need also to disallow the reparameterization of a path if it contains > > restriction clauses that reference the outer rel, as such paths have > > been found to cause incorrect results, or planning errors as in [2]. > > Do you want to produce a patch for that, to go along with this patch for > master? Sure, here it is: v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch Thanks Richard v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch Description: Binary data v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch Description: Binary data
Re: Oversight in reparameterize_path_by_child leading to executor crash
On Wed, Jan 17, 2024 at 5:01 PM Richard Guo wrote: > On Tue, Jan 16, 2024 at 2:30 AM Robert Haas wrote: > >> On Mon, Jan 8, 2024 at 3:32 AM Richard Guo >> wrote: > > > Fair point. I think we can back-patch a more minimal fix, as Tom >> > proposed in [1], which disallows the reparameterization if the path >> > contains sampling info that references the outer rel. But I think we >> > need also to disallow the reparameterization of a path if it contains >> > restriction clauses that reference the outer rel, as such paths have >> > been found to cause incorrect results, or planning errors as in [2]. >> >> Do you want to produce a patch for that, to go along with this patch for >> master? > > > Sure, here it is: > v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch > I forgot to mention that this patch applies on v16 not master. Thanks Richard
Re: Strange Bitmapset manipulation in DiscreteKnapsack()
On Thu, Jan 18, 2024 at 8:35 AM David Rowley wrote: > The functions's header comment mentions "The bitmapsets are all > pre-initialized with an unused high bit so that memory allocation is > done only once.". Ah, I neglected to notice this when reviewing the v1 patch. I guess it's implemented this way due to performance considerations, right? > I've attached a patch which adds bms_replace_members(). It's basically > like bms_copy() but attempts to reuse the member from another set. I > considered if the new function should be called bms_copy_inplace(), > but left it as bms_replace_members() for now. Do you think we can use 'memcpy(a, b, BITMAPSET_SIZE(b->nwords))' directly in the new bms_replace_members() instead of copying the bitmapwords one by one, like: - i = 0; - do - { - a->words[i] = b->words[i]; - } while (++i < b->nwords); - - a->nwords = b->nwords; + memcpy(a, b, BITMAPSET_SIZE(b->nwords)); But I'm not sure if this is an improvement or not. Thanks Richard
Reordering DISTINCT keys to match input path's pathkeys
Similar to what we did to GROUP BY keys in 0452b461bc, I think we can do the same to DISTINCT keys, i.e. reordering DISTINCT keys to match input path's pathkeys, which can help reduce cost by avoiding unnecessary re-sort, or allowing us to use incremental-sort to save efforts. For instance, create table t (a int, b int); create index on t (a, b); explain (costs off) select distinct b, a from t limit 10; QUERY PLAN -- Limit -> Unique -> Index Only Scan using t_a_b_idx on t (3 rows) Please note that the parser has ensured that the DISTINCT pathkeys matches the order of ORDER BY clauses. So there is no need to do this part again. In principle, we can perform such reordering for DISTINCT ON too, but we need to make sure that the resulting pathkeys matches initial ORDER BY keys, which seems not trivial. So it doesn't seem worth the effort. Attached is a patch for this optimization. Any thoughts? Thanks Richard v1-0001-Reordering-DISTINCT-keys-to-match-input-path-s-pathkeys.patch Description: Binary data
Re: Removing const-false IS NULL quals and redundant IS NOT NULL quals
On Tue, Jan 23, 2024 at 1:11 PM David Rowley wrote: > I went over this again. I did a little more work adjusting comments > and pushed it. > > Thanks for all your assistance with this, Richard. Thanks for pushing! This is really great. Thanks Richard
A compiling warning in jsonb_populate_record_valid
I came across a warning when building master (a044e61f1b) on old GCC (4.8.5). jsonfuncs.c: In function ‘jsonb_populate_record_valid’: ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison will always evaluate as ‘true’ for the address of ‘escontext’ will never be NULL [-Waddress] ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \ ^ jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’ return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); This was introduced in commit 1edb3b491b, and can be observed on several systems in the buildfarm too, such as: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2024-01-25%2004%3A54%3A38&stg=build Thanks Richard
Re: Trivial revise for the check of parameterized partial paths
On Sun, Jan 21, 2024 at 8:36 PM vignesh C wrote: > I'm seeing that there has been no activity in this thread for nearly 7 > months, I'm planning to close this in the current commitfest unless > someone is planning to take it forward. This patch fixes the wrong comments in try_partial_hashjoin_path, and also simplifies and enhances the checks for parameterized partial paths. I think it's worth to be moved forward. I've rebased the patch over current master, added a commit message describing what it does, and updated the comment a little bit in the v2 patch. Thanks Richard v2-0001-Revisions-to-the-checks-for-parameterized-partial-paths.patch Description: Binary data
Re: A compiling warning in jsonb_populate_record_valid
On Thu, Jan 25, 2024 at 2:28 PM Amit Langote wrote: > On Thu, Jan 25, 2024 at 2:59 PM Richard Guo > wrote: > > I came across a warning when building master (a044e61f1b) on old GCC > > (4.8.5). > > > > jsonfuncs.c: In function ‘jsonb_populate_record_valid’: > > ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison > will always evaluate as ‘true’ for the address of ‘escontext’ will never be > NULL [-Waddress] > > ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \ > >^ > > jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’ > > return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); > > > > This was introduced in commit 1edb3b491b, and can be observed on several > > systems in the buildfarm too, such as: > > > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2024-01-25%2004%3A54%3A38&stg=build > > Thanks for the report. > > Will apply the attached, which does this: > > - return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); > + return BoolGetDatum(!escontext.error_occurred); Looks good to me. Thanks Richard
Re: A performance issue with Memoize
On Fri, Jan 26, 2024 at 2:32 AM Tom Lane wrote: > I'm fairly sure I thought it wouldn't matter because of the Param > de-duplication done in paramassign.c. However, Richard's example > shows that's not so, because process_subquery_nestloop_params is > picky about the param ID assigned to a particular Var while > replace_nestloop_params is not. So flipping the order makes sense. > I'd change the comment though, maybe to > > /* > * Replace any outer-relation variables with nestloop params. > * > * We must provide nestloop params for both lateral references of > * the subquery and outer vars in the scan_clauses. It's better > * to assign the former first, because that code path requires > * specific param IDs, while replace_nestloop_params can adapt > * to the IDs assigned by process_subquery_nestloop_params. > * This avoids possibly duplicating nestloop params when the > * same Var is needed for both reasons. > */ +1. It's much better. > However ... it seems like we're not out of the woods yet. Why > is Richard's proposed test case still showing > > + -> Memoize (actual rows=5000 loops=N) > + Cache Key: t1.two, t1.two > > Seems like there is missing de-duplication logic, or something. When we collect the cache keys in paraminfo_get_equal_hashops() we search param_info's ppi_clauses as well as innerrel's lateral_vars for outer expressions. We do not perform de-duplication on the collected outer expressions there. In my proposed test case, the same Var 't1.two' appears both in the param_info->ppi_clauses and in the innerrel->lateral_vars, so we see two identical cache keys in the plan. I noticed this before and wondered if we should do de-duplication on the cache keys, but somehow I did not chase this to the ground. Thanks Richard
Re: A performance issue with Memoize
On Fri, Jan 26, 2024 at 1:22 AM Tom Lane wrote: > Apologies for not having noticed this thread before. I'm taking > a look at it now. However, while sniffing around this I found > what seems like an oversight in paramassign.c's > assign_param_for_var(): it says it should compare all the same > fields as _equalVar except for varlevelsup, but it's failing to > compare varnullingrels. Is that a bug? It's conceivable that > it's not possible to get here with varnullingrels different and > all else the same, but I don't feel good about that proposition. > > I tried adding > > @@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var) > pvar->vartype == var->vartype && > pvar->vartypmod == var->vartypmod && > pvar->varcollid == var->varcollid) > +{ > +Assert(bms_equal(pvar->varnullingrels, > var->varnullingrels)); > return pitem->paramId; > +} > } > } Yeah, I think it should be safe to assert that the varnullingrels is equal here. The Var is supposed to be an upper-level Var, and two same such Vars should not have different varnullingrels at this point, although the varnullingrels might be adjusted later in identify_current_nestloop_params according to which form of identity 3 we end up applying. Thanks Richard
Re: A performance issue with Memoize
On Fri, Jan 26, 2024 at 12:18 PM David Rowley wrote: > On Fri, 26 Jan 2024 at 16:51, Tom Lane wrote: > > >> However ... it seems like we're not out of the woods yet. Why > > >> is Richard's proposed test case still showing > > >> + -> Memoize (actual rows=5000 loops=N) > > >> + Cache Key: t1.two, t1.two > > >> Seems like there is missing de-duplication logic, or something. > > > > > This seems separate and isn't quite causing the same problems as what > > > Richard wants to fix so I didn't touch this for now. > > > > Fair enough, but I think it might be worth pursuing later. > > Here's a patch for that. At first I wondered if we should assume that the same param expr must have the same equality operator. If not, we should also check the operator to tell if the cache key is a duplicate, like - if (!list_member(*param_exprs, expr)) + if (!list_member(*param_exprs, expr) || + !list_member_oid(*operators, hasheqoperator)) But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator is set, it seems we can assume that: the operator is from the type cache entry which is fetched according to the expr datatype. So I think the patch makes sense. +1. Thanks Richard
Apply the "LIMIT 1" optimization to partial DISTINCT
In 5543677ec9 we introduced an optimization that uses Limit instead of Unique to implement DISTINCT when all the DISTINCT pathkeys have been marked as redundant. I happened to notice that this optimization was not applied to partial DISTINCT, which I think should be. This can improve plans in some cases, such as -- on master explain (costs off) select distinct four from tenk1 where four = 4; QUERY PLAN -- Limit -> Gather Workers Planned: 4 -> Unique -> Parallel Seq Scan on tenk1 Filter: (four = 4) (6 rows) -- patched explain (costs off) select distinct four from tenk1 where four = 4; QUERY PLAN -- Limit -> Gather Workers Planned: 4 -> Limit -> Parallel Seq Scan on tenk1 Filter: (four = 4) (6 rows) Such queries might not be that common, but it's very cheap to apply this optimization. Attached is a patch for that. Thanks Richard v1-0001-Apply-the-LIMIT-1-optimization-to-partial-DISTINCT.patch Description: Binary data
Re: Reordering DISTINCT keys to match input path's pathkeys
On Tue, Jan 23, 2024 at 5:03 PM David Rowley wrote: > I've not caught up on the specifics of 0452b461b, but I just wanted to > highlight that there was some work done in [1] in this area. It seems > Ankit didn't ever add that to a CF, so that might explain why it's > been lost. > > Anyway, just pointing it out as there may be useful code or discussion > in the corresponding threads. Thanks for pointing it out. I looked at the patch there and noticed several problems with it. * That patch is incomplete and does not work as expected. It at least needs to modify truncate_useless_pathkeys() to account for DISTINCT clause (I think this has been mentioned in that thread). * That patch would not consider the origin DISTINCT pathkeys if it could do some reordering, which is not great and can generate inefficient plans. For instance (after fixing the first problem) create table t (a int, b int); create index on t(a); set enable_hashagg to off; set enable_incremental_sort to off; set enable_seqscan to off; explain (costs off) select distinct b, a from t order by b, a; QUERY PLAN - Sort Sort Key: b, a -> Unique -> Sort Sort Key: a, b -> Index Scan using t_a_idx on t (6 rows) Using DISTINCT pathkeys {b, a} is more efficient for this plan, because only one Sort would be required. But that patch is not able to do that, because it does not consider the origin DISTINCT pathkeys after reordering. Thanks Richard
Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
On Sat, Jan 27, 2024 at 10:08 AM vignesh C wrote: > On Mon, 8 Jan 2024 at 22:21, Tom Lane wrote: > > > > Richard Guo writes: > > > On Sun, Jan 7, 2024 at 6:41 AM Tom Lane wrote: > > >> Thanks for the report! I guess we need something like the attached. > > > > > +1. > > > > Pushed, thanks for looking at it. > > I have changed the status of the commitfest entry to "Committed" as I > noticed the patch has already been committed. Well, the situation seems a little complex here. At first, this thread was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE' patch, which has already been pushed in [1]. Subsequently, I proposed another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in [2], which is currently under review and is what the commitfest entry for. Later on, within the same thread, another patch was posted as a fix to the first patch and was subsequently pushed in [3]. I believe this sequence of events might have led to confusion. What is the usual practice in such situations? I guess I'd better to fork a new thread to discuss my proposed patch which is about the 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'. [1] https://www.postgresql.org/message-id/754093.1700250120%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/CAMbWs49gAHeEOn0rpdUUYXryaa60KZ8JKwk1aSERttY9caCYkA%40mail.gmail.com [3] https://www.postgresql.org/message-id/1941515.1704732682%40sss.pgh.pa.us Thanks Richard
Propagate pathkeys from CTEs up to the outer query
In [1] we've reached a conclusion that for a MATERIALIZED CTE it's okay to 'allow our statistics or guesses for the sub-query to subsequently influence what the upper planner does'. Commit f7816aec23 exposes column statistics to the upper planner. In the light of that, here is a patch that exposes info about the ordering of the CTE result to the upper planner. This patch was initially posted in that same thread and has received some comments from Tom in [2]. Due to the presence of multiple patches in that thread, it has led to confusion. So fork a new thread here specifically dedicated to discussing the patch about exposing pathkeys from CTEs to the upper planner. Comments/thoughts? [1] https://www.postgresql.org/message-id/482957.1700192299%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/1339607.1700502358%40sss.pgh.pa.us Thanks Richard v3-0001-Propagate-pathkeys-from-CTEs-up-to-the-outer-query.patch Description: Binary data
Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
On Mon, Jan 29, 2024 at 11:20 AM vignesh C wrote: > On Mon, 29 Jan 2024 at 08:01, Richard Guo wrote: > > On Sat, Jan 27, 2024 at 10:08 AM vignesh C wrote: > >> I have changed the status of the commitfest entry to "Committed" as I > >> noticed the patch has already been committed. > > > > Well, the situation seems a little complex here. At first, this thread > > was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE' > > patch, which has already been pushed in [1]. Subsequently, I proposed > > another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in > > [2], which is currently under review and is what the commitfest entry > > for. Later on, within the same thread, another patch was posted as a > > fix to the first patch and was subsequently pushed in [3]. I believe > > this sequence of events might have led to confusion. > > > > What is the usual practice in such situations? I guess I'd better to > > fork a new thread to discuss my proposed patch which is about the > > 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'. > > Sorry I missed to notice that there was one pending patch yet to be > committed, I feel you can continue discussing here itself just to > avoid losing any historical information about the issue and the > continuation of the discussion. You can add a new commitfest entry for > this. It seems to me that a fresh new thread is a better option. I have just started a new thread in [1], and have tried to migrate the necessary context over there. I have also updated the commitfest entry accordingly. [1] https://www.postgresql.org/message-id/flat/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew%40mail.gmail.com Thanks Richard
Re: Some revises in adding sorting path
On Mon, Jul 17, 2023 at 4:55 PM Richard Guo wrote: > But I did not find a query that makes an incremental sort in this case. > After trying for a while it seems to me that we do not need to consider > incremental sort in this case, because for a partial path of a grouped > or partially grouped relation, it is either unordered (HashAggregate or > Append), or it has been ordered by the group_pathkeys (GroupAggregate). > It seems there is no case that we'd have a partial path that is > partially sorted. > Since now we'd try to reorder the group by keys (see 0452b461bc), it is possible that we have a partial path that is partially sorted. So this conclusion is not true any more. For instance, create table t (a int, b int, c int, d int); insert into t select i%10, i%10, i%10, i%10 from generate_series(1,100)i; create index on t (a, b); analyze t; set enable_hashagg to off; set enable_seqscan to off; explain (costs off) select count(*) from t group by a, c, b, parallel_safe_volatile(d); QUERY PLAN -- Finalize GroupAggregate Group Key: a, c, b, (parallel_safe_volatile(d)) -> Gather Merge Workers Planned: 2 -> Incremental Sort Sort Key: a, c, b, (parallel_safe_volatile(d)) Presorted Key: a -> Partial GroupAggregate Group Key: a, b, c, (parallel_safe_volatile(d)) -> Incremental Sort Sort Key: a, b, c, (parallel_safe_volatile(d)) Presorted Key: a, b -> Parallel Index Scan using t_a_b_idx on t (13 rows) If we do not consider incremental sort on partial paths in gather_grouping_paths(), we'd get a plan that looks like: explain (costs off) select count(*) from t group by a, c, b, parallel_safe_volatile(d); QUERY PLAN Finalize GroupAggregate Group Key: a, c, b, (parallel_safe_volatile(d)) -> Incremental Sort Sort Key: a, c, b, (parallel_safe_volatile(d)) Presorted Key: a, c, b -> Gather Merge Workers Planned: 2 -> Incremental Sort Sort Key: a, c, b Presorted Key: a -> Partial GroupAggregate Group Key: a, b, c, (parallel_safe_volatile(d)) -> Incremental Sort Sort Key: a, b, c, (parallel_safe_volatile(d)) Presorted Key: a, b -> Parallel Index Scan using t_a_b_idx on t (16 rows) So in the v3 patch I've brought back the logic that considers incremental sort on partial paths in gather_grouping_paths(). However, I failed to compose a test case for this scenario without having to generate a huge table. So in the v3 patch I did not include a test case for this aspect. Thanks Richard v3-0001-Revise-how-we-sort-partial-paths-in-create_ordered_paths.patch Description: Binary data v3-0002-Revise-how-we-sort-partial-paths-in-gather_grouping_paths.patch Description: Binary data
Re: Retiring is_pushed_down
On Sun, Jan 21, 2024 at 8:37 PM vignesh C wrote: > I'm seeing that there has been no activity in this thread for nearly 6 > months, I'm planning to close this in the current commitfest unless > someone is planning to take it forward. It can be opened again when > there is more interest. I'm planning to take it forward. The v2 patch does not compile any more. Attached is an updated patch that is rebased over master and fixes the compiling issues. Nothing else has changed. Thanks Richard v3-0001-Retiring-is_pushed_down.patch Description: Binary data
Re: Support run-time partition pruning for hash join
On Sat, Jan 27, 2024 at 11:29 AM vignesh C wrote: > CFBot shows that the patch does not apply anymore as in [1]: > > Please post an updated version for the same. Attached is an updated patch. Nothing else has changed. Thanks Richard v6-0001-Support-run-time-partition-pruning-for-hash-join.patch Description: Binary data
Re: Some revises in adding sorting path
On Tue, Jan 30, 2024 at 7:00 PM David Rowley wrote: > On Mon, 29 Jan 2024 at 22:39, Richard Guo wrote: > > So in the v3 patch I've brought back the logic that considers > > incremental sort on partial paths in gather_grouping_paths(). However, > > I failed to compose a test case for this scenario without having to > > generate a huge table. So in the v3 patch I did not include a test case > > for this aspect. > > Can you share the test with the huge table? The test had been shown in upthread [1]. Pasting it here: create table t (a int, b int, c int, d int); insert into t select i%10, i%10, i%10, i%10 from generate_series(1,100)i; create index on t (a, b); analyze t; set enable_hashagg to off; set enable_seqscan to off; explain (costs off) select count(*) from t group by a, c, b, parallel_safe_volatile(d); QUERY PLAN -- Finalize GroupAggregate Group Key: a, c, b, (parallel_safe_volatile(d)) -> Gather Merge Workers Planned: 2 -> Incremental Sort Sort Key: a, c, b, (parallel_safe_volatile(d)) Presorted Key: a -> Partial GroupAggregate Group Key: a, b, c, (parallel_safe_volatile(d)) -> Incremental Sort Sort Key: a, b, c, (parallel_safe_volatile(d)) Presorted Key: a, b -> Parallel Index Scan using t_a_b_idx on t (13 rows) > I tried and failed as, if I'm not mistaken, you're talking about a > parallel aggregate query with an incremental sort between the Partial > Aggregate node and the Finalize Group Aggregate node. If the partial > aggregate was a Group Aggregate, then it would already be correctly > sorted. We don't need a more strict sort ordering to perform the > Finalize Group Aggregate, the results must already be sorted by at > least the GROUP BY clause. If the partial aggregate had opted to Hash > Aggregate, then there'd be no presorted keys, so we could only get a > full sort. I can't see any way to get an incremental sort between the > 2 aggregate phases. > > What am I missing? This was true before 0452b461bc, and I reached the same conclusion in [2]. Quote it here: " But I did not find a query that makes an incremental sort in this case. After trying for a while it seems to me that we do not need to consider incremental sort in this case, because for a partial path of a grouped or partially grouped relation, it is either unordered (HashAggregate or Append), or it has been ordered by the group_pathkeys (GroupAggregate). It seems there is no case that we'd have a partial path that is partially sorted. " But if we've reordered the group by keys to match the input path's pathkeys, we might have a partial GroupAggregate that is partially sorted. See the plan above. > I also tried reverting your changes to planner.c to see if your new > tests would fail. They all passed. So it looks like none of these > tests are testing anything new. This patchset does not aim to introduce anything new; it simply refactors the existing code. The newly added tests are used to show that the code that is touched here is not redundant, but rather essential for generating certain paths. I remember the tests were added per your comment in [3]. [1] https://www.postgresql.org/message-id/CAMbWs4_iDwMAf5mp%2BG-tXq-gYzvR6koSHvNUqBPK4pt7%2B11tJw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAMbWs497h5jVCVwNDb%2BBX31Z_K8iBaPQKOcsTvpFQ7kF18a2%2Bg%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com Thanks Richard
Re: Oversight in reparameterize_path_by_child leading to executor crash
On Wed, Jan 31, 2024 at 5:12 AM Tom Lane wrote: > Richard Guo writes: > > On Wed, Jan 17, 2024 at 5:01 PM Richard Guo > wrote: > >> Sure, here it is: > >> v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch > > > I forgot to mention that this patch applies on v16 not master. > > I spent some time looking at this patch (which seems more urgent than > the patch for master, given that we have back-branch releases coming > up). Thanks for looking at this patch! > There are two things I'm not persuaded about: > > * Why is it okay to just use pull_varnos on a tablesample expression, > when contain_references_to() does something different? Hmm, the main reason is that the expression_tree_walker() function does not handle T_RestrictInfo nodes. So we cannot just use pull_varnos or pull_var_clause on a restrictinfo list. So contain_references_to() calls extract_actual_clauses() first before it calls pull_var_clause(). > * Is contain_references_to() doing the right thing by specifying > PVC_RECURSE_PLACEHOLDERS? That causes it to totally ignore a > PlaceHolderVar's ph_eval_at, and I'm not convinced that's correct. I was thinking that we should recurse into the PHV's contents to see if there are any lateral references to the other join relation. But now I realize that checking phinfo->ph_lateral, as you suggested, might be a better way to do that. But I'm not sure about checking phinfo->ph_eval_at. It seems to me that the ph_eval_at could not overlap the other join relation; otherwise the clause would not be a restriction clause but rather a join clause. > Ideally it seems to me that we want to reject a PlaceHolderVar > if either its ph_eval_at or ph_lateral overlap the other join > relation; if it was coded that way then we'd not need to recurse > into the PHV's contents. pull_varnos isn't directly amenable > to this, but I think we could use pull_var_clause with > PVC_INCLUDE_PLACEHOLDERS and then iterate through the resulting > list manually. (If this patch were meant for HEAD, I'd think > about extending the var.c code to support this usage more directly. > But as things stand, this is a one-off so I think we should just do > what we must in reparameterize_path_by_child.) Thanks for the suggestion. I've coded it this way in the v11 patch, and left a XXX comment about checking phinfo->ph_eval_at. > BTW, it shouldn't be necessary to write either PVC_RECURSE_AGGREGATES > or PVC_RECURSE_WINDOWFUNCS, because neither kind of node should ever > appear in a scan-level expression. I'd leave those out so that we > get an error if something unexpected happens. Good point. Thanks Richard v11-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch Description: Binary data
Re: Some revises in adding sorting path
On Wed, Jan 31, 2024 at 5:13 AM David Rowley wrote: > On Wed, 31 Jan 2024 at 00:44, Richard Guo wrote: > > This patchset does not aim to introduce anything new; it simply > > refactors the existing code. The newly added tests are used to show > > that the code that is touched here is not redundant, but rather > > essential for generating certain paths. I remember the tests were added > > per your comment in [3]. > > > > [3] > https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com > > OK. I've pushed the patched based on it being a simplification of the > partial path generation. Thanks for pushing it! Thanks Richard
Re: Apply the "LIMIT 1" optimization to partial DISTINCT
On Wed, Jan 31, 2024 at 12:26 PM David Rowley wrote: > On Fri, 26 Jan 2024 at 21:14, David Rowley wrote: > > However, having said that. Parallel plans are often picked when there > > is some highly selective qual as parallel_tuple_cost has to be applied > > to fewer tuples for such plans, so probably this is worth doing. > > I was messing around with your test case and didn't manage to get any > plan that had any rows to use the partial path with the LIMIT. I > ended up dropping the test that was checking the results were empty as > I didn't think it added much more value over the EXPLAIN output. > > I pushed the result. Thanks for pushing it! Thanks Richard
Re: Oversight in reparameterize_path_by_child leading to executor crash
On Wed, Jan 31, 2024 at 11:21 PM Tom Lane wrote: > Richard Guo writes: > > On Wed, Jan 31, 2024 at 5:12 AM Tom Lane wrote: > >> * Why is it okay to just use pull_varnos on a tablesample expression, > >> when contain_references_to() does something different? > > > Hmm, the main reason is that the expression_tree_walker() function does > > not handle T_RestrictInfo nodes. So we cannot just use pull_varnos or > > pull_var_clause on a restrictinfo list. > > Right, the extract_actual_clauses step is not wanted for the > tablesample expression. But my point is that surely the handling of > Vars and PlaceHolderVars should be the same, which it's not, and your > v11 makes it even less so. How can it be OK to ignore Vars in the > restrictinfo case? Hmm, in this specific scenario it seems that it's not possible to have Vars in the restrictinfo list that laterally reference the outer join relation; otherwise the clause containing such Vars would not be a restriction clause but rather a join clause. So in v11 I did not check Vars in contain_references_to(). But I agree that we'd better to handle Vars and PlaceHolderVars in the same way. > I think the code structure we need to end up with is a routine that > takes a RestrictInfo-free node tree (and is called directly in the > tablesample case) with a wrapper routine that strips the RestrictInfos > (for use on restriction lists). How about the attached v12 patch? But I'm a little concerned about omitting PVC_RECURSE_AGGREGATES and PVC_RECURSE_WINDOWFUNCS in this case. Is it possible that aggregates or window functions appear in the tablesample expression? Thanks Richard v12-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch Description: Binary data
Re: set_cheapest without checking pathlist
On Thu, Feb 1, 2024 at 10:04 AM James Coleman wrote: > I don't see any inherent reason why we must always assume that > gather_grouping_paths will always result in having at least one entry > in pathlist. If, for example, we've disabled incremental sort and the > cheapest partial path happens to already be sorted, then I don't > believe we'll add any paths. And if that happens then set_cheapest > will error with the message "could not devise a query plan for the > given query". So I propose we add a similar guard to this call point. I don't believe that would happen. It seems to me that we should, at the very least, have a path which is Gather on top of the cheapest partial path (see generate_gather_paths), as long as the partially_grouped_rel has partial paths. Thanks Richard
Re: set_cheapest without checking pathlist
On Thu, Feb 1, 2024 at 11:37 AM David Rowley wrote: > On Thu, 1 Feb 2024 at 16:29, Richard Guo wrote: > > On Thu, Feb 1, 2024 at 10:04 AM James Coleman wrote: > >> I don't see any inherent reason why we must always assume that > >> gather_grouping_paths will always result in having at least one entry > >> in pathlist. If, for example, we've disabled incremental sort and the > >> cheapest partial path happens to already be sorted, then I don't > >> believe we'll add any paths. And if that happens then set_cheapest > >> will error with the message "could not devise a query plan for the > >> given query". So I propose we add a similar guard to this call point. > > > > > > I don't believe that would happen. It seems to me that we should, at > > the very least, have a path which is Gather on top of the cheapest > > partial path (see generate_gather_paths), as long as the > > partially_grouped_rel has partial paths. > > It will have partial paths because it's nested inside "if > (partially_grouped_rel && partially_grouped_rel->partial_pathlist)" Right. And that leads to the conclusion that gather_grouping_paths will always generate at least one path for partially_grouped_rel. So it seems to me that the check added in the patch is not necessary. But maybe we can add an Assert or a comment clarifying that the pathlist of partially_grouped_rel cannot be empty here. Thanks Richard
Re: A performance issue with Memoize
On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy < anthonin.bonne...@datadoghq.com> wrote: > Hi, > > I've seen a similar issue with the following query (tested on the current > head): > > EXPLAIN ANALYZE SELECT * FROM tenk1 t1 > LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2 > ON t1.hundred = t2.hundred WHERE t1.hundred < 5; >QUERY PLAN > > > Nested Loop Left Join (cost=8.46..1495.10 rows=5 width=256) > (actual time=0.860..111.013 rows=5 loops=1) >-> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500 > width=244) (actual time=0.798..1.418 rows=500 loops=1) > Recheck Cond: (hundred < 5) > Heap Blocks: exact=263 > -> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04 > rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1) >Index Cond: (hundred < 5) >-> Memoize (cost=0.30..4.89 rows=100 width=12) (actual > time=0.009..0.180 rows=100 loops=500) > Cache Key: t1.hundred > Cache Mode: logical > Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage: > 5kB > -> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88 > rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500) >Index Cond: (hundred = t1.hundred) > Planning Time: 0.661 ms > Execution Time: 113.076 ms > (14 rows) > > The memoize's cache key only uses t1.hundred while the nested loop has > two changed parameters: the lateral var t1.two and t1.hundred. This > leads to a chgParam that is always different and the cache is purged > on each rescan. Thanks for the report! This issue is caused by that we fail to check PHVs for lateral references, and hence cannot know that t1.two should also be included in the cache keys. I reported exactly the same issue in [1], and verified that it can be fixed by the patch in [2] (which seems to require a rebase). [1] https://www.postgresql.org/message-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun%2BVtrowcmxb41CzA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAMbWs49%2BCjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg%40mail.gmail.com Thanks Richard
Re: set_cheapest without checking pathlist
On Thu, Feb 1, 2024 at 5:03 PM David Rowley wrote: > On Thu, 1 Feb 2024 at 19:36, Richard Guo wrote: > > maybe we can add an Assert or a comment clarifying that the pathlist of > > partially_grouped_rel cannot be empty here. > > There'd be no need to Assert that as set_cheapest() will raise an > error when given a rel with an empty pathlist. > > I don't think putting an Assert that would fail in the same situation > that we'd later ERROR out on would be a good idea. It'd make it harder > to debug the issue. Fair point. Thanks Richard
Re: Oversight in reparameterize_path_by_child leading to executor crash
On Fri, Feb 2, 2024 at 1:45 AM Tom Lane wrote: > I pushed v12 (with some cosmetic adjustments) into the back branches, > since we're getting close to the February release freeze. We still > need to deal with the larger fix for HEAD. Please re-post that > one so that the cfbot knows which is the patch-of-record. Thanks for the adjustments and pushing! Here is the patch for HEAD. I simply re-posted v10. Nothing has changed. Thanks Richard v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch Description: Binary data
Re: POC: GROUP BY optimization
On Fri, Feb 2, 2024 at 10:02 AM Tom Lane wrote: > Alexander Korotkov writes: > > I'm going to push this if there are no objections. > > One of the test cases added by this commit has not been very > stable in the buildfarm. Latest example is here: > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 > > and I've seen similar failures intermittently on other machines. > > I'd suggest building this test atop a table that is more stable > than pg_class. You're just waving a red flag in front of a bull > if you expect stable statistics from that during a regression run. > Nor do I see any particular reason for pg_class to be especially > suited to the test. Yeah, it's not a good practice to use pg_class in this place. While looking through the test cases added by this commit, I noticed some other minor issues that are not great. Such as * The table 'btg' is inserted with 1 tuples, which seems a bit expensive for a test. I don't think we need such a big table to test what we want. * I don't see why we need to manipulate GUC max_parallel_workers and max_parallel_workers_per_gather. * I think we'd better write the tests with the keywords being all upper or all lower. A mixed use of upper and lower is not great. Such as in explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w; * Some comments for the test queries are not easy to read. * For this statement CREATE INDEX idx_y_x_z ON btg(y,x,w); I think the index name would cause confusion. It creates an index on columns y, x and w, but the name indicates an index on y, x and z. I'd like to write a draft patch for the fixes. Thanks Richard
Re: POC: GROUP BY optimization
On Fri, Feb 2, 2024 at 11:32 AM Richard Guo wrote: > On Fri, Feb 2, 2024 at 10:02 AM Tom Lane wrote: > >> One of the test cases added by this commit has not been very >> stable in the buildfarm. Latest example is here: >> >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 >> >> and I've seen similar failures intermittently on other machines. >> >> I'd suggest building this test atop a table that is more stable >> than pg_class. You're just waving a red flag in front of a bull >> if you expect stable statistics from that during a regression run. >> Nor do I see any particular reason for pg_class to be especially >> suited to the test. > > > Yeah, it's not a good practice to use pg_class in this place. While > looking through the test cases added by this commit, I noticed some > other minor issues that are not great. Such as > > * The table 'btg' is inserted with 1 tuples, which seems a bit > expensive for a test. I don't think we need such a big table to test > what we want. > > * I don't see why we need to manipulate GUC max_parallel_workers and > max_parallel_workers_per_gather. > > * I think we'd better write the tests with the keywords being all upper > or all lower. A mixed use of upper and lower is not great. Such as in > > explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w; > > * Some comments for the test queries are not easy to read. > > * For this statement > > CREATE INDEX idx_y_x_z ON btg(y,x,w); > > I think the index name would cause confusion. It creates an index on > columns y, x and w, but the name indicates an index on y, x and z. > > I'd like to write a draft patch for the fixes. > Here is the draft patch that fixes the issues I complained about in upthread. Thanks Richard v1-0001-Multiple-revises-for-the-GROUP-BY-reordering-tests.patch Description: Binary data
Re: An improvement on parallel DISTINCT
On Fri, Feb 2, 2024 at 11:26 AM David Rowley wrote: > In light of this, do you still think it's worthwhile making this change? > > For me, I think all it's going to result in is extra planner work > without any performance gains. Hmm, with the query below, I can see that the new plan is cheaper than the old plan, and the cost difference exceeds STD_FUZZ_FACTOR. create table t (a int, b int); insert into t select i%10, i from generate_series(1,1000)i; analyze t; -- on master explain (costs on) select distinct a from t order by a limit 1; QUERY PLAN -- Limit (cost=120188.50..120188.51 rows=1 width=4) -> Sort (cost=120188.50..120436.95 rows=99379 width=4) Sort Key: a -> HashAggregate (cost=118697.82..119691.61 rows=99379 width=4) Group Key: a -> Gather (cost=97331.33..118200.92 rows=198758 width=4) Workers Planned: 2 -> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4) Group Key: a -> Parallel Seq Scan on t (cost=0.00..85914.67 rows=417 width=4) (10 rows) -- on patched explain (costs on) select distinct a from t order by a limit 1; QUERY PLAN -- Limit (cost=106573.93..106574.17 rows=1 width=4) -> Unique (cost=106573.93..130260.88 rows=99379 width=4) -> Gather Merge (cost=106573.93..129763.98 rows=198758 width=4) Workers Planned: 2 -> Sort (cost=105573.91..105822.35 rows=99379 width=4) Sort Key: a -> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4) Group Key: a -> Parallel Seq Scan on t (cost=0.00..85914.67 rows=417 width=4) (9 rows) It seems that including a LIMIT clause can potentially favor the new plan. Thanks Richard
Re: Commitfest 2024-01 is now closed
On Fri, Feb 2, 2024 at 2:41 PM Michael Paquier wrote: > On Fri, Feb 02, 2024 at 11:56:36AM +0530, Amit Kapila wrote: > > On Fri, Feb 2, 2024 at 10:58 AM Tom Lane wrote: > >> Thanks for all the work you did running it! CFM is typically a > >> thankless exercise in being a nag, but I thought you put in more > >> than the usual amount of effort to keep things moving. > >> > > > > +1. Thanks a lot for all the hard work. > > +1. Thanks! +1. Thanks! Thanks Richard
Re: Check lateral references within PHVs for memoize cache keys
On Mon, Dec 25, 2023 at 3:01 PM Richard Guo wrote: > On Thu, Jul 13, 2023 at 3:12 PM Richard Guo > wrote: > >> So I'm wondering if it'd be better that we move all this logic of >> computing additional lateral references within PHVs to get_memoize_path, >> where we can examine only PHVs that are evaluated at innerrel. And >> considering that these lateral refs are only used by Memoize, it seems >> more sensible to compute them there. But I'm a little worried that >> doing this would make get_memoize_path too expensive. >> >> Please see v4 patch for this change. >> > > I'd like to add that not checking PHVs for lateral references can lead > to performance regressions with Memoize node. > The v4 patch does not apply any more. I've rebased it on master. Nothing else has changed. Thanks Richard v5-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch Description: Binary data
Re: An improvement on parallel DISTINCT
On Fri, Feb 2, 2024 at 6:39 PM David Rowley wrote: > So the gains increase with more parallel workers due to pushing more > work to the worker. Amdahl's law approves of this. > > I'll push the patch shortly. Thanks for the detailed testing and pushing the patch! Thanks Richard
Re: An improvement on parallel DISTINCT
On Fri, Feb 2, 2024 at 7:36 PM David Rowley wrote: > Now for the other stuff you had. I didn't really like this part: > > + /* > + * Set target for partial_distinct_rel as generate_useful_gather_paths > + * requires that the input rel has a valid reltarget. > + */ > + partial_distinct_rel->reltarget = cheapest_partial_path->pathtarget; > > I think we should just make it work the same way as > create_grouping_paths(), where grouping_target is passed as a > parameter. > > I've done it that way in the attached. The change looks good to me. BTW, I kind of doubt that 'create_partial_distinct_paths' is a proper function name given what it actually does. It not only generates distinct paths based on input_rel's partial paths, but also adds Gather/GatherMerge on top of these partially distinct paths, followed by a final unique/aggregate path to ensure uniqueness of the final result. So maybe 'create_parallel_distinct_paths' or something like that would be better? I asked because I noticed that in create_partial_grouping_paths(), we only generate partially aggregated paths, and any subsequent FinalizeAggregate step is called in the caller. Thanks Richard
Re: Reordering DISTINCT keys to match input path's pathkeys
cfbot reminds that this patch does not apply any more. So I've rebased it on master, and also adjusted the test cases a bit. Thanks Richard v2-0001-Reordering-DISTINCT-keys-to-match-input-path-s-pathkeys.patch Description: Binary data
cfbot does not list patches in 'Current commitfest'
... and the patches in CF #47 (currently open) are listed in 'Next commitfest'. I guess we need to manually create a "next" commitfest? Thanks Richard
Re: Properly pathify the union planner
On Fri, Nov 24, 2023 at 6:29 AM David Rowley wrote: > I've attached the updated patch. This one is probably ready for > someone to test out. There will be more work to do, however. I just started reviewing this patch and haven't looked through all the details yet. Here are some feedbacks that came to my mind. Post them first so that I don’t forget them after the holidays. * I think we should update truncate_useless_pathkeys() to account for the ordering requested by the query's set operation; otherwise we may not get a subquery's path with the expected pathkeys. For instance, create table t (a int, b int); create index on t (a, b); set enable_hashagg to off; -- on v1 patch explain (costs off) (select * from t order by a) UNION (select * from t order by a); QUERY PLAN Unique -> Merge Append Sort Key: t.a, t.b -> Incremental Sort Sort Key: t.a, t.b Presorted Key: t.a -> Index Only Scan using t_a_b_idx on t -> Incremental Sort Sort Key: t_1.a, t_1.b Presorted Key: t_1.a -> Index Only Scan using t_a_b_idx on t t_1 (11 rows) -- after accounting for setop_pathkeys in truncate_useless_pathkeys() explain (costs off) (select * from t order by a) UNION (select * from t order by a); QUERY PLAN -- Unique -> Merge Append Sort Key: t.a, t.b -> Index Only Scan using t_a_b_idx on t -> Index Only Scan using t_a_b_idx on t t_1 (5 rows) * I understand that we need to sort (full or incremental) the paths of the subqueries to meet the ordering required for setop_pathkeys, so that MergeAppend has something to work with. Currently in the v1 patch this sorting is performed during the planning phase of the subqueries (in grouping_planner). And we want to add the subquery's cheapest_total_path as-is to allow that path to be used in hash-based UNIONs, and we also want to add a sorted path on top of cheapest_total_path. And then we may encounter the issue you mentioned earlier regarding add_path() potentially freeing the cheapest_total_path, leaving the Sort's subpath gone. I'm thinking that maybe it'd be better to move the work of sorting the subquery's paths to the outer query level, specifically within the build_setop_child_paths() function, just before we stick SubqueryScanPath on top of the subquery's paths. I think this is better because: 1. This minimizes the impact on subquery planning and reduces the footprint within the grouping_planner() function as much as possible. 2. This can help avoid the aforementioned add_path() issue because the two involved paths will be structured as: cheapest_path -> subqueryscan and cheapest_path -> sort -> subqueryscan If the two paths cost fuzzily the same and add_path() decides to keep the second one due to it having better pathkeys and pfree the first one, it would not be a problem. BTW, I haven't looked through the part involving partial paths, but I think we can do the same to partial paths. * I noticed that in generate_union_paths() we use a new function build_setop_pathkeys() to build the 'union_pathkeys'. I wonder why we don't simply utilize make_pathkeys_for_sortclauses() since we already have the grouplist for the setop's output columns. To assist the discussion I've attached a diff file that includes all the changes above. Thanks Richard v1-0001-review-diff.patch Description: Binary data
Re: Wrong query results caused by loss of join quals
On Thu, Feb 23, 2023 at 4:50 AM Tom Lane wrote: > I thought about this and decided that it's not really a problem. > have_relevant_joinclause is just a heuristic, and I don't think we > need to prioritize forming a join if the only relevant clauses look > like this. We won't be able to use such clauses for merge or hash, > so we're going to end up with an unconstrained nestloop, which isn't > something to be eager to form. The join ordering rules will take > care of forcing us to make the join when necessary. Agreed. And as I tried, in lots of cases joins with such clauses would be accepted by have_join_order_restriction(), which always appears with have_relevant_joinclause(). > The only easy improvement I can see to make here is to apply the old > rules at inner joins. Maybe it's worth complicating the data structures > to be smarter at outer joins, but I rather doubt it: we could easily > expend more overhead than we'll save here by examining irrelevant ECs. > In any case, if there is a useful optimization here, it can be pursued > later. This makes sense. > I changed it anyway after noting that (a) passing in the ojrelid is > needful to be able to distinguish inner and outer joins, and > (b) the existing comment about the join_relids input is now wrong. > Even if it happens to not be borked for current callers, that seems > like a mighty fragile assumption. Agreed. This is reasonable. > Less-hasty v2 patch attached. I think the patch is in good shape now. Thanks Richard
Re: Making empty Bitmapsets always be NULL
On Thu, Mar 2, 2023 at 6:59 AM Tom Lane wrote: > Yeah. I split out those executor fixes as 0002; 0003 is the changes > to bitmapsets proper, and then 0004 removes now-dead code. +1 to all these patches. Some minor comments from me. *0003 It seems that the Bitmapset checked by bms_is_empty_internal cannot be NULL from how it is computed by a function. So I wonder if we can remove the check of 'a' being NULL in that function, or reduce it to an Assert. - if (a == NULL) - return true; + Assert(a != NULL); *0004 It seems that in create_lateral_join_info around line 689, the bms_is_empty check of lateral_relids is not necessary, since we've checked that lateral_relids cannot be NULL several lines earlier. @@ -682,12 +682,6 @@ create_lateral_join_info(PlannerInfo *root) if (lateral_relids == NULL) continue; - /* -* We should not have broken the invariant that lateral_relids is -* exactly NULL if empty. -*/ - Assert(!bms_is_empty(lateral_relids)); - Thanks Richard
Assert failure of the cross-check for nullingrels
While looking into issue [1], I came across $subject on master. Below is how to reproduce it. DROP TABLE IF EXISTS t1,t2,t3,t4 CASCADE; CREATE TABLE t1 AS SELECT true AS x FROM generate_series(0,1) x; CREATE TABLE t2 AS SELECT true AS x FROM generate_series(0,1) x; CREATE TABLE t3 AS SELECT true AS x FROM generate_series(0,1) x; CREATE TABLE t4 AS SELECT true AS x FROM generate_series(0,1) x; ANALYZE; explain (costs off) select * from t1 left join (t2 left join t3 on t2.x) on t2.x left join t4 on t3.x and t2.x where t1.x = coalesce(t2.x,true); I've looked into this a little bit. For the join of t2/t3 to t4, since it can commute with the join of t1 to t2/t3 according to identity 3, we would generate multiple versions for its joinquals. In particular, the qual 't3.x' would have two versions, one with varnullingrels as {t2/t3, t1/t2}, the other one with varnullingrels as {t2/t3}. So far so good. Assume we've determined to build the join of t2/t3 to t4 after we've built t1/t2 and t2/t3, then we'd find that both versions of qual 't3.x' would be accepted by clause_is_computable_at. This is not correct. We are supposed to accept only the one marked as {t2/t3, t1/t2}. The other one is not rejected mainly because we found that the qual 't3.x' does not mention any nullable Vars of outer join t1/t2. I wonder if we should consider syn_xxxhand rather than min_xxxhand in clause_is_computable_at when we check if clause mentions any nullable Vars. But I'm not sure about that. [1] https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru Thanks Richard
Re: Assert failure of the cross-check for nullingrels
On Fri, Mar 10, 2023 at 4:13 PM Richard Guo wrote: > I wonder if we should consider syn_xxxhand rather than min_xxxhand in > clause_is_computable_at when we check if clause mentions any nullable > Vars. But I'm not sure about that. > No, considering syn_xxxhand is not right. After some join order commutation we may form the join with only its min_lefthand and min_righthand. In this case if we check against syn_xxxhand rather than min_xxxhand in clause_is_computable_at, we may end up with being unable to find a proper place for some quals. I can see this problem in below query. select * from t1 left join ((select t2.x from t2 left join t3 on t2.x where t3.x is null) s left join t4 on s.x) on s.x = t1.x; Suppose we've formed join t1/t2 and go ahead to form the join of t1/t2 to t3. If we consider t1/t2 join's syn_xxxhand, then the pushed down qual 't3.x is null' would not be computable at this level because it mentions nullable Vars from t1/t2 join's syn_righthand and meanwhile is not marked with t1/t2 join. This is not correct and would trigger an Assert. Back to the original issue, if a join has more than one quals, actually we treat them as a whole when we check if identity 3 applies as well as when we adjust them to be suitable for commutation according to identity 3. So when we check if a qual is computable at a given level, I think we should also consider the join's quals as a whole. I'm thinking that we use a 'group' notion for RestrictInfos and then use the clause_relids of the 'group' in clause_is_computable_at. Does this make sense? Thanks Richard
Re: Assert failure of the cross-check for nullingrels
On Mon, Mar 13, 2023 at 5:03 PM Richard Guo wrote: > Back to the original issue, if a join has more than one quals, actually > we treat them as a whole when we check if identity 3 applies as well as > when we adjust them to be suitable for commutation according to identity > 3. So when we check if a qual is computable at a given level, I think > we should also consider the join's quals as a whole. I'm thinking that > we use a 'group' notion for RestrictInfos and then use the clause_relids > of the 'group' in clause_is_computable_at. Does this make sense? > I'm imagining something like attached (no comments and test cases yet). Thanks Richard v1-0001-Draft-group-RestrictInfos.patch Description: Binary data
Re: Using each rel as both outer and inner for JOIN_ANTI
On Wed, Mar 15, 2023 at 2:25 AM Gregory Stark (as CFM) wrote: > So what is the status of this patch? > > It looks like you received some feedback from Emre, Tom, Ronan, and > Alvaro but it also looks like you responded to most or all of that. > Are you still blocked waiting for feedback? Anything specific you need > help with? > > Or is the patch ready for commit now? In which case it would be good > to rebase it since it's currently failing to apply. Well it would be > good to rebase regardless but it would be especially important if we > want to get it committed :) Thanks for reminding. Attached is the rebased patch, with no other changes. I think the patch is ready for commit. Thanks Richard v6-0001-Using-each-rel-as-both-outer-and-inner-for-anti-joins.patch Description: Binary data
Re: postgres_fdw: Useless if-test in GetConnection()
On Wed, Mar 15, 2023 at 6:18 PM Etsuro Fujita wrote: > While working on something else, I noticed that the “if (entry->conn > == NULL)” test after doing disconnect_pg_server() when re-establishing > a given connection in GetConnection() is pointless, because the former > function ensures that entry->conn is NULL. So I removed the if-test. > Attached is a patch for that. I think we could instead add an > assertion, but I did not, because we already have it in > make_new_connection(). +1. Good catch. Thanks Richard
Re: Assert failure of the cross-check for nullingrels
On Mon, Mar 13, 2023 at 5:44 PM Richard Guo wrote: > On Mon, Mar 13, 2023 at 5:03 PM Richard Guo > wrote: > >> Back to the original issue, if a join has more than one quals, actually >> we treat them as a whole when we check if identity 3 applies as well as >> when we adjust them to be suitable for commutation according to identity >> 3. So when we check if a qual is computable at a given level, I think >> we should also consider the join's quals as a whole. I'm thinking that >> we use a 'group' notion for RestrictInfos and then use the clause_relids >> of the 'group' in clause_is_computable_at. Does this make sense? >> > > I'm imagining something like attached (no comments and test cases yet). > Here is an updated patch with comments and test case. I also change the code to store 'group_clause_relids' directly in RestrictInfo. Thanks Richard v2-0001-Check-group_clause_relids-to-see-if-a-clause-is-computable.patch Description: Binary data
Re: A problem about ParamPathInfo for an AppendPath
On Fri, Mar 17, 2023 at 6:15 AM Tom Lane wrote: > Pushed. I thought the comment needed to be completely rewritten not just > tweaked, and I felt it was probably reasonable to continue to exclude > dummy paths from getting the more expensive treatment. Yes agreed. Thanks for the changes and pushing. Thanks Richard
Re: An oversight in ExecInitAgg for grouping sets
On Mon, Jan 9, 2023 at 5:21 PM David Rowley wrote: > On Thu, 5 Jan 2023 at 20:06, Richard Guo wrote: > > I reviewed this patch and have some comments. > > Thanks for looking at this. I think I've fixed all the issues you > mentioned. > > One extra thing I noticed was that I had to add a new > VALGRIND_MAKE_MEM_DEFINED in AllocSetAlloc when grabbing an item off > the freelist. I didn't quite manage to figure out why that's needed as > when we do AllocSetFree() we don't mark the pfree'd memory with > NOACCESS, and it also looks like AllocSetReset() sets the keeper > block's memory to NOACCESS, but that function also clears the > freelists too, so the freelist chunk is not coming from a recently > reset context. > > I might need to spend a bit more time on this to see if I can figure > out why this is happening. On the other hand, maybe we should just > mark pfree'd memory as NOACCESS as that might find another class of > issues. It occurred to me that this hasn't been applied. Should we add it to the CF to not lose track of it? Thanks Richard
Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean
On Mon, Mar 20, 2023 at 2:43 PM Michael Paquier wrote: > Nathan has reported to me offlist that maintainer-clean was not doing > its job for the files generated by gen_node_support.pl in > src/backend/nodes/ for the query jumbling. Attached is a patch to > take care of this issue. > > While on it, I have found a comment in the related README that was > missing a refresh. > > Any objections or comments? A minor comment for the README is that now we have five support functions not four. - outcome. (For some classes of node types, you don't need all four support + outcome. (For some classes of node types, you don't need all five support Thanks Richard
Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean
On Mon, Mar 20, 2023 at 3:49 PM Daniel Gustafsson wrote: > > On 20 Mar 2023, at 08:46, Michael Paquier wrote: > > How about removing the "fout/five" entirely here > > and make that simpler? I would propose: > > "For some classes of node types, you don't need all the support > > functions." > > Yes please, keeping such counts in sync is always error-prone. Agreed. +1 to remove the counts. Thanks Richard
Re: Fix typo plgsql to plpgsql.
On Tue, Mar 21, 2023 at 12:26 AM Zhang Mingli wrote: > Found several typos like plgsql, I think it should be plpgsql. > +1. I believe these are typos. And a grep search shows that all typos of this kind are addressed by the patch. Thanks Richard
Re: Comment in preptlist.c
On Tue, Mar 21, 2023 at 5:41 PM Etsuro Fujita wrote: > While working on something else, I noticed $SUBJECT added by commit > 86dc90056: > > * For UPDATE and DELETE queries, the targetlist must also contain "junk" > * tlist entries needed to allow the executor to identify the rows to be > * updated or deleted; for example, the ctid of a heap row. (The planner > * adds these; they're not in what we receive from the planner/rewriter.) > > I think that “planner/rewriter” should be parser/rewriter. Attached > is a patch for that. Yes of course. It should be parser/rewriter here. Thanks Richard