from:"Richard Guo"

Side effect of remove_useless_groupby_columns

2021-02-27 Thread Richard Guo

Hi,

When looking at [1], I realized we may have a side effect when removing
redundant columns in the GROUP BY clause. Suppose we have a query with
ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide
that 'b' is redundant due to being functionally dependent on other GROUP
BY columns, we would remove it from group keys. This will make us lose
the opportunity to leverage the index on 'b'.

Here is an example for illustration.

# create table t (a int primary key, b int);
# insert into t select i, i%1000 from generate_series(1,100)i;
# create index on t(b);

By default, we remove 'b' from group keys and generate a plan as below:

# explain (costs off) select b from t group by a, b order by b limit 10;
   QUERY PLAN

 Limit
   ->  Sort
 Sort Key: b
 ->  Group
   Group Key: a
   ->  Index Scan using t_pkey on t
(6 rows)

The index on 'b' is not being used and we'll have to retrieve all the
data underneath to perform the sort work.

On the other hand, if we keep 'b' as a group column, we can get such a
plan as:

# explain (costs off) select b from t group by a, b order by b limit 10;
   QUERY PLAN
-
 Limit
   ->  Group
 Group Key: b, a
 ->  Incremental Sort
   Sort Key: b, a
   Presorted Key: b
   ->  Index Scan using t_b_idx on t
(7 rows)

With the help of 't_b_idx', we can avoid the full scan on 't' and it
would run much faster.

Any thoughts?

[1]
https://www.postgresql.org/message-id/flat/16869-26346b77d6ccaeec%40postgresql.org

Thanks
Richard

Re: Side effect of remove_useless_groupby_columns

2021-02-28 Thread Richard Guo

Hi,

On Sun, Feb 28, 2021 at 5:15 PM David Rowley  wrote:

> On Sun, 28 Feb 2021 at 20:52, Richard Guo  wrote:
> > When looking at [1], I realized we may have a side effect when removing
> > redundant columns in the GROUP BY clause. Suppose we have a query with
> > ORDER BY 'b', and meanwhile column 'b' is also a group key. If we decide
> > that 'b' is redundant due to being functionally dependent on other GROUP
> > BY columns, we would remove it from group keys. This will make us lose
> > the opportunity to leverage the index on 'b'.
>
> > Any thoughts?
>
> I had initially thought that this was a non-issue before incremental
> sort was added, but the following case highlights that's wrong.
>
> # create table t (a int not null, b int);
> # insert into t select i, i%1000 from generate_series(1,100)i;
> # create index on t(b,a);
>
> # explain (analyze) select b from t group by a, b order by b,a limit 10;
> # alter table t add constraint t_pkey primary key (a);
> # explain (analyze) select b from t group by a, b order by b,a limit 10;
>
> Execution slows down after adding the primary key since that allows
> the functional dependency to be detected and the  "b" column removed
> from the GROUP BY clause.
>
> Perhaps we could do something like add:
>
> diff --git a/src/backend/optimizer/plan/planner.c
> b/src/backend/optimizer/plan/planner.c
> index 545b56bcaf..7e52212a13 100644
> --- a/src/backend/optimizer/plan/planner.c
> +++ b/src/backend/optimizer/plan/planner.c
> @@ -3182,7 +3182,8 @@ remove_useless_groupby_columns(PlannerInfo *root)
> if (!IsA(var, Var) ||
> var->varlevelsup > 0 ||
> !bms_is_member(var->varattno -
> FirstLowInvalidHeapAttributeNumber,
> -
> surplusvars[var->varno]))
> +
> surplusvars[var->varno]) ||
> +   list_member(parse->sortClause, sgc))
> new_groupby = lappend(new_groupby, sgc);
> }
>
> to remove_useless_groupby_columns().  It's not exactly perfect though
> since it's still good to remove the useless GROUP BY columns if
> there's no useful index to implement the ORDER BY.  We just don't know
> that when we call remove_useless_groupby_columns().
>

Or, rather than thinking it as a side effect of removing useless groupby
columns, how about we do an additional optimization as 'adding useful
groupby columns', to add into group keys some column which matches ORDER
BY and meanwhile is being functionally dependent on other GROUP BY
columns. What I'm thinking is a scenario like below.

# create table t (a int primary key, b int, c int);
# insert into t select i, i%1000, i%1000 from generate_series(1,100)i;
# create index on t(b);

# explain (analyze) select b from t group by a, c order by b limit 10;

For this query, we can first remove 'c' from groupby columns with the
existing optimization of 'removing useless groupby columns', and then
add 'b' to groupby columns with the new-added optimization of 'adding
useful groupby columns'.

We can do that because we know 'b' is functionally dependent on 'a', and
we are required to be sorted by 'b', and meanwhile there is index on 'b'
that we can leverage.

Thanks
Richard

Re: A problem about partitionwise join

2020-11-10 Thread Richard Guo

On Fri, Nov 6, 2020 at 11:26 PM Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

> Status update for a commitfest entry.
>
> According to CFbot this patch fails to apply. Richard, can you send an
> update, please?
>
> Also, I see that the thread was inactive for a while.
> Are you going to continue this work? I think it would be helpful, if you
> could write a short recap about current state of the patch and list open
> questions for reviewers.
>
> The new status of this patch is: Waiting on Author
>

Thanks Anastasia. I've rebased the patch with latest master.

To recap, the problem we are fixing here is when generating join clauses
from equivalence classes, we only select the joinclause with the 'best
score', or the first joinclause with a score of 3. This may cause us to
miss some joinclause on partition keys and thus fail to generate
partitionwise join.

The initial idea for the fix is to create all the RestrictInfos from ECs
in order to check whether there exist equi-join conditions involving
pairs of matching partition keys of the relations being joined for all
partition keys. And then Tom proposed a much better idea which leverages
function exprs_known_equal() to tell whether the partkeys can be found
in the same eclass, which is the current implementation in the latest
patch.

Thanks
Richard

v4-0001-Fix-up-partitionwise-join.patch
Description: Binary data

A spot of redundant initialization of brin memtuple

2021-11-18 Thread Richard Guo

Happened to notice this when reading around the codes. The BrinMemTuple
would be initialized in brin_new_memtuple(), right after being created.
So we don't need to initialize it again outside.

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index ccc9fa0959..67a277e1f9 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel,
BrinRevmap *revmap,
state->bs_bdesc = brin_build_desc(idxRel);
state->bs_dtuple = brin_new_memtuple(state->bs_bdesc);

-   brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
-
return state;
 }

Thanks
Richard

Re: A spot of redundant initialization of brin memtuple

2021-11-21 Thread Richard Guo

On Sat, Nov 20, 2021 at 12:23 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Fri, Nov 19, 2021 at 1:13 PM Richard Guo 
> wrote:
> >
> > Happened to notice this when reading around the codes. The BrinMemTuple
> > would be initialized in brin_new_memtuple(), right after being created.
> > So we don't need to initialize it again outside.
> >
> > diff --git a/src/backend/access/brin/brin.c
> b/src/backend/access/brin/brin.c
> > index ccc9fa0959..67a277e1f9 100644
> > --- a/src/backend/access/brin/brin.c
> > +++ b/src/backend/access/brin/brin.c
> > @@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel,
> BrinRevmap *revmap,
> > state->bs_bdesc = brin_build_desc(idxRel);
> > state->bs_dtuple = brin_new_memtuple(state->bs_bdesc);
> >
> > -   brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
> > -
> > return state;
> >  }
>
> Good catch. +1 for the change. Please submit a patch.
>

Thanks for the review. Attached is the patch.

Thanks
Richard


v1-0001-Remove-redundant-initialization-of-brin-memtuple.patch
Description: Binary data

Re: A spot of redundant initialization of brin memtuple

2021-11-21 Thread Richard Guo

On Mon, Nov 22, 2021 at 12:52 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Nov 22, 2021 at 8:53 AM Richard Guo 
> wrote:
> >
> >
> > On Sat, Nov 20, 2021 at 12:23 AM Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote:
> >>
> >> On Fri, Nov 19, 2021 at 1:13 PM Richard Guo 
> wrote:
> >> >
> >> > Happened to notice this when reading around the codes. The
> BrinMemTuple
> >> > would be initialized in brin_new_memtuple(), right after being
> created.
> >> > So we don't need to initialize it again outside.
> >> >
> >> > diff --git a/src/backend/access/brin/brin.c
> b/src/backend/access/brin/brin.c
> >> > index ccc9fa0959..67a277e1f9 100644
> >> > --- a/src/backend/access/brin/brin.c
> >> > +++ b/src/backend/access/brin/brin.c
> >> > @@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel,
> BrinRevmap *revmap,
> >> > state->bs_bdesc = brin_build_desc(idxRel);
> >> > state->bs_dtuple = brin_new_memtuple(state->bs_bdesc);
> >> >
> >> > -   brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
> >> > -
> >> > return state;
> >> >  }
> >>
> >> Good catch. +1 for the change. Please submit a patch.
> >
> >
> > Thanks for the review. Attached is the patch.
>
> Thanks. The patch looks good to me. Let's add it to the commitfest to
> not lose track of it.
>

Done. Here it is:
https://commitfest.postgresql.org/36/3424/

Thanks again for the review.

Thanks
Richard

Re: A problem about partitionwise join

2021-11-21 Thread Richard Guo

On Wed, Oct 6, 2021 at 1:19 AM Jaime Casanova 
wrote:

> On Wed, Jul 21, 2021 at 04:44:53PM +0800, Richard Guo wrote:
> > On Fri, Nov 27, 2020 at 8:05 PM Ashutosh Bapat <
> ashutosh.bapat@gmail.com>
> > wrote:
> >
> > >
> > > In the example you gave earlier, the equi join on partition key was
> > > there but it was replaced by individual constant assignment clauses.
> > > So if we keep the original restrictclause in there with a new flag
> > > indicating that it's redundant, have_partkey_equi_join will still be
> > > able to use it without much change. Depending upon where all we need
> > > to use avoid restrictclauses with the redundant flag, this might be an
> > > easier approach. However, with Tom's idea partition-wise join may be
> > > used even when there is no equi-join between partition keys but there
> > > are clauses like pk = const for all tables involved and const is the
> > > same for all such tables.
> > >
> >
> > Correct. So with Tom's idea partition-wise join can cope with clauses
> > such as 'foo.k1 = bar.k1 and foo.k2 = 16 and bar.k2 = 16'.
> >
> >
> > >
> > > In the spirit of small improvement made to the performance of
> > > have_partkey_equi_join(), pk_has_clause should be renamed as
> > > pk_known_equal and pks_known_equal as num_equal_pks.
> > >
> >
> > Thanks for the suggestion. Will do that in the new version of patch.
> >
>
> Hi Richard,
>
> We are marking this CF entry as "Returned with Feedback", which means
> you are encouraged to send a new patch (and create a new entry for a
> future CF for it) with the suggested changes.
>

Hi,

The suggested changes have already been included in v5 patch. Sorry for
the confusion.

Verified that the patch still applies and works on latest master. So I'm
moving it to the next CF (which is Commitfest 2022-01). Please correct
me if this is not the right thing to do.

Thanks
Richard

Inconsistent results from seqscan and gist indexscan

2021-11-25 Thread Richard Guo

Here is how it can be reproduced.

create table point_tbl (f1 point);

insert into point_tbl(f1) values ('(5.1, 34.5)');
insert into point_tbl(f1) values (' ( Nan , NaN ) ');
analyze;

create index gpointind on point_tbl using gist (f1);

set enable_seqscan to on;
set enable_indexscan to off;
# select * from point_tbl where f1 <@ polygon
'(0,0),(0,100),(100,100),(100,0),(0,0)';
 f1

 (5.1,34.5)
 (NaN,NaN)
(2 rows)


set enable_seqscan to off;
set enable_indexscan to on;
# select * from point_tbl where f1 <@ polygon
'(0,0),(0,100),(100,100),(100,0),(0,0)';
 f1

 (5.1,34.5)
(1 row)

Seems point_inside() does not handle NaN properly.

BTW, I'm using 15devel. But this issue can be seen in at least 12
version also.

Thanks
Richard

Re: Inconsistent results from seqscan and gist indexscan

2021-11-26 Thread Richard Guo

On Fri, Nov 26, 2021 at 5:23 PM Julien Rouhaud  wrote:

> Hi,
>
> On Fri, Nov 26, 2021 at 2:10 PM Richard Guo 
> wrote:
> >
> > Seems point_inside() does not handle NaN properly.
>
> This is unfortunately a known issue, which was reported twice ([1] and
> [2]) already.  There's a patch proposed for it:
> https://commitfest.postgresql.org/32/2710/ (adding Horiguchi-san in
> Cc).
>

Ah, I missed the previous threads. Good to know there is a patch fixing
it.

Thanks
Richard

Re: d25ea01275 and partitionwise join

2019-09-04 Thread Richard Guo

Hi Amit,

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote 
wrote:

> Fujita-san,
>
> To avoid losing track of this, I've added this to November CF.
>
> https://commitfest.postgresql.org/25/2278/
>
> I know there is one more patch beside the partitionwise join fix, but
> I've set the title to suggest that this is related mainly to
> partitionwise joins.
>

 Thank you for working on this. Currently partitionwise join does not
 take COALESCE expr into consideration when matching to partition keys.
 This is a problem.

 BTW, a rebase is needed for the patch set.

Thanks
Richard

Re: d25ea01275 and partitionwise join

2019-09-04 Thread Richard Guo

Hi Amit,

On Wed, Sep 4, 2019 at 3:30 PM Richard Guo  wrote:

> Hi Amit,
>
> On Wed, Sep 4, 2019 at 10:01 AM Amit Langote 
> wrote:
>
>> Fujita-san,
>>
>> To avoid losing track of this, I've added this to November CF.
>>
>> https://commitfest.postgresql.org/25/2278/
>>
>> I know there is one more patch beside the partitionwise join fix, but
>> I've set the title to suggest that this is related mainly to
>> partitionwise joins.
>>
>
>  Thank you for working on this. Currently partitionwise join does not
>  take COALESCE expr into consideration when matching to partition keys.
>  This is a problem.
>
>  BTW, a rebase is needed for the patch set.
>

I'm reviewing v2-0002 and I have concern about how COALESCE expr is
processed in match_join_arg_to_partition_keys().

If there is a COALESCE expr with first arg being non-partition key expr
and second arg being partition key, the patch would match it to the
partition key, which may result in wrong results in some cases.

For instance, consider the partition table below:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

So with patch v2-0002, the following query will be planned with
partitionwise join.

# explain (costs off)
select * from (p as t1 full join p as t2 on t1.k = t2.k) as
t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.val1, t12.k1)
= t3.k;
QUERY PLAN
--
 Append
   ->  Hash Full Join
 Hash Cond: (COALESCE(t1.val, t1.k) = t3.k)
 ->  Hash Full Join
   Hash Cond: (t1.k = t2.k)
   ->  Seq Scan on p_1 t1
   ->  Hash
 ->  Seq Scan on p_1 t2
 ->  Hash
   ->  Seq Scan on p_1 t3
   ->  Hash Full Join
 Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k)
 ->  Hash Full Join
   Hash Cond: (t1_1.k = t2_1.k)
   ->  Seq Scan on p_2 t1_1
   ->  Hash
 ->  Seq Scan on p_2 t2_1
 ->  Hash
   ->  Seq Scan on p_2 t3_1
(19 rows)

But as t1.val is not a partition key, actually we cannot use
partitionwise join here.

If we insert below data into the table, we will get wrong results for
the query above.

insert into p select 5,15;
insert into p select 15,5;

Thanks
Richard

Pulling up direct-correlated ANY_SUBLINK

2019-09-10 Thread Richard Guo

Hi,

Currently we do not try to pull up sub-select of type ANY_SUBLINK if it
refers to any Vars of the parent query, as indicated in the code snippet
below:

JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
Relids available_rels)
{
...

if (contain_vars_of_level((Node *) subselect, 1))
return NULL;


Why do we have this check?

Can we try to pull up direct-correlated ANY SubLink with the help of
LATERAL? That is, do the pull up in the same way as uncorrelated ANY
SubLink, by adding the SubLink's subselect to the query's rangetable,
but explicitly set LATERAL for its RangeTblEntry, like:

--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1226,13 +1226,6 @@ convert_ANY_sublink_to_join(PlannerInfo *root,
SubLink *sublink,
Assert(sublink->subLinkType == ANY_SUBLINK);

/*
-* The sub-select must not refer to any Vars of the parent query.
(Vars of
-* higher levels should be okay, though.)
-*/
-   if (contain_vars_of_level((Node *) subselect, 1))
-   return NULL;
-
-   /*
 * The test expression must contain some Vars of the parent query,
else
 * it's not gonna be a join.  (Note that it won't have Vars
referring to
 * the subquery, rather Params.)
@@ -1267,7 +1260,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root,
SubLink *sublink,
rte = addRangeTableEntryForSubquery(pstate,
subselect,
makeAlias("ANY_subquery", NIL),
-   false,
+  contain_vars_of_level((Node *)
subselect, 1), /* lateral */
false);
parse->rtable = lappend(parse->rtable, rte);
rtindex = list_length(parse->rtable);


By this way, we can convert the query:

select * from a where a.i = ANY(select i from b where a.j > b.j);

To:

select * from a SEMI JOIN lateral(select * from b where a.j > b.j) sub on
a.i = sub.i;


Does this make sense?

Thanks
Richard

Re: A problem about partitionwise join

2019-09-10 Thread Richard Guo

Hi Alvaro,

Thank you for reviewing this patch.

On Wed, Sep 11, 2019 at 4:48 AM Alvaro Herrera from 2ndQuadrant <
alvhe...@alvh.no-ip.org> wrote:

> So in this patch, the input restrictlist is modified to include the
> clauses generated by generate_join_implied_equalities_for_all.  That
> doesn't seem okay -- doesn't it affect downstream usage of the
> restrictlist in the caller of set_joinrel_size_estimates?
>

Actually the joinclauses generated by
generate_join_implied_equalities_for_all only affects the restrictlist
used in have_partkey_equi_join to check equi-join conditions for
partition keys.  The input restrictlist would not be altered.

>
> I wonder if it's possible to do this by using the ECs directly in
> have_partkey_equi_join instead of using them to create fake join
> clauses.
>

Hmm.. I thought about this option and at last figured that what we need
to do in have_partkey_equi_join with the ECs is actually the same as in
generate_join_implied_equalities_for_all. Maybe I didn't think it
correctly.

Thanks
Richard

Re: Pulling up direct-correlated ANY_SUBLINK

2019-09-11 Thread Richard Guo

Hi Antonin,

On Tue, Sep 10, 2019 at 4:31 PM Antonin Houska  wrote:

> Richard Guo  wrote:
>
> > Can we try to pull up direct-correlated ANY SubLink with the help of
> > LATERAL?
>
> > By this way, we can convert the query:
> >
> > select * from a where a.i = ANY(select i from b where a.j > b.j);
> >
> > To:
> >
> > select * from a SEMI JOIN lateral(select * from b where a.j > b.j)
> > sub on a.i = sub.i;
> >
>
> I tried this a few years ago. This is where the problems started:
>
>
> https://www.postgresql.org/message-id/1386716782.5203.YahooMailNeo%40web162905.mail.bf1.yahoo.com


Thank you for this link. Good to know the discussions years ago.


> I'm not sure I remember enough, but the problem has something to do with
> one
> possible strategy to plan SEMI JOIN: unique-ify the inner path and then
> perform plain INNER JOIN instead.
>
> I think the problemm was that the WHERE clause of the subquery didn't
> participate in the SEMI JOIN evaluation and was used as filter instead.
> Thus
> the clause's Vars were not used in unique keys of the inner path and so the
> SEMI JOIN didn't work well.
>

This used to be a problem until it was fixed by commit 043f6ff0, which
includes the postponed qual from a LATERAL subquery into the quals seen
by make_outerjoininfo().

Thanks
Richard

Re: Pulling up direct-correlated ANY_SUBLINK

2019-09-11 Thread Richard Guo

Hi Antonin,

On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska  wrote:

>
> Nevertheless, I don't know how to overcome the problems that I mentioned
> upthread.
>

Do you mean the problem "the WHERE clause of the subquery didn't
participate in the SEMI JOIN evaluation"? Good news is it has been fixed
by commit 043f6ff0 as I mentioned upthread.

Thanks
Richard

Re: Pulling up direct-correlated ANY_SUBLINK

2019-09-11 Thread Richard Guo

Hi Tom,

On Tue, Sep 10, 2019 at 9:48 PM Tom Lane  wrote:

>
> > Can we try to pull up direct-correlated ANY SubLink with the help of
> > LATERAL?
>
> Perhaps.  But what's the argument that you'd end up with a better
> plan?  LATERAL pretty much constrains things to use a nestloop,
> so I'm not sure there's anything fundamentally different.
>

This is a point I didn't think of. In that case if the pull-up mostly
results in a nestloop then we cannot make sure we will get a better
plan. Thank you for pointing it out.

Thanks
Richard

Re: Pulling up direct-correlated ANY_SUBLINK

2019-09-17 Thread Richard Guo

On Thu, Sep 12, 2019 at 11:35 PM Antonin Houska  wrote:

> Richard Guo  wrote:
>
> > On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska 
> > wrote:
> >
> >
> > Nevertheless, I don't know how to overcome the problems that I
> > mentioned
> > upthread.
> >
> >
> > Do you mean the problem "the WHERE clause of the subquery didn't
> > participate in the SEMI JOIN evaluation"? Good news is it has been
> > fixed
> > by commit 043f6ff0 as I mentioned upthread.
>
> Do you say that my old patch (rebased) no longer breaks the regression
> tests?
>

I think so.

>
> (I noticed your other email in the thread which seems to indicate that
> you're
> no lo longer interested to work on the feature, but asking out of
> curiosity.)
>

Tom pointed out that even if we pull up the subquery with the help of
LATERAL, we cannot make sure we will end up with a better plan, since
LATERAL pretty much constrains things to use a nestloop. Hmm, I think
what he said makes sense.

Thanks
Richard

Re: A problem about partitionwise join

2019-09-20 Thread Richard Guo

Hi Dilip,

Thank you for reviewing this patch.

On Fri, Sep 20, 2019 at 12:48 PM Dilip Kumar  wrote:

> On Thu, Aug 29, 2019 at 3:15 PM Richard Guo  wrote:
> >
> >
> > Attached is a patch as an attempt to address this issue. The idea is
> > quite straightforward. When building partition info for joinrel, we
> > generate any possible EC-derived joinclauses of form 'outer_em =
> > inner_em', which will be used together with the original restrictlist to
> > check if there exists an equi-join condition for each pair of partition
> > keys.
> >
> > Any comments are welcome!
>  /*
> + * generate_join_implied_equalities_for_all
> + *   Create any EC-derived joinclauses of form 'outer_em = inner_em'.
> + *
> + * This is used when building partition info for joinrel.
> + */
> +List *
> +generate_join_implied_equalities_for_all(PlannerInfo *root,
> + Relids join_relids,
> + Relids outer_relids,
> + Relids inner_relids)
>
> I think we need to have more detailed comments about why we need this
> separate function, we can also explain that
> generate_join_implied_equalities function will avoid
> the join clause if EC has the constant but for partition-wise join, we
> need that clause too.
>

Thank you for the suggestion.


>
> + while ((i = bms_next_member(matching_ecs, i)) >= 0)
> + {
> + EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes,
> i);
> + List*outer_members = NIL;
> + List*inner_members = NIL;
> + ListCell   *lc1;
> +
> + /* Do not consider this EC if it's ec_broken */
> + if (ec->ec_broken)
> + continue;
> +
> + /* Single-member ECs won't generate any deductions */
> + if (list_length(ec->ec_members) <= 1)
> + continue;
> +
>
> I am wondering isn't it possible to just process the missing join
> clause?  I mean 'generate_join_implied_equalities' has only skipped
> the ECs which has const so
> can't we create join clause only for those ECs and append it the
> "Restrictlist" we already have?  I might be missing something?
>

For ECs without const, 'generate_join_implied_equalities' also has
skipped some join clauses since it only selects the joinclause with
'best_score' between outer members and inner members. And the missing
join clauses are needed to generate partitionwise join. That's why the
query below cannot be planned as partitionwise join, as we have missed
joinclause 'foo.k = bar.k'.

select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;

And yes 'generate_join_implied_equalities_for_all' will create join
clauses that have existed in restrictlist. I think it's OK since the
same RestrictInfo deduced from EC will share the same pointer and
list_concat_unique_ptr will make sure there are no duplicates in the
restrictlist used by have_partkey_equi_join.

Thanks
Richard

Re: d25ea01275 and partitionwise join

2019-09-20 Thread Richard Guo

On Thu, Sep 19, 2019 at 4:15 PM Amit Langote 
wrote:

> Hi Richard,
>
> Thanks a lot for taking a close look at the patch and sorry about the
> delay.
>
> On Wed, Sep 4, 2019 at 5:29 PM Richard Guo  wrote:
> >> On Wed, Sep 4, 2019 at 10:01 AM Amit Langote 
> wrote:
> > I'm reviewing v2-0002 and I have concern about how COALESCE expr is
> > processed in match_join_arg_to_partition_keys().
> >
> > If there is a COALESCE expr with first arg being non-partition key expr
> > and second arg being partition key, the patch would match it to the
> > partition key, which may result in wrong results in some cases.
> >
> > For instance, consider the partition table below:
> >
> > create table p (k int, val int) partition by range(k);
> > create table p_1 partition of p for values from (1) to (10);
> > create table p_2 partition of p for values from (10) to (100);
> >
> > So with patch v2-0002, the following query will be planned with
> > partitionwise join.
> >
> > # explain (costs off)
> > select * from (p as t1 full join p as t2 on t1.k = t2.k) as
> t12(k1,val1,k2,val2)
> > full join p as t3 on COALESCE(t12.val1,
> t12.k1) = t3.k;
> > QUERY PLAN
> > --
> >  Append
> >->  Hash Full Join
> >  Hash Cond: (COALESCE(t1.val, t1.k) = t3.k)
> >  ->  Hash Full Join
> >Hash Cond: (t1.k = t2.k)
> >->  Seq Scan on p_1 t1
> >->  Hash
> >  ->  Seq Scan on p_1 t2
> >  ->  Hash
> >->  Seq Scan on p_1 t3
> >->  Hash Full Join
> >  Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k)
> >  ->  Hash Full Join
> >Hash Cond: (t1_1.k = t2_1.k)
> >->  Seq Scan on p_2 t1_1
> >->  Hash
> >  ->  Seq Scan on p_2 t2_1
> >  ->  Hash
> >->  Seq Scan on p_2 t3_1
> > (19 rows)
> >
> > But as t1.val is not a partition key, actually we cannot use
> > partitionwise join here.
> >
> > If we insert below data into the table, we will get wrong results for
> > the query above.
> >
> > insert into p select 5,15;
> > insert into p select 15,5;
>
> Good catch!  It's quite wrong to use COALESCE(t12.val1, t12.k1) = t3.k
> for partitionwise join as the COALESCE expression might as well output
> the value of val1 which doesn't conform to partitioning.
>
> I've fixed match_join_arg_to_partition_keys() to catch that case and
> fail.  Added a test case as well.
>
> Please find attached updated patches.
>

Thank you for the fix. Will review.

Thanks
Richard

Re: NOT IN subquery optimization

2019-02-28 Thread Richard Guo

On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng  wrote:

> Resend the patch with a whitespace removed so that "git apply patch" works
> directly.
>
>

Hi Zheng,

I have reviewed your patch. Good job except two issues I can find:

1. The patch would give wrong results when the inner side is empty. In this
case, the whole data from outer side should be in the outputs. But with the
patch, we will lose the NULLs from outer side.

2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash
join or merge join to do the ANTI JOIN.  Nested loop becomes the only
choice,
which is low-efficency.

Thanks
Richard

Re: Parallel grouping sets

2019-07-30 Thread Richard Guo

On Wed, Jun 12, 2019 at 10:58 AM Richard Guo  wrote:

> Hi all,
>
> Paul and I have been hacking recently to implement parallel grouping
> sets, and here we have two implementations.
>
> Implementation 1
> 
>
> Attached is the patch and also there is a github branch [1] for this
> work.
>

Rebased with the latest master.

Thanks
Richard


v3-0001-Implementing-parallel-grouping-sets.patch
Description: Binary data

Re: Parallel grouping sets

2019-07-31 Thread Richard Guo

On Tue, Jul 30, 2019 at 11:05 PM Tomas Vondra 
wrote:

> On Tue, Jul 30, 2019 at 03:50:32PM +0800, Richard Guo wrote:
> >On Wed, Jun 12, 2019 at 10:58 AM Richard Guo  wrote:
> >
> >> Hi all,
> >>
> >> Paul and I have been hacking recently to implement parallel grouping
> >> sets, and here we have two implementations.
> >>
> >> Implementation 1
> >> 
> >>
> >> Attached is the patch and also there is a github branch [1] for this
> >> work.
> >>
> >
> >Rebased with the latest master.
> >
>
> Hi Richard,
>
> thanks for the rebased patch. I think the patch is mostly fine (at least I
> don't see any serious issues). A couple minor comments:
>

Hi Tomas,

Thank you for reviewing this patch.


>
> 1) I think get_number_of_groups() would deserve a short explanation why
> it's OK to handle (non-partial) grouping sets and regular GROUP BY in the
> same branch. Before these cases were clearly separated, now it seems a bit
> mixed up and it may not be immediately obvious why it's OK.
>

Added a short comment in get_number_of_groups() explaining the behavior
when doing partial aggregation for grouping sets.


>
> 2) There are new regression tests, but they are not added to any schedule
> (parallel or serial), and so are not executed as part of "make check". I
> suppose this is a mistake.
>

Yes, thanks. Added the new regression test in parallel_schedule and
serial_schedule.


>
> 3) The regression tests do check plan and results like this:
>
> EXPLAIN (COSTS OFF, VERBOSE) SELECT ...;
> SELECT ... ORDER BY 1, 2, 3;
>
> which however means that the query might easily use a different plan than
> what's verified in the eplain (thanks to the additional ORDER BY clause).
> So I think this should explain and execute the same query.
>
> (In this case the plans seems to be the same, but that may easily change
> in the future, and we could miss it here, failing to verify the results.)
>

Thank you for pointing this out. Fixed it in V4 patch.


>
> 4) It might be a good idea to check the negative case too, i.e. a query on
> data set that we should not parallelize (because the number of partial
> groups would be too high).
>

Yes, agree. Added a negative case.


>
>
> Do you have any plans to hack on the second approach too? AFAICS those two
> approaches are complementary (address different data sets / queries), and
> it would be nice to have both. One of the things I've been wondering is if
> we need to invent gset_id as a new concept, or if we could simply use the
> existing GROUPING() function - that uniquely identifies the grouping set.
>
>
Yes, I'm planning to hack on the second approach in short future. I'm
also reconsidering the gset_id stuff since it brings a lot of complexity
for the second approach.  I agree with you that we can try GROUPING()
function to see if it can replace gset_id.

Thanks
Richard


v4-0001-Implementing-parallel-grouping-sets.patch
Description: Binary data

Re: How to retain lesser paths at add_path()?

2019-08-01 Thread Richard Guo

On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai  wrote:

> 2019年8月1日(木) 1:41 Tom Lane :
> >
> > Robert Haas  writes:
> > > Yeah, but I have to admit that this whole design makes me kinda
> > > uncomfortable.  Every time somebody comes up with a new figure of
> > > merit, it increases not only the number of paths retained but also the
> > > cost of comparing two paths to possibly reject one of them. A few
> > > years ago, you came up with the (good) idea of rejecting some join
> > > paths before actually creating the paths, and I wonder if we ought to
> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > > been saying, we ought to think about planning top-down with
> > > memoization instead of bottom up (yeah, I know that's a huge change).
> > > It just feels like the whole idea of a list of paths ordered by cost
> > > breaks down when there are so many ways that a not-cheapest path can
> > > still be worth keeping. Not sure exactly what would be better, though.
> >
> > Yeah, I agree that add_path is starting to feel creaky.  I don't
> > know what to do instead though.  Changing to a top-down design
> > sounds like it would solve some problems while introducing others
> > (not to mention the amount of work and breakage involved).
> >
> Hmm... It looks the problem we ought to revise about path construction
> is much larger than my expectation, and uncertain for me how much works
> are needed.
>
> Although it might be a workaround until fundamental reconstruction,
> how about to have a margin of estimated cost to reject paths?
> Current add_path() immediately rejects lesser paths if its cost is
> even a little more expensive than the compared one. One the other hands,
>

Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
costs of two paths, although the fuzz factor (1%) is hard coded and not
user-controllable.


> I understand it is not an essential re-design of path-construction logic,
> and
> may have limitation. However, amount of works are reasonable and no side-
> effect. (current behavior = 0% threshold).
> How about your opinions?
>
>
How's about Tom's suggestion on adding another dimension in add_path()
to be considered, just like how it considers paths of better sort order
or parallel-safe?

Thanks
Richard

Re: Partial join

2019-08-01 Thread Richard Guo

On Thu, Aug 1, 2019 at 5:38 PM Arne Roland  wrote:

> Hello,
>
> I attached one example of a partitioned table with multi column partition
> key. I also attached the output.
> Disabling the hash_join is not really necessary, it just shows the more
> drastic result in the case of low work_mem.
>
> Comparing the first and the second query I was surprised to see that SET
> enable_partitionwise_join could cause the costs to go up. Shouldn't the
> paths of the first query be generated as well?
>
> The third query seems to have a different issue. That one is close to my
> original performance problem. It looks to me like the push down of the sl
> condition stops the optimizer considering a partial join.
> If so would it be sane to keep a copy of the original quals to make the
> partial join possible? Do you have better ideas?
>

For the third query,  a rough investigation shows that, the qual 'sl =
5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
down to the base rels. One consequence of the deduction is when
constructing restrict lists for the joinrel, we lose the original
restrict 'sc.sl = sg.sl', and this would fail the check
have_partkey_equi_join(), which checks if there exists an equi-join
condition for each pair of partition keys. As a result, this joinrel
would not be considered as an input to further partitionwise joins.

We need to fix this.

Thanks
Richard

Re: Partial join

2019-08-02 Thread Richard Guo

On Thu, Aug 1, 2019 at 10:15 PM Tom Lane  wrote:

> Richard Guo  writes:
> > For the third query,  a rough investigation shows that, the qual 'sl =
> > 5' and 'sc.sl = sg.sl' will form an equivalence class and generate two
> > implied equalities: 'sc.sl = 5' and 'sg.sl = 5', which can be pushed
> > down to the base rels. One consequence of the deduction is when
> > constructing restrict lists for the joinrel, we lose the original
> > restrict 'sc.sl = sg.sl', and this would fail the check
> > have_partkey_equi_join(), which checks if there exists an equi-join
> > condition for each pair of partition keys. As a result, this joinrel
> > would not be considered as an input to further partitionwise joins.
>
> > We need to fix this.
>
> Uh ... why?  The pushed-down restrictions should result in pruning
> away any prunable partitions at the scan level, leaving nothing for
> the partitionwise join code to do.
>

Hmm..In the case of  multiple partition keys, for range partitioning, if
we have no clauses for a given key, any later keys would not be
considered for partition pruning.

That is to day, for table 'p partition by range (k1, k2)', quals like
'k2 = Const' would not prune partitions.

For query:

select * from p as t1 join p as t2 on t1.k1 = t2.k1 and t1.k2 = t2.k2
and t1.k2 = 2;

Since we don't consider ECs containing consts when generating join
clauses, we don't have restriction 't1.k2 = t2.k2' when building the
joinrel. As a result, partitionwise join is not considered as it
requires there existing an equi-join condition for each pair of
partition keys.

Is this a problem? What's your opinion?

Thanks
Richard

Re: Partial join

2019-08-02 Thread Richard Guo

On Thu, Aug 1, 2019 at 7:46 PM Arne Roland  wrote:

> Hello Richard,
>
> thanks for your quick reply.
>
>
> > We need to fix this.
>
>
> Do you have a better idea than just keeping the old quals - possibly just
> the ones that get eliminated - in a separate data structure? Is the push
> down of quals the only case of elimination of quals, only counting the ones
> which happen before the restrict lists are generated?
>
In you case, the restriction 'sl = sl' is just not generated for the
join, because it forms an EC with const, which is not considered when
generating join clauses.

Please refer to the code snippet below:

@@ -1164,8 +1164,8 @@ generate_join_implied_equalities(PlannerInfo *root,
List   *sublist = NIL;

/* ECs containing consts do not need any further
enforcement */
if (ec->ec_has_const)
continue;

Thanks
Richard

Re: Serialization questions

2019-08-20 Thread Richard Guo

On Wed, Aug 21, 2019 at 9:30 AM Alex  wrote:

>
> first issue "set default_transaction_isolation to 'serializable';" on the
> both sessions,  then run:
>
> Session 1:   begin;  select * from t;  (2 rows selected);
> Session 2:   delete from t;   (committed automatically)
> Session 1:  commit;  (commit successfully).
>
> looks the reads in session 1 has no impact on the session 2 at all which
> is conflicted with the document
>

This behavior makes sense to me. The effect can be considered as we
 execute the two sessions in a serial order of first session 1 and then
 session 2.

Thanks
Richard

A problem about partitionwise join

2019-08-26 Thread Richard Guo

Hi All,

To generate partitionwise join, we need to make sure there exists an
equi-join condition for each pair of partition keys, which is performed
by have_partkey_equi_join(). This makes sense and works well.

But if, let's say, one certain pair of partition keys (foo.k = bar.k)
has formed an equivalence class containing consts, no join clause would
be generated for it, since we have already generated 'foo.k = const' and
'bar.k = const' and pushed them into the proper restrictions earlier.

This will make partitionwise join fail to be planned if there are
multiple partition keys and the pushed-down restrictions 'xxx = const'
fail to prune away any partitions.

Consider the examples below:

create table p (k1 int, k2 int, val int) partition by range(k1,k2);
create table p_1 partition of p for values from (1,1) to (10,100);
create table p_2 partition of p for values from (10,100) to (20,200);

If we are joining on each pair of partition keys, we can generate
partitionwise join:

# explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2;
  QUERY PLAN
--
 Append
   ->  Hash Join
 Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2))
 ->  Seq Scan on p_1 foo
 ->  Hash
   ->  Seq Scan on p_1 bar
   ->  Hash Join
 Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2))
 ->  Seq Scan on p_2 foo_1
 ->  Hash
   ->  Seq Scan on p_2 bar_1
(11 rows)

But if we add another qual 'foo.k2 = const', we will be unable to
generate partitionwise join any more, because have_partkey_equi_join()
thinks not every partition key has an equi-join condition.

# explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2
and foo.k2 = 16;
   QUERY PLAN
-
 Hash Join
   Hash Cond: (foo.k1 = bar.k1)
   ->  Append
 ->  Seq Scan on p_1 foo
   Filter: (k2 = 16)
 ->  Seq Scan on p_2 foo_1
   Filter: (k2 = 16)
   ->  Hash
 ->  Append
   ->  Seq Scan on p_1 bar
 Filter: (k2 = 16)
   ->  Seq Scan on p_2 bar_1
 Filter: (k2 = 16)
(13 rows)

Is this a problem?

Thanks
Richard

Re: A problem about partitionwise join

2019-08-27 Thread Richard Guo

On Tue, Aug 27, 2019 at 8:51 AM Amit Langote 
wrote:

> Hi Richard,
>
> On Mon, Aug 26, 2019 at 6:33 PM Richard Guo  wrote:
> >
> > Hi All,
> >
> > To generate partitionwise join, we need to make sure there exists an
> > equi-join condition for each pair of partition keys, which is performed
> > by have_partkey_equi_join(). This makes sense and works well.
> >
> > But if, let's say, one certain pair of partition keys (foo.k = bar.k)
> > has formed an equivalence class containing consts, no join clause would
> > be generated for it, since we have already generated 'foo.k = const' and
> > 'bar.k = const' and pushed them into the proper restrictions earlier.
> >
> > This will make partitionwise join fail to be planned if there are
> > multiple partition keys and the pushed-down restrictions 'xxx = const'
> > fail to prune away any partitions.
> >
> > Consider the examples below:
> >
> > create table p (k1 int, k2 int, val int) partition by range(k1,k2);
> > create table p_1 partition of p for values from (1,1) to (10,100);
> > create table p_2 partition of p for values from (10,100) to (20,200);
> >
> > If we are joining on each pair of partition keys, we can generate
> > partitionwise join:
> >
> > # explain (costs off)
> > select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 =
> bar.k2;
> >   QUERY PLAN
> > --
> >  Append
> >->  Hash Join
> >  Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2))
> >  ->  Seq Scan on p_1 foo
> >  ->  Hash
> >->  Seq Scan on p_1 bar
> >->  Hash Join
> >  Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2))
> >  ->  Seq Scan on p_2 foo_1
> >  ->  Hash
> >->  Seq Scan on p_2 bar_1
> > (11 rows)
> >
> > But if we add another qual 'foo.k2 = const', we will be unable to
> > generate partitionwise join any more, because have_partkey_equi_join()
> > thinks not every partition key has an equi-join condition.
> >
> > # explain (costs off)
> > select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 =
> bar.k2 and foo.k2 = 16;
> >QUERY PLAN
> > -
> >  Hash Join
> >Hash Cond: (foo.k1 = bar.k1)
> >->  Append
> >  ->  Seq Scan on p_1 foo
> >Filter: (k2 = 16)
> >  ->  Seq Scan on p_2 foo_1
> >Filter: (k2 = 16)
> >->  Hash
> >  ->  Append
> >->  Seq Scan on p_1 bar
> >  Filter: (k2 = 16)
> >->  Seq Scan on p_2 bar_1
> >  Filter: (k2 = 16)
> > (13 rows)
> >
> > Is this a problem?
>
> Perhaps.  Maybe it has to do with the way have_partkey_equi_join() has
> been coded.  If it was coded such that it figured out on its own that
> the equivalence (foo.k2, bar.k2, ...) does exist, then that would
> allow partitionwise join to occur, which I think would be OK to do.
> But maybe I'm missing something.
>
>
This should be caused by how we deduce join clauses from equivalence
classes. ECs containing consts will not be considered so we cannot
generate (foo.k2 = bar.k2) for the query above.

In addition, when generating join clauses from equivalence classes, we
only select the joinclause with the 'best score', or the first
joinclause with a score of 3. This may make us miss some joinclause on
partition keys.

Check the query below as a more illustrative example:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
partitionwise join:

# explain (costs off)
select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
   QUERY PLAN
-
 Append
   ->  Hash Join
 Hash Cond: (foo.k = bar.k)
 ->  Seq Scan on p_1 foo
 ->  Hash
   ->  Seq Scan on p_1 bar
 Filter: (k = val)
   ->  Hash Join
 Hash Cond: (foo_1.k = bar_1.k)
 ->  Seq Scan on p_2 foo_1
 ->  Hash
   ->  Seq Scan on p_2 bar_1
 Filter: (k = val)
(13 rows)

But if we exchange the order of the two quals

Re: A problem about partitionwise join

2019-08-29 Thread Richard Guo

On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita 
wrote:

> Hi,
>
> On Tue, Aug 27, 2019 at 4:57 PM Richard Guo  wrote:
> > Check the query below as a more illustrative example:
> >
> > create table p (k int, val int) partition by range(k);
> > create table p_1 partition of p for values from (1) to (10);
> > create table p_2 partition of p for values from (10) to (100);
> >
> > If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
> > partitionwise join:
> >
> > # explain (costs off)
> > select * from p as foo join p as bar on foo.k = bar.k and foo.k =
> bar.val;
> >QUERY PLAN
> > -
> >  Append
> >->  Hash Join
> >  Hash Cond: (foo.k = bar.k)
> >  ->  Seq Scan on p_1 foo
> >  ->  Hash
> >->  Seq Scan on p_1 bar
> >  Filter: (k = val)
> >->  Hash Join
> >  Hash Cond: (foo_1.k = bar_1.k)
> >  ->  Seq Scan on p_2 foo_1
> >  ->  Hash
> >->  Seq Scan on p_2 bar_1
> >  Filter: (k = val)
> > (13 rows)
> >
> > But if we exchange the order of the two quals to 'foo.k = bar.val and
> > foo.k = bar.k', then partitionwise join cannot be generated any more,
> > because we only have joinclause 'foo.k = bar.val' as it first reached
> > score of 3. We have missed the joinclause on the partition key although
> > it does exist.
> >
> > # explain (costs off)
> > select * from p as foo join p as bar on foo.k = bar.val and foo.k =
> bar.k;
> >QUERY PLAN
> > -
> >  Hash Join
> >Hash Cond: (foo.k = bar.val)
> >->  Append
> >  ->  Seq Scan on p_1 foo
> >  ->  Seq Scan on p_2 foo_1
> >->  Hash
> >  ->  Append
> >->  Seq Scan on p_1 bar
> >  Filter: (val = k)
> >->  Seq Scan on p_2 bar_1
> >  Filter: (val = k)
> > (11 rows)
>
> I think it would be nice if we can address this issue.
>

Thank you.

Attached is a patch as an attempt to address this issue. The idea is
quite straightforward. When building partition info for joinrel, we
generate any possible EC-derived joinclauses of form 'outer_em =
inner_em', which will be used together with the original restrictlist to
check if there exists an equi-join condition for each pair of partition
keys.

Any comments are welcome!

Thanks
Richard


v1-0001-Fix-up-partitionwise-join.patch
Description: Binary data

Re: A problem about partitionwise join

2019-08-29 Thread Richard Guo

On Fri, Aug 30, 2019 at 2:08 AM Etsuro Fujita 
wrote:

> On Thu, Aug 29, 2019 at 6:45 PM Richard Guo  wrote:
> > On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita 
> wrote:
> >> On Tue, Aug 27, 2019 at 4:57 PM Richard Guo  wrote:
> >> > Check the query below as a more illustrative example:
> >> >
> >> > create table p (k int, val int) partition by range(k);
> >> > create table p_1 partition of p for values from (1) to (10);
> >> > create table p_2 partition of p for values from (10) to (100);
> >> >
> >> > If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
> >> > partitionwise join:
> >> >
> >> > # explain (costs off)
> >> > select * from p as foo join p as bar on foo.k = bar.k and foo.k =
> bar.val;
> >> >QUERY PLAN
> >> > -
> >> >  Append
> >> >->  Hash Join
> >> >  Hash Cond: (foo.k = bar.k)
> >> >  ->  Seq Scan on p_1 foo
> >> >  ->  Hash
> >> >->  Seq Scan on p_1 bar
> >> >  Filter: (k = val)
> >> >->  Hash Join
> >> >  Hash Cond: (foo_1.k = bar_1.k)
> >> >  ->  Seq Scan on p_2 foo_1
> >> >  ->  Hash
> >> >->  Seq Scan on p_2 bar_1
> >> >  Filter: (k = val)
> >> > (13 rows)
> >> >
> >> > But if we exchange the order of the two quals to 'foo.k = bar.val and
> >> > foo.k = bar.k', then partitionwise join cannot be generated any more,
> >> > because we only have joinclause 'foo.k = bar.val' as it first reached
> >> > score of 3. We have missed the joinclause on the partition key
> although
> >> > it does exist.
> >> >
> >> > # explain (costs off)
> >> > select * from p as foo join p as bar on foo.k = bar.val and foo.k =
> bar.k;
> >> >QUERY PLAN
> >> > -
> >> >  Hash Join
> >> >Hash Cond: (foo.k = bar.val)
> >> >->  Append
> >> >  ->  Seq Scan on p_1 foo
> >> >  ->  Seq Scan on p_2 foo_1
> >> >->  Hash
> >> >  ->  Append
> >> >->  Seq Scan on p_1 bar
> >> >  Filter: (val = k)
> >> >->  Seq Scan on p_2 bar_1
> >> >  Filter: (val = k)
> >> > (11 rows)
> >>
> >> I think it would be nice if we can address this issue.
>
> > Attached is a patch as an attempt to address this issue. The idea is
> > quite straightforward. When building partition info for joinrel, we
> > generate any possible EC-derived joinclauses of form 'outer_em =
> > inner_em', which will be used together with the original restrictlist to
> > check if there exists an equi-join condition for each pair of partition
> > keys.
>
> Thank you for the patch!  Will review.  Could you add the patch to the
> upcoming CF so that it doesn’t get lost?
>

Added this patch: https://commitfest.postgresql.org/24/2266/

Thanks
Richard

Re: A problem about partitionwise join

2019-11-28 Thread Richard Guo

On Fri, Nov 29, 2019 at 11:03 AM Michael Paquier 
wrote:

> On Tue, Nov 26, 2019 at 08:35:33PM +0900, Etsuro Fujita wrote:
> > I've just started reviewing this patch.  One comment I have for now
> > is: this is categorized into Bug Fixes, but we have a workaround at
> > least to the regression test case in the patch (ie, just reorder join
> > clauses), so this seems to me more like an improvement than a bug fix.
>
> Hmm.  Agreed.  Changed the category and moved to next CF.
>

Thanks Etsuro for the comment and thanks Michael for the change.

Thanks
Richard

Re: Parallel grouping sets

2020-01-07 Thread Richard Guo

On Sun, Dec 1, 2019 at 10:03 AM Michael Paquier  wrote:

> On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote:
> > Richard pointed out that he get incorrect results with the patch I
> > attached, there are bugs somewhere,
> > I fixed them now and attached the newest version, please refer to [1] for
> > the fix.
>
> Mr Robot is reporting that the latest patch fails to build at least on
> Windows.  Could you please send a rebase?  I have moved for now the
> patch to next CF, waiting on author.


Thanks for reporting this issue. Here is the rebase.

Thanks
Richard


v3-0001-Support-for-parallel-grouping-sets.patch
Description: Binary data

Re: A problem about partitionwise join

2020-01-18 Thread Richard Guo

Rebased the patch with latest master and also addressed the test case
failure reported by PostgreSQL Patch Tester.

Thanks
Richard


On Fri, Nov 29, 2019 at 11:35 AM Etsuro Fujita 
wrote:

> On Fri, Nov 29, 2019 at 12:08 PM Richard Guo  wrote:
> > On Fri, Nov 29, 2019 at 11:03 AM Michael Paquier 
> wrote:
> >> On Tue, Nov 26, 2019 at 08:35:33PM +0900, Etsuro Fujita wrote:
> >> > I've just started reviewing this patch.  One comment I have for now
> >> > is: this is categorized into Bug Fixes, but we have a workaround at
> >> > least to the regression test case in the patch (ie, just reorder join
> >> > clauses), so this seems to me more like an improvement than a bug fix.
> >>
> >> Hmm.  Agreed.  Changed the category and moved to next CF.
>
> > thanks Michael for the change.
>
> +1
>
> Best regards,
> Etsuro Fujita
>


v2-0001-Fix-up-partitionwise-join.patch
Description: Binary data

Re: Parallel grouping sets

2020-01-19 Thread Richard Guo

I realized that there are two patches in this thread that are
implemented according to different methods, which causes confusion. So I
decide to update this thread with only one patch, i.e. the patch for
'Implementation 1' as described in the first email and then move the
other patch to a separate thread.

With this idea, here is the patch for 'Implementation 1' that is rebased
with the latest master.

Thanks
Richard

On Wed, Jan 8, 2020 at 3:24 PM Richard Guo  wrote:

>
> On Sun, Dec 1, 2019 at 10:03 AM Michael Paquier 
> wrote:
>
>> On Thu, Nov 28, 2019 at 07:07:22PM +0800, Pengzhou Tang wrote:
>> > Richard pointed out that he get incorrect results with the patch I
>> > attached, there are bugs somewhere,
>> > I fixed them now and attached the newest version, please refer to [1]
>> for
>> > the fix.
>>
>> Mr Robot is reporting that the latest patch fails to build at least on
>> Windows.  Could you please send a rebase?  I have moved for now the
>> patch to next CF, waiting on author.
>
>
> Thanks for reporting this issue. Here is the rebase.
>
> Thanks
> Richard
>

v5-0001-Implementing-parallel-grouping-sets.patch
Description: Binary data

Restore CurrentUserId only if 'prevUser' is valid when abort transaction

2018-10-10 Thread Richard Guo

Hi,

This is a follow-up to the issue described in thread
https://www.postgresql.org/message-id/CAMbWs4-Mys%3DhBQSevTA8Zpd-TYFnb%3DXuHhN2TnktXMsfMUbjiQ%40mail.gmail.com

In short, during the first transaction starting phase within a backend, if
there is an 'ereport' after setting transaction state but before saving
CurrentUserId into 'prevUser' in TransactionStateData, 'prevUser' will
remain
as InvalidOid. Then in AbortTransaction(), CurrentUserId is restored with
'prevUser'. As a result, CurrentUserId will be InvalidOid in the rest of the
session.

Attached is a patch that fixes this issue.

Thanks
Richard


0001-Restore-CurrentUserId-only-if-prevUser-is-valid.patch
Description: Binary data

Re: Restore CurrentUserId only if 'prevUser' is valid when abort transaction

2018-10-11 Thread Richard Guo

Hi Michael,
Thanks for your input.

On Thu, Oct 11, 2018 at 11:38 AM, Michael Paquier 
wrote:

> On Wed, Oct 10, 2018 at 03:37:50PM +0800, Richard Guo wrote:
> > This is a follow-up to the issue described in thread
> > https://www.postgresql.org/message-id/CAMbWs4-Mys%3DhBQSevTA8Zpd-TYFnb%
> 3DXuHhN2TnktXMsfMUbjiQ%40mail.gmail.com
> >
> > In short, during the first transaction starting phase within a backend,
> if
> > there is an 'ereport' after setting transaction state but before saving
> > CurrentUserId into 'prevUser' in TransactionStateData, 'prevUser' will
> > remain as InvalidOid. Then in AbortTransaction(), CurrentUserId is
> > restored with 'prevUser'. As a result, CurrentUserId will be
> > InvalidOid in the rest of the session.
> >
> > Attached is a patch that fixes this issue.
>
> I guess that's an issue showing up with Greenplum as you folks are
> likely tweaking how a transaction start happens?
>
> It is as easy as doing something like that in StartTransaction() to get
> into a failed state:
> s->state = TRANS_START;
> s->transactionId = InvalidTransactionId;/* until assigned */
>
> +   {
> +   struct stat statbuf;
> +   if (stat("/tmp/hoge", &statbuf) == 0)
> +   elog(ERROR, "hoge invalid state!");
> +   }
>
> Then do something like the following:
> 1) Start a session
> 2) touch /tmp/hoge
> 3) Issue BEGIN, which fails and initializes CurrentUserId to InvalidOid.
> 4) rm /tmp/hoge
> 3) any DDL causes the system to crash.
>

Yes, you're right. This issue shows up when we were adding
inside StartTransaction()
some resource-group related logic which would error out in some cases.

Your example reproduces the same scene.


>
> Anyway, looking at the patch, I am poked by the comment on top of
> GetUserIdAndSecContext which states that InvalidOid can be a possible
> value.  It seems to me that the root of the problem is that TRANS_STATE
> is enforced to TRANS_INPROGRESS when aborting a transaction in a
> starting state, in which case we should not have to reset CurrentUserId
> as it has never been set.  The main reason why this was done is to
> prevent a warning message to show up.
>

>From the comment, Get/SetUserIdAndSecContext() has considered the case of
InvalidOid, but fails to handle it properly in AbortTransaction().

I think it is a better idea to avoid adjusting the state to TRANS_INPROGRESS
from TRANS_START when aborting a transaction, as your patch does, since its
only purpose is to suppress warning message.


>
> Tom, eedb068c0 is in cause here, and that's your commit.  Could you
> check if something like the attached is adapted?  I am pretty sure that
> we still want the sub-transaction part to still reset CurrentUserId
> unconditionally by the way.
>
> Thanks,
> --
> Michael
>

Pull up sublink of type 'NOT NOT (expr)'

2018-10-17 Thread Richard Guo

Hi hackers,

Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would
not
be considered when pulling up sublinks. For instance:

gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from
b));
 QUERY PLAN
-
 Seq Scan on a  (cost=51.50..85.62 rows=1005 width=8)
   Filter: (hashed SubPlan 1)
   SubPlan 1
 ->  Seq Scan on b  (cost=0.00..44.00 rows=3000 width=4)
(4 rows)


Should we give it a chance, like the attached does?

Thanks
Richard


0001-Pull-up-sublink-of-type-NOT-NOT-expr.patch
Description: Binary data

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-10-25 Thread Richard Guo

Hi Alex,

Yes hashed SubPlan preserves order and may be faster than hash join in some
cases. But I don't think that is a reason good enough to prevent the subplan
from being converted to join.

Let's suppose the subplan is uncorrelated, otherwise hashed SubPlan would
not
be used. Hashed SubPlan can only be applied when the size of the subquery
result can fit in work_mem. When the data size increases or work_mem is set
to
a smaller value that more than one batch is needed, hashed SubPlan will not
be
used and the total cost will increase dramatically. Hash join, by contrast,
handles multiple batches better.

Below is an example to show the behavior of hash join and hashed SubPlan for
single batch and multiple batches.

create table a (i int);
create table b (i int);
insert into a select i from generate_series(1,1)i;
insert into b select i from generate_series(1,1)i;

*For single batch:*

*Hash Join*
gpadmin=# explain analyze select * from a where a.i in (select b.i from b);
   QUERY PLAN
-
 Hash Semi Join  (cost=270.00..552.50 rows=1 width=4) (actual
time=4.332..9.499 rows=1 loops=1)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.016..1.328 rows=1 loops=1)
   ->  Hash  (cost=145.00..145.00 rows=1 width=4) (actual
time=4.292..4.292 rows=1 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 480kB
 ->  Seq Scan on b  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.013..1.817 rows=1 loops=1)
 Planning Time: 0.236 ms
 Execution Time: 10.099 ms
(8 rows)

*Hashed SubPlan*
gpadmin=# explain analyze select * from a where not not a.i in (select b.i
from b);
 QUERY PLAN
-
 Seq Scan on a  (cost=170.00..340.00 rows=5000 width=4) (actual
time=6.359..11.843 rows=1 loops=1)
   Filter: (hashed SubPlan 1)
   SubPlan 1
 ->  Seq Scan on b  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.017..1.749 rows=1 loops=1)
 Planning Time: 0.109 ms
 Execution Time: 12.905 ms
(6 rows)

insert into b select i from generate_series(1,10)i;
insert into b select i from generate_series(1,10)i;


*For multiple batches:*


*Hash Join*
gpadmin=# explain analyze select * from a where a.i in (select b.i from b);
 QUERY PLAN
-
 Hash Semi Join  (cost=6476.00..7659.50 rows=1 width=4) (actual
time=89.915..121.016 rows=1 loops=1)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..145.00 rows=1 width=4) (actual
time=0.020..1.779 rows=1 loops=1)
   ->  Hash  (cost=3030.00..3030.00 rows=21 width=4) (actual
time=89.666..89.667 rows=21 loops=1)
 Buckets: 131072  Batches: 4  Memory Usage: 2860kB
 ->  Seq Scan on b  (cost=0.00..3030.00 rows=21 width=4)
(actual time=0.015..37.686 rows=21 loops=1)
 Planning Time: 0.256 ms
 Execution Time: 121.769 ms
(8 rows)

*Plain SubPlan*
gpadmin=# explain analyze select * from a where not not a.i in (select b.i
from b);
 QUERY PLAN
-
 Seq Scan on a  (cost=0.00..27130170.00 rows=5000 width=4) (actual
time=0.030..9994.036 rows=1 loops=1)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Materialize  (cost=0.00..4901.00 rows=21 width=4) (actual
time=0.000..0.359 rows=5000 loops=1)
   ->  Seq Scan on b  (cost=0.00..3030.00 rows=21 width=4)
(actual time=0.010..1.673 rows=1 loops=1)
 Planning Time: 0.077 ms
 Execution Time: 9995.556 ms
(7 rows)

Thanks
Richard

On Wed, Oct 24, 2018 at 12:39 AM, Alexey Bashtanov 
wrote:

> Hello Richard,
>
> Currently for quals in the form of "NOT NOT (SubLink)", this SubLink would
> not
> be considered when pulling up sublinks. For instance:
>
> gpadmin=# explain select * from a where NOT NOT (a.i in (select b.i from
> b));
>  QUERY PLAN
> -
>  Seq Scan on a  (cost=51.50..85.62 rows=1005 width=8)
>Filter: (hashed SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on b  (cost=0.00..44.00 rows=3000 width=4)
> (4 rows)
>
>
> Should we give it a chance, like the attached does?
>
>
> Sometimes hashed subplan is faster than hash join and than all the other
> options, as it preserves the order.
> Using NOT NOT, one can control whether to use it or not:
> https://pgblog.bashtanov.com/2017/12/08/double-negative-
> and-query-performance/ (test case and results in the bottom of

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-11-14 Thread Richard Guo

Hi Tom,

Thanks for reviewing.

On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane  wrote:

> Richard Guo  writes:
> > Currently for quals in the form of "NOT NOT (SubLink)", this SubLink
> would
> > not be considered when pulling up sublinks.
>
> Yup.
>
> > Should we give it a chance, like the attached does?
>
> What is the argument that this occurs often enough to be worth expending
> extra cycles and code space on?
>
> If we do do something like this, I'd be inclined to make it handle
> any-number-of-consecutive-NOTs, and maybe remove NOT NOT over an ANY,
> not just EXISTS.  But I don't honestly think that it's worth troubling
> over.  Do even the dumbest ORMs generate such code?
>

What this patch does is to recursively remove NOT NOT over a SubLink, so it
actually can handle any-number-of-consecutive-NOTs, both over ANY and over
EXISTS.

Over ANY:

gpadmin=# explain select * from a where not not not not a.i in (select i
from b);
  QUERY PLAN
---
 Hash Join  (cost=42.75..93.85 rows=1130 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=40.25..40.25 rows=200 width=4)
 ->  HashAggregate  (cost=38.25..40.25 rows=200 width=4)
   Group Key: b.i
   ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=4)
(7 rows)

Over EXISTS:

gpadmin=# explain select * from a where not not not not exists (select 1
from b where a.i = b.i);
  QUERY PLAN
---
 Hash Join  (cost=42.75..93.85 rows=1130 width=8)
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=40.25..40.25 rows=200 width=4)
 ->  HashAggregate  (cost=38.25..40.25 rows=200 width=4)
   Group Key: b.i
   ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=4)
(7 rows)


I am not using an ORM, but just considering maybe it would be better if
PostgreSQL can do such pull-up.
Tom, what's your suggestion? Is it worthwhile expending several lines of
codes to do this pull-up?

Thanks
Richard

>
>

Re: Pull up sublink of type 'NOT NOT (expr)'

2018-11-18 Thread Richard Guo

Hi Tom,

Thanks for your input.

On Fri, Nov 16, 2018 at 6:06 AM, Tom Lane  wrote:

> Richard Guo  writes:
> > On Tue, Nov 13, 2018 at 10:05 AM, Tom Lane  wrote:
> >> What is the argument that this occurs often enough to be worth expending
> >> extra cycles and code space on?
>
> > I am not using an ORM, but just considering maybe it would be better if
> > PostgreSQL can do such pull-up.
> > Tom, what's your suggestion? Is it worthwhile expending several lines of
> > codes to do this pull-up?
>
> Well, really, if we're just doing this on speculation and don't even have
> any concrete indication that anybody writes code like that, I can't get
> excited about expending even a few lines of code on it.
>
> The reason why we perform optimizations similar to this in places like
> eval_const_expressions is (IMO anyway) that transformations such as
> function inlining and subquery pullup can create parse trees that look
> like this, even when the original query was perfectly sane and without
> obvious redundancy.  However, because pull_up_sublinks runs before any
> of that, it would only ever see NOT NOT if someone had actually written
> such a thing.  So to justify expending any code space or cycles on
> optimizing this, you have to make the case that that actually happens
> in the wild, and does so often enough to justify wasting some (admittedly
> small) number of cycles for everybody else.  I'd kind of like to see some
> actual field occurrence of NOT NOT over an optimizable IN/EXISTS before
> deciding that it's worth doing.
>
> It's sort of annoying that we have to run pull_up_sublinks before we do
> scalar expression simplification.  If we could do that in the other order,
> NOT NOT elimination would fall out automatically, and we'd also be able
> to recognize some other cases where it initially seems that an IN or
> EXISTS is not at top level, but it becomes so after we const-fold, apply
> DeMorgan's laws, etc.  However, according to the point I made above,
> it makes more sense to apply expression simplification after we've
> completed pullup-like operations.  So we can't really get there from
> here unless we wanted to do (parts of?) expression simplification twice,
> which is unlikely to win often enough to justify the cost.
>

Agree! If we do expression simplification in the other order, we have to do
that twice,
which is not a good idea.


>
> So I'm inclined to reject this patch as probably being a net loss in
> almost all cases.  If you can show any real-world cases where it wins,
> we could reconsider.
>

I am convinced. I do not see this happen often enough in real world neither.
Feel free to reject his patch.

Thanks
Richard

Re: An inefficient query caused by unnecessary PlaceHolderVar

2024-01-14 Thread Richard Guo

Updated this patch over 29f114b6ff, which indicates that we should apply
the same rules for PHVs.

Thanks
Richard


v3-0001-Avoid-unnecessary-PlaceHolderVars-for-Vars-PHVs.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-01-14 Thread Richard Guo

On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
wrote:

> Thank you for providing the test case relevant for this code change.
> The revised patch incorporating this change is attached.  Now the
> patchset looks good to me.  I'm going to push it if there are no
> objections.

Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.

Thanks
Richard

Re: POC: GROUP BY optimization

2024-01-15 Thread Richard Guo

On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov 
wrote:

> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo 
> wrote:
> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
> wrote:
> >>
> >> Thank you for providing the test case relevant for this code change.
> >> The revised patch incorporating this change is attached.  Now the
> >> patchset looks good to me.  I'm going to push it if there are no
> >> objections.
> >
> > Seems I'm late for the party.  Can we hold for several more days?  I'd
> > like to have a review on this patch.
>
> Sure, NP.  I'll hold it till your review.

Thanks.  Appreciate that.

I have briefly reviewed this patch and here are some comments.

* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys.  This does not make sense.  I guess this is a typo and
the real intention is to check against root->sort_pathkeys.

--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path
*path)
   root->num_groupby_pathkeys);

if (n > 0 &&
-   (enable_incremental_sort || n == list_length(path->pathkeys)))
+   (enable_incremental_sort || n ==
list_length(root->sort_pathkeys)))

However, I think this is also incorrect.  When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.

* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos.  Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great.  Consider the query below:

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

explain select count(*) from t group by a, b;
QUERY PLAN
--
 GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
   Group Key: a, b
   ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260
width=8)
(3 rows)

The same path with group keys {a, b} is created and added twice.

* Part of the work performed in this patch overlaps with that of
preprocess_groupclause.  They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
perform this work only once.

* When reordering the GROUP BY keys, I think the following checks need
some improvements.

+   /*
+* Since 1349d27 pathkey coming from underlying node can be in the
+* root->group_pathkeys but not in the processed_groupClause. So, we
+* should be careful here.
+*/
+   sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+   *group_clauses);
+   if (!sgc)
+   /* The grouping clause does not cover this pathkey */
+   break;

I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it.  Also, how
can we guarantee that the returned GROUP BY item is sortable?  Should we
check that with OidIsValid(sgc->sortop)?

Thanks
Richard

Re: Revise the Asserts added to bimapset manipulation functions

2024-01-16 Thread Richard Guo

On Tue, Jan 16, 2024 at 11:08 AM David Rowley  wrote:

> I've now adjusted the patch to have all modifications to Bitmapsets
> return a newly allocated set. There are a few cases missing in master
> that need to be fixed (items 6-10 below):
>
> The patch now includes changes for the following:
>
> 1. Document what REALLOCATE_BITMAPSETS is for.
> 2. Provide a reusable function to check a set is valid and use it in
> cassert builds and use it to validate sets (Richard)
> 3. Provide a reusable function to copy a set and pfree the original
> and use that instead of duplicating that code all over bitmapset.c
> 4. Fix Assert duplication (Richard)
> 5. Make comments in bms_union, bms_intersect, bms_difference clear
> that a new set is allocated.  (This has annoyed me for a while)
> 6. Fix failure to duplicate the set in bms_add_members() when b == NULL.
> 7. Fix failure to duplicate the set in bms_add_range() when upper < lower
> 8. Fix failure to duplicate the set in bms_add_range() when the set
> has enough words already.
> 9. Fix failure to duplicate the set in bms_del_members() when b == NULL
> 10. Fix failure to duplicate the set in bms_join() when a == NULL and
> also fix the b == NULL case too
> 11. Fix hazard in bms_add_members(), bms_int_members(),
> bms_del_members() and bms_join(),  where the code added in 7d58f2342
> could crash if a == b due to the REALLOCATE_BITMAPSETS code pfreeing
> 'a'. This happens in knapsack.c:93, although I propose we adjust that
> code now that empty sets are always represented as NULL.


Thank you so much for all the work you have put into making this patch
perfect.  I reviewed through the v3 patch and I do not have further
comments.  I think it's in good shape now.

Thanks
Richard

Re: Strange Bitmapset manipulation in DiscreteKnapsack()

2024-01-16 Thread Richard Guo

On Tue, Jan 16, 2024 at 11:32 AM David Rowley  wrote:

> While working on [1], I noticed some strange code in
> DiscreteKnapsack() which seems to be aiming to copy the Bitmapset.
>
> It's not that obvious at a casual glance, but:
>
> sets[j] = bms_del_members(sets[j], sets[j]);
>
> this is aiming to zero all the words in the set by passing the same
> set in both parameters.
>
> Now that 00b41463c changed Bitmapset to have NULL be the only valid
> representation of an empty set, this code no longer makes sense.  We
> may as well just bms_free() the original set and bms_copy() in the new
> set as the bms_del_members() call will always pfree the set anyway.
>
> I've done that in the attached.


+1.  This is actually what happens with the original code, i.e.
bms_del_members() will pfree sets[j] and bms_add_members() will bms_copy
sets[ow] to sets[j].  But the new code looks more natural.

I also checked other callers of bms_del_members() and did not find
another case that passes the same set in both parameters.

Thanks
Richard

Re: Fix a typo of func DecodeInsert()

2024-01-16 Thread Richard Guo

On Wed, Jan 17, 2024 at 8:47 AM Yongtao Huang 
wrote:

> Hi all,
> I think the comment above the function DecodeInsert()
> in src/backend/replication/logical/decode.c should be
> + * *Inserts *can contain the new tuple.
> , rather than
> - * *Deletes *can contain the new tuple.
>

Nice catch.  +1.

I kind of wonder if it would be clearer to state that "XLOG_HEAP_INSERT
can contain the new tuple", in order to differentiate it from
XLOG_HEAP2_MULTI_INSERT.

Thanks
Richard

Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-01-17 Thread Richard Guo

On Tue, Jan 16, 2024 at 2:30 AM Robert Haas  wrote:

> On Mon, Jan 8, 2024 at 3:32 AM Richard Guo  wrote:
> > Thanks for the suggestion.  Attached is an updated patch which is added
> > with a commit message that tries to explain the problem and the fix.
>
> This is great. The references to "the sampling infos" are a little bit
> confusing to me. I'm not sure if this is referring to
> TableSampleClause objects or what.


Yeah, it's referring to TableSampleClause objects.  I've updated the
commit message to clarify that.  In passing I also updated the test
cases a bit.  Please see
v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch


> > Fair point.  I think we can back-patch a more minimal fix, as Tom
> > proposed in [1], which disallows the reparameterization if the path
> > contains sampling info that references the outer rel.  But I think we
> > need also to disallow the reparameterization of a path if it contains
> > restriction clauses that reference the outer rel, as such paths have
> > been found to cause incorrect results, or planning errors as in [2].
>
> Do you want to produce a patch for that, to go along with this patch for
> master?


Sure, here it is:
v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch

Thanks
Richard


v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data


v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch
Description: Binary data

Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-01-17 Thread Richard Guo

On Wed, Jan 17, 2024 at 5:01 PM Richard Guo  wrote:

> On Tue, Jan 16, 2024 at 2:30 AM Robert Haas  wrote:
>
>> On Mon, Jan 8, 2024 at 3:32 AM Richard Guo 
>> wrote:
>
> > Fair point.  I think we can back-patch a more minimal fix, as Tom
>> > proposed in [1], which disallows the reparameterization if the path
>> > contains sampling info that references the outer rel.  But I think we
>> > need also to disallow the reparameterization of a path if it contains
>> > restriction clauses that reference the outer rel, as such paths have
>> > been found to cause incorrect results, or planning errors as in [2].
>>
>> Do you want to produce a patch for that, to go along with this patch for
>> master?
>
>
> Sure, here it is:
> v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch
>

I forgot to mention that this patch applies on v16 not master.

Thanks
Richard

Re: Strange Bitmapset manipulation in DiscreteKnapsack()

2024-01-17 Thread Richard Guo

On Thu, Jan 18, 2024 at 8:35 AM David Rowley  wrote:

> The functions's header comment mentions "The bitmapsets are all
> pre-initialized with an unused high bit so that memory allocation is
> done only once.".


Ah, I neglected to notice this when reviewing the v1 patch.  I guess
it's implemented this way due to performance considerations, right?


> I've attached a patch which adds bms_replace_members(). It's basically
> like bms_copy() but attempts to reuse the member from another set. I
> considered if the new function should be called bms_copy_inplace(),
> but left it as bms_replace_members() for now.


Do you think we can use 'memcpy(a, b, BITMAPSET_SIZE(b->nwords))'
directly in the new bms_replace_members() instead of copying the
bitmapwords one by one, like:

-   i = 0;
-   do
-   {
-   a->words[i] = b->words[i];
-   } while (++i < b->nwords);
-
-   a->nwords = b->nwords;
+   memcpy(a, b, BITMAPSET_SIZE(b->nwords));

But I'm not sure if this is an improvement or not.

Thanks
Richard

Reordering DISTINCT keys to match input path's pathkeys

2024-01-22 Thread Richard Guo

Similar to what we did to GROUP BY keys in 0452b461bc, I think we can do
the same to DISTINCT keys, i.e. reordering DISTINCT keys to match input
path's pathkeys, which can help reduce cost by avoiding unnecessary
re-sort, or allowing us to use incremental-sort to save efforts.  For
instance,

create table t (a int, b int);
create index on t (a, b);

explain (costs off) select distinct b, a from t limit 10;
QUERY PLAN
--
 Limit
   ->  Unique
 ->  Index Only Scan using t_a_b_idx on t
(3 rows)


Please note that the parser has ensured that the DISTINCT pathkeys
matches the order of ORDER BY clauses.  So there is no need to do this
part again.

In principle, we can perform such reordering for DISTINCT ON too, but we
need to make sure that the resulting pathkeys matches initial ORDER BY
keys, which seems not trivial.  So it doesn't seem worth the effort.

Attached is a patch for this optimization.  Any thoughts?

Thanks
Richard


v1-0001-Reordering-DISTINCT-keys-to-match-input-path-s-pathkeys.patch
Description: Binary data

Re: Removing const-false IS NULL quals and redundant IS NOT NULL quals

2024-01-22 Thread Richard Guo

On Tue, Jan 23, 2024 at 1:11 PM David Rowley  wrote:

> I went over this again. I did a little more work adjusting comments
> and pushed it.
>
> Thanks for all your assistance with this, Richard.


Thanks for pushing!  This is really great.

Thanks
Richard

A compiling warning in jsonb_populate_record_valid

2024-01-24 Thread Richard Guo

I came across a warning when building master (a044e61f1b) on old GCC
(4.8.5).

jsonfuncs.c: In function ‘jsonb_populate_record_valid’:
../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison
will always evaluate as ‘true’ for the address of ‘escontext’ will never be
NULL [-Waddress]
  ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \
   ^
jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’
  return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));

This was introduced in commit 1edb3b491b, and can be observed on several
systems in the buildfarm too, such as:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2024-01-25%2004%3A54%3A38&stg=build

Thanks
Richard

Re: Trivial revise for the check of parameterized partial paths

2024-01-24 Thread Richard Guo

On Sun, Jan 21, 2024 at 8:36 PM vignesh C  wrote:

> I'm seeing that there has been no activity in this thread for nearly 7
> months, I'm planning to close this in the current commitfest unless
> someone is planning to take it forward.

This patch fixes the wrong comments in try_partial_hashjoin_path, and
also simplifies and enhances the checks for parameterized partial paths.
I think it's worth to be moved forward.

I've rebased the patch over current master, added a commit message
describing what it does, and updated the comment a little bit in the v2
patch.

Thanks
Richard

v2-0001-Revisions-to-the-checks-for-parameterized-partial-paths.patch
Description: Binary data

Re: A compiling warning in jsonb_populate_record_valid

2024-01-24 Thread Richard Guo

On Thu, Jan 25, 2024 at 2:28 PM Amit Langote 
wrote:

> On Thu, Jan 25, 2024 at 2:59 PM Richard Guo 
> wrote:
> > I came across a warning when building master (a044e61f1b) on old GCC
> > (4.8.5).
> >
> > jsonfuncs.c: In function ‘jsonb_populate_record_valid’:
> > ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison
> will always evaluate as ‘true’ for the address of ‘escontext’ will never be
> NULL [-Waddress]
> >   ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \
> >^
> > jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’
> >   return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));
> >
> > This was introduced in commit 1edb3b491b, and can be observed on several
> > systems in the buildfarm too, such as:
> >
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2024-01-25%2004%3A54%3A38&stg=build
>
> Thanks for the report.
>
> Will apply the attached, which does this:
>
> -   return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));
> +   return BoolGetDatum(!escontext.error_occurred);


Looks good to me.

Thanks
Richard

Re: A performance issue with Memoize

2024-01-25 Thread Richard Guo

On Fri, Jan 26, 2024 at 2:32 AM Tom Lane  wrote:

> I'm fairly sure I thought it wouldn't matter because of the Param
> de-duplication done in paramassign.c.  However, Richard's example
> shows that's not so, because process_subquery_nestloop_params is
> picky about the param ID assigned to a particular Var while
> replace_nestloop_params is not.  So flipping the order makes sense.
> I'd change the comment though, maybe to
>
> /*
>  * Replace any outer-relation variables with nestloop params.
>  *
>  * We must provide nestloop params for both lateral references of
>  * the subquery and outer vars in the scan_clauses.  It's better
>  * to assign the former first, because that code path requires
>  * specific param IDs, while replace_nestloop_params can adapt
>  * to the IDs assigned by process_subquery_nestloop_params.
>  * This avoids possibly duplicating nestloop params when the
>  * same Var is needed for both reasons.
>  */

+1.  It's much better.

> However ... it seems like we're not out of the woods yet.  Why
> is Richard's proposed test case still showing
>
> + ->  Memoize (actual rows=5000 loops=N)
> +   Cache Key: t1.two, t1.two
>
> Seems like there is missing de-duplication logic, or something.

When we collect the cache keys in paraminfo_get_equal_hashops() we
search param_info's ppi_clauses as well as innerrel's lateral_vars for
outer expressions.  We do not perform de-duplication on the collected
outer expressions there.  In my proposed test case, the same Var
't1.two' appears both in the param_info->ppi_clauses and in the
innerrel->lateral_vars, so we see two identical cache keys in the plan.
I noticed this before and wondered if we should do de-duplication on the
cache keys, but somehow I did not chase this to the ground.

Thanks
Richard

Re: A performance issue with Memoize

2024-01-25 Thread Richard Guo

On Fri, Jan 26, 2024 at 1:22 AM Tom Lane  wrote:

> Apologies for not having noticed this thread before.  I'm taking
> a look at it now.  However, while sniffing around this I found
> what seems like an oversight in paramassign.c's
> assign_param_for_var(): it says it should compare all the same
> fields as _equalVar except for varlevelsup, but it's failing to
> compare varnullingrels.  Is that a bug?  It's conceivable that
> it's not possible to get here with varnullingrels different and
> all else the same, but I don't feel good about that proposition.
>
> I tried adding
>
> @@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
>  pvar->vartype == var->vartype &&
>  pvar->vartypmod == var->vartypmod &&
>  pvar->varcollid == var->varcollid)
> +{
> +Assert(bms_equal(pvar->varnullingrels,
> var->varnullingrels));
>  return pitem->paramId;
> +}
>  }
>  }

Yeah, I think it should be safe to assert that the varnullingrels is
equal here.  The Var is supposed to be an upper-level Var, and two same
such Vars should not have different varnullingrels at this point,
although the varnullingrels might be adjusted later in
identify_current_nestloop_params according to which form of identity 3
we end up applying.

Thanks
Richard

Re: A performance issue with Memoize

2024-01-25 Thread Richard Guo

On Fri, Jan 26, 2024 at 12:18 PM David Rowley  wrote:

> On Fri, 26 Jan 2024 at 16:51, Tom Lane  wrote:
> > >> However ... it seems like we're not out of the woods yet.  Why
> > >> is Richard's proposed test case still showing
> > >> + ->  Memoize (actual rows=5000 loops=N)
> > >> +   Cache Key: t1.two, t1.two
> > >> Seems like there is missing de-duplication logic, or something.
> >
> > > This seems separate and isn't quite causing the same problems as what
> > > Richard wants to fix so I didn't touch this for now.
> >
> > Fair enough, but I think it might be worth pursuing later.
>
> Here's a patch for that.

At first I wondered if we should assume that the same param expr must
have the same equality operator. If not, we should also check the
operator to tell if the cache key is a duplicate, like

-   if (!list_member(*param_exprs, expr))
+   if (!list_member(*param_exprs, expr) ||
+   !list_member_oid(*operators, hasheqoperator))

But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
is set, it seems we can assume that: the operator is from the type cache
entry which is fetched according to the expr datatype.

So I think the patch makes sense.  +1.

Thanks
Richard

Apply the "LIMIT 1" optimization to partial DISTINCT

2024-01-25 Thread Richard Guo

In 5543677ec9 we introduced an optimization that uses Limit instead of
Unique to implement DISTINCT when all the DISTINCT pathkeys have been
marked as redundant.  I happened to notice that this optimization was
not applied to partial DISTINCT, which I think should be.  This can
improve plans in some cases, such as

-- on master
explain (costs off) select distinct four from tenk1 where four = 4;
  QUERY PLAN
--
 Limit
   ->  Gather
 Workers Planned: 4
 ->  Unique
   ->  Parallel Seq Scan on tenk1
 Filter: (four = 4)
(6 rows)

-- patched
explain (costs off) select distinct four from tenk1 where four = 4;
  QUERY PLAN
--
 Limit
   ->  Gather
 Workers Planned: 4
 ->  Limit
   ->  Parallel Seq Scan on tenk1
 Filter: (four = 4)
(6 rows)

Such queries might not be that common, but it's very cheap to apply this
optimization.

Attached is a patch for that.

Thanks
Richard


v1-0001-Apply-the-LIMIT-1-optimization-to-partial-DISTINCT.patch
Description: Binary data

Re: Reordering DISTINCT keys to match input path's pathkeys

2024-01-26 Thread Richard Guo

On Tue, Jan 23, 2024 at 5:03 PM David Rowley  wrote:

> I've not caught up on the specifics of 0452b461b, but I just wanted to
> highlight that there was some work done in [1] in this area.  It seems
> Ankit didn't ever add that to a CF, so that might explain why it's
> been lost.
>
> Anyway, just pointing it out as there may be useful code or discussion
> in the corresponding threads.

Thanks for pointing it out.  I looked at the patch there and noticed
several problems with it.

* That patch is incomplete and does not work as expected.  It at least
needs to modify truncate_useless_pathkeys() to account for DISTINCT
clause (I think this has been mentioned in that thread).

* That patch would not consider the origin DISTINCT pathkeys if it could
do some reordering, which is not great and can generate inefficient
plans.  For instance (after fixing the first problem)

create table t (a int, b int);
create index on t(a);

set enable_hashagg to off;
set enable_incremental_sort to off;
set enable_seqscan to off;

explain (costs off) select distinct b, a from t order by b, a;
   QUERY PLAN
-
 Sort
   Sort Key: b, a
   ->  Unique
 ->  Sort
   Sort Key: a, b
   ->  Index Scan using t_a_idx on t
(6 rows)

Using DISTINCT pathkeys {b, a} is more efficient for this plan, because
only one Sort would be required.  But that patch is not able to do that,
because it does not consider the origin DISTINCT pathkeys after
reordering.

Thanks
Richard

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2024-01-28 Thread Richard Guo

On Sat, Jan 27, 2024 at 10:08 AM vignesh C  wrote:

> On Mon, 8 Jan 2024 at 22:21, Tom Lane  wrote:
> >
> > Richard Guo  writes:
> > > On Sun, Jan 7, 2024 at 6:41 AM Tom Lane  wrote:
> > >> Thanks for the report!  I guess we need something like the attached.
> >
> > > +1.
> >
> > Pushed, thanks for looking at it.
>
> I have changed the status of the commitfest entry to "Committed" as I
> noticed the patch has already been committed.

Well, the situation seems a little complex here.  At first, this thread
was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE'
patch, which has already been pushed in [1].  Subsequently, I proposed
another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in
[2], which is currently under review and is what the commitfest entry
for.  Later on, within the same thread, another patch was posted as a
fix to the first patch and was subsequently pushed in [3].  I believe
this sequence of events might have led to confusion.

What is the usual practice in such situations?  I guess I'd better to
fork a new thread to discuss my proposed patch which is about the
'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'.

[1] https://www.postgresql.org/message-id/754093.1700250120%40sss.pgh.pa.us
[2]
https://www.postgresql.org/message-id/CAMbWs49gAHeEOn0rpdUUYXryaa60KZ8JKwk1aSERttY9caCYkA%40mail.gmail.com
[3] https://www.postgresql.org/message-id/1941515.1704732682%40sss.pgh.pa.us

Thanks
Richard

Propagate pathkeys from CTEs up to the outer query

2024-01-28 Thread Richard Guo

In [1] we've reached a conclusion that for a MATERIALIZED CTE it's okay
to 'allow our statistics or guesses for the sub-query to subsequently
influence what the upper planner does'.  Commit f7816aec23 exposes
column statistics to the upper planner.  In the light of that, here is a
patch that exposes info about the ordering of the CTE result to the
upper planner.

This patch was initially posted in that same thread and has received
some comments from Tom in [2].  Due to the presence of multiple patches
in that thread, it has led to confusion.  So fork a new thread here
specifically dedicated to discussing the patch about exposing pathkeys
from CTEs to the upper planner.

Comments/thoughts?

[1] https://www.postgresql.org/message-id/482957.1700192299%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/1339607.1700502358%40sss.pgh.pa.us

Thanks
Richard


v3-0001-Propagate-pathkeys-from-CTEs-up-to-the-outer-query.patch
Description: Binary data

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2024-01-28 Thread Richard Guo

On Mon, Jan 29, 2024 at 11:20 AM vignesh C  wrote:

> On Mon, 29 Jan 2024 at 08:01, Richard Guo  wrote:
> > On Sat, Jan 27, 2024 at 10:08 AM vignesh C  wrote:
> >> I have changed the status of the commitfest entry to "Committed" as I
> >> noticed the patch has already been committed.
> >
> > Well, the situation seems a little complex here.  At first, this thread
> > was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE'
> > patch, which has already been pushed in [1].  Subsequently, I proposed
> > another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in
> > [2], which is currently under review and is what the commitfest entry
> > for.  Later on, within the same thread, another patch was posted as a
> > fix to the first patch and was subsequently pushed in [3].  I believe
> > this sequence of events might have led to confusion.
> >
> > What is the usual practice in such situations?  I guess I'd better to
> > fork a new thread to discuss my proposed patch which is about the
> > 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'.
>
> Sorry I missed to notice that there was one pending patch yet to be
> committed, I feel you can continue discussing here itself just to
> avoid losing any historical information about the issue and the
> continuation of the discussion. You can add a new commitfest entry for
> this.


It seems to me that a fresh new thread is a better option.  I have just
started a new thread in [1], and have tried to migrate the necessary
context over there.  I have also updated the commitfest entry
accordingly.

[1]
https://www.postgresql.org/message-id/flat/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew%40mail.gmail.com

Thanks
Richard

Re: Some revises in adding sorting path

2024-01-29 Thread Richard Guo

On Mon, Jul 17, 2023 at 4:55 PM Richard Guo  wrote:

> But I did not find a query that makes an incremental sort in this case.
> After trying for a while it seems to me that we do not need to consider
> incremental sort in this case, because for a partial path of a grouped
> or partially grouped relation, it is either unordered (HashAggregate or
> Append), or it has been ordered by the group_pathkeys (GroupAggregate).
> It seems there is no case that we'd have a partial path that is
> partially sorted.
>

Since now we'd try to reorder the group by keys (see 0452b461bc), it is
possible that we have a partial path that is partially sorted.  So this
conclusion is not true any more.  For instance,

create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from
generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
QUERY PLAN
--
 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Gather Merge
 Workers Planned: 2
 ->  Incremental Sort
   Sort Key: a, c, b, (parallel_safe_volatile(d))
   Presorted Key: a
   ->  Partial GroupAggregate
 Group Key: a, b, c, (parallel_safe_volatile(d))
 ->  Incremental Sort
   Sort Key: a, b, c, (parallel_safe_volatile(d))
   Presorted Key: a, b
   ->  Parallel Index Scan using t_a_b_idx on t
(13 rows)

If we do not consider incremental sort on partial paths in
gather_grouping_paths(), we'd get a plan that looks like:

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
   QUERY PLAN

 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Incremental Sort
 Sort Key: a, c, b, (parallel_safe_volatile(d))
 Presorted Key: a, c, b
 ->  Gather Merge
   Workers Planned: 2
   ->  Incremental Sort
 Sort Key: a, c, b
 Presorted Key: a
 ->  Partial GroupAggregate
   Group Key: a, b, c, (parallel_safe_volatile(d))
   ->  Incremental Sort
 Sort Key: a, b, c,
(parallel_safe_volatile(d))
 Presorted Key: a, b
 ->  Parallel Index Scan using t_a_b_idx on
t
(16 rows)

So in the v3 patch I've brought back the logic that considers
incremental sort on partial paths in gather_grouping_paths().  However,
I failed to compose a test case for this scenario without having to
generate a huge table.  So in the v3 patch I did not include a test case
for this aspect.

Thanks
Richard


v3-0001-Revise-how-we-sort-partial-paths-in-create_ordered_paths.patch
Description: Binary data


v3-0002-Revise-how-we-sort-partial-paths-in-gather_grouping_paths.patch
Description: Binary data

Re: Retiring is_pushed_down

2024-01-29 Thread Richard Guo

On Sun, Jan 21, 2024 at 8:37 PM vignesh C  wrote:

> I'm seeing that there has been no activity in this thread for nearly 6
> months, I'm planning to close this in the current commitfest unless
> someone is planning to take it forward. It can be opened again when
> there is more interest.

I'm planning to take it forward.  The v2 patch does not compile any
more.  Attached is an updated patch that is rebased over master and
fixes the compiling issues.  Nothing else has changed.

Thanks
Richard

v3-0001-Retiring-is_pushed_down.patch
Description: Binary data

Re: Support run-time partition pruning for hash join

2024-01-29 Thread Richard Guo

On Sat, Jan 27, 2024 at 11:29 AM vignesh C  wrote:

> CFBot shows that the patch does not apply anymore as in [1]:
>
> Please post an updated version for the same.

Attached is an updated patch.  Nothing else has changed.

Thanks
Richard

v6-0001-Support-run-time-partition-pruning-for-hash-join.patch
Description: Binary data

Re: Some revises in adding sorting path

2024-01-30 Thread Richard Guo

On Tue, Jan 30, 2024 at 7:00 PM David Rowley  wrote:

> On Mon, 29 Jan 2024 at 22:39, Richard Guo  wrote:
> > So in the v3 patch I've brought back the logic that considers
> > incremental sort on partial paths in gather_grouping_paths().  However,
> > I failed to compose a test case for this scenario without having to
> > generate a huge table.  So in the v3 patch I did not include a test case
> > for this aspect.
>
> Can you share the test with the huge table?

The test had been shown in upthread [1].  Pasting it here:

create table t (a int, b int, c int, d int);
insert into t select i%10, i%10, i%10, i%10 from
generate_series(1,100)i;
create index on t (a, b);
analyze t;

set enable_hashagg to off;
set enable_seqscan to off;

explain (costs off)
select count(*) from t group by a, c, b, parallel_safe_volatile(d);
QUERY PLAN
--
 Finalize GroupAggregate
   Group Key: a, c, b, (parallel_safe_volatile(d))
   ->  Gather Merge
 Workers Planned: 2
 ->  Incremental Sort
   Sort Key: a, c, b, (parallel_safe_volatile(d))
   Presorted Key: a
   ->  Partial GroupAggregate
 Group Key: a, b, c, (parallel_safe_volatile(d))
 ->  Incremental Sort
   Sort Key: a, b, c, (parallel_safe_volatile(d))
   Presorted Key: a, b
   ->  Parallel Index Scan using t_a_b_idx on t
(13 rows)

> I tried and failed as, if I'm not mistaken, you're talking about a
> parallel aggregate query with an incremental sort between the Partial
> Aggregate node and the Finalize Group Aggregate node.  If the partial
> aggregate was a Group Aggregate, then it would already be correctly
> sorted.  We don't need a more strict sort ordering to perform the
> Finalize Group Aggregate, the results must already be sorted by at
> least the GROUP BY clause.  If the partial aggregate had opted to Hash
> Aggregate, then there'd be no presorted keys, so we could only get a
> full sort.  I can't see any way to get an incremental sort between the
> 2 aggregate phases.
>
> What am I missing?

This was true before 0452b461bc, and I reached the same conclusion in
[2].  Quote it here:

"
But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.
"

But if we've reordered the group by keys to match the input path's
pathkeys, we might have a partial GroupAggregate that is partially
sorted.  See the plan above.

> I also tried reverting your changes to planner.c to see if your new
> tests would fail. They all passed. So it looks like none of these
> tests are testing anything new.

This patchset does not aim to introduce anything new; it simply
refactors the existing code.  The newly added tests are used to show
that the code that is touched here is not redundant, but rather
essential for generating certain paths.  I remember the tests were added
per your comment in [3].

[1]
https://www.postgresql.org/message-id/CAMbWs4_iDwMAf5mp%2BG-tXq-gYzvR6koSHvNUqBPK4pt7%2B11tJw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAMbWs497h5jVCVwNDb%2BBX31Z_K8iBaPQKOcsTvpFQ7kF18a2%2Bg%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com

Thanks
Richard

Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-01-30 Thread Richard Guo

On Wed, Jan 31, 2024 at 5:12 AM Tom Lane  wrote:

> Richard Guo  writes:
> > On Wed, Jan 17, 2024 at 5:01 PM Richard Guo 
> wrote:
> >> Sure, here it is:
> >> v10-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch
>
> > I forgot to mention that this patch applies on v16 not master.
>
> I spent some time looking at this patch (which seems more urgent than
> the patch for master, given that we have back-branch releases coming
> up).

Thanks for looking at this patch!

> There are two things I'm not persuaded about:
>
> * Why is it okay to just use pull_varnos on a tablesample expression,
> when contain_references_to() does something different?

Hmm, the main reason is that the expression_tree_walker() function does
not handle T_RestrictInfo nodes.  So we cannot just use pull_varnos or
pull_var_clause on a restrictinfo list.  So contain_references_to()
calls extract_actual_clauses() first before it calls pull_var_clause().

> * Is contain_references_to() doing the right thing by specifying
> PVC_RECURSE_PLACEHOLDERS?  That causes it to totally ignore a
> PlaceHolderVar's ph_eval_at, and I'm not convinced that's correct.

I was thinking that we should recurse into the PHV's contents to see if
there are any lateral references to the other join relation.  But now I
realize that checking phinfo->ph_lateral, as you suggested, might be a
better way to do that.

But I'm not sure about checking phinfo->ph_eval_at.  It seems to me that
the ph_eval_at could not overlap the other join relation; otherwise the
clause would not be a restriction clause but rather a join clause.

> Ideally it seems to me that we want to reject a PlaceHolderVar
> if either its ph_eval_at or ph_lateral overlap the other join
> relation; if it was coded that way then we'd not need to recurse
> into the PHV's contents.   pull_varnos isn't directly amenable
> to this, but I think we could use pull_var_clause with
> PVC_INCLUDE_PLACEHOLDERS and then iterate through the resulting
> list manually.  (If this patch were meant for HEAD, I'd think
> about extending the var.c code to support this usage more directly.
> But as things stand, this is a one-off so I think we should just do
> what we must in reparameterize_path_by_child.)

Thanks for the suggestion.  I've coded it this way in the v11 patch, and
left a XXX comment about checking phinfo->ph_eval_at.

> BTW, it shouldn't be necessary to write either PVC_RECURSE_AGGREGATES
> or PVC_RECURSE_WINDOWFUNCS, because neither kind of node should ever
> appear in a scan-level expression.  I'd leave those out so that we
> get an error if something unexpected happens.

Good point.

Thanks
Richard

v11-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch
Description: Binary data

Re: Some revises in adding sorting path

2024-01-30 Thread Richard Guo

On Wed, Jan 31, 2024 at 5:13 AM David Rowley  wrote:

> On Wed, 31 Jan 2024 at 00:44, Richard Guo  wrote:
> > This patchset does not aim to introduce anything new; it simply
> > refactors the existing code.  The newly added tests are used to show
> > that the code that is touched here is not redundant, but rather
> > essential for generating certain paths.  I remember the tests were added
> > per your comment in [3].
> >
> > [3]
> https://www.postgresql.org/message-id/CAApHDvo%2BFagxVSGmvt-LUrhLZQ0KViiLvX8dPaG3ZzWLNd-Zpg%40mail.gmail.com
>
> OK.  I've pushed the patched based on it being a simplification of the
> partial path generation.


Thanks for pushing it!

Thanks
Richard

Re: Apply the "LIMIT 1" optimization to partial DISTINCT

2024-01-30 Thread Richard Guo

On Wed, Jan 31, 2024 at 12:26 PM David Rowley  wrote:

> On Fri, 26 Jan 2024 at 21:14, David Rowley  wrote:
> > However, having said that. Parallel plans are often picked when there
> > is some highly selective qual as parallel_tuple_cost has to be applied
> > to fewer tuples for such plans, so probably this is worth doing.
>
> I was messing around with your test case and didn't manage to get any
> plan that had any rows to use the partial path with the LIMIT.  I
> ended up dropping the test that was checking the results were empty as
> I didn't think it added much more value over the EXPLAIN output.
>
> I pushed the result.


Thanks for pushing it!

Thanks
Richard

Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-01-31 Thread Richard Guo

On Wed, Jan 31, 2024 at 11:21 PM Tom Lane  wrote:

> Richard Guo  writes:
> > On Wed, Jan 31, 2024 at 5:12 AM Tom Lane  wrote:
> >> * Why is it okay to just use pull_varnos on a tablesample expression,
> >> when contain_references_to() does something different?
>
> > Hmm, the main reason is that the expression_tree_walker() function does
> > not handle T_RestrictInfo nodes.  So we cannot just use pull_varnos or
> > pull_var_clause on a restrictinfo list.
>
> Right, the extract_actual_clauses step is not wanted for the
> tablesample expression.  But my point is that surely the handling of
> Vars and PlaceHolderVars should be the same, which it's not, and your
> v11 makes it even less so.  How can it be OK to ignore Vars in the
> restrictinfo case?

Hmm, in this specific scenario it seems that it's not possible to have
Vars in the restrictinfo list that laterally reference the outer join
relation; otherwise the clause containing such Vars would not be a
restriction clause but rather a join clause.  So in v11 I did not check
Vars in contain_references_to().

But I agree that we'd better to handle Vars and PlaceHolderVars in the
same way.

> I think the code structure we need to end up with is a routine that
> takes a RestrictInfo-free node tree (and is called directly in the
> tablesample case) with a wrapper routine that strips the RestrictInfos
> (for use on restriction lists).

How about the attached v12 patch?  But I'm a little concerned about
omitting PVC_RECURSE_AGGREGATES and PVC_RECURSE_WINDOWFUNCS in this
case.  Is it possible that aggregates or window functions appear in the
tablesample expression?

Thanks
Richard

v12-0001-Avoid-reparameterizing-Paths-when-it-s-not-suitable.patch
Description: Binary data

Re: set_cheapest without checking pathlist

2024-01-31 Thread Richard Guo

On Thu, Feb 1, 2024 at 10:04 AM James Coleman  wrote:

> I don't see any inherent reason why we must always assume that
> gather_grouping_paths will always result in having at least one entry
> in pathlist. If, for example, we've disabled incremental sort and the
> cheapest partial path happens to already be sorted, then I don't
> believe we'll add any paths. And if that happens then set_cheapest
> will error with the message "could not devise a query plan for the
> given query". So I propose we add a similar guard to this call point.

I don't believe that would happen.  It seems to me that we should, at
the very least, have a path which is Gather on top of the cheapest
partial path (see generate_gather_paths), as long as the
partially_grouped_rel has partial paths.

Thanks
Richard

Re: set_cheapest without checking pathlist

2024-01-31 Thread Richard Guo

On Thu, Feb 1, 2024 at 11:37 AM David Rowley  wrote:

> On Thu, 1 Feb 2024 at 16:29, Richard Guo  wrote:
> > On Thu, Feb 1, 2024 at 10:04 AM James Coleman  wrote:
> >> I don't see any inherent reason why we must always assume that
> >> gather_grouping_paths will always result in having at least one entry
> >> in pathlist. If, for example, we've disabled incremental sort and the
> >> cheapest partial path happens to already be sorted, then I don't
> >> believe we'll add any paths. And if that happens then set_cheapest
> >> will error with the message "could not devise a query plan for the
> >> given query". So I propose we add a similar guard to this call point.
> >
> >
> > I don't believe that would happen.  It seems to me that we should, at
> > the very least, have a path which is Gather on top of the cheapest
> > partial path (see generate_gather_paths), as long as the
> > partially_grouped_rel has partial paths.
>
> It will have partial paths because it's nested inside "if
> (partially_grouped_rel && partially_grouped_rel->partial_pathlist)"


Right.  And that leads to the conclusion that gather_grouping_paths will
always generate at least one path for partially_grouped_rel.  So it
seems to me that the check added in the patch is not necessary.  But
maybe we can add an Assert or a comment clarifying that the pathlist of
partially_grouped_rel cannot be empty here.

Thanks
Richard

Re: A performance issue with Memoize

2024-02-01 Thread Richard Guo

On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy <
anthonin.bonne...@datadoghq.com> wrote:

> Hi,
>
> I've seen a similar issue with the following query (tested on the current
> head):
>
> EXPLAIN ANALYZE SELECT * FROM tenk1 t1
> LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
> ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
>QUERY PLAN
>
> 
>  Nested Loop Left Join  (cost=8.46..1495.10 rows=5 width=256)
> (actual time=0.860..111.013 rows=5 loops=1)
>->  Bitmap Heap Scan on tenk1 t1  (cost=8.16..376.77 rows=500
> width=244) (actual time=0.798..1.418 rows=500 loops=1)
>  Recheck Cond: (hundred < 5)
>  Heap Blocks: exact=263
>  ->  Bitmap Index Scan on tenk1_hundred  (cost=0.00..8.04
> rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
>Index Cond: (hundred < 5)
>->  Memoize  (cost=0.30..4.89 rows=100 width=12) (actual
> time=0.009..0.180 rows=100 loops=500)
>  Cache Key: t1.hundred
>  Cache Mode: logical
>  Hits: 0  Misses: 500  Evictions: 499  Overflows: 0  Memory Usage:
> 5kB
>  ->  Index Scan using tenk2_hundred on tenk2  (cost=0.29..4.88
> rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
>Index Cond: (hundred = t1.hundred)
>  Planning Time: 0.661 ms
>  Execution Time: 113.076 ms
> (14 rows)
>
> The memoize's cache key only uses t1.hundred while the nested loop has
> two changed parameters: the lateral var t1.two and t1.hundred. This
> leads to a chgParam that is always different and the cache is purged
> on each rescan.


Thanks for the report!  This issue is caused by that we fail to check
PHVs for lateral references, and hence cannot know that t1.two should
also be included in the cache keys.

I reported exactly the same issue in [1], and verified that it can be
fixed by the patch in [2] (which seems to require a rebase).

[1]
https://www.postgresql.org/message-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun%2BVtrowcmxb41CzA%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAMbWs49%2BCjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg%40mail.gmail.com

Thanks
Richard

Re: set_cheapest without checking pathlist

2024-02-01 Thread Richard Guo

On Thu, Feb 1, 2024 at 5:03 PM David Rowley  wrote:

> On Thu, 1 Feb 2024 at 19:36, Richard Guo  wrote:
> > maybe we can add an Assert or a comment clarifying that the pathlist of
> > partially_grouped_rel cannot be empty here.
>
> There'd be no need to Assert that as set_cheapest() will raise an
> error when given a rel with an empty pathlist.
>
> I don't think putting an Assert that would fail in the same situation
> that we'd later ERROR out on would be a good idea. It'd make it harder
> to debug the issue.


Fair point.

Thanks
Richard

Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 1:45 AM Tom Lane  wrote:

> I pushed v12 (with some cosmetic adjustments) into the back branches,
> since we're getting close to the February release freeze.  We still
> need to deal with the larger fix for HEAD.  Please re-post that
> one so that the cfbot knows which is the patch-of-record.


Thanks for the adjustments and pushing!

Here is the patch for HEAD.  I simply re-posted v10.  Nothing has
changed.

Thanks
Richard


v10-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I'm going to push this if there are no objections.
>
> One of the test cases added by this commit has not been very
> stable in the buildfarm.  Latest example is here:
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
>
> and I've seen similar failures intermittently on other machines.
>
> I'd suggest building this test atop a table that is more stable
> than pg_class.  You're just waving a red flag in front of a bull
> if you expect stable statistics from that during a regression run.
> Nor do I see any particular reason for pg_class to be especially
> suited to the test.

Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 1 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.

Thanks
Richard

Re: POC: GROUP BY optimization

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 11:32 AM Richard Guo  wrote:

> On Fri, Feb 2, 2024 at 10:02 AM Tom Lane  wrote:
>
>> One of the test cases added by this commit has not been very
>> stable in the buildfarm.  Latest example is here:
>>
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
>>
>> and I've seen similar failures intermittently on other machines.
>>
>> I'd suggest building this test atop a table that is more stable
>> than pg_class.  You're just waving a red flag in front of a bull
>> if you expect stable statistics from that during a regression run.
>> Nor do I see any particular reason for pg_class to be especially
>> suited to the test.
>
>
> Yeah, it's not a good practice to use pg_class in this place.  While
> looking through the test cases added by this commit, I noticed some
> other minor issues that are not great.  Such as
>
> * The table 'btg' is inserted with 1 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.
>
> * I don't see why we need to manipulate GUC max_parallel_workers and
> max_parallel_workers_per_gather.
>
> * I think we'd better write the tests with the keywords being all upper
> or all lower.  A mixed use of upper and lower is not great. Such as in
>
> explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
>
> * Some comments for the test queries are not easy to read.
>
> * For this statement
>
> CREATE INDEX idx_y_x_z ON btg(y,x,w);
>
> I think the index name would cause confusion.  It creates an index on
> columns y, x and w, but the name indicates an index on y, x and z.
>
> I'd like to write a draft patch for the fixes.
>

Here is the draft patch that fixes the issues I complained about in
upthread.

Thanks
Richard


v1-0001-Multiple-revises-for-the-GROUP-BY-reordering-tests.patch
Description: Binary data

Re: An improvement on parallel DISTINCT

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 11:26 AM David Rowley  wrote:

> In light of this, do you still think it's worthwhile making this change?
>
> For me, I think all it's going to result in is extra planner work
> without any performance gains.


Hmm, with the query below, I can see that the new plan is cheaper than
the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.

create table t (a int, b int);
insert into t select i%10, i from generate_series(1,1000)i;
analyze t;

-- on master
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--
 Limit  (cost=120188.50..120188.51 rows=1 width=4)
   ->  Sort  (cost=120188.50..120436.95 rows=99379 width=4)
 Sort Key: a
 ->  HashAggregate  (cost=118697.82..119691.61 rows=99379 width=4)
   Group Key: a
   ->  Gather  (cost=97331.33..118200.92 rows=198758 width=4)
 Workers Planned: 2
 ->  HashAggregate  (cost=96331.33..97325.12 rows=99379
width=4)
   Group Key: a
   ->  Parallel Seq Scan on t  (cost=0.00..85914.67
rows=417 width=4)
(10 rows)

-- on patched
explain (costs on) select distinct a from t order by a limit 1;
QUERY PLAN
--
 Limit  (cost=106573.93..106574.17 rows=1 width=4)
   ->  Unique  (cost=106573.93..130260.88 rows=99379 width=4)
 ->  Gather Merge  (cost=106573.93..129763.98 rows=198758 width=4)
   Workers Planned: 2
   ->  Sort  (cost=105573.91..105822.35 rows=99379 width=4)
 Sort Key: a
 ->  HashAggregate  (cost=96331.33..97325.12 rows=99379
width=4)
   Group Key: a
   ->  Parallel Seq Scan on t  (cost=0.00..85914.67
rows=417 width=4)
(9 rows)

It seems that including a LIMIT clause can potentially favor the new
plan.

Thanks
Richard

Re: Commitfest 2024-01 is now closed

2024-02-01 Thread Richard Guo

On Fri, Feb 2, 2024 at 2:41 PM Michael Paquier  wrote:

> On Fri, Feb 02, 2024 at 11:56:36AM +0530, Amit Kapila wrote:
> > On Fri, Feb 2, 2024 at 10:58 AM Tom Lane  wrote:
> >> Thanks for all the work you did running it!  CFM is typically a
> >> thankless exercise in being a nag, but I thought you put in more
> >> than the usual amount of effort to keep things moving.
> >>
> >
> > +1. Thanks a lot for all the hard work.
>
> +1.  Thanks!


+1.  Thanks!

Thanks
Richard

Re: Check lateral references within PHVs for memoize cache keys

2024-02-02 Thread Richard Guo

On Mon, Dec 25, 2023 at 3:01 PM Richard Guo  wrote:

> On Thu, Jul 13, 2023 at 3:12 PM Richard Guo 
> wrote:
>
>> So I'm wondering if it'd be better that we move all this logic of
>> computing additional lateral references within PHVs to get_memoize_path,
>> where we can examine only PHVs that are evaluated at innerrel.  And
>> considering that these lateral refs are only used by Memoize, it seems
>> more sensible to compute them there.  But I'm a little worried that
>> doing this would make get_memoize_path too expensive.
>>
>> Please see v4 patch for this change.
>>
>
> I'd like to add that not checking PHVs for lateral references can lead
> to performance regressions with Memoize node.
>

The v4 patch does not apply any more.  I've rebased it on master.
Nothing else has changed.

Thanks
Richard


v5-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch
Description: Binary data

Re: An improvement on parallel DISTINCT

2024-02-04 Thread Richard Guo

On Fri, Feb 2, 2024 at 6:39 PM David Rowley  wrote:

> So the gains increase with more parallel workers due to pushing more
> work to the worker. Amdahl's law approves of this.
>
> I'll push the patch shortly.


Thanks for the detailed testing and pushing the patch!

Thanks
Richard

Re: An improvement on parallel DISTINCT

2024-02-04 Thread Richard Guo

On Fri, Feb 2, 2024 at 7:36 PM David Rowley  wrote:

> Now for the other stuff you had.   I didn't really like this part:
>
> + /*
> + * Set target for partial_distinct_rel as generate_useful_gather_paths
> + * requires that the input rel has a valid reltarget.
> + */
> + partial_distinct_rel->reltarget = cheapest_partial_path->pathtarget;
>
> I think we should just make it work the same way as
> create_grouping_paths(), where grouping_target is passed as a
> parameter.
>
> I've done it that way in the attached.

The change looks good to me.

BTW, I kind of doubt that 'create_partial_distinct_paths' is a proper
function name given what it actually does.  It not only generates
distinct paths based on input_rel's partial paths, but also adds
Gather/GatherMerge on top of these partially distinct paths, followed by
a final unique/aggregate path to ensure uniqueness of the final result.
So maybe 'create_parallel_distinct_paths' or something like that would
be better?

I asked because I noticed that in create_partial_grouping_paths(), we
only generate partially aggregated paths, and any subsequent
FinalizeAggregate step is called in the caller.

Thanks
Richard

Re: Reordering DISTINCT keys to match input path's pathkeys

2024-02-04 Thread Richard Guo

cfbot reminds that this patch does not apply any more.  So I've rebased
it on master, and also adjusted the test cases a bit.

Thanks
Richard


v2-0001-Reordering-DISTINCT-keys-to-match-input-path-s-pathkeys.patch
Description: Binary data

cfbot does not list patches in 'Current commitfest'

2024-02-04 Thread Richard Guo

... and the patches in CF #47 (currently open) are listed in 'Next
commitfest'.  I guess we need to manually create a "next" commitfest?

Thanks
Richard

Re: Properly pathify the union planner

2024-02-06 Thread Richard Guo

On Fri, Nov 24, 2023 at 6:29 AM David Rowley  wrote:

> I've attached the updated patch.  This one is probably ready for
> someone to test out. There will be more work to do, however.

I just started reviewing this patch and haven't looked through all the
details yet.  Here are some feedbacks that came to my mind.  Post them
first so that I don’t forget them after the holidays.

* I think we should update truncate_useless_pathkeys() to account for
the ordering requested by the query's set operation; otherwise we may
not get a subquery's path with the expected pathkeys.  For instance,

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

-- on v1 patch
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
 QUERY PLAN

 Unique
   ->  Merge Append
 Sort Key: t.a, t.b
 ->  Incremental Sort
   Sort Key: t.a, t.b
   Presorted Key: t.a
   ->  Index Only Scan using t_a_b_idx on t
 ->  Incremental Sort
   Sort Key: t_1.a, t_1.b
   Presorted Key: t_1.a
   ->  Index Only Scan using t_a_b_idx on t t_1
(11 rows)

-- after accounting for setop_pathkeys in truncate_useless_pathkeys()
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
  QUERY PLAN
--
 Unique
   ->  Merge Append
 Sort Key: t.a, t.b
 ->  Index Only Scan using t_a_b_idx on t
 ->  Index Only Scan using t_a_b_idx on t t_1
(5 rows)

* I understand that we need to sort (full or incremental) the paths of
the subqueries to meet the ordering required for setop_pathkeys, so that
MergeAppend has something to work with.  Currently in the v1 patch this
sorting is performed during the planning phase of the subqueries (in
grouping_planner).

And we want to add the subquery's cheapest_total_path as-is to allow
that path to be used in hash-based UNIONs, and we also want to add a
sorted path on top of cheapest_total_path.  And then we may encounter
the issue you mentioned earlier regarding add_path() potentially freeing
the cheapest_total_path, leaving the Sort's subpath gone.

I'm thinking that maybe it'd be better to move the work of sorting the
subquery's paths to the outer query level, specifically within the
build_setop_child_paths() function, just before we stick SubqueryScanPath
on top of the subquery's paths.  I think this is better because:

1. This minimizes the impact on subquery planning and reduces the
footprint within the grouping_planner() function as much as possible.

2. This can help avoid the aforementioned add_path() issue because the
two involved paths will be structured as:

cheapest_path -> subqueryscan
and
cheapest_path -> sort -> subqueryscan

If the two paths cost fuzzily the same and add_path() decides to keep
the second one due to it having better pathkeys and pfree the first one,
it would not be a problem.

BTW, I haven't looked through the part involving partial paths, but I
think we can do the same to partial paths.

* I noticed that in generate_union_paths() we use a new function
build_setop_pathkeys() to build the 'union_pathkeys'.  I wonder why we
don't simply utilize make_pathkeys_for_sortclauses() since we already
have the grouplist for the setop's output columns.

To assist the discussion I've attached a diff file that includes all the
changes above.

Thanks
Richard

v1-0001-review-diff.patch
Description: Binary data

Re: Wrong query results caused by loss of join quals

2023-02-23 Thread Richard Guo

On Thu, Feb 23, 2023 at 4:50 AM Tom Lane  wrote:

> I thought about this and decided that it's not really a problem.
> have_relevant_joinclause is just a heuristic, and I don't think we
> need to prioritize forming a join if the only relevant clauses look
> like this.  We won't be able to use such clauses for merge or hash,
> so we're going to end up with an unconstrained nestloop, which isn't
> something to be eager to form.  The join ordering rules will take
> care of forcing us to make the join when necessary.


Agreed.  And as I tried, in lots of cases joins with such clauses would
be accepted by have_join_order_restriction(), which always appears with
have_relevant_joinclause().


> The only easy improvement I can see to make here is to apply the old
> rules at inner joins.  Maybe it's worth complicating the data structures
> to be smarter at outer joins, but I rather doubt it: we could easily
> expend more overhead than we'll save here by examining irrelevant ECs.
> In any case, if there is a useful optimization here, it can be pursued
> later.


This makes sense.


> I changed it anyway after noting that (a) passing in the ojrelid is
> needful to be able to distinguish inner and outer joins, and
> (b) the existing comment about the join_relids input is now wrong.
> Even if it happens to not be borked for current callers, that seems
> like a mighty fragile assumption.


Agreed. This is reasonable.


> Less-hasty v2 patch attached.


I think the patch is in good shape now.

Thanks
Richard

Re: Making empty Bitmapsets always be NULL

2023-03-01 Thread Richard Guo

On Thu, Mar 2, 2023 at 6:59 AM Tom Lane  wrote:

> Yeah.  I split out those executor fixes as 0002; 0003 is the changes
> to bitmapsets proper, and then 0004 removes now-dead code.

+1 to all these patches.  Some minor comments from me.

*0003
It seems that the Bitmapset checked by bms_is_empty_internal cannot be
NULL from how it is computed by a function.  So I wonder if we can
remove the check of 'a' being NULL in that function, or reduce it to an
Assert.

-   if (a == NULL)
-   return true;
+   Assert(a != NULL);

*0004
It seems that in create_lateral_join_info around line 689, the
bms_is_empty check of lateral_relids is not necessary, since we've
checked that lateral_relids cannot be NULL several lines earlier.

@@ -682,12 +682,6 @@ create_lateral_join_info(PlannerInfo *root)
if (lateral_relids == NULL)
continue;

-   /*
-* We should not have broken the invariant that lateral_relids is
-* exactly NULL if empty.
-*/
-   Assert(!bms_is_empty(lateral_relids));
-

Thanks
Richard

Assert failure of the cross-check for nullingrels

2023-03-10 Thread Richard Guo

While looking into issue [1], I came across $subject on master.  Below
is how to reproduce it.

DROP TABLE IF EXISTS t1,t2,t3,t4 CASCADE;
CREATE TABLE t1 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t2 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t3 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t4 AS SELECT true AS x FROM generate_series(0,1) x;
ANALYZE;

explain (costs off)
select * from t1 left join (t2 left join t3 on t2.x) on t2.x left join t4
on t3.x and t2.x where t1.x = coalesce(t2.x,true);

I've looked into this a little bit.  For the join of t2/t3 to t4, since
it can commute with the join of t1 to t2/t3 according to identity 3, we
would generate multiple versions for its joinquals.  In particular, the
qual 't3.x' would have two versions, one with varnullingrels as {t2/t3,
t1/t2}, the other one with varnullingrels as {t2/t3}.  So far so good.

Assume we've determined to build the join of t2/t3 to t4 after we've
built t1/t2 and t2/t3, then we'd find that both versions of qual 't3.x'
would be accepted by clause_is_computable_at.  This is not correct.  We
are supposed to accept only the one marked as {t2/t3, t1/t2}.  The other
one is not rejected mainly because we found that the qual 't3.x' does
not mention any nullable Vars of outer join t1/t2.

I wonder if we should consider syn_xxxhand rather than min_xxxhand in
clause_is_computable_at when we check if clause mentions any nullable
Vars.  But I'm not sure about that.

[1]
https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

2023-03-13 Thread Richard Guo

On Fri, Mar 10, 2023 at 4:13 PM Richard Guo  wrote:

> I wonder if we should consider syn_xxxhand rather than min_xxxhand in
> clause_is_computable_at when we check if clause mentions any nullable
> Vars.  But I'm not sure about that.
>

No, considering syn_xxxhand is not right.  After some join order
commutation we may form the join with only its min_lefthand and
min_righthand.  In this case if we check against syn_xxxhand rather than
min_xxxhand in clause_is_computable_at, we may end up with being unable
to find a proper place for some quals.  I can see this problem in below
query.

select * from t1 left join ((select t2.x from t2 left join t3 on t2.x where
t3.x is null) s left join t4 on s.x) on s.x = t1.x;

Suppose we've formed join t1/t2 and go ahead to form the join of t1/t2
to t3.  If we consider t1/t2 join's syn_xxxhand, then the pushed down
qual 't3.x is null' would not be computable at this level because it
mentions nullable Vars from t1/t2 join's syn_righthand and meanwhile is
not marked with t1/t2 join.  This is not correct and would trigger an
Assert.

Back to the original issue, if a join has more than one quals, actually
we treat them as a whole when we check if identity 3 applies as well as
when we adjust them to be suitable for commutation according to identity
3.  So when we check if a qual is computable at a given level, I think
we should also consider the join's quals as a whole.  I'm thinking that
we use a 'group' notion for RestrictInfos and then use the clause_relids
of the 'group' in clause_is_computable_at.  Does this make sense?

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

2023-03-13 Thread Richard Guo

On Mon, Mar 13, 2023 at 5:03 PM Richard Guo  wrote:

> Back to the original issue, if a join has more than one quals, actually
> we treat them as a whole when we check if identity 3 applies as well as
> when we adjust them to be suitable for commutation according to identity
> 3.  So when we check if a qual is computable at a given level, I think
> we should also consider the join's quals as a whole.  I'm thinking that
> we use a 'group' notion for RestrictInfos and then use the clause_relids
> of the 'group' in clause_is_computable_at.  Does this make sense?
>

I'm imagining something like attached (no comments and test cases yet).

Thanks
Richard


v1-0001-Draft-group-RestrictInfos.patch
Description: Binary data

Re: Using each rel as both outer and inner for JOIN_ANTI

2023-03-15 Thread Richard Guo

On Wed, Mar 15, 2023 at 2:25 AM Gregory Stark (as CFM) 
wrote:

> So what is the status of this patch?
>
> It looks like you received some feedback from Emre, Tom, Ronan, and
> Alvaro but it also looks like you responded to most or all of that.
> Are you still blocked waiting for feedback? Anything specific you need
> help with?
>
> Or is the patch ready for commit now? In which case it would be good
> to rebase it since it's currently failing to apply. Well it would be
> good to rebase regardless but it would be especially important if we
> want to get it committed :)


Thanks for reminding.  Attached is the rebased patch, with no other
changes.  I think the patch is ready for commit.

Thanks
Richard


v6-0001-Using-each-rel-as-both-outer-and-inner-for-anti-joins.patch
Description: Binary data

Re: postgres_fdw: Useless if-test in GetConnection()

2023-03-15 Thread Richard Guo

On Wed, Mar 15, 2023 at 6:18 PM Etsuro Fujita 
wrote:

> While working on something else, I noticed that the “if (entry->conn
> == NULL)” test after doing disconnect_pg_server() when re-establishing
> a given connection in GetConnection() is pointless, because the former
> function ensures that entry->conn is NULL.  So I removed the if-test.
> Attached is a patch for that.  I think we could instead add an
> assertion, but I did not, because we already have it in
> make_new_connection().


+1. Good catch.

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

2023-03-16 Thread Richard Guo

On Mon, Mar 13, 2023 at 5:44 PM Richard Guo  wrote:

> On Mon, Mar 13, 2023 at 5:03 PM Richard Guo 
> wrote:
>
>> Back to the original issue, if a join has more than one quals, actually
>> we treat them as a whole when we check if identity 3 applies as well as
>> when we adjust them to be suitable for commutation according to identity
>> 3.  So when we check if a qual is computable at a given level, I think
>> we should also consider the join's quals as a whole.  I'm thinking that
>> we use a 'group' notion for RestrictInfos and then use the clause_relids
>> of the 'group' in clause_is_computable_at.  Does this make sense?
>>
>
> I'm imagining something like attached (no comments and test cases yet).
>

Here is an updated patch with comments and test case.  I also change the
code to store 'group_clause_relids' directly in RestrictInfo.

Thanks
Richard


v2-0001-Check-group_clause_relids-to-see-if-a-clause-is-computable.patch
Description: Binary data

Re: A problem about ParamPathInfo for an AppendPath

2023-03-16 Thread Richard Guo

On Fri, Mar 17, 2023 at 6:15 AM Tom Lane  wrote:

> Pushed.  I thought the comment needed to be completely rewritten not just
> tweaked, and I felt it was probably reasonable to continue to exclude
> dummy paths from getting the more expensive treatment.

Yes agreed.  Thanks for the changes and pushing.

Thanks
Richard

Re: An oversight in ExecInitAgg for grouping sets

2023-03-19 Thread Richard Guo

On Mon, Jan 9, 2023 at 5:21 PM David Rowley  wrote:

> On Thu, 5 Jan 2023 at 20:06, Richard Guo  wrote:
> > I reviewed this patch and have some comments.
>
> Thanks for looking at this. I think I've fixed all the issues you
> mentioned.
>
> One extra thing I noticed was that I had to add a new
> VALGRIND_MAKE_MEM_DEFINED in AllocSetAlloc when grabbing an item off
> the freelist. I didn't quite manage to figure out why that's needed as
> when we do AllocSetFree() we don't mark the pfree'd memory with
> NOACCESS, and it also looks like AllocSetReset() sets the keeper
> block's memory to NOACCESS, but that function also clears the
> freelists too, so the freelist chunk is not coming from a recently
> reset context.
>
> I might need to spend a bit more time on this to see if I can figure
> out why this is happening.  On the other hand, maybe we should just
> mark pfree'd memory as NOACCESS as that might find another class of
> issues.


It occurred to me that this hasn't been applied.  Should we add it to
the CF to not lose track of it?

Thanks
Richard

Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean

2023-03-20 Thread Richard Guo

On Mon, Mar 20, 2023 at 2:43 PM Michael Paquier  wrote:

> Nathan has reported to me offlist that maintainer-clean was not doing
> its job for the files generated by gen_node_support.pl in
> src/backend/nodes/ for the query jumbling.  Attached is a patch to
> take care of this issue.
>
> While on it, I have found a comment in the related README that was
> missing a refresh.
>
> Any objections or comments?


A minor comment for the README is that now we have five support
functions not four.

- outcome.  (For some classes of node types, you don't need all four support
+ outcome.  (For some classes of node types, you don't need all five support

Thanks
Richard

Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean

2023-03-20 Thread Richard Guo

On Mon, Mar 20, 2023 at 3:49 PM Daniel Gustafsson  wrote:

> > On 20 Mar 2023, at 08:46, Michael Paquier  wrote:
> > How about removing the "fout/five" entirely here
> > and make that simpler?  I would propose:
> > "For some classes of node types, you don't need all the support
> > functions."
>
> Yes please, keeping such counts in sync is always error-prone.


Agreed. +1 to remove the counts.

Thanks
Richard

Re: Fix typo plgsql to plpgsql.

2023-03-20 Thread Richard Guo

On Tue, Mar 21, 2023 at 12:26 AM Zhang Mingli  wrote:

> Found several typos like plgsql, I think it should be plpgsql.
>

+1. I believe these are typos. And a grep search shows that all typos of
this kind are addressed by the patch.

Thanks
Richard

Re: Comment in preptlist.c

2023-03-21 Thread Richard Guo

On Tue, Mar 21, 2023 at 5:41 PM Etsuro Fujita 
wrote:

> While working on something else, I noticed $SUBJECT added by commit
> 86dc90056:
>
>  * For UPDATE and DELETE queries, the targetlist must also contain "junk"
>  * tlist entries needed to allow the executor to identify the rows to be
>  * updated or deleted; for example, the ctid of a heap row.  (The planner
>  * adds these; they're not in what we receive from the planner/rewriter.)
>
> I think that “planner/rewriter” should be parser/rewriter.  Attached
> is a patch for that.


Yes of course.  It should be parser/rewriter here.

Thanks
Richard

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 934 matches

Mail list logo