Re: Removing unneeded self joins

2022-03-04 Thread Andrey Lepikhov

On 1/3/2022 03:03, Greg Stark wrote:

On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau  wrote:


Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?


Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.

My question is whether it handles cases like:

select b.x,c.y
   from t
join t2 as b on (b.id = t.id)
   join t2 as c on (c.id = t.id)

That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?
Yes, the self-join removal machinery uses EC mechanism as usual to get 
all join clauses. So, this case works (See demo in attachment).


Also, in new version of the patch I fixed one stupid bug: checking a 
self-join candidate expression operator - we can remove only expressions 
like F(arg1) = G(arg2).


--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 70398361a0a0d9c6c3c7ddd1fd305ac11138e7b1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 888 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 426 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 197 +
 12 files changed, 1583 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids 
joinrelids);
+ Relids 
joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int 
*nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   RelOptInfo 
*innerrel,
   JoinType 
jointype,
   List 
*restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
remove_rel_from_query(root, innerrelid,
  
bms_union(sjinfo->min_lefthand,
-   
sjinfo->min

Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-12-29 Thread Andrey Lepikhov

On 29.12.2020 16:20, Hou, Zhijie wrote:

see new version in attachment.


I took a look into the patch, and have some comments.

1.
+   PG_FINALLY();
+   {
+   copy_fmstate = NULL; /* Detect problems */
I don't quite understand this comment,
does it means we want to detect something like Null reference ?


2.
+   PG_FINALLY();
+   {
...
+   if (!OK)
+   PG_RE_THROW();
+   }
Is this PG_RE_THROW() necessary ?
IMO, PG_FINALLY will reproduce the PG_RE_THROW action if we get to the code 
block due to an error being thrown.


This is a debugging stage atavisms. fixed.


3.
+   ereport(ERROR,
+   (errmsg("unexpected extra results during 
COPY of table: %s",
+   PQerrorMessage(conn;

I found some similar message like the following:

pg_log_warning("unexpected extra results during COPY of table 
\"%s\"",
   tocEntryTag);
How about using existing messages style ?


This style is intended for use in frontend utilities, not for contrib 
extensions, i think.


4.
I noticed some not standard code comment[1].
I think it's better to comment like:
/*
  * line 1
  * line 2
  */

[1]---
+   /* Finish COPY IN protocol. It is needed to do after successful 
copy or
+* after an error.
+*/


+/*
+ *
+ * postgresExecForeignCopy

+/*
+ *
+ * postgresBeginForeignCopy

Thanks, fixed.
The patch in attachment rebased on 107a2d4204.


--
regards,
Andrey Lepikhov
Postgres Professional
From 1c5439d802b7654ee50dc4326b9bc24fc7f44677 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 14 Dec 2020 13:37:40 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c|  60 ++--
 .../postgres_fdw/expected/postgres_fdw.out|  46 ++-
 contrib/postgres_fdw/postgres_fdw.c   | 130 ++
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  45 ++
 doc/src/sgml/fdwhandler.sgml  |  73 ++
 src/backend/commands/copy.c   |   4 +-
 src/backend/commands/copyfrom.c   | 126 ++---
 src/backend/commands/copyto.c |  84 ---
 src/backend/executor/execMain.c   |   8 +-
 src/backend/executor/execPartition.c  |  27 +++-
 src/include/commands/copy.h   |   8 +-
 src/include/foreign/fdwapi.h  |  15 ++
 13 files changed, 533 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ca2f9f3215..b2a71faabc 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List 
*targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,

deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+ bool 
enclose_in_parens);
 
 /*
  * Helper functions
@@ -1763,6 +1765,20 @@ d

Increase value of OUTER_VAR

2021-03-03 Thread Andrey Lepikhov

Hi,

Playing with a large value of partitions I caught the limit with 65000 
table entries in a query plan:


if (IS_SPECIAL_VARNO(list_length(glob->finalrtable)))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("too many range table entries")));

Postgres works well with so many partitions.
The constants INNER_VAR, OUTER_VAR, INDEX_VAR are used as values of the 
variable 'var->varno' of integer type. As I see, they were introduced 
with commit 1054097464 authored by Marc G. Fournier, in 1996.

Value 65000 was relevant to the size of the int type at that time.

Maybe we will change these values to INT_MAX? (See the patch in attachment).

--
regards,
Andrey Lepikhov
Postgres Professional
From 98da77cdefd53dbf4ce0c740d1a0f356da970648 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 3 Mar 2021 11:22:32 +0300
Subject: [PATCH] Set values of special varnos to the upper bound of the int
 type

---
 src/include/nodes/primnodes.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index d4ce037088..9038d97886 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -168,9 +168,9 @@ typedef struct Expr
  * in the planner and doesn't correspond to any simple relation column may
  * have varnosyn = varattnosyn = 0.
  */
-#defineINNER_VAR   65000   /* reference to inner subplan */
-#defineOUTER_VAR   65001   /* reference to outer subplan */
-#defineINDEX_VAR   65002   /* reference to index column */
+#defineINNER_VAR   (INT_MAX - 2)   /* reference to inner subplan */
+#defineOUTER_VAR   (INT_MAX - 1)   /* reference to outer subplan */
+#defineINDEX_VAR   (INT_MAX)   /* reference to index column */
 
 #define IS_SPECIAL_VARNO(varno)((varno) >= INNER_VAR)
 
-- 
2.29.2



Re: Increase value of OUTER_VAR

2021-03-03 Thread Andrey Lepikhov

On 3/3/21 12:52, Julien Rouhaud wrote:

On Wed, Mar 3, 2021 at 4:57 PM Amit Langote  wrote:


On Wed, Mar 3, 2021 at 5:52 PM David Rowley  wrote:

Something like 1 million seems like a more realistic limit to me.
That might still be on the high side, but it'll likely mean we'd not
need to revisit this for quite a while.


+1

Also, I got reminded of this discussion from not so long ago:

https://www.postgresql.org/message-id/flat/16302-e45634e2c0e34e97%40postgresql.org

Thank you


+1

Ok. I changed the value to 1 million and explained this decision in the 
comment.

This issue caused by two cases:
1. Range partitioning on a timestamp column.
2. Hash partitioning.
Users use range distribution by timestamp because they want to insert 
new data quickly and analyze entire set of data.
Also, in some discussions, I see Oracle users discussing issues with 
more than 1e5 partitions.


--
regards,
Andrey Lepikhov
Postgres Professional
From a3c1ee9d2e197dee40aed81cb6a08695a8fa2917 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 3 Mar 2021 11:22:32 +0300
Subject: [PATCH] Increase values of special varnos to 1 million. Use this
 value as a realistic limit for number of range table entries. Restrict it to
 detect possible errors which can cause exceeding of MaxAllocSize in palloc()
 on the elements of the array. This value can be changed later to suit the
 needs of the real world.

---
 src/include/nodes/primnodes.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index d4ce037088..06016340a3 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -168,9 +168,9 @@ typedef struct Expr
  * in the planner and doesn't correspond to any simple relation column may
  * have varnosyn = varattnosyn = 0.
  */
-#defineINNER_VAR   65000   /* reference to inner subplan */
-#defineOUTER_VAR   65001   /* reference to outer subplan */
-#defineINDEX_VAR   65002   /* reference to index column */
+#defineINNER_VAR   100 /* reference to inner subplan */
+#defineOUTER_VAR   101 /* reference to outer subplan */
+#defineINDEX_VAR   102 /* reference to index column */
 
 #define IS_SPECIAL_VARNO(varno)((varno) >= INNER_VAR)
 
-- 
2.29.2



Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-11-23 Thread Andrey Lepikhov




On 11/23/20 7:49 AM, tsunakawa.ta...@fujitsu.com wrote:

Hi Andrey-san,

From: Tomas Vondra 

I needed to look at this patch while working on something related, and I found 
it
got broken by 6973533650c a couple days ago. So here's a fixed version, to keep
cfbot happy. I haven't done any serious review yet.


Could I or my colleague continue this patch in a few days?  It looks it's 
stalled over one month.


I don't found any problems with this patch that needed to be corrected. 
It is wait for actions from committers side, i think.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-11-23 Thread Andrey Lepikhov




On 11/24/20 9:27 AM, tsunakawa.ta...@fujitsu.com wrote:

Andrey-san, Fujita-san,

From: Etsuro Fujita 

On Mon, Nov 23, 2020 at 5:39 PM Andrey Lepikhov
 wrote:

On 11/23/20 7:49 AM, tsunakawa.ta...@fujitsu.com wrote:

Could I or my colleague continue this patch in a few days?  It looks it's

stalled over one month.


I don't found any problems with this patch that needed to be corrected.
It is wait for actions from committers side, i think.


I'm planning to review this patch.  I think it would be better for
another pair of eyes to take a look at it, though.



There are the following two issues left untouched.

https://www.postgresql.org/message-id/TYAPR01MB2990DC396B338C98F27C8ED3FE1F0%40TYAPR01MB2990.jpnprd01.prod.outlook.com


I disagree with your opinion about changing the interface of the 
ExecRelationAllowsMultiInsert routine. If you insist on the need for 
this change, we need another opinion.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Removing unneeded self joins

2020-11-28 Thread Andrey Lepikhov

Thank you for the review,

On 27.11.2020 21:49, Heikki Linnakangas wrote:

On 31/10/2020 11:26, Andrey V. Lepikhov wrote:

+    /*
+ * Process restrictlist to seperate out the self join 
quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a 
= b to

+ * otherjoinquals.
+ */
+    split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+    if (list_length(selfjoinquals) == 0)
+    {
+    /*
+ * Have a chance to remove join if target list 
contains vars from

+ * the only one relation.
+ */


I don't understand the logic here. If 'selfjoinquals' is empty, it means 
that there is no join qual between the two relations, right? How can we 
ever remove the join in that case? And how does the target list affect 
that? Can you give an example query of that?


Maybe it is a problem of variable naming. Following the idea of David 
Rowley, we split quals into two subsets: {x==x} and another, for example 
{x=y}.
First set is an trivial case of self-join: if we have unique index on 
the attribute 'x', then this join is self-join.
Second set is give us a chance: if right side is unique for right side 
of the qual and no vars from right side end up in the target list of the 
join, then this is a self-join case. Example:


CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x;  -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y;  -- self-join too




--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4553,11 +4553,13 @@ explain (costs off)
 select p.* from
   (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
   where p.k = 1 and p.k = 2;
-    QUERY PLAN ---
+   QUERY PLAN 
+

  Result
    One-Time Filter: false
-(2 rows)
+   ->  Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)

 -- bug 5255: this is not optimizable by join removal
 begin;


That doesn't seem like an improvement...


I investigated this case. It is a planner feature: it simplifies dummy 
joins and dummy scans to different plans. Maybe it can cause some 
discussion, but this example so rare and doesn't make a problem.


My general impression of this patch is that it's a lot of code to deal 
with a rare special case. At the beginning of this thread there was 
discussion on the overhead that this might add to planning queries that 
don't benefit, but adding a lot of code isn't nice either, even if the 
performance is acceptable. That's not necessarily a show-stopper, but it 
does make me much less excited about this. I'm not sure what to suggest 
to do about that, except a really vague "Can you make is simpler?"


Currently planner reduces useless outer joins and unique semijoins. 
Reduce self join feature continues the development of the planner in the 
same direction. For example, it is needed for ORM software.
Most of the code dedicated to removing unnecessary relation and 
replacing of one oid with another. We are trying to use 
remove_rel_from_query() machinery. Perhaps this will allow us to make 
the code shorter.


--
regards,
Andrey Lepikhov
Postgres Professional




Cost overestimation of foreign JOIN

2020-11-30 Thread Andrey Lepikhov

Hi,

While testing of Asynchronous Append feature with TPC-H queries, I found 
that the push-down JOIN technique is rarely used.

For my servers fdw_tuple_cost = 0.2, fdw_startup_cost = 100.
Exploring the code, i found in postgres_fdw, estimate_path_cost_size(), 
lines 2908,2909:

run_cost += nrows * join_cost.per_tuple;
nrows = clamp_row_est(nrows * fpinfo->joinclause_sel);

Above:
nrows = fpinfo_i->rows * fpinfo_o->rows;

Maybe it is needed to swap lines 2908 and 2909 (see attachment)?

In my case of two big partitioned tables and small join result it 
strongly influenced on choice of the JOIN push-down strategy.


--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index b6c72e1d1e..3047300c4b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2905,8 +2905,8 @@ estimate_path_cost_size(PlannerInfo *root,
 */
run_cost = fpinfo_i->rel_total_cost - 
fpinfo_i->rel_startup_cost;
run_cost += fpinfo_o->rel_total_cost - 
fpinfo_o->rel_startup_cost;
-   run_cost += nrows * join_cost.per_tuple;
nrows = clamp_row_est(nrows * fpinfo->joinclause_sel);
+   run_cost += nrows * join_cost.per_tuple;
run_cost += nrows * remote_conds_cost.per_tuple;
run_cost += fpinfo->local_conds_cost.per_tuple * 
retrieved_rows;
 


Re: Cost overestimation of foreign JOIN

2020-11-30 Thread Andrey Lepikhov

On 30.11.2020 22:38, Tom Lane wrote:

Andrey Lepikhov  writes:

Maybe it is needed to swap lines 2908 and 2909 (see attachment)?


No; as explained in the comment immediately above here, we're assuming
that the join conditions will be applied on the cross product of the
input relations.


Thank you. Now it is clear to me.


Now admittedly, that's a worst-case assumption, since it amounts to
expecting that the remote server will do the join in the dumbest
possible nested-loop way.  If the remote can use a merge or hash
join, for example, the cost is likely to be a lot less.


My goal is scaling Postgres on a set of the same servers with same 
postgres instances. If one server uses for the join a hash-join node, i 
think it is most likely that the other server will also use for this 
join a hash-join node (Maybe you remember, I also use the statistics 
copying technique to provide up-to-date statistics on partitions). Tests 
show good results with such an approach. But maybe this is my special case.



 But it is
not the job of this code path to outguess the remote planner.  It's
certainly not appropriate to invent an unprincipled cost estimate
as a substitute for trying to guess that.


Agreed.


If you're unhappy with the planning results you get for this,
why don't you have use_remote_estimate turned on?


I have a mixed load model. Large queries are suitable for additional 
estimate queries. But for many simple SELECT's that touch a small 
portion of the data, the latency has increased significantly. And I 
don't know how to switch the use_remote_estimate setting in such case.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: [POC] Fast COPY FROM command for the table with foreign partitions

2021-02-02 Thread Andrey Lepikhov

On 2/2/21 11:57, tsunakawa.ta...@fujitsu.com wrote:

Hello, Andrey-san,


From: Tang, Haiying 

Sometimes before i suggested additional optimization [1] which can
additionally speed up COPY by 2-4 times. Maybe you can perform the
benchmark for this solution too?

...

But the patch no longer applies, I tried to apply on e42b3c3bd6(2021/1/26) but
failed.

Can you send a rebased version?

When do you think you can submit the rebased version of them?  (IIUC at the 
off-list HighGo meeting, you're planning to post them late this week after the 
global snapshot patch.)  Just in case you are not going to do them for the 
moment, can we rebase and/or further modify them so that they can be committed 
in PG 14?

Of course, you can rebase it.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Andrey Lepikhov

On 5/2/2020 01:24, Tom Lane wrote:

I've not written any actual code, but am close to being ready to.
This thread gives us hope to get started on solving some of the basic 
planner problems.
But there is no activity for a long time, as I see. Have You tried to 
implement this idea? Is it actual now?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Andrey Lepikhov

On 22/12/2021 20:42, Tom Lane wrote:

Andrey Lepikhov  writes:

On 5/2/2020 01:24, Tom Lane wrote:

I've not written any actual code, but am close to being ready to.



This thread gives us hope to get started on solving some of the basic
planner problems.
But there is no activity for a long time, as I see. Have You tried to
implement this idea? Is it actual now?


It's been on the back burner for awhile :-(.  I've not forgotten
about it though.
I would try to develop this feature. Idea is clear for me, but 
definition of the NullableVars structure is not obvious. Maybe you can 
prepare a sketch of this struct or you already have some draft code?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-23 Thread Andrey Lepikhov

On 21/12/2021 00:05, Peter Geoghegan wrote:

* Some index AMs don't work like nbtree and GiST in that they cannot
do their scan sequentially -- they have to do something like a
logical/keyspace order scan instead, which is *totally* different to
heapam (not just a bit different). There is no telling how many times
each page will be accessed in these other index AMs, and in what
order, even under optimal conditions. We should arguably not even try
to provide any granular progress information here, since it'll
probably be too messy.


Maybe we could add callbacks into AM interface for 
send/receive/representation implementation of progress?
So AM would define a set of parameters to send into stat collector and 
show to users.


--
regards,
Andrey Lepikhov
Postgres Professional




Multiple Query IDs for a rewritten parse tree

2022-01-08 Thread Andrey Lepikhov

On 5/1/2022 10:13, Tom Lane wrote:
> I feel like we need to get away from the idea that there is just
> one query hash, and somehow let different extensions attach
> differently-calculated hashes to a query.  I don't have any immediate
> ideas about how to do that in a reasonably inexpensive way.

Now, queryId field represents an query class (depending on an jumbling 
implementation). It is used by extensions as the way for simple tracking 
a query from a parse tree creation point to the end of its life along 
all hook calls, which an extension uses (remember about possible plan 
caching).


I know at least two different cases of using queryId:
1) for monitoring purposes - pg_stat_statements is watching how often 
queries of a class emerge in the database and collects a stat on each class.
2) adaptive purposes - some extensions influence a planner decision 
during the optimization stage and want to learn on a performance shift 
at the end of execution stage.


Different purposes may require different jumbling implementations. But 
users can want to use such extensions at the same time. So we should 
allow to attach many different query IDs to a query (maybe better to 
call it as 'query label'?).


Thinking for a while I invented three different ways to implement it:
1. queryId will be a trivial 64-bit counter. So, each extension can 
differ each query from any other, track it along all hooks, use an 
jumbling code and store an queryId internally. Here only one big problem 
I see - increasing overhead in the case of many consumers of queryId 
feature.


2. Instead of simple queryId we can store a list of pairs (QueryId, 
funcOid). An extension can register a callback for queryId generation 
and the core will form a list of queryIds right after an query tree 
rewriting. funcOid is needed to differ jumbling implementations. Here we 
should invent an additional node type for an element of the list.


3. Instead of queryId we could add a multi-purpose 'private' list in the 
Query struct. Any extension can add to this list additional object(s) 
(with registered object type, of course). As an example, i can imagine a 
kind of convention for queryIds in such case - store a String node with 
value: ' - '.

This way we should implement a registered callback mechanism too.

I think, third way is the cheapest, flexible and simple for implementation.

Any thoughts, comments, criticism ?

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Andrey Lepikhov

On 10/1/2022 13:56, Julien Rouhaud wrote:

On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote:

I think, pg_stat_statements can live with an queryId generator of the
sr_plan extension. But It replaces all constants with $XXX parameter at the
query string. In our extension user defines which plan is optimal and which
constants can be used as parameters in the plan.


I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?
Yes. the same input query string doesn't prove that frozen query plan 
can be used, because rewrite rules could be changed. So we use only a 
query tree. Here we must have custom jumbling implementation.

queryId in this extension defines two aspects:
1. How many input queries will be compared with a query tree template of 
the frozen statement.

2. As a result, performance overheads on unsuccessful comparings.



One drawback I see here - creating or dropping of my extension changes
behavior of pg_stat_statements that leads to distortion of the DB load
profile. Also, we haven't guarantees, that another extension will work
correctly (or in optimal way) with such queryId.


But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?
Additional Jumble state isn't necessary, but it would be better for 
performance to collect pointers to all constant nodes during a process 
of hash generation.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Andrey Lepikhov

On 10/1/2022 15:39, Julien Rouhaud wrote:

On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote:

On 10/1/2022 13:56, Julien Rouhaud wrote:
Yes. the same input query string doesn't prove that frozen query plan can be
used, because rewrite rules could be changed. So we use only a query tree.


Yes, I'm fully aware of that.  I wasn't asking about using the input query
string but the need for generating a dedicated normalized output query string
that would be potentially different from the one generated by
pg_stat_statements (or similar).

Thanks, now I got it. I don't remember a single situation where we would 
need to normalize a query string.

But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?

Additional Jumble state isn't necessary, but it would be better for
performance to collect pointers to all constant nodes during a process of
hash generation.


I agree, but it's even better for performance if this is collected only once.
I still don't know if this extension (or any extension) would require something
different from a common jumble state that would serve for all purpose or if we
can work with a single jumble state and only handle multiple queryid.
I think, JumbleState in highly monitoring-specific, maybe even 
pg_stat_statements specific (maybe you can designate another examples). 
I haven't ideas how it can be used in arbitrary extension.
But, it is not so difficult to imagine an implementation (as part of 
Tom's approach, described earlier) such kind of storage for each 
generation method.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: [POC] Fast COPY FROM command for the table with foreign partitions

2021-03-22 Thread Andrey Lepikhov

On 5/3/21 21:54, tsunakawa.ta...@fujitsu.com wrote:

I've managed to rebased it, although it took unexpectedly long.  The patch is 
attached.  It passes make check against core and postgres_fdw.  I'll turn the 
CF status back to ready for committer shortly.


Macros _() at the postgresExecForeignCopy routine:
if (PQputCopyEnd(conn, OK ? NULL : _("canceled by server")) <= 0)

uses gettext. Under linux it is compiled ok, because (as i understood)  
uses standard implementation of gettext:

objdump -t contrib/postgres_fdw/postgres_fdw.so | grep 'gettext'
gettext@@GLIBC_2.2.5

but in MacOS (and maybe somewhere else) we need to explicitly link  
libintl library in the Makefile:

SHLIB_LINK += $(filter -lintl, $(LIBS)

Also, we may not use gettext at all in this part of the code.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-07-27 Thread Andrey Lepikhov




27.07.2020 21:34, Alexey Kondratov пишет:

Hi Andrey,

On 2020-07-23 09:23, Andrey V. Lepikhov wrote:

On 7/16/20 2:14 PM, Amit Langote wrote:

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Version 5 of the patch. With changes caused by Amit's comments.


Just got a segfault with your v5 patch by deleting from a foreign table. 
It seems that ShutdownForeignScan inside node->fdwroutine doesn't have a 
correct pointer to the required function.


I haven't had a chance to look closer on the code, but you can easily 
reproduce this error with the attached script (patched Postgres binaries 
should be available in the PATH). It works well with master and fails 
with your patch applied.


I used master a3ab7a707d and v5 version of the patch with your script.
No errors found. Can you check your test case?

--
regards,
Andrey Lepikhov
Postgres Professional




Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-08-21 Thread Andrey Lepikhov



On 8/7/20 2:14 PM, Amit Langote wrote:

I was playing around with v5 and I noticed an assertion failure which
I concluded is due to improper setting of ri_usesBulkModify.  You can
reproduce it with these steps.

create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table foo (a int, b int) partition by list (a);
create table foo1 (like foo);
create foreign table ffoo1 partition of foo for values in (1) server
lb options (table_name 'foo1');
create table foo2 (like foo);
create foreign table ffoo2 partition of foo for values in (2) server
lb options (table_name 'foo2');
create function print_new_row() returns trigger language plpgsql as $$
begin raise notice '%', new; return new; end; $$;
create trigger ffoo1_br_trig before insert on ffoo1 for each row
execute function print_new_row();
copy foo from stdin csv;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1,2
2,3
\.

NOTICE:  (1,2)
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.

#0  0x7f2d5e266337 in raise () from /lib64/libc.so.6
#1  0x7f2d5e267a28 in abort () from /lib64/libc.so.6
#2  0x00aafd5d in ExceptionalCondition
(conditionName=0x7f2d37b468d0 "!resultRelInfo->ri_usesBulkModify ||
resultRelInfo->ri_FdwRoutine->BeginForeignCopyIn == NULL",
 errorType=0x7f2d37b46680 "FailedAssertion",
fileName=0x7f2d37b4677f "postgres_fdw.c", lineNumber=1863) at
assert.c:67
#3  0x7f2d37b3b0fe in postgresExecForeignInsert (estate=0x2456320,
resultRelInfo=0x23a8f58, slot=0x23a9480, planSlot=0x0) at
postgres_fdw.c:1862
#4  0x0066362a in CopyFrom (cstate=0x23a8d40) at copy.c:3331

The problem is that partition ffoo1's BR trigger prevents it from
using multi-insert, but its ResultRelInfo.ri_usesBulkModify is true,
which is copied from its parent.  We should really check the same
things for a partition that CopyFrom() checks for the main target
relation (root parent) when deciding whether to use multi-insert.
Thnx, I added TAP-test on this problem> However instead of duplicating 
the same logic to do so in two places

(CopyFrom and ExecInitPartitionInfo), I think it might be a good idea
to refactor the code to decide if multi-insert mode can be used for a
given relation by checking its properties and put it in some place
that both the main target relation and partitions need to invoke.
InitResultRelInfo() seems to be one such place.

+1


Also, it might be a good idea to use ri_usesBulkModify more generally
than only for foreign relations as the patch currently does, because I
can see that it can replace the variable insertMethod in CopyFrom().
Having both insertMethod and ri_usesBulkModify in each ResultRelInfo
seems confusing and bug-prone.

Finally, I suggest renaming ri_usesBulkModify to ri_usesMultiInsert to
reflect its scope.

Please check the attached delta patch that applies on top of v5 to see
what that would look like.

I merged your delta patch (see v6 in attachment) to the main patch.
Currently it seems more commitable than before.

--
regards,
Andrey Lepikhov
Postgres Professional
>From da6c4fe8262df58164e2c4ab80e085a019c9c6c1 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Thu, 9 Jul 2020 11:16:56 +0500
Subject: [PATCH] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopyIn
* EndForeignCopyIn
* ExecForeignCopyIn

BeginForeignCopyIn and EndForeignCopyIn initialize and free
the CopyState of bulk COPY. The ExecForeignCopyIn routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepik

Re: Ideas about a better API for postgres_fdw remote estimates

2020-08-28 Thread Andrey Lepikhov



On 7/14/20 6:32 AM, Bruce Momjian wrote:

On Mon, Jul  6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw.  That still doesn't make it part of EXPLAIN.


Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above.  I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though.  I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me.  Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.


I agree the statistics extraction should probably be part of core.
There is the goal if FDWs returning data, and returning the data
quickly.  I think we can require all-new FDW servers to get improved
performance.  I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet.  I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.

On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well.  Getting multiple costs for a query goes in that direction.



During the implementation of sharding related improvements i noticed 
that if we use a lot of foreign partitions, we have bad plans because of 
vacuum don't update statistics of foreign tables.This is done by the 
ANALYZE command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch 
in attachment.


I implemented some FDW + pg core machinery to reduce weight of the 
problem. The ANALYZE command on foreign table executes query on foreign 
server that extracts statistics tuple, serializes it into json-formatted 
string and returns to the caller. The caller deserializes this string, 
generates statistics for this foreign table and update it. The second 
patch is a proof-of-concept.


This patch speedup analyze command and provides statistics relevance on 
a foreign table after autovacuum operation. Its effectiveness depends on 
relevance of statistics on the remote server, but still.


--
regards,
Andrey Lepikhov
Postgres Professional
>From 329954981959ee3fc97e52266c89a436d02ddf5e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 4 Aug 2020 09:29:37 +0500
Subject: [PATCH 1/2] TAP-Test on bad statistics.

Add dummy postgres_fdw_stat() routine. It will return stat tuples
for each input attribute.
---
 contrib/postgres_fdw/Makefile  |  4 ++--
 contrib/postgres_fdw/expected/foreign_stat.out | 18 ++
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql| 11 +++
 contrib/postgres_fdw/postgres_fdw.c| 17 +
 contrib/postgres_fdw/postgres_fdw.control  |  2 +-
 contrib/postgres_fdw/sql/foreign_stat.sql  | 10 ++
 6 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 contrib/postgres_fdw/expected/foreign_stat.out
 create mode 100644 contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
 create mode 100644 contrib/postgres_fdw/sql/foreign_stat.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..15a6f6c353 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -14,9 +14,9 @@ PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK_INTERNAL = $(libpq)
 
 EXTENSION = postgres_fdw
-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw foreign_stat
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/expected/foreign_stat.out b/contrib/postgres_fdw/expected/foreign_stat.out
new file mode 100644
index 00..28a470bccc
--- /dev/null
+++ b/contrib/postgres_fdw/expected/foreign_stat.out
@@ -0,0 +1,18 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
+VACUUM;
+-- Check statistic interface routine
+SELECT * FROM postgres_fdw_stat('public', 'test', 'a');
+ postgres_fdw_stat 
+---
+ 
+(1 row)
+
+EXPLAIN (TIMING OFF, SUMMARY O

Re: Ideas about a better API for postgres_fdw remote estimates

2020-08-29 Thread Andrey Lepikhov




On 8/29/20 9:22 PM, Stephen Frost wrote:



I implemented some FDW + pg core machinery to reduce weight of the problem.
The ANALYZE command on foreign table executes query on foreign server that
extracts statistics tuple, serializes it into json-formatted string and
returns to the caller. The caller deserializes this string, generates
statistics for this foreign table and update it. The second patch is a
proof-of-concept.


Isn't this going to create a version dependency that we'll need to deal
with..?  What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today.  If we don't like the available
TABLESAMPLE methods then we could add a new one that's explicitly the
'right' sample for an ANALYZE call and use that when it's available on
the remote side.  Not sure if it'd make any sense for ANALYZE itself to
start using that same TABLESAMPLE code, but maybe?  Not that I think
it'd be much of an issue if it's independent either, with appropriate
comments to note that we should probably try to make them match up for
the sake of FDWs.
This approach does not contradict your idea.  This is a lightweight 
opportunity to reduce the cost of analysis if we have a set of servers 
with actual versions of system catalog and fdw.



This patch speedup analyze command and provides statistics relevance on a
foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.


If we do decide to go down this route, wouldn't it mean we'd have to
solve the problem of what to do when it's a 9.6 foreign server being
queried from a v12 server and dealing with any difference in the
statistics structures of the two?

Seems like we would... in which case I would say that we should pull
that bit out and make it general, and use it for pg_upgrade too, which
would benefit a great deal from having the ability to upgrade stats
between major versions also.  That's a much bigger piece to take on, of
course, but seems to be what's implied with this approach for the FDW.



Thank you for this use case.

We can add field "version" to statistics string (btree uses versioning 
too). As you can see, in this patch we are only trying to get 
statistics. If for some reason this does not work out, then we go along 
a difficult route.


Moreover,  I believe this strategy should only work if we analyze a 
relation implicitly. If the user executes analysis explicitly by the 
command "ANALYZE ", we need to perform an fair analysis of the 
table.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Ideas about a better API for postgres_fdw remote estimates

2020-08-31 Thread Andrey Lepikhov

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
 wrote:

I agreed that this task needs to implement an API for
serialization/deserialization of statistics:
pg_load_relation_statistics(json_string text);
pg_get_relation_statistics(relname text);
We can use a version number for resolving conflicts with different
statistics implementations.
"Load" function will validate the values[] anyarray while deserializing
the input json string to the datatype of the relation column.


This is a valuable feature. Analysing a foreign table by fetching rows
from the foreign server isn't very efficient. In fact the current FDW
API for doing that forges that in-efficiency by requiring the FDW to
return a sample of rows that will be analysed by the core. That's why
I see that your patch introduces a new API to get foreign rel stat. I
don't think there's any point in maintaining these two APIs just for



ANALYSING table. Instead we should have only one FDW API which will do
whatever it wants and return statistics that can be understood by the
core and let core install it in the catalogs. I believe that's doable.

I think the same.


In case of PostgreSQL it could get the stats available as is from the
foreign server, convert it into a form that the core understands and
returns. The patch introduces a new function postgres_fdw_stat() which
will be available only from version 14 onwards. Can we use
row_to_json(), which is available in all the supported versions,
instead?
I started from here. But we need to convert starelid, staop[] stacoll[] 
oids into portable format. Also we need to explicitly specify the type 
of each values[] array. And no one guaranteed that anyarray values[] 
can't contained an array of complex type values, containing oids, that 
can't be correctly converted to database objects on another server...
These considerations required me to add new postgres_fdw_stat() routine 
that can be moved into the core.


In case of some other foreign server, an FDW will be responsible to
return statistics in a form that the core will understand. It may
fetch rows from the foreign server or be a bit smart and fetch the
statistics and convert.
I don't think I fully understood your idea. Please explain in more 
detail if possible.


This also means that FDWs will have to deal with the statistics format
that the core understands and thus will need changes in their code
with every version in the worst case. But AFAIR, PostgreSQL supports
different forms of statistics so the problem may not remain that
severe if FDWs and core agree on some bare minimum format that the
core supports for long.
I don't think FDW needs to know anything about the internals of 
statistics. It only need to execute query like

"SELECT extract_statistics(namespace.relation);"
and apply the text representation by the function call like this:
store_statistics(const char *stat);
All validation and update pg_statistic operations will be performed into 
the core.


I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them.

Ok. It was implemented for discussion, test and as a base of development.


I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names.
Here we get full statistics from remote server and extract statistics 
only for columns, included into the tuple descriptor of foreign table.



But I think those
problems are kind of secondary. We have to agree on the design first. 

+1.
I only want to point out the following. In previous threads statistics 
was converted row-by-row. I want to suggest to serialize all statistics 
tuples for the relation into single json string. On apply phase we can 
filter unneeded attributes.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Make query ID more portable

2021-10-12 Thread Andrey Lepikhov

On 12/10/21 13:35, Julien Rouhaud wrote:

Hi,

On Tue, Oct 12, 2021 at 4:12 PM Andrey V. Lepikhov
 wrote:

See the patch in attachment as an POC. Main idea here is to break
JumbleState down to a 'clocations' part that can be really interested in
a post parse hook and a 'context data', that needed to build query or
subquery signature (hash) and, I guess, isn't really needed in any
extensions.


There have been quite a lot of threads about that in the past, and
almost every time people wanted to change how the hash was computed.
So it seems to me that extensions would actually be quite interested
in that.  This is even more the case now that an extension can be used
to replace the queryid calculation only and keep the rest of the
extension relying on it as is.
Yes, I know. I have been using such self-made queryID for four years. 
And I will use it further.
But core jumbling code is good, fast and much easier in support. The 
purpose of this work is extending of jumbling to use in more flexible 
way to avoid meaningless copying of this code to an extension.

I think, it adds not much complexity and overhead.


I think the biggest change in your patch is:

   case RTE_RELATION:
- APP_JUMB(rte->relid);
- JumbleExpr(jstate, (Node *) rte->tablesample);
+ {
+ char *relname = regclassout_ext(rte->relid, true);
+
+ APP_JUMB_STRING(relname);
+ JumbleExpr(jstate, (Node *) rte->tablesample, ctx);
   APP_JUMB(rte->inh);
   break;

Have you done any benchmark on OLTP workload?  Adding catalog access
there is likely to add significant overhead.
Yes, I should do benchmarking. But I guess, main goal of Query ID is 
monitoring, that can be switched off, if necessary.

This part made for a demo. It can be replaced by a hook, for example.


Also, why only using the fully qualified relation name for stable
hashes?  At least operators and functions should also be treated the
same way.  If you do that you will probably have way too much overhead
to be usable in a busy production environment.  Why not using the new
possibility of 3rd party extension for the queryid calculation that
exactly suits your need?

I fully agree with these arguments. This code is POC. Main part here is 
breaking down JumbleState, using a local context for subqueries and 
sorting of a range table entries hashes.
I think, we can call one routine (APP_JUMB_OBJECT(), as an example) for 
all oids in this code. It would allow an extension to intercept this 
call and replace oid with an arbitrary value.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Make query ID more portable

2021-10-13 Thread Andrey Lepikhov

On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov  writes:

But core jumbling code is good, fast and much easier in support.

Also, the current code handles renames of schemas and objects, but this
would not.
Yes, It is good option if an extension works only in the context of one 
node. But my efforts are directed to the cross-instance usage of a 
monitoring data. As an example, it may be useful for sharding.
Also, I guess for an user is essential to think that if he changed a 
name of any object he also would change queries and reset monitoring 
data, related on this object.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Make query ID more portable

2021-10-13 Thread Andrey Lepikhov

On 12/10/21 18:40, Tom Lane wrote:

Andrey Lepikhov  writes:

But core jumbling code is good, fast and much easier in support.

A bigger issue is that query ID stability isn't something we are going
to promise on a large scale --- for example, what if a new release adds
some new fields to struct Query?  So I'm not sure that "query IDs should
survive dump/reload" is a useful goal to consider.  It's certainly not
something that could be reached by anything even remotely like the
existing code.

Thank you for the explanation.
I think the problem of queryId is that is encapsulates two different 
meanings:
1. It allows an extension to match an query on post parse and execution 
stages. In this sense, queryId should be as unique as possible for each 
query.
2. For pg_stat_statements purposes (and my project too) it represents an 
query class and should be stable against permutations of range table 
entries, clauses, e.t.c. For example:

"SELECT * FROM a,b;" and "SELECT * FROM b,a;" should have the same queryId.

This issue may be solved in an extension with next approach:
1. Force as unique value for queryId as extension wants in a post parse hook
2. Generalize the JumbleQuery routine code to generate a kind of query 
class signature.


The attached patch is a first sketch for such change.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom ef4033935b25c90fd8c5b6f10a0646a7ede385e3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Oct 2021 10:22:53 +0500
Subject: [PATCH] Add callback for jumbling of an OID value.

Query jumbling machinery depends on OIDs of objects. It is fine for
specific instance, because allows to survive an object rename. But it
is bad for a cross instance case. Even if you have the same code
version, the same schema, you can't guarantee stability of oids.

So, an extension can't rely on this technique

This patch changes an interface of the JumbleQuery routine to allow
an extension to use it for generation of an query signature and
use specific algorithm for oids jumbling.
---
 src/backend/commands/explain.c   |  6 +--
 src/backend/parser/analyze.c | 12 ++---
 src/backend/tcop/postgres.c  |  6 +--
 src/backend/utils/misc/queryjumble.c | 81 ++--
 src/include/utils/queryjumble.h  | 14 -
 5 files changed, 67 insertions(+), 52 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 10644dfac4..b2f10e828e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -166,7 +166,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
ExplainState *es = NewExplainState();
TupOutputState *tstate;
-   JumbleState *jstate = NULL;
+   JumbleState jstate;
Query  *query;
List   *rewritten;
ListCell   *lc;
@@ -246,10 +246,10 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 
query = castNode(Query, stmt->query);
if (IsQueryIdEnabled())
-   jstate = JumbleQuery(query, pstate->p_sourcetext);
+   query->queryId = JumbleQuery(query, pstate->p_sourcetext, NULL, 
&jstate);
 
if (post_parse_analyze_hook)
-   (*post_parse_analyze_hook) (pstate, query, jstate);
+   (*post_parse_analyze_hook) (pstate, query, &jstate);
 
/*
 * Parse analysis was done already, but we still have to run the rule
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 146ee8dd1e..f9c3991a94 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -113,7 +113,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
ParseState *pstate = make_parsestate(NULL);
Query  *query;
-   JumbleState *jstate = NULL;
+   JumbleState jstate;
 
Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -127,10 +127,10 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
query = transformTopLevelStmt(pstate, parseTree);
 
if (IsQueryIdEnabled())
-   jstate = JumbleQuery(query, sourceText);
+   query->queryId = JumbleQuery(query, sourceText, NULL, &jstate);
 
if (post_parse_analyze_hook)
-   (*post_parse_analyze_hook) (pstate, query, jstate);
+   (*post_parse_analyze_hook) (pstate, query, &jstate);
 
free_parsestate(pstate);
 
@@ -152,7 +152,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char 
*sourceText,
 {
ParseState *pstate = make_parsestate(NULL);
Query  *query;
-   JumbleState *jstate = NULL;
+   JumbleState jstate;
 
Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -166,10 +166,10 @@ parse_analyze_varparams(RawStmt *parseTree, const char 
*sourceText,
check_variable_parameters(pstate, query);
 
if (IsQueryIdEnabled())
-

Re: Make query ID more portable

2021-10-14 Thread Andrey Lepikhov

On 14/10/21 10:40, Julien Rouhaud wrote:

On Thu, Oct 14, 2021 at 12:37 PM Andrey Lepikhov
 wrote:


On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov  writes:

I think that there are just too many arbitrary decisions that could be
made on what exactly should be a query identifier to have a single
in-core implementation.
Yes, and I use such custom decision too. But core jumbling code 
implements good idea and can be generalized for reuse. Patch from 
previous letter and breaking down of JumbleState can allow coders to 
implement their codes based on queryjumble.c module with smaller changes.



 If you do sharding, you already have to
properly configure each node, so configuring your custom query id
extension shouldn't be a big problem.
My project is about adaptive query optimization techniques. It is not 
obvious how to match (without a field in Query struct) a post parse and 
an execution phases because of nested queries.
Also, if we use queryId in an extension, we interfere with 
pg_stat_statements.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Suggestions on message transfer among backends

2019-03-11 Thread Andrey Lepikhov

On 11/03/2019 18:36, Andy Fan wrote:

Hi:
   I need some function which requires some message exchange among 
different back-ends (connections).

specially I need a shared hash map and a message queue.

Message queue:  it should be many writers,  1 reader.   Looks POSIX 
message queue should be OK, but postgre doesn't use it.  is there any 
equivalent in PG?


shared hash map:  the number of items can be fixed and the value can be 
fixed as well.


any keywords or explanation will be extremely helpful.
You may use shm_mq (shared memory queue) and hash tables (dynahash.c) in 
shared memory (see ShmemInitHash() + shmem_startup_hook)


Thanks


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

2019-03-26 Thread Andrey Lepikhov




On 25/03/2019 15:21, Heikki Linnakangas wrote:

On 25/03/2019 09:57, David Steele wrote:

On 2/6/19 2:08 PM, Andrey Lepikhov wrote:

The patchset had a problem with all-zero pages, has appeared at index
build stage: the generic_log_relation() routine sends all pages into the
WAL. So  lsn field at all-zero page was initialized and the
PageIsVerified() routine detects it as a bad page.
The solution may be:
1. To improve index build algorithms and eliminate the possibility of
not used pages appearing.
2. To mark each page as 'dirty' right after initialization. In this case
we will got 'empty' page instead of the all-zeroed.
3. Do not write into the WAL all-zero pages.


Hmm. When do we create all-zero pages during index build? That seems 
pretty surprising.
GIST uses buffered pages. During GIST build it is possible (very rarely) 
what no one index tuple was written to the block page before new block 
was allocated. And the page has become an all-zero page.
You can't have problems in the current GIST code, because it writes into 
the WAL only changed pages.
But the idea of the patch is traversing blocks of index relation 
one-by-one after end of index building process and write this blocks to 
the WAL. In this case we will see all-zeroed pages and need to check it.



On 04.02.2019 10:04, Michael Paquier wrote:

On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:

Ok. It is used only for demonstration.


The latest patch set needs a rebase, so moved to next CF, waiting on
author as this got no reviews.


The patch no longer applies so marked Waiting on Author.

Alexander, Heikki, are either of you planning to review the patch in
this CF?


I had another quick look.

I still think using the "generic xlog AM" for this is a wrong level of 
abstraction, and we should use the XLOG_FPI records for this directly. 
We can extend XLOG_FPI so that it can store multiple pages in a single 
record, if it doesn't already handle it.


Another counter-point to using the generic xlog record is that you're 
currently doing unnecessary two memcpy's of all pages in the index, in 
GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.


I guess the generic_log_relation() function can stay where it is, but it 
should use XLogRegisterBuffer() and XLogInsert() directly.
Ok. This patch waited feedback for a long time. I will check the GIST 
code changes from previous review and will try to use your advice.


- Heikki


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

2019-04-01 Thread Andrey Lepikhov



On 26/03/2019 15:59, Heikki Linnakangas wrote:

On 26/03/2019 11:29, Andrey Lepikhov wrote:

On 25/03/2019 15:21, Heikki Linnakangas wrote:

Hmm. When do we create all-zero pages during index build? That seems
pretty surprising.


GIST uses buffered pages. During GIST build it is possible (very rarely)
what no one index tuple was written to the block page before new block
was allocated. And the page has become an all-zero page.
You can't have problems in the current GIST code, because it writes into
the WAL only changed pages.


Looking at the code, I don't see how that could happen. The only place 
where the GiST index file is extended is in gistNewBuffer(), and all 
callers of that initialize the page immediately after the call. What am 
I missing?

Sorry, This issue was found in SP-GiST AM. You can show it:
1. Apply v2 version of the patch set (see attachment).
2. In the generic_log_relation() routine set logging on PageIsNew(buf)
3. Run script t1.sql (in attachment).

This problem can be resolved by calling MarkBufferDirty() after 
SpGistInitBuffer() in the allocNewBuffer() routine. But in this case we 
will write to the WAL more pages than necessary.
To avoid it in the patch '0001-Relation-into-WAL-function' I do not 
write new pages to the WAL.


Attached patch set is not final version. It is needed for demonstration 
of 'all-zero pages' issue only. The sentence for the direct use of 
XLOG_FPI records will be considered in v3.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


t1.sql
Description: application/sql
>From d3093aa9a7628979b892d31449eda6228ef169ce Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 1 Apr 2019 08:33:46 +0500
Subject: [PATCH 1/4] Relation-into-WAL-function

---
 src/backend/access/transam/generic_xlog.c | 48 +++
 src/include/access/generic_xlog.h |  3 ++
 2 files changed, 51 insertions(+)

diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index 5b00b7275b..c22e361747 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -542,3 +542,51 @@ generic_mask(char *page, BlockNumber blkno)
 
 	mask_unused_space(page);
 }
+
+/*
+ * Function to write generic xlog for every existing block of a relation.
+ * Caller is responsible for locking the relation exclusively.
+ */
+void
+generic_log_relation(Relation rel)
+{
+	BlockNumber 		blkno;
+	BlockNumber 		nblocks;
+	int	npbuf = 0;
+	GenericXLogState	*state = NULL;
+	Bufferbufpack[MAX_GENERIC_XLOG_PAGES];
+
+	CHECK_FOR_INTERRUPTS();
+	nblocks = RelationGetNumberOfBlocks(rel);
+
+	/*
+	 * Iterate over all index pages and WAL-logging it. Pages are grouping into
+	 * the packages before adding to a WAL-record. Zero-pages are
+	 * not logged.
+	 */
+	for (blkno = 0; blkno < nblocks; blkno++)
+	{
+		Buffer	buf;
+
+		buf = ReadBuffer(rel, blkno);
+		if (!PageIsNew(BufferGetPage(buf)))
+		{
+			if (npbuf == 0)
+state = GenericXLogStart(rel);
+
+			LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+			GenericXLogRegisterBuffer(state, buf, GENERIC_XLOG_FULL_IMAGE);
+			bufpack[npbuf++] = buf;
+		}
+		else
+			ReleaseBuffer(buf);
+
+		if ((npbuf == MAX_GENERIC_XLOG_PAGES) || (blkno == nblocks-1))
+		{
+			GenericXLogFinish(state);
+
+			for (; npbuf > 0; npbuf--)
+UnlockReleaseBuffer(bufpack[npbuf-1]);
+		}
+	}
+}
diff --git a/src/include/access/generic_xlog.h b/src/include/access/generic_xlog.h
index cb5b5b713a..e3bbf014cc 100644
--- a/src/include/access/generic_xlog.h
+++ b/src/include/access/generic_xlog.h
@@ -42,4 +42,7 @@ extern const char *generic_identify(uint8 info);
 extern void generic_desc(StringInfo buf, XLogReaderState *record);
 extern void generic_mask(char *pagedata, BlockNumber blkno);
 
+/* other utils */
+extern void generic_log_relation(Relation rel);
+
 #endif			/* GENERIC_XLOG_H */
-- 
2.17.1

>From 9a0172346c8a942b6a493aca8c47452256a2932f Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 1 Apr 2019 08:37:32 +0500
Subject: [PATCH 2/4] GIN-Optimal-WAL-Usage

---
 src/backend/access/gin/ginbtree.c |  6 ++---
 src/backend/access/gin/gindatapage.c  |  9 
 src/backend/access/gin/ginentrypage.c |  2 +-
 src/backend/access/gin/gininsert.c| 30 ++--
 src/backend/access/gin/ginutil.c  |  4 ++--
 src/backend/access/gin/ginvacuum.c|  2 +-
 src/backend/access/gin/ginxlog.c  | 33 ---
 src/backend/access/rmgrdesc/gindesc.c |  6 -
 src/include/access/gin.h  |  3 ++-
 src/include/access/ginxlog.h  |  2 --
 10 files changed, 26 insertions(+), 71 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 533949e46a..9f82eef8c3 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -396,7 +396,7 @@ ginP

Re: Removing unneeded self joins

2019-08-05 Thread Andrey Lepikhov



On 02/08/2019 04:54, Thomas Munro wrote:

On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
 wrote:

Version v.17 of the patch that fix the bug see in attachment.


While moving this to the September CF, I noticed that it needs to be
updated for the recent pg_list.h API changes.

The patch was updated:
1. Changes caused by pg_list.h API changes.
2. Fix the problem of joint clause_relids and required_relids changes [1].
3. Add eclass mentions of removed relation into the kept relation (field 
eclass_indexes was introduced by commit 3373c71553).


[1] 
https://www.postgresql.org/message-id/flat/5c21029d-81a2-c999-6744-6a898fcc9a19%40postgrespro.ru


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 72ac38de7fb55fe741677ea75b280eecb443e978 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 2 Aug 2019 11:01:47 +0500
Subject: [PATCH] Remove self joins v18

---
 src/backend/nodes/outfuncs.c  |   19 +-
 src/backend/optimizer/path/indxpath.c |   28 +-
 src/backend/optimizer/path/joinpath.c |6 +-
 src/backend/optimizer/plan/analyzejoins.c | 1021 +++--
 src/backend/optimizer/plan/planmain.c |7 +-
 src/backend/optimizer/util/pathnode.c |3 +-
 src/backend/optimizer/util/relnode.c  |   26 +-
 src/include/nodes/nodes.h |1 +
 src/include/nodes/pathnodes.h |   22 +-
 src/include/optimizer/pathnode.h  |4 +
 src/include/optimizer/paths.h |3 +-
 src/include/optimizer/planmain.h  |7 +-
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  256 +-
 src/test/regress/sql/equivclass.sql   |   17 +
 src/test/regress/sql/join.sql |  127 +++
 16 files changed, 1480 insertions(+), 99 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 86c31a48c9..9d511d1d1b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2270,7 +2270,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2342,6 +2343,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 	WRITE_BITMAPSET_FIELD(keys);
 }
 
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
 static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
@@ -4108,6 +4122,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f339fdfde..b57be54df6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3568,7 +3568,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3589,12 +3590,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3645,6 +3650,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3693,6 +3699,9 @@ relation_ha

Re: Removing unneeded self joins

2019-11-05 Thread Andrey Lepikhov

v.21 in attechment fix small bug:
Now we move all non-mergejoinable clauses from joininfo to base restrict 
info because of the relation will not be joined.


On 30/09/2019 13:29, Konstantin Knizhnik wrote:


Slightly refactored version of the patch with more comments.



--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 4540345416f1d5ac71395773621d022d15b529f1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 5 Nov 2019 17:57:23 +0500
Subject: [PATCH] Remove self joins v21

---
 src/backend/nodes/outfuncs.c  |   19 +-
 src/backend/optimizer/path/indxpath.c |   28 +-
 src/backend/optimizer/path/joinpath.c |6 +-
 src/backend/optimizer/plan/analyzejoins.c | 1155 -
 src/backend/optimizer/plan/planmain.c |7 +-
 src/backend/optimizer/util/pathnode.c |3 +-
 src/backend/optimizer/util/relnode.c  |   26 +-
 src/backend/utils/mmgr/aset.c |1 -
 src/include/nodes/nodes.h |1 +
 src/include/nodes/pathnodes.h |   22 +-
 src/include/optimizer/pathnode.h  |4 +
 src/include/optimizer/paths.h |3 +-
 src/include/optimizer/planmain.h  |7 +-
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  275 
 src/test/regress/expected/updatable_views.out |   15 +-
 src/test/regress/sql/equivclass.sql   |   17 +
 src/test/regress/sql/join.sql |  139 ++
 18 files changed, 1653 insertions(+), 107 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b0dcd02ff6..dc8cb9cec0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2274,7 +2274,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2346,6 +2347,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 	WRITE_BITMAPSET_FIELD(keys);
 }
 
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
 static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
@@ -4122,6 +4136,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 37b257cd0e..3d0d03b887 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	column_values = lappend(column_values, rinfo->outer_is_left
+	

Re: pg_waldump and PREPARE

2019-11-07 Thread Andrey Lepikhov




On 08/11/2019 05:53, Fujii Masao wrote:

On Fri, Nov 8, 2019 at 9:41 AM Michael Paquier  wrote:


On Tue, Sep 03, 2019 at 10:00:08AM +0900, Fujii Masao wrote:

Sorry for the long delay... Yes, I will update the patch if necessary.


Fujii-san, are you planning to update this patch then?  I have
switched it as waiting on author.


No because there has been nothing to update in the latest patch for now
unless I'm missing something. So I'm just waiting for some new review
comments against the latest patch to come :)
Can I switch the status back to "Needs review"?

Regards,



One issue is that your patch provides small information. WAL errors 
Investigation often requires information on xid, subxacts, 
delete-on-abort/commit rels; rarely - invalidation messages etc.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: pg_waldump and PREPARE

2019-11-07 Thread Andrey Lepikhov




On 08/11/2019 09:26, Kyotaro Horiguchi wrote:

Hello.

At Fri, 8 Nov 2019 08:23:41 +0500, Andrey Lepikhov  
wrote in

Can I switch the status back to "Needs review"?
Regards,



One issue is that your patch provides small information. WAL errors
Investigation often requires information on xid, subxacts,
delete-on-abort/commit rels; rarely - invalidation messages etc.


Basically agrred, but it can be very large in certain cases, even if
it is rare.


Maybe this is the reason for introducing a “verbose” option?

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: pg_waldump and PREPARE

2019-11-12 Thread Andrey Lepikhov




12.11.2019 12:41, Fujii Masao пишет:

Ok, I changed the patch that way.
Attached is the latest version of the patch.

Regards,


I did not see any problems in this version of the patch. The information 
displayed by pg_waldump for the PREPARE record is sufficient for use.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Optimization of NestLoop join in the case of guaranteed empty inner subtree

2019-12-11 Thread Andrey Lepikhov
During NestLoop execution we have bad corner case: if outer subtree 
contains tuples the join node will scan inner subtree even if it does 
not return any tuples.


To reproduce the problem see 'problem.sql' in attachment:
Out of explain analyze see in 'problem_explain.txt'

As you can see, executor scan each of 1e5 outer tuples despite the fact 
that inner can't return any tuples.


Teodor Sigaev and I developed a patch to solve this problem. Result of 
explain analyze procedure can be found in the 'optimized_execution.txt'.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


problem.sql
Description: application/sql
QUERY PLAN  
  
--
 Limit  (cost=0.29..1515.83 rows=1 width=80) (actual time=25.194..25.194 rows=0 
loops=1)
   ->  Nested Loop  (cost=0.29..504674.12 rows=333 width=80) (actual 
time=25.193..25.193 rows=0 loops=1)
 Join Filter: (big.id = small.id)
 ->  Index Scan Backward using big_pkey on big  (cost=0.29..5152.29 
rows=10 width=44) (actual time=0.005..12.933 rows=10 loops=1)
 ->  Materialize  (cost=0.00..22.66 rows=333 width=36) (actual 
time=0.000..0.000 rows=0 loops=10)
   ->  Seq Scan on small  (cost=0.00..21.00 rows=333 width=36) 
(actual time=0.145..0.145 rows=0 loops=1)
 Filter: ((id)::numeric > '1000'::numeric)
 Rows Removed by Filter: 1000
 Planning Time: 0.215 ms
 Execution Time: 25.214 ms
(10 rows)

>From 40ffbd2d9ee5954e5ae09f1e5a929ae2f2b17b8c Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 9 Dec 2019 18:25:04 +0500
Subject: [PATCH] Skip scan of outer subtree if inner of the NestedLoop node is
 guaranteed empty.

---
 src/backend/executor/nodeMaterial.c   | 11 +++
 src/backend/executor/nodeNestloop.c   |  5 +
 src/include/nodes/execnodes.h |  4 
 src/test/regress/expected/partition_prune.out |  8 
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index cc93bbe45b..98771647f6 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -135,6 +135,8 @@ ExecMaterial(PlanState *pstate)
 		if (TupIsNull(outerslot))
 		{
 			node->eof_underlying = true;
+			if (tuplestorestate && tuplestore_tuple_count(tuplestorestate) == 0)
+node->ss.ps.guaranteed_empty = true;
 			return NULL;
 		}
 
@@ -348,6 +350,9 @@ ExecReScanMaterial(MaterialState *node)
 			node->tuplestorestate = NULL;
 			if (outerPlan->chgParam == NULL)
 ExecReScan(outerPlan);
+			else
+/* At first look no cases can lead to this code. But still. */
+node->ss.ps.guaranteed_empty = false;
 			node->eof_underlying = false;
 		}
 		else
@@ -363,6 +368,12 @@ ExecReScanMaterial(MaterialState *node)
 		 */
 		if (outerPlan->chgParam == NULL)
 			ExecReScan(outerPlan);
+
+		/*
+		 * The node suppress tuplestore and guaranteed_empty trick isn't work.
+		 */
+		Assert(node->ss.ps.guaranteed_empty == false);
+
 		node->eof_underlying = false;
 	}
 }
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc6667ef82..ea26a7d9d1 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -164,6 +164,11 @@ ExecNestLoop(PlanState *pstate)
 		{
 			ENL1_printf("no inner tuple, need new outer tuple");
 
+			if (innerPlan->guaranteed_empty &&
+(node->js.jointype == JOIN_INNER ||
+ node->js.jointype == JOIN_SEMI))
+return NULL;
+
 			node->nl_NeedNewOuter = true;
 
 			if (!node->nl_MatchedOuter &&
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 692438d6df..c7f1a14526 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1007,6 +1007,9 @@ typedef struct PlanState
 	 * ExecConditionalAssignProjectionInfo()).  If no projection is necessary
 	 * ExecConditionalAssignProjectionInfo() defaults those fields to the scan
 	 * operations.
+	 *
+	 * If guaranteed_empty field is true, this means that no tuple will be
+	 * returned by the node.
 	 */
 	const TupleTableSlotOps *scanops;
 	const TupleTableSlotOps *outerops;
@@ -1020,6 +1023,7 @@ typedef struct PlanState
 	bool		outeropsset;
 	bool		inneropsset;
 	bool		resultopsset;
+	bool		guaranteed_empty;
 } PlanState;
 
 /* 
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f9eeda60e6..04cfe0944e 100644
--- a/src/test/re

Re: Optimization of NestLoop join in the case of guaranteed empty inner subtree

2019-12-15 Thread Andrey Lepikhov



On 12/11/19 8:49 PM, Tom Lane wrote:

Andrey Lepikhov  writes:

During NestLoop execution we have bad corner case: if outer subtree
contains tuples the join node will scan inner subtree even if it does
not return any tuples.


So the first question about corner-case optimizations like this is always
"how much overhead does it add in the normal case where it fails to gain
anything?".  I see no performance numbers in your proposal.


I thought it is trivial. But quick study shows no differences that can 
be seen.




I do not much like anything about the code, either: as written it's
only helpful for an especially narrow corner case (so narrow that
I wonder if it really ever helps at all: surely calling a nodeMaterial
whose tuplestore is empty doesn't cost much).


Scanning of large outer can be very costly. If you will try to play with 
analytical queries you can find cases, where nested loops uses 
materialization of zero tuples. At least one of the cases for this is 
finding data gaps.

Also, this optimization exists in logic of hash join.


 But that doesn't stop it
from adding a bool to the generic PlanState struct, with global
implications.  What I'd expected from your text description is that
nodeNestLoop would remember whether its inner child had returned zero rows
the first time, and assume that subsequent executions could be skipped
unless the inner child's parameters change.


This note I was waiting for. I agree with you that adding a bool 
variable to PlanState is excessful. See in attachment another version of 
the optimization.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From a92617b82d922d5ebac7342bd8c212e0eb5b4553 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 9 Dec 2019 18:25:04 +0500
Subject: [PATCH] Skip scan of outer subtree if inner of the NestedLoop node is
 guaranteed empty.

---
 src/backend/executor/nodeNestloop.c   | 8 
 src/include/nodes/execnodes.h | 1 +
 src/test/regress/expected/partition_prune.out | 8 
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc6667ef82..4a7da5406d 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -164,6 +164,11 @@ ExecNestLoop(PlanState *pstate)
 		{
 			ENL1_printf("no inner tuple, need new outer tuple");
 
+			if (node->nl_InnerEmpty && list_length(nl->nestParams) == 0 &&
+(node->js.jointype == JOIN_INNER ||
+ node->js.jointype == JOIN_SEMI))
+return NULL;
+
 			node->nl_NeedNewOuter = true;
 
 			if (!node->nl_MatchedOuter &&
@@ -200,6 +205,8 @@ ExecNestLoop(PlanState *pstate)
 			 */
 			continue;
 		}
+		else
+			node->nl_InnerEmpty = false;
 
 		/*
 		 * at this point we have a new pair of inner and outer tuples so we
@@ -327,6 +334,7 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	{
 		case JOIN_INNER:
 		case JOIN_SEMI:
+			nlstate->nl_InnerEmpty = true;
 			break;
 		case JOIN_LEFT:
 		case JOIN_ANTI:
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 692438d6df..8829433347 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1847,6 +1847,7 @@ typedef struct NestLoopState
 	JoinState	js;/* its first field is NodeTag */
 	bool		nl_NeedNewOuter;
 	bool		nl_MatchedOuter;
+	bool		nl_InnerEmpty;
 	TupleTableSlot *nl_NullInnerTupleSlot;
 } NestLoopState;
 
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f9eeda60e6..04cfe0944e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2455,9 +2455,9 @@ update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
Index Cond: (a = 1)
-   ->  Bitmap Heap Scan on ab_a1_b3 ab_2 (actual rows=0 loops=1)
+   ->  Bitmap Heap Scan on ab_a1_b3 ab_2 (never executed)
  Recheck Cond: (a = 1)
- ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
+ ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
Index Cond: (a = 1)
  ->  Materialize (actual rows=0 loops=1)
->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
@@ -2496,9 +2496,9 @@ update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
Index Cond: (a = 1)
-   -

Re: NOT IN subquery optimization

2020-01-04 Thread Andrey Lepikhov
At the top of the thread your co-author argued the beginning of this 
work with "findings about the performance of PostgreSQL, MySQL, and 
Oracle on various subqueries:"


https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/

I launched this test set with your "not_in ..." patch. Your optimization 
improves only results of D10-D13 queries. Nothing has changed for bad 
plans of the E20-E27 and F20-F27 queries.


For example, we can replace E20 query:
SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u = 
large.u); - execution time: 1370 ms, by
SELECT * FROM large WHERE EXISTS (SELECT n,u FROM small WHERE (small.u = 
large.u) AND (large.n = small.n

)) AND n IS NOT NULL; - execution time: 0.112 ms

E21 query:
SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u = 
large.u); - 1553 ms, by
SELECT * FROM large WHERE EXISTS (SELECT nn FROM small WHERE (small.u = 
large.u) AND (small.nn = large.n)); - 0.194 ms


F27 query:
SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu 
= large.u); - 1653.048 ms, by
SELECT * FROM large WHERE NOT EXISTS (SELECT nn,nu FROM small WHERE 
(small.nu = large.u) AND (small.nn = large.nn)); - 274.019 ms


Are you planning to make another patch for these cases?

Also i tried to increase work_mem up to 2GB: may be hashed subqueries 
can improve situation? But hashing is not improved execution time of the 
queries significantly.


On your test cases (from the comments of the patch) the subquery hashing 
has the same execution time with queries No.13-17. At the queries 
No.1-12 it is not so slow as without hashing, but works more slowly (up 
to 3 orders) than NOT IN optimization.


On 12/2/19 9:25 PM, Li, Zheng wrote:

Here is the latest rebased patch.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: NOT IN subquery optimization

2020-01-08 Thread Andrey Lepikhov




On 1/7/20 12:34 AM, Li, Zheng wrote:

Hi Andrey,

Thanks for the comment!

The unimproved cases you mentioned all fall into the category “correlated 
subquery”. This category is explicitly disallowed by existing code to convert 
to join in convert_ANY_sublink_to_join:
 /*
  * The sub-select must not refer to any Vars of the parent query. (Vars of
  * higher levels should be okay, though.)
  */
 if (contain_vars_of_level((Node *) subselect, 1))
 return NULL;

I think this is also the reason why hashed subplan is not used for such 
subqueries.

It's probably not always safe to convert a correlated subquery to join. We need 
to find out/prove when it’s safe/unsafe to convert such ANY subquery if we were 
to do so.



Maybe this part of code contains logical error?
You optimize only the special case of the "NOT IN" expression, equal to 
NOT EXISTS. The convert_EXISTS_sublink_to_join() routine can contain 
vars of the parent query.

May be you give an trivial example for this problem?

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: Removing unneeded self joins

2020-01-27 Thread Andrey Lepikhov

Rebased version v.22.
- Added enable_self_join_removal GUC (true is default)
- The joinquals of the relation that is being removed, redistributed in 
accordance with the remove_rel_from_query () machinery.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 2d194aab7f8e43805a61901de34b28fcc4e136b4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 27 Jan 2020 14:51:20 +0500
Subject: [PATCH] Remove self joins v.22

---
 src/backend/nodes/outfuncs.c  |   19 +-
 src/backend/optimizer/path/indxpath.c |   28 +-
 src/backend/optimizer/path/joinpath.c |6 +-
 src/backend/optimizer/plan/analyzejoins.c | 1174 -
 src/backend/optimizer/plan/planmain.c |7 +-
 src/backend/optimizer/util/pathnode.c |3 +-
 src/backend/optimizer/util/relnode.c  |   26 +-
 src/backend/utils/misc/guc.c  |   11 +
 src/backend/utils/mmgr/aset.c |1 -
 src/include/nodes/nodes.h |1 +
 src/include/nodes/pathnodes.h |   22 +-
 src/include/optimizer/pathnode.h  |4 +
 src/include/optimizer/paths.h |3 +-
 src/include/optimizer/planmain.h  |7 +-
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  275 
 src/test/regress/expected/updatable_views.out |   15 +-
 src/test/regress/sql/equivclass.sql   |   17 +
 src/test/regress/sql/join.sql |  139 ++
 19 files changed, 1683 insertions(+), 107 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..7b2b2641cd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2278,7 +2278,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2350,6 +2351,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 	WRITE_BITMAPSET_FIELD(keys);
 }
 
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
 static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
@@ -4131,6 +4145,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..0c6e7eee74 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	column_values = lappend(column_values, rinfo->outer_is_left
+	

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

2019-04-01 Thread Andrey Lepikhov



On 25/03/2019 15:21, Heikki Linnakangas wrote:

I had another quick look.

I still think using the "generic xlog AM" for this is a wrong level of 
abstraction, and we should use the XLOG_FPI records for this directly. 
We can extend XLOG_FPI so that it can store multiple pages in a single 
record, if it doesn't already handle it.


Another counter-point to using the generic xlog record is that you're 
currently doing unnecessary two memcpy's of all pages in the index, in 
GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.


I guess the generic_log_relation() function can stay where it is, but it 
should use XLogRegisterBuffer() and XLogInsert() directly.


Patch set v.3 uses XLOG_FPI records directly.

As a benchmark I use the script (test.sql in attachment) which show WAL 
size increment during index build. In the table below you can see the 
influence of the patch on WAL growth.


Results
===
AM  | master | patch |
GIN | 347 MB | 66 MB |
GiST| 157 MB | 43 MB |
SP-GiST | 119 MB | 38 MB |

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 2edd0ada13b4749487d0f046191ef2bcf8b11ca3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 2 Apr 2019 09:42:59 +0500
Subject: [PATCH 1/4] Relation-into-WAL-function

---
 src/backend/access/transam/generic_xlog.c | 63 +++
 src/include/access/generic_xlog.h |  3 ++
 2 files changed, 66 insertions(+)

diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index 5b00b7275b..0d5025f78a 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -16,6 +16,7 @@
 #include "access/bufmask.h"
 #include "access/generic_xlog.h"
 #include "access/xlogutils.h"
+#include "catalog/pg_control.h"
 #include "miscadmin.h"
 #include "utils/memutils.h"
 
@@ -542,3 +543,65 @@ generic_mask(char *page, BlockNumber blkno)
 
 	mask_unused_space(page);
 }
+
+/*
+ * Function for WAL-logging all pages of a relation.
+ * Caller is responsible for locking the relation exclusively.
+ */
+void
+log_relation(Relation rel)
+{
+	BlockNumber 		blkno = 0;
+	BlockNumber 		nblocks;
+	Bufferbufpack[XLR_MAX_BLOCK_ID];
+
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Iterate over all index pages and WAL-logging it. Pages are grouping into
+	 * the packages before adding to a WAL-record. Zero pages are not logged.
+	 */
+	nblocks = RelationGetNumberOfBlocks(rel);
+	while (blkno < nblocks)
+	{
+		XLogRecPtr recptr;
+		int8 nbufs = 0;
+		int8 i;
+
+		/*
+		 * Assemble package of relation blocks. Try to combine the maximum
+		 * possible number of blocks in one record.
+		 */
+		while (nbufs < XLR_MAX_BLOCK_ID && blkno < nblocks)
+		{
+			Buffer buf = ReadBuffer(rel, blkno);
+
+			if (!PageIsNew(BufferGetPage(buf)))
+bufpack[nbufs++] = buf;
+			else
+ReleaseBuffer(buf);
+			blkno++;
+		}
+
+		XLogBeginInsert();
+		XLogEnsureRecordSpace(nbufs, 0);
+
+		START_CRIT_SECTION();
+		for (i = 0; i < nbufs; i++)
+		{
+			LockBuffer(bufpack[i], BUFFER_LOCK_EXCLUSIVE);
+			XLogRegisterBuffer(i, bufpack[i], REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
+		}
+
+		recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI);
+
+		for (i = 0; i < nbufs; i++)
+		{
+			Page page = BufferGetPage(bufpack[i]);
+			PageSetLSN(page, recptr);
+			MarkBufferDirty(bufpack[i]);
+			UnlockReleaseBuffer(bufpack[i]);
+		}
+		END_CRIT_SECTION();
+	}
+}
diff --git a/src/include/access/generic_xlog.h b/src/include/access/generic_xlog.h
index cb5b5b713a..8abfa486c7 100644
--- a/src/include/access/generic_xlog.h
+++ b/src/include/access/generic_xlog.h
@@ -42,4 +42,7 @@ extern const char *generic_identify(uint8 info);
 extern void generic_desc(StringInfo buf, XLogReaderState *record);
 extern void generic_mask(char *pagedata, BlockNumber blkno);
 
+/* other utils */
+extern void log_relation(Relation rel);
+
 #endif			/* GENERIC_XLOG_H */
-- 
2.17.1

>From 8857657efa8f3347010d3b251315f800dd03bf8d Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 2 Apr 2019 09:43:20 +0500
Subject: [PATCH 2/4] GIN-Optimal-WAL-Usage

---
 src/backend/access/gin/ginbtree.c |  6 ++---
 src/backend/access/gin/gindatapage.c  |  9 
 src/backend/access/gin/ginentrypage.c |  2 +-
 src/backend/access/gin/gininsert.c| 30 ++--
 src/backend/access/gin/ginutil.c  |  4 ++--
 src/backend/access/gin/ginvacuum.c|  2 +-
 src/backend/access/gin/ginxlog.c  | 33 ---
 src/backend/access/rmgrdesc/gindesc.c |  6 -
 src/include/access/gin.h  |  3 ++-
 src/include/access/ginxlog.h  |  2 --
 10 files changed, 26 insertions(+), 71 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 533949e46a..9f82eef8

Re: Failure in contrib test _int on loach

2019-04-10 Thread Andrey Lepikhov



On 10/04/2019 20:25, Heikki Linnakangas wrote:

On 09/04/2019 19:11, Anastasia Lubennikova wrote:

05.04.2019 19:41, Anastasia Lubennikova writes:

In attachment, you can find patch with a test that allows to reproduce
the bug not randomly, but on every run.
Now I'm trying to find a way to fix the issue.


The problem was caused by incorrect detection of the page to insert new
tuple after split.
If gistinserttuple() of the tuple formed by gistgetadjusted() had to
split the page, we must to go back to the parent and
descend back to the child that's a better fit for the new tuple.

Previously this was handled by the code block with the following comment:

* Concurrent split detected. There's no guarantee that the
* downlink for this page is consistent with the tuple we're
* inserting anymore, so go back to parent and rechoose the best
* child.

After introducing GistBuildNSN this code path became unreachable.
To fix it, I added new flag to detect such splits during indexbuild.


Isn't it possible that the grandparent page is also split, so that we'd 
need to climb further up?
Based on Anastasia's idea i prepare alternative solution to fix the bug 
(see attachment).
It utilizes the idea of linear increment of LSN/NSN. WAL write process 
is used for change NSN value to 1 for each block of index relation.

I hope this can be a fairly clear and safe solution.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 59e1519a0a48b879777820ff68116c68ed31e684 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 11 Apr 2019 10:52:39 +0500
Subject: [PATCH] Alt fix for gist_optimized_wal_intarray_test bug

---
 src/backend/access/gin/gininsert.c  |  2 +-
 src/backend/access/gist/gist.c  |  4 ++--
 src/backend/access/gist/gistbuild.c | 17 +++--
 src/backend/access/spgist/spginsert.c   |  2 +-
 src/backend/access/transam/xloginsert.c |  4 +++-
 src/include/access/gist.h   |  6 --
 src/include/access/gist_private.h   |  2 ++
 src/include/access/xloginsert.h |  6 +-
 8 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 55eab14617..a63f33b429 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -413,7 +413,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	{
 		log_newpage_range(index, MAIN_FORKNUM,
 		  0, RelationGetNumberOfBlocks(index),
-		  true);
+		  true, NULL);
 	}
 
 	/*
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 2db790c840..56f6ce04db 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -492,7 +492,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 		 * we don't need to be able to detect concurrent splits yet.)
 		 */
 		if (is_build)
-			recptr = GistBuildLSN;
+			recptr = gistBuildLSN++;
 		else
 		{
 			if (RelationNeedsWAL(rel))
@@ -559,7 +559,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 			MarkBufferDirty(leftchildbuf);
 
 		if (is_build)
-			recptr = GistBuildLSN;
+			recptr = gistBuildLSN++;
 		else
 		{
 			if (RelationNeedsWAL(rel))
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 8e81eda517..31118b54cf 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -76,6 +76,13 @@ typedef struct
 	GistBufferingMode bufferingMode;
 } GISTBuildState;
 
+/*
+ * A bogus LSN / NSN value used during index build. Must be smaller than any
+ * real or fake unlogged LSN, so that after an index build finishes, all the
+ * splits are considered completed.
+ */
+XLogRecPtr gistBuildLSN = 0;
+
 /* prototypes for private functions */
 static void gistInitBuffering(GISTBuildState *buildstate);
 static int	calculatePagesPerBuffer(GISTBuildState *buildstate, int levelStep);
@@ -107,6 +114,12 @@ static void gistMemorizeParent(GISTBuildState *buildstate, BlockNumber child,
 static void gistMemorizeAllDownlinks(GISTBuildState *buildstate, Buffer parent);
 static BlockNumber gistGetParent(GISTBuildState *buildstate, BlockNumber child);
 
+static void
+gistbuild_log_mask(char *page)
+{
+	GistPageSetNSN((Page) page, (uint64) 1);
+}
+
 /*
  * Main entry point to GiST index build. Initially calls insert over and over,
  * but switches to more efficient buffering build algorithm after a certain
@@ -180,7 +193,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	GISTInitBuffer(buffer, F_LEAF);
 
 	MarkBufferDirty(buffer);
-	PageSetLSN(page, GistBuildLSN);
+	PageSetLSN(page, gistBuildLSN++);
 
 	UnlockReleaseBuffer(buffer);
 
@@ -222,7 +235,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	{
 		log_newpage_range(index, MAIN_FORKNUM,
 		  0, RelationGetNumberOfBlocks(inde

Re: Failure in contrib test _int on loach

2019-04-11 Thread Andrey Lepikhov




On 11/04/2019 13:14, Heikki Linnakangas wrote:

On 11/04/2019 09:10, Andrey Lepikhov wrote:

On 10/04/2019 20:25, Heikki Linnakangas wrote:

On 09/04/2019 19:11, Anastasia Lubennikova wrote:

After introducing GistBuildNSN this code path became unreachable.
To fix it, I added new flag to detect such splits during indexbuild.


Isn't it possible that the grandparent page is also split, so that we'd
need to climb further up?


Based on Anastasia's idea i prepare alternative solution to fix the bug
(see attachment).
It utilizes the idea of linear increment of LSN/NSN. WAL write process
is used for change NSN value to 1 for each block of index relation.
I hope this can be a fairly clear and safe solution.


That's basically the same idea as always using the "fake LSN" during 
index build, like the original version of this patch did. It's got the 
problem that I mentioned at 
https://www.postgresql.org/message-id/090fb3cb-1ca4-e173-ecf7-47d41ebac...@iki.fi: 



* Using "fake" unlogged LSNs for GiST index build seemed fishy. I 
could not convince myself that it was safe in all corner cases. In a 
recently initdb'd cluster, it's theoretically possible that the fake 
LSN counter overtakes the real LSN value, and that could lead to 
strange behavior. For example, how would the buffer manager behave, if 
there was a dirty page in the buffer cache with an LSN value that's 
greater than the current WAL flush pointer? I think you'd get "ERROR: 
xlog flush request %X/%X is not satisfied --- flushed only to %X/%X".


Perhaps the risk is theoretical; the real WAL begins at XLOG_SEG_SIZE, 
so with defaults WAL segment size, the index build would have to do 
about 16 million page splits. The index would have to be at least 150 GB 
for that. But it seems possible, and with non-default segment and page 
size settings more so.
As i see in bufmgr.c, XLogFlush() can't called during index build. In 
the log_newpage_range() call we can use mask to set value of NSN (and 
LSN) to 1.


Perhaps we could start at 1, but instead of using a global counter, 
whenever a page is split, we take the parent's LSN value and increment 
it by one. So different branches of the tree could use the same values, 
which would reduce the consumption of the counter values.


Yet another idea would be to start the counter at 1, but check that it 
doesn't overtake the WAL insert pointer. If it's about to overtake it, 
just generate some dummy WAL.


But it seems best to deal with this in gistdoinsert(). I think 
Anastasia's approach of adding a flag to GISTInsertStack can be made to 
work, if we set the flag somewhere in gistinserttuples() or 
gistplacetopage(), whenever a page is split. That way, if it needs to 
split multiple levels, the flag is set on all of the corresponding 
GISTInsertStack entries.


Yet another trivial fix would be just always start the tree descend from 
the root in gistdoinsert(), if a page is split. Not as efficient, but 
probably negligible in practice.

Agree

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Do CustomScan as not projection capable node

2019-04-18 Thread Andrey Lepikhov
Can we include the CustomScan node in the list of nodes that do not 
support projection?
Reason is that custom node can contain quite arbitrary logic that does 
not guarantee projection support.
Secondly. If planner does not need a separate Result node, it just 
assign tlist to subplan (i.e. changes targetlist of custom node) and 
does not change the custom_scan_tlist.
Perhaps I do not fully understand the logic of using the 
custom_scan_tlist field. But if into the PlanCustomPath() routine our 
custom node does not build own custom_scan_tlist (may be it will use 
tlist as base for custom_scan_tlist) we will get errors in the 
set_customscan_references() call.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 938904d179e0a4e31cbb20fb70243d2b980d8dc2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 19 Apr 2019 09:34:09 +0500
Subject: [PATCH] Add CustomScan node in the list of nodes that do not support
 projection

---
 src/backend/optimizer/plan/createplan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index efe073a3ee..682d4d4429 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6691,6 +6691,7 @@ is_projection_capable_path(Path *path)
 		case T_ModifyTable:
 		case T_MergeAppend:
 		case T_RecursiveUnion:
+		case T_CustomScan:
 			return false;
 		case T_Append:
 
-- 
2.17.1



Re: Do CustomScan as not projection capable node

2019-04-22 Thread Andrey Lepikhov




On 22/04/2019 18:40, Robert Haas wrote:

On Fri, Apr 19, 2019 at 12:45 AM Tom Lane  wrote:

I don't buy this for a minute.  Where do you think projection is
going to happen?  There isn't any existing node type that *couldn't*
support projection if we insisted that that be done across-the-board.
I think it's mostly just a legacy thing that some don't.


I think there may actually be some good reasons for that.  If
something like an Append or Material node projects, it seems to me
that this means that we built the wrong tlist for its input(s).

That justification doesn't apply to custom scans, though.
The main reason for my question was incomplete information about the 
parameter custom_scan_tlist / fdw_scan_tlist.
In the process of testing my custom node, I encountered an error in 
setrefs.c caused by optimization of the projection operation. In order 
to reliably understand how to properly use custom_scan_tlist, I had to 
study in detail the mechanics of the FDW plan generator and now the 
problem is solved.
We have only three references to this parameter in the hackers mailing 
list, a brief reference on postgresql.org and limited comments into two 
patches: 1a8a4e5 and e7cb7ee.
It is possible that custom_scan_tlist is designed too nontrivially, and 
it is possible that it needs some comments describing in more detail how 
to use it.


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company




Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-10-22 Thread Andrey Lepikhov




On 22.10.2018 2:06, Heikki Linnakangas wrote:

On 17/08/2018 06:47, Andrey Lepikhov wrote:

I propose the patch for fix one small code defect.
The XLogReadRecord() function reads the pages of a WAL segment that
contain a WAL-record. Then it creates a readRecordBuf buffer in private
memory of a backend and copy record from the pages to the readRecordBuf
buffer. Pointer 'record' is set to the beginning of the readRecordBuf
buffer.

But if the WAL-record is fully placed on one page, the XLogReadRecord()
function forgets to bind the "record" pointer with the beginning of the
readRecordBuf buffer. In this case, XLogReadRecord() returns a pointer
to an internal xlog page. This patch fixes the defect.


Hmm. I agree it looks weird the way it is. But I wonder, why do we even 
copy the record to readRecordBuf, if it fits on the WAL page? Returning 
a pointer to the xlog page buffer seems OK in that case. What if we 
remove the memcpy(), instead?


It depends on the PostgreSQL roadmap. If we want to compress main data 
and encode something to reduce a WAL-record size, than the 
representation of the WAL-record in shared buffers (packed) and in local 
memory (unpacked) will be different and the patch is needed.
If something like this should not be in the PostgreSQL core, then it is 
more efficient to remove memcpy(), of course.

I vote for the patch.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-10-22 Thread Andrey Lepikhov


On 23.10.2018 0:53, Heikki Linnakangas wrote:
I'd expect the decompression to read from the on-disk buffer, and unpack 
to readRecordBuf, I still don't see a need to copy the packed record to 
readRecordBuf. If there is a need for that, though, the patch that 
implements the packing or compression can add the memcpy() where it 
needs it.


I agree with it. Eventually, placement of the WAL-record can be defined 
by comparison the record, readBuf and readRecordBuf pointers.

In attachment new version of the patch.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 36fd35dc75658f471efbc64fe2a3f204f0aa27e4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 23 Oct 2018 10:17:55 +0500
Subject: [PATCH] WAL-record-buffer-pointer-fix

---
 src/backend/access/transam/xlogreader.c | 27 -
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 0768ca7822..c5e019bf77 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -353,19 +353,6 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		gotheader = false;
 	}
 
-	/*
-	 * Enlarge readRecordBuf as needed.
-	 */
-	if (total_len > state->readRecordBufSize &&
-		!allocate_recordbuf(state, total_len))
-	{
-		/* We treat this as a "bogus data" condition */
-		report_invalid_record(state, "record length %u at %X/%X too long",
-			  total_len,
-			  (uint32) (RecPtr >> 32), (uint32) RecPtr);
-		goto err;
-	}
-
 	len = XLOG_BLCKSZ - RecPtr % XLOG_BLCKSZ;
 	if (total_len > len)
 	{
@@ -375,6 +362,19 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		char	   *buffer;
 		uint32		gotlen;
 
+		/*
+		 * Enlarge readRecordBuf as needed.
+		 */
+		if (total_len > state->readRecordBufSize &&
+			!allocate_recordbuf(state, total_len))
+		{
+			/* We treat this as a "bogus data" condition */
+			report_invalid_record(state, "record length %u at %X/%X too long",
+  total_len,
+  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+			goto err;
+		}
+
 		/* Copy the first fragment of the record from the first page. */
 		memcpy(state->readRecordBuf,
 			   state->readBuf + RecPtr % XLOG_BLCKSZ, len);
@@ -479,7 +479,6 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		state->EndRecPtr = RecPtr + MAXALIGN(total_len);
 
 		state->ReadRecPtr = RecPtr;
-		memcpy(state->readRecordBuf, record, total_len);
 	}
 
 	/*
-- 
2.17.1



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-10-23 Thread Andrey Lepikhov




On 19.10.2018 0:54, Peter Geoghegan wrote:

I would welcome any theories as to what could be the problem here. I'm
think that this is fixable, since the picture for the patch is very
positive, provided you only focus on bgwriter/checkpoint activity and
on-disk sizes. It seems likely that there is a very specific gap in my
understanding of how the patch affects buffer cleaning.


I have same problem with background heap & index cleaner (based on your 
patch). In this case the bottleneck is WAL-record which I need to write 
for each cleaned block and locks which are held during the WAL-record 
writing process.

Maybe you will do a test without writing any data to disk?

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-10-25 Thread Andrey Lepikhov



On 26.10.2018 10:33, Kyotaro HORIGUCHI wrote:

Hello.

At Tue, 23 Oct 2018 10:25:27 +0500, Andrey Lepikhov  wrote 
in <2553f2b0-0e39-eb0e-d382-6c0ed08ca...@postgrespro.ru>


On 23.10.2018 0:53, Heikki Linnakangas wrote:

I'd expect the decompression to read from the on-disk buffer, and
unpack to readRecordBuf, I still don't see a need to copy the packed
record to readRecordBuf. If there is a need for that, though, the
patch that implements the packing or compression can add the memcpy()
where it needs it.


I agree with it. Eventually, placement of the WAL-record can be
defined by comparison the record, readBuf and readRecordBuf pointers.
In attachment new version of the patch.


This looks quite clear and what it does is reasonable to me.
Applies cleanly on top of current master and no regression seen.


Just one comment. This patch leaves the following code.

  >  /* Record does not cross a page boundary */
  >  if (!ValidXLogRecord(state, record, RecPtr))
  >  goto err;
  >
  >  state->EndRecPtr = RecPtr + MAXALIGN(total_len);
!>
  >  state->ReadRecPtr = RecPtr;
  >  }

The empty line (marked by '!') looks a bit too much standing out
after this patch. Could you remove the line? Then could you
transpose the two assignments if you don't mind?


Thanks, see attachment.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From ec1df6c2b41fdfe2c79e3f0944653057e394c535 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 23 Oct 2018 10:17:55 +0500
Subject: [PATCH] WAL record buffer pointer fix

---
 src/backend/access/transam/xlogreader.c | 30 -
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 0768ca7822..82a16e78f3 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -353,19 +353,6 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		gotheader = false;
 	}
 
-	/*
-	 * Enlarge readRecordBuf as needed.
-	 */
-	if (total_len > state->readRecordBufSize &&
-		!allocate_recordbuf(state, total_len))
-	{
-		/* We treat this as a "bogus data" condition */
-		report_invalid_record(state, "record length %u at %X/%X too long",
-			  total_len,
-			  (uint32) (RecPtr >> 32), (uint32) RecPtr);
-		goto err;
-	}
-
 	len = XLOG_BLCKSZ - RecPtr % XLOG_BLCKSZ;
 	if (total_len > len)
 	{
@@ -375,6 +362,19 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		char	   *buffer;
 		uint32		gotlen;
 
+		/*
+		 * Enlarge readRecordBuf as needed.
+		 */
+		if (total_len > state->readRecordBufSize &&
+			!allocate_recordbuf(state, total_len))
+		{
+			/* We treat this as a "bogus data" condition */
+			report_invalid_record(state, "record length %u at %X/%X too long",
+  total_len,
+  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+			goto err;
+		}
+
 		/* Copy the first fragment of the record from the first page. */
 		memcpy(state->readRecordBuf,
 			   state->readBuf + RecPtr % XLOG_BLCKSZ, len);
@@ -476,10 +476,8 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 		if (!ValidXLogRecord(state, record, RecPtr))
 			goto err;
 
-		state->EndRecPtr = RecPtr + MAXALIGN(total_len);
-
 		state->ReadRecPtr = RecPtr;
-		memcpy(state->readRecordBuf, record, total_len);
+		state->EndRecPtr = RecPtr + MAXALIGN(total_len);
 	}
 
 	/*
-- 
2.17.1



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-11-02 Thread Andrey Lepikhov

I do the code review.
Now, it is first patch - v6-0001... dedicated to a logical duplicates 
ordering.


Documentation is full and clear. All non-trivial logic is commented 
accurately.


Patch applies cleanly on top of current master. Regression tests passed 
and my "Retail Indextuple deletion" use cases works without mistakes.

But I have two comments on the code.
New BTScanInsert structure reduces parameters list of many functions and 
look fine. But it contains some optimization part ('restorebinsrch' 
field et al.). It is used very locally in the code - 
_bt_findinsertloc()->_bt_binsrch() routines calling. May be you localize 
this logic into separate struct, which will passed to _bt_binsrch() as 
pointer. Another routines may pass NULL value to this routine. It is may 
simplify usability of the struct.


Due to the optimization the _bt_binsrch() size has grown twice. May be 
you move this to some service routine?



--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-11-03 Thread Andrey Lepikhov




On 03.11.2018 5:00, Peter Geoghegan wrote:

The DESC heap TID sort order thing probably needs to go. I'll probably
have to go fix the regression test failures that occur when ASC heap
TID order is used. (Technically those failures are a pre-existing
problem, a problem that I mask by using DESC order...which is weird.
The problem is masked in the master branch by accidental behaviors
around nbtree duplicates, which is something that deserves to die.
DESC order is closer to the accidental current behavior.)


I applied your patches at top of master. After tests corrections 
(related to TID ordering in index relations DROP...CASCADE operation) 
'make check-world' passed successfully many times.
In the case of 'create view' regression test - 'drop cascades to 62 
other objects' problem - I verify an Álvaro Herrera hypothesis [1] and 
it is true. You can verify it by tracking the 
object_address_present_add_flags() routine return value.

Some doubts, however, may be regarding the 'triggers' test.
May you specify the test failures do you mean?

[1] 
https://www.postgresql.org/message-id/20180504022601.fflymidf7eoencb2%40alvherre.pgsql


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-11-04 Thread Andrey Lepikhov




On 04.11.2018 9:31, Peter Geoghegan wrote:

On Sat, Nov 3, 2018 at 8:52 PM Andrey Lepikhov
 wrote:

I applied your patches at top of master. After tests corrections
(related to TID ordering in index relations DROP...CASCADE operation)
'make check-world' passed successfully many times.
In the case of 'create view' regression test - 'drop cascades to 62
other objects' problem - I verify an Álvaro Herrera hypothesis [1] and
it is true. You can verify it by tracking the
object_address_present_add_flags() routine return value.


I'll have to go and fix the problem directly, so that ASC sort order
can be used.


Some doubts, however, may be regarding the 'triggers' test.
May you specify the test failures do you mean?


Not sure what you mean. The order of items that are listed in the
DETAIL for a cascading DROP can have an "implementation defined"
order. I think that this is an example of the more general problem --
what you call the 'drop cascades to 62 other objects' problem is a
more specific subproblem, or, if you prefer, a more specific symptom
of the same problem.


I mean that your code have not any problems that I can detect by 
regression tests and by the retail index tuple deletion patch.
Difference in amount of dropped objects is not a problem. It is caused 
by pos 2293 - 'else if (thisobj->objectSubId == 0)' - at the file 
catalog/dependency.c and it is legal behavior: column row object deleted 
without any report because we already decided to drop its whole table.


Also, I checked the triggers test. Difference in the ERROR message 
'cannot drop trigger trg1' is caused by different order of tuples in the 
relation with the dependDependerIndexId relid. It is legal behavior and 
we can simply replace test results.


May be you know any another problems of the patch?

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2018-11-05 Thread Andrey Lepikhov
 to fixing the
processing order of this findDependentObjects() pg_depend scan so that
we reliably get the user-visible behavior we already tacitly expect?

--
Peter Geoghegan



--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-11-18 Thread Andrey Lepikhov



On 16.11.2018 11:23, Michael Paquier wrote:

On Thu, Nov 15, 2018 at 06:12:38PM +0900, Kyotaro HORIGUCHI wrote:

This patch eliminates unnecessary copying that was done for
non-continued records. Now the return value of XLogReadRecord
directly points into page buffer holded in XLogReaderStats. It is
safe because no caller site uses the returned pointer beyond the
replacement of buffer content at the next call to the same
function.


I was looking at this patch, and shouldn't we worry about compatibility
with plugins or utilities which look directly at what's in readRecordBuf
for the record contents?  Let's not forget that the contents of
XLogReaderState are public.


According to my experience, I clarify some comments to avoid this 
mistakes in the future (see attachment).


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 7de252405ef5d5979fe2711515c0c6402abc0e06 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 19 Nov 2018 07:08:28 +0500
Subject: [PATCH] Some clarifications on readRecordBuf comments

---
 src/backend/access/transam/xlogreader.c | 4 ++--
 src/include/access/xlogreader.h | 6 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index c5e019bf77..88d0bcf48a 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -209,8 +209,8 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
  * If the reading fails for some other reason, NULL is also returned, and
  * *errormsg is set to a string with details of the failure.
  *
- * The returned pointer (or *errormsg) points to an internal buffer that's
- * valid until the next call to XLogReadRecord.
+ * The returned pointer (or *errormsg) points to an internal read-only buffer
+ * that's valid until the next call to XLogReadRecord.
  */
 XLogRecord *
 XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 40116f8ecb..0837625b95 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -185,7 +185,11 @@ struct XLogReaderState
 	 */
 	TimeLineID	nextTLI;
 
-	/* Buffer for current ReadRecord result (expandable) */
+	/*
+	 * Buffer for current ReadRecord result (expandable).
+	 * Used in the case, than current ReadRecord result don't fit on the
+	 * currently read WAL page.
+	 */
 	char	   *readRecordBuf;
 	uint32		readRecordBufSize;
 
-- 
2.17.1



Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-11-19 Thread Andrey Lepikhov




On 20.11.2018 6:30, Michael Paquier wrote:

On Mon, Nov 19, 2018 at 10:48:06AM +0500, Andrey Lepikhov wrote:

According to my experience, I clarify some comments to avoid this mistakes
in the future (see attachment).


No objections from here.


- * The returned pointer (or *errormsg) points to an internal buffer that's
- * valid until the next call to XLogReadRecord.
+ * The returned pointer (or *errormsg) points to an internal read-only buffer
+ * that's valid until the next call to XLogReadRecord.


Not sure that this bit adds much.

Ok



-   /* Buffer for current ReadRecord result (expandable) */
+   /*
+* Buffer for current ReadRecord result (expandable).
+* Used in the case, than current ReadRecord result don't fit on the
+* currently read WAL page.
+*/


Yeah, this one is not entirely true now.  What about the following
instead:
-   /* Buffer for current ReadRecord result (expandable) */
+   /*
+* Buffer for current ReadRecord result (expandable), used when a record
+* crosses a page boundary.
+*/


I agree


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: Removing unneeded self joins

2022-12-15 Thread Andrey Lepikhov

On 12/6/22 23:46, Andres Freund wrote:

This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 
19:11:52.453920838 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 
2022-12-05 19:15:21.864183651 +
@@ -5806,7 +5806,7 @@
   Nested Loop
 Join Filter: (sj_t3.id = sj_t1.id)
 ->  Nested Loop
- Join Filter: (sj_t3.id = sj_t2.id)
+ Join Filter: (sj_t2.id = sj_t3.id)
   ->  Nested Loop Semi Join
 ->  Nested Loop
   ->  HashAggregate

This change in the test behaviour is induced by the a5fc4641
"Avoid making commutatively-duplicate clauses in EquivalenceClasses."
Nothing special, as I see. Attached patch fixes this.

--
Regards
Andrey Lepikhov
Postgres Professional
From 3e546637561bf4c6d195bc7c95b1e05263e797e2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8e4145979d..2f9948d5f8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5311,6 +5311,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 914bfd90bc..8d57c68b1f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 			  List *exprlist, List *oprlist)
+{
+	return relation_has_unique_index_ext(root, rel, restrictlist,
+		 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+			  List *restrictlist,
+			  List *exprlist, List *oprlist,
+			  List **extra_clauses)
 {
 	ListCell   *ic;
 
@@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List	   *exprs = NIL;
 
 		/*
 		 * 

Optimization issue of branching UNION ALL

2022-12-20 Thread Andrey Lepikhov

Hi,

Client report on a corner case have shown up possible minor 
non-optimality in procedure of transformation of simple UNION ALL 
statement tree.
Complaint is about auto-generated query with 1E4 simple union all's (see 
t.sh to generate a demo script). The reason: in REL_11_STABLE it is 
planned and executed in a second, but REL_12_STABLE and beyond makes 
matters worse: planning of such a query needs tons of gigabytes of RAM.


Superficial study revealed possibly unnecessary operations that could be 
avoided:
1. Walking across a query by calling substitute_phv_relids() even if 
lastPHId shows that no one phv is presented.
2. Iterative passes along the append_rel_list for replacing vars in the 
translated_vars field. I can't grasp real necessity of passing all the 
append_rel_list during flattening of an union all leaf subquery. No one 
can reference this leaf, isn't it?


In attachment you can see some sketch that reduces a number of planner 
cycles/copyings.


--
Regards
Andrey Lepikhov
Postgres Professional

t.sh
Description: application/shellscript
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 08a73fb9d86..3739e3fe7ba 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -131,7 +131,7 @@ static bool find_dependent_phvs_in_jointree(PlannerInfo *root,
 			Node *node, int varno);
 static void substitute_phv_relids(Node *node,
   int varno, Relids subrelids);
-static void fix_append_rel_relids(List *append_rel_list, int varno,
+static void fix_append_rel_relids(PlannerInfo *root, int varno,
   Relids subrelids);
 static Node *find_jointree_node_for_rel(Node *jtnode, int relid);
 
@@ -1156,8 +1156,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 		Relids		subrelids;
 
 		subrelids = get_relids_in_jointree((Node *) subquery->jointree, false);
-		substitute_phv_relids((Node *) parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) parse, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -2064,17 +2065,25 @@ perform_pullup_replace_vars(PlannerInfo *root,
 	 * use PHVs for safety.  (This analysis could be made tighter but it seems
 	 * unlikely to be worth much trouble.)
 	 */
-	foreach(lc, root->append_rel_list)
+	if (containing_appendrel)
 	{
-		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
-		bool		save_need_phvs = rvcontext->need_phvs;
+		bool save_need_phvs = rvcontext->need_phvs;
 
-		if (appinfo == containing_appendrel)
-			rvcontext->need_phvs = false;
-		appinfo->translated_vars = (List *)
-			pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
+		rvcontext->need_phvs = false;
+		containing_appendrel->translated_vars = (List *)
+			pullup_replace_vars((Node *) containing_appendrel->translated_vars,
+rvcontext);
 		rvcontext->need_phvs = save_need_phvs;
 	}
+	else
+	{
+		foreach(lc, root->append_rel_list)
+		{
+			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
+			appinfo->translated_vars = (List *)
+pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
+		}
+	}
 
 	/*
 	 * Replace references in the joinaliasvars lists of join RTEs.
@@ -3273,7 +3282,7 @@ remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc)
 		subrelids = get_relids_in_jointree(newjtloc, false);
 		Assert(!bms_is_empty(subrelids));
 		substitute_phv_relids((Node *) root->parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -3492,7 +3501,7 @@ substitute_phv_relids(Node *node, int varno, Relids subrelids)
  * We assume we may modify the AppendRelInfo nodes in-place.
  */
 static void
-fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
+fix_append_rel_relids(PlannerInfo *root, int varno, Relids subrelids)
 {
 	ListCell   *l;
 	int			subvarno = -1;
@@ -3503,7 +3512,7 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
 	 * AppendRelInfo nodes refer to it.  So compute it on first use. Note that
 	 * bms_singleton_member will complain if set is not singleton.
 	 */
-	foreach(l, append_rel_list)
+	foreach(l, root->append_rel_list)
 	{
 		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
 
@@ -3518,8 +3527,9 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
 		}
 
 		/* Also fix up any PHVs in its translated vars */
-		substitute_phv_relids((Node *) appinfo->translated_vars,
-			  varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) appinfo->translated_vars,
+  varno, subrelids);
 	}
 }
 


Re: Optimization issue of branching UNION ALL

2022-12-22 Thread Andrey Lepikhov

On 22/12/2022 06:50, Tom Lane wrote:

2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?


After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted.  So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.
Thanks, I have written the letter because of some doubts too. But only 
one weak point I could imagine - if someday sql standard will be changed.

Your code looks better, than previous attempt.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2022-12-27 Thread Andrey Lepikhov

On 12/26/15 23:04, Teodor Sigaev wrote:
I'd like to present OR-clause support for indexes. Although OR-clauses 
could be supported by bitmapOR index scan it isn't very effective and 
such scan lost any order existing in index. We (with Alexander Korotkov) 
presented results on Vienna's conference this year. In short, it 
provides performance improvement:


EXPLAIN ANALYZE
SELECT count(*) FROM tst WHERE id = 5 OR id = 500 OR id = 5000;
...
The problems on the way which I see for now:
1 Calculating cost. Right now it's just a simple transformation of costs 
computed for BitmapOr path. I'd like to hope that's possible and so 
index's estimation function could be non-touched. So, they could believe 
that all clauses are implicitly-ANDed
2 I'd like to add such support to btree but it seems that it should be a 
separated patch. Btree search algorithm doesn't use any kind of stack of 
pages and algorithm to walk over btree doesn't clear for me for now.
3 I could miss some places which still assumes  implicitly-ANDed list of 
clauses although regression tests passes fine.
I support such a cunning approach. But this specific case, you 
demonstrated above, could be optimized independently at an earlier 
stage. If to convert:


(F(A) = ConstStableExpr_1) OR (F(A) = ConstStableExpr_2)
to
F(A) IN (ConstStableExpr_1, ConstStableExpr_2)

it can be seen significant execution speedup. For example, using the 
demo.sql to estimate maximum positive effect we see about 40% of 
execution and 100% of planning speedup.


To avoid unnecessary overhead, induced by the optimization, such 
transformation may be made at the stage of planning (we have cardinality 
estimations and have pruned partitions) but before creation of a 
relation scan paths. So, we can avoid planning overhead and non-optimal 
BitmapOr in the case of many OR's possibly aggravated by many indexes on 
the relation.
For example, such operation can be executed in create_index_paths() 
before passing rel->indexlist.


--
Regards
Andrey Lepikhov
Postgres Professional


demo.sql
Description: application/sql


Re: [PATCH] random_normal function

2023-01-18 Thread Andrey Lepikhov

On 1/9/23 23:52, Tom Lane wrote:

BTW, if this does bring the probability of failure down to the
one-in-a-billion range, I think we could also nuke the whole
"ignore:" business, simplifying pg_regress and allowing the
random test to be run in parallel with others.
With 'ignore' option we get used to cover by tests some of the time 
dependent features, such as "statement_timeout", 
"idle_in_transaction_session_timeout", usage of user timeouts in 
extensions and so on.


We have used the pg_sleep() function to interrupt a query at certain 
execution phase. But on some platforms, especially in containers, the 
query can vary execution time in so widely that the pg_sleep() timeout, 
required to get rid of dependency on a query execution time, has become 
unacceptable. So, the "ignore" option was the best choice.


For Now, Do we only have the "isolation tests" option to create stable 
execution time-dependent tests now? Or I'm not aware about some test 
machinery?


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [PATCH] random_normal function

2023-01-18 Thread Andrey Lepikhov

On 1/19/23 11:01, Tom Lane wrote:

Andrey Lepikhov  writes:

On 1/9/23 23:52, Tom Lane wrote:

BTW, if this does bring the probability of failure down to the
one-in-a-billion range, I think we could also nuke the whole
"ignore:" business, simplifying pg_regress and allowing the
random test to be run in parallel with others.



We have used the pg_sleep() function to interrupt a query at certain
execution phase. But on some platforms, especially in containers, the
query can vary execution time in so widely that the pg_sleep() timeout,
required to get rid of dependency on a query execution time, has become
unacceptable. So, the "ignore" option was the best choice.


But does such a test have any actual value?  If your test infrastructure
ignores the result, what makes you think you'd notice if the test did
indeed detect a problem?
Yes, it is good to catch SEGFAULTs and assertions which may be frequent 
because of a logic complexity in the case of timeouts.




I think "ignore:" was a kluge we put in twenty-plus years ago when our
testing standards were a lot lower, and it's way past time we got
rid of it.
Ok, I will try to invent alternative way for deep (and stable) testing 
of timeouts. Thank you for the answer.


--
Regards
Andrey Lepikhov
Postgres Professional





[POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-29 Thread Andrey Lepikhov

Hi,

Previously, we read int this mailing list some controversial opinions on 
queryid generation and Jumbling technique. Here we don't intend to solve 
these problems but help an extension at least don't conflict with others 
on the queryId value.


Extensions could need to pass some query-related data through all stages 
of the query planning and execution. As a trivial example, 
pg_stat_statements uses queryid at the end of execution to save some 
statistics. One more reason - extensions now conflict on queryid value 
and the logic of its changing. With this patch, it can be managed.


This patch introduces the structure 'ExtensionData' which allows to 
manage of a list of entries with a couple of interface functions 
addExtensionDataToNode() and GetExtensionData(). Keep in mind the 
possible future hiding of this structure from the public interface.
An extension should invent a symbolic key to identify its data. It may 
invent as many additional keys as it wants but the best option here - is 
no more than one entry for each extension.
Usage of this machinery is demonstrated by the pg_stat_statements 
example - here we introduced Bigint node just for natively storing of 
queryId value.


Ruthless pgbench benchmark shows that we got some overhead:
1.6% - in default mode
4% - in prepared mode
~0.1% in extended mode.

An optimization that avoids copying of queryId by storing it into the 
node pointer field directly allows to keep this overhead in a range of 
%0.5 for all these modes but increases complexity. So here we 
demonstrate not optimized variant.


Some questions still cause doubts:
- QueryRewrite: should we copy extension fields from the parent 
parsetree to the rewritten ones?
- Are we need to invent a registration procedure to do away with the 
names of entries and use some compact integer IDs?
- Do we need to optimize this structure to avoid a copy for simple data 
types, for example, inventing something like A_Const?


All in all, in our opinion, this issue is tend to grow with an 
increasing number of extensions that utilize planner and executor hooks 
for some purposes. So, any thoughts will be useful.


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 944ce61d7ff934727240d90ee7620bfb69ad3a5a Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Wed, 22 Mar 2023 16:59:30 +0500
Subject: [PATCH] Add on more field into Query and PlannedStmt structures to
 allow an extension to pass some data across all the execution stages of
 specific instance of the query.

---
 .../pg_stat_statements/pg_stat_statements.c   |  57 --
 src/backend/commands/extension.c  | 103 ++
 src/backend/executor/execParallel.c   |   1 +
 src/backend/nodes/value.c |  12 ++
 src/backend/optimizer/plan/planner.c  |   1 +
 src/backend/rewrite/rewriteHandler.c  |   6 +
 src/backend/tcop/postgres.c   |   2 +
 src/include/commands/extension.h  |   4 +
 src/include/nodes/parsenodes.h|  23 
 src/include/nodes/plannodes.h |   2 +
 src/include/nodes/value.h |  10 ++
 11 files changed, 209 insertions(+), 12 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 5285c3f7fa..fc47661c86 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "commands/extension.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -107,6 +108,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 	!IsA(n, PrepareStmt) && \
 	!IsA(n, DeallocateStmt))
 
+#define GET_QUERYID(node) \
+	(Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements")
+
+#define INSERT_QUERYID(node, queryid) \
+	castNode(Bigint, AddExtensionDataToNode((Node *) node, \
+			"pg_stat_statements", \
+			(Node *) makeBigint((int64) queryid), \
+			true))
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -821,6 +830,13 @@ error:
 static void
 pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
+	Bigint *queryId;
+
+	if ((queryId = GET_QUERYID(query)) == NULL)
+		queryId = INSERT_QUERYID(query, query->queryId);
+
+	Assert(queryId);
+
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
@@ -837,7 +853,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	{
 		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
 		{
-			query->queryId = UINT64CONST(0);
+			queryId->val = UINT64CONST(0);
 			return;

Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-30 Thread Andrey Lepikhov

On 30/3/2023 12:57, Julien Rouhaud wrote:

Extensions could need to pass some query-related data through all stages of
the query planning and execution. As a trivial example, pg_stat_statements
uses queryid at the end of execution to save some statistics. One more
reason - extensions now conflict on queryid value and the logic of its
changing. With this patch, it can be managed.


I just had a quick lookc at the patch, and IIUC it doesn't really help on that
side, as there's still a single official "queryid" that's computed, stored
everywhere and later used by pg_stat_statements (which does then store in
additionally to that official queryid).

Thank you for the attention!
This patch doesn't try to solve the problem of oneness of queryId. In 
this patch we change pg_stat_statements and it doesn't set 0 into 
queryId field according to its internal logic. And another extension 
should do the same - use queryId on your own but not erase it - erase 
your private copy in the ext_field.



Ruthless pgbench benchmark shows that we got some overhead:
1.6% - in default mode
4% - in prepared mode
~0.1% in extended mode.


That's a quite significant overhead.  But the only reason to accept such a
change is to actually use it to store additional data, so it would be
interesting to see what the overhead is like once you store at least 2
different values there.
Yeah, but as I said earlier, it can be reduced to much smaller value 
just with simple optimization. Here I intentionally avoid it to discuss 
the core concept.



- Are we need to invent a registration procedure to do away with the names
of entries and use some compact integer IDs?


Note that the patch as proposed doesn't have any defense for two extensions
trying to register something with the same name, or update a stored value, as
AddExtensionDataToNode() simply prepend the new value to the list.  You can
actually update the value by just storing the new value, but it will add a
significant overhead to every other extension that want to read another value.
Thanks a lot! Patch in attachment implements such an idea - extension 
can allocate some entries and use these private IDs to add entries. I 
hope, an extension would prefer to use only one entry for all the data 
to manage overhead, but still.


The API is also quite limited as each stored value has a single identifier.
What if your extension needs to store multiple things?  Since it's all based on
Node you can't really store some custom struct, so you probably have to end up
with things like "my_extension.my_val1", "my_extension.my_val2" which isn't
great.
Main idea here - if an extension holds custom struct and want to pass it 
along all planning and execution stages it should use extensible node 
with custom read/write/copy routines.


--
regards,
Andrey Lepikhov
Postgres Professional
From ab101322330684e9839e46c26f70ad5462e40dac Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 30 Mar 2023 21:49:32 +0500
Subject: [PATCH 2/2] Add ExtendedData entry registration routines to use
 entryId instead of symbolic name.

---
 .../pg_stat_statements/pg_stat_statements.c   |  7 +-
 src/backend/commands/extension.c  | 69 +++
 src/include/commands/extension.h  |  6 +-
 src/include/nodes/parsenodes.h|  2 +-
 4 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index fc47661c86..5b21163365 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -109,11 +109,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = 
PG_VERSION_NUM / 100;
!IsA(n, 
DeallocateStmt))
 
 #define GET_QUERYID(node) \
-   (Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements")
+   (Bigint *) GetExtensionData(node->ext_field, extendedEntryID)
 
 #define INSERT_QUERYID(node, queryid) \
castNode(Bigint, AddExtensionDataToNode((Node *) node, \
-   
"pg_stat_statements", \
+   
extendedEntryID, \

(Node *) makeBigint((int64) queryid), \

true))
 /*
@@ -298,6 +298,7 @@ static bool pgss_track_utility = true;  /* whether to 
track utility commands */
 static bool pgss_track_planning = false;   /* whether to track planning

 * duration */
 static bool pgss_save = true;  /* wh

Re: Removing unneeded self joins

2023-09-12 Thread Andrey Lepikhov

On 5/7/2023 21:28, Andrey Lepikhov wrote:

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.
Here is new patch in attachment. Rebased on current master and some 
minor gaffes are fixed.


--
regards,
Andrey Lepikhov
Postgres Professional
From 70bb5cf3d11b2797f1a9c7b04740435135229d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove self-joins.

Self Join Elimination (SJE) feature removes an inner join of a plain table to
itself in the query tree if is proved that the join can be replaced with
a scan without impact to the query result.
Self join and inner relation are replaced with the outer in query, equivalence
classes and planner info structures. Also, inner restrictlist moves to the
outer one with removing duplicated clauses. Thus, this optimization reduces
length of range table list (especially it make sense for partitioned relations),
reduces number of restriction clauses === selectivity estimations and
potentially can improve total planner prediction for the query.

The SJE proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Add to the list above baseretrictinfo of inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip this
pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove 
possibility
of the self-join elimination inner and outer clauses must have exact match.

Relation replacement procedure is not trivial and it is partly combined with 
the one,
used to remove useless left joins.

Tests, covering this feature, added to the join.sql.
Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1094 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  824 -
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/expected/updatable_views.out |   17 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  360 ++
 src/tools/pgindent/typedefs.list  |2 +
 14 files changed, 2375 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..43c07b0d3b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5299,6 +5299,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..508285d1ef 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+  

Re: POC: GROUP BY optimization

2023-09-12 Thread Andrey Lepikhov

Hi,
Here is the patch rebased on the current master. Also, I fixed some 
minor slips and one static analyzer warning.
This is just for adding to the next commitfest and enforcing work with 
this patch.


One extra difference in newly added postgres_fdw tests is caused by this 
patch - see changes in the query plan in attachment.


--
regards,
Andrey Lepikhov
Postgres Professional
From 33953655c9ac3f9ec64b80c9f2a2ff38bd178745 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may depend heavily on the order in which the keys are compared when
building the groups. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of the keys
and leave the rest to the optimizer. But that might get very expensive, so we
try to pick only a couple interesting orderings based on both local and global
information.

When planning the grouping path, we explore statistics (number of distinct
values, cost of the comparison function) for the keys and reorder them
to minimize comparison costs. Intuitively, it may be better to perform more
expensive comparisons (for complex data types, etc.) last because maybe
the cheaper comparisons will be enough. Similarly, the higher the cardinality
of a key, the lower the probability we'll need to compare more keys. The patch
generates and costs various orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch generates orderings and picks those, minimizing the comparison cost
(for various path keys), and then adds orderings that might be useful for
operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps
the ordering specified in the query, assuming the user might have additional
insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  36 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 602 ++
 src/backend/optimizer/plan/planner.c  | 465 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1771 insertions(+), 382 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0f..63af7feabe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT 
DISTINCT t2.c1, t3.c1 FROM
 -- join with pseudoconstant quals
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER 
= SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
- 

Re: POC: GROUP BY optimization

2023-09-18 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.
Ok, some thoughts on this part of the task. As I see, we have not so 
many different operators: 26 with fixed width and 16 with variable width:


SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;

Benchmarking procedure of types with fixed length can be something like 
below:


CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
nrows integer,
exec_time float
)  AS $$
DECLARE
  data json;
  step integer;
BEGIN
  SET work_mem='1GB';

  FOR step IN 0..3 LOOP
SELECT pow(100, step)::integer INTO nrows;
DROP TABLE IF EXISTS test CASCADE;
EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
FROM generate_series(1,%s) AS gs;', typ, nrows);

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
  SELECT * FROM test ORDER BY (x) DESC INTO data;
SELECT data->0->'Execution Time' INTO exec_time;
RETURN NEXT;
  END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow 
of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for 
the integer type (as a basis) we can calculate costs for other 
operators. Variable-width types, i think, could require more complex 
technique to check dependency on the length.


Does this way look worthwhile?

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-09-19 Thread Andrey Lepikhov

On 25/8/2023 14:39, Yuya Watari wrote:

Hello,

On Wed, Aug 9, 2023 at 8:54 PM David Rowley  wrote:

I think the best way to move this forward is to explore not putting
partitioned table partitions in EMs and instead see if we can
translate to top-level parent before lookups.  This might just be too
complex to translate the Exprs all the time and it may add overhead
unless we can quickly determine somehow that we don't need to attempt
to translate the Expr when the given Expr is already from the
top-level parent. If that can't be made to work, then maybe that shows
the current patch has merit.


Based on your suggestion, I have experimented with not putting child
EquivalenceMembers in an EquivalenceClass. I have attached a new
patch, v20, to this email. The following is a summary of v20.
Working on self-join removal in the thread [1] nearby, I stuck into the 
problem, which made an additional argument to work in this new direction 
than a couple of previous ones.
With indexing positions in the list of equivalence members, we make some 
optimizations like join elimination more complicated - it may need to 
remove some clauses and equivalence class members.
For changing lists of derives or ec_members, we should go through all 
the index lists and fix them, which is a non-trivial operation.


[1] 
https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-09-20 Thread Andrey Lepikhov

On 23/8/2023 12:37, Richard Guo wrote:

If we go with the "tablesample scans can't be reparameterized" approach
in the back branches, I'm a little concerned that what if we find more
cases in the futrue where we need modify RTEs for reparameterization.
So I spent some time seeking and have managed to find one: there might
be lateral references in a scan path's restriction clauses, and
currently reparameterize_path_by_child fails to adjust them.
It may help you somehow: in [1], we designed a feature where the 
partitionwise join technique can be applied to a JOIN of partitioned and 
non-partitioned tables. Unfortunately, it is out of community 
discussions, but we still support it for sharding usage - it is helpful 
for the implementation of 'global' tables in a distributed 
configuration. And there we were stuck into the same problem with 
lateral relids adjustment. So you can build a more general view of the 
problem with this patch.


[1] Asymmetric partition-wise JOIN
https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional





Re: disfavoring unparameterized nested loops

2023-09-20 Thread Andrey Lepikhov

On 29/9/2022 21:32, Benjamin Coutu wrote:

I'd like to revamp this important discussion.

As is well described in this fairly recent paper here 
https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks at Postgres) 
"estimation errors quickly grow as the number of joins increases, and that these 
errors are usually the reason for bad plans" - I think we can all get behind that 
statement.

While nested loop joins work great when cardinality estimates are correct, they 
are notoriously bad when the optimizer underestimates and they degrade very 
fast in such cases - the good vs. bad here is very asymmetric. On the other 
hand hash joins degrade much more gracefully - they are considered very robust 
against underestimation. The above mentioned paper illustrates that all mayor 
DBMS (including Postgres) tend to underestimate and usually that 
underestimation increases drastically with the number of joins (see Figures 3+4 
of the paper).

Now, a simple approach to guarding against bad plans that arise from 
underestimation could be to use what I would call a 
nested-loop-conviction-multiplier based on the current depth of the join tree, 
e.g. for a base table that multiplier would obviously be 1, but would then grow 
(e.g.) quadratically. That conviction-multiplier would *NOT* be used to skew 
the cardinality estimates themselves, but rather be applied to the overall 
nested loop join cost at each particular stage of the plan when comparing it to 
other more robust join strategies like hash or sort-merge joins. That way when 
we can be sure to have a good estimate at the bottom of the join tree we treat 
all things equal, but favor nested loops less and less as we move up the join 
tree for the sake of robustness.
Also, we can expand the multiplier whenever we fall back to using the default 
cardinality constant as surely all bets are off at that point - we should 
definitely treat nested loop joins as out of favor in this instance and that 
could easily be incorporated by simply increasing the conviction-mutliplier.

What are your thoughts on this simple idea - is it perhaps too simple?
In my practice, parameterized nested loop reduces, sometimes 
drastically, execution time. If your query touches a lot of tables but 
extracts only a tiny part of the data, and you have good coverage by 
indexes, PNL works great.
Moreover, I have pondered extending parameterization through subqueries 
and groupings.


What could you say about a different way: hybrid join? In MS SQL Server, 
they have such a feature [1], and, according to their description, it 
requires low overhead. They start from HashJoin and switch to NestLoop 
if the inner input contains too small tuples. It solves the issue, Isn't it?


[1] 
https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PATCH] Add extra statistics to explain for Nested Loop

2023-09-22 Thread Andrey Lepikhov

On 31/7/2022 10:49, Julien Rouhaud wrote:

On Sat, Jul 30, 2022 at 08:54:33PM +0800, Julien Rouhaud wrote:
Anyway, 1% is in my opinion still too much overhead for extensions that won't
get any extra information.
I have read all the thread and still can't understand something. What 
valuable data can I find with these extra statistics if no one 
parameterized node in the plan exists?
Also, thinking about min/max time in the explain, I guess it would be 
necessary in rare cases. Usually, the execution time will correlate to 
the number of tuples scanned, won't it? So, maybe skip the time 
boundaries in the instrument structure?
In my experience, it is enough to know the total number of tuples 
bubbled up from a parameterized node to decide further optimizations. 
Maybe simplify this feature up to the one total_rows field in the case 
of nloops > 1 and in the presence of parameters?
And at the end. If someone wants a lot of additional statistics, why not 
give them that by extension? It is only needed to add a hook into the 
point of the node explanation and some efforts to make instrumentation 
extensible. But here, honestly, I don't have code/ideas so far.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Postgres picks suboptimal index after building of an extended statistics

2023-09-24 Thread Andrey Lepikhov

On 12/8/2021 06:26, Tomas Vondra wrote:

On 8/11/21 2:48 AM, Peter Geoghegan wrote:

On Wed, Jun 23, 2021 at 7:19 AM Andrey V. Lepikhov
 wrote:

Ivan Frolkov reported a problem with choosing a non-optimal index during
a query optimization. This problem appeared after building of an
extended statistics.


Any thoughts on this, Tomas?



Thanks for reminding me, I missed / forgot about this thread.

I agree the current behavior is unfortunate, but I'm not convinced the 
proposed patch is fixing the right place - doesn't this mean the index 
costing won't match the row estimates displayed by EXPLAIN?


I wonder if we should teach clauselist_selectivity about UNIQUE indexes, 
and improve the cardinality estimates directly, not just costing for 
index scans.


Also, is it correct that the patch calculates num_sa_scans only when 
(numIndexTuples >= 0.0)?
I can't stop thinking about this issue. It is bizarre when Postgres 
chooses a non-unique index if a unique index gives us proof of minimum scan.
I don't see a reason to teach the clauselist_selectivity() routine to 
estimate UNIQUE indexes. We add some cycles, but it will work with btree 
indexes only.
Maybe to change compare_path_costs_fuzzily() and add some heuristic, for 
example:
"If selectivity of both paths gives us no more than 1 row, prefer to use 
a unique index or an index with least selectivity."


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GUC option for skipping shared buffers in core dumps

2023-09-25 Thread Andrey Lepikhov

Hi,

The current approach could be better because we want to use it on 
Windows/MacOS and other systems. So, I've tried to develop another 
strategy - detaching shmem and DSM blocks before executing the abort() 
routine.

As I can see, it helps and reduces the size of the core file.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..4d7bf2c0e4 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -325,7 +325,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index ed11e8be7f..34ac874ad0 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -197,7 +197,7 @@ main(int argc, char *argv[])
else
PostmasterMain(argc, argv);
/* the functions above should not return */
-   abort();
+   pg_abort();
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 54e9bfb8c4..fc32a6bb1b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1469,7 +1469,7 @@ PostmasterMain(int argc, char *argv[])
 */
ExitPostmaster(status != STATUS_OK);
 
-   abort();/* not reached */
+   pg_abort(); /* not reached */
 }
 
 
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index e250b0567e..3123b388ab 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -385,7 +385,7 @@ WalSndShutdown(void)
whereToSendOutput = DestNone;
 
proc_exit(0);
-   abort();/* keep the compiler 
quiet */
+   pg_abort(); /* keep the compiler 
quiet */
 }
 
 /*
diff --git a/src/backend/utils/error/assert.c b/src/backend/utils/error/assert.c
index 719dd7b309..f422d42440 100644
--- a/src/backend/utils/error/assert.c
+++ b/src/backend/utils/error/assert.c
@@ -19,6 +19,32 @@
 #include 
 #endif
 
+#include 
+#include 
+
+int core_dump_no_shared_buffers = COREDUMP_INCLUDE_ALL;
+
+/*
+ * Remember, at the same time someone can work with shared memory, write them 
to
+ * disk and so on.
+ */
+void
+pg_abort(void)
+{
+   if (core_dump_no_shared_buffers != COREDUMP_INCLUDE_ALL)
+   {
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_DSM)
+   dsm_detach_all();
+
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_SHMEM)
+   PGSharedMemoryDetach();
+   }
+
+   abort();
+}
+
 /*
  * ExceptionalCondition - Handles the failure of an Assert()
  *
@@ -63,5 +89,5 @@ ExceptionalCondition(const char *conditionName,
sleep(100);
 #endif
 
-   abort();
+   pg_abort();
 }
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8e1f3e8521..f6c863ca68 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -601,7 +601,7 @@ errfinish(const char *filename, int lineno, const char 
*funcname)
 * children...
 */
fflush(NULL);
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index bdb26e2b77..95e205e8d1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -427,6 +427,14 @@ static const struct config_enum_entry 
debug_logical_replication_streaming_option
{NULL, 0, false}
 };
 
+static const struct config_enum_entry core_dump_no_shared_buffers_options[] = {
+   {"none", COREDUMP_INCLUDE_ALL, false},
+   {"shmem", COREDUMP_EXCLUDE_SHMEM, false},
+   {"dsm", COREDUMP_EXCLUDE_DSM, false},
+   {"all", COREDUMP_EXCLUDE_ALL, false},
+   {NULL, 0, false}
+};
+
 StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 
2),
 "array length mismatch");
 
@@ -4971,6 +4979,17 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
 
+   {
+   {"core_dump_no_shared_buffers", PGC_POSTMASTER, 
DEVELOPER_OPTIONS,
+   NULL,
+   NULL,
+   GUC_NOT_IN_SAMPLE
+   },
+   &core_dump_no_shared_buffers,
+   COREDUMP_I

Re: RFC: Logging plan of the running query

2023-09-25 Thread Andrey Lepikhov

On 25/9/2023 14:21, torikoshia wrote:

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
ran the following query but did not cause any problems.


```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
   PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


I didn't find a problem either. I just feel uncomfortable if, at the 
moment of interruption, we have a descriptor of another query than the 
query have been executing and holding resources.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-09-25 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

According to this issue, I see two options:
1. Go through the grouping column list and find the most reliable one. 
If we have a unique column or column with statistics on the number of 
distinct values, which is significantly more than ndistincts for other 
grouping columns, we can place this column as the first in the grouping. 
It should guarantee the reliability of such a decision, isn't it?
2. If we have extended statistics on distinct values and these 
statistics cover some set of first columns in the grouping list, we can 
optimize these positions. It also looks reliable.


Any thoughts?

--
regards,
Andrey Lepikhov
Postgres Professional





Re: RFC: Logging plan of the running query

2023-09-27 Thread Andrey Lepikhov

On 28/9/2023 09:04, torikoshia wrote:

On 2023-09-25 18:49, Andrey Lepikhov wrote:

On 25/9/2023 14:21, torikoshia wrote:

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and 
then ran the following query but did not cause any problems.


```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
   PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


I didn't find a problem either. I just feel uncomfortable if, at the
moment of interruption, we have a descriptor of another query than the
query have been executing and holding resources.


I think that "descriptor" here refers to ActiveQueryDesc, in which case
it is updated at the beginning of ExecutorRun(), so I am wondering if
the situation you're worried about would not occur.
As you can see, in my example we have the only DDL and no queries with 
plans. In this case postgres doesn't call ExecutorRun() just because it 
doesn't have a plan. But locks will be obtained.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: NOT IN subquery optimization

2020-04-01 Thread Andrey Lepikhov
You should do small rebase (conflict with 911e7020770) and pgindent of 
the patch to repair problems with long lines and backspaces.


I am reviewing your patch in small steps. Questions:
1. In the find_innerjoined_rels() routine you stop descending on 
JOIN_FULL node type. I think it is wrong because if var has NOT NULL 
constraint, full join can't change it to NULL.
2. The convert_NOT_IN_to_join() routine is ok, but its name is 
misleading. May be you can use something like make_NOT_IN_to_join_quals()?

3. pull_up_sublinks_qual_recurse(). Comment:
"Return pullout predicate (x is NOT NULL)..."
may be change to
"Return pullout predicate (x is NOT NULL or NOT EXISTS...)"?
4. is_node_nonnullable():
I think one more case of non-nullable var may be foreign key constraint.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company





Re: Removing unneeded self joins

2020-04-02 Thread Andrey Lepikhov

On 4/1/20 8:34 PM, David Steele wrote:
This patch no longer applies cleanly on 
src/test/regress/sql/equivclass.sql: 
http://cfbot.cputube.org/patch_27_1712.log


The CF entry has been updated to Waiting on Author.


v.23 in attachment:

1. The patch applies cleanly.
2. Add checks: two potentially self joined relations may belong to 
different rules of order restriction in join_info_list.

3. Add test for item 2.

The CF entry has been updated to Needs review.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From e74d3c2549737305419b3c29301d29c1e191d6ae Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 3 Apr 2020 09:31:58 +0500
Subject: [PATCH] Remove Self Joins v. 23

---
 src/backend/nodes/outfuncs.c  |   19 +-
 src/backend/optimizer/path/indxpath.c |   28 +-
 src/backend/optimizer/path/joinpath.c |6 +-
 src/backend/optimizer/plan/analyzejoins.c | 1178 -
 src/backend/optimizer/plan/planmain.c |7 +-
 src/backend/optimizer/util/pathnode.c |3 +-
 src/backend/optimizer/util/relnode.c  |   26 +-
 src/backend/utils/misc/guc.c  |   11 +
 src/backend/utils/mmgr/aset.c |1 -
 src/include/nodes/nodes.h |1 +
 src/include/nodes/pathnodes.h |   22 +-
 src/include/optimizer/pathnode.h  |4 +
 src/include/optimizer/paths.h |3 +-
 src/include/optimizer/planmain.h  |   10 +-
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  330 +
 src/test/regress/expected/updatable_views.out |   15 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  169 +++
 19 files changed, 1773 insertions(+), 108 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb168ffd6d..a1a9ae1ac1 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2281,7 +2281,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2353,6 +2354,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 	WRITE_BITMAPSET_FIELD(keys);
 }
 
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
 static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
@@ -4135,6 +4149,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..0c6e7eee74 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -368

Re: POC: GROUP BY optimization

2022-01-21 Thread Andrey Lepikhov

On 7/22/21 3:58 AM, Tomas Vondra wrote:

4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
estimate_num_groups_incremental. In principle yes, if we use "group
size" from the previous step, then the returned value is the number of
new groups after adding the "new" pathkey.
But even if we ignore the issues with amplification mentioned in (3),
there's an issue with non-linear behavior in estimate_num_groups,
because at some point it's calculating

D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))

where N - total rows, p - sample size, n - number of distinct values.
And if we have (N1,n1) and (N2,n2) then the ratio of calculated
estimated (which is pretty much what calculating group size does)

D(N2,n2,p2) / D(N1,n1,p1)

which will differ depending on p1 and p2. And if we're tweaking the
tuplesPerPrevGroup all the time, that's really annoying, as it may make
the groups smaller or larger, which is unpredictable and annoying, and I
wonder if it might go against the idea of penalizing tuplesPerPrevGroup
to some extent.
We could simply use the input "tuples" value here, and then divide the
current and previous estimate to calculate the number of new groups.
tuplesPerPrevGroup is only a top boundary for the estimation. I think, 
we should use result of previous estimation as a limit for the next 
during incremental estimation. Maybe we simply limit the 
tuplesPerPrevGroup value to ensure of monotonic non-growth of this 
value? - see in attachment patch to previous fixes.


--
regards,
Andrey Lepikhov
Postgres Professionaldiff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 70af9c91d5..4e26cd275d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2025,8 +2025,10 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, 
int nPresortedKeys,
/*
 * Real-world distribution isn't uniform but now we don't have 
a way to
 * determine that, so, add multiplier to get closer to worst 
case.
+* Guard this value against irrational decisions.
 */
-   tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
+   tuplesPerPrevGroup = Min(tuplesPerPrevGroup,
+ceil(1.5 * 
tuplesPerPrevGroup / nGroups));
 
/*
 * We could skip all following columns for cost estimation, 
because we


Re: POC: GROUP BY optimization

2022-01-23 Thread Andrey Lepikhov

On 22/1/2022 01:34, Tomas Vondra wrote:
The other thing we could do is reduce the coefficient gradually - so 
it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so 
on. But it seems somewhat arbitrary (I certainly don't have some sound 
theoretical justification ...).


I think, it hasn't a reason to increase complexity without any theory at 
the bottom. Simple solution allows to realize problems much faster, if 
they arise.


... I've skipped the path_save 
removal in planner.c, because that seems incorrect - if there are 
multiple pathkeys, we must start with the original path, not the 
modified one we built in the last iteration. Or am I missing something

You are right, I misunderstood the idea of path_save variable.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Multiple Query IDs for a rewritten parse tree

2022-01-31 Thread Andrey Lepikhov

On 29/1/2022 12:51, Julien Rouhaud wrote:

4. We should reserve position of default in-core generator


 From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.


I don't really like this approach.  IIUC, this patch as-is is meant to break
pg_stat_statements extensibility.  If kind == 0 is reserved for in-core queryid
then you can't use pg_stat_statement with a different queryid generator
anymore.


Yes, it is one more problem. Maybe if we want to make it extensible, we 
could think about hooks in the pg_stat_statements too?



The patch also reserves kind == -1 for pg_stat_statements internal purpose, and
I'm not really sure why that's needed.
My idea - tags with positive numbers are reserved for generation 
results, that is performance-critical.
As I see during the implementation, pg_stat_statements makes additional 
changes on queryId (no matter which ones). Because our purpose is to 
avoid interference in this place, I invented negative values, where 
extensions can store their queryIds, based on any generator or not. 
Maybe it is redundant - main idea here was to highlight the issue.


I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...).  That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.

I think, it depends on a specific purpose of an extension.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Merging statistics from children instead of re-sampling everything

2022-02-10 Thread Andrey Lepikhov

On 21/1/2022 01:25, Tomas Vondra wrote:

But I don't have a very good idea what to do about statistics that we
can't really merge. For some types of statistics it's rather tricky to
reasonably merge the results - ndistinct is a simple example, although
we could work around that by building and merging hyperloglog counters.
I think, as a first step on this way we can reduce a number of pulled 
tuples. We don't really needed to pull all tuples from a remote server. 
To construct a reservoir, we can pull only a tuple sample. Reservoir 
method needs only a few arguments to return a sample like you read 
tuples locally. Also, to get such parts of samples asynchronously, we 
can get size of each partition on a preliminary step of analysis.
In my opinion, even this solution can reduce heaviness of a problem 
drastically.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: explain analyze rows=%.0f

2022-11-06 Thread Andrey Lepikhov

On 22/7/2022 16:47, Amit Kapila wrote:

I feel the discussion has slightly deviated which makes it unclear
whether this patch is required or not?


After quick review I want to express my thoughts.
At first, We have been waiting for this feature for years. Often clients 
give an explain to us where we see something like:

"rows=0, loops=100".
Without verbose mode, I can't even understand whether this node produces 
any rows or not.

So, I think this feature is useful for parameterized plans mostly.
Also, printing two decimal digits or even three isn't meaningful - 
sometimes we have a plan where number of loops is about 1E6 or even 1E7, 
but number of real rows is equal 100 or 1000.

To overcome this issue, I see two options:
1. Show the exact number of tuples without division by loops (fair case 
but invasive and bad for automation tools).

2. Show rows in scientific format like X.XXEXX.
I vote for second option.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2022-11-06 Thread Andrey Lepikhov

On 2/11/2022 15:27, Yuya Watari wrote:

Hello,

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

I'm still in review of your patch now. At most it seems ok, but are you 
really need both eq_sources and eq_derives lists now? As I see, 
everywhere access to these lists guides by eclass_source_indexes and 
eclass_derive_indexes correspondingly. Maybe to merge them?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2022-11-08 Thread Andrey Lepikhov

On 2/11/2022 15:27, Yuya Watari wrote:

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

Looking into find_em_for_rel() changes I see that you replaced
if (bms_is_subset(em->em_relids, rel->relids)
with assertion statement.
According of get_ecmember_indexes(), the em_relids field of returned 
equivalence members can contain relids, not mentioned in the relation.
I don't understand, why it works now? For example, we can sort by t1.x, 
but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it 
is not a mistake, maybe to add a comment why assertion here isn't failed?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2021-05-07 Thread Andrey Lepikhov

On 12/3/21 14:05, Hywel Carver wrote:

I've built and tested this, and it seems to function correctly to me. One question I have is 
whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of 
the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be 
clear, this is still an improvement (to me) without that.


New version of the feature. Deeply refactored with main goal - to reduce 
the code size) and rebased on current master.


Here I didn't work on 'unnecessary IS NOT NULL filter'.

--
regards,
Andrey Lepikhov
Postgres Professional
From eee0c5f0de35d8cb83e6c4ca7749020acb18a4d1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 28 Apr 2021 18:27:53 +0500
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if can be proven that such
join can be replaced with a scan. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row, if:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars then the inner row
is (physically) same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Collect all another join quals.
3. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. Proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 942 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 331 
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 166 
 12 files changed, 1511 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..dd5c4d2bd3 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids 
joinrelids);
+ Relids 
joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int 
*nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   RelOptInfo 
*innerrel,
   JoinType 
jointype,
   List 
*restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
remove_rel_from_query(root, innerrelid,
  
bms_union(sjinfo->min_lefthand,
-   
sjinfo->min_righthand));
+   
sjinfo->min_righthand), 0);
 
/* We verify that exactly one reference gets removed from 
joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo 
*sjinfo)
 
 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the

Re: Asynchronous Append on postgres_fdw nodes.

2021-05-07 Thread Andrey Lepikhov

On 6/5/21 22:12, Stephen Frost wrote:

* Etsuro Fujita (etsuro.fuj...@gmail.com) wrote:

I think the user should be careful about this.  How about adding a
note about it to the “Asynchronous Execution Options” section in
postgres-fdw.sgml, like the attached?

+1

... then again, it'd really be better if we could figure out a way to
just do the right thing here.  I haven't looked at this in depth but I
would think that the overhead of async would be well worth it just about
any time there's more than one foreign server involved.  Is it not
reasonable to have a heuristic where we disable async in the cases where
there's only one foreign server, but have it enabled all the other time?
While continuing to allow users to manage it explicitly if they want.
Bechmarking of SELECT from foreign partitions hosted on the same server, 
i see results:


With async append:
1 partition - 178 ms; 4 - 263; 8 - 450; 16 - 860; 32 - 1740.

Without:
1 - 178 ms; 4 - 583; 8 - 1140; 16 - 2302; 32 - 4620.

So, these results show that we have a reason to use async append in the 
case where there's only one foreign server.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Asynchronous Append on postgres_fdw nodes.

2021-05-07 Thread Andrey Lepikhov

On 6/5/21 11:45, Etsuro Fujita wrote:

On Tue, Apr 27, 2021 at 9:31 PM Etsuro Fujita  wrote:
The patch fixes the issue, but I don’t think it’s the right way to go,
because it requires an extra ExecProcNode() call, which wouldn’t be
efficient.  Also, the patch wouldn’t address another issue I noticed
in EXPLAIN ANALYZE for async-capable nodes that the command wouldn’t
measure the time spent in such nodes accurately.  For the case of
async-capable node using postgres_fdw, it only measures the time spent
in ExecProcNode() in ExecAsyncRequest()/ExecAsyncNotify(), missing the
time spent in other things such as creating a cursor in
ExecAsyncRequest().  :-(.  To address both issues, I’d like to propose
the attached, in which I added instrumentation support to
ExecAsyncRequest()/ExecAsyncConfigureWait()/ExecAsyncNotify().  I
think this would not only address the reported issue more efficiently,
but allow to collect timing for async-capable nodes more accurately.


Ok, I agree with the approach, but the next test case failed:

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM (
(SELECT * FROM f1) UNION ALL (SELECT * FROM f2)
) q1 LIMIT 100;
ERROR:  InstrUpdateTupleCount called on node not yet executed

Initialization script see in attachment.

--
regards,
Andrey Lepikhov
Postgres Professional


t1.sql
Description: application/sql


Re: Asynchronous Append on postgres_fdw nodes.

2021-05-07 Thread Andrey Lepikhov

On 6/5/21 14:11, Etsuro Fujita wrote:

On Tue, Apr 27, 2021 at 3:57 PM Andrey V. Lepikhov
 wrote:

One more question. Append choose async plans at the stage of the Append
plan creation.
Later, the planner performs some optimizations, such as eliminating
trivial Subquery nodes. So, AsyncAppend is impossible in some
situations, for example:

(SELECT * FROM f1 WHERE a < 10)
UNION ALL
(SELECT * FROM f2 WHERE a < 10);

But works for the query:

SELECT *
FROM (SELECT * FROM f1 UNION ALL SELECT * FROM f2) AS q1
WHERE a < 10;

As far as I understand, this is not a hard limit.


I think so, but IMO I think this would be an improvement rather than a bug fix.


We can choose async
subplans at the beginning of the execution stage.
For a demo, I prepared the patch (see in attachment).
It solves the problem and passes the regression tests.


Thanks for the patch!  IIUC, another approach to this would be the
patch you proposed before [1].  Right?

Yes. I think, new solution will be better.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Asynchronous Append on postgres_fdw nodes.

2021-05-10 Thread Andrey Lepikhov

On 10/5/21 08:03, Etsuro Fujita wrote:

On Fri, May 7, 2021 at 7:32 PM Andrey Lepikhov
I think a simple fix for this would be just remove the check whether
the instr->running flag is set or not in InstrUpdateTupleCount().
Attached is an updated patch, in which I also updated a comment in
execnodes.h and docs in fdwhandler.sgml to match the code in
nodeAppend.c, and fixed typos in comments in nodeAppend.c.

Your patch fixes the problem. But I found two more problems:

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) 



SELECT * FROM ( 



  (SELECT * FROM f1) 



 UNION ALL 



(SELECT * FROM f2) 



   UNION ALL 



  (SELECT * 
FROM l3) 

   ) q1 
LIMIT 6709;

  QUERY PLAN
--
 Limit (actual rows=6709 loops=1)
   ->  Append (actual rows=6709 loops=1)
 ->  Async Foreign Scan on f1 (actual rows=1 loops=1)
 ->  Async Foreign Scan on f2 (actual rows=1 loops=1)
 ->  Seq Scan on l3 (actual rows=6708 loops=1)

Here we scan 6710 tuples at low level but appended only 6709. Where did 
we lose one tuple?


2.
SELECT * FROM (
	(SELECT * FROM f1) 



   UNION ALL
	(SELECT * FROM f2) 



   UNION ALL
(SELECT * FROM f3 WHERE a > 0)
) q1 LIMIT 3000;
  QUERY PLAN
--
 Limit (actual rows=3000 loops=1)
   ->  Append (actual rows=3000 loops=1)
 ->  Async Foreign Scan on f1 (actual rows=0 loops=1)
 ->  Async Foreign Scan on f2 (actual rows=0 loops=1)
 ->  Foreign Scan on f3 (actual rows=3000 loops=1)

Here we give preference to the synchronous scan. Why?

--
regards,
Andrey Lepikhov
Postgres Professional




Defer selection of asynchronous subplans until the executor initialization stage

2021-05-10 Thread Andrey Lepikhov

On 7/5/21 21:05, Etsuro Fujita wrote:

I think it would be better to start a new thread for this, and add the
patch to the next CF so that it doesn’t get lost.


Current implementation of async append choose asynchronous subplans at 
the phase of an append plan creation. This is safe approach, but we 
loose some optimizations, such of flattening trivial subqueries and 
can't execute some simple queries asynchronously. For example:


EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF, COSTS OFF)
(SELECT * FROM f1 WHERE a < 10) UNION ALL
(SELECT * FROM f2 WHERE a < 10);

But, as I could understand, we can choose these subplans later, at the 
init append phase when all optimizations already passed.

In attachment - implementation of the proposed approach.

Initial script for the example see in the parent thread [1].


[1] 
https://www.postgresql.org/message-id/a38bb206-8340-9528-5ef6-37de2d5cb1a3%40postgrespro.ru



--
regards,
Andrey Lepikhov
Postgres Professional
From 395b1d62389cf40520a4afd87c11301aa2b17df2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 11 May 2021 08:43:03 +0500
Subject: [PATCH] Defer selection of asynchronous subplans to the executor
 initial phase.

---
 contrib/postgres_fdw/postgres_fdw.c | 10 +-
 src/backend/executor/execAmi.c  |  7 +++
 src/backend/executor/nodeAppend.c   | 19 +++
 src/backend/nodes/copyfuncs.c   |  1 -
 src/backend/nodes/outfuncs.c|  1 -
 src/backend/nodes/readfuncs.c   |  1 -
 src/backend/optimizer/plan/createplan.c | 17 +
 src/include/nodes/plannodes.h   |  1 -
 src/include/optimizer/planmain.h|  1 +
 9 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index 4ff58d9c27..3e151a6790 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1245,6 +1245,7 @@ postgresGetForeignPlan(PlannerInfo *root,
boolhas_final_sort = false;
boolhas_limit = false;
ListCell   *lc;
+   ForeignScan *fsplan;
 
/*
 * Get FDW private data created by postgresGetForeignUpperPaths(), if 
any.
@@ -1429,7 +1430,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 * field of the finished plan node; we can't keep them in private state
 * because then they wouldn't be subject to later planner processing.
 */
-   return make_foreignscan(tlist,
+   fsplan = make_foreignscan(tlist,
local_exprs,
scan_relid,
params_list,
@@ -1437,6 +1438,13 @@ postgresGetForeignPlan(PlannerInfo *root,
fdw_scan_tlist,
fdw_recheck_quals,
outer_plan);
+
+   /* If appropriate, consider participation in async operations */
+   fsplan->scan.plan.async_capable = (enable_async_append &&
+  
best_path->path.pathkeys == NIL &&
+  
!fsplan->scan.plan.parallel_safe &&
+  
is_async_capable_path((Path *)best_path));
+   return fsplan;
 }
 
 /*
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index b3726a54f3..4e70f4eb54 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -524,6 +524,9 @@ ExecSupportsBackwardScan(Plan *node)
if (node->parallel_aware)
return false;
 
+   if (node->async_capable)
+   return false;
+
switch (nodeTag(node))
{
case T_Result:
@@ -536,10 +539,6 @@ ExecSupportsBackwardScan(Plan *node)
{
ListCell   *l;
 
-   /* With async, tuples may be interleaved, so 
can't back up. */
-   if (((Append *) node)->nasyncplans > 0)
-   return false;
-
foreach(l, ((Append *) node)->appendplans)
{
if (!ExecSupportsBackwardScan((Plan *) 
lfirst(l)))
diff --git a/src/backend/executor/nodeAppend.c 
b/src/backend/executor/nodeAppend.c
index 3c1f12adaf..363cf9f4a5 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -117,6 +117,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
int firstvalid;
int  

Re: Defer selection of asynchronous subplans until the executor initialization stage

2021-05-11 Thread Andrey Lepikhov

On 11/5/21 08:55, Zhihong Yu wrote:

+           /* Check to see if subplan can be executed asynchronously */
+           if (subplan->async_capable)
+           {
+               subplan->async_capable = false;

It seems the if statement is not needed: you can directly assign false 
to  subplan->async_capable.Thank you, I agree with you.
Close look into the postgres_fdw regression tests show at least one open 
problem with this approach: we need to control situations when only one 
partition doesn't pruned and append isn't exist at all.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Asynchronous Append on postgres_fdw nodes.

2021-05-11 Thread Andrey Lepikhov

On 11/5/21 12:24, Etsuro Fujita wrote:

On Tue, May 11, 2021 at 11:58 AM Andrey Lepikhov
The extra tuple, which is from f1 or f2, would have been kept in the
Append node's as_asyncresults, not returned from the Append node to
the Limit node.  The async Foreign Scan nodes would fetch tuples
before the Append node ask the tuples, so the fetched tuples may or
may not be used.

Ok.>>  ->  Append (actual rows=3000 loops=1)

   ->  Async Foreign Scan on f1 (actual rows=0 loops=1)
   ->  Async Foreign Scan on f2 (actual rows=0 loops=1)
   ->  Foreign Scan on f3 (actual rows=3000 loops=1)

Here we give preference to the synchronous scan. Why?


This would be expected behavior, and the reason is avoid performance
degradation; you might think it would be better to execute the async
Foreign Scan nodes more aggressively, but it would require
waiting/polling for file descriptor events many times, which is
expensive and might cause performance degradation.  I think there is
room for improvement, though.
Yes, I agree with you. Maybe you can add note in documentation on 
async_capable, for example:
"... Synchronous and asynchronous scanning strategies can be mixed by 
optimizer in one scan plan of a partitioned table or an 'UNION ALL' 
command. For performance reasons, synchronous scans executes before the 
first of async scan. ..."


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Asymmetric partition-wise JOIN

2021-05-26 Thread Andrey Lepikhov

Next version of the patch.
For searching any problems I forced this patch during 'make check' 
tests. Some bugs were found and fixed.


--
regards,
Andrey Lepikhov
Postgres Professional
From 101614b504b0b17e201d2375c8af61cfc671e51d Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Fri, 2 Apr 2021 11:02:20 +0500
Subject: [PATCH] Asymmetric partitionwise join.

Teach optimizer to consider partitionwise join of non-partitioned
table with each partition of partitioned table.

Disallow asymmetric machinery for joining of two partitioned (or appended)
relations because it could cause CPU and memory huge consumption
during reparameterization of NestLoop path.
---
 src/backend/optimizer/path/joinpath.c|   9 +
 src/backend/optimizer/path/joinrels.c| 184 +++
 src/backend/optimizer/plan/setrefs.c |  13 +-
 src/backend/optimizer/util/appendinfo.c  |  37 ++-
 src/backend/optimizer/util/pathnode.c|   9 +-
 src/backend/optimizer/util/relnode.c |  19 +-
 src/include/optimizer/paths.h|   7 +-
 src/test/regress/expected/partition_join.out | 306 +++
 src/test/regress/sql/partition_join.sql  | 126 
 9 files changed, 682 insertions(+), 28 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index b67b517770..7a1a7b2e86 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -335,6 +335,15 @@ add_paths_to_joinrel(PlannerInfo *root,
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
   jointype, &extra);
+
+   /*
+* 7. If outer relation is delivered from partition-tables, consider
+* distributing inner relation to every partition-leaf prior to
+* append these leafs.
+*/
+   try_asymmetric_partitionwise_join(root, joinrel,
+ 
outerrel, innerrel,
+ 
jointype, &extra);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinrels.c 
b/src/backend/optimizer/path/joinrels.c
index 0dbe2ac726..e6bd2ea5fe 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
 
 #include "miscadmin.h"
 #include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -1551,6 +1552,189 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo 
*rel1, RelOptInfo *rel2,
}
 }
 
+/*
+ * Build RelOptInfo on JOIN of each partition of the outer relation and the 
inner
+ * relation. Return List of such RelOptInfo's. Return NIL, if at least one of
+ * these JOINs are impossible to build.
+ */
+static List *
+extract_asymmetric_partitionwise_subjoin(PlannerInfo *root,
+   
 RelOptInfo *joinrel,
+   
 AppendPath *append_path,
+   
 RelOptInfo *inner_rel,
+   
 JoinType jointype,
+   
 JoinPathExtraData *extra)
+{
+   List*result = NIL;
+   ListCell*lc;
+
+   foreach (lc, append_path->subpaths)
+   {
+   Path*child_path = lfirst(lc);
+   RelOptInfo  *child_rel = child_path->parent;
+   Relids  child_join_relids;
+   Relids  parent_relids;
+   RelOptInfo  *child_join_rel;
+   SpecialJoinInfo *child_sjinfo;
+   List*child_restrictlist;
+
+   child_join_relids = bms_union(child_rel->relids, 
inner_rel->relids);
+   parent_relids = bms_union(append_path->path.parent->relids,
+ 
inner_rel->relids);
+
+   child_sjinfo = build_child_join_sjinfo(root, extra->sjinfo,
+   
   child_rel->relids,
+   
   inner_rel->relids);
+   child_restrictlist = (List *)
+   adjust_appendrel_attrs_multilevel(root, (Node 
*)extra->restrictlist,
+   
  child_join_r

Re: [PoC] Reducing planning time when tables have many partitions

2023-02-14 Thread Andrey Lepikhov

On 2/6/23 06:47, Yuya Watari wrote:

Of course, I'm not sure if my approach in v16-0003 is ideal, but it
may help solve your concern above. Since simple_rel_array[0] is no
longer necessary with my patch, I removed the setup_append_rel_entry()
function in v16-0004. However, to work the patch, I needed to change
some assertions in v16-0005. For more details, please see the commit
message of v16-0005. After these works, the attached patches passed
all regression tests in my environment.

Instead of my approach, imitating the following change to
get_eclass_indexes_for_relids() is also a possible solution. Ignoring
NULL RelOptInfos enables us to avoid the segfault, but we have to
adjust EquivalenceMemberIterator to match the result, and I'm not sure
if this idea is correct.
As I see, You moved the indexes from RelOptInfo to PlannerInfo. May be 
better to move them into RangeTblEntry instead?


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [POC] Allow flattening of subquery with a link to upper query

2022-10-03 Thread Andrey Lepikhov

On 9/13/22 16:40, Andrey Lepikhov wrote:

On 5/9/2022 12:22, Richard Guo wrote:
On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
To resolve both issues, lower outer join passes through pull_sublinks_* 
into flattening routine (see attachment).

I've added these cases into subselect.sql

In attachment - new version of the patch, rebased onto current master.

--
Regards
Andrey Lepikhov
Postgres Professional
From 6462578f8789cb831f45ebfad65308d6afabb833 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 28 Sep 2022 13:42:12 +0300
Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary
 join query. With many restrictions, check each subquery and pull up
 expressions, references upper query block. Works for operators '=' and 'IN'.
 Machinery of converting ANY subquery to JOIN stays the same with minor
 changes in walker function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pass a lower outer join through the pullup_sublink recursive procedure to find
out some situations when qual references outer side of an outer join.

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469.
---
 src/backend/optimizer/plan/subselect.c| 320 +++-
 src/backend/optimizer/prep/prepjointree.c |  71 ++--
 src/backend/optimizer/util/tlist.c|   2 +-
 src/backend/optimizer/util/var.c  |   8 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/optimizer/optimizer.h |   1 +
 src/include/optimizer/subselect.h |   3 +-
 src/include/optimizer/tlist.h |   1 +
 src/test/regress/expected/prepare.out |  18 +
 src/test/regress/expected/subselect.out   | 438 ++
 src/test/regress/sql/prepare.sql  |   7 +
 src/test/regress/sql/subselect.sql| 162 
 12 files changed, 1004 insertions(+), 37 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92e3338584..29da43bb4d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -32,6 +32,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
@@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context
 } inline_cte_walker_context;
 
 
+bool optimize_correlated_subqueries = true;
+
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
 		   List *plan_params,
 		   SubLinkType subLinkType, int subLinkId,
@@ -1229,6 +1232,288 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
 	return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+static bool
+contain_placeholders(Node *node, inline_cte_walker_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Query))
+		return query_tree_walker((Query *) node, contain_placeholders, context, 0);
+	else if (IsA(node, PlaceHolderVar))
+	{
+		return true;
+	}
+
+	return expression_tree_walker(node, contain_placeholders, context);
+}
+
+/*
+ * To be pullable all clauses of flattening correlated subquery should be
+ * anded and mergejoinable (XXX: really necessary?)
+ */
+static bool
+quals_is_pullable(Node *quals)
+{
+	if (!contain_vars_of_level(quals, 1))
+		return true;
+
+	if (quals && IsA(quals, OpExpr))
+	{
+		OpExpr *expr = (OpExpr *) quals;
+		Node   *leftarg;
+
+		/* Contains only one expression */
+		leftarg = linitial(expr->args);
+		if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */
+			return false;
+
+		if (contain_placeholders(quals, NULL))
+			return false;
+
+		return true;
+	}
+	else if (is_andclause(quals))
+	{
+		ListCell   *l;
+
+		foreach(l, ((BoolExpr *) quals)->args)
+		{
+			Node *andarg = (Node *) lfirst(l);
+
+			if (!IsA(andarg, OpExpr))
+return false;
+			if (!quals_is_pullable(andarg))
+return false;
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+typedef struct
+{
+	Query  *subquery;
+	int		newvarno;
+	List   *pulling_quals;
+	bool	varlevel_up;
+} correlated_t;
+
+static Node *
+pull_subquery_clauses_mutator(Node *node, correlated_t *ctx)
+{
+	if (node == NULL)
+		return NULL;
+
+	if (IsA(node, OpExpr) && !ctx->varlevel_up)
+	{
+		if (!contain_vars_of_level(node, 1))
+			return node;
+
+		/*
+		 * The expression contains links to upper relation. It will be pulled to
+		 * uplevel. All links into vars of upper levels must be changed.
+		 */
+
+		ctx->varlevel_up = true;
+		ctx->pulling_quals =
+			lappend(ctx->pulling_quals,
+	expression_tree_mutator(node,
+			pull_subquery_clauses_mutator,
+			(

Re: [POC] Allow flattening of subquery with a link to upper query

2022-10-05 Thread Andrey Lepikhov

On 5/10/2022 02:45, Zhihong Yu wrote:

Hi,
For contain_placeholders():

+   if (IsA(node, Query))
+       return query_tree_walker((Query *) node, contain_placeholders, 
context, 0);

+   else if (IsA(node, PlaceHolderVar))

Fixed


The `else` is not needed.

For correlated_t struct, it would be better if the fields have comments.

Ok, I've added some comments.


+                    * (for grouping, as an example). So, revert its 
status to

+                    * a full valued entry.

full valued -> fully valued

Fixed

--
regards,
Andrey Lepikhov
Postgres Professional
From da570914e53bc34e6bf3291649725a041a8b929c Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 28 Sep 2022 13:42:12 +0300
Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary
 join query. With many restrictions, check each subquery and pull up
 expressions, references upper query block. Works for operators '=' and 'IN'.
 Machinery of converting ANY subquery to JOIN stays the same with minor
 changes in walker function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pass a lower outer join through the pullup_sublink recursive procedure to find
out some situations when qual references outer side of an outer join.

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database 
Syst. 7 (1982): 443-469.
---
 src/backend/optimizer/plan/subselect.c| 328 +++-
 src/backend/optimizer/prep/prepjointree.c |  71 ++--
 src/backend/optimizer/util/tlist.c|   2 +-
 src/backend/optimizer/util/var.c  |   8 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/optimizer/optimizer.h |   1 +
 src/include/optimizer/subselect.h |   3 +-
 src/include/optimizer/tlist.h |   1 +
 src/test/regress/expected/prepare.out |  18 +
 src/test/regress/expected/subselect.out   | 438 ++
 src/test/regress/sql/prepare.sql  |   7 +
 src/test/regress/sql/subselect.sql| 162 
 src/tools/pgindent/typedefs.list  |   1 +
 13 files changed, 1013 insertions(+), 37 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c 
b/src/backend/optimizer/plan/subselect.c
index 92e3338584..be19d85524 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -32,6 +32,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
@@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context
 } inline_cte_walker_context;
 
 
+bool optimize_correlated_subqueries = true;
+
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
   List *plan_params,
   SubLinkType subLinkType, int 
subLinkId,
@@ -1229,6 +1232,296 @@ inline_cte_walker(Node *node, inline_cte_walker_context 
*context)
return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+static bool
+contain_placeholders(Node *node, inline_cte_walker_context *context)
+{
+   if (node == NULL)
+   return false;
+
+   if (IsA(node, Query))
+   return query_tree_walker((Query *) node, contain_placeholders, 
context, 0);
+   if (IsA(node, PlaceHolderVar))
+   return true;
+
+   return expression_tree_walker(node, contain_placeholders, context);
+}
+
+/*
+ * To be pullable all clauses of flattening correlated subquery should be
+ * anded and mergejoinable (XXX: really necessary?)
+ */
+static bool
+quals_is_pullable(Node *quals)
+{
+   if (!contain_vars_of_level(quals, 1))
+   return true;
+
+   if (quals && IsA(quals, OpExpr))
+   {
+   OpExpr *expr = (OpExpr *) quals;
+   Node   *leftarg;
+
+   /* Contains only one expression */
+   leftarg = linitial(expr->args);
+   if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it 
really necessary ? */
+   return false;
+
+   if (contain_placeholders(quals, NULL))
+   return false;
+
+   return true;
+   }
+   else if (is_andclause(quals))
+   {
+   ListCell   *l;
+
+   foreach(l, ((BoolExpr *) quals)->args)
+   {
+   Node *andarg = (Node *) lfirst(l);
+
+   if (!IsA(andarg, OpExpr))
+   return false;
+   if (!quals_is_pullable(andarg))
+   return false;
+   }
+
+   return true;
+   }
+
+   return false;
+}
+
+/*

Re: Removing unneeded self joins

2022-10-05 Thread Andrey Lepikhov

New version, rebased onto current master.
Nothing special, just rebase.

--
regards,
Andrey Lepikhov
Postgres Professional
From 03aab7a2431032166c9ea5f52fbcccaf7168abec Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d750290f13..5ce2d4d2fa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5290,6 +5290,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index c31fcc917d..51f672a65c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3513,6 +3513,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
 
@@ -3568,6 +3583,7 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+   List   *exprs = NIL;
 
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
@@ -3616,6 +3632,24 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+   if 
(bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+   {
+   MemoryContext oldMemCtx =
+   

Re: document the need to analyze partitioned tables

2022-10-05 Thread Andrey Lepikhov

On 10/5/22 13:37, Laurenz Albe wrote:

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).


I'd like to suggest an improvement to this.  The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.


By the way, where are the statistics of a partitioned tables used?  The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.

For example, it is used to estimate selectivity of join clause:

CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
CREATE TABLE test_0 PARTITION OF test
  FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE test_1 PARTITION OF test
  FOR VALUES WITH (modulus 2, remainder 1);

INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
VACUUM ANALYZE test;
INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
VACUUM ANALYZE test_0,test_1;

EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
VACUUM ANALYZE test;
EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;

Here without actual statistics on parent table we make wrong prediction.

--
Regards
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-10 Thread Andrey Lepikhov

On 10/7/22 11:18, Etsuro Fujita wrote:

On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita  wrote:

I will review the patch a bit more, but I feel that it is
in good shape.


One thing I noticed is this bit added to CopyMultiInsertBufferFlush()
to run triggers on the foreign table.

+   /* Run AFTER ROW INSERT triggers */
+   if (resultRelInfo->ri_TrigDesc != NULL &&
+   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+   {
+   Oid relid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+   for (i = 0; i < inserted; i++)
+   {
+   TupleTableSlot *slot = rslots[i];
+
+   /*
+* AFTER ROW Triggers might reference the tableoid column,
+* so (re-)initialize tts_tableOid before evaluating them.
+*/
+   slot->tts_tableOid = relid;
+
+   ExecARInsertTriggers(estate, resultRelInfo,
+slot, NIL,
+cstate->transition_capture);
+   }
+   }

Since foreign tables cannot have transition tables, we have
trig_insert_new_table=false.  So I simplified the if test and added an
assertion ensuring trig_insert_new_table=false.  Attached is a new
version of the patch.  I tweaked some comments a bit as well.  I think
the patch is committable.  So I plan on committing it next week if
there are no objections.
I reviewed the patch one more time. Only one question: bistate and 
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on 
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.


--
Regards
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-12 Thread Andrey Lepikhov

On 10/12/22 07:56, Etsuro Fujita wrote:

On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
 wrote:

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.


You mean the bistate member of CopyMultiInsertBuffer?

Yes


We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table.  So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions.  How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached?  In the attached I
updated comments a bit further as well.

Yes, quite enough.

--
Regards
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-28 Thread Andrey Lepikhov

On 28/10/2022 16:12, Etsuro Fujita wrote:

On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita  wrote:

I have committed the patch after tweaking comments a little bit further.


I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it).  So I plan to
close this in the November CF unless they think otherwise.

Anyway, thanks for the patch, Andrey!  Thanks for reviewing, Ian and Zhihong!

Thanks,

I studied performance of this code in comparison to bulk INSERTions.
This patch seems to improve speed of insertion by about 20%. Also, this 
patch is very invasive. So, I don't have any plans to work on it now.


--
regards,
Andrey Lepikhov
Postgres Professional





  1   2   3   >