date:20200113

Re: [Proposal] Global temporary tables

2020-01-13 Thread Konstantin Knizhnik





On 12.01.2020 4:51, Tomas Vondra wrote:

On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:



On 09.01.2020 19:48, Tomas Vondra wrote:


The most complex and challenged task is to support GTT for all kind 
of indexes. Unfortunately I can not proposed some good universal 
solution for it.
Just patching all existed indexes implementation seems to be the 
only choice.




I haven't looked at the indexing issue closely, but IMO we need to
ensure that every session sees/uses only indexes on GTT that were
defined before the seesion started using the table.


Why? It contradicts with behavior of normal tables.
Assume that you have active clients and at some point of time DBA 
recognizes that them are spending to much time in scanning some GTT.
It cab create index for this GTT but if existed client will not be 
able to use this index, then we need somehow make this clients to 
restart their sessions?
In my patch I have implemented building indexes for GTT on demand: if 
accessed index on GTT is not yet initialized, then it is filled with 
local data.


Yes, I know the behavior would be different from behavior for regular
tables. And yes, it would not allow fixing slow queries in sessions
without interrupting those sessions.

I proposed just ignoring those new indexes because it seems much simpler
than alternative solutions that I can think of, and it's not like those
other solutions don't have other issues.


Quit opposite: prohibiting sessions to see indexes created before 
session start to use GTT requires more efforts. We need to somehow 
maintain and check GTT first access time.




For example, I've looked at the "on demand" building as implemented in
global_private_temp-8.patch, I kinda doubt adding a bunch of index build
calls into various places in index code seems somewht suspicious.


We in any case has to initialize GTT indexes on demand even if we 
prohibit usages of indexes created after first access by session to GTT.
So the difference is only in one thing: should we just initialize empty 
index or populate it with local data (if rules for index usability are 
the same for GTT as for normal tables).
From implementation point of view there is no big difference. Actually 
building index in standard way is even simpler than constructing empty 
index. Originally I have implemented
first approach (I just forgot to consider case when GTT was already user 
by a session). Then I rewrited it using second approach and patch even 
became simpler.




* brinbuild is added to brinRevmapInitialize, which is meant to
  initialize state for scanning. It seems wrong to build the index we're
  scanning from this function (layering and all that).

* btbuild is called from _bt_getbuf. That seems a bit ... suspicious?



As I already mentioned - support of indexes for GTT is one of the most 
challenged things in my patch.
I didn't find good and universal solution. So I agreed that call of 
btbuild from _bt_getbuf may be considered as suspicious.
I will be pleased if you or sombody else can propose better elternative 
and not only for B-Tree, but for all other indexes.


But as I already wrote above, prohibiting session to used indexes 
created after first access to GTT doesn't solve the problem.
For normal tables (and for local temp tables) indexes are initialized at 
the time of their creation.
With GTT it doesn't work, because each session has its own local data of 
GTT.
We should either initialize/build index on demand (when it is first 
accessed), either at the moment of session start initialize indexes for 
all existed GTTs.
Last options seem to be much worser from my point of view: there may me 
huge number of GTT and session may not need to access GTT at all.


... and so on for other index types. Also, what about custom indexes
implemented in extensions? It seems a bit strange each of them has to
support this separately.


I have already complained about it: my patch supports GTT for all 
built-in indexes, but custom indexes has to handle it themselves.
Looks like to provide some generic solution we need to extend index API, 
providing two diffrent operations: creation and initialization.
But extending index API is very critical change... And also it doesn't 
solve the problem with all existed extensions: them in any case have

to be rewritten to implement new API version in order to support GTT.


IMHO if this really is the right solution, we need to make it work for
existing indexes without having to tweak them individually. Why don't we
track a flag whether an index on GTT was initialized in a given session,
and if it was not then call the build function before calling any other
function from the index AM?
But let's talk about other issues caused by "on demand" build. Imagine
you have 50 sessions, each using the same GTT with a GB of per-session
data. Now you create a new index on the GTT, which forces the sessions
to build it's "local" index. Those builds will use maintenance_work_mem
e

Re: How to make a OpExpr check compatible among different versions

2020-01-13 Thread Peter Eisentraut


On 2020-01-13 08:29, Andy Fan wrote:
During one of my works for logical rewrite,  I want to check if the expr 
is a given Expr.


so the simplest way is:
if (expr->opno == 418 && nodeTag(linitial(expr->args)) == T_xxx  && 
nodeTag(lsecond(expr->args)) == T_ )

{
  ..
}

if we write code like above,  we may have issues if the oid changed in 
the future version.


Generally, you would do this by using a preprocessor symbol.  For 
example, instead of hardcoding the OID of the text type, you would use 
the symbol TEXTOID instead.  Symbols like that exist for many catalog 
objects that one might reasonably need to hardcode.


However, hardcoding an OID reference to an operator looks like a design 
mistake to me.  Operators should normally be looked up via operator 
classes or similar structures that convey the meaning of the operator.


Also, instead of nodeTag() == T_xxx you should use the IsA() macro.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Comment fix in session.h

2020-01-13 Thread Amit Kapila

On Mon, Jan 13, 2020 at 12:51 PM Antonin Houska  wrote:
>
> This diff fixes what I consider a typo.
>

LGTM.  I'll push this in some time.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Question regarding heap_multi_insert documentation

2020-01-13 Thread Michael Paquier

On Mon, Jan 13, 2020 at 12:40:20AM +0100, Daniel Gustafsson wrote:
> Thanks for clarifying. PFA tiny patch for this.

Thanks, pushed.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Block level parallel vacuum

2020-01-13 Thread Sergei Kornilov

Hello

> I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.

Right.

>>  > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
>>  > - skip_parallel_vacuum_index(Irel[i], lps->lvshared));
>>  > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
>>  > + skip_parallel_vacuum_index(Irel[i],
>>  > + lps->lvshared));
>>  >
>>  > The above condition is true when the index can *not* do parallel index 
>> vacuum.

Ouch, right. I was wrong. (or the variable name and the comment really confused 
me)

> Okay, would it better if we get rid of this variable and have code like below?
>
> /* Skip the indexes that can be processed by parallel workers */
> if ( !(get_indstats(lps->lvshared, i) == NULL ||
> skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> continue;

Complex condition... Not sure.

> How about changing it to skipped_index and change the comment to something 
> like “We are interested in only index skipped parallel vacuum”?

I prefer this idea.

> Today, again thinking about it, it seems
> the idea Mahendra is suggesting that is giving an error if the
> parallel degree is not specified seems reasonable to me.

+1

regards, Sergei

[PATCH] distinct aggregates within a window function WIP

2020-01-13 Thread Krasiyan Andreev

Hi hackers,

I want to propose to you an old patch for Postgres 11, off-site developed
by Oliver Ford,
but I have permission from him to publish it and to continue it's
development,
that allow distinct aggregates, like select sum(distinct nums) within a
window function.

I have rebased it for current git master branch and have made necessary
changes to it to work with Postgres 13devel.

It's a WIP, because it doesn't have tests yet (I will add them later) and
also, it works for a int, float, and numeric types,
but probably distinct check can be rewritten for possible performance
improvement,
with storing the distinct elements in a hash table which should give a
performance improvement.

If you find the implementation of patch acceptable from committers
perspective,
I will answer to all yours design and review notes and will try to go ahead
with it,
also, I will add this patch to the March commit fest.

For example usage of a patch, if you have time series data, with current
Postgres you will get an error:

postgres=# CREATE TABLE t_demo AS
postgres-# SELECT   ordinality, day, date_part('week', day) AS week
postgres-# FROMgenerate_series('2020-01-02', '2020-01-15', '1
day'::interval)
postgres-# WITH ORDINALITY AS day;
SELECT 14
postgres=# SELECT * FROM t_demo;
 ordinality |  day   | week
++--
  1 | 2020-01-02 00:00:00+02 |1
  2 | 2020-01-03 00:00:00+02 |1
  3 | 2020-01-04 00:00:00+02 |1
  4 | 2020-01-05 00:00:00+02 |1
  5 | 2020-01-06 00:00:00+02 |2
  6 | 2020-01-07 00:00:00+02 |2
  7 | 2020-01-08 00:00:00+02 |2
  8 | 2020-01-09 00:00:00+02 |2
  9 | 2020-01-10 00:00:00+02 |2
 10 | 2020-01-11 00:00:00+02 |2
 11 | 2020-01-12 00:00:00+02 |2
 12 | 2020-01-13 00:00:00+02 |3
 13 | 2020-01-14 00:00:00+02 |3
 14 | 2020-01-15 00:00:00+02 |3
(14 rows)

postgres=# SELECT   *,
postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2
FOLLOWING)
postgres-# FROMt_demo;
ERROR:  DISTINCT is not implemented for window functions
LINE 2: array_agg(DISTINCT week) OVER (ORDER BY day ROWS
^

So you will need to write something like this:

postgres=# SELECT  *, (SELECT array_agg(DISTINCT unnest) FROM unnest(x)) AS
b
postgres-# FROM
postgres-# (
postgres(# SELECT  *,
postgres(# array_agg(week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS x
postgres(# FROMt_demo
postgres(# ) AS a;
 ordinality |  day   | week |  x  |   b
++--+-+---
  1 | 2020-01-02 00:00:00+02 |1 | {1,1,1} | {1}
  2 | 2020-01-03 00:00:00+02 |1 | {1,1,1,1}   | {1}
  3 | 2020-01-04 00:00:00+02 |1 | {1,1,1,1,2} | {1,2}
  4 | 2020-01-05 00:00:00+02 |1 | {1,1,1,2,2} | {1,2}
  5 | 2020-01-06 00:00:00+02 |2 | {1,1,2,2,2} | {1,2}
  6 | 2020-01-07 00:00:00+02 |2 | {1,2,2,2,2} | {1,2}
  7 | 2020-01-08 00:00:00+02 |2 | {2,2,2,2,2} | {2}
  8 | 2020-01-09 00:00:00+02 |2 | {2,2,2,2,2} | {2}
  9 | 2020-01-10 00:00:00+02 |2 | {2,2,2,2,2} | {2}
 10 | 2020-01-11 00:00:00+02 |2 | {2,2,2,2,3} | {2,3}
 11 | 2020-01-12 00:00:00+02 |2 | {2,2,2,3,3} | {2,3}
 12 | 2020-01-13 00:00:00+02 |3 | {2,2,3,3,3} | {2,3}
 13 | 2020-01-14 00:00:00+02 |3 | {2,3,3,3}   | {2,3}
 14 | 2020-01-15 00:00:00+02 |3 | {3,3,3} | {3}
(14 rows)

With attached version, you will get the desired results:

postgres=# SELECT   *,
postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2
FOLLOWING)
postgres-# FROMt_demo;
 ordinality |  day   | week | array_agg
++--+---
  1 | 2020-01-02 00:00:00+02 |1 | {1}
  2 | 2020-01-03 00:00:00+02 |1 | {1}
  3 | 2020-01-04 00:00:00+02 |1 | {1,2}
  4 | 2020-01-05 00:00:00+02 |1 | {1,2}
  5 | 2020-01-06 00:00:00+02 |2 | {1,2}
  6 | 2020-01-07 00:00:00+02 |2 | {1,2}
  7 | 2020-01-08 00:00:00+02 |2 | {2}
  8 | 2020-01-09 00:00:00+02 |2 | {2}
  9 | 2020-01-10 00:00:00+02 |2 | {2}
 10 | 2020-01-11 00:00:00+02 |2 | {2,3}
 11 | 2020-01-12 00:00:00+02 |2 | {2,3}
 12 | 2020-01-13 00:00:00+02 |3 | {2,3}
 13 | 2020-01-14 00:00:00+02 |3 | {2,3}
 14 | 2020-01-15 00:00:00+02 |3 | {3}
(14 rows)
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 4cc7da268d..66

Re: How to make a OpExpr check compatible among different versions

2020-01-13 Thread Andy Fan

On Mon, Jan 13, 2020 at 4:09 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-01-13 08:29, Andy Fan wrote:
> > During one of my works for logical rewrite,  I want to check if the expr
> > is a given Expr.
> >
> > so the simplest way is:
> > if (expr->opno == 418 && nodeTag(linitial(expr->args)) == T_xxx  &&
> > nodeTag(lsecond(expr->args)) == T_ )
> > {
> >   ..
> > }
> >
> > if we write code like above,  we may have issues if the oid changed in
> > the future version.
>
> Generally, you would do this by using a preprocessor symbol.  For
> example, instead of hardcoding the OID of the text type, you would use
> the symbol TEXTOID instead.  Symbols like that exist for many catalog
> objects that one might reasonably need to hardcode.
>
> However, hardcoding an OID reference to an operator looks like a design
> mistake to me.  Operators should normally be looked up via operator
> classes or similar structures that convey the meaning of the operator.
>

Yes, I just realized this.  Thanks for your point!


> Also, instead of nodeTag() == T_xxx you should use the IsA() macro.
>
> Thank you for this as well.

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

isTempNamespaceInUse() is incorrect with its handling of MyBackendId

2020-01-13 Thread Michael Paquier

Hi all,

While reviewing some code in namespace.c, I have bumped into the
following issue introduced by 246a6c8:
diff --git a/src/backend/catalog/namespace.c
b/src/backend/catalog/namespace.c
index c82f9fc4b5..e70243a008 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3235,8 +3235,8 @@ isTempNamespaceInUse(Oid namespaceId)

backendId = GetTempNamespaceBackendId(namespaceId);

-   if (backendId == InvalidBackendId ||
-   backendId == MyBackendId)
+   /* No such temporary namespace? */
+   if (backendId == InvalidBackendId)
return false;

The current logic of isTempNamespaceInUse() would cause a session
calling the routine to return always false if trying to check if its
own temporary session is in use, but that's incorrect.  It is actually
safe to remove the check on MyBackendId as the code would fall back on
a check equivalent to MyProc->tempNamespaceId a bit down as per the
attached, so let's fix it.

Thoughts?
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index c82f9fc4b5..e70243a008 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3235,8 +3235,8 @@ isTempNamespaceInUse(Oid namespaceId)
 
 	backendId = GetTempNamespaceBackendId(namespaceId);
 
-	if (backendId == InvalidBackendId ||
-		backendId == MyBackendId)
+	/* No such temporary namespace? */
+	if (backendId == InvalidBackendId)
 		return false;
 
 	/* Is the backend alive? */


signature.asc
Description: PGP signature

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-13 Thread Dilip Kumar

On Thu, Jan 9, 2020 at 12:09 PM Amit Kapila  wrote:
>
> On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar  wrote:
> >
> > On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila  wrote:
> > >
> > > On Wed, Jan 8, 2020 at 1:12 PM Dilip Kumar  wrote:
> > > >
> > > > I have observed one more design issue.
> > > >
> > >
> > > Good observation.
> > >
> > > >  The problem is that when we
> > > > get a toasted chunks we remember the changes in the memory(hash table)
> > > > but don't stream until we get the actual change on the main table.
> > > > Now, the problem is that we might get the change of the toasted table
> > > > and the main table in different streams.  So basically, in a stream,
> > > > if we have only got the toasted tuples then even after
> > > > ReorderBufferStreamTXN the memory usage will not be reduced.
> > > >
> > >
> > > I think we can't split such changes in a different stream (unless we
> > > design an entirely new solution to send partial changes of toast
> > > data), so we need to send them together. We can keep a flag like
> > > data_complete in ReorderBufferTxn and mark it complete only when we
> > > are able to assemble the entire tuple.  Now, whenever, we try to
> > > stream the changes once we reach the memory threshold, we can check
> > > whether the data_complete flag is true, if so, then only send the
> > > changes, otherwise, we can pick the next largest transaction.  I think
> > > we can retry it for few times and if we get the incomplete data for
> > > multiple transactions, then we can decide to spill the transaction or
> > > maybe we can directly spill the first largest transaction which has
> > > incomplete data.
> > >
> > Yeah, we might do something on this line.  Basically, we need to mark
> > the top-transaction as data-incomplete if any of its subtransaction is
> > having data-incomplete (it will always be the latest sub-transaction
> > of the top transaction).  Also, for streaming, we are checking the
> > largest top transaction whereas for spilling we just need the larget
> > (sub) transaction.   So we also need to decide while picking the
> > largest top transaction for streaming, if we get a few transactions
> > with in-complete data then how we will go for the spill.  Do we spill
> > all the sub-transactions under this top transaction or we will again
> > find the larget (sub) transaction for spilling.
> >
>
> I think it is better to do later as that will lead to the spill of
> only required (minimum changes to get the memory below threshold)
> changes.
I think instead of doing this can't we just spill the changes which
are in toast_hash.  Basically, at the end of the stream, we have some
toast tuple which we could not stream because we did not have the
insert for the main table then we can spill only those changes which
are in tuple hash.  And, in the subsequent stream whenever we get the
insert for the main table at that time we can restore those changes
and stream.  We can also maintain some flag saying data is not
complete (with some change LSN number) and after that LSN we can spill
any toast change to disk until we get the change for the main table,
by this way we can avoid building tuple hash until we get the change
for the main table.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: ALTER TABLE support for dropping generation expression

2020-01-13 Thread Sergei Kornilov

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Thank you!
Looks good to me. I have no further comments. I'll mark as ready for committer.

The new status of this patch is: Ready for Committer

Re: [HACKERS] Block level parallel vacuum

2020-01-13 Thread Amit Kapila

On Thu, Jan 9, 2020 at 4:03 PM Amit Kapila  wrote:
>
> On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, 8 Jan 2020 at 22:16, Amit Kapila  wrote:
> > >
> > >
> > > What do you think of the attached?  Sawada-san, kindly verify the
> > > changes and let me know your opinion.
> >
> > I agreed to not include both the FAST option patch and
> > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus
> > on the main part and we can discuss and add them later if want.
> >
> > I've looked at the latest version patch you shared. Overall it looks
> > good and works fine. I have a few small comments:
> >
>
> I have addressed all your comments and slightly change nearby comments
> and ran pgindent.  I think we can commit the first two preparatory
> patches now unless you or someone else has any more comments on those.
>

I have pushed the first one (4e514c6) and I am planning to commit the
next one (API: v46-0001-Introduce-IndexAM-fields-for-parallel-vacuum)
patch on Wednesday.  We are still discussing a few things for the main
parallel vacuum patch
(v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel) which
we should reach conclusion soon. In the attached, I have made a few
changes in the comments of patch
v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v46-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
Description: Binary data


v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel.patch
Description: Binary data

Re: Comment fix in session.h

2020-01-13 Thread Amit Kapila

On Mon, Jan 13, 2020 at 2:08 PM Amit Kapila  wrote:
>
> On Mon, Jan 13, 2020 at 12:51 PM Antonin Houska  wrote:
> >
> > This diff fixes what I consider a typo.
> >
>
> LGTM.  I'll push this in some time.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2020-01-13 Thread Paul Guo

On Fri, Jan 10, 2020 at 9:43 PM Alvaro Herrera 
wrote:

> On 2020-Jan-09, Alvaro Herrera wrote:
>
> > I looked at this a little while and was bothered by the perl changes; it
> > seems out of place to have RecursiveCopy be thinking about tablespaces,
> > which is way out of its league.  So I rewrote that to use a callback:
> > the PostgresNode code passes a callback that's in charge to handle the
> > case of a symlink.  Things look much more in place with that.  I didn't
> > verify that all places that should use this are filled.
> >
> > In 0002 I found adding a new function unnecessary: we can keep backwards
> > compat by checking 'ref' of the third argument.  With that we don't have
> > to add a new function.  (POD changes pending.)
>
> I forgot to add that something in these changes is broken (probably the
> symlink handling callback) so the tests fail, but I couldn't stay away
> from my daughter's birthday long enough to figure out what or how.  I'm
> on something else today, so if one of you can research and submit fixed
> versions, that'd be great.
>
> Thanks,
>

I spent some time on this before getting off work today.

With below fix, the 4th test is now ok but the 5th (last one) hangs due to
panic.

(gdb) bt
#0  0x003397e32625 in raise () from /lib64/libc.so.6
#1  0x003397e33e05 in abort () from /lib64/libc.so.6
#2  0x00a90506 in errfinish (dummy=0) at elog.c:590
#3  0x00a92b4b in elog_finish (elevel=22, fmt=0xb2d580 "cannot find
directory %s tablespace %d database %d") at elog.c:1465
#4  0x0057aa0a in XLogLogMissingDir (spcNode=16384, dbNode=0,
path=0x1885100 "pg_tblspc/16384/PG_13_202001091/16389") at xlogutils.c:104
#5  0x0065e92e in dbase_redo (record=0x1841568) at dbcommands.c:2225
#6  0x0056ac94 in StartupXLOG () at xlog.c:7200


diff --git a/src/include/commands/dbcommands.h
b/src/include/commands/dbcommands.h
index b71b400e700..f8f6d5ffd03 100644
--- a/src/include/commands/dbcommands.h
+++ b/src/include/commands/dbcommands.h
@@ -19,8 +19,6 @@
 #include "lib/stringinfo.h"
 #include "nodes/parsenodes.h"

-extern void CheckMissingDirs4DbaseRedo(void);
-
 extern Oid createdb(ParseState *pstate, const CreatedbStmt *stmt);
 extern void dropdb(const char *dbname, bool missing_ok, bool force);
 extern void DropDatabase(ParseState *pstate, DropdbStmt *stmt);
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index e6e7ea505d9..4eef8bb1985 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -615,11 +615,11 @@ sub _srcsymlink
my $srcrealdir = readlink($srcpath);

opendir(my $dh, $srcrealdir);
-   while (readdir $dh)
+   while (my $entry = (readdir $dh))
{
-   next if (/^\.\.?$/);
-   my $spath = "$srcrealdir/$_";
-   my $dpath = "$dstrealdir/$_";
+   next if ($entry eq '.' or $entry eq '..');
+   my $spath = "$srcrealdir/$entry";
+   my $dpath = "$dstrealdir/$entry";
RecursiveCopy::copypath($spath, $dpath);
}
closedir $dh;

Re: benchmarking Flex practices

2020-01-13 Thread John Naylor

On Mon, Jan 13, 2020 at 7:57 AM Tom Lane  wrote:
>
> Hmm ... after a bit of research I agree that these functions are not
> a portability hazard.  They are present at least as far back as flex
> 2.5.33 which is as old as we've got in the buildfarm.
>
> However, I'm less excited about them from a performance standpoint.
> The BEGIN() macro expands to (ordinarily)
>
> yyg->yy_start = integer-constant
>
> which is surely pretty cheap.  However, yy_push_state is substantially
> more expensive than that, not least because the first invocation in
> a parse cycle will involve a malloc() or palloc().  Likewise yy_pop_state
> is multiple times more expensive than plain BEGIN().
>
> Now, I agree that this is negligible for ECPG's usage, so if
> pushing/popping state is helpful there, let's go for it.  But I am
> not convinced it's negligible for the backend, and I also don't
> see that we actually need to track any nested scanner states there.
> So I'd rather stick to using BEGIN in the backend.  Not sure about
> psql.

Okay, removed in v11. The advantage of stack functions in ECPG was to
avoid having the two variables state_before_str_start and
state_before_str_stop. But if we don't use stack functions in the
backend, then consistency wins in my mind. Plus, it was easier for me
to revert the stack functions for all 3 scanners.

> BTW, while looking through the latest patch it struck me that
> "UCONST" is an underspecified and potentially confusing name.
> It doesn't indicate what kind of constant we're talking about,
> for instance a C programmer could be forgiven for thinking
> it means something like "123U".  What do you think of "USCONST",
> following UIDENT's lead of prefixing U onto whatever the
> underlying token type is?

Makes perfect sense. Grepping through the source tree, indeed it seems
the replication command scanner is using UCONST for digits.

Some other cosmetic adjustments in ECPG parser.c:
-Previously I had a WIP comment in about 2 functions that are copies
from elsewhere. In v11 I just noted that they are copied.
-I thought it'd be nicer if ECPG spelled UESCAPE in caps when
reconstructing the string.
-Corrected copy-paste-o in comment

Also:
-reverted some spurious whitespace changes
-revised scan.l comment about the performance benefits of no backtracking
-split the ECPG C-comment scanning cleanup into a separate patch, as I
did for v6. I include it here since it's related (merging scanner
states), but not relevant to making the core scanner smaller.
-wrote draft commit messages

--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v11-0001-Reduce-size-of-backend-scanner-transition-array.patch
Description: Binary data


v11-0002-Merge-ECPG-scanner-states-regarding-C-comments.patch
Description: Binary data

Re: our checks for read-only queries are not great

2020-01-13 Thread Peter Eisentraut


On 2020-01-10 14:41, Robert Haas wrote:

This rule very nearly matches the current behavior: it explains why
temp table operations are allowed, and why ALTER SYSTEM is allowed,
and why REINDEX etc. are allowed. However, there's a notable
exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed
in a read-only transaction. Under the "doesn't change pg_dump output"
criteria, the first and third ones should be permitted but COMMIT
PREPARED should be denied, except maybe if the prepared transaction
didn't do any writes (and in that case, why did we bother preparing
it?). Despite that, this rule does a way better job explaining the
current behavior than anything else suggested so far.


I don't follow.  Does pg_dump dump prepared transactions?

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-13 Thread Juan José Santamaría Flecha

On Sat, Jan 11, 2020 at 5:06 PM Tomas Vondra 
wrote:

>
> Thanks. I did a quick review of this patch, and I think it's almost RFC.
>
>
Thanks for reviewing.


> - In func.sgml, it seems we've lost this bit:
>
>
> TM does not include trailing blanks.
> to_timestamp and to_date
> ignore
> the TM modifier.
>
>
>Does that mean the function no longer ignore the TM modifier? That
>would be somewhat problematic (i.e. it might break some user code).
>But looking at the code I don't see why the patch would have this
>effect, so I suppose it's merely a doc bug.
>
>
It is intentional. This patch uses the TM modifier to identify the usage of
localized names as input for to_timestamp() and to_date().


> - I don't think we need to include examples how to_timestmap ignores
>case, I'd say just stating the fact is clear enough. But if we want to
>have examples, I think we should not inline in the para but use the
>established pattern:
>
>
>  Some examples:
> 
> ...
> 
> 
>
>which is used elsewhere in the func.sgml file.
>

I was trying to match the style surrounding the usage notes for date/time
formatting [1]. Agreed that it is not worth an example on its own, so
dropped.


>
> - In formatting.c the "common/unicode_norm.h" should be right after
>includes from "catalog/" to follow the alphabetic order (unless
>there's a reason why that does not work).
>

Fixed.


>
> - I rather dislike the "dim" parameter name, because I immediately think
>"dimension" which makes little sense. I suggest renaming to "nitems"
>or "nelements" or something like that.
>


Agreed, using "nelements" as a better style matchup.

Please, find attached a version addressing the above mentioned.

[1] https://www.postgresql.org/docs/current/functions-formatting.html

Regards,

Juan José Santamaría Flecha

>


0001-Allow-localized-month-names-to_date-v5.patch
Description: Binary data

Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId

2020-01-13 Thread Julien Rouhaud

On Mon, Jan 13, 2020 at 06:37:03PM +0900, Michael Paquier wrote:
> Hi all,
>
> While reviewing some code in namespace.c, I have bumped into the
> following issue introduced by 246a6c8:
> diff --git a/src/backend/catalog/namespace.c
> b/src/backend/catalog/namespace.c
> index c82f9fc4b5..e70243a008 100644
> --- a/src/backend/catalog/namespace.c
> +++ b/src/backend/catalog/namespace.c
> @@ -3235,8 +3235,8 @@ isTempNamespaceInUse(Oid namespaceId)
>
> backendId = GetTempNamespaceBackendId(namespaceId);
>
> -   if (backendId == InvalidBackendId ||
> -   backendId == MyBackendId)
> +   /* No such temporary namespace? */
> +   if (backendId == InvalidBackendId)
> return false;
>
> The current logic of isTempNamespaceInUse() would cause a session
> calling the routine to return always false if trying to check if its
> own temporary session is in use, but that's incorrect.

Indeed.

> It is actually
> safe to remove the check on MyBackendId as the code would fall back on
> a check equivalent to MyProc->tempNamespaceId a bit down as per the
> attached, so let's fix it.
>
> Thoughts?

But that means an extraneous call to BackendIdGetProc() in that case, it seems
better to avoid it if we already have the information.

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-13 Thread vignesh C

On Thu, Sep 26, 2019 at 7:17 PM Luis Carril  wrote:
>
>
> I don't disagree with adding FOREIGN, though.
>
> Your patch is failing the pg_dump TAP tests.  Please use
> configure --enable-tap-tests, fix the problems, then resubmit.
>
> Fixed, I've attached a new version.

Will it be possible to add a test case for this, can we validate by
adding one test?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: pg_basebackup fails on databases with high OIDs

2020-01-13 Thread Peter Eisentraut


On 2020-01-11 17:47, Magnus Hagander wrote:

On Sat, Jan 11, 2020 at 5:44 PM Julien Rouhaud  wrote:


On Sat, Jan 11, 2020 at 08:21:11AM +0100, Peter Eisentraut wrote:

On 2020-01-06 21:00, Magnus Hagander wrote:

+0.5 to avoid calling OidInputFunctionCall()


Or just directly using atol() instead of atoi()? Well maybe not
directly but in a small wrapper that verifies it's not bigger than an
unsigned?

Unlike in cases where we use oidin etc, we are dealing with data that
is "mostly trusted" here, aren't we? Meaning we could call atol() on
it, and throw an error if it overflows, and be done with it?
Subdirectories in the data directory aren't exactly "untrusted enduser
data"...


Yeah, it looks like we are using strtoul() without additional error checking
in similar situations, so here is a patch doing it like that.



- true, isDbDir ? 
pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
+ true, isDbDir ? 
(Oid) strtoul(lastDir + 1, NULL, 10) : InvalidOid);


Looking at some other code, I just discovered the atooid() macro that already
does the same, maybe it'd be better for consistency to use that instead?


+1. Whie it does the same thing, consistency is good! :)


committed

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_basebackup fails on databases with high OIDs

2020-01-13 Thread Julien Rouhaud

On Mon, Jan 13, 2020 at 1:49 PM Peter Eisentraut
 wrote:
>
> On 2020-01-11 17:47, Magnus Hagander wrote:
> > On Sat, Jan 11, 2020 at 5:44 PM Julien Rouhaud  wrote:
> >>
> >> On Sat, Jan 11, 2020 at 08:21:11AM +0100, Peter Eisentraut wrote:
> >>> On 2020-01-06 21:00, Magnus Hagander wrote:
> > +0.5 to avoid calling OidInputFunctionCall()
> 
>  Or just directly using atol() instead of atoi()? Well maybe not
>  directly but in a small wrapper that verifies it's not bigger than an
>  unsigned?
> 
>  Unlike in cases where we use oidin etc, we are dealing with data that
>  is "mostly trusted" here, aren't we? Meaning we could call atol() on
>  it, and throw an error if it overflows, and be done with it?
>  Subdirectories in the data directory aren't exactly "untrusted enduser
>  data"...
> >>>
> >>> Yeah, it looks like we are using strtoul() without additional error 
> >>> checking
> >>> in similar situations, so here is a patch doing it like that.
> >>
> >>> - true, 
> >>> isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
> >>> + true, 
> >>> isDbDir ? (Oid) strtoul(lastDir + 1, NULL, 10) : InvalidOid);
> >>
> >> Looking at some other code, I just discovered the atooid() macro that 
> >> already
> >> does the same, maybe it'd be better for consistency to use that instead?
> >
> > +1. Whie it does the same thing, consistency is good! :)
>
> committed

Thanks!

Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId

2020-01-13 Thread Michael Paquier

On Mon, Jan 13, 2020 at 01:09:01PM +0100, Julien Rouhaud wrote:
> But that means an extraneous call to BackendIdGetProc() in that
> case, it seems better to avoid it if we already have the information.

Note that you cannot make a direct comparison of the result from
GetTempNamespaceBackendId() with MyBackendId, because it is critical
to be able to handle the case of a session which has not created yet
an object on its own temp namespace (this way any temp objects from
previous connections which used the same backend slot can be marked as
orphaned and discarded by autovacuum, the whole point of 246a6c8).  So
in order to get a fast-exit path we could do the following:
- Add a routine GetTempNamespace which returns myTempNamespace (the
result can be InvalidOid).
- Add an assertion at the beginning of isTempNamespaceInUse() to make
sure that it never gets called with InvalidOid as input argument.
- Return true as a first step of GetTempNamespaceBackendId() if
namespaceId matches GetTempNamespace().

What do you think?
--
Michael


signature.asc
Description: PGP signature

Re: Add pg_file_sync() to adminpack

2020-01-13 Thread Michael Paquier

On Sat, Jan 11, 2020 at 02:12:15AM +0900, Fujii Masao wrote:
> I'm not sure if returning false with WARNING only in some error cases
> is really good idea or not. At least for me, it's more intuitive to
> return true on success and emit an ERROR otherwise. I'd like to hear
> more opinions about this. Also if returning true on success is rather
> confusing, we can change its return type to void.

An advantage of not issuing an ERROR if that when working on a list of
files (for example a WITH RECURSIVE on the whole data directory?), you
can then know which files could not be synced instead of seeing one
ERROR about one file, while being unsure about the state of the
others.

> Could you elaborate why? But if it's not good to sync the existing directory
> in the regression test, we may need to give up testing the sync of directory.
> Another idea is to add another function like pg_mkdir() into adminpack
> and use the directory that we newly created by using that function,
> for the test. Or better idea?

We should avoid potentially costly tests in any regression scenario if
we have a way to do so.  I like your idea of having a pg_mkdir(), that
feels more natural to have as there is already pg_file_write().
--
Michael


signature.asc
Description: PGP signature

Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId

2020-01-13 Thread Julien Rouhaud

On Mon, Jan 13, 2020 at 10:14:52PM +0900, Michael Paquier wrote:
> On Mon, Jan 13, 2020 at 01:09:01PM +0100, Julien Rouhaud wrote:
> > But that means an extraneous call to BackendIdGetProc() in that
> > case, it seems better to avoid it if we already have the information.
>
> Note that you cannot make a direct comparison of the result from
> GetTempNamespaceBackendId() with MyBackendId, because it is critical
> to be able to handle the case of a session which has not created yet
> an object on its own temp namespace (this way any temp objects from
> previous connections which used the same backend slot can be marked as
> orphaned and discarded by autovacuum, the whole point of 246a6c8).

Oh right.

> So in order to get a fast-exit path we could do the following:
> - Add a routine GetTempNamespace which returns myTempNamespace (the
> result can be InvalidOid).
> - Add an assertion at the beginning of isTempNamespaceInUse() to make
> sure that it never gets called with InvalidOid as input argument.
> - Return true as a first step of GetTempNamespaceBackendId() if
> namespaceId matches GetTempNamespace().
>
> What do you think?

Well, since isTempNamespaceInUse is for now only called by autovacuum, and
shouldn't change soon, this really feels premature optimzation, so your
original approach looks better.

Re: [PATCH] distinct aggregates within a window function WIP

2020-01-13 Thread Tom Lane

Krasiyan Andreev  writes:
> I want to propose to you an old patch for Postgres 11, off-site developed
> by Oliver Ford,
> but I have permission from him to publish it and to continue it's
> development,
> that allow distinct aggregates, like select sum(distinct nums) within a
> window function.

I started to respond by asking whether that's well-defined, but
reading down further I see that that's not actually what the feature
is: what it is is attaching DISTINCT to a window function itself.
I'd still ask whether it's well-defined though, or even minimally
sensible.  Window functions are generally supposed to produce one
row per input row --- how does that square with the implicit row
merging of DISTINCT?  They're also typically row-order-sensitive
--- how does that work with DISTINCT?  Also, to the extent that
this is sensible, can't you get the same results already today
with appropriate use of window framing options?

> It's a WIP, because it doesn't have tests yet (I will add them later) and
> also, it works for a int, float, and numeric types,

As a rule of thumb, operations like this should not be coded to be
datatype-specific.  We threw out some features in the original window
function patch until they could be rewritten to not be limited to a
hard-coded set of data types (cf commit 0a459cec9), and I don't see
why we'd apply a lesser standard here.  Certainly DISTINCT for
aggregates has no such limitation.

regards, tom lane

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-13 Thread Tom Lane

vignesh C  writes:
> On Thu, Sep 26, 2019 at 7:17 PM Luis Carril  wrote:
>>> Your patch is failing the pg_dump TAP tests.  Please use
>>> configure --enable-tap-tests, fix the problems, then resubmit.

>> Fixed, I've attached a new version.

> Will it be possible to add a test case for this, can we validate by
> adding one test?

Isn't the change in the TAP test output sufficient?

regards, tom lane

Re: Add pg_file_sync() to adminpack

2020-01-13 Thread Julien Rouhaud

On Mon, Jan 13, 2020 at 2:46 PM Michael Paquier  wrote:
>
> On Sat, Jan 11, 2020 at 02:12:15AM +0900, Fujii Masao wrote:
> > I'm not sure if returning false with WARNING only in some error cases
> > is really good idea or not. At least for me, it's more intuitive to
> > return true on success and emit an ERROR otherwise. I'd like to hear
> > more opinions about this. Also if returning true on success is rather
> > confusing, we can change its return type to void.
>
> An advantage of not issuing an ERROR if that when working on a list of
> files (for example a WITH RECURSIVE on the whole data directory?), you
> can then know which files could not be synced instead of seeing one
> ERROR about one file, while being unsure about the state of the
> others.

Actually, can't it create a security hazard, for instance if you call
pg_file_sync() on a heap file and the calls errors out, since it's
bypassing data_sync_retry?

Re: [PATCH] distinct aggregates within a window function WIP

2020-01-13 Thread Thomas Kellerer

Tom Lane schrieb am 13.01.2020 um 15:19:

> what it is is attaching DISTINCT to a window function itself.
> I'd still ask whether it's well-defined though, or even minimally
> sensible.  Window functions are generally supposed to produce one
> row per input row --- how does that square with the implicit row
> merging of DISTINCT?  They're also typically row-order-sensitive
> --- how does that work with DISTINCT?  Also, to the extent that
> this is sensible, can't you get the same results already today
> with appropriate use of window framing options?

I find the example using array_agg() and cumulative window functions a
bit confusing as well, but I think there are situations where having this
is really helpful, e.g.:

   count(distinct some_column) over (partition by something)

I know it's not an argument, but Oracle supports this and porting
queries like that from Oracle to Postgres isn't really fun.

Thomas

Re: How to retain lesser paths at add_path()?

2020-01-13 Thread Kohei KaiGai

The v2 patch is attached.

This adds two dedicated lists on the RelOptInfo to preserve lesser paths
if extension required to retain the path-node to be removed in usual manner.
These lesser paths are kept in the separated list, so it never expand the length
of pathlist and partial_pathlist. That was the arguable point in the discussion
at the last October.

The new hook is called just before the path-node removal operation, and
gives extension a chance for extra decision.
If extension considers the path-node to be removed can be used in the upper
path construction stage, they can return 'true' as a signal to preserve this
lesser path-node.
In case when same kind of path-node already exists in the preserved_pathlist
and the supplied lesser path-node is cheaper than the old one, extension can
remove the worse one arbitrarily to keep the length of preserved_pathlist.
(E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved-
pathlist for further opportunity of combined usage with GpuPreAgg path-node.
It just needs "the best GpuJoin path-node" somewhere, not two or more.)

Because PostgreSQL core has no information which preserved path-node can
be removed, extensions that uses path_removal_decision_hook() has responsibility
to keep the length of preserved_(partial_)pathlist reasonable.

BTW, add_path() now removes the lesser path-node by pfree(), not only detach
from the path-list. (IndexPath is an exception)
Does it really make sense? It only releases the path-node itself, so may not
release entire objects. So, efficiency of memory usage is limited. And
ForeignScan
/ CustomScan may references the path-node to be removed. It seems to me here
is no guarantee lesser path-nodes except for IndexPath nodes are safe
to release.

Best regards,

2020年1月11日(土) 21:27 Tomas Vondra :
>
> On Sat, Jan 11, 2020 at 05:07:11PM +0900, Kohei KaiGai wrote:
> >Hi,
> >
> >The proposition I posted at 10th-Oct proposed to have a separate list to 
> >retain
> >lesser paths not to expand the path_list length, but here are no comments by
> >others at that time.
> >Indeed, the latest patch has not been updated yet.
> >Please wait for a few days. I'll refresh the patch again.
> >
>
> OK, thanks for the update. I've marked the patch as "waiting on author".
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 

pgsql13-path_removal_decision_hook.v2.patch
Description: Binary data

Re: [PATCH] distinct aggregates within a window function WIP

2020-01-13 Thread Krasiyan Andreev

I understand yours note about datatype-specific operations, so I need to
think more generic about it.
About yours additional note, I think that it is not possible to get easy
the same result with appropriate use of window framing options,
because "exclude ties" will not exclude "current row" itself, only peers of
it. So, that is the only difference and reason of DISTINCT aggregate.
Maybe, if we can specify at the same time to "exclude ties" and "exclude
current row" itself, there will not be need of DISTINCT, but right now
I think that nor "exclude ties" nor "exclude groups" or "exclude current
row", can specify it, because they can't be nested or used at the same time.

На пн, 13.01.2020 г. в 16:19 Tom Lane  написа:

> Krasiyan Andreev  writes:
> > I want to propose to you an old patch for Postgres 11, off-site developed
> > by Oliver Ford,
> > but I have permission from him to publish it and to continue it's
> > development,
> > that allow distinct aggregates, like select sum(distinct nums) within a
> > window function.
>
> I started to respond by asking whether that's well-defined, but
> reading down further I see that that's not actually what the feature
> is: what it is is attaching DISTINCT to a window function itself.
> I'd still ask whether it's well-defined though, or even minimally
> sensible.  Window functions are generally supposed to produce one
> row per input row --- how does that square with the implicit row
> merging of DISTINCT?  They're also typically row-order-sensitive
> --- how does that work with DISTINCT?  Also, to the extent that
> this is sensible, can't you get the same results already today
> with appropriate use of window framing options?
>
> > It's a WIP, because it doesn't have tests yet (I will add them later) and
> > also, it works for a int, float, and numeric types,
>
> As a rule of thumb, operations like this should not be coded to be
> datatype-specific.  We threw out some features in the original window
> function patch until they could be rewritten to not be limited to a
> hard-coded set of data types (cf commit 0a459cec9), and I don't see
> why we'd apply a lesser standard here.  Certainly DISTINCT for
> aggregates has no such limitation.
>
> regards, tom lane
>

Re: [PATCH] distinct aggregates within a window function WIP

2020-01-13 Thread Vik Fearing

On 13/01/2020 15:19, Tom Lane wrote:
> Krasiyan Andreev  writes:
>> I want to propose to you an old patch for Postgres 11, off-site developed
>> by Oliver Ford,
>> but I have permission from him to publish it and to continue it's
>> development,
>> that allow distinct aggregates, like select sum(distinct nums) within a
>> window function.
> I started to respond by asking whether that's well-defined, but
> reading down further I see that that's not actually what the feature
> is: what it is is attaching DISTINCT to a window function itself.
> I'd still ask whether it's well-defined though, or even minimally
> sensible.  Window functions are generally supposed to produce one
> row per input row --- how does that square with the implicit row
> merging of DISTINCT?  They're also typically row-order-sensitive
> --- how does that work with DISTINCT?  


It's a little strange because the spec says:



If the window ordering clause or the window framing clause of the window
structure descriptor that describes the 
is present, then no  simply contained in  shall specify DISTINCT or .



So it seems to be well defined if all you have is a partition.


But then it also says:



DENSE_RANK() OVER WNS is equivalent to the :
    COUNT (DISTINCT ROW ( VE 1 , ..., VE N ) )
    OVER (WNS1 RANGE UNBOUNDED PRECEDING)



And that kind of looks like a framing clause there.


> Also, to the extent that
> this is sensible, can't you get the same results already today
> with appropriate use of window framing options?


I don't see how.


I have sometimes wanted this feature so I am +1 on us getting at least a
minimal form of it.

-- 

Vik

Re: [Proposal] Global temporary tables

2020-01-13 Thread Tomas Vondra


On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote:



On 12.01.2020 4:51, Tomas Vondra wrote:

On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote:



On 09.01.2020 19:48, Tomas Vondra wrote:


The most complex and challenged task is to support GTT for all 
kind of indexes. Unfortunately I can not proposed some good 
universal solution for it.
Just patching all existed indexes implementation seems to be 
the only choice.




I haven't looked at the indexing issue closely, but IMO we need to
ensure that every session sees/uses only indexes on GTT that were
defined before the seesion started using the table.


Why? It contradicts with behavior of normal tables.
Assume that you have active clients and at some point of time DBA 
recognizes that them are spending to much time in scanning some 
GTT.
It cab create index for this GTT but if existed client will not be 
able to use this index, then we need somehow make this clients to 
restart their sessions?
In my patch I have implemented building indexes for GTT on demand: 
if accessed index on GTT is not yet initialized, then it is filled 
with local data.


Yes, I know the behavior would be different from behavior for regular
tables. And yes, it would not allow fixing slow queries in sessions
without interrupting those sessions.

I proposed just ignoring those new indexes because it seems much simpler
than alternative solutions that I can think of, and it's not like those
other solutions don't have other issues.


Quit opposite: prohibiting sessions to see indexes created before 
session start to use GTT requires more efforts. We need to somehow 
maintain and check GTT first access time.




Hmmm, OK. I'd expect such check to be much simpler than the on-demand
index building, but I admit I haven't tried implementing either of those
options.



For example, I've looked at the "on demand" building as implemented in
global_private_temp-8.patch, I kinda doubt adding a bunch of index build
calls into various places in index code seems somewht suspicious.


We in any case has to initialize GTT indexes on demand even if we 
prohibit usages of indexes created after first access by session to 
GTT.
So the difference is only in one thing: should we just initialize 
empty index or populate it with local data (if rules for index 
usability are the same for GTT as for normal tables).
From implementation point of view there is no big difference. Actually 
building index in standard way is even simpler than constructing empty 
index. Originally I have implemented
first approach (I just forgot to consider case when GTT was already 
user by a session). Then I rewrited it using second approach and patch 
even became simpler.




* brinbuild is added to brinRevmapInitialize, which is meant to
  initialize state for scanning. It seems wrong to build the index we're
  scanning from this function (layering and all that).

* btbuild is called from _bt_getbuf. That seems a bit ... suspicious?



As I already mentioned - support of indexes for GTT is one of the most 
challenged things in my patch.
I didn't find good and universal solution. So I agreed that call of 
btbuild from _bt_getbuf may be considered as suspicious.
I will be pleased if you or sombody else can propose better 
elternative and not only for B-Tree, but for all other indexes.


But as I already wrote above, prohibiting session to used indexes 
created after first access to GTT doesn't solve the problem.
For normal tables (and for local temp tables) indexes are initialized 
at the time of their creation.
With GTT it doesn't work, because each session has its own local data 
of GTT.
We should either initialize/build index on demand (when it is first 
accessed), either at the moment of session start initialize indexes 
for all existed GTTs.
Last options seem to be much worser from my point of view: there may 
me huge number of GTT and session may not need to access GTT at all.


... and so on for other index types. Also, what about custom indexes
implemented in extensions? It seems a bit strange each of them has to
support this separately.


I have already complained about it: my patch supports GTT for all 
built-in indexes, but custom indexes has to handle it themselves.
Looks like to provide some generic solution we need to extend index 
API, providing two diffrent operations: creation and initialization.
But extending index API is very critical change... And also it doesn't 
solve the problem with all existed extensions: them in any case have

to be rewritten to implement new API version in order to support GTT.



Why not to allow creating only indexes implementing this new API method
(on GTT)?



IMHO if this really is the right solution, we need to make it work for
existing indexes without having to tweak them individually. Why don't we
track a flag whether an index on GTT was initialized in a given session,
and if it was not then call the build function before calling any other
fu

Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes

2020-01-13 Thread Peter Eisentraut


On 2020-01-06 07:06, Tom Lane wrote:

Hence, the attached revision only forces quoting in a SQL COPY
command, or if the user already typed a quote.


Yes, that seems better.  Users tend to not like if tab completion messes 
with what they have already typed unless strictly necessary.


The file name completion portion of this patch seems to work quite well now.

I have found a weird behavior with identifier quoting, which is not the 
subject of this patch, but it appears to be affected by it.


The good thing is that the new code will behave sensibly with

select * from "pg_cl

which the old code didn't offer anything on.

The problem is that if you have

create table "test""1" (a int);

then the new code responds to

select * from "te

by making that

select * from "te"

whereas the old code curiously handled that perfectly.

Neither the old nor the new code will produce anything from

select * from te

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Block level parallel vacuum

2020-01-13 Thread Mahendra Singh Thalor

On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
>
> Hi
> Thank you for update! I looked again
>
> (vacuum_indexes_leader)
> +   /* Skip the indexes that can be processed by parallel workers 
> */
> +   if (!skip_index)
> +   continue;
>
> Does the variable name skip_index not confuse here? Maybe rename to something 
> like can_parallel?
>

Again I looked into code and thought that somehow if we can add a
boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
identify that this index is supporting parallel vacuum or not, then it
will be easy to skip those indexes and multiple time we will not call
skip_parallel_vacuum_index (from vacuum_indexes_leader and
parallel_vacuum_index)
We can have a linked list of non-parallel supported indexes, then
directly we can pass to vacuum_indexes_leader.

Ex: let suppose we have 5 indexes into a table.  If before launching
parallel workers, if we can add boolean flag(can_parallel)
IndexBulkDeleteResult structure to identify that this index is
supporting parallel vacuum or not.
Let index 1, 4 are not supporting parallel vacuum so we already have
info in a linked list that 1->4 are not supporting parallel vacuum, so
parallel_vacuum_index will process these indexes and rest will be
processed by parallel workers. If parallel worker found that
can_parallel is false, then it will skip that index.

As per my understanding, if we implement this, then we can avoid
multiple function calling of skip_parallel_vacuum_index and if there
is no index which can't  performe parallel vacuum, then we will not
call vacuum_indexes_leader as head of list pointing to null. (we can
save unnecessary calling of vacuum_indexes_leader)

Thoughts?

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-13 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > To be clear, I was advocating for a NEW DB-level privilege ('INSTALL' or
> > 'CREATE EXTENSION' if we could make that work), so that we have it be
> > distinct from CREATE (which, today, really means 'CREATE SCHEMA').
> 
> I still say this is wrong, or at least pointless, because it'd be a
> right that any DB owner could grant to himself.

Yes, of course it is, that the DB owner would have this privilege was
something you agreed to in the prior email- I'd rather not just have a
"if (DBOwner())" check, I'd rather use our actual privilege system and
have this be a right that the DB owner has but can then GRANT out to
others if they wish to.

I'm certainly not suggesting that such a privilege wouldn't be
controlled by the DB owner.  Forcing it to only be allowed for the DB
owner and not be something that the DB owner can GRANT out isn't much
better than "if (superuser())"-style checks.

> If we're to have any
> meaningful access control on extension installation, the privilege
> would have to be attached to some other object ... and there's no clear
> candidate for what.

Extensions are installed at the DB level, not at any other level, and
therefore that's the appropriate place to attach them, which is exactly
what I'm suggesting we do here.

> As someone noted awhile back, if we could somehow
> attach ACLs to potentially-installable extensions, that might be an
> interesting avenue to pursue.  That's well beyond what I'm willing
> to pursue for v13, though.

Sure, having some catalog of installable extensions where someone (in my
thinking, the DB owner) could GRANT out access to install certain
extensions to others might be interesting, but it's not what I'm
suggesting here.

> In the meantime, though, this idea as stated doesn't do anything except
> let a DB owner grant install privileges to someone else.  I'm not even
> convinced that we want that, or that anyone needs it (I can recall zero
> such requests related to PLs in the past).  And for sure it does not
> belong in a minimal implementation of this feature.

Yes, that's what this approach would do.  I suppose an alternative would
be to lump it in with "CREATE" rights on the DB, but I've advocated and
will continue to advocate for splitting up of such broad rights.
DB-level CREATE rights currently cover both schemas and publications,
for example, even though the two have rather little to do with each
other.

If the only agreeable option is a if (DBOwner())-type check, or lumping
the privilege to CREATE (trusted) EXTENSION in with other DB-level
CREATE rights, then I'll go along with one of those.  I'll be happy
enough with that, since it avoids having an additional default role that
has to be GRANT'd by a superuser.  Ideally, down the road, we'll split
out the CREATE privilege (both at DB and at schema level) to be more
fine grained, but that can certainly be done later.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: our checks for read-only queries are not great

2020-01-13 Thread Stephen Frost

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Fri, 2020-01-10 at 09:29 -0500, Tom Lane wrote:
> > > ALTER SYSTEM is read only in my mind.
> > 
> > I'm still having trouble with this conclusion.  I think it can only
> > be justified by a very narrow reading of "reflected in pg_dump"
> > that relies on the specific factorization we've chosen for upgrade
> > operations, ie that postgresql.conf mods have to be carried across
> > by hand.  But that's mostly historical baggage, rather than a sane
> > basis for defining "read only".  If somebody comes up with a patch
> > that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> > whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> > will you then decide that ALTER SYSTEM SET is no longer read-only?
> 
> I think that having ALTER SYSTEM commands in pg_dumpall output
> would be a problem.  It would cause all kinds of problems whenever
> parameters change.  Thinking of the transition "checkpoint_segments"
> -> "max_wal_size", you'd have to build some translation magic into pg_dump.
> Besides, such a feature would make it harder to restore a dump taken
> with version x into version x + n for n > 0.

pg_dump already specifically has understanding of how to deal with old
options in other things when constructing a dump for a given version-
and we already have issues that a dump taken with pg_dump X has a good
chance of now being able to be restoreding into a PG X+1, that's why
it's recommended to use the pg_dump for the version of PG you're
intending to restore into, so I don't particularly agree with any of the
arguments presented above.

> > Or, perhaps, reject such a patch on the grounds that it breaks this
> > arbitrary definition of read-only-ness?
> 
> I agree with Robert that such a patch should be rejected on other
> grounds.
> 
> Concerning the topic of the thread, I personally have come to think
> that changing GUCs is *not* writing to the database.  But that is based
> on the fact that you can change GUCs on streaming replication standbys,
> and it may be surprising to a newcomer.
> 
> Perhaps it would be good to consider this question:
> Do we call something "read-only" if it changes nothing, or do we call it
> "read-only" if it is allowed on a streaming replication standby?
> The first would be more correct, but the second may be more convenient.

The two are distinct from each other and one doesn't imply the other.  I
don't think we need to, or really want to, force them to be the same.

When we're talking about a "read-only" transaction that the user has
specifically asked be "read-only" then, imv anyway, we should be pretty
stringent regarding what that's allowed to do and shouldn't be allowing
that to change state in the system which other processes will see after
the transaction is over.

A transaction (on a primary or a replica) doesn't need to be started as
explicitly "read-only" and perhaps we should change the language when we
are starting up to say "database is ready to accept replica connections"
or something instead of "read-only" connections to clarify that.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: our checks for read-only queries are not great

2020-01-13 Thread Robert Haas

On Mon, Jan 13, 2020 at 5:57 AM Peter Eisentraut
 wrote:
> On 2020-01-10 14:41, Robert Haas wrote:
> > This rule very nearly matches the current behavior: it explains why
> > temp table operations are allowed, and why ALTER SYSTEM is allowed,
> > and why REINDEX etc. are allowed. However, there's a notable
> > exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed
> > in a read-only transaction. Under the "doesn't change pg_dump output"
> > criteria, the first and third ones should be permitted but COMMIT
> > PREPARED should be denied, except maybe if the prepared transaction
> > didn't do any writes (and in that case, why did we bother preparing
> > it?). Despite that, this rule does a way better job explaining the
> > current behavior than anything else suggested so far.
>
> I don't follow.  Does pg_dump dump prepared transactions?

No, but committing one changes the database contents as seen by a
subsequent pg_dump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: our checks for read-only queries are not great

2020-01-13 Thread Laurenz Albe

On Mon, 2020-01-13 at 13:56 -0500, Stephen Frost wrote:
> > I think that having ALTER SYSTEM commands in pg_dumpall output
> > would be a problem.  It would cause all kinds of problems whenever
> > parameters change.  Thinking of the transition "checkpoint_segments"
> > -> "max_wal_size", you'd have to build some translation magic into pg_dump.
> > Besides, such a feature would make it harder to restore a dump taken
> > with version x into version x + n for n > 0.
> 
> pg_dump already specifically has understanding of how to deal with old
> options in other things when constructing a dump for a given version-
> and we already have issues that a dump taken with pg_dump X has a good
> chance of now being able to be restoreding into a PG X+1, that's why
> it's recommended to use the pg_dump for the version of PG you're
> intending to restore into, so I don't particularly agree with any of the
> arguments presented above.

Right.
But increasing the difficulty of restoring a version x pg_dump with
version x + 1 is still not a thing we should lightly do.

Not that the docs currently say "it is recommended to use pg_dumpall
from the newer version".  They don't say "is is not supported".

Yours,
Laurenz Albe

Re: our checks for read-only queries are not great

2020-01-13 Thread Stephen Frost

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Mon, 2020-01-13 at 13:56 -0500, Stephen Frost wrote:
> > > I think that having ALTER SYSTEM commands in pg_dumpall output
> > > would be a problem.  It would cause all kinds of problems whenever
> > > parameters change.  Thinking of the transition "checkpoint_segments"
> > > -> "max_wal_size", you'd have to build some translation magic into 
> > > pg_dump.
> > > Besides, such a feature would make it harder to restore a dump taken
> > > with version x into version x + n for n > 0.
> > 
> > pg_dump already specifically has understanding of how to deal with old
> > options in other things when constructing a dump for a given version-
> > and we already have issues that a dump taken with pg_dump X has a good
> > chance of now being able to be restoreding into a PG X+1, that's why
> > it's recommended to use the pg_dump for the version of PG you're
> > intending to restore into, so I don't particularly agree with any of the
> > arguments presented above.
> 
> Right.
> But increasing the difficulty of restoring a version x pg_dump with
> version x + 1 is still not a thing we should lightly do.

I've never heard that and I don't agree with it being a justification
for blocking sensible progress.

> Not that the docs currently say "it is recommended to use pg_dumpall
> from the newer version".  They don't say "is is not supported".

It's recommended due to exactly the reasons presented and no one is
saying that such isn't supported- but we don't and aren't going to
guarantee that it's going to work.  We absolutely know of cases where it
just won't work, today.  If that's justification for saying it's not
supported, then fine, let's change the language to say that.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [Proposal] Global temporary tables

2020-01-13 Thread Julien Rouhaud

On Mon, Jan 13, 2020 at 05:32:53PM +0100, Tomas Vondra wrote:
> On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote:
> >
> >"if any code tried to access the statistics directly from the table, 
> >rather than via the caches".
> >
> >Currently optimizer is accessing statistic though caches. So this 
> >approach works. If somebody will rewrite optimizer or provide own 
> >custom optimizer in extension which access statistic directly
> >then it we really be a problem. But I wonder why bypassing catalog 
> >cache may be needed.
> >
> 
> I don't know, but it seems extensions like hypopg do it.

AFAIR, hypopg only opens pg_statistic to use its tupledesc when creating
statistics on hypothetical partitions, but it should otherwise never reads or
need plain pg_statistic rows.

Re: benchmarking Flex practices

2020-01-13 Thread Tom Lane

John Naylor  writes:
> [ v11 patch ]

I pushed this with some small cosmetic adjustments.

One non-cosmetic adjustment I experimented with was to change
str_udeescape() to overwrite the source string in-place, since
we know that's modifiable storage and de-escaping can't make
the string longer.  I reasoned that saving a palloc() might help
reduce the extra cost of UESCAPE processing.  It didn't seem to
move the needle much though, so I didn't commit it that way.
A positive reason to keep the API as it stands is that if we
do something about the idea of allowing Unicode strings in
non-UTF8 backend encodings, that'd likely break the assumption
about how the string can't get longer.

I'm about to go off and look at the non-UTF8 idea, btw.

regards, tom lane

Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes

2020-01-13 Thread Tom Lane

Peter Eisentraut  writes:
> The file name completion portion of this patch seems to work quite well now.

Thanks for testing!

> I have found a weird behavior with identifier quoting, which is not the 
> subject of this patch, but it appears to be affected by it.

> The good thing is that the new code will behave sensibly with
> select * from "pg_cl
> which the old code didn't offer anything on.

Check.

> The problem is that if you have
> create table "test""1" (a int);
> then the new code responds to
> select * from "te
> by making that
> select * from "te"
> whereas the old code curiously handled that perfectly.

Right.  The underlying cause of both these changes seems to be that
what readline passes to psql_completion is just the contents of the
string, now that we've told it that double-quote is a string quoting
character.  So that works fine with 'pg_cl' which didn't need quoting
anyway.  In your second example, because we generate possible matches
that are already quoted-if-necessary, we fail to find anything that
bare 'te' is a prefix of, where before we were considering '"te' and
it worked.

I'll think about how to improve this.  Given that we're dequoting
filenames explicitly anyway, maybe we don't need to tell readline that
double-quote is a quoting character.  Another idea is that maybe
we ought to refactor things so that identifier dequoting and requoting
is handled explicitly, similarly to the way filenames work now.
That would make the patch a lot bigger though.

regards, tom lane

Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-13 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> In the meantime, though, this idea as stated doesn't do anything except
>> let a DB owner grant install privileges to someone else.  I'm not even
>> convinced that we want that, or that anyone needs it (I can recall zero
>> such requests related to PLs in the past).  And for sure it does not
>> belong in a minimal implementation of this feature.

> Yes, that's what this approach would do.  I suppose an alternative would
> be to lump it in with "CREATE" rights on the DB, but I've advocated and
> will continue to advocate for splitting up of such broad rights.
> DB-level CREATE rights currently cover both schemas and publications,
> for example, even though the two have rather little to do with each
> other.

The patch as I'm proposing it has nothing to do with "CREATE" rights.
You're attacking something different from what I actually want to do.

regards, tom lane

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2020-01-13 Thread Anastasia Lubennikova


On 31.12.2019 01:40, Peter Geoghegan wrote:

On Mon, Dec 30, 2019 at 9:45 AM Robert Haas  wrote:

For example, float and numeric types are "never bitwise equal", while array,
text, and other container types are "maybe bitwise equal". An array of
integers
or text with C collation can be treated as bitwise equal attributes, and it
would be too harsh to restrict them from deduplication.

We might as well support container types (like array) in the first
Postgres version that has nbtree deduplication, I suppose. Even still,
I don't think that it actually matters much to users. B-Tree indexes
on arrays are probably very rare. Note that I don't consider text to
be a container type here -- obviously btree/text_ops is a very
important opclass for the deduplication feature. It may be the most
important opclass overall.

Recursively invoking a support function for the "contained" data type
in the btree/array_ops support function seems like it might be messy.
Not sure about that, though.


What bothers me is that this option will unlikely be helpful on its own
and we
should also provide some kind of recheck function along with opclass, which
complicates this idea even further and doesn't seem very clear.

It seems like the simplest thing might be to forget about the 'char'
column and just have a support function which can be used to assess
whether a given opclass's notion of equality is bitwise.

I like the idea of relying only on a support function.


In attachment you can find the WIP patch that adds support function for 
btree opclasses.
Before continuing, I want to ensure that I understood the discussion 
above correctly.


Current version of the patch adds:

1) new syntax, which allow to provide support function:

CREATE OPERATOR CLASS int4_ops_test
FOR TYPE int4 USING btree AS
        OPERATOR 1 =(int4, int4),
        FUNCTION 1 btint4cmp(int4, int4),
        SUPPORT datum_image_eqisbitwise;

We probably can add more words to specify the purpose of the support 
function.
Do you have any other objections about the place of this new element in 
CreateOplcass syntax structure?


2) trivial support function that always returns true 
'datum_image_eqisbitwise'.
It is named after 'datum_image_eq', because we define this support 
function via its behavior.


If this prototype is fine, I will continue this work and add support 
functions for other opclasses, update pg_dump and documentation.


Thoughts?

commit 9eb37de922c699208e52349494653241ddeb0116
Author: anastasia 
Date:   Mon Jan 13 23:28:20 2020 +0300

v5-WIP-Opclass-bitwise-equality-0001.patch

diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index e2c6de457c..143ecd1e13 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -335,7 +335,8 @@ DefineOpClass(CreateOpClassStmt *stmt)
 storageoid,		/* storage datatype oid, if any */
 namespaceoid,	/* namespace to create opclass in */
 opfamilyoid,	/* oid of containing opfamily */
-opclassoid;		/* oid of opclass we create */
+opclassoid,		/* oid of opclass we create */
+eqisbitwise_procoid; /* oid of opclass support function */
 	int			maxOpNumber,	/* amstrategies value */
 maxProcNumber;	/* amsupport value */
 	bool		amstorage;		/* amstorage flag */
@@ -564,6 +565,9 @@ DefineOpClass(CreateOpClassStmt *stmt)
 	aclcheck_error_type(ACLCHECK_NOT_OWNER, storageoid);
 #endif
 break;
+			case OPCLASS_ITEM_SUPPORT_FUNCTION:
+eqisbitwise_procoid = LookupFuncName(item->name->objname, 0, false, false);
+break;
 			default:
 elog(ERROR, "unrecognized item type: %d", item->itemtype);
 break;
@@ -653,6 +657,7 @@ DefineOpClass(CreateOpClassStmt *stmt)
 	values[Anum_pg_opclass_opcintype - 1] = ObjectIdGetDatum(typeoid);
 	values[Anum_pg_opclass_opcdefault - 1] = BoolGetDatum(stmt->isDefault);
 	values[Anum_pg_opclass_opckeytype - 1] = ObjectIdGetDatum(storageoid);
+	values[Anum_pg_opclass_opceqisbitwise - 1] = ObjectIdGetDatum(eqisbitwise_procoid);
 
 	tup = heap_form_tuple(rel->rd_att, values, nulls);
 
@@ -707,6 +712,15 @@ DefineOpClass(CreateOpClassStmt *stmt)
 		recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
 	}
 
+	/* dependency on support function */
+	if (OidIsValid(eqisbitwise_procoid))
+	{
+		referenced.classId = ProcedureRelationId;
+		referenced.objectId = eqisbitwise_procoid;
+		referenced.objectSubId = 0;
+		recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+	}
+
 	/* dependency on owner */
 	recordDependencyOnOwner(OperatorClassRelationId, opclassoid, GetUserId());
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54ad62bb7f..b0cae4e0aa 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3793,6 +3793,7 @@ _copyCreateOpClassStmt(const CreateOpClassStmt *from)
 	COPY_NODE_FIELD(datatype);
 	COPY_NODE_FIELD(items);
 	COPY_SCALAR_FIELD(isDefault);
+	COPY_STRING_FIELD(eqisbitwise_fn);
 
 	return newnode;
 }
d

Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-13 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> In the meantime, though, this idea as stated doesn't do anything except
> >> let a DB owner grant install privileges to someone else.  I'm not even
> >> convinced that we want that, or that anyone needs it (I can recall zero
> >> such requests related to PLs in the past).  And for sure it does not
> >> belong in a minimal implementation of this feature.
> 
> > Yes, that's what this approach would do.  I suppose an alternative would
> > be to lump it in with "CREATE" rights on the DB, but I've advocated and
> > will continue to advocate for splitting up of such broad rights.
> > DB-level CREATE rights currently cover both schemas and publications,
> > for example, even though the two have rather little to do with each
> > other.
> 
> The patch as I'm proposing it has nothing to do with "CREATE" rights.
> You're attacking something different from what I actually want to do.

Yes, as an aside, I'm argueing that we should split up the general
CREATE privileges, but I also said that's not required for this.

You're asking "what's the best way to add this privilege to PG?".  I'm
saying that it should be done through the privilege system, similar to
publications.  I'd prefer it not be lumped into CREATE, but that at
least makes sense to me- adding a default role for this doesn't.  I
suppose making it akin to ALTER DATABASE and having it be limited to the
DB owner is also alright (as I said in my last email) but it means that
someone has to be given DB ownership rights in order to install
extensions.  I don't really see CREATE EXTENSION as being like ALTER
DATABASE from a privilege perspective, but having it be DB owner still
makes more sense than a default role for this.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [Proposal] Global temporary tables

2020-01-13 Thread Tomas Vondra

On Mon, Jan 13, 2020 at 09:12:38PM +0100, Julien Rouhaud wrote:

On Mon, Jan 13, 2020 at 05:32:53PM +0100, Tomas Vondra wrote:

On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote:
>
>"if any code tried to access the statistics directly from the table,
>rather than via the caches".
>
>Currently optimizer is accessing statistic though caches. So this
>approach works. If somebody will rewrite optimizer or provide own
>custom optimizer in extension which access statistic directly
>then it we really be a problem. But I wonder why bypassing catalog
>cache may be needed.
>

I don't know, but it seems extensions like hypopg do it.

AFAIR, hypopg only opens pg_statistic to use its tupledesc when creating
statistics on hypothetical partitions, but it should otherwise never reads or
need plain pg_statistic rows.

Ah, OK! Thanks for the clarification. I knew it does something with the
catalog, didn't realize it only gets the descriptor.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-13 Thread Alvaro Herrera

On 2020-Jan-11, Tomas Vondra wrote:

> Hi,
> 
> On Thu, Sep 26, 2019 at 01:47:28PM +, Luis Carril wrote:
> > 
> > I don't disagree with adding FOREIGN, though.
> > 
> > Your patch is failing the pg_dump TAP tests.  Please use
> > configure --enable-tap-tests, fix the problems, then resubmit.
> > 
> > Fixed, I've attached a new version.
> 
> This seems like a fairly small and non-controversial patch (I agree with
> Alvaro that having the optional FOREIGN seems won't hurt). So barring
> objections I'll polish it a bit and push sometime next week.

If we're messing with that code, we may as well reduce cognitive load a
little bit and unify all those multiple consecutive appendStringInfo
calls into one.  (My guess is that this was previously not possible
because there were multiple fmtId() calls in the argument list, but
that's no longer the case.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e369a559b2f96e3e73cf1bc4a1fefc9890243a7c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jan 2020 12:33:21 -0300
Subject: [PATCH v3] pg_dump: Add FOREIGN to ALTER statements, if appropriate

Author: Luis Carril
Discussion: https://postgr.es/m/lejpr01mb0185a19b2e7c98e5e2a031f5e7...@lejpr01mb0185.deuprd01.prod.outlook.de
---
 src/bin/pg_dump/pg_dump.c | 90 ---
 1 file changed, 47 insertions(+), 43 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 799b6988b7..2866c55e96 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15583,6 +15583,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 	{
 		char	   *ftoptions = NULL;
 		char	   *srvname = NULL;
+		char	   *foreign = "";
 
 		switch (tbinfo->relkind)
 		{
@@ -15616,6 +15617,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 	ftoptions = pg_strdup(PQgetvalue(res, 0, i_ftoptions));
 	PQclear(res);
 	destroyPQExpBuffer(query);
+
+	foreign = "FOREIGN ";
 	break;
 }
 			case RELKIND_MATVIEW:
@@ -15957,11 +15960,10 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 	continue;
 
 appendPQExpBufferStr(q, "\n-- For binary upgrade, set up inherited constraint.\n");
-appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
-  qualrelname);
-appendPQExpBuffer(q, " ADD CONSTRAINT %s ",
-  fmtId(constr->dobj.name));
-appendPQExpBuffer(q, "%s;\n", constr->condef);
+appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ADD CONSTRAINT %s %s;\n",
+  foreign, qualrelname,
+  fmtId(constr->dobj.name),
+  constr->condef);
 appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_constraint\n"
 	 "SET conislocal = false\n"
 	 "WHERE contype = 'c' AND conname = ");
@@ -15978,7 +15980,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 {
 	TableInfo  *parentRel = parents[k];
 
-	appendPQExpBuffer(q, "ALTER TABLE ONLY %s INHERIT %s;\n",
+	appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s INHERIT %s;\n", foreign,
 	  qualrelname,
 	  fmtQualifiedDumpable(parentRel));
 }
@@ -16084,9 +16086,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 			if (!shouldPrintColumn(dopt, tbinfo, j) &&
 tbinfo->notnull[j] && !tbinfo->inhNotNull[j])
 			{
-appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
-  qualrelname);
-appendPQExpBuffer(q, "ALTER COLUMN %s SET NOT NULL;\n",
+appendPQExpBuffer(q,
+  "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET NOT NULL;\n",
+  foreign, qualrelname,
   fmtId(tbinfo->attnames[j]));
 			}
 
@@ -16097,11 +16099,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 			 */
 			if (tbinfo->attstattarget[j] >= 0)
 			{
-appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
-  qualrelname);
-appendPQExpBuffer(q, "ALTER COLUMN %s ",
-  fmtId(tbinfo->attnames[j]));
-appendPQExpBuffer(q, "SET STATISTICS %d;\n",
+appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET STATISTICS %d;\n",
+  foreign, qualrelname,
+  fmtId(tbinfo->attnames[j]),
   tbinfo->attstattarget[j]);
 			}
 
@@ -16134,11 +16134,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
  */
 if (storage != NULL)
 {
-	appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
-	  qualrelname);
-	appendPQExpBuffer(q, "ALTER COLUMN %s ",
-	  fmtId(tbinfo->attnames[j]));
-	appendPQExpBuffer(q, "SET STORAGE %s;\n",
+	appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET STORAGE %s;\n",
+	  foreign, qualrelname,
+	  fmtId(tbinfo->attnames[j]),
 	  storage);
 }
 			}
@@ -16148,11 +16146,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 			 */
 			if (tbinfo->attoptions[j][0] != '\0')
 			{
-appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
-  qualrelname);
-appendPQExpBuffer(q, "ALTER COLUMN %s ",
-  fmtId(tbinfo->attnam

Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId

2020-01-13 Thread Michael Paquier

On Mon, Jan 13, 2020 at 02:56:13PM +0100, Julien Rouhaud wrote:
> Well, since isTempNamespaceInUse is for now only called by autovacuum, and
> shouldn't change soon, this really feels premature optimzation, so your
> original approach looks better.

Yes, I'd rather keep this routine in its simplest shape for now.  If
the optimization makes sense, though in most cases it won't because it
just helps sessions to detect faster their own temp schema, then let's
do it.  I'll let this patch aside for a couple of days to let others
comment on it, and if there are no objections, I'll commit the fix.
Thanks for the lookup!
--
Michael


signature.asc
Description: PGP signature

Re: DROP OWNED CASCADE vs Temp tables

2020-01-13 Thread Alvaro Herrera

On 2020-Jan-07, Mithun Cy wrote:

> I have a test where a user creates a temp table and then disconnect,
> concurrently we try to do DROP OWNED BY CASCADE on the same user. Seems
> this causes race condition between temptable deletion during disconnection
> (@RemoveTempRelations(myTempNamespace)) and DROP OWNED BY CASCADE operation
> which will try to remove same temp table when they find them as part of
> pg_shdepend.

Cute.

This seems fiddly to handle better; maybe you'd have to have a new
PERFORM_DELETION_* flag that says to ignore "missing" objects; so when
you go from shdepDropOwned, you pass that flag all the way down to
doDeletion(), so the objtype-specific function is called with
"missing_ok", and ignore if the object has already gone away.  That's
tedious because none of the Remove* functions have the concept of
missing_ok.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2020-01-13 Thread Peter Geoghegan

On Mon, Jan 13, 2020 at 12:49 PM Anastasia Lubennikova
 wrote:
> In attachment you can find the WIP patch that adds support function for
> btree opclasses.

Cool. Thanks!

> Current version of the patch adds:
>
> 1) new syntax, which allow to provide support function:
>
> CREATE OPERATOR CLASS int4_ops_test
> FOR TYPE int4 USING btree AS
>  OPERATOR 1 =(int4, int4),
>  FUNCTION 1 btint4cmp(int4, int4),
>  SUPPORT datum_image_eqisbitwise;

Hmm. Do we really need these grammar changes? If so, why? I think that
you wanted to make this something that could work with any opclass of
any index access method, but I don't see that as a useful goal. (If it
was useful, it could be considered later, on a case by case basis.)

I imagined that this infrastructure would consist of inventing a new
variety of B-Tree opclass support function -- something like
sortsupport. You could generalize from the example of commit
c6e3ac11b60, which added SortSupport functions (also known as "B-Tree
support function 2" functions). You might also take a look at the much
more recent commit 0a459cec96d, which added in_range functions (also
known as "B-Tree support function 3"). Note that neither of those two
commits had grammar changes for CREATE OPERATOR CLASS, or anything
like that. What I have in mind is a "B-Tree support function 4",
obviously.

You should probably add a C function that's similar to
PrepareSortSupportFromIndexRel()/FinishSortSupportFunction() that will
be called from the B-Tree code. This will give a simple yes/no answer
to the question: "Is it safe to apply deduplication to this Relation"?
This C function will know to return false for an opclass that doesn't
have any support function 4 set for any single attribute. It can
provide a general overview of what we're telling the caller about the
opclass here, etc. Another patch could add a similar function that
works with a plain operator, a bit like
PrepareSortSupportFromOrderingOp() -- but that isn't necessary now.

I suppose that this approach requires something a bit like a struct
SortSupportData, with filled-out collation information, etc. The
nbtree code expects a simple yes/no answer based on all columns in the
index, so it will be necessary to serialize that information to send
it across the SQL function interface -- the pg_proc support function
will have one argument of type "internal". And, I suppose that you'll
also need some basic btvalidate() validation code.

> We probably can add more words to specify the purpose of the support
> function.

Right -- some documentation is needed in btree.sgml, alongside the
existing stuff for support functions 1, 2, and 3.

> 2) trivial support function that always returns true
> 'datum_image_eqisbitwise'.
> It is named after 'datum_image_eq', because we define this support
> function via its behavior.

I like the idea of a generic, trivial SQL-callable function that all
simple scalar types can use -- one that just returns true. Maybe we
should call this general class of function an "image_equal" function,
and refer to "image equality" in the btree.sgml docs. I don't think
that using the term "bitwise" is helpful, since it sounds very precise
but is actually slightly inaccurate (since TOASTable datums can be
"image equal", but not bitwise equal according to datumIsEqual()).

How does everyone feel about "image equality" as a name? As I said
before, it seems like a good idea to tie this new infrastructure to
existing infrastructure used for things like REFRESH MATERIALIZED VIEW
CONCURRENTLY.

> If this prototype is fine, I will continue this work and add support
> functions for other opclasses, update pg_dump and documentation.

If this work is structured as a new support function, then it isn't
really a user-visible feature -- you won't need pg_dump support, psql
support, etc. Most of the documentation will be for operator class
authors rather than regular users. We can document the specific
opclasses that have support for deduplication later, if at all.

I think it would be fine if the deduplication docs (not the docs for
this infrastructure) just pointed out specific cases that we *cannot*
support -- there are not many exceptions (numeric, text with a
nondeterministic collation, a few others like that).

--
Peter Geoghegan

Additional improvements to extended statistics

2020-01-13 Thread Tomas Vondra


Hi,

Now that I've committed [1] which allows us to use multiple extended
statistics per table, I'd like to start a thread discussing a couple of
additional improvements for extended statistics. I've considered
starting a separate patch for each, but that would be messy as those
changes will touch roughly the same places. So I've organized it into a
single patch series, with the simpler parts at the beginning.

There are three main improvements:

1) improve estimates of OR clauses

Until now, OR clauses pretty much ignored extended statistics, based on
the experience that they're less vulnerable to misestimates. But it's a
bit weird that AND clauses are handled while OR clauses are not, so this
extends the logic to OR clauses.

Status: I think this is fairly OK.


2) support estimating clauses (Var op Var)

Currently, we only support clauses with a single Var, i.e. clauses like

  - Var op Const
  - Var IS [NOT] NULL
  - [NOT] Var
  - ...

and AND/OR clauses built from those simple ones. This patch adds support
for clauses of the form (Var op Var), of course assuming both Vars come
from the same relation.

Status: This works, but it feels a bit hackish. Needs more work.


3) support extended statistics on expressions

Currently we only allow simple references to columns in extended stats,
so we can do

   CREATE STATISTICS s ON a, b, c FROM t;

but not

   CREATE STATISTICS s ON (a+b), (c + 1) FROM t;

This patch aims to allow this. At the moment it's a WIP - it does most
of the catalog changes and stats building, but with some hacks/bugs. And
it does not even try to use those statistics during estimation.

The first question is how to extend the current pg_statistic_ext catalog
to support expressions. I've been planning to do it the way we support
expressions for indexes, i.e. have two catalog fields - one for keys,
one for expressions.

One difference is that for statistics we don't care about order of the
keys, so that we don't need to bother with storing 0 keys in place for
expressions - we can simply assume keys are first, then expressions.

And this is what the patch does now.

I'm however wondering whether to keep this split - why not to just treat
everything as expressions, and be done with it? A key just represents a
Var expression, after all. And it would massively simplify a lot of code
that now has to care about both keys and expressions.

Of course, expressions are a bit more expensive, but I wonder how
noticeable that would be.

Opinions?


ragards

[1] https://commitfest.postgresql.org/26/2320/

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From e8714d7edbfbafd3203623680e290d00ec3f1f8c Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 2 Dec 2019 23:02:17 +0100
Subject: [PATCH 1/3] Support using extended stats for parts of OR clauses

---
 src/backend/optimizer/path/clausesel.c| 88 +++
 src/backend/statistics/extended_stats.c   | 56 +---
 src/backend/statistics/mcv.c  |  5 +-
 .../statistics/extended_stats_internal.h  |  3 +-
 src/include/statistics/statistics.h   |  3 +-
 5 files changed, 120 insertions(+), 35 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c 
b/src/backend/optimizer/path/clausesel.c
index a3ebe10592..8ff756bb31 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -92,7 +92,7 @@ clauselist_selectivity(PlannerInfo *root,
 */
s1 *= statext_clauselist_selectivity(root, clauses, varRelid,

 jointype, sjinfo, rel,
-   
 &estimatedclauses);
+   
 &estimatedclauses, false);
}
 
/*
@@ -104,6 +104,68 @@ clauselist_selectivity(PlannerInfo *root,

  estimatedclauses);
 }
 
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+ List *clauses,
+ int varRelid,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo)
+{
+   ListCell   *lc;
+   Selectivity s1 = 0.0;
+   RelOptInfo *rel;
+   Bitmapset  *estimatedclauses = NULL;
+   int idx;
+
+   /*
+* Determine if these clauses reference a single relation.  If so, and 
if
+* it has extended statistics, try to apply those.
+*/
+   rel = find_single_rel_for_clauses(root, clauses);
+   if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+   {
+

Re: Why is pq_begintypsend so slow?

2020-01-13 Thread Andres Freund

Hi,

On 2020-01-11 22:32:45 -0500, Tom Lane wrote:
> I wrote:
> > I saw at this point that the remaining top spots were
> > enlargeStringInfo and appendBinaryStringInfo, so I experimented
> > with inlining them (again, see patch below).  That *did* move
> > the needle: down to 72691 ms, or 20% better than HEAD.
>
> Oh ... marking the test in the inline part of enlargeStringInfo()
> as unlikely() helps quite a bit more: 66100 ms, a further 9% gain.
> Might be over-optimizing for this particular case, perhaps
>
> (But ... I'm not finding these numbers to be super reproducible
> across different ASLR layouts.  So take it with a grain of salt.)

FWIW, I've also observed, in another thread (the node func generation
thing [1]), that inlining enlargeStringInfo() helps a lot, especially
when inlining some of its callers. Moving e.g. appendStringInfo() inline
allows the compiler to sometimes optimize away the strlen. But if
e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo()
unconditionally, successive appends cannot optimize away memory accesses
for ->len/->data.

For the case of send functions, we really ought to have at least
pq_begintypsend(), enlargeStringInfo() and pq_endtypsend() inline. That
way the compiler ought to be able to avoid repeatedly loading/storing
->len, after the initial initStringInfo() call. Might even make sense to
also have initStringInfo() inline, because then the compiler would
probably never actually materialize the StringInfoData (and would
automatically have good aliasing information too).

The commit referenced above is obviously quite WIP-ey, and contains
things that should be split into separate commits. But I think it might
be worth moving more functions into the header, like I've done in that
commit.

The commit also adds appendStringInfoU?Int(32,64) operations - I've
unsuprisingly found these to be *considerably* faster than going through
appendStringInfoString().

> but I think that's a reasonable marking given that we overallocate
> the stringinfo buffer for most uses.

Wonder if it's worth providing a function to initialize the stringinfo
differently for the many cases where we have at least a very good idea
of how long the string will be. It's sad to allocate 1kb just for
e.g. int4send to send an integer plus length.

Greetings,

Andres Freund

[1] 
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=commit;h=127e860cf65f50434e0bb97acbba4b0ea6f38cfd

Unicode escapes with any backend encoding

2020-01-13 Thread Tom Lane

I threatened to do this in another thread [1], so here it is.

This patch removes the restriction that the server encoding must
be UTF-8 in order to write any Unicode escape with a value outside
the ASCII range.  Instead, we'll allow the notation and convert to
the server encoding if that's possible.  (If it isn't, of course
you get an encoding conversion failure.)

In the cases that were already supported, namely ASCII characters
or UTF-8 server encoding, this should be only immeasurably slower
than before.  Otherwise, it calls the appropriate encoding conversion
procedure, which of course will take a little time.  But that's
better than failing, surely.

One way in which this is slightly less good than before is that
you no longer get a syntax error cursor pointing at the problematic
escape when conversion fails.  If we were really excited about that,
something could be done with setting up an errcontext stack entry.
But that would add a few cycles, so I wasn't sure whether to do it.

Grepping for other direct uses of unicode_to_utf8(), I notice that
there are a couple of places in the JSON code where we have a similar
restriction that you can only write a Unicode escape in UTF8 server
encoding.  I'm not sure whether these same semantics could be
applied there, so I didn't touch that.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw%40mail.gmail.com

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index c908e0b..e134877 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -189,6 +189,23 @@ UPDATE "my_table" SET "a" = 5;
 ampersands.  The length limitation still applies.

 
+   
+Quoting an identifier also makes it case-sensitive, whereas
+unquoted names are always folded to lower case.  For example, the
+identifiers FOO, foo, and
+"foo" are considered the same by
+PostgreSQL, but
+"Foo" and "FOO" are
+different from these three and each other.  (The folding of
+unquoted names to lower case in PostgreSQL is
+incompatible with the SQL standard, which says that unquoted names
+should be folded to upper case.  Thus, foo
+should be equivalent to "FOO" not
+"foo" according to the standard.  If you want
+to write portable applications you are advised to always quote a
+particular name or never quote it.)
+   
+

  Unicode escape
  in identifiers
@@ -230,7 +247,8 @@ U&"d!0061t!+61" UESCAPE '!'
 The escape character can be any single character other than a
 hexadecimal digit, the plus sign, a single quote, a double quote,
 or a whitespace character.  Note that the escape character is
-written in single quotes, not double quotes.
+written in single quotes, not double quotes,
+after UESCAPE.

 

@@ -239,32 +257,18 @@ U&"d!0061t!+61" UESCAPE '!'

 

-The Unicode escape syntax works only when the server encoding is
-UTF8.  When other server encodings are used, only code
-points in the ASCII range (up to \007F) can be
-specified.  Both the 4-digit and the 6-digit form can be used to
+Either the 4-digit or the 6-digit escape form can be used to
 specify UTF-16 surrogate pairs to compose characters with code
 points larger than U+, although the availability of the
 6-digit form technically makes this unnecessary.  (Surrogate
-pairs are not stored directly, but combined into a single
-code point that is then encoded in UTF-8.)
+pairs are not stored directly, but are combined into a single
+code point.)

 

-Quoting an identifier also makes it case-sensitive, whereas
-unquoted names are always folded to lower case.  For example, the
-identifiers FOO, foo, and
-"foo" are considered the same by
-PostgreSQL, but
-"Foo" and "FOO" are
-different from these three and each other.  (The folding of
-unquoted names to lower case in PostgreSQL is
-incompatible with the SQL standard, which says that unquoted names
-should be folded to upper case.  Thus, foo
-should be equivalent to "FOO" not
-"foo" according to the standard.  If you want
-to write portable applications you are advised to always quote a
-particular name or never quote it.)
+If the server encoding is not UTF-8, the Unicode code point identified
+by one of these escape sequences is converted to the actual server
+encoding; an error is reported if that's not possible.

   
 
@@ -427,25 +431,11 @@ SELECT 'foo'  'bar';
 
  It is your responsibility that the byte sequences you create,
  especially when using the octal or hexadecimal escapes, compose
- valid characters in the server character set encoding.  When the
- server encoding is UTF-8, then the Unicode escapes or the
+ valid characters in the server character set encoding.
+ A useful alternative

Re: Amcheck: do rightlink verification with lock coupling

2020-01-13 Thread Peter Geoghegan

On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
 wrote:
> Understood. Is that a reason to not commit of this patch now, though?

It could use some polishing. Are you interested in committing it?

-- 
Peter Geoghegan

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2020-01-13 Thread Peter Geoghegan

On Fri, Jan 10, 2020 at 1:36 PM Peter Geoghegan  wrote:
> * HEIKKI: Do we only generate one posting list in one WAL record? I
> would assume it's better to deduplicate everything on the page, since
> we're modifying it anyway.

Still thinking about this one.

> * HEIKKI: Does xl_btree_vacuum WAL record store a whole copy of the
> remaining posting list on an updated tuple? Wouldn't it be simpler and
> more space-efficient to store just the deleted TIDs?

This probably makes sense. The btreevacuumposting() code that
generates "updated" index tuples (tuples that VACUUM uses to replace
existing ones when some but not all of the TIDs need to be removed)
was derived from GIN's ginVacuumItemPointers(). That approach works
well enough, but we can do better now. It shouldn't be that hard.

My preferred approach is a little different to your suggested approach
of storing the deleted TIDs directly. I would like to make it work by
storing an array of uint16 offsets into a posting list, one array per
"updated" tuple (with one offset per deleted TID within each array).
These arrays (which must include an array size indicator at the start)
can appear in the xl_btree_vacuum record, at the same place the patch
currently puts a raw IndexTuple. They'd be equivalent to a raw
IndexTuple -- the REDO routine would reconstruct the same raw posting
list tuple on its own. This approach seems simpler, and is clearly
very space efficient.

This approach is similar to the approach used by REDO routines to
handle posting list splits. Posting list splits must call
_bt_swap_posting() on the primary, while the corresponding REDO
routines also call _bt_swap_posting(). For space efficient "updates",
we'd have to invent a sibling utility function -- we could call it
_bt_delete_posting(), and put it next to _bt_swap_posting() within
nbtdedup.c.

How do you feel about that approach? (And how do you feel about the
existing "REDO routines call _bt_swap_posting()" business that it's
based on?)

> * HEIKKI: Would it be more clear to have a separate struct for the
> posting list split case? (i.e. don't reuse xl_btree_insert)

I've concluded that this one probably isn't worthwhile. We'd have to
carry a totally separate record on the stack within _bt_insertonpg().
If you feel strongly about it, I will reconsider.

-- 
Peter Geoghegan

Re: DROP OWNED CASCADE vs Temp tables

2020-01-13 Thread Michael Paquier

On Mon, Jan 13, 2020 at 07:45:06PM -0300, Alvaro Herrera wrote:
> This seems fiddly to handle better; maybe you'd have to have a new
> PERFORM_DELETION_* flag that says to ignore "missing" objects; so when
> you go from shdepDropOwned, you pass that flag all the way down to
> doDeletion(), so the objtype-specific function is called with
> "missing_ok", and ignore if the object has already gone away.  That's
> tedious because none of the Remove* functions have the concept of
> missing_ok.

Yes, that would be invasive and I'd rather not backpatch such a change
but I don't see a better or cleaner way to handle that correctly
either than the way you are describing.  Looking at all the
subroutines removing the objects by OID, a patch among those lines is
repetitive, though not complicated to do.
--
Michael


signature.asc
Description: PGP signature

Re: DROP OWNED CASCADE vs Temp tables

2020-01-13 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Jan-07, Mithun Cy wrote:
>> I have a test where a user creates a temp table and then disconnect,
>> concurrently we try to do DROP OWNED BY CASCADE on the same user. Seems
>> this causes race condition between temptable deletion during disconnection
>> (@RemoveTempRelations(myTempNamespace)) and DROP OWNED BY CASCADE operation
>> which will try to remove same temp table when they find them as part of
>> pg_shdepend.

> Cute.

Is this really any worse than any other attempt to issue two DROPs against
the same object concurrently?  Maybe we can just call it pilot error.

> This seems fiddly to handle better; maybe you'd have to have a new
> PERFORM_DELETION_* flag that says to ignore "missing" objects; so when
> you go from shdepDropOwned, you pass that flag all the way down to
> doDeletion(), so the objtype-specific function is called with
> "missing_ok", and ignore if the object has already gone away.  That's
> tedious because none of the Remove* functions have the concept of
> missing_ok.

That seems fundamentally wrong.  By the time we've queued an object for
deletion in dependency.c, we have a lock on it, and we've verified that
the object is still there (cf. systable_recheck_tuple calls).
If shdepDropOwned is doing it differently, I'd say shdepDropOwned is
doing it wrong.

regards, tom lane

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-13 Thread vignesh C

On Mon, Jan 13, 2020 at 7:52 PM Tom Lane  wrote:
>
> vignesh C  writes:
> > On Thu, Sep 26, 2019 at 7:17 PM Luis Carril  wrote:
> >>> Your patch is failing the pg_dump TAP tests.  Please use
> >>> configure --enable-tap-tests, fix the problems, then resubmit.
>
> >> Fixed, I've attached a new version.
>
> > Will it be possible to add a test case for this, can we validate by
> > adding one test?
>
> Isn't the change in the TAP test output sufficient?
>

I could not see any expected file output changes in the patch. Should
we modify the existing test to validate this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Option to dump foreign data in pg_dump

2020-01-13 Thread vignesh C

On Fri, Nov 29, 2019 at 2:10 PM Luis Carril  wrote:
>
> Luis,
>
> It seems you've got enough support for this concept, so let's move
> forward with this patch.  There are some comments from Tom about the
> patch; would you like to send an updated version perhaps?
>
> Thanks
>
> Hi,
>
>  I've attached a new version (v6) removing the superfluous JOIN that Tom 
> identified, and not collecting the oids (avoiding the query) if the option is 
> not used at all.
>
> About the testing issues that Tom mentioned:
> I do not see how can we have a pure SQL dummy FDW that tests the 
> functionality. Because the only way to identify if the data of a foreign 
> table for the chosen server is dumped is if the COPY statement appears in the 
> output, but if the C callbacks of the FDW are not implemented, then the 
> SELECT that dumps the data to generate the COPY cannot be executed.
> Also, to test that the include option chooses only the data of the  specified 
> foreign servers we would need some negative testing, i.e. that the COPY 
> statement for the non-desired table does not appear. But I do not find these 
> kind of tests in the test suite, even for other selective options like 
> --table or --exclude-schema.
>

Can you have a look at dump with parallel option. Parallel option will
take a lock on table while invoking lockTableForWorker. May be this is
not required for foreign tables.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: benchmarking Flex practices

2020-01-13 Thread John Naylor

On Tue, Jan 14, 2020 at 4:12 AM Tom Lane  wrote:
>
> John Naylor  writes:
> > [ v11 patch ]
>
> I pushed this with some small cosmetic adjustments.

Thanks for your help hacking on the token filter.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Amcheck: do rightlink verification with lock coupling

2020-01-13 Thread Tomas Vondra


On Mon, Jan 13, 2020 at 03:49:40PM -0800, Peter Geoghegan wrote:

On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra
 wrote:

Understood. Is that a reason to not commit of this patch now, though?


It could use some polishing. Are you interested in committing it?



Not really - as a CFM I was trying to revive patches that seem in good
shape but not moving.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes

2020-01-13 Thread Tom Lane

I wrote:
> Peter Eisentraut  writes:
>> I have found a weird behavior with identifier quoting, which is not the 
>> subject of this patch, but it appears to be affected by it.

> I'll think about how to improve this.  Given that we're dequoting
> filenames explicitly anyway, maybe we don't need to tell readline that
> double-quote is a quoting character.  Another idea is that maybe
> we ought to refactor things so that identifier dequoting and requoting
> is handled explicitly, similarly to the way filenames work now.
> That would make the patch a lot bigger though.

On reflection, it seems like the best bet for the moment is to
remove double-quote from the rl_completer_quote_characters string,
which should restore all behavior around double-quoted strings to
what it was before.  (We have to keep single-quote in that string,
though, or quoted file names fail entirely.)

The only impact this has on filename completion is that we're no
longer bright enough to convert a double-quoted filename to
single-quoted for you, which would be a nice-to-have but it's
hardly core functionality.

At some point it'd be good to undo that and upgrade quoted-identifier
processing, but I don't really want to include such changes in this
patch.  I'm about burned out on tab completion for right now.

regards, tom lane

diff --git a/config/programs.m4 b/config/programs.m4
index 90ff944..68ab823 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -209,17 +209,20 @@ fi
 
 
 
-# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
-# ---
+# PGAC_READLINE_VARIABLES
+# ---
 # Readline versions < 2.1 don't have rl_completion_append_character
+# Libedit lacks rl_filename_quote_characters and rl_filename_quoting_function
 
-AC_DEFUN([PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER],
+AC_DEFUN([PGAC_READLINE_VARIABLES],
 [AC_CACHE_CHECK([for rl_completion_append_character], pgac_cv_var_rl_completion_append_character,
 [AC_LINK_IFELSE([AC_LANG_PROGRAM([#include 
-#ifdef HAVE_READLINE_READLINE_H
-# include 
+#if defined(HAVE_READLINE_READLINE_H)
+#include 
+#elif defined(HAVE_EDITLINE_READLINE_H)
+#include 
 #elif defined(HAVE_READLINE_H)
-# include 
+#include 
 #endif
 ],
 [rl_completion_append_character = 'x';])],
@@ -228,7 +231,42 @@ AC_DEFUN([PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER],
 if test x"$pgac_cv_var_rl_completion_append_character" = x"yes"; then
 AC_DEFINE(HAVE_RL_COMPLETION_APPEND_CHARACTER, 1,
   [Define to 1 if you have the global variable 'rl_completion_append_character'.])
-fi])# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
+fi
+AC_CACHE_CHECK([for rl_filename_quote_characters], pgac_cv_var_rl_filename_quote_characters,
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include 
+#if defined(HAVE_READLINE_READLINE_H)
+#include 
+#elif defined(HAVE_EDITLINE_READLINE_H)
+#include 
+#elif defined(HAVE_READLINE_H)
+#include 
+#endif
+],
+[rl_filename_quote_characters = "x";])],
+[pgac_cv_var_rl_filename_quote_characters=yes],
+[pgac_cv_var_rl_filename_quote_characters=no])])
+if test x"$pgac_cv_var_rl_filename_quote_characters" = x"yes"; then
+AC_DEFINE(HAVE_RL_FILENAME_QUOTE_CHARACTERS, 1,
+  [Define to 1 if you have the global variable 'rl_filename_quote_characters'.])
+fi
+AC_CACHE_CHECK([for rl_filename_quoting_function], pgac_cv_var_rl_filename_quoting_function,
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include 
+#if defined(HAVE_READLINE_READLINE_H)
+#include 
+#elif defined(HAVE_EDITLINE_READLINE_H)
+#include 
+#elif defined(HAVE_READLINE_H)
+#include 
+#endif
+],
+[rl_filename_quoting_function = 0;])],
+[pgac_cv_var_rl_filename_quoting_function=yes],
+[pgac_cv_var_rl_filename_quoting_function=no])])
+if test x"$pgac_cv_var_rl_filename_quoting_function" = x"yes"; then
+AC_DEFINE(HAVE_RL_FILENAME_QUOTING_FUNCTION, 1,
+  [Define to 1 if you have the global variable 'rl_filename_quoting_function'.])
+fi
+])# PGAC_READLINE_VARIABLES
 
 
 
diff --git a/configure b/configure
index 25cfbcb..a46ba40 100755
--- a/configure
+++ b/configure
@@ -16316,10 +16316,12 @@ else
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include 
-#ifdef HAVE_READLINE_READLINE_H
-# include 
+#if defined(HAVE_READLINE_READLINE_H)
+#include 
+#elif defined(HAVE_EDITLINE_READLINE_H)
+#include 
 #elif defined(HAVE_READLINE_H)
-# include 
+#include 
 #endif
 
 int
@@ -16345,6 +16347,85 @@ if test x"$pgac_cv_var_rl_completion_append_character" = x"yes"; then
 $as_echo "#define HAVE_RL_COMPLETION_APPEND_CHARACTER 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for rl_filename_quote_characters" >&5
+$as_echo_n "checking for rl_filename_quote_characters... " >&6; }
+if ${pgac_cv_var_rl_filename_quote_characters+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+#if defined(HAVE_READLINE_READLINE_H)
+#include 
+#elif defined(HAVE_EDITLINE_READLINE_H)
+#include 
+#e

Re: Unicode escapes with any backend encoding

2020-01-13 Thread Andrew Dunstan

On Tue, Jan 14, 2020 at 10:02 AM Tom Lane  wrote:
>

>
> Grepping for other direct uses of unicode_to_utf8(), I notice that
> there are a couple of places in the JSON code where we have a similar
> restriction that you can only write a Unicode escape in UTF8 server
> encoding.  I'm not sure whether these same semantics could be
> applied there, so I didn't touch that.
>


Off the cuff I'd be inclined to say we should keep the text escape
rules the same. We've already extended the JSON standard y allowing
non-UTF8 encodings.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unicode escapes with any backend encoding

2020-01-13 Thread Tom Lane

Andrew Dunstan  writes:
> On Tue, Jan 14, 2020 at 10:02 AM Tom Lane  wrote:
>> Grepping for other direct uses of unicode_to_utf8(), I notice that
>> there are a couple of places in the JSON code where we have a similar
>> restriction that you can only write a Unicode escape in UTF8 server
>> encoding.  I'm not sure whether these same semantics could be
>> applied there, so I didn't touch that.

> Off the cuff I'd be inclined to say we should keep the text escape
> rules the same. We've already extended the JSON standard y allowing
> non-UTF8 encodings.

Right.  I'm just thinking though that if you can write "é" literally
in a JSON string, even though you're using LATIN1 not UTF8, then why
not allow writing that as "\u00E9" instead?  The latter is arguably
truer to spec.

However, if JSONB collapses "\u00E9" to LATIN1 "é", that would be bad,
unless we have a way to undo it on printout.  So there might be
some more moving parts here than I thought.

regards, tom lane

Re: Making psql error out on output failures

2020-01-13 Thread David Z

Hi Daniel,
I agree with you if psql output doesn't indicate any error when the disk is 
full, then it is obviously not nice. In some situations, people may end up lost 
data permanently. 
However, after I quickly applied your idea/patch to "commit 
bf65f3c8871bcc95a3b4d5bcb5409d3df05c8273 (HEAD -> REL_12_STABLE, 
origin/REL_12_STABLE)", and I found the behaviours/results are different.

Here is the steps and output, 
$ sudo mkdir -p /mnt/ramdisk
$ sudo mount -t tmpfs -o rw,size=1M tmpfs /mnt/ramdisk

Test-1: delete the "file", and run psql command from a terminal directly,
$ rm /mnt/ramdisk/file 
$ psql -d postgres  -At -c "select repeat('111', 100)" > /mnt/ramdisk/file
Error printing tuples
then dump the file,
$ rm /mnt/ramdisk/file 
$ hexdump -C /mnt/ramdisk/file 
  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31  ||
*
0010

Test-2: delete the "file", run the command within psql console,
$ rm /mnt/ramdisk/file 
$ psql -d postgres
psql (12.1)
Type "help" for help.

postgres=# select repeat('111', 100) \g /mnt/ramdisk/file
Error printing tuples
postgres=# 
Then dump the file again,
$ hexdump -C /mnt/ramdisk/file 
  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  ||
*
0010

As you can see the content are different after applied the patch. 

David

The new status of this patch is: Waiting on Author

Re: Avoid full GIN index scan when possible

2020-01-13 Thread Alexander Korotkov

Updated patch is attached.  It contains more comments as well as commit message.

On Sun, Jan 12, 2020 at 12:10 AM Alexander Korotkov
 wrote:
> On Sat, Jan 11, 2020 at 3:19 PM Julien Rouhaud  wrote:
> > While at it, IIUC only excludeOnly key can have nrequired == 0 (if
> > that's the case, this could be explicitly said in startScanKey
> > relevant comment), so it'd be more consistent to also use excludeOnly
> > rather than nrequired in this assert?
>
> Make sense.  I'll adjust the assert as well as comment.

The changes to this assertion are not actually needed.  I just
accidentally forgot to revert them.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v12.patch
Description: Binary data

Re: Protect syscache from bloating with negative cache entries

2020-01-13 Thread Kyotaro Horiguchi

This is a new complete workable patch after a long time of struggling
with benchmarking.

At Tue, 19 Nov 2019 19:48:10 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I ran the run2.sh script attached, which runs catcachebench2(), which
> asks SearchSysCache3() for cached entries (almost) 24 times per
> run.  The number of each output line is the mean of 3 times runs, and
> stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl
> | psql" creates a database for the benchmark. catcachebench2() runs
> the shortest path in the three in the attached benchmark program.
> 
> (pg_ctl start)
> $ perl gen_tbl.pl | psql ...
> (pg_ctl stop)

I wonder why I took the average of the time instead of choose the
fastest one.  This benchmark is extremely CPU intensive so the fastest
run reliably represents the perfromance.

I changed the benchmark so that it shows the time of the fastest run
(run4.sh). Based on the latest result, I used the pattern 3
(SearchSyscacheN indirection, but wrongly pointed as 1 in the last
mail) in the latest version,

I took the number of the fastest time among 3 iteration of 5 runs of
both master/patched O2 binaries.

 version |   min   
-+-
 master  | 7986.65 
 patched | 7984.47   = 'indirect' below

I would say this version doesn't get degradaded by indirect calls.
So, I applied the other part of the catcache expiration patch as the
succeeding parts. After that I got somewhat strange but very stable
result.  Just adding struct members acceleartes the benchmark. The
numbers are the fastest time of 20 runs of the bencmark in 10 times
iterations.

  ms
master  7980.79   # the master with the benchmark extension (0001)
=
base7340.96   # add only struct members and a GUC variable. (0002)
indirect7998.68   # call SearchCatCacheN indirectly (0003)
=
expire-off  7422.30   # CatCache expiration (0004)
  # (catalog_cache_prune_min_age = -1)
expire-on   7861.13   # CatCache expiration (catalog_cache_prune_min_age = 0)


The patch accelerates CatCaCheSearch for uncertain reasons. I'm not
sure what makes the difference between about 8000ms and about 7400 ms,
though. Several times building of all versions then running the
benchmark gave me the results with the same tendency. I once stop this
work at this point and continue later. The following files are
attached.

0001-catcache-benchmark-extension.patch:
  benchmnark extension used by the benchmarking here.  The test tables
  are generated using gentbl2.pl attached. (perl gentbk2.pl | psql)

0002-base_change.patch:
  Preliminary adds some struct members and a GUC variable to see if
  they cause any extent of degradation.

0003-Make-CatCacheSearchN-indirect-functions.patch:
  Rewrite to change CatCacheSearchN functions to be called indirectly.

0004-CatCache-expiration-feature.patch:
  Add CatCache expiration feature.

gentbl2.pl: A script that emits SQL statements to generate test tables.
run4.sh   : The test script I used for benchmarkiing here.
build2.sh : A script I used to build the four types of binaries used here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From dacf4a2ac9eb49099e744ee24066b94e9f78aa61 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 14 Nov 2019 19:24:36 +0900
Subject: [PATCH 1/4] catcache benchmark extension

Provides the function catcachebench(bench_no int), which runs CPU
intensive benchmark on catcache search. The test table is created by a
script separately provided.

catcachebench(0): prewarm catcache with provided test tables.
catcachebench(1): fetches all attribute stats of all tables.
This benchmark loads a vast number of unique entries.
	Expriration doesn't work since it runs in a transaction.
catcachebench(2): fetches all attribute stats of a tables many times.
This benchmark repeatedly accesses already loaded entries.
	Expriration doesn't work since it runs in a transaction.
catcachebench(3): fetches all attribute stats of all tables four times.
Different from other modes, this runs expiration by forcibly
updates reference clock variable every 1000 entries.

At this point, variables needed for the expiration feature is not
added so SetCatCacheClock is a dummy macro that just replaces it with
its parameter.
---
 contrib/catcachebench/Makefile   |  17 +
 contrib/catcachebench/catcachebench--0.0.sql |  14 +
 contrib/catcachebench/catcachebench.c| 330 +++
 contrib/catcachebench/catcachebench.control  |   6 +
 src/backend/utils/cache/catcache.c   |  35 ++
 src/backend/utils/cache/syscache.c   |   2 +-
 src/include/utils/catcache.h |   3 +
 7 files changed, 406 insertions(+), 1 deletion(-)
 create mode 100644 contrib/catcachebench/Makefile
 create mode 100644 contrib/catcachebench/catcachebench--0.0.sql
 create mode 100644 contrib/catcachebench/catcachebench.c
 create mode 100644 contrib/catcachebench/catcachebench.control

dif

Improve errors when setting incorrect bounds for SSL protocols

2020-01-13 Thread Michael Paquier

Hi all,
(Daniel G. in CC.)

As discussed on the thread to be able to set the min/max SSL protocols
with libpq, when mixing incorrect bounds the user experience is not
that good: 
https://www.postgresql.org/message-id/9cfa34ee-f670-419d-b92c-cb7943a27...@yesql.se

It happens that the error generated with incorrect combinations
depends solely on what OpenSSL thinks is fine, and that's the
following:
psql: error: could not connect to server: SSL error: tlsv1 alert
internal error

It is hard for users to understand what such an error means and how to
act on it. 

Please note that OpenSSL 1.1.0 has added two routines to be able to
get the min/max protocols set in a context, called
SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
OpenSSL I think that it is better to use
ssl_protocol_version_to_openssl to do the parsing work.  I also found
that it is easier to check for compatible versions after setting both
bounds in the SSL context, so as there is no need to worry about
invalid values depending on the build of OpenSSL used.

So attached is a patch to improve the detection of incorrect
combinations.  Once applied, we get a complain about an incorrect
version at server startup (FATAL) or reload (LOG).  The patch includes
new regression tests.

Thoughts?
--
Michael
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 62f1fcab2b..75fdb2e91b 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,8 +67,7 @@ static bool SSL_initialized = false;
 static bool dummy_ssl_passwd_cb_called = false;
 static bool ssl_is_server_start;
 
-static int	ssl_protocol_version_to_openssl(int v, const char *guc_name,
-			int loglevel);
+static int	ssl_protocol_version_to_openssl(int v);
 #ifndef SSL_CTX_set_min_proto_version
 static int	SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version);
 static int	SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version);
@@ -84,6 +83,8 @@ be_tls_init(bool isServerStart)
 {
 	STACK_OF(X509_NAME) *root_cert_list = NULL;
 	SSL_CTX*context;
+	int			ssl_ver_min = -1;
+	int			ssl_ver_max = -1;
 
 	/* This stuff need be done only once. */
 	if (!SSL_initialized)
@@ -192,13 +193,19 @@ be_tls_init(bool isServerStart)
 
 	if (ssl_min_protocol_version)
 	{
-		int			ssl_ver = ssl_protocol_version_to_openssl(ssl_min_protocol_version,
-			  "ssl_min_protocol_version",
-			  isServerStart ? FATAL : LOG);
+		ssl_ver_min = ssl_protocol_version_to_openssl(ssl_min_protocol_version);
 
-		if (ssl_ver == -1)
+		if (ssl_ver_min == -1)
+		{
+			ereport(isServerStart ? FATAL : LOG,
+	(errmsg("%s setting %s not supported by this build",
+			"ssl_min_protocol_version",
+			GetConfigOption("ssl_min_protocol_version",
+			false, false;
 			goto error;
-		if (!SSL_CTX_set_min_proto_version(context, ssl_ver))
+		}
+
+		if (!SSL_CTX_set_min_proto_version(context, ssl_ver_min))
 		{
 			ereport(isServerStart ? FATAL : LOG,
 	(errmsg("could not set minimum SSL protocol version")));
@@ -208,13 +215,19 @@ be_tls_init(bool isServerStart)
 
 	if (ssl_max_protocol_version)
 	{
-		int			ssl_ver = ssl_protocol_version_to_openssl(ssl_max_protocol_version,
-			  "ssl_max_protocol_version",
-			  isServerStart ? FATAL : LOG);
+		ssl_ver_max = ssl_protocol_version_to_openssl(ssl_max_protocol_version);
 
-		if (ssl_ver == -1)
+		if (ssl_ver_max == -1)
+		{
+			ereport(isServerStart ? FATAL : LOG,
+	(errmsg("%s setting %s not supported by this build",
+			"ssl_max_protocol_version",
+			GetConfigOption("ssl_max_protocol_version",
+			false, false;
 			goto error;
-		if (!SSL_CTX_set_max_proto_version(context, ssl_ver))
+		}
+
+		if (!SSL_CTX_set_max_proto_version(context, ssl_ver_max))
 		{
 			ereport(isServerStart ? FATAL : LOG,
 	(errmsg("could not set maximum SSL protocol version")));
@@ -222,6 +235,20 @@ be_tls_init(bool isServerStart)
 		}
 	}
 
+	/* Check compatibility of min/max protocols */
+	if (ssl_min_protocol_version &&
+		ssl_max_protocol_version)
+	{
+		/*
+		 * No need to check for invalid values (-1) for each protocol number
+		 * as the code above would have already generated an error.
+		 */
+		if (ssl_ver_min > ssl_ver_max)
+			ereport(isServerStart ? FATAL : LOG,
+	(errmsg("incompatible min/max SSL protocol versions set")));
+		goto error;
+	}
+
 	/* disallow SSL session tickets */
 	SSL_CTX_set_options(context, SSL_OP_NO_TICKET);
 
@@ -1275,12 +1302,11 @@ X509_NAME_to_cstring(X509_NAME *name)
  * guc.c independent of OpenSSL availability and version.
  *
  * If a version is passed that is not supported by the current OpenSSL
- * version, then we log with the given loglevel and return (if we return) -1.
- * If a nonnegative value is returned, subsequent code can assume it's working
- * with a supported version.
+ * version, then we return -1.  If a nonnegative value is returned,
+ * subsequen

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-13 Thread Amit Kapila

On Sun, Jan 12, 2020 at 8:18 AM Amit Kapila  wrote:
>
> On Sat, Jan 11, 2020 at 11:16 AM Tom Lane  wrote:
> >
>
> > * The seeming bug in v10 suggests that we aren't testing large enough
> > logical-decoding cases, or at least aren't noticing leaks in that
> > area.  I'm not sure what a good design is for testing that.  I'm not
> > thrilled with just using a larger (and slower) test case, but it's
> > not clear to me how else to attack it.
> >
>
> It is not clear to me either at this stage, but I think we can decide
> that after chasing the issue in v10.  My current plan is to revert
> this test and make a note of the memory leak problem found (probably
> track in Older Bugs section of PostgreSQL 12 Open Items).
>

Pushed the revert and added an open item in the 'Older Bugs' section
of PostgreSQL 12 Open Items [1].


[1] - https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

2020-01-13 Thread Masahiko Sawada

On Mon, 13 Jan 2020 at 12:50, Amit Kapila  wrote:
>
> On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
>  wrote:
> >
> > On Sat, 11 Jan 2020 at 13:18, Amit Kapila  wrote:
> > >
> > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor 
> > > >  wrote:
> > > > >
> > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > > > > >
> > > > > > Hi
> > > > > > Thank you for update! I looked again
> > > > > >
> > > > > > (vacuum_indexes_leader)
> > > > > > +   /* Skip the indexes that can be processed by 
> > > > > > parallel workers */
> > > > > > +   if (!skip_index)
> > > > > > +   continue;
> > > > > >
> > > > > > Does the variable name skip_index not confuse here? Maybe rename to 
> > > > > > something like can_parallel?
> > > > >
> > > > > I also agree with your point.
> > > >
> > > > I don't think the change is a good idea.
> > > >
> > > > -   boolskip_index = 
> > > > (get_indstats(lps->lvshared, i) == NULL ||
> > > > - 
> > > > skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > > +   boolcan_parallel = 
> > > > (get_indstats(lps->lvshared, i) == NULL ||
> > > > +   
> > > > skip_parallel_vacuum_index(Irel[i],
> > > > +   
> > > >lps->lvshared));
> > > >
> > > > The above condition is true when the index can *not* do parallel index 
> > > > vacuum. How about changing it to skipped_index and change the comment 
> > > > to something like “We are interested in only index skipped parallel 
> > > > vacuum”?
> > > >
> > >
> > > Hmm, I find the current code and comment better than what you or
> > > Sergei are proposing.  I am not sure what is the point of confusion in
> > > the current code?
> >
> > Yeah the current code is also good. I just thought they were concerned
> > that the variable name skip_index might be confusing because we skip
> > if skip_index is NOT true.
> >
>
> Okay, would it better if we get rid of this variable and have code like below?
>
> /* Skip the indexes that can be processed by parallel workers */
> if ( !(get_indstats(lps->lvshared, i) == NULL ||
> skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> continue;

Make sense to me.

> ...
>
> > >
> > > > >
> > > > > >
> > > > > > Another question about behavior on temporary tables. Use case: the 
> > > > > > user commands just "vacuum;" to vacuum entire database (and has 
> > > > > > enough maintenance workers). Vacuum starts fine in parallel, but on 
> > > > > > first temporary table we hit:
> > > > > >
> > > > > > +   if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 
> > > > > > 0)
> > > > > > +   {
> > > > > > +   ereport(WARNING,
> > > > > > +   (errmsg("disabling parallel option 
> > > > > > of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > > +   
> > > > > > RelationGetRelationName(onerel;
> > > > > > +   params->nworkers = -1;
> > > > > > +   }
> > > > > >
> > > > > > And therefore we turn off the parallel vacuum for the remaining 
> > > > > > tables... Can we improve this case?
> > > > >
> > > > > Good point.
> > > > > Yes, we should improve this. I tried to fix this.
> > > >
> > > > +1
> > > >
> > >
> > > Yeah, we can improve the situation here.  I think we don't need to
> > > change the value of params->nworkers at first place if allow
> > > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > > display warning unless the user has explicitly asked for parallel
> > > option.  See the fix in the attached patch.
> >
> > Agreed. But with the updated patch the PARALLEL option without the
> > parallel degree doesn't display warning because params->nworkers = 0
> > in that case. So how about restoring params->nworkers at the end of
> > vacuum_rel()?
> >
>
> I had also thought on those lines, but I was not entirely sure about
> this resetting of workers.  Today, again thinking about it, it seems
> the idea Mahendra is suggesting that is giving an error if the
> parallel degree is not specified seems reasonable to me.  This means
> Vacuum (parallel), Vacuum (parallel) , etc. will give an
> error "parallel degree must be specified".  This idea has merit as now
> we are supporting a parallel vacuum by default, so a 'parallel' option
> without a parallel degree doesn't have any meaning.  If we do that,
> then we don't need to do anything additional about the handling of
> temp tables (other than what patch is already doing) as well.  What do
> you think?
>

Good point! Agreed.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.

Re: [HACKERS] Block level parallel vacuum

2020-01-13 Thread Masahiko Sawada

On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor  wrote:
>
> On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> >
> > Hi
> > Thank you for update! I looked again
> >
> > (vacuum_indexes_leader)
> > +   /* Skip the indexes that can be processed by parallel 
> > workers */
> > +   if (!skip_index)
> > +   continue;
> >
> > Does the variable name skip_index not confuse here? Maybe rename to 
> > something like can_parallel?
> >
>
> Again I looked into code and thought that somehow if we can add a
> boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> identify that this index is supporting parallel vacuum or not, then it
> will be easy to skip those indexes and multiple time we will not call
> skip_parallel_vacuum_index (from vacuum_indexes_leader and
> parallel_vacuum_index)
> We can have a linked list of non-parallel supported indexes, then
> directly we can pass to vacuum_indexes_leader.
>
> Ex: let suppose we have 5 indexes into a table.  If before launching
> parallel workers, if we can add boolean flag(can_parallel)
> IndexBulkDeleteResult structure to identify that this index is
> supporting parallel vacuum or not.
> Let index 1, 4 are not supporting parallel vacuum so we already have
> info in a linked list that 1->4 are not supporting parallel vacuum, so
> parallel_vacuum_index will process these indexes and rest will be
> processed by parallel workers. If parallel worker found that
> can_parallel is false, then it will skip that index.
>
> As per my understanding, if we implement this, then we can avoid
> multiple function calling of skip_parallel_vacuum_index and if there
> is no index which can't  performe parallel vacuum, then we will not
> call vacuum_indexes_leader as head of list pointing to null. (we can
> save unnecessary calling of vacuum_indexes_leader)
>
> Thoughts?
>

We skip not only indexes that don't support parallel index vacuum but
also indexes supporting it depending on vacuum phase. That is, we
could skip different indexes at different vacuum phase. Therefore with
your idea, we would need to have at least three linked lists for each
possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
that right?

I think we can check if there are indexes that should be processed by
the leader process before entering the loop in vacuum_indexes_leader
by comparing nindexes_parallel_XXX of LVParallelState to the number of
indexes but I'm not sure it's effective since the number of indexes on
a table should be small.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Amcheck: do rightlink verification with lock coupling

2020-01-13 Thread Andrey Borodin

Hi Peter! Sorry for answering so long.

> 11 янв. 2020 г., в 7:49, Peter Geoghegan  написал(а):
> 
> I'm curious why Andrey's corruption problems were not detected by the
> cross-page amcheck test, though. We compare the first non-pivot tuple
> on the right sibling leaf page with the last one on the target page,
> towards the end of bt_target_page_check() -- isn't that almost as good
> as what you have here in practice? I probably would have added
> something like this myself earlier, if I had reason to think that
> verification would be a lot more effective that way.

We were dealing with corruption caused by lost page update. Consider two pages:
A->B
If A is split into A` and C we get:
A`->C->B
But if update of A is lost we still have
A->B, but B backward pointers points to C.
B's smallest key is bigger than hikey of A`, but this do not violate 
cross-pages invariant.

Page updates may be lost due to bug in backup software with incremental 
backups, bug in storage layer of Aurora-style system, bug in page cache, 
incorrect
fsync error handling, bug in ssd firmware etc. And our data checksums do not
detect this kind of corruption. BTW I think that it would be better if our
checksums were not stored on a page itseft, they could detect this kind of 
faults.

We were able to timely detect all those problems on primaries in our testing
environment. But much later found out that some standbys were corrupted,
the problem appeared only when they were promoted.
Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more
invariant in standby checks.


Thanks!

Best regards, Andrey Borodin.

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-13 Thread Amit Kapila

On Mon, Jan 13, 2020 at 3:18 PM Dilip Kumar  wrote:
>
> On Thu, Jan 9, 2020 at 12:09 PM Amit Kapila  wrote:
> >
> > On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar  wrote:
> > >
> > > On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila  
> > > wrote:
> > > >
> > > > >  The problem is that when we
> > > > > get a toasted chunks we remember the changes in the memory(hash table)
> > > > > but don't stream until we get the actual change on the main table.
> > > > > Now, the problem is that we might get the change of the toasted table
> > > > > and the main table in different streams.  So basically, in a stream,
> > > > > if we have only got the toasted tuples then even after
> > > > > ReorderBufferStreamTXN the memory usage will not be reduced.
> > > > >
> > > >
> > > > I think we can't split such changes in a different stream (unless we
> > > > design an entirely new solution to send partial changes of toast
> > > > data), so we need to send them together. We can keep a flag like
> > > > data_complete in ReorderBufferTxn and mark it complete only when we
> > > > are able to assemble the entire tuple.  Now, whenever, we try to
> > > > stream the changes once we reach the memory threshold, we can check
> > > > whether the data_complete flag is true

Here, we can also consider streaming the changes when data_complete is
false, but some additional changes have been added to the same txn as
the new changes might complete the tuple.

> > > > , if so, then only send the
> > > > changes, otherwise, we can pick the next largest transaction.  I think
> > > > we can retry it for few times and if we get the incomplete data for
> > > > multiple transactions, then we can decide to spill the transaction or
> > > > maybe we can directly spill the first largest transaction which has
> > > > incomplete data.
> > > >
> > > Yeah, we might do something on this line.  Basically, we need to mark
> > > the top-transaction as data-incomplete if any of its subtransaction is
> > > having data-incomplete (it will always be the latest sub-transaction
> > > of the top transaction).  Also, for streaming, we are checking the
> > > largest top transaction whereas for spilling we just need the larget
> > > (sub) transaction.   So we also need to decide while picking the
> > > largest top transaction for streaming, if we get a few transactions
> > > with in-complete data then how we will go for the spill.  Do we spill
> > > all the sub-transactions under this top transaction or we will again
> > > find the larget (sub) transaction for spilling.
> > >
> >
> > I think it is better to do later as that will lead to the spill of
> > only required (minimum changes to get the memory below threshold)
> > changes.
> I think instead of doing this can't we just spill the changes which
> are in toast_hash.  Basically, at the end of the stream, we have some
> toast tuple which we could not stream because we did not have the
> insert for the main table then we can spill only those changes which
> are in tuple hash.
>

Hmm, I think this can turn out to be inefficient because we can easily
end up spilling the data even when we don't need to so.  Consider
cases, where part of the streamed changes are for toast, and remaining
are the changes which we would have streamed and hence can be removed.
In such cases, we could have easily consumed remaining changes for
toast without spilling.  Also, I am not sure if spilling changes from
the hash table is a good idea as they are no more in the same order as
they were in ReorderBuffer which means the order in which we serialize
the changes normally would change and that might have some impact, so
we would need some more study if we want to pursue this idea.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Add pg_file_sync() to adminpack

2020-01-13 Thread Michael Paquier

On Mon, Jan 13, 2020 at 03:39:32PM +0100, Julien Rouhaud wrote:
> Actually, can't it create a security hazard, for instance if you call
> pg_file_sync() on a heap file and the calls errors out, since it's
> bypassing data_sync_retry?

Are you mistaking security with durability here?  By default, the
function proposed is only executable by a superuser, so that's not
really a security concern..  But I agree that failing to detect a
PANIC on a fsync for a sensitive Postgres file could lead to
corruptions.  That's why we PANIC these days. 
--
Michael


signature.asc
Description: PGP signature

Re: Add pg_file_sync() to adminpack

2020-01-13 Thread Julien Rouhaud

On Tue, Jan 14, 2020 at 7:18 AM Michael Paquier  wrote:
>
> On Mon, Jan 13, 2020 at 03:39:32PM +0100, Julien Rouhaud wrote:
> > Actually, can't it create a security hazard, for instance if you call
> > pg_file_sync() on a heap file and the calls errors out, since it's
> > bypassing data_sync_retry?
>
> Are you mistaking security with durability here?

Yes, data durability sorry.

>  By default, the
> function proposed is only executable by a superuser, so that's not
> really a security concern..  But I agree that failing to detect a
> PANIC on a fsync for a sensitive Postgres file could lead to
> corruptions.  That's why we PANIC these days.

Exactly.  My concern is that some superuser may not be aware that
pg_file_sync could actually corrupt data, so there should be a big red
warning explaining that.

Re: base backup client as auxiliary backend process

2020-01-13 Thread Masahiko Sawada

On Sat, 11 Jan 2020 at 18:52, Peter Eisentraut
 wrote:
>
> On 2020-01-10 04:32, Masahiko Sawada wrote:
> > I agreed that these patches are useful on its own and 0001 patch and
>
> committed 0001
>
> > 0002 patch look good to me. For 0003 patch,
> >
> > +  linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may 
> > use
> > +  a temporary replication slot (determined by  > +  linkend="guc-wal-receiver-create-temp-slot"/>), but these are not 
> > shown
> > +  here.
> >
> > I think it's better to show the temporary slot name on
> > pg_stat_wal_receiver view. Otherwise user would have no idea about
> > what wal receiver is using the temporary slot.
>
> Makes sense.  It makes the code a bit more fiddly, but it seems worth
> it.  New patches attached.

Thank you for updating the patch!

- Replication slot name used by this WAL receiver
+ 
+  Replication slot name used by this WAL receiver.  This is only set if a
+  permanent replication slot is set using .  Otherwise, the WAL receiver may use
+  a temporary replication slot (determined by ), but these are not shown
+  here.
+ 

Now that the slot name is shown even if it's a temp slot the above
documentation changes needs to be changed. Other changes look good to
me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Implementing Incremental View Maintenance

2020-01-13 Thread Takuma Hoshiai

On Sat, 11 Jan 2020 09:27:58 +0900
nuko yokohama  wrote:

> LIMIT clause without ORDER BY should be prohibited when creating
> incremental materialized views.
> 
> In SQL, the result of a LIMIT clause without ORDER BY is undefined.
> If the LIMIT clause is allowed when creating an incremental materialized
> view, incorrect results will be obtained when the view is updated after
> updating the source table.

Thank you for your advice. It's just as you said. 
LIMIT/OFFSET clause should is prohibited. We will add this to next patch.

Best Regards,
 Takuma Hoshiai

> 
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql --version
> psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql
> DROP TABLE IF EXISTS test CASCADE;
> psql:limit-problem.sql:1: NOTICE:  drop cascades to materialized view
> test_imv
> DROP TABLE
> CREATE TABLE test (id int primary key, data text);
> CREATE TABLE
> INSERT INTO test VALUES (generate_series(1, 10), 'foo');
> INSERT 0 10
> CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1;
> SELECT 1
>Materialized view "public.test_imv"
> Column |  Type   | Collation | Nullable | Default | Storage  |
> Stats target | Description
> ---+-+---+--+-+--+--+-
>  id| integer |   |  | | plain|
>  |
>  data  | text|   |  | | extended |
>  |
>  __ivm_count__ | bigint  |   |  | | plain|
>  |
> View definition:
>  SELECT test.id,
> test.data
>FROM test
>  LIMIT 1;
> Access method: heap
> Incremental view maintenance: yes
> 
> SELECT * FROM test LIMIT 1;
>  id | data
> +--
>   1 | foo
> (1 row)
> 
> TABLE test_imv;
>  id | data
> +--
>   1 | foo
> (1 row)
> 
> UPDATE test SET data = 'bar' WHERE id = 1;
> UPDATE 1
> SELECT * FROM test LIMIT 1;
>  id | data
> +--
>   2 | foo
> (1 row)
> 
> TABLE test_imv;
>  id | data
> +--
>   1 | bar
> (1 row)
> 
> DELETE FROM test WHERE id = 1;
> DELETE 1
> SELECT * FROM test LIMIT 1;
>  id | data
> +--
>   2 | foo
> (1 row)
> 
> TABLE test_imv;
>  id | data
> +--
> (0 rows)
> ```
> 
> ORDER BY clause is not allowed when executing CREATE INCREMENTAL
> MATELIARIZED VIEW.
> We propose not to allow LIMIT clauses as well.
> 
> 
> 2018年12月27日(木) 21:57 Yugo Nagata :
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER trigge

76 matches

Mail list logo