Re: Global temporary tables

2019-08-18 Thread Konstantin Knizhnik



On 16.08.2019 20:17, Pavel Stehule wrote:



pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> napsal:


I did more investigations of performance of global temp tables
with shared buffers vs. vanilla (local) temp tables.

1. Combination of persistent and temporary tables in the same query.

Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values
(generate_series(1,1),generate_series(1,1));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);

Size of table is about 6Gb, I run this test on desktop with 16GB
of RAM and postgres with 1Gb shared buffers.
I run two queries:

insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;

where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:

Percent of selected data
1%
10%
100%
Local temp table
44610
90
47920
891
63414
21612
Global temp table
44669
35
47939
298
59159
26015


As you can see, time of insertion in temporary table is almost the
same
and time of traversal of temporary table is about twice smaller
for global temp table
when it fits in RAM together with persistent table and slightly
worser when it doesn't fit.



2. Temporary table only access.
The same system, but Postgres is configured with
shared_buffers=10GB, max_parallel_workers = 4,
max_parallel_workers_per_gather = 4

Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4
bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values
(generate_series(1,1),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;

Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3
bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9
bigint);
insert into global_temp values
(generate_series(1,1),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;

Results (msec):

Insert
Select
Local temp table37489
48322
Global temp table   44358
3003


So insertion in local temp table is performed slightly faster but
select is 16 times slower!

Conclusion:
In the assumption then temp table fits in memory, global temp
tables with shared buffers provides better performance than local
temp table.
I didn't consider here global temp tables with local buffers
because for them results should be similar with local temp tables.


Probably there is not a reason why shared buffers should be slower 
than local buffers when system is under low load.


access to shared memory is protected by spin locks (are cheap for few 
processes), so tests in one or few process are not too important (or 
it is just one side of space)


another topic can be performance on MS Sys - there are stories about 
not perfect performance of shared memory there.


Regards

Pavel

 One more test which is used to simulate access to temp tables under 
high load.

I am using "upsert" into temp table in multiple connections.

create global temp table gtemp (x integer primary key, y bigint);

upsert.sql:
insert into gtemp values (random() * 100, 0) on conflict(x) do 
update set y=gtemp.y+1;


pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres


I failed to find some standard way in pgbech to perform per-session 
initialization to create local temp table,

so I just insert this code in pgbench code:

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
    {
    if ((state[i].con = doConnect()) == NULL)
    goto done;
+   executeStatement(state[i].con, "create temp 
table ltemp(x integer primary key, y bigint)");

    }
    }


Results are the following:
Global temp table: 117526 TPS
Local temp table:   107802 TPS


So even for this workload global temp table with shared buffers are a 
little bit faster.

I will be pleased if you can propose some other testing scenario.



Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements

2019-08-18 Thread Daniel Migowski

Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed:
On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski > wrote:



attached you find a patch that adds a new GUC:


Quick questions before looking at the patch.


prepared_statement_limit:

 - Do we have a consensus about the name of GUC? I don't think it is
the right name for that.
No, it is a proposal. It could also be named plancache_mem or 
cachedplansource_maxmem or anything else. It was intended to make 
prepared statements not use up all my mem, but development has shown 
that it could also be used for other CachedPlans, as long as it is a 
saved plan.
 - Is this a WIP patch or the final patch? Because I can see TODO and 
non-standard

comments in the patch.


Definitely work in progress! The current implementation seems to work 
for me, but might be improved, but I wanted some input from the mailing 
list before I optimize things.


The most important question is, if such a feature would find some love 
here. Personally it is essential for me because a single prepared 
statement uses up to 45MB in my application and there were cases where 
ORM-generated prepared statememts would crash my server after some time.


Then I would like to know if the current implementation would at least 
not crash (even it might by slow a bit) or if I have to take more care 
for locking in some places. I believe a backend is a single thread of 
execution but there were notes of invalidation messages that seem to be 
run asynchronously to the main thread. Could someone care to explain the 
treading model of a single backend to me? Does it react so signals and 
what would those signals change? Can I assume that only the 
ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback will be 
called async?



     Specifies the maximum amount of memory used in each
session to
cache
     parsed-and-rewritten queries and execution plans. This
affects
the maximum memory
 a backend threads will reserve when many prepared statements
are used.
 The default value of 0 disables this setting, but it is
recommended to set this
 value to a bit lower than the maximum memory a backend
worker
thread should reserve
 permanently.

If the GUC is configured after each save of a CachedPlanSource, or
after
creating a CachedPlan from it, the function
EnforcePreparedStatementLimit is called now. It checks the mem
usage of
the existing saved CachedPlanSources and invalidates the
query_list and
the gplan if available until the memory limit is met again.

CachedPlanSource are removed-and-tailadded in the saved_plan_list
everytime GetCachedPlan is called on them so it can be used as a
LRU list.


Could be a single move-to-tail function I would add.


I also reworked ResetPlanCache, PlanCacheRelCallback and
PlanCacheObjectCallback a bit so when a CachedPlanSource is
invalidated
the query_list is not only marked as invalid but it is also fully
released to free memory here.

Because this seems to work I was able to reuse the new ReleaseQueryList 
in my implementation.



Regards,
Daniel Migowski

PS@Konstantin: This patch also includes the CachedPlanMemoryUsage
function you like, maybe you like the review the patch for me?

--
Ibrar Ahmed

Daniel Migowski


Re: Global temporary tables

2019-08-18 Thread Pavel Stehule
ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 16.08.2019 20:17, Pavel Stehule wrote:
>
>
>
> pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> napsal:
>
>> I did more investigations of performance of global temp tables with
>> shared buffers vs. vanilla (local) temp tables.
>>
>> 1. Combination of persistent and temporary tables in the same query.
>>
>> Preparation:
>> create table big(pk bigint primary key, val bigint);
>> insert into big values
>> (generate_series(1,1),generate_series(1,1));
>> create temp table lt(key bigint, count bigint);
>> create global temp table gt(key bigint, count bigint);
>>
>> Size of table is about 6Gb, I run this test on desktop with 16GB of RAM
>> and postgres with 1Gb shared buffers.
>> I run two queries:
>>
>> insert into T (select count(*),pk/P as key from big group by key);
>> select sum(count) from T;
>>
>> where P is (100,10,1) and T is name of temp table (lt or gt).
>> The table below contains times of both queries in msec:
>>
>> Percent of selected data
>> 1%
>> 10%
>> 100%
>> Local temp table
>> 44610
>> 90
>> 47920
>> 891
>> 63414
>> 21612
>> Global temp table
>> 44669
>> 35
>> 47939
>> 298
>> 59159
>> 26015
>>
>> As you can see, time of insertion in temporary table is almost the same
>> and time of traversal of temporary table is about twice smaller for
>> global temp table
>> when it fits in RAM together with persistent table and slightly worser
>> when it doesn't fit.
>>
>>
>>
>> 2. Temporary table only access.
>> The same system, but Postgres is configured with shared_buffers=10GB,
>> max_parallel_workers = 4, max_parallel_workers_per_gather = 4
>>
>> Local temp tables:
>> create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint,
>> x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
>> insert into local_temp values
>> (generate_series(1,1),0,0,0,0,0,0,0,0);
>> select sum(x1) from local_temp;
>>
>> Global temp tables:
>> create global temporary table global_temp(x1 bigint, x2 bigint, x3
>> bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
>> insert into global_temp values
>> (generate_series(1,1),0,0,0,0,0,0,0,0);
>> select sum(x1) from global_temp;
>>
>> Results (msec):
>>
>> Insert
>> Select
>> Local temp table 37489
>> 48322
>> Global temp table 44358
>> 3003
>>
>> So insertion in local temp table is performed slightly faster but select
>> is 16 times slower!
>>
>> Conclusion:
>> In the assumption then temp table fits in memory, global temp tables with
>> shared buffers provides better performance than local temp table.
>> I didn't consider here global temp tables with local buffers because for
>> them results should be similar with local temp tables.
>>
>
> Probably there is not a reason why shared buffers should be slower than
> local buffers when system is under low load.
>
> access to shared memory is protected by spin locks (are cheap for few
> processes), so tests in one or few process are not too important (or it is
> just one side of space)
>
> another topic can be performance on MS Sys - there are stories about not
> perfect performance of shared memory there.
>
> Regards
>
> Pavel
>
>  One more test which is used to simulate access to temp tables under high
> load.
> I am using "upsert" into temp table in multiple connections.
>
> create global temp table gtemp (x integer primary key, y bigint);
>
> upsert.sql:
> insert into gtemp values (random() * 100, 0) on conflict(x) do update
> set y=gtemp.y+1;
>
> pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
>
>
> I failed to find some standard way in pgbech to perform per-session
> initialization to create local temp table,
> so I just insert this code in pgbench code:
>
> diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
> index 570cf33..af6a431 100644
> --- a/src/bin/pgbench/pgbench.c
> +++ b/src/bin/pgbench/pgbench.c
> @@ -5994,6 +5994,7 @@ threadRun(void *arg)
> {
> if ((state[i].con = doConnect()) == NULL)
> goto done;
> +   executeStatement(state[i].con, "create temp table
> ltemp(x integer primary key, y bigint)");
> }
> }
>
>
> Results are the following:
> Global temp table: 117526 TPS
> Local temp table:   107802 TPS
>
>
> So even for this workload global temp table with shared buffers are a
> little bit faster.
> I will be pleased if you can propose some other testing scenario.
>

please, try to increase number of connections.

Regards

Pavel


Allow to_date() and to_timestamp() to accept localized names

2019-08-18 Thread Juan José Santamaría Flecha
Hello,

Going through the current items in the wiki's todo list, I have been
looking into: "Allow to_date () and to_timestamp () to accept
localized month names".

It seems to me that the code is almost there, so I would like to know
if something like the attached patch would be a reasonable way to go.

Regards,

Juan José Santamaría Flecha


0001-Allow-localized-month-names-to_date.patch
Description: Binary data


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-18 Thread Peter Eisentraut
On 2019-08-17 08:16, Antonin Houska wrote:
> One problem that occurs to me is that PG may need to send some sort of
> credentials to the KMS. If it runs a separate process to execute the command,
> it needs to pass those credentials to it. Whether it does so via parameters or
> environment variables, both can be seen by other users.

You could do it via stdin or a file, perhaps.

Where would the PostgreSQL server ultimately get the KMS credentials from?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Implement INSERT SET syntax

2019-08-18 Thread Peter Eisentraut
On 2019-08-16 05:19, Amit Kapila wrote:
> I think this can be a handy feature in some cases as pointed by you,
> but do we really want it for PostgreSQL?  In the last round of
> discussions as pointed by you, there doesn't seem to be a consensus
> that we want this feature.  I guess before spending too much time into
> reviewing this feature, we should first build a consensus on whether
> we need this.

I think the problem this is attempting to solve is valid.

What I don't like about the syntax is that it kind of breaks the
notional processing model of INSERT in a fundamental way.  The model is

INSERT INTO $target $table_source

where $table_source could be VALUES, SELECT, possibly others in theory.

The proposed syntax changes this to only allow a single row to be
specified via the SET syntax, and the SET syntax does not function as a
row or table source in other contexts.

Let's think about how we can achieve this using existing concepts in
SQL.  What we really need here at a fundamental level is an option to
match $target to $table_source by column *name* rather than column
*position*.  There is existing syntax in SQL for that, namely

a UNION b

vs

a UNION CORRESPONDING b

I think this could be used for INSERT as well.

And then you need a syntax to assign column names inside the VALUES
rows.  I think you could do either of the following:

VALUES (a => 1, b => 2)

or

VALUES (1 AS a, 2 AS b)

Another nice effect of this would be that you could so something like

INSERT INTO tbl2 CORRESPONDING SELECT * FROM tbl1;

which copies the contents of tbl1 to tbl2 if they have the same column
names but allowing for a different column order.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Creating partitions automatically at least on HASH?

2019-08-18 Thread Fabien COELHO


Hello Robert & Robert,


- no partitions are created immediately (current case)
  but will have to be created manually later

- static partitions are created automatically, based on provided
  parameters

- dynamic partitions will be created later, when needed, based
  on provided parameters again.

Even if all that is not implemented immediately.

We need something that will let you specify just a modulus for hash 
partitions, a start, end, and interval for range partitions, and a list of 
bounds for list partitions.  If we're willing to create a new keyword, we 
could make PARTITIONS a keyword. Then:


PARTITION BY HASH (whatever) PARTITIONS 8


I think that it should reuse already existing keywords, i.e. MODULUS should 
appear somewhere.


Maybe:

 ... PARTITION BY HASH (whatever)
 [ CREATE [IMMEDIATE | DEFERRED] PARTITIONS (MODULUS 8) |
   NOCREATE or maybe NO CREATE ];


I have given a small go at the parser part of that.

There are 3 types of partitions with 3 dedicated syntax structures to 
handle their associated parameters (WITH …, FROM … TO …, IN …). ISTM that 
it is a "looks good from far away" idea, but when trying to extend that it 
is starting to be a pain. If a 4th partition type is added, should it be 
yet another syntax? So I'm looking for an generic and extensible syntax 
that could accomodate all cases for automatic creation of partitions.


Second problem, adding a "CREATE" after "PARTITION BY … (…)" create 
shift-reduce conflicts with potential other CREATE TABLE option 
specification syntax. Not sure which one, but anyway. So the current 
generic syntax I'm considering is using "DO" as a trigger to start the 
optional automatic partition creation stuff:


  CREATE TABLE Stuff (...)
PARTITION BY [HASH | RANGE | LIST] (…)
  DO NONE -- this is the default
  DO [IMMEDIATE|DEFERRED] USING (…)

Where the USING part would be generic keword value pairs, eg:

For HASH: (MODULUS 8) and/or (NPARTS 10)

For RANGE: (START '1970-01-01', STOP '2020-01-01', INCREMENT '1 year')
and/or (START 1970, STOP 2020, NPARTS 50)

And possibly for LIST: (IN (…), IN (…), …), or possibly some other 
keyword.


The "DEFERRED" could be used as an open syntax for dynamic partitioning, 
if later someone would feel like doing it.


ISTM that "USING" is better than "WITH" because WITH is already used 
specifically for HASH and other optional stuff in CREATE TABLE.


The text constant would be interpreted depending on the partitioning 
expression/column type.


Any opinion about the overall approach?

--
Fabien.

Re: [PATCH] Implement INSERT SET syntax

2019-08-18 Thread Vik Fearing
On 18/08/2019 11:03, Peter Eisentraut wrote:
>
> a UNION b
>
> vs
>
> a UNION CORRESPONDING b


I have a WIP patch for CORRESPONDING [BY].  Is there any interest in me
continuing it?  If so, I'll start another thread for it.

-- 

Vik Fearing





Re: [PATCH] Implement INSERT SET syntax

2019-08-18 Thread Tom Lane
Vik Fearing  writes:
> On 18/08/2019 11:03, Peter Eisentraut wrote:
>> a UNION b
>> vs
>> a UNION CORRESPONDING b

> I have a WIP patch for CORRESPONDING [BY].  Is there any interest in me
> continuing it?  If so, I'll start another thread for it.

CORRESPONDING is in the SQL standard, so in theory we ought to provide
it.  I think the hard question is how big/complicated the patch would be
--- if the answer is "complicated", maybe it's not worth it.  People
have submitted patches for it before that didn't go anywhere, suggesting
that the tradeoffs are not very good ... but maybe you'll think of a
better way.

regards, tom lane




Re: [PATCH] Implement INSERT SET syntax

2019-08-18 Thread Tom Lane
Peter Eisentraut  writes:
> What I don't like about the syntax is that it kind of breaks the
> notional processing model of INSERT in a fundamental way.

Agreed.  I really don't like that this only works for a VALUES-like case
(and only the one-row form at that).  It's hard to see it as anything
but a wart pasted onto the syntax.

> Let's think about how we can achieve this using existing concepts in
> SQL.  What we really need here at a fundamental level is an option to
> match $target to $table_source by column *name* rather than column
> *position*.  There is existing syntax in SQL for that, namely
> a UNION b
> vs
> a UNION CORRESPONDING b

A potential issue here --- and something that applies to Vik's question
as well, now that I think about it --- is that CORRESPONDING breaks down
in the face of ALTER TABLE RENAME COLUMN.  Something that had been a
legal query before the rename might be invalid, or mean something quite
different, afterwards.  This is really nasty for stored views/rules,
because we have neither a mechanism for forbidding input-table renames
nor a mechanism for revalidating views/rules afterwards.  Maybe we could
make it go by resolving CORRESPONDING in the rewriter or planner, rather
than in parse analysis; but that seems quite unpleasant as well.
Changing our conclusions about the data types coming out of a UNION
really shouldn't happen later than parse analysis.

The SET-style syntax doesn't have that problem, since it's explicit
about which values go into which columns.

Perhaps the way to resolve Peter's objection is to make the syntax
more fully like UPDATE:

INSERT INTO target SET c1 = x, c2 = y+z, ... FROM tables-providing-x-y-z

(with the patch as-submitted corresponding to the case with an empty
FROM clause, hence no variables in the expressions-to-be-assigned).

Of course, this is not functionally distinct from

INSERT INTO target(c1,c2,...) SELECT x, y+z, ... FROM tables-providing-x-y-z

and it's fair to question whether it's worth supporting a nonstandard
syntax just to allow the target column names to be written closer to
the expressions-to-be-assigned.

regards, tom lane




Re: pgsql: doc: Add some images

2019-08-18 Thread Jürgen Purtz

On 16.08.19 23:00, Alvaro Herrera wrote:

On 2019-Mar-27, Peter Eisentraut wrote:


doc: Add some images

Add infrastructure for having images in the documentation, in SVG
format.  Add two images to start with.  See the included README file
for instructions.
Author: Jürgen Purtz 
Author: Peter Eisentraut 

Now when I test Jürgen's new proposed image genetic-algorithm I find
that this stuff doesn't work in VPATH builds, at least for PDF -- I
don't get a build failure, but instead I get just a section title that
doesn't precede any actual image.  (There's a very small warning hidden
in the tons of other fop output).  If I edit the .fo file by hand to
make the path to .svg absolute, the image appears correctly.

I don't see any way in the fop docs to specify the base path for images.

I'm not sure what's a good way to fix this problem in a general way.
Would some new rule in the xslt would work?



Hello Alvaro,

it is be possible that you face the following situation: the image 
subdirectory contains all ditaa and graphviz source files, but not all 
corresponding svg files. Those svg files are created by the given 
Makefile of this subdirectory resp. should be included in git (and 
patches - what was not the case in one of my patches).


Can you acknowledge, that this is your starting situation when you miss 
the graphic in PDF? If no, please give us more information: operation 
system, ... . If yes, we have the following options:


a) Make sure that all svg files exists in addition to the original 
source files (as it was originally planned)


b) Run make in images subdirectory manually

c) Append a line "    $(MAKE) -C images" to Makefile of sgml directory 
for PDF, HTML and EPUB targets to check the dependencies within the 
images subdirectory.


Kind regards, Jürgen






[PATCH] ltree, lquery, and ltxtquery binary protocol support

2019-08-18 Thread Nino Floris
Hi,

Attached is a patch to support send/recv on ltree, lquery and ltxtquery.
I'm a contributor to the Npgsql .NET PostgreSQL driver and we'll be
the first to have official support once those ltree changes have been
released.
You can find the driver support work here, the tests verify a
roundtrip for each of the types is succesful.
https://github.com/NinoFloris/npgsql/tree/label-tree-support

Any feedback would be highly appreciated.

Greetings,
Nino Floris


0001-Implements-ltree-lquery-and-ltxtquery-binary-protoco.patch
Description: Binary data


Re: [PATCH] minor doc fix for create-role

2019-08-18 Thread Tom Lane
t...@anne.cat writes:
> Attached is a minor patch to fix the name param documentation for create 
> role, just adding a direct quote from user-manag.sgml talking about what the 
> role name is allowed to be.  I was searching for this information and figured 
> the reference page should have it as well.

Hm, I guess my reaction to this proposal is "why just here?".  We have
an awful lot of different CREATE commands, and none of them say more
about the target name than this one does.  (Not to mention ALTER, DROP,
etc.)  Perhaps it's worth adding some boilerplate text to all those
places, but I'm dubious.

Also, the specific text proposed for addition doesn't seem that helpful,
since it doesn't define which characters are "special characters".
I'd rather see something like "The name must be a valid SQL identifier
as defined in ."  But, while that would work fine
in HTML output, it would not be particularly short or useful in man-page
output.

Perhaps the ideal solution would be further markup on the synopsis
sections that somehow identifies each term as an "identifier" or
other appropriate syntactic category, and provides a hyperlink to
a definition (in output formats that are friendly to that).  Seems
like a lot of work though :-(

regards, tom lane




Re: Unused header file inclusion

2019-08-18 Thread Tom Lane
Andres Freund  writes:
> I've pushed the other ones.

Checking whether header files compile standalone shows you were overly
aggressive about removing fmgr.h includes:

In file included from /tmp/headerscheck.Ss8bVx/test.c:3:
./src/include/utils/selfuncs.h:143: error: expected declaration specifiers or 
'...' before 'FmgrInfo'
./src/include/utils/selfuncs.h:146: error: expected declaration specifiers or 
'...' before 'FmgrInfo'
./src/include/utils/selfuncs.h:152: error: expected declaration specifiers or 
'...' before 'FmgrInfo'

That's with a script I use that's like cpluspluscheck except it tests
with plain gcc not g++.  I attached it for the archives' sake.

Oddly, cpluspluscheck does not complain about those cases, but it
does complain about

In file included from /tmp/cpluspluscheck.FgX2SW/test.cpp:4:
./src/bin/scripts/scripts_parallel.h:18: error: ISO C++ forbids declaration of 
'PGconn' with no type
./src/bin/scripts/scripts_parallel.h:18: error: expected ';' before '*' token
./src/bin/scripts/scripts_parallel.h:29: error: 'PGconn' has not been declared

(My headerscheck script is missing that header; I need to update it to
match the latest version of cpluspluscheck.)

regards, tom lane

#!/bin/sh

# Check all exported PostgreSQL include files for standalone build.

# Run this from the top-level source directory after performing a build.
# No output if everything is OK, else compiler errors.

me=`basename $0`

tmp=`mktemp -d /tmp/$me.XX`

trap 'rm -rf $tmp' 0 1 2 3 15

# This should match what configure chooses, though we only care about -Wxxx
CFLAGS="-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -D_GNU_SOURCE -I . -I src/include -I src/interfaces/libpq $CFLAGS"

# Omit src/include/port/, because it's platform specific, and c.h includes
# the relevant file anyway.
# rusagestub.h is also platform-specific, and will be included by
# utils/pg_rusage.h if necessary.
# access/rmgrlist.h is not meant to be included standalone.
# regex/regerrs.h is not meant to be included standalone.
# parser/gram.h will be included by parser/gramparse.h.
# parser/kwlist.h is not meant to be included standalone.
# pg_trace.h and utils/probes.h can include sys/sdt.h from SystemTap,
# which itself contains C++ code and so won't compile with a C++
# compiler under extern "C" linkage.

for f in `find src/include src/interfaces/libpq/libpq-fe.h src/interfaces/libpq/libpq-events.h -name '*.h' -print | \
grep -v -e ^src/include/port/ \
	-e ^src/include/rusagestub.h -e ^src/include/regex/regerrs.h \
	-e ^src/include/access/rmgrlist.h \
	-e ^src/include/parser/gram.h -e ^src/include/parser/kwlist.h \
	-e ^src/include/pg_trace.h -e ^src/include/utils/probes.h`
do
	{
	test $f != "src/include/postgres_fe.h" && echo '#include "postgres.h"'
	echo "#include \"$f\""
	} >$tmp/test.c

	${CC:-gcc} $CFLAGS -c $tmp/test.c -o $tmp/test.o
done


Re: Rethinking opclass member checks and dependency strength

2019-08-18 Thread Tom Lane
Alexander Korotkov  writes:
> On Wed, Aug 7, 2019 at 7:28 PM Tom Lane  wrote:
>> This leads to the thought that maybe we could put some intelligence
>> into an index-AM-specific callback instead.  For example, for btree
>> and hash the appropriate rule is probably that cross-type operators
>> and functions should be tied to the opfamily while single-type
>> members are internally tied to the associated opclass.  For GiST,
>> GIN, and SPGiST it's not clear to me that *any* operator deserves
>> an INTERNAL dependency; only the implementation functions do.
>> 
>> Furthermore, if we had an AM callback that were charged with
>> deciding the right dependency links for all the operators/functions,
>> we could also have it do some validity checking on those things,
>> thus moving some of the checks made by amvalidate into a more
>> useful place.

> +1, sounds like a plan for me.

Here's a preliminary patch along these lines.  It adds an AM callback
that can adjust the dependency types before they're entered into
pg_depend.  There's a lot of stuff that's open for debate and/or
remains to be done:

* Is the parameter list of amcheckmembers() sufficient, or should we
pass more info (if so, what)?  In principle, the AM can always look
up anything else it needs to know from the provided OIDs, but that
would be cumbersome if many AMs need the same info.

* Do we need any more flexibility in the set of ways that the pg_depend
entries can be set up than what I've provided here?

* Are the specific ways that the entries are getting set up appropriate?
Note in particular that I left btree/hash alone, feeling that the default
(historical) behavior was designed for them and is not unreasonable; but
maybe we should switch them to the cross-type-vs-not-cross-type behavior
proposed above.  Also I didn't touch BRIN because I don't know enough
about it to be sure what would be best, and I didn't touch contrib/bloom
because I don't care too much about it.

* I didn't add any actual error checking to the checkmembers functions.
I figure that can be done in a followup patch, and it'll just be tedious
boilerplate anyway.

* I refactored things a little bit in opclasscmds.c, mostly by adding
an is_func boolean to OpFamilyMember and getting rid of parameters
equivalent to that.  This is based on the thought that AMs might prefer
to process the structs based on such a flag rather than by keeping them
in separate lists.  We could go further and merge the operator and
function structs into one list, forcing the is_func flag to be used;
but I'm not sure whether that'd be an improvement.

* I'm not at all impressed with the name, location, or concept of
opfam_internal.h.  I think we should get rid of that header and put
the OpFamilyMember struct somewhere else.  Given that this patch
makes it part of the AM API, it wouldn't be unreasonable to move it
to amapi.h.  But I've not done that here.

I'll add this to the upcoming commitfest.

regards, tom lane

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index cc16709..5f664ca 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -134,6 +134,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amproperty = NULL;
 	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = blvalidate;
+	amroutine->amcheckmembers = NULL;
 	amroutine->ambeginscan = blbeginscan;
 	amroutine->amrescan = blrescan;
 	amroutine->amgettuple = NULL;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index dd54c68..2815098 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -137,6 +137,7 @@ typedef struct IndexAmRoutine
 amproperty_function amproperty; /* can be NULL */
 ambuildphasename_function ambuildphasename;   /* can be NULL */
 amvalidate_function amvalidate;
+amcheckmembers_function amcheckmembers; /* can be NULL */
 ambeginscan_function ambeginscan;
 amrescan_function amrescan;
 amgettuple_function amgettuple; /* can be NULL */
@@ -496,7 +497,47 @@ amvalidate (Oid opclassoid);
the access method can reasonably do that.  For example, this might include
testing that all required support functions are provided.
The amvalidate function must return false if the opclass is
-   invalid.  Problems should be reported with ereport messages.
+   invalid.  Problems should be reported with ereport
+   messages, typically at INFO level.
+  
+
+  
+
+void
+amcheckmembers (Oid opfamilyoid,
+Oid opclassoid,
+List *operators,
+List *functions);
+
+   Validate proposed operator and function members of an operator family,
+   so far as the access method can reasonably do that, and set their
+   dependency types if the default is not satisfactory.  This is called
+   during CREATE OPERATOR CLASS and during
+   ALTER OPERATOR FAMILY ADD; in the latter
+   case opclassoid is InvalidOid.
+   The List arguments are lists
+   of OpFamilyMember structs, as def

Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos

2019-08-18 Thread Thomas Munro
On Wed, Apr 17, 2019 at 1:04 PM Thomas Munro  wrote:
> On Mon, Apr 15, 2019 at 7:57 PM  wrote:
> > I forgot to mention that this is happening in a docker container.
>
> Huh, so there may be some configuration of Linux container that can
> fail here with EPERM, even though that error that does not appear in
> the man page, and doesn't make much intuitive sense.  Would be good to
> figure out how that happens.

Steve Dodd ran into the same problem in Borg[1].  It looks like what's
happening here is that on PowerPC and ARM systems, there is a second
system call sync_file_range2 that has the arguments arranged in a
better order for their calling conventions (see Notes section of man
sync_file_range), and glibc helpfully translates for you, but some
container technologies forgot to include sync_file_range2 in their
syscall forwarding table.  Perhaps we should just handle this with the
not_implemented_by_kernel mechanism I added for WSL.

[1] https://lists.freedesktop.org/archives/systemd-devel/2019-August/043276.html

-- 
Thomas Munro
https://enterprisedb.com




Re: Zedstore - compressed in-core columnar storage

2019-08-18 Thread Justin Pryzby
On Thu, Aug 15, 2019 at 01:05:49PM +0300, Heikki Linnakangas wrote:
> We've continued hacking on Zedstore, here's a new patch version against
> current PostgreSQL master (commit f1bf619acdf). If you want to follow the
> development in real-time, we're working on this branch:
> https://github.com/greenplum-db/postgres/tree/zedstore

Thanks for persuing this.  It's an exciting development and I started
looking at how we'd put it to use.  I imagine we'd use it in favour of ZFS
tablespaces, which I hope to retire.

I've just done very brief experiment so far.  Some thoughts:

 . I was missing a way to check for compression ratio; it looks like zedstore
   with lz4 gets ~4.6x for our largest customer's largest table.  zfs using
   compress=gzip-1 gives 6x compression across all their partitioned tables,
   and I'm surprised it beats zedstore .

 . What do you think about pg_restore --no-tableam; similar to 
   --no-tablespaces, it would allow restoring a table to a different AM:
   PGOPTIONS='-c default_table_access_method=zedstore' pg_restore --no-tableam 
./pg_dump.dat -d postgres
   Otherwise, the dump says "SET default_table_access_method=heap", which
   overrides any value from PGOPTIONS and precludes restoring to new AM.

 . It occured to me that indices won't be compressed.  That's no fault of
   zedstore, but it may mean that some sites would need to retain their ZFS
   tablespace, and suggests the possibility of an separate, future project
   (I wonder if there's some way a new meta-AM could "enable" compression of
   other index AMs, to avoid the need to implement zbtree, zhash, zgin, ...).

 . it'd be nice if there was an ALTER TABLE SET ACCESS METHOD, to allow
   migrating data.  Otherwise I think the alternative is:
begin; lock t;
CREATE TABLE new_t LIKE (t INCLUDING ALL) USING (zedstore);
INSERT INTO new_t SELECT * FROM t;
for index; do CREATE INDEX...; done
DROP t; RENAME new_t (and all its indices). attach/inherit, etc.
commit;

 . Speaking of which, I think LIKE needs a new option for ACCESS METHOD, which
   is otherwise lost.

Cheers,
Justin




Re: Improve search for missing parent downlinks in amcheck

2019-08-18 Thread Alexander Korotkov
On Tue, Aug 13, 2019 at 11:44 PM Peter Geoghegan  wrote:
> > In this revision check for missing downlinks is combined with
> > bt_downlink_check().  So, pages visited by bt_downlink_check() patch
> > doesn't cause extra accessed.  It only causes following additional
> > page accesses:
> > 1) Downlinks corresponding to "negative infinity" keys,
> > 2) Pages of child level, which are not referenced by downlinks.
> >
> > But I think these two kinds are very minority, and those accesses
> > could be trade off with more precise missing downlink check without
> > bloom filter.  What do you think?
>
> I am generally in favor of making the downlink check absolutely
> reliable, and am not too worried about the modest additional overhead.
> After all, bt_index_parent_check() is supposed to be thorough though
> expensive. The only reason that I used a Bloom filter for
> fingerprinting downlink blocks was that it seemed important to quickly
> get amcheck coverage for subtle multi-level page deletion bugs just
> after v11 feature freeze. We can now come up with a better design for
> that.

Great!

> I was confused about how this patch worked at first. But then I
> remembered that Lehman and Yao consider downlinks to be distinct
> things to separator keys in internal pages. The high key of an
> internal page in the final separator key, so you have n downlinks and
> n + 1 separator keys per internal page -- two distinct things that
> appear in alternating order (the negative infinity item is not
> considered to have any separator key here). So, while internal page
> items are explicitly "(downlink, separator)" pairs that are grouped
> into a single tuple in nbtree, with a separate tuple just for the high
> key, Lehman and Yao would find it just as natural to treat them as
> "(separator, downlink)" pairs. You have to skip the first downlink on
> each internal level to make that work, but this makes our
> bt_downlink_check() check have something from the target page (child's
> parent page) that is like the high key in the child.
>
> It's already really confusing that we don't quite mean the same thing
> as Lehman and Yao when we say downlink (See also my long "why a
> highkey is never truly a copy of another item" comment block within
> bt_target_page_check()), and that is not your patch's fault. But maybe
> we need to fix that to make your patch easier to understand. (i.e.
> maybe we need to go over every use of the word "downlink" in nbtree,
> and make it say something more precise, to make everything less
> confusing.)

I agree that current terms nbtree use to describe downlinks and
separator keys may be confusing.  I'll try to fix this and come up
with patch if succeed.

> Other feedback:
>
> * Did you miss the opportunity to verify that every high key matches
> its right sibling page's downlink tuple in parent page? We talked
> about this already, but you don't seem to match the key (you only
> match the downlink block).
>
> * You are decoupling the new check from the bt_index_parent_check()
> "heapallindexed" option. That's probably a good thing, but you need to
> update the sgml docs.
>
> * Didn't you forget to remove the BtreeCheckState.rightsplit flag?
>
> * You've significantly changed the behavior of bt_downlink_check() --
> I would note this in its header comments. This is where ~99% of the
> new work happens.
>
> * I don't like that you use the loaded term "target" here -- anything
> called "target" should be BtreeCheckState.target:
>
> >  static void
> > -bt_downlink_missing_check(BtreeCheckState *state)
> > +bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
> > + BlockNumber targetblock, Page target)
> >  {
>
> If it's unclear what I mean, this old comment should make it clearer:
>
> /*
>  * State associated with verifying a B-Tree index
>  *
>  * target is the point of reference for a verification operation.
>  *
>  * Other B-Tree pages may be allocated, but those are always auxiliary (e.g.,
>  * they are current target's child pages).  Conceptually, problems are only
>  * ever found in the current target page (or for a particular heap tuple 
> during
>  * heapallindexed verification).  Each page found by verification's 
> left/right,
>  * top/bottom scan becomes the target exactly once.
>  */

The revised patch seems to fix all of above.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


amcheck-btree-improve-missing-parent-downlinks-check-3.patch
Description: Binary data


Re: Support for jsonpath .datetime() method

2019-08-18 Thread Alexander Korotkov
On Tue, Aug 13, 2019 at 12:08 AM Alexander Korotkov
 wrote:
> On Thu, Aug 1, 2019 at 1:31 PM Thomas Munro  wrote:
> > On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan
> >  wrote:
> > > On 7/23/19 6:48 PM, Nikita Glukhov wrote:
> > > > Some concrete pieces of review:
> > > >> +   
> > > >> +FF1
> > > >> +decisecond (0-9)
> > > >> +   
> > > >>
> > > >> Let's not use such weird terms as "deciseconds".  We could say
> > > >> "fractional seconds, 1 digit" etc. or something like that.
> > > > And what about "tenths of seconds", "hundredths of seconds"?
> > >
> > > Yes, those are much better.
> >
> > I've moved this to the September CF, still in "Waiting on Author" state.
>
> I'd like to summarize differences between standard datetime parsing
> and our to_timestamp()/to_date().

Let me describe my proposal to overcome these differences.

> 1) Standard defines much less datetime template parts.  Namely it defines:
>  | YYY | YY | Y
>  | RR
> MM
> DD
> DDD
> HH | HH12
> HH24
> MI
> SS
> S
> FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
> A.M. | P.M.
> TZH
> TZM
>
> We support majority of them and much more.

Regarding non-contradicting template parts we can support them in
.datetime() method too.  That would be our extension to standard.  See
no problem here.

> Incompatibilities are:
>  *  (our name is S),

Since  is not reserved, I'd propose to make  an alias for S.

>  * We don't support  | RR,
>  * Our handling of  | YYY | YY | Y is different.  What we have
> here is more like  | RR in standard (Nikita explained that
> upthread [1]),

I'd like to make  | YYY | YY | Y and  | RR behavior standard
conforming in both to_timestamp()/to_date() and .datetime().  Handling
these template parts differently in different functions would be
confusing for users.

>  * We don't support FF[1-9].  FF[1-6] are implemented in patch.  We
> can't support FF[7-9], because our binary representation of timestamp
> datatype don't have enough of precision.

I propose to postpone implementation of FF[7-9].  We can support them
later once we have precise enough datatypes.

> 2) Standard defines only following delimiters: , ,
> , , , , , .  And
> it requires strict matching of separators between template and input
> strings.  We don't do so either in FX or non-FX mode.
>
> For instance, we allow both to_date('2019/12/31', '-MM-DD') and
> to_date('2019/12/31', 'FX-MM-DD').  But according to standard this
> date should be written only as '2019-12-31' to match given template
> string.
>
> 4) For non-delimited template parts standard requires matching to
> digit sequences of lengths between 1 and maximum number of characters
> of that template part.  We don't always do so.  For instance, we allow
> more than 4 digits to correspond to , more than 3 digits to
> correspond to YYY and so on.
>
> # select to_date('2019-12-31', 'YYY-MM-DD');
>   to_date
> 
>  2019-12-31
> (1 row)

In order to implement these I'd like to propose introduction of
special do_to_timestamp() flag, which would define standard conforming
parsing.  This flag would be used in .datetime() jsonpath method.
Later we also should use it for CAST(... FORMAT ...) expression, which
should also do standard conforming parsing

> 3) Standard prescribes recognition of digits according to \p{Nd}
> regex.  \p{Nd} matches to "a digit zero through nine in any script
> except ideographic scripts".  As far as I remember, we currently do
> recognize only ASCII digits.

Support all unicode digit scripts would be cool for both
to_timestamp()/to_date() and standard parsing.  However, I think this
could be postponed.  Personally I didn't meet non-ascii digits in
databases yet.  If needed one can implement this later, shouldn't be
hard.

If no objections, Nikita and me will work on revised patchset based on
this proposal.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: FETCH FIRST clause PERCENT option

2019-08-18 Thread Ryan Lambert
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

The latest patch [1] and the cleanup patch [2] apply cleanly to the latest 
master (5f110933).  I reviewed the conversation and don't see any outstanding 
questions or concerns.  Updating status to ready for committer.

[1] 
https://www.postgresql.org/message-id/attachment/103028/percent-incremental-v6.patch
[2] 
https://www.postgresql.org/message-id/attachment/103157/percent-incremental-v6-comment-cleanup.patch

Ryan Lambert

The new status of this patch is: Ready for Committer


Re: pgsql: doc: Add some images

2019-08-18 Thread Alvaro Herrera
Hi Jürgen,

On 2019-Aug-18, Jürgen Purtz wrote:

> it is be possible that you face the following situation: the image
> subdirectory contains all ditaa and graphviz source files, but not all
> corresponding svg files. Those svg files are created by the given Makefile
> of this subdirectory resp. should be included in git (and patches - what was
> not the case in one of my patches).

Not really ... I did create the .svg file by invoking the make rule for
it manually.

The files do exist, but they are in the wrong directory: they're in
/pgsql/source/master/doc/src/sgml/images
and the "make postgres-A4.pdf" was invoked in
/pgsql/build/master/doc/src/sgml/images
   ^

As I said, if I edit the .fo file to change the "images/foobar.svg"
entry to read "/pgsql/source/master/doc/src/sgml/images/foobar.svg" then
the built docco correctly includes the image.

A "VPATH" build is one where I invoke "configure" from a different
directory.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Unused header file inclusion

2019-08-18 Thread Tom Lane
I wrote:
> (My headerscheck script is missing that header; I need to update it to
> match the latest version of cpluspluscheck.)

I did that, and ended up with the attached.  I'm rather tempted to stick
this into src/tools/ alongside cpluspluscheck, because it seems to find
rather different trouble spots than cpluspluscheck does.  Thoughts?

As of HEAD (927f34ce8), I get clean passes from both this and
cpluspluscheck, except for the FmgrInfo references in selfuncs.h.
I tried it on RHEL/Fedora, FreeBSD 12, and current macOS.

regards, tom lane

#!/bin/sh

# Check (almost) all PostgreSQL include files for standalone build.
#
# Argument 1 is the top-level source directory, argument 2 the
# top-level build directory (they might be the same). If not set, they
# default to the current directory.
#
# Needs to be run after configuring and creating all generated headers.
# It's advisable to configure --with-perl --with-python, or you're
# likely to get errors from associated headers.
#
# No output if everything is OK, else compiler errors.

if [ -z "$1" ]; then
srcdir="."
else
srcdir="$1"
fi

if [ -z "$2" ]; then
builddir="."
else
builddir="$2"
fi

me=`basename $0`

# Pull some info from configure's results.
MGLOB="$builddir/src/Makefile.global"
CFLAGS=`sed -n 's/^CFLAGS[ 	]*=[ 	]*//p' "$MGLOB"`
CPPFLAGS=`sed -n 's/^CPPFLAGS[ 	]*=[ 	]*//p' "$MGLOB"`
PG_SYSROOT=`sed -n 's/^PG_SYSROOT[ 	]*=[ 	]*//p' "$MGLOB"`
CC=`sed -n 's/^CC[ 	]*=[ 	]*//p' "$MGLOB"`
perl_includespec=`sed -n 's/^perl_includespec[ 	]*=[ 	]*//p' "$MGLOB"`
python_includespec=`sed -n 's/^python_includespec[ 	]*=[ 	]*//p' "$MGLOB"`

# needed on Darwin
CPPFLAGS=`echo "$CPPFLAGS" | sed "s|\\$(PG_SYSROOT)|$PG_SYSROOT|g"`

# (EXTRAFLAGS is not set here, but user can pass it in if need be.)

# Create temp directory.
tmp=`mktemp -d /tmp/$me.XX`

trap 'rm -rf $tmp' 0 1 2 3 15

# Scan all of src/ and contrib/ for header files.
for f in `cd "$srcdir" && find src contrib -name '*.h' -print`
do
	# Ignore files that are unportable or intentionally not standalone.

	# These files are platform-specific, and c.h will include the
	# one that's relevant for our current platform anyway.
	test "$f" = src/include/port/aix.h && continue
	test "$f" = src/include/port/cygwin.h && continue
	test "$f" = src/include/port/darwin.h && continue
	test "$f" = src/include/port/freebsd.h && continue
	test "$f" = src/include/port/hpux.h && continue
	test "$f" = src/include/port/linux.h && continue
	test "$f" = src/include/port/netbsd.h && continue
	test "$f" = src/include/port/openbsd.h && continue
	test "$f" = src/include/port/solaris.h && continue
	test "$f" = src/include/port/win32.h && continue

	# Additional Windows-specific headers.
	test "$f" = src/include/port/win32_port.h && continue
	test "$f" = src/include/port/win32/sys/socket.h && continue
	test "$f" = src/include/port/win32_msvc/dirent.h && continue
	test "$f" = src/port/pthread-win32.h && continue

	# Likewise, these files are platform-specific, and the one
	# relevant to our platform will be included by atomics.h.
	test "$f" = src/include/port/atomics/arch-arm.h && continue
	test "$f" = src/include/port/atomics/arch-hppa.h && continue
	test "$f" = src/include/port/atomics/arch-ia64.h && continue
	test "$f" = src/include/port/atomics/arch-ppc.h && continue
	test "$f" = src/include/port/atomics/arch-x86.h && continue
	test "$f" = src/include/port/atomics/fallback.h && continue
	test "$f" = src/include/port/atomics/generic.h && continue
	test "$f" = src/include/port/atomics/generic-acc.h && continue
	test "$f" = src/include/port/atomics/generic-gcc.h && continue
	test "$f" = src/include/port/atomics/generic-msvc.h && continue
	test "$f" = src/include/port/atomics/generic-sunpro.h && continue
	test "$f" = src/include/port/atomics/generic-xlc.h && continue

	# rusagestub.h is also platform-specific, and will be included
	# by utils/pg_rusage.h if necessary.
	test "$f" = src/include/rusagestub.h && continue

	# sepgsql.h depends on headers that aren't there on most platforms.
	test "$f" = contrib/sepgsql/sepgsql.h && continue

	# These files are not meant to be included standalone, because
	# they contain lists that might have multiple use-cases.
	test "$f" = src/include/access/rmgrlist.h && continue
	test "$f" = src/include/parser/kwlist.h && continue
	test "$f" = src/pl/plpgsql/src/pl_reserved_kwlist.h && continue
	test "$f" = src/pl/plpgsql/src/pl_unreserved_kwlist.h && continue
	test "$f" = src/interfaces/ecpg/preproc/c_kwlist.h && continue
	test "$f" = src/interfaces/ecpg/preproc/ecpg_kwlist.h && continue
	test "$f" = src/include/regex/regerrs.h && continue
	test "$f" = src/pl/plpgsql/src/plerrcodes.h && continue
	test "$f" = src/pl/plpython/spiexceptions.h && continue
	test "$f" = src/pl/tcl/pltclerrcodes.h && continue

	# We can't make these Bison output files compilable standalone
	# without using "%code require", which old Bison versions lack.
	# parser/gram.h will be include

RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-18 Thread Smith, Peter
> From: Ibrar Ahmed  Sent: Sunday, 18 August 2019 2:43 AM
> +1 for voice call, bruce we usually have a weekly TDE call. I will include 
> you in

If you don't mind, please also add me to that TDE call list.

Thanks/Regards,
---
Peter Smith
Fujitsu Australia


Re: Resume vacuum and autovacuum from interruption and cancellation

2019-08-18 Thread Masahiko Sawada
On Thu, Aug 8, 2019 at 10:42 PM Rafia Sabih  wrote:
>
> On Tue, 16 Jul 2019 at 13:57, Masahiko Sawada  wrote:
> >
> > On Wed, Jun 12, 2019 at 1:30 PM Masahiko Sawada  
> > wrote:
> > >
> > > Hi all,
> > >
> > > Long-running vacuum could be sometimes cancelled by administrator. And
> > > autovacuums could be cancelled by concurrent processes. Even if it
> > > retries after cancellation, since it always restart from the first
> > > block of table it could vacuums blocks again that we vacuumed last
> > > time. We have visibility map to skip scanning all-visible blocks but
> > > in case where the table is large and often modified, we're more likely
> > > to reclaim more garbage from blocks other than we processed last time
> > > than scanning from the first block.
> > >
> > > So I'd like to propose to make vacuums save its progress and resume
> > > vacuuming based on it. The mechanism I'm thinking is simple; vacuums
> > > periodically report the current block number to the stats collector.
> > > If table has indexes, reports it after heap vacuum whereas reports it
> > > every certain amount of blocks (e.g. 1024 blocks = 8MB) if no indexes.
> > > We can see that value on new column vacuum_resume_block of
> > > pg_stat_all_tables. I'm going to add one vacuum command option RESUME
> > > and one new reloption vacuum_resume. If the option is true vacuums
> > > fetch the block number from stats collector before starting and start
> > > vacuuming from that block. I wonder if we could make it true by
> > > default for autovacuums but it must be false when aggressive vacuum.
> > >
> > > If we start to vacuum from not first block, we can update neither
> > > relfrozenxid nor relfrozenxmxid. And we might not be able to update
> > > even relation statistics.
> > >
>
> Sounds like an interesting idea, but does it really help? Because if
> vacuum was interrupted previously, wouldn't it already know the dead
> tuples, etc in the next run quite quickly, as the VM, FSM is already
> updated for the page in the previous run.

Since tables are modified even during vacuum, if vacuum runs again
after interruption it could need to vacuum the part of table again
that has already been cleaned by the last vacuum. But the rest part of
the table is likely to have more garbage in many cases. Therefore I
think this would be helpful especially for a case where table is large
and heavily updated. Even if the table has not gotten dirtied since
the last vacuum it can skip already-vacuumed pages by looking vm or
the last vacuumed block. I think that it doesn't make thing worse than
today's vacuum in many cases.

>
> A few minor things I noticed in the first look,

Thanks for reviewing the patch.

> +/*
> + * When a table has no indexes, save the progress every 8GB so that we can
> + * resume vacuum from the middle of table. When table has indexes we save it
> + * after the second heap pass finished.
> + */
> +#define VACUUM_RESUME_BLK_INTERVAL 1024 /* 8MB */
> Discrepancy with the memory unit here.
>

Fixed.

> /* No found valid saved block number, resume from the first block */
> Can be better framed.

Fixed.

Attached the updated version patch.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From 4b38130a7ab369eceeb12f3b4153f47a83d96e23 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Wed, 10 Jul 2019 19:02:56 +0900
Subject: [PATCH v2] Add RESUME option to VACUUM and autovacuum.

This commit adds a new reloption, vaucum_resume, which controls
whether vacuum attempt to resume vacuuming from the last vacuumed
block saved at vacuum_resume_block column of pg_stat_all_tables. It
also adds a new option to the VACUUM command, RESUME which can be used
to override the reloption.
---
 doc/src/sgml/monitoring.sgml   |  5 ++
 doc/src/sgml/ref/vacuum.sgml   | 18 +
 src/backend/access/common/reloptions.c | 13 +++-
 src/backend/access/heap/vacuumlazy.c   | 95 --
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/commands/vacuum.c  | 13 
 src/backend/postmaster/pgstat.c| 42 
 src/backend/utils/adt/pgstatfuncs.c| 14 
 src/include/catalog/pg_proc.dat|  5 ++
 src/include/commands/vacuum.h  |  5 +-
 src/include/pgstat.h   | 14 
 src/include/utils/rel.h|  2 +
 src/test/regress/expected/rules.out|  3 +
 src/test/regress/expected/vacuum.out   | 20 ++
 src/test/regress/sql/vacuum.sql| 21 ++
 15 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..fe68113b02 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2784,6 +2784,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
  bigint
  Estimated number of rows modified since this table was last analyzed
 
+
+ vacuum_resume_blo

RE: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-18 Thread Smith, Peter
> -Original Message-
> From: Stephen Frost  Sent: Friday, 16 August 2019 11:01 AM

> Having direct integration with a KMS would certainly be valuable, and I don't 
> see a reason to deny users that option if someone would like to spend time
> implementing it- in addition to a simpler mechanism such as a passphrase 
> command, which I believe is what was being suggested here.

Yes. We recently made an internal PoC for FEP to enable it to reach out to AWS 
KMS whenever the MKEY was rotated or TDKEY was created. This was achieved by 
inserting some hooks in our TDE code - these hooks were implemented by a 
contrib-module loaded by the shared_preload_libraries GUC variable. So when no 
special "tdekey_aws" module was loaded, our TDE functionality simply reverts to 
its default (random) MDEK/TDEK keys. 

Even if OSS community chooses not to implement any KMS integration, the TDE 
design could consider providing hooks in a few appropriate places to make it 
easy for people who may need to add their own later.

Regards,
---
Peter Smith
Fujitsu Australia





Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements

2019-08-18 Thread Kyotaro Horiguchi
Hello,

At Sun, 18 Aug 2019 09:43:09 +0200, Daniel Migowski  
wrote in <6e25ca12-9484-8994-a1ee-40fdbe6af...@ikoffice.de>
> Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed:
> > On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski  > > wrote:
> >
> >
> > attached you find a patch that adds a new GUC:
> >
> >
> > Quick questions before looking at the patch.
> >
> >
> > prepared_statement_limit:
> >
> >  - Do we have a consensus about the name of GUC? I don't think it is
> > the right name for that.

The almost same was proposed [1] as a part of syscache-pruning
patch [2], but removed to concentrate on defining how to do that
on the first instance - syscache.  We have some mechanisms that
have the same characteristics - can be bloat and no means to keep
it in a certain size. It is better that they are treated the same
way, or at leaast on the same principle.

[1] 
https://www.postgresql.org/message-id/20180315.141246.130742928.horiguchi.kyotaro%40lab.ntt.co.jp
[2] https://commitfest.postgresql.org/23/931/

Pruning plancaches in any means is valuable, but we haven't
reached a concsensus on how to do that. My old patch does that
based on the number of entries because precise memory accounting
of memory contexts is too expensive. I didn't look this patch
closer but it seems to use MemoryContext->methods->stats to count
memory usage, which would be too expensive for the purpose. We
currently use it only for debug output on critical errors like
OOM.

> No, it is a proposal. It could also be named plancache_mem or
> cachedplansource_maxmem or anything else. It was intended to make
> prepared statements not use up all my mem, but development has shown
> that it could also be used for other CachedPlans, as long as it is a
> saved plan.
> >  - Is this a WIP patch or the final patch? Because I can see TODO and
> > non-standard
> > comments in the patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Fix typos and inconsistencies for HEAD (take 11)

2019-08-18 Thread Alexander Lakhin
Hello hackers,

Please consider fixing the last group (for now) of typos and
inconsistencies in the tree:

11.1 peforming -> performing
11.2 table_freeze_age -> freeze_table_age
11.3 TableInfoData -> TableDataInfo
11.4 TableStatus -> PgStat_TableStatus
11.5 TAG_VALID -> BM_TAG_VALID
11.6 TarDirectoryMethod -> WalTarMethod
11.7 targetss -> targets
11.8 tarReadRaw -> _tarReadRaw
11.9 Tcl_SetVar2 -> Tcl_SetVar2Ex
11.10 tcvn_to_utf8, utf8_to_tcvn -> windows_1258_to_utf8,
utf8_to_windows_1258
11.11 test_rls_hook_permissive, test_rls_hook_restrictive ->
test_rls_hooks_permissive, test_rls_hooks_restrictive
11.12 test_thread_funcs.c -> thread_test.c
11.13 threated -> treated
11.14 TidNode -> TidScan
11.15 tidPtr -> tss_TidPtr
11.16 TimestampTZ -> TimestampTz
11.17 TM_ZONE -> tm_zone
11.18 TOAST_DEBUG -> remove (orphaned since the introduction in 57d8080a)
11.19 TopTransactionState -> TopTransactionStateData
11.20 tpe -> type
11.21 tranasctions -> transactions
11.22 TransactionOid -> TransactionId
11.23 TupleLockUpdate -> LockTupleNoKeyExclusive
11.24 TUPLES_OK -> PGRES_TUPLES_OK
11.25 typstore -> tupstore
11.26 uniquePairs -> hstoreUniquePairs
11.27 upperIndex -> upperIndx
11.28 USER_NAME_SIZE -> remove (not used since the introduction in
c2e9b2f28)
11.29 VacLimit -> xidVacLimit
11.30 verfies -> verifies
11.31 verifymbstr -> pg_verifymbstr
11.32 VfdArray -> VfdCache
11.33 visca -> vice
11.34 v_spl_left, v_spl_right -> v->spl_left, v->spl_right
11.35 WaitforMultipleObjects -> WaitForMultipleObjects
11.36 walrcv_query -> walrcv_exec
11.37 max wal_size_mb -> max_wal_size_mb
11.38 WalWriteDelay -> WalWriterDelay
11.39 whichChkpti -> whichChkpt
11.40 win32_crashdump.c -> crashdump.c
11.41 worker_i -> worker_num
11.42 WorkingDirectorY -> WorkingDirectory
11.43 XactLockTableWaitErrorContextCb -> XactLockTableWaitErrorCb
11.44 xlblock -> xlblocks
11.45 xl_multi_insert -> xl_multi_insert_tuple
11.46 xlog_brin_update -> xl_brin_insert
11.47 XLogFakeRelcacheEntry -> CreateFakeRelcacheEntry
11.48 XLOGFileSlop -> XLOGfileslop
11.49 XLogreader -> XLogReader
11.50 XLogReadPage -> XLogRead
11.51 xl_rem_len -> xlp_rem_len
11.52 XMAX_EXCL_LOCK, XMAX_KEYSHR_LOCK -> HEAP_XMAX_EXCL_LOCK,
HEAP_XMAX_KEYSHR_LOCK
11.53 yyscanner_t -> yyscan_t
11.54 cdir -> cidr
11.55 examine_operator_expression -> examine_opclause_expression
11.56 list_qsort_comparator -> list_sort_comparator (renamed in 569ed7f4)
11.57 PASSBYVALUE -> PASSEDBYVALUE

Best regards,
Alexander
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f1d1005ae2..a7abf8c2ee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -2875,7 +2875,7 @@
   
 
   
-   tcvn_to_utf8
+   windows_1258_to_utf8
WIN1258
UTF8
   
@@ -3037,7 +3037,7 @@
   
 
   
-   utf8_to_tcvn
+   utf8_to_windows_1258
UTF8
WIN1258
   
diff --git a/src/backend/access/nbtree/nbtsplitloc.c b/src/backend/access/nbtree/nbtsplitloc.c
index a7882fd874..c83016af71 100644
--- a/src/backend/access/nbtree/nbtsplitloc.c
+++ b/src/backend/access/nbtree/nbtsplitloc.c
@@ -814,7 +814,7 @@ _bt_bestsplitloc(FindSplitData *state, int perfectpenalty,
 		final->firstoldonright < state->newitemoff + MAX_LEAF_INTERVAL)
 	{
 		/*
-		 * Avoid the problem by peforming a 50:50 split when the new item is
+		 * Avoid the problem by performing a 50:50 split when the new item is
 		 * just to the right of the would-be "many duplicates" split point.
 		 */
 		final = &state->splits[0];
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7d6c50b49d..e154507ecd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -863,7 +863,7 @@ get_all_vacuum_rels(int options)
  *	 DEAD or RECENTLY_DEAD (see HeapTupleSatisfiesVacuum).
  * - freezeLimit is the Xid below which all Xids are replaced by
  *	 FrozenTransactionId during vacuum.
- * - xidFullScanLimit (computed from table_freeze_age parameter)
+ * - xidFullScanLimit (computed from freeze_table_age parameter)
  *	 represents a minimum Xid value; a table whose relfrozenxid is older than
  *	 this will have a full-table vacuum applied to it, to freeze tuples across
  *	 the whole table.  Vacuuming a table younger than this value can use a
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 58e2432aac..20ee1d3fb4 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1776,7 +1776,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 scratch.d.rowcompare_final.rctype = rcexpr->rctype;
 ExprEvalPushStep(state, &scratch);
 
-/* adjust jump targetss */
+/* adjust jump targets */
 foreach(lc, adjust_jumps)
 {
 	ExprEvalStep *as = &state->steps[lfirst_int(lc)];
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 2bb14cdd02..d362e7f7d7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -893,7 +893,7 @

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-08-18 Thread Ahsan Hadi
On Mon, 19 Aug 2019 at 6:23 AM, Smith, Peter 
wrote:

> > From: Ibrar Ahmed  Sent: Sunday, 18 August 2019
> 2:43 AM
> > +1 for voice call, bruce we usually have a weekly TDE call. I will
> include you in
>
> If you don't mind, please also add me to that TDE call list.
>

Sure will do.


> Thanks/Regards,
> ---
> Peter Smith
> Fujitsu Australia
>


Re: POC: Cleaning up orphaned files using undo logs

2019-08-18 Thread Dilip Kumar
On Sat, Aug 17, 2019 at 9:35 PM Robert Haas  wrote:
>
> On Wed, Aug 14, 2019 at 12:39 PM Andres Freund  wrote:
> > > > Again, I think it's not ok to just assume you can lock an essentially
> > > > unbounded number of buffers. This seems almost guaranteed to result in
> > > > deadlocks. And there's limits on how many lwlocks one can hold etc.
> > >
> > > I think for controlling that we need to put a limit on max prepared
> > > undo?  I am not sure any other way of limiting the number of buffers
> > > because we must lock all the buffer in which we are going to insert
> > > the undo record under one WAL logged operation.
> >
> > I heard that a number of times. But I still don't know why that'd
> > actually be true. Why would it not be sufficient to just lock the buffer
> > currently being written to, rather than all buffers? It'd require a bit
> > of care updating the official current "logical end" of a log, but
> > otherwise ought to not be particularly hard? Only one backend can extend
> > the log after all, and until the log is externally visibily extended,
> > nobody can read or write those buffers, no?
>
> Well, I don't understand why you're on about this.  We've discussed it
> a number of times but I'm still confused.  I'll repeat my previous
> arguments on-list:
>
> 1. It's absolutely fine to just put a limit on this, because the
> higher-level facilities that use this shouldn't be doing a single
> WAL-logged operation that touches a zillion buffers.  We have been
> careful to avoid having WAL-logged operations touch an unbounded
> number of buffers in plenty of other places, like the btree code, and
> we are going to have to be similarly careful here for multiple
> reasons, deadlock avoidance being one.  So, saying, "hey, you're going
> to lock an unlimited number of buffers" is a straw man.  We aren't.
> We can't.

Right.  So basically, we need to put a limit on how many max undo can
be prepared under single WAL logged operation and that will internally
put the limit on max undo buffers.  Suppose we limit max_prepared_
undo to 2 then we need to lock at max 5 undo buffers.  We need to
somehow deal with the multi-insert code in the zheap because in that
code for inserting in a single page we write one undo record per range
if all the tuple which we are inserting on a single page are
interleaved.  But, maybe we can handle that by just inserting one undo
record which can have multiple ranges.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [proposal] de-TOAST'ing using a iterator

2019-08-18 Thread John Naylor
On Fri, Aug 16, 2019 at 10:48 PM Binguo Bao  wrote:
> [v8 patch with cosmetic changes]

Okay, looks good. I'll make a few style suggestions and corrections.
In the course of looking at this again, I have a few other questions
below as well.

It looks like you already do this for the most part, but I'll mention
that we try to keep lines, including comments, less than 80 characters
long. pgindent can try to fix that, but the results don't always look
nice.

About variable names: The iterator pointers are variously called
"iter", "iterator", and "fetch_iter". I found this confusing the first
time I read this code. I think we should use "iter" if we have only
one kind in the function, and "detoast_iter" and "fetch_iter" if we
have both kinds.
--

init_detoast_iterator():

+ * The "iterator" variable is normally just a local variable in the caller.

I don't think this comment is helpful to understand this function or its use.

+ * It only make sense to initialize de-TOAST iterator for external
on-disk value.

s/make/makes/
"a" de-TOAST iterator
s/value/values/

The comments in this function that start with "This is a ..." could be
shortened like this:

/* indirect pointer -- dereference it */

While looking at this again, I noticed we no longer need to test for
the in-line compressed case at all. I also tried some other cosmetic
rearrangements. Let me know what you think about the attached patch.
Also, I wonder if the VARATT_IS_EXTERNAL_INDIRECT case should come
first. Then the two normal cases are next to eachother.


free_detoast_iterator(), free_fetch_datum_iterator(), and free_toast_buffer():

These functions should return void.

+ * Free the memory space occupied by the de-TOAST iterator include buffers and
+ * fetch datum iterator.

Perhaps "Free memory used by the de-TOAST iterator, including buffers
and fetch datum iterator."

The check

if (iter->buf != iter->fetch_datum_iterator->buf)

is what we need to do for the compressed case. Could we use this
directly instead of having a separate state variable iter->compressed,
with a macro like this?

#define TOAST_ITER_COMPRESSED(iter) \
(iter->buf != iter->fetch_datum_iterator->buf)

Or maybe that's too clever?


detoast_iterate():

+ * As long as there is another data chunk in compression or external storage,

We no longer use the iterator with in-line compressed values.

+ * de-TOAST it into toast buffer in iterator.

Maybe "into the iterator's toast buffer"


fetch_datum_iterate():

My remarks for detoast_iterate() also apply here.


init_toast_buffer():

+ * Note the constrain buf->position <= buf->limit may be broken
+ * at initialization. Make sure that the constrain is satisfied
+ * when consume chars.

s/constrain/constraint/ (2 times)
s/consume/consuming/

Also, this comment might be better at the top the whole function?


pglz_decompress_iterate():

+ * Decompresses source into dest until the source is exhausted.

This comment is from the original function, but I think it would be
better to highlight the differences from the original, something like:

"This function is based on pglz_decompress(), with these additional
requirements:

1. We need to save the current control byte and byte position for the
caller's next iteration.

2. In pglz_decompress(), we can assume we have all the source bytes
available. This is not the case when we decompress one chunk at a
time, so we have to make sure that we only read bytes available in the
current chunk."

(I'm not sure about the term 'byte position', maybe there's a better one.)

+ * In the while loop, sp may go beyond the srcend, provides a four-byte
+ * buffer to prevent sp from reading unallocated bytes from source buffer.
+ * When source->limit reaches source->capacity, don't worry about reading
+ * unallocated bytes.

Here's my suggestion:

"In the while loop, sp may be incremented such that it points beyond
srcend. To guard against reading beyond the end of the current chunk,
we set srcend such that we exit the loop when we are within four bytes
of the end of the current chunk. When source->limit reaches
source->capacity, we are decompressing the last chunk, so we can (and
need to) read every byte."

+ for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)

Note you can also replace 8 with INVALID_CTRLC here.

tuptoaster.h:
+ * Constrains that need to be satisfied:

s/constrains/constraints/

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8

I think the macro might be better placed in pg_lzcompress.h, and for
consistency used in pglz_decompress(). Then the comment can be shorter
and more general. With my additional comment in
init_detoast_iterator(), hopefully it will be clear to readers.


--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Traini

Re: Add "password_protocol" connection parameter to libpq

2019-08-18 Thread Michael Paquier
On Fri, Aug 16, 2019 at 02:11:57PM -0400, Jonathan S. Katz wrote:
> To be pedantic, +1 on the channel_binding param.

Seems like we are moving in this direction then.  I don't object to
the introduction of this parameter.  We would likely want to do
something for downgrade attacks in other cases where channel binding
is not used, still there is verify-ca/full even in this case which
offer similar protections for MITM and eavesdropping.

> I would ask if scram-sha-256-plus makes the list if we have the
> channel_binding param?

No in my opinion.

> If channel_binding = require, it would essentially ignore any non-plus
> options in password_protocol and require scram-sha-256-plus. In a future
> scram-sha-512 world, then having scram-sha-256-plus or
> scram-sha-512-plus options in "password_protocol" then could be
> necessary based on what the user would prefer or require in their
> application.

Not including scram-sha-512-plus or scram-sha-256-plus in the
comma-separated list would be a problem for users willing to give for
example scram-sha-256,scram-sha-512-plus as an authorized list of
protocols but I don't think that it makes much sense as they basically
require an SSL connection for tls-server-end-point per the second
element.
--
Michael


signature.asc
Description: PGP signature


Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements

2019-08-18 Thread Daniel Migowski

Am 19.08.2019 um 05:57 schrieb Kyotaro Horiguchi:

At Sun, 18 Aug 2019 09:43:09 +0200, Daniel Migowski  wrote in 
<6e25ca12-9484-8994-a1ee-40fdbe6af...@ikoffice.de>

Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed:

On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski mailto:dmigow...@ikoffice.de>> wrote:


 attached you find a patch that adds a new GUC:


Quick questions before looking at the patch.


 prepared_statement_limit:

  - Do we have a consensus about the name of GUC? I don't think it is
the right name for that.

The almost same was proposed [1] as a part of syscache-pruning
patch [2], but removed to concentrate on defining how to do that
on the first instance - syscache.  We have some mechanisms that
have the same characteristics - can be bloat and no means to keep
it in a certain size. It is better that they are treated the same
way, or at least on the same principle.
Correct. However, I don't know the backend well enough to see how to 
unify this. Also time based eviction isn't that important for me, and 
I'd rather work with the memory used. I agree that memory leaks are all 
bad, and a time based eviction for some small cache entries might 
suffice, but CachedPlanSources take up up to 45MB EACH here, and looking 
at the memory directly seems desirable in that case.

[1] 
https://www.postgresql.org/message-id/20180315.141246.130742928.horiguchi.kyotaro%40lab.ntt.co.jp
[2] https://commitfest.postgresql.org/23/931/

Pruning plancaches in any means is valuable, but we haven't
reached a concsensus on how to do that. My old patch does that
based on the number of entries because precise memory accounting
of memory contexts is too expensive. I didn't look this patch
closer but it seems to use MemoryContext->methods->stats to count
memory usage, which would be too expensive for the purpose. We
currently use it only for debug output on critical errors like
OOM.


Yes, this looks like a place to improve. I hadn't looked at the stats 
function before, and wasn't aware it iterates over all chunks of the 
context. We really don't need all the mem stats, just the total usage 
and this calculates solely from the size of the blocks. Maybe we should 
add this counter (totalmemusage) to the MemoryContexts themselves to 
start to be able to make decisions based on the memory usage fast. 
Shouldn't be too slow because blocks seem to be aquired much less often 
than chunks and when they are aquired an additional add to variable in 
memory wouldn't hurt. One the other hand they should be as fast as 
possible given how often they are used in the database, so that might be 
bad.


Also one could precompute the memory usage of a CachedPlanSource when it 
is saved, when the query_list gets calculated (for plans about to be 
saved) and when the generic plan is stored in it. In combination with a 
fast stats function which only calculates the total usage this looks 
good for me. Also one could store the sum in a session global variable 
to make checking for the need of a prune run faster.


While time based eviction also makes sense in a context where the 
database is not alone on a server, contraining the memory used directly 
attacks the problem one tries to solve with time based eviction.






Re: POC: Cleaning up orphaned files using undo logs

2019-08-18 Thread Dilip Kumar
On Wed, Aug 14, 2019 at 10:35 PM Andres Freund  wrote:
>

>
> > >   - When reading an undo record, the whole stage of UnpackUndoData()
> > > reading data into a the UndoPackContext is omitted, reading directly
> > > into the UnpackedUndoRecord. That removes one further copy of the
> > > record format.
> > So we will read member by member to UnpackedUndoRecord?  because in
> > context we have at least a few headers packed and we can memcpy one
> > header at a time like UndoRecordHeader, UndoRecordBlock.
>
> Well, right now you then copy them again later, so not much is gained by
> that (although that later copy can happen without the content lock
> held). As I think I suggested before, I suspect that the best way would
> be to just memcpy() the data from the page(s) into an appropriately
> sized buffer with the content lock held, and then perform unpacking
> directly into UnpackedUndoRecord. Especially with the bulk API that will
> avoid having to do much work with locks held, and reduce memory usage by
> only unpacking the record(s) in a batch that are currently being looked
> at.
>
>
> > But that just a few of them so if we copy field by field in the
> > UnpackedUndoRecord then we can get rid of copying in context then copy
> > it back to the UnpackedUndoRecord.  Is this is what in your mind or
> > you want to store these structures (UndoRecordHeader, UndoRecordBlock)
> > directly into UnpackedUndoRecord?
>
> I at the moment see no reason not to?

Currently, In UnpackedUndoRecord we store all members directly which
are set by the caller.  We store pointers to some header which are
allocated internally by the undo layer and the caller need not worry
about setting them.  So now you are suggesting to put other headers
also as structures in UnpackedUndoRecord.  I as such don't have much
problem in doing that but I think initially Robert designed
UnpackedUndoRecord structure this way so it will be good if Robert
provides his opinion on this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com