Re: proposal: plpgsql pragma statement

2018-12-16 Thread Pavel Stehule
Hi

st 12. 12. 2018 v 9:03 odesílatel Pavel Stehule 
napsal:

>
>
> čt 6. 12. 2018 v 18:27 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> čt 6. 12. 2018 v 18:17 odesílatel Robert Haas 
>> napsal:
>>
>>> On Thu, Dec 6, 2018 at 12:13 PM Pavel Stehule 
>>> wrote:
>>> > My idea about plpgsql PRAGMA is very close to PL/SQL or Ada PRAGMA.
>>> This is not runtime statement - the information from this command will be
>>> assigned to related object - function, block, command at parser time.
>>>
>>> That's sensible, but the syntax you were proposing didn't look like it
>>> was related to a specific statement.  I was objecting to the idea that
>>> PRAGMA whatever; should be construed as an annotation of,
>>> specifically, the following statement.
>>>
>>
>> please, can you propose, some what you like?
>>
>> For my purpose I can imagine PRAGMA on function level with same syntax
>> like PL/SQL - I need to push somewhere some information that I can use for
>> plpgsql_check to protect users against false alarms. The locality in this
>> moment is not too important for me. But I prefer solution that doesn't
>> looks too strange, and is possible just with change plpgsql parser.
>>
>
> I looked to some documentation - and for example - the PL/SQL PRAGMA
> inline looks very similar to what I propose.
>
> For me good enough is block level pragma used in DECLARE section
>
> DECLARE
>   pragma plpgsql_check(OFF);
> BEGIN
>   .. this part will not be checked ..
> END;
>
> The syntax will be prepared for future PL/SQL pragmas like
> AUTONOMOUS_TRANSACTION, ..
>

here is block only level PRAGMA - available only from DECLARE part.

The informations are attached to PLpgSQL_stmt_block as List's field pragmas;

Note, if somebody will write support for autonomous transactions, then then
the PL/SQL syntax will be prepared. But my motivation is primary for some
possibility to push some parameters to plpgsql extensions with user
friendly persistent natural readable way.

Regards

Pavel


>
> Regards
>
> Pavel
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>> --
>>> Robert Haas
>>> EnterpriseDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>>
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 1f2abbb5d1..fc95d3e950 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -814,6 +814,49 @@ $$ LANGUAGE plpgsql;
 happen in a plain SQL command.

   
+
+  
+   Block level PRAGMA
+
+   
+A PL/pgSQL function supports pragma on block
+level. Pragma is a compiler directive, that can be used by
+PL/pgSQL compiler, or by any extensions that
+can work with PL/pgSQL code.
+   
+
+
+PRAGMA name;
+PRAGMA name (  argument_name =>  value );
+
+
+   
+The pragma can be used for plpgsql_check
+enabling/disabling code checking or for storing additional information:
+
+
+DECLARE
+  PRAGMA plpgsql_check(off);
+BEGIN
+  -- code inside block will not be checked by plpgsql_check
+  ...
+
+
+DECLARE
+  -- force routine volatility detection
+  PRAGMA plpgsql_check(volatility => volatile);
+  PRAGMA plpgsql_check(temporary_table => 'tmp_tab', '(a int, b int, c int)');
+BEGIN
+  ...
+
+
+More details are described in related extension's description.
+   
+
+   
+Unknown pragma is ignored.
+   
+  
   
 
   
diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 25a5a9d448..0c97ddbb12 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -27,7 +27,7 @@ DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql
 REGRESS_OPTS = --dbname=$(PL_TESTDB)
 
 REGRESS = plpgsql_call plpgsql_control plpgsql_domain plpgsql_record \
-	plpgsql_cache plpgsql_transaction plpgsql_varprops
+	plpgsql_cache plpgsql_transaction plpgsql_varprops plpgsql_pragma
 
 all: all-lib
 
diff --git a/src/pl/plpgsql/src/expected/plpgsql_pragma.out b/src/pl/plpgsql/src/expected/plpgsql_pragma.out
new file mode 100644
index 00..ffe5c7664a
--- /dev/null
+++ b/src/pl/plpgsql/src/expected/plpgsql_pragma.out
@@ -0,0 +1,11 @@
+do $$
+DECLARE
+  var int;
+  PRAGMA xxx;
+  PRAGMA do;
+  PRAGMA var; -- name can be any identifier
+  PRAGMA xxx(10, 10.1, '', "a"., off, on); -- supported types
+  PRAGMA xxx(label => value);
+BEGIN
+END;
+$$;
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index a979a5109d..53e707bdd5 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -111,6 +111,11 @@ static	PLpgSQL_expr	*read_cursor_args(PLpgSQL_var *cursor,
 static	List			*read_raise_options(void);
 static	void			check_raise_parameters(PLpgSQL_stmt_raise *stmt);
 
+/*
+ * local variable for collection pragmas inside one declare block
+ */
+static List		   *pragmas;
+
 %}
 
 %expect 0
@@ -146,6 +151,7 @@ static	void			check_raise_parameters(PLpgSQL_stmt_raise *stmt);
 			char *label;
 			int  n_initvars;
 			int  *initvarnos;
+			List *pragmas;
 		}		declhdr;
 		struct
 		{
@@ -166,6 +172,8 @@ static	void			ch

Re: plpgsql plugin - stmt_beg/end is not called for top level block of statements

2018-12-16 Thread Pavel Stehule
po 19. 11. 2018 v 19:37 odesílatel Pavel Stehule 
napsal:

> Hi
>
> I am playing with plpgsql profiling and and plpgsql plugin API. I found so
> callback stmt_beg and stmt_end was not called for top statement due direct
> call exec_stmt_block function.
>
> <-->estate.err_text = NULL;
> <-->estate.err_stmt = (PLpgSQL_stmt *) (func->action);
> <-->rc = exec_stmt_block(&estate, func->action);
> <-->if (rc != PLPGSQL_RC_RETURN)
> <-->{
> <--><-->estate.err_stmt = NULL;
> <--><-->estate.err_text = NULL;
>
> Isn't better to call exec_stmt there? Then plpgsql plugin function will be
> called really for every plpgsql statement.
>

Now, the statement's hook is not called for every plpgsql_stmt_block
statement. It is not big issue, but it is not consistent - and this
inconsistency should be repaired inside extension. Better to be consistent
and every plpgsql statement call identically.

patch attached - all regress tests passed. This patch has a effect only on
plpgsql extensions.

Regards

Pavel


> Regards
>
> Pavel
>
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 42358c2ebe..f292fcad2d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -585,7 +585,7 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
 	 */
 	estate.err_text = NULL;
 	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt_block(&estate, func->action);
+	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
 	if (rc != PLPGSQL_RC_RETURN)
 	{
 		estate.err_stmt = NULL;
@@ -955,7 +955,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
 	 */
 	estate.err_text = NULL;
 	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt_block(&estate, func->action);
+	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
 	if (rc != PLPGSQL_RC_RETURN)
 	{
 		estate.err_stmt = NULL;
@@ -1116,7 +1116,7 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
 	 */
 	estate.err_text = NULL;
 	estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-	rc = exec_stmt_block(&estate, func->action);
+	rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
 	if (rc != PLPGSQL_RC_RETURN)
 	{
 		estate.err_stmt = NULL;


Re: select limit error in file_fdw

2018-12-16 Thread Erik Rijkers

On 2018-12-16 07:03, Tom Lane wrote:

Erik Rijkers  writes:

I have noticed that since ffa4cbd623, a foreign table that pulls data
from a PROGRAM (in this case an unzip call) will fail if there is a
LIMIT on the SELECT
(while succeeding without LIMIT). Below is an example.


Um ... this example works for me, in both HEAD and v11 branch tip.
Moreover, the behavior you describe is exactly what ffa4cbd623 was
intended to fix.  Is there any chance that you got 11.1 and v11
branch tip mixed up?


I admit it's suspicious. I am assuming I pull the latest, from 
REL_11_STABLE, but I will have another hard look at my build stuff.


On the other hand, in that test.sh, have you tried enlarging the test 
table? It works for me too with small enough values in that 
generate_series.



If not, there must be some platform-specific behavior involved.
What are you testing on, exactly?


This is debian 9/Stretch:

/etc/os-release:
"Debian GNU/Linux 9 (stretch)"

uname -a
Linux gulo 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 
GNU/Linux


/proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 42
model name  : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping: 7
microcode   : 0x25
cpu MHz : 2299.645
cache size  : 6144 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp l
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf

bogomips: 6185.58
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


$ (PGSERVICE=dev11 psql -c "select version()")
 PostgreSQL 11.1_REL_11_STABLE_20181216_0458_171c on 
x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 
20170516, 64-bit

(1 row)

(note that '171c', tacked on via --with-extra-version)


What other info could be useful?






Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-16 Thread Thomas Munro
On Fri, Dec 14, 2018 at 4:14 PM Tom Lane  wrote:
> Andres Freund  writes:
> > On December 13, 2018 6:01:04 PM PST, Tom Lane  wrote:
> >> Has anyone tried to reproduce this on other platforms?
>
> > I recently also hit this locally, but since that's also Debian unstable... 
> > Note that removing openssl "fixed" the issue for me.
>
> FWIW, I tried to reproduce this on Fedora 28 and RHEL6, without success.
> It's possible that there's some significant detail of your configuration
> that I didn't match, but on the whole "bug in Debian unstable" seems
> like the most probable theory right now.

I was keen to try to bisect this, but I couldn't reproduce it on a
freshly upgraded Debian unstable VM, with --with-openssl, using "make
installcheck" under src/test/authentication.  I even tried using the
gold linker as skink does.  Maybe I'm using the wrong checker
options... Andres, can we see your exact valgrind invocation?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: select limit error in file_fdw

2018-12-16 Thread Erik Rijkers

On 2018-12-16 11:19, Erik Rijkers wrote:

On 2018-12-16 07:03, Tom Lane wrote:

Erik Rijkers  writes:

I have noticed that since ffa4cbd623, a foreign table that pulls data
from a PROGRAM (in this case an unzip call) will fail if there is a
LIMIT on the SELECT
(while succeeding without LIMIT). Below is an example.


Um ... this example works for me, in both HEAD and v11 branch tip.
Moreover, the behavior you describe is exactly what ffa4cbd623 was
intended to fix.  Is there any chance that you got 11.1 and v11
branch tip mixed up?


I admit it's suspicious. I am assuming I pull the latest, from
REL_11_STABLE, but I will have another hard look at my build stuff.


To circumvent a possible bug in my normal build stuff, I built
an instance from scratch, maybe someone could check for any errors
that I may have overlooked?

The instance built with this still has the LIMIT error.

I did notice that the error (as provoked by the earlier posted test.sh)
can be avoided by adding 'count(*) over ()' to the select list.
Not really surprising, I suppose.

Here is my scratch_build.sh:


#!/bin/bash

git --version

project=scratch

# shutdown - just in case it's running
/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast/pg_ctl 
\

  -D /home/aardvark/pg_stuff/pg_installations/pgsql.scratch/data \
  -l logfile -w stop

cd ~/tmp/

# if [[ 0 -eq 1 ]]; then
  git clone https://git.postgresql.org/git/postgresql.git
  cd postgresql
  git checkout REL_11_STABLE
# else
#   cd postgresql
# # git pull
# fi

echo "deleting stuff"
make distclean &> /dev/null

echo "rebuilding stuff"
time ( ./configure \
  --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.$project \
  
--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast 
\
  
--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.$project/lib.fast 
\
  --with-pgport=6011 --quiet --enable-depend --with-openssl  
--with-libxml \

  --with-libxslt --with-zlib  --enable-tap-tests \
  --with-extra-version=_$project \
 && make -j 6  && ( cd contrib; make ) \
 && make check \
 && make install && ( cd contrib; make install ) \
 &&   
/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast/initdb 
\
   -D /home/aardvark/pg_stuff/pg_installations/pgsql.$project/data -E 
UTF8 \
   -A scram-sha-256 --pwfile=/home/aardvark/pg_stuff/.11devel 
--data-checksums \

   --waldir=/home/aardvark/pg_stuff/pg_installations_wal/pgsql.$project
  rc=$?
  echo "rc [$rc]"
)

/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast/pg_ctl 
\

  -D /home/aardvark/pg_stuff/pg_installations/pgsql.scratch/data \
  -l logfile -w start

echo "select current_setting('port'), version()" \
  | 
/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast/psql 
-qX service=scratch

echo "
   create extension file_fdw;
   \\dx
" | 
/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast/psql 
-qX service=scratch


/home/aardvark/pg_stuff/pg_installations/pgsql.$project/bin.fast/pg_ctl 
\

  -D /home/aardvark/pg_stuff/pg_installations/pgsql.scratch/data \
  -l logfile -w stop



comments welcome.


thanks,

 Erik Rijkers
























On the other hand, in that test.sh, have you tried enlarging the test
table? It works for me too with small enough values in that
generate_series.


If not, there must be some platform-specific behavior involved.
What are you testing on, exactly?


This is debian 9/Stretch:

/etc/os-release:
"Debian GNU/Linux 9 (stretch)"

uname -a
Linux gulo 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 
GNU/Linux


/proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 42
model name  : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping: 7
microcode   : 0x25
cpu MHz : 2299.645
cache size  : 6144 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp l
bugs: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass 
l1tf

bogomips: 6185.58
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


$ (PGSERVICE=dev11 psql -c "select version()")
 PostgreSQL 11.1_REL_11_STABLE_20181216_0458_171c on
x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0
20170516, 64-bit
(1 row)

(note that '171c', tacked on via --with-extra-version)


What other info could be useful?




Re: [HACKERS] logical decoding of two-phase transactions

2018-12-16 Thread Tomas Vondra
Hi Nikhil,

Thanks for the updated patch - I've started working on a review, with
the hope of getting it committed sometime in 2019-01. But the patch
bit-rotted again a bit (probably due to d3c09b9b), which broke the last
part. Can you post a fixed version?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Alternative to \copy in psql modelled after \g

2018-12-16 Thread Daniel Verite
I wrote:

> I admit that if we could improve \g to handle COPY, it would be more
> elegant than the current proposal adding two meta-commands.
> 
> But the copy-workflow and non-copy-workflow are different, and in
> order to know which one to start, \g would need to analyze the query

It turns out I was wrong on this. The workflows are different but when
psql receives the first PGresult, it's still time to handle the I/O
redirection. It just differs from \copy in the case of an error:
\copy won't even send a command to the server if the local output
stream can't be opened, whereas COPY \g would send the command, and
will have to deal with the client-side error afterwards.

Well, except that up to now, COPY ... TO STDOUT \g file or \g |command
has been ignoring "file" or "|command", which is arguably a bug as Tom
puts it in another discussion in [1].

So as a replacement for the \copyto I was proposing earlier, PFA a patch
for COPY TO STDOUT to make use of the argument to \g.


[1] bug #15535
https://www.postgresql.org/message-id/6309.1544031...@sss.pgh.pa.us


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 62c2928..1488bb9 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1095,14 +1095,31 @@ ProcessResult(PGresult **results)
 * If pset.copyStream is set, use that as data 
source/sink,
 * otherwise use queryFout or cur_cmd_source as 
appropriate.
 */
-   FILE   *copystream = pset.copyStream;
+   FILE   *copystream = NULL;
PGresult   *copy_result;
 
SetCancelConn();
if (result_status == PGRES_COPY_OUT)
{
-   if (!copystream)
+   bool is_pipe;
+   if (pset.gfname)
+   {
+   /*
+* COPY TO STDOUT \g [|]file may be 
used as an alternative
+* to \copy
+*/
+   if (!openQueryOutputFile(pset.gfname, 
©stream, &is_pipe))
+   {
+   copystream = NULL; /* will 
discard the COPY data entirely */
+   }
+   if (is_pipe)
+   disable_sigpipe_trap();
+   }
+   else if (pset.copyStream)
+   copystream = pset.copyStream;
+   else
copystream = pset.queryFout;
+
success = handleCopyOut(pset.db,

copystream,

©_result) && success;
@@ -1117,11 +1134,25 @@ ProcessResult(PGresult **results)
PQclear(copy_result);
copy_result = NULL;
}
+
+   if (pset.gfname && copystream != NULL)
+   {
+   /* close \g argument file/pipe */
+   if (is_pipe)
+   {
+   pclose(copystream);
+   restore_sigpipe_trap();
+   }
+   else
+   {
+   fclose(copystream);
+   }
+   }
}
else
{
-   if (!copystream)
-   copystream = pset.cur_cmd_source;
+   /* COPY IN */
+   copystream = pset.copyStream ? pset.copyStream 
: pset.cur_cmd_source;
success = handleCopyIn(pset.db,
   
copystream,
   
PQbinaryTuples(*results),
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 555c633..645970e 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -426,6 +426,7 @@ do_copy(const char *args)
  * conn should be a database connection th

Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-12-16 Thread Alvaro Herrera
On 2018-Dec-15, Peter Eisentraut wrote:

> An example:
> 
> select pg_stat_statements_reset(
> 10,
> (select min(oid) from pg_database where datname like 'test%'),
> 1234);
> 
> (This is obviously a weird example, but it illustrates the
> language-theoretic point.)

This is not at all weird.  Lack of clarity on this point is the reason
that spawned this whole sub-thread, because a reset was removing data
that was not about the query that was intended.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2018-12-16 Thread Tomas Vondra
Hi,

Attached is an updated version of this patch series. It's meant to be
applied on top of the 2pc decoding patch [1], because streaming of
in-progress transactions requires handling of concurrent aborts. So it
may or may not apply directly to master, I'm not sure - unfortunately
that's likely to confuse the cputube thing, but I don't want to include
the 2pc decoding bits here because that would be just confusing.

If needed, the part introducing logical_work_mem limit for ReorderBuffer
can be separated and committed independently, but I do expect this to be
committed after the 2pc decoding patch so I've left it like this.

This new version is mostly just a rebase to current master (or almost,
because 2pc decoding only applies to 29180e5d78 due to minor bitrot),
but it also addresses the new stuff committed since last version (most
importantly decoding of TRUNCATE). It also fixes a bug in WAL-logging of
subxact assignments, where the assignment was included in records with
XID=0, essentially failing to track the subxact properly.

For the logical_work_mem part, I think this is quite solid. The main
question is how to pick transactions for eviction. For now it uses the
same approach as master (i.e. picking the largest top-level transaction,
although measured by amount of memory and not just number of changes).

But I've realized that may not work with Generation context that great,
because unlike AllocSet it does not reuse the memory. That's nice as it
allows freeing old blocks (which AllocSet can't), but it means a small
transaction can have a change on old blocks preventing free(). That is
something we have in pg11 already, because that's where Generation
context got introduced - I haven't seen this issue in practice, but we
might need to do something about it.

In any case, I'm thinking we may need to pick a different eviction
algorithm - say using a transaction with the oldest change (and loop
until we release at least one block in the Generation context), or maybe
look for block mixing changes from the smallest number of transactions,
or something like that. Other ideas are welcome. I don't think the exact
algorithm is particularly critical, because it's meant to be triggered
only very rarely (i.e. pick logical_work_mem high enough).

The in-progress streaming is mostly mechanical extension of existing
functionality (new methods in various APIs, ...) and refactoring of
ReorderBuffer to handle incremental decoding. I'm sure it'd benefit from
reviews, of course.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Add-logical_work_mem-to-limit-ReorderBuffer-20181216.patch.gz
Description: application/gzip


0002-Immediately-WAL-log-assignments-20181216.patch.gz
Description: application/gzip


0003-Issue-individual-invalidations-with-wal_lev-20181216.patch.gz
Description: application/gzip


0004-Extend-the-output-plugin-API-with-stream-me-20181216.patch.gz
Description: application/gzip


0005-Implement-streaming-mode-in-ReorderBuffer-20181216.patch.gz
Description: application/gzip


0006-Add-support-for-streaming-to-built-in-repli-20181216.patch.gz
Description: application/gzip


0007-Track-statistics-for-streaming-spilling-20181216.patch.gz
Description: application/gzip


0008-BUGFIX-set-final_lsn-for-subxacts-before-cl-20181216.patch.gz
Description: application/gzip


reorderbuffer: memory overconsumption with medium-size subxacts

2018-12-16 Thread Alvaro Herrera
Hello

Found this on Postgres 9.6, but I think it affects back to 9.4.

I've seen a case where reorderbuffer keeps very large amounts of memory
in use, without spilling to disk, if the main transaction does little or
no changes and many subtransactions execute changes just below the
threshold to spill to disk.

The particular case we've seen is the main transaction does one UPDATE,
then a subtransaction does something between 300 and 4000 changes.
Since all these are below max_changes_in_memory, nothing gets spilled to
disk.  (To make matters worse: even if there are some subxacts that do
more than max_changes_in_memory, only that subxact is spilled, not the
whole transaction.)  This was causing a 16GB-machine to die, unable to
process the long transaction; had to add additional 16 GB of physical
RAM for the machine to be able to process the transaction.

I think there's a one-line fix, attached: just add the number of changes
in a subxact to nentries_mem when the transaction is assigned to the
parent.  Since a wal ASSIGNMENT records happens once every 32 subxacts,
this accumulates just that number of subxact changes in memory before
spilling, which is much more reasonable.  (Hmm, I wonder why this
happens every 32 subxacts, if the code seems to be using
PGPROC_MAX_CACHED_SUBXIDS which is 64.)

Hmm, while writing this I am wonder if this affects cases with many
levels of subtransactions.  Not sure how are nested subxacts handled by
reorderbuffer.c, but reading code I think it is okay.

Of course, there's Tomas logical_work_mem too, but that's too invasive
to backpatch.

-- 
Álvaro HerreraPostgreSQL Expert, https://www.2ndQuadrant.com/
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f6ea315422..d5fdd7c6a6 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -869,6 +869,8 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 	/* Possibly transfer the subtxn's snapshot to its top-level txn. */
 	ReorderBufferTransferSnapToParent(txn, subtxn);
 
+	txn->nentries_mem += subtxn->nentries_mem;
+
 	/* Verify LSN-ordering invariant */
 	AssertTXNLsnOrder(rb);
 }
@@ -2333,7 +2335,6 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		spilled++;
 	}
 
-	Assert(spilled == txn->nentries_mem);
 	Assert(dlist_is_empty(&txn->changes));
 	txn->nentries_mem = 0;
 	txn->serialized = true;


Grant documentation about "all tables"

2018-12-16 Thread Lætitia Avrot
Hi all,

When you look at Postgres' SQL reference documentation for `GRANT`, the
`ALL TABLES` clause is explained as :

> ALL TABLES also affects views and foreign tables, just like
the specific-object GRANT command.

A colleague of mine was asking himself if it included materialized views or
not (well, yes it does).

I made that tiny patch to add materialized views to the list. It builds on
my laptop.

Then another question crossed my mind... What about partitioned tables ?
I'm pretty sure it works for them too (because they're basically tables)
but should we add them too ? I couldn't decide whether to add them too or
not so I refrain from doing it and am asking you the question.

What do you think ?

Cheers,

Lætitia
-- 
*Think! Do you really need to print this email ? *
*There is no Planet B.*
diff --git a/doc/src/sgml/ref/grant.sgml b/doc/src/sgml/ref/grant.sgml
index e98fe86052..37b5ccdf13 100644
--- a/doc/src/sgml/ref/grant.sgml
+++ b/doc/src/sgml/ref/grant.sgml
@@ -206,7 +206,7 @@ GRANT role_name [, ...] TO ALL
-   TABLES also affects views and foreign tables, just like the
+   TABLES also affects views, materialized views and foreign tables, just like the
specific-object GRANT command.  ALL
FUNCTIONS also affects aggregate and window functions, but not
procedures, again just like the specific-object GRANT


Re: Improving collation-dependent indexes in system catalogs

2018-12-16 Thread Alvaro Herrera
On 2018-Dec-15, Tom Lane wrote:

> I wrote:
> > While fooling with the idea of making type "name" collation
> > aware, it occurred to me that there's a better, more general
> > answer, which is to insist that collation-aware system catalog
> > columns must be marked with C collation.

> Concretely, this ...

Looks sane in a quick once-over.

I notice that some information_schema view columns end up with C
collation after this patch, and others remain with default collation.
Is that sensible?  (I think the only two cases where this might matter
at all are information_schema.parameters.parameter_name,
information_schema.routines.external_name and
information_schema.foreign_servers.foreign_server_type.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: select limit error in file_fdw

2018-12-16 Thread Tom Lane
Erik Rijkers  writes:
> On 2018-12-16 07:03, Tom Lane wrote:
>> Um ... this example works for me, in both HEAD and v11 branch tip.
>> Moreover, the behavior you describe is exactly what ffa4cbd623 was
>> intended to fix.  Is there any chance that you got 11.1 and v11
>> branch tip mixed up?

> [ nope ]
> On the other hand, in that test.sh, have you tried enlarging the test 
> table?

Yeah, I tried sizes ranging up to 1m tuples without success.

However, something else occurred to me this morning, and a bit
later I can reproduce the problem!  I did it by changing the
table's definition to use a shell pipeline:

 program   'unzip -p \"/tmp/t.zip\" \"tmp/t.txt\" | cat'

ERROR:  program "unzip -p "/tmp/t.zip" "tmp/t.txt" | cat" failed
DETAIL:  child process exited with exit code 141

What is happening for me, I think, is that if the PROGRAM is
just "unzip" then the shell exec's it directly, and so the
SIGPIPE result is reported directly to the PG backend.  But
if the PROGRAM is too complicated to handle that way, then
unzip and cat are children of a shell process, and one or both
of them get SIGPIPE, and the shell reports that as exit status
128 + SIGPIPE.  So we need to consider that result as indicating
a sigpipe failure, too.

It remains unclear why you had an intervening shell process when
I didn't, but perhaps that can be chalked up to use of a different
shell?

regards, tom lane



Re: select limit error in file_fdw

2018-12-16 Thread Tom Lane
I wrote:
> It remains unclear why you had an intervening shell process when
> I didn't, but perhaps that can be chalked up to use of a different
> shell?

To provide some data on that: popen() is presumably invoking /bin/sh,
which on my box is

$ /bin/sh --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

$ rpm -qf /bin/sh
bash-4.1.2-48.el6.x86_64

regards, tom lane



Re: Alternative to \copy in psql modelled after \g

2018-12-16 Thread Tom Lane
"Daniel Verite"  writes:
> So as a replacement for the \copyto I was proposing earlier, PFA a patch
> for COPY TO STDOUT to make use of the argument to \g.

Sounds plausible, please add to next commitfest so we don't forget it.

regards, tom lane



Re: select limit error in file_fdw

2018-12-16 Thread Erik Rijkers

On 2018-12-16 16:52, Tom Lane wrote:

Erik Rijkers  writes:

On 2018-12-16 07:03, Tom Lane wrote:

Um ... this example works for me, in both HEAD and v11 branch tip.
Moreover, the behavior you describe is exactly what ffa4cbd623 was
intended to fix.  Is there any chance that you got 11.1 and v11
branch tip mixed up?



[ nope ]
On the other hand, in that test.sh, have you tried enlarging the test
table?


Yeah, I tried sizes ranging up to 1m tuples without success.

However, something else occurred to me this morning, and a bit
later I can reproduce the problem!  I did it by changing the
table's definition to use a shell pipeline:

 program   'unzip -p \"/tmp/t.zip\" \"tmp/t.txt\" | cat'

ERROR:  program "unzip -p "/tmp/t.zip" "tmp/t.txt" | cat" failed
DETAIL:  child process exited with exit code 141


curious...

Adding that ' | cat' tail makes all 3 instances (that I have tried) 
fail:

  11.1 as released,
  REL_11_STABLE, and
  instance from d56e0fde

/bin/sh seems to be dash, here.

bash version is:
$ /bin/bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)




reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-16 Thread John Naylor
On 10/15/18, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2018-10-15 16:36:26 -0400, Tom Lane wrote:
>>> We could possibly fix these by changing the data structure so that
>>> what's in a ScanKeywords entry is an offset into some giant string
>>> constant somewhere.  No idea how that would affect performance, but
>>> I do notice that we could reduce the sizeof(ScanKeyword), which can't
>>> hurt.
>
>> Yea, that might even help performancewise. Alternatively we could change
>> ScanKeyword to store the keyword name inline, but that'd be a measurable
>> size increase...
>
> Yeah.  It also seems like doing it this way would improve locality of
> access: the pieces of the giant string would presumably be in the same
> order as the ScanKeywords entries, whereas with the current setup,
> who knows where the compiler has put 'em or in what order.
>
> We'd need some tooling to generate the constants that way, though;
> I can't see how to make it directly from kwlist.h.

A few months ago I was looking into faster search algorithms for
ScanKeywordLookup(), so this is interesting to me. While an optimal
full replacement would be a lot of work, the above ideas are much less
invasive and would still have some benefit. Unless anyone intends to
work on this, I'd like to flesh out the offset-into-giant-string
approach a bit further:

Since there are several callers of the current approach that don't use
the core keyword list, we'd have to keep the existing struct and
lookup function, to keep the complexity manageable. Once we have an
offset-based struct and function, it makes sense to use it for all
searches of core keywords. This includes not only the core scanner,
but also adt/rule_utils.c, fe_utils/string_utils.c, and
ecpg/preproc/keywords.c.

There would need to be a header with offsets replacing name strings,
generated from parser/kwlist.h, maybe kwlist_offset.h. It'd probably
be convenient if it was emitted into the common/ dir. The giant string
would likely need its own header (kwlist_string.h?).

Since PL/pgSQL uses the core scanner, we'd need to use offsets in its
reserved_keywords[], too. Those don't change much, so we can probably
get away with hard-coding the offsets and the giant string in that
case. (If that's not acceptable, we could separate that out to
pl_reserved_kwlist.h and reuse the above tooling to generate
pl_reserved_kwlist_{offset,string}.h, but that's more complex.)

The rest should be just a SMOP. Any issues I left out?

-John Naylor



Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2018-12-16 Thread Tomas Vondra
FWIW the original CF entry in 2018-07 [1] was marked as RWF. I'm not
sure what's the right way to resubmit such patches, so I've created a
new entry in 2019-01 [2] referencing the same hackers thread (and with
the same authors/reviewers metadata).

[1] https://commitfest.postgresql.org/19/1429/
[2] https://commitfest.postgresql.org/21/1927/

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Why aren't we using strsignal(3) ?

2018-12-16 Thread Tom Lane
While poking at the signal-reporting bug just pointed out by
Erik Rijkers, I couldn't help noticing how many places we have
that are doing some equivalent of this ugly dance:

#if defined(HAVE_DECL_SYS_SIGLIST) && HAVE_DECL_SYS_SIGLIST
{
charstr2[256];

snprintf(str2, sizeof(str2), "%d: %s", WTERMSIG(exitstatus),
 WTERMSIG(exitstatus) < NSIG ?
 sys_siglist[WTERMSIG(exitstatus)] : "(unknown)");
snprintf(str, sizeof(str),
 _("child process was terminated by signal %s"), str2);
}
#else
snprintf(str, sizeof(str),
 _("child process was terminated by signal %d"),
 WTERMSIG(exitstatus));
#endif

(Plus, there's at least one place that *should* be doing this and is not.)

Not only is this repetitive and unreadable, but it's also obsolete:
at least as far back as POSIX:2008, there's a function strsignal()
that you're supposed to use instead.

I propose to replace all these places with code like

snprintf(str, sizeof(str),
 _("child process was terminated by signal %d: %s"),
 WTERMSIG(exitstatus), pg_strsignal(WTERMSIG(exitstatus)));

where pg_strsignal is a trivial wrapper around strsignal() if that
exists, else it uses sys_siglist[] if that exists, else it just
returns "unrecognized signal".

regards, tom lane



Re: select limit error in file_fdw

2018-12-16 Thread Tom Lane
Erik Rijkers  writes:
> On 2018-12-16 16:52, Tom Lane wrote:
>> However, something else occurred to me this morning, and a bit
>> later I can reproduce the problem!  I did it by changing the
>> table's definition to use a shell pipeline:

> /bin/sh seems to be dash, here.

Hm.  That must be the relevant difference.  I just repeated the experiment
on a Fedora 28 box with reasonably up-to-date bash:

$ /bin/sh --version
GNU bash, version 4.4.23(1)-release (x86_64-redhat-linux-gnu)

and it behaves the same as my RHEL6 box: no problem with the direct
invocation of unzip, problem if use a pipeline.

Anyway, we know what to do, so I'll go do it.

regards, tom lane



Re: reorderbuffer: memory overconsumption with medium-size subxacts

2018-12-16 Thread Tomas Vondra
On 12/16/18 4:06 PM, Alvaro Herrera wrote:
> Hello
> 
> Found this on Postgres 9.6, but I think it affects back to 9.4.
> 
> I've seen a case where reorderbuffer keeps very large amounts of memory
> in use, without spilling to disk, if the main transaction does little or
> no changes and many subtransactions execute changes just below the
> threshold to spill to disk.
> 
> The particular case we've seen is the main transaction does one UPDATE,
> then a subtransaction does something between 300 and 4000 changes.
> Since all these are below max_changes_in_memory, nothing gets spilled to
> disk.  (To make matters worse: even if there are some subxacts that do
> more than max_changes_in_memory, only that subxact is spilled, not the
> whole transaction.)  This was causing a 16GB-machine to die, unable to
> process the long transaction; had to add additional 16 GB of physical
> RAM for the machine to be able to process the transaction.
> 

Yeah. We do the check for each xact separately, so it's vulnerable to
this scenario (subxact large just below max_changes_in_memory).

> I think there's a one-line fix, attached: just add the number of changes
> in a subxact to nentries_mem when the transaction is assigned to the
> parent.  Since a wal ASSIGNMENT records happens once every 32 subxacts,
> this accumulates just that number of subxact changes in memory before
> spilling, which is much more reasonable.  (Hmm, I wonder why this
> happens every 32 subxacts, if the code seems to be using
> PGPROC_MAX_CACHED_SUBXIDS which is 64.)
> 

Not sure, for a couple of reasons ...

Doesn't that essentially mean we'd always evict toplevel xact(s),
including all subxacts, no matter how tiny those subxacts are? That
seems like it might easily cause regressions for the common case with
many small subxact and one huge subxact (which is the only one we'd
currently spill, I think). That seems annoying.

But even if we decide it's the right approach, isn't the proposed patch
a couple of bricks shy? It redefines the nentries_mem field from "per
(sub)xact" to "total" for the toplevel xact, but then updates it only in
when assigning the child. But the subxacts may receive changes after
that, so ReorderBufferQueueChange() probably needs to update the value
for toplevel xact too, I guess. And ReorderBufferCheckSerializeTXN()
should probably check the toplevel xact too ...

Maybe a simpler solution would be to simply track total number of
changes in memory (from all xacts), and then evict the largest one. But
I doubt that would be backpatchable - it's pretty much what the
logical_work_mem patch does.

And of course, addressing this on pg11 is a bit more complicated due to
the behavior of Generation context (see the logical_work_mem thread for
details).

> Hmm, while writing this I am wonder if this affects cases with many
> levels of subtransactions.  Not sure how are nested subxacts handled by
> reorderbuffer.c, but reading code I think it is okay.
> 

That should be OK. The assignments don't care about the intermediate
subxacts, they only care about the toplevel xact and current subxact.

> Of course, there's Tomas logical_work_mem too, but that's too invasive
> to backpatch.
> 

Yep.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Should new partitions inherit their tablespace from their parent?

2018-12-16 Thread David Rowley
On Tue, 11 Dec 2018 at 15:43, Michael Paquier  wrote:
> +   parentrel = heap_openrv(parent, AccessExclusiveLock);
> So, in order to determine which tablespace should be used here, an
> exclusive lock is taken on the parent because its partition descriptor
> gets updated by the addition of the new partition.  This process is
> actually done already in MergeAttributes() as well, but it won't get
> triggered if a tablespace is defined directly in the CREATE TABLE
> statement.  I think that we should add a comment to explain the
> dependency between both, as well as why an exclusive lock is needed, so
> something among those lines perhaps?  Here is an idea:
> +   /*
> +* Grab an exclusive lock on the parent because its partition
> +* descriptor will be changed by the addition of the new partition.
> +* The same lock level is taken as when merging attributes below
> +* in MergeAttributes() to protect from lock upgrade deadlocks.
> +*/
>
> The position where the tablespace is chosen is definitely the good one.
>
> What do you think?

I think a comment in that location is a good idea. There's work being
done to reduce the lock level required for attaching a partition so a
comment here will help show that it's okay to reduce the lock level
for fetching the tablespace too.

I've attached an updated patch that includes the new comment. I didn't
use your proposed words though.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v4-0001-Allow-newly-created-partitions-to-inherit-their-p.patch
Description: Binary data


Re: Why aren't we using strsignal(3) ?

2018-12-16 Thread Alvaro Herrera
On 2018-Dec-16, Tom Lane wrote:

> I propose to replace all these places with code like
> 
> snprintf(str, sizeof(str),
>  _("child process was terminated by signal %d: %s"),
>  WTERMSIG(exitstatus), pg_strsignal(WTERMSIG(exitstatus)));
> 
> where pg_strsignal is a trivial wrapper around strsignal() if that
> exists, else it uses sys_siglist[] if that exists, else it just
> returns "unrecognized signal".

LGTM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reorderbuffer: memory overconsumption with medium-size subxacts

2018-12-16 Thread Andres Freund
Hi,

On 2018-12-16 12:06:16 -0300, Alvaro Herrera wrote:
> Found this on Postgres 9.6, but I think it affects back to 9.4.
> 
> I've seen a case where reorderbuffer keeps very large amounts of memory
> in use, without spilling to disk, if the main transaction does little or
> no changes and many subtransactions execute changes just below the
> threshold to spill to disk.
> 
> The particular case we've seen is the main transaction does one UPDATE,
> then a subtransaction does something between 300 and 4000 changes.
> Since all these are below max_changes_in_memory, nothing gets spilled to
> disk.  (To make matters worse: even if there are some subxacts that do
> more than max_changes_in_memory, only that subxact is spilled, not the
> whole transaction.)  This was causing a 16GB-machine to die, unable to
> process the long transaction; had to add additional 16 GB of physical
> RAM for the machine to be able to process the transaction.
> 
> I think there's a one-line fix, attached: just add the number of changes
> in a subxact to nentries_mem when the transaction is assigned to the
> parent.

Isn't this going to cause significant breakage, because we rely on
nentries_mem to be accurate?

/* try to load changes from disk */
if (entry->txn->nentries != entry->txn->nentries_mem)


Greetings,

Andres Freund



Remove double trailing semicolons

2018-12-16 Thread David Rowley
While looking over some recent commits, I noticed there are some code
lines with a double trailing semicolon instead of a single one.  The
attached fixes these.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_double_trailing_semicolons.patch
Description: Binary data


Re: reorderbuffer: memory overconsumption with medium-size subxacts

2018-12-16 Thread Alvaro Herrera
On 2018-Dec-16, Andres Freund wrote:

> > I think there's a one-line fix, attached: just add the number of changes
> > in a subxact to nentries_mem when the transaction is assigned to the
> > parent.
> 
> Isn't this going to cause significant breakage, because we rely on
> nentries_mem to be accurate?
> 
>   /* try to load changes from disk */
>   if (entry->txn->nentries != entry->txn->nentries_mem)

Bahh.

Are you suggesting I should try a more thorough rewrite of
reorderbuffer.c?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reorderbuffer: memory overconsumption with medium-size subxacts

2018-12-16 Thread Andres Freund
Hi,

On 2018-12-16 17:30:30 -0300, Alvaro Herrera wrote:
> On 2018-Dec-16, Andres Freund wrote:
> 
> > > I think there's a one-line fix, attached: just add the number of changes
> > > in a subxact to nentries_mem when the transaction is assigned to the
> > > parent.
> > 
> > Isn't this going to cause significant breakage, because we rely on
> > nentries_mem to be accurate?
> > 
> > /* try to load changes from disk */
> > if (entry->txn->nentries != entry->txn->nentries_mem)
> 
> Bahh.
> 
> Are you suggesting I should try a more thorough rewrite of
> reorderbuffer.c?

I'm saying you can't just randomly poke at one variable without actually
checking how it's used.  I kinda doubt you're going to find something
that's immediately backpatchable here. Perhaps something for master
that's then backpatched after it matured for a while.

Greetings,

Andres Freund



Re: don't create storage when unnecessary

2018-12-16 Thread Alvaro Herrera
On 2018-Dec-07, Michael Paquier wrote:

> A macro makes sense to control that.

I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to
RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and
thus would have relfilenode set to 0.  I think this is a bit misleading
either way.

> Now I have to admit that I don't
> like your solution.  Wouldn't it be cleaner to assign InvalidOid to
> relfilenode in such cases?  The callers of heap_create would need to be
> made smarter when they now pass down a relfilenode (looking at you,
> DefineIndex!), but that seems way more consistent to me.

I don't follow.  When a relfilenode is passed by callers, they indicate
that the storage has already been created.  Contrariwise, when a
relation kind that *does* have storage but is not yet created, they
pass InvalidOid as relfilenode, and heap_create is in charge of creating
storage.  Maybe I'm not quite seeing what problem you mean.  Or I could
add a separate boolean, but that seems pointless.

Another possible improvement is to remove the create_storage boolean 

> Some tests would also be welcome.

Added a test in sanity_check.sql that there's no relation with the
relkinds that aren't supposed to have storage.  Without the code fix it
fails in current regression database, but in the failure result set
there isn't any relation of kinds 'p' or 'I', so this isn't a terribly
comprehensive test -- the query runs too early in the regression
sequence.  I'm not sure it's worth bothering further.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 11debaa780..8aeddfa3d3 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -324,17 +324,13 @@ heap_create(const char *relname,
 		get_namespace_name(relnamespace), relname),
  errdetail("System catalog modifications are currently disallowed.")));
 
-	/*
-	 * Decide if we need storage or not, and handle a couple other special
-	 * cases for particular relkinds.
-	 */
+	/* Handle reltablespace for specific relkinds. */
 	switch (relkind)
 	{
 		case RELKIND_VIEW:
 		case RELKIND_COMPOSITE_TYPE:
 		case RELKIND_FOREIGN_TABLE:
 		case RELKIND_PARTITIONED_TABLE:
-			create_storage = false;
 
 			/*
 			 * Force reltablespace to zero if the relation has no physical
@@ -343,16 +339,7 @@ heap_create(const char *relname,
 			reltablespace = InvalidOid;
 			break;
 
-		case RELKIND_PARTITIONED_INDEX:
-			/*
-			 * Preserve tablespace so that it's used as tablespace for indexes
-			 * on future partitions.
-			 */
-			create_storage = false;
-			break;
-
 		case RELKIND_SEQUENCE:
-			create_storage = true;
 
 			/*
 			 * Force reltablespace to zero for sequences, since we don't
@@ -361,19 +348,21 @@ heap_create(const char *relname,
 			reltablespace = InvalidOid;
 			break;
 		default:
-			create_storage = true;
 			break;
 	}
 
 	/*
-	 * Unless otherwise requested, the physical ID (relfilenode) is initially
-	 * the same as the logical ID (OID).  When the caller did specify a
-	 * relfilenode, it already exists; do not attempt to create it.
+	 * Decide whether to create storage. If caller passed a valid relfilenode,
+	 * storage is already created, so don't do it here.  Also don't create it
+	 * for relkinds without physical storage.
 	 */
-	if (OidIsValid(relfilenode))
+	if (!RELKIND_CAN_HAVE_STORAGE(relkind) || OidIsValid(relfilenode))
 		create_storage = false;
 	else
+	{
+		create_storage = true;
 		relfilenode = relid;
+	}
 
 	/*
 	 * Never allow a pg_class entry to explicitly specify the database's
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index c3071db1cd..911686e6bb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1253,6 +1253,10 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	/* these relations kinds never have storage */
+	if (!RELKIND_CAN_HAVE_STORAGE(relation->rd_rel->relkind))
+		return;
+
 	if (relation->rd_rel->reltablespace)
 		relation->rd_node.spcNode = relation->rd_rel->reltablespace;
 	else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 84e63c6d06..321c9a1973 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -122,6 +122,19 @@ typedef FormData_pg_class *Form_pg_class;
  */
 #define		  REPLICA_IDENTITY_INDEX	'i'
 
+/*
+ * Relation kinds that have physical storage. These relations normally have
+ * relfilenode set to non-zero, but it can also be zero if the relation is
+ * mapped.
+ */
+#define RELKIND_CAN_HAVE_STORAGE(relkind) \
+	((relkind) == RELKIND_RELATION || \
+	 (relkind) == RELKIND_INDEX || \
+	 (relkind) == RELKIND_SEQUENCE || \
+	 (relkind) == RELKIND_TOASTVALUE || \
+	 (relkind) == RELKIND_MATVIEW)
+
+
 #endif			/* EXPOSE_TO_CLIENT_CODE */
 
 #

Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-16 Thread Andres Freund
Hi,

On 2018-12-16 22:33:00 +1100, Thomas Munro wrote:
> On Fri, Dec 14, 2018 at 4:14 PM Tom Lane  wrote:
> > Andres Freund  writes:
> > > On December 13, 2018 6:01:04 PM PST, Tom Lane  wrote:
> > >> Has anyone tried to reproduce this on other platforms?
> >
> > > I recently also hit this locally, but since that's also Debian 
> > > unstable... Note that removing openssl "fixed" the issue for me.
> >
> > FWIW, I tried to reproduce this on Fedora 28 and RHEL6, without success.
> > It's possible that there's some significant detail of your configuration
> > that I didn't match, but on the whole "bug in Debian unstable" seems
> > like the most probable theory right now.
> 
> I was keen to try to bisect this, but I couldn't reproduce it on a
> freshly upgraded Debian unstable VM, with --with-openssl, using "make
> installcheck" under src/test/authentication.  I even tried using the
> gold linker as skink does.  Maybe I'm using the wrong checker
> options... Andres, can we see your exact valgrind invocation?

Ok, I think I've narrowed this down a bit further. But far from
completely.  I don't think you need particularly special options, but
it's easy to miss the error, because it doesn't cause postmaster to exit
with an error.

It only happens when a bgworker is shutdown with SIGQUIT (be it
directly, or via postmaster immediate shutdown):

$ valgrind --quiet --error-exitcode=55 
--suppressions=/home/andres/src/postgresql/src/tools/valgrind.supp 
--suppressions=/home/andres/tmp/valgrind-global.supp --trace-children=yes 
--track-origins=yes --read-var-info=no --num-callers=20 --leak-check=no 
--gen-suppressions=all 
/home/andres/build/postgres/dev-assert/vpath/src/backend/postgres -D 
/srv/dev/pgdev-dev
2018-12-16 12:53:26.274 PST [1187] LOG:  listening on IPv4 address "127.0.0.1", 
port 5433

$ kill -QUIT 1187

==1194== Invalid read of size 8
==1194==at 0x4C3B5A5: check_free (dlerror.c:188)
==1194==by 0x4C3BAB1: free_key_mem (dlerror.c:221)
==1194==by 0x4C3BAB1: __dlerror_main_freeres (dlerror.c:239)
==1194==by 0x53D6F81: __libc_freeres (in /lib/x86_64-linux-gnu/libc-2.28.so)
==1194==by 0x482D19E: _vgnU_freeres (vg_preloaded.c:77)
==1194==by 0x567F54: bgworker_quickdie (bgworker.c:662)
==1194==by 0x48A86AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==1194==by 0x5367B76: epoll_wait (epoll_wait.c:30)
==1194==by 0x5EE7CC: WaitEventSetWaitBlock (latch.c:1078)
==1194==by 0x5EE6A5: WaitEventSetWait (latch.c:1030)
==1194==by 0x5EDDBC: WaitLatchOrSocket (latch.c:407)
==1194==by 0x5EDC23: WaitLatch (latch.c:347)
==1194==by 0x5992D7: ApplyLauncherMain (launcher.c:1062)
==1194==by 0x568245: StartBackgroundWorker (bgworker.c:835)
==1194==by 0x57C295: do_start_bgworker (postmaster.c:5742)
==1194==by 0x57C631: maybe_start_bgworkers (postmaster.c:5955)
==1194==by 0x578C3C: reaper (postmaster.c:2940)
==1194==by 0x48A86AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==1194==by 0x535F3B6: select (select.c:41)
==1194==by 0x576A9F: ServerLoop (postmaster.c:1677)
==1194==by 0x57642A: PostmasterMain (postmaster.c:1386)
==1194==  Address 0x708d488 is 12 bytes after a block of size 12 alloc'd
==1194==at 0x483577F: malloc (vg_replace_malloc.c:299)
==1194==by 0x4AD8D38: CRYPTO_zalloc (mem.c:230)
==1194==by 0x4AD4F8D: ossl_init_get_thread_local (init.c:66)
==1194==by 0x4AD4F8D: ossl_init_get_thread_local (init.c:59)
==1194==by 0x4AD4F8D: ossl_init_thread_start (init.c:426)
==1194==by 0x4AFE5B9: RAND_DRBG_get0_public (drbg_lib.c:1118)
==1194==by 0x4AFE5EF: drbg_bytes (drbg_lib.c:963)
==1194==by 0x7F6DD9: pg_strong_random (pg_strong_random.c:135)
==1194==by 0x57B70F: RandomCancelKey (postmaster.c:5251)
==1194==by 0x57C367: assign_backendlist_entry (postmaster.c:5822)
==1194==by 0x57C0F2: do_start_bgworker (postmaster.c:5692)
==1194==by 0x57C631: maybe_start_bgworkers (postmaster.c:5955)
==1194==by 0x578C3C: reaper (postmaster.c:2940)
==1194==by 0x48A86AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==1194==by 0x535F3B6: select (select.c:41)
==1194==by 0x576A9F: ServerLoop (postmaster.c:1677)
==1194==by 0x57642A: PostmasterMain (postmaster.c:1386)
==1194==by 0x4997E0: main (main.c:228)


I now suspect this is a more longrunning issue than I thought. Not all
my valgrind buildfarm branches have ssl enabled (due to an ssl issue a
while back). And previously this wouldn't have been caught, because it
doesn't cause postmaster to fail, it's just that Andrew added a script
that checks logs for valgrind bleats.


The interesting bit is that if I replace the _exit(2) in
bgworker_quickdie() with an exit(2) (i.e. processing atexit handlers),
or manully add an OPENSSL_cleanup() before the _exit(2), valgrind
doesn't find errors.

The fact that one needs an immediate shutdown in a bgworker, with
openssl enabled, explains why this is hard to hit...

Greetings,

Andres Freund



Re: Improving collation-dependent indexes in system catalogs

2018-12-16 Thread Tom Lane
Alvaro Herrera  writes:
> I notice that some information_schema view columns end up with C
> collation after this patch, and others remain with default collation.
> Is that sensible?  (I think the only two cases where this might matter
> at all are information_schema.parameters.parameter_name,
> information_schema.routines.external_name and
> information_schema.foreign_servers.foreign_server_type.)

Yeah.  Looking closer at that, there are no collation-sensitive indexes
in information_schema (if there were, the existing opr_sanity test would
have caught 'em).  But there are collation-sensitive table columns, which
do have pg_statistic entries, and those entries are at least nominally
broken by copying them into a database with a different default collation.

We could think of two ways to deal with that.  One is to plaster
COLLATE "C" on each textual table column in the information_schema.
A more aggressive approach is to attach COLLATE "C" to each of the
domain types that information_schema defines, which fixes the table
columns a fortiori, and also causes all of the exposed information_schema
view columns to acquire database-independent collations.

I tried both ways, as in the attached patches below (each meant to be
applied on top of my patch upthread), and they both pass check-world.

A possible advantage of the second approach is that it could end up
allowing comparisons on information_schema view columns to be translated
to indexable comparisons on the underlying "name" columns, which would
be a pleasant outcome.  On the other hand, people might be annoyed by
the semantics change, if they'd previously been doing that with the
expectation of getting database-collation-based comparisons.

I'm not sure whether the SQL standard says anything that either patch
would be violating.  I see that it does say that these domains have
CHARACTER SET SQL_TEXT, and that the collation of that character set
is implementation-defined, so I think we could get away with changing
so far as spec compliance is concerned.

regards, tom lane

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 6227a8f3..742e2b6 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1576,13 +1576,13 @@ GRANT SELECT ON sequences TO PUBLIC;
  */
 
 CREATE TABLE sql_features (
-feature_id  character_data,
-feature_namecharacter_data,
-sub_feature_id  character_data,
-sub_feature_namecharacter_data,
-is_supportedyes_or_no,
-is_verified_by  character_data,
-commentscharacter_data
+feature_id  character_data COLLATE "C",
+feature_namecharacter_data COLLATE "C",
+sub_feature_id  character_data COLLATE "C",
+sub_feature_namecharacter_data COLLATE "C",
+is_supportedyes_or_no COLLATE "C",
+is_verified_by  character_data COLLATE "C",
+commentscharacter_data COLLATE "C"
 );
 
 -- Will be filled with external data by initdb.
@@ -1599,11 +1599,11 @@ GRANT SELECT ON sql_features TO PUBLIC;
 -- clause 9.1.
 
 CREATE TABLE sql_implementation_info (
-implementation_info_id  character_data,
-implementation_info_namecharacter_data,
+implementation_info_id  character_data COLLATE "C",
+implementation_info_namecharacter_data COLLATE "C",
 integer_value   cardinal_number,
-character_value character_data,
-commentscharacter_data
+character_value character_data COLLATE "C",
+commentscharacter_data COLLATE "C"
 );
 
 INSERT INTO sql_implementation_info VALUES ('10003', 'CATALOG NAME', NULL, 'Y', NULL);
@@ -1628,13 +1628,13 @@ GRANT SELECT ON sql_implementation_info TO PUBLIC;
  */
 
 CREATE TABLE sql_languages (
-sql_language_source character_data,
-sql_language_year   character_data,
-sql_language_conformancecharacter_data,
-sql_language_integrity  character_data,
-sql_language_implementation character_data,
-sql_language_binding_style  character_data,
-sql_language_programming_language character_data
+sql_language_source character_data COLLATE "C",
+sql_language_year   character_data COLLATE "C",
+sql_language_conformancecharacter_data COLLATE "C",
+sql_language_integrity  character_data COLLATE "C",
+sql_language_implementation character_data COLLATE "C",
+sql_language_binding_style  character_data COLLATE "C",
+sql_language_programming_language character_data COLLATE "C"
 );
 
 INSERT INTO sql_languages VALUES ('ISO 9075', '1999', 'CORE', NULL, NULL, 'DIRECT', NULL);
@@ -1651,11 +1651,11 @@ GRANT SELECT ON sql_languages TO PUBLIC;
  */
 
 CREATE TABLE sql_packages (
-feature_id  character_data,
-feature_namecharacter_data,
-is_supported  

Re: select limit error in file_fdw

2018-12-16 Thread Erik Rijkers

On 2018-12-16 19:10, Tom Lane wrote:


Anyway, we know what to do, so I'll go do it.



Thank you very much. I've now also tested with the original, much larger 
file, which gives no problem anymore. I am really glad this works again, 
we use this stuff a lot.



Erik Rijkers






Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-16 Thread Thomas Munro
On Mon, Dec 17, 2018 at 7:57 AM Andres Freund  wrote:
> The interesting bit is that if I replace the _exit(2) in
> bgworker_quickdie() with an exit(2) (i.e. processing atexit handlers),
> or manully add an OPENSSL_cleanup() before the _exit(2), valgrind
> doesn't find errors.

Weird.  Well I can see that there were bugs last year where OpenSSL
failed to clean up its thread locals[1], and after they fixed that,
cases where it bogusly cleaned up someone else's thread locals[2].
Maybe there is some interference between pthread keys or something
like that.

[1] https://github.com/openssl/openssl/issues/3033
[2] https://github.com/openssl/openssl/issues/3584


-- 
Thomas Munro
http://www.enterprisedb.com



Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-16 Thread Andres Freund
Hi,

On 2018-12-17 08:25:38 +1100, Thomas Munro wrote:
> On Mon, Dec 17, 2018 at 7:57 AM Andres Freund  wrote:
> > The interesting bit is that if I replace the _exit(2) in
> > bgworker_quickdie() with an exit(2) (i.e. processing atexit handlers),
> > or manully add an OPENSSL_cleanup() before the _exit(2), valgrind
> > doesn't find errors.
> 
> Weird.  Well I can see that there were bugs last year where OpenSSL
> failed to clean up its thread locals[1], and after they fixed that,
> cases where it bogusly cleaned up someone else's thread locals[2].
> Maybe there is some interference between pthread keys or something
> like that.
> 
> [1] https://github.com/openssl/openssl/issues/3033
> [2] https://github.com/openssl/openssl/issues/3584

What confuses the heck out of me is that it happens on _exit(). Those
issues ought to be only visible when doing exit(), no?

I guess there's also a good argument to make that valgrind running it's
intercept in the _exit() case is a bit dubious (given that's going to be
used in cases where e.g. a signal handler might have interrupted a
malloc), but given the stacktraces here I don't think that can be the
cause.

Greetings,

Andres Freund



Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-16 Thread Alexander Korotkov
On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
 wrote:
> On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> > Is there any reason something like that isn't necessary for gist? If so,
> > where's that documented? If not, uh, isn't that a somewhat serious bug
> > in gist?
>
> Thank you for pointing!  This looks like a bug for me too.  I'm going
> to investigate more on this and provide a fix in next couple of days.

Sorry for delay.  Attached patch implements conflict handling for gist
microvacuum like btree and hash.  I'm going to push it if no
objections.

Note, that it implements new WAL record type.  So, new WAL can\t be
replayed on old minor release.  I'm note sure if we claim that it's
usually possible.  Should we state something explicitly for this case?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


gist-microvacuum-conflict-handling.patch
Description: Binary data


Re: Should new partitions inherit their tablespace from their parent?

2018-12-16 Thread Alvaro Herrera
I didn't like this, so I rewrote it a little bit.

First, I changed the Assert() to use the macro for relations with
storage that I just posted in the other thread that Michael mentioned.

I then noticed that we're doing a heap_openrv() in the parent relation
and closing it before MergeAttribute() does the same thing all over
again; also MergeAttribute has the silly array-of-OIDs return value for
the parents so that DefineRelation can handle it again later.  Rube
Goldberg says hi.  I changed this so that *before* doing anything with
the parent list, we transform it to a list of OIDs, and lock them; so
MergeAttributes now receives the list of OIDs of parents rather than
RangeVars.  We can also move the important comment (about lock level of
parent rels) buried in the bowels of MergeAttribute to the place where
it belongs in DefineRelation; and we no longer have to mess with
transforming names to OIDs multiple times.

Proposed patch attached.

I'll self-review this again tomorrow, 'cause I now have to run.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index d3e33132f3..94f7651c34 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1216,7 +1216,11 @@ WITH ( MODULUS numeric_literal, REM
   of the tablespace in which the new table is to be created.
   If not specified,
is consulted, or
-   if the table is temporary.
+   if the table is temporary.  For
+  partitioned tables, since no storage is required for the table itself,
+  the tablespace specified here only serves to mark the default tablespace
+  for any newly created partitions when no other tablespace is explicitly
+  specified.
  
 

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 651e86b71e..4d5b82aaa9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -333,7 +333,6 @@ heap_create(const char *relname,
 		case RELKIND_VIEW:
 		case RELKIND_COMPOSITE_TYPE:
 		case RELKIND_FOREIGN_TABLE:
-		case RELKIND_PARTITIONED_TABLE:
 			create_storage = false;
 
 			/*
@@ -343,10 +342,11 @@ heap_create(const char *relname,
 			reltablespace = InvalidOid;
 			break;
 
+		case RELKIND_PARTITIONED_TABLE:
 		case RELKIND_PARTITIONED_INDEX:
 			/*
-			 * Preserve tablespace so that it's used as tablespace for indexes
-			 * on future partitions.
+			 * For partitioned tables and indexes, preserve tablespace so that
+			 * it's used as the tablespace for future partitions.
 			 */
 			create_storage = false;
 			break;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d6d0de1b01..bc98a0b12e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -305,7 +305,7 @@ static void truncate_check_activity(Relation rel);
 static void RangeVarCallbackForTruncate(const RangeVar *relation,
 			Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
-bool is_partition, List **supOids, List **supconstr);
+bool is_partition, List **supconstr);
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
@@ -446,7 +446,7 @@ static bool ATPrepChangePersistence(Relation rel, bool toLogged);
 static void ATPrepSetTableSpace(AlteredTableInfo *tab, Relation rel,
 	const char *tablespacename, LOCKMODE lockmode);
 static void ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode);
-static void ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace);
+static void ATExecSetTableSpaceNoStorage(Relation rel, Oid newTableSpace);
 static void ATExecSetRelOptions(Relation rel, List *defList,
 	AlterTableType operation,
 	LOCKMODE lockmode);
@@ -536,6 +536,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
 	Oid			ofTypeId;
 	ObjectAddress address;
+	LOCKMODE	parentLockmode;
 
 	/*
 	 * Truncate relname to appropriate length (probably a waste of time, as
@@ -581,6 +582,38 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
  errmsg("cannot create temporary table within security-restricted operation")));
 
 	/*
+	 * Determine the lockmode to use when scanning parents.  A self-exclusive
+	 * lock is needed here.
+	 *
+	 * For regular inheritance, if two backends attempt to add children to the
+	 * same parent simultaneously, and that parent has no pre-existing
+	 * children, then both will attempt to update the parent's relhassubclass
+	 * field, leading to a "tuple concurrently updated" error.  Also, this
+	 * interlocks against a concurrent ANALYZE on the p

Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-16 Thread Andres Freund
Hi,

On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
>  wrote:
> > On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> > > Is there any reason something like that isn't necessary for gist? If so,
> > > where's that documented? If not, uh, isn't that a somewhat serious bug
> > > in gist?
> >
> > Thank you for pointing!  This looks like a bug for me too.  I'm going
> > to investigate more on this and provide a fix in next couple of days.
> 
> Sorry for delay.  Attached patch implements conflict handling for gist
> microvacuum like btree and hash.  I'm going to push it if no
> objections.
> 
> Note, that it implements new WAL record type.  So, new WAL can\t be
> replayed on old minor release.  I'm note sure if we claim that it's
> usually possible.  Should we state something explicitly for this case?

Please hold off committing for a bit. Adding new WAL records in a minor
release ought to be very well considered and a measure of last resort.

Couldn't we determine the xid horizon on the primary, and reuse an
existing WAL record to trigger the conflict?  Or something along those
lines?

Greetings,

Andres Freund



Re: slight tweaks to documentation about runtime pruning

2018-12-16 Thread Alvaro Herrera
On 2018-Dec-10, David Rowley wrote:

> On Wed, 5 Dec 2018 at 20:24, Amit Langote  
> wrote:

> > However, for pruned partitions' subplans, what's actually shown is the
> > string "(never executed)", not loops.  So, wouldn't it be better to tell
> > the readers to look for that instead of "loops"?
> 
> I don't really see any issues with the existing documentation here.
> Remember that pruning can be performed multiple times when a parameter
> changes that was found to match the partition key and the
> Append/MergeAppend is rescanned.

I lean towards Amit's side.  I think we're too laconic about many
details of EXPLAIN's output.  This is two lines about an interesting
detail that's pretty obscure.  It doesn't hurt to have it there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: select limit error in file_fdw

2018-12-16 Thread Tom Lane
Erik Rijkers  writes:
> Thank you very much. I've now also tested with the original, much larger 
> file, which gives no problem anymore. I am really glad this works again, 
> we use this stuff a lot.

We appreciate you noticing the problem before 11.2 got out ...

regards, tom lane



Re: Should new partitions inherit their tablespace from their parent?

2018-12-16 Thread Michael Paquier
On Sun, Dec 16, 2018 at 07:07:35PM -0300, Alvaro Herrera wrote:
> I'll self-review this again tomorrow, 'cause I now have to run.

> - if (!is_partition)
> - relation = heap_openrv(parent, 
> ShareUpdateExclusiveLock);
> - else
> - relation = heap_openrv(parent, AccessExclusiveLock);
> + /* caller already got lock */
> + relation = heap_open(parent, NoLock);

Okay, I think that you should add an assertion on
CheckRelationLockedByMe() as MergeAttributes()'s only caller is
DefineRelation().  Better safe than sorry later.
--
Michael


signature.asc
Description: PGP signature


Re: gist microvacuum doesn't appear to care about hot standby?

2018-12-16 Thread Alexander Korotkov
On Mon, Dec 17, 2018 at 1:25 AM Andres Freund  wrote:
> On 2018-12-17 01:03:52 +0300, Alexander Korotkov wrote:
> > On Thu, Dec 13, 2018 at 7:28 AM Alexander Korotkov
> >  wrote:
> > > On Thu, Dec 13, 2018 at 1:45 AM Andres Freund  wrote:
> > > > Is there any reason something like that isn't necessary for gist? If so,
> > > > where's that documented? If not, uh, isn't that a somewhat serious bug
> > > > in gist?
> > >
> > > Thank you for pointing!  This looks like a bug for me too.  I'm going
> > > to investigate more on this and provide a fix in next couple of days.
> >
> > Sorry for delay.  Attached patch implements conflict handling for gist
> > microvacuum like btree and hash.  I'm going to push it if no
> > objections.
> >
> > Note, that it implements new WAL record type.  So, new WAL can\t be
> > replayed on old minor release.  I'm note sure if we claim that it's
> > usually possible.  Should we state something explicitly for this case?
>
> Please hold off committing for a bit. Adding new WAL records in a minor
> release ought to be very well considered and a measure of last resort.
>
> Couldn't we determine the xid horizon on the primary, and reuse an
> existing WAL record to trigger the conflict?  Or something along those
> lines?

I thought about that, but decided it's better to mimic B-tree and hash
behavior rather than invent new logic in a minor release.  But given
that new WAL record in minor release in substantial problem, that
argument doesn't matter.

Yes, it seems to be possible.  We can determine xid horizon on primary
in the same way you proposed for B-tree and hash [1] and use
XLOG_HEAP2_CLEANUP_INFO record to trigger the conflict.  Do you like
me to make such patch for GiST based on your patch?

1. 
https://www.postgresql.org/message-id/20181214014235.dal5ogljs3bmlq44%40alap3.anarazel.de

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Catalog views failed to show partitioned table information.

2018-12-16 Thread Amit Langote
Hi,

On 2018/12/15 8:00, Michael Paquier wrote:
> On Fri, Dec 14, 2018 at 05:21:49PM +0530, Suraj Kharage wrote:
>> There are some catalog views which do not show the partitioned table and
>> its index entry.
>> One of them is "pg_indexes" which failed to show the partitioned index.
>> Attached the patch which fixes the same.
> 
> I tend to agree with your comment here.  pg_tables lists partitioned
> tables, but pg_indexes is forgotting about partitioned indexes.  So this
> is a good thing to add.

+1

>> Other views such as pg_stat*,pg_statio_* has the same problem for
>> partitioned tables and indexes.
>> Since the partitioned tables and its indexes considered as a dummy, they do
>> not have any significance in stat tables,
>> can we still consider adding relkind=p in these pg_stat_* views? Thoughts?
> 
> I am less sure about that as partitioned relations do not have a
> physical presence.

Hmm, although most of the fields of pg_stat_user_tables would be NULL or 0
for partitioned tables/indexes, values of at least some of the fields of
pg_stat_user_tables, like last_vacuum, last_analyze, etc., might be useful
to users.  Also, we cannot assume that these views will continue to be
mostly useless as far as partitioned relations are concerned.

Thanks,
Amit




Re: Should new partitions inherit their tablespace from their parent?

2018-12-16 Thread David Rowley
On Mon, 17 Dec 2018 at 12:59, Michael Paquier  wrote:
> Okay, I think that you should add an assertion on
> CheckRelationLockedByMe() as MergeAttributes()'s only caller is
> DefineRelation().  Better safe than sorry later.

Would that not just double up the work that's already done in relation_open()?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Catalog views failed to show partitioned table information.

2018-12-16 Thread Michael Paquier
On Mon, Dec 17, 2018 at 10:22:28AM +0900, Amit Langote wrote:
> On 2018/12/15 8:00, Michael Paquier wrote:
>> I tend to agree with your comment here.  pg_tables lists partitioned
>> tables, but pg_indexes is forgotting about partitioned indexes.  So this
>> is a good thing to add.
> 
> +1

I'll go commit something close to what Suraj is proposing if there are
no objections from others.  At least we agree on that part ;)

>> I am less sure about that as partitioned relations do not have a
>> physical presence.
> 
> Hmm, although most of the fields of pg_stat_user_tables would be NULL or 0
> for partitioned tables/indexes, values of at least some of the fields of
> pg_stat_user_tables, like last_vacuum, last_analyze, etc., might be useful
> to users.  Also, we cannot assume that these views will continue to be
> mostly useless as far as partitioned relations are concerned.

Well, when VACUUM or ANALYZE list a partitioned table what the
processing does is to decompose partitioned tables into a list of actual
relations it can work on, and it never processes the partitioned parts,
so last_vacuum & friends remain set at 0/NULL.

We had a similar discussion about that a couple of months ago, and it
was not really clear to me how it is possible to define aggregates for
partitioned tables when analyzing them, and if stat tables should show
them or not:
https://www.postgresql.org/message-id/152922564661.24801.3078728743990100...@wrigleys.postgresql.org

Listing only NULL/0 is also confusing I think because this would mean
for the end-user that VACUUM and/or ANALYZE have never been run for a
given relation.

pg_partition_tree has been added since then, so compiling stats has
become easier for full partition trees, the documentation could be
improved on that point perhaps.
--
Michael


signature.asc
Description: PGP signature


Re: 'infinity'::Interval should be added

2018-12-16 Thread Robert Haas
On Sat, Dec 15, 2018 at 3:02 PM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2018-12-15 14:43:49 -0500, Tom Lane wrote:
> >> Note that timestamp_lt etc don't actually need any special case for
> >> infinity, and we could hope that the infinity representation for interval
> >> makes it possible to likewise not special-case it in interval comparisons.
> >> But I think it's silly to argue that infinity handling is a significant
> >> fraction of something like timestamp_pl_interval or timestamp_part.
>
> > I'm inclined to agree that if done carefully the overhead here is
> > probably acceptable.
>
> Backing up to look at the actual code ... if the infinity representation
> is the max or min value in each of the three fields, then the conversion
> done by interval_cmp_value would yield an int128 value that's certainly
> greater than or less than any other interval value, and I don't think it
> can overflow, so there's no need to add any code to the comparison cases.

OK, well, I stand corrected.  Not sure that answers the question of
whether we want this, but it's nice to know that if we do, it can be
done without causing too much slowdown.

Simon's argument for adding this is that we support 'infinity' for
timestamp, but is that a good argument for making 'interval' do it,
given that there are many other types like date for which we don't
support it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 'infinity'::Interval should be added

2018-12-16 Thread Tom Lane
Robert Haas  writes:
> Simon's argument for adding this is that we support 'infinity' for
> timestamp, but is that a good argument for making 'interval' do it,
> given that there are many other types like date for which we don't
> support it?

My feeling is that date is to timestamp as integer is to float.
We have infinities in the latter types but not the former, and
that seems just fine: infinity is one of the additional values
that you get to have with the bigger/more expensive type.
So I don't really feel that the lack of infinity in date is an
argument against whether to provide it for interval.

The positive argument for adding infinity to interval is that
we define operations such as timestamp minus timestamp as
yielding interval.  That's why this has to fail right now:

regression=# select timestamp 'infinity' - timestamp 'now';
ERROR:  cannot subtract infinite timestamps

But if we had infinite intervals then that would have a well
defined result, just as this works:

regression=# select extract(epoch from timestamp 'infinity');
 date_part 
---
  Infinity
(1 row)

Of course, there are still cases like timestamp 'infinity' -
timestamp 'infinity' that would need to fail, but that has a
semantic basis rather than "the output type can't represent it".
(No, I don't want to invent an interval equivalent of NaN
to make that not fail.)

[ wanders away wondering why type numeric has NaN but not infinity ]

regards, tom lane



Re: 'infinity'::Interval should be added

2018-12-16 Thread Isaac Morland
On Sun, 16 Dec 2018 at 22:27, Robert Haas  wrote:


> Simon's argument for adding this is that we support 'infinity' for
> timestamp, but is that a good argument for making 'interval' do it,
> given that there are many other types like date for which we don't
> support it?
>

postgres=> select 'infinity'::date, '-infinity'::date;
   date   |   date
--+---
 infinity | -infinity
(1 row)

postgres=>

Works since 8.4:

https://www.postgresql.org/docs/8.4/datatype-datetime.html

I think that in principle an argument can be made for having infinity and
-infinity in every data type, sort of like how every data type has null,
but based on the nature of the system as it exists I'm not going to advance
that argument.

I do think that decimal/numeric probably ought to have infinity, assuming
there is a reasonable way to fit them into the representation,
but integer types are currently exactly 2/4/8-byte 2's complement values so
I don't see how to squeeze in infinite values in a way that wouldn't mess
up existing code (or cause handling of integer values to be much slower).


Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread Michael Paquier
On Fri, Dec 14, 2018 at 08:08:45AM +, Amul Sul wrote:
> dump-alter-index-stats-v2.patch looks pretty much reasonable to me, passing 
> on committer.
> 
> The new status of this patch is: Ready for Committer

Thanks Amul for the review.  I got the occasion to look again at this
patch, and I have read again the original thread which has added the new
grammar for ALTER INDEX SET STATISTICS:
https://www.postgresql.org/message-id/CAPpHfdsSYo6xpt0F=ngadqmpfjjhc7zapde9h1qwkdphpwf...@mail.gmail.com

As Alexander and others state on this thread, it looks a bit weird to
use internally-produced attribute names in those SQL queries, which is
why the new grammar has been added.  At the same time, it looks more
solid to me to represent the dumps with those column names instead of
column numbers.  Tom, Alexander, as you have commented on the original
thread, perhaps you have an opinion here to share?

For now, attached is an updated patch which has a simplified test list
in the TAP test.  I have also added two free() calls for the arrays
getting allocated when statistics are present for an index.
--
Michael


signature.asc
Description: PGP signature


Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread Tom Lane
Michael Paquier  writes:
> As Alexander and others state on this thread, it looks a bit weird to
> use internally-produced attribute names in those SQL queries, which is
> why the new grammar has been added.  At the same time, it looks more
> solid to me to represent the dumps with those column names instead of
> column numbers.  Tom, Alexander, as you have commented on the original
> thread, perhaps you have an opinion here to share?

The problem is that there's no guarantee that the new server would
generate the same column name for an index column --- and I don't
want to try to lock things down so much that there would be such
a guarantee.  So I'd go with the column-number form.

As an example:

regression=# create table foo (expr int, f1 int, f2 int);
CREATE TABLE
regression=# create index on foo ((f1+f2));
CREATE INDEX
regression=# create index on foo (expr, (f1+f2));
CREATE INDEX
regression=# \d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 expr   | integer |   |  | 
 f1 | integer |   |  | 
 f2 | integer |   |  | 
Indexes:
"foo_expr_expr1_idx" btree (expr, (f1 + f2))
"foo_expr_idx" btree ((f1 + f2))

regression=# \d foo_expr_idx
 Index "public.foo_expr_idx"
 Column |  Type   | Key? | Definition 
+-+--+
 expr   | integer | yes  | (f1 + f2)
btree, for table "public.foo"

regression=# \d foo_expr_expr1_idx
  Index "public.foo_expr_expr1_idx"
 Column |  Type   | Key? | Definition 
+-+--+
 expr   | integer | yes  | expr
 expr1  | integer | yes  | (f1 + f2)
btree, for table "public.foo"

If we were to rename the "foo.expr" column at this point,
and then dump and reload, the expression column in the
second index would presumably acquire the name "expr"
not "expr1", because "expr" would no longer be taken.
So if pg_dump were to try to use that index column name
in ALTER ... SET STATISTICS, it'd fail.

regards, tom lane



Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread amul sul
On Mon, Dec 17, 2018 at 10:44 AM Michael Paquier  wrote:
>
> On Fri, Dec 14, 2018 at 08:08:45AM +, Amul Sul wrote:
> > dump-alter-index-stats-v2.patch looks pretty much reasonable to me, passing 
> > on committer.
> >
> > The new status of this patch is: Ready for Committer
>
> Thanks Amul for the review.  I got the occasion to look again at this
> patch, and I have read again the original thread which has added the new
> grammar for ALTER INDEX SET STATISTICS:
> https://www.postgresql.org/message-id/CAPpHfdsSYo6xpt0F=ngadqmpfjjhc7zapde9h1qwkdphpwf...@mail.gmail.com
>
> As Alexander and others state on this thread, it looks a bit weird to
> use internally-produced attribute names in those SQL queries, which is
> why the new grammar has been added.  At the same time, it looks more
> solid to me to represent the dumps with those column names instead of
> column numbers.  Tom, Alexander, as you have commented on the original
> thread, perhaps you have an opinion here to share?
>

Oh I see -- understood the problem, I missed this discussion, thanks to
letting me know.

> For now, attached is an updated patch which has a simplified test list
> in the TAP test.  I have also added two free() calls for the arrays
> getting allocated when statistics are present for an index.

Patch is missing?

Regards,
Amul



Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread Michael Paquier
On Mon, Dec 17, 2018 at 10:59:08AM +0530, amul sul wrote:
> Patch is missing?

Here you go.  The patch is still using atttribute names, which is a bad
idea ;)
--
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..09e90ea62c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6712,7 +6712,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 i_conoid,
 i_condef,
 i_tablespace,
-i_indreloptions;
+i_indreloptions,
+i_indstatcols,
+i_indstatvals;
 	int			ntups;
 
 	for (i = 0; i < numTables; i++)
@@ -6766,7 +6768,15 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "(SELECT pg_catalog.array_agg(attname ORDER BY attname::text COLLATE \"C\") "
+			  "  FROM pg_catalog.pg_attribute "
+			  "  WHERE attrelid = i.indexrelid AND "
+			  "attstattarget >= 0) AS indstatcols,"
+			  "(SELECT pg_catalog.array_agg(attstattarget ORDER BY attname::text COLLATE \"C\") "
+			  "  FROM pg_catalog.pg_attribute "
+			  "  WHERE attrelid = i.indexrelid AND "
+			  "attstattarget >= 0) AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "JOIN pg_catalog.pg_class t2 ON (t2.oid = i.indrelid) "
@@ -6803,7 +6813,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_constraint c "
@@ -6836,7 +6848,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_constraint c "
@@ -6865,7 +6879,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "null AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_depend d "
@@ -6897,7 +6913,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "null AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "null AS indreloptions "
+			  "null AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_depend d "
@@ -6935,6 +6953,8 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 		i_condef = PQfnumber(res, "condef");
 		i_tablespace = PQfnumber(res, "tablespace");
 		i_indreloptions = PQfnumber(res, "indreloptions");
+		i_indstatcols = PQfnumber(res, "indstatcols");
+		i_indstatvals = PQfnumber(res, "indstatvals");
 
 		tbinfo->indexes = indxinfo =
 			(IndxInfo *) pg_malloc(ntups * sizeof(IndxInfo));
@@ -6958,6 +6978,8 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			indxinfo[j].indnattrs = atoi(PQgetvalue(res, j, i_indnatts));
 			indxinfo[j].tablespace = pg_strdup(PQgetvalue(res, j, i_tablespace));
 			indxinfo[j].indreloptions = pg_strdup(PQgetvalue(res, j, i_indreloptions));
+			indxinfo[j].indstatcols = pg_strdup(PQgetvalue(res, j, i_indstatcols));
+			indxinfo[j].indstatvals = pg_strdup(PQgetvalue(res, j, i_indstatvals));
 			indxinfo[j].indkeys = (Oid *) pg_malloc(indxinfo[j].indnattrs * sizeof(Oid));
 			parseOidArray(PQgetvalue(res, j, i_indkey),
 		  indxinfo[j].indkeys, indxinfo[j].indnattrs);
@@ -16148,6 +16170,13 @@ dumpIndex(

Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread Michael Paquier
On Mon, Dec 17, 2018 at 12:24:15AM -0500, Tom Lane wrote:
> If we were to rename the "foo.expr" column at this point,
> and then dump and reload, the expression column in the
> second index would presumably acquire the name "expr"
> not "expr1", because "expr" would no longer be taken.
> So if pg_dump were to try to use that index column name
> in ALTER ... SET STATISTICS, it'd fail.

Good point, thanks!  I did not think about the case where a table uses
an attribute name matching what would be generated for indexes.

So this settles the argument that we had better not do anything before
v11.  Switching the dump code to use column numbers has not proved to be
complicated as only the query and some comments had to be tweaked.
Attached is an updated patch, and I am switching back the patch to
"Needs review" to have an extra pair of eyes look at that in case I
missed something.
--
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 637c79af48..a46dd4c8e6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6712,7 +6712,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 i_conoid,
 i_condef,
 i_tablespace,
-i_indreloptions;
+i_indreloptions,
+i_indstatcols,
+i_indstatvals;
 	int			ntups;
 
 	for (i = 0; i < numTables; i++)
@@ -6766,7 +6768,15 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "(SELECT pg_catalog.array_agg(attnum ORDER BY attnum) "
+			  "  FROM pg_catalog.pg_attribute "
+			  "  WHERE attrelid = i.indexrelid AND "
+			  "attstattarget >= 0) AS indstatcols,"
+			  "(SELECT pg_catalog.array_agg(attstattarget ORDER BY attnum) "
+			  "  FROM pg_catalog.pg_attribute "
+			  "  WHERE attrelid = i.indexrelid AND "
+			  "attstattarget >= 0) AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "JOIN pg_catalog.pg_class t2 ON (t2.oid = i.indrelid) "
@@ -6803,7 +6813,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_constraint c "
@@ -6836,7 +6848,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_constraint c "
@@ -6865,7 +6879,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "null AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "t.reloptions AS indreloptions "
+			  "t.reloptions AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_depend d "
@@ -6897,7 +6913,9 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			  "c.oid AS conoid, "
 			  "null AS condef, "
 			  "(SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, "
-			  "null AS indreloptions "
+			  "null AS indreloptions, "
+			  "'' AS indstatcols, "
+			  "'' AS indstatvals "
 			  "FROM pg_catalog.pg_index i "
 			  "JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) "
 			  "LEFT JOIN pg_catalog.pg_depend d "
@@ -6935,6 +6953,8 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 		i_condef = PQfnumber(res, "condef");
 		i_tablespace = PQfnumber(res, "tablespace");
 		i_indreloptions = PQfnumber(res, "indreloptions");
+		i_indstatcols = PQfnumber(res, "indstatcols");
+		i_indstatvals = PQfnumber(res, "indstatvals");
 
 		tbinfo->indexes = indxinfo =
 			(IndxInfo *) pg_malloc(ntups * sizeof(I

Fixing typos in tests of partition_info.sql

2018-12-16 Thread Michael Paquier
Hi Amit,
(CC: -hackers)

I was just going through some of the tests, when I noticed that the
tests of partition_info.sql have two typos and that the last set of
tests is imprecise about the expected behavior of the functions.

Do you think that something like the attached is an improvement?

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread amul sul
On Mon, Dec 17, 2018 at 11:54 AM Michael Paquier  wrote:
>
> On Mon, Dec 17, 2018 at 12:24:15AM -0500, Tom Lane wrote:
> > If we were to rename the "foo.expr" column at this point,
> > and then dump and reload, the expression column in the
> > second index would presumably acquire the name "expr"
> > not "expr1", because "expr" would no longer be taken.
> > So if pg_dump were to try to use that index column name
> > in ALTER ... SET STATISTICS, it'd fail.
>
> Good point, thanks!  I did not think about the case where a table uses
> an attribute name matching what would be generated for indexes.
>
> So this settles the argument that we had better not do anything before
> v11.  Switching the dump code to use column numbers has not proved to be
> complicated as only the query and some comments had to be tweaked.
> Attached is an updated patch, and I am switching back the patch to
> "Needs review" to have an extra pair of eyes look at that in case I
> missed something.

+1, will have a look, thanks.

Regards,
Amul



Re: Fixing typos in tests of partition_info.sql

2018-12-16 Thread Michael Paquier
On Mon, Dec 17, 2018 at 03:40:28PM +0900, Michael Paquier wrote:
> I was just going through some of the tests, when I noticed that the
> tests of partition_info.sql have two typos and that the last set of
> tests is imprecise about the expected behavior of the functions.
> 
> Do you think that something like the attached is an improvement?

Meuh.  Patch forgotten.
--
Michael
diff --git a/src/test/regress/expected/partition_info.out b/src/test/regress/expected/partition_info.out
index 202d820827..6bd2d96fb6 100644
--- a/src/test/regress/expected/partition_info.out
+++ b/src/test/regress/expected/partition_info.out
@@ -99,7 +99,7 @@ SELECT relid, parentrelid, level, isleaf
 (1 row)
 
 DROP TABLE ptif_test;
--- A table not part of a partition tree works is the only member listed.
+-- A table not part of a partition tree is the only member listed.
 CREATE TABLE ptif_normal_table(a int);
 SELECT relid, parentrelid, level, isleaf
   FROM pg_partition_tree('ptif_normal_table');
@@ -109,7 +109,8 @@ SELECT relid, parentrelid, level, isleaf
 (1 row)
 
 DROP TABLE ptif_normal_table;
--- Views and materialized viewS cannot be part of a partition tree.
+-- Views and materialized views are not part of a partition tree,
+-- causing the functions to return NULL.
 CREATE VIEW ptif_test_view AS SELECT 1;
 CREATE MATERIALIZED VIEW ptif_test_matview AS SELECT 1;
 SELECT * FROM pg_partition_tree('ptif_test_view');
diff --git a/src/test/regress/sql/partition_info.sql b/src/test/regress/sql/partition_info.sql
index 9b55a7fe5c..0a491d295e 100644
--- a/src/test/regress/sql/partition_info.sql
+++ b/src/test/regress/sql/partition_info.sql
@@ -54,13 +54,14 @@ SELECT relid, parentrelid, level, isleaf
 
 DROP TABLE ptif_test;
 
--- A table not part of a partition tree works is the only member listed.
+-- A table not part of a partition tree is the only member listed.
 CREATE TABLE ptif_normal_table(a int);
 SELECT relid, parentrelid, level, isleaf
   FROM pg_partition_tree('ptif_normal_table');
 DROP TABLE ptif_normal_table;
 
--- Views and materialized viewS cannot be part of a partition tree.
+-- Views and materialized views are not part of a partition tree,
+-- causing the functions to return NULL.
 CREATE VIEW ptif_test_view AS SELECT 1;
 CREATE MATERIALIZED VIEW ptif_test_matview AS SELECT 1;
 SELECT * FROM pg_partition_tree('ptif_test_view');


signature.asc
Description: PGP signature


Re: Fixing typos in tests of partition_info.sql

2018-12-16 Thread Amit Langote
Hi,

On 2018/12/17 15:40, Michael Paquier wrote:
> Hi Amit,
> (CC: -hackers)
> 
> I was just going through some of the tests, when I noticed that the
> tests of partition_info.sql have two typos and that the last set of
> tests is imprecise about the expected behavior of the functions.
> 
> Do you think that something like the attached is an improvement?

You forgot to attach something. :)

Thanks,
Amit




Re: Fixing typos in tests of partition_info.sql

2018-12-16 Thread Amit Langote
On 2018/12/17 15:52, Michael Paquier wrote:
> On Mon, Dec 17, 2018 at 03:40:28PM +0900, Michael Paquier wrote:
>> I was just going through some of the tests, when I noticed that the
>> tests of partition_info.sql have two typos and that the last set of
>> tests is imprecise about the expected behavior of the functions.
>>
>> Do you think that something like the attached is an improvement?
> 
> Meuh.  Patch forgotten.

Thanks.

--- A table not part of a partition tree works is the only member listed.
+-- A table not part of a partition tree is the only member listed.

How about:

-- Table that is not part of any partition tree is the only member listed

--- Views and materialized viewS cannot be part of a partition tree.
+-- Views and materialized views are not part of a partition tree,
+-- causing the functions to return NULL.

How bouat:

-- Function returns NULL for relation types that cannot be part of a
-- partition tree; for example, views, materialized views, etc.

Thanks,
Amit




Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread amul sul
Hi Michael,

On Mon, Dec 17, 2018 at 11:54 AM Michael Paquier  wrote:
>
[...]
> So this settles the argument that we had better not do anything before
> v11.  Switching the dump code to use column numbers has not proved to be
> complicated as only the query and some comments had to be tweaked.
> Attached is an updated patch, and I am switching back the patch to
> "Needs review" to have an extra pair of eyes look at that in case I
> missed something.

This v4-patch needs a rebase against the latest master head(#67915fb).

Regards,
Amul



Re: Remove double trailing semicolons

2018-12-16 Thread Amit Kapila
On Mon, Dec 17, 2018 at 1:53 AM David Rowley
 wrote:
>
> While looking over some recent commits, I noticed there are some code
> lines with a double trailing semicolon instead of a single one.  The
> attached fixes these.
>

LGTM.  I will commit it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread Michael Paquier
On Mon, Dec 17, 2018 at 12:49:03PM +0530, amul sul wrote:
> On Mon, Dec 17, 2018 at 11:54 AM Michael Paquier  wrote:
>> So this settles the argument that we had better not do anything before
>> v11.  Switching the dump code to use column numbers has not proved to be
>> complicated as only the query and some comments had to be tweaked.
>> Attached is an updated patch, and I am switching back the patch to
>> "Needs review" to have an extra pair of eyes look at that in case I
>> missed something.
> 
> This v4-patch needs a rebase against the latest master head(#67915fb).

I am on top of the master branch at 67915fb8, and this applies fine for
me:
$ patch -p1 < dump-alter-index-stats-v4.patch
patching file src/bin/pg_dump/pg_dump.c
patching file src/bin/pg_dump/pg_dump.h
patching file src/bin/pg_dump/t/002_pg_dump.pl

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: Fixing typos in tests of partition_info.sql

2018-12-16 Thread Michael Paquier
On Mon, Dec 17, 2018 at 04:14:07PM +0900, Amit Langote wrote:
> --- A table not part of a partition tree works is the only member listed.
> +-- A table not part of a partition tree is the only member listed.
> 
> How about:
> 
> -- Table that is not part of any partition tree is the only member listed
> 
> --- Views and materialized viewS cannot be part of a partition tree.
> +-- Views and materialized views are not part of a partition tree,
> +-- causing the functions to return NULL.
> 
> How about:
> 
> -- Functions returns NULL for relation types that cannot be part of a
> -- partition tree; for example, views, materialized views, etc.

Your two suggestions look fine to me, so let's just reuse your wording.
i would just use the plural for the last comment with "Functions return"
as pg_partition_tree gets called multiple times, and later on
pg_partition_root will likely be added.

Perhaps there are other suggestions?
--
Michael


signature.asc
Description: PGP signature


Re: Fixing typos in tests of partition_info.sql

2018-12-16 Thread Amit Langote
On 2018/12/17 16:38, Michael Paquier wrote:
> On Mon, Dec 17, 2018 at 04:14:07PM +0900, Amit Langote wrote:
>> --- A table not part of a partition tree works is the only member listed.
>> +-- A table not part of a partition tree is the only member listed.
>>
>> How about:
>>
>> -- Table that is not part of any partition tree is the only member listed
>>
>> --- Views and materialized viewS cannot be part of a partition tree.
>> +-- Views and materialized views are not part of a partition tree,
>> +-- causing the functions to return NULL.
>>
>> How about:
>>
>> -- Functions returns NULL for relation types that cannot be part of a
>> -- partition tree; for example, views, materialized views, etc.
> 
> Your two suggestions look fine to me, so let's just reuse your wording.
> i would just use the plural for the last comment with "Functions return"
> as pg_partition_tree gets called multiple times, and later on
> pg_partition_root will likely be added.

Okay, let's use "Functions" then, although I wonder if you shouldn't tweak
it later when you commit the pg_partition_root patch?

Thanks,
Amit




Re: ALTER INDEX ... ALTER COLUMN not present in dump

2018-12-16 Thread amul sul
On Mon, Dec 17, 2018 at 1:04 PM Michael Paquier  wrote:
>
> On Mon, Dec 17, 2018 at 12:49:03PM +0530, amul sul wrote:
> > On Mon, Dec 17, 2018 at 11:54 AM Michael Paquier 
wrote:
> >> So this settles the argument that we had better not do anything before
> >> v11.  Switching the dump code to use column numbers has not proved to
be
> >> complicated as only the query and some comments had to be tweaked.
> >> Attached is an updated patch, and I am switching back the patch to
> >> "Needs review" to have an extra pair of eyes look at that in case I
> >> missed something.
> >
> > This v4-patch needs a rebase against the latest master head(#67915fb).
>
> I am on top of the master branch at 67915fb8, and this applies fine for
> me:
> $ patch -p1 < dump-alter-index-stats-v4.patch
> patching file src/bin/pg_dump/pg_dump.c
> patching file src/bin/pg_dump/pg_dump.h
> patching file src/bin/pg_dump/t/002_pg_dump.pl
>

Thanks, patch command works for me as well, with git I was getting a
following failure:

Laptop215:PG edb$ git apply ~/Downloads/dump-alter-index-stats-v4.patch

/Users/edb/Downloads/dump-alter-index-stats-v4.patch:10: trailing
whitespace.
i_indreloptions,
/Users/edb/Downloads/dump-alter-index-stats-v4.patch:11: trailing
whitespace.
i_indstatcols,
/Users/edb/Downloads/dump-alter-index-stats-v4.patch:12: trailing
whitespace.
i_indstatvals;
/Users/edb/Downloads/dump-alter-index-stats-v4.patch:21: trailing
whitespace.
  "t.reloptions AS indreloptions, "
/Users/edb/Downloads/dump-alter-index-stats-v4.patch:22: trailing
whitespace.
  "(SELECT pg_catalog.array_agg(attnum ORDER BY attnum) "
error: patch failed: src/bin/pg_dump/pg_dump.c:6712
error: src/bin/pg_dump/pg_dump.c: patch does not apply
error: patch failed: src/bin/pg_dump/pg_dump.h:360
error: src/bin/pg_dump/pg_dump.h: patch does not apply
error: patch failed: src/bin/pg_dump/t/002_pg_dump.pl:2343
error: src/bin/pg_dump/t/002_pg_dump.pl: patch does not apply

Laptop215:PG edb$ git --version
git version 2.14.1

Regards,
Amul