date:20190329

RE: speeding up planning with partitions

2019-03-29 Thread Imai, Yoshikazu

On Fri, Mar 29, 2019 at 3:45 PM, Amit Langote wrote:
> Thanks a lot for hacking on the patch.  I'm really happy with the direction
> you took for inheritance_planner, as it allows UPDATE/DELETE to use
> partition pruning.

I was astonished by Tom's awesome works and really thanks him.

> Certainly.  Note that previously we'd always scan *all* hash partitions
> for UPDATE and DELETE queries, because constraint exclusion can't exclude
> hash partitions due to the shape of their partition constraint.
> 
> I ran my usual benchmark with up to 8192 partitions.
> 
> N: 2..8192
> 
> create table rt (a int, b int, c int) partition by range (a); select 'create
> table rt' || x::text || ' partition of rt for values from (' || (x)::text
> || ') to (' || (x+1)::text || ');' from generate_series(1,
> N) x;
> \gexec
> 
> update.sql:
> 
> \set param random(1, N)
> update rt set a = 0 where a = :param;
> 
> pgbench -n -T 120 -f select.sql
> 
> npartsv38   HEAD
> ==      
> 2  2971   2969
> 8  2980   1949
> 32 2955733
> 1282946145
> 5122924 11
> 1024   2986  3
> 4096   2702  0
> 8192   2531OOM
> 
> Obviously, you'll get similar numbers with hash or list partitioning.

I also ran the test for hash partitioning for just make sure.


N: 2..8192

create table ht (a int, b int, c int) partition by hash (a);
select 'create table ht' || x::text ||
' partition of ht for values with (MODULUS N, REMAINDER || (x)::text || ');'
from generate_series(0, N-1) x;
\gexec

update.sql:

\set param random(1, N * 100)
update ht set b = b + 1 where a = :param;

pgbench -n -T 60 -f update.sql


[updating one partition]
npartsv38   HEAD
==      
0:  10538  10487
2:   6942   7028
4:   7043   5645
8:   6981   3954
16:  6932   2440
32:  6897   1243
64:  6897309
128: 6753120
256: 6727 46
512: 6708 12
1024:6063  3
2048:5894  1
4096:5374OOM
8192:4572OOM


The performance for hash is also improved, though drop rate of performance with 
large partitions seems higher than that of range partitioning.

Thanks
--
Imai Yoshikazu

RE: Timeout parameters

2019-03-29 Thread Nagaura, Ryohei

Hi, Kirk-san,

> From: Jamison, Kirk [mailto:k.jami...@jp.fujitsu.com]
> In addition, regarding socket_timeout parameter.
> I referred to the doc in libpq.sgml, corrected misspellings, and rephrased the
> doc a little bit as below:
You aimed consistency with connect_timeout. Didn't you?
If yes, That is a nice idea!

Best regards,
-
Ryohei Nagaura



socket_timeout_v11.patch
Description: socket_timeout_v11.patch


TCP_backend_v17.patch
Description: TCP_backend_v17.patch


TCP_interface_v17.patch
Description: TCP_interface_v17.patch

Re: idle-in-transaction timeout error does not give a hint

2019-03-29 Thread Tatsuo Ishii

> Personally, I don't find this hint particularly necessary.  The session
> was terminated because nothing was happening, so the real fix on the
> application side is probably more involved than just retrying.  This is
> different from some of the other cases that were cited, such as
> serialization conflicts, where you just got unlucky due to concurrent
> events.  In the case of idle-in-transaction-timeout, the fault is
> entirely your own.

Hum. Sounds like a fair argument. Ideriha-san, what do you think?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Peter Eisentraut

On 2019-03-28 09:07, Sergei Kornilov wrote:
> Unfortunately patch does not apply due recent commits. Any chance this can be 
> fixed (and even committed in pg12)?

Committed :)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Pluggable Storage - Andres's take

2019-03-29 Thread Haribabu Kommi

On Wed, Mar 27, 2019 at 11:17 AM Andres Freund  wrote:

> Hi,
>
> On 2019-02-22 14:52:08 -0500, Robert Haas wrote:
> > On Fri, Feb 22, 2019 at 11:19 AM Amit Khandekar 
> wrote:
> > > Thanks for the review. Attached v2.
> >
> > Thanks.  I took this, combined it with Andres's
> > v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch, did
> > some polishing of the code and comments, and pgindented.  Here's what
> > I ended up with; see what you think.
>
> I pushed this after some fairly minor changes, directly including the
> patch to route the horizon computation through tableam. The only real
> change is that I removed the table relfilenode from the nbtree/hash
> deletion WAL record - it was only required to access the heap without
> accessing the catalog and was unused now.  Also added a WAL version
> bump.
>
> It seems possible that some other AM might want to generalize the
> prefetch logic from heapam.c, but I think it's fair to defer that until
> such an AM wants to do so
>

As I see that your are fixing some typos of the code that is committed,
I just want to share some more corrections that I found in the patches
that are committed till now.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-dA-to-show-Table-type-access-method.patch
Description: Binary data


0002-Typos-and-comemnt-corrections.patch
Description: Binary data

Re: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Sergei Kornilov

>>  Unfortunately patch does not apply due recent commits. Any chance this can 
>> be fixed (and even committed in pg12)?
>
> Committed :)

wow! Congratulations! This was very long way

my favorite pg12 feature

regards, Sergei

Re: Timeout parameters

2019-03-29 Thread Kyotaro HORIGUCHI

Hi, thank you for the new version.

This compiles on Windows. (I didn't comfirmed that it works
correctly..)

# ȁ  (I inserted this garbage to force my mailer to send in utf-8).

At Fri, 29 Mar 2019 06:52:41 +, "Nagaura, Ryohei" 
 wrote in 

> Hi all.
> 
> I found my mistake in backend patch.
> 
> I modified from
> + port->keepalives_count = count;
> to
> + port->tcp_user_timeout = timeout;
> in line 113.

I hadn't noticed^^;

It look good, but still I have some comments.

+Specifies the number of milliseconds that transmitted data may
+remain unacknowledged before a connection is forcefully closed.

Hmm. "forcefully" means powerful or assertive, in Japanese "力強く
" or "強硬に".  "forcibly" means acoomplished through force, in
Japanesee "無理やり" or "強引に". Actually the latter is used in
man 7 tcp.

http://man7.org/linux/man-pages/man7/tcp.7.html
>  time in milliseconds that transmitted data may remain
>  unacknowledged before TCP will forcibly close the
>  corresponding connection and return ETIMEDOUT to the

+#tcp_user_timeout = 0  # TCP_USER_TIMEOUT, in milliseconds;

The comment protrudes left.

+# - TCP USER TIMEOUT -

Just above in the sample file, there are lines like this:

> # - TCP Keepalives -
> # see "man 7 tcp" for details
> 
> #tcp_keepalives_idle = 0  # TCP_KEEPIDLE, in seconds;

Couldn't we have the new variable in the section, like this?

-# - TCP Keepalives -
+# - TCP timeout settings -
 # see "man 7 tcp" for details
..
 #tcp_keepalives_count = 0  # TCP_KEEPCNT;
# 0 selects the system default
+#tcp_user_timeout = 0  # TCP_USER_TIMEOUT, in milliseconds;
# 0 selects the system default


regars.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Timeout parameters

2019-03-29 Thread Nagaura, Ryohei

Hi all.

I found that connect_timeout uses pqWaitTimed().
Socket_timeout is related to pqWait() not pqWaitTimed().
Thus, I removed connect_timeout in my socket_Timeout patch.

FYI, I summarized a use case of this parameter.
The connection is built successfully.
Suppose that the server is hit by some kind of accident(e,g,. I or 
Tsunakawa-san suggested).
At this time, there is no guarantee that the server OS can pass control to the 
postmaster.
Therefore, it is considered natural way to disconnect the connection from the 
client side.
The place of implement of disconnection is pqWait() because it is where users' 
infinite wait occurs.

Best regards,
-
Ryohei Nagaura




socket_timeout_v12.patch
Description: socket_timeout_v12.patch


TCP_backend_v17.patch
Description: TCP_backend_v17.patch


TCP_interface_v17.patch
Description: TCP_interface_v17.patch

Re: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Michael Paquier

On Fri, Mar 29, 2019 at 10:39:23AM +0300, Sergei Kornilov wrote:
> wow! Congratulations! This was very long way
> 
> my favorite pg12 feature

So this has been committed, nice!  Thanks a lot to all for keeping
alive this patch over the ages, with particular thanks to Andreas and
Peter.
--
Michael


signature.asc
Description: PGP signature

RE: Libpq support to connect to standby server as priority

2019-03-29 Thread Tsunakawa, Takayuki

Hi Hari-san,

I've reviewed all the files.  The patch would be OK when the following have 
been fixed, except for the complexity of fe-connect.c (which probably cannot be 
improved.)

Unfortunately, I'll be absent next week.  The earliest date I can do the test 
will be April 8 or 9.  I hope someone could take care of this patch...


(1) 0001
With this read-only option type, application can connect to
to a read-only server in the list of hosts, in case
...
before issuing the SHOW transaction_readonly to find out whether


"to" appears twice in a row.
transaction_readonly -> transaction_read_only


(2) 0001
+succesful connection or failure.

succesful -> successful


(3) 0008
to conenct to a standby server with a faster check instead of

conenct -> connect


(4) 0008
Logically, recovery exit should be notified after the following statement:

XLogCtl->SharedRecoveryInProgress = false;


(5) 0008
+   /* Update in_recovery status. */
+   if (LocalRecoveryInProgress)
+   SetConfigOption("in_recovery",
+   "on",
+   PGC_INTERNAL, 
PGC_S_OVERRIDE);
+

This SetConfigOption() is called for every RecoveryInProgress() call on the 
standby.  It may cause undesirable overhead.  How about just calling 
SetConfigOption() once in InitPostgres() to set the initial value for 
in_recovery?  InitPostgres() and its subsidiary functions call 
SetConfigOption() likewise.


(6) 0008
async.c is for LISTEN/UNLISTEN/NOTIFY.  How about adding the new functions in 
postgres.c like RecoveryConflictInterrupt()?


(7) 0008
+   if (pid != 0)
+   {
+   (void) SendProcSignal(pid, reason, procvxid.backendId);
+   }

The braces are not necessary because the block only contains a single statement.


Regards
Takayuki Tsunakawa

Re: Multitenancy optimization

2019-03-29 Thread Hadi Moshayedi

On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Certainly it is possible to create multicolumn statistics to notify
> Postgres about columns correlation.
> But unfortunately it is not good and working solution.
>
> First of all we have to create multicolumn statistic for all possible
> combinations of table's attributes including "tenant_id".
> It is very inconvenient and inefficient.
>

On the inconvenient part: doesn't postgres itself automatically create
functional dependencies on combinations? i.e. it seems to me if we create
statistics on (a, b, c), then we don't need to create statistics on (a, b)
or (a, c) or (b, c), because the pg_statistic_ext entry for (a, b, c)
already includes enough information.

On the inefficient part, I think there's some areas of improvement here.
For example, if (product_id) -> seller_id correlation is 1.0, then
(product_id, product_name) -> seller_id correlation is definitely 1.0 and
we don't need to store it. So we can reduce the amount of information
stored in pg_statistic_ext -> stxdependencies, without losing any data
points.

More generally, if (a) -> b correlation is X, then (a, c) -> b correlation
is >= X. Maybe we can have a threshold to reduce number of entries in
pg_statistic_ext -> stxdependencies.

-- Hadi

Re: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 09:04, Michael Paquier wrote:
> On Fri, Mar 29, 2019 at 10:39:23AM +0300, Sergei Kornilov wrote:
>> wow! Congratulations! This was very long way
>>
>> my favorite pg12 feature
> 
> So this has been committed, nice!  Thanks a lot to all for keeping
> alive this patch over the ages, with particular thanks to Andreas and
> Peter.

So, we're getting buildfarm failures, only with clang.  I can reproduce
those (with clang).

It seems the issue is somewhere near indexcmds.c "Phase 6 of REINDEX
CONCURRENTLY".  More eyes welcome.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Timeout parameters

2019-03-29 Thread Nagaura, Ryohei

Hi Horiguchi-san,
Thank you so much for your review!

> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> Hmm. "forcefully" means powerful or assertive, in Japanese "力強く
> " or "強硬に".  "forcibly" means acoomplished through force, in Japanesee "
> 無理やり" or "強引に". Actually the latter is used in man 7 tcp.
Oops, I adopted the latter in both patches about tcp user timeout.

> The comment protrudes left.
Fixed.

> Couldn't we have the new variable in the section, like this?
ok!

Best regards,
-
Ryohei Nagaura




socket_timeout_v12.patch
Description: socket_timeout_v12.patch


TCP_backend_v19.patch
Description: TCP_backend_v19.patch


TCP_interface_v18.patch
Description: TCP_interface_v18.patch

Re: Syntax diagrams in user documentation

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 03:53, Corey Huinker wrote:
> #3b As long as I live, I will never properly memorize the syntax for
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I will google this
> and copy-paste it. I suspect I'm not alone. If it's available only in an
> image, then I can't copy paste, and I /will/ mistype some part of that
> at least twice.

I doubt that we would remove the current textual synopses.  The graphics
would just be an addition.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-03-29 Thread Kyotaro HORIGUCHI

Hello. Sorry for being late a bit.

At Wed, 27 Mar 2019 17:30:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190327.173037.40342566.horiguchi.kyot...@lab.ntt.co.jp>
> > I don't see much point in continuing to review this patch at this
> > point.  There's been no new version of the patch in 3 weeks, and there
> > is -- in my view at least -- a rather frustrating lack of evidence
> > that the complexity this patch introduces is actually beneficial.  No
> > matter how many people +1 the idea of making this more complicated, it
> > can't be justified unless you can provide a test result showing that
> > the additional complexity solves a problem that does not get solved
> > without that complexity.  And even then, who is going to commit a
> > patch that uses a design which Tom Lane says was tried before and
> > stunk?
> 
> Hmm. Anyway it is hit by recent commit. I'll post a rebased
> version and a version reverted to do hole-scan. Then I'll take
> numbers as far as I can and will show the result.. tomorrow.

I took performance numbers for master and three versions of the
patch. Master, LRU, full-scan, modified full-scan. I noticed that
useless scan can be skipped in full-scan version so I added the
last versoin.

I ran three artificial test cases. The database is created by
gen_tbl.pl. Numbers are the average of the fastest five runs in
successive 15 runs.

Test cases are listed below.

1_0. About 3,000,000 negative entries are created in pg_statstic
  cache by scanning that many distinct columns. It is 3000 tables
  * 1001 columns. Pruning scans happen several times while a run
  but no entries are removed. This emulates the bloating phase of
  cache. catalog_cache_prune_min_age is default (300s).
  (access_tbl1.pl)

1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0,
  which means turning off.

2_0. Repeatedly access 1001 of the 3,000,000 entries 6000
  times. This emulates the stable cache case without having
  pruning. catalog_cache_prune_min_age is default (300s).
 (access_tbl2.pl)

2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0,
  which means turning off.

3_0. Scan over the 3,000,000 entries twice with setting prune_age
  to 10s. A run takes about 18 seconds on my box so fair amount
  of old entries are removed. This emulates the stable case with
  continuous pruning. (access_tbl3.pl)

2_1. Same to 3_0 except that catalog_cache_prune_min_age is 0,
  which means turning off.


The result follows.

 | master |  LRU   |  Full  |Full-mod|
-|++++
 1_0 | 17.287 | 17.370 | 17.255 | 16.623 |
 1_1 | 17.287 | 17.063 | 16.336 | 17.192 |
 2_0 | 15.695 | 18.769 | 18.563 | 15.527 |
 2_1 | 15.695 | 18.603 | 18.498 | 18.487 |
 3_0 | 26.576 | 33.817 | 34.384 | 34.971 |
 3_1 | 26.576 | 27.462 | 26.202 | 26.368 |

The result of 2_0 and 2_1 seems strange, but I show you the
numbers at the present.

- Full-scan seems to have the smallest impact when turned off.

- Full-scan-mod seems to perform best in total. (as far as
  Full-mod-2_0 is wrong value..)

- LRU doesn't seem to outperform full scanning.

For your information I measured how long pruning takes time.

LRU318318 out of 2097153 entries in 26ms:  0.08us/entry.
Full-scan  443443 out of 2097153 entreis in 184ms. 0.4us/entry.

LRU is actually fast to remove entries but the difference seems
to be canceled by the complexity of LRU maintenance.

As my conclusion, we should go with the Full-scan or
Full-scan-mod version. I conduct a further overnight test and
will see which is better.

I attached the test script set. It is used in the folling manner.

(start server)
# perl gen_tbl.pl | psql postgres
(stop server)
# sh run.sh 30 > log.txt   # 30 is repeat count
# perl process.pl
 | master |  LRU   |  Full  |Full-mod|
-|++++
 1_0 | 16.711 | 17.647 | 16.767 | 17.256 |
...


The attached files are follow.

LRU versions patches.
  LRU-0001-Add-dlist_move_tail.patch
  LRU-0002-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

Fullscn version patch.
  FullScan-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

Fullscn-mod version patch.
  FullScan-mod-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

test scripts.
  test_script.tar.gz


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1c397d118a65d6b76282cc904c43ecfe97ee5329 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Feb 2019 14:56:07 +0900
Subject: [PATCH 1/2] Add dlist_move_tail

We have dlist_push_head/tail and dlist_move_head but not
dlist_move_tail. Add it.
---
 src/include/lib/ilist.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index b1a5974ee4..659ab1ac87 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move elemen

Re: Syntax diagrams in user documentation

2019-03-29 Thread Peter Eisentraut

On 2019-03-28 23:45, Jeremy Schneider wrote:
> We're just gearing up for the Google Season of Docs and I think this
> would be a great task for a doc writer to help with.  Any reason to
> expect serious objections to syntax diagram graphics in the docs?

It's worth a thought, but I tend to think that this would not be a good
task for a "technical writer".  It's either an automation task or
busywork transcribing the syntax to whatever new intermediate format.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Timeout parameters

2019-03-29 Thread Jamison, Kirk

Hi Nagaura-san,

Thank you for the updated patches.
It became a long thread now, but it's okay, you've done a good amount of work.
There are 3 patches in total: 2 for tcp_user_timeout parameter, 1 for 
socket_timeout.


A. TCP_USER_TIMEOUT
Since I've been following the updates, I compiled a summary of addressed changes
you made from all the reviewers' comments for TCP_USER_TIMEOUT: 

For both client and server: apply ok, build ok, compile ok.

[DOCS]
I confirmed that docs were patterned from keepalives in both
config.sgml (backend) and libpq.sgml (interface). 
- changed "forcefully" to "forcibly"

It seems OK now to me.

[CODE]
1. SERVER  TCP_backend patch (v19)

- as per advice of Fabien, default postgres configuration file
has been modified (postgresql.conf.sample);
- as per advice of Horiguchi-san, I confirmed that the variable
is now grouped along with keepalives* under the TCP timeout settings.

- confirmed the removal of "errno != ENOPROTOOPT" in backend's 
pq_settcpusertimeout()

- confirmed the hook from show_tcp_user_timeout to pq_gettcpusertimeout(), 
based from the comment of Horiguchi-san to follow the tcp_keepalive* options' 
behavior.

- as per advice of Tsunakawa-san, GUC_NOT_IN_SAMPLE has been removed
and the default value is changed to 0.

- confirmed the modification of pq_settcpusertimeout():
a. to follow keepalives' pq_set* behavior as well
b. correct the error of sizeof(timeout)) << 0 to just a single '<'; 
'port->keepalives_count = count' to 'port->tcp_user_timeout = timeout'


2. CLIENT   TCP_interface patch (v18)

- confirmed that tcp_user_timeout is placed after keepalives options
in src/interfaces/libpq/fe-connect.c, 

- confirmed the change / integration for the comment below in 
fe-connect.c:PQconnectPoll():
>I don't see a point in the added part having own "if
>(!IS_AF_UNIX" block separately from tcp_keepalive options.

- confirmed the removal of the following:
a. "errno != ENOPROTOOPT" in setTCPUserTimeout()
b. unused parameter: char *tcp_user_timeout;
c. error for non-supported systems


Overall, I tested the patch using the same procedure I did in the above email,
and it worked well. Given that you addressed the comments from all
the reviewers, I've judged these 2 patches as committable now.



B. socket_timeout parameter

> FYI, I summarized a use case of this parameter.
> The connection is built successfully.
> Suppose that the server is hit by some kind of accident(e,g,. I or 
> Tsunakawa-san suggested).
> At this time, there is no guarantee that the server OS can pass control to 
> the postmaster.
> Therefore, it is considered natural way to disconnect the connection from the 
> client side.
> The place of implement of disconnection is pqWait() because it is where 
> users' infinite wait occurs.

apply, build, compile --> OK
Doc seems OK now to me.
The patch looks good to me as you addressed the code comments from 
Tsunakawa-san.
Although it's still the committer who will have the final say.



That said, I am now marking this one "Ready for Committer".
If others have new comments by weekend, feel free to change the status.


Regards,
Kirk Jamison

Re: Multitenancy optimization

2019-03-29 Thread Konstantin Knizhnik




On 29.03.2019 11:06, Hadi Moshayedi wrote:
On Thu, Mar 28, 2019 at 5:40 AM Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> wrote:


Certainly it is possible to create multicolumn statistics to notify
Postgres about columns correlation.
But unfortunately it is not good and working solution.

First of all we have to create multicolumn statistic for all possible
combinations of table's attributes including "tenant_id".
It is very inconvenient and inefficient.

On the inconvenient part: doesn't postgres itself automatically create 
functional dependencies on combinations? i.e. it seems to me if we 
create statistics on (a, b, c), then we don't need to create 
statistics on (a, b) or (a, c) or (b, c), because the pg_statistic_ext 
entry for (a, b, c) already includes enough information.


On the inefficient part, I think there's some areas of improvement 
here. For example, if (product_id) -> seller_id correlation is 1.0, 
then (product_id, product_name) -> seller_id correlation is definitely 
1.0 and we don't need to store it. So we can reduce the amount of 
information stored in pg_statistic_ext -> stxdependencies, without 
losing any data points.


More generally, if (a) -> b correlation is X, then (a, c) -> b 
correlation is >= X. Maybe we can have a threshold to reduce number of 
entries in pg_statistic_ext -> stxdependencies.


-- Hadi


Yes, Postgres automatically creates functional dependencies on combinations.
But actually you do not need ALL combinations. Table can contain 
hundreds of attributes: number of combination in this case will not fit 
in bigint.
This is why Postgres doesn't allow to create muticolumn statistic for 
more than 8 columns.
So if you have table with hundred attributes and tenant_id, you with 
have to manually create statistic for each  pair.
And it is very inconvenient (and as I already mentioned doesn't 
completely solve the problem with join selectivity estimation).


May be there are some other ways of addressing this problem (although I 
do not them).
But I think that in any case, if number of distinction values is 
explicitly specified for the attribute, then this value should be used 
by optimizer instead of dummy DEFAULT_NUM_DISTINCT.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: SET LOCAL ROLE NO RESET -- sandbox transactions

2019-03-29 Thread Eric Hanson

These seem like much better ideas than mine. :-)  Thanks.

Did anything ever come of these ideas?  Do you have a sense of the level of
community support around these ideas?

Thanks,
Eric

On Wed, Mar 27, 2019 at 11:23 AM Chapman Flack 
wrote:

> On 3/27/19 2:40 AM, Eric Hanson wrote:
>
> > What would be the implications of adding a NO RESET clause to SET LOCAL
> > ROLE?
>
> There's a part of this that seems to be a special case of the
> GUC-protected-by-cookie idea discussed a bit in [1] and [2]
> (which is still an idea that I like).
>
> Regards,
> -Chap
>
> [1]
> https://www.postgresql.org/message-id/59127E4E.8090705%40anastigmatix.net
>
> [2]
>
> https://www.postgresql.org/message-id/CA%2BTgmoYOz%2BZmOteahrduJCc8RT8GEgC6PNXLwRzJPObmHGaurg%40mail.gmail.com
>

RE: Timeout parameters

2019-03-29 Thread Nagaura, Ryohei

Hi all.

I found my mistake in backend patch.

I modified from
+   port->keepalives_count = count;
to
+   port->tcp_user_timeout = timeout;
in line 113.

Sorry for mistake like this again and again...
Best regards,
-
Ryohei Nagaura




socket_timeout_v12.patch
Description: socket_timeout_v12.patch


TCP_backend_v18.patch
Description: TCP_backend_v18.patch


TCP_interface_v17.patch
Description: TCP_interface_v17.patch

Re: FETCH FIRST clause PERCENT option

2019-03-29 Thread Surafel Temesgen

Hi,
On Thu, Feb 28, 2019 at 2:50 PM Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> 
> -* previous time we got a different result.
> +* previous time we got a different result.In PERCENTAGE option
> there are
> +* no bound on the number of output tuples */
>  */
> 
>
>
 Thank you for testing .Fixed
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..e3ce4d7e36 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1440,7 +1440,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..7bfd7fd034 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -21,6 +21,8 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "executor/executor.h"
 #include "executor/nodeLimit.h"
 #include "miscadmin.h"
@@ -44,6 +46,7 @@ ExecLimit(PlanState *pstate)
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
+	TupleDesc   tupleDescriptor;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -52,6 +55,8 @@ ExecLimit(PlanState *pstate)
 	 */
 	direction = node->ps.state->es_direction;
 	outerPlan = outerPlanState(node);
+	slot = node->subSlot;
+	tupleDescriptor = node->ps.ps_ResultTupleDesc;
 
 	/*
 	 * The main logic is a simple state machine.
@@ -81,7 +86,15 @@ ExecLimit(PlanState *pstate)
 			/*
 			 * Check for empty window; if so, treat like empty subplan.
 			 */
-			if (node->count <= 0 && !node->noCount)
+			if (node->limitOption == PERCENTAGE)
+			{
+if (node->percent == 0.0)
+{
+	node->lstate = LIMIT_EMPTY;
+	return NULL;
+}
+			}
+			else if (node->count <= 0 && !node->noCount)
 			{
 node->lstate = LIMIT_EMPTY;
 return NULL;
@@ -107,6 +120,15 @@ ExecLimit(PlanState *pstate)
 	break;
 			}
 
+			/*
+			 * We may needed this tuple in backward scan so put it into tuplestore.
+			 */
+			if (node->limitOption == PERCENTAGE)
+			{
+tuplestore_puttupleslot(node->tupleStore, slot);
+tuplestore_advance(node->tupleStore, true);
+			}
+
 			/*
 			 * Okay, we have the first tuple of the window.
 			 */
@@ -124,6 +146,72 @@ ExecLimit(PlanState *pstate)
 		case LIMIT_INWINDOW:
 			if (ScanDirectionIsForward(direction))
 			{
+
+/*
+ * In case of coming back from backward scan the tuple is already
+ * in tuple store.
+ */
+if (node->limitOption == PERCENTAGE && node->backwardPosition > 0)
+{
+	slot = MakeSingleTupleTableSlot(tupleDescriptor, &TTSOpsMinimalTuple);
+	if (tuplestore_gettupleslot(node->tupleStore, true, false, slot))
+	{
+		node->subSlot = slot;
+		node->position++;
+		node->backwardPosition--;
+		return slot;
+	}
+	else
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+
+/*
+ * In PERCENTAGE case no need of executing outerPlan multiple times.
+ */
+if (node->limitOption ==  PERCENTAGE && node->reachEnd)
+{
+	node->lstate = LIMIT_WINDOWEND;
+
+	/*
+	 * If we know we won't need to back up, we can release
+	 * resources at this point.
+	 */
+	if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+		(void) ExecShutdownNode(outerPlan);
+
+	return NULL;
+}
+
+/*
+ * When in percentage mode, we need to see if we can get any
+ * additional rows from the subplan (enough to increase the
+ * node->count value).
+ */
+if (node->limitOption == PERCENTAGE)
+{
+	/* loop until the node->count increments */
+	while (node->position - node->offset >= node->count)
+	{
+		int64	cnt;
+
+		slot = ExecProcNode(outerPlan);
+		if (TupIsNull(slot))
+		{
+			node->reachEnd = true;
+			break;
+		}
+
+		tuplestore_puttupleslot(node->tupleStore, slot);
+
+		cnt = tuple

Re: Berserk Autovacuum (let's save next Mandrill)

2019-03-29 Thread Michael Banck

Hi,

On Thu, Mar 28, 2019 at 12:36:24PM +1300, David Rowley wrote:
> On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera  wrote:
> > I wonder if Mandrill's problem is related to Mailchimp raising the
> > freeze_max_age to a point where autovac did not have enough time to
> > react with an emergency vacuum.  If you keep raising that value because
> > the vacuums cause problems for you (they block DDL), there's something
> > wrong.
>
> I have seen some very high autovacuum_freeze_max_age settings
> recently. It would be interesting to know what they had theirs set to.
> I see they mentioned "Search and Url tables". I can imagine "search"
> never needs any UPDATEs, so quite possibly those were append-only, in
> which case the anti-wraparound vacuum would have had quite a lot of
> work on its hands since possibly every page needed frozen. A table
> receiving regular auto-vacuums from dead tuples would likely get some
> pages frozen during those.

By the way, the Routine Vacuuming chapter of the documentation says:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more space

[...]

If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively]
is trivial compared to your total database size, setting
autovacuum_freeze_max_age to its maximum allowed value is recommended."

Maybe this should be qualified with "unless you have trouble with your
autovacuum keeping up" or so; or generally reworded?


Michael

Re: speeding up planning with partitions

2019-03-29 Thread Amit Langote

Here are some comments on v38.

On 2019/03/29 12:44, Amit Langote wrote:
> Thanks again for the new patch.  I'm reading it now and will send comments
> later today if I find something.

-Assert(rte->rtekind == RTE_RELATION ||
-   rte->rtekind == RTE_SUBQUERY);
-add_appendrel_other_rels(root, rel, rti);
+if (rte->rtekind == RTE_RELATION)
+expand_inherited_rtentry(root, rel, rte, rti);
+else
+expand_appendrel_subquery(root, rel, rte, rti);

Wouldn't it be a good idea to keep the Assert?

+ * It's possible that the RTIs we just assigned for the child rels in the
+ * final rtable are different from where they were in the SELECT query.

In the 2nd sentence, maybe you meant "...from what they were"

+forboth(lc, old_child_rtis, lc2, new_child_rtis)
+{
+int old_child_rti = lfirst_int(lc);
+int new_child_rti = lfirst_int(lc2);
+
+if (old_child_rti == new_child_rti)
+continue;   /* nothing to do */
+
+Assert(old_child_rti > new_child_rti);
+
+ChangeVarNodes((Node *) child_appinfos,
+   old_child_rti, new_child_rti, 0);
+}

This seems necessary?  RTEs of children of the target table should be in
the same position in the final_rtable as they are in the select_rtable.
It seems that they can be added to parse->rtable simply as:

  orig_rtable_len = list_length(parse->rtable);
  parse->rtable = list_concat(parse->rtable,
  list_copy_tail(select_rtable,
 orig_rtable_len));

That is, after the block of code that plans the query as SELECT.

+ * about the childrens' reltargets, they'll be made later

Should it be children's?

+/*
+ * If the partitioned table has no partitions, treat this as the
+ * non-inheritance case.
+ */
+if (partdesc->nparts == 0)
+{
+/* XXX wrong? */
+parentrte->inh = false;
+return;
+}

About the XXX: I think resetting inh flag is unnecessary, so we should
just remove the line.  If we do that, we can also get rid of the following
code in set_rel_size():

else if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
/*
 * A partitioned table without any partitions is marked as
 * a dummy rel.
 */
set_dummy_rel_pathlist(rel);
}

Finally, it's not in the patch, but how about visiting
get_relation_constraints() for revising this block of code:

/*
 * Append partition predicates, if any.
 *
 * For selects, partition pruning uses the parent table's partition bound
 * descriptor, instead of constraint exclusion which is driven by the
 * individual partition's partition constraint.
 */
if (enable_partition_pruning && root->parse->commandType != CMD_SELECT)
{
List   *pcqual = RelationGetPartitionQual(relation);

if (pcqual)
{
/*
 * Run the partition quals through const-simplification similar to
 * check constraints.  We skip canonicalize_qual, though, because
 * partition quals should be in canonical form already; also,
 * since the qual is in implicit-AND format, we'd have to
 * explicitly convert it to explicit-AND format and back again.
 */
pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);

/* Fix Vars to have the desired varno */
if (varno != 1)
ChangeVarNodes((Node *) pcqual, 1, varno, 0);

result = list_concat(result, pcqual);
}
}

We will no longer need to load the partition constraints for "other rel"
partitions, not even for UPDATE and DELETE queries.  Now, we won't load
them with the patch applied, because we're cheating by first planning the
query as SELECT, so that's not an issue.  But we should change the
condition here to check if the input relation is a "baserel", in which
case, this should still load the partition constraint so that constraint
exclusion can use it when running with constraint_exclusion = on.  In
fact, I recently reported [1] on -hackers that we don't load the partition
constraint even if the partition is being accessed directly as a bug
introduced in PG 11.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/9813f079-f16b-61c8-9ab7-4363cab28d80%40lab.ntt.co.jp

Re: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 09:13, Peter Eisentraut wrote:
> On 2019-03-29 09:04, Michael Paquier wrote:
>> On Fri, Mar 29, 2019 at 10:39:23AM +0300, Sergei Kornilov wrote:
>>> wow! Congratulations! This was very long way
>>>
>>> my favorite pg12 feature
>>
>> So this has been committed, nice!  Thanks a lot to all for keeping
>> alive this patch over the ages, with particular thanks to Andreas and
>> Peter.
> 
> So, we're getting buildfarm failures, only with clang.  I can reproduce
> those (with clang).
> 
> It seems the issue is somewhere near indexcmds.c "Phase 6 of REINDEX
> CONCURRENTLY".  More eyes welcome.

I think I found a fix.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Michael Paquier

On Fri, Mar 29, 2019 at 09:13:35AM +0100, Peter Eisentraut wrote:
> So, we're getting buildfarm failures, only with clang.  I can reproduce
> those (with clang).

Indeed, I can reproduce the failures using -O2 with clang.  I am
wondering if we are not missing a volatile flag somewhere and that
some code reordering is at cause here.

> It seems the issue is somewhere near indexcmds.c "Phase 6 of REINDEX
> CONCURRENTLY".  More eyes welcome.

Here is a short reproducer:
create materialized view aam as select 1 AS a;
create index aai on aam(a);
reindex table CONCURRENTLY aam;
--
Michael


signature.asc
Description: PGP signature

Re: patch to allow disable of WAL recycling

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 01:09, Thomas Munro wrote:
>> I would like to fix these problems and commit the patch.  First, I'm
>> going to go and do some project-style tidying, write some proposed doc
>> tweaks, and retest these switches on the machine where I saw
>> beneficial effects from the patch before.  I'll post a new version
>> shortly to see if anyone has objections.
> Here's a new version of the patch.

I like the way the documentation is written in this patch version.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

2019-03-29 Thread Simon Riggs

On Wed, 27 Mar 2019 at 00:06, Andres Freund  wrote:

> Compute XID horizon for page level index vacuum on primary.
>
> Previously the xid horizon was only computed during WAL replay.

This commit message was quite confusing. It took me a while to realize this
relates to btree index deletes and that what you mean is that we are
calculcating the latestRemovedXid for index entries. That is related to but
not same thing as the horizon itself. So now I understand the "was computed
only during WAL replay" since it seemed obvious that the xmin horizon was
calculcated regularly on the master, but as you say the latestRemovedXid
was not.

Now I understand, I'm happy that you've moved this from redo into mainline.
And you've optimized it, which is also important (since performance was the
original objection and why it was placed in redo). I can see you've removed
duplicate code in hash indexes as well, which is good.

The term "xid horizon" was only used once in the code in PG11. That usage
appears to be a typo, since in many other places the term "xmin horizon" is
used to mean the point at which we can finally mark tuples as dead. Now we
have some new, undocumented APIs that use the term "xid horizon" yet still
code that refers to "xmin horizon", with neither term being defined. I'm
hoping you'll do some later cleanup of that to avoid confusion.

While trying to understand this, I see there is an even better way to
optimize this. Since we are removing dead index tuples, we could alter the
killed index tuple interface so that it returns the xmax of the tuple being
marked as killed, rather than just a boolean to say it is dead. Indexes can
then mark the killed tuples with the xmax that killed them rather than just
a hint bit. This is possible since the index tuples are dead and cannot be
used to follow the htid to the heap, so the htid is redundant and so the
block number of the tid could be overwritten with the xmax, zeroing the
itemid. Each killed item we mark with its xmax means one less heap fetch we
need to perform when we delete the page - it's possible we optimize that
away completely by doing this.

Since this point of the code is clearly going to be a performance issue it
seems like something we should do now.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: propagating replica identity to partitions

2019-03-29 Thread Peter Eisentraut

On 2019-03-28 17:46, Alvaro Herrera wrote:
> Thanks, Michael and Peter, for responding; however there is a second
> part to the question, which is "should I change the recursivity of
> REPLICA IDENTITY, while not simultaneously changing the recusivity of
> the TABLESPACE and OWNER TO forms of ALTER TABLE?"
> 
> I think everyone agrees that REPLICA IDENTITY should be changed.  The
> question is whether it's bad behavior from my part to change it
> separately from changing every other non-currently-recursive form.

If this were new functionality, I would tend to think that ALTER TABLE
should by default recurse for everything.  I don't know the history of
why it is not done for some things.  It might be a mistake, just like
the replica identity case apparently is.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: propagating replica identity to partitions

2019-03-29 Thread Peter Eisentraut

On 2019-03-28 18:16, Simon Riggs wrote:
> SET TABLESPACE should not recurse because it copies the data, while
> holding long locks. If that was ever fixed so it happened concurrently,
> I would agree this could recurse by default.

Since a partitioned table has no storage, what is the meaning of moving
a partitioned table to a different tablespace without recursing?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

2019-03-29 Thread Daniel Gustafsson

On Saturday, March 9, 2019 8:16 AM, Noah Misch  wrote:

This patch is not really in my wheelhouse, so I might very well be testing it
in the wrong way, but whats the fun in staying in shallow end. Below is my
attempt at reviewing this patchset.

Both patches applies with a little bit of fuzz, and passes check-world. No new
perlcritic warnings are introduced, but 016_shm.pl needs to be renamed to 017
since commit b0825d28ea83e44139bd319e6d1db2c499c claimed 016. They absolutely
seem to do what they on the tin, and to my understanding this seems like an
improvement we definitely want.

> I renamed IpcMemoryAnalyze() to PGSharedMemoryAttach() and deleted the old
> function of that name. Now, this function never calls shmdt(); the caller is
> responsible for that. I do like things better this way. What do you think?

I think it makes for a good API for the caller to be responsible, but it does
warrant a comment on the function to explicitly state that.

A few other small comments:

+   state = PGSharedMemoryAttach((IpcMemoryId) id2, &memAddress);
+   if (memAddress)
+   shmdt(memAddress);

This seems like a case where it would be useful to log a shmdt() error or do
an Assert() around the success of the operation perhaps?

+* Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
+* ensure no more than one postmaster per data directory can enter this
+* loop simultaneously.  (CreateDataDirLockFile() does not ensure that,
+* but prefer fixing it over coping here.)

This comment make it seem like there is a fix to CreateLockFile() missing to
his patch, is that correct? If so, do you have an idea for that patch?

> I tested on Red Hat and on Windows Server 2016; I won't be shocked
> if the test (not the code under test) breaks on other Windows configurations.

IIRC there are Windows versions where Win32::Process::KillProcess is required
for this, but thats admittedly somewhat dated knowledge. If the buildfarm goes
red on older Windows animals it might be something to look at perhaps.

Switching this to Ready for Committer since I can't see anything but tiny 
things.

cheers ./daniel

Re: pg_upgrade: Pass -j down to vacuumdb

2019-03-29 Thread Peter Eisentraut

On 2019-03-28 02:43, Jeff Janes wrote:
> At first blush I thought it was obvious that you would not want to run
> analyze-in-stages in parallel.  But after thinking about it some more
> and reflecting on experience doing some troublesome upgrades, I would
> reverse that and say it is now obvious you do want at least the first
> stage of analyze-in-stages, and probably the first two, to run in
> parallel.  That is not currently an option it supports, so we can't
> really recommend it in the script or the docs.

So do you think we should copy down the -j option from pg_upgrade, or
make some other arrangement?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: propagating replica identity to partitions

2019-03-29 Thread Simon Riggs

On Fri, 29 Mar 2019 at 09:51, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-28 18:16, Simon Riggs wrote:
> > SET TABLESPACE should not recurse because it copies the data, while
> > holding long locks. If that was ever fixed so it happened concurrently,
> > I would agree this could recurse by default.
>
> Since a partitioned table has no storage, what is the meaning of moving
> a partitioned table to a different tablespace without recursing?
>

My point was that people will misuse it and regret at length, but the data
copying behavior is clearly documented.

If you are saying there is no other possible interpretation of that
command, then I relent. It is important we have a way to do this, if people
wish it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_basebackup ignores the existing data directory permissions

2019-03-29 Thread Peter Eisentraut

On 2019-03-26 03:26, Michael Paquier wrote:
> Do we really want to extend the replication protocol to control that?

Perhaps we are losing sight of the original problem, which is that if
you create the target directory with the wrong permissions then ... it
has the wrong permissions.  And you are free to change the permissions
at any time.  Many of the proposed solutions sound excessively
complicated relative to that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Log a sample of transactions

2019-03-29 Thread Adrien NAYRAT


On 3/29/19 3:06 AM, Kuroda, Hayato wrote:

Dear Adrien,


Hello Kuroda-san,



I understood the cost of randomizing is very low. Maybe it's OK..
I'll change the status to "Ready For Committer."


Thanks, I hope it will be commited in PG12 as the feature is closed to 
log_statement_sample_rate.




Finally, I apologize for having delayed review.
Thank you for your good opportunities.


Don't worry, we all do what we can. Thanks for your review ;)


Best Regards,

Re: Enable data checksums by default

2019-03-29 Thread Bernd Helmle

Am Dienstag, den 26.03.2019, 16:14 +0100 schrieb Christoph Berg:
> select 92551.0/97363;
> 0.9506
> 
> So the cost is 5% in this very contrived case. In almost any other
> setting, the cost would be lower, I'd think.

Well, my machine (Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz, 32 GByte
RAM) tells me this:

pgbench -s 50 -i pgbench
pg_ctl -o "--shared-buffers=128kB" restart
pgbench -r -P4 -Mprepared -T60 -c $clients -j $clients -n -S

...prewarm...

Clients checksums
1   20110
2   35338
4   67207
8   96627
16  110091

Clients no checksums
1   21716
2   38543
4   72118
8   117545
16  121415

Clients Impact
1   0,926045312212194
2   0,916846119918014
4   0,931903269641421
8   0,822042621974563
16  0,906733105464728

So between ~7% to 18% impact with checksums in this specific case here.

Bernd

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread David Steele


Hi Robert,

On 3/20/19 5:08 PM, David Steele wrote:


I'll revise the patch if Peter thinks this approach looks reasonable.


Hopefully Peter's silence can be interpreted as consent.  Probably just 
busy, though.


I used your suggestions with minor editing.  After some reflection, I 
agree that the inline warnings are likely to be more effective than 
something at the end, at least for those working on a new implementation.


Regards,
--
-David
da...@pgmasters.net
From bd1754696ca503795ca69386d1d29a7347a77276 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Fri, 29 Mar 2019 10:12:09 +
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.

Update the exclusive backup documentation to explain the limitations of
the exclusive backup mode and make it clear that the feature is
deprecated.

Update the log message when the backup_label is found but recovery
cannot proceed.
---
 doc/src/sgml/backup.sgml  | 52 +++
 src/backend/access/transam/xlog.c | 10 --
 2 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..b67da8916a 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,13 +948,26 @@ SELECT * FROM pg_stop_backup(false, true);


 Making an exclusive low level backup
+
+
+ 
+  The exclusive backup method is deprecated and should be avoided.
+  Prior to PostgreSQL 9.6, this was the only
+  low-level method available, but it is now recommended that all users
+  upgrade their scripts to use non-exclusive backups.
+ 
+
+
 
  The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. This type of backup
- can only be taken on a primary and does not allow concurrent backups.
- Prior to PostgreSQL 9.6, this
- was the only low-level method available, but it is now recommended that
- all users upgrade their scripts to use non-exclusive backups if possible.
+ non-exclusive one, but it differs in a few key steps. This type of
+ backup can only be taken on a primary and does not allow concurrent
+ backups.  Moreover, because it writes a backup_label file on the
+ master, it can cause the master to fail to restart automatically after
+ a crash.  On the other hand, the erroneous removal of a backup_label
+ file from a backup or standby is a common mistake which can can result
+ in serious data corruption.  If it is necessary to use this method,
+ the following steps may be used.
 
 
   
@@ -1011,9 +1024,17 @@ SELECT pg_start_backup('label', true);
  consider during this backup.
 
 
-  Note that if the server crashes during the backup it may not be
-  possible to restart until the backup_label file has 
been
-  manually deleted from the PGDATA directory.
+ As noted above, if the server crashes during the backup it may not be
+ possible to restart until the backup_label file has
+ been manually deleted from the PGDATA directory.  Note
+ that it is very important to never remove the
+ backup_label file when restoring a backup, because
+ this will result in corruption.  Confusion about when it is appropriate
+ to remove this file is a common cause of data corruption when using this
+ method; be very certain that you remove the file only on an existing
+ master and never when building a standby or restoring a backup, even if
+ you are building a standby that will subsequently be promoted to a new
+ master.
 


@@ -1045,11 +1066,16 @@ SELECT pg_stop_backup();
  If the archive process has fallen behind
  because of failures of the archive command, it will keep retrying
  until the archive succeeds and the backup is complete.
- If you wish to place a time limit on the execution of
- pg_stop_backup, set an appropriate
- statement_timeout value, but make note that if
- pg_stop_backup terminates because of this your backup
- may not be valid.
+
+
+
+ When using exclusive backup mode, it is absolutely imperative to ensure
+ that pg_stop_backup completes successfully at the
+ end of the backup.  Even if the backup itself fails, for example due to
+ lack of disk space, failure to call pg_stop_backup
+ will leave the server in backup mode indefinitely, causing future backups
+ to fail and increasing the risk of a restart failure during the time that
+ backup_label exists.
 

   
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 19d7911ec5..9840a55c10 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6364,14 +6364,20 @@ StartupXLOG(void)
if (!ReadRecord(xlogreader, checkPoint.redo, 
LOG, false))
erepo

table_privileges view always show object owner as a grantor

2019-03-29 Thread Ashutosh Sharma

Hi All,

I noticed that irrespective of whoever grants privileges on an object,
it's always the object owner who is seen as a grantor in the output of
table_privileges view. As an example, consider the following case.

create user u1;
create user u2 with superuser;
create user u3;

\c postgres u1
create table t1(a integer);

\c postgres u2
grant select on t1 to u3; -- it's u2 who is granting select privileges
on t1 to u3

\c postgres u3
select * from table_privileges where table_name = 't1';

postgres=# \c postgres u3
You are now connected to database "postgres" as user "u3".

postgres=> select * from information_schema.table_privileges where
table_name = 't1';
 grantor | grantee | table_catalog | table_schema | table_name |
privilege_type | is_grantable | with_hierarchy
-+-+---+--+++--+
 u1  | u3  | postgres  | public   | t1 |
SELECT | NO   | YES
(1 row)

postgres=> select * from t1;
 a
---
(0 rows)

Above output of table_privilges shows 'u1' (who is the object owner of
t1) as a grantor instead of u2. Isn't that a wrong information ? If
incase that isn't wrong then may i know why does the postgresql
documentation on "table_privilegs" describes grantor as "Name of the
role that granted the privilege". Here is the documentation link for
table_privilges view.

https://www.postgresql.org/docs/current/infoschema-table-privileges.html

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: Re: log bind parameter values on error

2019-03-29 Thread David Steele


Hi Alexey,

On 3/14/19 12:38 PM, Peter Eisentraut wrote:


Meanwhile, I have committed a patch that refactors the ParamListInfo
initialization a bit, so you don't have to initialize hasTextValues in a
bunch of places unrelated to your core code.  So please rebase your
patch on that.


It's been two weeks with no new patch or answers to Peter's questions. 
Since we are nearly at the end of the CF I'll target this patch for PG13 
and mark it Returned with Feedback at the end of the CF if there's no 
new patch by then.


Regards,
--
-David
da...@pgmasters.net

Re: Re: Row Level Security − leakproof-ness and performance implications

2019-03-29 Thread David Steele


Hi Pierre,

On 3/18/19 8:13 PM, Joe Conway wrote:


I have no idea what the other entry is all about as I have not had the
time to look.


There doesn't seem to be consensus on your patch, either -- I'm planning 
to mark it rejected at the end of the CF unless you have a new patch for 
consideration.


This thread got a bit hijacked and is hard to follow.  If you do submit 
a new patch I recommend creating a new thread.


Regards,
--
-David
da...@pgmasters.net

Re: pg_ctl on windows can't open postmaster.pid: Permission denied

2019-03-29 Thread Thomas Munro

On Wed, Dec 13, 2017 at 5:01 PM Thomas Munro
 wrote:
> On Wed, Dec 13, 2017 at 4:24 PM, Andres Freund  wrote:
> > Hi,
> >
> > Buildfarm animal thrips just failed with a curious error:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thrips&dt=2017-12-13%2002%3A27%3A27
> >
> > == shutting down postmaster   ==
> > pg_ctl: could not open PID file 
> > "C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid":
> >  Permission denied
> >
> > otherwise there were no failures.
> >
> > I wonder if we're not opening the file with the right options to allow
> > us to open it while it's opened for writes in another backend? In the
> > backend we do so via FILE_SHARE_READ pgwin32_open which backs open and
> > fopen in backend code.
>
> Yeah, pg_ctl.c must be using fopen directly (from Windows' libc/crt).
> A sharing violation would indeed appear as errno == EACCES by my
> reading of the docs so that matches your theory.  I think this code
> should use pgwin32_fopen on Windows.

dory failed like this today:

== shutting down postmaster   ==
pg_ctl: could not open PID file
"c:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid":
Permission denied

This is probably a stupid question, but after commit 0ba06e0,
shouldn't pg_ctl.c have fopen defined as pgwin32_fopen by port.h,
because it was included by c.h, because it was included by
postgres_fe.h?

-- 
Thomas Munro
https://enterprisedb.com

Re: Enable data checksums by default

2019-03-29 Thread Ants Aasma

On Thu, Mar 28, 2019 at 10:38 AM Christoph Berg  wrote:

> Re: Ants Aasma 2019-03-27 <
> ca+csw_twxdrzdn2xsszbxej63dez+f6_hs3qf7hmxfenxsq...@mail.gmail.com>
> > Can you try with postgres compiled with CFLAGS="-O2 -march=native"?
> There's
> > a bit of low hanging fruit there to use a runtime CPU check to pick a
> > better optimized checksum function.
>
> Frankly, no. This is with the apt.pg.o packages which are supposed to
> be usable by everyone. If there is a better per-CPU checksum function,
> PG should pick it at runtime. Special compiler flags are a no-go here.
>

I went ahead and tested it on the count(*) test, same settings as upthread.
Median of 5 runs of 20txs on Intel i5-2500k @ 4GHz.

No checksum: 344ms
Checksums: 384ms (+12%)
No checksum march=native: 344ms
Checksums march=native: 369ms (+7%)

The checksum code was written to be easily auto-vectorized by the compiler.
So if we just compile the same function with different compiler flags and
pick between them at runtime the overhead can be approximately halved. Not
saying that this needs to be done before enabling checksums by default,
just that when considering overhead, we can foresee it being much lower in
future versions.

Regards,
Ants Aasma

Inconsistencies in the behavior of CHR() function in PG.

2019-03-29 Thread Prabhat Sahu

Hi All,

While trying to explore on CHR() function in PG,
I found that few of the ASCII values are returning hex number values(like
'\x08', '\x0B')
and few are executing within SQL (i.e. chr(9) => Horizontal tab,  chr(10)
=> Line feed) as below example.

postgres=# select 1|| chr(8)|| 2 || chr(9)||3 || chr(10)||4 || chr(11)||5
|| chr(12)||6 || chr(13)||7  as col1;
  col1

 1*\x08*2  3* +*
 4*\x0B*5*\x0C*6*\r*7
(1 row)

My question here is, why these inconsistencies in the behavior of CHR()
function?

-- 


With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Corporation

The Postgres Database Company

Re: Re: pg_basebackup ignores the existing data directory permissions

2019-03-29 Thread David Steele


On 3/26/19 3:59 AM, Haribabu Kommi wrote:


I am really questioning if we should keep this stuff isolated within
pg_basebackup or not.  At the same time, it may be confusing to have
BASE_BACKUP only use the permissions inherited from the data
directory, so some input from folks maintaining an external backup
tool is welcome.


That would be good to hear what other external backup tool authors
think of this change.


I'm OK with the -g (inherit|none|group) option as implemented.  I prefer 
the default as it is (inherit), which makes sense since I wrote it that way.


Having BASE_BACKUP set the permissions inside the tar file seems OK as 
well.  I'm not aware of any external solutions that are using the 
replication protocol directly - I believe they all use pg_basebackup, so 
I don't think they would need to change anything.


Having said that, I think setting permissions is a pretty trivial thing 
to do and there are plenty of possible scenarios that are still not 
covered here.  I have no objections to the patch but it seems like overkill.


Regards,
--
-David
da...@pgmasters.net

fsync error handling in pg_receivewal, pg_recvlogical

2019-03-29 Thread Peter Eisentraut

Do we need to review the fsync error handling in pg_receivewal and
pg_recvlogical, following the recent backend changes?  The current
default behavior is that these tools will log fsync errors and then
reconnect and proceed with the next data streaming in.  As a result, you
might then have some files in the accumulated WAL that have not been
fsynced.  Perhaps a hard exit would be more appropriate?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2019-03-29 Thread Michael Paquier

On Sat, Mar 09, 2019 at 10:15:37AM +0900, Michael Paquier wrote:
> I am adding an open item about that.  I think I could commit the
> patch, but I need to study it a bit more first.

So, coming back to this thread, and studying the problem again, it
looks that the diagnostic that a non-aggressive, anti-wraparound
vacuum could be triggered because the worker sees trouble in the
force because of some activity happening in parallel.  Hence, if we
face this case, it looks right to skip the vacuum for this relation.

Attached is an updated patch with a better error message, more
comments, and the removal of the anti-wraparound non-aggressive log
which was added in 28a8fa9.  The error message could be better I
guess.  Suggestions are welcome.

Thoughts?
--
Michael
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5c554f9465..82be8c81f3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,6 +248,25 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	/*
+	 * When running an anti-wraparound vacuum, we expect relfrozenxid to be
+	 * old enough so as aggressive is always set.  If this is not the case,
+	 * it could be possible that another concurrent vacuum process has done
+	 * the work for this relation so that relfrozenxid in relcache has
+	 * already been moved forward enough, causing this vacuum run to be
+	 * non-aggressive.  If that happens, note that this relation no longer
+	 * needs to be vacuumed, so just skip it.
+	 */
+	if (params->is_wraparound && !aggressive)
+	{
+		ereport(LOG,
+(errmsg_internal("found vacuum to prevent wraparound of table \"%s.%s.%s\" to be not aggressive, so skipping",
+ get_database_name(MyDatabaseId),
+ get_namespace_name(RelationGetNamespace(onerel)),
+ RelationGetRelationName(onerel;
+		return;
+	}
+
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
@@ -375,10 +394,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 			initStringInfo(&buf);
 			if (params->is_wraparound)
 			{
-if (aggressive)
-	msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
-else
-	msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
+/* an anti-wraparound vacuum has to be aggressive */
+Assert(aggressive);
+msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
 			}
 			else
 			{


signature.asc
Description: PGP signature

Re: Inconsistencies in the behavior of CHR() function in PG.

2019-03-29 Thread Christoph Berg

Re: Prabhat Sahu 2019-03-29 

> While trying to explore on CHR() function in PG,
> I found that few of the ASCII values are returning hex number values(like
> '\x08', '\x0B')
> and few are executing within SQL (i.e. chr(9) => Horizontal tab,  chr(10)
> => Line feed) as below example.

That's not a property of chr(), but generally of the "text" datatype:

# select E'\002'::text;
 text
──
 \x02

Non-printable characters are quoted. See also:

# select i, chr(i) from generate_series(1, 256) g(i);

Christoph

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Peter Eisentraut

On 2019-03-20 14:42, Magnus Hagander wrote:
> But that would be factually incorrect and backwards, so it seems like a
> terrible idea, at least when it comes to manual. If you are doing it
> manually, it's a lot *easier* to do it right with the non-exclusive
> mode, because you can easily keep one psql and one shell open. And
> that's safe.

The scenario I have in mind is, a poorly maintained server, nothing
installed, can't install anything (no internet connection, license
expired), flaky network, you fear it's going to fail soon, you need to
take a backup.  The simplest procedure would appear to be: start backup
mode, copy files away, stop backup mode.  Anything else that involves
holding a session open over there for the whole time is way more fragile
unless proper preparations have been made (and even then).  So I don't
know what you want to call that scenario, but I would feel more
comfortable having these basic tools available in a bind.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Should the docs have a warning about pg_stat_reset()?

2019-03-29 Thread Robert Haas

On Wed, Mar 27, 2019 at 7:49 PM David Rowley
 wrote:
> Yeah, analyze, not vacuum.  It is a bit scary to add new ways for
> auto-vacuum to suddenly have a lot of work to do.  When all workers
> are busy it can lead to neglect of other duties.  It's true that there
> won't be much in the way of routine vacuuming work for the database
> the stats were just reset on, as of course, all the n_dead_tup
> counters were just reset. However, it could starve other databases of
> vacuum attention.  Anti-wraparound vacuums on the current database may
> get neglected too.
>
> I'm not saying let's not do it, I'm just saying we need to think of
> what bad things could happen as a result of such a change.

+1.  I think that if we documented that pg_stat_reset() is going to
trigger an auto-analyze of every table in your system, we'd have some
people who didn't read the documentation and unleashed a storm of
auto-analyze activity, and other people who did read the documentation
and then intentionally used it to unleash a storm of auto-analyze
activity.  Neither sounds that great.

I really wish somebody had the time and energy to put some serious
work on the problem of autovacuum scheduling in general.  Our current
algorithm is a huge improvement over what what we had before 8.3, but
that was a decade ago.  This particular issue strikes me as something
that is likely to be hard to solve with an isolated tweak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [GSoC 2019] Proposal: Develop Performance Farm Database and Website

2019-03-29 Thread Robert Haas

On Tue, Mar 26, 2019 at 9:10 AM Ila B.  wrote:
> I am Ilaria Battiston, an aspiring GSoC student, and I would love to have a 
> feedback on the first draft of my Google Summer of Code proposal. The project 
> is "Develop Performance Farm Database and Website”. You can find any other 
> detail in the attached PDF file :)

I think there's probably a very large amount of work to be done in
this area.  Nobody is going to finish it in a summer.  Still, there's
probably some useful things you could get done in a summer.  I think a
lot will depend on finding a good mentor who is familiar with these
areas (which I am not).  Has anyone expressed an interest?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online verification of checksums

2019-03-29 Thread Magnus Hagander

On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra 
wrote:

> On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote:
> >Hi,
> >
> >On 2019-03-28 21:09:22 +0100, Michael Banck wrote:
> >> I agree that the current patch might have some corner-cases where it
> >> does not guarantee 100% accuracy in online mode, but I hope the current
> >> version at least has no more false negatives.
> >
> >False positives are *bad*. We shouldn't integrate code that has them.
> >
>
> Yeah, I agree. I'm a bit puzzled by the reluctance to make the online mode
> communicate with the server, which would presumably address these issues.
> Can someone explain why not to do that?
>

I agree that this effort seems better spent on fixing those issues there
(of which many are the same), and then re-use that.


FWIW I've initially argued against that, believing that we can address
> those issues in some other way, and I'd love if that was possible. But
> considering we're still trying to make that work reliably I think the
> reasonable conclusion is that Andres was right communicating with the
> server is necessary.
>
> Of course, I definitely appreciate people are working on this, otherwise
> we wouldn't be having this discussion ...
>

+1.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Robert Haas

On Fri, Mar 29, 2019 at 6:27 AM David Steele  wrote:
> I used your suggestions with minor editing.  After some reflection, I
> agree that the inline warnings are likely to be more effective than
> something at the end, at least for those working on a new implementation.

I'm glad we could agree on something.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[PG 9.6]make is failing for test_decoding contrib module.

2019-03-29 Thread tushar


Hi,

Found by one of the my colleague - Kashif Jeeshan ,  in PG 9.6 - make is 
failing for test_decoding contrib module.


[centos@centos-cpula test_decoding]$ make
gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -O2 -fPIC -I. -I. 
-I../../src/include -D_GNU_SOURCE   -c -o test_decoding.o test_decoding.c

In file included from ../../src/include/postgres.h:48,
 from test_decoding.c:13:
../../src/include/utils/elog.h:71:28: error: utils/errcodes.h: No such 
file or directory

In file included from ../../src/include/replication/slot.h:15,
 from ../../src/include/replication/logical.h:12,
 from test_decoding.c:23:
../../src/include/storage/lwlock.h:129:33: error: storage/lwlocknames.h: 
No such file or directory

test_decoding.c: In function ‘pg_decode_startup’:
test_decoding.c:127: error: ‘ERRCODE_INVALID_PARAMETER_VALUE’ undeclared 
(first use in this function)
test_decoding.c:127: error: (Each undeclared identifier is reported only 
once

test_decoding.c:127: error: for each function it appears in.)
make: *** [test_decoding.o] Error 1
[centos@centos-cpula test_decoding]$

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Magnus Hagander

On Fri, Mar 29, 2019 at 1:19 PM Robert Haas  wrote:

> On Fri, Mar 29, 2019 at 6:27 AM David Steele  wrote:
> > I used your suggestions with minor editing.  After some reflection, I
> > agree that the inline warnings are likely to be more effective than
> > something at the end, at least for those working on a new implementation.
>
> I'm glad we could agree on something.  Committed.
>

+1, thanks.

Minor nitpick:
+ backup can only be taken on a primary and does not allow concurrent
+ backups.  Moreover, because it writes a backup_label file on the
+ master, it can cause the master to fail to restart automatically after

Let's be consistent in if we call it a primary or a master, at least within
the same paragraph :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Block level parallel vacuum

2019-03-29 Thread Robert Haas

On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada  wrote:
> You're right, the previous patches are wrong. Attached the updated
> version patches.

0001 looks good now.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 11:27, David Steele wrote:
>> I'll revise the patch if Peter thinks this approach looks reasonable.
> 
> Hopefully Peter's silence can be interpreted as consent.  Probably just 
> busy, though.
> 
> I used your suggestions with minor editing.  After some reflection, I 
> agree that the inline warnings are likely to be more effective than 
> something at the end, at least for those working on a new implementation.

It looks very sensible now, I think.  Thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread David Steele


On 3/29/19 11:58 AM, Peter Eisentraut wrote:

On 2019-03-20 14:42, Magnus Hagander wrote:

But that would be factually incorrect and backwards, so it seems like a
terrible idea, at least when it comes to manual. If you are doing it
manually, it's a lot *easier* to do it right with the non-exclusive
mode, because you can easily keep one psql and one shell open. And
that's safe.


The scenario I have in mind is, a poorly maintained server, nothing
installed, can't install anything (no internet connection, license
expired), flaky network, you fear it's going to fail soon, you need to
take a backup.  The simplest procedure would appear to be: start backup
mode, copy files away, stop backup mode.  Anything else that involves
holding a session open over there for the whole time is way more fragile
unless proper preparations have been made (and even then).  So I don't
know what you want to call that scenario, but I would feel more
comfortable having these basic tools available in a bind.


I would argue the best thing in this scenario is to use pg_basebackup. 
It's a solid tool and likely far better than any script the user might 
cook up on the spot.


Regards,
--
-David
da...@pgmasters.net

Re: House style for DocBook documentation?

2019-03-29 Thread Peter Eisentraut

On 2019-03-08 15:38, Chapman Flack wrote:
> Perhaps:
> 
> o  For an internal link, use  if you will supply text, else 
> o  For an external link, use  with or without link text

I have pushed an update to this effect.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PG 9.6]make is failing for test_decoding contrib module.

2019-03-29 Thread Robert Haas

On Fri, Mar 29, 2019 at 8:24 AM tushar  wrote:
> Found by one of the my colleague - Kashif Jeeshan ,  in PG 9.6 - make is

Kashif Jeeshan?

> failing for test_decoding contrib module.
>
> [centos@centos-cpula test_decoding]$ make
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv -O2 -fPIC -I. -I.
> -I../../src/include -D_GNU_SOURCE   -c -o test_decoding.o test_decoding.c
> In file included from ../../src/include/postgres.h:48,
>   from test_decoding.c:13:
> ../../src/include/utils/elog.h:71:28: error: utils/errcodes.h: No such
> file or directory
> In file included from ../../src/include/replication/slot.h:15,
>   from ../../src/include/replication/logical.h:12,
>   from test_decoding.c:23:
> ../../src/include/storage/lwlock.h:129:33: error: storage/lwlocknames.h:
> No such file or directory
> test_decoding.c: In function ‘pg_decode_startup’:
> test_decoding.c:127: error: ‘ERRCODE_INVALID_PARAMETER_VALUE’ undeclared
> (first use in this function)
> test_decoding.c:127: error: (Each undeclared identifier is reported only
> once
> test_decoding.c:127: error: for each function it appears in.)
> make: *** [test_decoding.o] Error 1
> [centos@centos-cpula test_decoding]$

I think your tree is not clean, or you haven't built the server
correctly first.  If this were actually broken, the buildfarm would be
red:

https://buildfarm.postgresql.org/cgi-bin/show_status.pl

Try 'git clean -dfx'.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread David Steele


On 3/29/19 12:25 PM, Magnus Hagander wrote:
On Fri, Mar 29, 2019 at 1:19 PM Robert Haas > wrote:


On Fri, Mar 29, 2019 at 6:27 AM David Steele mailto:da...@pgmasters.net>> wrote:
 > I used your suggestions with minor editing.  After some reflection, I
 > agree that the inline warnings are likely to be more effective than
 > something at the end, at least for those working on a new
implementation.

I'm glad we could agree on something.  Committed.


Me, too.  Thanks!


Minor nitpick:
+     backup can only be taken on a primary and does not allow concurrent
+     backups.  Moreover, because it writes a backup_label file on the
+     master, it can cause the master to fail to restart automatically after

Let's be consistent in if we call it a primary or a master, at least 
within the same paragraph :)


Agreed, let's stick with "primary".

Are we planning to back-patch this?  The deprecation was added to the
docs in 9.6 -- I think these clarifications would be helpful.

Thanks,
--
-David
da...@pgmasters.net

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Robert Haas

On Fri, Mar 29, 2019 at 8:45 AM David Steele  wrote:
> Are we planning to back-patch this?  The deprecation was added to the
> docs in 9.6 -- I think these clarifications would be helpful.

I wasn't planning too, but I guess we could consider it.  I'd be more
inclined to back-patch the documentation changes than the message text
changes, but what do other people think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread David Steele


On 3/29/19 12:46 PM, Robert Haas wrote:

On Fri, Mar 29, 2019 at 8:45 AM David Steele  wrote:

Are we planning to back-patch this?  The deprecation was added to the
docs in 9.6 -- I think these clarifications would be helpful.


I wasn't planning too, but I guess we could consider it.  I'd be more
inclined to back-patch the documentation changes than the message text
changes, but what do other people think?


I think we should definitely do the docs, I'm 50% on the message.  You 
could argue that is is a behavioral change, but it is pretty important info.


Regards,
--
-David
da...@pgmasters.net

Re: [GSoC 2019] Proposal: Develop Performance Farm Database and Website

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 13:04, Robert Haas wrote:
> On Tue, Mar 26, 2019 at 9:10 AM Ila B.  wrote:
>> I am Ilaria Battiston, an aspiring GSoC student, and I would love to have a 
>> feedback on the first draft of my Google Summer of Code proposal. The 
>> project is "Develop Performance Farm Database and Website”. You can find any 
>> other detail in the attached PDF file :)
> 
> I think there's probably a very large amount of work to be done in
> this area.  Nobody is going to finish it in a summer.  Still, there's
> probably some useful things you could get done in a summer.  I think a
> lot will depend on finding a good mentor who is familiar with these
> areas (which I am not).  Has anyone expressed an interest?

Moreover, I have a feeling that have been hearing about work on a
performance farm for many years.  Perhaps it should be investigated what
became of that work and what the problems were getting it to a working
state.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Magnus Hagander

On Fri, Mar 29, 2019 at 1:49 PM David Steele  wrote:

> On 3/29/19 12:46 PM, Robert Haas wrote:
> > On Fri, Mar 29, 2019 at 8:45 AM David Steele 
> wrote:
> >> Are we planning to back-patch this?  The deprecation was added to the
> >> docs in 9.6 -- I think these clarifications would be helpful.
> >
> > I wasn't planning too, but I guess we could consider it.  I'd be more
> > inclined to back-patch the documentation changes than the message text
> > changes, but what do other people think?
>
> I think we should definitely do the docs, I'm 50% on the message.  You
> could argue that is is a behavioral change, but it is pretty important
> info.
>

+1 on the docs.

Is the changes to the messages going to cause issues or weirdness for
translators? That would be a reason not to backpatch it. Without that, I'm
leaning towards backpatching it.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Add exclusive backup deprecation notes to documentation

2019-03-29 Thread Peter Eisentraut

On 2019-03-29 13:54, Magnus Hagander wrote:
> Is the changes to the messages going to cause issues or weirdness for
> translators? That would be a reason not to backpatch it. Without that,
> I'm leaning towards backpatching it. 

Note that the messages refer to recovery.signal, so a backpatch would
have to rephrase the message.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: fsync error handling in pg_receivewal, pg_recvlogical

2019-03-29 Thread Michael Paquier

On Fri, Mar 29, 2019 at 12:48:09PM +0100, Peter Eisentraut wrote:
> Do we need to review the fsync error handling in pg_receivewal and
> pg_recvlogical, following the recent backend changes?  The current
> default behavior is that these tools will log fsync errors and then
> reconnect and proceed with the next data streaming in.  As a result, you
> might then have some files in the accumulated WAL that have not been
> fsynced.  Perhaps a hard exit would be more appropriate?

Yes, I think that we are going to need an equivalent of that for all
frontend tools.  At various degrees, making sure that a fsync happens
is also important for pg_dump, pg_basebackup, pg_rewind and
pg_checksums so it is not only a problem of the two tools you mention.
It seems to me that the most correct way to have those failures would
be to use directly exit(EXIT_FAILURE) in file_utils.c where
appropriate.
--
Michael


signature.asc
Description: PGP signature

Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2019-03-29 Thread Andrew Dunstan

On 3/29/19 7:51 AM, Michael Paquier wrote:
> On Sat, Mar 09, 2019 at 10:15:37AM +0900, Michael Paquier wrote:
>> I am adding an open item about that.  I think I could commit the
>> patch, but I need to study it a bit more first.
> So, coming back to this thread, and studying the problem again, it
> looks that the diagnostic that a non-aggressive, anti-wraparound
> vacuum could be triggered because the worker sees trouble in the
> force because of some activity happening in parallel.  Hence, if we
> face this case, it looks right to skip the vacuum for this relation.
>
> Attached is an updated patch with a better error message, more
> comments, and the removal of the anti-wraparound non-aggressive log
> which was added in 28a8fa9.  The error message could be better I
> guess.  Suggestions are welcome.
>
> Thoughts?

+                (errmsg_internal("found vacuum to prevent wraparound of
table \"%s.%s.%s\" to be not aggressive, so skipping",

This might convey something to hackers, but I doubt it will convey much
to regular users. Perhaps something like "skipping redundant
anti-wraparound vacuum of table ..." would be better.

The comment is also a bit too tentative. Perhaps something like this
would do:

Normally the relfrozenxid for an anti-wraparound vacuum will be old
enough to force an aggressive vacuum. However, a concurrent vacuum
might have already done this work that the relfroxzenxid in relcache
has been updated. If that happens this vacuum is redundant, so skip it.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

2019-03-29 Thread Alexander Korotkov

On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan
 wrote:
> On 3/28/19 9:50 AM, Tom Lane wrote:
> > Andres Freund  writes:
> >> On March 28, 2019 9:31:14 AM EDT, Tom Lane  wrote:
> >>> Has anybody gotten through a valgrind run on this code yet?
> >> Skink has successfully passed since - but that's x86...
> > Yeah, there is a depressingly high chance that this is somehow specific
> > to the bison version, flex version, and/or compiler in use on jacana.
>
> lousyjack has also passed it (x64).
>
> git bisect on jacana blames commit 550b9d26f.

Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l
compile at once.  I've re-read this commit and didn't find anything
suspicious.
I've asked Andrew for access to jacana in order to investigate this myself.



--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Block level parallel vacuum

2019-03-29 Thread Masahiko Sawada

On Fri, Mar 29, 2019 at 9:28 PM Robert Haas  wrote:
>
> On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada  
> wrote:
> > You're right, the previous patches are wrong. Attached the updated
> > version patches.
>
> 0001 looks good now.  Committed.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [PG 9.6]make is failing for test_decoding contrib module.

2019-03-29 Thread tushar


On 03/29/2019 06:12 PM, Robert Haas wrote:

On Fri, Mar 29, 2019 at 8:24 AM tushar  wrote:

Found by one of the my colleague - Kashif Jeeshan ,  in PG 9.6 - make is

Kashif Jeeshan?
:-) , actually he is also working  on logical replication on standbys 
testing - whenever he has some bandwidth (On/off)  ..he found one issue .
 i suggested him to see the behavior on PG 9.6/ PG 10 and while doing 
so - got this issue when he  performed make against test_decoding

test_decoding.c:127: error: for each function it appears in.)
make: *** [test_decoding.o] Error 1
[centos@centos-cpula test_decoding]$

I think your tree is not clean, or you haven't built the server
correctly first.  If this were actually broken, the buildfarm would be
red:

https://buildfarm.postgresql.org/cgi-bin/show_status.pl

Try 'git clean -dfx'.


Yes, you are right.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: New vacuum option to do only freezing

2019-03-29 Thread Robert Haas

On Fri, Mar 29, 2019 at 2:16 AM Masahiko Sawada  wrote:
> Attached updated patches. These patches are applied on top of 0001
> patch on parallel vacuum thread[1].

+bool index_cleanup = true;  /* by default */

I think we should instead initialize index_cleanup to the reloption
value, if there is one, or true if none, and then let it be overridden
by the loop that follows, where whatever the user specifies in the SQL
command is processed.  That way, any explicitly-specified option
within the command itself wins, and the reloption sets the default.
As you have it, index cleanup is disabled when the reloption is set to
false even if the user wrote VACUUM (INDEX_CLEANUP TRUE).

+appendStringInfo(&buf,
+ _("%.0f tuples and %.0f item identifiers are left
as dead.\n"),
+ vacrelstats->nleft_dead_tuples,
+ vacrelstats->nleft_dead_itemids);

I tend to think we should omit this line entirely if both values are
0, as will very often be the case.

+if ((params->options & VACOPT_FULL) != 0 &&
+(params->options & VACOPT_INDEX_CLEANUP) == 0)
+ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("VACUUM option INDEX_CLEANUP cannot be set to
false with FULL")));

I think it would be better to just ignore the INDEX_CLEANUP option
when FULL is specified.

I wasn't all that happy with the documentation changes you proposed.
Please find attached a proposed set of doc changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

vacuum-index-cleanup-doc-rmh.patch
Description: Binary data

Re: partitioned tables referenced by FKs

2019-03-29 Thread Robert Haas

On Wed, Mar 20, 2019 at 11:58 AM Alvaro Herrera
 wrote:
> constraint is dropped.  I can only think of ugly data structure to
> support this, and adding enough hooks in dependency.c to support this is
> going to be badly received.

I don't know why dependency.c doesn't handle this internally.  If I
say that I want to delete a list of objects, some of which can't be
dropped without dropping certain other things, dependency.c ought to
be able to suspend judgement on what the problems are until it's
looked over that whole list.  It seems to me that we've had to fight
with this problem before, and I don't know why every individual bit of
code that calls into that file has to be responsible for working
around it individually.

> I think a better idea is to prevent DROP of a partition that's
> referenced by a foreign key, period.  We would only allow ALTER..DETACH
> (they can drop the partition once it's detached).  It seems to me that
> the check can be done more naturally during detach than during drop,
> because we're not constrained to do it within the dependency.c system.

IMHO, that sucks a lot.  Maybe we should live with it anyway, but it does suck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: partitioned tables referenced by FKs

2019-03-29 Thread Robert Haas

On Thu, Mar 28, 2019 at 2:59 PM Alvaro Herrera  wrote:
> I ended up revising the dependencies that we give to the constraint in
> the partition -- instead of giving it partition-type dependencies, we
> give it an INTERNAL dependency.  Now when you request to drop the
> partition, it says this:
>
> create table pk (a int primary key) partition by list (a);
> create table fk (a int references pk);
> create table pk1 partition of pk for values in (1);
>
> alvherre=# drop table pk1;
> ERROR:  cannot drop table pk1 because other objects depend on it
> DETAIL:  constraint fk_a_fkey on table fk depends on table pk1
> HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Hmm.  I guess that's strictly better than blocking the drop
completely, but definitely not ideal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Offline enabling/disabling of data checksums

2019-03-29 Thread Michael Paquier

On Tue, Mar 26, 2019 at 01:41:38PM +0100, Fabien COELHO wrote:
>> I am not sure that "checksum status" is a correct term.  It seems to
>> me that "same configuration for data checksums as before the tool ran"
>> or something like that would be more correct.
> 
> Possibly, I cannot say.

I have put more thoughts into this part, and committed the
reorganization as you mainly suggested.
--
Michael


signature.asc
Description: PGP signature

Re: [GSoC 2019] Proposal: Develop Performance Farm Database and Website

2019-03-29 Thread Ilaria

Hello,

Thanks for the answer. This project is on the official PostgreSQL project list 
of GSoC 2019, and potential mentors are stated there. 

I trust mentors’ judgement on outlining the work and the tasks to be done in 
three months, and there is the previous student’s work to use as example if 
needed. The project consists in building a database and a website on top of it 
for users to browse performance data. 

Let me know whether there are any specific issues you’re concerned about. 

Ilaria

> Am 29.03.2019 um 13:52 schrieb Peter Eisentraut 
> :
> 
>> On 2019-03-29 13:04, Robert Haas wrote:
>>> On Tue, Mar 26, 2019 at 9:10 AM Ila B.  wrote:
>>> I am Ilaria Battiston, an aspiring GSoC student, and I would love to have a 
>>> feedback on the first draft of my Google Summer of Code proposal. The 
>>> project is "Develop Performance Farm Database and Website”. You can find 
>>> any other detail in the attached PDF file :)
>> 
>> I think there's probably a very large amount of work to be done in
>> this area.  Nobody is going to finish it in a summer.  Still, there's
>> probably some useful things you could get done in a summer.  I think a
>> lot will depend on finding a good mentor who is familiar with these
>> areas (which I am not).  Has anyone expressed an interest?
> 
> Moreover, I have a feeling that have been hearing about work on a
> performance farm for many years.  Perhaps it should be investigated what
> became of that work and what the problems were getting it to a working
> state.
> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2019-03-29 Thread Michael Paquier

On Fri, Mar 29, 2019 at 09:11:47AM -0400, Andrew Dunstan wrote:
> +                (errmsg_internal("found vacuum to prevent wraparound of
> table \"%s.%s.%s\" to be not aggressive, so skipping",
> 
> This might convey something to hackers, but I doubt it will convey much
> to regular users. Perhaps something like "skipping redundant
> anti-wraparound vacuum of table ..." would be better.

"skipping redundant" is much better.

> The comment is also a bit too tentative. Perhaps something like this
> would do:
> 
> Normally the relfrozenxid for an anti-wraparound vacuum will be old
> enough to force an aggressive vacuum. However, a concurrent vacuum
> might have already done this work that the relfroxzenxid in relcache
> has been updated. If that happens this vacuum is redundant, so skip it.

That works for me.
--
Michael


signature.asc
Description: PGP signature

Re: Enable data checksums by default

2019-03-29 Thread Michael Paquier

On Fri, Mar 29, 2019 at 11:16:11AM +0100, Bernd Helmle wrote:
> So between ~7% to 18% impact with checksums in this specific case here.

I can't really believe that many people set up shared_buffers at 128kB
which would cause such a large number of page evictions, but I can
believe that many users have shared_buffers set to its default value
and that we are going to get complains about "performance drop after
upgrade to v12" if we switch data checksums to on by default.
--
Michael


signature.asc
Description: PGP signature

Re: table_privileges view always show object owner as a grantor

2019-03-29 Thread Laurenz Albe

Ashutosh Sharma wrote:
> I noticed that irrespective of whoever grants privileges on an object,
> it's always the object owner who is seen as a grantor in the output of
> table_privileges view.

> Isn't that a wrong information ? If
> incase that isn't wrong then may i know why does the postgresql
> documentation on "table_privilegs" describes grantor as "Name of the
> role that granted the privilege". Here is the documentation link for
> table_privilges view.
> 
> https://www.postgresql.org/docs/current/infoschema-table-privileges.html

Currently the grantor of a privilege is the owner if a superuser
grants a privilege on the object.

If that were not so, how would you disambiguate between privileges
granted by a superuser and privileges passed on by somebody
who has been granted the privilege WITH GRANT OPTION?

Or, with an example:
If A grants SELECT to a table WITH GRANT OPTION to B, and
B grants the privilege to C, A cannot directly revoke the
privilege from C. All A can to is revoke the privilege from
B with the CASCADE option.

This distiction would be lost if B could appear as grantor
just because he has been superuser at some time in the past
(and doesn't hold the privilege himself).

So I'd say the behavior is fine as it is, but it would not harm to
document it better (or at all).

Yours,
Laurenz Albe

Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Michael Paquier wrote:

> On Fri, Mar 29, 2019 at 09:11:47AM -0400, Andrew Dunstan wrote:
> > +                (errmsg_internal("found vacuum to prevent wraparound of
> > table \"%s.%s.%s\" to be not aggressive, so skipping",
> > 
> > This might convey something to hackers, but I doubt it will convey much
> > to regular users. Perhaps something like "skipping redundant
> > anti-wraparound vacuum of table ..." would be better.
> 
> "skipping redundant" is much better.

Yeah, that looks good to me too.  I wonder if we really need it as LOG
though; we don't say anything for actions unless they take more than the
min duration, so why say something for a no-op that takes almost no time?
Maybe make it DEBUG1.

> > The comment is also a bit too tentative. Perhaps something like this
> > would do:
> > 
> > Normally the relfrozenxid for an anti-wraparound vacuum will be old
> > enough to force an aggressive vacuum. However, a concurrent vacuum
> > might have already done this work that the relfroxzenxid in relcache
> > has been updated. If that happens this vacuum is redundant, so skip it.
> 
> That works for me.

s/relfroxzenxid/relfrozenxid/

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_ctl on windows can't open postmaster.pid: Permission denied

2019-03-29 Thread Michael Paquier

On Sat, Mar 30, 2019 at 12:13:23AM +1300, Thomas Munro wrote:
> This is probably a stupid question, but after commit 0ba06e0,
> shouldn't pg_ctl.c have fopen defined as pgwin32_fopen by port.h,
> because it was included by c.h, because it was included by
> postgres_fe.h?

Yeah, normally pg_ctl should be using the concurrent-safe wrapper.
--
Michael


signature.asc
Description: PGP signature

Re: [PG 9.6]make is failing for test_decoding contrib module.

2019-03-29 Thread tushar


On 03/29/2019 06:12 PM, Robert Haas wrote:

Kashif Jeeshan?


Ohh, Please read - Kashif Zeeshan.  Sorry for the typo.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

unsuscribe

2019-03-29 Thread Enrique Kurth Schoenfeld Escobar

Re: partitioned tables referenced by FKs

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Robert Haas wrote:

> On Wed, Mar 20, 2019 at 11:58 AM Alvaro Herrera
>  wrote:
> > constraint is dropped.  I can only think of ugly data structure to
> > support this, and adding enough hooks in dependency.c to support this is
> > going to be badly received.
> 
> I don't know why dependency.c doesn't handle this internally.  If I
> say that I want to delete a list of objects, some of which can't be
> dropped without dropping certain other things, dependency.c ought to
> be able to suspend judgement on what the problems are until it's
> looked over that whole list.  It seems to me that we've had to fight
> with this problem before, and I don't know why every individual bit of
> code that calls into that file has to be responsible for working
> around it individually.

Hmm, if you think this kind of situation is more widespread than this
particular case, then maybe we can add the hooks I was talking about,
and simplify those other cases while fixing this problem.  Do you have
it in your mind what cases are those?  Nothing comes to mind, but I'll
search around and see if I can find anything.

Note that there are two separate problems here.  One is how to run
arbitrary code to see whether the DROP is allowed (in this case, the
arbitrary code to run is RI_Final_Check which searches for referenced
rows and throws ERROR if there are any).  My proposed hook system seems
to work OK for that.

The other problem is how to decide (in dependency.c terms) that the
partition can be dropped.  We have these scenarios to support:

1. ALTER TABLE fk DROP CONSTRAINT   -- should work cleanly
2. DROP referenced-partition-- allow only if no referenced 
rows
3. ALTER referenced-parent-table DETACH partition -- allow only if no 
referenced rows
4. DROP referenced-parent-table CASCADE -- drop the FK and partition

The patch I posted in v8 allowed cases 1, 2 and 3 but failed 4.

One possibility is to pass the whole list of objects to drop to the
hooks; so if we see (in the pg_class-specific hook) that the FK
constraint is listed as being dropped, we just don't execute
RI_Final_Check.

BTW, RI_Final_Check looks a bit like a silly name, now that I think
about it; it was only a temporary name I picked by opposition to
RI_Initial_Check.  Maybe RI_PartitionRemove_Check would be more apropos?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL pollutes the file system

2019-03-29 Thread Fred .Flintstone

I think that would be amazing! It would be great!

On Fri, Mar 29, 2019 at 4:01 AM Tatsuo Ishii  wrote:
>
> > Andreas Karlsson  writes:
> >> On 3/27/19 3:26 PM, Tomas Vondra wrote:
> >>> That is true, of course. But are there actual examples of such conflicts
> >>> in practice? I mean, are there tools/packages that provide commands with
> >>> a conflicting name? I'm not aware of any, and as was pointed before, we'd
> >>> have ~20 years of history on any new ones.
> >
> >> That is a fair argument. Since we squatted those names back in the
> >> mid-90s I think the risk of collision is low.
> >
> > Right.  I think there is a fair argument to be made for user confusion
> > (not actual conflict) with respect to createuser and dropuser.  The
> > argument for renaming any of the other tools is much weaker, IMO.
>
> If we were to invent new command names, what about doing similar to
> git? I mean something like:
>
> pgsql createdb 
>
> Here, "pgsql" is new command name and "createdb" is a sub command name
> to create a database.
>
> This way, we would be free from the command name conflict problem and
> plus, we could do:
>
> pgsql --help
>
> which will prints subscommand names when a user is not sure what is
> the sub command name.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp

clean up pg_checksums.sgml

2019-03-29 Thread Justin Pryzby

PFA patch with minor improvements to documentation.

Also, what do you think about changing user-facing language from
"check checksum" to "verify checksum" ?  I see that commit ed308d78 actually
moved in the other direction, but I preferred "verify".
>From d8e627cf340e5384d59ab4fc3f3d0b4891a5b1c0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 28 Mar 2019 19:20:52 -0500
Subject: [PATCH] Clean up pg_checksums.sgml

---
 doc/src/sgml/ref/pg_checksums.sgml | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c60695d..9433782 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -39,15 +39,16 @@ PostgreSQL documentation
pg_checksums verifies, enables or disables data
checksums in a PostgreSQL cluster.  The server
must be shut down cleanly before running
-   pg_checksums. The exit status is zero if there
-   are no checksum errors when checking them, and nonzero if at least one
-   checksum failure is detected. If enabling or disabling checksums, the
+   pg_checksums. When verifying checksums, the exit
+   status is zero if there are no checksum errors, and nonzero if at least one
+   checksum failure is detected. When enabling or disabling checksums, the
exit status is nonzero if the operation failed.
   
 
   
-   While verifying or enabling checksums needs to scan or write every file in
-   the cluster, disabling checksums will only update the file
+   When verifying checksums, every file in the cluster is scanned;
+   When enabling checksums, every file in the cluster is also rewritten.
+   Disabling checksums only updates the file
pg_control.
   
  
@@ -195,11 +196,10 @@ PostgreSQL documentation
safe.
   
   
-   If pg_checksums is aborted or killed while
-   enabling or disabling checksums, the cluster will keep the same
-   configuration for data checksums as before the operation attempted.
-   pg_checksums can be restarted to
-   attempt again the same operation.
+   If pg_checksums is aborted or killed
+   while enabling or disabling checksums, the cluster's checksum state
+   will be unchanged, and pg_checksums would need to
+   be rerun and start its operation from scratch.
   
  
 
-- 
2.1.4

Re: table_privileges view always show object owner as a grantor

2019-03-29 Thread Tom Lane

Laurenz Albe  writes:
> Ashutosh Sharma wrote:
>> I noticed that irrespective of whoever grants privileges on an object,
>> it's always the object owner who is seen as a grantor in the output of
>> table_privileges view.

The above is demonstrably false ...

regression=# create user alice;
CREATE ROLE
regression=# create user bob;
CREATE ROLE
regression=# create user charlie;
CREATE ROLE
regression=# \c - alice
You are now connected to database "regression" as user "alice".
regression=> create table a1(f int);
CREATE TABLE
regression=> grant select on table a1 to bob with grant option;
GRANT
regression=> \c - bob
You are now connected to database "regression" as user "bob".
regression=> grant select on table a1 to charlie;  
GRANT
regression=> select * from information_schema.table_privileges where table_name 
= 'a1';
 grantor | grantee | table_catalog | table_schema | table_name | privilege_type 
| is_grantable | with_hierarchy 
-+-+---+--+++--+
 bob | charlie | regression| public   | a1 | SELECT 
| NO   | YES
 alice   | bob | regression| public   | a1 | SELECT 
| YES  | YES
(2 rows)

> Currently the grantor of a privilege is the owner if a superuser
> grants a privilege on the object.

Yes, that is true.

> So I'd say the behavior is fine as it is, but it would not harm to
> document it better (or at all).

It is documented, see under GRANT:

If a superuser chooses to issue a GRANT or REVOKE command, the command
is performed as though it were issued by the owner of the affected
object. In particular, privileges granted via such a command will
appear to have been granted by the object owner. (For role membership,
the membership appears to have been granted by the containing role
itself.)

GRANT and REVOKE can also be done by a role that is not the owner of
the affected object, but is a member of the role that owns the object,
or is a member of a role that holds privileges WITH GRANT OPTION on
the object. In this case the privileges will be recorded as having
been granted by the role that actually owns the object or holds the
privileges WITH GRANT OPTION. For example, if table t1 is owned by
role g1, of which role u1 is a member, then u1 can grant privileges on
t1 to u2, but those privileges will appear to have been granted
directly by g1. Any other member of role g1 could revoke them later.

If the role executing GRANT holds the required privileges indirectly
via more than one role membership path, it is unspecified which
containing role will be recorded as having done the grant. In such
cases it is best practice to use SET ROLE to become the specific role
you want to do the GRANT as.

The point about other members of the owning role being able to revoke
the privileges is why it's done this way.

regards, tom lane

Re: log bind parameter values on error

2019-03-29 Thread Alexey Bashtanov


Hello and sorry for weeks of silence.


Hello Anders and Peter,

Thanks for your messages.
Please see the new patch version attached.

In my testing, I couldn't get this patch to do anything.  Could you
please share your testing steps?


Sure. Provided you're in the postgres checkout and you've run make in 
src/test/examples/ this should work


CREATE SCHEMA testlibpq3;
SET search_path = testlibpq3;
CREATE TABLE test1_(i int4, t text, b bytea);
INSERT INTO test1_ VALUES(0, '', '');
CREATE VIEW test1 AS SELECT 1/i i, t, b FROM test1_;

-- will log only statement
\! ./src/test/examples/testlibpq3

ALTER SYSTEM SET log_parameters_on_error TO on;
SELECT pg_reload_conf();

-- will log statement with parameters
\! ./src/test/examples/testlibpq3


I did

postgres -D data --log-parameters-on-error=on

pgbench -i bench

alter table pgbench_accounts alter column aid type smallint;

pgbench -M extended -S -T 10 bench

This will then error out on type overflows, but I don't see any
parameters being logged:

ERROR:  value "62812" is out of range for type smallint
STATEMENT:  SELECT abalance FROM pgbench_accounts WHERE aid = $1;

(In this case the error message contains the parameter value, so it's
not a very practical case, but it should work, it seems.)
I guess this error occurred /while/ binding, so the parameters probably 
weren't yet all bound by the time of error reporting.

That's why the error message came without parameters.


Meanwhile, I have committed a patch that refactors the ParamListInfo
initialization a bit, so you don't have to initialize hasTextValues in a
bunch of places unrelated to your core code.  So please rebase your
patch on that.


Please find rebased patch attached.

I apologize for no reply before: I first thought my patch was really 
faulty and knew I wouldn't have time to fix it these days, but it seems 
to me it actually works.


Anyway, I don't suppose you manage to review it within the remaining few 
days, so I'll rebase it again in the beginning of the next CF.


Best regards,
  Alexey
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d383de2..1442f30 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6330,6 +6330,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_parameters_on_error (boolean)
+  
+   log_parameters_on_error configuration parameter
+  
+  
+  
+   
+Controls whether the statement is logged with bind parameter values. 
+It adds some overhead, as postgres will cache textual
+representations of parameter values in memory for all statements,
+even if they eventually do not get logged. The default is
+off. Only superusers can change this setting.
+   
+  
+ 
+
  
   log_statement (enum)
   
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index f5d5613..5e5ccf1 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -45,6 +45,7 @@ makeParamList(int numParams)
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
 	retval->numParams = numParams;
+	retval->hasTextValues = false;
 
 	return retval;
 }
@@ -58,6 +59,10 @@ makeParamList(int numParams)
  * set of parameter values.  If dynamic parameter hooks are present, we
  * intentionally do not copy them into the result.  Rather, we forcibly
  * instantiate all available parameter values and copy the datum values.
+ *
+ * Since this function is never used for copying parameters into a top portal,
+ * we don't copy textual representations. Neither we set the hasTextValues
+ * flag, so noone will access them.
  */
 ParamListInfo
 copyParamList(ParamListInfo from)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f9ce3d8..e097ce8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -86,6 +86,12 @@
  */
 const char *debug_query_string; /* client-supplied query string */
 
+/*
+ * The top-level portal that the client is immediately working with:
+ * creating, binding, executing, or all at one using simple protocol
+ */
+Portal current_top_portal = NULL;
+
 /* Note: whereToSendOutput is initialized for the bootstrap/standalone case */
 CommandDest whereToSendOutput = DestDebug;
 
@@ -1694,6 +1700,9 @@ exec_bind_message(StringInfo input_message)
 	else
 		portal = CreatePortal(portal_name, false, false);
 
+	Assert(current_top_portal == NULL);
+	current_top_portal = portal;
+
 	/*
 	 * Prepare to copy stuff into the portal's memory context.  We do all this
 	 * copying first, because it could possibly fail (out-of-memory) and we
@@ -1731,6 +1740,9 @@ exec_bind_message(StringInfo input_message)
 	 */
 	if (numParams > 0)
 	{
+		/* GUC value can change, so we remember its state to be consistent */
+		bool need_text_values = log_parameters_on_error;
+
 		params = makeParamList(numParams);
 
 		for (int paramno = 0; paramno < numParams; paramno++)
@@ -1798,9 +1810,31 @@ exec_bind

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

I just noticed that the CLUSTER calls index_build, which my patch
modifies to include additional progress metrics; this means that during
the index build phase, the metrics set by CLUSTER will be trashed by the
ones my patch introduces.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Andres Freund

Hi,

On 2019-03-29 12:02:18 -0300, Alvaro Herrera wrote:
> I just noticed that the CLUSTER calls index_build, which my patch
> modifies to include additional progress metrics; this means that during
> the index build phase, the metrics set by CLUSTER will be trashed by the
> ones my patch introduces.

Yea, it really seems that the index build infrastructure needs to
support cooperating with the caller's progress reporting. For CLUSTER,
REINDEX, ALTER TABLE rewrites etc, they all would likely want to have
insight into the index build while also having their own progress.

Greetings,

Andres Freund

RE: REINDEX CONCURRENTLY 2.0

2019-03-29 Thread Shinoda, Noriyoshi (PN Japan A&PS Delivery)

Hi hackers,

I tried this great feature for partition index. 
The first time the REINDEX TABLE CONCURRENTLY statement is executed to the 
partition, then an error occurs. 
The second run succeeds but leaves an index with an INVALID status.
I think this is not the desired behaviour.

# TEST
postgres=> CREATE TABLE part1(c1 INT) PARTITION BY RANGE(c1);
CREATE TABLE
postgres=> CREATE TABLE part1v1 PARTITION OF part1 FOR VALUES FROM (0) TO (100);
CREATE TABLE
postgres=> CREATE INDEX idx1_part1 ON part1(c1);
CREATE INDEX
postgres=> REINDEX TABLE CONCURRENTLY part1v1;
ERROR:  cannot drop index part1v1_c1_idx_ccold because index idx1_part1 
requires it
HINT:  You can drop index idx1_part1 instead.
postgres=> \d+ part1v1
  Table "public.part1v1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |  |
Partition of: part1 FOR VALUES FROM (0) TO (100)
Partition constraint: ((c1 IS NOT NULL) AND (c1 >= 0) AND (c1 < 100))
Indexes:
"part1v1_c1_idx" btree (c1)
"part1v1_c1_idx_ccold" btree (c1) INVALID
Access method: heap

postgres=> REINDEX TABLE CONCURRENTLY part1v1;
REINDEX
postgres=> \d+ part1v1
  Table "public.part1v1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |  |
Partition of: part1 FOR VALUES FROM (0) TO (100)
Partition constraint: ((c1 IS NOT NULL) AND (c1 >= 0) AND (c1 < 100))
Indexes:
"part1v1_c1_idx" btree (c1)
"part1v1_c1_idx_ccold" btree (c1) INVALID
Access method: heap

Regards,
Noriyoshi Shinoda

-Original Message-
From: Michael Paquier [mailto:mich...@paquier.xyz] 
Sent: Friday, March 29, 2019 6:21 PM
To: Peter Eisentraut 
Cc: Sergei Kornilov ; pgsql-hackers@lists.postgresql.org
Subject: Re: REINDEX CONCURRENTLY 2.0

On Fri, Mar 29, 2019 at 09:13:35AM +0100, Peter Eisentraut wrote:
> So, we're getting buildfarm failures, only with clang.  I can 
> reproduce those (with clang).

Indeed, I can reproduce the failures using -O2 with clang.  I am wondering if 
we are not missing a volatile flag somewhere and that some code reordering is 
at cause here.

> It seems the issue is somewhere near indexcmds.c "Phase 6 of REINDEX 
> CONCURRENTLY".  More eyes welcome.

Here is a short reproducer:
create materialized view aam as select 1 AS a; create index aai on aam(a); 
reindex table CONCURRENTLY aam;
--
Michael

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Alvaro Herrera wrote:

> I just noticed that the CLUSTER calls index_build, which my patch
> modifies to include additional progress metrics; this means that during
> the index build phase, the metrics set by CLUSTER will be trashed by the
> ones my patch introduces.

Indeed:

  pid  | datid | datname  | relid | command |phase| 
cluster_index_relid | heap_tuples_scanned | heap_tuples_written | 
heap_blks_total | heap_blks_scanned | index_rebuild_count 
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 |  162402 |  162402 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 |  460362 |  460362 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 |  754004 |  754004 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1047058 | 1047058 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1356296 | 1356296 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1645321 | 1645321 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 1939920 | 1939920 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 2227450 | 2227450 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 2526116 | 2526116 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 2828468 | 2828468 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 3142982 | 3142982 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 3451494 | 3451494 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 3769799 | 3769799 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 4077513 | 4077513 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 4383255 | 4383255 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 4700286 | 4700286 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5015468 | 5015468 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5324951 | 5324951 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5628172 | 5628172 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 5940862 | 5940862 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 6253778 | 6253778 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 6560474 | 6560474 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | index scanning heap | 
  16387 | 6881248 | 6881248 |   0 | 
0 |   0
 28457 | 12750 | alvherre | 16384 | CLUSTER | in

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Andres Freund wrote:

> Hi,
> 
> On 2019-03-29 12:02:18 -0300, Alvaro Herrera wrote:
> > I just noticed that the CLUSTER calls index_build, which my patch
> > modifies to include additional progress metrics; this means that during
> > the index build phase, the metrics set by CLUSTER will be trashed by the
> > ones my patch introduces.
> 
> Yea, it really seems that the index build infrastructure needs to
> support cooperating with the caller's progress reporting. For CLUSTER,
> REINDEX, ALTER TABLE rewrites etc, they all would likely want to have
> insight into the index build while also having their own progress.

So, CLUSTER and ALTER TABLE rewrites only do non-concurrent index
builds; and REINDEX can reuse pretty much the same wait-for metrics
columns as CIC.  So I think it's okay if I move only the metrics that
conflict for index_build.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: partitioned tables referenced by FKs

2019-03-29 Thread Jesper Pedersen


Hi Alvaro,

On 3/28/19 2:59 PM, Alvaro Herrera wrote:

I ended up revising the dependencies that we give to the constraint in
the partition -- instead of giving it partition-type dependencies, we
give it an INTERNAL dependency.  Now when you request to drop the
partition, it says this:

create table pk (a int primary key) partition by list (a);
create table fk (a int references pk);
create table pk1 partition of pk for values in (1);

alvherre=# drop table pk1;
ERROR:  cannot drop table pk1 because other objects depend on it
DETAIL:  constraint fk_a_fkey on table fk depends on table pk1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

If you do say CASCADE, the constraint is dropped.  Not really ideal (I
would prefer that the drop is prevented completely), but at least it's
not completely bogus.  If you do "DROP TABLE pk", it works sanely.
Also, if you DETACH the partition that pg_depend row goes away, so a
subsequent drop of the partition works sanely.

Fixed the psql issue pointed out by Amit L too.



Could expand a bit on the change to DEPENDENCY_INTERNAL instead of 
DEPENDENCY_PARTITION_PRI / DEPENDENCY_PARTITION_SEC ?


If you run "DROP TABLE t2_p32 CASCADE" the foreign key constraint is 
removed from all of t1.


-- ddl.sql --
CREATE TABLE t1 (i1 INT PRIMARY KEY, i2 INT NOT NULL) PARTITION BY HASH 
(i1);
CREATE TABLE t2 (i1 INT PRIMARY KEY, i2 INT NOT NULL) PARTITION BY HASH 
(i1);


\o /dev/null
SELECT 'CREATE TABLE t1_p' || x::text || ' PARTITION OF t1
FOR VALUES WITH (MODULUS 64, REMAINDER ' || x::text || ');'
from generate_series(0,63) x;
\gexec
\o

\o /dev/null
SELECT 'CREATE TABLE t2_p' || x::text || ' PARTITION OF t2
FOR VALUES WITH (MODULUS 64, REMAINDER ' || x::text || ');'
from generate_series(0,63) x;
\gexec
\o

ALTER TABLE t1 ADD CONSTRAINT fk_t1_i2_t2_i1 FOREIGN KEY (i2) REFERENCES 
t2(i1);


INSERT INTO t2 (SELECT i, i FROM generate_series(1, 1000) AS i);
INSERT INTO t1 (SELECT i, i FROM generate_series(1, 1000) AS i);

ANALYZE;
-- ddl.sql --

Detaching the partition for DROP seems safer to me.

Thanks in advance !

Best regards,
 Jesper

Re: partitioned tables referenced by FKs

2019-03-29 Thread Alvaro Herrera

On 2019-Mar-29, Jesper Pedersen wrote:

> Could expand a bit on the change to DEPENDENCY_INTERNAL instead of
> DEPENDENCY_PARTITION_PRI / DEPENDENCY_PARTITION_SEC ?

The PARTITION dependencies work in a way that doesn't do what we want.
Admittedly, neither does INTERNAL, but at least it's less bad.

> If you run "DROP TABLE t2_p32 CASCADE" the foreign key constraint is removed
> from all of t1.

Yes.  CASCADE is always a dangerous tool; if you run the DROP partition
without cascade, it explicitly lists that the constraint is going to be
dropped.

If you get in the habit of added CASCADE to all your drops, you're going
to lose data pretty quickly.  In this case, no data is lost, only a
constraint.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL pollutes the file system

2019-03-29 Thread Christoph Berg

Re: Tatsuo Ishii 2019-03-29 
<20190329.100407.1159785913847835944.t-is...@sraoss.co.jp>
> If we were to invent new command names, what about doing similar to
> git? I mean something like:
> 
> pgsql createdb 

That is pretty close to "psql" and it will be utterly confusing for
new users. And everyone will have a hard time when talking about the
tools, imagine someone saying "please run psql appdbname" or "please
run pgsql createdb". The difference is just too small.

What might possibly make sense is to add options to psql to
facilitate common tasks:

psql --createdb foo
psql --createuser bar --superuser
psql --reindex foo

Christoph

Re: pgsql: Compute XID horizon for page level index vacuum on primary.

2019-03-29 Thread Andres Freund

Hi,

On 2019-03-29 09:37:11 +, Simon Riggs wrote:
> This commit message was quite confusing. It took me a while to realize this
> relates to btree index deletes and that what you mean is that we are
> calculcating the latestRemovedXid for index entries. That is related to but
> not same thing as the horizon itself.

Well, it's the page level horizon...

> While trying to understand this, I see there is an even better way to
> optimize this. Since we are removing dead index tuples, we could alter the
> killed index tuple interface so that it returns the xmax of the tuple being
> marked as killed, rather than just a boolean to say it is dead.

Wouldn't that quite possibly result in additional and unnecessary
conflicts? Right now the page level horizon is computed whenever the
page is actually reused, rather than when an item is marked as
deleted. As it stands right now, the computed horizons are commonly very
"old", because of that delay, leading to lower rates of conflicts.

> Indexes can then mark the killed tuples with the xmax that killed them
> rather than just a hint bit. This is possible since the index tuples
> are dead and cannot be used to follow the htid to the heap, so the
> htid is redundant and so the block number of the tid could be
> overwritten with the xmax, zeroing the itemid. Each killed item we
> mark with its xmax means one less heap fetch we need to perform when
> we delete the page - it's possible we optimize that away completely by
> doing this.

That's far from a trivial feature imo. It seems quite possible that we'd
end up with increased overhead, because the current logic can get away
with only doing hint bit style writes - but would that be true if we
started actually replacing the item pointers? Because I don't see any
guarantee they couldn't cross a page boundary etc? So I think we'd need
to do WAL logging during index searches, which seems prohibitively
expensive.

And I'm also doubtful it's worth it because:

> Since this point of the code is clearly going to be a performance issue it
> seems like something we should do now.

I've tried quite a bit to find a workload where this matters, but after
avoiding redundant buffer accesses by sorting, and prefetching I was
unable to do so.  What workload do you see where this would be really be
bad? Without the performance optimization I'd found a very minor
regression by trying to force the heap visits to happen in a pretty
random order, but after sorting that went away.  I'm sure it's possible
to find a case on overloaded rotational disks where you'd find a small
regression, but I don't think it'd be particularly bad.

Greetings,

Andres Freund

Re: speeding up planning with partitions

2019-03-29 Thread Tom Lane

Amit Langote  writes:
> Here are some comments on v38.

Thanks for looking it over!  I'll just reply to points worth discussing:

> -Assert(rte->rtekind == RTE_RELATION ||
> -   rte->rtekind == RTE_SUBQUERY);
> -add_appendrel_other_rels(root, rel, rti);
> +if (rte->rtekind == RTE_RELATION)
> +expand_inherited_rtentry(root, rel, rte, rti);
> +else
> +expand_appendrel_subquery(root, rel, rte, rti);

> Wouldn't it be a good idea to keep the Assert?

There's an Assert in expand_appendrel_subquery that what it got is an
RTE_SUBQUERY, so I thought the one at the call site was redundant.
I suppose another way to do this would be like

   if (rte->rtekind == RTE_RELATION)
   expand_inherited_rtentry(root, rel, rte, rti);
   else if (rte->rtekind == RTE_SUBQUERY)
   expand_appendrel_subquery(root, rel, rte, rti);
   else
   Assert(false);

Not sure if that's better or not.  Or we could go back to the
design of just having one function and letting it dispatch the
case it doesn't want to the other function --- though I think
I'd make expand_inherited_rtentry be the primary function,
rather than the other way around as you had it in v37.

> +forboth(lc, old_child_rtis, lc2, new_child_rtis)
> +{
> +int old_child_rti = lfirst_int(lc);
> +int new_child_rti = lfirst_int(lc2);
> +
> +if (old_child_rti == new_child_rti)
> +continue;   /* nothing to do */
> +
> +Assert(old_child_rti > new_child_rti);
> +
> +ChangeVarNodes((Node *) child_appinfos,
> +   old_child_rti, new_child_rti, 0);
> +}

> This seems necessary?  RTEs of children of the target table should be in
> the same position in the final_rtable as they are in the select_rtable.

Well, that's what I'm not very convinced of.  I have observed that
the regression tests don't reach this ChangeVarNodes call, but
I think that might just be lack of test cases rather than a proof
that it's impossible.  The question is whether it'd ever be possible
for the update/delete target to not be the first "inh" table that
gets expanded.  Since that expansion is done in RTE order, it
reduces to "is the target always before any other RTE entries
that could need inheritance expansion?"  Certainly that would typically
be true, but I don't feel very comfortable about assuming that it
must be true, when you start thinking about things like updatable
views, rules, WITH queries, and so on.

It might be worth trying to devise a test case that does reach this
code.  If we could convince ourselves that it's really impossible,
I'd be willing to drop it in favor of putting a test-and-elog check
in the earlier loop that the RTI pairs are all equal.  But I'm not
willing to do it without more investigation.

> +/* XXX wrong? */
> +parentrte->inh = false;

> About the XXX: I think resetting inh flag is unnecessary, so we should
> just remove the line.

Possibly.  I hadn't had time to follow up the XXX annotation.

> If we do that, we can also get rid of the following
> code in set_rel_size():

> else if (rte->relkind == RELKIND_PARTITIONED_TABLE)
> {
> /*
>  * A partitioned table without any partitions is marked as
>  * a dummy rel.
>  */
> set_dummy_rel_pathlist(rel);
> }

Not following?  Surely we need to mark the childless parent as dummy at
some point, and that seems like as good a place as any.

> Finally, it's not in the patch, but how about visiting
> get_relation_constraints() for revising this block of code:

That seems like probably an independent patch --- do you want to write it?

regards, tom lane

Re: Online verification of checksums

2019-03-29 Thread Stephen Frost

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra 
> wrote:
> 
> > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote:
> > >Hi,
> > >
> > >On 2019-03-28 21:09:22 +0100, Michael Banck wrote:
> > >> I agree that the current patch might have some corner-cases where it
> > >> does not guarantee 100% accuracy in online mode, but I hope the current
> > >> version at least has no more false negatives.
> > >
> > >False positives are *bad*. We shouldn't integrate code that has them.
> > >
> >
> > Yeah, I agree. I'm a bit puzzled by the reluctance to make the online mode
> > communicate with the server, which would presumably address these issues.
> > Can someone explain why not to do that?
> 
> I agree that this effort seems better spent on fixing those issues there
> (of which many are the same), and then re-use that.

This really seems like it depends on which of the options we're talking
about..   Connecting to the server and asking what the current insert
point is, so we can check that the LSN isn't completely insane, seems
reasonable, but at least one option being discussed was to have
pg_basebackup actually *lock the page* (even if just for I/O..) and then
re-read it, and having an external tool doing that instead of the
backend seems like a whole different level to me.  That would involve
having an SQL function for "lock this page against I/O" and then another
for "unlock this page", wouldn't it?

> > FWIW I've initially argued against that, believing that we can address
> > those issues in some other way, and I'd love if that was possible. But
> > considering we're still trying to make that work reliably I think the
> > reasonable conclusion is that Andres was right communicating with the
> > server is necessary.

As part of a backup, you could check against the pages written out into
the WAL as a cross-check and be able to be confident that at least
everything which was backed up had been checked.  That doesn't cover
things like unlogged tables though.

For my part, at least, adding additional checks around the LSN seems
like a good solution (though we can't allow those checks to turn into
false positives...) and would seriously reduce the risk that we have
false negatives (we can *not* completely eliminate false negatives
entirely..  we could possibly get to a point where at least we don't
have any more false negatives than PG itself has but it looks like an
awful lot of work and ends up adding its own risks...).

As I've said before, I'd certainly support a background worker which
performs ongoing checksum validation of pages and that would be able to
use the same approach as what we do with pg_basebackup, but having an
external tool locking pages seems really unlikely to be reasonable.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: partitioned tables referenced by FKs

2019-03-29 Thread Jesper Pedersen


Hi,

On 3/29/19 11:22 AM, Alvaro Herrera wrote:

On 2019-Mar-29, Jesper Pedersen wrote:


Could expand a bit on the change to DEPENDENCY_INTERNAL instead of
DEPENDENCY_PARTITION_PRI / DEPENDENCY_PARTITION_SEC ?


The PARTITION dependencies work in a way that doesn't do what we want.
Admittedly, neither does INTERNAL, but at least it's less bad.


If you run "DROP TABLE t2_p32 CASCADE" the foreign key constraint is removed
from all of t1.


Yes.  CASCADE is always a dangerous tool; if you run the DROP partition
without cascade, it explicitly lists that the constraint is going to be
dropped.

If you get in the habit of added CASCADE to all your drops, you're going
to lose data pretty quickly.  In this case, no data is lost, only a
constraint.



Thanks !

Maybe the "(" / ")" in the CASCADE description should be removed from 
ref/drop_table.sgml as part of this patch.


Should catalogs.sgml be updated for this case ?

Best regards,
 Jesper

Re: Online verification of checksums

2019-03-29 Thread Andres Freund

Hi,

On 2019-03-29 11:30:15 -0400, Stephen Frost wrote:
> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra 
> > wrote:
> > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote:
> > > >Hi,
> > > >
> > > >On 2019-03-28 21:09:22 +0100, Michael Banck wrote:
> > > >> I agree that the current patch might have some corner-cases where it
> > > >> does not guarantee 100% accuracy in online mode, but I hope the current
> > > >> version at least has no more false negatives.
> > > >
> > > >False positives are *bad*. We shouldn't integrate code that has them.
> > > >
> > >
> > > Yeah, I agree. I'm a bit puzzled by the reluctance to make the online mode
> > > communicate with the server, which would presumably address these issues.
> > > Can someone explain why not to do that?
> > 
> > I agree that this effort seems better spent on fixing those issues there
> > (of which many are the same), and then re-use that.
> 
> This really seems like it depends on which of the options we're talking
> about..   Connecting to the server and asking what the current insert
> point is, so we can check that the LSN isn't completely insane, seems
> reasonable, but at least one option being discussed was to have
> pg_basebackup actually *lock the page* (even if just for I/O..) and then
> re-read it, and having an external tool doing that instead of the
> backend seems like a whole different level to me.  That would involve
> having an SQL function for "lock this page against I/O" and then another
> for "unlock this page", wouldn't it?

No, I don't think so. And we obviously couldn't have a SQL level
function hold an LWLock after it has finished, that'd make undetected
deadlocks triggerable by users.  The way I'd imagine that being done is
to just perform the checksum test in the commandline tool, and whenever
there's a checksum failure that could plausibly be a torn read, call a
server side function that re-tests the page after locking it. Which then
would just return the error message in a string.

Greetings,

Andres Freund

Re: table_privileges view always show object owner as a grantor

2019-03-29 Thread Ashutosh Sharma

On Fri, Mar 29, 2019 at 8:15 PM Tom Lane  wrote:
>
> Laurenz Albe  writes:
> > Ashutosh Sharma wrote:
> >> I noticed that irrespective of whoever grants privileges on an object,
> >> it's always the object owner who is seen as a grantor in the output of
> >> table_privileges view.
>
> The above is demonstrably false ...
>

Okay. Seems like that is only true when the grantor of a privilege is superuser.

> regression=# create user alice;
> CREATE ROLE
> regression=# create user bob;
> CREATE ROLE
> regression=# create user charlie;
> CREATE ROLE
> regression=# \c - alice
> You are now connected to database "regression" as user "alice".
> regression=> create table a1(f int);
> CREATE TABLE
> regression=> grant select on table a1 to bob with grant option;
> GRANT
> regression=> \c - bob
> You are now connected to database "regression" as user "bob".
> regression=> grant select on table a1 to charlie;
> GRANT
> regression=> select * from information_schema.table_privileges where 
> table_name = 'a1';
>  grantor | grantee | table_catalog | table_schema | table_name | 
> privilege_type | is_grantable | with_hierarchy
> -+-+---+--+++--+
>  bob | charlie | regression| public   | a1 | SELECT   
>   | NO   | YES
>  alice   | bob | regression| public   | a1 | SELECT   
>   | YES  | YES
> (2 rows)
>
> > Currently the grantor of a privilege is the owner if a superuser
> > grants a privilege on the object.
>
> Yes, that is true.
>
> > So I'd say the behavior is fine as it is, but it would not harm to
> > document it better (or at all).
>
> It is documented, see under GRANT:
>

Okay, Thanks for the pointer. I was actually referring to the
documentation on table_privileges view where the description for
grantor column says : "Name of the role that granted the privilege"

> If a superuser chooses to issue a GRANT or REVOKE command, the command
> is performed as though it were issued by the owner of the affected
> object. In particular, privileges granted via such a command will
> appear to have been granted by the object owner. (For role membership,
> the membership appears to have been granted by the containing role
> itself.)
>
> GRANT and REVOKE can also be done by a role that is not the owner of
> the affected object, but is a member of the role that owns the object,
> or is a member of a role that holds privileges WITH GRANT OPTION on
> the object. In this case the privileges will be recorded as having
> been granted by the role that actually owns the object or holds the
> privileges WITH GRANT OPTION. For example, if table t1 is owned by
> role g1, of which role u1 is a member, then u1 can grant privileges on
> t1 to u2, but those privileges will appear to have been granted
> directly by g1. Any other member of role g1 could revoke them later.
>
> If the role executing GRANT holds the required privileges indirectly
> via more than one role membership path, it is unspecified which
> containing role will be recorded as having done the grant. In such
> cases it is best practice to use SET ROLE to become the specific role
> you want to do the GRANT as.
>
> The point about other members of the owning role being able to revoke
> the privileges is why it's done this way.
>

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

1 2 >

1 - 100 of 161 matches

Mail list logo