Re: PG version recommendation

2019-05-11 Thread Peter J. Holzer
On 2019-05-08 13:41:08 +0900, Ian Barwick wrote:
> On Wed, 8 May 2019 at 07:19, Tim Cross  wrote:
> 
> I would find out if the IT team who will maintain the system are running
> a specific Linux distribution, such as RHEL and just go with the PG
> version that is on that distribution.
> 
> 
> Bear in mind, depending on the OS, the default version available may be
> chronically outdated and no longer supported by the community. For example
> the CentOS 7 out-of-the-box PostgreSQL version is 9.2.24.

RHEL 8 was just released and contains PostgreSQL 10. So depending on how
long the IT team takes to integrate a new base OS version into their
standard procedures, that might be the best option.

RHEL also has "software collections" (which are new to me although the
guide is from 2013 - so you can guess how long it's been that I last
administrated RHEL systems) which include PostgreSQL 10 for RHEL 7
according to https://www.postgresql.org/download/linux/redhat/. 

Of course, if the IT team is willing to use the PostgreSQL yum
repository, then that is even better.

All this of course assumes that the IT team insists on Redhat, which the
OP hasn't said. If there is a wider varietyl of distributions to choose
from, my preference would be Debian or Ubuntu (in that order).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature


Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

2019-05-11 Thread bb ddd


Thanks, Adrian, but i was looking for something that goes into more depth.
For example there is one case described there where we have a Nested Loop with 
rows=33, and its 2 child nodes have each rows=10.
But first of all this is a very exotic join condition (t1.hundred < t2.hundred) 
and second of all i cannot find any explanation how this number 33 is derived. 
They literally spend less than 2 sentences on this case.

In my case i as well have Nested Loop's rows value different than the product 
of its 2 children's rows values, but with a normal join condition on a foreign 
key.
My guesses (again) are the same way it keeps some statistics (very curious what 
exactly) about what is the probability 2 random rows from each table satisfy 
that condition (t1.hundred < t2.hundred), it also keeps statistics what is the 
probability 2 random rows from each table satisfy the regular normal join 
condition like the one i have in my case (basically t1.t2_id=t2.id). And in 
both cases it just applies that probability to the product of the rows values 
of the 2 child nodes, to calculate the expected rows value of the result of the 
Nested Loop. But i am just guessing.







 > Оригинално писмо 

 >От: Adrian Klaver adrian.kla...@aklaver.com

 >Относно: Re: Question about Expected rows value in EXPLAIN output for Nested
 Loop node

 >До: bb ddd 

 >Изпратено на: 10.05.2019 19:02



 
> On 5/10/19 8:56 AM, bb ddd wrote:
 
> > Thanks for the reply. Here it is: https://explain.depesz.com/s/LQCS
 
> > 
 
> > My main trouble is not with this exact case, but i am looking for a general 
> > description of the algorithm how those numbers are calculated. In the 
> > meanwhile i also constructed a couple of artificial tables to experiment 
> > with and see how these numbers change, and am pretty sure my guesses there 
> > in the original question are quite close to what is going on, but would be 
> > so much easier to read explain plans, if there was some documentation about 
> > how the expected rows are calculated for different nodes, in different 
> > contexts (like when loops=1 vs loops>1) maybe what statistics they use, how 
> > they depend on the numbers of their child nodes etc.
 
> > 
 
> > 
 
> 
 
> How about?:
 
> https://www.postgresql.org/docs/11/using-explain.html
 
> 
 
> 
 
> -- 
 
> Adrian Klaver
 
> adrian.kla...@aklaver.com




Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

2019-05-11 Thread Adrian Klaver

On 5/11/19 2:33 AM, bb ddd wrote:


Thanks, Adrian, but i was looking for something that goes into more depth.
For example there is one case described there where we have a Nested Loop with 
rows=33, and its 2 child nodes have each rows=10.
But first of all this is a very exotic join condition (t1.hundred < t2.hundred) 
and second of all i cannot find any explanation how this number 33 is derived. 
They literally spend less than 2 sentences on this case.

In my case i as well have Nested Loop's rows value different than the product 
of its 2 children's rows values, but with a normal join condition on a foreign 
key.
My guesses (again) are the same way it keeps some statistics (very curious what exactly) about what is the probability 2 random rows from each table satisfy that condition 


https://www.postgresql.org/docs/11/planner-stats-details.html

(t1.hundred < t2.hundred), it also keeps statistics what is the 
probability 2 random rows from each table satisfy the regular normal 
join condition like the one i have in my case (basically 
t1.t2_id=t2.id). And in both cases it just applies that probability to 
the product of the rows values of the 2 child nodes, to calculate the 
expected rows value of the result of the Nested Loop. But i am just 
guessing.










--
Adrian Klaver
adrian.kla...@aklaver.com




Re: perl path issue

2019-05-11 Thread Adrian Klaver

On 5/10/19 6:49 PM, Prakash Ramakrishnan wrote:

Hi Adrian,


If I am following the below correctly you have EDB Postgres and 
pgBackRest running together on the prod server, correct?


If so what was done different/is different on the dev server?



1) How was the production Postgres installed(EDB, native packages) and
on what OS?

prod has been installed edb standard edition .run file and os CENTOS 7.5

2) The dev Postgres is the one installed using EDB installer, correct?

yes

3) Define working fine. In other words where you/are you using a backup
method other then pgBackRest on the production and dev server?

Please find prod config file and backup details,

==> cat /etc/pgbackrest.conf
[global]
repo1-path=/pgBACKUP/A4_sydcosafpp001
retention-diff=3
retention-full=1
retention-archive=2
start-fast=y
process-max=12
archive-async=y


[global:archive-push]
process-max=4

[A4_sydcosafpp001]
pg1-path=/Postgres/pgDATA/data
postg...@sydcosafpp001.enterprisenet.org:/home/postgres
==> cd /pgBACKUP/A4_sydcosafpp001/
archive/ backup/
postg...@sydcosafpp001.enterprisenet.org:/home/postgres
==> cd /pgBACKUP/A4_sydcosafpp001/backup/A4_sydcosafpp001/
postg...@sydcosafpp001.enterprisenet.org:/pgBACKUP/A4_sydcosafpp001/backup/A4_sydcosafpp001
==> ls -lrth
total 114K
drwxr-x---. 4 postgres postgres   44 Jan  7 01:08 backup.history
drwxr-x---. 3 postgres postgres   96 May  4 21:30 20190504-190001F
drwxr-x---. 3 postgres postgres   96 May  5 19:41 
20190504-190001F_20190505-190002I
drwxr-x---. 3 postgres postgres   96 May  6 20:28 
20190504-190001F_20190506-190002I
drwxr-x---. 3 postgres postgres   96 May  7 20:26 
20190504-190001F_20190507-190004I
drwxr-x---. 3 postgres postgres   96 May  8 20:28 
20190504-190001F_20190508-190002I
drwxr-x---. 3 postgres postgres   96 May  9 20:32 
20190504-190001F_20190509-190001I
drwxr-x---. 3 postgres postgres   96 May 10 20:32 
20190504-190001F_20190510-190001I
lrwxrwxrwx. 1 postgres postgres   33 May 10 20:32 latest -> 
20190504-190001F_20190510-190001I
-rw-r-. 1 postgres postgres 5.7K May 10 20:32 backup.info 


-rw-r-. 1 postgres postgres 5.7K May 10 20:32 backup.info.copy



4) When you say single database are you talking about a particular
database in the Postgres cluster or the entire cluster?

we need restore for single database for single cluster.

5) Why is pgBackRest essential to this? In other words why not use the
Postgres pg_dump/pg_restore tools?

pgbackrest tool need for backup and restore larger databases.
if suppose am using pg_dump and restore they wont accept it and taking 
too much time using parallel option also so we use for pgbackrest.



On Fri, May 10, 2019 at 9:44 PM Adrian Klaver > wrote:


On 5/10/19 7:50 AM, Prakash Ramakrishnan wrote:
 > Hi Adrian,
 >
 > Production backup working fine now and previously dev server also
 > working fine.
 >
 > Now the business team wants to restore the single database using
 > pgbackrest tool so we have setup the pgbackrest.conf file is like to
 > point prod .


The above is not much to go on. So a list of questions below. Please
answer them all, thanks:

1) How was the production Postgres installed(EDB, native packages) and
on what OS?

2) The dev Postgres is the one installed using EDB installer, correct?

3) Define working fine. In other words where you/are you using a backup
method other then pgBackRest on the production and dev server?

4) When you say single database are you talking about a particular
database in the Postgres cluster or the entire cluster?

5) Why is pgBackRest essential to this? In other words why not use the
Postgres pg_dump/pg_restore tools?


As to below, this more then a Perl issue it is a package
incompatibility
issue.


 >
 > and directly  restore the backup from prod so thats why not able
to do
 > some perl issue .
 >
 > database size is huge so they giving pressure and below is the
config file,
 >
 > ==> cat /etc/pgbackrest.conf
 > #[global]
 > #repo1-path=/Postgres/pgBACKUP/A4_sydcosausd001
 > #retention-diff=4
 > #retention-full=4
 > #retention-archive=2
 > #start-fast=y
 > #process-max=4
 > #archive-async=y
 >
 >
 > #[global:archive-push]
 > #process-max=4
 >
 > #[A4_sydcosausd001]
 > #db-path=/Postgres/pgDATA/data
 >
 > *
 > *
 > *[global]*
 > *repo1-host=sydcosafpp001.enterprisenet.org

 > *
 > *repo1-host-user=postgres*
 > *repo1-host-config=/etc/pgbackrest.conf*
 > *repo1-path=/pgBACKUP/A4_sydcosafpp001/backup/A4_sydcosafpp001*
 > *
 > *
 > *[A4_sydcosafpp001]*
 > *pg1-path=/Postgres/prakash_pgbackrest*
 > postg...@sydcosausd001.enterprisenet.org:/home/postgres
 >
 >
 >
 > On Fri, May 1

Re: Question about Expected rows value in EXPLAIN output for Nested Loop node

2019-05-11 Thread Tom Lane
bb ddd  writes:
> Thanks, Adrian, but i was looking for something that goes into more depth.

Use the source, Luke.

git clone git://git.postgresql.org/git/postgresql.git

Likely places to look for this purpose include

src/backend/optimizer/README
src/backend/optimizer/path/costsize.c
src/backend/optimizer/path/clausesel.c
src/backend/utils/adt/selfuncs.c

The short answer to your question is that no, the size of the join
relation is not estimated by multiplying the sizes of input paths
for some particular nestloop plan.  The join size estimate is made
first and the same size is applied to all paths for the join.

In the case of a nestloop where the entire join condition was pushed
into the RHS path, you'd ideally expect that the LHS size times the
estimated size of the filtered RHS path matches the previously-made
join size estimate ... but those are different code paths using
different estimators, and since this is all very approximate and
full of heuristic guesses, sometimes they don't match very well.

regards, tom lane




Re: Hot Standby Conflict on pg_attribute

2019-05-11 Thread Jeremy Schneider
Just a quick footnote: If autovac truncations are frequently causing replica 
lag, and if this is a problem for you, IIUC one way you can stop autovac from 
doing the truncations even on older versions is setting old_snapshot_threshold 
to any value at all besides zero.  (On 12+ you can directly control the 
truncation behavior.)

-Jeremy

Sent from my TI-83

> On May 10, 2019, at 12:46, Erik Jones  wrote:
> 
>> On Fri, May 10, 2019 at 12:41 PM Tom Lane  wrote:
> 
>> Andres Freund  writes:
>> > On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
>> >> The question then is: Why would these user queries be waiting on an
>> >> AccessShare lock on pg_attribute?
>> 
>> > Queries that access a table for the *first* time after DDL happened
>> > (including truncating the relation), need an AccessShareLock on
>> > pg_attribute (and pg_class, pg_index, ...) for a short time.
>> 
>> Also, it seems likely that what's really triggering the issue is
>> autovacuum on pg_attribute trying to truncate off empty pages
>> in pg_attribute (after a bunch of dead rows were generated there
>> by DDL activity).  That requires exclusive lock on pg_attribute,
>> which would propagate down to the standby.
>> 
>> regards, tom lane
> 
> Right, that part I understood after checking out pg_attribute's insert/delete 
> counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW 
> runs on an otherwise idle server.  With them running 2k+ refreshes per day 
> autovac is regularly working on their catalog tables.
> 
> Thanks!
> -- 
> Erik Jones
> mag...@gmail.com


Re: Optimizing Database High CPU

2019-05-11 Thread Scottix
Hey,
So I finally found the culprit. Turns out to be the THP fighting with
itself.

After running on Ubuntu
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

It instantly went from a loadavg of 30 to 3

Also make sure you re-enable on reboot.

Anyway just wanted to give a followup on the issue incase anyone else is
having the same problem.

On Mon, Mar 4, 2019 at 12:03 PM Jeff Janes  wrote:

> On Wed, Feb 27, 2019 at 5:01 PM Michael Lewis  wrote:
>
>> If those 50-100 connections are all active at once, yes, that is high.
>>> They can easily spend more time fighting each other over LWLocks,
>>> spinlocks, or cachelines rather than doing useful work.  This can be
>>> exacerbated when you have multiple sockets rather than all cores in a
>>> single socket.  And these problems are likely to present as high Sys times.
>>>
>>> Perhaps you can put up a connection pooler which will allow 100
>>> connections to all think they are connected at once, but forces only 12 or
>>> so to actually be active at one time, making the others transparently queue.
>>>
>>
>> Can you expound on this or refer me to someplace to read up on this?
>>
>
> Just based on my own experimentation.  This is not a blanket
> recommendation,  but specific to the situation that we already suspect
> there is contention, and the server is too old to have 
> pg_stat_actvity.wait_event
> column.
>
>
>> Context, I don't want to thread jack though: I think I am seeing similar
>> behavior in our environment at times with queries that normally take
>> seconds taking 5+ minutes at times of high load. I see many queries showing
>> buffer_mapping as the LwLock type in snapshots but don't know if that may
>> be expected.
>>
>
> It sounds like your processes are fighting to reserve buffers in
> shared_buffers in which to read data pages.  But those data pages are
> probably already in the OS page cache, otherwise reading it from disk would
> be slow enough that you would be seeing some type of IO wait, or buffer_io,
> rather than buffer_mapping as the dominant wait type.  So I think that
> means you have most of your data in RAM, but not enough of it in
> shared_buffers.  You might be in a rare situation where setting
> shared_buffers to a high fraction of RAM, rather than the usual low
> fraction, is called for.  Increasing NUM_BUFFER_PARTITIONS might also be
> useful, but that requires a recompilation of the server.  But do these
> spikes correlate with anything known at the application level?  A change in
> the mix of queries, or a long report or maintenance operation?  Maybe the
> query plans briefly toggle over to using seq scans rather than index scans
> or vice versa, which drastically changes the block access patterns?
>
>
>> In our environment PgBouncer will accept several hundred connections and
>> allow up to 100 at a time to be active on the database which are VMs with
>> ~16 CPUs allocated (some more, some less, multi-tenant and manually
>> sharded). It sounds like you are advocating for connection max very close
>> to the number of cores. I'd like to better understand the pros/cons of that
>> decision.
>>
>
> There are good reasons to allow more than that.  For example, your
> application holds some transactions open briefly while it does some
> cogitation on the application-side, rather than immediately committing and
> so returning the connection to the connection pool.  Or your server has a
> very high IO capacity and benefits from lots of read requests in the queue
> at the same time, so it can keep every spindle busy and every rotation
> productive.  But, if you have no reason to believe that any of those
> situations apply to you, but do have evidence that you have lock contention
> between processes, then I think that limiting the number active processes
> to the number of cores is a good starting point.
>
> Cheers,
>
> Jeff
>


-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com


Linked data from upgrade after VACUUM FULL not deleted.

2019-05-11 Thread Josef Šimánek
Hello.

Few months ago we did successful pg_upgrade --link from 9.6 to 10. I did a
VACUUM FULL of all database this weekend and data from 9.6 directory were
not released.

I have tablespace with only one database.

PostgreSQL 10 folder has similar size to actual database size (by
https://wiki.postgresql.org/wiki/Disk_Usage it is 660GB):
681G./postgres_raid_tablespace/PG_10_201707211

PostgreSQL 9.6 folder still has a lot of data in there:
547G./postgres_raid_tablespace/PG_9.6_201608131

I did a really naive check if any links are still present:

ls -la . | grep "\->"

And nothing was found.

I have checked also which files are waiting to be deleted via lsof -u
postgres, but nothing was found in those directories.

I'm wondering if I understand this well and this is expected state where
old folder is not touched anymore and that's the reason why VACUUM FULL is
not releasing space in there.

Is there any way how to check if anything in that folder is really not used
anymore and consider that safe to delete?

I'll appreciate any suggestions.

Josef


Re: Hot Standby Conflict on pg_attribute

2019-05-11 Thread Erik Jones
Thanks for the ti

On Sat, May 11, 2019 at 9:15 AM Jeremy Schneider 
wrote:

> Just a quick footnote: If autovac truncations are frequently causing
> replica lag, and if this is a problem for you, IIUC one way you can stop
> autovac from doing the truncations even on older versions is setting
> old_snapshot_threshold to any value at all besides zero.  (On 12+ you can
> directly control the truncation behavior.)
>
> -Jeremy
>

Thanks for the tip!

-- 
Erik Jones
mag...@gmail.com


Re: Linked data from upgrade after VACUUM FULL not deleted.

2019-05-11 Thread Bruce Momjian
On Sat, May 11, 2019 at 07:04:32PM +0200, Josef Šimánek wrote:
> Hello.
> 
> Few months ago we did successful pg_upgrade --link from 9.6 to 10. I did a
> VACUUM FULL of all database this weekend and data from 9.6 directory were not
> released.
> 
> I have tablespace with only one database.
> 
> PostgreSQL 10 folder has similar size to actual database size (by https://
> wiki.postgresql.org/wiki/Disk_Usage it is 660GB):
> 681G    ./postgres_raid_tablespace/PG_10_201707211
> 
> PostgreSQL 9.6 folder still has a lot of data in there:
> 547G    ./postgres_raid_tablespace/PG_9.6_201608131
> 
> I did a really naive check if any links are still present:
> 
> ls -la . | grep "\->"

They are _hard_ links and do not show up as ->, but as the _link_ count,
which is the second column of ls -la.

> And nothing was found.
> 
> I have checked also which files are waiting to be deleted via lsof -u 
> postgres,
> but nothing was found in those directories.
> 
> I'm wondering if I understand this well and this is expected state where old
> folder is not touched anymore and that's the reason why VACUUM FULL is not
> releasing space in there.
> 
> Is there any way how to check if anything in that folder is really not used
> anymore and consider that safe to delete?
> 
> I'll appreciate any suggestions.

They is an output line of pg_upgrade which says:

Running this script will delete the old cluster's data files:

In summary, you can delete the old data directory.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Query on pg_stat_activity table got stuck

2019-05-11 Thread Tom Lane
I wrote:
> This patch is against HEAD --- I've not looked at how much adjustment
> it'll need for the back branches, but I'm sure there's some.

I've back-patched this now.  If you want to test the patch, the v10-branch
version is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3d113136bbaa86fbf1edde7aaf70ba06a6166b7

regards, tom lane