Clarification related to BDR

2020-05-14 Thread Santhosh Kumar
Hi,
 I came across a link published in postgresql, where it is clearly
mentioned BDR as an open source. When I tried to install BDR for CentOS
from 2ndQuadrant, the yum repository was not reachable and upon further
enquiring with 2ndQuadrant, I got a reply from them quoting as follows

"BDR is not open source. We do not have plans to open source this."

 Can you please help me understand, why the following news is published in
"postgresql" with an encouraging message acknowledging BDR as an open
source? We invested time and effort to use BDR only to understand at a
later point in time, that it is not. Kindly clarify, if I am missing
anything.

https://www.postgresql.org/about/news/1689/

[image: image.png]


Regards,
KRS


Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Andy Fan
Hi:

I want to know what happens been done for an uncommitted prepared
transaction with pg_waldump, however I can't find it.

demo=# begin;
BEGIN
demo=*# select txid_current();
 txid_current
--
  608
(1 row)

demo=*# prepare transaction 's';
PREPARE TRANSACTION
demo=# insert into mm select generate_series(1, 1000);
INSERT 0 1000
demo=#

===


pg_wal> ~/postgres/bin/pg_waldump  0001008F | grep 'tx:
   608'
rmgr: Transaction len (rec/tot):138/   138, tx:608, lsn:
0/8F68C020, prev 0/8F68BFD0, desc: PREPARE gid s: 2020-05-14
15:00:33.212997 CST

I can get the log for "prepared command" only,  but nothing was found for
the insert
statement.   what should I do?

My version is 9.4.

Thanks


Sv: Practical usage of large objects.

2020-05-14 Thread Andreas Joseph Krogh

På onsdag 13. mai 2020 kl. 19:53:38, skrev Dmitry Igrishin mailto:dmit...@gmail.com>>: 
Hello all,

 As you know, PostgreSQL has a large objects facility [1]. I'm curious
 are there real systems which are use this feature? I'm asking because
 and I'm in doubt should the Pgfe driver [2] provide the convenient API
 for working with large objects or not.

 Thanks! 

Yea, we use LOs, because using JDBC bytea reallys doesn't stream (at least 
using the pgjdbc-ng driver). When retrieving bytea using JDBC it retunrs an 
InputStream but it's backed by an in-memory byte[]. With LOs and java.sql.Blob 
(which the standard pgjdbc-dirver doesn't support ,but pgjdbc-ngdoes) it 
acutally uses strams and memory is kept down to a minimum. 


--
 Andreas Joseph Krogh

PG12.2 Configure cannot enalble SSL

2020-05-14 Thread Gavan Schneider


Greetings:

This problem may be related to several threads I have found, e.g.,
https://www.postgresql.org/message-id/29069.1575298784%40sss.pgh.pa.us

Context:
Building PG12.2 and need to enable SSL
Previous version builds have been uneventful but I haven’t attempted 
to enable SSL before


Sources:
PG — standard source distribution 12.2
OpenSSL — Homebrew
macOS Catalina v 10.15.4

Checks:
configure finishes normally if I do not attempt to enable SSL
the header files are at the end of the —with-includes
they are readable
	searches for this problem seem to relate to older versions and are no 
longer relevant in that I seem to have the include files properly 
identified


Problem:
	I really don’t know where to go with this except to hope someone can 
see my =obvious= error

All help appreciated.



Relecvant terminal:
-bash-3.2$ cd postgresql-12.2
-bash-3.2$ ls -las
total 1488
   0 drwxrwxrwx@ 18 pendari   staff 576 11 Feb 09:29 .
   0 drwxr-xr-x+  8 postgres  wheel 256 14 May 16:17 ..
   8 -rw-r--r--@  1 pendari   staff 730 11 Feb 09:14 .dir-locals.el
   8 -rw-r--r--@  1 pendari   staff1547 11 Feb 09:14 .gitattributes
   8 -rw-r--r--@  1 pendari   staff 504 11 Feb 09:14 .gitignore
   8 -rw-r--r--@  1 pendari   staff1192 11 Feb 09:14 COPYRIGHT
   8 -rw-r--r--@  1 pendari   staff3909 11 Feb 09:14 GNUmakefile.in
   8 -rw-r--r--@  1 pendari   staff 284 11 Feb 09:14 HISTORY
 128 -rw-r--r--@  1 pendari   staff   61606 11 Feb 09:29 INSTALL
   8 -rw-r--r--@  1 pendari   staff1665 11 Feb 09:14 Makefile
   8 -rw-r--r--@  1 pendari   staff1212 11 Feb 09:14 README
   8 -rw-r--r--@  1 pendari   staff 522 11 Feb 09:14 aclocal.m4
   0 drwxrwxrwx@ 23 pendari   staff 736 11 Feb 09:28 config
1120 -rwxr-xr-x@  1 pendari   staff  572134 11 Feb 09:14 configure
 168 -rw-r--r--@  1 pendari   staff   83465 11 Feb 09:14 configure.in
   0 drwxrwxrwx@ 59 pendari   staff1888 11 Feb 09:28 contrib
   0 drwxrwxrwx@  7 pendari   staff 224 11 Feb 09:28 doc
   0 drwxrwxrwx@ 22 pendari   staff 704 11 Feb 09:29 src
-bash-3.2$
-bash-3.2$ ./configure --with-openssl \
> --with-includes=/usr/local/opt/openssl/include/openssl \
> --with-libraries=/usr/local/opt/openssl/lib
checking build system type... x86_64-apple-darwin19.4.0
checking host system type... x86_64-apple-darwin19.4.0
...
...
...
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking openssl/ssl.h usability... no
checking openssl/ssl.h presence... no
checking for openssl/ssl.h... no
configure: error: header file  is required for OpenSSL
-bash-3.2$

-bash-3.2$ /usr/local/Cellar/openssl@1.1/1.1.1g/bin/openssl version
OpenSSL 1.1.1g  21 Apr 2020
-bash-3.2$



Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong. The ancients, in the case at bar, laid the blame upon the gods: 
sometimes they were remote and surly, and sometimes they were kind. In 
the Middle Ages lesser power took a hand in the matter, and so one reads 
of works of art inspired by Our Lady, by the Blessed Saints, by the 
souls of the departed, and even by the devil.  H. L. Mencken, 1920


Re: Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Andy Fan
On Thu, May 14, 2020 at 3:38 PM Andy Fan  wrote:

> I can get the log for "prepared command" only,  but nothing was found for
> the insert
> statement.   what should I do?
>
> My version is 9.4.
>
> Sorry, my production version is 9.4 and my demo above is v12.   I tried in
9.4, I still have
troubles to get the logs.

Thanks


Re: Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Michael Paquier
On Thu, May 14, 2020 at 03:38:24PM +0800, Andy Fan wrote:
> I want to know what happens been done for an uncommitted prepared
> transaction with pg_waldump, however I can't find it.
> 
> demo=*# prepare transaction 's';
> PREPARE TRANSACTION
> demo=# insert into mm select generate_series(1, 1000);
> INSERT 0 1000
>
> I can get the log for "prepared command" only,  but nothing was found for
> the insert statement.  what should I do?

Because in your previous sequence you inserted the data after
preparing the transaction and they are part of a completely different
transaction, no?
--
Michael


signature.asc
Description: PGP signature


Re: Clarification related to BDR

2020-05-14 Thread Magnus Hagander
On Thu, May 14, 2020 at 9:01 AM Santhosh Kumar 
wrote:

> Hi,
>  I came across a link published in postgresql, where it is clearly
> mentioned BDR as an open source. When I tried to install BDR for CentOS
> from 2ndQuadrant, the yum repository was not reachable and upon further
> enquiring with 2ndQuadrant, I got a reply from them quoting as follows
>
> "BDR is not open source. We do not have plans to open source this."
>
>  Can you please help me understand, why the following news is published in
> "postgresql" with an encouraging message acknowledging BDR as an open
> source? We invested time and effort to use BDR only to understand at a
> later point in time, that it is not. Kindly clarify, if I am missing
> anything.
>
> https://www.postgresql.org/about/news/1689/
>
> [image: image.png]
>
>
>
This news is from 2016. At that time, BDR was open source, but it has since
been closed.

//Magnus


Re: Clarification related to BDR

2020-05-14 Thread Andreas Kretschmer




Am 14.05.20 um 06:37 schrieb Santhosh Kumar:
Can you please help me understand, why the following news is published 
in "postgresql" with an encouraging message acknowledging BDR as an 
open source? We invested time and effort to use BDR only to understand 
at a later point in time, that it is not. Kindly clarify, if I am 
missing anything. 


BDR version 1 was Open Source, version 2 and 3 are not. Version 1 
(patched 9.4) and Version 2 (community PG 9.6) are not under support 
now, stable and supported version is 3 (PG 10 and 11, 12 soon).
You need a usage license which is bundled with a diamond support 
subscription.



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





Re: Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Andy Fan
On Thu, May 14, 2020 at 4:05 PM Michael Paquier  wrote:

> On Thu, May 14, 2020 at 03:38:24PM +0800, Andy Fan wrote:
> > I want to know what happens been done for an uncommitted prepared
> > transaction with pg_waldump, however I can't find it.
> >
> > demo=*# prepare transaction 's';
> > PREPARE TRANSACTION
> > demo=# insert into mm select generate_series(1, 1000);
> > INSERT 0 1000
> >
> > I can get the log for "prepared command" only,  but nothing was found for
> > the insert statement.  what should I do?
>
> Because in your previous sequence you inserted the data after
> preparing the transaction and they are part of a completely different
> transaction, no?
>

Thanks,  actually I don't know how to use prepared transaction and how it
works.
I care about this because there is a long prepared transaction exists in
our customer,
and we want to know what this transaction has done(like any data it
changed).
All the things I know is the data comes from pg_prepared_xact, but it
doesn't help a lot.

Best Regards
Andy Fan


Re: pg_upgrade too slow on vacuum phase

2020-05-14 Thread Kouber Saparev
>
> So the analyze(and freeze) are done before the new cluster are fully
> populated. Is the time being taken maybe for the loading schema/data
> portion?
>
>
No, the upgrade is stuck on these 2 stages indeed, maybe 50% on the first
and 50% (or a little more) on the second. When a run them outside of the
pg_upgrade procedure it is still slow, so pg_upgrade is not to blame. I
guess I will need to repack the biggest tables first.


Re: Clarification related to BDR

2020-05-14 Thread Simon Riggs
On Thu, 14 May 2020 at 08:01, Santhosh Kumar  wrote:

> Hi,
>  I came across a link published in postgresql, where it is clearly
> mentioned BDR as an open source. When I tried to install BDR for CentOS
> from 2ndQuadrant, the yum repository was not reachable and upon further
> enquiring with 2ndQuadrant, I got a reply from them quoting as follows
>
> "BDR is not open source. We do not have plans to open source this."
>
>  Can you please help me understand, why the following news is published in
> "postgresql" with an encouraging message acknowledging BDR as an open
> source? We invested time and effort to use BDR only to understand at a
> later point in time, that it is not. Kindly clarify, if I am missing
> anything.
>
> https://www.postgresql.org/about/news/1689/
>
> [image: image.png]
>

Santhosh,

2ndQuadrant has invested time and effort into the BDR project for the last
8 years and continues to do so.

BDR1 is open source and it continues to be available on 2ndQuadrant's
GitHub: https://github.com/2ndQuadrant/bdr. This version, however, runs on
PostgreSQL 9.4 which has now reached end-of-life. 2ndQuadrant, the
developers of BDR, don't recommend using this version and also no longer
support it, nor do we provide binaries.

The recommended version is BDR3, which has a new architecture and many new
features. BDR3 has been developed under a different and more viable
economic model which allows us to provide rapid response and hot fixes for
high availability and security issues to users, as well as rapid
development of new features. Many companies are now adopting this and new
users are welcome.

We remain committed to the active contribution of major new features and
timely bug fixes to open source PostgreSQL. We will continue to contribute
features from BDR to open source PostgreSQL over time, subject to community
acceptance.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

Mission Critical Databases


Re: Reuse an existing slot with a new initdb

2020-05-14 Thread Support


On 5/13/2020 9:28 PM, David G. Johnston wrote:
On Wednesday, May 13, 2020, Michael Paquier > wrote:


On Wed, May 13, 2020 at 02:12:45PM -0700, live-school support wrote:
> I didn't recal that it was not possible to create a hot standby
with a fresh
> new install and pg_dumpall :(.
>
> only pg_basebackup or an exact copy of the data folder can do it
right? is
> the reason technical or else?

When using physical replication, both the primary and standby need to
have the same system ID, and both instances need to share the same
architectures to work properly as data is physically replayed from one
cluster to the other using WAL, which includes for example copies of
on disk relation 8kB pages (ever heard of full_page_writes?).


This basically hits the nail on the head.  My reading is that the OP 
has two abstractly identical restored databases, one created from a 
physical copy and the other from a logical copy. The issue is why the 
original server cannot use the same replication slot name to continue 
synchronizing with the logically restored one but is able to continue 
with the physically restored one.  The above is why.


The OP asks whether the technical identifier error encountered can be 
overcome.  It cannot but even if it could the attempt would still end 
up failed due to fundamental differences in the physical data layouts 
between physical and logical restoration.  If the OP needs to rebuild 
a physical replication hot standby database they must use a physical 
backup of the original database as a starting point.  To use a 
logically restored database target would require logical replication.


David J.


Thanks Michael and David for your answers
I think David caught it, the question is Why, as long as we have an 
exact copy of the master (from pg_dumpall) we cannot start a new initdb 
hot standby with an already existing physical replication slots without 
the master complain about this "identifier doesn't match up"

knowing that everything seems to be synchronized?
Sorry Michael to not show you more logs, I made these tests weeks ago 
and cannot restart them for now, too busy on other jobs.


David


Re: Clarification related to BDR

2020-05-14 Thread Ravi Krishna




On 5/14/20 12:37 AM, Santhosh Kumar wrote:

Can you please help me understand, why the following news is published 
in "postgresql" with an encouraging message acknowledging BDR as an open 
source? 


In my opinion it is not a bright idea to not have support for any 
product. Support is an indemnity against blame game. If you face a 
catastrophic issue with the product and have no support team/company,

the blame will fall on you as you the person associated with the product.




Re: Column reset all values

2020-05-14 Thread otar shavadze
Thanks a lot. Drop and re-create views is not an option, because there is a
lot views, (and materialized views).
also nor index drop is an option, because I need re-create index as I use
this table in procedure, so index is necessary for  further queries. So
total runtime will not decreased.

Thanks  Olivier,  I will test out with second option, you mentioned.


On Wed, May 13, 2020 at 1:15 PM Olivier Gautherot 
wrote:

> Hi Otar,
>
> On Wed, May 13, 2020 at 10:15 AM otar shavadze 
> wrote:
>
>> postgres version 12
>> I have very simple update query, like this:
>>
>> update my_table
>> set
>> col = NULL
>> where
>> col IS NOT NULL;
>>
>> my_table contains few million rows, col is indexed column
>>
>> Fastest way would be   alter table, drop column and then add column
>> again, but I can't do this just because on this column   depends bunch of
>> views and materialized views.
>>
>> No much hope, but still asking, Is there some another way to just reset
>> column all values? ( with NULL in my case)
>>
>
> If views depend on this column, you may need to drop them (in the right
> order...) and then recreate them. Now, if they depend on a column that will
> not contain significant data, you may wish to remove the column, or declare
> it as null if you need to maintain compatibility.
>
> Now, if you have time and down time of the database is an issue, you may
> run the UPDATE on lots of 1000 rows (or whatever that number fits you).
> UPDATE is typically a INSERT/DELETE/VACUUM sequence and this copying around
> is the killer - doing it in one go can temporarily increase the disk usage.
> I've had success with the following pseudo code:
>
> SELECT rowid FROM mytable WHERE col IS NOT NULL
>
> and fed the result to something like:
>
> FOR chunk IN chunk_in_1000_rows(query_result)
> DO
> BEGIN
> UPDATE my_table SET col = NULL WHERE rowid IN chunk
> COMMIT
> SLEEP(5)
> DONE
>
> You may wish to run a VACUUM FULL manually at the end.
>
> In my case, I had to compute individual numbers so the processing was a
> bit more complex but it happily processed over 60 millions rows in a few
> days.
>
> Hope it helps
> --
> Olivier Gautherot
>
>


Re: PG12.2 Configure cannot enalble SSL

2020-05-14 Thread Tom Lane
"Gavan Schneider"  writes:
>   -bash-3.2$ ./configure --with-openssl \
>   > --with-includes=/usr/local/opt/openssl/include/openssl \
>   > --with-libraries=/usr/local/opt/openssl/lib
>   ...
>   checking openssl/ssl.h usability... no
>   checking openssl/ssl.h presence... no
>   checking for openssl/ssl.h... no
>   configure: error: header file  is required for OpenSSL

Offhand, I'll guess that you need to shorten the --with-includes
path to

--with-includes=/usr/local/opt/openssl/include

What you are showing here would only work if the header file's full path
is

/usr/local/opt/openssl/include/openssl/openssl/ssl.h

which doesn't seem likely.

regards, tom lane




Re: Reuse an existing slot with a new initdb

2020-05-14 Thread Tom Lane
Support  writes:
> I think David caught it, the question is Why, as long as we have an 
> exact copy of the master (from pg_dumpall)

Stop right there.  pg_dumpall does *not* produce an exact copy.
It produces a logically equivalent copy, which is not close enough
for physical replication to work.

regards, tom lane




Re: Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Laurenz Albe
On Thu, 2020-05-14 at 16:26 +0800, Andy Fan wrote:
> Thanks,  actually I don't know how to use prepared transaction and how it 
> works.
> I care about this because there is a long prepared transaction exists in our 
> customer,
> and we want to know what this transaction has done(like any data it changed). 
> All the things I know is the data comes from pg_prepared_xact, but it doesn't 
> help a lot. 

Idf you have the transaction ID from "pg_prepared_xact", you could check
what locks are held:

SELECT * FROM pg_locks WHERE transactionid = ...;

Than might give you a clue.

Using prepared transactions without a transaction manager that keeps track of 
them
and cleans up if necessary is dangerous.

Yours,
Laurenz Albe
-- 
+43-670-6056265
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com





Re: Practical usage of large objects.

2020-05-14 Thread Laurenz Albe
On Thu, 2020-05-14 at 12:59 +0900, Michael Paquier wrote:
> On Wed, May 13, 2020 at 01:55:48PM -0400, Tom Lane wrote:
> > Dmitry Igrishin  writes:
> > > As you know, PostgreSQL has a large objects facility [1]. I'm curious
> > > are there real systems which are use this feature?
> > 
> > We get questions about it regularly, so yeah people use it.
> 
> I recall that some applications where I work make use of it for some
> rather large log-like data.  At the end of the day, it really boils
> down to if you wish to store blobs of data which are larger than 1GB,
> the limit for toasted fields, as LOs can be up to 4TB.  Also, updating
> or reading a LO can be much cheaper than a toasted field, as the
> latter would update/read the value as a whole.

Interesting; only recently I played with that a little and found that
that is not necessarily true:

https://www.cybertec-postgresql.com/en/binary-data-performance-in-postgresql/

Yours,
Laurenz Albe





Re: Reuse an existing slot with a new initdb

2020-05-14 Thread Support



On 5/14/2020 6:33 AM, Tom Lane wrote:

Support  writes:

I think David caught it, the question is Why, as long as we have an
exact copy of the master (from pg_dumpall)

Stop right there.  pg_dumpall does *not* produce an exact copy.
It produces a logically equivalent copy, which is not close enough
for physical replication to work.

regards, tom lane


Ah ok, so there is no way to make physical <--> logical copy jumping 
so... :(





vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Eduard Rozenberg
Hello there,

I'm a long-time postgres user but vacuumdb newbie :).

Using vacuumdb (v12.2) with '---jobs' to vacuum a v9.6 database on localhost 
with parallel processes:

$ time vacuumdb --full --echo -U postgres --jobs=8 -d mydatabase_test 
-p 5434 --password

As shown below I ran into the (100%?) predictable/expected deadlock with sys 
catalog tables documented in the vacuumdb doc page 
(https://www.postgresql.org/docs/12/app-vacuumdb.html 
).

Question: I specified "-d mydatabase_test" - why is it also doing a full vac on 
the system catalog (postgres db) which is practically guaranteed to cause a 
deadlock? I don't need or want it to do a full vac on the postgres db as I can 
do that myself later if I need to, and it takes no time compared to full vac of 
my own 7 TB database that I am "extremely eager" to parallelize.

Is my only option list all schemas.tables and feed to vacuumdb as "-t 
schema1.table1 -t schema1.table2 " ? That's manageable but unfortunate 
as it creates more work (having to list all tables) and adds additional work 
for automation scripts as well.

I've tried searching forums/goog for past issues related to this. I did find an 
old discussion about possible enhancements to "vacuumdb --jobs" to help the 
user avoid these predictable deadlocks, where the conclusion was to let the 
user figure out a way to deal with it such as via "-t table -t table -t table 
...".

Maybe "--jobs" should only apply to the user's own databases, and when vacuumdb 
is working on system catalog tables it should apply a strictly 
serial/sequential approach to handle those tables?

Thanks!


--
output from vacuumdb's --echo
--
...
VACUUM (FULL) mycompanytesting.sometable1;
VACUUM (FULL) index.mydatabasestd_missing;
VACUUM (FULL) mycompanytesting.sometable2;
VACUUM (FULL) mycompanytesting.sometable3;
VACUUM (FULL) pg_catalog.pg_attribute;
VACUUM (FULL) mycompanydata.sometable4;
VACUUM (FULL) pg_catalog.pg_statistic;
VACUUM (FULL) mycompanytesting.sometable5;
VACUUM (FULL) pg_catalog.pg_proc;
VACUUM (FULL) pg_catalog.pg_depend;
VACUUM (FULL) pg_catalog.pg_class;
VACUUM (FULL) pg_catalog.pg_type;
vacuumdb: error: vacuuming of database "mydatabase_test" failed: ERROR:  
deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of database 
35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 35239378; 
blocked by process 28183.
HINT:  See server log for query details.


--
/var/log/postgresql-9.6
--
...
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.pg_toast.pg_toast_35240266"
LOG:  could not receive data from client: Connection timed out
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.pg_toast.pg_toast_35240330"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of database 
35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 
35239378; blocked by process 28183.
Process 28183: VACUUM (FULL) pg_catalog.pg_type;
Process 28182: VACUUM (FULL) pg_catalog.pg_class;
HINT:  See server log for query details.
STATEMENT:  VACUUM (FULL) pg_catalog.pg_type;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_class;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_depend;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) someschema.sometable
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
LOG:  could not send data to client: Broken pipe
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL) mycompanydata.wildcard_replacement_bkp;
FATAL:  connection to client lost
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_proc;
LOG:  c

Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Adrian Klaver

On 5/14/20 8:35 AM, Eduard Rozenberg wrote:

Hello there,

I'm a long-time postgres user but vacuumdb newbie :).

Using vacuumdb (v12.2) with '---jobs' to vacuum a v9.6 database on 
localhost with parallel processes:


$ time vacuumdb --full --echo -U postgres --jobs=8 -d mydatabase_test -p 
5434 --password


As shown below I ran into the (100%?) predictable/expected deadlock with 
sys catalog tables documented in the vacuumdb doc page 
(https://www.postgresql.org/docs/12/app-vacuumdb.html).


Question: I specified "-d mydatabase_test" - why is it also doing a full 
vac on the system catalog (postgres db) which is practically guaranteed 


The postgres database and the system catalogs are two separate things.

to cause a deadlock? I don't need or want it to do a full vac on the 
postgres db as I can do that myself later if I need to, and it takes no 
time compared to full vac of my own 7 TB database that I am "extremely 
eager" to parallelize.


See:

https://www.postgresql.org/docs/12/sql-vacuum.html

"FULL

Selects “full” vacuum, which can reclaim more space, but takes much 
longer and exclusively locks the table. This method also requires extra 
disk space, since it writes a new copy of the table and doesn't release 
the old copy until the operation is complete. Usually this should only 
be used when a significant amount of space needs to be reclaimed from 
within the table.

"

Pretty sure all that rewriting of tables bloats the system catalogs and 
so they get a VACUUM FULL.



The question here is why is it necessary to do a VACUUM FULL in the 
first place?




Is my only option list all schemas.tables and feed to vacuumdb as "-t 
schema1.table1 -t schema1.table2 " ? That's manageable but 
unfortunate as it creates more work (having to list all tables) and adds 
additional work for automation scripts as well.


I've tried searching forums/goog for past issues related to this. I did 
find an old discussion about possible enhancements to "vacuumdb --jobs" 
to help the user avoid these predictable deadlocks, where the conclusion 
was to let the user figure out a way to deal with it such as via "-t 
table -t table -t table ...".


Maybe "--jobs" should only apply to the user's own databases, and when 
vacuumdb is working on system catalog tables it should apply a strictly 
serial/sequential approach to handle those tables?


Thanks!


--
output from vacuumdb's --echo
--
...
VACUUM (FULL) mycompanytesting.sometable1;
VACUUM (FULL) index.mydatabasestd_missing;
VACUUM (FULL) mycompanytesting.sometable2;
VACUUM (FULL) mycompanytesting.sometable3;
VACUUM (FULL) pg_catalog.pg_attribute;
VACUUM (FULL) mycompanydata.sometable4;
VACUUM (FULL) pg_catalog.pg_statistic;
VACUUM (FULL) mycompanytesting.sometable5;
VACUUM (FULL) pg_catalog.pg_proc;
VACUUM (FULL) pg_catalog.pg_depend;
VACUUM (FULL) pg_catalog.pg_class;
VACUUM (FULL) pg_catalog.pg_type;
vacuumdb: error: vacuuming of database "mydatabase_test" failed: ERROR: 
  deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of 
database 35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 
35239378; blocked by process 28183.

HINT:  See server log for query details.


--
/var/log/postgresql-9.6
--
...
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table 
"mydatabase_test.pg_toast.pg_toast_35240266"

LOG:  could not receive data from client: Connection timed out
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
LOG:  received SIGHUP, reloading configuration files
LOG:  received SIGHUP, reloading configuration files
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table 
"mydatabase_test.pg_toast.pg_toast_35240330"

ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
ERROR:  deadlock detected
DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of 
database 35239378; blocked by process 28182.
Process 28182 waits for AccessShareLock on relation 1247 of database 
35239378; blocked by process 28183.

Process 28183: VACUUM (FULL) pg_catalog.pg_type;
Process 28182: VACUUM (FULL) pg_catalog.pg_class;
HINT:  See server log for query details.
STATEMENT:  VACUUM (FULL) pg_catalog.pg_type;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL) pg_catalog.pg_class;
ERROR:  canceling statement due to u

Re: Practical usage of large objects.

2020-05-14 Thread Thomas Markus



Am 14.05.20 um 15:36 schrieb Laurenz Albe:

Interesting; only recently I played with that a little and found that
that is not necessarily true:

https://www.cybertec-postgresql.com/en/binary-data-performance-in-postgresql/

Yours,
Laurenz Albe
We used lo a lot in a project for large uploads (>4GB files). Really 
useful in a cloud environment.


I was interested in speed camparison myself and made a similar test with 
network connection and without pg specific code.

https://github.com/5UtJAjiRWj1q/psql-lob-performance

File access is really fast and lo access is much slower than bytea (as 
expected). But content size limitation and memory consumption for bytea 
is problematic.


regards
Thomas





Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Adrian Klaver

On 5/14/20 11:27 AM, Eduard Rozenberg wrote:

@Adrian thanks.

I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
previously on a test db copy and saw the DB size (postgres 9.6) shrink 
from 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).


I don't know the reason so much space was "locked up" (other than there 
is a lot of data "churn" from daily updates). But I definitely do need 
to do the vac full on the production db to get down to the smaller size 
- cannot afford the 2.4 TB of "wasted" space on an ongoing basis.


It may not be wasted space. A regular VACUUM marks space within a table 
available for reuse(and not returned to OS) when it removes unneeded 
tuples. It then fills that space up with new tuples, roughly speaking. 
So if the vacuuming is working properly you will reach a steady state 
where space is reused and the database on disk size grows slowly as 
reusable space is occupied. I would monitor the database size on a 
regular basis. My guess is that the VACUUM FULL is dropping the OS used 
space and then it fills up again as the database does those updates.




Based on your email it sounds like the vacuuming of those pg_catalog.* 
tables is done regardless, as a normal part of doing vac full on my own 
database.


Unfortunately I still don't see an ideal way to run vacuum full in 
parallel via vacuumdb without running into the expected and documented 
deadlock. Only method I'm aware of is to list each table individually 
with "-t table1 -t table2..." to "vacuum db --jobs" which is not 
pleasant and not exceedingly beautiful.


Thanks.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Adrian Klaver

On 5/14/20 12:03 PM, Eduard Rozenberg wrote:

@Adrian thanks again.

I read the postgres docs the same way - that previously used space is marked as 
available and therefore no need for vacuum full. Previously used = now 
available space, which gets re-used, in theory.

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 TB of 
previously used space is clearly available ("clearly available" as proven by 
vacuum full shrinking the DB space usage by 2.4 TB). I did verify postgresql.conf has 
always been properly configured re: autovacuum:  'autovacuum = on'and 'track_counts = on'


Well if I'm counting zeros right 50GB on 4.4TB database is ~1.14%. Does 
that sound right for new data being added?



One place to look to see how aggressively the autovacuum  is being done 
here:


https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW

What you find there may mean tweaking the values as explained here:

https://www.postgresql.org/docs/12/routine-vacuuming.html#AUTOVACUUM

Could be there are just a few tables that account for most of the churn 
and a manual VACUUM on them is needed. Say if there are regularly 
scheduled large UPDATEs to tables, incorporate a VACUUM after.




I'm not planning on running VACUUM FULL regularly, just "this one time". And I was trying to to 
parallelize VACUUM FULL and minimize downtime of the production DB caused by table locking. And then I found 
the option of using "vacuumdb --jobs" which sounded like the perfect solution except for "well 
you can't actually use --jobs because you'll run into a deadlock and everybody knows that and nobody has a 
(good) solution for it" :).

--Ed


On May 14, 2020, at 11:46, Adrian Klaver  wrote:

On 5/14/20 11:27 AM, Eduard Rozenberg wrote:

@Adrian thanks.
I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
previously on a test db copy and saw the DB size (postgres 9.6) shrink from 6.8 TB to 4.4 
TB of actual, real disk space usage ($ df -d 1).
I don't know the reason so much space was "locked up" (other than there is a lot of data 
"churn" from daily updates). But I definitely do need to do the vac full on the production db to 
get down to the smaller size - cannot afford the 2.4 TB of "wasted" space on an ongoing basis.


It may not be wasted space. A regular VACUUM marks space within a table 
available for reuse(and not returned to OS) when it removes unneeded tuples. It 
then fills that space up with new tuples, roughly speaking. So if the vacuuming 
is working properly you will reach a steady state where space is reused and the 
database on disk size grows slowly as reusable space is occupied. I would 
monitor the database size on a regular basis. My guess is that the VACUUM FULL 
is dropping the OS used space and then it fills up again as the database does 
those updates.


Based on your email it sounds like the vacuuming of those pg_catalog.* tables 
is done regardless, as a normal part of doing vac full on my own database.
Unfortunately I still don't see an ideal way to run vacuum full in parallel via vacuumdb without 
running into the expected and documented deadlock. Only method I'm aware of is to list each table 
individually with "-t table1 -t table2..." to "vacuum db --jobs" which is not 
pleasant and not exceedingly beautiful.
Thanks.



--
Adrian Klaver
adrian.kla...@aklaver.com





--
Adrian Klaver
adrian.kla...@aklaver.com




surprisingly slow creation of gist index used in exclude constraint

2020-05-14 Thread Chris Withers

  
  
Hi,
I'm upgrading a database from 9.4 to 11.5 by dumping from the old
  cluster and loading into the new cluster.
The database is tiny: around 2.3G, but importing this table is
  proving problematic:
Column  |   Type|Modifiers
+---+--
 period | tsrange   | not null
 col1   | character varying | not null
 col2   | character varying | not null
 col3   | integer   | not null
 col4   | character varying | not null default ''::character varying
 id | integer   | not null default nextval('mkt_profile_id_seq'::regclass)
 deleted| boolean   | not null default false
 managed| boolean   | not null default false
 col5   | character varying |
Indexes:
"mkt_profile_pkey" PRIMARY KEY, btree (id)
"mkt_profile_period_col1_col4_col2_chan_excl" EXCLUDE USING gist (period WITH &&, col1 WITH =, col4 WITH =, col2 WITH =, col3 WITH =)
Check constraints:
"mkt_profile_period_check" CHECK (period <> 'empty'::tsrange)
Foreign-key constraints:
"mkt_profile_col1_fkey" FOREIGN KEY (col1) REFERENCES host(name)
It has 4.1 million rows in it and while importing the data only
  takes a couple of minutes, when I did a test load into the new
  cluster, building the mkt_profile_period_col1_col4_col2_chan_excl
  index for the exclude constraint took 15 hours.
I feel like asking what I'm doing wrong here? The new server is
  pretty decent hardware...
Concrete questions:
- what, if anything, am I getting badly wrong here?
- what can I do to speed up creation of this index?
- failing that, what can I do to import and then create the index
  in the background? 

As you can imagine, a 15hr outage for an upgrade has not met with
  large amounts of happiness from the people whose application it is
  ;-)
Chris

  



Re: surprisingly slow creation of gist index used in exclude constraint

2020-05-14 Thread ktm



Quoting Chris Withers :


Hi,

I'm upgrading a database from 9.4 to 11.5 by dumping from the old
cluster and loading into the new cluster.

The database is tiny: around 2.3G, but importing this table is
proving problematic:
Column  |   Type|Modifiers
...
I feel like asking what I'm doing wrong here? The new server is
pretty decent hardware...

Concrete questions:

- what, if anything, am I getting badly wrong here?

- what can I do to speed up creation of this index?

- failing that, what can I do to import and then create the index   in
the background?

As you can imagine, a 15hr outage for an upgrade has not met with
large amounts of happiness from the people whose application it is
;-)

Chris

Hi Chris,

This sounds like a candidate for pg_logical replicating from the old  
to new system.


Regards,
Ken






Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Eduard Rozenberg
@Adrian thanks again.

I read the postgres docs the same way - that previously used space is marked as 
available and therefore no need for vacuum full. Previously used = now 
available space, which gets re-used, in theory.

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 
TB of previously used space is clearly available ("clearly available" as proven 
by vacuum full shrinking the DB space usage by 2.4 TB). I did verify 
postgresql.conf has always been properly configured re: autovacuum:  
'autovacuum = on'and 'track_counts = on'

I'm not planning on running VACUUM FULL regularly, just "this one time". And I 
was trying to to parallelize VACUUM FULL and minimize downtime of the 
production DB caused by table locking. And then I found the option of using 
"vacuumdb --jobs" which sounded like the perfect solution except for "well you 
can't actually use --jobs because you'll run into a deadlock and everybody 
knows that and nobody has a (good) solution for it" :).

--Ed

> On May 14, 2020, at 11:46, Adrian Klaver  wrote:
> 
> On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
>> @Adrian thanks.
>> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
>> previously on a test db copy and saw the DB size (postgres 9.6) shrink from 
>> 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).
>> I don't know the reason so much space was "locked up" (other than there is a 
>> lot of data "churn" from daily updates). But I definitely do need to do the 
>> vac full on the production db to get down to the smaller size - cannot 
>> afford the 2.4 TB of "wasted" space on an ongoing basis.
> 
> It may not be wasted space. A regular VACUUM marks space within a table 
> available for reuse(and not returned to OS) when it removes unneeded tuples. 
> It then fills that space up with new tuples, roughly speaking. So if the 
> vacuuming is working properly you will reach a steady state where space is 
> reused and the database on disk size grows slowly as reusable space is 
> occupied. I would monitor the database size on a regular basis. My guess is 
> that the VACUUM FULL is dropping the OS used space and then it fills up again 
> as the database does those updates.
> 
>> Based on your email it sounds like the vacuuming of those pg_catalog.* 
>> tables is done regardless, as a normal part of doing vac full on my own 
>> database.
>> Unfortunately I still don't see an ideal way to run vacuum full in parallel 
>> via vacuumdb without running into the expected and documented deadlock. Only 
>> method I'm aware of is to list each table individually with "-t table1 -t 
>> table2..." to "vacuum db --jobs" which is not pleasant and not exceedingly 
>> beautiful.
>> Thanks.
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com





Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Eduard Rozenberg
@Adrian thanks.

I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
previously on a test db copy and saw the DB size (postgres 9.6) shrink from 6.8 
TB to 4.4 TB of actual, real disk space usage ($ df -d 1).

I don't know the reason so much space was "locked up" (other than there is a 
lot of data "churn" from daily updates). But I definitely do need to do the vac 
full on the production db to get down to the smaller size - cannot afford the 
2.4 TB of "wasted" space on an ongoing basis.

Based on your email it sounds like the vacuuming of those pg_catalog.* tables 
is done regardless, as a normal part of doing vac full on my own database.

Unfortunately I still don't see an ideal way to run vacuum full in parallel via 
vacuumdb without running into the expected and documented deadlock. Only method 
I'm aware of is to list each table individually with "-t table1 -t table2..." 
to "vacuum db --jobs" which is not pleasant and not exceedingly beautiful.

Thanks.

> On May 14, 2020, at 10:35, Adrian Klaver  wrote:
> 
> On 5/14/20 8:35 AM, Eduard Rozenberg wrote:
>> Hello there,
>> I'm a long-time postgres user but vacuumdb newbie :).
>> Using vacuumdb (v12.2) with '---jobs' to vacuum a v9.6 database on localhost 
>> with parallel processes:
>> $ time vacuumdb --full --echo -U postgres --jobs=8 -d mydatabase_test -p 
>> 5434 --password
>> As shown below I ran into the (100%?) predictable/expected deadlock with sys 
>> catalog tables documented in the vacuumdb doc page 
>> (https://www.postgresql.org/docs/12/app-vacuumdb.html).
>> Question: I specified "-d mydatabase_test" - why is it also doing a full vac 
>> on the system catalog (postgres db) which is practically guaranteed 
> 
> The postgres database and the system catalogs are two separate things.
> 
>> to cause a deadlock? I don't need or want it to do a full vac on the 
>> postgres db as I can do that myself later if I need to, and it takes no time 
>> compared to full vac of my own 7 TB database that I am "extremely eager" to 
>> parallelize.
> 
> See:
> 
> https://www.postgresql.org/docs/12/sql-vacuum.html 
> 
> 
> "FULL
> 
>Selects “full” vacuum, which can reclaim more space, but takes much longer 
> and exclusively locks the table. This method also requires extra disk space, 
> since it writes a new copy of the table and doesn't release the old copy 
> until the operation is complete. Usually this should only be used when a 
> significant amount of space needs to be reclaimed from within the table.
> "
> 
> Pretty sure all that rewriting of tables bloats the system catalogs and so 
> they get a VACUUM FULL.
> 
> 
> The question here is why is it necessary to do a VACUUM FULL in the first 
> place?
> 
>> Is my only option list all schemas.tables and feed to vacuumdb as "-t 
>> schema1.table1 -t schema1.table2 " ? That's manageable but 
>> unfortunate as it creates more work (having to list all tables) and adds 
>> additional work for automation scripts as well.
>> I've tried searching forums/goog for past issues related to this. I did find 
>> an old discussion about possible enhancements to "vacuumdb --jobs" to help 
>> the user avoid these predictable deadlocks, where the conclusion was to let 
>> the user figure out a way to deal with it such as via "-t table -t table -t 
>> table ...".
>> Maybe "--jobs" should only apply to the user's own databases, and when 
>> vacuumdb is working on system catalog tables it should apply a strictly 
>> serial/sequential approach to handle those tables?
>> Thanks!
>> --
>> output from vacuumdb's --echo
>> --
>> ...
>> VACUUM (FULL) mycompanytesting.sometable1;
>> VACUUM (FULL) index.mydatabasestd_missing;
>> VACUUM (FULL) mycompanytesting.sometable2;
>> VACUUM (FULL) mycompanytesting.sometable3;
>> VACUUM (FULL) pg_catalog.pg_attribute;
>> VACUUM (FULL) mycompanydata.sometable4;
>> VACUUM (FULL) pg_catalog.pg_statistic;
>> VACUUM (FULL) mycompanytesting.sometable5;
>> VACUUM (FULL) pg_catalog.pg_proc;
>> VACUUM (FULL) pg_catalog.pg_depend;
>> VACUUM (FULL) pg_catalog.pg_class;
>> VACUUM (FULL) pg_catalog.pg_type;
>> vacuumdb: error: vacuuming of database "mydatabase_test" failed: ERROR:   
>> deadlock detected
>> DETAIL:  Process 28183 waits for AccessShareLock on relation 1259 of 
>> database 35239378; blocked by process 28182.
>> Process 28182 waits for AccessShareLock on relation 1247 of database 
>> 35239378; blocked by process 28183.
>> HINT:  See server log for query details.
>> --
>> /var/log/postgresql-9.6
>> --
>> ...
>> LOG:  received SIGHUP, reloading configuration files
>> ERROR:  canceling autovacuum task
>> CONTEXT:  automatic vacuum of table "mydatabase_test.someschema.sometable"
>> ERROR:  canceling autovacuum task
>> CONTEXT:  automatic vacuum of table "mydatabase_test.somesc

Re: surprisingly slow creation of gist index used in exclude constraint

2020-05-14 Thread Tom Lane
Chris Withers  writes:
>   It has 4.1 million rows in it and while importing the data only
>   takes a couple of minutes, when I did a test load into the new
>   cluster, building the mkt_profile_period_col1_col4_col2_chan_excl
>   index for the exclude constraint took 15 hours.

Don't recall for sure, but I think GIST index build is sensitive
to the maintenance_work_mem setting; did you have that cranked up?

> - failing that, what can I do to import and then create the index
>   in the background?

CREATE INDEX CONCURRENTLY, perhaps.

regards, tom lane




Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Virendra Kumar
You might have index fragmentation and possibly reindexing them conncurrently 
on PG12, should do that. As everyone agreed most of space will be marked for 
re-use later for table segments but indices in your case could be problem. On 
previous versions you can use pg_repack to do index rebuilds which will help 
reduce space consumed.

Regards,
Virendra 

On Thursday, May 14, 2020, 1:20:41 PM PDT, Eduard Rozenberg 
 wrote:  
 
 @Adrian thanks again.

I read the postgres docs the same way - that previously used space is marked as 
available and therefore no need for vacuum full. Previously used = now 
available space, which gets re-used, in theory.

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 
TB of previously used space is clearly available ("clearly available" as proven 
by vacuum full shrinking the DB space usage by 2.4 TB). I did verify 
postgresql.conf has always been properly configured re: autovacuum:  
'autovacuum = on'and 'track_counts = on'

I'm not planning on running VACUUM FULL regularly, just "this one time". And I 
was trying to to parallelize VACUUM FULL and minimize downtime of the 
production DB caused by table locking. And then I found the option of using 
"vacuumdb --jobs" which sounded like the perfect solution except for "well you 
can't actually use --jobs because you'll run into a deadlock and everybody 
knows that and nobody has a (good) solution for it" :).

--Ed

> On May 14, 2020, at 11:46, Adrian Klaver  wrote:
> 
> On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
>> @Adrian thanks.
>> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
>> previously on a test db copy and saw the DB size (postgres 9.6) shrink from 
>> 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).
>> I don't know the reason so much space was "locked up" (other than there is a 
>> lot of data "churn" from daily updates). But I definitely do need to do the 
>> vac full on the production db to get down to the smaller size - cannot 
>> afford the 2.4 TB of "wasted" space on an ongoing basis.
> 
> It may not be wasted space. A regular VACUUM marks space within a table 
> available for reuse(and not returned to OS) when it removes unneeded tuples. 
> It then fills that space up with new tuples, roughly speaking. So if the 
> vacuuming is working properly you will reach a steady state where space is 
> reused and the database on disk size grows slowly as reusable space is 
> occupied. I would monitor the database size on a regular basis. My guess is 
> that the VACUUM FULL is dropping the OS used space and then it fills up again 
> as the database does those updates.
> 
>> Based on your email it sounds like the vacuuming of those pg_catalog.* 
>> tables is done regardless, as a normal part of doing vac full on my own 
>> database.
>> Unfortunately I still don't see an ideal way to run vacuum full in parallel 
>> via vacuumdb without running into the expected and documented deadlock. Only 
>> method I'm aware of is to list each table individually with "-t table1 -t 
>> table2..." to "vacuum db --jobs" which is not pleasant and not exceedingly 
>> beautiful.
>> Thanks.
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com


  

Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

2020-05-14 Thread Michael Lewis
On Thu, May 14, 2020 at 2:20 PM Eduard Rozenberg 
wrote:

> I did verify postgresql.conf has always been properly configured re:
> autovacuum:  'autovacuum = on'and 'track_counts = on'
>

This may be insufficient to keep up if you have large tables. The default
scale factor allows for 20% of the rows to be dead before the autovacuum
will kick in to mark the space as available for reuse. Assuming you have
the I/O capacity and prefer to do a little cleanup more often rather than
HUGE cleanup work all at once on rare occasions, it may be ideal to look at
turning down the autovacuum_vacuum_scale_factor. You can tweak these
settings on large tables only, or increase the autovacuum_vacuum_threshold
at the same time to compensate a bit for decreasing the scale factor. You
can also look at pg_stat_activity for autovacuums, and if you see that some
are running for hours, then probably they are trying to do too much work
all at once and waiting too long before tidying up. Also, the default
autovacuum_vacuum_cost_delay was changed from 20ms to 2ms with PG12 so that
may be worth considering as a best practice even on older versions.


Re: Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Andy Fan
On Thu, May 14, 2020 at 9:33 PM Laurenz Albe 
wrote:

> On Thu, 2020-05-14 at 16:26 +0800, Andy Fan wrote:
> > Thanks,  actually I don't know how to use prepared transaction and how
> it works.
> > I care about this because there is a long prepared transaction exists in
> our customer,
> > and we want to know what this transaction has done(like any data it
> changed).
> > All the things I know is the data comes from pg_prepared_xact, but it
> doesn't help a lot.
>
> Idf you have the transaction ID from "pg_prepared_xact", you could check
> what locks are held:
>
> SELECT * FROM pg_locks WHERE transactionid = ...;
>
> Than might give you a clue.
>
> Thanks a lot.   this transaction only lock a transactionid lock,   so I
assume there is nothing
is done in this transaction.


  locktype| database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid  |
  mode  | granted | fastpath
---+--+--+--+---++---+-+---+--++--+---+-+--
 transactionid |  |  |  |   ||
 1181845808 | |   |  | -1/1181845808  |  |
ExclusiveLock | t   | f
(1 row)

Best Regards
Andy Fan


view selection during query rewrite

2020-05-14 Thread Vamsi Meduri
Suppose I have the following query and a view v1 defined as follows:
*Q1: SELECT * FROM Table1 WHERE Table1.col1 = 5 and Table1.col2 LIKE
'%abc%';*
*create view v1 as select * from Table1 where Table1.col1 = 5;*

An effective way to execute Q1 would be to re-write it as: *select * from
v1 where col2 LIKE '%abc%';* provided that the selection predicates are
highly selective and also if it is cheaper in execution time of the query.

However I see that postgres does not automatically perform query rewriting
using view. I have looked at the plan using explain (analyze) and I see
that Q1 is always executed using the base tables unless I explicitly
mention the view in the query instead of the base table.

Is there a way to enforce automatic view selection in postgres or is it up
to the user to write the query optimally?


Re: view selection during query rewrite

2020-05-14 Thread Tom Lane
Vamsi Meduri  writes:
> Suppose I have the following query and a view v1 defined as follows:
> *Q1: SELECT * FROM Table1 WHERE Table1.col1 = 5 and Table1.col2 LIKE
> '%abc%';*
> *create view v1 as select * from Table1 where Table1.col1 = 5;*

> An effective way to execute Q1 would be to re-write it as: *select * from
> v1 where col2 LIKE '%abc%';* provided that the selection predicates are
> highly selective and also if it is cheaper in execution time of the query.

> However I see that postgres does not automatically perform query rewriting
> using view. I have looked at the plan using explain (analyze) and I see
> that Q1 is always executed using the base tables unless I explicitly
> mention the view in the query instead of the base table.

If you are talking about a regular view, such a transformation would be a
complete waste of time, because the end result would be exactly the same
(after expending a lot of cycles transforming the query and then reversing
it back to the original state during view expansion).

If it's a materialized view, then there'd be a potential for savings ...
but the other side of that coin is that you'd get stale data, since
a matview is not going to be entirely up to date.  It's no business of
the rewriter (or the query planner) to decide that such a tradeoff is OK.

I do recall some discussion of extensions attempting to do such things,
but I doubt we'd ever put it in core Postgres.

regards, tom lane




Logical replication for async service communication?

2020-05-14 Thread Sean Huber
Has anyone attempted to use logical replication with table partitioning for
async service communication?

Proof of concept:
https://gist.github.com/shuber/8e53d42d0de40e90edaf4fb182b59dfc

Services would commit messages to their own databases along with the rest
of their data (with the same transactional guarantees) and then messages
are "realtime" replicated (with all of its features and guarantees) to the
receiving service's database where their workers (e.g. que-rb
, skip locked polling, etc) are waiting to
respond by inserting messages into *their* database to be replicated back.

Throw in a trigger to automatically acknowledge/cleanup/notify messages and
I think we've got something that resembles a queue? Maybe make that same
trigger match incoming messages against a "routes" table (based on message
type, certain JSON schemas
 in the payload, etc)
and write matches to the que-rb jobs table instead for some kind of
distributed/replicated work queue hybrid?

My motivations for this line of thinking were mostly based around high
availability and isolating service downtime/failures from each other. Our
PostgreSQL databases are the most critical pieces of infrastructure for all
of our services - if it's down then we don't want the impacted service to
even attempt to be doing work. On the other hand, we don't want a service's
downtime to impact its ability to receive (queued) messages from other
services that it can resume consuming (once, in order) when it's back up.

We're exploring other message queues but keep getting drawn back to
PostgreSQL because we can get the same transactional guarantees with our
messages/jobs as the rest of our data. Even the act of enqueuing a job or
sending a message to another service is something that must be committed
and can be rolled back like everything else.

For our potential use case specifically, we're not dealing with high levels
of realtime traffic etc - we're not even close to 1k jobs/messages per
second.

I'm looking to poke holes in this concept before sinking anymore time
exploring the idea. Any feedback/warnings/concerns would be much
appreciated, thanks for your time!

Sean Huber


Re: surprisingly slow creation of gist index used in exclude constraint

2020-05-14 Thread Chris Withers

On 14/05/2020 21:16, k...@rice.edu wrote:


Hi Chris,

This sounds like a candidate for pg_logical replicating from the old to 
new system.


Can you point me to a good guide as to how to easily set this up for one 
database and would work between pg 9.4 and pg 11.5?


cheers,

Chris




Re: Check what has been done for a uncommitted prepared transaction

2020-05-14 Thread Laurenz Albe
On Fri, 2020-05-15 at 08:09 +0800, Andy Fan wrote:
> On Thu, May 14, 2020 at 9:33 PM Laurenz Albe  wrote:
> > On Thu, 2020-05-14 at 16:26 +0800, Andy Fan wrote:
> > > Thanks,  actually I don't know how to use prepared transaction and how it 
> > > works.
> > > I care about this because there is a long prepared transaction exists in 
> > > our customer,
> > > and we want to know what this transaction has done(like any data it 
> > > changed). 
> > > All the things I know is the data comes from pg_prepared_xact, but it 
> > > doesn't help a lot. 
> > 
> > Idf you have the transaction ID from "pg_prepared_xact", you could check
> > what locks are held:
> > 
> > SELECT * FROM pg_locks WHERE transactionid = ...;
> > 
> > Than might give you a clue.
> > 
> 
> Thanks a lot.   this transaction only lock a transactionid lock,   so I 
> assume there is nothing
> is done in this transaction. 
> 
> 
>   locktype| database | relation | page | tuple | virtualxid | 
> transactionid | classid | objid | objsubid | virtualtransaction | pid  | 
> mode  | granted | fastpath
> ---+--+--+--+---++---+-+---+--++--+---+-+--
>  transactionid |  |  |  |   ||
> 1181845808 | |   |  | -1/1181845808  |  | 
> ExclusiveLock | t   | f
> (1 row)

Looks like it, yes.
Roll it back then.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com