postgres: WAL ends befor end of online backup
We are running postgres 9.6, with 10+TB size. Backups have been taken using a homegrown tool "pgrsync", which uses S3 as the repository. Both the backup files and WAL archives are stored on S3. Problem: While trying to restore, the WAL archives restore randomly fails for some backups with the following message in logs: 2020-11-12 06:33:32 UTC [10037]: [27988-1] user=,db=LOG: redo done at 5493D/2EFFF568 2020-11-12 06:33:32 UTC [10037]: [27989-1] user=,db=LOG: last completed transaction was at log time 2020-11-06 12:31:27.796805+00 2020-11-12 06:33:34 UTC [10037]: [27990-1] user=,db=LOG: restored log file "00020005493D002E" from archive 2020-11-12 06:33:34 UTC [10037]: [27991-1] user=,db=FATAL: WAL ends before end of online backup 2020-11-12 06:33:34 UTC [10037]: [27992-1] user=,db=HINT: All WAL generated while online backup was taken must be available at recovery. 2020-11-12 06:33:36 UTC [10033]: [3-1] user=,db=LOG: startup process (PID 10037) exited with exit code 1 2020-11-12 06:33:36 UTC [10033]: [4-1] user=,db=LOG: terminating any other active server processes 2020-11-12 06:33:48 UTC [10033]: [5-1] user=,db=LOG: database system is shut down In this case, the backup start location is 0002000544C6006B and stop location is 000200054521008D, (based on pg_stop_backup() output) but it stops in between at 005493D and terminates the restore. If I redo the restore again, it stops exactly at the same point. Similar results from couple of more backups, while other backups successfully restores. It is possibly an indication of some specific WAL files got corrupted during the backup/restore process. Is that the correct interpretation? Questions: 1. Are there any ways of identifying if the WAL file is corrupted? 2. Is there a way to move ahead without losing data? (I am wary of using pg_resetxlog)
Is it possible to write a generic UPSERT?
Dear all, I hope this request is not too naiive. Please ignore if its not matching this lists focus, or if it just shows my lack of SQL knowledge. Any pointer to documentation would be appreciated! My question is about the "ON CONFLICT UPDATE" clause. We often have one dedicated condition that we want to achieve: Insert an item if it does not exist, or update all given fields if it does exist. Your excellent documentation makes it easy to implement this in SQL. But the statement is slightly complex to type, and I find me and my colleagues often spend more time on this than I would hope. Our two main challenges are: (1) we have to look up the uniqueness constraints on the table, and (2) we have to duplicate the insert statement in the UPDATE section again, because virtually all fields should get overwritten (except for the conflicting ones). On long inserts this can be quite annoying and error-prone. I can see how "ON CONFLICT" is very powerful. But that power seems often a burden for us. We would prefer something that is less manual effort for the specific use case. Basically, we would like: INSERT if not exist, and UPDATE _all_ non-conflicting fields in case of _any_ conflict In my (naiive) thinking, such a construct would cover 99% of our use cases. Or did other people make very different experiences? Now the question: I fail to generically implement this. I do not necessarily mean that this has to be supported by PostgreSQL. Any solution would work for us, be it in SQL, PostgreSQL or C/C++. For example an acceptable solution would be a C/C++ wrapper method that generates the statement, given the corresponding INSERT statement and the types and values of all arguments. The wrapper would also have access to the table definitions because we create all tables in code. We currently do not have concurrency concerns so a more complex solution using multiple statements would be fine. Has anybody ever done something like this? Is there an SQL way to achieve this? Or another programmatic way? All the best, Mario Emmenlauer -- BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203 Balanstr. 43 mailto: memmenlauer * biodataanalysis.de D-81669 München http://www.biodataanalysis.de/
Re: Christopher Browne
On Wed, Nov 4, 2020 at 06:29:18PM -0500, Steve Singer wrote: > > It is with much sadness that I am letting the community know that Chris > Browne passed away recently. > > Chris had been a long time community member and was active on various > Postgresql mailing lists. He was a member of the funds committee, the PgCon > program committee and served on the board of the PostgreSQL Community > Association of Canada. Chris was a maintainer of the Slony replication > system and worked on various PG related tools. > > I worked with Chris for over 10 years and anytime someone had a problem he > would jump at the chance to help and propose solutions. He always had time > to listen to your problem and offer ideas or explain how something worked. FYI, the funeral service is online today at 11am US/Eastern time (in one hour): https://youtu.be/hOgMwmFYJM4 The URL was listed on the web page of the obituary: https://www.arbormemorial.ca/capital/obituaries/christopher-bruce-browne/57436/ -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Discovering postgres binary directory location
On any given server there could be zero, one, or many PGHOME/bin locations. The OP wants to identify all of them. The default location used by package-based installers is different from the default location of software built from source, and when building from source you can specify a non-default location. My server has PG 9.6, 12.1, and 13 from RPM and also 12.1 from source. The solution might be as easy as finding all installations of a core PostgreSQL executable like pg_ctl. I would not search for psql since you will find one or more links in system folders and so your monitoring tool would need some logic to filter out the links. The word postgres is both an executable and a user directory name. Hence I am using pg_ctl in my example. This example was written on CentOS 7.7. # find / -name pg_ctl /usr/pgsql-13/bin/pg_ctl /usr/local/pgsql/bin/pg_ctl /usr/pgsql-12/bin/pg_ctl /usr/pgsql-9.6/bin/pg_ctl /root/Downloads/postgresql-12.1/src/bin/pg_ctl /root/Downloads/postgresql-12.1/src/bin/pg_ctl/pg_ctl You have mail in /var/spool/mail/root Notice above my server has three RPM installs and one source code install. Also notice the results contain two false positives (in the Downloads directory) and also a message about mail. You'll have to figure out how to separate the good and bad results. To strip off the file name and only return the directory, you can used the -printf option as shown below or if not supported on your system use a host command like sed or awk. This does not remove false positives or system messages. find / -name 'pg_ctl' -printf "%h\n" /usr/pgsql-13/bin /usr/local/pgsql/bin /usr/pgsql-12/bin /usr/pgsql-9.6/bin /root/Downloads/postgresql-12.1/src/bin /root/Downloads/postgresql-12.1/src/bin/pg_ctl You have mail in /var/spool/mail/root Careful when stripping out the false positives. PostgreSQL installed from source can be pretty much anywhere including a Downloads directory, /tmp, and so on. In my case, the Downloads directory has a src sub-directory, which tells me it is just a staging area for source code. -Mark On Thu, Nov 12, 2020 at 1:22 AM Paul Förster wrote: > Hi Raul, hi Adrian, > > > On 11. Nov, 2020, at 23:26, Adrian Klaver > wrote: > > > > On 11/11/20 2:22 PM, Raul Kaubi wrote: > >> Hi > >> CentOS 7 > >> Postgres 9 to 12 > >> I am looking ways to universally discover postgresql binary directory > for monitoring purpose. > >> For example postgres 12, it is: */usr/pgsql-12* > > > > pg_config --bindir > > /usr/local/pgsql12/bin > > or by query: > > postgres=# select setting from pg_config where name = 'BINDIR'; > setting > - > /data/postgres/12.4/bin > (1 row) > > Cheers, > Paul > >
Re: Discovering postgres binary directory location
Hi Mark, > On 12. Nov, 2020, at 16:19, Mark Johnson wrote: > > # find / -name pg_ctl > /usr/pgsql-13/bin/pg_ctl > /usr/local/pgsql/bin/pg_ctl > /usr/pgsql-12/bin/pg_ctl > /usr/pgsql-9.6/bin/pg_ctl > /root/Downloads/postgresql-12.1/src/bin/pg_ctl > /root/Downloads/postgresql-12.1/src/bin/pg_ctl/pg_ctl > You have mail in /var/spool/mail/root how about searching for pg_ctl only inside a bin directory: $ find / -type f -name "pg_ctl" -exec grep "/bin/" {} \; 2>/dev/null Binary file /data/postgres/12.4/bin/pg_ctl matches Binary file /data/postgres/13.0/bin/pg_ctl matches That should also solve your source tree and root mail problems. Cheers, Paul
Re: Discovering postgres binary directory location
Hi Mark, > On 12. Nov, 2020, at 16:37, Paul Förster wrote: > > how about searching for pg_ctl only inside a bin directory: > > $ find / -type f -name "pg_ctl" -exec grep "/bin/" {} \; 2>/dev/null > Binary file /data/postgres/12.4/bin/pg_ctl matches > Binary file /data/postgres/13.0/bin/pg_ctl matches > > That should also solve your source tree and root mail problems. btw., you can also do it without calling grep: $ find / -type f -executable -regex "*/bin/pg_ctl" 2>/dev/null At least on openSUSE. But I guess, it should be pretty much the same on CentOS. Cheers, Paul
Re: Is it possible to write a generic UPSERT?
> On 12 Nov 2020, at 14:58, Mario Emmenlauer wrote: (…) > But the statement is slightly complex to type, and I find me and my > colleagues often spend more time on this than I would hope. Our two > main challenges are: > (1) we have to look up the uniqueness constraints on the table, and > (2) we have to duplicate the insert statement in the UPDATE section >again, because virtually all fields should get overwritten >(except for the conflicting ones). On long inserts this can be >quite annoying and error-prone. > > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: >INSERT if not exist, and >UPDATE _all_ non-conflicting fields in case of _any_ conflict > > In my (naiive) thinking, such a construct would cover 99% of our > use cases. Or did other people make very different experiences? (…) > Has anybody ever done something like this? Is there an SQL way to > achieve this? Or another programmatic way? We generate the SQL @work based on the definitions in, IIRC, the information_schema. It has tables for both the column lists per table and the primary key definitions. With that, an SQL statement that returns the required SQL statement is easy to generate, after which you can execute it either from a plpgsql execute statement in a function or in a do-block. We do this in plpgsql, but that’s mostly because this code is part of our ETL process and it has to perform some other logic on the same data anyway. I could look up our code for you tomorrow, but that looks to be a busy day, so I can’t promise. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Is it possible to write a generic UPSERT?
On 12.11.20 17:55, Alban Hertroys wrote: >> On 12 Nov 2020, at 14:58, Mario Emmenlauer wrote: > > (…) > >> But the statement is slightly complex to type, and I find me and my >> colleagues often spend more time on this than I would hope. Our two >> main challenges are: >> (1) we have to look up the uniqueness constraints on the table, and >> (2) we have to duplicate the insert statement in the UPDATE section >>again, because virtually all fields should get overwritten >>(except for the conflicting ones). On long inserts this can be >>quite annoying and error-prone. >> >> I can see how "ON CONFLICT" is very powerful. But that power seems >> often a burden for us. We would prefer something that is less manual >> effort for the specific use case. Basically, we would like: >>INSERT if not exist, and >>UPDATE _all_ non-conflicting fields in case of _any_ conflict >> >> In my (naiive) thinking, such a construct would cover 99% of our >> use cases. Or did other people make very different experiences? > > (…) > >> Has anybody ever done something like this? Is there an SQL way to >> achieve this? Or another programmatic way? > > We generate the SQL @work based on the definitions in, IIRC, the > information_schema. It has tables for both the column lists per table and the > primary key definitions. > > With that, an SQL statement that returns the required SQL statement is easy > to generate, after which you can execute it either from a plpgsql execute > statement in a function or in a do-block. This is actually a very very interesting idea! I did not consider that we could completely generate the statements based on the actual table information from the information_schema. I need to give this a bit more thought but I very much like the idea, thanks for pushing me in a new direction! All the best, Mario Emmenlauer -- BioDataAnalysis GmbH, Mario Emmenlauer Tel. Buero: +49-89-74677203 Balanstr. 43 mailto: memmenlauer * biodataanalysis.de D-81669 München http://www.biodataanalysis.de/
Re: Is it possible to write a generic UPSERT?
On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer wrote: > I can see how "ON CONFLICT" is very powerful. But that power seems > often a burden for us. We would prefer something that is less manual > effort for the specific use case. Basically, we would like: > INSERT if not exist, and > UPDATE _all_ non-conflicting fields in case of _any_ conflict > If you do not have significant ratio of HOT (heap only tuple) updates that you want to preserve and you don't have sequences that are GENERATED AS ALWAYS (rather than BY DEFAULT), you could consider just doing a DELETE where the keys exist, then insert all the rows. It should be trivial to figure out the primary key or other unique index to match on. MERGE command is implemented for this use case in some DBMS, but not Postgres (yet?).
Re: Encryption with customer provided key in a multi tenant Postgres JSONB DB
On Thu, Nov 12, 2020 at 11:57:27AM +0530, Saurav Sarkar wrote: > Hi David, > > Thanks for the reply. > > I just wanted to check if there is any possibility or any activity ongoing > which can enable database or fine granular level encryption in future. > > Probably then i can wait otherwise i have to move towards Client Side > encryption as you mentioned. Postgres shared the same WAL files for all databases in a cluster, so the idea of having multiple keys for different users is very hard or impossible to implement. Client-side is much better for this use-case. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Is it possible to write a generic UPSERT?
Am 12.11.20 um 18:34 schrieb Michael Lewis: MERGE command is implemented for this use case in some DBMS, but not Postgres (yet?). MERGE is available in 2ndqPostgres, but that's not open source (it's available for 2ndQ-customers on request). Regards, Andreas -- 2ndQuadrant, an EDB company www.2ndQuadrant.com / www.enterprisedb.com
Re: Encryption with customer provided key in a multi tenant Postgres JSONB DB
On Wed, Nov 11, 2020 at 10:49 PM Saurav Sarkar wrote: > We have a multi tenant application where for each tenant we create > separate tables . So for e.g. if i have 100 tenants then i have 100 tables. > Depending how many tables each client gets and what you expect your tenant growth may be like, you will likely need to consider separate databases at some point for better stats and to ensure the system handles catalog table lookups and such.
Packages?
Hi, Is there chance we will get packages (like those in Oracle) in PostgreSQL? They are kind of nice to have. The main advantage from my point of view is the ability to grant execute to a single package to a user and then they can run all procedures in the package.
Re: Packages?
Hi pá 13. 11. 2020 v 8:29 odesílatel Post Gresql napsal: > Hi, > > Is there chance we will get packages (like those in Oracle) in PostgreSQL? > This probability is very low - Postgres has redundant feature - schema. Schema (in Postgres) can be used like Oracle's packages. Orafce uses schemas instead packages well. > They are kind of nice to have. The main advantage from my point of view > is the ability to grant execute to a single package to a user and then > they can run all procedures in the package. > Without usage right nobody can use function or procedure from schema. Regards Pavel