postgres: WAL ends befor end of online backup

2020-11-12 Thread Moorthy RS
We are running postgres 9.6, with 10+TB size. Backups have been taken using
a homegrown tool "pgrsync", which uses S3 as the repository. Both the
backup files and WAL archives are stored on S3.

Problem: While trying to restore, the WAL archives restore randomly fails
for some backups with the following message in logs:

2020-11-12 06:33:32 UTC [10037]: [27988-1] user=,db=LOG:  redo done at
5493D/2EFFF568
2020-11-12 06:33:32 UTC [10037]: [27989-1] user=,db=LOG:  last
completed transaction was at log time 2020-11-06 12:31:27.796805+00
2020-11-12 06:33:34 UTC [10037]: [27990-1] user=,db=LOG:  restored log
file "00020005493D002E" from archive
2020-11-12 06:33:34 UTC [10037]: [27991-1] user=,db=FATAL:  WAL ends
before end of online backup
 2020-11-12 06:33:34
UTC [10037]: [27992-1] user=,db=HINT:  All WAL generated while online
backup was taken must be available at recovery.
2020-11-12 06:33:36 UTC [10033]: [3-1]
user=,db=LOG:  startup process (PID 10037) exited with exit code 1

2020-11-12 06:33:36 UTC [10033]: [4-1] user=,db=LOG:
terminating any other active server processes

2020-11-12 06:33:48 UTC [10033]: [5-1] user=,db=LOG:  database system
is shut down

In this case, the backup start location is 0002000544C6006B and
stop location is 000200054521008D, (based on pg_stop_backup()
output) but it stops in between at 005493D and terminates the restore. If I
redo the restore again, it stops exactly at the same point. Similar results
from couple of more backups, while other backups successfully restores.

It is possibly an indication of some specific WAL files got corrupted
during the backup/restore process. Is that the correct interpretation?

Questions:

   1. Are there any ways of identifying if the WAL file is corrupted?
   2. Is there a way to move ahead without losing data? (I am wary of using
   pg_resetxlog)


Is it possible to write a generic UPSERT?

2020-11-12 Thread Mario Emmenlauer


Dear all,

I hope this request is not too naiive. Please ignore if its not
matching this lists focus, or if it just shows my lack of SQL
knowledge. Any pointer to documentation would be appreciated!

My question is about the "ON CONFLICT UPDATE" clause. We often have
one dedicated condition that we want to achieve: Insert an item if
it does not exist, or update all given fields if it does exist.

Your excellent documentation makes it easy to implement this in SQL.

But the statement is slightly complex to type, and I find me and my
colleagues often spend more time on this than I would hope. Our two
main challenges are:
(1) we have to look up the uniqueness constraints on the table, and
(2) we have to duplicate the insert statement in the UPDATE section
again, because virtually all fields should get overwritten
(except for the conflicting ones). On long inserts this can be
quite annoying and error-prone.

I can see how "ON CONFLICT" is very powerful. But that power seems
often a burden for us. We would prefer something that is less manual
effort for the specific use case. Basically, we would like:
INSERT if not exist, and
UPDATE _all_ non-conflicting fields in case of _any_ conflict

In my (naiive) thinking, such a construct would cover 99% of our
use cases. Or did other people make very different experiences?


Now the question: I fail to generically implement this. I do not
necessarily mean that this has to be supported by PostgreSQL. Any
solution would work for us, be it in SQL, PostgreSQL or C/C++. For
example an acceptable solution would be a C/C++ wrapper method that
generates the statement, given the corresponding INSERT statement and
the types and values of all arguments. The wrapper would also have
access to the table definitions because we create all tables in code.

We currently do not have concurrency concerns so a more complex
solution using multiple statements would be fine.

Has anybody ever done something like this? Is there an SQL way to
achieve this? Or another programmatic way?

All the best,

Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer  Tel. Buero: +49-89-74677203
Balanstr. 43   mailto: memmenlauer * biodataanalysis.de
D-81669 München  http://www.biodataanalysis.de/




Re: Christopher Browne

2020-11-12 Thread Bruce Momjian
On Wed, Nov  4, 2020 at 06:29:18PM -0500, Steve Singer wrote:
> 
> It is with much sadness that I am letting the community know that Chris
> Browne passed away recently.
> 
> Chris had been a long time community member and was active on various
> Postgresql mailing lists.  He was a member of the funds committee, the PgCon
> program committee and served on the board of the PostgreSQL Community
> Association of Canada. Chris was a maintainer of the Slony replication
> system and worked on various PG related tools.
> 
> I worked with Chris for over 10 years and anytime someone had a problem he
> would jump at the chance to help and propose solutions. He always had time
> to listen to your problem and offer ideas or explain how something worked.

FYI, the funeral service is online today at 11am US/Eastern time (in one
hour):

https://youtu.be/hOgMwmFYJM4

The URL was listed on the web page of the obituary:


https://www.arbormemorial.ca/capital/obituaries/christopher-bruce-browne/57436/

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Discovering postgres binary directory location

2020-11-12 Thread Mark Johnson
On any given server there could be zero, one, or many PGHOME/bin
locations.  The OP wants to identify all of them.  The default location
used by package-based installers is different from the default location of
software built from source, and when building from source you can specify a
non-default location.  My server has PG 9.6, 12.1, and 13 from RPM and also
12.1 from source.

The solution might be as easy as finding all installations of a core
PostgreSQL executable like pg_ctl.  I would not search for psql since you
will find one or more links in system folders and so your monitoring tool
would need some logic to filter out the links.  The word postgres is both
an executable and a user directory name.  Hence I am using pg_ctl in my
example.  This example was written on CentOS 7.7.

# find / -name pg_ctl
/usr/pgsql-13/bin/pg_ctl
/usr/local/pgsql/bin/pg_ctl
/usr/pgsql-12/bin/pg_ctl
/usr/pgsql-9.6/bin/pg_ctl
/root/Downloads/postgresql-12.1/src/bin/pg_ctl
/root/Downloads/postgresql-12.1/src/bin/pg_ctl/pg_ctl
You have mail in /var/spool/mail/root

Notice above my server has three RPM installs and one source code install.
Also notice the results contain two false positives (in the Downloads
directory) and also a message about mail.  You'll have to figure out how to
separate the good and bad results.

To strip off the file name and only return the directory, you can used the
-printf option as shown below or if not supported on your system use a host
command like sed or awk.  This does not remove false positives or system
messages.

find / -name 'pg_ctl' -printf "%h\n"
/usr/pgsql-13/bin
/usr/local/pgsql/bin
/usr/pgsql-12/bin
/usr/pgsql-9.6/bin
/root/Downloads/postgresql-12.1/src/bin
/root/Downloads/postgresql-12.1/src/bin/pg_ctl
You have mail in /var/spool/mail/root

Careful when stripping out the false positives.  PostgreSQL installed from
source can be pretty much anywhere including a Downloads directory, /tmp,
and so on.  In my case, the Downloads directory has a src sub-directory,
which tells me it is just a staging area for source code.

-Mark

On Thu, Nov 12, 2020 at 1:22 AM Paul Förster 
wrote:

> Hi Raul, hi Adrian,
>
> > On 11. Nov, 2020, at 23:26, Adrian Klaver 
> wrote:
> >
> > On 11/11/20 2:22 PM, Raul Kaubi wrote:
> >> Hi
> >> CentOS 7
> >> Postgres 9 to 12
> >> I am looking ways to universally discover postgresql binary directory
> for monitoring purpose.
> >> For example postgres 12, it is: */usr/pgsql-12*
> >
> > pg_config --bindir
> > /usr/local/pgsql12/bin
>
> or by query:
>
> postgres=# select setting from pg_config where name = 'BINDIR';
>  setting
> -
>  /data/postgres/12.4/bin
> (1 row)
>
> Cheers,
> Paul
>
>


Re: Discovering postgres binary directory location

2020-11-12 Thread Paul Förster
Hi Mark,

> On 12. Nov, 2020, at 16:19, Mark Johnson  wrote:
> 
> # find / -name pg_ctl
> /usr/pgsql-13/bin/pg_ctl
> /usr/local/pgsql/bin/pg_ctl
> /usr/pgsql-12/bin/pg_ctl
> /usr/pgsql-9.6/bin/pg_ctl
> /root/Downloads/postgresql-12.1/src/bin/pg_ctl
> /root/Downloads/postgresql-12.1/src/bin/pg_ctl/pg_ctl
> You have mail in /var/spool/mail/root

how about searching for pg_ctl only inside a bin directory:

$ find / -type f -name "pg_ctl" -exec grep "/bin/" {} \; 2>/dev/null 
Binary file /data/postgres/12.4/bin/pg_ctl matches
Binary file /data/postgres/13.0/bin/pg_ctl matches

That should also solve your source tree and root mail problems.

Cheers,
Paul




Re: Discovering postgres binary directory location

2020-11-12 Thread Paul Förster
Hi Mark,

> On 12. Nov, 2020, at 16:37, Paul Förster  wrote:
> 
> how about searching for pg_ctl only inside a bin directory:
> 
> $ find / -type f -name "pg_ctl" -exec grep "/bin/" {} \; 2>/dev/null 
> Binary file /data/postgres/12.4/bin/pg_ctl matches
> Binary file /data/postgres/13.0/bin/pg_ctl matches
> 
> That should also solve your source tree and root mail problems.

btw., you can also do it without calling grep:

$ find / -type f -executable -regex "*/bin/pg_ctl" 2>/dev/null

At least on openSUSE. But I guess, it should be pretty much the same on CentOS.

Cheers,
Paul



Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Alban Hertroys


> On 12 Nov 2020, at 14:58, Mario Emmenlauer  wrote:

(…)

> But the statement is slightly complex to type, and I find me and my
> colleagues often spend more time on this than I would hope. Our two
> main challenges are:
> (1) we have to look up the uniqueness constraints on the table, and
> (2) we have to duplicate the insert statement in the UPDATE section
>again, because virtually all fields should get overwritten
>(except for the conflicting ones). On long inserts this can be
>quite annoying and error-prone.
> 
> I can see how "ON CONFLICT" is very powerful. But that power seems
> often a burden for us. We would prefer something that is less manual
> effort for the specific use case. Basically, we would like:
>INSERT if not exist, and
>UPDATE _all_ non-conflicting fields in case of _any_ conflict
> 
> In my (naiive) thinking, such a construct would cover 99% of our
> use cases. Or did other people make very different experiences?

(…)

> Has anybody ever done something like this? Is there an SQL way to
> achieve this? Or another programmatic way?

We generate the SQL @work based on the definitions in, IIRC, the 
information_schema. It has tables for both the column lists per table and the 
primary key definitions.

With that, an SQL statement that returns the required SQL statement is easy to 
generate, after which you can execute it either from a plpgsql execute 
statement in a function or in a do-block.

We do this in plpgsql, but that’s mostly because this code is part of our ETL 
process and it has to perform some other logic on the same data anyway. I could 
look up our code for you tomorrow, but that looks to be a busy day, so I can’t 
promise.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Mario Emmenlauer
On 12.11.20 17:55, Alban Hertroys wrote:
>> On 12 Nov 2020, at 14:58, Mario Emmenlauer  wrote:
> 
> (…)
> 
>> But the statement is slightly complex to type, and I find me and my
>> colleagues often spend more time on this than I would hope. Our two
>> main challenges are:
>> (1) we have to look up the uniqueness constraints on the table, and
>> (2) we have to duplicate the insert statement in the UPDATE section
>>again, because virtually all fields should get overwritten
>>(except for the conflicting ones). On long inserts this can be
>>quite annoying and error-prone.
>>
>> I can see how "ON CONFLICT" is very powerful. But that power seems
>> often a burden for us. We would prefer something that is less manual
>> effort for the specific use case. Basically, we would like:
>>INSERT if not exist, and
>>UPDATE _all_ non-conflicting fields in case of _any_ conflict
>>
>> In my (naiive) thinking, such a construct would cover 99% of our
>> use cases. Or did other people make very different experiences?
> 
> (…)
> 
>> Has anybody ever done something like this? Is there an SQL way to
>> achieve this? Or another programmatic way?
> 
> We generate the SQL @work based on the definitions in, IIRC, the 
> information_schema. It has tables for both the column lists per table and the 
> primary key definitions.
> 
> With that, an SQL statement that returns the required SQL statement is easy 
> to generate, after which you can execute it either from a plpgsql execute 
> statement in a function or in a do-block.


This is actually a very very interesting idea! I did not consider
that we could completely generate the statements based on the actual
table information from the information_schema. I need to give this a
bit more thought but I very much like the idea, thanks for pushing me
in a new direction!

All the best,

Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer  Tel. Buero: +49-89-74677203
Balanstr. 43   mailto: memmenlauer * biodataanalysis.de
D-81669 München  http://www.biodataanalysis.de/




Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Michael Lewis
On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer 
wrote:

> I can see how "ON CONFLICT" is very powerful. But that power seems
> often a burden for us. We would prefer something that is less manual
> effort for the specific use case. Basically, we would like:
> INSERT if not exist, and
> UPDATE _all_ non-conflicting fields in case of _any_ conflict
>

If you do not have significant ratio of HOT (heap only tuple) updates that
you want to preserve and you don't have sequences that are GENERATED AS
ALWAYS (rather than BY DEFAULT), you could consider just doing a DELETE
where the keys exist, then insert all the rows. It should be trivial to
figure out the primary key or other unique index to match on.

MERGE command is implemented for this use case in some DBMS, but not
Postgres (yet?).


Re: Encryption with customer provided key in a multi tenant Postgres JSONB DB

2020-11-12 Thread Bruce Momjian
On Thu, Nov 12, 2020 at 11:57:27AM +0530, Saurav Sarkar wrote:
> Hi David,
> 
> Thanks for the reply.
> 
> I just wanted to check if there is any possibility or any activity ongoing
> which can enable database or fine granular level encryption in future.
> 
> Probably then i can wait otherwise i have to move towards Client Side
> encryption as you mentioned.

Postgres shared the same WAL files for all databases in a cluster, so
the idea of having multiple keys for different users is very hard or
impossible to implement.  Client-side is much better for this use-case.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Andreas Kretschmer




Am 12.11.20 um 18:34 schrieb Michael Lewis:
MERGE command is implemented for this use case in some DBMS, but not 
Postgres (yet?).


MERGE is available in 2ndqPostgres, but that's not open source (it's 
available for 2ndQ-customers on request).



Regards, Andreas

--
2ndQuadrant, an EDB company
www.2ndQuadrant.com / www.enterprisedb.com





Re: Encryption with customer provided key in a multi tenant Postgres JSONB DB

2020-11-12 Thread Michael Lewis
On Wed, Nov 11, 2020 at 10:49 PM Saurav Sarkar 
wrote:

> We have a multi tenant application where for each tenant we create
> separate tables . So for e.g. if i have 100 tenants then i have 100 tables.
>

Depending how many tables each client gets and what you expect your tenant
growth may be like, you will likely need to consider separate databases at
some point for better stats and to ensure the system handles catalog table
lookups and such.


Packages?

2020-11-12 Thread Post Gresql

Hi,

Is there chance we will get packages (like those in Oracle) in PostgreSQL?

They are kind of nice to have. The main advantage from my point of view 
is the ability to grant execute to a single package to a user and then 
they can run all procedures in the package.






Re: Packages?

2020-11-12 Thread Pavel Stehule
Hi

pá 13. 11. 2020 v 8:29 odesílatel Post Gresql 
napsal:

> Hi,
>
> Is there chance we will get packages (like those in Oracle) in PostgreSQL?
>

This probability is very low - Postgres has redundant feature - schema.
Schema (in Postgres) can be used like Oracle's packages.  Orafce uses
schemas instead packages well.


> They are kind of nice to have. The main advantage from my point of view
> is the ability to grant execute to a single package to a user and then
> they can run all procedures in the package.
>

Without usage right nobody can use function or procedure from schema.

Regards

Pavel