Re: Unable to init and run postgresql-12 on centos 8.2.2004

2020-06-21 Thread Srinivasa T N
On Sun, Jun 21, 2020, 8:57 AM Vishal Agrawal 
wrote:

> Hello,
>
> I'm trying to install postgresql 12 on CentOS 8.2 as following -
>
> First I run docker container on the host  -
> # docker run --privileged -dit --name pgdg centos:8.2.2004 /sbin/init
>
> Then inside the container, I run the commands given below -
>
> # yum -y install glibc-langpack-en glibc-locale-source
> # localedef -i en_US -f UTF-8 en_US.UTF-8
>
> #  dnf install
> https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
>
>
> #  dnf -qy module disable postgresql
>
> #  dnf install postgresql12-server
>
> #  /usr/pgsql-12/bin/postgresql-12-setup initdb
>
> #  systemctl enable postgresql-12
>
> #  systemctl start postgresql-12
>
> Postgresql starts at this point...
>
> However, when I try to stop it, it just hangs -
>
> # systemctl stop postgresql-12 (hangs)
>
> Can someone help how to resolve this?
>
> Thanks,
> Vishal
>

Looking at logs might help.

Regards,
Seenu.

>


Definition of REPEATABLE READ

2020-06-21 Thread Peter J. Holzer
I've read http://jepsen.io/analyses/postgresql-12.3 which reports a
problem in PostgreSQL's SERIALIZABLE isolation leven (which has since been
fixed) and also shows an example of a violation of what they consider to
be the correct definition of REPEATABLE READ.

Specifically:

T1: r 190 [1 2] | r 188 [4 5] | a 188 8

T2: a 190 4 | a 190 5 

T3: r 190 [1 2 4 5] | r 188 [4 5]

This violates G2-item in Alya et al.[1]: There is a dependency cycle
(T1.1 rw-> T2.1, T2.2 wr-> T3.1, T3.2 rw-> T1.3) with at least one rw
anti-dependency.

It is however, fine in PostgreSQLs implementation of REPEATABLE READ:

T1  T2  T3
BEGIN ISOLATION LEVEL   BEGIN ISOLATION LEVEL   BEGIN ISOLATION LEVEL
REPEATABLE READ;REPEATABLE READ;REPEATABLE READ;

SELECT * FROM t
WHERE id = 190;

id  | v
190 | 1 2

SELECT * FROM t
WHERE id = 188;

id  | v
190 | 4 5

UPDATE t SET v = v || ' 4'
WHERE id = 190;

UPDATE t SET v = v || ' 5'
WHERE id = 190;

COMMIT;

SELECT * FROM t
WHERE id = 190;

id  | v
190 | 1 2 4 5

SELECT * FROM t
WHERE id = 188;

id  | v
190 | 4 5

COMMIT;
UPDATE t SET v = v || ' 8'
WHERE id = 188;

COMMIT;

It also doesn't appear to violate the wording in SQL/92, which forbids

 1) P1 ("Dirty read"): SQL-transaction T1 modifies a row. SQL-
transaction T2 then reads that row before T1 performs a COMMIT.
If T1 then performs a ROLLBACK, T2 will have read a row that was
never committed and that may thus be considered to have never
existed.

 2) P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL-
transaction T2 then modifies or deletes that row and performs
a COMMIT. If T1 then attempts to reread the row, it may receive
the modified value or discover that the row has been deleted.

for REPEATABLE READ (but allows P3 (Phantom reads)).

As far as I understand it, Alya et al. follow Berenson et.al[2] in that
the ANSI committee didn't mean what they wrote. Since excluding P1, P2
and P3 isn't sufficient to achieve serializabibility and the standard is
explizit that SERIALIZABLE should indeed achieve that, they argue that
REPEATABLE READ was really meant to exclude all phenomena except P3. So
for for Alya et. al., only (G2 - G2-item) should be allowed.

Since those papers are now 25 and 20 years old, respectively, and there
have been several revisions of the SQL standard in the meantime, has the
SQL committee come around to that view (SQL/2003 seems unchanged, I
couldn't find a later revision online)? And if it has, did it follow
those definitions or come up with different ones (it seems to me that
G2-item is much stronger that warranted by the wording in the standard)?

hp


[1] http://pmg.csail.mit.edu/papers/icde00.pdf
[2] 
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


The backup API and general purpose backup software

2020-06-21 Thread Peter J. Holzer
This is inspired by the thread with the subject "Something else about
Redo Logs disappearing", but since that thread is already quite long,
since I have lost track what exactly "Peter"'s problem is and since his
somewhat belligerent tone makes it unappealing to reread the whole
thread I'll attempt a fresh start.

To make a full backup with the "new" (non-exclusive) API, a software
must do the following

1. open a connection to the database

2. invoke pg_start_backup('label', false, false) in the connection from
   step 1.

3. copy the contents of the data directory to the backup medium

4. invoke pg_stop_backup(false, true) in the connection from step 1.

5. copy the result from the previous step to the backup medium.

(It is assumed that any archived WALs are also archived in a manner that
they can be restored together with this backup. If this is not the case
adding them to the backup would be step 6.)

So far so good and writing a program which implements this should not
pose a great difficulty.

General purpose backup software often assumes that it can perform a
backup in three steps:

1. Invoke a pre-backup script.

2. Copy files to the backup medium in such a way that they can be
   identified as a group and be restored together and without mixing
   them with data from another backup session.

3, Invoke a post-backup script.

(I have to admit that is has been a long time since I've looked at any
backup system in any detail. Doubtlessly many have a more complicated
model, but I'm fairly confident that this is still the lowest common
denominator.)

Now we have two problems.

The first is that the pre-backup script has to exit before the proper
backup can begin, but it also has to open a connection which will stay
open during the backup because it will be needed in the post-backup
script again. This can be solved by starting a background process which
keeps the connection open and providing some communication channel
(perhaps a Unix pipe) for the post-backup script to communicate with
this background process. A bit awkward, but no great deal.

The second problem is that the post-backup script will only be called
after the backup is finished. So the information returned by pg_stop_backup
cannot be included in the backup which makes the backup useless. This is
indeed serious, and I don't see a way around this in this simple model.

But there is a workaround: Split the backup.

I am assuming that the backup software uses a unique ID to identify each
backup and passes that ID to the pre-backup and post-backup script.
This ID is used as the label in the call to pg_start_backup(). 
It is also included in the information returned by pg_stop_backup(). So
the post-backup script stores that information in a different place
(maybe including the ID in the filename(s) to avoid conflicts and for
redundancy) and then triggers a backup of that place (or alternatively
that can be backed up independently).

To restore a backup, you will then also need two steps:

1) Restore the summary information from the second backup. Inspect the
backup_label to find the ID of the backup of the data.

2) Restore that backup

(3) Put the backup label where it belongs, make sure the archived WALs
are accessible and start the database

hp



-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Unable to init and run postgresql-12 on centos 8.2.2004

2020-06-21 Thread Vishal Agrawal
Seenu, I don't see any journalctl entry other than the log that postmaster
has been killed. (Jun 21 14:52:30 f9cbc3b08e20 systemd[1]:
postgresql-12.service: Killing process 1096 (postmaster) with signal
SIGKILL.)
Any further suggestions on where to look ?

It does seem to work with systemd-239-18 (8.1.1911) but fails
with systemd-239-30 (In 8.2.2004 repo).

On Sun, Jun 21, 2020 at 6:05 AM Srinivasa T N  wrote:

>
>
> On Sun, Jun 21, 2020, 8:57 AM Vishal Agrawal 
> wrote:
>
>> Hello,
>>
>> I'm trying to install postgresql 12 on CentOS 8.2 as following -
>>
>> First I run docker container on the host  -
>> # docker run --privileged -dit --name pgdg centos:8.2.2004 /sbin/init
>>
>> Then inside the container, I run the commands given below -
>>
>> # yum -y install glibc-langpack-en glibc-locale-source
>> # localedef -i en_US -f UTF-8 en_US.UTF-8
>>
>> #  dnf install
>> https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
>>
>>
>> #  dnf -qy module disable postgresql
>>
>> #  dnf install postgresql12-server
>>
>> #  /usr/pgsql-12/bin/postgresql-12-setup initdb
>>
>> #  systemctl enable postgresql-12
>>
>> #  systemctl start postgresql-12
>>
>> Postgresql starts at this point...
>>
>> However, when I try to stop it, it just hangs -
>>
>> # systemctl stop postgresql-12 (hangs)
>>
>> Can someone help how to resolve this?
>>
>> Thanks,
>> Vishal
>>
>
> Looking at logs might help.
>
> Regards,
> Seenu.
>
>>


Re: The backup API and general purpose backup software

2020-06-21 Thread Ron

Peter,

I don't understand that last step "5. copy the result from the previous step 
to the backup medium."  It seems to be a duplication of "3. copy the 
contents of the data directory to the backup medium".


On 6/21/20 8:28 AM, Peter J. Holzer wrote:

This is inspired by the thread with the subject "Something else about
Redo Logs disappearing", but since that thread is already quite long,
since I have lost track what exactly "Peter"'s problem is and since his
somewhat belligerent tone makes it unappealing to reread the whole
thread I'll attempt a fresh start.

To make a full backup with the "new" (non-exclusive) API, a software
must do the following

1. open a connection to the database

2. invoke pg_start_backup('label', false, false) in the connection from
step 1.

3. copy the contents of the data directory to the backup medium

4. invoke pg_stop_backup(false, true) in the connection from step 1.

5. copy the result from the previous step to the backup medium.

(It is assumed that any archived WALs are also archived in a manner that
they can be restored together with this backup. If this is not the case
adding them to the backup would be step 6.)

So far so good and writing a program which implements this should not
pose a great difficulty.

General purpose backup software often assumes that it can perform a
backup in three steps:

1. Invoke a pre-backup script.

2. Copy files to the backup medium in such a way that they can be
identified as a group and be restored together and without mixing
them with data from another backup session.

3, Invoke a post-backup script.

(I have to admit that is has been a long time since I've looked at any
backup system in any detail. Doubtlessly many have a more complicated
model, but I'm fairly confident that this is still the lowest common
denominator.)

Now we have two problems.

The first is that the pre-backup script has to exit before the proper
backup can begin, but it also has to open a connection which will stay
open during the backup because it will be needed in the post-backup
script again. This can be solved by starting a background process which
keeps the connection open and providing some communication channel
(perhaps a Unix pipe) for the post-backup script to communicate with
this background process. A bit awkward, but no great deal.

The second problem is that the post-backup script will only be called
after the backup is finished. So the information returned by pg_stop_backup
cannot be included in the backup which makes the backup useless. This is
indeed serious, and I don't see a way around this in this simple model.

But there is a workaround: Split the backup.

I am assuming that the backup software uses a unique ID to identify each
backup and passes that ID to the pre-backup and post-backup script.
This ID is used as the label in the call to pg_start_backup().
It is also included in the information returned by pg_stop_backup(). So
the post-backup script stores that information in a different place
(maybe including the ID in the filename(s) to avoid conflicts and for
redundancy) and then triggers a backup of that place (or alternatively
that can be backed up independently).

To restore a backup, you will then also need two steps:

1) Restore the summary information from the second backup. Inspect the
backup_label to find the ID of the backup of the data.

2) Restore that backup

(3) Put the backup label where it belongs, make sure the archived WALs
 are accessible and start the database

 hp





--
Angular momentum makes the world go 'round.




Re: The backup API and general purpose backup software

2020-06-21 Thread Peter J. Holzer
On 2020-06-21 10:32:16 -0500, Ron wrote:
> On 6/21/20 8:28 AM, Peter J. Holzer wrote:
> > To make a full backup with the "new" (non-exclusive) API, a software
> > must do the following
> > 
> > 1. open a connection to the database
> > 
> > 2. invoke pg_start_backup('label', false, false) in the connection from
> > step 1.
> > 
> > 3. copy the contents of the data directory to the backup medium
> > 
> > 4. invoke pg_stop_backup(false, true) in the connection from step 1.
> > 
> > 5. copy the result from the previous step to the backup medium.
> 
> I don't understand that last step "5. copy the result from the previous step
> to the backup medium."  It seems to be a duplication of "3. copy the
> contents of the data directory to the backup medium".

Step three copies all the files from the data directory. However, there
is an additional piece of information you you need to recover the
database, and that is only returned by the call to pg_stop_backup:

postgres=# select * from pg_stop_backup(false, true);
╔═[ RECORD 1 ]═╗
║ lsn│ A9/2F000130 ║
║ labelfile  │ START WAL LOCATION: A9/2F28 (file 000100A9002F)↵║
║│ CHECKPOINT LOCATION: A9/2F60   ↵║
║│ BACKUP METHOD: streamed↵║
║│ BACKUP FROM: master↵║
║│ START TIME: 2020-06-21 17:36:30 CEST   ↵║
║│ LABEL: backup # 1234   ↵║
║│ START TIMELINE: 1  ↵║
║│ ║
║ spcmapfile │ 17522 /home/postgres   ↵║
║│ ║
╚╧═╝

Without this information the backup is useless, so it must be backed
up, too. Obviously you can do that only after you got it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Unable to init and run postgresql-12 on centos 8.2.2004

2020-06-21 Thread Srinivasa T N
On Sun, Jun 21, 2020, 8:23 PM Vishal Agrawal 
wrote:

> Seenu, I don't see any journalctl entry other than the log that postmaster
> has been killed. (Jun 21 14:52:30 f9cbc3b08e20 systemd[1]:
> postgresql-12.service: Killing process 1096 (postmaster) with signal
> SIGKILL.)
> Any further suggestions on where to look ?
>
Looks like problem with systemd but no harm in checking postgres log
files.  I assume that you have already tried "journalctl -l".

>
> It does seem to work with systemd-239-18 (8.1.1911) but fails
> with systemd-239-30 (In 8.2.2004 repo).
>

Any problem reverting to older version of systemd?

Regards,
Seenu.

>
>>
> On Sun, Jun 21, 2020 at 6:05 AM Srinivasa T N  wrote:
>
>>
>>
>> On Sun, Jun 21, 2020, 8:57 AM Vishal Agrawal 
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to install postgresql 12 on CentOS 8.2 as following -
>>>
>>> First I run docker container on the host  -
>>> # docker run --privileged -dit --name pgdg centos:8.2.2004 /sbin/init
>>>
>>> Then inside the container, I run the commands given below -
>>>
>>> # yum -y install glibc-langpack-en glibc-locale-source
>>> # localedef -i en_US -f UTF-8 en_US.UTF-8
>>>
>>> #  dnf install
>>> https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
>>>
>>>
>>> #  dnf -qy module disable postgresql
>>>
>>> #  dnf install postgresql12-server
>>>
>>> #  /usr/pgsql-12/bin/postgresql-12-setup initdb
>>>
>>> #  systemctl enable postgresql-12
>>>
>>> #  systemctl start postgresql-12
>>>
>>> Postgresql starts at this point...
>>>
>>> However, when I try to stop it, it just hangs -
>>>
>>> # systemctl stop postgresql-12 (hangs)
>>>
>>> Can someone help how to resolve this?
>>>
>>> Thanks,
>>> Vishal
>>>
>>
>> Looking at logs might help.
>>
>> Regards,
>> Seenu.
>>
>>>


Re: Unable to init and run postgresql-12 on centos 8.2.2004

2020-06-21 Thread Adrian Klaver

On 6/21/20 7:53 AM, Vishal Agrawal wrote:
Seenu, I don't see any journalctl entry other than the log that 
postmaster has been killed. (Jun 21 14:52:30 f9cbc3b08e20 systemd[1]: 
postgresql-12.service: Killing process 1096 (postmaster) with signal 
SIGKILL.)

Any further suggestions on where to look ?

It does seem to work with systemd-239-18 (8.1.1911) but fails 
with systemd-239-30 (In 8.2.2004 repo).


You might want to add report to this issue:

https://bugs.centos.org/view.php?id=17461

0017461: systemd-239-30 package prevents some units from stopping

"The postgresql unit installed in the postgresql:9.6 package on centos 
8.2 fails to stop when using systemd-239-30 and running in an LXD 
container.  ..."





On Sun, Jun 21, 2020 at 6:05 AM Srinivasa T N > wrote:




On Sun, Jun 21, 2020, 8:57 AM Vishal Agrawal
mailto:agrawal.vis...@gmail.com>> wrote:

Hello,

I'm trying to install postgresql 12 on CentOS 8.2 as following -

First I run docker container on the host  -
# docker run --privileged -dit --name pgdg centos:8.2.2004
/sbin/init

Then inside the container, I run the commands given below -

# yum -y install glibc-langpack-en glibc-locale-source
# localedef -i en_US -f UTF-8 en_US.UTF-8

# dnf install

https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm


# dnf -qy module disable postgresql

# dnf install postgresql12-server

# /usr/pgsql-12/bin/postgresql-12-setup initdb

# systemctl enable postgresql-12

# systemctl start postgresql-12

Postgresql starts at this point...

However, when I try to stop it, it just hangs -

# systemctl stop postgresql-12 (hangs)

Can someone help how to resolve this?

Thanks,
Vishal


Looking at logs might help.

Regards,
Seenu.




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Netapp SnapCenter

2020-06-21 Thread Magnus Hagander
On Sat, Jun 20, 2020 at 9:04 AM Paul Förster 
wrote:

> Hi Stephen,
>
> > On 19. Jun, 2020, at 18:02, Stephen Frost  wrote:
> > When it comes to 'backup mode', it's actually the case that there can be
> > multiple backups running concurrently because there isn't actually a
> > single 'cluster wide backup mode', really.
>
> this is what I don't understand. Why would there be a reason to run
> multiple backups concurrently? I mean, using pg_start_backup() means I then
> have to backup (using whatever method) of the while PGDATA and not just
> some random file ${PGDATA}/base/13297/2685. And since I have to backup the
> whole of PGDATA anyway, why would I want to do that multiple times, even
> more so, concurrently?
>
> I read the backup doc but I just can't grasp the idea of why multiple
> concurrent backup capabilities should yield any benefit at all.
>

One not uncommon case is for example being able to provision a new replica
while a backup is running. Since replicas are provisioned starting off a
base backup, being able to run that concurrently is very useful. Especially
if base backups take a long time to run.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: The backup API and general purpose backup software

2020-06-21 Thread Ron

On 6/21/20 10:45 AM, Peter J. Holzer wrote:

On 2020-06-21 10:32:16 -0500, Ron wrote:

On 6/21/20 8:28 AM, Peter J. Holzer wrote:

To make a full backup with the "new" (non-exclusive) API, a software
must do the following

1. open a connection to the database

2. invoke pg_start_backup('label', false, false) in the connection from
 step 1.

3. copy the contents of the data directory to the backup medium

4. invoke pg_stop_backup(false, true) in the connection from step 1.

5. copy the result from the previous step to the backup medium.

I don't understand that last step "5. copy the result from the previous step
to the backup medium."  It seems to be a duplication of "3. copy the
contents of the data directory to the backup medium".

Step three copies all the files from the data directory. However, there
is an additional piece of information you you need to recover the
database, and that is only returned by the call to pg_stop_backup:

postgres=# select * from pg_stop_backup(false, true);
╔═[ RECORD 1 ]═╗
║ lsn│ A9/2F000130 ║
║ labelfile  │ START WAL LOCATION: A9/2F28 (file 000100A9002F)↵║
║│ CHECKPOINT LOCATION: A9/2F60   ↵║
║│ BACKUP METHOD: streamed↵║
║│ BACKUP FROM: master↵║
║│ START TIME: 2020-06-21 17:36:30 CEST   ↵║
║│ LABEL: backup # 1234   ↵║
║│ START TIMELINE: 1  ↵║
║│ ║
║ spcmapfile │ 17522 /home/postgres   ↵║
║│ ║
╚╧═╝

Without this information the backup is useless, so it must be backed
up, too. Obviously you can do that only after you got it.


Thanks.  A clearer instruction for step 4, I'd have said something like 
"invoke pg_stop_backup(false, true)in the connection from step 1, piping the 
output to a file on the backup medium."


--
Angular momentum makes the world go 'round.




Re: Hiding a GUC from SQL

2020-06-21 Thread raf
Laurenz Albe wrote:

> On Wed, 2020-06-17 at 13:23 -0700, Michel Pelletier wrote:
> > In my extension pgsodium I'm defining a custom variable at startup to store 
> > a key:
> > 
> > https://github.com/michelp/pgsodium/blob/master/src/pgsodium.c#L1107
> > 
> > I'm using the flags GUC_NO_SHOW_ALL | GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE 
> > | GUC_DISALLOW_IN_FILE,
> > and a custom "no show" show hook that obscures the value.  This idea was 
> > inspired from the
> > pgcryptokey module from Bruce Momjian.
> > 
> > The value cannot be shown either with SHOW or current_setting() and it does 
> > not appear in pg_settings.
> >  From what I can tell, the value is inaccessible from SQL, but I think it's 
> > worth asking
> > the experts if there is some other demonstrable way, from SQL, that this 
> > value could be
> > leaked even to a superuser.  no sql level user should be able to see this 
> > value, only a C function,
> > like the pgsodium_derive() from which to derive other keys, should be able 
> > to see it.
> > I realize that someone with external process access can get the key, my  
> > goal is to prevent
> > accessing it from SQL.
> > 
> > Any thoughts on weaknesses to this approach would be welcome.  Thanks!
> > 
> > -Michel
> 
> A superuser can access files and start programs on the server machine.
> 
> A dedicated superuser may for example attach to PostgreSQL with a debugger
> and read the value of the variable.
> 
> And if that doesn't work, there may be other things to try.
> 
> It is mostly useless to try to keep a superuser from doing anything that
> the "postgres" operating system user can do.
> 
> Yours,
> Laurenz Albe

But only mostly useless. :-) There are ways to limit the power of the
superuser. On Linux, for instance, "sysctl kernel.yama.ptrace_scope=3"
prevents tracing, debugging, and reading another process's memory, even
by the superuser, and the only way to turn it off is via a (hopefully
noticeable) reboot. And, if the keys aren't present on the server at
boot time, and aren't fetched from their remote source (or read from a
user) unless that yama setting is in place, then it will be very hard
for a superuser to obtain the keys. If a remote source KMS is used,
ideally, you'd also want it to cryptographically verify that its client
hadn't been tampered with (somehow), or to not hand out the keys except
for planned reboots. The point is that it's not useless to make things
harder for a superuser.

You might not stop a legitimate sitewide superuser whose family is being
held hostage, but you can stop, or at least make things much more
difficult, for a superuser process on a single host that is the result
of a software vulnerability that wasn't nobbled by apparmor or selinux
or grsecurity.

cheers,
raf





Feature suggestion: auto-prefixing SELECT query column names with table/alias names

2020-06-21 Thread Guy Burgess

Hello,

I've seen some situations where it would be very handy if PostgreSQL 
could automatically prefix column names in a SELECT query with their 
table names (or its alias).   So for two tables, a & b, each with 
columns "id" and "title":


    SELECT * FROM a , b;

instead of returning ambiguously-named columns:

    id, title, id, title

would return:

    a.id, a.title, b.id, b.title

Of course, this can be achieved by avoiding the (often frowned-upon) 
SELECT * syntax in the first place and using explicit column names, but 
that makes otherwise short queries much longer and harder to maintain.  
Also this doesn't only apply to SELECT * queries: if tables a & b had 
more columns, but I wanted just the "title" column of each, to avoid 
ambiguous column names or having to use column position indexes, I have 
to give explicit aliases to those columns (which could be anything, but 
in practice is often the table prefix as already written in the select 
(SELECT a.title AS "a.title". b.title as "b.title" etc).


Could there be an (admittedly non-standard) option to automatically 
prefix table/alias names to columns? From searching, it seems a number 
of people would find this useful[1].  Perhaps something like:   SELECT 
#* FROM a, b    -->   a.id, a.title, b.id, d.title.  This would allow 
use per-table:   SELECT a.id, #b.*   -->   id, b.id, b.title.   Or per 
column  E.g:   SELECT a.id, #a.title, #b.title   -->   id, a.title. 
b.title.   Or alternatively it could be specified in the FROM clause as 
to which tables have their columns prefixed.


I have no idea if this is viable/sensible/anathema, especially given the 
understandable reluctance to stray from the SQL standard, but thought 
I'd ask, especially as it seems like something others would find useful too.


Thanks,

Guy

[1]

https://stackoverflow.com/questions/329931/sql-select-join-is-it-possible-to-prefix-all-columns-as-prefix

https://stackoverflow.com/questions/13153344/in-a-join-how-to-prefix-all-column-names-with-the-table-it-came-from

https://stackoverflow.com/questions/53947657/how-to-display-table-name-as-prefix-to-column-name-in-sql-result

https://stackoverflow.com/questions/33364392/sql-join-prefix-fields-with-table-name





Re: Feature suggestion: auto-prefixing SELECT query column names with table/alias names

2020-06-21 Thread Laurenz Albe
On Mon, 2020-06-22 at 14:16 +1200, Guy Burgess wrote:
> I've seen some situations where it would be very handy if PostgreSQL 
> could automatically prefix column names in a SELECT query with their 
> table names (or its alias).   So for two tables, a & b, each with 
> columns "id" and "title":
> 
>  SELECT * FROM a , b;
> 
> instead of returning ambiguously-named columns:
> 
>  id, title, id, title
> 
> would return:
> 
>  a.id, a.title, b.id, b.title
> 
> Of course, this can be achieved by avoiding the (often frowned-upon) 
> SELECT * syntax in the first place and using explicit column names, but 
> that makes otherwise short queries much longer and harder to maintain.  
> Also this doesn't only apply to SELECT * queries: if tables a & b had 
> more columns, but I wanted just the "title" column of each, to avoid 
> ambiguous column names or having to use column position indexes, I have 
> to give explicit aliases to those columns (which could be anything, but 
> in practice is often the table prefix as already written in the select 
> (SELECT a.title AS "a.title". b.title as "b.title" etc).
> 
> Could there be an (admittedly non-standard) option to automatically 
> prefix table/alias names to columns? From searching, it seems a number 
> of people would find this useful[1].  Perhaps something like:   SELECT 
> #* FROM a, b-->   a.id, a.title, b.id, d.title.  This would allow 
> use per-table:   SELECT a.id, #b.*   -->   id, b.id, b.title.   Or per 
> column  E.g:   SELECT a.id, #a.title, #b.title   -->   id, a.title. 
> b.title.   Or alternatively it could be specified in the FROM clause as 
> to which tables have their columns prefixed.
> 
> I have no idea if this is viable/sensible/anathema, especially given the 
> understandable reluctance to stray from the SQL standard, but thought 
> I'd ask, especially as it seems like something others would find useful too.

I'd say that this idea is not so desirable, for one because the resulting
names would not follow the name rules for SQL identifiers, so that you'd
have to double quote them whenever you reference them.
That could of course be avoided by using "a_id", "a_title" and so on.

Then there is the case of "a JOIN b USING (somecol)".
Here, "somecol" will appear in the output only once.  How should it be
labeled?  If you say "not at all", then what do you want to happen for

  SELECT * FROM a JOIN b USING (id) JOIN c USING (x)

where all three tables have a column "id"?  You see, it is not so simple.

But the real objection I have is that the problem can be easily avoided
by spelling out the SELECT list and using aliases.  Either you are talking
about ad-hoc queries, or you want to use "SELECT *" in your code.

- For ad-hoc queries I don't see the problem so much.  Here, the tables
  will be returned in the order you specify them in the query.
  So if you have "b JOIN a", the result columns will always be
  first all columns from "b", then all columns from "a".
  So you can easily figure out which column belongs to which table.

- For queries in your code, using "SELECT *" is a very bad idea.
  There is a real risk of your code breaking if the database changes.
  In code, you are probably not ready to deal with a changing number
  of columns.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Hiding a GUC from SQL

2020-06-21 Thread Laurenz Albe
On Mon, 2020-06-22 at 09:44 +1000, raf wrote:
> A superuser can access files and start programs on the server machine.
> > A dedicated superuser may for example attach to PostgreSQL with a debugger
> > and read the value of the variable.
> > 
> > And if that doesn't work, there may be other things to try.
> > 
> > It is mostly useless to try to keep a superuser from doing anything that
> > the "postgres" operating system user can do.
> 
> But only mostly useless. :-) There are ways to limit the power of the
> superuser. On Linux, for instance, "sysctl kernel.yama.ptrace_scope=3"
> prevents tracing, debugging, and reading another process's memory, even
> by the superuser, and the only way to turn it off is via a (hopefully
> noticeable) reboot.

Interesting.  Will this block a user from debugging his own processes?
Perhaps you can plug that hole that way, but that was just the first thing
that popped in my head.  Don't underestimate the creativity of attackers.
I for one would not trust my ability to anticipate all possible attacks,
and I think that would be a bad security practice.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Definition of REPEATABLE READ

2020-06-21 Thread Thomas Munro
On Mon, Jun 22, 2020 at 12:25 AM Peter J. Holzer  wrote:
> I've read http://jepsen.io/analyses/postgresql-12.3 which reports a
> problem in PostgreSQL's SERIALIZABLE isolation leven (which has since been
> fixed) and also shows an example of a violation of what they consider to
> be the correct definition of REPEATABLE READ.

In response to that report, we added a paragraph to the manual to
state explicitly that what we offer is snapshot isolation:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1575fbf1ac3c8464b2dade245deff537a3aa2498

https://www.postgresql.org/docs/devel/transaction-iso.html#XACT-REPEATABLE-READ

> Since those papers are now 25 and 20 years old, respectively, and there
> have been several revisions of the SQL standard in the meantime, has the
> SQL committee come around to that view (SQL/2003 seems unchanged, I
> couldn't find a later revision online)? And if it has, did it follow
> those definitions or come up with different ones (it seems to me that
> G2-item is much stronger that warranted by the wording in the standard)?

SQL:2016 is the latest, and I don't have it to hand right now, but
SQL:2011 still defined four degrees of isolation in terms of the three
phenomena (pages 133-135 of part II, "Foundations", at least in the
draft copy you can find on the net).

As for what else PostgreSQL should do about this historical confusion,
I suspect that a patch to pgsql-hackers to accept the name SNAPSHOT
ISOLATION as an alternative would be popular, and we could probably
even spin it as the "true" name of the level (while still accepting
REPEATABLE READ; there is no point in breaking that), and maybe add a
second table to the manual's Transaction Isolation section to cover
the later formalisations in a digested user friendly format (if that
is possible).




DISTINCT on jsonb fields and Indexes

2020-06-21 Thread Sankar P
I have a table with the schema:

CREATE TABLE fluent (id BIGSERIAL, record JSONB);

Then I created a couple of indexes:
1) CREATE INDEX idx_records ON fluent USING GIN (record);
2) CREATE INDEX idx_btree_records ON fluent USING BTREE ((record ->>
'destinationServiceName'));

Now, if I run a query to look up the distinct values of the field
`destinationServiceName`, via:

select distinct record ->> 'destinationServiceName' from fluent;

This query takes a lot of time, and does full table scan. The query planning is:

# explain analyze select distinct record ->> 'destinationServiceName'
from fluent;
QUERY PLAN
  Unique
(cost=1103803.97..1118803.97 rows=300 width=32) (actual
time=77282.528..78549.877 rows=10 loops=1)
   ->  Sort  (cost=1103803.97..303.97 rows=300 width=32)
(actual time=77282.525..78046.992 rows=300 loops=1)
 Sort Key: ((record ->> 'destinationServiceName'::text))
 Sort Method: external merge  Disk: 117456kB
 ->  Seq Scan on fluent  (cost=0.00..637500.00 rows=300
width=32) (actual time=14.440..69545.867 rows=300 loops=1)
 Planning Time: 0.187 ms
 Execution Time: 78574.221 ms

I see that none of the indexes are used. I want to do a few
aggregations, like "what are the distinct pairs of
`destinationServiceName` and `sourceServiceName` etc. " in these
records. Now, is such a querying possible at all without doing full
table scans ? I get such kind of aggregation support in elasticsearch
+ kibana, without doing full-table scan (I believe so, but I do not
have data to back this claim) and I am trying to see if this is
possible with any other extra index creation in postgres.

Any suggestions ? Thanks.

-- 
Sankar P
http://psankar.blogspot.com




Re: Hiding a GUC from SQL

2020-06-21 Thread raf
Laurenz Albe wrote:

> On Mon, 2020-06-22 at 09:44 +1000, raf wrote:
> > A superuser can access files and start programs on the server machine.
> > > A dedicated superuser may for example attach to PostgreSQL with a debugger
> > > and read the value of the variable.
> > > 
> > > And if that doesn't work, there may be other things to try.
> > > 
> > > It is mostly useless to try to keep a superuser from doing anything that
> > > the "postgres" operating system user can do.
> > 
> > But only mostly useless. :-) There are ways to limit the power of the
> > superuser. On Linux, for instance, "sysctl kernel.yama.ptrace_scope=3"
> > prevents tracing, debugging, and reading another process's memory, even
> > by the superuser, and the only way to turn it off is via a (hopefully
> > noticeable) reboot.
> 
> Interesting.  Will this block a user from debugging his own processes?

Yes.

> Perhaps you can plug that hole that way, but that was just the first thing
> that popped in my head.  Don't underestimate the creativity of attackers.
> I for one would not trust my ability to anticipate all possible attacks,
> and I think that would be a bad security practice.

Yes, but that's no reason not to perform as much risk
assessment and mitigation as you can afford/justify.
Not being able to prevent all attacks is no reason not
to prevent those that you can. :-) Nobody said anything
about underestimating anyone or trusting anyone.

> Yours,
> Laurenz Albe

cheers,
raf





Re: Netapp SnapCenter

2020-06-21 Thread Paul Förster
Hi Magnus,

> On 21. Jun, 2020, at 21:35, Magnus Hagander  wrote:
> One not uncommon case is for example being able to provision a new replica 
> while a backup is running. Since replicas are provisioned starting off a base 
> backup, being able to run that concurrently is very useful. Especially if 
> base backups take a long time to run.

hmm, a backup shouldn't take so long as to delay a base backup significantly. 
But that's just my opinion. I may be a little snapshot biased, though, where it 
only takes a few seconds. :-)

Cheers,
Paul



Re: Netapp SnapCenter

2020-06-21 Thread Stephen Frost
Greetings,

* Paul Förster (paul.foers...@gmail.com) wrote:
> > On 21. Jun, 2020, at 21:35, Magnus Hagander  wrote:
> > One not uncommon case is for example being able to provision a new replica 
> > while a backup is running. Since replicas are provisioned starting off a 
> > base backup, being able to run that concurrently is very useful. Especially 
> > if base backups take a long time to run.
> 
> hmm, a backup shouldn't take so long as to delay a base backup significantly. 
> But that's just my opinion. I may be a little snapshot biased, though, where 
> it only takes a few seconds. :-)

That's not the only case that I, at least, have heard of- folks aren't
really very happy with their backups fail when they could have just as
well completed, even if they're overlapping.  Sure, it's better if
backups are scheduled such that they don't overlap, but that can be hard
to guarantee.

The thing about this is though that the new API avoids *other* issues,
like what happens if the system crashes during a backup (which is an
entirely common thing that happens, considering how long many backups
take...) and it does so in a relatively reasonable way while also
allowing concurrent backups, which is perhaps a relatively modest
benefit but isn't the main point of the different API.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Netapp SnapCenter

2020-06-21 Thread Paul Förster
Hi Stephen,

> On 22. Jun, 2020, at 07:36, Stephen Frost  wrote:
> That's not the only case that I, at least, have heard of- folks aren't
> really very happy with their backups fail when they could have just as
> well completed, even if they're overlapping.  Sure, it's better if
> backups are scheduled such that they don't overlap, but that can be hard
> to guarantee.

I see.

> The thing about this is though that the new API avoids *other* issues,
> like what happens if the system crashes during a backup (which is an
> entirely common thing that happens, considering how long many backups
> take...) and it does so in a relatively reasonable way while also
> allowing concurrent backups, which is perhaps a relatively modest
> benefit but isn't the main point of the different API.

that makes me curious about another thing. The output of pg_stop_backup() is to 
be stored. Otherwise the backup is useless. So far, so good. But what if the 
server crashes in the middle of the backup and pg_stop_back() hence is never 
reached? In this case, it obviously does not create any output.

Ok, you usually start the server, the database does a crash recovery and opens. 
Then, some time later, you do the usual backup and all is well. This is like 
99.999% of all cases.

But what if you need to restore to the latest transaction while the database 
was running in backup mode during which the crash occurred. How does that work 
if no pg_stop_backup() output exists? Did I miss something here?

Cheers,
Paul