from:"Peter"

Locked out of schema public

2019-11-06 Thread Peter



This is FreeBSD 11.3, with postgres installed from ports as 10.10.

There is included a daily utility doing pg_dump:
: ${daily_pgsql_pgdump_args:="-U ${daily_pgsql_user} -p ${daily_pgsql_port} -bF 
c"}
pg_dump ${daily_pgsql_pgdump_args} -f ${file} ${db}


Recently I did a restore of some database, as the postgres user, with:
pg_restore -c -d  -h  

and now ordinary users are locked out of the database:

PG::UndefinedTable: ERROR:  relation "users" does not exist

=> \d users
Did not find any relation named "users".
=> \d
Did not find any relations.
=> \d public.users
 Table "public.users"
[etc.etc. all is present]

=> show search_path;
   search_path   
-
 "$user", public
(1 row)

=> select current_schemas(false);
 current_schemas 
-
 {}
(1 row)

eh HOPPALA!!!

=> select * from public.users;
ERROR:  permission denied for schema public


How can this happen? I don't think I twiddled anything with schemas,
in fact I never used them in any way.

cheers,
PMc

Re: Locked out of schema public (pg_dump lacks backup of the grant)

2019-11-06 Thread Peter



Long story short:

pg_dump just forgets to backup the grant on schema public. :(


Long story:

After searching for half an hour to get some comprehensive listing
of permissions (which was in vain) I tried with pgadmin3 (which is
indeed a life-saver and still somehow works on 10.10 - and that's
the reason I am reluctant to upgrade postgres, as this can only get
worse) - and then it was a simple action of comparing page-by-page:

GRANT ALL ON SCHEMA public TO public;

That one is missing on the restored database.


So, if you do a "pg_restore -C -c -d postgres", then you get that
grant from the template database, and no problem. (But this is ugly,
as you need to find and terminate all the connections on the db.)
If you do only "pg_restore -c -d ", the sessions can stay open,
but then it will do

DROP SCHEMA public;
CREATE SCHEMA public;

and it will NOT restore the grant because it is not in the backup.

I'd like to call this a bug.

Re: Locked out of schema public

2019-11-06 Thread Peter

Hi Adrian,

okay, lets check these out:

> What is ${daily_pgsql_user} equal to?

postgres. The owner of the installation.

> I am not seeing -U postgres.
> Are you sure there is not something else specifying the user e.g. env 
> PGUSER?

I'm sure. The log shows the nightly backup connections as
postgres:, and all connections except postgres:postgres work only
with kerberos - it cannot do much bogus there.

> What user are you doing below as?

Ordinary application user. The postgres and superusers do get access
to the tables.

> What does \dn+ show?

=> \dn+
  List of schemas
  Name  |  Owner   |  Access privileges   |  Description   
+--+--+
 public | postgres | postgres=UC/postgres+| standard public schema
|  | pgsql=UC/postgres| 

And after restoring with "pg_restore -C -c -d postgres", when it works
correctly again, then it shows:

-> \dn+
  List of schemas
  Name  |  Owner   |  Access privileges   |  Description   
+--+--+
 public | postgres | postgres=UC/postgres+| standard public schema
|  | =UC/postgres+| 
|  | pgsql=UC/postgres| 

So that was the command I was searching for. Thank You!


For now  I hold on the bug...

cheers,
PMc

Re: Locked out of schema public (pg_dump lacks backup of the grant)

2019-11-06 Thread Peter

Hello Tom,

 thank You very much.

> We improved that situation in v11, I believe.  What I see for this
> case these days is per commit 5955d9341:

> [...]

Ah, well. I don't fully understand that, but as the iessue appears to
be known, then that is fine with me.

This thing is just bad if one never seriously worked with schemas and
has no immediate idea what could have hit - especially when that happens
on the top of a stack of open windows with other issues. :(

cheers,
PMc

12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-09 Thread Peter

Hi folks,

with 12.1, after a couple of queries, at a random place, the clientlib
does produce a failed query without giving reason or error-message [1].
Then when retrying, the clientlib switches off signal handling and
sits inactive in memory (needs kill -9).

The server log shows no error or other hint.
The behaviour happens rarely with trust access, and almost always when
using Kerberos5 (Heimdal as included in FreeBSD).

11.5 clientlib has none of this behaviour and seems to work fine, like
10.10 did.

Environment:
OS  FreeBSD 11.3
Applic. Ruby-on-Rails, ruby=2.5.7, gem 'pg'=1.2.2
(it makes no difference if that one is compiled with
the 12.1 or the 10.10 library)
Server  12.1

[1] the message from ruby is
PG::ConnectionBad: PQconsumeInput() : 

rgds,
PMc

Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-09 Thread Peter

On Thu, Jan 09, 2020 at 01:48:01PM -0500, Tom Lane wrote:
! Peter  writes:
! > with 12.1, after a couple of queries, at a random place, the clientlib
! > does produce a failed query without giving reason or error-message [1].
! > Then when retrying, the clientlib switches off signal handling and
! > sits inactive in memory (needs kill -9).
! 
! Seems like you'd better raise this with the author(s) of the "pg"
! Ruby gem.  Perhaps they read this mailing list, but more likely
! they have a specific bug reporting mechanism somewhere.

Tom,
 I don't think this has anything to do with "pg". Just checked: I get
garbage and misbehaviour on the "psql" command line tool also:

$ psql -h myhost flowmdev
psql (12.1)
GSSAPI-encrypted connection
Type "help" for help.

flowmdev=> select * from flows;
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
flowmdev=> select * from flows;
server sent data ("D" message) without prior row description ("T" message)
flowmdev=> select * from flows;
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
  id  | name   |... 


To the contrary:

$ PGGSSENCMODE="disable" psql -h myhost flowmdev
psql (12.1)
Type "help" for help.

flowmdev=> select * from flows;
  id  | name   |... 


rgds,
PMc

Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-09 Thread Peter

On Thu, Jan 09, 2020 at 10:47:00AM -0800, Adrian Klaver wrote:
! 
! Might want to take at below:
! 
! https://github.com/ged/ruby-pg/issues/311

Thanks a lot! That option
> gssencmode: "disable"
seems to solve the issue.

But I think the people there are concerned by a different issue: they
are bothering about fork(), while my flaw appears also when I do *NOT*
do fork. Also the picture is slightly different; they get segfaults, I
get misbehaviour.

rgds,
PMc

Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-09 Thread Peter

On Thu, Jan 09, 2020 at 04:31:44PM -0500, Tom Lane wrote:
! Peter  writes:
! > flowmdev=> select * from flows;
! > message type 0x44 arrived from server while idle
! > message type 0x44 arrived from server while idle
! > message type 0x44 arrived from server while idle
! 
! Oh ... that does look pretty broken.  However, we've had no other similar
! reports, so there must be something unique to your configuration.  Busted
! GSSAPI library, or some ABI inconsistency, perhaps?  What platform are you
! on, and how did you build or obtain this Postgres code?

This is a FreeBSD 11.3-p3 r351611 built from source. Postgres is built
from
https://svn0.eu.freebsd.org/ports/branches/2019Q4  (rel. 12r1)  or
https://svn0.eu.freebsd.org/ports/branches/2020Q1  (rel. 12.1)
with "make package install".
I have a build environment for base&ports that forces recompiles on
any change and should make ABI inconsistencies quite hard to create.

All local patches are versioned and documented; there are none that
I could imagine influencing this.
There are no patches on postgres. Also no patches on the GSSAPI.
There are a couple of patches on the Heimdal, to fix broken
commandline parsing, broken pidfile handling and broken daemonization.
None of them touches the core functionality (like key handling).

But I just recognize something of interest (which I had taken for
granted when importing the database): the flaw does NOT appear when
accessing the database from the server's local system (with TCP and
GSSAPI encryption active). Only from remote system.

But then, if I go on the local system, and change the mtu:
# ifconfig lo0 mtu 1500
and restart the server, then I get the exact same errors locally.

I don't get a clue of that, it doesn't make sense. With the default
lo0 mtu of 16384 the packets go on the network with the full 8256
bytes you send. With mtu 1500 they are split into 1448 byte pieces;
but TCP is supposed to handle this transparently. And what difference
would the encryption make with this?
> net.inet.tcp.sendspace: 32768
> net.inet.tcp.recvspace: 65536
These are also bigger. No, I don't understand that.

The only thing - these are all VIMAGE jails. VIMAGE was considered
'experimental' some time ago, and went productive in FreeBSD 12.0, 
and 11.3 is lower and later than 12.0 - whatever that concedes.

Another thing I found out: the slower the network, the worse the
errors. So might it be nobody complained just because those people
usually having GSSAPI also have very fast machines and networks
nowadays?

When I go to packet-radio speed:
# ipfw pipe 4 config bw 10kbit/s

then I can see the query returning empty at the first received bytes:
flowmdev=# select * from flows;
flowmdev=# 

and not even waiting the 8 seconds for the first block to arrive.

rgds,
PMc

12.2: Howto check memory-leak in worker?

2020-05-04 Thread Peter

Hi all,
  I have something that looks a bit insane:

# ps axl | grep 6145
  UID   PID  PPID CPU PRI NI VSZRSS MWCHAN   STAT TT TIME 
COMMAND
  770  6145 1   0  20  0  241756868 select   SsJ   -  0:24.62 
/usr/local/bin/postgres -D 
  770  6147  6145   0  23  0  243804 109784 select   IsJ   -  3:18.52 
postgres: checkpointer(
  770  6148  6145   0  20  0  241756  21348 select   SsJ   -  2:02.83 
postgres: background writer
  770  6149  6145   0  20  0  241756   7240 select   SsJ   - 16:36.80 
postgres: walwriter(pos
  770  6150  6145   0  20  0   21980876 select   SsJ   -  0:13.92 
postgres: archiver   last w
  770  6151  6145   0  20  0   21980980 select   SsJ   -  0:58.45 
postgres: stats collector  
  770  6152  6145   0  20  0  241756   1268 select   IsJ   -  0:02.07 
postgres: logical replicati
  770 43315  6145   0  21  0  251844   7520 select   IsJ   -  1:07.74 
postgres: admin postgres 19
  770 43317  6145   0  25  0  251764   8684 select   IsJ   -  1:28.89 
postgres: admin bareos 192.
  770 43596  6145   0  20  0  245620   4476 select   IsJ   -  0:00.12 
postgres: admin bareos 192.
  770 43761  6145   0  20  0  245620   4476 select   IsJ   -  0:00.15 
postgres: admin bareos 192.
  770 90206  6145   0  52  0 1331256 219720 racctDsJ   -563:45.41 
postgres: bareos bareos 192

The 90206 is continuously growing. It is the unspecific, all-purpose
worker for the www.bareos.com backup tool, so it is a bit difficult to
figure what precisely it does - but it tends to be rather simple
straight-forward queries, so it is unlikely to have dozens of "internal sort
operations and hash tables".

What I can say that at times this worker is completely idle in
ClientRead, but does not shrink in memory. Is this a normal behaviour?

Here is a more dynamic picture: it continues to add 2048kB chunks (and
does not do noticeable paging):

  UID   PID  PPID CPU PRI NI VSZRSS MWCHAN   STAT TT TIME 
COMMAND
Mon May  4 13:33:09 CEST 2020
  770 90206  6145   0  91  0 1335352 226900 -RsJ   -569:09.19 
postgres: bareos bareos SELECT (postgres)
Mon May  4 13:33:39 CEST 2020
  770 90206  6145   0  93  0 1335352 227696 -RsJ   -569:28.48 
postgres: bareos bareos idle (postgres)
Mon May  4 13:34:09 CEST 2020
  770 90206  6145   0  92  0 1337400 228116 -RsJ   -569:47.46 
postgres: bareos bareos SELECT (postgres)
Mon May  4 13:34:39 CEST 2020
  770 90206  6145   0  92  0 1337400 228596 -RsJ   -570:06.56 
postgres: bareos bareos UPDATE (postgres)
Mon May  4 13:35:09 CEST 2020
  770 90206  6145   0  92  0 1337400 228944 -RsJ   -570:25.62 
postgres: bareos bareos SELECT (postgres)
Mon May  4 13:35:40 CEST 2020
  770 90206  6145   0  52  0 1337400 229288 racctDsJ   -570:44.33 
postgres: bareos bareos UPDATE (postgres)
Mon May  4 13:36:10 CEST 2020
  770 90206  6145   0  91  0 1337400 229952 -RsJ   -571:03.20 
postgres: bareos bareos SELECT (postgres)
Mon May  4 13:36:40 CEST 2020
  770 90206  6145   0  52  0 1337400 223772 racctDsJ   -571:21.50 
postgres: bareos bareos SELECT (postgres)
Mon May  4 13:37:10 CEST 2020
  770 90206  6145   0  91  0 1337400 224448 -RsJ   -571:40.63 
postgres: bareos bareos idle (postgres)
Mon May  4 13:37:40 CEST 2020
  770 90206  6145   0  91  0 1339448 225464 -RsJ   -571:58.36 
postgres: bareos bareos SELECT (postgres)
Mon May  4 13:38:10 CEST 2020
  770 90206  6145   0  52  0 1339448 215620 select   SsJ   -572:14.24 
postgres: bareos bareos idle (postgres)
Mon May  4 13:38:40 CEST 2020
  770 90206  6145   0  81  0 1339448 215320 -RsJ   -572:21.09 
postgres: bareos bareos idle (postgres)
Mon May  4 13:39:10 CEST 2020


OS is FreeBSD 11.3-RELEASE-p8 r360175M i386
PostgreSQL 12.2 on i386-portbld-freebsd11.3, compiled by gcc9 (FreeBSD Ports 
Collection) 9.3.0, 32-bit

autovacuum is Disabled.

The memory-specific config is:
> shared_buffers = 200MB
> temp_buffers = 40MB
> work_mem = 80MB
> maintenance_work_mem = 250MB
> dynamic_shared_memory_type = posix
> random_page_cost = 2.0
> effective_cache_size = 1GB
(others are left at default)

I remember vaguely that there are means to have a closer look into
what is using the memory, but do not recall the specifics. Some
pointers or ideas to proceed would be gladly appreciated (Dtrace
should work) - processes will usually fail with OOM at this size, due
to machine configuration - I'm waiting for that now (it is a very very
old pentium3 machine ;) ). 

cheerio,
PMc

Re: 12.2: Howto check memory-leak in worker?

2020-05-04 Thread Peter

On Mon, May 04, 2020 at 12:55:38PM -0700, Adrian Klaver wrote:

! > The 90206 is continuously growing. It is the unspecific, all-purpose
! > worker for the www.bareos.com backup tool, so it is a bit difficult to
! > figure what precisely it does - but it tends to be rather simple
! > straight-forward queries, so it is unlikely to have dozens of "internal sort
! > operations and hash tables".
! > 
! 
! Is there any relevant information in the bareos or Postgres logs?

No. Only the final exitus-in-tabula.
 
! > autovacuum is Disabled.
! 
! Any particular reason for above?

Yes. Disk spindown doesn't work so, and I see no reason to have a
backup zone that is used few times a day spinning 24 hours.

BTW, I would greatly appreciate if we would reconsider the need for
the server to read the postmaster.pid file every few seconds (probably
needed for something, I don't know).
That makes it necessary to set atime=off to get a spindown, and I
usually prefer to have atime=on so I can see what my stuff is
currently doing.


cheerio,
PMc

Re: 12.2: Howto check memory-leak in worker?

2020-05-05 Thread Peter

On Tue, May 05, 2020 at 10:57:04AM +1200, Thomas Munro wrote:
! On Tue, May 5, 2020 at 10:13 AM Peter  wrote:
! > BTW, I would greatly appreciate if we would reconsider the need for
! > the server to read the postmaster.pid file every few seconds (probably
! > needed for something, I don't know).
! > That makes it necessary to set atime=off to get a spindown, and I
! > usually prefer to have atime=on so I can see what my stuff is
! > currently doing.
! 
! That'd be this:
! 
! /*
!  * Once a minute, verify that postmaster.pid hasn't been removed or
!  * overwritten.  If it has, we force a shutdown.  This avoids having
!  * postmasters and child processes hanging around after their database
!  * is gone, and maybe causing problems if a new database cluster is
!  * created in the same place.  It also provides some protection
!  * against a DBA foolishly removing postmaster.pid and manually
!  * starting a new postmaster.  Data corruption is likely to ensue from
!  * that anyway, but we can minimize the damage by aborting ASAP.
!  */
! if (now - last_lockfile_recheck_time >= 1 * SECS_PER_MINUTE)
! {
! if (!RecheckDataDirLockFile())

Ah. Yes. Thanks for identifying this issue.

It is really hard to comment on this in a way that might not be
considered offensive, so lets put it that way: You all have seen
the Apollo-13 movie, so You know that a need to power-down as much
as possible may appear en-route, and in a situation where you have
lots of other issues, so what you need the least is things like
this getting in your way.
I usually take this a bit further and consider only two relevant
quality levels for software: interplanetary and interstellar - where
i.p. is the kind of stuff on which we will trust our lives during
the manned Jupiter mission. PG might well qualify for this, except
with things like the quoted; so I think such should be tagged with
appropriate compile-time switches.

Now concerning the memory leak:
That one was introduced with the work done on the GSSAPI encryption;
it goes away when setting 'hostnogssenc' in pg_hba. (And, whoever has
editor rights, there is also a little fixup needed for that switch in
pg_hba.conf; it should probably be "eleven formats".)


cheerio,
PMc

Re: 12.2: Howto check memory-leak in worker?

2020-05-05 Thread Peter

On Tue, May 05, 2020 at 11:59:27AM -0400, Tom Lane wrote:

! Well, the choice we face is preventing somebody's disk from spinning
! down, versus preventing somebody else from completely corrupting their
! database.  From where I sit that's not a difficult choice, nor one
! that I feel a need to let users second-guess.

Then maybe You see a scenario where that feature would actually
prevent db corruption, while I have not yet found a realistic one.
Instead, what gave me headaches is that ZFS might take a single
tablespace (=pool) offline with the cluster continuing to run - and 
I am not sure if the db is supposed to survive that (mine did, after I
had hit the power button in horror), nor is it easy to protect from
that.
Anyway, I can now switch that feature off per local patch, which is
the second-best solution.
 
! Oooh ... it looks like some of the encryption code paths have neglected
! to call gss_release_buffer.  Will fix, thanks for the report!

Great! So I assume I don't need to send a bug report I had prepared
interim. Feel free to ask me for anything that might be still needed.

cheerio,
PMc

12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Peter

Hi all,
  this is a 12.2 Release on FreeBSD 11.3.

I am doing RedoLog Archiving according to Docs Chapter 25.1. 

During the last week I have lost 4 distinct Redo Logs; they are
not in the backup.

Loosing a RedoLog is very bad, because there is no redundancy,
loosing a single one of them makes the timeline disappear and it
will only reappear after another Base Backup. Very very bad.

So I did analyze the matter.

There are three ways to restart a Berkeley machine:

1. Cowboy it (aka pull-the-plug). This makes sure that everything is
   certainly dead immediately, and, given all hard- and software is
   well designed, nothing bad should happen.

2. Shut it down. This is the official means, and it takes very long,
   because each and every applications are signalled and given time to
   bring up whatever objections they may have.

   In this case "pg_ctl stop" will be invoked with whatever options the
   sysop has configured, and postgres will copy out a full log into
   archive before terminating.

3. Halt/Reboot it, like this:
   
https://www.freebsd.org/cgi/man.cgi?query=reboot&sektion=8&manpath=FreeBSD+11.3-RELEASE
   This is considered more safe than pull-the-plug, and still fast.
   Applications are killed without much waiting, but all disk buffers
   are flushed to permanent storage and filesystems closed.
   
   In this case, it seems,  Postgres will delete the current log
   without archiving it. :(

   What precisely happens (according to the OS sources) during reboot
   is this: processes will be sent SIGTERM, and after some 2-10
   seconds followed by SIGKILL.

--

Lets have a closer look:

We did a regular shutdown at 17:09, and then we did a reboot at 19:24.

Here is the content of the staging area (where the logs are
copied to and accumulated until it is worth to run a backup job):

# dir arch/
total 240
drwx--  2 postgres  postgres 5 Jun  8 17:09 .
drwxr-xr-x  6 postgres  postgres 7 Jun  8 17:09 ..
-rw---  1 postgres  postgres  16777216 Jun  8 09:38 
0001001700FC.ok
-rw---  1 postgres  postgres  16777216 Jun  8 10:48 
0001001700FD.ok
-rw---  1 postgres  postgres  16777216 Jun  8 17:09 
0001001700FE.ok

And here is the pg_wal directory:

# dir data12/pg_wal/
total 89256
drwx--   3 postgres  postgres10 Jun  8 19:28 .
drwx--  19 postgres  postgres23 Jun  8 19:28 ..
-rw---   1 postgres  postgres   335 Jun  7 07:36 
0001001700EF.0060.backup
-rw---   1 postgres  postgres  16777216 Jun  8 19:38 
00010018
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180001
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180002
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180003
-rw---   1 postgres  postgres  16777216 Jun  7 07:17 
000100180004
-rw---   1 postgres  postgres  16777216 Jun  7 07:36 
000100180005
drwx--   2 postgres  postgres 3 Jun  8 17:09 archive_status
# dir data12/pg_wal/archive_status
total 23
drwx--  2 postgres  postgres   3 Jun  8 17:09 .
drwx--  3 postgres  postgres  10 Jun  8 19:28 ..
-rw---  1 postgres  postgres   0 Jun  7 07:36 
0001001700EF.0060.backup.done

Now where the hell is my "FF" log ???

Lets check syslog - this was the full shutdown at 17:09:

Jun  8 17:09:38  admn pg-bck[73534]: [10-1] :[] LOG:  0: 
received fast shutdown request
Jun  8 17:09:38  admn pg-bck[73534]: [10-2] :[] LOCATION:  pmdie, 
postmaster.c:2780
Jun  8 17:09:38  admn pg-bck[73534]: [11-1] :[] LOG:  0: 
aborting any active transactions
Jun  8 17:09:38  admn pg-bck[73534]: [11-2] :[] LOCATION:  pmdie, 
postmaster.c:2813
Jun  8 17:09:38  admn pg-bck[73549]: [10-1] :[] DEBUG:  0: 
logical replication launcher shutting down
Jun  8 17:09:38  admn pg-bck[73549]: [10-2] :[] LOCATION:  
ProcessInterrupts, postgres.c:2981
Jun  8 17:09:38  admn pg-bck[73534]: [12-1] :[] LOG:  0: 
background worker "logical replication launcher" (PID 73549) exited with exit 
code 1
Jun  8 17:09:38  admn pg-bck[73534]: [12-2] :[] LOCATION:  
LogChildExit, postmaster.c:3657
Jun  8 17:09:38  admn pg-bck[73544]: [13-1] :[] LOG:  0: 
shutting down
Jun  8 17:09:38  admn pg-bck[73544]: [13-2] :[] LOCATION:  
ShutdownXLOG, xlog.c:8321
Jun  8 17:09:45  admn pg-bck[82223]: RedoLog.bck invoked 
pg_wal/0001001700FE 0001001700FE
Jun  8 17:09:45  admn pg-bck[82223]: RedoLog.bck 
pg_wal/0001001700FE 0001001700FE returns 0
Jun  8 17:09:45  admn pg-bck[73547]: [8-1] :[] DEBUG:  0: 
archived write-ahead log file "0001001700FE"
Jun  8 17:09:45  admn pg-bck[73547]: [8-2] :[] LOCATION:  
pgarch_archiveXlog, pgarch.c:675
Jun  8 17:09:55  admn pg-bck[73544]: [14-1] :[] LOG:  0: 
checkpoint starting: shutdown immediate
Jun  8 17:09:55  admn pg-bck[735

Re: 12.2: Why do my Redo Logs disappear??

2020-06-08 Thread Peter

On Mon, Jun 08, 2020 at 05:40:20PM -0700, Peter Geoghegan wrote:
! 
! I strongly suspect that you were hit by the bug fixed in commit
! 4e87c483. You should upgrade to Postgres 12.3 ASAP, to get that fix:
! 
! "Avoid premature recycling of WAL segments during crash recovery
! (Jehan-Guillaume de Rorthais)
! 
! WAL segments that become ready to be archived during crash recovery
! were potentially recycled without being archived."

Ahh, thank You so much; this is good news: if it is an already known
bug, I can close all efforts, remove the debug-switches again, stuff
the sources back into their box and relax. ;)

I did a check if I would find something about lost archiving logs, but
didn't - and then it is always possible that it's just a mistake
of mine - as I know I do make mistakes.

And now for the nitpicking part :)

On Mon, Jun 08, 2020 at 05:35:40PM -0700, Adrian Klaver wrote:

! > I am doing RedoLog Archiving according to Docs Chapter 25.1.
! 
! There is no ReDo logging, there is WAL logging.

Yes I know - and i don't care. Technically they're no longer WAL when
they're going to be archived. Their purpose then becomes to redo the
transactions, and even the messages say so:
! > LOG:  0: redo done at 17/FF01C098

! What docs, because section 25.1 in the Postgres docs is :
! 
! https://www.postgresql.org/docs/12/backup-dump.html
! 
! 25.1. SQL Dump

Ups, mistake of mine. Should be 25.3.1.

! https://www.postgresql.org/docs/12/server-shutdown.html
! "
! Important
! 
! It is best not to use SIGKILL to shut down the server. Doing so will prevent
! the server from releasing shared memory and semaphores. Furthermore, SIGKILL
! kills the postgres process without letting it relay the signal to its
! subprocesses, so it might be necessary to kill the individual subprocesses
! by hand as well.

And which of these would be of any concern if the machine is rebooted
anyway?

I had to install new hardware, and currently I'm trying to identify
a memory exhaustion issue. This makes it necessary to reboot the full
system quite often, and I neither want to wait for orderly termination
of dozens of subsytems, nor do I want to need fsck at restart. This
would make SIGKILL/reboot the method of choice.

! What is RedoLog.bck?

The script which one has to write according to the docs' section which
would be correctly numbered 25.3.1.

cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-08 Thread Peter



Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one: 

If you do copy out the archive logs not directly to tape, but to
some disk area for further processing, then there is an issue with  
possible loss. If you do it like the docs say, with a command like  
this:   

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p 
+/mnt/server/archivedir/%f'  # Unix 

That "cp" is usually not synchronous. So there is the possibility   
that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log  
being safely away.  

But the target of the copy may not yet been written to disk. If 
at that point a power loss happens, the log may become missing/damaged/ 
incomplete, while the database may or may not consider it done  
when restarting.

Therefore, mounting such a target filesystem in all-synchronous mode
might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")  

cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-09 Thread Peter

On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote:
! 
! On 6/8/20 7:33 PM, Peter wrote:
! > 
! > Actually, the affair had some good side: as usual I was checking
! > my own designs first and looking for flaws, and indeed I found one:
! > If you do copy out the archive logs not directly to tape, but to
! > some disk area for further processing, then there is an issue with
! > possible loss. If you do it like the docs say, with a command like
! > this:
! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
! > +/mnt/server/archivedir/%f'  # Unix
! > That "cp" is usually not synchronous. So there is the possibility
! > that this command terminates successfully, and reports exitcode zero
! > back to the Postgres, and then the Postgres will consider that log
! > being safely away.
! 
! Which is why just following the above command in the docs is:
! 
! "(This is an example, not a recommendation, and might not work on all
! platforms.) "

So, what You are basically saying is: my worries are justified and
correctly founded, and this is indeed a matter that needs to be taken
care of.
Thank You.

! Generally for peace of mind folks use third party tools like:
! 
! pg_backrest(https://pgbackrest.org/),
! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or
! Barman(https://www.pgbarman.org/).

Hmja. We may on occasion have a look into these...

! I use pg_backrest, but it does not look promising for running on BSD:
! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html

That looks mostly like the usual things which can be fixed.

Now, for the facts: I am already using a professional backup
solution. (It is actually a "dual-use/commercial" solution, of the
kind which you can either fetch from github and use without support,
or buy with a contract or whatever and get support.)

With this professional backup solution I have already identified 28
(spell: twenty-eight) bugs, and fixed/workarounded these, until I got it
properly working.

This professional backup solution also offers support for postgres.
Sadly, it only covers postgres up to Rel.9, and that piece of software
wasn't touched in the last 6 or 7 years.
But the bigger issue there is, that backup solution needs it's
own postgres database as it's backend - and it cannot backup the
database it is using. Looks quite pointless to me, then.
So I just did it all with shell (and it wasn't many lines).

So now, as I've been thru identifying and handling all the issues in
that one backup solution, and since it is supposed to handle *all*
backup demands (and not only postgres), I will certainly not start
and go thru the same process again with one of these supposed
solutions, where 90% of the code tries to solve the same things
redundantly again, but then only for PG.


Actually, I am getting very tired of reading that something which can
easily be done within 20 lines of shell scripting, would need special
"solutions", solutions that need to be compiled, solutions that would
bring along their own fashion of interpreter, solutions that have a
lot of their own dependencies and introduce mainly one thing: new bugs.

Does nobody know anymore how to do proper systems management
scripting? Using just the basic system tools which have proven to
work for more than 50 years now!?

! Not sure about pg_probackup.

Okay, I had a -very short- look into these. Just scanning the
introductory pages.

The only really interesting thing there is the pg_probackup. These
folks seem to have found a way to do row-level incremental backups.

And pgbarman seems to have an impressive understanding of ITIL (in
case anybody bothers about that).

All these tools do only cover PG, but do that in any possible regards.

This is fine as long as you do not run any computers, and the only
application you are using is Postgres.
But, if you have other applications as well, or have computers, then
you will need a different backup solution, something that will cover
your site-wide backup demands, in a consistent fashion (think
something in the style of ADSM, or nowadays called Spectrum Protect).

And then 90% of the things offered here become superfluous, because
they are already handled site-wide. And then you will have to
consider integration of both pieces - and that will most likely be
more work and more error-prone than just writing a few adapters in
shell.



cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-09 Thread Peter

On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote:
! Adrian Klaver  writes:
! > On 6/8/20 7:33 PM, Peter wrote:
! >> That "cp" is usually not synchronous. So there is the possibility
! >> that this command terminates successfully, and reports exitcode zero
! >> back to the Postgres, and then the Postgres will consider that log
! >> being safely away.
! 
! > Which is why just following the above command in the docs is:
! > "(This is an example, not a recommendation, and might not work on all 
! > platforms.) "
! 
! Yeah.  There have been discussions about changing that disclaimer to be
! more forceful, because in point of fact a plain "cp" is generally not safe
! enough.  You need to fsync the written file, and on many filesystems you
! also have to fsync the directory it's in.

It certainly does not need to be "more forceful" - because this is not
about behavioural education, like training dogs, horses, or monkeys,
and neither do we entertain a BDSM studio.

What it needs instead is mention of the magic word "fsync". Because,
we already know that - we just need a reminder at the proper place.

Systems integrators are professional people. They are not in need of
more beating (spell: forceful education), only of substantial
technical hints and informations.

! > Generally for peace of mind folks use third party tools like:
! 
! +1.  Rolling your own archive script is seldom advisable.

Well then, using commercial solutions brings it's own problems. E.g.,
the people I happened to work for often had problems with torsion,
which happens when the solution gets longer than, say, twenty meters,
and these are walked at high speeds.

They didn't have a problem with scripting - rather the opposite, they
were happy with it and paid good money for.


cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-09 Thread Peter

On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:

! The backup solution is?

https://www.bareos.com/

! Fine rant below. Go forth and work your wonders.

I don't need to, anymore. I did that, for about 20 years - people
I used to work for as a consultant (major banks and insurance shops)
would usually run Informix or Oracle. Postgres is just my own private
fancy.

On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! * Peter (p...@citylink.dinoex.sub.org) wrote:
! > This professional backup solution also offers support for postgres.
! > Sadly, it only covers postgres up to Rel.9, and that piece of software
! > wasn't touched in the last 6 or 7 years.
! 
! Then it certainly doesn't work with the changes in v12, and probably has
! other issues, as you allude to.

Just having a look at their webpage, something seems to have been updated
recently, they now state that they have a new postgres adapter:

https://www.bareos.com/en/company_news/postgres-plugin-en1.html
Enjoy reading, and tell us what You think.

! > Actually, I am getting very tired of reading that something which can
! > easily be done within 20 lines of shell scripting, would need special
! 
! This is just simply false- you can't do it properly in 20 lines of shell
! scripting.

Well, Your own docs show how to do it with a one-liner. So please
don't blame me for improving that to 20 lines.

! Sure, you can write something that has probably next to no
! error checking,

Before judging that, one should first specify what precisely is the
demand.
In a basic approach, the demand may be to get the logs out on tape in
a failsafe automated fashion without any miss, and get the data tree
out periodically, and have guaranteed that these files are untampered
as on disk.

And that can very well be done properly with an incremental filesystem
backup software plus some 20 lines of shellscript.

Now talking about doing an automated restore, or, having some menu-
driven solution, or -the worst of all- having a solution that can be
operated by morons - that's an entirely different matter.

In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.

You may consider different demands, and that is also fine, but doesn't
need to concern me.

! uses the deprecated API that'll cause your systems to
! fail to start if you ever happen to have a reboot during a backup

It is highly unlikely that I did never have that happen during 15
years. So what does that mean? If I throw in a pg_start_backup('bogus'),
and then restart the cluster, it will not work anymore?
Lets see...
Clean stop/start - no issue whatsoever. (LOG:  online backup mode
canceled)
kill -9 the whole flock - no issue whatsoever (Log: database system
was interrupted)
I won't pull the plug now, but that has certainly happened lots of
times in the past, and also yielded no issue whatsoever - simply
because there *never* was *any* issue whatsover with Postgres (until
I got the idea to install the relatively fresh R.12 - but that's
understandable).

So maybe this problem exists only on Windows?

And yes, I read that whole horrible discussion, and I could tear my
hair out, really, concerning the "deprecated API". I suppose You mean
the mentioning in the docs that the "exclusive low-level backup" is
somehow deprecated.
This is a very big bad. Because: normally you can run the base backup
as a strictly ordinary file-level backup in "full" mode, just as any
backup software can do it. You will simply execute the
pg_start_backup() and pg_stop_backup() commands in the before- and
after- hooks - and any backup software will offer these hooks.

But now, with the now recommended "non-exclusive low-level backup",
the task is different: now your before-hook needs to do two things
at the same time:
 1. keep a socket open in order to hold the connection to postgres
(because postgres will terminate the backup when the socket is
closed), and
 2. invoke exit(0) (because the actual backup will not start until
the before- hook has properly delivered a successful exit code.
And, that is not only difficult, it is impossible.

So, what has to be done instead: you need to write a separate network
daemon, with the only purpose of holding that connection to postgres
open. And that network daemon needs to handle the communication to
the backup software on one side, and to postgres on the other side.
And that network daemon then needs the full-blown feature requirements
as a fail-safe network daemon should have (and that is a LOT!), plus
it needs to handle all kinds of possible failures (like network
interruptions) in that triangle, during the backup, and properly
notify both sides of whatever might be ongoing (and that

Re: Something else about Redo Logs disappearing

2020-06-10 Thread Peter

On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! > And then 90% of the things offered here become superfluous, because
! > they are already handled site-wide. And then you will have to
! > consider integration of both pieces - and that will most likely be
! > more work and more error-prone than just writing a few adapters in
! > shell.
! 
! pgbackrest's repo can be safely backed up using the simple file-based
! backup utilities that you're referring to here.  I suspect some of the
! other solution's backups also could be, but you'd probably want to make
! sure.

What repo?? I seem to have missed that at first glance.

Are You indeed suggesting that one should have their data within
the database, where it is worked with, and then use Your tool
to copy it to some "repo" disk playground whatever area, and then
use their regular backup system to COPY IT AGAIN into their
backup/archiving system? Are You kiddin'?

Are You indeed suggesting that people should buy the amount
of disks that they use for their database AGAIN in order for Your
software to copy the stuff around?

Is this becoming a madhouse, or are You going to refund them that?

Let me tell You something: the people I used to work for, sometimes
had a problem. They had some amount of data that was created during
the day, and they had the night to write that data away to backup.
That would usually mean, four or eight of the big tapes, streaming in
parallel, fibers saturated, all night thru. And the problem usually was
that they would need a longer night. At least the math had to be done
properly.

Maybe You never encountered these, but there are surroundings where
there is no spare room for nonsense. Maybe that'S why these people
preferred to use oracle.


cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-11 Thread Peter

On Wed, Jun 10, 2020 at 01:10:36PM +0200, Magnus Hagander wrote:

! > Just having a look at their webpage, something seems to have been updated
! > recently, they now state that they have a new postgres adapter:
! >
! > https://www.bareos.com/en/company_news/postgres-plugin-en1.html
! > Enjoy reading, and tell us what You think.
! >
! 
! This one unfortunately rings out of somebody who doesn't know how to back
! up postgres, at least in the past 10-15 years.
! 
! They are using an API that has been deprecated for years - in what's
! announced as a brand new product. They are advocating local archiving,
! which basically guarantees dataloss in the event of a disaster.

Aye, thank You, that's exactly the impression I got. This is probably
still the old thing I was talking about, just made into a new product.
 
! That's from a 3 minute look, but that's definitely enough to suggest this
! is  not something I'd consider using.

The matter is, that backup software (as a whole, not this postgres
component) offers lots of things exactly as I like them. It is a great
concept, a great implementation, but a bad coding quality and a bad
maintenance policy. But then, one can get it for free; and I know
of no other with such features. So I went thru the effort of fixing
it up, so that it now well serves my needs - and use my own scripting
for the add-ons.
 
! > Well, Your own docs show how to do it with a one-liner. So please
! > don't blame me for improving that to 20 lines.
! >
! 
! Yes, those docs are unfortunately "known bad" and should definitely be
! improved on. it does very clearly state that the example is just an
! example. But it doesn't clearly state *why* it shouldn't be used.

That's why I felt the ethical need to speak up and share my
consideration. Now it's up to those in charge and not my issue
anymore. ;)

! In my understanding, backup is done via pgdump. The archive logs are
! > for emergencies (data corruption, desaster), only. And emergencies
! > would usually be handled by some professional people who know what
! > they have to do.
! >
! 
! I'd say it's the exact opposite. backups are done via pg_basebackup or
! manual basebackups. Archive logs are for point in time recovery. pg_dump
! can be used as a secondary "backup to the backups" option, but it is most
! interesting for things that are not backups (such as inspecting data, or
! provisioning partial test systems).
! 
! Different for different scenarios of course, but that would be the base
! scenario. And pg_dump are definitely as far from good backups are you can
! get while still having something that can be called approximately backups.
! It might be enough for small databases, but even in those cases
! pg_basebackup (without archive logging) is easier...

It's easier to create - but to apply? That depends on how many DBs are
in the cluster and how diverse their use. Also at any major version
switch these backups get worthless; one cannot use them for longterm.
(I suppose this is also true for pg_basebackup.)

I'm creating my longterm (and offsite) simply as clones from the regular
full backup. So what I came up with for now, is: I run pg_dump over all
the present databases, plus globals, chunk that up (in a similar way
like chunked HTTP works), feed it onto a pipe and backup that pipe. No
need for interim storage, so it can get as large as the backup
software can take it. And that should work for longterm - and I don't 
currently see a better option.

(This one does not work in 20 lines shellscript, because I didn't get
a reliable chunker running in shell.)

! And yes, I read that whole horrible discussion, and I could tear my
! > hair out, really, concerning the "deprecated API". I suppose You mean
! > the mentioning in the docs that the "exclusive low-level backup" is
! > somehow deprecated.
! >
! 
! Yes. There is no "somehow", it's deprecated.

Then lets not call it "somehow", as, more precisely, from my
understanding so far, that so called "new API" is ill-conceived and
troublesome in more than one regard. I would, with my current
knowledge, recommend to avoid, or better, abandon it.

Or, in other words: it is similar to what Boeing tried to do, in
forcing things upon people via software, for safety reasons - and
now see where Boeing got with that.

! > But now, with the now recommended "non-exclusive low-level backup",
! > the task is different: now your before-hook needs to do two things
! > at the same time:
! >  1. keep a socket open in order to hold the connection to postgres
! > (because postgres will terminate the backup when the socket is
! > closed), and
! >  2. invoke exit(0) (because the actual backup will not start until
! > the before- hook has properly delivered a successful exit code.
! > And, that is not only difficult, it is impossible.
!
! It is not impossible. It is harder if you limit your available tools yes,
! but it also *works*.

In this description which I choose, I would think it is actually
imposs

Re: Something else about Redo Logs disappearing

2020-06-11 Thread Peter

On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote:

! > What repo?? I seem to have missed that at first glance.
! 
! Yes, pgbackrest has a repo, like most other tools (though they call them
! different things... pg_basebackup has one though it's not really
! formal).
! 
! > Are You indeed suggesting that one should have their data within
! > the database, where it is worked with, and then use Your tool
! > to copy it to some "repo" disk playground whatever area, and then
! > use their regular backup system to COPY IT AGAIN into their
! > backup/archiving system? Are You kiddin'?
! 
! No, I'm not kidding and yes, that's what I'm suggesting.

Well, do You think I could seriously sell that to anybody?

! You need a
! consistent backup of your database that includes all the needed WAL to
! perform a restore.

Okay. So lets behave like professional people and figure how that
can be achieved:
At first, we drop that WAL requirement, because with WAL archiving
it is already guaranteed that an unbroken chain of WAL is always
present in the backup (except when we have a bug like the one that
lead to this discussion).
So this is **not part of the scope**.

I'll get to the other item, the "consistent backup", further below.

! This is only one option though, there are others- you can also use
! pgbackrest to push your backups to s3 (or any s3-compatible data storage
! system, which includes some backup systems), and we'll be adding
! support

! I concur that this is becoming a madhouse, and is pushing past the limit
! for what I'm willing to deal with when trying to assist someone.

Well, then that might be a misconception. I'm traditionally a
consultant, and so I am used to *evaluate* solutions. I don't need
assistance for that, I only need precise technical info.

So lets get serious:
It is NOT technically feasible to amplify the storage in a way that
the entire backup data gets copied from the live database to some "repo"
place first, and then again from that repo place to regular file-based
backup/archiving storage.
And it does not make a difference WHERE that other place is, if at
Jeff's or whereever. It just does not belong into the loop.

So, how could the alternative look like?

I for my part consider doing this:
With a proper transactional filesystem we can do recursive filesystem
snapshots. That means, given a suitable filesystem layout, we can do a
snapshot of the data tree, AND the pg_wal filesystem, AND the
respective tablespaces. And this is atomical.

So, what we then do in the before- hook, is:

 * we call pg_start_backup(whatever, false, false).
 * we issue the filesystem snapshot, atomical.
 * we call pg_stop_backup(whatever).

And then we return with exit(0) (if all succeeded), and the file-based
backup software can start to collect the files from the filesystem
snapshot, and release it afterwards.

This STILL needs threaded programming (as I said, there is no way to
avoid that with those "new API"), but in this case it is effectively
reduced to just grab the return-code of some program that has been
started with "&".

So far, so good.

There is still one thing to be done, namely, the requirement to
collect the data reported by pg_stop_backup() and add that to the
backup, at a point in time where that is ALREADY CLOSED! (that's
another piece of gross bogus in this "new API") - but with my intended
method (and some ugliness) this could now also be solved.

But then, lets think another step forward: for what purpose do we
actually need to call pg_start_backup() and pg_stop_backup() at all?
I couldn't find exhaustive information about that, only some partial
facts.

What we know for certain, is: if we leave these calls away, and
just do a filesystem snapshot and make a backup from that, then we
have exactly the same thing as if we had a power failure at that time.
So this is restoreable, and the server will certainly start.

The remaining question is: can we then give it our subsequently
archived redologs and make it roll forward before going ready? I don't
know that yet, but maybe, if we just throw these WAL into pg_wal, the
thing might be stupid enough to swallow them.

If that does not work, then there might still be ways to trick it
into believing it's a standby server and make it roll forward these
WAL.

So, there are not only ways to avoid the useless storage
amplification, there might even be means to get rid of that whole
misconceived "API".

Things that remain to be figured out:
 1. What does pg_start_backup actually do and why would that be
necessary? I could not find exhaustive information, but this can
probably figured from the source. Currently I know so much:
 - it writes a backup_label file. That is just a few lines of
   ASCII and should not be difficult to produce.
 - it does a checkpoint. This can probably be done with the
   respective command (if it is needed at all).
 - it does temporarily switch to full_page_writes. This cannot
   be don

Re: Something else about Redo Logs disappearing

2020-06-13 Thread Peter

On Thu, Jun 11, 2020 at 10:53:15PM +0200, Laurenz Albe wrote:
! On Thu, 2020-06-11 at 22:35 +0200, Magnus Hagander wrote:
! > I believe somebody around that time also wrote a set of bash scripts that 
can be used in a pre/post-backup-job combination with the current APIs.
! 
! https://github.com/cybertec-postgresql/safe-backup

Ah, thank You, very nice.

I've never seen anybody coding bash - it is strongly shunned in the
Berkeley community. 

Some Questions:
 1. There are explicit error messages in loc-82 and -92 of pgpre.sh.
To where are these written?
 2. The result data from pg_stop_backup() are stored into the living
database. But, according to the docs, they should be placed into
the completed backup. Do I have a misunderstanding here?
 3. The most common error cause of a backup might be tape-side
malfunction. So far as I see, the way to handle this is currently,
to provide a timeout for pgpre.sh (which is problematic, because
we might as well have just reached end-of-tape and have to wait
until monday for the operator to change it). May I suggest to add
a switch to pgpost.sh, in order to volutarily fail out of the job?
 4. If, by misconfiguration and/or operator error, the backup system
happens to start a second backup. in parallel to the first,
then do I correctly assume, both backups will be rendered
inconsistent while this may not be visible to the operator; and
the earlier backup would be flagged as apparently successful while
carrying the wrong (later) label?

BTW: what does, in general, happen, if a backup_label file gets
accidentially swapped with one from a parallel, but slightly later
backup? Do I correctly assume that such mistake gets somehow detected,
as otherwise it would have just the same unwelcome effects
(i.e. silent data corruption) as no backup_label at all?

cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-13 Thread Peter

On Sat, Jun 13, 2020 at 01:53:28PM +0200, Laurenz Albe wrote:

! > I've never seen anybody coding bash - it is strongly shunned in the
! > Berkeley community. 
! 
! Strange, but then I don't move in these circles.

Never mind.

! > Some Questions:
! >  1. There are explicit error messages in loc-82 and -92 of pgpre.sh.
! > To where are these written?
! 
! Standard error.  It is up to the caller of the script to route that
! somewhere useful.

Understood.

! >  2. The result data from pg_stop_backup() are stored into the living
! > database. But, according to the docs, they should be placed into
! > the completed backup. Do I have a misunderstanding here?
! 
! Right, but these scripts don't know anything about that backup itself.
! They are designed to be called before and after the backup.
! In between, you back up the data directory however you think fit.
! 
! It is the responsibility of the caller of the post-backup script
! to add the "backup_label" file to the backup.

I see.

! >  4. If, by misconfiguration and/or operator error, the backup system
! > happens to start a second backup. in parallel to the first,
! > then do I correctly assume, both backups will be rendered
! > inconsistent while this may not be visible to the operator; and
! > the earlier backup would be flagged as apparently successful while
! > carrying the wrong (later) label?
! 
! If you are using my scripts and start a second backup while the first
! one is still running, the first backup will be interrupted.

This is not what I am asking. It appears correct to me, that, on
the database, the first backup will be interrupted. But on the
tape side, this might go unnoticed, and on completion it will
successfully receive the termination code from the *SECOND*
backup - which means that on tape we will have a seemingly
successful backup, which
 1. is corrupted, and
 2. carries a wrong label.

! This is specific to my scripts, PostgreSQL's non-exclusive backup
! can perform more than one concurrent backup successfully.
! I tried to keep things simple.

I understand. But the operator may not know that and/or accidentially
start a second backup while one is still running. And this will then
result in ...

! If you have the wrong "backup_label", you end up with silent data corruption.

 ... this.

Indeed this is difficult to avoid, because the high risk of silent
data corruption is an elementary architectural feature of the so-called
"new API". Which is why this is not going to run on my ship.

But you will have to wait - the to-be-expected inrush of service-sales
due to corrupted clusters will only happen after R.13 is active and
peope are *forced* to cope with that "new API".

Thanks for the effort of answering my questions.

cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-13 Thread Peter

On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote:
! > Okay. So lets behave like professional people and figure how that
! > can be achieved:
! > At first, we drop that WAL requirement, because with WAL archiving
! > it is already guaranteed that an unbroken chain of WAL is always
! > present in the backup (except when we have a bug like the one that
! > lead to this discussion).
! > So this is **not part of the scope**.
! >
!
! I would assume that anybody who deals with backups professionally wouldn't
! consider that out of scope,

I strongly disagree. I might suppose You haven't thought this to the
proper end. See:

The WAL backup is, for a couple of reasons, different to most other
backup objects:
- it runs on demand, not per schedule.
- it must be immediately repeated (or alerted), until success,
otherwise the database might stop.
- it might use a different storage (e.g. twofold/mirrored),
for obvious reasons.
- it gets deleted after backup, and that is a point-of-no-return.

For the data tree backup, on the contrary, all these attributes are
identical to those of any other incr/full filesystem backup.

You can see that all the major attributes (scheduling, error-handling,
signalling, ...) of a WAL backup are substantially different to that
of any usual backup.
This is a different *Class* of backup object, therefore it needs an
appropriate infrastructure that can handle these attributes correctly.

The WAL do *not belong* into the scope of the data tree backup, they
are to be handled by different schemes in a different way.

But, if You never have considered *continuous* archiving, and only
intend to take a functional momentarily backup of a cluster, then You
may well have never noticed these differences. I noticed them mainly
because I did *BUILD* such an infrastructure (the 20 lines of shell
script, you know).

And yes, I was indeed talking about *professional* approaches.

! There is *absolutely* no need for threading to use the current APIs. You
! need to run one query, go do something else, and then run another
! query.

Wrong. The point is, I dont want to "go do something else", I have to
exit() and get back to the initiator at that place.

This is also clearly visible in Laurenz' code: he utilizes two
unchecked background tasks (processes, in this case) with loose
coupling for the purpose, as it does not work otherwise.

! But then, lets think another step forward: for what purpose do we
! > actually need to call pg_start_backup() and pg_stop_backup() at all?
! > I couldn't find exhaustive information about that, only some partial
! > facts.
! >
!
! Since you don't trust the documentation, I suggest you take a look at
!
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=55cac186dc71fcc2f4628f9974b30850bb51eb5d;hb=92c58fd94801dd5c81ee20e26c5bb71ad64552a8#l10438

Thanks, this indeed gives a most entertaining read. Honestly, I've
never seen such a source before - there is thrice as much storytime
than code. :)

The most interesting point in there appears to be this:
> that the backup label and tablespace map files are not written to
> disk. Instead, their would-be contents are returned in *labelfile
> and *tblspcmapfile,

This is in do_pg_start_backup() - so we actually HAVE this data
already at the *START* time of the backup!

Then why in hell do we wait until the END of the backup before we
hand this data to the operator: at a time when the DVD with the
backup is already fixated and cannot be changed anymore, so that
we have to send this data to quarternary storage (spell: the line
printer) and have girl Friday sort them out (and probably mix them
up)??

Is this indeed Boeing-security: try to avoid possible mistake by
creating a much bigger risk?

! Things that remain to be figured out:
! > 1. What does pg_start_backup actually do and why would that be
! > necessary? I could not find exhaustive information, but this can
! > probably figured from the source. Currently I know so much:
! > - it writes a backup_label file. That is just a few lines of
! >ASCII and should not be difficult to produce.
! >
!
! It does that only in exclusive mode, and doing that is one of the big
! problems with exclusive mode. So don't do that.

As I can read, there is no difference in the function requirements
between exclusive and non-exclusive mode, in that regard: the
backup-label file is NOT necessary in the running cluster data tree,
BUT it should get into the RESTORED data tree before starting it.

And I can't find a single one of those "big problems". What I do find
is just people whining that their cluster doesn't start and they can't
simply delete a file, even if told so. Like soldier complaining that
his gun doesn't shoot and he has no idea how to reload.

I could devise a dozen ways how to detect that situation automatically
during rc.d and remove the file - if it were of any concern to me.

! > I now hop

Re: Something else about Redo Logs disappearing

2020-06-15 Thread Peter

On Mon, Jun 15, 2020 at 11:44:33AM +0200, Laurenz Albe wrote:
! On Sat, 2020-06-13 at 19:48 +0200, Peter wrote:
! > ! >  4. If, by misconfiguration and/or operator error, the backup system
! > ! > happens to start a second backup. in parallel to the first,
! > ! > then do I correctly assume, both backups will be rendered
! > ! > inconsistent while this may not be visible to the operator; and
! > ! > the earlier backup would be flagged as apparently successful while
! > ! > carrying the wrong (later) label?
! > ! 
! > ! If you are using my scripts and start a second backup while the first
! > ! one is still running, the first backup will be interrupted.
! > 
! > This is not what I am asking. It appears correct to me, that, on
! > the database, the first backup will be interrupted. But on the
! > tape side, this might go unnoticed, and on completion it will
! > successfully receive the termination code from the *SECOND*
! > backup - which means that on tape we will have a seemingly
! > successful backup, which
! >  1. is corrupted, and
! >  2. carries a wrong label.
! 
! That will only happen if the backup that uses my scripts does the
! wrong thing.

Yes. Occasionally software does the wrong thing, it's called "bugs".

! An example:
! 
! - Backup #1 calls "pgpre.sh"
! - Backup #1 starts copying files
! - Backup #2 calls "pgpre.sh".
!   This will cancel the first backup.
! - Backup #1 completes copying files.
! - Backup #1 calls "pgpost.sh".
!   It will receive an error.
!   So it has to invalidate the backup.
! - Backup #2 completes copying files.
! - Backup #2 calls "pgpost.sh".
!   It gets a "backup_label" file and completes the backup.

That's not true.

Now let me see how to compile a bash... and here we go:

! An example:
! 
! - Backup #1 calls "pgpre.sh"

> $ ./pgpre.sh
> backup starting location: 1/C858
> $

We now have:
> 24129 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 24130 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 24131 10  SJ   0:00.01 psql -Atq
> 24158 10  SCJ  0:00.00 sleep 5

And:
> postgres=# \d
>  List of relations
>  Schema |  Name  | Type  |  Owner   
> ++---+--
>  public | backup | table | postgres
> (1 row)
>  
> postgres=# select * from backup;
>  id |  state  |  pid  | backup_label | tablespace_map 
> +-+---+--+
>   1 | running | 24132 |  | 
> (1 row)

! - Backup #1 starts copying files

Let's suppose it does now.

! - Backup #2 calls "pgpre.sh".

> $ ./pgpre.sh
> backup starting location: 1/C924
> $ FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
> Backup failed
> ./pgpre.sh: line 93: ${PSQL[1]}: ambiguous redirect
> 
> $ echo $?
> 0

!   This will cancel the first backup.

Yes, it seems it did:

> 25279 10  SJ   0:00.00 /usr/local/bin/bash ./pgpre.sh
> 25280 10  IWJ  0:00.00 /usr/local/bin/bash ./pgpre.sh
> 25281 10  SJ   0:00.01 psql -Atq
> 25402 10  SCJ  0:00.00 sleep 5

> postgres=# \d
>  List of relations
>  Schema |  Name  | Type  |  Owner   
> ++---+--
>  public | backup | table | postgres
> (1 row)
> 
> postgres=# select * from backup;
>  id |  state  |  pid  | backup_label | tablespace_map 
> +-+---+--+
>   1 | running | 25282 |  | 
> (1 row)

! - Backup #1 completes copying files.
! - Backup #1 calls "pgpost.sh".

> $ ./pgpost.sh 
> START WAL LOCATION: 1/C924 (file 0001000100C9)
> CHECKPOINT LOCATION: 1/C958
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2020-06-15 14:09:41 CEST
> LABEL: 2020-06-15 14:09:40
> START TIMELINE: 1
>
> $ echo $?
> 0

!   It will receive an error.
!   So it has to invalidate the backup.

Where is the error?

What we now have is this:
No processes anymore.

>  id |  state   |  pid  |  backup_label
>   | tablespace_map 
> +--+---++
>   1 | complete | 25282 | START WAL LOCATION: 1/C924 (file 
> 0001000100C9)+| 
> |  |   | CHECKPOINT LOCATION: 1/C958  
>  +| 
> |  |   | BACKUP METHOD: streamed

Re: Something else about Redo Logs disappearing

2020-06-15 Thread Peter

On Mon, Jun 15, 2020 at 03:19:29PM +0200, Laurenz Albe wrote:
! On Mon, 2020-06-15 at 14:50 +0200, Peter wrote:
! > ! An example:
! > ! 
! > ! - Backup #1 calls "pgpre.sh"
! > ! - Backup #1 starts copying files
! > ! - Backup #2 calls "pgpre.sh".
! > !   This will cancel the first backup.
! > ! - Backup #1 completes copying files.
! > ! - Backup #1 calls "pgpost.sh".
! > !   It will receive an error.
! > !   So it has to invalidate the backup.
! > ! - Backup #2 completes copying files.
! > ! - Backup #2 calls "pgpost.sh".
! > !   It gets a "backup_label" file and completes the backup.
! > 
! > That's not true.
! 
! Ah, yes, you are right.

Thank You.

! Since "pgpre.sh" and "pgpost.sh" are independent, there
! is no way to tell which of them belongs to which other.

Correct.
 
! So calling "pgpost.sh" indeed ends the most recently started
! backup and returns "backup_label" accordingly.
! 
! That means: the caller of the scripts has to make sure
! not to start a second backup while the first one is running.

Never run two backups in parallel with such an approach, exactly.

And that is one of a couple of likely pitfalls I perceived when
looking at that new API.

We could fix that, but that will then get more complicated - and
people will usually not do that. And that's why I consider that
new API as rather dangerous.


cheerio,
PMc

Re: Something else about Redo Logs disappearing

2020-06-16 Thread Peter

On Sun, Jun 14, 2020 at 03:05:15PM +0200, Magnus Hagander wrote:

! > You can see that all the major attributes (scheduling, error-handling,
! > signalling, ...) of a WAL backup are substantially different to that
! > of any usual backup.
! 
! > This is a different *Class* of backup object, therefore it needs an
! > appropriate infrastructure that can handle these attributes correctly.
! >
! 
! Yes, this is *exactly* why special-handling the WAL during the base backup
! makes a lot of sense.

Certainly. Only I prefer to do the special-handling *outside of* the
base backup.

! Is it required? No.
! Will it make your backups more reliable? Yes.

*shrug* I have no benefit in increasing reliability from 250% to 330%,
if that would be the case at all.

! But, if You never have considered *continuous* archiving, and only
! > intend to take a functional momentarily backup of a cluster, then You
! > may well have never noticed these differences. I noticed them mainly
! > because I did *BUILD* such an infrastructure (the 20 lines of shell
! > script, you know).
! >
! 
! Yes, if you take a simplistic view of your backups, then yes.

You appear to sound like an insurance salesman who desperately tries
to sell a third health insurance policy to somebody who already has
two of them, by trying to build on unfounded precariousness.

! ! There is *absolutely* no need for threading to use the current APIs. You
! > ! need to run one query, go do something else, and then run another
! > ! query.
! >
! > Wrong. The point is, I dont want to "go do something else", I have to
! > exit() and get back to the initiator at that place.
! >
! 
! That is not a requirement of the current PostgreSQL APIs.

We'll be done with that whole API in a few more lines now. (I'm getting
tired of this.)

! (in fact, using
! threading would add a significant extra burden there, as libpq does not
! allow sharing of connections between threads)

I never said one would need to thread the DB connections.

! That is a requirement, and indeed a pretty sharp limitation, of the *other*
! APIs you are working with, it sounds like.

What "other"?

! The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do
! they require any form of threading.

Ah, nice try! But, we're *NOT* shifting blame around. We do instead
get things working. We do proper engineering.
 
! And the fact that you need to do an exit() would negate any threading
! anyway, so that seems to be a false argument regardless.

You do know exactly what I'm talking about.

! This is also clearly visible in Laurenz' code: he utilizes two
! > unchecked background tasks (processes, in this case) with loose
! > coupling for the purpose, as it does not work otherwise.
! >
! 
! Yes, because he is also trying to work around a severely limited API *on
! the other side*.

There is no "other" side. There is only *one* side: to get things
working. And for interaction, Jon Postel's law applies:

  Be conservative in what you provide, and liberal in what you require.

This is how the Internet was built. The modern-day linux-youngsters
tend to forget that we all stand on the shoulders of giants.

! The most interesting point in there appears to be this:
! >   > that the backup label and tablespace map files are not written to
! >   > disk. Instead, their would-be contents are returned in *labelfile
! >   > and *tblspcmapfile,
! >
! > This is in do_pg_start_backup() - so we actually HAVE this data
! > already at the *START* time of the backup!
! 
! 
! > Then why in hell do we wait until the END of the backup before we
! > hand this data to the operator: at a time when the DVD with the
! >
! 
! Because it cannot be safely written *into the data directory*.
! 
! Now, it could be written *somewhere else*, that is true. And then you would
! add an extra step at restore time to rename it back. But then your restore
! would now also require a plugin.

Yes, and as it is now, it requires girl Friday to fetch them from
the line-printer and mix them up - which, as we already got explained,
can end up a *lot* worse. Or, equivalently and as here practically
demonstrated, some consultant trainee writing some script which, when
accidentially invoked twice, creates an inconsistent backup, and
this being invisible to the operator. That's indeed dangerous enough
for my taste.

But lets grab that from the start:
Yes, I didn't trust the docs. Because, as people here are so crazy
about the old API being troublesome and dangerous and must be
deprecated, and the whole thing being so imminent, then there should
be some REASON for that. And from the docs I could not see any reason
 - so I supposed there must be something else in pg_start_backup();
something that is not explained in the docs, and that would explain
the whole bohei.

But, in fact, there is no such thing.

First, the backup_label, which should not stay in the running cluster
tree. So, what bad does happen when it stays there? Nothing at all.
The cluster migh

Re: Something else about Redo Logs disappearing

2020-06-16 Thread Peter

On Mon, Jun 15, 2020 at 09:46:34PM +0200, Laurenz Albe wrote:
! On Mon, 2020-06-15 at 19:00 +0200, Peter wrote:
! > And that is one of a couple of likely pitfalls I perceived when
! > looking at that new API.
! 
! That is a property of my scripts, *not* of the non-exclusive
! backup API...

Then how would I have figured it before even knowing Your scripts?

The API requires such kind of programming, and such kind of
programming creates dangerous pitfalls. After You have trained soms
hundreds of system administrators, You will also see such things
right at first glance.

But then it's futile to discuss with religious people, as they are
blind to reasoning: AL II. 58, "The slaves shall serve."


cheerio,
PMc

SELECT creates millions of temp files in a single directory

2022-04-23 Thread Peter

In modern versions of postgres a simple SELECT writes a couple
of millions of individual temp files into a single directory under
pgsql_tmp.
I know of no filesystem that would take such lightly, and even
ZFS gets some problems with such extremely long directories.

What is the rationale in this behaviour and how is it supposed to
be handled?

The specific case is a database that is in use for almost 20 years.
It happened to run on a pentium-2 with 768 MB memory, there it was
slow, but did work reliably.
Now it runs on a 2660v3 with 32 GB memory, and tends to exhaust that
memory.

Database size has not increased, postgres memory configuration has
not been changed, only postgres versions were gradually upgraded
from 8 to 12.

This is the memory configuration:

shared_buffers = 40MB
temp_buffers = 20MB
work_mem = 50MB
max_stack_depth = 40MB
max_files_per_process = 200

But the actual memory consumption is 30 GB (per query!), i.e. all
of the installed memory:

 UID   PID  PPID  C PRI NI VSZ RSS MWCHAN   STAT  TT TIME COMMAND
 770 53143 10252 16  20  0 9359944 7796128 zio->io_ DsJ-  3:11.29 postgres: 
bareos bareos fd00::118(53471) SELECT (postgres)
 770 54334 10252 17  20  0 9279780   24388 zio->io_ DsJ-  2:58.19 postgres: 
parallel worker for PID 53143(postgres)
 770 54335 10252 17  20  0 9279780   22168 zfs  DLsJ   -  2:51.30 postgres: 
parallel worker for PID 53143(postgres)

This is the situation on the filesystem:
$ data12/base # du -k 16387/
9093312 16387/

$ data12/base/pgsql_tmp # du -k *
19979644pgsql_tmp53143.0.sharedfileset

$ data12/base/pgsql_tmp/pgsql_tmp53143.0.sharedfileset # ls -la | wc
 1264755 11382788 96271672

More than a million files in a single directory, this is
inacceptable.


This is the query:
  SELECT DISTINCT Path.PathId, File.PathId, Path.Path
  FROM Path LEFT JOIN File USING (PathId)
 LEFT JOIN PathHierarchy ON (Path.Pathid = PathHierarchy.PPathId)
  WHERE File.PathId IS NULL AND PathHierarchy.PPathId IS NULL LIMIT 30

These are the stats:
  Path: 464229 live tuples, 42 MB, 49 MB index
  File: 43779725 live tuples, 1 dead tuple, 7160 MB, 2971 MB index
  PathHierarchy: 380879 live tuples, 13 MB, 17 MB index

The pathhierarchy table is irrelevant to the behaviour and can be left
out.

Vacuum and Analyze has been run right before the query.

This is the structure:
CREATE TABLE IF NOT EXISTS public.path
(
pathid integer NOT NULL DEFAULT nextval('path_pathid_seq'::regclass),
path text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT path_pkey PRIMARY KEY (pathid)
)
CREATE TABLE IF NOT EXISTS public.file
(
fileid bigint NOT NULL DEFAULT nextval('file_fileid_seq'::regclass),
fileindex integer NOT NULL DEFAULT 0,
jobid integer NOT NULL,
pathid integer NOT NULL,
deltaseq smallint NOT NULL DEFAULT 0,
markid integer NOT NULL DEFAULT 0,
fhinfo numeric(20,0) NOT NULL DEFAULT 0,
fhnode numeric(20,0) NOT NULL DEFAULT 0,
lstat text COLLATE pg_catalog."default" NOT NULL,
md5 text COLLATE pg_catalog."default" NOT NULL,
name text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT file_pkey PRIMARY KEY (fileid)
)
CREATE INDEX IF NOT EXISTS file_jpfid_idx
ON public.file USING btree
(jobid ASC NULLS LAST, pathid ASC NULLS LAST, name COLLATE 
pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;

And this does not give the explanation:
bareos=# explain SELECT DISTINCT Path.PathId, File.PathId, Path.Path  FROM Path 
LEFT JOIN File USING (PathId)
 LEFT JOIN PathHierarchy ON (Path.Pathid = PathHierarchy.PPathId)
  WHERE File.PathId IS NULL AND PathHierarchy.PPathId IS NULL LIMIT 30;
QUERY PLAN  
   
---
 Limit  (cost=1367204.20..1370204.20 rows=30 width=67)
   ->  HashAggregate  (cost=1367204.20..1371346.45 rows=414225 width=67)
 Group Key: path.pathid, file.pathid, path.path
 ->  Gather  (cost=1225693.97..1364097.51 rows=414225 width=67)
   Workers Planned: 2
   ->  Parallel Hash Anti Join  (cost=1224693.97..1321675.01 
rows=172594 width=67)
 Hash Cond: (path.pathid = file.pathid)
 ->  Parallel Hash Anti Join  (cost=6727.04..19953.72 
rows=181359 width=63)
   Hash Cond: (path.pathid = pathhierarchy.ppathid)
   ->  Parallel Seq Scan on path  (cost=0.00..7361.29 
rows=193429 width=63)
   ->  Parallel Hash  (cost=3926.46..3926.46 
rows=224046 width=4)
 ->  Parallel Seq Scan on pathhierarchy  
(cost=0.00..3926.46 rows=224046 width=4)
 ->  Parallel Hash  (cost=918690.59..918690.59 
rows=18241547 width=4)
   ->

Re: SELECT creates millions of temp files in a single directory

2022-04-23 Thread Peter

On Sat, Apr 23, 2022 at 02:11:00PM -0700, Adrian Klaver wrote:
! On 4/23/22 12:50, Peter wrote:
! 
! 
! > People seem to have been brainwashed by Web-Services and OLTP,
! > and now think the working set must always fit in memory. But this
! > is only one possible usecase, it is not the exclusive only one.
! 
! This is no-win situation as most of the complaints in recent years have been
! that Postgres was/is to conservative in its default settings and is not
! taking advantage of newer more powerful hardware.

I know, and You got to the point; this is exactly what I am talking
about: people take the abundance of ressources as granted.

In Rel. 8 postgres was a breathtaking beauty of engineering: the style
of old, driven to perfection. 
Now You're gradually sacrificing this, for the speed junkies and to
protect those from mistakes who are not engineers.

And no, I don't know how this could be solved: the more influential
you get, the more driven you are by public demand; the less freedom
you have to follow ideals.

David Johnston thinks we must just not speak that out, we must instead
behave like "the emperor's new clothes", and follow google's
understanding of "positive values".
Sorry, that doesn't work for me.

Re: SELECT creates millions of temp files in a single directory

2022-04-23 Thread Peter

On Sat, Apr 23, 2022 at 05:13:41PM -0400, Tom Lane wrote:
! "David G. Johnston" writes:
! > I'll add that given the nature of the problem that changing temp_file_limit
! > away from its default of unlimited may be useful.
! >
https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-DISK
!
! Maybe ... that limits the total space involved, not the number of
! files it's separated into, so I'm not sure how much it can help.

That's what I understood from the docs, too. What I also read in the
docs is that it will just kill the query when it hits the limit, and
this is not really what I want.

And btw, I probably lost-in-translation the relevant info about the
running version:

Name : postgresql12-server
Version: 12.10
Installed on : Mon Apr 4 04:13:18 2022 CEST
Origin : databases/postgresql12-server
Architecture : FreeBSD:13:amd64

! It might be worth playing around to see how varying values of work_mem
! affect this behavior, too. That should change the planner's initial
! estimate of the number of hash batches needed, which likely plays into
! this somehow.

Agreed. But then, looking at the generated filenames, in the style of
"i44297of524288.p1.0" - this is an algorithm at work, so somebody must
have done this, and obviousely didn't bother to create half a million
of files, after having created another half million already.

So I thought I might just ask what is the idea with this.

| > It would help if you can provide a self-contained demonstration
| > that others can then verify and debug (or explain).
|
| ... and that. As this message stands, it's undocumented whining.
| Please see
|
| https://wiki.postgresql.org/wiki/Slow_Query_Questions
|
| for the sort of information we need to debug performance issues.

It is not a performance issue, it is a design question: You inflict
pain on my beloved ZFS, and as a lover I react. ;)

| (I recall that we have fixed some problems in the past that could
| lead to unreasonable numbers of temp files in hash joins. Whether
| there's more to do, or Peter is running a version that lacks those
| fixes, is impossible to tell with the given info.)

Yes, I was accidentially deleting that info too when I deleted the
more extensive rants from my original posting. See here, above.

Why is my table continuousely written?

2022-08-03 Thread Peter



Hello,

 this is postgres version 12.11_1 on Freebsd 13.1

I have a table "mess", filename "6309215", that behaves strange.
Data is currently only inserted/appended, there is no update and no
(successful) delete done, autovacuum is NOT enabled.

This is the definition:

CREATE TABLE IF NOT EXISTS dnstap.mess
(
id bigint NOT NULL DEFAULT nextval('dnstap.mess_id_seq'::regclass),
mess text COLLATE pg_catalog."default",
hash text COLLATE pg_catalog."default",
CONSTRAINT mess_pkey PRIMARY KEY (id),
CONSTRAINT mess_hash_key UNIQUE (hash)
)
TABLESPACE l1only;


And this is the table:

-rw---  1 770  770  1073741824 Aug  2 04:38 6309212
-rw---  1 770  770  1073741824 Aug  2 04:38 6309212.1
-rw---  1 770  770  1073741824 Aug  2 04:38 6309212.2
-rw---  1 770  770  1073741824 Aug  2 04:39 6309212.3
-rw---  1 770  770  1073741824 Aug  2 04:39 6309212.4
-rw---  1 770  770   405872640 Aug  3 04:28 6309212.5
-rw---  1 770  770 1441792 Aug  3 03:25 6309212_fsm
-rw---  1 770  770  180224 Aug  2 05:13 6309212_vm
-rw---  1 770  770   0 Aug  2 00:21 6309213
-rw---  1 770  7708192 Aug  2 04:40 6309214
-rw---  1 770  770  1073741824 Aug  3 04:29 6309215
-rw---  1 770  770  1073741824 Aug  3 04:30 6309215.1
-rw---  1 770  770  1073741824 Aug  3 04:13 6309215.10
-rw---  1 770  770   995811328 Aug  3 04:16 6309215.11
-rw---  1 770  770  1073741824 Aug  3 04:32 6309215.2
-rw---  1 770  770  1073741824 Aug  3 04:34 6309215.3
-rw---  1 770  770  1073741824 Aug  3 04:35 6309215.4
-rw---  1 770  770  1073741824 Aug  3 04:37 6309215.5
-rw---  1 770  770  1073741824 Aug  3 04:39 6309215.6
-rw---  1 770  770  1073741824 Aug  3 04:41 6309215.7
-rw---  1 770  770  1073741824 Aug  3 04:42 6309215.8
-rw---  1 770  770  1073741824 Aug  3 04:12 6309215.9
-rw---  1 770  770 3170304 Aug  3 04:38 6309215_fsm
-rw---  1 770  770  393216 Aug  3 04:38 6309215_vm
-rw---  1 770  770  1073741824 Aug  3 03:17 6309216
-rw---  1 770  770  1073741824 Aug  2 04:33 6309216.1
-rw---  1 770  770   241360896 Aug  3 03:18 6309216.2
-rw---  1 770  770  606208 Aug  3 03:18 6309216_fsm
-rw---  1 770  770   73728 Aug  3 03:17 6309216_vm
-rw---  1 770  77043139072 Aug  3 03:18 6309217


There is another table, "6309212", that is also only appended data,
and that behaves as expected: only the final segment is written, the
other segments are a day old.
But in "6309215" all the segments were recently written. ("6309216" is
the TOAST for "6309215", and "6309217" seems to be an index on that.)


This is what happens:

2022-08-03 03:09:03 CEST
-- --- Check for hash value clash with old entries
SELECT i1.id
  FROM dnstap.incoming AS i1,
   dnstap.mess AS mess
 WHERE mess.hash = md5(i1.mess)
   AND mess.mess <> i1.mess

I don't think a SELECT like this has reasons to write the
referenced table - but it does:

-rw---  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215
-rw---  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.1
-rw---  1 770  770  1073741824 Aug  3 03:09:47 2022 6309215.10
-rw---  1 770  770   965394432 Aug  3 03:10:17 2022 6309215.11
-rw---  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.2
-rw---  1 770  770  1073741824 Aug  3 03:09:20 2022 6309215.3
-rw---  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.4
-rw---  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.5
-rw---  1 770  770  1073741824 Aug  3 03:10:18 2022 6309215.6
-rw---  1 770  770  1073741824 Aug  3 03:10:21 2022 6309215.7
-rw---  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.8
-rw---  1 770  770  1073741824 Aug  3 03:09:21 2022 6309215.9

And this goes on in that way with all the queries.
And when finally the vacuum analyze is run, it is the same, only
that one writes the segments in sequence (only inserts were appended
to the end, there are no updates or deletes):

2022-08-03 04:28:16 CEST
VACUUM (ANALYZE) dnstap.mess;

-rw---  1 770  770  1073741824 Aug  3 04:29 6309215
-rw---  1 770  770  1073741824 Aug  3 04:30 6309215.1
-rw---  1 770  770  1073741824 Aug  3 04:13 6309215.10
-rw---  1 770  770   995811328 Aug  3 04:16 6309215.11
-rw---  1 770  770  1073741824 Aug  3 04:32 6309215.2
-rw---  1 770  770  1073741824 Aug  3 04:33 6309215.3
-rw---  1 770  770  1073741824 Aug  3 04:07 6309215.4
-rw---  1 770  770  1073741824 Aug  3 04:08 6309215.5
-rw---  1 770  770  1073741824 Aug  3 04:09 6309215.6
-rw---  1 770  770  1073741824 Aug  3 04:10 6309215.7
-rw---  1 770  770  1073741824 Aug  3 04:11 6309215.8
-rw---  1 770  770  1073741824 Aug  3 04:12 6309215.9
-rw---  1 770  770 3170304 Aug  3 03:18 6309215_fsm
-rw---  1 770  770  393216 Aug  3 04:28 6309215_vm

What is worse is this:

-rw---  1 770   770  16777216 Aug  3 04:28 000100780067.ok
-rw---  1 770

Why is my table continuousely written? -> XID issue?

2022-08-03 Thread Peter



I had a look into what actually changed in the table. At the first
write that I grabbed, four rows in that segment had such a change:

117ee000  77 00 00 00 00 df b8 82  8e a4 00 00 64 00 a0 00  |w...d...|
117ee000  77 00 00 00 f0 22 b4 f3  68 d3 00 00 64 00 a0 00  |w"..h...d...|

117ee0a0  f8 5c 00 00 f8 5c 00 00  09 00 00 00 06 00 f7 8b  |.\...\..|
117ee0b0  13 00 04 00 93 00 18 0d  d8 27 fe 01 00 00 00 00  |.'..|
117ee0b0  13 00 04 00 93 01 18 0d  d8 27 fe 01 00 00 00 00  |.'..|

1aad8000  77 00 00 00 30 fc 8c a5  80 8a 00 00 58 00 a8 00  |w...0...X...|
1aad8000  77 00 00 00 e8 12 8a f3  6e 5b 00 00 58 00 a8 00  |w...n[..X...|

1aad80a0   f6 5d 00 00 00 00 00 00  |.]..|
1aad80b0  09 00 00 00 06 00 6c d5  10 00 04 00 03 08 18 0d  |..l.|
1aad80b0  09 00 00 00 06 00 6c d5  10 00 04 00 03 09 18 0d  |..l.|

If I get that right, then it is the HEAP_XMIN_COMMITTED from
t_infomask that got set for the rows.

The next write is a bit different:

a000  76 00 00 00 60 9d 84 d0  23 4c 04 00 70 00 50 01  |v...`...#L..p.P.|
a000  78 00 00 00 80 81 e9 05  78 0d 04 00 70 00 50 01  |x...x...p.P.|

a690   cf 51 00 00 00 00 00 00  |504c54...Q..|
a690   cf 51 00 00 13 5e 00 00  |504c54...Q...^..|
a6a0  00 00 00 00 06 00 05 00  12 00 04 00 03 2b 18 0d  |.+..|
a6a0  00 00 00 00 06 00 05 00  12 00 04 00 93 23 18 0d  |.#..|

003ec000  76 00 00 00 30 98 85 d0  59 e9 04 00 60 00 48 01  |v...0...Y...`.H.|
003ec000  78 00 00 00 08 33 1b 03  bd 97 04 00 60 00 48 01  |x3..`.H.|

003eddf0  cf 51 00 00 00 00 00 00  00 00 00 00 06 00 f6 01  |.Q..|
003eddf0  cf 51 00 00 13 5e 00 00  00 00 00 00 06 00 f6 01  |.Q...^..|
003ede00  02 00 04 00 03 2b 18 0d  f3 56 2a 00 00 00 00 00  |.+...V*.|
003ede00  02 00 04 00 93 23 18 0d  f3 56 2a 00 00 00 00 00  |.#...V*.|

It seems here is the t_xmax for the row updated, and also flags in
t_infomask changed - and there are about 1000 rows (of some 3 mio.
that are contained in this segment) where this is done.

So if I get this right, it looks like some out-of-band housekeeping is
done here.
Now, while I would understand that this is done as occasion offers
when a record is read anyway, and I do not think that I grok the
XID housekeeping in fullness, I also do not think that I have any open
transactions that would persist a server restart. So I was under the
impression that a server restart plus a vacuum FULL, without any
further update/delete operations, should resolve all the needs for
such housekeeping for the time being, and quiesce the situation.

Apparently it does not. So, then, what does?

Re: pkg: two postgresql clients

2022-08-10 Thread Peter

On Wed, Aug 10, 2022 at 09:25:37AM +0200, m...@ft-c.de wrote:

Hi Franz,

 You will get much better targeted help with such questions
at https://forums.freebsd.org (if it is FreeBSD you're running) or
in German on https://www.bsdforen.de/ (for all tastes of Berkekey).

Something is apparently wrong here. postgres-15 is AFAIK the development
branch, and it should not even be possible to install two versions
at the same time on the same node.

Which repository server are You using? (I compile locally, so I don't
know what might be offered from the public servers.) And what is Your
OS version?
 

cheerio,
PMc


! Hello,
! 
! 1.
! pkg upgrade show this result:
! 
! New packages to be INSTALLED:
! postgresql13-client: 13.7
! postgresql15-client: 15.b1_1
! 
! Installed packages to be UPGRADED:
! firefox: 103.0.1,2 -> 103.0.2,2
! nspr: 4.34 -> 4.34.1
! opencore-amr: 0.1.5 -> 0.1.6
! py39-psutil: 5.9.1_1 -> 5.9.1_2
! 
! There are two different PG clients. Is this correct?
! 
! 2.
! The command 
! pkg install  firefox
! New packages to be INSTALLED:
! postgresql15-client: 15.b1_1
! Installed packages to be UPGRADED:
! firefox: 103.0.1,2 -> 103.0.2,2
! 
! PG is not a dependent package of firefox?
! 
! I use version:
! postgres -V 
! postgres (PostgreSQL) 14.1
! 
! Franz

Re: How to make PostreSQL utilities honor home directories?

2022-09-01 Thread Peter



Well, like others mentioned before, it is not getting fully clear
what You are trying to achieve. But, in any case, if this is Your
problem 

On Thu, Sep 01, 2022 at 06:01:02PM -0400, Jeffrey Walton wrote:
! Hi Everyone,
! 
! We are having a heck of a time getting PostreSQL utilities to honor
! home directories. For example, when I execute this script:
! 
! sudo -H -u postgres PGPASSWORD=${password} \
! psql -h "${hostname}" -U "${username}" -d "${database}" \
! --command="..."
! 
! It produces failures:
! 
! could not change directory to "/home/jwalton/godojo": Permission denied

 ... this appears to me as rather a sudo issue. Because certainly
psql cannot execute /as user postgres/ in a directory where user
postgres is not allowed to enter. So sudo should fix that, and in
my sudo installation I find either a "-D directory" option for
sudo (that should change the directory accordingly) or a "--login"
option (that would run a full login shell for the user postgres,
which, alongside going to the postgres homedir, does a lot of
other things which may or may not be desireable in your installation).

Tools for moving normalized data around

2023-01-18 Thread Peter

Hi,

  imagine I have a database containing normalized data: a whole bunch
of tables all related via foreign keys (i.e. the thing one should
usually have ;) ).

So there is a dependency graph: all records relate to others in
some tree-like fashion (has-many, belongs-to, etc.)

Now I want to grab some part of the data, on a certain condition
(let's say all records belonging to user 'Bob', if there is a "user"
table somewhere at the tree-bottom), and move it to another database
with the very same layout - which is already populated with data
and runs a different instance of the same application.

Grabbing the intended records is just some dead-simple Selects. But
then inserting them into the other database is not fun, because some
primary keys will likely collide.
And if we start to update the primary keys to new unique values, we
must consequently update ALL the foreign keys throughout the entire
tree of tables.

How to do this with two tables connected via one foreign key, that is
explained a dozen times in Stackoverflow. But what if the tree is 50
tables and 120 foreign key columns?
It can be done. But probably not manually.

So, as this seems a very usual use-case for normalized data, is there
any tooling available? Anywhere? (I searched, I didn't find.)

Also, it seems the provided commands are not very supporting. Try to
do an Insert and *always* create a new PK from the sequence, and
return the old and the new PK for the inserted row. It seems this does
not work without either naming all the other columns explicitely in
the insert (impossible, they are always different) or establishing
some trigger functions...


Different example, same general problem: Imagine I have such a tree
of normalized tables, and I want to grab a part of it and roll only
that part back in time, to some state it was X weeks ago.
(That's possible if we store records of changes in an extra column
within the rows themselves. It might even be possible with some
postgres-internal data, which has a knowledge of history - but there
seems no API access to that.)

But I want to have this working on click-button, independent
of the table layouts, only with a basic graph of the dependency tree
and with an unlimited number of involved tables, i.e.: maintainable.


I have searched for solutions (or building blocks to solutions) for
these and similar problems, and didn't find much. I don't understand
that - SQL is old, normalization is old, (even postgres has already
become quite old) and these are genuine issues.
It can't be I'm the only one thinking about such things.

So maybe I'm looking at the wrong place? Any ideas welcome.

Cheers,
PMc

Queries running forever, because of wrong rowcount estimate

2023-02-11 Thread Peter



TL;DR Version:
==
For a table where all rows have been deleted, under certain conditions
the planner does assume that it contains the values that were
previousely present, and will therefore assume that nothing needs to
be added, while in fact everything needs to be added.
-
This is
PostgreSQL 12.13 on amd64-portbld-freebsd13.1, compiled by FreeBSD
clang version 13.0.0 (g...@github.com:llvm/llvm-project.git
llvmorg-13.0.0-0-gd7b669b3a303), 64-bit

and if somebody can tell me this has been fixed in a more recent
version, then I am happy.
---

More elaborate Version:
===
When you have a table populated, pg_class may look like this:

 relpages | reltuples   
   
--+---  
   
1 | 3   
   

Now when you delete all rows, this will not change, and the planner
will now make the -wrong- assumption that the tuples are still
present, and planning may be bad. That is why ANALYZE is now
needed, and after ANALYZE things look like this, and do work again:

 relpages | reltuples 
--+---
1 | 0

But, if you happen to run a VACUUM ANALYZE on that table, then things
will look like this:

 relpages | reltuples 
--+---
0 | 0

And now, for reasons unknown to me, the planner will again make the
assupltion that all the previous values are still present in the
table, and planning will be bad. And at that point you can run
ANALYZE as often as you want, it does not help anymore.
---

Full Version:
=

Lets start with the query:

INSERT INTO opcode (opcode)
SELECT DISTINCT
   incoming.opcode AS opcode
  FROM incoming AS incoming
   LEFT JOIN opcode AS opcode USING (opcode)
 WHERE opcode.opcode IS NULL

Table "incoming" will bring millions of rows, but column "opcode"
knows only three distinct values. Others might probably appear, so
this INSERT will add such new value to table "opcode".

So far, no problem.

But what happens if I delete all rows from "opcode" and start the
process afresh? Then the INSERT *should* reinsert all the (three)
values freshly into table "opcode", i.e. there will be *millions*
of rows "WHERE opcode.opcode IS NULL", before the "DISTINCT".

This is also not a problem. Not yet.

Now we decide that not all the rows in table "incoming" are already
processable, so we limit the whole operation to those that are:

INSERT INTO opcode (opcode)
SELECT DISTINCT
   incoming.opcode AS opcode
  FROM incoming AS incoming
   LEFT JOIN opcode AS opcode USING (opcode)
 WHERE opcode.opcode IS NULL
   AND EXISTS (
 SELECT ready.id
   FROM tmp_ready AS ready
  WHERE ready.id = incoming.id
 )

And now this query will run forever and never return:

"Unique  (cost=434008.43..434008.44 rows=1 width=6)"
"  ->  Sort  (cost=434008.43..434008.44 rows=1 width=6)"
"Sort Key: incoming.opcode"
"->  Nested Loop Semi Join  (cost=1058.15..434008.42 rows=1 width=6)"
"  Join Filter: (incoming.id = pairs.ori)"
"  ->  Gather  (cost=1058.15..151427.56 rows=1 width=14)"
"Workers Planned: 3"
"->  Hash Anti Join  (cost=58.15..150427.46 rows=1 
width=14)"
"  Hash Cond: (incoming.opcode = opcode.opcode)"
"  ->  Parallel Seq Scan on incoming  
(cost=0.00..147589.32 rows=741332 width=14)"
"  ->  Hash  (cost=31.40..31.40 rows=2140 width=6)"
"->  Seq Scan on opcode  (cost=0.00..31.40 
rows=2140 width=6)"
"  ->  Seq Scan on tmp_ready ready  (cost=0.00..253869.27 
rows=2296927 width=8)"

The LEFT JOIN is estimated with only one result row (we know there can
actually be millions), and therefore this goes into a Nested Loop that
will read tmp_ready forever. 

Putting an Index onto tmp_ready helps the matter - but that is not the
root cause here.
The root cause is a wrong estimate: When deleting all rows from table
"opcode", this apparently does not change the planner behaviour. The
planner still thinks that all values from the incoming.opcode column are
already in opcode.opcode.

After trying with different experiments, I finally managed to somehow
kick ANALYZE to produce working statistics, and now it looks like this:

"HashAggregate  (cost=554927.43..554927.44 rows=1 width=6)"
"  Group Key: incoming.opcode"
"  ->  Hash Semi Join  (cost=291554.87..549213.83 rows=2285442 width=6)"
"Hash Cond: (incoming.id = ready.id)"
"->  Hash Anti Join  (cost=1.01..192070.20 rows=2286638 width=14)"
"  Hash Cond: (incoming.opcode = opcode.opcode)"
"  ->  Seq S

[Testcase] Queries running forever, because of wrong rowcount estimate

2023-02-12 Thread Peter

CREATE DATABASE ttc
WITH
OWNER = admin
ENCODING = 'UTF8'
LC_COLLATE = 'de_DE.UTF-8'
LC_CTYPE = 'de_DE.UTF-8'
TABLESPACE = pg_default
CONNECTION LIMIT = -1
IS_TEMPLATE = False;

select version();
PostgreSQL 12.13 on amd64-portbld-freebsd13.1, compiled by FreeBSD clang 
version 13.0.0 (g...@github.com:llvm/llvm-project.git 
llvmorg-13.0.0-0-gd7b669b3a303), 64-bit


CREATE TABLE  public.incoming
(
id bigint NOT NULL,
data text COLLATE pg_catalog."default",
CONSTRAINT incoming_pkey PRIMARY KEY (id)
)
TABLESPACE pg_default;

insert into incoming(id) select generate_series(1,100);
update incoming set data = 'EINS' where data IS NULL;
insert into incoming(id) select generate_series(101,200);
update incoming set data = 'ZWEI' where data IS NULL;
insert into incoming(id) select generate_series(201,300);
update incoming set data = 'DREI' where data IS NULL;

CREATE TABLE IF NOT EXISTS public.data
(
data text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT data_pkey PRIMARY KEY (data)
)
TABLESPACE pg_default;

insert into data (data) values ('EINS'), ('DREI');

analyze public.data;
analyze public.incoming;

explain
select distinct data
from incoming 
Left outer join public.data using (data)
where data.data is null;

 HashAggregate  (cost=67371.04..67371.07 rows=3 width=5)
   Group Key: incoming.data
   ->  Hash Anti Join  (cost=0.55..64871.04 rows=100 width=5)
 Hash Cond: (incoming.data = data.data)
 ->  Seq Scan on incoming  (cost=0.00..44745.50 rows=300 width=5)
 ->  Hash  (cost=0.52..0.52 rows=2 width=5)
   ->  Seq Scan on data  (cost=0.00..0.52 rows=2 width=5)

delete from data;
vacuum analyze data;

explain
select distinct data
from incoming 
Left outer join public.data using (data)
where data.data is null;

 Unique  (cost=56056.08..56056.09 rows=1 width=5)
   ->  Sort  (cost=56056.08..56056.09 rows=1 width=5)
 Sort Key: incoming.data
 ->  Hash Anti Join  (cost=60.58..56056.07 rows=1 width=5)
   Hash Cond: (incoming.data = data.data)
   ->  Seq Scan on incoming  (cost=0.00..44745.50 rows=300 
width=5)
   ->  Hash  (cost=29.70..29.70 rows=2470 width=5)
 ->  Seq Scan on data  (cost=0.00..29.70 rows=2470 width=5)

"rows=1" in the "Hash Anti Join" line is WRONG. It should be
300. Or at least some thousands.
On the next-higher level there will now a Nested Loop chosen. And that
Nested Loop will do whatever costly things it needs to do - only not 1
time but three million times.

I think I have a workaround patch also.

---
Server Configuration Tuning:

 
< #port = 5432  # (change requires restart)
< #max_connections = 100# (change requires restart)
---
> port = 5434   # (change requires restart)
> max_connections = 60  # (change requires restart)

< #shared_buffers = 32MB# min 128kB
---
> shared_buffers = 40MB # min 128kB

< #temp_buffers = 8MB   # min 800kB
---
> temp_buffers = 20MB   # min 800kB

< #work_mem = 4MB   # min 64kB
< #maintenance_work_mem = 64MB  # min 1MB
---
> work_mem = 50MB   # min 64kB
> maintenance_work_mem = 50MB   # min 1MB

< #max_stack_depth = 2MB# min 100kB
---
> max_stack_depth = 40MB# min 100kB

< #max_files_per_process = 1000 # min 25
---
> max_files_per_process = 200   # min 25

< #effective_io_concurrency = 1 # 1-1000; 0 disables prefetching
---
> effective_io_concurrency = 5  # 1-1000; 0 disables prefetching

< #max_parallel_workers_per_gather = 2  # taken from max_parallel_workers
---
> max_parallel_workers_per_gather = 0   # taken from max_parallel_workers

< #synchronous_commit = on  # synchronization level;
---
> synchronous_commit = off  # synchronization level;

< #wal_sync_method = fsync  # the default is the first option
---
> wal_sync_method = fsync   # the default is the first option

< #full_page_writes = on# recover from partial page 
writes
---
> full_page_writes = off# recover from partial page 
> writes

< #wal_init_zero = on   # zero-fill new WAL files
---
> wal_init_zero = off   # zero-fill new WAL files

< #wal_writer_delay = 200ms # 1-1 milliseconds
---
> wal_writer_delay = 2000ms # 1-1 milliseconds

< #checkpoint_timeout = 5min# range 30s-1d
---
> checkpoint_timeout = 10min# range 30s-1d

< #checkpoint_completion_target = 0.5   # checkpoint target duration, 0.0 - 1.0
---
> checkpoint_completion_target = 0.0# checkpoint target du

Re: [Testcase] Queries running forever, because of wrong rowcount estimate

2023-02-13 Thread Peter

On Mon, Feb 13, 2023 at 12:38:12PM -0500, Tom Lane wrote:
! Peter  writes:
! > "rows=1" in the "Hash Anti Join" line is WRONG. It should be
! > 300. Or at least some thousands.
! 
! FWIW, this behaves better in v14 and up.  In older versions there's
! an ambiguity about what relpages=reltuples=0 means; it's the state
! when ANALYZE has never yet examined the table contents, but it's
! also the state when we have found the table to be empty.  Because
! of the former case, older planner versions don't like to believe
! that a table is empty even if reltuples says so.

Thanks, Tom, I found that as change 3d351d916b2 - that should
make the issue more unlikely and not so prone to happen during
testing.
OTOH, that code did apparently not do anything evil that would
not be possible to happen under other circumstances. It's unlikely,
because one would at some point change such queries to a more
performant style, but I am currently trying to understand why
it can happen at all...

cheers,
PMc

[Outcome] Queries running forever, because of wrong rowcount estimate

2023-02-14 Thread Peter

On Mon, Feb 13, 2023 at 12:38:12PM -0500, Tom Lane wrote:
! Peter  writes:
! > "rows=1" in the "Hash Anti Join" line is WRONG. It should be
! > 300. Or at least some thousands.
! 
! FWIW, this behaves better in v14 and up.  In older versions there's
! an ambiguity about what relpages=reltuples=0 means; it's the state
! when ANALYZE has never yet examined the table contents, but it's
! also the state when we have found the table to be empty.  Because
! of the former case, older planner versions don't like to believe
! that a table is empty even if reltuples says so.
! 
!   regards, tom lane

Okay, I got someway through with it.

Given, the destination table is empty, and VACUUMed. Then there is
relpages = reltuples = 0.

We do some
"source LEFT OUTER JOIN destination ON (matchfield) WHERE matchfield IS NULL"
The source is big, but carries only a few distinct matchfield values.

The so-called "hack" in heapam_handler.c:heapam_estimate_rel_size()
does then makes the planner believe that there are actually 10
curpages. This is not wrong, because the same might happen if one does
an INSERT amounting for 10 pages and not yet ANALYZE.

We then get a calculated rowcount of whatever, lets assume 2330 rows.

Then we go into eqjoinsel():

n_distinct for the left source table is known, it is (lets assume) 4.
n_distinct for the right destination table is not known because it is
not analyzed, but it is UNIQUE, so it becomes 2330.

We get into eqjoinsel_inner(): 

MCV (right) for the destination is not known because it is not analyzed.

selec = selec / nd2   => 0.000429

We get into eqjoinsel_semi():

if (nd1 <= nd2 || nd2 < 0)
selec = 1.0 - nullfrac1;

Voila, selec = 1.0

And so the ANTI JOIN will estimate to 0 (aka 1) result rows, instead
of whatever rowcount the source brings along (usually big).

- cut ---
Okay, so this should be fixed in R.14.

But, as mentioned above, the same can also happen in normal operations

Example:
I have a logfile from my webserver and a table to collect the (unique)
IP-addresses. I start with an empty table and feed in the log.

First round, the destination addresses. Lats assume there are 2330
different addresses.
Second round, the source addresses. There are only 4 different ones.

I don't do an ANALYZE between the two steps.

And voila, I get the same effect as above. (Been there, done
it: duration: 30216273.583 ms)

Agreed, this is an inefficient way of doing that. It would be better
to do a UNION on the source data, at that point filter it accordingly,
then DISTINCT, and only then the ANTI JOIN. 

But I'm a lazy guy and I may not care if a query that could run in
0.5 seconds takes 60 seconds instead.
However, it's not so nice when it takes 10 hours.

Anyway, I don't get a real clue on what this stance should actually
achieve (from selfuncs.c:eqjoinsel_semi()#2773):

if (nd1 <= nd2 || nd2 < 0)
selec = 1.0 - nullfrac1;
else
selec = (nd2 / nd1) * (1.0 - nullfrac1);

This seems to assume a 100% match whenever the left (source) table
brings fewer distinct(!) values than the right (destination) table
already carries.

For now, I have just disabled this behaviour, in the rough&dirty way:

--- src/backend/optimizer/path/costsize.c.orig  2022-11-07 22:47:13.0 
+0100
+++ src/backend/optimizer/path/costsize.c   2023-02-13 00:04:54.156489000 
+0100
@@ -4685,8 +4685,11 @@
/* pselec not used */
break;
case JOIN_ANTI:
+   if(fkselec * jselec > 0.9975) {
+   jselec = 0.9975;
+   }
nrows = outer_rows * (1.0 - fkselec * jselec);
nrows *= pselec;
break;
default:
/* other values not expected here */


cheerio,
PMc

Autovacuum endless loop in heap_page_prune()?

2024-05-25 Thread Peter

Good morning,

 I just found Autovacuum run for 6 hours on a 8 GB table, VACUUM query
doesnt cancel, cluster doesn't stop, autovacuum worker is not
killable, truss shows no activity, after kill -6 this backtrace:

* thread #1, name = 'postgres', stop reason = signal SIGABRT
  * frame #0: 0x00548063 postgres`HeapTupleSatisfiesVacuumHorizon + 531
frame #1: 0x0054aed9 postgres`heap_page_prune + 537
frame #2: 0x0054e38a postgres`heap_vacuum_rel + 3626
frame #3: 0x006af382 postgres`vacuum_rel + 626
frame #4: 0x006aeeeb postgres`vacuum + 1611
frame #5: 0x007b4664 postgres`do_autovacuum + 4292
frame #6: 0x007b2342 postgres`AutoVacWorkerMain + 866
frame #7: 0x007b1f97 postgres`StartAutoVacWorker + 39
frame #8: 0x007ba0df postgres`sigusr1_handler + 783
frame #9: 0x0008220da627 libthr.so.3`___lldb_unnamed_symbol683 + 215
frame #10: 0x0008220d9b1a libthr.so.3`___lldb_unnamed_symbol664 + 314
frame #11: 0x7913
frame #12: 0x007bba25 postgres`ServerLoop + 1541
frame #13: 0x007b9467 postgres`PostmasterMain + 3207
frame #14: 0x0071a566 postgres`main + 758
frame #15: 0x004f9995 postgres`_start + 261

After restart, no problems reported yet.

Storyline:
this is the file-list table of my backup/archive system, contains ~50
mio records. Recently I found a flaw in the backup system, so that some
old records weren't removed. I wrote a script to do this, that script
did run first at 04:15 and reported it had now removed a lot of old
data. I looked into pgadmin4 and it reported 9 mio dead tuples.

I wondered why autovacuum wouldn't run, and the stats showed that
regular daily vacuum had last run at 03:15 and autovacuum at 03:18. I
wondered why it wouldn't start again, and went to sleep. Having
consumed 300 minutes cpu now at 11:00, it did start at around 05:00.

No messages of hardware errors, good old Haswell Xeon/EP with all ECC.
Could there be a compute error somewhere, and does autovacuum lock
out ordinary kill signals during some time?


We're at Rel. 15.6 on FreeBSD 13.3

 *** Changed values: 
shared_buffers = 40MB
temp_buffers = 20MB
work_mem = 50MB
maintenance_work_mem = 50MB
max_stack_depth = 40MB
dynamic_shared_memory_type = posix
max_files_per_process = 200
effective_io_concurrency = 5
synchronous_commit = off
wal_sync_method = fsync
full_page_writes = off
wal_compression = on
wal_init_zero = off
wal_writer_delay = 2000ms
checkpoint_timeout = 180min
checkpoint_completion_target = 0.0
max_wal_size = 2GB
archive_mode = on
archive_timeout = 86400
seq_page_cost = 0.5
random_page_cost = 0.7
effective_cache_size = 1GB
default_statistics_target = 1000
autovacuum = on
autovacuum_naptime = 5min
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.05

Re: Autovacuum endless loop in heap_page_prune()?

2024-05-27 Thread Peter

On Mon, May 27, 2024 at 11:25:47AM +0200, Laurenz Albe wrote:
! On Sat, 2024-05-25 at 12:51 +0200, Peter wrote:
! >  I just found Autovacuum run for 6 hours on a 8 GB table, VACUUM query
! > doesnt cancel, cluster doesn't stop, autovacuum worker is not
! > killable, truss shows no activity, after kill -6 this backtrace:
! > 
! > * thread #1, name = 'postgres', stop reason = signal SIGABRT
! >   * frame #0: 0x00548063 postgres`HeapTupleSatisfiesVacuumHorizon + 
531
! >     frame #1: 0x0054aed9 postgres`heap_page_prune + 537
! >     frame #2: 0x0054e38a postgres`heap_vacuum_rel + 3626
! >     frame #3: 0x006af382 postgres`vacuum_rel + 626
! >     frame #4: 0x006aeeeb postgres`vacuum + 1611
! >     frame #5: 0x007b4664 postgres`do_autovacuum + 4292
! >     frame #6: 0x007b2342 postgres`AutoVacWorkerMain + 866
! >     frame #7: 0x007b1f97 postgres`StartAutoVacWorker + 39
! >     frame #8: 0x007ba0df postgres`sigusr1_handler + 783
! >     frame #9: 0x0008220da627 libthr.so.3`___lldb_unnamed_symbol683 + 215
! >     frame #10: 0x0008220d9b1a libthr.so.3`___lldb_unnamed_symbol664 + 
314
! >     frame #11: 0x7913
! >     frame #12: 0x007bba25 postgres`ServerLoop + 1541
! >     frame #13: 0x007b9467 postgres`PostmasterMain + 3207
! >     frame #14: 0x0071a566 postgres`main + 758
! >     frame #15: 0x004f9995 postgres`_start + 261
! > 
! > After restart, no problems reported yet.
! > 
! > Storyline:
! > this is the file-list table of my backup/archive system, contains ~50
! > mio records. Recently I found a flaw in the backup system, so that some
! > old records weren't removed. I wrote a script to do this, that script
! > did run first at 04:15 and reported it had now removed a lot of old
! > data. I looked into pgadmin4 and it reported 9 mio dead tuples.
! 
! This smells of index corruption.
! 
! I have seen cases where a corrupted index sends VACUUM into an endless loop
! so that it does not react to query cancellation.

Thanks for Your reply. So it can happen...

! Check the index with the "bt_index_check()" function from the "amcheck"
! extension.  If that reports a problem, rebuild the index.

-
-- Table: public.file

-- DROP TABLE IF EXISTS public.file;

CREATE TABLE IF NOT EXISTS public.file
(
fileid bigint NOT NULL DEFAULT nextval('file_fileid_seq'::regclass),
fileindex integer NOT NULL DEFAULT 0,
jobid integer NOT NULL,
pathid integer NOT NULL,
deltaseq smallint NOT NULL DEFAULT 0,
markid integer NOT NULL DEFAULT 0,
fhinfo numeric(20,0) NOT NULL DEFAULT 0,
fhnode numeric(20,0) NOT NULL DEFAULT 0,
lstat text COLLATE pg_catalog."default" NOT NULL,
md5 text COLLATE pg_catalog."default" NOT NULL,
name text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT file_pkey PRIMARY KEY (fileid)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.file
OWNER to bareos;
-- Index: file_jpfid_idx

-- DROP INDEX IF EXISTS public.file_jpfid_idx;

CREATE INDEX IF NOT EXISTS file_jpfid_idx
ON public.file USING btree
(jobid ASC NULLS LAST, pathid ASC NULLS LAST, name COLLATE 
pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
-- Index: file_pjidpart_idx

-- DROP INDEX IF EXISTS public.file_pjidpart_idx;

CREATE INDEX IF NOT EXISTS file_pjidpart_idx
ON public.file USING btree
(pathid ASC NULLS LAST, jobid ASC NULLS LAST)
TABLESPACE pg_default
WHERE fileindex = 0 AND name = ''::text;
-

bareos=# SET client_min_messages = DEBUG1;
bareos=# SELECT bt_index_check(index => 16753, heapallindexed => true);
DEBUG:  verifying consistency of tree structure for index "file_pjidpart_idx"
DEBUG:  verifying level 2 (true root level)
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying that tuples from index "file_pjidpart_idx" are present in 
"file"
DEBUG:  finished verifying presence of 490834 tuples from table "file" with 
bitset 33.61% set
LOG:  duration: 55912.563 ms  statement: SELECT bt_index_check(index => 16753, 
heapallindexed => true);
 bt_index_check 

 
(1 row)

bareos=# SELECT bt_index_check(index => 16752, heapallindexed => true);
DEBUG:  verifying consistency of tree structure for index "file_jpfid_idx"
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying that tuples from index "file_jpfid_idx" are present in "file"
DEBUG:  finished verifying presence of 45922943

Re: Autovacuum endless loop in heap_page_prune()?

2024-05-27 Thread Peter

On Mon, May 27, 2024 at 01:51:56PM +0200, Laurenz Albe wrote:

! > ! Apart from hardware problems, one frequent cause is upgrading glibc
! > ! (if the index on a string column or expression).
! > 
! > No, this is FreeBSD, we don't normally do such things... ;)
! 
! You don't update the C library, or collations never change?

I rarely update the C library.
Kernel + libraries + OS programs are a single unit here, updated
about once a year, and then by many people and with all the usual
testing.
I could lookup how often some locale was modified, but honestly, I'm
too lazy now.

| (but of course SQL_ASCII is a mistake).

Really? I re-read the chapter on Locale/CharSet Support, and I don't
see the mistake. Only one can not convert that to anything else, which
is fine in this usecase (storing arbitrary filenames from any OS in
any character-set within the same column). 


regards,
PMc

Re: Failing GSSAPI TCP when connecting to server

2024-09-30 Thread Peter

Hello Folks,

  Thanks for Your inspiration; and I made some progress (found
a way to avoid the issue).

The issue is most likely not related to postgres.

Ron Johnson said:

>> A configuration problem on the machine(s) can be ruled out,
> Famous last words.

Trust me. :)

> Is there a way to test pmc authentication via some other tool, like psql?

Sure, that works. The problem is contained inside the running
application program(s), everything else doesn't show it.

> If *only *the application changed, then by definition it can't be a
> database problem.  *Something* in the application changed; you just haven't
> found it.

Obviousely, yes. But then, such a change might expose an undesired
behaviour elsewhere.

> Specifically, I'd read the Discourse 2.3.0 and 2.3.1 release notes.

Correction: it is actually 3.2.0 and 3.3.1.

I finally went the way of bisecting, and, it's not really a problem in
Discourse either. It comes from a feature I had enabled in the course
of migrating, a filesystem change monitor based on kqueue:
   https://man.freebsd.org/cgi/man.cgi?query=kqueue
Removing that feature solves the issue for now.

I have still no idea how that tool might lead to mishandled sockets
elsewhere; it might somehow have to do with the async processing of
the DB connect. That would need a thorough look into the code where
this is done.

Tom Lane wrote:

>The TCP trace looks like the client side is timing out too quickly
>in the unsuccessful case. It's not clear to me how the different
>Discourse version would lead to the Kerberos library applying a
>different timeout.

It's not a timeout; a timeout would close the socket. It seems to
rather forget the socket.

>Still, it seems like most of the moving parts
>here are outside of Postgres' control --- I don't think that libpq
>itself has much involvement in the KDC communication.

Kerberos is weird. It goes into libgssapi, but libgssapi doesn't
do much on it's own, it just maps so-called "mech"s, which then point
to the actual kerberos code - which in the case of FreeBSD is very
ancient (but work should be underway to modernize it). It's one of
the most creepy pieces of code I've looked into.

> I concur with looking at the Discourse release notes and maybe asking
> some questions in that community.

They only support that app to run in a certain containerization
on a specific brand of Linux. They don't like my questions and
might just delete them.

Anyway, I have a lead now to either avoid the problem or where to
look more closely. And it has not directly to do with postgres, but
rather with genuine socket mishandling and/or maybe some flaw in
FreeBSD.

cheers,
PMc

Failing GSSAPI TCP when connecting to server

2024-09-29 Thread Peter

My application is trying to connect the database server, and meanwhile
tries to talk to the KDC server for a service ticket. 
Earlier these TCP connections did run like this, and were successful:

13:57:53.788797 IP6 clientIPv6.54143 > serverIPv6.88: Flags [S], seq 
4189109662, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 402503231 
ecr 0], length 0
13:57:53.789053 IP6 serverIPv6.88 > clientIPv6.54143: Flags [S.], seq 
1604590824, ack 4189109663, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS 
val 1505702993 ecr 402503231], length 0
13:57:53.789122 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 1, win 
1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 0
13:57:53.789223 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 1:5, ack 
1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 4
13:57:53.829575 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 5, win 
1035, options [nop,nop,TS val 1505703037 ecr 402503231], length 0
13:57:53.829696 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 5:1166, 
ack 1, win 1035, options [nop,nop,TS val 402503271 ecr 1505703037], length 1161
13:57:53.832411 IP6 serverIPv6.88 > clientIPv6.54143: Flags [P.], seq 1:5, ack 
1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 4
13:57:53.832440 IP6 serverIPv6.88 > clientIPv6.54143: Flags [FP.], seq 5:907, 
ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 
902
13:57:53.832461 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 908, win 
1021, options [nop,nop,TS val 402503276 ecr 1505703037], length 0
13:57:53.832620 IP6 clientIPv6.54143 > serverIPv6.88: Flags [F.], seq 1166, ack 
908, win 1035, options [nop,nop,TS val 402503276 ecr 1505703037], length 0
13:57:53.832847 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 1167, win 
1035, options [nop,nop,TS val 1505703037 ecr 402503276], length 0

After an upgrade of the application they now look like this:

16:53:09.637708 IP6 clientIPv6.49124 > serverIPv6.88: Flags [S], seq 
2860816940, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 2664510889 
ecr 0], length 0
16:53:09.637990 IP6 serverIPv6.88 > clientIPv6.49124: Flags [S.], seq 
2595272492, ack 2860816941, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS 
val 3896716896 ecr 2664510889], length 0
16:53:09.638153 IP6 clientIPv6.49124 > serverIPv6.88: Flags [.], ack 1, win 
1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 0
16:53:09.638158 IP6 clientIPv6.49124 > serverIPv6.88: Flags [P.], seq 1:5, ack 
1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 4
16:53:09.638162 IP6 clientIPv6.49124 > serverIPv6.88: Flags [FP.], seq 5:1166, 
ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 1161
16:53:09.638451 IP6 serverIPv6.88 > clientIPv6.49124: Flags [.], ack 1167, win 
1017, options [nop,nop,TS val 3896716896 ecr 2664510889], length 0
16:53:09.638508 IP clientIPv4.58934 > serverIPv4.88: Flags [S], seq 3474302110, 
win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 726603825 ecr 0], 
length 0
16:53:09.638696 IP serverIPv4.88 > clientIPv4.58934: Flags [S.], seq 
2424341350, ack 3474302111, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS 
val 2662337458 ecr 726603825], length 0
16:53:09.638729 IP clientIPv4.58934 > serverIPv4.88: Flags [.], ack 1, win 
1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 0
16:53:09.638733 IP clientIPv4.58934 > serverIPv4.88: Flags [P.], seq 1:5, ack 
1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 4
16:53:09.638751 IP clientIPv4.58934 > serverIPv4.88: Flags [FP.], seq 5:1166, 
ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 1161
16:53:09.638856 IP6 clientIPv6.46516 > serverIPv6.88: Flags [S], seq 
3686126664, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1901751531 
ecr 0], length 0
16:53:09.639054 IP serverIPv4.88 > clientIPv4.58934: Flags [.], ack 1167, win 
1009, options [nop,nop,TS val 2662337458 ecr 726603825], length 0
16:53:09.639071 IP6 serverIPv6.88 > clientIPv6.46516: Flags [S.], seq 
3830196306, ack 3686126665, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS 
val 457425497 ecr 1901751531], length 0
16:53:09.639105 IP6 clientIPv6.46516 > serverIPv6.88: Flags [.], ack 1, win 
1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 0
16:53:09.639130 IP6 clientIPv6.46516 > serverIPv6.88: Flags [P.], seq 1:5, ack 
1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 4
16:53:09.639154 IP6 clientIPv6.46516 > serverIPv6.88: Flags [FP.], seq 5:1166, 
ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 1161

This is repeated LOTS of times, and what remains is hundreds of
sockets in FIN_WAIT_2 that do not appear to disappear quickly.
The error message from postgres client is
GSSAPI continuation error: Miscellaneous failure: unable to reach any KDC in 
realm 
(In fact in does reach the K

Re: query_to_xml() returns invalid XML when query returns no rows

2018-03-12 Thread Peter Eisentraut

On 3/12/18 08:18, Thomas Kellerer wrote:
> I am not sure if this qualifies as a bug: 
> 
> query_to_xml() returns an empty XML document when the query returns no rows, 
> e.g:
> 
>select query_to_xml('select 42 where false', false, true, '');
> 
> The problem with this is, that if the resulting XML is then fed into e.g. the 
> xpath() function, that function fails because the "empty" document is an 
> invalid XML:

That's because you have the tableforest argument set to true.  If you
want a proper XML document, then you should write

select query_to_xml('select 42 where false', false, false, '');

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: query_to_xml() returns invalid XML when query returns no rows

2018-03-13 Thread Peter Eisentraut

On 3/13/18 15:21, Thomas Kellerer wrote:
> I still think it's incorrect to return an empty (=invalid) XML instead of a 
> NULL value though.

This behavior is specified in the SQL standard.  While an empty string
is not a valid XML "document", it is valid as XML "content".

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Question on corruption (PostgreSQL 9.6.1)

2018-03-16 Thread Peter Geoghegan

On Thu, Mar 15, 2018 at 8:16 AM, Andy Halsall  wrote:
> Thanks for the advice. I re-indexed and reloaded a pg_dumpall into a spare 
> server - no errors. Will run pg_catcheck asap.

You can also run amcheck. Get the version targeting earlier Postgres
releases off Github (there are packages for most Linux systems). This
can verify that the heap is consistent with indexes.

-- 
Peter Geoghegan

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Peter Geoghegan

On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel  wrote:
> SELECT heap_page_items(get_raw_page('pg_authid', 7));

Can you post this?

SELECT * FROM page_header(get_raw_page('pg_authid', 7));

-- 
Peter Geoghegan

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Peter Geoghegan

On Mon, Mar 19, 2018 at 1:55 PM, Jeremy Finzel  wrote:
> @Peter :
>
> staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
>   lsn   | checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> +--+---+---+---+-+--+-+---
>  262B4/10FDC478 |0 | 1 |   304 |  2224 |8192 | 8192 |
> 4 | 0
> (1 row)

Thanks.

That looks normal. I wonder if the contents of that page looks
consistent with the rest of the table following manual inspection,
though. I recently saw system catalog corruption on a 9.5 instance
where an entirely different relation's page ended up in pg_attribute
and pg_depend. They were actually pristine index pages from an
application index. I still have no idea why this happened.

This is very much a guess, but it can't hurt to check if the contents
of the tuples themselves are actually sane by inspecting them with
"SELECT * FROM pg_authid". heap_page_items() doesn't actually care
about the shape of the tuples in the page, so this might have been
missed.

-- 
Peter Geoghegan

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-21 Thread Peter Geoghegan

On Wed, Mar 21, 2018 at 1:38 PM, Jeremy Finzel  wrote:
> A server restart and upgrade to 9.5.12 (at the same time), as expected, made
> the issue go away.  Still doesn't give us any answers as to what happened or
> if it would happen again!  Thanks for the feeback.

You may still want to use amcheck to look for problems. The version on
Github works with 9.5, and there are Redhat and Debian pgdg packages.
See:

https://github.com/petergeoghegan/amcheck

The "heapallindexed" option will be of particular interest to you -
that option verifies that the table has matching rows for a target
index (in addition to testing the structure of a target B-Tree index
itself). This is probably the best general test for corruption that is
available. There is a fair chance that this will reveal new
information.

-- 
Peter Geoghegan

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-22 Thread Peter Geoghegan

On Thu, Mar 22, 2018 at 12:27 PM, Jeremy Finzel  wrote:
> Thank you for the recommendation.  I ran both amcheck functions on all 4
> indexes of those 2 tables with heapallindexed = true, but no issues were
> found.

Probably wouldn't hurt to run it against all indexes, if you can make
time for that. If you can generalize from the example query that calls
the bt_index_check() function, but set
"heapallindexed=>i.indisprimary" and remove "n.nspname =
'pg_catalog'", as well as "LIMIT 10".  This will test tables and
indexes from all schemas, which might be interesting.
-- 
Peter Geoghegan

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-22 Thread Peter Geoghegan

On Thu, Mar 22, 2018 at 2:24 PM, Jeremy Finzel  wrote:
> I am running this on a san snapshot of our production system.  I assume that
> this will give me a valid check for file-system-level corruption.  I am
> going to kick it off and see if I find anything interesting.

It might. Note that SAN snapshots might have corruption hazards,
though. Postgres expects crash consistency across all filesystems, so
you might run into trouble if you had a separate filesystem for WAL,
for example. I know that LVM snapshots only provide a consistent view
of a single logical volume, even though many LVM + Postgres setups
will involve multiple logical volumes. This makes it possible for a
small inconsistency across logical volumes to corrupt data.

I don't know anything about your SAN snapshotting, but this is at
least something to consider.

-- 
Peter Geoghegan

Re: Troubleshooting a segfault and instance crash

2018-03-24 Thread Peter Geoghegan

On Thu, Mar 8, 2018 at 9:40 AM, Blair Boadway  wrote:
> Mar  7 14:46:35 pgprod2 kernel:postgres[29351]: segfault at 0 ip
> 00302f32868a sp 7ffcf1547498 error 4 in
> libc-2.12.so[302f20+18a000]
>
> Mar  7 14:46:35 pgprod2 POSTGRES[21262]: [5] user=,db=,app=client= LOG:
> server process (PID 29351) was terminated by signal 11: Segmentation fault

> It crashes the database, though it starts again on its own without any
> apparent issues.  This has happened 3 times in 2 months and each time the
> segfault error and memory address is the same.

We had a recent report of a segfault on a Redhat compatible system,
that seemed like it might originate from within its glibc [1].
Although all the versions there didn't match what you have, it's worth
considering as a possibility.

Maybe you can't install debuginfo packages because you don't yet have
the necessary debuginfo repos set up. Just a guess. That is sometimes
a required extra step.

[1] https://postgr.es/m/7369.1520528...@sss.pgh.pa.us
--
Peter Geoghegan

Re: Autovacuum behavior with rapid insert/delete 9.6

2018-03-29 Thread Peter Geoghegan

On Thu, Mar 29, 2018 at 4:01 PM, Ted Filmore  wrote:
> What I am really asking to confirm is after describing the situation is it
> reasonable to focus on (in the short term) tuning autovacuum to increase
> performance or does this not make sense given the workload and I should look
> elsewhere?

I would look into this suspected 9.5 regression, if that's possible:

https://postgr.es/m/CAH2-Wz=sfakvmv1x9jh19ej8am8tzn9f-yecips9hrrrqss...@mail.gmail.com

-- 
Peter Geoghegan

Re: Problem with connection to host (wrong host)

2018-03-31 Thread Peter Eisentraut

On 3/31/18 11:44, Mike Martin wrote:
> Hi
> I am just setting up a postgres server, details
> Host 192.168.0.3
> pg_hba.conf
> # TYPE  DATABASE    USER    ADDRESS METHOD
> 
> # "local" is for Unix domain socket connections only
> local   all all peer
> # IPv4 local connections:
> host    all all 192.168.0.0/32
> <http://192.168.0.0/32>    ident

This entry should probably be something like

192.168.0.0/24

Writing .0/32 doesn't make much sense.

> # IPv6 local connections:
> host    all all 127.0.0.1/32
> <http://127.0.0.1/32>    ident
> 
> host    all all ::1/128 ident
> 
> postgresql.conf
> listen_addresses-'*'
> 
> however when I try to connect from my laptop (ip 192.168.0.2) I get
> 
> psql -h 192.168.0.3 -U usevideo -W
> Password for user usevideo:
> psql: FATAL:  no pg_hba.conf entry for host "192.168.0.2", user
> "usevideo", database "usevideo", SSL off
> 
> So a bit confused, is psql ignoring the host parameter

.3 is the host you are connecting to, as seen from the client.

.2 is the host your connection is coming from, as seen from the server.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: LDAP Bind Password

2018-04-04 Thread Peter Eisentraut

On 4/3/18 16:12, Kumar, Virendra wrote:
> Is anybody aware of how to encrypt bind password for ldap authentication
> in pg_hba.conf. Anonymous bind is disabled in our organization so we
> have to use bind ID and password but to keep them as plaintext in
> pg_hba.conf defeat security purposes. We want to either encrypt it or
> authenticate without binding. Any insights into this is appreciated.

You can use the "simple bind" method that is described in the
documentation.  That one doesn't involve a second bind step.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rationale for aversion to the central database?

2018-04-08 Thread Peter Klipstein

Tim, I'm sorry if I sound like a cheerleader, but boy did you nail this. I
would basically say exactly the same thing, just not as well.



On Sun, Apr 8, 2018 at 9:37 PM, Tim Cross  wrote:

>
>
> On 9 April 2018 at 07:39, Guyren Howe  wrote:
>
>> I am a Rails developer at a medium-large size company. I’ve mostly worked
>> at smaller companies. I’ve some exposure to other web development
>> communities.
>>
>> When it comes to databases, I have universally encountered the attitude
>> that one should treat the database as a dumb data bucket. There is a *very*
>> strong aversion to putting much of any business logic in the database. I
>> encounter substantial aversion to have multiple applications access one
>> database, or even the reverse: all abstraction should be at the application
>> layer.
>>
>> My best theory is that these communities developed at a time when Windows
>> was more dominant, and just generally it was *significantly* easier to use
>> MySQL than Postgres for many, particularly new, developers. And it is
>> pretty reasonable to adopt an aversion to sophisticated use of the database
>> in that case.
>>
>> This attitude has just continued to today, even as many of them have
>> switched to Postgres.
>>
>> This is only a hypothesis. I am now officially researching the issue. I
>> would be grateful for any wisdom from this community.
>>
>>
>> Aside: it is rare to find a situation in life or anywhere where one
>> widely adopted thing is worse in *every way* than another thing, but this
>> certainly was and largely still continues to be the case when one compares
>> MySQL and Postgres. So why do folks continue to use MySQL? I find this
>> mystifying.
>>
>
> It is interesting looking at many of the responses to this thread. I see a
> lot at each extreme - either put lots of stuff inthe database or use the
> database as just a 'dumb' store and put everything in the application code.
>
> I think the real solution is somewhere in the middle. I've lost count of
> the number of applications where the application code is jumping through
> all sorts of hoops to do basic data operations which would be far better
> handled in the database and can easily be done using just ANSI SQL (so is
> portable). It drives me crazy when people tell me the database is slow when
> they are doing 'select * from table' and then filtering and sorting the
> data in their application. Applications should take advantage of what the
> database does well. Unfortunately, I see far too many developers who are
> uncomfortable with SQL, don't know how to structure their queries
> efficiently (lots of nested sub queries etc, cartesian joins etc).
>
> At the other extreme is those who tend to put almost everything in the
> database - including business policy and business 'rules' which are
> probably better categorised as current business strategy. First, I think it
> is nearly always a mistake to try and enforce business policy with
> technology. Policies change too often and should be dealt with via
> administrative measures. Technology can certainly be used to raise alerts
> regarding policy breeches, but should not be used to enforce policies.
> Likewise, some business rules are more akin to strategies than being actual
> static rules and can change with little notice, rhyme or reason. These
> probably should not be 'hard coded' into the database. Other rules are more
> stable and unlikely to ever change and are likely good candidates for being
> encoded in the database as either functions or constraints.
>
> I do feel that often the big problem is with management who fail to
> understand the time and effort needed to develop a good data model.
> Developers are put under pressure to deliver functionality and as long as
> it looks correct at the interface level, all is good. Little thought is
> really put into long term maintenance or performance.  From a developer
> perspective, time put into becoming an expert in React, Angular, Node,
> Python etc is probably going to earn them more bonus points than time spent
> on developing skills in defining good data models or understanding of the
> power/functionality of the underlying database engine. Of course, this does
> tend to be short sighted as a good data model will tend to make it easier
> to add/enhance an application and understanding your database system will
> make changes and enhancements less daunting.
>
> For me, the sign of a good developer is one who is able to get the balance
> right. They understand the strengths and weaknesses of ALL the components
> involved and are able to select the technology mix which suits the problem
> domain and are able to get the right balance between business
> responsiveness to change and long term maintenance/viability.
> Unfortunately, such developers are rare, so it will usually mean there are
> a team of people with different skills and what will matter is how well
> they are able to work together as a team and come up with an architectur

Re: List all columns referencing an FK

2018-04-09 Thread Peter Eisentraut

On 2/8/18 05:31, Andreas Joseph Krogh wrote:
> Back in 2008 I asked this
> question: 
> http://www.postgresql-archive.org/Finding-all-tables-that-have-foreign-keys-referencing-a-table-td2153236.html

> I wonder, is this now possible using information_schema only, or are
> there still pieces missing in the standard holding this back?

I think you'll still have the same problems if the same constraint name
appears more than once per schema.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: How to monitor logical replication initial sync?

2018-04-09 Thread Peter Eisentraut

On 3/15/18 09:19, bricklen wrote:
> How does one monitor the status or progress of an initial sync under
> logical replication?  For example:
> 
> * I create a publication in database db_pub
> * I create a subscription in database db_sub
> * In 15 minutes I want to check an see that the initial sync is N%
> complete
> 
> Is it possible to tell when the initial sync is complete, or better
> yet, how complete it is?
> 
> 
> This is a question I'm quite interested in as well (and one I do not
> have an answer to).
>  
> Does anyone with more familiarity with logical replication have any
> suggestions on how to determine the status of the initial sync?

Something like

select * from pg_subscription_rel where srsubstate <> 'r' and srsubid =
(select oid from pg_subscription where subname = 'mysub');

The key is checking the srsubstate column for 'r' (ready).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ERROR: found multixact from before relminmxid

2018-04-09 Thread Peter Geoghegan

On Mon, Apr 9, 2018 at 7:01 AM, Tomas Vondra
 wrote:
> The bigger question is whether this can actually detect the issue. If
> it's due to an storage issue, then perhaps yes. But if you only see
> multixact issues consistently and nothing else, it might easily be a
> PostgreSQL bug (in which case the checksum will be correct).

You can also run amcheck. Get the version targeting earlier Postgres
releases off Github (there are packages for most Linux systems). This
can verify that the heap is consistent with indexes.

-- 
Peter Geoghegan

Re: ERROR: found multixact from before relminmxid

2018-04-09 Thread Peter Geoghegan

On Mon, Apr 9, 2018 at 5:55 PM, Alexandre Arruda  wrote:
> I ran amcheck in all index of a table and I only get empty returns.

Did you try doing so with the "heapallindexed" option? That's what's
really interesting here.

-- 
Peter Geoghegan

Re: ERROR: found multixact from before relminmxid

2018-04-09 Thread Peter Geoghegan

On Mon, Apr 9, 2018 at 6:56 PM, Alexandre Arruda  wrote:
> (... and all other indexes returns null too)
>
> I tried with bt_index_check too. Same results.

That's interesting, because it tells me that you have a table that
appears to not be corrupt, despite the CLUSTER error. Also, the error
itself comes from sanity checking added to MultiXact freezing fairly
recently, in commit 699bf7d0.

You didn't say anything about regular VACUUM being broken. Do you find
that it works without any apparent issue?

I have a suspicion that this could be a subtle bug in
CLUSTER/freezing. The only heap_freeze_tuple() caller is code used by
CLUSTER, so it's not that hard to imagine a MultiXact freezing bug
that is peculiar to CLUSTER. Though I haven't thought about it in much
detail.

-- 
Peter Geoghegan

Re: ERROR: found multixact from before relminmxid

2018-04-09 Thread Peter Geoghegan

On Mon, Apr 9, 2018 at 7:53 PM, Andres Freund  wrote:
> I've not followed this thread. Possible it's the overeager check for pg 
> upgraded tuples from before 9.3 that Alvaro fixed recently?

I was aware of commit 477ad05e, which must be what you're referring
to. I don't think that that's what this is, since this error occurs
within heap_freeze_tuple() -- it's not the over-enforced
HEAP_XMAX_IS_LOCKED_ONLY() error within heap_prepare_freeze_tuple().
And, because this database wasn't pg_upgraded.

I should wait until tomorrow before doing any further analysis, though.

-- 
Peter Geoghegan

Re: ERROR: found multixact from before relminmxid

2018-04-10 Thread Peter Geoghegan

On Tue, Apr 10, 2018 at 4:31 AM, Alexandre Arruda  wrote:
> Actualy, I first notice the problem in logs by autovacuum:
>
> 2018-04-10 08:22:15.385 -03 [55477] CONTEXT:  automatic vacuum of
> table "production.public.fn06t"
> 2018-04-10 08:22:16.815 -03 [55477] ERROR:  found multixact 68834765
> from before relminmxid 73262006
>
> production=# vacuum analyze verbose fn06t;
> INFO:  vacuuming "public.fn06t"
> ERROR:  found multixact 76440919 from before relminmxid 122128619

Do you think that CLUSTER was run before regular VACUUM/autovacuum
showed this error, though?

Have you noticed any data loss? Things look okay when you do your dump
+ restore, right? The problem, as far as you know, is strictly that
CLUSTER + VACUUM refuse to finish/raise these multixactid errors?

-- 
Peter Geoghegan

Re: ERROR: found multixact from before relminmxid

2018-04-10 Thread Peter Geoghegan

On Tue, Apr 10, 2018 at 7:54 PM, Alexandre Arruda  wrote:
> pg_control version number:1002

Andres was also asking about his check_rel() function, from
https://www.postgresql.org/message-id/20180319181723.ugaf7hfkluqyo...@alap3.anarazel.de.
Can you check that out as well, please? You'll need to be able to
install the pageinspect contrib module.

-- 
Peter Geoghegan

Re: New website

2018-04-18 Thread Peter Geoghegan

On Wed, Apr 18, 2018 at 3:03 PM, Adrian Klaver
 wrote:
> I would contact the Webmaster but Contact goes to a big image of an elephant
> head. That is also where Downloads, Support and Donate lands. Might have
> been a good idea to roll out a demo site for testing first. Will reserve
> judgment on the site design until it is functioning.

I really don't think it's practical to give everyone a veto on a
website design. It took years to launch this website redesign.
Honestly, I was beginning to think that it would never happen.

Anyway, I did notice something myself, which is that the git logo
links to https://git.postgresql.org/gitweb/, which has many
non-very-important git repos. Instead, it should point to the main
PostgreSQL repository's gitweb page, which is at
https://git.postgresql.org/gitweb/?p=postgresql.git.
-- 
Peter Geoghegan

Re: New website

2018-04-18 Thread Peter Geoghegan

On Wed, Apr 18, 2018 at 3:40 PM, Adrian Klaver
 wrote:
> And to get to the mailing list archives(arguably one of the most important
> links) you have to:
>
> Click on Community, click on Mailing Lists on the left sidebar, then scroll
> to the bottom of page to find the search box.

I look at the mailing list archives as much as anyone else does, and I
don't think that it's very important for it to have prominent
placement. Even still, the only difference here is the scrolling. It
has fundamentally the same structure as before.

> I am one of the No votes in the survey.

This sounds pretty far from constructive to me, which automatically
detracts from what you're saying.

-- 
Peter Geoghegan

Re: Can we run pg_basebackup master is still working normally (selects, updates, deleted, etc)

2018-05-04 Thread Peter Eisentraut

On 5/4/18 06:41, Adrian Heath wrote:
> The system is used 24/7. Can we run pg_basebackup while the master is 
> still receiving updates without worrying about physical file corruptions 
> in the generated archive file?

Yes, as long as you follow the instructions about saving the WAL
generated during the backup and replaying it on the newly created standby.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-25 Thread Peter Geoghegan

On Fri, May 25, 2018 at 1:38 PM, Andres Freund  wrote:
>> BTW I think the definition of HeapTupleHeaderXminFrozen is seriously
>> confusing, by failing to return true if the xmin is numerically
>> FrozenXid (which it'll be if the database was pg_upgraded).  I wonder
>> about this one in HeapTupleSatisfiesMVCC:
>
> I suggest raising this on -hackers. I agree that it's unfortunate.

I wonder if BootstrapTransactionId also needs to be considered here.

-- 
Peter Geoghegan

Re: Syndicating PostgreSQL mailing list to Discourse

2018-05-27 Thread Peter Geoghegan

On Sun, May 27, 2018 at 6:13 AM, Stephen Frost  wrote:
> I'm curious what would be different here from what our archives provide.
> We could certainly have a single "all lists" archive page but that seems
> more likely to be just completely confusing than actually useful at all.

Any replacement to our own archives will need to provide access to
mail from over 20 years ago to be in any way usable. It's not uncommon
to have to go back that far.

Personally, I don't buy the idea that the need to use a mailing list
rather than a web forum is a notable obstacle for new contributors.
PGLister seems pretty slick to me. It has eliminated all of the
frustrations that I had. Maybe we need to do a better job when it
comes to communicating what the benefits of a mailing list are,
though.

There are real, practical reasons to prefer a mailing list; that
preference isn't just due to ingrained habit. I'm pleased that there
has been a drive to modernize some of the community's infrastructure
in recent years, but only because those changes turned out to be
unalloyed improvements (at least in my view). Besides, while mailing
lists may seem antiquated to a lot of people, aren't web forums almost
as antiquated? Sites like Stack Overflow are very clearly not designed
to work as discussion forums. They do not allow subjective questions,
and it's common for moderators to swiftly delete new threads. Stack
Overflow is popular because it provides the fastest possible access to
a crowdsourced answer, without requiring or even encouraging
participation.

--
Peter Geoghegan

Re: Code of Conduct plan

2018-06-05 Thread Peter Geoghegan

On Mon, Jun 4, 2018 at 11:41 AM, Jason Petersen  wrote:
> Ultimately, the important thing this CoC provides is some concrete language
> to point at when a party is aggrieved and explicit avenues of redress
> available when one refuses to address one’s own behavior. We’re adults here,
> the strawmen of people being harangued out of the community because they
> said a bad word are unlikely to materialize.
>
> +1

This seems like a good summary on the purpose of the CoC.

It is of course possible that a member of the committee could act in
bad faith for any number of reasons. You can say the same thing about
any position of leadership or authority within the community, though.
That hasn't really been much of a problem in my experience, and I see
no reason for particular concern about it here.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-06-05 Thread Peter Geoghegan

On Tue, Jun 5, 2018 at 8:49 AM, Benjamin Scherrey
 wrote:
> I thought the same thing as a member of the Django community. It adopted a
> CoC that I vocally warned was dangerous and far more likely to be abused
> than provide any benefit. I was shocked when the very first time it was ever
> invoked it was by one of the founders of the project (whom I previously
> personally respected) and it was absolutely used in the manner that I had
> feared which was to shut someone up whose opinion he did not like rather
> than any legitimate concern. Unfortunately this is not such an unusual
> circumstance as one might hope in these projects or conferences. It is
> impossible to separate the concept of political correctness from these CoCs
> I find and they are much more dangerous things than they appear. We should
> tread with extreme cautious about adopting such a thing.

It's impossible for me to know what really happened in that situation,
but it doesn't seem like the CoC was likely to have been much of a
factor in any telling. If this individual was in a position of
influence and decided to act maliciously, they would no doubt have
found another way to do so in the absence of a CoC. On the other hand,
it's easy to imagine a newer non-influential community member finding
no recourse against abusive behavior because that isn't explicitly
provided for; they might simply not know where to start, and become
totally discouraged.

Nobody is claiming that the CoC is perfect, or that it can anticipate
every situation; it's just a framework for handling disputes about
abusive and/or antisocial behavior. The core team have had exclusive
responsibility for "Handling disciplinary issues" as part of their
charter, at least until now. You can make exactly the same slippery
slope argument against that.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-06-05 Thread Peter Geoghegan

On Tue, Jun 5, 2018 at 9:51 AM, James Keener  wrote:
> To be honest, this is a bigger problem. Why would someone not feel
> comfortable contacting the core team? Why would they feel better contacting
> the CoC board who is probably mostly core team or otherwise self-selected
> community members who have a strong belief in the CoC (and I don't mean that
> kindly)?

The CoC states that the committee's members cannot come from the core team.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-06-05 Thread Peter Geoghegan

On Tue, Jun 5, 2018 at 10:08 AM, Tom Lane  wrote:
> So publishing a formal CoC at all is mainly meant to deal with weak
> points 1 and 2, and then the details of the process are there to try
> to fix point 3.
>
> Yeah, managing the committee is a lot of overhead that in an ideal
> world we wouldn't need, but I think we have to accept it to have a
> process people will have confidence in.

It's worth pointing out that the community has grown considerably in
the last ten years. I assume that adding a bit of process to deal with
these kinds of disputes is related to that.

We have a pretty good track record through totally informal standards
for behavior. Setting a good example is absolutely essential. While
that's still the most important thing, it doesn't seem particularly
scalable on its own.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-06-05 Thread Peter Geoghegan

On Tue, Jun 5, 2018 at 12:20 PM, Benjamin Scherrey
 wrote:
> I keep hearing this claim. I've followed up and tried to verify them. Sorry
> but "trust me" doesn't cut it here any more than "trust me this will make
> Postgres go faster" would on a code change. What's the context for this?
> What evidence do we have that indicates this CoC would have likely resulted
> in a different outcome? Without that then your claim does not even rise up
> to the standard of theoretical. Frankly this claim does not seem very
> plausible to me at all. Let's try to keep our standards here.

Whose standards are these? By my count, the majority of e-mails you've
ever sent to a PostgreSQL mailing list have been sent in the last 2
days, to this code of conduct thread.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-06-05 Thread Peter Geoghegan

On Tue, Jun 5, 2018 at 2:06 PM, Sven R. Kunze  wrote:
> 1) CoC might result in developers leaving projects
> http://lists.llvm.org/pipermail/llvm-dev/2018-May/122922.html

This guy left LLVM for several reasons. The pertinent reason for us
was that he had to agree to a code of conduct in order to attend
conferences, which he found to be unacceptable. He did not have to
agree that the idea of a code of conduct was a good one, though. It
would have been perfectly possible for him to be opposed in principle
to the idea of a CoC, while also formally agreeing to it and attending
those conferences. I gather that his objections were around questions
of unintended consequences, the role of a certain authority to assess
violations of the CoC, and so on (I surmise that he was not actually
opposed to or constrained by any of the specific rules around content
in technical presentations and so on).

I for one accept that these may have been reasonable concerns, even
though I don't really agree, since the LLVM CoC seems quite
reasonable. Anybody that participates in an open source community soon
learns that their opinion on almost any matter may not be the one that
prevails. There are often differences of opinion on -hackers that seem
to fundamentally be down to a difference in values. We still manage to
make it work, somehow.

> 2) CoC might result in not so equal peers and friends, might result in a
> committee which feels above their peers, and might promote conceit and
> denunciation.

I think that having a code of conduct is better than not having one,
and I think that the one that we came up with is appropriate and
proportionate. We could speculate all day about specific unintended
consequences that may or may not follow. That doesn't seem very
constructive, though. Besides, the time for that has passed.

> In related discussions, people recurringly ask not to establish a secondary
> judicial system but to use the already existing ones.

I don't follow. Practically any organized group has rules around
conduct, with varying degrees of formality, means of enforcement, etc.
Naturally, the rules across disparate groups vary widely for all kinds
of reasons. Formalizing and being more transparent about how this
works seems like the opposite of paternalism to me.

-- 
Peter Geoghegan

Re: Replica string comparsion issue

2018-06-13 Thread Peter Geoghegan

On Wed, Jun 13, 2018 at 1:10 PM, Andrey Lizenko  wrote:
> I'm observing strange behaviour on comparing ::text field with string while
> quering replica.

The first thing to do here is to install amcheck on the replica, and
verify that the indexes are consistent with the replica's own notion
of how text sorts for the collation that is implicitly in use (the
database default):

https://www.postgresql.org/docs/current/static/amcheck.html

You can modify the example query to check the indexes that you're
interested in. I think that there is a very strong chance that the
replica has incompatible collation rules, given that it uses a totally
different OS.

-- 
Peter Geoghegan

Re: Can I disable sslcert/sslkey in pg_service.conf?

2018-06-14 Thread Peter Eisentraut

On 6/14/18 15:06, George Woodring wrote:
> If I put the blanks into pg_service.conf:
> 
> [mydb]
> dbname=mydb
> host=host1
> sslmode=require
> sslcert=
> sslkey=
> 
> It does not work.

I can believe that.

> [woodring@ibeam]$ PGSERVICE=mydb psql
> psql: SSL error: tlsv1 alert unknown ca
> 
> I tried the opposite of moving the .postgresql directory to a different
> name and putting a hard coded certificate path in pg_service, but it
> looks to have its own sets of challenges.

I think that's probably the best way out, though.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: What to do when dynamic shared memory control segment is corrupt

2018-06-18 Thread Peter Geoghegan

On Mon, Jun 18, 2018 at 1:03 PM, Tom Lane  wrote:
> Hm, I supposed that Sherrylyn would've noticed any PANIC entries in
> the log.  The TRAP message from an assertion failure could've escaped
> notice though, even assuming that her logging setup captured it.

Unhandled C++ exceptions end up calling std::abort(). I've seen bugs
in modules like PL/V8 that were caused by this. The symptom was a
mysterious message in the logs about SIGABRT. Perhaps that's what
happened here?

What extensions are installed, if any?

-- 
Peter Geoghegan

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-06-27 Thread Peter Geoghegan

On Wed, Jun 27, 2018 at 11:35 AM, Luca Ferrari  wrote:
> If I then disconnect and reconnect I'm able to issue the select and
> get back the results. But if I issue a reindex I got the same error
> and the table "becames unreadable" for the whole session.
> On 10.3 the table is never locked for the session, that is I can
> create the index, I can query the table and get the results, but I
> cannot reindex. However, even after a reindex, it does allow me to
> select data from the table.
>
> So my question is: why this behavior in later PostgreSQL?

It might have something to do with the changes to parallel CREATE
INDEX. It changed how we tracked whether or not an index could be used
because it was currently undergoing reindexing. This is supposed to
make no difference at all, but there was one bug that could cause us
to consider an index irrevocably unusable.

Do you find that the issue goes away if you set
max_parallel_maintenance_workers=0 on v11/master?

-- 
Peter Geoghegan

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-07-09 Thread Peter Geoghegan

On Thu, Jun 28, 2018 at 8:02 AM, Andres Freund  wrote:
> Peter, looks like you might be involved specifically.

Seems that way.

> This however seems wrong.  Cleary the relation's index list is out of
> date.
>
> I believe this happens because there's currently no relcache
> invalidation registered for the main relation, until *after* the index
> is built. Normally it'd be the CacheInvalidateRelcacheByTuple(tuple) in
> index_update_stats(), which is called at the bottom of index_build().
> But we never get there, because the earlier error.  That's bad, because
> any relcache entry built *after* the CommandCounterIncrement() in
> CommandCounterIncrement() will now be outdated.
>
> In the olden days we most of the time didn't build a relcache entry
> until after the index was built - but plan_create_index_workers() now
> does. I'm suspect there's other ways to trigger that earlier, too.

Note that there is a kludge within plan_create_index_workers() that
has us treat the heap relation as an inheritance parent, just to get a
RelOptInfo for the heap relation without running into similar trouble
with the index in get_relation_info(). Perhaps there's an argument to
be made for refactoring plan_create_index_workers() as a fix for this.

> Putting in a CacheInvalidateRelcache(heapRelation); before the CCI in
> index_create() indeed makes the "borked relcache" problem go away.
>
>
> I wonder why we don't just generally trigger invalidations to an
> indexes' "owning" relation in CacheInvalidateHeapTuple()?

I don't know, but that seems like a good question.

-- 
Peter Geoghegan

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-07-09 Thread Peter Geoghegan

On Mon, Jul 9, 2018 at 11:32 AM, Andres Freund  wrote:
>> Note that there is a kludge within plan_create_index_workers() that
>> has us treat the heap relation as an inheritance parent, just to get a
>> RelOptInfo for the heap relation without running into similar trouble
>> with the index in get_relation_info(). Perhaps there's an argument to
>> be made for refactoring plan_create_index_workers() as a fix for this.
>
> Maybe I'm missing something, but what has this got to do with the issue
> at hand?

Nothing. It might be worthwhile to find a way to not do that as part
of fixing this issue, though. Just a suggestion.

> I assume we'll have to backpatch this issue, so I think it'd probably a
> good idea to put a specific CacheInvalidateHeapTuple() in there
> explicitly in the back branches, and do the larger fix in 12. ISTM
> there's some risks that it'd cause issues.  Will you tackle this?

Okay.

-- 
Peter Geoghegan

Re: sorting/comparing column values in non-alphanumeric sorting ways ?

2018-07-11 Thread Peter Geoghegan

On Wed, Jul 11, 2018 at 2:44 PM, David Gauthier
 wrote:
> I want to load the 'highestver' column with the highest version of tcfg1-3.
>
> This won't work...
> update tv set greatest = greatest(tcfg1,tcfg2,tcfg3)
> ...because it thinks 1.0.9 is greater than 1.0.10
>
> Is there a way to get this to work right ?

If you're using v10 with ICU, then you can create a custom ICU
collation for this, with "natural" sort order. Something like this
should work:

CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');

See the docs -- "23.2.2.3.2. ICU collations".

-- 
Peter Geoghegan

Re: How to create logical replication slot with NOEXPORT_SNAPSHOT in jdbc

2018-07-17 Thread Peter Eisentraut

On 17.07.18 03:41, Igor Polishchuk wrote:
> We are trying to use logical decoding for detecting database changes.
> However, when we create a replication slot, the data processing pauses
> if there are still transactions running from before the slot creation.
> If I understand correctly, the slot is waiting for creating a consistent
> snapshot and is blocked by the long transactions.
> In our application, we don't need it, as we only want to see if some
> tables were modified. Is it possible to create a  logical replication
> slot with  NOEXPORT_SNAPSHOT option using jdbc?

That doesn't do what you want.  You still need to wait for the snapshot
to be created; there is no way around that.  The NOEXPORT_SNAPSHOT
option just means that the snapshot, once created, won't be exported for
use by other sessions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: User documentation vs Official Docs

2018-07-17 Thread Peter Eisentraut

On 17.07.18 02:13, Joshua D. Drake wrote:
> On 07/16/2018 05:08 PM, Alvaro Herrera wrote:
>>
>> Sounds like wiki pages could solve need this pretty conveniently.  If
>> and when the content is mature enough and migrates to the tutorial main
>> documentation pages, the wiki pages can be replaced with redirects to
>> those.
> 
> Anyone who writes a lot is going to rebel against using a wiki. They are 
> one of the worst to write in from a productivity perspective. I would 
> rather write in Docbook, at least then I can template everything and we 
> could have a standard xsl sheet etc...

I don't really buy that.  The wiki seems just fine for writing short to
medium size how-to type articles.  We already have good content of that
sort in the wiki right now.  It's not like there isn't going to be
anyone who will rebel against any of the other tool chains that have
been mentioned.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re:

2018-07-19 Thread Peter Geoghegan

On Thu, Jul 19, 2018 at 11:43 AM, Torsten Förtsch
 wrote:
> is there a way to find if a certain page in a data file is referenced by a
> btree index?
>
> I found a few completely empty pages in one of my tables. I am not sure if
> that's corruption or just bloat. Now I am thinking I could use an index, the
> PK for instance, and see if it references these pages.

Ir'a probably not serious, but you may want to try amcheck's
heapallindexed check. You'll have to use the non-contrib packages for
that right now, though, but those are available from the PGDG repos.

-- 
Peter Geoghegan

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-07-25 Thread Peter Geoghegan

On Wed, Jul 25, 2018 at 4:03 PM, Andres Freund  wrote:
> Peter, given that your patch made this more likely, and that you're a
> committer these days, I'm opening an open items entry, and assign it to
> you. Does that sound ok?

I intend to follow through on this soon. I have been distracted by
project work. I accept responsibility for the open item, though.

-- 
Peter Geoghegan

Re: Restore relhaspkey in PostgreSQL Version 11 Beta

2018-07-31 Thread Peter Geoghegan

On Tue, Jul 31, 2018 at 7:47 AM, Melvin Davidson  wrote:
> I was hoping that at least one other person would see my point of view, but
> by the
> harsh replies I've been getting, I feel more like a whistle blower that
> insiders
> think I also should be made to "go away".

You were bellicose from almost the beginning of this thread. And, yes,
that does detract from your argument. Just as it would in almost any
other sphere or arena.

> Well, you are right. This old Viet Vet shall now end this conversation and
> his career.
> I just need a way to do so quietly and painlessly.
> The truth is absolute and cannot be changed.
> Perception is not the truth.
> Flerp!

I cannot imagine what reaction you were expecting to this. In all
sincerity, I suggest reflecting on your words. You don't seem to have
realistic expectations about how the community works, or could ever
work.

-- 
Peter Geoghegan

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-07-31 Thread Peter Geoghegan

On Mon, Jul 9, 2018 at 11:32 AM, Andres Freund  wrote:
> I assume we'll have to backpatch this issue, so I think it'd probably a
> good idea to put a specific CacheInvalidateHeapTuple() in there
> explicitly in the back branches, and do the larger fix in 12. ISTM
> there's some risks that it'd cause issues.

What do you think of the attached?

The is a new CacheInvalidateRelcache() call, rather than a new call to
CacheInvalidateRelcacheByTuple(), but those two things are equivalent
(I assume that you actually meant to say
CacheInvalidateRelcacheByTuple(), not CacheInvalidateHeapTuple()).

Since nobody seems to be that excited about the
CacheInvalidateHeapTuple() idea, I haven't pursued it.

-- 
Peter Geoghegan

0001-Add-table-relcache-invalidation-to-index-builds.patch
Description: Binary data

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-07-31 Thread Peter Geoghegan

On Tue, Jul 31, 2018 at 7:02 PM, Andres Freund  wrote:
> Maybe expand a bit on this by saying that it's more likely "because
> plan_create_index_workers() triggers a relcache entry to be (re-)built,
> which previously did only happen in edge cases" or such?

Okay.

> Not a fan of this comment. It doesn't really explain that well why it's
> needed here, but then goes on to a relatively general explanation of why
> cache invalidation is necessary.  Why not just go for something like
> "register relcache invalidation on the indexes' heap relation, to
> maintain consistency of its index list"?

That seems much more generic to me!

The comment is supposed to convey that the stuff within
index_update_stats() isn't enough because of xact abort specifically.
SI invalidation is very much part of the index_update_stats() contract
already.

> I wonder if it wouldn't be more appropriately placed closer to the
> UpdateIndexRelation(), given that that's essentially what necessitates
> the relcache flush?

That makes sense. I'll do it that way.

-- 
Peter Geoghegan

Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function

2018-08-02 Thread Peter Geoghegan

On Tue, Jul 31, 2018 at 9:00 PM, Andres Freund  wrote:
> I don't think that's particularly relevant. We should always register an
> invalidation before the relevant CommandCounterIncrement(), because that
> is what makes catalog changes visible, and therefore requires
> registering invalidations for coherency.

Fair enough. How about the attached revision?

-- 
Peter Geoghegan


v2-0001-Add-table-relcache-invalidation-to-index-builds.patch
Description: Binary data

Re: jndi jdbc url with ssl authenticat in tomcat ... fails org.xml.sax.SAXParseException columnNumber: 79; The reference to entity "ssl" must end with the ';' delimiter

2018-08-04 Thread Peter Hicks

The & need to be replaced by &


Peter


On Sat, 4 Aug 2018, 22:47 Dave Cramer,  wrote:

> Pretty sure this is a tomcat error .
>
> The connection string looks fine
>
> Dave Cramer
>
> da...@postgresintl.com
> www.postgresintl.com
>
> On 30 July 2018 at 11:32, Didier Wiroth 
> wrote:
>
>> Hello,
>> I'm trying to configure a postgres jndi resource in tomcat that needs to
>> connect via ssl and client certificate for authentication.
>> Unfortunately ... without success.
>>
>> The user is: esrde_aoo and authenticates with a certificate (no
>> password!).
>>
>> Here is the tomcat 8.5.31 (running on jre 1.8.0_152-b16) resource
>> configuration:
>>   > driverClassName="org.postgresql.Driver"
>> name="jdbc/pgrde"
>>
>> url="jdbc:postgresql://test.localhost:5432/esrde?user=esrde_aoo&ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory&sslcert=d:/apps/ssl/rde/esrde_aoo.crt&sslkey=d:/apps/ssl/rde/esrde_aoo.key"
>>
>> username="esrde_aoo"
>> />
>>
>> I tried many many different combination ... it always fails at the second
>> parameter.
>> org.apache.tomcat.util.digester.Digester.fatalError Parse Fatal Error at
>> line 21 column 79: The reference to entity "ssl" must end with the ';'
>> delimiter.
>>  org.xml.sax.SAXParseException; systemId:
>> file:/D:/apps/web-data/tam/conf/db-context-ora12.xml; lineNumber: 21;
>> columnNumber: 79; The reference to entity "ssl" must end with the ';'
>> delimiter.
>> at
>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
>> at
>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:913)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1548)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1315)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
>> at
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
>> at
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
>> at
>> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
>> at
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
>> at
>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
>> at
>> org.apache.tomcat.util.digester.Digester.parse(Digester.java:1521)
>> at
>> org.apache.catalina.startup.ContextConfig.processContextConfig(ContextConfig.java:527)
>> at
>> org.apache.catalina.startup.ContextConfig.contextConfig(ContextConfig.java:465)
>> at
>> org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:728)
>> at
>> org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:310)
>> at
>> org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:94)
>> at
>> org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:395)
>> at
>> org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:108)
>> at
>> org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140)
>> at
>> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
>>

Re: scram-sha-256 authentication broken in FIPS mode

2018-09-11 Thread Peter Eisentraut

On 11/09/2018 05:02, Michael Paquier wrote:
> Hence, intrinsically, we are in contradiction with the upstream docs.  I
> have worked on the problem with the patch, which works down to OpenSSL
> 0.9.8, and should fix your issue.  This is based on what you sent
> previously, except that I was not able to apply what was sent, so I
> reworked the whole.  Alessandro, does this fix your problems?  I would
> like to apply that down to v10 where SCRAM has been introduced.

I recommend letting this bake in the master branch for a while.  There
are a lot weirdly patched and alternative OpenSSL versions out there
that defy any documentation.

Of course, we should also see if this actually fixes the reported problem.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Convert interval to hours

2018-09-14 Thread Peter Kleiner

On Fri, Sep 14, 2018 at 11:51 AM David Gauthier
 wrote:
>
> Hi:
>
> In perl/DBI, I have code that's getting me an "age" which returns something 
> like... "-17 days -08:29:35".  How can I convert that to a number of hours 
> (as a float I would presume) ?
>
> Thanks
>
>
>
>

I've done this as
select extract(epoch from '-17 days -08:29:35'::interval)/3600 as hours;

hours
---
 -416.49305556
(1 row)

Others might have a better way, though.

Pete

Re: Code of Conduct plan

2018-09-14 Thread Peter Geoghegan

On Fri, Sep 14, 2018 at 7:19 AM, Joshua D. Drake  wrote:
> Sure and that is unfortunate but isn't it up to the individual to deal with
> it through appropriate channels for whatever platform they are on? All of
> these platforms are:
>
> 1. Voluntary to use
> 2. Have their own Terms of Use and complaint departments
> 3. If it is abuse there are laws
>
> I agree that within Postgresql.org we must have a professional code of
> conduct but the idea that an arbitrary committee appointed by an unelected
> board can decide the fate of a community member based on actions outside of
> the community is a bit authoritarian don't you think?

The choice of the committee members is hardly arbitrary. Having
committee members be appointed by core is more or less consistent with
how the community has always dealt with disciplinary issues. The
criteria used by core were discussed quite openly. While the risk that
the committee will yield their power in an "authoritarian" way seems
very small, it cannot be ruled out entirely. In fact, it hasn't been
ruled out by the draft CoC itself.

No CoC can possibly provide for every conceivable situation. Somebody
has to interpret the rules, and it has to be possible to impose
sanctions when the CoC is violated -- otherwise, what's the point?
There are several checks and balances in place, and I for one have
confidence in the process as outlined. It's imperfect, but quite a lot
better than either the status quo, or a platitude about inclusivity.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-09-14 Thread Peter Geoghegan

On Fri, Sep 14, 2018 at 10:31 AM, Dimitri Maziuk  wrote:
> So let me get this straight: you want to have a "sanctioned" way to deny
> people access to postgresql community support channel?

Yes.

> "Because
> somebody who may or may not be the same person, allegedly said something
> somewhere that some other tweet disagreed with on faceplant"?
>
> Great plan if you do for-pay postgresql support for the living.

You can make your own conclusions about my motivations, just as I'll
make my own conclusions about yours. I'm not going to engage with you
on either, though.

-- 
Peter Geoghegan

Re: Code of Conduct plan

2018-09-14 Thread Peter Geoghegan

On Fri, Sep 14, 2018 at 11:06 AM, Dimitri Maziuk  wrote:
> Personally I would like that. Others might prefer an invitation to
> unsubscribe or forever hold their peace, I could live with that too, but
> I believe explicit opt-ins are preferable to opt-outs.

I think that it's a legitimate position to be opposed to a CoC like
this. I also think it's legitimate to feel so strongly about it, on
philosophical or political grounds, that you are compelled to avoid
participating while subject to the CoC. FWIW, the latter position
seems rather extreme to me personally, but I still respect it.

In all sincerity, if you're compelled to walk away from participating
in mailing list discussions on a point of principle, then I wish you
well. That is your right.

-- 
Peter Geoghegan

Re: Convert interval to hours

2018-09-14 Thread Peter Kleiner

On Fri, Sep 14, 2018 at 2:42 PM Steven Lembark  wrote:
>
> On Fri, 14 Sep 2018 12:21:14 -0400
> David Gauthier  wrote:
>
> > I'm using postgres v9.5.2 on RH6.
>
> PG can convert the times for you.
> For times (not timestamps) you are always better off dealing with
> either time or integer seconds. There are a variety of issues with
> rouding that affect repeatability and accuracy of results using
> floats or doubles. Given that 10 and three are both continuing
> fractions in binary (e.g., 1/10 binary is an infinite series)
> division by 3600 will only cause you annoyance at some point.
>
> If you are subtracting times then you will (usually) end up with
> an interval, which can be cast to seconds in the query and give
> you precise, accurate, repeatable results every time.
>
> e.g.,
>
> select
> extract
> (
> epoch from ( time1 - time2 )::interval
> )
> as "seconds",
> ...
>
> is one approach.
>
> In nearly all cases you are better off selecting and converting
> the time in SQL rather than converting the start and end times
> from numeric (time) to string (DBI) and then back from char *
> to float/double or int/unsigned. The charaacter conversion is
> expensive and numeric -> string -> numeric leaes you open to all
> sorts of rouding and conversion issues.
>
> Frankly, if you have to run the query more than once I'd suggest
> adding a view that does the select/convert for you (along with
> dealing with any NULL's that creep into things). PG makes it quite
> easy to add the view and quite in-expensive to apply it.
>

In the original e-mail, the OP said
> I have code that's getting me an "age" which returns something like... "-17 
> days -08:29:35".

I took that to mean he was beginning with a string, which I suggested
to cast to an interval.  If he's starting with a different type, then
of course the fewer castings the better.  Also, it seems as though you
two have had private communication, because I don't see an e-mail
where he specified the DB type.  Perhaps he also showed more of the
source data there.

Pete

Re: pglogical extension. - 2 node master <-> master logical replication ?

2019-07-07 Thread Peter Eisentraut

On 2019-07-02 19:54, Day, David wrote:
> On attempting the reverse subscription from host1 to host2
> 
> select pglogical.create_subscription('alabama_sub', 
> 'host=alabama port=5432 dbname=ace_db user=replicator',
> '{connections}', false, false, '{}' )
> 
> could not connect to the postgresql server: FATAL:  role "pgsql" does not 
> exist
> DETAIL:  dsn was:  host=georgia port=5432 dbname=ace_db
> 
> --
> Wrong dsn and role ?
> The  postgres installed superuser role is not pgsql.
> I did  not see this issue in the working subscription direction.

The problem might be in the dsn that you gave to create_node().  Hard to
tell without a fully reproducible script.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 978 matches

Mail list logo