Locked out of schema public
This is FreeBSD 11.3, with postgres installed from ports as 10.10. There is included a daily utility doing pg_dump: : ${daily_pgsql_pgdump_args:="-U ${daily_pgsql_user} -p ${daily_pgsql_port} -bF c"} pg_dump ${daily_pgsql_pgdump_args} -f ${file} ${db} Recently I did a restore of some database, as the postgres user, with: pg_restore -c -d -h and now ordinary users are locked out of the database: PG::UndefinedTable: ERROR: relation "users" does not exist => \d users Did not find any relation named "users". => \d Did not find any relations. => \d public.users Table "public.users" [etc.etc. all is present] => show search_path; search_path - "$user", public (1 row) => select current_schemas(false); current_schemas - {} (1 row) eh HOPPALA!!! => select * from public.users; ERROR: permission denied for schema public How can this happen? I don't think I twiddled anything with schemas, in fact I never used them in any way. cheers, PMc
Re: Locked out of schema public (pg_dump lacks backup of the grant)
Long story short: pg_dump just forgets to backup the grant on schema public. :( Long story: After searching for half an hour to get some comprehensive listing of permissions (which was in vain) I tried with pgadmin3 (which is indeed a life-saver and still somehow works on 10.10 - and that's the reason I am reluctant to upgrade postgres, as this can only get worse) - and then it was a simple action of comparing page-by-page: GRANT ALL ON SCHEMA public TO public; That one is missing on the restored database. So, if you do a "pg_restore -C -c -d postgres", then you get that grant from the template database, and no problem. (But this is ugly, as you need to find and terminate all the connections on the db.) If you do only "pg_restore -c -d ", the sessions can stay open, but then it will do DROP SCHEMA public; CREATE SCHEMA public; and it will NOT restore the grant because it is not in the backup. I'd like to call this a bug.
Re: Locked out of schema public
Hi Adrian, okay, lets check these out: > What is ${daily_pgsql_user} equal to? postgres. The owner of the installation. > I am not seeing -U postgres. > Are you sure there is not something else specifying the user e.g. env > PGUSER? I'm sure. The log shows the nightly backup connections as postgres:, and all connections except postgres:postgres work only with kerberos - it cannot do much bogus there. > What user are you doing below as? Ordinary application user. The postgres and superusers do get access to the tables. > What does \dn+ show? => \dn+ List of schemas Name | Owner | Access privileges | Description +--+--+ public | postgres | postgres=UC/postgres+| standard public schema | | pgsql=UC/postgres| And after restoring with "pg_restore -C -c -d postgres", when it works correctly again, then it shows: -> \dn+ List of schemas Name | Owner | Access privileges | Description +--+--+ public | postgres | postgres=UC/postgres+| standard public schema | | =UC/postgres+| | | pgsql=UC/postgres| So that was the command I was searching for. Thank You! For now I hold on the bug... cheers, PMc
Re: Locked out of schema public (pg_dump lacks backup of the grant)
Hello Tom, thank You very much. > We improved that situation in v11, I believe. What I see for this > case these days is per commit 5955d9341: > [...] Ah, well. I don't fully understand that, but as the iessue appears to be known, then that is fine with me. This thing is just bad if one never seriously worked with schemas and has no immediate idea what could have hit - especially when that happens on the top of a stack of open windows with other issues. :( cheers, PMc
12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
Hi folks, with 12.1, after a couple of queries, at a random place, the clientlib does produce a failed query without giving reason or error-message [1]. Then when retrying, the clientlib switches off signal handling and sits inactive in memory (needs kill -9). The server log shows no error or other hint. The behaviour happens rarely with trust access, and almost always when using Kerberos5 (Heimdal as included in FreeBSD). 11.5 clientlib has none of this behaviour and seems to work fine, like 10.10 did. Environment: OS FreeBSD 11.3 Applic. Ruby-on-Rails, ruby=2.5.7, gem 'pg'=1.2.2 (it makes no difference if that one is compiled with the 12.1 or the 10.10 library) Server 12.1 [1] the message from ruby is PG::ConnectionBad: PQconsumeInput() : rgds, PMc
Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
On Thu, Jan 09, 2020 at 01:48:01PM -0500, Tom Lane wrote: ! Peter writes: ! > with 12.1, after a couple of queries, at a random place, the clientlib ! > does produce a failed query without giving reason or error-message [1]. ! > Then when retrying, the clientlib switches off signal handling and ! > sits inactive in memory (needs kill -9). ! ! Seems like you'd better raise this with the author(s) of the "pg" ! Ruby gem. Perhaps they read this mailing list, but more likely ! they have a specific bug reporting mechanism somewhere. Tom, I don't think this has anything to do with "pg". Just checked: I get garbage and misbehaviour on the "psql" command line tool also: $ psql -h myhost flowmdev psql (12.1) GSSAPI-encrypted connection Type "help" for help. flowmdev=> select * from flows; message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle flowmdev=> select * from flows; server sent data ("D" message) without prior row description ("T" message) flowmdev=> select * from flows; message type 0x54 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle message type 0x44 arrived from server while idle id | name |... To the contrary: $ PGGSSENCMODE="disable" psql -h myhost flowmdev psql (12.1) Type "help" for help. flowmdev=> select * from flows; id | name |... rgds, PMc
Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
On Thu, Jan 09, 2020 at 10:47:00AM -0800, Adrian Klaver wrote: ! ! Might want to take at below: ! ! https://github.com/ged/ruby-pg/issues/311 Thanks a lot! That option > gssencmode: "disable" seems to solve the issue. But I think the people there are concerned by a different issue: they are bothering about fork(), while my flaw appears also when I do *NOT* do fork. Also the picture is slightly different; they get segfaults, I get misbehaviour. rgds, PMc
Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
On Thu, Jan 09, 2020 at 04:31:44PM -0500, Tom Lane wrote: ! Peter writes: ! > flowmdev=> select * from flows; ! > message type 0x44 arrived from server while idle ! > message type 0x44 arrived from server while idle ! > message type 0x44 arrived from server while idle ! ! Oh ... that does look pretty broken. However, we've had no other similar ! reports, so there must be something unique to your configuration. Busted ! GSSAPI library, or some ABI inconsistency, perhaps? What platform are you ! on, and how did you build or obtain this Postgres code? This is a FreeBSD 11.3-p3 r351611 built from source. Postgres is built from https://svn0.eu.freebsd.org/ports/branches/2019Q4 (rel. 12r1) or https://svn0.eu.freebsd.org/ports/branches/2020Q1 (rel. 12.1) with "make package install". I have a build environment for base&ports that forces recompiles on any change and should make ABI inconsistencies quite hard to create. All local patches are versioned and documented; there are none that I could imagine influencing this. There are no patches on postgres. Also no patches on the GSSAPI. There are a couple of patches on the Heimdal, to fix broken commandline parsing, broken pidfile handling and broken daemonization. None of them touches the core functionality (like key handling). But I just recognize something of interest (which I had taken for granted when importing the database): the flaw does NOT appear when accessing the database from the server's local system (with TCP and GSSAPI encryption active). Only from remote system. But then, if I go on the local system, and change the mtu: # ifconfig lo0 mtu 1500 and restart the server, then I get the exact same errors locally. I don't get a clue of that, it doesn't make sense. With the default lo0 mtu of 16384 the packets go on the network with the full 8256 bytes you send. With mtu 1500 they are split into 1448 byte pieces; but TCP is supposed to handle this transparently. And what difference would the encryption make with this? > net.inet.tcp.sendspace: 32768 > net.inet.tcp.recvspace: 65536 These are also bigger. No, I don't understand that. The only thing - these are all VIMAGE jails. VIMAGE was considered 'experimental' some time ago, and went productive in FreeBSD 12.0, and 11.3 is lower and later than 12.0 - whatever that concedes. Another thing I found out: the slower the network, the worse the errors. So might it be nobody complained just because those people usually having GSSAPI also have very fast machines and networks nowadays? When I go to packet-radio speed: # ipfw pipe 4 config bw 10kbit/s then I can see the query returning empty at the first received bytes: flowmdev=# select * from flows; flowmdev=# and not even waiting the 8 seconds for the first block to arrive. rgds, PMc
12.2: Howto check memory-leak in worker?
Hi all, I have something that looks a bit insane: # ps axl | grep 6145 UID PID PPID CPU PRI NI VSZRSS MWCHAN STAT TT TIME COMMAND 770 6145 1 0 20 0 241756868 select SsJ - 0:24.62 /usr/local/bin/postgres -D 770 6147 6145 0 23 0 243804 109784 select IsJ - 3:18.52 postgres: checkpointer( 770 6148 6145 0 20 0 241756 21348 select SsJ - 2:02.83 postgres: background writer 770 6149 6145 0 20 0 241756 7240 select SsJ - 16:36.80 postgres: walwriter(pos 770 6150 6145 0 20 0 21980876 select SsJ - 0:13.92 postgres: archiver last w 770 6151 6145 0 20 0 21980980 select SsJ - 0:58.45 postgres: stats collector 770 6152 6145 0 20 0 241756 1268 select IsJ - 0:02.07 postgres: logical replicati 770 43315 6145 0 21 0 251844 7520 select IsJ - 1:07.74 postgres: admin postgres 19 770 43317 6145 0 25 0 251764 8684 select IsJ - 1:28.89 postgres: admin bareos 192. 770 43596 6145 0 20 0 245620 4476 select IsJ - 0:00.12 postgres: admin bareos 192. 770 43761 6145 0 20 0 245620 4476 select IsJ - 0:00.15 postgres: admin bareos 192. 770 90206 6145 0 52 0 1331256 219720 racctDsJ -563:45.41 postgres: bareos bareos 192 The 90206 is continuously growing. It is the unspecific, all-purpose worker for the www.bareos.com backup tool, so it is a bit difficult to figure what precisely it does - but it tends to be rather simple straight-forward queries, so it is unlikely to have dozens of "internal sort operations and hash tables". What I can say that at times this worker is completely idle in ClientRead, but does not shrink in memory. Is this a normal behaviour? Here is a more dynamic picture: it continues to add 2048kB chunks (and does not do noticeable paging): UID PID PPID CPU PRI NI VSZRSS MWCHAN STAT TT TIME COMMAND Mon May 4 13:33:09 CEST 2020 770 90206 6145 0 91 0 1335352 226900 -RsJ -569:09.19 postgres: bareos bareos SELECT (postgres) Mon May 4 13:33:39 CEST 2020 770 90206 6145 0 93 0 1335352 227696 -RsJ -569:28.48 postgres: bareos bareos idle (postgres) Mon May 4 13:34:09 CEST 2020 770 90206 6145 0 92 0 1337400 228116 -RsJ -569:47.46 postgres: bareos bareos SELECT (postgres) Mon May 4 13:34:39 CEST 2020 770 90206 6145 0 92 0 1337400 228596 -RsJ -570:06.56 postgres: bareos bareos UPDATE (postgres) Mon May 4 13:35:09 CEST 2020 770 90206 6145 0 92 0 1337400 228944 -RsJ -570:25.62 postgres: bareos bareos SELECT (postgres) Mon May 4 13:35:40 CEST 2020 770 90206 6145 0 52 0 1337400 229288 racctDsJ -570:44.33 postgres: bareos bareos UPDATE (postgres) Mon May 4 13:36:10 CEST 2020 770 90206 6145 0 91 0 1337400 229952 -RsJ -571:03.20 postgres: bareos bareos SELECT (postgres) Mon May 4 13:36:40 CEST 2020 770 90206 6145 0 52 0 1337400 223772 racctDsJ -571:21.50 postgres: bareos bareos SELECT (postgres) Mon May 4 13:37:10 CEST 2020 770 90206 6145 0 91 0 1337400 224448 -RsJ -571:40.63 postgres: bareos bareos idle (postgres) Mon May 4 13:37:40 CEST 2020 770 90206 6145 0 91 0 1339448 225464 -RsJ -571:58.36 postgres: bareos bareos SELECT (postgres) Mon May 4 13:38:10 CEST 2020 770 90206 6145 0 52 0 1339448 215620 select SsJ -572:14.24 postgres: bareos bareos idle (postgres) Mon May 4 13:38:40 CEST 2020 770 90206 6145 0 81 0 1339448 215320 -RsJ -572:21.09 postgres: bareos bareos idle (postgres) Mon May 4 13:39:10 CEST 2020 OS is FreeBSD 11.3-RELEASE-p8 r360175M i386 PostgreSQL 12.2 on i386-portbld-freebsd11.3, compiled by gcc9 (FreeBSD Ports Collection) 9.3.0, 32-bit autovacuum is Disabled. The memory-specific config is: > shared_buffers = 200MB > temp_buffers = 40MB > work_mem = 80MB > maintenance_work_mem = 250MB > dynamic_shared_memory_type = posix > random_page_cost = 2.0 > effective_cache_size = 1GB (others are left at default) I remember vaguely that there are means to have a closer look into what is using the memory, but do not recall the specifics. Some pointers or ideas to proceed would be gladly appreciated (Dtrace should work) - processes will usually fail with OOM at this size, due to machine configuration - I'm waiting for that now (it is a very very old pentium3 machine ;) ). cheerio, PMc
Re: 12.2: Howto check memory-leak in worker?
On Mon, May 04, 2020 at 12:55:38PM -0700, Adrian Klaver wrote: ! > The 90206 is continuously growing. It is the unspecific, all-purpose ! > worker for the www.bareos.com backup tool, so it is a bit difficult to ! > figure what precisely it does - but it tends to be rather simple ! > straight-forward queries, so it is unlikely to have dozens of "internal sort ! > operations and hash tables". ! > ! ! Is there any relevant information in the bareos or Postgres logs? No. Only the final exitus-in-tabula. ! > autovacuum is Disabled. ! ! Any particular reason for above? Yes. Disk spindown doesn't work so, and I see no reason to have a backup zone that is used few times a day spinning 24 hours. BTW, I would greatly appreciate if we would reconsider the need for the server to read the postmaster.pid file every few seconds (probably needed for something, I don't know). That makes it necessary to set atime=off to get a spindown, and I usually prefer to have atime=on so I can see what my stuff is currently doing. cheerio, PMc
Re: 12.2: Howto check memory-leak in worker?
On Tue, May 05, 2020 at 10:57:04AM +1200, Thomas Munro wrote: ! On Tue, May 5, 2020 at 10:13 AM Peter wrote: ! > BTW, I would greatly appreciate if we would reconsider the need for ! > the server to read the postmaster.pid file every few seconds (probably ! > needed for something, I don't know). ! > That makes it necessary to set atime=off to get a spindown, and I ! > usually prefer to have atime=on so I can see what my stuff is ! > currently doing. ! ! That'd be this: ! ! /* ! * Once a minute, verify that postmaster.pid hasn't been removed or ! * overwritten. If it has, we force a shutdown. This avoids having ! * postmasters and child processes hanging around after their database ! * is gone, and maybe causing problems if a new database cluster is ! * created in the same place. It also provides some protection ! * against a DBA foolishly removing postmaster.pid and manually ! * starting a new postmaster. Data corruption is likely to ensue from ! * that anyway, but we can minimize the damage by aborting ASAP. ! */ ! if (now - last_lockfile_recheck_time >= 1 * SECS_PER_MINUTE) ! { ! if (!RecheckDataDirLockFile()) Ah. Yes. Thanks for identifying this issue. It is really hard to comment on this in a way that might not be considered offensive, so lets put it that way: You all have seen the Apollo-13 movie, so You know that a need to power-down as much as possible may appear en-route, and in a situation where you have lots of other issues, so what you need the least is things like this getting in your way. I usually take this a bit further and consider only two relevant quality levels for software: interplanetary and interstellar - where i.p. is the kind of stuff on which we will trust our lives during the manned Jupiter mission. PG might well qualify for this, except with things like the quoted; so I think such should be tagged with appropriate compile-time switches. Now concerning the memory leak: That one was introduced with the work done on the GSSAPI encryption; it goes away when setting 'hostnogssenc' in pg_hba. (And, whoever has editor rights, there is also a little fixup needed for that switch in pg_hba.conf; it should probably be "eleven formats".) cheerio, PMc
Re: 12.2: Howto check memory-leak in worker?
On Tue, May 05, 2020 at 11:59:27AM -0400, Tom Lane wrote: ! Well, the choice we face is preventing somebody's disk from spinning ! down, versus preventing somebody else from completely corrupting their ! database. From where I sit that's not a difficult choice, nor one ! that I feel a need to let users second-guess. Then maybe You see a scenario where that feature would actually prevent db corruption, while I have not yet found a realistic one. Instead, what gave me headaches is that ZFS might take a single tablespace (=pool) offline with the cluster continuing to run - and I am not sure if the db is supposed to survive that (mine did, after I had hit the power button in horror), nor is it easy to protect from that. Anyway, I can now switch that feature off per local patch, which is the second-best solution. ! Oooh ... it looks like some of the encryption code paths have neglected ! to call gss_release_buffer. Will fix, thanks for the report! Great! So I assume I don't need to send a bug report I had prepared interim. Feel free to ask me for anything that might be still needed. cheerio, PMc
12.2: Why do my Redo Logs disappear??
Hi all, this is a 12.2 Release on FreeBSD 11.3. I am doing RedoLog Archiving according to Docs Chapter 25.1. During the last week I have lost 4 distinct Redo Logs; they are not in the backup. Loosing a RedoLog is very bad, because there is no redundancy, loosing a single one of them makes the timeline disappear and it will only reappear after another Base Backup. Very very bad. So I did analyze the matter. There are three ways to restart a Berkeley machine: 1. Cowboy it (aka pull-the-plug). This makes sure that everything is certainly dead immediately, and, given all hard- and software is well designed, nothing bad should happen. 2. Shut it down. This is the official means, and it takes very long, because each and every applications are signalled and given time to bring up whatever objections they may have. In this case "pg_ctl stop" will be invoked with whatever options the sysop has configured, and postgres will copy out a full log into archive before terminating. 3. Halt/Reboot it, like this: https://www.freebsd.org/cgi/man.cgi?query=reboot&sektion=8&manpath=FreeBSD+11.3-RELEASE This is considered more safe than pull-the-plug, and still fast. Applications are killed without much waiting, but all disk buffers are flushed to permanent storage and filesystems closed. In this case, it seems, Postgres will delete the current log without archiving it. :( What precisely happens (according to the OS sources) during reboot is this: processes will be sent SIGTERM, and after some 2-10 seconds followed by SIGKILL. -- Lets have a closer look: We did a regular shutdown at 17:09, and then we did a reboot at 19:24. Here is the content of the staging area (where the logs are copied to and accumulated until it is worth to run a backup job): # dir arch/ total 240 drwx-- 2 postgres postgres 5 Jun 8 17:09 . drwxr-xr-x 6 postgres postgres 7 Jun 8 17:09 .. -rw--- 1 postgres postgres 16777216 Jun 8 09:38 0001001700FC.ok -rw--- 1 postgres postgres 16777216 Jun 8 10:48 0001001700FD.ok -rw--- 1 postgres postgres 16777216 Jun 8 17:09 0001001700FE.ok And here is the pg_wal directory: # dir data12/pg_wal/ total 89256 drwx-- 3 postgres postgres10 Jun 8 19:28 . drwx-- 19 postgres postgres23 Jun 8 19:28 .. -rw--- 1 postgres postgres 335 Jun 7 07:36 0001001700EF.0060.backup -rw--- 1 postgres postgres 16777216 Jun 8 19:38 00010018 -rw--- 1 postgres postgres 16777216 Jun 7 07:17 000100180001 -rw--- 1 postgres postgres 16777216 Jun 7 07:17 000100180002 -rw--- 1 postgres postgres 16777216 Jun 7 07:17 000100180003 -rw--- 1 postgres postgres 16777216 Jun 7 07:17 000100180004 -rw--- 1 postgres postgres 16777216 Jun 7 07:36 000100180005 drwx-- 2 postgres postgres 3 Jun 8 17:09 archive_status # dir data12/pg_wal/archive_status total 23 drwx-- 2 postgres postgres 3 Jun 8 17:09 . drwx-- 3 postgres postgres 10 Jun 8 19:28 .. -rw--- 1 postgres postgres 0 Jun 7 07:36 0001001700EF.0060.backup.done Now where the hell is my "FF" log ??? Lets check syslog - this was the full shutdown at 17:09: Jun 8 17:09:38 admn pg-bck[73534]: [10-1] :[] LOG: 0: received fast shutdown request Jun 8 17:09:38 admn pg-bck[73534]: [10-2] :[] LOCATION: pmdie, postmaster.c:2780 Jun 8 17:09:38 admn pg-bck[73534]: [11-1] :[] LOG: 0: aborting any active transactions Jun 8 17:09:38 admn pg-bck[73534]: [11-2] :[] LOCATION: pmdie, postmaster.c:2813 Jun 8 17:09:38 admn pg-bck[73549]: [10-1] :[] DEBUG: 0: logical replication launcher shutting down Jun 8 17:09:38 admn pg-bck[73549]: [10-2] :[] LOCATION: ProcessInterrupts, postgres.c:2981 Jun 8 17:09:38 admn pg-bck[73534]: [12-1] :[] LOG: 0: background worker "logical replication launcher" (PID 73549) exited with exit code 1 Jun 8 17:09:38 admn pg-bck[73534]: [12-2] :[] LOCATION: LogChildExit, postmaster.c:3657 Jun 8 17:09:38 admn pg-bck[73544]: [13-1] :[] LOG: 0: shutting down Jun 8 17:09:38 admn pg-bck[73544]: [13-2] :[] LOCATION: ShutdownXLOG, xlog.c:8321 Jun 8 17:09:45 admn pg-bck[82223]: RedoLog.bck invoked pg_wal/0001001700FE 0001001700FE Jun 8 17:09:45 admn pg-bck[82223]: RedoLog.bck pg_wal/0001001700FE 0001001700FE returns 0 Jun 8 17:09:45 admn pg-bck[73547]: [8-1] :[] DEBUG: 0: archived write-ahead log file "0001001700FE" Jun 8 17:09:45 admn pg-bck[73547]: [8-2] :[] LOCATION: pgarch_archiveXlog, pgarch.c:675 Jun 8 17:09:55 admn pg-bck[73544]: [14-1] :[] LOG: 0: checkpoint starting: shutdown immediate Jun 8 17:09:55 admn pg-bck[735
Re: 12.2: Why do my Redo Logs disappear??
On Mon, Jun 08, 2020 at 05:40:20PM -0700, Peter Geoghegan wrote: ! ! I strongly suspect that you were hit by the bug fixed in commit ! 4e87c483. You should upgrade to Postgres 12.3 ASAP, to get that fix: ! ! "Avoid premature recycling of WAL segments during crash recovery ! (Jehan-Guillaume de Rorthais) ! ! WAL segments that become ready to be archived during crash recovery ! were potentially recycled without being archived." Ahh, thank You so much; this is good news: if it is an already known bug, I can close all efforts, remove the debug-switches again, stuff the sources back into their box and relax. ;) I did a check if I would find something about lost archiving logs, but didn't - and then it is always possible that it's just a mistake of mine - as I know I do make mistakes. And now for the nitpicking part :) On Mon, Jun 08, 2020 at 05:35:40PM -0700, Adrian Klaver wrote: ! > I am doing RedoLog Archiving according to Docs Chapter 25.1. ! ! There is no ReDo logging, there is WAL logging. Yes I know - and i don't care. Technically they're no longer WAL when they're going to be archived. Their purpose then becomes to redo the transactions, and even the messages say so: ! > LOG: 0: redo done at 17/FF01C098 ! What docs, because section 25.1 in the Postgres docs is : ! ! https://www.postgresql.org/docs/12/backup-dump.html ! ! 25.1. SQL Dump Ups, mistake of mine. Should be 25.3.1. ! https://www.postgresql.org/docs/12/server-shutdown.html ! " ! Important ! ! It is best not to use SIGKILL to shut down the server. Doing so will prevent ! the server from releasing shared memory and semaphores. Furthermore, SIGKILL ! kills the postgres process without letting it relay the signal to its ! subprocesses, so it might be necessary to kill the individual subprocesses ! by hand as well. And which of these would be of any concern if the machine is rebooted anyway? I had to install new hardware, and currently I'm trying to identify a memory exhaustion issue. This makes it necessary to reboot the full system quite often, and I neither want to wait for orderly termination of dozens of subsytems, nor do I want to need fsck at restart. This would make SIGKILL/reboot the method of choice. ! What is RedoLog.bck? The script which one has to write according to the docs' section which would be correctly numbered 25.3.1. cheerio, PMc
Re: Something else about Redo Logs disappearing
Actually, the affair had some good side: as usual I was checking my own designs first and looking for flaws, and indeed I found one: If you do copy out the archive logs not directly to tape, but to some disk area for further processing, then there is an issue with possible loss. If you do it like the docs say, with a command like this: archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p +/mnt/server/archivedir/%f' # Unix That "cp" is usually not synchronous. So there is the possibility that this command terminates successfully, and reports exitcode zero back to the Postgres, and then the Postgres will consider that log being safely away. But the target of the copy may not yet been written to disk. If at that point a power loss happens, the log may become missing/damaged/ incomplete, while the database may or may not consider it done when restarting. Therefore, mounting such a target filesystem in all-synchronous mode might be a good idea. (UFS: "-o sync", ZFS: "set sync=always") cheerio, PMc
Re: Something else about Redo Logs disappearing
On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote: ! ! On 6/8/20 7:33 PM, Peter wrote: ! > ! > Actually, the affair had some good side: as usual I was checking ! > my own designs first and looking for flaws, and indeed I found one: ! > If you do copy out the archive logs not directly to tape, but to ! > some disk area for further processing, then there is an issue with ! > possible loss. If you do it like the docs say, with a command like ! > this: ! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p ! > +/mnt/server/archivedir/%f' # Unix ! > That "cp" is usually not synchronous. So there is the possibility ! > that this command terminates successfully, and reports exitcode zero ! > back to the Postgres, and then the Postgres will consider that log ! > being safely away. ! ! Which is why just following the above command in the docs is: ! ! "(This is an example, not a recommendation, and might not work on all ! platforms.) " So, what You are basically saying is: my worries are justified and correctly founded, and this is indeed a matter that needs to be taken care of. Thank You. ! Generally for peace of mind folks use third party tools like: ! ! pg_backrest(https://pgbackrest.org/), ! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or ! Barman(https://www.pgbarman.org/). Hmja. We may on occasion have a look into these... ! I use pg_backrest, but it does not look promising for running on BSD: ! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html That looks mostly like the usual things which can be fixed. Now, for the facts: I am already using a professional backup solution. (It is actually a "dual-use/commercial" solution, of the kind which you can either fetch from github and use without support, or buy with a contract or whatever and get support.) With this professional backup solution I have already identified 28 (spell: twenty-eight) bugs, and fixed/workarounded these, until I got it properly working. This professional backup solution also offers support for postgres. Sadly, it only covers postgres up to Rel.9, and that piece of software wasn't touched in the last 6 or 7 years. But the bigger issue there is, that backup solution needs it's own postgres database as it's backend - and it cannot backup the database it is using. Looks quite pointless to me, then. So I just did it all with shell (and it wasn't many lines). So now, as I've been thru identifying and handling all the issues in that one backup solution, and since it is supposed to handle *all* backup demands (and not only postgres), I will certainly not start and go thru the same process again with one of these supposed solutions, where 90% of the code tries to solve the same things redundantly again, but then only for PG. Actually, I am getting very tired of reading that something which can easily be done within 20 lines of shell scripting, would need special "solutions", solutions that need to be compiled, solutions that would bring along their own fashion of interpreter, solutions that have a lot of their own dependencies and introduce mainly one thing: new bugs. Does nobody know anymore how to do proper systems management scripting? Using just the basic system tools which have proven to work for more than 50 years now!? ! Not sure about pg_probackup. Okay, I had a -very short- look into these. Just scanning the introductory pages. The only really interesting thing there is the pg_probackup. These folks seem to have found a way to do row-level incremental backups. And pgbarman seems to have an impressive understanding of ITIL (in case anybody bothers about that). All these tools do only cover PG, but do that in any possible regards. This is fine as long as you do not run any computers, and the only application you are using is Postgres. But, if you have other applications as well, or have computers, then you will need a different backup solution, something that will cover your site-wide backup demands, in a consistent fashion (think something in the style of ADSM, or nowadays called Spectrum Protect). And then 90% of the things offered here become superfluous, because they are already handled site-wide. And then you will have to consider integration of both pieces - and that will most likely be more work and more error-prone than just writing a few adapters in shell. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote: ! Adrian Klaver writes: ! > On 6/8/20 7:33 PM, Peter wrote: ! >> That "cp" is usually not synchronous. So there is the possibility ! >> that this command terminates successfully, and reports exitcode zero ! >> back to the Postgres, and then the Postgres will consider that log ! >> being safely away. ! ! > Which is why just following the above command in the docs is: ! > "(This is an example, not a recommendation, and might not work on all ! > platforms.) " ! ! Yeah. There have been discussions about changing that disclaimer to be ! more forceful, because in point of fact a plain "cp" is generally not safe ! enough. You need to fsync the written file, and on many filesystems you ! also have to fsync the directory it's in. It certainly does not need to be "more forceful" - because this is not about behavioural education, like training dogs, horses, or monkeys, and neither do we entertain a BDSM studio. What it needs instead is mention of the magic word "fsync". Because, we already know that - we just need a reminder at the proper place. Systems integrators are professional people. They are not in need of more beating (spell: forceful education), only of substantial technical hints and informations. ! > Generally for peace of mind folks use third party tools like: ! ! +1. Rolling your own archive script is seldom advisable. Well then, using commercial solutions brings it's own problems. E.g., the people I happened to work for often had problems with torsion, which happens when the solution gets longer than, say, twenty meters, and these are walked at high speeds. They didn't have a problem with scripting - rather the opposite, they were happy with it and paid good money for. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote: ! The backup solution is? https://www.bareos.com/ ! Fine rant below. Go forth and work your wonders. I don't need to, anymore. I did that, for about 20 years - people I used to work for as a consultant (major banks and insurance shops) would usually run Informix or Oracle. Postgres is just my own private fancy. On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: ! * Peter (p...@citylink.dinoex.sub.org) wrote: ! > This professional backup solution also offers support for postgres. ! > Sadly, it only covers postgres up to Rel.9, and that piece of software ! > wasn't touched in the last 6 or 7 years. ! ! Then it certainly doesn't work with the changes in v12, and probably has ! other issues, as you allude to. Just having a look at their webpage, something seems to have been updated recently, they now state that they have a new postgres adapter: https://www.bareos.com/en/company_news/postgres-plugin-en1.html Enjoy reading, and tell us what You think. ! > Actually, I am getting very tired of reading that something which can ! > easily be done within 20 lines of shell scripting, would need special ! ! This is just simply false- you can't do it properly in 20 lines of shell ! scripting. Well, Your own docs show how to do it with a one-liner. So please don't blame me for improving that to 20 lines. ! Sure, you can write something that has probably next to no ! error checking, Before judging that, one should first specify what precisely is the demand. In a basic approach, the demand may be to get the logs out on tape in a failsafe automated fashion without any miss, and get the data tree out periodically, and have guaranteed that these files are untampered as on disk. And that can very well be done properly with an incremental filesystem backup software plus some 20 lines of shellscript. Now talking about doing an automated restore, or, having some menu- driven solution, or -the worst of all- having a solution that can be operated by morons - that's an entirely different matter. In my understanding, backup is done via pgdump. The archive logs are for emergencies (data corruption, desaster), only. And emergencies would usually be handled by some professional people who know what they have to do. You may consider different demands, and that is also fine, but doesn't need to concern me. ! uses the deprecated API that'll cause your systems to ! fail to start if you ever happen to have a reboot during a backup It is highly unlikely that I did never have that happen during 15 years. So what does that mean? If I throw in a pg_start_backup('bogus'), and then restart the cluster, it will not work anymore? Lets see... Clean stop/start - no issue whatsoever. (LOG: online backup mode canceled) kill -9 the whole flock - no issue whatsoever (Log: database system was interrupted) I won't pull the plug now, but that has certainly happened lots of times in the past, and also yielded no issue whatsoever - simply because there *never* was *any* issue whatsover with Postgres (until I got the idea to install the relatively fresh R.12 - but that's understandable). So maybe this problem exists only on Windows? And yes, I read that whole horrible discussion, and I could tear my hair out, really, concerning the "deprecated API". I suppose You mean the mentioning in the docs that the "exclusive low-level backup" is somehow deprecated. This is a very big bad. Because: normally you can run the base backup as a strictly ordinary file-level backup in "full" mode, just as any backup software can do it. You will simply execute the pg_start_backup() and pg_stop_backup() commands in the before- and after- hooks - and any backup software will offer these hooks. But now, with the now recommended "non-exclusive low-level backup", the task is different: now your before-hook needs to do two things at the same time: 1. keep a socket open in order to hold the connection to postgres (because postgres will terminate the backup when the socket is closed), and 2. invoke exit(0) (because the actual backup will not start until the before- hook has properly delivered a successful exit code. And, that is not only difficult, it is impossible. So, what has to be done instead: you need to write a separate network daemon, with the only purpose of holding that connection to postgres open. And that network daemon needs to handle the communication to the backup software on one side, and to postgres on the other side. And that network daemon then needs the full-blown feature requirements as a fail-safe network daemon should have (and that is a LOT!), plus it needs to handle all kinds of possible failures (like network interruptions) in that triangle, during the backup, and properly notify both sides of whatever might be ongoing (and that
Re: Something else about Redo Logs disappearing
On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote: ! > And then 90% of the things offered here become superfluous, because ! > they are already handled site-wide. And then you will have to ! > consider integration of both pieces - and that will most likely be ! > more work and more error-prone than just writing a few adapters in ! > shell. ! ! pgbackrest's repo can be safely backed up using the simple file-based ! backup utilities that you're referring to here. I suspect some of the ! other solution's backups also could be, but you'd probably want to make ! sure. What repo?? I seem to have missed that at first glance. Are You indeed suggesting that one should have their data within the database, where it is worked with, and then use Your tool to copy it to some "repo" disk playground whatever area, and then use their regular backup system to COPY IT AGAIN into their backup/archiving system? Are You kiddin'? Are You indeed suggesting that people should buy the amount of disks that they use for their database AGAIN in order for Your software to copy the stuff around? Is this becoming a madhouse, or are You going to refund them that? Let me tell You something: the people I used to work for, sometimes had a problem. They had some amount of data that was created during the day, and they had the night to write that data away to backup. That would usually mean, four or eight of the big tapes, streaming in parallel, fibers saturated, all night thru. And the problem usually was that they would need a longer night. At least the math had to be done properly. Maybe You never encountered these, but there are surroundings where there is no spare room for nonsense. Maybe that'S why these people preferred to use oracle. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Wed, Jun 10, 2020 at 01:10:36PM +0200, Magnus Hagander wrote: ! > Just having a look at their webpage, something seems to have been updated ! > recently, they now state that they have a new postgres adapter: ! > ! > https://www.bareos.com/en/company_news/postgres-plugin-en1.html ! > Enjoy reading, and tell us what You think. ! > ! ! This one unfortunately rings out of somebody who doesn't know how to back ! up postgres, at least in the past 10-15 years. ! ! They are using an API that has been deprecated for years - in what's ! announced as a brand new product. They are advocating local archiving, ! which basically guarantees dataloss in the event of a disaster. Aye, thank You, that's exactly the impression I got. This is probably still the old thing I was talking about, just made into a new product. ! That's from a 3 minute look, but that's definitely enough to suggest this ! is not something I'd consider using. The matter is, that backup software (as a whole, not this postgres component) offers lots of things exactly as I like them. It is a great concept, a great implementation, but a bad coding quality and a bad maintenance policy. But then, one can get it for free; and I know of no other with such features. So I went thru the effort of fixing it up, so that it now well serves my needs - and use my own scripting for the add-ons. ! > Well, Your own docs show how to do it with a one-liner. So please ! > don't blame me for improving that to 20 lines. ! > ! ! Yes, those docs are unfortunately "known bad" and should definitely be ! improved on. it does very clearly state that the example is just an ! example. But it doesn't clearly state *why* it shouldn't be used. That's why I felt the ethical need to speak up and share my consideration. Now it's up to those in charge and not my issue anymore. ;) ! In my understanding, backup is done via pgdump. The archive logs are ! > for emergencies (data corruption, desaster), only. And emergencies ! > would usually be handled by some professional people who know what ! > they have to do. ! > ! ! I'd say it's the exact opposite. backups are done via pg_basebackup or ! manual basebackups. Archive logs are for point in time recovery. pg_dump ! can be used as a secondary "backup to the backups" option, but it is most ! interesting for things that are not backups (such as inspecting data, or ! provisioning partial test systems). ! ! Different for different scenarios of course, but that would be the base ! scenario. And pg_dump are definitely as far from good backups are you can ! get while still having something that can be called approximately backups. ! It might be enough for small databases, but even in those cases ! pg_basebackup (without archive logging) is easier... It's easier to create - but to apply? That depends on how many DBs are in the cluster and how diverse their use. Also at any major version switch these backups get worthless; one cannot use them for longterm. (I suppose this is also true for pg_basebackup.) I'm creating my longterm (and offsite) simply as clones from the regular full backup. So what I came up with for now, is: I run pg_dump over all the present databases, plus globals, chunk that up (in a similar way like chunked HTTP works), feed it onto a pipe and backup that pipe. No need for interim storage, so it can get as large as the backup software can take it. And that should work for longterm - and I don't currently see a better option. (This one does not work in 20 lines shellscript, because I didn't get a reliable chunker running in shell.) ! And yes, I read that whole horrible discussion, and I could tear my ! > hair out, really, concerning the "deprecated API". I suppose You mean ! > the mentioning in the docs that the "exclusive low-level backup" is ! > somehow deprecated. ! > ! ! Yes. There is no "somehow", it's deprecated. Then lets not call it "somehow", as, more precisely, from my understanding so far, that so called "new API" is ill-conceived and troublesome in more than one regard. I would, with my current knowledge, recommend to avoid, or better, abandon it. Or, in other words: it is similar to what Boeing tried to do, in forcing things upon people via software, for safety reasons - and now see where Boeing got with that. ! > But now, with the now recommended "non-exclusive low-level backup", ! > the task is different: now your before-hook needs to do two things ! > at the same time: ! > 1. keep a socket open in order to hold the connection to postgres ! > (because postgres will terminate the backup when the socket is ! > closed), and ! > 2. invoke exit(0) (because the actual backup will not start until ! > the before- hook has properly delivered a successful exit code. ! > And, that is not only difficult, it is impossible. ! ! It is not impossible. It is harder if you limit your available tools yes, ! but it also *works*. In this description which I choose, I would think it is actually imposs
Re: Something else about Redo Logs disappearing
On Wed, Jun 10, 2020 at 08:32:22AM -0400, Stephen Frost wrote: ! > What repo?? I seem to have missed that at first glance. ! ! Yes, pgbackrest has a repo, like most other tools (though they call them ! different things... pg_basebackup has one though it's not really ! formal). ! ! > Are You indeed suggesting that one should have their data within ! > the database, where it is worked with, and then use Your tool ! > to copy it to some "repo" disk playground whatever area, and then ! > use their regular backup system to COPY IT AGAIN into their ! > backup/archiving system? Are You kiddin'? ! ! No, I'm not kidding and yes, that's what I'm suggesting. Well, do You think I could seriously sell that to anybody? ! You need a ! consistent backup of your database that includes all the needed WAL to ! perform a restore. Okay. So lets behave like professional people and figure how that can be achieved: At first, we drop that WAL requirement, because with WAL archiving it is already guaranteed that an unbroken chain of WAL is always present in the backup (except when we have a bug like the one that lead to this discussion). So this is **not part of the scope**. I'll get to the other item, the "consistent backup", further below. ! This is only one option though, there are others- you can also use ! pgbackrest to push your backups to s3 (or any s3-compatible data storage ! system, which includes some backup systems), and we'll be adding ! support ! I concur that this is becoming a madhouse, and is pushing past the limit ! for what I'm willing to deal with when trying to assist someone. Well, then that might be a misconception. I'm traditionally a consultant, and so I am used to *evaluate* solutions. I don't need assistance for that, I only need precise technical info. So lets get serious: It is NOT technically feasible to amplify the storage in a way that the entire backup data gets copied from the live database to some "repo" place first, and then again from that repo place to regular file-based backup/archiving storage. And it does not make a difference WHERE that other place is, if at Jeff's or whereever. It just does not belong into the loop. So, how could the alternative look like? I for my part consider doing this: With a proper transactional filesystem we can do recursive filesystem snapshots. That means, given a suitable filesystem layout, we can do a snapshot of the data tree, AND the pg_wal filesystem, AND the respective tablespaces. And this is atomical. So, what we then do in the before- hook, is: * we call pg_start_backup(whatever, false, false). * we issue the filesystem snapshot, atomical. * we call pg_stop_backup(whatever). And then we return with exit(0) (if all succeeded), and the file-based backup software can start to collect the files from the filesystem snapshot, and release it afterwards. This STILL needs threaded programming (as I said, there is no way to avoid that with those "new API"), but in this case it is effectively reduced to just grab the return-code of some program that has been started with "&". So far, so good. There is still one thing to be done, namely, the requirement to collect the data reported by pg_stop_backup() and add that to the backup, at a point in time where that is ALREADY CLOSED! (that's another piece of gross bogus in this "new API") - but with my intended method (and some ugliness) this could now also be solved. But then, lets think another step forward: for what purpose do we actually need to call pg_start_backup() and pg_stop_backup() at all? I couldn't find exhaustive information about that, only some partial facts. What we know for certain, is: if we leave these calls away, and just do a filesystem snapshot and make a backup from that, then we have exactly the same thing as if we had a power failure at that time. So this is restoreable, and the server will certainly start. The remaining question is: can we then give it our subsequently archived redologs and make it roll forward before going ready? I don't know that yet, but maybe, if we just throw these WAL into pg_wal, the thing might be stupid enough to swallow them. If that does not work, then there might still be ways to trick it into believing it's a standby server and make it roll forward these WAL. So, there are not only ways to avoid the useless storage amplification, there might even be means to get rid of that whole misconceived "API". Things that remain to be figured out: 1. What does pg_start_backup actually do and why would that be necessary? I could not find exhaustive information, but this can probably figured from the source. Currently I know so much: - it writes a backup_label file. That is just a few lines of ASCII and should not be difficult to produce. - it does a checkpoint. This can probably be done with the respective command (if it is needed at all). - it does temporarily switch to full_page_writes. This cannot be don
Re: Something else about Redo Logs disappearing
On Thu, Jun 11, 2020 at 10:53:15PM +0200, Laurenz Albe wrote: ! On Thu, 2020-06-11 at 22:35 +0200, Magnus Hagander wrote: ! > I believe somebody around that time also wrote a set of bash scripts that can be used in a pre/post-backup-job combination with the current APIs. ! ! https://github.com/cybertec-postgresql/safe-backup Ah, thank You, very nice. I've never seen anybody coding bash - it is strongly shunned in the Berkeley community. Some Questions: 1. There are explicit error messages in loc-82 and -92 of pgpre.sh. To where are these written? 2. The result data from pg_stop_backup() are stored into the living database. But, according to the docs, they should be placed into the completed backup. Do I have a misunderstanding here? 3. The most common error cause of a backup might be tape-side malfunction. So far as I see, the way to handle this is currently, to provide a timeout for pgpre.sh (which is problematic, because we might as well have just reached end-of-tape and have to wait until monday for the operator to change it). May I suggest to add a switch to pgpost.sh, in order to volutarily fail out of the job? 4. If, by misconfiguration and/or operator error, the backup system happens to start a second backup. in parallel to the first, then do I correctly assume, both backups will be rendered inconsistent while this may not be visible to the operator; and the earlier backup would be flagged as apparently successful while carrying the wrong (later) label? BTW: what does, in general, happen, if a backup_label file gets accidentially swapped with one from a parallel, but slightly later backup? Do I correctly assume that such mistake gets somehow detected, as otherwise it would have just the same unwelcome effects (i.e. silent data corruption) as no backup_label at all? cheerio, PMc
Re: Something else about Redo Logs disappearing
On Sat, Jun 13, 2020 at 01:53:28PM +0200, Laurenz Albe wrote: ! > I've never seen anybody coding bash - it is strongly shunned in the ! > Berkeley community. ! ! Strange, but then I don't move in these circles. Never mind. ! > Some Questions: ! > 1. There are explicit error messages in loc-82 and -92 of pgpre.sh. ! > To where are these written? ! ! Standard error. It is up to the caller of the script to route that ! somewhere useful. Understood. ! > 2. The result data from pg_stop_backup() are stored into the living ! > database. But, according to the docs, they should be placed into ! > the completed backup. Do I have a misunderstanding here? ! ! Right, but these scripts don't know anything about that backup itself. ! They are designed to be called before and after the backup. ! In between, you back up the data directory however you think fit. ! ! It is the responsibility of the caller of the post-backup script ! to add the "backup_label" file to the backup. I see. ! > 4. If, by misconfiguration and/or operator error, the backup system ! > happens to start a second backup. in parallel to the first, ! > then do I correctly assume, both backups will be rendered ! > inconsistent while this may not be visible to the operator; and ! > the earlier backup would be flagged as apparently successful while ! > carrying the wrong (later) label? ! ! If you are using my scripts and start a second backup while the first ! one is still running, the first backup will be interrupted. This is not what I am asking. It appears correct to me, that, on the database, the first backup will be interrupted. But on the tape side, this might go unnoticed, and on completion it will successfully receive the termination code from the *SECOND* backup - which means that on tape we will have a seemingly successful backup, which 1. is corrupted, and 2. carries a wrong label. ! This is specific to my scripts, PostgreSQL's non-exclusive backup ! can perform more than one concurrent backup successfully. ! I tried to keep things simple. I understand. But the operator may not know that and/or accidentially start a second backup while one is still running. And this will then result in ... ! If you have the wrong "backup_label", you end up with silent data corruption. ... this. Indeed this is difficult to avoid, because the high risk of silent data corruption is an elementary architectural feature of the so-called "new API". Which is why this is not going to run on my ship. But you will have to wait - the to-be-expected inrush of service-sales due to corrupted clusters will only happen after R.13 is active and peope are *forced* to cope with that "new API". Thanks for the effort of answering my questions. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Thu, Jun 11, 2020 at 10:35:13PM +0200, Magnus Hagander wrote: ! > Okay. So lets behave like professional people and figure how that ! > can be achieved: ! > At first, we drop that WAL requirement, because with WAL archiving ! > it is already guaranteed that an unbroken chain of WAL is always ! > present in the backup (except when we have a bug like the one that ! > lead to this discussion). ! > So this is **not part of the scope**. ! > ! ! I would assume that anybody who deals with backups professionally wouldn't ! consider that out of scope, I strongly disagree. I might suppose You haven't thought this to the proper end. See: The WAL backup is, for a couple of reasons, different to most other backup objects: - it runs on demand, not per schedule. - it must be immediately repeated (or alerted), until success, otherwise the database might stop. - it might use a different storage (e.g. twofold/mirrored), for obvious reasons. - it gets deleted after backup, and that is a point-of-no-return. For the data tree backup, on the contrary, all these attributes are identical to those of any other incr/full filesystem backup. You can see that all the major attributes (scheduling, error-handling, signalling, ...) of a WAL backup are substantially different to that of any usual backup. This is a different *Class* of backup object, therefore it needs an appropriate infrastructure that can handle these attributes correctly. The WAL do *not belong* into the scope of the data tree backup, they are to be handled by different schemes in a different way. But, if You never have considered *continuous* archiving, and only intend to take a functional momentarily backup of a cluster, then You may well have never noticed these differences. I noticed them mainly because I did *BUILD* such an infrastructure (the 20 lines of shell script, you know). And yes, I was indeed talking about *professional* approaches. ! There is *absolutely* no need for threading to use the current APIs. You ! need to run one query, go do something else, and then run another ! query. Wrong. The point is, I dont want to "go do something else", I have to exit() and get back to the initiator at that place. This is also clearly visible in Laurenz' code: he utilizes two unchecked background tasks (processes, in this case) with loose coupling for the purpose, as it does not work otherwise. ! But then, lets think another step forward: for what purpose do we ! > actually need to call pg_start_backup() and pg_stop_backup() at all? ! > I couldn't find exhaustive information about that, only some partial ! > facts. ! > ! ! Since you don't trust the documentation, I suggest you take a look at ! https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=55cac186dc71fcc2f4628f9974b30850bb51eb5d;hb=92c58fd94801dd5c81ee20e26c5bb71ad64552a8#l10438 Thanks, this indeed gives a most entertaining read. Honestly, I've never seen such a source before - there is thrice as much storytime than code. :) The most interesting point in there appears to be this: > that the backup label and tablespace map files are not written to > disk. Instead, their would-be contents are returned in *labelfile > and *tblspcmapfile, This is in do_pg_start_backup() - so we actually HAVE this data already at the *START* time of the backup! Then why in hell do we wait until the END of the backup before we hand this data to the operator: at a time when the DVD with the backup is already fixated and cannot be changed anymore, so that we have to send this data to quarternary storage (spell: the line printer) and have girl Friday sort them out (and probably mix them up)?? Is this indeed Boeing-security: try to avoid possible mistake by creating a much bigger risk? ! Things that remain to be figured out: ! > 1. What does pg_start_backup actually do and why would that be ! > necessary? I could not find exhaustive information, but this can ! > probably figured from the source. Currently I know so much: ! > - it writes a backup_label file. That is just a few lines of ! >ASCII and should not be difficult to produce. ! > ! ! It does that only in exclusive mode, and doing that is one of the big ! problems with exclusive mode. So don't do that. As I can read, there is no difference in the function requirements between exclusive and non-exclusive mode, in that regard: the backup-label file is NOT necessary in the running cluster data tree, BUT it should get into the RESTORED data tree before starting it. And I can't find a single one of those "big problems". What I do find is just people whining that their cluster doesn't start and they can't simply delete a file, even if told so. Like soldier complaining that his gun doesn't shoot and he has no idea how to reload. I could devise a dozen ways how to detect that situation automatically during rc.d and remove the file - if it were of any concern to me. ! > I now hop
Re: Something else about Redo Logs disappearing
On Mon, Jun 15, 2020 at 11:44:33AM +0200, Laurenz Albe wrote: ! On Sat, 2020-06-13 at 19:48 +0200, Peter wrote: ! > ! > 4. If, by misconfiguration and/or operator error, the backup system ! > ! > happens to start a second backup. in parallel to the first, ! > ! > then do I correctly assume, both backups will be rendered ! > ! > inconsistent while this may not be visible to the operator; and ! > ! > the earlier backup would be flagged as apparently successful while ! > ! > carrying the wrong (later) label? ! > ! ! > ! If you are using my scripts and start a second backup while the first ! > ! one is still running, the first backup will be interrupted. ! > ! > This is not what I am asking. It appears correct to me, that, on ! > the database, the first backup will be interrupted. But on the ! > tape side, this might go unnoticed, and on completion it will ! > successfully receive the termination code from the *SECOND* ! > backup - which means that on tape we will have a seemingly ! > successful backup, which ! > 1. is corrupted, and ! > 2. carries a wrong label. ! ! That will only happen if the backup that uses my scripts does the ! wrong thing. Yes. Occasionally software does the wrong thing, it's called "bugs". ! An example: ! ! - Backup #1 calls "pgpre.sh" ! - Backup #1 starts copying files ! - Backup #2 calls "pgpre.sh". ! This will cancel the first backup. ! - Backup #1 completes copying files. ! - Backup #1 calls "pgpost.sh". ! It will receive an error. ! So it has to invalidate the backup. ! - Backup #2 completes copying files. ! - Backup #2 calls "pgpost.sh". ! It gets a "backup_label" file and completes the backup. That's not true. Now let me see how to compile a bash... and here we go: ! An example: ! ! - Backup #1 calls "pgpre.sh" > $ ./pgpre.sh > backup starting location: 1/C858 > $ We now have: > 24129 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 24130 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 24131 10 SJ 0:00.01 psql -Atq > 24158 10 SCJ 0:00.00 sleep 5 And: > postgres=# \d > List of relations > Schema | Name | Type | Owner > ++---+-- > public | backup | table | postgres > (1 row) > > postgres=# select * from backup; > id | state | pid | backup_label | tablespace_map > +-+---+--+ > 1 | running | 24132 | | > (1 row) ! - Backup #1 starts copying files Let's suppose it does now. ! - Backup #2 calls "pgpre.sh". > $ ./pgpre.sh > backup starting location: 1/C924 > $ FATAL: terminating connection due to administrator command > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > connection to server was lost > Backup failed > ./pgpre.sh: line 93: ${PSQL[1]}: ambiguous redirect > > $ echo $? > 0 ! This will cancel the first backup. Yes, it seems it did: > 25279 10 SJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 25280 10 IWJ 0:00.00 /usr/local/bin/bash ./pgpre.sh > 25281 10 SJ 0:00.01 psql -Atq > 25402 10 SCJ 0:00.00 sleep 5 > postgres=# \d > List of relations > Schema | Name | Type | Owner > ++---+-- > public | backup | table | postgres > (1 row) > > postgres=# select * from backup; > id | state | pid | backup_label | tablespace_map > +-+---+--+ > 1 | running | 25282 | | > (1 row) ! - Backup #1 completes copying files. ! - Backup #1 calls "pgpost.sh". > $ ./pgpost.sh > START WAL LOCATION: 1/C924 (file 0001000100C9) > CHECKPOINT LOCATION: 1/C958 > BACKUP METHOD: streamed > BACKUP FROM: master > START TIME: 2020-06-15 14:09:41 CEST > LABEL: 2020-06-15 14:09:40 > START TIMELINE: 1 > > $ echo $? > 0 ! It will receive an error. ! So it has to invalidate the backup. Where is the error? What we now have is this: No processes anymore. > id | state | pid | backup_label > | tablespace_map > +--+---++ > 1 | complete | 25282 | START WAL LOCATION: 1/C924 (file > 0001000100C9)+| > | | | CHECKPOINT LOCATION: 1/C958 > +| > | | | BACKUP METHOD: streamed
Re: Something else about Redo Logs disappearing
On Mon, Jun 15, 2020 at 03:19:29PM +0200, Laurenz Albe wrote: ! On Mon, 2020-06-15 at 14:50 +0200, Peter wrote: ! > ! An example: ! > ! ! > ! - Backup #1 calls "pgpre.sh" ! > ! - Backup #1 starts copying files ! > ! - Backup #2 calls "pgpre.sh". ! > ! This will cancel the first backup. ! > ! - Backup #1 completes copying files. ! > ! - Backup #1 calls "pgpost.sh". ! > ! It will receive an error. ! > ! So it has to invalidate the backup. ! > ! - Backup #2 completes copying files. ! > ! - Backup #2 calls "pgpost.sh". ! > ! It gets a "backup_label" file and completes the backup. ! > ! > That's not true. ! ! Ah, yes, you are right. Thank You. ! Since "pgpre.sh" and "pgpost.sh" are independent, there ! is no way to tell which of them belongs to which other. Correct. ! So calling "pgpost.sh" indeed ends the most recently started ! backup and returns "backup_label" accordingly. ! ! That means: the caller of the scripts has to make sure ! not to start a second backup while the first one is running. Never run two backups in parallel with such an approach, exactly. And that is one of a couple of likely pitfalls I perceived when looking at that new API. We could fix that, but that will then get more complicated - and people will usually not do that. And that's why I consider that new API as rather dangerous. cheerio, PMc
Re: Something else about Redo Logs disappearing
On Sun, Jun 14, 2020 at 03:05:15PM +0200, Magnus Hagander wrote: ! > You can see that all the major attributes (scheduling, error-handling, ! > signalling, ...) of a WAL backup are substantially different to that ! > of any usual backup. ! ! > This is a different *Class* of backup object, therefore it needs an ! > appropriate infrastructure that can handle these attributes correctly. ! > ! ! Yes, this is *exactly* why special-handling the WAL during the base backup ! makes a lot of sense. Certainly. Only I prefer to do the special-handling *outside of* the base backup. ! Is it required? No. ! Will it make your backups more reliable? Yes. *shrug* I have no benefit in increasing reliability from 250% to 330%, if that would be the case at all. ! But, if You never have considered *continuous* archiving, and only ! > intend to take a functional momentarily backup of a cluster, then You ! > may well have never noticed these differences. I noticed them mainly ! > because I did *BUILD* such an infrastructure (the 20 lines of shell ! > script, you know). ! > ! ! Yes, if you take a simplistic view of your backups, then yes. You appear to sound like an insurance salesman who desperately tries to sell a third health insurance policy to somebody who already has two of them, by trying to build on unfounded precariousness. ! ! There is *absolutely* no need for threading to use the current APIs. You ! > ! need to run one query, go do something else, and then run another ! > ! query. ! > ! > Wrong. The point is, I dont want to "go do something else", I have to ! > exit() and get back to the initiator at that place. ! > ! ! That is not a requirement of the current PostgreSQL APIs. We'll be done with that whole API in a few more lines now. (I'm getting tired of this.) ! (in fact, using ! threading would add a significant extra burden there, as libpq does not ! allow sharing of connections between threads) I never said one would need to thread the DB connections. ! That is a requirement, and indeed a pretty sharp limitation, of the *other* ! APIs you are working with, it sounds like. What "other"? ! The PostgreSQL APIs discussed to *not* require you to do an exit(). Nor do ! they require any form of threading. Ah, nice try! But, we're *NOT* shifting blame around. We do instead get things working. We do proper engineering. ! And the fact that you need to do an exit() would negate any threading ! anyway, so that seems to be a false argument regardless. You do know exactly what I'm talking about. ! This is also clearly visible in Laurenz' code: he utilizes two ! > unchecked background tasks (processes, in this case) with loose ! > coupling for the purpose, as it does not work otherwise. ! > ! ! Yes, because he is also trying to work around a severely limited API *on ! the other side*. There is no "other" side. There is only *one* side: to get things working. And for interaction, Jon Postel's law applies: Be conservative in what you provide, and liberal in what you require. This is how the Internet was built. The modern-day linux-youngsters tend to forget that we all stand on the shoulders of giants. ! The most interesting point in there appears to be this: ! > > that the backup label and tablespace map files are not written to ! > > disk. Instead, their would-be contents are returned in *labelfile ! > > and *tblspcmapfile, ! > ! > This is in do_pg_start_backup() - so we actually HAVE this data ! > already at the *START* time of the backup! ! ! ! > Then why in hell do we wait until the END of the backup before we ! > hand this data to the operator: at a time when the DVD with the ! > ! ! Because it cannot be safely written *into the data directory*. ! ! Now, it could be written *somewhere else*, that is true. And then you would ! add an extra step at restore time to rename it back. But then your restore ! would now also require a plugin. Yes, and as it is now, it requires girl Friday to fetch them from the line-printer and mix them up - which, as we already got explained, can end up a *lot* worse. Or, equivalently and as here practically demonstrated, some consultant trainee writing some script which, when accidentially invoked twice, creates an inconsistent backup, and this being invisible to the operator. That's indeed dangerous enough for my taste. But lets grab that from the start: Yes, I didn't trust the docs. Because, as people here are so crazy about the old API being troublesome and dangerous and must be deprecated, and the whole thing being so imminent, then there should be some REASON for that. And from the docs I could not see any reason - so I supposed there must be something else in pg_start_backup(); something that is not explained in the docs, and that would explain the whole bohei. But, in fact, there is no such thing. First, the backup_label, which should not stay in the running cluster tree. So, what bad does happen when it stays there? Nothing at all. The cluster migh
Re: Something else about Redo Logs disappearing
On Mon, Jun 15, 2020 at 09:46:34PM +0200, Laurenz Albe wrote: ! On Mon, 2020-06-15 at 19:00 +0200, Peter wrote: ! > And that is one of a couple of likely pitfalls I perceived when ! > looking at that new API. ! ! That is a property of my scripts, *not* of the non-exclusive ! backup API... Then how would I have figured it before even knowing Your scripts? The API requires such kind of programming, and such kind of programming creates dangerous pitfalls. After You have trained soms hundreds of system administrators, You will also see such things right at first glance. But then it's futile to discuss with religious people, as they are blind to reasoning: AL II. 58, "The slaves shall serve." cheerio, PMc
SELECT creates millions of temp files in a single directory
In modern versions of postgres a simple SELECT writes a couple of millions of individual temp files into a single directory under pgsql_tmp. I know of no filesystem that would take such lightly, and even ZFS gets some problems with such extremely long directories. What is the rationale in this behaviour and how is it supposed to be handled? The specific case is a database that is in use for almost 20 years. It happened to run on a pentium-2 with 768 MB memory, there it was slow, but did work reliably. Now it runs on a 2660v3 with 32 GB memory, and tends to exhaust that memory. Database size has not increased, postgres memory configuration has not been changed, only postgres versions were gradually upgraded from 8 to 12. This is the memory configuration: shared_buffers = 40MB temp_buffers = 20MB work_mem = 50MB max_stack_depth = 40MB max_files_per_process = 200 But the actual memory consumption is 30 GB (per query!), i.e. all of the installed memory: UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 770 53143 10252 16 20 0 9359944 7796128 zio->io_ DsJ- 3:11.29 postgres: bareos bareos fd00::118(53471) SELECT (postgres) 770 54334 10252 17 20 0 9279780 24388 zio->io_ DsJ- 2:58.19 postgres: parallel worker for PID 53143(postgres) 770 54335 10252 17 20 0 9279780 22168 zfs DLsJ - 2:51.30 postgres: parallel worker for PID 53143(postgres) This is the situation on the filesystem: $ data12/base # du -k 16387/ 9093312 16387/ $ data12/base/pgsql_tmp # du -k * 19979644pgsql_tmp53143.0.sharedfileset $ data12/base/pgsql_tmp/pgsql_tmp53143.0.sharedfileset # ls -la | wc 1264755 11382788 96271672 More than a million files in a single directory, this is inacceptable. This is the query: SELECT DISTINCT Path.PathId, File.PathId, Path.Path FROM Path LEFT JOIN File USING (PathId) LEFT JOIN PathHierarchy ON (Path.Pathid = PathHierarchy.PPathId) WHERE File.PathId IS NULL AND PathHierarchy.PPathId IS NULL LIMIT 30 These are the stats: Path: 464229 live tuples, 42 MB, 49 MB index File: 43779725 live tuples, 1 dead tuple, 7160 MB, 2971 MB index PathHierarchy: 380879 live tuples, 13 MB, 17 MB index The pathhierarchy table is irrelevant to the behaviour and can be left out. Vacuum and Analyze has been run right before the query. This is the structure: CREATE TABLE IF NOT EXISTS public.path ( pathid integer NOT NULL DEFAULT nextval('path_pathid_seq'::regclass), path text COLLATE pg_catalog."default" NOT NULL, CONSTRAINT path_pkey PRIMARY KEY (pathid) ) CREATE TABLE IF NOT EXISTS public.file ( fileid bigint NOT NULL DEFAULT nextval('file_fileid_seq'::regclass), fileindex integer NOT NULL DEFAULT 0, jobid integer NOT NULL, pathid integer NOT NULL, deltaseq smallint NOT NULL DEFAULT 0, markid integer NOT NULL DEFAULT 0, fhinfo numeric(20,0) NOT NULL DEFAULT 0, fhnode numeric(20,0) NOT NULL DEFAULT 0, lstat text COLLATE pg_catalog."default" NOT NULL, md5 text COLLATE pg_catalog."default" NOT NULL, name text COLLATE pg_catalog."default" NOT NULL, CONSTRAINT file_pkey PRIMARY KEY (fileid) ) CREATE INDEX IF NOT EXISTS file_jpfid_idx ON public.file USING btree (jobid ASC NULLS LAST, pathid ASC NULLS LAST, name COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; And this does not give the explanation: bareos=# explain SELECT DISTINCT Path.PathId, File.PathId, Path.Path FROM Path LEFT JOIN File USING (PathId) LEFT JOIN PathHierarchy ON (Path.Pathid = PathHierarchy.PPathId) WHERE File.PathId IS NULL AND PathHierarchy.PPathId IS NULL LIMIT 30; QUERY PLAN --- Limit (cost=1367204.20..1370204.20 rows=30 width=67) -> HashAggregate (cost=1367204.20..1371346.45 rows=414225 width=67) Group Key: path.pathid, file.pathid, path.path -> Gather (cost=1225693.97..1364097.51 rows=414225 width=67) Workers Planned: 2 -> Parallel Hash Anti Join (cost=1224693.97..1321675.01 rows=172594 width=67) Hash Cond: (path.pathid = file.pathid) -> Parallel Hash Anti Join (cost=6727.04..19953.72 rows=181359 width=63) Hash Cond: (path.pathid = pathhierarchy.ppathid) -> Parallel Seq Scan on path (cost=0.00..7361.29 rows=193429 width=63) -> Parallel Hash (cost=3926.46..3926.46 rows=224046 width=4) -> Parallel Seq Scan on pathhierarchy (cost=0.00..3926.46 rows=224046 width=4) -> Parallel Hash (cost=918690.59..918690.59 rows=18241547 width=4) ->
Re: SELECT creates millions of temp files in a single directory
On Sat, Apr 23, 2022 at 02:11:00PM -0700, Adrian Klaver wrote: ! On 4/23/22 12:50, Peter wrote: ! ! ! > People seem to have been brainwashed by Web-Services and OLTP, ! > and now think the working set must always fit in memory. But this ! > is only one possible usecase, it is not the exclusive only one. ! ! This is no-win situation as most of the complaints in recent years have been ! that Postgres was/is to conservative in its default settings and is not ! taking advantage of newer more powerful hardware. I know, and You got to the point; this is exactly what I am talking about: people take the abundance of ressources as granted. In Rel. 8 postgres was a breathtaking beauty of engineering: the style of old, driven to perfection. Now You're gradually sacrificing this, for the speed junkies and to protect those from mistakes who are not engineers. And no, I don't know how this could be solved: the more influential you get, the more driven you are by public demand; the less freedom you have to follow ideals. David Johnston thinks we must just not speak that out, we must instead behave like "the emperor's new clothes", and follow google's understanding of "positive values". Sorry, that doesn't work for me.
Re: SELECT creates millions of temp files in a single directory
On Sat, Apr 23, 2022 at 05:13:41PM -0400, Tom Lane wrote: ! "David G. Johnston" writes: ! > I'll add that given the nature of the problem that changing temp_file_limit ! > away from its default of unlimited may be useful. ! > https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-DISK ! ! Maybe ... that limits the total space involved, not the number of ! files it's separated into, so I'm not sure how much it can help. That's what I understood from the docs, too. What I also read in the docs is that it will just kill the query when it hits the limit, and this is not really what I want. And btw, I probably lost-in-translation the relevant info about the running version: Name : postgresql12-server Version: 12.10 Installed on : Mon Apr 4 04:13:18 2022 CEST Origin : databases/postgresql12-server Architecture : FreeBSD:13:amd64 ! It might be worth playing around to see how varying values of work_mem ! affect this behavior, too. That should change the planner's initial ! estimate of the number of hash batches needed, which likely plays into ! this somehow. Agreed. But then, looking at the generated filenames, in the style of "i44297of524288.p1.0" - this is an algorithm at work, so somebody must have done this, and obviousely didn't bother to create half a million of files, after having created another half million already. So I thought I might just ask what is the idea with this. | > It would help if you can provide a self-contained demonstration | > that others can then verify and debug (or explain). | | ... and that. As this message stands, it's undocumented whining. | Please see | | https://wiki.postgresql.org/wiki/Slow_Query_Questions | | for the sort of information we need to debug performance issues. It is not a performance issue, it is a design question: You inflict pain on my beloved ZFS, and as a lover I react. ;) | (I recall that we have fixed some problems in the past that could | lead to unreasonable numbers of temp files in hash joins. Whether | there's more to do, or Peter is running a version that lacks those | fixes, is impossible to tell with the given info.) Yes, I was accidentially deleting that info too when I deleted the more extensive rants from my original posting. See here, above.
Why is my table continuousely written?
Hello, this is postgres version 12.11_1 on Freebsd 13.1 I have a table "mess", filename "6309215", that behaves strange. Data is currently only inserted/appended, there is no update and no (successful) delete done, autovacuum is NOT enabled. This is the definition: CREATE TABLE IF NOT EXISTS dnstap.mess ( id bigint NOT NULL DEFAULT nextval('dnstap.mess_id_seq'::regclass), mess text COLLATE pg_catalog."default", hash text COLLATE pg_catalog."default", CONSTRAINT mess_pkey PRIMARY KEY (id), CONSTRAINT mess_hash_key UNIQUE (hash) ) TABLESPACE l1only; And this is the table: -rw--- 1 770 770 1073741824 Aug 2 04:38 6309212 -rw--- 1 770 770 1073741824 Aug 2 04:38 6309212.1 -rw--- 1 770 770 1073741824 Aug 2 04:38 6309212.2 -rw--- 1 770 770 1073741824 Aug 2 04:39 6309212.3 -rw--- 1 770 770 1073741824 Aug 2 04:39 6309212.4 -rw--- 1 770 770 405872640 Aug 3 04:28 6309212.5 -rw--- 1 770 770 1441792 Aug 3 03:25 6309212_fsm -rw--- 1 770 770 180224 Aug 2 05:13 6309212_vm -rw--- 1 770 770 0 Aug 2 00:21 6309213 -rw--- 1 770 7708192 Aug 2 04:40 6309214 -rw--- 1 770 770 1073741824 Aug 3 04:29 6309215 -rw--- 1 770 770 1073741824 Aug 3 04:30 6309215.1 -rw--- 1 770 770 1073741824 Aug 3 04:13 6309215.10 -rw--- 1 770 770 995811328 Aug 3 04:16 6309215.11 -rw--- 1 770 770 1073741824 Aug 3 04:32 6309215.2 -rw--- 1 770 770 1073741824 Aug 3 04:34 6309215.3 -rw--- 1 770 770 1073741824 Aug 3 04:35 6309215.4 -rw--- 1 770 770 1073741824 Aug 3 04:37 6309215.5 -rw--- 1 770 770 1073741824 Aug 3 04:39 6309215.6 -rw--- 1 770 770 1073741824 Aug 3 04:41 6309215.7 -rw--- 1 770 770 1073741824 Aug 3 04:42 6309215.8 -rw--- 1 770 770 1073741824 Aug 3 04:12 6309215.9 -rw--- 1 770 770 3170304 Aug 3 04:38 6309215_fsm -rw--- 1 770 770 393216 Aug 3 04:38 6309215_vm -rw--- 1 770 770 1073741824 Aug 3 03:17 6309216 -rw--- 1 770 770 1073741824 Aug 2 04:33 6309216.1 -rw--- 1 770 770 241360896 Aug 3 03:18 6309216.2 -rw--- 1 770 770 606208 Aug 3 03:18 6309216_fsm -rw--- 1 770 770 73728 Aug 3 03:17 6309216_vm -rw--- 1 770 77043139072 Aug 3 03:18 6309217 There is another table, "6309212", that is also only appended data, and that behaves as expected: only the final segment is written, the other segments are a day old. But in "6309215" all the segments were recently written. ("6309216" is the TOAST for "6309215", and "6309217" seems to be an index on that.) This is what happens: 2022-08-03 03:09:03 CEST -- --- Check for hash value clash with old entries SELECT i1.id FROM dnstap.incoming AS i1, dnstap.mess AS mess WHERE mess.hash = md5(i1.mess) AND mess.mess <> i1.mess I don't think a SELECT like this has reasons to write the referenced table - but it does: -rw--- 1 770 770 1073741824 Aug 3 03:09:25 2022 6309215 -rw--- 1 770 770 1073741824 Aug 3 03:09:25 2022 6309215.1 -rw--- 1 770 770 1073741824 Aug 3 03:09:47 2022 6309215.10 -rw--- 1 770 770 965394432 Aug 3 03:10:17 2022 6309215.11 -rw--- 1 770 770 1073741824 Aug 3 03:09:25 2022 6309215.2 -rw--- 1 770 770 1073741824 Aug 3 03:09:20 2022 6309215.3 -rw--- 1 770 770 1073741824 Aug 3 03:09:25 2022 6309215.4 -rw--- 1 770 770 1073741824 Aug 3 03:09:25 2022 6309215.5 -rw--- 1 770 770 1073741824 Aug 3 03:10:18 2022 6309215.6 -rw--- 1 770 770 1073741824 Aug 3 03:10:21 2022 6309215.7 -rw--- 1 770 770 1073741824 Aug 3 03:09:25 2022 6309215.8 -rw--- 1 770 770 1073741824 Aug 3 03:09:21 2022 6309215.9 And this goes on in that way with all the queries. And when finally the vacuum analyze is run, it is the same, only that one writes the segments in sequence (only inserts were appended to the end, there are no updates or deletes): 2022-08-03 04:28:16 CEST VACUUM (ANALYZE) dnstap.mess; -rw--- 1 770 770 1073741824 Aug 3 04:29 6309215 -rw--- 1 770 770 1073741824 Aug 3 04:30 6309215.1 -rw--- 1 770 770 1073741824 Aug 3 04:13 6309215.10 -rw--- 1 770 770 995811328 Aug 3 04:16 6309215.11 -rw--- 1 770 770 1073741824 Aug 3 04:32 6309215.2 -rw--- 1 770 770 1073741824 Aug 3 04:33 6309215.3 -rw--- 1 770 770 1073741824 Aug 3 04:07 6309215.4 -rw--- 1 770 770 1073741824 Aug 3 04:08 6309215.5 -rw--- 1 770 770 1073741824 Aug 3 04:09 6309215.6 -rw--- 1 770 770 1073741824 Aug 3 04:10 6309215.7 -rw--- 1 770 770 1073741824 Aug 3 04:11 6309215.8 -rw--- 1 770 770 1073741824 Aug 3 04:12 6309215.9 -rw--- 1 770 770 3170304 Aug 3 03:18 6309215_fsm -rw--- 1 770 770 393216 Aug 3 04:28 6309215_vm What is worse is this: -rw--- 1 770 770 16777216 Aug 3 04:28 000100780067.ok -rw--- 1 770
Why is my table continuousely written? -> XID issue?
I had a look into what actually changed in the table. At the first write that I grabbed, four rows in that segment had such a change: 117ee000 77 00 00 00 00 df b8 82 8e a4 00 00 64 00 a0 00 |w...d...| 117ee000 77 00 00 00 f0 22 b4 f3 68 d3 00 00 64 00 a0 00 |w"..h...d...| 117ee0a0 f8 5c 00 00 f8 5c 00 00 09 00 00 00 06 00 f7 8b |.\...\..| 117ee0b0 13 00 04 00 93 00 18 0d d8 27 fe 01 00 00 00 00 |.'..| 117ee0b0 13 00 04 00 93 01 18 0d d8 27 fe 01 00 00 00 00 |.'..| 1aad8000 77 00 00 00 30 fc 8c a5 80 8a 00 00 58 00 a8 00 |w...0...X...| 1aad8000 77 00 00 00 e8 12 8a f3 6e 5b 00 00 58 00 a8 00 |w...n[..X...| 1aad80a0 f6 5d 00 00 00 00 00 00 |.]..| 1aad80b0 09 00 00 00 06 00 6c d5 10 00 04 00 03 08 18 0d |..l.| 1aad80b0 09 00 00 00 06 00 6c d5 10 00 04 00 03 09 18 0d |..l.| If I get that right, then it is the HEAP_XMIN_COMMITTED from t_infomask that got set for the rows. The next write is a bit different: a000 76 00 00 00 60 9d 84 d0 23 4c 04 00 70 00 50 01 |v...`...#L..p.P.| a000 78 00 00 00 80 81 e9 05 78 0d 04 00 70 00 50 01 |x...x...p.P.| a690 cf 51 00 00 00 00 00 00 |504c54...Q..| a690 cf 51 00 00 13 5e 00 00 |504c54...Q...^..| a6a0 00 00 00 00 06 00 05 00 12 00 04 00 03 2b 18 0d |.+..| a6a0 00 00 00 00 06 00 05 00 12 00 04 00 93 23 18 0d |.#..| 003ec000 76 00 00 00 30 98 85 d0 59 e9 04 00 60 00 48 01 |v...0...Y...`.H.| 003ec000 78 00 00 00 08 33 1b 03 bd 97 04 00 60 00 48 01 |x3..`.H.| 003eddf0 cf 51 00 00 00 00 00 00 00 00 00 00 06 00 f6 01 |.Q..| 003eddf0 cf 51 00 00 13 5e 00 00 00 00 00 00 06 00 f6 01 |.Q...^..| 003ede00 02 00 04 00 03 2b 18 0d f3 56 2a 00 00 00 00 00 |.+...V*.| 003ede00 02 00 04 00 93 23 18 0d f3 56 2a 00 00 00 00 00 |.#...V*.| It seems here is the t_xmax for the row updated, and also flags in t_infomask changed - and there are about 1000 rows (of some 3 mio. that are contained in this segment) where this is done. So if I get this right, it looks like some out-of-band housekeeping is done here. Now, while I would understand that this is done as occasion offers when a record is read anyway, and I do not think that I grok the XID housekeeping in fullness, I also do not think that I have any open transactions that would persist a server restart. So I was under the impression that a server restart plus a vacuum FULL, without any further update/delete operations, should resolve all the needs for such housekeeping for the time being, and quiesce the situation. Apparently it does not. So, then, what does?
Re: pkg: two postgresql clients
On Wed, Aug 10, 2022 at 09:25:37AM +0200, m...@ft-c.de wrote: Hi Franz, You will get much better targeted help with such questions at https://forums.freebsd.org (if it is FreeBSD you're running) or in German on https://www.bsdforen.de/ (for all tastes of Berkekey). Something is apparently wrong here. postgres-15 is AFAIK the development branch, and it should not even be possible to install two versions at the same time on the same node. Which repository server are You using? (I compile locally, so I don't know what might be offered from the public servers.) And what is Your OS version? cheerio, PMc ! Hello, ! ! 1. ! pkg upgrade show this result: ! ! New packages to be INSTALLED: ! postgresql13-client: 13.7 ! postgresql15-client: 15.b1_1 ! ! Installed packages to be UPGRADED: ! firefox: 103.0.1,2 -> 103.0.2,2 ! nspr: 4.34 -> 4.34.1 ! opencore-amr: 0.1.5 -> 0.1.6 ! py39-psutil: 5.9.1_1 -> 5.9.1_2 ! ! There are two different PG clients. Is this correct? ! ! 2. ! The command ! pkg install firefox ! New packages to be INSTALLED: ! postgresql15-client: 15.b1_1 ! Installed packages to be UPGRADED: ! firefox: 103.0.1,2 -> 103.0.2,2 ! ! PG is not a dependent package of firefox? ! ! I use version: ! postgres -V ! postgres (PostgreSQL) 14.1 ! ! Franz
Re: How to make PostreSQL utilities honor home directories?
Well, like others mentioned before, it is not getting fully clear what You are trying to achieve. But, in any case, if this is Your problem On Thu, Sep 01, 2022 at 06:01:02PM -0400, Jeffrey Walton wrote: ! Hi Everyone, ! ! We are having a heck of a time getting PostreSQL utilities to honor ! home directories. For example, when I execute this script: ! ! sudo -H -u postgres PGPASSWORD=${password} \ ! psql -h "${hostname}" -U "${username}" -d "${database}" \ ! --command="..." ! ! It produces failures: ! ! could not change directory to "/home/jwalton/godojo": Permission denied ... this appears to me as rather a sudo issue. Because certainly psql cannot execute /as user postgres/ in a directory where user postgres is not allowed to enter. So sudo should fix that, and in my sudo installation I find either a "-D directory" option for sudo (that should change the directory accordingly) or a "--login" option (that would run a full login shell for the user postgres, which, alongside going to the postgres homedir, does a lot of other things which may or may not be desireable in your installation).
Tools for moving normalized data around
Hi, imagine I have a database containing normalized data: a whole bunch of tables all related via foreign keys (i.e. the thing one should usually have ;) ). So there is a dependency graph: all records relate to others in some tree-like fashion (has-many, belongs-to, etc.) Now I want to grab some part of the data, on a certain condition (let's say all records belonging to user 'Bob', if there is a "user" table somewhere at the tree-bottom), and move it to another database with the very same layout - which is already populated with data and runs a different instance of the same application. Grabbing the intended records is just some dead-simple Selects. But then inserting them into the other database is not fun, because some primary keys will likely collide. And if we start to update the primary keys to new unique values, we must consequently update ALL the foreign keys throughout the entire tree of tables. How to do this with two tables connected via one foreign key, that is explained a dozen times in Stackoverflow. But what if the tree is 50 tables and 120 foreign key columns? It can be done. But probably not manually. So, as this seems a very usual use-case for normalized data, is there any tooling available? Anywhere? (I searched, I didn't find.) Also, it seems the provided commands are not very supporting. Try to do an Insert and *always* create a new PK from the sequence, and return the old and the new PK for the inserted row. It seems this does not work without either naming all the other columns explicitely in the insert (impossible, they are always different) or establishing some trigger functions... Different example, same general problem: Imagine I have such a tree of normalized tables, and I want to grab a part of it and roll only that part back in time, to some state it was X weeks ago. (That's possible if we store records of changes in an extra column within the rows themselves. It might even be possible with some postgres-internal data, which has a knowledge of history - but there seems no API access to that.) But I want to have this working on click-button, independent of the table layouts, only with a basic graph of the dependency tree and with an unlimited number of involved tables, i.e.: maintainable. I have searched for solutions (or building blocks to solutions) for these and similar problems, and didn't find much. I don't understand that - SQL is old, normalization is old, (even postgres has already become quite old) and these are genuine issues. It can't be I'm the only one thinking about such things. So maybe I'm looking at the wrong place? Any ideas welcome. Cheers, PMc
Queries running forever, because of wrong rowcount estimate
TL;DR Version: == For a table where all rows have been deleted, under certain conditions the planner does assume that it contains the values that were previousely present, and will therefore assume that nothing needs to be added, while in fact everything needs to be added. - This is PostgreSQL 12.13 on amd64-portbld-freebsd13.1, compiled by FreeBSD clang version 13.0.0 (g...@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303), 64-bit and if somebody can tell me this has been fixed in a more recent version, then I am happy. --- More elaborate Version: === When you have a table populated, pg_class may look like this: relpages | reltuples --+--- 1 | 3 Now when you delete all rows, this will not change, and the planner will now make the -wrong- assumption that the tuples are still present, and planning may be bad. That is why ANALYZE is now needed, and after ANALYZE things look like this, and do work again: relpages | reltuples --+--- 1 | 0 But, if you happen to run a VACUUM ANALYZE on that table, then things will look like this: relpages | reltuples --+--- 0 | 0 And now, for reasons unknown to me, the planner will again make the assupltion that all the previous values are still present in the table, and planning will be bad. And at that point you can run ANALYZE as often as you want, it does not help anymore. --- Full Version: = Lets start with the query: INSERT INTO opcode (opcode) SELECT DISTINCT incoming.opcode AS opcode FROM incoming AS incoming LEFT JOIN opcode AS opcode USING (opcode) WHERE opcode.opcode IS NULL Table "incoming" will bring millions of rows, but column "opcode" knows only three distinct values. Others might probably appear, so this INSERT will add such new value to table "opcode". So far, no problem. But what happens if I delete all rows from "opcode" and start the process afresh? Then the INSERT *should* reinsert all the (three) values freshly into table "opcode", i.e. there will be *millions* of rows "WHERE opcode.opcode IS NULL", before the "DISTINCT". This is also not a problem. Not yet. Now we decide that not all the rows in table "incoming" are already processable, so we limit the whole operation to those that are: INSERT INTO opcode (opcode) SELECT DISTINCT incoming.opcode AS opcode FROM incoming AS incoming LEFT JOIN opcode AS opcode USING (opcode) WHERE opcode.opcode IS NULL AND EXISTS ( SELECT ready.id FROM tmp_ready AS ready WHERE ready.id = incoming.id ) And now this query will run forever and never return: "Unique (cost=434008.43..434008.44 rows=1 width=6)" " -> Sort (cost=434008.43..434008.44 rows=1 width=6)" "Sort Key: incoming.opcode" "-> Nested Loop Semi Join (cost=1058.15..434008.42 rows=1 width=6)" " Join Filter: (incoming.id = pairs.ori)" " -> Gather (cost=1058.15..151427.56 rows=1 width=14)" "Workers Planned: 3" "-> Hash Anti Join (cost=58.15..150427.46 rows=1 width=14)" " Hash Cond: (incoming.opcode = opcode.opcode)" " -> Parallel Seq Scan on incoming (cost=0.00..147589.32 rows=741332 width=14)" " -> Hash (cost=31.40..31.40 rows=2140 width=6)" "-> Seq Scan on opcode (cost=0.00..31.40 rows=2140 width=6)" " -> Seq Scan on tmp_ready ready (cost=0.00..253869.27 rows=2296927 width=8)" The LEFT JOIN is estimated with only one result row (we know there can actually be millions), and therefore this goes into a Nested Loop that will read tmp_ready forever. Putting an Index onto tmp_ready helps the matter - but that is not the root cause here. The root cause is a wrong estimate: When deleting all rows from table "opcode", this apparently does not change the planner behaviour. The planner still thinks that all values from the incoming.opcode column are already in opcode.opcode. After trying with different experiments, I finally managed to somehow kick ANALYZE to produce working statistics, and now it looks like this: "HashAggregate (cost=554927.43..554927.44 rows=1 width=6)" " Group Key: incoming.opcode" " -> Hash Semi Join (cost=291554.87..549213.83 rows=2285442 width=6)" "Hash Cond: (incoming.id = ready.id)" "-> Hash Anti Join (cost=1.01..192070.20 rows=2286638 width=14)" " Hash Cond: (incoming.opcode = opcode.opcode)" " -> Seq S
[Testcase] Queries running forever, because of wrong rowcount estimate
CREATE DATABASE ttc WITH OWNER = admin ENCODING = 'UTF8' LC_COLLATE = 'de_DE.UTF-8' LC_CTYPE = 'de_DE.UTF-8' TABLESPACE = pg_default CONNECTION LIMIT = -1 IS_TEMPLATE = False; select version(); PostgreSQL 12.13 on amd64-portbld-freebsd13.1, compiled by FreeBSD clang version 13.0.0 (g...@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303), 64-bit CREATE TABLE public.incoming ( id bigint NOT NULL, data text COLLATE pg_catalog."default", CONSTRAINT incoming_pkey PRIMARY KEY (id) ) TABLESPACE pg_default; insert into incoming(id) select generate_series(1,100); update incoming set data = 'EINS' where data IS NULL; insert into incoming(id) select generate_series(101,200); update incoming set data = 'ZWEI' where data IS NULL; insert into incoming(id) select generate_series(201,300); update incoming set data = 'DREI' where data IS NULL; CREATE TABLE IF NOT EXISTS public.data ( data text COLLATE pg_catalog."default" NOT NULL, CONSTRAINT data_pkey PRIMARY KEY (data) ) TABLESPACE pg_default; insert into data (data) values ('EINS'), ('DREI'); analyze public.data; analyze public.incoming; explain select distinct data from incoming Left outer join public.data using (data) where data.data is null; HashAggregate (cost=67371.04..67371.07 rows=3 width=5) Group Key: incoming.data -> Hash Anti Join (cost=0.55..64871.04 rows=100 width=5) Hash Cond: (incoming.data = data.data) -> Seq Scan on incoming (cost=0.00..44745.50 rows=300 width=5) -> Hash (cost=0.52..0.52 rows=2 width=5) -> Seq Scan on data (cost=0.00..0.52 rows=2 width=5) delete from data; vacuum analyze data; explain select distinct data from incoming Left outer join public.data using (data) where data.data is null; Unique (cost=56056.08..56056.09 rows=1 width=5) -> Sort (cost=56056.08..56056.09 rows=1 width=5) Sort Key: incoming.data -> Hash Anti Join (cost=60.58..56056.07 rows=1 width=5) Hash Cond: (incoming.data = data.data) -> Seq Scan on incoming (cost=0.00..44745.50 rows=300 width=5) -> Hash (cost=29.70..29.70 rows=2470 width=5) -> Seq Scan on data (cost=0.00..29.70 rows=2470 width=5) "rows=1" in the "Hash Anti Join" line is WRONG. It should be 300. Or at least some thousands. On the next-higher level there will now a Nested Loop chosen. And that Nested Loop will do whatever costly things it needs to do - only not 1 time but three million times. I think I have a workaround patch also. --- Server Configuration Tuning: < #port = 5432 # (change requires restart) < #max_connections = 100# (change requires restart) --- > port = 5434 # (change requires restart) > max_connections = 60 # (change requires restart) < #shared_buffers = 32MB# min 128kB --- > shared_buffers = 40MB # min 128kB < #temp_buffers = 8MB # min 800kB --- > temp_buffers = 20MB # min 800kB < #work_mem = 4MB # min 64kB < #maintenance_work_mem = 64MB # min 1MB --- > work_mem = 50MB # min 64kB > maintenance_work_mem = 50MB # min 1MB < #max_stack_depth = 2MB# min 100kB --- > max_stack_depth = 40MB# min 100kB < #max_files_per_process = 1000 # min 25 --- > max_files_per_process = 200 # min 25 < #effective_io_concurrency = 1 # 1-1000; 0 disables prefetching --- > effective_io_concurrency = 5 # 1-1000; 0 disables prefetching < #max_parallel_workers_per_gather = 2 # taken from max_parallel_workers --- > max_parallel_workers_per_gather = 0 # taken from max_parallel_workers < #synchronous_commit = on # synchronization level; --- > synchronous_commit = off # synchronization level; < #wal_sync_method = fsync # the default is the first option --- > wal_sync_method = fsync # the default is the first option < #full_page_writes = on# recover from partial page writes --- > full_page_writes = off# recover from partial page > writes < #wal_init_zero = on # zero-fill new WAL files --- > wal_init_zero = off # zero-fill new WAL files < #wal_writer_delay = 200ms # 1-1 milliseconds --- > wal_writer_delay = 2000ms # 1-1 milliseconds < #checkpoint_timeout = 5min# range 30s-1d --- > checkpoint_timeout = 10min# range 30s-1d < #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 --- > checkpoint_completion_target = 0.0# checkpoint target du
Re: [Testcase] Queries running forever, because of wrong rowcount estimate
On Mon, Feb 13, 2023 at 12:38:12PM -0500, Tom Lane wrote: ! Peter writes: ! > "rows=1" in the "Hash Anti Join" line is WRONG. It should be ! > 300. Or at least some thousands. ! ! FWIW, this behaves better in v14 and up. In older versions there's ! an ambiguity about what relpages=reltuples=0 means; it's the state ! when ANALYZE has never yet examined the table contents, but it's ! also the state when we have found the table to be empty. Because ! of the former case, older planner versions don't like to believe ! that a table is empty even if reltuples says so. Thanks, Tom, I found that as change 3d351d916b2 - that should make the issue more unlikely and not so prone to happen during testing. OTOH, that code did apparently not do anything evil that would not be possible to happen under other circumstances. It's unlikely, because one would at some point change such queries to a more performant style, but I am currently trying to understand why it can happen at all... cheers, PMc
[Outcome] Queries running forever, because of wrong rowcount estimate
On Mon, Feb 13, 2023 at 12:38:12PM -0500, Tom Lane wrote: ! Peter writes: ! > "rows=1" in the "Hash Anti Join" line is WRONG. It should be ! > 300. Or at least some thousands. ! ! FWIW, this behaves better in v14 and up. In older versions there's ! an ambiguity about what relpages=reltuples=0 means; it's the state ! when ANALYZE has never yet examined the table contents, but it's ! also the state when we have found the table to be empty. Because ! of the former case, older planner versions don't like to believe ! that a table is empty even if reltuples says so. ! ! regards, tom lane Okay, I got someway through with it. Given, the destination table is empty, and VACUUMed. Then there is relpages = reltuples = 0. We do some "source LEFT OUTER JOIN destination ON (matchfield) WHERE matchfield IS NULL" The source is big, but carries only a few distinct matchfield values. The so-called "hack" in heapam_handler.c:heapam_estimate_rel_size() does then makes the planner believe that there are actually 10 curpages. This is not wrong, because the same might happen if one does an INSERT amounting for 10 pages and not yet ANALYZE. We then get a calculated rowcount of whatever, lets assume 2330 rows. Then we go into eqjoinsel(): n_distinct for the left source table is known, it is (lets assume) 4. n_distinct for the right destination table is not known because it is not analyzed, but it is UNIQUE, so it becomes 2330. We get into eqjoinsel_inner(): MCV (right) for the destination is not known because it is not analyzed. selec = selec / nd2 => 0.000429 We get into eqjoinsel_semi(): if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; Voila, selec = 1.0 And so the ANTI JOIN will estimate to 0 (aka 1) result rows, instead of whatever rowcount the source brings along (usually big). - cut --- Okay, so this should be fixed in R.14. But, as mentioned above, the same can also happen in normal operations Example: I have a logfile from my webserver and a table to collect the (unique) IP-addresses. I start with an empty table and feed in the log. First round, the destination addresses. Lats assume there are 2330 different addresses. Second round, the source addresses. There are only 4 different ones. I don't do an ANALYZE between the two steps. And voila, I get the same effect as above. (Been there, done it: duration: 30216273.583 ms) Agreed, this is an inefficient way of doing that. It would be better to do a UNION on the source data, at that point filter it accordingly, then DISTINCT, and only then the ANTI JOIN. But I'm a lazy guy and I may not care if a query that could run in 0.5 seconds takes 60 seconds instead. However, it's not so nice when it takes 10 hours. Anyway, I don't get a real clue on what this stance should actually achieve (from selfuncs.c:eqjoinsel_semi()#2773): if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; else selec = (nd2 / nd1) * (1.0 - nullfrac1); This seems to assume a 100% match whenever the left (source) table brings fewer distinct(!) values than the right (destination) table already carries. For now, I have just disabled this behaviour, in the rough&dirty way: --- src/backend/optimizer/path/costsize.c.orig 2022-11-07 22:47:13.0 +0100 +++ src/backend/optimizer/path/costsize.c 2023-02-13 00:04:54.156489000 +0100 @@ -4685,8 +4685,11 @@ /* pselec not used */ break; case JOIN_ANTI: + if(fkselec * jselec > 0.9975) { + jselec = 0.9975; + } nrows = outer_rows * (1.0 - fkselec * jselec); nrows *= pselec; break; default: /* other values not expected here */ cheerio, PMc
Autovacuum endless loop in heap_page_prune()?
Good morning, I just found Autovacuum run for 6 hours on a 8 GB table, VACUUM query doesnt cancel, cluster doesn't stop, autovacuum worker is not killable, truss shows no activity, after kill -6 this backtrace: * thread #1, name = 'postgres', stop reason = signal SIGABRT * frame #0: 0x00548063 postgres`HeapTupleSatisfiesVacuumHorizon + 531 frame #1: 0x0054aed9 postgres`heap_page_prune + 537 frame #2: 0x0054e38a postgres`heap_vacuum_rel + 3626 frame #3: 0x006af382 postgres`vacuum_rel + 626 frame #4: 0x006aeeeb postgres`vacuum + 1611 frame #5: 0x007b4664 postgres`do_autovacuum + 4292 frame #6: 0x007b2342 postgres`AutoVacWorkerMain + 866 frame #7: 0x007b1f97 postgres`StartAutoVacWorker + 39 frame #8: 0x007ba0df postgres`sigusr1_handler + 783 frame #9: 0x0008220da627 libthr.so.3`___lldb_unnamed_symbol683 + 215 frame #10: 0x0008220d9b1a libthr.so.3`___lldb_unnamed_symbol664 + 314 frame #11: 0x7913 frame #12: 0x007bba25 postgres`ServerLoop + 1541 frame #13: 0x007b9467 postgres`PostmasterMain + 3207 frame #14: 0x0071a566 postgres`main + 758 frame #15: 0x004f9995 postgres`_start + 261 After restart, no problems reported yet. Storyline: this is the file-list table of my backup/archive system, contains ~50 mio records. Recently I found a flaw in the backup system, so that some old records weren't removed. I wrote a script to do this, that script did run first at 04:15 and reported it had now removed a lot of old data. I looked into pgadmin4 and it reported 9 mio dead tuples. I wondered why autovacuum wouldn't run, and the stats showed that regular daily vacuum had last run at 03:15 and autovacuum at 03:18. I wondered why it wouldn't start again, and went to sleep. Having consumed 300 minutes cpu now at 11:00, it did start at around 05:00. No messages of hardware errors, good old Haswell Xeon/EP with all ECC. Could there be a compute error somewhere, and does autovacuum lock out ordinary kill signals during some time? We're at Rel. 15.6 on FreeBSD 13.3 *** Changed values: shared_buffers = 40MB temp_buffers = 20MB work_mem = 50MB maintenance_work_mem = 50MB max_stack_depth = 40MB dynamic_shared_memory_type = posix max_files_per_process = 200 effective_io_concurrency = 5 synchronous_commit = off wal_sync_method = fsync full_page_writes = off wal_compression = on wal_init_zero = off wal_writer_delay = 2000ms checkpoint_timeout = 180min checkpoint_completion_target = 0.0 max_wal_size = 2GB archive_mode = on archive_timeout = 86400 seq_page_cost = 0.5 random_page_cost = 0.7 effective_cache_size = 1GB default_statistics_target = 1000 autovacuum = on autovacuum_naptime = 5min autovacuum_vacuum_scale_factor = 0.01 autovacuum_analyze_scale_factor = 0.05
Re: Autovacuum endless loop in heap_page_prune()?
On Mon, May 27, 2024 at 11:25:47AM +0200, Laurenz Albe wrote: ! On Sat, 2024-05-25 at 12:51 +0200, Peter wrote: ! > I just found Autovacuum run for 6 hours on a 8 GB table, VACUUM query ! > doesnt cancel, cluster doesn't stop, autovacuum worker is not ! > killable, truss shows no activity, after kill -6 this backtrace: ! > ! > * thread #1, name = 'postgres', stop reason = signal SIGABRT ! > * frame #0: 0x00548063 postgres`HeapTupleSatisfiesVacuumHorizon + 531 ! > frame #1: 0x0054aed9 postgres`heap_page_prune + 537 ! > frame #2: 0x0054e38a postgres`heap_vacuum_rel + 3626 ! > frame #3: 0x006af382 postgres`vacuum_rel + 626 ! > frame #4: 0x006aeeeb postgres`vacuum + 1611 ! > frame #5: 0x007b4664 postgres`do_autovacuum + 4292 ! > frame #6: 0x007b2342 postgres`AutoVacWorkerMain + 866 ! > frame #7: 0x007b1f97 postgres`StartAutoVacWorker + 39 ! > frame #8: 0x007ba0df postgres`sigusr1_handler + 783 ! > frame #9: 0x0008220da627 libthr.so.3`___lldb_unnamed_symbol683 + 215 ! > frame #10: 0x0008220d9b1a libthr.so.3`___lldb_unnamed_symbol664 + 314 ! > frame #11: 0x7913 ! > frame #12: 0x007bba25 postgres`ServerLoop + 1541 ! > frame #13: 0x007b9467 postgres`PostmasterMain + 3207 ! > frame #14: 0x0071a566 postgres`main + 758 ! > frame #15: 0x004f9995 postgres`_start + 261 ! > ! > After restart, no problems reported yet. ! > ! > Storyline: ! > this is the file-list table of my backup/archive system, contains ~50 ! > mio records. Recently I found a flaw in the backup system, so that some ! > old records weren't removed. I wrote a script to do this, that script ! > did run first at 04:15 and reported it had now removed a lot of old ! > data. I looked into pgadmin4 and it reported 9 mio dead tuples. ! ! This smells of index corruption. ! ! I have seen cases where a corrupted index sends VACUUM into an endless loop ! so that it does not react to query cancellation. Thanks for Your reply. So it can happen... ! Check the index with the "bt_index_check()" function from the "amcheck" ! extension. If that reports a problem, rebuild the index. - -- Table: public.file -- DROP TABLE IF EXISTS public.file; CREATE TABLE IF NOT EXISTS public.file ( fileid bigint NOT NULL DEFAULT nextval('file_fileid_seq'::regclass), fileindex integer NOT NULL DEFAULT 0, jobid integer NOT NULL, pathid integer NOT NULL, deltaseq smallint NOT NULL DEFAULT 0, markid integer NOT NULL DEFAULT 0, fhinfo numeric(20,0) NOT NULL DEFAULT 0, fhnode numeric(20,0) NOT NULL DEFAULT 0, lstat text COLLATE pg_catalog."default" NOT NULL, md5 text COLLATE pg_catalog."default" NOT NULL, name text COLLATE pg_catalog."default" NOT NULL, CONSTRAINT file_pkey PRIMARY KEY (fileid) ) TABLESPACE pg_default; ALTER TABLE IF EXISTS public.file OWNER to bareos; -- Index: file_jpfid_idx -- DROP INDEX IF EXISTS public.file_jpfid_idx; CREATE INDEX IF NOT EXISTS file_jpfid_idx ON public.file USING btree (jobid ASC NULLS LAST, pathid ASC NULLS LAST, name COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; -- Index: file_pjidpart_idx -- DROP INDEX IF EXISTS public.file_pjidpart_idx; CREATE INDEX IF NOT EXISTS file_pjidpart_idx ON public.file USING btree (pathid ASC NULLS LAST, jobid ASC NULLS LAST) TABLESPACE pg_default WHERE fileindex = 0 AND name = ''::text; - bareos=# SET client_min_messages = DEBUG1; bareos=# SELECT bt_index_check(index => 16753, heapallindexed => true); DEBUG: verifying consistency of tree structure for index "file_pjidpart_idx" DEBUG: verifying level 2 (true root level) DEBUG: verifying level 1 DEBUG: verifying level 0 (leaf level) DEBUG: verifying that tuples from index "file_pjidpart_idx" are present in "file" DEBUG: finished verifying presence of 490834 tuples from table "file" with bitset 33.61% set LOG: duration: 55912.563 ms statement: SELECT bt_index_check(index => 16753, heapallindexed => true); bt_index_check (1 row) bareos=# SELECT bt_index_check(index => 16752, heapallindexed => true); DEBUG: verifying consistency of tree structure for index "file_jpfid_idx" DEBUG: verifying level 3 (true root level) DEBUG: verifying level 2 DEBUG: verifying level 1 DEBUG: verifying level 0 (leaf level) DEBUG: verifying that tuples from index "file_jpfid_idx" are present in "file" DEBUG: finished verifying presence of 45922943
Re: Autovacuum endless loop in heap_page_prune()?
On Mon, May 27, 2024 at 01:51:56PM +0200, Laurenz Albe wrote: ! > ! Apart from hardware problems, one frequent cause is upgrading glibc ! > ! (if the index on a string column or expression). ! > ! > No, this is FreeBSD, we don't normally do such things... ;) ! ! You don't update the C library, or collations never change? I rarely update the C library. Kernel + libraries + OS programs are a single unit here, updated about once a year, and then by many people and with all the usual testing. I could lookup how often some locale was modified, but honestly, I'm too lazy now. | (but of course SQL_ASCII is a mistake). Really? I re-read the chapter on Locale/CharSet Support, and I don't see the mistake. Only one can not convert that to anything else, which is fine in this usecase (storing arbitrary filenames from any OS in any character-set within the same column). regards, PMc
Re: Failing GSSAPI TCP when connecting to server
Hello Folks, Thanks for Your inspiration; and I made some progress (found a way to avoid the issue). The issue is most likely not related to postgres. Ron Johnson said: >> A configuration problem on the machine(s) can be ruled out, > Famous last words. Trust me. :) > Is there a way to test pmc authentication via some other tool, like psql? Sure, that works. The problem is contained inside the running application program(s), everything else doesn't show it. > If *only *the application changed, then by definition it can't be a > database problem. *Something* in the application changed; you just haven't > found it. Obviousely, yes. But then, such a change might expose an undesired behaviour elsewhere. > Specifically, I'd read the Discourse 2.3.0 and 2.3.1 release notes. Correction: it is actually 3.2.0 and 3.3.1. I finally went the way of bisecting, and, it's not really a problem in Discourse either. It comes from a feature I had enabled in the course of migrating, a filesystem change monitor based on kqueue: https://man.freebsd.org/cgi/man.cgi?query=kqueue Removing that feature solves the issue for now. I have still no idea how that tool might lead to mishandled sockets elsewhere; it might somehow have to do with the async processing of the DB connect. That would need a thorough look into the code where this is done. Tom Lane wrote: >The TCP trace looks like the client side is timing out too quickly >in the unsuccessful case. It's not clear to me how the different >Discourse version would lead to the Kerberos library applying a >different timeout. It's not a timeout; a timeout would close the socket. It seems to rather forget the socket. >Still, it seems like most of the moving parts >here are outside of Postgres' control --- I don't think that libpq >itself has much involvement in the KDC communication. Kerberos is weird. It goes into libgssapi, but libgssapi doesn't do much on it's own, it just maps so-called "mech"s, which then point to the actual kerberos code - which in the case of FreeBSD is very ancient (but work should be underway to modernize it). It's one of the most creepy pieces of code I've looked into. > I concur with looking at the Discourse release notes and maybe asking > some questions in that community. They only support that app to run in a certain containerization on a specific brand of Linux. They don't like my questions and might just delete them. Anyway, I have a lead now to either avoid the problem or where to look more closely. And it has not directly to do with postgres, but rather with genuine socket mishandling and/or maybe some flaw in FreeBSD. cheers, PMc
Failing GSSAPI TCP when connecting to server
My application is trying to connect the database server, and meanwhile tries to talk to the KDC server for a service ticket. Earlier these TCP connections did run like this, and were successful: 13:57:53.788797 IP6 clientIPv6.54143 > serverIPv6.88: Flags [S], seq 4189109662, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 402503231 ecr 0], length 0 13:57:53.789053 IP6 serverIPv6.88 > clientIPv6.54143: Flags [S.], seq 1604590824, ack 4189109663, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1505702993 ecr 402503231], length 0 13:57:53.789122 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 0 13:57:53.789223 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 4 13:57:53.829575 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 5, win 1035, options [nop,nop,TS val 1505703037 ecr 402503231], length 0 13:57:53.829696 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 402503271 ecr 1505703037], length 1161 13:57:53.832411 IP6 serverIPv6.88 > clientIPv6.54143: Flags [P.], seq 1:5, ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 4 13:57:53.832440 IP6 serverIPv6.88 > clientIPv6.54143: Flags [FP.], seq 5:907, ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 902 13:57:53.832461 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 908, win 1021, options [nop,nop,TS val 402503276 ecr 1505703037], length 0 13:57:53.832620 IP6 clientIPv6.54143 > serverIPv6.88: Flags [F.], seq 1166, ack 908, win 1035, options [nop,nop,TS val 402503276 ecr 1505703037], length 0 13:57:53.832847 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 1167, win 1035, options [nop,nop,TS val 1505703037 ecr 402503276], length 0 After an upgrade of the application they now look like this: 16:53:09.637708 IP6 clientIPv6.49124 > serverIPv6.88: Flags [S], seq 2860816940, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 2664510889 ecr 0], length 0 16:53:09.637990 IP6 serverIPv6.88 > clientIPv6.49124: Flags [S.], seq 2595272492, ack 2860816941, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 3896716896 ecr 2664510889], length 0 16:53:09.638153 IP6 clientIPv6.49124 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 0 16:53:09.638158 IP6 clientIPv6.49124 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 4 16:53:09.638162 IP6 clientIPv6.49124 > serverIPv6.88: Flags [FP.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 1161 16:53:09.638451 IP6 serverIPv6.88 > clientIPv6.49124: Flags [.], ack 1167, win 1017, options [nop,nop,TS val 3896716896 ecr 2664510889], length 0 16:53:09.638508 IP clientIPv4.58934 > serverIPv4.88: Flags [S], seq 3474302110, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 726603825 ecr 0], length 0 16:53:09.638696 IP serverIPv4.88 > clientIPv4.58934: Flags [S.], seq 2424341350, ack 3474302111, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2662337458 ecr 726603825], length 0 16:53:09.638729 IP clientIPv4.58934 > serverIPv4.88: Flags [.], ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 0 16:53:09.638733 IP clientIPv4.58934 > serverIPv4.88: Flags [P.], seq 1:5, ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 4 16:53:09.638751 IP clientIPv4.58934 > serverIPv4.88: Flags [FP.], seq 5:1166, ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 1161 16:53:09.638856 IP6 clientIPv6.46516 > serverIPv6.88: Flags [S], seq 3686126664, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1901751531 ecr 0], length 0 16:53:09.639054 IP serverIPv4.88 > clientIPv4.58934: Flags [.], ack 1167, win 1009, options [nop,nop,TS val 2662337458 ecr 726603825], length 0 16:53:09.639071 IP6 serverIPv6.88 > clientIPv6.46516: Flags [S.], seq 3830196306, ack 3686126665, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 457425497 ecr 1901751531], length 0 16:53:09.639105 IP6 clientIPv6.46516 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 0 16:53:09.639130 IP6 clientIPv6.46516 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 4 16:53:09.639154 IP6 clientIPv6.46516 > serverIPv6.88: Flags [FP.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 1161 This is repeated LOTS of times, and what remains is hundreds of sockets in FIN_WAIT_2 that do not appear to disappear quickly. The error message from postgres client is GSSAPI continuation error: Miscellaneous failure: unable to reach any KDC in realm (In fact in does reach the K
Re: query_to_xml() returns invalid XML when query returns no rows
On 3/12/18 08:18, Thomas Kellerer wrote: > I am not sure if this qualifies as a bug: > > query_to_xml() returns an empty XML document when the query returns no rows, > e.g: > >select query_to_xml('select 42 where false', false, true, ''); > > The problem with this is, that if the resulting XML is then fed into e.g. the > xpath() function, that function fails because the "empty" document is an > invalid XML: That's because you have the tableforest argument set to true. If you want a proper XML document, then you should write select query_to_xml('select 42 where false', false, false, ''); -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: query_to_xml() returns invalid XML when query returns no rows
On 3/13/18 15:21, Thomas Kellerer wrote: > I still think it's incorrect to return an empty (=invalid) XML instead of a > NULL value though. This behavior is specified in the SQL standard. While an empty string is not a valid XML "document", it is valid as XML "content". -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Question on corruption (PostgreSQL 9.6.1)
On Thu, Mar 15, 2018 at 8:16 AM, Andy Halsall wrote: > Thanks for the advice. I re-indexed and reloaded a pg_dumpall into a spare > server - no errors. Will run pg_catcheck asap. You can also run amcheck. Get the version targeting earlier Postgres releases off Github (there are packages for most Linux systems). This can verify that the heap is consistent with indexes. -- Peter Geoghegan
Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel wrote: > SELECT heap_page_items(get_raw_page('pg_authid', 7)); Can you post this? SELECT * FROM page_header(get_raw_page('pg_authid', 7)); -- Peter Geoghegan
Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
On Mon, Mar 19, 2018 at 1:55 PM, Jeremy Finzel wrote: > @Peter : > > staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7)); > lsn | checksum | flags | lower | upper | special | pagesize | > version | prune_xid > +--+---+---+---+-+--+-+--- > 262B4/10FDC478 |0 | 1 | 304 | 2224 |8192 | 8192 | > 4 | 0 > (1 row) Thanks. That looks normal. I wonder if the contents of that page looks consistent with the rest of the table following manual inspection, though. I recently saw system catalog corruption on a 9.5 instance where an entirely different relation's page ended up in pg_attribute and pg_depend. They were actually pristine index pages from an application index. I still have no idea why this happened. This is very much a guess, but it can't hurt to check if the contents of the tuples themselves are actually sane by inspecting them with "SELECT * FROM pg_authid". heap_page_items() doesn't actually care about the shape of the tuples in the page, so this might have been missed. -- Peter Geoghegan
Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
On Wed, Mar 21, 2018 at 1:38 PM, Jeremy Finzel wrote: > A server restart and upgrade to 9.5.12 (at the same time), as expected, made > the issue go away. Still doesn't give us any answers as to what happened or > if it would happen again! Thanks for the feeback. You may still want to use amcheck to look for problems. The version on Github works with 9.5, and there are Redhat and Debian pgdg packages. See: https://github.com/petergeoghegan/amcheck The "heapallindexed" option will be of particular interest to you - that option verifies that the table has matching rows for a target index (in addition to testing the structure of a target B-Tree index itself). This is probably the best general test for corruption that is available. There is a fair chance that this will reveal new information. -- Peter Geoghegan
Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
On Thu, Mar 22, 2018 at 12:27 PM, Jeremy Finzel wrote: > Thank you for the recommendation. I ran both amcheck functions on all 4 > indexes of those 2 tables with heapallindexed = true, but no issues were > found. Probably wouldn't hurt to run it against all indexes, if you can make time for that. If you can generalize from the example query that calls the bt_index_check() function, but set "heapallindexed=>i.indisprimary" and remove "n.nspname = 'pg_catalog'", as well as "LIMIT 10". This will test tables and indexes from all schemas, which might be interesting. -- Peter Geoghegan
Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
On Thu, Mar 22, 2018 at 2:24 PM, Jeremy Finzel wrote: > I am running this on a san snapshot of our production system. I assume that > this will give me a valid check for file-system-level corruption. I am > going to kick it off and see if I find anything interesting. It might. Note that SAN snapshots might have corruption hazards, though. Postgres expects crash consistency across all filesystems, so you might run into trouble if you had a separate filesystem for WAL, for example. I know that LVM snapshots only provide a consistent view of a single logical volume, even though many LVM + Postgres setups will involve multiple logical volumes. This makes it possible for a small inconsistency across logical volumes to corrupt data. I don't know anything about your SAN snapshotting, but this is at least something to consider. -- Peter Geoghegan
Re: Troubleshooting a segfault and instance crash
On Thu, Mar 8, 2018 at 9:40 AM, Blair Boadway wrote: > Mar 7 14:46:35 pgprod2 kernel:postgres[29351]: segfault at 0 ip > 00302f32868a sp 7ffcf1547498 error 4 in > libc-2.12.so[302f20+18a000] > > Mar 7 14:46:35 pgprod2 POSTGRES[21262]: [5] user=,db=,app=client= LOG: > server process (PID 29351) was terminated by signal 11: Segmentation fault > It crashes the database, though it starts again on its own without any > apparent issues. This has happened 3 times in 2 months and each time the > segfault error and memory address is the same. We had a recent report of a segfault on a Redhat compatible system, that seemed like it might originate from within its glibc [1]. Although all the versions there didn't match what you have, it's worth considering as a possibility. Maybe you can't install debuginfo packages because you don't yet have the necessary debuginfo repos set up. Just a guess. That is sometimes a required extra step. [1] https://postgr.es/m/7369.1520528...@sss.pgh.pa.us -- Peter Geoghegan
Re: Autovacuum behavior with rapid insert/delete 9.6
On Thu, Mar 29, 2018 at 4:01 PM, Ted Filmore wrote: > What I am really asking to confirm is after describing the situation is it > reasonable to focus on (in the short term) tuning autovacuum to increase > performance or does this not make sense given the workload and I should look > elsewhere? I would look into this suspected 9.5 regression, if that's possible: https://postgr.es/m/CAH2-Wz=sfakvmv1x9jh19ej8am8tzn9f-yecips9hrrrqss...@mail.gmail.com -- Peter Geoghegan
Re: Problem with connection to host (wrong host)
On 3/31/18 11:44, Mike Martin wrote: > Hi > I am just setting up a postgres server, details > Host 192.168.0.3 > pg_hba.conf > # TYPE DATABASE USER ADDRESS METHOD > > # "local" is for Unix domain socket connections only > local all all peer > # IPv4 local connections: > host all all 192.168.0.0/32 > <http://192.168.0.0/32> ident This entry should probably be something like 192.168.0.0/24 Writing .0/32 doesn't make much sense. > # IPv6 local connections: > host all all 127.0.0.1/32 > <http://127.0.0.1/32> ident > > host all all ::1/128 ident > > postgresql.conf > listen_addresses-'*' > > however when I try to connect from my laptop (ip 192.168.0.2) I get > > psql -h 192.168.0.3 -U usevideo -W > Password for user usevideo: > psql: FATAL: no pg_hba.conf entry for host "192.168.0.2", user > "usevideo", database "usevideo", SSL off > > So a bit confused, is psql ignoring the host parameter .3 is the host you are connecting to, as seen from the client. .2 is the host your connection is coming from, as seen from the server. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: LDAP Bind Password
On 4/3/18 16:12, Kumar, Virendra wrote: > Is anybody aware of how to encrypt bind password for ldap authentication > in pg_hba.conf. Anonymous bind is disabled in our organization so we > have to use bind ID and password but to keep them as plaintext in > pg_hba.conf defeat security purposes. We want to either encrypt it or > authenticate without binding. Any insights into this is appreciated. You can use the "simple bind" method that is described in the documentation. That one doesn't involve a second bind step. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Rationale for aversion to the central database?
Tim, I'm sorry if I sound like a cheerleader, but boy did you nail this. I would basically say exactly the same thing, just not as well. On Sun, Apr 8, 2018 at 9:37 PM, Tim Cross wrote: > > > On 9 April 2018 at 07:39, Guyren Howe wrote: > >> I am a Rails developer at a medium-large size company. I’ve mostly worked >> at smaller companies. I’ve some exposure to other web development >> communities. >> >> When it comes to databases, I have universally encountered the attitude >> that one should treat the database as a dumb data bucket. There is a *very* >> strong aversion to putting much of any business logic in the database. I >> encounter substantial aversion to have multiple applications access one >> database, or even the reverse: all abstraction should be at the application >> layer. >> >> My best theory is that these communities developed at a time when Windows >> was more dominant, and just generally it was *significantly* easier to use >> MySQL than Postgres for many, particularly new, developers. And it is >> pretty reasonable to adopt an aversion to sophisticated use of the database >> in that case. >> >> This attitude has just continued to today, even as many of them have >> switched to Postgres. >> >> This is only a hypothesis. I am now officially researching the issue. I >> would be grateful for any wisdom from this community. >> >> >> Aside: it is rare to find a situation in life or anywhere where one >> widely adopted thing is worse in *every way* than another thing, but this >> certainly was and largely still continues to be the case when one compares >> MySQL and Postgres. So why do folks continue to use MySQL? I find this >> mystifying. >> > > It is interesting looking at many of the responses to this thread. I see a > lot at each extreme - either put lots of stuff inthe database or use the > database as just a 'dumb' store and put everything in the application code. > > I think the real solution is somewhere in the middle. I've lost count of > the number of applications where the application code is jumping through > all sorts of hoops to do basic data operations which would be far better > handled in the database and can easily be done using just ANSI SQL (so is > portable). It drives me crazy when people tell me the database is slow when > they are doing 'select * from table' and then filtering and sorting the > data in their application. Applications should take advantage of what the > database does well. Unfortunately, I see far too many developers who are > uncomfortable with SQL, don't know how to structure their queries > efficiently (lots of nested sub queries etc, cartesian joins etc). > > At the other extreme is those who tend to put almost everything in the > database - including business policy and business 'rules' which are > probably better categorised as current business strategy. First, I think it > is nearly always a mistake to try and enforce business policy with > technology. Policies change too often and should be dealt with via > administrative measures. Technology can certainly be used to raise alerts > regarding policy breeches, but should not be used to enforce policies. > Likewise, some business rules are more akin to strategies than being actual > static rules and can change with little notice, rhyme or reason. These > probably should not be 'hard coded' into the database. Other rules are more > stable and unlikely to ever change and are likely good candidates for being > encoded in the database as either functions or constraints. > > I do feel that often the big problem is with management who fail to > understand the time and effort needed to develop a good data model. > Developers are put under pressure to deliver functionality and as long as > it looks correct at the interface level, all is good. Little thought is > really put into long term maintenance or performance. From a developer > perspective, time put into becoming an expert in React, Angular, Node, > Python etc is probably going to earn them more bonus points than time spent > on developing skills in defining good data models or understanding of the > power/functionality of the underlying database engine. Of course, this does > tend to be short sighted as a good data model will tend to make it easier > to add/enhance an application and understanding your database system will > make changes and enhancements less daunting. > > For me, the sign of a good developer is one who is able to get the balance > right. They understand the strengths and weaknesses of ALL the components > involved and are able to select the technology mix which suits the problem > domain and are able to get the right balance between business > responsiveness to change and long term maintenance/viability. > Unfortunately, such developers are rare, so it will usually mean there are > a team of people with different skills and what will matter is how well > they are able to work together as a team and come up with an architectur
Re: List all columns referencing an FK
On 2/8/18 05:31, Andreas Joseph Krogh wrote: > Back in 2008 I asked this > question: > http://www.postgresql-archive.org/Finding-all-tables-that-have-foreign-keys-referencing-a-table-td2153236.html > I wonder, is this now possible using information_schema only, or are > there still pieces missing in the standard holding this back? I think you'll still have the same problems if the same constraint name appears more than once per schema. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: How to monitor logical replication initial sync?
On 3/15/18 09:19, bricklen wrote: > How does one monitor the status or progress of an initial sync under > logical replication? For example: > > * I create a publication in database db_pub > * I create a subscription in database db_sub > * In 15 minutes I want to check an see that the initial sync is N% > complete > > Is it possible to tell when the initial sync is complete, or better > yet, how complete it is? > > > This is a question I'm quite interested in as well (and one I do not > have an answer to). > > Does anyone with more familiarity with logical replication have any > suggestions on how to determine the status of the initial sync? Something like select * from pg_subscription_rel where srsubstate <> 'r' and srsubid = (select oid from pg_subscription where subname = 'mysub'); The key is checking the srsubstate column for 'r' (ready). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ERROR: found multixact from before relminmxid
On Mon, Apr 9, 2018 at 7:01 AM, Tomas Vondra wrote: > The bigger question is whether this can actually detect the issue. If > it's due to an storage issue, then perhaps yes. But if you only see > multixact issues consistently and nothing else, it might easily be a > PostgreSQL bug (in which case the checksum will be correct). You can also run amcheck. Get the version targeting earlier Postgres releases off Github (there are packages for most Linux systems). This can verify that the heap is consistent with indexes. -- Peter Geoghegan
Re: ERROR: found multixact from before relminmxid
On Mon, Apr 9, 2018 at 5:55 PM, Alexandre Arruda wrote: > I ran amcheck in all index of a table and I only get empty returns. Did you try doing so with the "heapallindexed" option? That's what's really interesting here. -- Peter Geoghegan
Re: ERROR: found multixact from before relminmxid
On Mon, Apr 9, 2018 at 6:56 PM, Alexandre Arruda wrote: > (... and all other indexes returns null too) > > I tried with bt_index_check too. Same results. That's interesting, because it tells me that you have a table that appears to not be corrupt, despite the CLUSTER error. Also, the error itself comes from sanity checking added to MultiXact freezing fairly recently, in commit 699bf7d0. You didn't say anything about regular VACUUM being broken. Do you find that it works without any apparent issue? I have a suspicion that this could be a subtle bug in CLUSTER/freezing. The only heap_freeze_tuple() caller is code used by CLUSTER, so it's not that hard to imagine a MultiXact freezing bug that is peculiar to CLUSTER. Though I haven't thought about it in much detail. -- Peter Geoghegan
Re: ERROR: found multixact from before relminmxid
On Mon, Apr 9, 2018 at 7:53 PM, Andres Freund wrote: > I've not followed this thread. Possible it's the overeager check for pg > upgraded tuples from before 9.3 that Alvaro fixed recently? I was aware of commit 477ad05e, which must be what you're referring to. I don't think that that's what this is, since this error occurs within heap_freeze_tuple() -- it's not the over-enforced HEAP_XMAX_IS_LOCKED_ONLY() error within heap_prepare_freeze_tuple(). And, because this database wasn't pg_upgraded. I should wait until tomorrow before doing any further analysis, though. -- Peter Geoghegan
Re: ERROR: found multixact from before relminmxid
On Tue, Apr 10, 2018 at 4:31 AM, Alexandre Arruda wrote: > Actualy, I first notice the problem in logs by autovacuum: > > 2018-04-10 08:22:15.385 -03 [55477] CONTEXT: automatic vacuum of > table "production.public.fn06t" > 2018-04-10 08:22:16.815 -03 [55477] ERROR: found multixact 68834765 > from before relminmxid 73262006 > > production=# vacuum analyze verbose fn06t; > INFO: vacuuming "public.fn06t" > ERROR: found multixact 76440919 from before relminmxid 122128619 Do you think that CLUSTER was run before regular VACUUM/autovacuum showed this error, though? Have you noticed any data loss? Things look okay when you do your dump + restore, right? The problem, as far as you know, is strictly that CLUSTER + VACUUM refuse to finish/raise these multixactid errors? -- Peter Geoghegan
Re: ERROR: found multixact from before relminmxid
On Tue, Apr 10, 2018 at 7:54 PM, Alexandre Arruda wrote: > pg_control version number:1002 Andres was also asking about his check_rel() function, from https://www.postgresql.org/message-id/20180319181723.ugaf7hfkluqyo...@alap3.anarazel.de. Can you check that out as well, please? You'll need to be able to install the pageinspect contrib module. -- Peter Geoghegan
Re: New website
On Wed, Apr 18, 2018 at 3:03 PM, Adrian Klaver wrote: > I would contact the Webmaster but Contact goes to a big image of an elephant > head. That is also where Downloads, Support and Donate lands. Might have > been a good idea to roll out a demo site for testing first. Will reserve > judgment on the site design until it is functioning. I really don't think it's practical to give everyone a veto on a website design. It took years to launch this website redesign. Honestly, I was beginning to think that it would never happen. Anyway, I did notice something myself, which is that the git logo links to https://git.postgresql.org/gitweb/, which has many non-very-important git repos. Instead, it should point to the main PostgreSQL repository's gitweb page, which is at https://git.postgresql.org/gitweb/?p=postgresql.git. -- Peter Geoghegan
Re: New website
On Wed, Apr 18, 2018 at 3:40 PM, Adrian Klaver wrote: > And to get to the mailing list archives(arguably one of the most important > links) you have to: > > Click on Community, click on Mailing Lists on the left sidebar, then scroll > to the bottom of page to find the search box. I look at the mailing list archives as much as anyone else does, and I don't think that it's very important for it to have prominent placement. Even still, the only difference here is the scrolling. It has fundamentally the same structure as before. > I am one of the No votes in the survey. This sounds pretty far from constructive to me, which automatically detracts from what you're saying. -- Peter Geoghegan
Re: Can we run pg_basebackup master is still working normally (selects, updates, deleted, etc)
On 5/4/18 06:41, Adrian Heath wrote: > The system is used 24/7. Can we run pg_basebackup while the master is > still receiving updates without worrying about physical file corruptions > in the generated archive file? Yes, as long as you follow the instructions about saving the WAL generated during the backup and replaying it on the newly created standby. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
On Fri, May 25, 2018 at 1:38 PM, Andres Freund wrote: >> BTW I think the definition of HeapTupleHeaderXminFrozen is seriously >> confusing, by failing to return true if the xmin is numerically >> FrozenXid (which it'll be if the database was pg_upgraded). I wonder >> about this one in HeapTupleSatisfiesMVCC: > > I suggest raising this on -hackers. I agree that it's unfortunate. I wonder if BootstrapTransactionId also needs to be considered here. -- Peter Geoghegan
Re: Syndicating PostgreSQL mailing list to Discourse
On Sun, May 27, 2018 at 6:13 AM, Stephen Frost wrote: > I'm curious what would be different here from what our archives provide. > We could certainly have a single "all lists" archive page but that seems > more likely to be just completely confusing than actually useful at all. Any replacement to our own archives will need to provide access to mail from over 20 years ago to be in any way usable. It's not uncommon to have to go back that far. Personally, I don't buy the idea that the need to use a mailing list rather than a web forum is a notable obstacle for new contributors. PGLister seems pretty slick to me. It has eliminated all of the frustrations that I had. Maybe we need to do a better job when it comes to communicating what the benefits of a mailing list are, though. There are real, practical reasons to prefer a mailing list; that preference isn't just due to ingrained habit. I'm pleased that there has been a drive to modernize some of the community's infrastructure in recent years, but only because those changes turned out to be unalloyed improvements (at least in my view). Besides, while mailing lists may seem antiquated to a lot of people, aren't web forums almost as antiquated? Sites like Stack Overflow are very clearly not designed to work as discussion forums. They do not allow subjective questions, and it's common for moderators to swiftly delete new threads. Stack Overflow is popular because it provides the fastest possible access to a crowdsourced answer, without requiring or even encouraging participation. -- Peter Geoghegan
Re: Code of Conduct plan
On Mon, Jun 4, 2018 at 11:41 AM, Jason Petersen wrote: > Ultimately, the important thing this CoC provides is some concrete language > to point at when a party is aggrieved and explicit avenues of redress > available when one refuses to address one’s own behavior. We’re adults here, > the strawmen of people being harangued out of the community because they > said a bad word are unlikely to materialize. > > +1 This seems like a good summary on the purpose of the CoC. It is of course possible that a member of the committee could act in bad faith for any number of reasons. You can say the same thing about any position of leadership or authority within the community, though. That hasn't really been much of a problem in my experience, and I see no reason for particular concern about it here. -- Peter Geoghegan
Re: Code of Conduct plan
On Tue, Jun 5, 2018 at 8:49 AM, Benjamin Scherrey wrote: > I thought the same thing as a member of the Django community. It adopted a > CoC that I vocally warned was dangerous and far more likely to be abused > than provide any benefit. I was shocked when the very first time it was ever > invoked it was by one of the founders of the project (whom I previously > personally respected) and it was absolutely used in the manner that I had > feared which was to shut someone up whose opinion he did not like rather > than any legitimate concern. Unfortunately this is not such an unusual > circumstance as one might hope in these projects or conferences. It is > impossible to separate the concept of political correctness from these CoCs > I find and they are much more dangerous things than they appear. We should > tread with extreme cautious about adopting such a thing. It's impossible for me to know what really happened in that situation, but it doesn't seem like the CoC was likely to have been much of a factor in any telling. If this individual was in a position of influence and decided to act maliciously, they would no doubt have found another way to do so in the absence of a CoC. On the other hand, it's easy to imagine a newer non-influential community member finding no recourse against abusive behavior because that isn't explicitly provided for; they might simply not know where to start, and become totally discouraged. Nobody is claiming that the CoC is perfect, or that it can anticipate every situation; it's just a framework for handling disputes about abusive and/or antisocial behavior. The core team have had exclusive responsibility for "Handling disciplinary issues" as part of their charter, at least until now. You can make exactly the same slippery slope argument against that. -- Peter Geoghegan
Re: Code of Conduct plan
On Tue, Jun 5, 2018 at 9:51 AM, James Keener wrote: > To be honest, this is a bigger problem. Why would someone not feel > comfortable contacting the core team? Why would they feel better contacting > the CoC board who is probably mostly core team or otherwise self-selected > community members who have a strong belief in the CoC (and I don't mean that > kindly)? The CoC states that the committee's members cannot come from the core team. -- Peter Geoghegan
Re: Code of Conduct plan
On Tue, Jun 5, 2018 at 10:08 AM, Tom Lane wrote: > So publishing a formal CoC at all is mainly meant to deal with weak > points 1 and 2, and then the details of the process are there to try > to fix point 3. > > Yeah, managing the committee is a lot of overhead that in an ideal > world we wouldn't need, but I think we have to accept it to have a > process people will have confidence in. It's worth pointing out that the community has grown considerably in the last ten years. I assume that adding a bit of process to deal with these kinds of disputes is related to that. We have a pretty good track record through totally informal standards for behavior. Setting a good example is absolutely essential. While that's still the most important thing, it doesn't seem particularly scalable on its own. -- Peter Geoghegan
Re: Code of Conduct plan
On Tue, Jun 5, 2018 at 12:20 PM, Benjamin Scherrey wrote: > I keep hearing this claim. I've followed up and tried to verify them. Sorry > but "trust me" doesn't cut it here any more than "trust me this will make > Postgres go faster" would on a code change. What's the context for this? > What evidence do we have that indicates this CoC would have likely resulted > in a different outcome? Without that then your claim does not even rise up > to the standard of theoretical. Frankly this claim does not seem very > plausible to me at all. Let's try to keep our standards here. Whose standards are these? By my count, the majority of e-mails you've ever sent to a PostgreSQL mailing list have been sent in the last 2 days, to this code of conduct thread. -- Peter Geoghegan
Re: Code of Conduct plan
On Tue, Jun 5, 2018 at 2:06 PM, Sven R. Kunze wrote: > 1) CoC might result in developers leaving projects > http://lists.llvm.org/pipermail/llvm-dev/2018-May/122922.html This guy left LLVM for several reasons. The pertinent reason for us was that he had to agree to a code of conduct in order to attend conferences, which he found to be unacceptable. He did not have to agree that the idea of a code of conduct was a good one, though. It would have been perfectly possible for him to be opposed in principle to the idea of a CoC, while also formally agreeing to it and attending those conferences. I gather that his objections were around questions of unintended consequences, the role of a certain authority to assess violations of the CoC, and so on (I surmise that he was not actually opposed to or constrained by any of the specific rules around content in technical presentations and so on). I for one accept that these may have been reasonable concerns, even though I don't really agree, since the LLVM CoC seems quite reasonable. Anybody that participates in an open source community soon learns that their opinion on almost any matter may not be the one that prevails. There are often differences of opinion on -hackers that seem to fundamentally be down to a difference in values. We still manage to make it work, somehow. > 2) CoC might result in not so equal peers and friends, might result in a > committee which feels above their peers, and might promote conceit and > denunciation. I think that having a code of conduct is better than not having one, and I think that the one that we came up with is appropriate and proportionate. We could speculate all day about specific unintended consequences that may or may not follow. That doesn't seem very constructive, though. Besides, the time for that has passed. > In related discussions, people recurringly ask not to establish a secondary > judicial system but to use the already existing ones. I don't follow. Practically any organized group has rules around conduct, with varying degrees of formality, means of enforcement, etc. Naturally, the rules across disparate groups vary widely for all kinds of reasons. Formalizing and being more transparent about how this works seems like the opposite of paternalism to me. -- Peter Geoghegan
Re: Replica string comparsion issue
On Wed, Jun 13, 2018 at 1:10 PM, Andrey Lizenko wrote: > I'm observing strange behaviour on comparing ::text field with string while > quering replica. The first thing to do here is to install amcheck on the replica, and verify that the indexes are consistent with the replica's own notion of how text sorts for the collation that is implicitly in use (the database default): https://www.postgresql.org/docs/current/static/amcheck.html You can modify the example query to check the indexes that you're interested in. I think that there is a very strong chance that the replica has incompatible collation rules, given that it uses a totally different OS. -- Peter Geoghegan
Re: Can I disable sslcert/sslkey in pg_service.conf?
On 6/14/18 15:06, George Woodring wrote: > If I put the blanks into pg_service.conf: > > [mydb] > dbname=mydb > host=host1 > sslmode=require > sslcert= > sslkey= > > It does not work. I can believe that. > [woodring@ibeam]$ PGSERVICE=mydb psql > psql: SSL error: tlsv1 alert unknown ca > > I tried the opposite of moving the .postgresql directory to a different > name and putting a hard coded certificate path in pg_service, but it > looks to have its own sets of challenges. I think that's probably the best way out, though. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: What to do when dynamic shared memory control segment is corrupt
On Mon, Jun 18, 2018 at 1:03 PM, Tom Lane wrote: > Hm, I supposed that Sherrylyn would've noticed any PANIC entries in > the log. The TRAP message from an assertion failure could've escaped > notice though, even assuming that her logging setup captured it. Unhandled C++ exceptions end up calling std::abort(). I've seen bugs in modules like PL/V8 that were caused by this. The symptom was a mysterious message in the logs about SIGABRT. Perhaps that's what happened here? What extensions are installed, if any? -- Peter Geoghegan
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Wed, Jun 27, 2018 at 11:35 AM, Luca Ferrari wrote: > If I then disconnect and reconnect I'm able to issue the select and > get back the results. But if I issue a reindex I got the same error > and the table "becames unreadable" for the whole session. > On 10.3 the table is never locked for the session, that is I can > create the index, I can query the table and get the results, but I > cannot reindex. However, even after a reindex, it does allow me to > select data from the table. > > So my question is: why this behavior in later PostgreSQL? It might have something to do with the changes to parallel CREATE INDEX. It changed how we tracked whether or not an index could be used because it was currently undergoing reindexing. This is supposed to make no difference at all, but there was one bug that could cause us to consider an index irrevocably unusable. Do you find that the issue goes away if you set max_parallel_maintenance_workers=0 on v11/master? -- Peter Geoghegan
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Thu, Jun 28, 2018 at 8:02 AM, Andres Freund wrote: > Peter, looks like you might be involved specifically. Seems that way. > This however seems wrong. Cleary the relation's index list is out of > date. > > I believe this happens because there's currently no relcache > invalidation registered for the main relation, until *after* the index > is built. Normally it'd be the CacheInvalidateRelcacheByTuple(tuple) in > index_update_stats(), which is called at the bottom of index_build(). > But we never get there, because the earlier error. That's bad, because > any relcache entry built *after* the CommandCounterIncrement() in > CommandCounterIncrement() will now be outdated. > > In the olden days we most of the time didn't build a relcache entry > until after the index was built - but plan_create_index_workers() now > does. I'm suspect there's other ways to trigger that earlier, too. Note that there is a kludge within plan_create_index_workers() that has us treat the heap relation as an inheritance parent, just to get a RelOptInfo for the heap relation without running into similar trouble with the index in get_relation_info(). Perhaps there's an argument to be made for refactoring plan_create_index_workers() as a fix for this. > Putting in a CacheInvalidateRelcache(heapRelation); before the CCI in > index_create() indeed makes the "borked relcache" problem go away. > > > I wonder why we don't just generally trigger invalidations to an > indexes' "owning" relation in CacheInvalidateHeapTuple()? I don't know, but that seems like a good question. -- Peter Geoghegan
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Mon, Jul 9, 2018 at 11:32 AM, Andres Freund wrote: >> Note that there is a kludge within plan_create_index_workers() that >> has us treat the heap relation as an inheritance parent, just to get a >> RelOptInfo for the heap relation without running into similar trouble >> with the index in get_relation_info(). Perhaps there's an argument to >> be made for refactoring plan_create_index_workers() as a fix for this. > > Maybe I'm missing something, but what has this got to do with the issue > at hand? Nothing. It might be worthwhile to find a way to not do that as part of fixing this issue, though. Just a suggestion. > I assume we'll have to backpatch this issue, so I think it'd probably a > good idea to put a specific CacheInvalidateHeapTuple() in there > explicitly in the back branches, and do the larger fix in 12. ISTM > there's some risks that it'd cause issues. Will you tackle this? Okay. -- Peter Geoghegan
Re: sorting/comparing column values in non-alphanumeric sorting ways ?
On Wed, Jul 11, 2018 at 2:44 PM, David Gauthier wrote: > I want to load the 'highestver' column with the highest version of tcfg1-3. > > This won't work... > update tv set greatest = greatest(tcfg1,tcfg2,tcfg3) > ...because it thinks 1.0.9 is greater than 1.0.10 > > Is there a way to get this to work right ? If you're using v10 with ICU, then you can create a custom ICU collation for this, with "natural" sort order. Something like this should work: CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true'); See the docs -- "23.2.2.3.2. ICU collations". -- Peter Geoghegan
Re: How to create logical replication slot with NOEXPORT_SNAPSHOT in jdbc
On 17.07.18 03:41, Igor Polishchuk wrote: > We are trying to use logical decoding for detecting database changes. > However, when we create a replication slot, the data processing pauses > if there are still transactions running from before the slot creation. > If I understand correctly, the slot is waiting for creating a consistent > snapshot and is blocked by the long transactions. > In our application, we don't need it, as we only want to see if some > tables were modified. Is it possible to create a logical replication > slot with NOEXPORT_SNAPSHOT option using jdbc? That doesn't do what you want. You still need to wait for the snapshot to be created; there is no way around that. The NOEXPORT_SNAPSHOT option just means that the snapshot, once created, won't be exported for use by other sessions. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: User documentation vs Official Docs
On 17.07.18 02:13, Joshua D. Drake wrote: > On 07/16/2018 05:08 PM, Alvaro Herrera wrote: >> >> Sounds like wiki pages could solve need this pretty conveniently. If >> and when the content is mature enough and migrates to the tutorial main >> documentation pages, the wiki pages can be replaced with redirects to >> those. > > Anyone who writes a lot is going to rebel against using a wiki. They are > one of the worst to write in from a productivity perspective. I would > rather write in Docbook, at least then I can template everything and we > could have a standard xsl sheet etc... I don't really buy that. The wiki seems just fine for writing short to medium size how-to type articles. We already have good content of that sort in the wiki right now. It's not like there isn't going to be anyone who will rebel against any of the other tool chains that have been mentioned. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re:
On Thu, Jul 19, 2018 at 11:43 AM, Torsten Förtsch wrote: > is there a way to find if a certain page in a data file is referenced by a > btree index? > > I found a few completely empty pages in one of my tables. I am not sure if > that's corruption or just bloat. Now I am thinking I could use an index, the > PK for instance, and see if it references these pages. Ir'a probably not serious, but you may want to try amcheck's heapallindexed check. You'll have to use the non-contrib packages for that right now, though, but those are available from the PGDG repos. -- Peter Geoghegan
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Wed, Jul 25, 2018 at 4:03 PM, Andres Freund wrote: > Peter, given that your patch made this more likely, and that you're a > committer these days, I'm opening an open items entry, and assign it to > you. Does that sound ok? I intend to follow through on this soon. I have been distracted by project work. I accept responsibility for the open item, though. -- Peter Geoghegan
Re: Restore relhaspkey in PostgreSQL Version 11 Beta
On Tue, Jul 31, 2018 at 7:47 AM, Melvin Davidson wrote: > I was hoping that at least one other person would see my point of view, but > by the > harsh replies I've been getting, I feel more like a whistle blower that > insiders > think I also should be made to "go away". You were bellicose from almost the beginning of this thread. And, yes, that does detract from your argument. Just as it would in almost any other sphere or arena. > Well, you are right. This old Viet Vet shall now end this conversation and > his career. > I just need a way to do so quietly and painlessly. > The truth is absolute and cannot be changed. > Perception is not the truth. > Flerp! I cannot imagine what reaction you were expecting to this. In all sincerity, I suggest reflecting on your words. You don't seem to have realistic expectations about how the community works, or could ever work. -- Peter Geoghegan
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Mon, Jul 9, 2018 at 11:32 AM, Andres Freund wrote: > I assume we'll have to backpatch this issue, so I think it'd probably a > good idea to put a specific CacheInvalidateHeapTuple() in there > explicitly in the back branches, and do the larger fix in 12. ISTM > there's some risks that it'd cause issues. What do you think of the attached? The is a new CacheInvalidateRelcache() call, rather than a new call to CacheInvalidateRelcacheByTuple(), but those two things are equivalent (I assume that you actually meant to say CacheInvalidateRelcacheByTuple(), not CacheInvalidateHeapTuple()). Since nobody seems to be that excited about the CacheInvalidateHeapTuple() idea, I haven't pursued it. -- Peter Geoghegan 0001-Add-table-relcache-invalidation-to-index-builds.patch Description: Binary data
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Tue, Jul 31, 2018 at 7:02 PM, Andres Freund wrote: > Maybe expand a bit on this by saying that it's more likely "because > plan_create_index_workers() triggers a relcache entry to be (re-)built, > which previously did only happen in edge cases" or such? Okay. > Not a fan of this comment. It doesn't really explain that well why it's > needed here, but then goes on to a relatively general explanation of why > cache invalidation is necessary. Why not just go for something like > "register relcache invalidation on the indexes' heap relation, to > maintain consistency of its index list"? That seems much more generic to me! The comment is supposed to convey that the stuff within index_update_stats() isn't enough because of xact abort specifically. SI invalidation is very much part of the index_update_stats() contract already. > I wonder if it wouldn't be more appropriately placed closer to the > UpdateIndexRelation(), given that that's essentially what necessitates > the relcache flush? That makes sense. I'll do it that way. -- Peter Geoghegan
Re: could not read block 0 in file : read only 0 of 8192 bytes when doing nasty on immutable index function
On Tue, Jul 31, 2018 at 9:00 PM, Andres Freund wrote: > I don't think that's particularly relevant. We should always register an > invalidation before the relevant CommandCounterIncrement(), because that > is what makes catalog changes visible, and therefore requires > registering invalidations for coherency. Fair enough. How about the attached revision? -- Peter Geoghegan v2-0001-Add-table-relcache-invalidation-to-index-builds.patch Description: Binary data
Re: jndi jdbc url with ssl authenticat in tomcat ... fails org.xml.sax.SAXParseException columnNumber: 79; The reference to entity "ssl" must end with the ';' delimiter
The & need to be replaced by & Peter On Sat, 4 Aug 2018, 22:47 Dave Cramer, wrote: > Pretty sure this is a tomcat error . > > The connection string looks fine > > Dave Cramer > > da...@postgresintl.com > www.postgresintl.com > > On 30 July 2018 at 11:32, Didier Wiroth > wrote: > >> Hello, >> I'm trying to configure a postgres jndi resource in tomcat that needs to >> connect via ssl and client certificate for authentication. >> Unfortunately ... without success. >> >> The user is: esrde_aoo and authenticates with a certificate (no >> password!). >> >> Here is the tomcat 8.5.31 (running on jre 1.8.0_152-b16) resource >> configuration: >> > driverClassName="org.postgresql.Driver" >> name="jdbc/pgrde" >> >> url="jdbc:postgresql://test.localhost:5432/esrde?user=esrde_aoo&ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory&sslcert=d:/apps/ssl/rde/esrde_aoo.crt&sslkey=d:/apps/ssl/rde/esrde_aoo.key" >> >> username="esrde_aoo" >> /> >> >> I tried many many different combination ... it always fails at the second >> parameter. >> org.apache.tomcat.util.digester.Digester.fatalError Parse Fatal Error at >> line 21 column 79: The reference to entity "ssl" must end with the ';' >> delimiter. >> org.xml.sax.SAXParseException; systemId: >> file:/D:/apps/web-data/tam/conf/db-context-ora12.xml; lineNumber: 21; >> columnNumber: 79; The reference to entity "ssl" must end with the ';' >> delimiter. >> at >> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) >> at >> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) >> at >> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) >> at >> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) >> at >> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472) >> at >> com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:913) >> at >> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1548) >> at >> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1315) >> at >> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784) >> at >> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) >> at >> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505) >> at >> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841) >> at >> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770) >> at >> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) >> at >> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) >> at >> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643) >> at >> org.apache.tomcat.util.digester.Digester.parse(Digester.java:1521) >> at >> org.apache.catalina.startup.ContextConfig.processContextConfig(ContextConfig.java:527) >> at >> org.apache.catalina.startup.ContextConfig.contextConfig(ContextConfig.java:465) >> at >> org.apache.catalina.startup.ContextConfig.init(ContextConfig.java:728) >> at >> org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:310) >> at >> org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:94) >> at >> org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:395) >> at >> org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:108) >> at >> org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:140) >> at >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754) >>
Re: scram-sha-256 authentication broken in FIPS mode
On 11/09/2018 05:02, Michael Paquier wrote: > Hence, intrinsically, we are in contradiction with the upstream docs. I > have worked on the problem with the patch, which works down to OpenSSL > 0.9.8, and should fix your issue. This is based on what you sent > previously, except that I was not able to apply what was sent, so I > reworked the whole. Alessandro, does this fix your problems? I would > like to apply that down to v10 where SCRAM has been introduced. I recommend letting this bake in the master branch for a while. There are a lot weirdly patched and alternative OpenSSL versions out there that defy any documentation. Of course, we should also see if this actually fixes the reported problem. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Convert interval to hours
On Fri, Sep 14, 2018 at 11:51 AM David Gauthier wrote: > > Hi: > > In perl/DBI, I have code that's getting me an "age" which returns something > like... "-17 days -08:29:35". How can I convert that to a number of hours > (as a float I would presume) ? > > Thanks > > > > I've done this as select extract(epoch from '-17 days -08:29:35'::interval)/3600 as hours; hours --- -416.49305556 (1 row) Others might have a better way, though. Pete
Re: Code of Conduct plan
On Fri, Sep 14, 2018 at 7:19 AM, Joshua D. Drake wrote: > Sure and that is unfortunate but isn't it up to the individual to deal with > it through appropriate channels for whatever platform they are on? All of > these platforms are: > > 1. Voluntary to use > 2. Have their own Terms of Use and complaint departments > 3. If it is abuse there are laws > > I agree that within Postgresql.org we must have a professional code of > conduct but the idea that an arbitrary committee appointed by an unelected > board can decide the fate of a community member based on actions outside of > the community is a bit authoritarian don't you think? The choice of the committee members is hardly arbitrary. Having committee members be appointed by core is more or less consistent with how the community has always dealt with disciplinary issues. The criteria used by core were discussed quite openly. While the risk that the committee will yield their power in an "authoritarian" way seems very small, it cannot be ruled out entirely. In fact, it hasn't been ruled out by the draft CoC itself. No CoC can possibly provide for every conceivable situation. Somebody has to interpret the rules, and it has to be possible to impose sanctions when the CoC is violated -- otherwise, what's the point? There are several checks and balances in place, and I for one have confidence in the process as outlined. It's imperfect, but quite a lot better than either the status quo, or a platitude about inclusivity. -- Peter Geoghegan
Re: Code of Conduct plan
On Fri, Sep 14, 2018 at 10:31 AM, Dimitri Maziuk wrote: > So let me get this straight: you want to have a "sanctioned" way to deny > people access to postgresql community support channel? Yes. > "Because > somebody who may or may not be the same person, allegedly said something > somewhere that some other tweet disagreed with on faceplant"? > > Great plan if you do for-pay postgresql support for the living. You can make your own conclusions about my motivations, just as I'll make my own conclusions about yours. I'm not going to engage with you on either, though. -- Peter Geoghegan
Re: Code of Conduct plan
On Fri, Sep 14, 2018 at 11:06 AM, Dimitri Maziuk wrote: > Personally I would like that. Others might prefer an invitation to > unsubscribe or forever hold their peace, I could live with that too, but > I believe explicit opt-ins are preferable to opt-outs. I think that it's a legitimate position to be opposed to a CoC like this. I also think it's legitimate to feel so strongly about it, on philosophical or political grounds, that you are compelled to avoid participating while subject to the CoC. FWIW, the latter position seems rather extreme to me personally, but I still respect it. In all sincerity, if you're compelled to walk away from participating in mailing list discussions on a point of principle, then I wish you well. That is your right. -- Peter Geoghegan
Re: Convert interval to hours
On Fri, Sep 14, 2018 at 2:42 PM Steven Lembark wrote: > > On Fri, 14 Sep 2018 12:21:14 -0400 > David Gauthier wrote: > > > I'm using postgres v9.5.2 on RH6. > > PG can convert the times for you. > For times (not timestamps) you are always better off dealing with > either time or integer seconds. There are a variety of issues with > rouding that affect repeatability and accuracy of results using > floats or doubles. Given that 10 and three are both continuing > fractions in binary (e.g., 1/10 binary is an infinite series) > division by 3600 will only cause you annoyance at some point. > > If you are subtracting times then you will (usually) end up with > an interval, which can be cast to seconds in the query and give > you precise, accurate, repeatable results every time. > > e.g., > > select > extract > ( > epoch from ( time1 - time2 )::interval > ) > as "seconds", > ... > > is one approach. > > In nearly all cases you are better off selecting and converting > the time in SQL rather than converting the start and end times > from numeric (time) to string (DBI) and then back from char * > to float/double or int/unsigned. The charaacter conversion is > expensive and numeric -> string -> numeric leaes you open to all > sorts of rouding and conversion issues. > > Frankly, if you have to run the query more than once I'd suggest > adding a view that does the select/convert for you (along with > dealing with any NULL's that creep into things). PG makes it quite > easy to add the view and quite in-expensive to apply it. > In the original e-mail, the OP said > I have code that's getting me an "age" which returns something like... "-17 > days -08:29:35". I took that to mean he was beginning with a string, which I suggested to cast to an interval. If he's starting with a different type, then of course the fewer castings the better. Also, it seems as though you two have had private communication, because I don't see an e-mail where he specified the DB type. Perhaps he also showed more of the source data there. Pete
Re: pglogical extension. - 2 node master <-> master logical replication ?
On 2019-07-02 19:54, Day, David wrote: > On attempting the reverse subscription from host1 to host2 > > select pglogical.create_subscription('alabama_sub', > 'host=alabama port=5432 dbname=ace_db user=replicator', > '{connections}', false, false, '{}' ) > > could not connect to the postgresql server: FATAL: role "pgsql" does not > exist > DETAIL: dsn was: host=georgia port=5432 dbname=ace_db > > -- > Wrong dsn and role ? > The postgres installed superuser role is not pgsql. > I did not see this issue in the working subscription direction. The problem might be in the dsn that you gave to create_node(). Hard to tell without a fully reproducible script. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services