Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hello. Wrote a patch implementing COPY with ignoring errors in rows using block subtransactions. Syntax: COPY [table] FROM [file/stdin] WITH IGNORE_ERROS; Examples: CREATE TABLE check_ign_err (n int, m int, k int); COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS; 1 1 1 2 2 2 2 3 3 3 \. WARNING: COPY check_ign_err, line 2: "2 2 2 2" SELECT * FROM check_ign_err; n | m | k ---+---+--- 1 | 1 | 1 3 | 3 | 3 (2 rows) ## TRUNCATE check_ign_err; COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS; 1 1 1 2 2 3 3 3 \. WARNING: COPY check_ign_err, line 2: "2 2" SELECT * FROM check_ign_err; n | m | k ---+---+--- 1 | 1 | 1 3 | 3 | 3 (2 rows) ## TRUNCATE check_ign_err; COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS; 1 1 1 2 a 2 3 3 3 \. WARNING: COPY check_ign_err, line 2, column m: "a" SELECT * FROM check_ign_err; n | m | k ---+---+--- 1 | 1 | 1 3 | 3 | 3 (2 rows) Regards, Damir пт, 10 дек. 2021 г. в 21:48, Pavel Stehule : > > > 2014-12-26 11:41 GMT+01:00 Pavel Stehule : > >> >> >> 2014-12-25 22:23 GMT+01:00 Alex Shulgin : >> >>> Trent Shipley writes: >>> >>> > On Friday 2007-12-14 16:22, Tom Lane wrote: >>> >> Neil Conway writes: >>> >> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct >>> COPY >>> >> > to drop (and log) rows that contain malformed data. That is, rows >>> with >>> >> > too many or too few columns, rows that result in constraint >>> violations, >>> >> > and rows containing columns where the data type's input function >>> raises >>> >> > an error. The last case is the only thing that would be a bit >>> tricky to >>> >> > implement, I think: you could use PG_TRY() around the >>> InputFunctionCall, >>> >> > but I guess you'd need a subtransaction to ensure that you reset >>> your >>> >> > state correctly after catching an error. >>> >> >>> >> Yeah. It's the subtransaction per row that's daunting --- not only >>> the >>> >> cycles spent for that, but the ensuing limitation to 4G rows imported >>> >> per COPY. >>> > >>> > You could extend the COPY FROM syntax with a COMMIT EVERY n clause. >>> This >>> > would help with the 4G subtransaction limit. The cost to the ETL >>> process is >>> > that a simple rollback would not be guaranteed send the process back >>> to it's >>> > initial state. There are easy ways to deal with the rollback issue >>> though. >>> > >>> > A {NO} RETRY {USING algorithm} clause might be useful. If the NO >>> RETRY >>> > option is selected then the COPY FROM can run without subtransactions >>> and in >>> > excess of the 4G per transaction limit. NO RETRY should be the >>> default since >>> > it preserves the legacy behavior of COPY FROM. >>> > >>> > You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not >>> give the >>> > option of sending exceptions to a table since they are presumably >>> malformed, >>> > otherwise they would not be exceptions. (Users should re-process >>> exception >>> > files if they want an if good then table a else exception to table b >>> ...) >>> > >>> > EXCEPTIONS TO and NO RETRY would be mutually exclusive. >>> > >>> > >>> >> If we could somehow only do a subtransaction per failure, things would >>> >> be much better, but I don't see how. >>> >>> Hello, >>> >>> Attached is a proof of concept patch for this TODO item. There is no >>> docs yet, I just wanted to know if approach is sane. >>> >>> The added syntax is like the following: >>> >>> COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout] >>> >>> The way it's done it is abusing Copy Both mode and from my limited >>> testing, that seems to just work. The error trapping itself is done >>> using PG_TRY/PG_CATCH and can only catch formatting or before-insert >>> trigger errors, no attempt is made to recover from a failed unique >>> constraint, etc. >>> >>> Example in action: >>> >>> postgres=# \d test_copy2 >>> Table "public.test_copy2" >>> Column | Type | Modifiers >>> +-+--- >>> id | integer | >>> val| integer | >>> >>> postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout; >>> 1 >>> NOTICE: missing data for column "val" >>> CONTEXT: COPY test_copy2, line 1: "1" >>> 2 >>> NOTICE: missing data for column "val" >>> CONTEXT: COPY test_copy2, line 2: "2" >>> 3 >>> NOTICE: missing data for column "val" >>> CONTEXT: COPY test_copy2, line 3: "3" >>> NOTICE: total exceptions ignored: 3 >>> >>> postgres=# \d test_copy1 >>> Table "public.test_copy1" >>> Column | Type | Modifiers >>> +-+--- >>> id | integer | not null >>> >>> postgres=# set client_min_messages to warning; >>> SET >>> postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout; >>> ... >>> vmstat >>> zoneinfo >>> postgres=# >>> >>> Limited performance testing shows no significant difference between >>> error-catching and plain code path. For example, timing >>> >>>
Re: Addition of --no-sync to pg_upgrade for test speedup
On Fri, Dec 17, 2021 at 09:47:05AM -0500, Bruce Momjian wrote: > On Fri, Dec 17, 2021 at 10:21:04AM +0100, Peter Eisentraut wrote: >> I think that is reasonable. Thanks. I have applied that, as that really helped here. >> Maybe we could have some global option, like some environment variable, that >> enables the "sync" mode in all tests, so it's easy to test that once in a >> while. Not really a requirement for your patch, but an idea in case this is >> a concern. > > Yes, I think it would be good to see all the places we might want to > pass the no-sync option. The remaining places in src/bin/ that I can see are pg_resetwal, where we would fsync() a WAL segment full of zeros, and pg_recvlogical OutputFsync(), which does not point to much data, I guess. The first one may be worth it, but that's just 16MB we are talking about and WriteEmptyXLOG() is not a code path taken currently by the tests. We could introduce a new environment variable if one wishes to enforce those flushes, say PG_TEST_SYNC, on top of patching any TAP test that has a --no-sync to filter it out. -- Michael signature.asc Description: PGP signature
Refactoring of compression options in pg_basebackup
Hi all, (Added Georgios in CC.) When working on the support of LZ4 for pg_receivewal, walmethods.c has gained an extra parameter for the compression method. It gets used on the DirectoryMethod instead of the compression level to decide which type of compression is used. One thing that I left out during this previous work is that the TarMethod also gained knowledge of this compression method, but we still use the compression level to check if tars should be compressed or not. This is wrong on multiple aspects. First, this is not consistent with the directory method, making walmethods.c harder to figure out. Second, this is not extensible if we want to introduce more compression methods in pg_basebackup, like LZ4. This reflects on the options used by pg_receivewal and pg_basebackup, that are not inconsistent as well. The attached patch refactors the code of pg_basebackup and the TarMethod of walmethods.c to use the compression method where it should, splitting entirely the logic related the compression level. This is one step toward the introduction of LZ4 in pg_basebackup, but this refactoring is worth doing on its own, hence a separate thread to deal with this problem first. The options of pg_basebackup are reworked to be consistent with pg_receivewal, as follows: - --compress ranges now from 1 to 9, instead of 0 to 9. - --compression-method={none,gzip} is added, the default is none, same as HEAD. - --gzip/-z has the same meaning as before, being just a synonym of --compression-method=gzip with the default compression level of ZLIB assigned if there is no --compress. One more thing that I have noticed while hacking this stuff is that we have no regression tests for gzip with pg_basebackup, so I have added some that are skipped when not compiling the code with ZLIB. Opinions? -- Michael From 6457eae3577cd0074f58797c4fa420c296d68b39 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Sat, 18 Dec 2021 20:27:30 +0900 Subject: [PATCH] Refactor options of pg_basebackup for compression level and methods --- src/bin/pg_basebackup/pg_basebackup.c| 79 +++- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 47 +++- src/bin/pg_basebackup/walmethods.c | 47 +++- doc/src/sgml/ref/pg_basebackup.sgml | 22 +- 4 files changed, 155 insertions(+), 40 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1739ac6382..01529aa036 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -113,6 +113,7 @@ static bool showprogress = false; static bool estimatesize = true; static int verbose = 0; static int compresslevel = 0; +static WalCompressionMethod compression_method = COMPRESSION_NONE; static IncludeWal includewal = STREAM_WAL; static bool fastcheckpoint = false; static bool writerecoveryconf = false; @@ -359,7 +360,9 @@ usage(void) printf(_(" -X, --wal-method=none|fetch|stream\n" " include required WAL files with specified method\n")); printf(_(" -z, --gzip compress tar output\n")); - printf(_(" -Z, --compress=0-9 compress tar output with given compression level\n")); + printf(_(" -Z, --compress=1-9 compress tar output with given compression level\n")); + printf(_(" --compression-method=METHOD\n" + " method to compress data\n")); printf(_("\nGeneral options:\n")); printf(_(" -c, --checkpoint=fast|spread\n" " set fast or spread checkpointing\n")); @@ -524,7 +527,7 @@ LogStreamerMain(logstreamer_param *param) stream.do_sync); else stream.walmethod = CreateWalTarMethod(param->xlog, - COMPRESSION_NONE, /* ignored */ + compression_method, compresslevel, stream.do_sync); @@ -1034,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation, archive_file = NULL; } + if (compression_method == COMPRESSION_NONE) + streamer = bbstreamer_plain_writer_new(archive_filename, + archive_file); #ifdef HAVE_LIBZ - if (compresslevel != 0) + else if (compression_method == COMPRESSION_GZIP) { strlcat(archive_filename, ".gz", sizeof(archive_filename)); streamer = bbstreamer_gzip_writer_new(archive_filename, archive_file, compresslevel); } - else #endif - streamer = bbstreamer_plain_writer_new(archive_filename, - archive_file); - + else + { + Assert(false); /* not reachable */ + } /* * If we need to parse the archive for whatever reason, then we'll @@ -1765,6 +1771,7 @@ main(int argc, char **argv) {"no-manifest", no_argument, NULL, 5}, {"manifest-force-encode", no_argument, NULL, 6}, {"manifest-checksums", required_argument, NULL, 7}, + {"compression-method", required_argument, NULL, 8}, {NULL, 0, NULL, 0} }; int c; @@ -1872,14 +1879,10 @@ main
Re: Adding CI to our tree
On 12/17/21 14:34, Andres Freund wrote: > Hi, > > On 2021-12-17 09:36:05 -0500, Tom Lane wrote: >> Andrew Dunstan writes: >>> Maye I have missed it, but why are we using ccache here? That seems a >>> bit pointless in an ephemeral instance. >> I believe Munro's cfbot tooling is able to save and re-use ccache >> across successive instantiations of a build instance. I've not >> looked at this code, but if it can do that there'd be point to it. > Yes, the ccache cache is persisted across runs (see the *_cache and > upload_cache inststructions). It makes a quite substantial difference. One > reason the windows runs are a lot slower than the others is just that visual > studio isn't yet supported by ccache, and that there doesn't seem to be good > other tools. > > The ccache maintainers merged more of the msvc support last weekend! So I have > quite a bit of hope of being able to use ccache there as well. > Ok. I have had to disable ccache for fairywren (msys2) because it caused serious instability. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Column Filtering in Logical Replication
On 2021-Dec-18, Tomas Vondra wrote: > On 12/18/21 02:34, Alvaro Herrera wrote: > > On 2021-Dec-17, Tomas Vondra wrote: > > > > If the server has a *separate* security mechanism to hide the > > > > columns (per-column privs), it is that feature that will protect > > > > the data, not the logical-replication-feature to filter out > > > > columns. > > > > > > Right. Although I haven't thought about how logical decoding > > > interacts with column privileges. I don't think logical decoding > > > actually checks column privileges - I certainly don't recall any > > > ACL checks in src/backend/replication ... > > > > Well, in practice if you're confronted with a replica that's > > controlled by a malicious user that can tweak its behavior, then > > replica-side privilege checking won't do anything useful. > > I don't follow. Surely the decoding happens on the primary node, > right? Which is where the ACL checks would happen, using the role the > replication connection is opened with. I think you do follow. Yes, the decoding happens on the primary node, and the security checks should occur in the primary node, because to do otherwise is folly(*). Which means that column filtering, being a replica-side feature, is *not* a security feature. I was mistaken about it, is all. If you want security, you need to use column-level privileges, as you say. (*) The checks *must* occur in the primary side, because the primary does not control the code that runs in the replica side. The primary must treat the replica as running potentially hostile code. Trying to defend against that is not practical. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: sequences vs. synchronous replication
On 12/18/21 07:00, Tomas Vondra wrote: On 12/18/21 05:52, Tom Lane wrote: Tomas Vondra writes: The problem is exactly the same as in [1] - the aborted transaction generated WAL, but RecordTransactionAbort() ignores that and does not update LogwrtResult.Write, with the reasoning that aborted transactions do not matter. But sequences violate that, because we only write WAL once every 32 increments, so the following nextval() gets "committed" without waiting for the replica (because it did not produce WAL). Ugh. I'm not sure this is a clear data corruption bug, but it surely walks and quacks like one. My proposal is to fix this by tracking the lsn of the last LSN for a sequence increment, and then check that LSN in RecordTransactionCommit() before calling XLogFlush(). (1) Does that work if the aborted increment was in a different session? I think it is okay but I'm tired enough to not be sure. Good point - it doesn't :-( At least not by simply storing LSN in a global variable or something like that. The second backend needs to know the LSN of the last WAL-logged sequence increment, but only the first backend knows that. So we'd need to share that between backends somehow. I doubt we want to track LSN for every individual sequence (because for clusters with many dbs / sequences that may be a lot). Perhaps we could track just a fixed number o LSN values in shared memory (say, 1024), and update/read just the element determined by hash(oid). That is, the backend WAL-logging sequence with given oid would set the current LSN to array[hash(oid) % 1024], and backend doing nextval() would simply remember the LSN in that slot. Yes, if there are conflicts that'll flush more than needed. Here's a PoC demonstrating this idea. I'm not convinced it's the right way to deal with this - it surely seems more like a duct tape fix than a clean solution. But it does the trick. I wonder if storing this in shmem is good enough - we lose the LSN info on restart, but the checkpoint should trigger FPI which makes it OK. A bigger question is whether sequences are the only thing affected by this. If you look at RecordTransactionCommit() then we skip flush/wait in two cases: 1) !wrote_xlog - if the xact did not produce WAL 2) !markXidCommitted - if the xact does not have a valid XID Both apply to sequences, and the PoC patch tweaks them. But maybe there are other places where we don't generate WAL and/or assign XID in some cases, to save time? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index e7b0bc804d..c6a0c07846 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -35,6 +35,7 @@ #include "catalog/pg_enum.h" #include "catalog/storage.h" #include "commands/async.h" +#include "commands/sequence.h" #include "commands/tablecmds.h" #include "commands/trigger.h" #include "common/pg_prng.h" @@ -1289,6 +1290,13 @@ RecordTransactionCommit(void) bool RelcacheInitFileInval = false; bool wrote_xlog; + /* + * Force flush and synchronous replication even if no XID was assigned. + * This is needed when depending on WAL produced by another transaction + * (possibly in a different backend). Sequences need this, for example. + */ + bool force_sync = false; + /* * Log pending invalidations for logical decoding of in-progress * transactions. Normally for DDLs, we log this at each command end, @@ -1299,6 +1307,24 @@ RecordTransactionCommit(void) if (XLogLogicalInfoActive()) LogLogicalInvalidations(); + /* + * Check the LSN of increments for sequences we touched in this transaction. + * If it's higher than XactLastRecEnd, we need to force flush/sync. + */ + { + /* Separate call, so that we can compare to XactLastRecEnd. */ + XLogRecPtr tmpptr = SequenceGetLastLSN(); + + /* + * If higher than XactLastRecEnd, make sure we flush even without + * a XID assigned. + */ + force_sync = (tmpptr > XactLastRecEnd); + + /* Override the XactLastRecEnd value. */ + XactLastRecEnd = Max(XactLastRecEnd, tmpptr); + } + /* Get data needed for commit record */ nrels = smgrGetPendingDeletes(true, &rels); nchildren = xactGetCommittedChildren(&children); @@ -1446,7 +1472,7 @@ RecordTransactionCommit(void) * if all to-be-deleted tables are temporary though, since they are lost * anyway if we crash.) */ - if ((wrote_xlog && markXidCommitted && + if ((wrote_xlog && (markXidCommitted || force_sync) && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) || forceSyncCommit || nrels > 0) { @@ -1504,7 +1530,7 @@ RecordTransactionCommit(void) * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ - if (wrote_xlog && markXidCommitted) + if (wrote_xlog && (markXidCommitted || force_sync)) SyncRepWaitForLSN(XactLastRecEnd
Is my home $HOME or is it getpwent()->pw_dir ?
Hi, I sometimes do some testing as nobody, on a distro where getpwent(nobody)->pw_dir is a directory that nobody can't write. So I end up setting $HOME to a directory that, um, is writable. When I start psql, strace shows $HOME being honored when looking for .terminfo and .inputrc, and getpwent()->pw_dir being used to look for .pgpass, .psqlrc, and .psql_history, which of course aren't there. I'm sure the .terminfo and .inputrc lookups are being done by library code. In my experience, it seems traditionally unixy to let $HOME take precedence. Maybe things that are pointedly cross-platform are more likely to rely on the getpwent lookup. I run into the same issue with Java, which is pointedly cross-platform. But there, I can alias java to java -Duser.home="$HOME" and all is well. Would a patch be acceptable for psql to allow such an option on the command line? I assume that would be more acceptable than just changing the default behavior. And if so, would it be preferable to add a whole new option for it, (--home ?) or, analogously to the way java works, just to add a HOME variable so it can be set on the command line with -v ? Or would a name like HOME pose too much risk that somebody is using such a variable in psql scripts for unrelated purposes? In a moment of hopefulness I tried \set and looked to see if such a thing already exists, but I didn't see it. I see that I can set a HISTFILE variable (or set PSQL_HISTORY in the environment), and can set PSQLRC in the environment (but not as a variable), and nothing can set the .pgpass location. One HOME variable could take care of all three in one foop. (Or could it? Perhaps .pgpass is handled in libpq at a layer unaware of psql variables? But maybe the variable could have a modify event that alerts libpq.) Regards, -Chap
Re: Is my home $HOME or is it getpwent()->pw_dir ?
On 12/18/21 15:57, Chapman Flack wrote: > I see that I can set > a HISTFILE variable (or set PSQL_HISTORY in the environment), > and can set PSQLRC in the environment (but not as a variable), > and nothing can set the .pgpass location well, not in the psql docs, but in the environment variable section for libpq I do see a PGPASSFILE. -C
Re: Is my home $HOME or is it getpwent()->pw_dir ?
On Sat, Dec 18, 2021 at 2:07 PM Chapman Flack wrote: > On 12/18/21 15:57, Chapman Flack wrote: > > I see that I can set > > a HISTFILE variable (or set PSQL_HISTORY in the environment), > > and can set PSQLRC in the environment (but not as a variable), > > and nothing can set the .pgpass location > > well, not in the psql docs, but in the environment variable section > for libpq I do see a PGPASSFILE. > > psql docs saith: "This utility, like most other PostgreSQL utilities, also uses the environment variables supported by libpq (see Section 34.15)." David J.
Re: Is my home $HOME or is it getpwent()->pw_dir ?
On 12/18/21 16:16, David G. Johnston wrote: > psql docs saith: > > "This utility, like most other PostgreSQL utilities, also uses the > environment variables supported by libpq (see Section 34.15)." I'm sure that's adequate as far as that goes. I just happened to miss it when composing the longer email (and then I just thought "I bet there are environment variables supported by libpq" and looked there). Regards, -Chap
Re: sequences vs. synchronous replication
Tomas Vondra writes: > Here's a PoC demonstrating this idea. I'm not convinced it's the right > way to deal with this - it surely seems more like a duct tape fix than a > clean solution. But it does the trick. I was imagining something a whole lot simpler, like "don't try to cache unused sequence numbers when wal_level > minimal". We've accepted worse performance hits in that operating mode, and it'd fix a number of user complaints we've seen about weird sequence behavior on standbys. regards, tom lane
Re: sequences vs. synchronous replication
On 12/18/21 22:27, Tom Lane wrote: Tomas Vondra writes: Here's a PoC demonstrating this idea. I'm not convinced it's the right way to deal with this - it surely seems more like a duct tape fix than a clean solution. But it does the trick. I was imagining something a whole lot simpler, like "don't try to cache unused sequence numbers when wal_level > minimal". We've accepted worse performance hits in that operating mode, and it'd fix a number of user complaints we've seen about weird sequence behavior on standbys. What do you mean by "not caching unused sequence numbers"? Reducing SEQ_LOG_VALS to 1, i.e. WAL-logging every sequence increment? That'd work, but I wonder how significant the impact will be. It'd bet it hurts the patch adding logical decoding of sequences quite a bit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: sequences vs. synchronous replication
Tomas Vondra writes: > What do you mean by "not caching unused sequence numbers"? Reducing > SEQ_LOG_VALS to 1, i.e. WAL-logging every sequence increment? Right. > That'd work, but I wonder how significant the impact will be. As I said, we've accepted worse in order to have stable replication behavior. regards, tom lane
Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
Remove bogus ACL check for AMs. Rebased on cf0cab868. Use ForkNumber rather than int. Update comments and commit message. Also move the Size column of \l and \dt >From a1d33780e4c61a7bdb64ee6ab5c248e1175a3efd Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 13 Jul 2021 21:25:48 -0500 Subject: [PATCH v4 1/4] Add pg_am_size(), pg_namespace_size() .. See also: 358a897fa, 528ac10c7 --- src/backend/utils/adt/dbsize.c | 130 src/include/catalog/pg_proc.dat | 19 + 2 files changed, 149 insertions(+) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index d5a7fb13f3..908432ba28 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -13,18 +13,23 @@ #include +#include "access/genam.h" #include "access/htup_details.h" #include "access/relation.h" +#include "access/table.h" #include "catalog/catalog.h" #include "catalog/namespace.h" #include "catalog/pg_authid.h" #include "catalog/pg_tablespace.h" #include "commands/dbcommands.h" +#include "commands/defrem.h" #include "commands/tablespace.h" #include "miscadmin.h" #include "storage/fd.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" +#include "utils/lsyscache.h" #include "utils/numeric.h" #include "utils/rel.h" #include "utils/relfilenodemap.h" @@ -832,6 +837,131 @@ pg_size_bytes(PG_FUNCTION_ARGS) PG_RETURN_INT64(result); } +/* + * Return the sum of size of relations for which the given attribute of + * pg_class matches the specified OID value. + */ +static int64 +calculate_size_attvalue(int attnum, Oid attval) +{ + int64 totalsize = 0; + ScanKeyData skey; + Relation pg_class; + SysScanDesc scan; + HeapTuple tuple; + + ScanKeyInit(&skey, attnum, + BTEqualStrategyNumber, F_OIDEQ, attval); + + pg_class = table_open(RelationRelationId, AccessShareLock); + scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, &skey); + while (HeapTupleIsValid(tuple = systable_getnext(scan))) + { + Relation rel; + Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple); + + rel = try_relation_open(classtuple->oid, AccessShareLock); + if (!rel) + continue; + + for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++) + totalsize += calculate_relation_size(&rel->rd_node, rel->rd_backend, forkNum); + + relation_close(rel, AccessShareLock); + } + + systable_endscan(scan); + table_close(pg_class, AccessShareLock); + return totalsize; +} + +/* Compute the size of relations in a schema (namespace) */ +static int64 +calculate_namespace_size(Oid nspOid) +{ + /* + * User must be a member of pg_read_all_stats or have CREATE privilege for + * target namespace. + */ + if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS)) + { + AclResult aclresult; + aclresult = pg_namespace_aclcheck(nspOid, GetUserId(), ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, OBJECT_SCHEMA, + get_namespace_name(nspOid)); + } + + return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid); +} + +Datum +pg_namespace_size_oid(PG_FUNCTION_ARGS) +{ + int64 size; + Oid nspOid = PG_GETARG_OID(0); + + size = calculate_namespace_size(nspOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +Datum +pg_namespace_size_name(PG_FUNCTION_ARGS) +{ + int64 size; + Name nspName = PG_GETARG_NAME(0); + Oid nspOid = get_namespace_oid(NameStr(*nspName), false); + + size = calculate_namespace_size(nspOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +/* Compute the size of relations using the given access method */ +static int64 +calculate_am_size(Oid amOid) +{ + /* XXX acl_check? */ + + return calculate_size_attvalue(Anum_pg_class_relam, amOid); +} + +Datum +pg_am_size_oid(PG_FUNCTION_ARGS) +{ + int64 size; + Oid amOid = PG_GETARG_OID(0); + + size = calculate_am_size(amOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +Datum +pg_am_size_name(PG_FUNCTION_ARGS) +{ + int64 size; + Name amName = PG_GETARG_NAME(0); + Oid amOid = get_am_oid(NameStr(*amName), false); + + size = calculate_am_size(amOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + /* * Get the filenode of a relation * diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 4d992dc224..ff08b2ac01 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7230,6 +7230,25 @@ descr => 'total disk space usage for the specified tablespace', proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8', proargtypes => 'name', prosrc => 'pg_tablespace_size_name' }, + +{ oid => '9410', + descr => 'total disk space usage for the specified namespace', + proname => 'pg_namespace_size', provolatile => 'v', prorettype => 'int8', + proargtypes => 'oid', prosrc => 'pg_namespace_size_oid' }, +{ oid => '9411', + descr => 'total disk space usage for the spe
Getting rid of regression test input/ and output/ files
I did some desultory investigation of the idea proposed at [1] that we should refactor the regression test scripts to try to reduce their interdependencies. I soon realized that one of the stumbling blocks to this is that we've tended to concentrate data-loading COPY commands, as well as C function creation commands, into a few files to reduce the notational cruft of substituting path names and the like into the test scripts. That is, we don't want to have even more scripts that have to be translated from input/foo.source and output/foo.source into runnable scripts and test results. This led me to wonder why we couldn't get rid of that entire mechanism in favor of some less-painful way of getting that information into the scripts. If we had the desired values in psql variables, we could do what we need easily, for example instead of CREATE FUNCTION check_primary_key () RETURNS trigger AS '@libdir@/refint@DLSUFFIX@' LANGUAGE C; something like CREATE FUNCTION check_primary_key () RETURNS trigger AS :'LIBDIR' '/refint' :'DLSUFFIX' LANGUAGE C; (The extra line breaks are needed to convince SQL that the adjacent string literals should be concatenated. We couldn't have done this so easily before psql had the :'variable' notation, but that came in in 9.0.) I see two ways we could get the info from pg_regress into psql variables: 1. Add "-v VARIABLE=VALUE" switches to the psql invocations. This requires no new psql capability, but it does introduce the problem of getting correct shell quoting of the values. I think we'd need to either duplicate appendShellString in pg_regress.c, or start linking both libpq and libpgfeutils.a into pg_regress to be able to use appendShellString itself. In the past we've not wanted to link libpq into pg_regress (though I admit I've forgotten the argument for not doing so). 2. Export the values from pg_regress as environment variables, and then add a way for the test scripts to read those variables. I was a bit surprised to realize that we didn't have any way to do that already --- psql has \setenv, so why did we never invent \getenv? On the whole I prefer #2, as it seems cleaner and it adds some actually useful-to-end-users psql functionality. Attached is a really incomplete, quick-n-dirty POC showing that this can be made to work. If there aren't objections or better ideas, I'll see about fleshing this out. regards, tom lane [1] https://www.postgresql.org/message-id/20211217182518.GA2529654%40rfd.leadboat.com diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index ccd7b48108..fb3bab9494 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -98,6 +98,8 @@ static backslashResult process_command_g_options(char *first_option, bool active_branch, const char *cmd); static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch); +static backslashResult exec_command_getenv(PsqlScanState scan_state, bool active_branch, + const char *cmd); static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch); @@ -348,6 +350,8 @@ exec_command(const char *cmd, status = exec_command_g(scan_state, active_branch, cmd); else if (strcmp(cmd, "gdesc") == 0) status = exec_command_gdesc(scan_state, active_branch); + else if (strcmp(cmd, "getenv") == 0) + status = exec_command_getenv(scan_state, active_branch, cmd); else if (strcmp(cmd, "gexec") == 0) status = exec_command_gexec(scan_state, active_branch); else if (strcmp(cmd, "gset") == 0) @@ -1481,6 +1485,43 @@ exec_command_gdesc(PsqlScanState scan_state, bool active_branch) return status; } +/* + * \getenv -- set variable from environment variable + */ +static backslashResult +exec_command_getenv(PsqlScanState scan_state, bool active_branch, + const char *cmd) +{ + bool success = true; + + if (active_branch) + { + char *myvar = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + char *envvar = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + + if (!myvar || !envvar) + { + pg_log_error("\\%s: missing required argument", cmd); + success = false; + } + else + { + char *envval = getenv(envvar); + + if (envval && !SetVariable(pset.vars, myvar, envval)) +success = false; + } + free(myvar); + free(envvar); + } + else + ignore_slash_options(scan_state); + + return success ? PSQL_CMD_SKIP_LINE : PSQL_CMD_ERROR; +} + /* * \gexec -- send query and execute each field of result */ diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source index 8acb516801..ad42d02a22 100644 --- a/src/test/regress/input/copy.source +++ b/src/test/regress/i
Re: should we document an example to set multiple libraries in shared_preload_libraries?
On Thu, Dec 9, 2021 at 7:42 AM Bharath Rupireddy wrote: > How about ALTER SYSTEM SET shared_preload_libraries command itself > checking for availability of the specified libraries (after fetching > library names from the parsed string value) with stat() and then > report an error if any of the libraries doesn't exist? Currently, it > just accepts if the specified value passes the parsing. That certainly would have helped me. I guess it technically breaks the theoretical use case of "first change the shared_preload_libraries setting, then install those libraries, then restart Postgres," but I don't see why anyone would do that in practice. For what it's worth, I don't even feel strongly that this needs to be an error—just that the current flow around this is error-prone due to several sharp edges and could be improved. I would be happy with an error, but a warning or other mechanism could work, too. I do think better documentation is not enough: the differences between a working setting value and a broken one are pretty subtle. Thanks, Maciek
Re: [EXTERNAL] Re: should we document an example to set multiple libraries in shared_preload_libraries?
On Fri, Dec 10, 2021 at 10:10 AM Godfrin, Philippe E wrote: > I may have missed something in this stream, but is this a system controlled > by Patroni? In my case, no: it's a pretty vanilla PGDG install on Ubuntu 20.04. Thanks for the context, though. Thanks, Maciek
Re: sequences vs. synchronous replication
On Sat, Dec 18, 2021 at 7:24 AM Tomas Vondra wrote: > > while working on logical decoding of sequences, I ran into an issue with > nextval() in a transaction that rolls back, described in [1]. But after > thinking about it a bit more (and chatting with Petr Jelinek), I think > this issue affects physical sync replication too. > > Imagine you have a primary <-> sync_replica cluster, and you do this: > >CREATE SEQUENCE s; > >-- shutdown the sync replica > >BEGIN; >SELECT nextval('s') FROM generate_series(1,50); >ROLLBACK; > >BEGIN; >SELECT nextval('s'); >COMMIT; > > The natural expectation would be the COMMIT gets stuck, waiting for the > sync replica (which is not running), right? But it does not. > How about if we always WAL log the first sequence change in a transaction? -- With Regards, Amit Kapila.
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi so 18. 12. 2021 v 9:55 odesílatel Damir Belyalov napsal: > Hello. > > Wrote a patch implementing COPY with ignoring errors in rows using block > subtransactions. > It is great so you are working on this patch. Unfortunately, I am afraid this simple design is not optimal. Using subtransaction for every row has too big overhead. I think it should use subtransaction for blocks of rows (1000 rows), and only when there is an exception, then it should replay inserts in subtransaction per row. You should check performance overhead. Regards Pavel > Syntax: COPY [table] FROM [file/stdin] WITH IGNORE_ERROS; > > Examples: > CREATE TABLE check_ign_err (n int, m int, k int); > COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS; > 1 1 1 > 2 2 2 2 > 3 3 3 > \. > WARNING: COPY check_ign_err, line 2: "2 2 2 2" > SELECT * FROM check_ign_err; > n | m | k > ---+---+--- > 1 | 1 | 1 > 3 | 3 | 3 > (2 rows) > > ## > > TRUNCATE check_ign_err; > COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS; > 1 1 1 > 2 2 > 3 3 3 > \. > WARNING: COPY check_ign_err, line 2: "2 2" > SELECT * FROM check_ign_err; > n | m | k > ---+---+--- > 1 | 1 | 1 > 3 | 3 | 3 > (2 rows) > > ## > > TRUNCATE check_ign_err; > COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS; > 1 1 1 > 2 a 2 > 3 3 3 > \. > WARNING: COPY check_ign_err, line 2, column m: "a" > SELECT * FROM check_ign_err; > n | m | k > ---+---+--- > 1 | 1 | 1 > 3 | 3 | 3 > (2 rows) > > > > Regards, Damir > > пт, 10 дек. 2021 г. в 21:48, Pavel Stehule : > >> >> >> 2014-12-26 11:41 GMT+01:00 Pavel Stehule : >> >>> >>> >>> 2014-12-25 22:23 GMT+01:00 Alex Shulgin : >>> Trent Shipley writes: > On Friday 2007-12-14 16:22, Tom Lane wrote: >> Neil Conway writes: >> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY >> > to drop (and log) rows that contain malformed data. That is, rows with >> > too many or too few columns, rows that result in constraint violations, >> > and rows containing columns where the data type's input function raises >> > an error. The last case is the only thing that would be a bit tricky to >> > implement, I think: you could use PG_TRY() around the InputFunctionCall, >> > but I guess you'd need a subtransaction to ensure that you reset your >> > state correctly after catching an error. >> >> Yeah. It's the subtransaction per row that's daunting --- not only the >> cycles spent for that, but the ensuing limitation to 4G rows imported >> per COPY. > > You could extend the COPY FROM syntax with a COMMIT EVERY n clause. This > would help with the 4G subtransaction limit. The cost to the ETL process is > that a simple rollback would not be guaranteed send the process back to it's > initial state. There are easy ways to deal with the rollback issue though. > > A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY > option is selected then the COPY FROM can run without subtransactions and in > excess of the 4G per transaction limit. NO RETRY should be the default since > it preserves the legacy behavior of COPY FROM. > > You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give the > option of sending exceptions to a table since they are presumably malformed, > otherwise they would not be exceptions. (Users should re-process exception > files if they want an if good then table a else exception to table b ...) > > EXCEPTIONS TO and NO RETRY would be mutually exclusive. > > >> If we could somehow only do a subtransaction per failure, things would >> be much better, but I don't see how. Hello, Attached is a proof of concept patch for this TODO item. There is no docs yet, I just wanted to know if approach is sane. The added syntax is like the following: COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout] The way it's done it is abusing Copy Both mode and from my limited testing, that seems to just work. The error trapping itself is done using PG_TRY/PG_CATCH and can only catch formatting or before-insert trigger errors, no attempt is made to recover from a failed unique constraint, etc. Example in action: postgres=# \d test_copy2 Table "public.test_copy2" Column | Type | Modifiers +-+--- id | integer | val| integer | postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout; 1 NOTICE: missing data for column "val" CONTEXT: COPY test_copy2, line 1: "1" 2 NOTICE: missing data for column "val" CONTEXT: COPY
Re: Schema variables - new implementation for Postgres 15
Op 19-12-2021 om 07:23 schreef Pavel Stehule: I am sending new versions of patches. I hope I solved all Tomas's objections. 1. The schema variables were renamed to session variables 2. I fixed issues related to creating, dropping variables under subtransactions + regress tests 3. I fixed issues in pg_dump + regress tests > [0001-schema-variables-20211219.patch] > [0002-schema-variables-20211219.patch] Hi Pavel, I get an error during test 'session_variables'. (on the upside, my own little testsuite runs without error) thanks, Erik Rijkers diff -U3 /home/aardvark/pg_stuff/pg_sandbox/pgsql.schema_variables/src/test/regress/expected/session_variables.out /home/aardvark/pg_stuff/pg_sandbox/pgsql.schema_variables/src/test/regress/results/session_variables.out --- /home/aardvark/pg_stuff/pg_sandbox/pgsql.schema_variables/src/test/regress/expected/session_variables.out 2021-12-19 07:48:40.422821574 +0100 +++ /home/aardvark/pg_stuff/pg_sandbox/pgsql.schema_variables/src/test/regress/results/session_variables.out 2021-12-19 08:00:29.286820957 +0100 @@ -32,11 +32,11 @@ SET ROLE TO DEFAULT; -- check output \dV+ var1 -List of variables - Schema | Name | Type | Is nullable | Is mutable | Default | Owner | Transactional end action | Access privileges| Description ---+--+-+-++-+---+--++- - svartest | var1 | numeric | t | t | | pavel | | pavel=SW/pavel+| - | | | || | | | var_test_role=SW/pavel | + List of variables + Schema | Name | Type | Is nullable | Is mutable | Default | Owner | Transactional end action | Access privileges | Description +--+--+-+-++-+--+--+---+- + svartest | var1 | numeric | t | t | | aardvark | | aardvark=SW/aardvark +| + | | | || | | | var_test_role=SW/aardvark | (1 row) REVOKE ALL ON VARIABLE var1 FROM var_test_role; @@ -745,10 +745,10 @@ CREATE VARIABLE var1 AS int DEFAULT 100; COMMENT ON VARIABLE var1 IS 'some variable comment'; \dV+ var1 - List of variables - Schema | Name | Type | Is nullable | Is mutable | Default | Owner | Transactional end action | Access privileges | Description ---+--+-+-++-+---+--+---+--- - svartest | var1 | integer | t | t | 100 | pavel | | | some variable comment +List of variables + Schema | Name | Type | Is nullable | Is mutable | Default | Owner | Transactional end action | Access privileges | Description +--+--+-+-++-+--+--+---+--- + svartest | var1 | integer | t | t | 100 | aardvark | | | some variable comment (1 row) DROP VARIABLE var1;
Re: Schema variables - new implementation for Postgres 15
ne 19. 12. 2021 v 8:09 odesílatel Erik Rijkers napsal: > Op 19-12-2021 om 07:23 schreef Pavel Stehule: > > > > > I am sending new versions of patches. > > > > I hope I solved all Tomas's objections. > > > > 1. The schema variables were renamed to session variables > > 2. I fixed issues related to creating, dropping variables under > > subtransactions + regress tests > > 3. I fixed issues in pg_dump + regress tests > > > > > [0001-schema-variables-20211219.patch] > > [0002-schema-variables-20211219.patch] > > Hi Pavel, > > I get an error during test 'session_variables'. > > (on the upside, my own little testsuite runs without error) > > thanks, > please, can you send me regress diff? Regards Pavel > Erik Rijkers > > > > > > > > > >
Re: Schema variables - new implementation for Postgres 15
ne 19. 12. 2021 v 8:13 odesílatel Pavel Stehule napsal: > > > ne 19. 12. 2021 v 8:09 odesílatel Erik Rijkers napsal: > >> Op 19-12-2021 om 07:23 schreef Pavel Stehule: >> >> > >> > I am sending new versions of patches. >> > >> > I hope I solved all Tomas's objections. >> > >> > 1. The schema variables were renamed to session variables >> > 2. I fixed issues related to creating, dropping variables under >> > subtransactions + regress tests >> > 3. I fixed issues in pg_dump + regress tests >> > >> >> > [0001-schema-variables-20211219.patch] >> > [0002-schema-variables-20211219.patch] >> >> Hi Pavel, >> >> I get an error during test 'session_variables'. >> >> (on the upside, my own little testsuite runs without error) >> >> thanks, >> > > please, can you send me regress diff? > I see the problem now, the test contains username, and that is wrong. Schema | Name | Type | Is nullable | Is mutable | Default | Owner | Transactional end action | Access privileges | Description ---+--+-+-++-+---+--++- - svartest | var1 | numeric | t | t | | pavel | | pavel=SW/pavel +| - | | | | | | | | var_test_role=SW/pavel | +--+--+-+-++-+--+--+---+- + svartest | var1 | numeric | t | t | | appveyor | | appveyor=SW/appveyor +| + | | | | | | | | var_test_role=SW/appveyor | (1 row) REVOKE ALL ON VARIABLE var1 FROM var_test_role; I have to remove this test Pavel Regards > > Pavel > > > >> Erik Rijkers >> >> >> >> >> >> >> >> >> >>
Re: Schema variables - new implementation for Postgres 15
Op 19-12-2021 om 08:13 schreef Pavel Stehule: ne 19. 12. 2021 v 8:09 odesílatel Erik Rijkers > > [0001-schema-variables-20211219.patch] > [0002-schema-variables-20211219.patch] Hi Pavel, I get an error during test 'session_variables'. (on the upside, my own little testsuite runs without error) thanks, please, can you send me regress diff? I did attach it but if you did not receive it, see also cfbot, especially https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.156992 Erik