Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
On Sun, Jan 10, 2021 at 11:44:14AM +0500, Andrey Borodin wrote: > > 10 янв. 2021 г., в 03:15, Noah Misch написал(а): > > > > No; it deletes the most recent ~1B and leaves the older segments. An > > exception is multixact, as described in the commit message and the patch's > > change to a comment in TruncateMultiXact(). > > Thanks for clarification. > One more thing: retention point at 3/4 of overall space (half of wraparound) > seems more or less random to me. Why not 5/8 or 9/16? No reason for that exact value. The purpose of that patch is to mitigate bugs that cause the server to write data into a region of the SLRU that we permit truncation to unlink. If the patch instead tested "diff > INT_MIN * .99", the new behavior would get little testing, because xidWarnLimit would start first. Also, the new behavior wouldn't mitigate bugs that trespass >~20M XIDs into unlink-eligible space. If the patch tested "diff > INT_MIN * .01", more sites would see disk consumption grow. I think reasonable multipliers range from 0.5 (in the patch today) to 0.9, but it's a judgment call. > Can you please send revised patches with fixes? Attached. Author: Noah Misch Commit: Noah Misch Prevent excess SimpleLruTruncate() deletion. Every core SLRU wraps around. With the exception of pg_notify, the wrap point can fall in the middle of a page. Account for this in the PagePrecedes callback specification and in SimpleLruTruncate()'s use of said callback. Update each callback implementation to fit the new specification. This changes SerialPagePrecedesLogically() from the style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes(). (Whereas pg_clog and pg_serial share a key space, pg_serial is nothing like pg_notify.) The bug fixed here has the same symptoms and user followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch to 9.5 (all supported versions). Reviewed by Andrey Borodin and Tom Lane. Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 69a81f3..410d02a 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -694,6 +694,7 @@ CLOGShmemInit(void) SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER, SYNC_HANDLER_CLOG); + SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE); } /* @@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid) /* - * Decide which of two CLOG page numbers is "older" for truncation purposes. + * Decide whether a CLOG page number is "older" for truncation purposes. * * We need to use comparison of TransactionIds here in order to do the right - * thing with wraparound XID arithmetic. However, if we are asked about - * page number zero, we don't want to hand InvalidTransactionId to - * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So, - * offset both xids by FirstNormalTransactionId to avoid that. + * thing with wraparound XID arithmetic. However, TransactionIdPrecedes() + * would get weird about permanent xact IDs. So, offset both such that xid1, + * xid2, and xid2 + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset + * is relevant to page 0 and to the page preceding page 0. + * + * The page containing oldestXact-2^31 is the important edge case. The + * portion of that page equaling or following oldestXact-2^31 is expendable, + * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is + * the first XID of a page and segment, the entire page and segment is + * expendable, and we could truncate the segment. Recognizing that case would + * require making oldestXact, not just the page containing oldestXact, + * available to this callback. The benefit would be rare and small, so we + * don't optimize that edge case. */ static bool CLOGPagePrecedes(int page1, int page2) @@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2) TransactionId xid2; xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE; - xid1 += FirstNormalTransactionId; + xid1 += FirstNormalTransactionId + 1; xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE; - xid2 += FirstNormalTransactionId; + xid2 += FirstNormalTransactionId + 1; - return TransactionIdPrecedes(xid1, xid2); + return (TransactionIdPrecedes(xid1, xid2) && + TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1)); } diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index b786eef..9f42461 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -557,6 +557,7 @@ CommitTsShmemInit(void)
Re: Added schema level support for publication.
On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy wrote: > > On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy > wrote: > > I think this feature can be useful, in case a user has a lot of tables > > to publish inside a schema. Having said that, I wonder if this feature > > mandates users to create the same schema with same > > permissions/authorizations manually on the subscriber, because logical > > replication doesn't propagate any ddl's so are the schema or schema > > changes? Or is it that the list of tables from the publisher can go > > into a different schema on the subscriber? > > > > Since the schema can have other objects such as data types, functions, > > operators, I'm sure with your feature, non-table objects will be > > skipped. > > > > As Amit pointed out earlier, the behaviour when schema dropped, I > > think we should also consider when schema is altered, say altered to a > > different name, maybe we should change that in the publication too. > > > > In general, what happens if we have some temporary tables or foreign > > tables inside the schema, will they be allowed to send the data to > > subscribers? > > > > And, with this feature, since there can be many huge tables inside a > > schema, the initial table sync phase of the replication can take a > > while. > > > > Say a user has created a publication for a schema with hundreds of > > tables in it, at some point later, can he stop replicating a single or > > some tables from that schema? > > > > IMO, it's better to have the syntax - CREATE PUBLICATION > > production_publication FOR ALL TABLES IN SCHEMA production - just > > added IN between for all tables and schema. > > > > Say a user has a schema with 121 tables in it, and wants to replicate > > only 120 or 199 or even lesser tables out of it, so can we have some > > skip option to the new syntax, something like below? > > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA > > production WITH skip = marketing, accounts, sales; --> meaning is, > > replicate all the tables in the schema production except marketing, > > accounts, sales tables. > > One more point - if the publication is created for a schema with no or > some initial tables, will all the future tables that may get added to > the schema will be replicated too? > I expect this should be the behavior. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Inconsistent "" use
In doc/src/sgml/func.sgml description of SHOW command use "SQL", while SET command description the same section does not use "". Shouldn't the description of SET use "" for "SQL" as well? Patch attached. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 02a37658ad..ecb66f9c3f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24551,7 +24551,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); If is_local is true, the new value will only apply for the current transaction. If you want the new value to apply for the current session, use false -instead. This function corresponds to the SQL +instead. This function corresponds to the SQL command SET.
invalid data in file backup_label problem on windows
Hi hackers. When I using a Non-Exclusive Low-Level Backup on windows, "invalid data in file backup_label" will be shown. The code are listed below if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c", &hi, &lo, &tli_from_walseg, startxlogfilename, &ch) != 5 || ch != '\n') ereport(FATAL, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); When I wrote the backup_label on windows, CRLF will at the end of line, so the ch will be '\r'. I'm not sure that these two files(tablespace_map and backup_label) should not use CRLF new line style is mentioned in manual[1]. How about setting the text mode when reading these 2 files like patch listed in attachment? Any thought? [1] https://www.postgresql.org/docs/13/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Best Regards, Shenhao Wang 0001-set-text-mode-when-reading-backup_label-and-tablesap.patch Description: 0001-set-text-mode-when-reading-backup_label-and-tablesap.patch
Re: support for MERGE
On 1/10/21 2:44 AM, Tomas Vondra wrote: > 5) WHEN AND > > I admit the "WHEN AND" conditions sounds a bit cryptic - it took me a > while to realize what this refers to. Is that a term established by SQL > Standard, or something we invented? The grammar gets it right but the error messages are nonsensical to me. I would like to see all user-facing instances of "WHEN AND" be replaced by the admittedly more verbose "WHEN [NOT] MATCHED AND". -- Vik Fearing
Re: proposal: schema variables
pá 8. 1. 2021 v 18:54 odesílatel Erik Rijkers napsal: > On 2021-01-08 07:20, Pavel Stehule wrote: > > Hi > > > > just rebase > > > > [schema-variables-20200108.patch] > > Hey Pavel, > > My gcc 8.3.0 compile says: > (on debian 10/Buster) > > utility.c: In function ‘CreateCommandTag’: > utility.c:2332:8: warning: this statement may fall through > [-Wimplicit-fallthrough=] > tag = CMDTAG_SELECT; > ^~~ > utility.c:2334:3: note: here > case T_LetStmt: > ^~~~ > yes, there was an error from the last rebase. Fixed > > compile, check, check-world, runs without further problem. > > I also changed a few typos/improvements in the documentation, see > attached. > > One thing I wasn't sure of: I have assumed that >ON TRANSACTIONAL END RESET > > should be >ON TRANSACTION END RESET > > and changed it accordingly, please double-check. > It looks well, I merged these changes to patch. Thank you very much for these corectures Updated patch attached Regards Pavel > > Erik Rijkers > schema-variables-20210110.patch.gz Description: application/gzip
Re: [HACKERS] Custom compression methods
On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote: > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby wrote: > > And fails pg_upgrade check, apparently losing track of the compression (?) > > > > CREATE TABLE public.cmdata2 ( > > -f1 text COMPRESSION lz4 > > +f1 text > > ); > > I did not get this? pg_upgrade check is passing for me. I realized that this was failing in your v16 patch sent Dec 25. It's passing on current patches because they do "DROP TABLE cmdata2", but that's only masking the error. I think this patch needs to be specifically concerned with pg_upgrade, so I suggest to not drop your tables and MVs, to allow the pg_upgrade test to check them. That exposes this issue: pg_dump: error: Error message from server: ERROR: cache lookup failed for access method 36447 pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout; pg_dumpall: error: pg_dump failed on database "regression", exiting waiting for server to shut down done server stopped pg_dumpall of post-upgrade database cluster failed I found that's the AM's OID in the old clsuter: regression=# SELECT * FROM pg_am WHERE oid=36447; oid | amname | amhandler | amtype ---++-+ 36447 | pglz2 | pglzhandler | c But in the new cluster, the OID has changed. Since that's written into table data, I think you have to ensure that the compression OIDs are preserved on upgrade: 16755 | pglz2 | pglzhandler | c In my brief attempt to inspect it, I got this crash: $ tmp_install/usr/local/pgsql/bin/postgres -D src/bin/pg_upgrade/tmp_check/data & regression=# SELECT pg_column_compression(f1) FROM cmdata a; server closed the connection unexpectedly Thread 1 "postgres" received signal SIGSEGV, Segmentation fault. __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory. (gdb) bt #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 #1 0x55c6049fde62 in cstring_to_text (s=0x0) at varlena.c:193 #2 pg_column_compression () at varlena.c:5335 (gdb) up #2 pg_column_compression () at varlena.c:5335 5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name( (gdb) l 5333varvalue = (struct varlena *) DatumGetPointer(value); 5334 5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name( 5336 toast_get_compression_oid(varvalue; I guess a missing AM here is a "shouldn't happen" case, but I'd prefer it to be caught with an elog() (maybe in get_am_name()) or at least an Assert. -- Justin
Re: Added schema level support for publication.
Thanks for your comments Bharath, please find my opinion below. On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy wrote: > I think this feature can be useful, in case a user has a lot of tables > to publish inside a schema. Having said that, I wonder if this feature > mandates users to create the same schema with same > permissions/authorizations manually on the subscriber, because logical > replication doesn't propagate any ddl's so are the schema or schema > changes? Or is it that the list of tables from the publisher can go > into a different schema on the subscriber? > DDL's will not be propagated to the subscriber. Users have to create the schema & tables in the subscriber. No change in Permissions/authorizations handling, it will be the same as the existing behavior for relations. > Since the schema can have other objects such as data types, functions, > operators, I'm sure with your feature, non-table objects will be > skipped. > Yes, only table data will be sent to subscribers, non-table objects will be skipped. > As Amit pointed out earlier, the behaviour when schema dropped, I > think we should also consider when schema is altered, say altered to a > different name, maybe we should change that in the publication too. > I agree that when schema is altered the renamed schema should be reflected in the publication. > In general, what happens if we have some temporary tables or foreign > tables inside the schema, will they be allowed to send the data to > subscribers? > Temporary tables & foreign tables will not be added to the publications. > And, with this feature, since there can be many huge tables inside a > schema, the initial table sync phase of the replication can take a > while. > Yes this is required. > Say a user has created a publication for a schema with hundreds of > tables in it, at some point later, can he stop replicating a single or > some tables from that schema? > There is no provision for this currently. > IMO, it's better to have the syntax - CREATE PUBLICATION > production_publication FOR ALL TABLES IN SCHEMA production - just > added IN between for all tables and schema. > I'm ok with the proposed syntax, I would like others' opinion too before making the change. > Say a user has a schema with 121 tables in it, and wants to replicate > only 120 or 199 or even lesser tables out of it, so can we have some > skip option to the new syntax, something like below? > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA > production WITH skip = marketing, accounts, sales; --> meaning is, > replicate all the tables in the schema production except marketing, > accounts, sales tables. > Yes this is a good use case, will include this change. Thanks for the comments, I will handle the comments and post a patch for this. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Added schema level support for publication.
On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy wrote: > > One more point - if the publication is created for a schema with no or > some initial tables, will all the future tables that may get added to > the schema will be replicated too? > I agree on this, when a relation is added to the schema it should be added to the publication. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [PATCH] Generic type subscripting
Hi > I'm thinking of the update path as a kind of implicit schema. JSON is > intentionally not bound to any schema on creation, so I don't see a > failure to enforce another schema at runtime (and outside the WHERE > clause, at that) as an error exactly. > This concept is not consistent with other implemented behaviour. 1. The schema is dynamically enhanced - so although the path doesn't exists, it is created and data are changed postgres=# create table foo(a jsonb); CREATE TABLE postgres=# insert into foo values('{}'); INSERT 0 1 postgres=# update foo set a['a']['a'][10] = '0'; UPDATE 1 postgres=# select * from foo; ┌───┐ │ a │ ╞═══╡ │ {"a": {"a": [null, null, null, null, null, null, null, null, null, null, 0]}} │ └───┘ (1 row) So although the path [a,a,10] was not exists, it was created. 2. this update fails (and it is correct) postgres=# update foo set a['a']['a']['b'] = '0'; ERROR: path element at position 3 is not an integer: "b" although the path [a,a,b] doesn't exists, and it is not ignored. This implementation doesn't do only UPDATE (and then analogy with WHERE clause isn't fully adequate). It does MERGE. This is necessary, because without it, the behaviour will be pretty unfriendly - because there is not any external schema. I think so this is important - and it can be little bit messy. I am not sure if I use correct technical terms - we try to use LAX update in first step, and if it is not successful, then we try to do LAX insert. This is maybe correct from JSON semantic - but for developer it is unfriendly, because he hasn't possibility to detect if insert was or was not successful. In special JSON functions I can control behave and can specify LAX or STRICT how it is necessity. But in this interface (subscripting) this possibility is missing. I think so there should be final check (semantically) if value was updated, and if the value was changed. If not, then error should be raised. It should be very similar like RLS update. I know and I understand so there should be more than one possible implementations, but safe is only one - after successful update I would to see new value inside, and when it is not possible, then I expect exception. I think so it is more practical too. I can control filtering with WHERE clause. But I cannot to control MERGE process. Manual recheck after every update can be terrible slow. Regards Pavel > But I looked into the bulk case a little further, and "outside the > WHERE clause" cuts both ways. The server reports an update whether or > not the JSON could have been modified, which suggests triggers will > fire for no-op updates. That's more clearly a problem. > > insert into j (val) values > ('{"a": 100}'), > ('{"a": "200"}'), > ('{"b": "300"}'), > ('{"c": {"d": 400}}'), > ('{"a": {"z": 500}}'); > > INSERT 0 5 > update j set val['a']['z'] = '600' returning *; > val > > {"a": 100} > {"a": "200"} > {"a": {"z": 600}, "b": "300"} > {"a": {"z": 600}, "c": {"d": 400}} > {"a": {"z": 600}} > (5 rows) > > *UPDATE 5* >
Re: new heapcheck contrib module
On Fri, Jan 8, 2021 at 6:33 AM Mark Dilger wrote: > The attached patches, v31, are mostly the same, but with "getopt_long.h" > included from pg_amcheck.c per Thomas's review, and a .gitignore file added > in contrib/pg_amcheck/ I couple more little things from Windows CI: C:\projects\postgresql\src\include\fe_utils/option_utils.h(19): fatal error C1083: Cannot open include file: 'libpq-fe.h': No such file or directory [C:\projects\postgresql\pg_amcheck.vcxproj] Does contrib/amcheck/Makefile need to say "SHLIB_PREREQS = submake-libpq" like other contrib modules that use libpq? pg_backup_utils.obj : error LNK2001: unresolved external symbol exit_nicely [C:\projects\postgresql\pg_dump.vcxproj] I think this is probably because additions to src/fe_utils/Makefile's OBJS list need to be manually replicated in src/tools/msvc/Mkvcbuild.pm's @pgfeutilsfiles list. (If I'm right about that, perhaps it needs a comment to remind us Unix hackers of that, or perhaps it should be automated...)
Re: Terminate the idle sessions
Thomas Munro writes: > On Thu, Jan 7, 2021 at 4:51 PM Tom Lane wrote: >> Thomas Munro writes: >>> One of the strange things about these errors is that they're >>> asynchronous/unsolicited, but they appear to the client to be the >>> response to their next request (if it doesn't eat ECONNRESET instead). >> Right, which is what makes class 57 (operator intervention) seem >> attractive to me. From the client's standpoint these look little >> different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN, >> which are in that category. > Yeah, that's a good argument. Given the lack of commentary on this thread, I'm guessing that people aren't so excited about this topic that a change in the existing sqlstate assignment for ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT would fly. So I propose to change the new ERRCODE_IDLE_SESSION_TIMEOUT to be in class 57 and call it good. regards, tom lane
Re: zstd compression for pg_dump
On 1/4/21 3:53 AM, Justin Pryzby wrote: About 89% smaller. Did a quick code review of the patch. I have not yet taken it for a spin yet and there are parts of the code I have not read yet. ## Is there any reason for this diff? -cfp*fp = pg_malloc(sizeof(cfp)); +cfp*fp = pg_malloc0(sizeof(cfp)); ## Since we know have multiple returns in cfopen() I am not sure that setting fp to NULL is still clearer than just returning NULL. ## I do not like that this pretends to support r+, w+ and a+ but does not actually do so since it does not create an input stream in those cases. else if (mode[0] == 'w' || mode[0] == 'a' || strchr(mode, '+') != NULL) [...] else if (strchr(mode, 'r')) ## Wouldn't cfread(), cfwrite(), cfgetc(), cfgets(), cfclose() and cfeof() be cleaner with sitch statments similar to cfopen()? ## "/* Should be called "method" or "library" ? */" Maybe, but personally I think algorithm is fine too. ## "Is a nondefault level set ?" The PostgreSQL project does not use space before question mark (at least not in English). ## Why isn't level_set just a local variable in parse_compression()? It does not seem to be used elsewhere. ## Shouldn't we call the Compression variable in OpenArchive() nocompress to match with the naming convention in other places. And in general I wonder if we should not write "nocompression = {COMPR_ALG_NONE}" rather than "nocompression = {0}". ## Why not use const on the pointers to Compression for functions like cfopen()? As far as I can see several of them could be const. ## Shouldn't "AH->compression.alg = Z_DEFAULT_COMPRESSION" in ReadHead() be "AH->compression.alg = COMPR_ALG_DEFAULT"? Additionally I am not convinced that returning COMPR_ALG_DEFAULT will even work but I have not had the time to test that theory yet. And in general I am quite sceptical of that we really need of COMPR_ALG_DEFAULT. ## Some white space issues Add spaces around plus in "atoi(1+eq)" and "pg_log_error("unknown compression algorithm: %s", 1+eq)". Add spaces around plus in parse_compression(), e.g. in "strlen(1+eq)". ## Shouldn't hasSuffix() take the current compression algorithm as a parameter? Or alternatively look up which compression algorithm to use from the suffix? ## Why support multiple ways to write zlib on the command line? I do not see any advatange of being able to write it as libz. ## I feel renaming SaveOutput() to GetOutput() would make it more clear what it does now that you have changed the return type. ## You have accidentally committed "-runstatedir" in configure. I have no idea why we do not have it (maybe it is something Debian specific) but even if we are going to add it it should not be in this patch. Same with the parenthesis changes to LARGE_OFF_T. ## This is probably out of scope of your patch but I am not a fan of the fallback logic in cfopen_read(). I feel ideally we should always know if there is a suffix or not and not try to guess file names and do pointless syscalls. ## COMPR_ALG_DEFAULT looks like it would error out for archive and directory if someone has neither zlib nor zstandard. It feels like it should default to uncompressed if we have neither. Or at least give a better error message. Note, there's currently several "compression" patches in CF app. This patch seems to be independent of the others, but probably shouldn't be totally uncoordinated (like adding lz4 in one and ztsd in another might be poor execution). A thought here is that maybe we want to use the same values for the enums in all patches. Especially if we write the numeric value to pg dump files. Andreas
Re: Inconsistent "" use
On Sun, Jan 10, 2021 at 08:22:42PM +0900, Tatsuo Ishii wrote: > In doc/src/sgml/func.sgml description of SHOW command use > "SQL", while SET command description the same > section does not use "". Shouldn't the description of SET use > "" for "SQL" as well? Patch attached. https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters either ignore that or use it as a signal to substitute small caps. I don't consider small caps an improvement for "SQL", so I'd prefer to never use SQL. also makes the markup longer (though one could mitigate that with an entity like &SQL). However, standardizing on either way is better than varying within the manual.
Re: [PATCH] Automatic HASH and LIST partition creation
On Wed, Oct 7, 2020 at 6:26 AM Anastasia Lubennikova wrote: > Do you think that it is a bug? For now, I removed this statement from > tests just to calm down the CI. I don't think we can use \d+ on a temporary table here, because the backend ID appears in the namespace, which is causing a failure on one of the CI OSes due to nondeterminism: CREATE TEMP TABLE temp_parted (a char) PARTITION BY LIST (a) CONFIGURATION (VALUES IN ('a') DEFAULT PARTITION temp_parted_default); \d+ temp_parted - Partitioned table "pg_temp_3.temp_parted" + Partitioned table "pg_temp_4.temp_parted"
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On Fri, Jan 08, 2021 at 04:14:33PM -0300, Alvaro Herrera wrote: > > > I ended up with apparently broken constraint when running multiple loops > > > around > > > a concurrent detach / attach: > > > > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 > > > CONCURRENTLY"; do :; done& > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 > > > CONCURRENTLY"; do :; done& > > > > > > "p1_check" CHECK (true) > > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2) > > > > Not good. > > Haven't had time to investigate this problem yet. I guess it's because you commited the txn and released lock in the middle of the command. -- Justin >From e18c11fd5bcc4f5cd981a3219383265b55974f34 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 10 Jan 2021 15:41:43 -0600 Subject: [PATCH] fix --- src/backend/commands/tablecmds.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index d7b9c63e5f..144c27c303 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -17131,6 +17131,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, Oid partrelid, parentrelid; LOCKTAG tag; + LockRelId partlockrelid; char *parentrelname; char *partrelname; @@ -17162,6 +17163,10 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, table_close(rel, NoLock); tab->rel = NULL; + partlockrelid.relId = parentrelid; + partlockrelid.dbId = MyDatabaseId; + LockRelationIdForSession(&partlockrelid, ShareUpdateExclusiveLock); + /* Make updated catalog entry visible */ PopActiveSnapshot(); CommitTransactionCommand(); @@ -17204,7 +17209,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, errmsg("partition \"%s\" was removed concurrently", partrelname))); tab->rel = rel; - + UnlockRelationIdForSession(&partlockrelid, ShareUpdateExclusiveLock); } /* Do the final part of detaching */ @@ -17444,7 +17449,10 @@ DetachAddConstraintIfNeeded(List **wqueue, Relation partRel) TupleDesc td = RelationGetDescr(partRel); Constraint *n; - constraintExpr = make_ands_explicit(RelationGetPartitionQual(partRel)); + List *l = RelationGetPartitionQual(partRel); + Assert(partRel->rd_rel->relispartition); + Assert(l != NIL); + constraintExpr = make_ands_explicit(l); /* If an identical constraint exists, we don't need to create one */ if (td->constr && td->constr->num_check > 0) -- 2.17.0
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi, I started looking at this patch again, hoping to get it committed in this CF, but I think there's a regression in handling TOAST tables (compared to the v3 patch submitted by Pavan in February 2019). The test I'm running a very simple test (see test.sql): 1) start a transaction 2) create a table with a text column 3) copy freeze data into it 4) use pg_visibility to see how many blocks are all_visible both in the main table and it's TOAST table For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) I get this: pages NOT all_visible -- main 637 0 toast50001 3 There was some discussion about relcache invalidations causing a couple TOAST pages not be marked as all_visible, etc. However, for this patch on master I get this pages NOT all_visible -- main 637 0 toast50001 50001 So no pages in TOAST are marked as all_visible. I haven't investigated what's causing this, but IMO that needs fixing to make ths patch RFC. Attached is the test script I'm using, and a v5 of the patch - rebased on current master, with some minor tweaks to comments etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 10 Jan 2021 20:30:29 +0100 Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the visibility map. Until now it only marked individual tuples as frozen, but page-level flags were not updated. This is a fairly old patch, and multiple people worked on it. The first version was written by Jeff Janes, and then reworked by Pavan Deolasee and Anastasia Lubennikova. Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii Discussion: https://postgr.es/m/caboikdn-ptgv0mzntrk2q8otfuuajqaymgmkdu1dckftuxv...@mail.gmail.com Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com --- .../pg_visibility/expected/pg_visibility.out | 64 +++ contrib/pg_visibility/sql/pg_visibility.sql | 77 +++ src/backend/access/heap/heapam.c | 76 -- src/backend/access/heap/hio.c | 17 src/include/access/heapam_xlog.h | 3 + 5 files changed, 229 insertions(+), 8 deletions(-) diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out index ca4b6e186b..0017e3415c 100644 --- a/contrib/pg_visibility/expected/pg_visibility.out +++ b/contrib/pg_visibility/expected/pg_visibility.out @@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition'); (1 row) +-- test copy freeze +create table copyfreeze (a int, b char(1500)); +-- load all rows via COPY FREEZE and ensure that all pages are set all-visible +-- and all-frozen. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | t | t + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- load half the rows via regular COPY and rest via COPY FREEZE. The pages +-- which are touched by regular COPY must not be set all-visible/all-frozen. On +-- the other hand, pages allocated by COPY FREEZE should be marked +-- all-frozen/all-visible. +begin; +truncate copyfreeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | f | f + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + +-- Try a mix of regular COPY and COPY FREEZE. +begin; +truncate copyfreeze; +copy copyfreeze from stdin freeze; +copy copyfreeze from stdin; +copy copyfreeze from stdin freeze; +commit; +select * from pg_visibility_map('copyfreeze'); + blkno | all_visible | all_frozen +---+-+ + 0 | t | t + 1 | f | f + 2 | t | t +(3 rows) + +select * from pg_check_frozen('copyfreeze'); + t_ctid + +(0 rows) + -- cleanup drop table test_partitioned; drop view test_view; @@ -188,3 +251,4 @@ drop server dummy_server; drop foreign data wrapper dummy; drop materialized view matview_visibility_test; drop table regular_table; +drop tabl
Re: Multiple hosts in connection string failed to failover in non-hot standby mode
I wrote: > The problems that I see in this area are first that there's no > real standardization in libpq as to whether to append error messages > together or just flush preceding messages; and second that no effort > is made in multi-connection-attempt cases to separate the errors from > different attempts, much less identify which error goes with which > host or IP address. I think we really need to put some work into > that. I spent some time on this, and here is a patch set that tries to improve matters. 0001 changes the libpq coding rules to be that all error messages should be appended to conn->errorMessage, never overwritten (there are a couple of exceptions in fe-lobj.c) and we reset conn->errorMessage to empty only at the start of a connection request or new query. This is something that's been bugging me for a long time and I'm glad to get it cleaned up. Formerly it seemed that a dartboard had been used to decide whether to use "printfPQExpBuffer" or "appendPQExpBuffer"; now it's always the latter. We can also get rid of several hacks that were used to get around the mess and force appending behavior. 0002 then changes the reporting rules in fe-connect.c as I suggested, so that you might get errors like this: $ psql -h localhost,/tmp -p 12345 psql: error: could not connect to host "localhost" (::1), port 12345: Connection refused Is the server running on that host and accepting TCP/IP connections? could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused Is the server running on that host and accepting TCP/IP connections? could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory Is the server running locally and accepting connections on that socket? and we have a pretty uniform rule that errors coming back from a connection attempt will be prefixed with the server address. Then 0003 is the part of your patch that I'm happy with. Given 0001+0002 we could certainly consider looping back and retrying for more cases, but I still want to tread lightly on that. I don't see a lot of value in retrying errors that seem to be on the client side, such as failure to set socket properties; and in general I'm hesitant to add untestable code paths here. I feel pretty good about 0001: it might be committable as-is. 0002 is probably subject to bikeshedding, plus it has a problem in the ECPG tests. Two of the error messages are now unstable because they expose chosen-at-random socket paths: diff -U3 /home/postgres/pgsql/src/interfaces/ecpg/test/expected/connect-test5.stderr /home/postgres/pgsql/src/interfaces/ecpg/test/results/connect-test5.stderr --- /home/postgres/pgsql/src/interfaces/ecpg/test/expected/connect-test5.stderr 2020-08-04 14:59:45.617380050 -0400 +++ /home/postgres/pgsql/src/interfaces/ecpg/test/results/connect-test5.stderr 2021-01-10 16:12:22.539433702 -0500 @@ -36,7 +36,7 @@ [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ECPGconnect: opening database on port for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 0 -[NO_PID]: ECPGconnect: could not open database: FATAL: database "regress_ecpg_user2" does not exist +[NO_PID]: ECPGconnect: could not open database: could not connect to socket "/tmp/pg_regress-EbHubF/.s.PGSQL.58080": FATAL: database "regress_ecpg_user2" does not exist [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_finish: connection main closed @@ -73,7 +73,7 @@ [NO_PID]: sqlca: code: -220, state: 08003 [NO_PID]: ECPGconnect: opening database on port for user regress_ecpg_user2 [NO_PID]: sqlca: code: 0, state: 0 -[NO_PID]: ECPGconnect: could not open database: FATAL: database "regress_ecpg_user2" does not exist +[NO_PID]: ECPGconnect: could not open database: could not connect to socket "/tmp/pg_regress-EbHubF/.s.PGSQL.58080": FATAL: database "regress_ecpg_user2" does not exist [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_finish: connection main closed I don't have any non-hacky ideas what to do about that. The extra detail seems useful to end users, but we don't have any infrastructure that would allow filtering it out in the ECPG tests. regards, tom lane diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c index b76f0befd0..8b60378379 100644 --- a/src/interfaces/libpq/fe-auth-scram.c +++ b/src/interfaces/libpq/fe-auth-scram.c @@ -208,13 +208,13 @@ pg_fe_scram_exchange(void *opaq, char *input, int inputlen, { if (inputlen == 0) { - printfPQExpBuffer(&conn->errorMessage, + appendPQExpBuffer(&conn->errorMessage, libpq_gettext("malformed SCRAM message (empty message)\n")); goto error; } if (inputlen != strlen(input)) { - printfPQExpBuffer(&conn->errorMessage, + appendPQExpBuffer(&conn->errorMessage, libpq_gettext("malformed SCRAM message (length mismatch)\n")); goto error; } @@ -258,14 +258,14 @@ pg_fe_scram_exchange(vo
Re: Inconsistent "" use
On Sun, Jan 10, 2021 at 01:11:07PM -0800, Noah Misch wrote: > On Sun, Jan 10, 2021 at 08:22:42PM +0900, Tatsuo Ishii wrote: > > In doc/src/sgml/func.sgml description of SHOW command use > > "SQL", while SET command description the same > > section does not use "". Shouldn't the description of SET use > > "" for "SQL" as well? Patch attached. > > https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters > either ignore that or use it as a signal to substitute small caps. > I don't consider small caps an improvement for "SQL", so I'd prefer to never > use SQL. also makes the markup longer (though > one could mitigate that with an entity like &SQL). However, standardizing on > either way is better than varying within the manual. I think smallcaps is almost always a win for acronyms. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Inconsistent "" use
Bruce Momjian writes: > On Sun, Jan 10, 2021 at 01:11:07PM -0800, Noah Misch wrote: >> https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters >> either ignore that or use it as a signal to substitute small caps. >> I don't consider small caps an improvement for "SQL", so I'd prefer to never >> use SQL. also makes the markup longer (though >> one could mitigate that with an entity like &SQL). However, standardizing on >> either way is better than varying within the manual. > I think smallcaps is almost always a win for acronyms. I'm with Noah: small caps are *not* an improvement, they're just distractingly fussy. I note that the authors of the stylesheets we use seem to agree, because AFAICS is not rendered specially in either HTML or PDF output. Given this docbook.org advice, I'd be inclined to just remove our use of altogether. Although, since it isn't actually making any difference, it's not clear that it's worth doing anything. The largest effect of trying to standardize (in either direction) would be to create back-patching hazards for docs fixes. regards, tom lane
RE: Disable WAL logging to speed up data loading
From: Masahiko Sawada > I think it's better to have index AM (and perhaps table AM) control it > instead of filtering in XLogInsert(). Because otherwise third-party > access methods that use LSN like gist indexes cannot write WAL records > at all when wal_level = none even if they want. Hm, third-party extensions use RM_GENERIC_ID, so it should probably be allowed in XLogInsert() as well instead of allowing control in some other way. It's not unnatural that WAL records for in-core AMs are filtered in XLogInsert(). Regards Takayuki Tsunakawa
RE: Parallel Inserts in CREATE TABLE AS
> Attaching v21 patch set, which has following changes: > 1) 0001 - changed fpes->ins_cmd_type == > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type != > PARALLEL_INSERT_CMD_UNDEF > 2) 0002 - reworded the commit message. > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases > to ensure leader and worker insert the tuples in the same xact and replaced > memory usage output in numbers like 25kB to NkB to make the tests stable. > 4) 0004 - updated one of the test output to be in NkB and made the assertion > in SetParallelInsertState to be not under an if condition. > > There's one open point [1] on selective skipping of error "cannot insert > tuples in a parallel worker" in heap_prepare_insert(), thoughts are > welcome. > > Please consider the v21 patch set for further review. Hi, I took a look into the new patch and have some comments. 1. + /* +* Do not consider tuple cost in case of we intend to perform parallel +* inserts by workers. We would have turned on the ignore flag in +* apply_scanjoin_target_to_paths before generating Gather path for the +* upper level SELECT part of the query. +*/ + if ((root->parse->parallelInsCmdTupleCostOpt & +PARALLEL_INSERT_SELECT_QUERY) && + (root->parse->parallelInsCmdTupleCostOpt & +PARALLEL_INSERT_CAN_IGN_TUP_COST)) Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ? IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set. 2. +static void +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) ... + info = (ParallelInsertCTASInfo *) ins_info; + intoclause_str = nodeToString(info->intoclause); + intoclause_len = strlen(intoclause_str) + 1; +static void +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) ... + info = (ParallelInsertCTASInfo *)ins_info; + intoclause_str = nodeToString(info->intoclause); + intoclause_len = strlen(intoclause_str) + 1; + intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len); I noticed the above code will call nodeToString and strlen twice which seems unnecessary. Do you think it's better to store the result of nodetostring and strlen first and pass them when used ? 3. + if (node->need_to_scan_locally || node->nworkers_launched == 0) + { + EState *estate = node->ps.state; + TupleTableSlot *outerTupleSlot; + + for(;;) + { + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = + node->pei ? node->pei->area : NULL; ... + node->ps.state->es_processed++; + } How about use the variables estate like 'estate-> es_processed++;' Instead of node->ps.state->es_processed++; Best regards, houzj
Re: pg_preadv() and pg_pwritev()
On Wed, Dec 23, 2020 at 12:06 AM Thomas Munro wrote: > On Mon, Dec 21, 2020 at 11:40 AM Andres Freund wrote: > > Can we come up with a better name than 'uio'? I find that a not exactly > > meaningful name. > > Ok, let's try port/pg_iovec.h. I pushed it with that name, and a couple more cosmetic changes. I'll keep an eye on the build farm.
Re: Inconsistent "" use
> I'm with Noah: small caps are *not* an improvement, they're just > distractingly fussy. I note that the authors of the stylesheets > we use seem to agree, because AFAICS is not rendered > specially in either HTML or PDF output. > > Given this docbook.org advice, I'd be inclined to just remove > our use of altogether. Although, since it isn't actually > making any difference, it's not clear that it's worth doing anything. > The largest effect of trying to standardize (in either direction) > would be to create back-patching hazards for docs fixes. Yeah, simple grep showed that there are almost 1k lines using . I agree that the pain caused by fixing all of them is much larger than the benefit to standardize the usage of . -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: pg_preadv() and pg_pwritev()
On Mon, Jan 11, 2021 at 3:34 PM Thomas Munro wrote: > On Wed, Dec 23, 2020 at 12:06 AM Thomas Munro wrote: > > On Mon, Dec 21, 2020 at 11:40 AM Andres Freund wrote: > > > Can we come up with a better name than 'uio'? I find that a not exactly > > > meaningful name. > > > > Ok, let's try port/pg_iovec.h. > > I pushed it with that name, and a couple more cosmetic changes. I'll > keep an eye on the build farm. Since only sifaka has managed to return a result so far (nice CPU), I had plenty of time to notice that macOS Big Sur has introduced preadv/pwritev. They were missing on Catalina[1]. [1] https://cirrus-ci.com/task/6002157537198080
O(n^2) system calls in RemoveOldXlogFiles()
Hi, I noticed that RemoveXlogFile() has this code: /* * Before deleting the file, see if it can be recycled as a future log * segment. Only recycle normal files, pg_standby for example can create * symbolic links pointing to a separate archive directory. */ if (wal_recycle && endlogSegNo <= recycleSegNo && lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) && InstallXLogFileSegment(&endlogSegNo, path, true, recycleSegNo, true)) { ereport(DEBUG2, (errmsg("recycled write-ahead log file \"%s\"", segname))); CheckpointStats.ckpt_segs_recycled++; /* Needn't recheck that slot on future iterations */ endlogSegNo++; } I didn't check the migration history of this code but it seems that endlogSegNo doesn't currently have the right scoping to achieve the goal of that last comment, so checkpoints finish up repeatedly search for the next free slot, starting at the low end each time, like so: stat("pg_wal/0001004F", {st_mode=S_IFREG|0600, st_size=16777216, ...}) = 0 ... stat("pg_wal/00010073", 0x7fff98b9e060) = -1 ENOENT (No such file or directory) stat("pg_wal/0001004F", {st_mode=S_IFREG|0600, st_size=16777216, ...}) = 0 ... stat("pg_wal/00010074", 0x7fff98b9e060) = -1 ENOENT (No such file or directory) ... and so on until we've recycled all our recyclable segments. Ouch.
Re: Removing unneeded self joins
On 1/7/21 7:08 PM, Masahiko Sawada wrote: On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov wrote: Thanks, it is my fault. I tried to extend this patch with foreign key references and made a mistake. Currently I rollback this new option (see patch in attachment), but will be working for a while to simplify this patch. Are you working to simplify the patch? This patch has been "Waiting on Author" for 1 month and doesn't seem to pass cfbot tests. Please update the patch. Yes, I'm working to improve this feature. In attachment - fixed and rebased on ce6a71fa53. -- regards, Andrey Lepikhov Postgres Professional >From 3caeb297320af690be71b367329d86c49564e231 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Mon, 11 Jan 2021 09:01:11 +0500 Subject: [PATCH] Remove self-joins. Remove inner joins of a relation to itself if can be proven that such join can be replaced with a scan. We can build the required proofs of uniqueness using the existing innerrel_is_unique machinery. We can remove a self-join when for each outer row, if: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars then the inner row is (physically) same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals looks like a.x = b.x 2. Collect all another join quals. 3. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. Proved, that this join is self-join and can be replaced by a scan. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- src/backend/optimizer/plan/analyzejoins.c | 1186 + src/backend/optimizer/plan/planmain.c |5 + src/backend/optimizer/util/relnode.c | 26 +- src/backend/utils/misc/guc.c | 10 + src/include/optimizer/pathnode.h |4 + src/include/optimizer/planmain.h |2 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 331 ++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 166 +++ 11 files changed, 1765 insertions(+), 16 deletions(-) diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 90460a69bd..d631e95f89 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -22,6 +22,7 @@ */ #include "postgres.h" +#include "catalog/pg_class.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" @@ -29,8 +30,12 @@ #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/planmain.h" +#include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" #include "utils/lsyscache.h" +#include "utils/memutils.h" + +bool enable_self_join_removal; /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); @@ -47,6 +52,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root, RelOptInfo *innerrel, JoinType jointype, List *restrictlist); +static void change_rinfo(RestrictInfo* rinfo, Index from, Index to); /* @@ -1118,3 +1124,1183 @@ is_innerrel_unique_for(PlannerInfo *root, /* Let rel_is_distinct_for() do the hard work */ return rel_is_distinct_for(root, innerrel, clause_list); } + +typedef struct +{ + Index oldRelid; + Index newRelid; +} ChangeVarnoContext; + + +static bool +change_varno_walker(Node *node, ChangeVarnoContext *context) +{ + if (node == NULL) + return false; + + if (IsA(node, Var)) + { + Var* var = (Var*)node; + if (var->varno == context->oldRelid) + { + var->varno = context->newRelid; + var->varnosyn = context->newRelid; + var->location = -1; + } + else if (var->varno == context->newRelid) + var->location = -1; + + return false; + } + if (IsA(node, RestrictInfo)) + { + change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid); + return false; + } + return expression_tree_walker(node, change_varno_walker, context); +} + +/* + * For all Vars in the expression that have varno = oldRelid, set + * varno = newRelid. + */ +static void +change_varno(Expr *expr, Index oldRelid, Index newRelid) +{ + ChangeVarnoContext context; + + context.oldRelid = oldRelid; + context.newRelid = newRelid; + change_varno_walker((Node *) expr, &context); +} + +/* + * Substitute newId for oldId in relids. + */ +static void +change_relid(Relids *relids, Index oldId, Index newId) +{ + if (bms_is_member(oldId, *relids)) + *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId); +} + +static void +change_rinfo(RestrictInfo* rinfo, Index from, Index to) +{ + bool is_req_equal
Re: Single transaction in the tablesync worker?
On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila wrote: > > On Fri, Jan 8, 2021 at 7:14 AM Peter Smith wrote: > > > > FYI, I was able to reproduce this case in debugger. PSA logs showing > > details. > > > > Thanks for reproducing as I was worried about exactly this case. I > have one question related to logs: > > ## > ## ALTER SUBSCRIPTION to REFRESH the publication > > ## This blocks on some latch until the tablesync worker dies, then it > continues > ## > > Did you check which exact latch or lock blocks this? > I have checked this myself and the command is waiting on the drop of origin till the tablesync worker is finished because replorigin_drop() requires state->acquired_by to be 0 which will only be true once the tablesync worker exits. I think this is the reason you might have noticed that the command can't be finished until the tablesync worker died. So this can't be an interlock between ALTER SUBSCRIPTION .. REFRESH command and tablesync worker and to that end it seems you have below Fixme's in the patch: + * FIXME - Usually this cleanup would be OK, but will not + * always be OK because the logicalrep_worker_stop_at_commit + * only "flags" the worker to be stopped in the near future + * but meanwhile it may still be running. In this case there + * could be a race between the tablesync worker and this code + * to see who will succeed with the tablesync drop (and the + * loser will ERROR). + * + * FIXME - Also, checking the state is also not guaranteed + * correct because state might be TCOPYDONE when we checked + * but has since progressed to SYNDONE + */ + + if (state == SUBREL_STATE_TCOPYDONE) + { I feel this was okay for an earlier code but now we need to stop the tablesync workers before trying to drop the slot as we do in DropSubscription. Now, if we do that then that would fix the race conditions mentioned in Fixme but still, there are few more things I am worried about: (a) What if the launcher again starts the tablesync worker? One idea could be to acquire AccessExclusiveLock on SubscriptionRelationId as we do in DropSubscription which is not a very good idea but I can't think of any other good way. (b) the patch is just checking SUBREL_STATE_TCOPYDONE before dropping the replication slot but the slot could be created even before that (in SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the slot and if we are not able to drop then we can simply continue assuming it didn't exist. One minor comment: 1. + SpinLockAcquire(&MyLogicalRepWorker->relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE; MyLogicalRepWorker->relstate_lsn = current_lsn; - Spurious line removal. -- With Regards, Amit Kapila.
Re: logical replication worker accesses catalogs in error context callback
On Sat, Jan 9, 2021 at 2:57 PM Bharath Rupireddy wrote: > > On Thu, Jan 7, 2021 at 5:47 AM Masahiko Sawada wrote: > > > > On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada > > wrote: > > > > > > On Wed, Jan 6, 2021 at 11:02 AM Andres Freund wrote: > > > > > > > > Hi, > > > > > > > > Due to a debug ereport I just noticed that worker.c's > > > > slot_store_error_callback is doing something quite dangerous: > > > > > > > > static void > > > > slot_store_error_callback(void *arg) > > > > { > > > > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg; > > > > LogicalRepRelMapEntry *rel; > > > > char *remotetypname; > > > > Oid remotetypoid, > > > > localtypoid; > > > > > > > > /* Nothing to do if remote attribute number is not set */ > > > > if (errarg->remote_attnum < 0) > > > > return; > > > > > > > > rel = errarg->rel; > > > > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum]; > > > > > > > > /* Fetch remote type name from the LogicalRepTypMap cache */ > > > > remotetypname = logicalrep_typmap_gettypname(remotetypoid); > > > > > > > > /* Fetch local type OID from the local sys cache */ > > > > localtypoid = get_atttype(rel->localreloid, > > > > errarg->local_attnum + 1); > > > > > > > > errcontext("processing remote data for replication target > > > > relation \"%s.%s\" column \"%s\", " > > > >"remote type %s, local type %s", > > > >rel->remoterel.nspname, > > > > rel->remoterel.relname, > > > > > > > > rel->remoterel.attnames[errarg->remote_attnum], > > > >remotetypname, > > > >format_type_be(localtypoid)); > > > > } > > > > > > > > > > > > that's not code that can run in an error context callback. It's > > > > theoretically possible (but unlikely) that > > > > logicalrep_typmap_gettypname() is safe to run in an error context > > > > callback. But get_atttype() definitely isn't. > > > > > > > > get_attype() may do catalog accesses. That definitely can't happen > > > > inside an error context callback - the entire transaction might be > > > > borked at this point! > > > > > > You're right. Perhaps calling to format_type_be() is also dangerous > > > since it does catalog access. We should have added the local type > > > names to SlotErrCallbackArg so we avoid catalog access in the error > > > context. > > > > > > I'll try to fix this. > > > > Attached the patch that fixes this issue. > > > > Since logicalrep_typmap_gettypname() could search the sys cache by > > calling to format_type_be(), I stored both local and remote type names > > to SlotErrCallbackArg so that we can just set the names in the error > > callback without sys cache lookup. > > > > Please review it. > > Patch looks good, except one minor comment - can we store > rel->remoterel.atttyps[remoteattnum] into a local variable and use it > in logicalrep_typmap_gettypname, just to save on indentation? Thank you for reviewing the patch! Agreed. Attached the updated patch. > > I quickly searched in places where error callbacks are being used, I > think we need a similar kind of fix for conversion_error_callback in > postgres_fdw.c, because get_attname and get_rel_name are being used > which do catalogue lookups. ISTM that all the other error callbacks > are good in the sense that they are not doing sys catalogue lookups. Indeed. If we need to disallow the catalog lookup during executing error callbacks we might want to have an assertion checking that in SearchCatCacheInternal(), in addition to Assert(IsTransactionState()). Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ fix_slot_store_error_callback_v2.patch Description: Binary data
Re: [HACKERS] Custom compression methods
On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby wrote: > > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote: > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby wrote: > > > And fails pg_upgrade check, apparently losing track of the compression (?) > > > > > > CREATE TABLE public.cmdata2 ( > > > -f1 text COMPRESSION lz4 > > > +f1 text > > > ); > > > > I did not get this? pg_upgrade check is passing for me. > > I realized that this was failing in your v16 patch sent Dec 25. > It's passing on current patches because they do "DROP TABLE cmdata2", but > that's only masking the error. > > I think this patch needs to be specifically concerned with pg_upgrade, so I > suggest to not drop your tables and MVs, to allow the pg_upgrade test to check > them. That exposes this issue: Thanks for the suggestion I will try this. > pg_dump: error: Error message from server: ERROR: cache lookup failed for > access method 36447 > pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout; > pg_dumpall: error: pg_dump failed on database "regression", exiting > waiting for server to shut down done > server stopped > pg_dumpall of post-upgrade database cluster failed > > I found that's the AM's OID in the old clsuter: > regression=# SELECT * FROM pg_am WHERE oid=36447; > oid | amname | amhandler | amtype > ---++-+ > 36447 | pglz2 | pglzhandler | c > > But in the new cluster, the OID has changed. Since that's written into table > data, I think you have to ensure that the compression OIDs are preserved on > upgrade: > > 16755 | pglz2 | pglzhandler | c Yeah, basically we are storing am oid in the compressed data so Oid must be preserved. I will look into this and fix it. > In my brief attempt to inspect it, I got this crash: > > $ tmp_install/usr/local/pgsql/bin/postgres -D > src/bin/pg_upgrade/tmp_check/data & > regression=# SELECT pg_column_compression(f1) FROM cmdata a; > server closed the connection unexpectedly > > Thread 1 "postgres" received signal SIGSEGV, Segmentation fault. > __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 > 120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory. > (gdb) bt > #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120 > #1 0x55c6049fde62 in cstring_to_text (s=0x0) at varlena.c:193 > #2 pg_column_compression () at varlena.c:5335 > > (gdb) up > #2 pg_column_compression () at varlena.c:5335 > 5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name( > (gdb) l > 5333varvalue = (struct varlena *) DatumGetPointer(value); > 5334 > 5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name( > 5336 > toast_get_compression_oid(varvalue; > > I guess a missing AM here is a "shouldn't happen" case, but I'd prefer it to > be > caught with an elog() (maybe in get_am_name()) or at least an Assert. Yeah, this makes sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
On Thu, Jan 7, 2021 at 9:32 PM Fujii Masao wrote: > > > > On 2021/01/07 17:28, shinya11.k...@nttdata.com wrote: > >> On Thu, Jan 7, 2021 at 1:30 PM Fujii Masao > >> wrote: > >>> > >>> On 2021/01/07 12:42, Masahiko Sawada wrote: > On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao > wrote: > > > > On 2021/01/07 10:01, Masahiko Sawada wrote: > >> On Wed, Jan 6, 2021 at 3:37 PM > >> wrote: > >>> > +#define Query_for_list_of_cursors \ > +" SELECT name FROM pg_cursors"\ > > This query should be the following? > > " SELECT pg_catalog.quote_ident(name) "\ > " FROM pg_catalog.pg_cursors "\ > " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'" > > +/* CLOSE */ > + else if (Matches("CLOSE")) > + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > + " UNION ALL SELECT 'ALL'"); > > "UNION ALL" should be "UNION"? > >>> > >>> Thank you for your advice, and I corrected them. > >>> > > + COMPLETE_WITH_QUERY(Query_for_list_of_cursors > > + " UNION SELECT 'ABSOLUTE'" > > + " UNION SELECT 'BACKWARD'" > > + " UNION SELECT 'FORWARD'" > > + " UNION SELECT 'RELATIVE'" > > + " UNION SELECT 'ALL'" > > + " UNION SELECT 'NEXT'" > > + " UNION SELECT 'PRIOR'" > > + " UNION SELECT 'FIRST'" > > + " UNION SELECT 'LAST'" > > + " UNION SELECT 'FROM'" > > + " UNION SELECT 'IN'"); > > > > This change makes psql unexpectedly output "FROM" and "IN" just > > after "FETCH". > > I think "FROM" and "IN" can be placed just after "FETCH". According > to > the documentation, the direction can be empty. For instance, we can > do > like: > >>> > >>> Thank you! > >>> > I've attached the patch improving the tab completion for DECLARE as > an > example. What do you think? > > BTW according to the documentation, the options of DECLARE statement > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive. > > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ] > CURSOR [ { WITH | WITHOUT } HOLD ] FOR query > > But I realized that these options are actually order-insensitive. For > instance, we can declare a cursor like: > > =# declare abc scroll binary cursor for select * from pg_class; > DECLARE CURSOR > > The both parser code and documentation has been unchanged from 2003. > Is it a documentation bug? > >>> > >>> Thank you for your patch, and it is good. > >>> I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO > >>> SCROLL) are order-sensitive." > >>> I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in > >>> any order." , according to the documentation. > >> > >> Thanks, you're right. I missed that sentence. I still don't think the > >> syntax of DECLARE statement in the documentation doesn't match its > >> implementation but I agree that it's order-insensitive. > >> > >>> I made a new patch, but the amount of codes was large due to > >>> order-insensitive. > >> > >> Thank you for updating the patch! > >> > >> Yeah, I'm also afraid a bit that conditions will exponentially > >> increase when a new option is added to DECLARE statement in the > >> future. Looking at the parser code for DECLARE statement, we can put > >> the same options multiple times (that's also why I don't think it > >> matches). For instance, > >> > >> postgres(1:44758)=# begin; > >> BEGIN > >> postgres(1:44758)=# declare test binary binary binary cursor for > >> select * from pg_class; > >> DECLARE CURSOR > >> > >> So how about simplify the above code as follows? > >> > >> @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, > >> int end) > >> else if (Matches("DECLARE", MatchAny)) > >> COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", > >> "CURSOR"); > >> + /* > >> + * Complete DECLARE with one of BINARY, INSENSITIVE, SCROLL, > >> + * NO SCROLL, and CURSOR. The tail doesn't match any keywords for > >> + * DECLARE, assume we want options. > >> + */ > >> + else if (HeadMatches("DECLARE", MatchAny, "*") && > >> + TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR"))) > >> + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", > >> + "CURSOR"); > > > > This change seems to cause "DECLARE ... CURSOR FOR SELECT " to > > unexpectedly output BINARY, INSENSITIVE, etc. > > Indeed. Is the following not complete but much better? > >>> > >>> Yes, I think that's better. > >>> > > @@
Re: Added schema level support for publication.
On Sun, Jan 10, 2021 at 11:21 PM vignesh C wrote: > On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy > wrote: > > I think this feature can be useful, in case a user has a lot of tables > > to publish inside a schema. Having said that, I wonder if this feature > > mandates users to create the same schema with same > > permissions/authorizations manually on the subscriber, because logical > > replication doesn't propagate any ddl's so are the schema or schema > > changes? Or is it that the list of tables from the publisher can go > > into a different schema on the subscriber? > > > > DDL's will not be propagated to the subscriber. Users have to create > the schema & tables in the subscriber. No change in > Permissions/authorizations handling, it will be the same as the > existing behavior for relations. Looks like the existing behaviour already requires users to create the schema on the subscriber when publishing the tables from that schema. Otherwise, an error is thrown on the subscriber [1]. [1] on publisher: CREATE SCHEMA myschema; CREATE TABLE myschema.t1(a1 int, b1 int); INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i; CREATE PUBLICATION testpub FOR TABLE myschema.t1; on subscriber: postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost dbname=postgres user=bharath port=5432' PUBLICATION testpub; ERROR: schema "myschema" does not exist CREATE SCHEMA myschema; CREATE TABLE myschema.t1(a1 int, b1 int); postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost dbname=postgres user=bharath port=5432' PUBLICATION testpub; NOTICE: created replication slot "testsub" on publisher CREATE SUBSCRIPTION > > Since the schema can have other objects such as data types, functions, > > operators, I'm sure with your feature, non-table objects will be > > skipped. > > > > Yes, only table data will be sent to subscribers, non-table objects > will be skipped. Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is for all the tables in the database, does this i.e. skips non-table objects and temporary tables, foreign tables and so on. So, your feature also can behave the same way, but within the scope of the given schema/s. > > As Amit pointed out earlier, the behaviour when schema dropped, I > > think we should also consider when schema is altered, say altered to a > > different name, maybe we should change that in the publication too. > > > > I agree that when schema is altered the renamed schema should be > reflected in the publication. I think, it's not only making sure that the publisher side has the new altered schema, but also the subscriber needs those alters. Having said that, since these alters come under DDL changes and in logical replication we don't publish the scheme changes to the subscriber, we may not need to anything extra for informing the schema alters to the subscriber from the publisher, the users might have to do the same schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub REFRESH PUBLICATION; should work for them? If this understanding is correct, then we should document this. > > In general, what happens if we have some temporary tables or foreign > > tables inside the schema, will they be allowed to send the data to > > subscribers? > > > > Temporary tables & foreign tables will not be added to the publications. Yes the existing logical replication framework doesn't allow replication of temporary, unlogged, foreign tables and other non-table relations such as materialized views, indexes etc [1]. The CREATE PUBLICATION statement either fails in check_publication_add_relation or before that. CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the single table tab1 is any of the above restricted tables, seems fine. But, if there's a list of tables with CREATE PUBLICATION testpub FOR TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4, unlogged_tab5, normal_tab6, normal_tab7 ..; This query fails on first encounter of the restricted table, say at temp_tab2. Whereas, CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted tables and continue to add the accepted tables into the publication within the database. IMHO, if there's a list of tables specified with FOR TABLE, then instead of throwing an error in case of any restricted table, we can issue a warning and continue with the addition of accepted tables into the publication. If done, this behaviour will be in sync with FOR ALL TABLES; Thoughts? If okay, I can work on a patch. Related to error messages: when foreign table is specified in CREATE PUBLICATION statement, then "ERROR: "f1" is not a table", is thrown [1], how about the error message "ERROR: foerign table "f1" cannot be replicated". In general, it would be good if we could have the error messages something like in [2] instead of the existing [1]. Thoughts? If okay, I can work on a patch. [1] t1 is a temporary table: postgres=# CREATE PUBLICATION testpub FOR TABLE
Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
> 10 янв. 2021 г., в 14:43, Noah Misch написал(а): > >> Can you please send revised patches with fixes? > > Attached. > > I'm marking patch as ready for committer. I can't tell should we backpatch insurance patch or not: it potentially fixes unknown bugs, and potentially contains unknown bugs. I can't reason because of such uncertainty. I've tried to look for any potential problem and as for now I see none. Chances are is doing code less error-prone. Fix certainly worth backpatching. Thanks! Best regards, Andrey Borodin.
Re: [HACKERS] Custom compression methods
On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar wrote: > > On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby wrote: > > > > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote: > > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby wrote: > > > > And fails pg_upgrade check, apparently losing track of the compression > > > > (?) > > > > > > > > CREATE TABLE public.cmdata2 ( > > > > -f1 text COMPRESSION lz4 > > > > +f1 text > > > > ); > > > > > > I did not get this? pg_upgrade check is passing for me. > > > > I realized that this was failing in your v16 patch sent Dec 25. > > It's passing on current patches because they do "DROP TABLE cmdata2", but > > that's only masking the error. I tested specifically pg_upgrade by removing all the DROP table and MV and it is passing. I don't see the reason why should it fail. I mean after the upgrade why COMPRESSION lz4 is missing? > > I think this patch needs to be specifically concerned with pg_upgrade, so I > > suggest to not drop your tables and MVs, to allow the pg_upgrade test to > > check > > them. That exposes this issue: > > Thanks for the suggestion I will try this. > > > pg_dump: error: Error message from server: ERROR: cache lookup failed for > > access method 36447 > > pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout; > > pg_dumpall: error: pg_dump failed on database "regression", exiting > > waiting for server to shut down done > > server stopped > > pg_dumpall of post-upgrade database cluster failed > > > > I found that's the AM's OID in the old clsuter: > > regression=# SELECT * FROM pg_am WHERE oid=36447; > > oid | amname | amhandler | amtype > > ---++-+ > > 36447 | pglz2 | pglzhandler | c > > > > But in the new cluster, the OID has changed. Since that's written into > > table > > data, I think you have to ensure that the compression OIDs are preserved on > > upgrade: > > > > 16755 | pglz2 | pglzhandler | c > > Yeah, basically we are storing am oid in the compressed data so Oid > must be preserved. I will look into this and fix it. On further analysis, if we are dumping and restoring then we will compress the data back while inserting it so why would we need to old OID. I mean in the new cluster we are inserting data again so it will be compressed again and now it will store the new OID. Am I missing something here? > > In my brief attempt to inspect it, I got this crash: > > > > $ tmp_install/usr/local/pgsql/bin/postgres -D > > src/bin/pg_upgrade/tmp_check/data & > > regression=# SELECT pg_column_compression(f1) FROM cmdata a; > > server closed the connection unexpectedly I tried to test this after the upgrade but I can get the proper value. Laptop309pnin:bin dilipkumar$ ./pg_ctl -D /Users/dilipkumar/Documents/PG/custom_compression/src/bin/pg_upgrade/tmp_check/data.old/ start waiting for server to start2021-01-11 11:53:28.153 IST [43412] LOG: starting PostgreSQL 14devel on x86_64-apple-darwin19.6.0, compiled by Apple clang version 11.0.3 (clang-1103.0.32.62), 64-bit 2021-01-11 11:53:28.170 IST [43412] LOG: database system is ready to accept connections done server started Laptop309pnin:bin dilipkumar$ ./psql -d regression regression[43421]=# SELECT pg_column_compression(f1) FROM cmdata a; pg_column_compression --- lz4 lz4 pglz2 (3 rows) Manual test: (dump and load on the new cluster) --- postgres[43903]=# CREATE ACCESS METHOD pglz2 TYPE COMPRESSION HANDLER pglzhandler; CREATE ACCESS METHOD postgres[43903]=# select oid from pg_am where amname='pglz2'; oid --- 16384 (1 row) postgres[43903]=# CREATE TABLE cmdata_test(f1 text COMPRESSION pglz2); CREATE TABLE postgres[43903]=# INSERT INTO cmdata_test VALUES(repeat('1234567890',1000)); INSERT 0 1 postgres[43903]=# SELECT pg_column_compression(f1) FROM cmdata_test; pg_column_compression --- pglz2 (1 row) Laptop309pnin:bin dilipkumar$ ./pg_dump -d postgres > 1.sql —restore on new cluster— postgres[44030]=# select oid from pg_am where amname='pglz2'; oid --- 16385 (1 row) postgres[44030]=# SELECT pg_column_compression(f1) FROM cmdata_test; pg_column_compression --- pglz2 (1 row) You can see on the new cluster the OID of the pglz2 is changed but there is no issue. Is it possible for you to give me a self-contained test case to reproduce the issue or a theory that why it should fail? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Custom compression methods
On Mon, Jan 11, 2021 at 12:11:54PM +0530, Dilip Kumar wrote: > On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar wrote: > > On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby wrote: > > > > > > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote: > > > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby > > > > wrote: > > > > > And fails pg_upgrade check, apparently losing track of the > > > > > compression (?) > > > > > > > > > > CREATE TABLE public.cmdata2 ( > > > > > -f1 text COMPRESSION lz4 > > > > > +f1 text > > > > > ); > > > > > > > > I did not get this? pg_upgrade check is passing for me. > > > > > > I realized that this was failing in your v16 patch sent Dec 25. > > > It's passing on current patches because they do "DROP TABLE cmdata2", but > > > that's only masking the error. > > I tested specifically pg_upgrade by removing all the DROP table and MV > and it is passing. I don't see the reason why should it fail. I mean > after the upgrade why COMPRESSION lz4 is missing? How did you test it ? I'm not completely clear how this is intended to work... has it been tested before ? According to the comments, in binary upgrade mode, there's an ALTER which is supposed to SET COMPRESSION, but that's evidently not happening. > > > I found that's the AM's OID in the old clsuter: > > > regression=# SELECT * FROM pg_am WHERE oid=36447; > > > oid | amname | amhandler | amtype > > > ---++-+ > > > 36447 | pglz2 | pglzhandler | c > > > > > > But in the new cluster, the OID has changed. Since that's written into > > > table > > > data, I think you have to ensure that the compression OIDs are preserved > > > on > > > upgrade: > > > > > > 16755 | pglz2 | pglzhandler | c > > > > Yeah, basically we are storing am oid in the compressed data so Oid > > must be preserved. I will look into this and fix it. > > On further analysis, if we are dumping and restoring then we will > compress the data back while inserting it so why would we need to old > OID. I mean in the new cluster we are inserting data again so it will > be compressed again and now it will store the new OID. Am I missing > something here? I'm referring to pg_upgrade which uses pg_dump, but does *not* re-insert data, but rather recreates catalogs only and then links to the old tables (either with copy, link, or clone). Test with make -C src/bin/pg_upgrade (which is included in make check-world). -- Justin
Re: Moving other hex functions to /common
On Wed, Jan 06, 2021 at 08:58:23AM -0500, Bruce Momjian wrote: > Fine. Do you want to add the overflow to the patch I posted, for all > encoding types? Yeah. It looks that it would be good to be consistent as well for escape case, so as it is possible to add a dstlen argument to struct pg_encoding for the encoding and decoding routines. I would also prefer the option to remove the argument "data" from the encode and decode length routines for the hex routines part of src/common/, even if it forces the creation of two small wrappers in encode.c to call the routines of src/common/. Would you prefer if I send a patch by myself? Please note that anything I'd send would use directly elog() and pg_log() instead of returning status codes for the src/common/ routines, and of course not touch ECPG, as that's the approach you are favoring. -- Michael signature.asc Description: PGP signature
Re: psql \df choose functions by their arguments
Hi I tried this patch out last year but was overrolled by Other Stuff before I got around to providing any feedback, and was reminded of it just now when I was trying to execute "\df somefunction text int" or similar, which had me confused until I remembered it's not a feature yet, so it would certainly be very welcome to have this. 2020年11月3日(火) 23:27 Greg Sabino Mullane : > > Thanks for looking this over! > >> >> some Abbreviations of common types are not added to the type_abbreviations[] >> Such as: >> >> Int8=> bigint > > > I wasn't aiming to provide a canonical list, as I personally have never seen > anyone use int8 instead of bigint when (for example) creating a function, but > I'm not strongly opposed to expanding the list. I have vague memories of working with "int8" a bit (possibly related to an Informix migration), anyway it seems easy enough to add them for completeness as someone (possibly migrating from another database) might wonder why it's not working. Just a small code readability suggestion - in exec_command_d(), it seems neater to put the funcargs declaration in a block together with the code with which uses it (see attached diff). Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index d449cea66c..e8c1fd64bc 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -781,8 +781,6 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) case 'f': /* function subsystem */ switch (cmd[2]) { - char *funcargs; - case '\0': case '+': case 'S': @@ -791,10 +789,14 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) case 'p': case 't': case 'w': + { + char *funcargs; + funcargs = psql_scan_slash_option(scan_state, OT_WHOLE_LINE, NULL, true); success = describeFunctions(&cmd[2], pattern, show_verbose, show_system, funcargs); free(funcargs); break; + } default: status = PSQL_CMD_UNKNOWN; break;
Re: logical replication worker accesses catalogs in error context callback
On Mon, Jan 11, 2021 at 10:56 AM Masahiko Sawada wrote: > Agreed. Attached the updated patch. Thanks for the updated patch. Looks like the comment crosses the 80 char limit, I have corrected it. And also changed the variable name from remotetypeid to remotetypid, so that logicalrep_typmap_gettypname will not cross the 80 char limit. And also added a commit message. Attaching v3 patch, please have a look. Both make check and make check-world passes. > > I quickly searched in places where error callbacks are being used, I > > think we need a similar kind of fix for conversion_error_callback in > > postgres_fdw.c, because get_attname and get_rel_name are being used > > which do catalogue lookups. ISTM that all the other error callbacks > > are good in the sense that they are not doing sys catalogue lookups. > > Indeed. If we need to disallow the catalog lookup during executing > error callbacks we might want to have an assertion checking that in > SearchCatCacheInternal(), in addition to Assert(IsTransactionState()). I tried to add(as attached in v1-0001-Avoid-Sys-Cache-Lookups-in-Error-Callbacks.patch) the Assert(!error_context_stack); in SearchCatCacheInternal, initdb itself fails [1]. That means, we are doing a bunch of catalogue access from error context callbacks. Given this, I'm not quite sure whether we can have such an assertion in SearchCatCacheInternal. Although unrelated to what we are discussing here - when I looked at SearchCatCacheInternal, I found that the function SearchCatCache has SearchCatCacheInternal in the function comment, I think we should correct it. Thoughts? If okay, I will post a separate patch for this minor comment fix. [1] running bootstrap script ... TRAP: FailedAssertion("error_context_stack", File: "catcache.c", Line: 1220, PID: 310728) /home/bharath/workspace/postgres/instnew/bin/postgres(ExceptionalCondition+0xd0)[0x56056984c8ba] /home/bharath/workspace/postgres/instnew/bin/postgres(+0x76eb1a)[0x560569826b1a] /home/bharath/workspace/postgres/instnew/bin/postgres(SearchCatCache1+0x3a)[0x5605698269af] /home/bharath/workspace/postgres/instnew/bin/postgres(SearchSysCache1+0xc1)[0x5605698448b2] /home/bharath/workspace/postgres/instnew/bin/postgres(get_typtype+0x1f)[0x56056982e389] /home/bharath/workspace/postgres/instnew/bin/postgres(CheckAttributeType+0x29)[0x5605692bafe9] /home/bharath/workspace/postgres/instnew/bin/postgres(CheckAttributeNamesTypes+0x2c9)[0x5605692bafac] /home/bharath/workspace/postgres/instnew/bin/postgres(heap_create_with_catalog+0x11f)[0x5605692bc436] /home/bharath/workspace/postgres/instnew/bin/postgres(boot_yyparse+0x7f0)[0x5605692a0d3f] /home/bharath/workspace/postgres/instnew/bin/postgres(+0x1ecb36)[0x5605692a4b36] /home/bharath/workspace/postgres/instnew/bin/postgres(AuxiliaryProcessMain+0x5e0)[0x5605692a4997] /home/bharath/workspace/postgres/instnew/bin/postgres(main+0x268)[0x5605694c6bce] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fa2777ce0b3] /home/bharath/workspace/postgres/instnew/bin/postgres(_start+0x2e)[0x5605691794de] With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From a806415a49e3fc8c1e71a102aefb0c10c08deb05 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Mon, 11 Jan 2021 12:26:12 +0530 Subject: [PATCH v3] fix slot store error callback Avoid accessing system catalogues inside slot_store_error_callback error context callback, because the the entire transaction might have been broken at this point. Since logicalrep_typmap_gettypname() could search the sys cache by calling to format_type_be(), store both local and remote type names to SlotErrCallbackArg so that we can just set the names in the error callback without sys cache lookup. Author: Masahiko Sawada --- src/backend/replication/logical/worker.c | 50 +--- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 1b1d70ed68..95f28c9acf 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -132,8 +132,9 @@ static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping); typedef struct SlotErrCallbackArg { LogicalRepRelMapEntry *rel; - int local_attnum; int remote_attnum; + char *local_typename; + char *remote_typename; } SlotErrCallbackArg; /* @@ -430,30 +431,19 @@ static void slot_store_error_callback(void *arg) { SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg; - LogicalRepRelMapEntry *rel; - char *remotetypname; - Oid remotetypoid, -localtypoid; + LogicalRepRelMapEntry *rel = errarg->rel; /* Nothing to do if remote attribute number is not set */ if (errarg->remote_attnum < 0) return; - rel = errarg->rel; - remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum]; - - /* Fetch remote type name from the LogicalRepTypMap cache */ - remotetypname = logicalrep_typmap_gettypname(remotetypoid); - - /* Fetch local type
Re: Reduce the time required for a database recovery from archive.
Hi StephenBased on our last discussion I redesigned the implementation of WAL archive recovery speed-up. The main idea of the new implementation was partly borrowed from your proposal, to be more accurate from the following one:On 9 Nov 2020, at 23:31, Stephen Frostwrote:The relatively simple approach I was thinking was that a couple ofworkers would be started and they'd have some prefetch amount that needsto be kept out ahead of the applying process, which they couldpotentially calculate themselves without needing to be pushed forward bythe applying process.In the new implementation, several workers are spawned on server start up for delivering WAL segements from archive. The number of workers to spawn is specfied by the GUC parameter wal_prefetch_workers; the max. number of files to preload from the archive is determined by the GUC parameter wal_max_prefetch_amount. The applier of WAL records still handles WAL files one-by-one, but since several prefetching processes are loading files from the archive, there is a high probability that when the next WAL file is requested by the applier of WAL records, it has already been delivered from the archive.Every time any of the running workers is going to preload the next WAL file, it checks whether a limit imposed by the parameter wal_max_prefetch_amount was reached. If it was, then the process suspends preloading until the WAL applier process handles some of the already preloaded WAL files and the total number of already loaded but not yet processed WAL files drops below this limit.At the moment I didn't implement a mechanism for dynamic calculation of the number of workers required for loading the WAL files in time. We can consider current (simplified) implementation as a base for further discussion and turn to this matter in the next iteration if it be needed.Also I would like to ask your opinion about the issue I'm thinking about: Parallel workers spawned for preloading WAL files from archive use the original mechanism for delivering files from archive - they run a command specified by the GUC parameter restore_command. One of the possible parameters accepted by the restore_command is %r, which specifies the filename of the last restart point. If several workers preload WAL files simultaneously with another process applying the preloaded WAL files, I’m not sure what is correct way to determine the last restart point value that WAL-preloading processes should use, because this value can be updated at any time by the process that applies WALs.Another issue that I would like to ask your opinion about regards to choosing correct value for a max size of the hash table stored in shared memory. Currently, wal_max_prefetch_amount is passed as the value for max. hash table size that I'm not sure is the best decision.Thanks in advance for feedback.Regards,Dmitry 0001-Reduce-time-required-to-recover-database-from-archiv.patch Description: Binary data