Re: Cleaning up historical portability baggage
On 15.08.22 03:48, Thomas Munro wrote: I vaguely remember successfully trying it in the past. But I just tried it unsuccessfully in a VM and there's a bunch of other places saying it's not working... https://github.com/microsoft/WSL/issues/4240 I think we'd better remove our claim that it works then. Patch attached. When I developed support for abstract unix sockets, I did test them on Windows. The lack of support on WSL appears to be an unrelated fact. See for example how [0] talks about them separately. [0]: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/
Remove remaining mentions of UNSAFE_STAT_OK
The last use of UNSAFE_STAT_OK was removed in bed90759fcbcd72d4d06969eebab81e47326f9a2, but the build system(s) still mentions it. Is it safe to remove, or does it interact with system header files in some way that isn't obvious here?From 594268d54fd344348aa547bc0d3fa6393255a52b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 15 Aug 2022 10:39:07 +0200 Subject: [PATCH] Remove remaining mentions of UNSAFE_STAT_OK The last use was removed by bed90759fcbcd72d4d06969eebab81e47326f9a2. --- src/interfaces/libpq/Makefile | 2 +- src/tools/msvc/Mkvcbuild.pm | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index 8abdb092c2..79c574eeb8 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -22,7 +22,7 @@ NAME= pq SO_MAJOR_VERSION= 5 SO_MINOR_VERSION= $(MAJORVERSION) -override CPPFLAGS := -DFRONTEND -DUNSAFE_STAT_OK -I$(srcdir) $(CPPFLAGS) -I$(top_builddir)/src/port -I$(top_srcdir)/src/port +override CPPFLAGS := -DFRONTEND -I$(srcdir) $(CPPFLAGS) -I$(top_builddir)/src/port -I$(top_srcdir)/src/port ifneq ($(PORTNAME), win32) override CFLAGS += $(PTHREAD_CFLAGS) endif diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm index 85d03372a5..ee963d85f3 100644 --- a/src/tools/msvc/Mkvcbuild.pm +++ b/src/tools/msvc/Mkvcbuild.pm @@ -266,7 +266,6 @@ sub mkvcbuild $libpq = $solution->AddProject('libpq', 'dll', 'interfaces', 'src/interfaces/libpq'); $libpq->AddDefine('FRONTEND'); - $libpq->AddDefine('UNSAFE_STAT_OK'); $libpq->AddIncludeDir('src/port'); $libpq->AddLibrary('secur32.lib'); $libpq->AddLibrary('ws2_32.lib'); -- 2.37.1
Re: Making Vars outer-join aware
On Tue, Aug 2, 2022 at 3:51 AM Tom Lane wrote: > Here's a rebase up to HEAD, mostly to placate the cfbot. > I accounted for d8e34fa7a (s/all_baserels/all_query_rels/ > in those places) and made one tiny bug-fix change. > Nothing substantive as yet. When we add required PlaceHolderVars to a join rel's targetlist, if the PHV can be computed in the nullable-side of the join, we would add the join's RT index to phnullingrels. This is correct as we know the PHV's value can be nulled at this join. But I'm wondering if it is necessary since we have already propagated any varnullingrels into the PHV when we apply pullup variable replacement in perform_pullup_replace_vars(). On the other hand, the phnullingrels may contain RT indexes of outer joins above this join level. It seems not good to add such a PHV to the joinrel's targetlist. Below is an example: # explain (verbose, costs off) select a.i, ss.jj from a left join (b left join (select c.i, coalesce(c.j, 1) as jj from c) ss on b.i = ss.i) on true; QUERY PLAN - Nested Loop Left Join Output: a.i, (COALESCE(c.j, 1)) -> Seq Scan on public.a Output: a.i, a.j -> Materialize Output: (COALESCE(c.j, 1)) -> Hash Left Join Output: (COALESCE(c.j, 1)) Hash Cond: (b.i = c.i) -> Seq Scan on public.b Output: b.i, b.j -> Hash Output: c.i, (COALESCE(c.j, 1)) -> Seq Scan on public.c Output: c.i, COALESCE(c.j, 1) (15 rows) In this query, for the joinrel {B, C}, the PHV in its targetlist has a phnullingrels that contains the join of {A} and {BC}, which is confusing because we have not reached that join level. I tried the changes below to illustrate the two issues. The assertion holds true during regression tests and the error pops up for the query above. --- a/src/backend/optimizer/util/placeholder.c +++ b/src/backend/optimizer/util/placeholder.c @@ -464,18 +464,20 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, { if (sjinfo->jointype == JOIN_FULL && sjinfo->ojrelid != 0) { - /* PHV's value can be nulled at this join */ - phv->phnullingrels = bms_add_member(phv->phnullingrels, - sjinfo->ojrelid); + Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels)); + + if (!bms_is_subset(phv->phnullingrels, joinrel->relids)) + elog(ERROR, "phnullingrels is not subset of joinrel's relids"); } } else if (bms_is_subset(phinfo->ph_eval_at, inner_rel->relids)) { if (sjinfo->jointype != JOIN_INNER && sjinfo->ojrelid != 0) { - /* PHV's value can be nulled at this join */ - phv->phnullingrels = bms_add_member(phv->phnullingrels, - sjinfo->ojrelid); + Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels)); + + if (!bms_is_subset(phv->phnullingrels, joinrel->relids)) + elog(ERROR, "phnullingrels is not subset of joinrel's relids"); } } If the two issues are indeed something we need to fix, maybe we can change add_placeholders_to_joinrel() to search the PHVs from outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels if needed, just like what we do in build_joinrel_tlist(). The PHVs there should have the correct phnullingrels (at least the PHVs in base rels' targetlists have correctly set phnullingrels to NULL). Thanks Richard
Re: Cleaning up historical portability baggage
On Mon, Aug 15, 2022 at 8:36 PM Peter Eisentraut wrote: > On 15.08.22 03:48, Thomas Munro wrote: > >> I vaguely remember successfully trying it in the past. But I just tried it > >> unsuccessfully in a VM and there's a bunch of other places saying it's not > >> working... > >> https://github.com/microsoft/WSL/issues/4240 > > I think we'd better remove our claim that it works then. Patch attached. > > When I developed support for abstract unix sockets, I did test them on > Windows. The lack of support on WSL appears to be an unrelated fact. > See for example how [0] talks about them separately. User amoldeshpande's complaint was posted to the WSL project's issue tracker but it's about native Windows/winsock code and s/he says so explicitly (though other people pile in with various other complaints including WSL interop). User sunilmut's comment says it's not working, and [0] is now just confusing everybody :-(
Fwd: Merging statistics from children instead of re-sampling everything
> > 3) stadistinct - This is quite problematic. We only have the per-child > estimates, and it's not clear if there's any overlap. For now I've just > summed it up, because that's safer / similar to what we do for gather > merge paths etc. Maybe we could improve this by estimating the overlap > somehow (e.g. from MCV lists / histograms). But honestly, I doubt the > estimates based on tiny sample of each child are any better. I suppose > we could introduce a column option, determining how to combine ndistinct > (similar to how we can override n_distinct itself). > > 4) MCV - It's trivial to build a new "parent" MCV list, although it may > be too large (in which case we cut it at statistics target, and copy the > remaining bits to the histogram) > I think there is one approach to solve the problem with calculating mcv and distinct statistics. To do this, you need to calculate the density of the sample distribution and store it, for example, in some slot. Then, when merging statistics, we will sum up the densities of all partitions as functions and get a new density. According to the new density, you can find out which values are most common and which are distinct. To calculate the partition densities, you can use the "Kernel density Estimation" - https://www.statsmodels.org/dev/examples/notebooks/generated/kernel_density html The approach may be very inaccurate and difficult to implement, but solves the problem. Regards, Damir Belyalov Postgres Professional
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Hi Amit, Amit Kapila , 6 Ağu 2022 Cmt, 16:01 tarihinde şunu yazdı: > I think there is some basic flaw in slot reuse design. Currently, we > copy the table by starting a repeatable read transaction (BEGIN READ > ONLY ISOLATION LEVEL REPEATABLE READ) and create a slot that > establishes a snapshot which is first used for copy and then LSN > returned by it is used in the catchup phase after the copy is done. > The patch won't establish such a snapshot before a table copy as it > won't create a slot each time. If this understanding is correct, I > think we need to use ExportSnapshot/ImportSnapshot functionality to > achieve it or do something else to avoid the problem mentioned. > I did not really think about the snapshot created by replication slot while making this change. Thanks for pointing it out. I've been thinking about how to fix this issue. There are some points I'm still not sure about. If the worker will not create a new replication slot, which snapshot should we actually export and then import? At the line where the worker was supposed to create replication slot but now will reuse an existing slot instead, calling pg_export_snapshot() can export the snapshot instead of CREATE_REPLICATION_SLOT. However, importing that snapshot into the current transaction may not make any difference since we exported that snapshot from the same transaction. I think this wouldn't make sense. How else an export/import snapshot logic can be placed in this change? LSN also should be set accurately. The current change does not handle LSN properly. I see that CREATE_REPLICATION_SLOT returns consistent_point which indicates the earliest location which streaming can start from. And this consistent_point is used as origin_startpos. If that's the case, would it make sense to use "confirmed_flush_lsn" of the replication slot in case the slot is being reused? Since confirmed_flush_lsn can be considered as the safest, earliest location which streaming can start from, I think it would work. And at this point, with the correct LSN, I'm wondering whether this export/import logic is really necessary if the worker does not create a replication slot. What do you think? For small tables, it could have a visible performance difference as it > involves database write operations to each time create and drop the > origin. But if we don't want to reuse then also you need to set its > origin_lsn appropriately. Currently (without this patch), after > creating the slot, we directly use the origin_lsn returned by > create_slot API whereas now it won't be the same case as the patch > doesn't create a slot every time. > Correct. For this issue, please consider the LSN logic explained above. Thanks, Melih
Re: ICU for global collation
Hello everyone in this thread! While reading and testing the patch that adds ICU for global collations [1] I noticed on master (1c5818b9c68e5c2ac8f19d372f24cce409de1a26) and REL_15_STABLE (63b64d8270691894a9a8f2d4e929e7780020edb8) that: 1) pg_upgrade from REL_14_STABLE (63b64d8270691894a9a8f2d4e929e7780020edb8) does not always work: For REL_14_STABLE: $ initdb -D data_old For REL_15_STABLE or master: $ initdb -D data_new --locale-provider icu --icu-locale ru-RU $ pg_upgrade -d .../data_old -D data_new -b ... -B ... ... Restoring database schemas in the new cluster template1 *failure* Consult the last few lines of "data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_dump_1.log" for the probable cause of the failure. Failure, exiting In data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_dump_1.log: pg_restore: error: could not execute query: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Command was: CREATE DATABASE "template1" WITH TEMPLATE = template0 OID = 1 ENCODING = 'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8'; In data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_server.log: TRAP: FailedAssertion("(dblocprovider == COLLPROVIDER_ICU && dbiculocale) || (dblocprovider != COLLPROVIDER_ICU && !dbiculocale)", File: "dbcommands.c", Line: 1292, PID: 69247) postgres: marina postgres [local] CREATE DATABASE(ExceptionalCondition+0xb9)[0xb4d8ec] postgres: marina postgres [local] CREATE DATABASE(createdb+0x1abc)[0x68ca99] postgres: marina postgres [local] CREATE DATABASE(standard_ProcessUtility+0x651)[0x9b1d82] postgres: marina postgres [local] CREATE DATABASE(ProcessUtility+0x122)[0x9b172a] postgres: marina postgres [local] CREATE DATABASE[0x9b01cf] postgres: marina postgres [local] CREATE DATABASE[0x9b0433] postgres: marina postgres [local] CREATE DATABASE(PortalRun+0x2fe)[0x9af95d] postgres: marina postgres [local] CREATE DATABASE[0x9a953b] postgres: marina postgres [local] CREATE DATABASE(PostgresMain+0x733)[0x9ada6b] postgres: marina postgres [local] CREATE DATABASE[0x8ec632] postgres: marina postgres [local] CREATE DATABASE[0x8ebfbb] postgres: marina postgres [local] CREATE DATABASE[0x8e8653] postgres: marina postgres [local] CREATE DATABASE(PostmasterMain+0x1226)[0x8e7f26] postgres: marina postgres [local] CREATE DATABASE[0x7bbccb] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7eff082f90b3] postgres: marina postgres [local] CREATE DATABASE(_start+0x2e)[0x49c29e] 2022-08-15 14:24:56.124 MSK [69231] LOG: server process (PID 69247) was terminated by signal 6: Aborted 2022-08-15 14:24:56.124 MSK [69231] DETAIL: Failed process was running: CREATE DATABASE "template1" WITH TEMPLATE = template0 OID = 1 ENCODING = 'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8'; 1.1) It looks like there's a bug in the function get_db_infos (src/bin/pg_upgrade/info.c), where the version of the old cluster is always checked: if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) snprintf(query + strlen(query), sizeof(query) - strlen(query), "'c' AS datlocprovider, NULL AS daticulocale, "); else snprintf(query + strlen(query), sizeof(query) - strlen(query), "datlocprovider, daticulocale, "); With the simple patch diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster) snprintf(query, sizeof(query), "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "); - if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) + if (GET_MAJOR_VERSION(cluster->major_version) < 1500) snprintf(query + strlen(query), sizeof(query) - strlen(query), "'c' AS datlocprovider, NULL AS daticulocale, "); else I got the expected error during the upgrade: locale providers for database "template1" do not match: old "libc", new "icu" Failure, exiting 1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the following lines earlier: if (dbiculocale == NULL) dbiculocale = src_iculocale; The following patch works for me: diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("ICU locale must be
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 2022-07-19 21:40, Damir Belyalov wrote: Hi! Improved my patch by adding block subtransactions. The block size is determined by the REPLAY_BUFFER_SIZE parameter. I used the idea of a buffer for accumulating tuples in it. If we read REPLAY_BUFFER_SIZE rows without errors, the subtransaction will be committed. If we find an error, the subtransaction will rollback and the buffer will be replayed containing tuples. Thanks for working on this! I tested 0002-COPY-IGNORE_ERRORS.patch and faced an unexpected behavior. I loaded 1 rows which contained 1 wrong row. I expected I could see rows after COPY, but just saw 999 rows. Since when I changed MAX_BUFFERED_TUPLES from 1000 to other values, the number of loaded rows also changed, I imagine MAX_BUFFERED_TUPLES might be giving influence of this behavior. ```sh $ cat /tmp/test1.dat 1 aaa 2 aaa 3 aaa 4 aaa 5 aaa 6 aaa 7 aaa 8 aaa 9 aaa 10 aaa 11 aaa ... 9994aaa 9995aaa 9996aaa 9997aaa 9998aaa aaa xxx aaa ``` ```SQL =# CREATE TABLE test (id int, data text); =# COPY test FROM '/tmp/test1.dat' WITH (IGNORE_ERRORS); WARNING: COPY test, line 1, column i: "xxx" COPY =# SELECT COUNT(*) FROM test; count --- 999 (1 row) ``` BTW I may be overlooking it, but have you submit this proposal to the next CommitFest? https://commitfest.postgresql.org/39/ -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
pg_receivewal and SIGTERM
There's a smallish backup tool called pg_backupcluster in Debian's postgresql-common which also ships a systemd service that runs pg_receivewal for wal archiving, and supplies a pg_getwal script for reading the files back on restore, including support for .partial files. So far the machinery was using plain files and relied on compressing the WALs from time to time, but now I wanted to compress the files directly from pg_receivewal --compress=5. Unfortunately this broke the regression tests that include a test for the .partial files where pg_receivewal.service is shut down before the segment is full. The problem was that systemd's default KillSignal is SIGTERM, while pg_receivewal flushes the output compression buffers on SIGINT only. The attached patch makes it do the same for SIGTERM as well. (Most places in PG that install a SIGINT handler also install a SIGTERM handler already.) Christoph >From 8e42bf5458ae00050327da07c09a77649f24e36d Mon Sep 17 00:00:00 2001 From: Christoph Berg Date: Mon, 15 Aug 2022 14:29:43 +0200 Subject: [PATCH] pg_receivewal: Exit cleanly on SIGTERM Compressed output is only flushed on clean exits. The reason to support SIGTERM here as well is that pg_receivewal might well be running as a daemon, and systemd's default KillSignal is SIGTERM. --- doc/src/sgml/ref/pg_receivewal.sgml | 8 +--- src/bin/pg_basebackup/pg_receivewal.c | 1 + 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml index 4fe9e1a874..5f83ba1893 100644 --- a/doc/src/sgml/ref/pg_receivewal.sgml +++ b/doc/src/sgml/ref/pg_receivewal.sgml @@ -118,8 +118,9 @@ PostgreSQL documentation In the absence of fatal errors, pg_receivewal - will run until terminated by the SIGINT signal - (ControlC). + will run until terminated by the SIGINT + (ControlC) + or SIGTERM signal. @@ -457,7 +458,8 @@ PostgreSQL documentation pg_receivewal will exit with status 0 when - terminated by the SIGINT signal. (That is the + terminated by the SIGINT or + SIGTERM signal. (That is the normal way to end it. Hence it is not an error.) For fatal errors or other signals, the exit status will be nonzero. diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c index f064cff4ab..4acd0654b9 100644 --- a/src/bin/pg_basebackup/pg_receivewal.c +++ b/src/bin/pg_basebackup/pg_receivewal.c @@ -933,6 +933,7 @@ main(int argc, char **argv) */ #ifndef WIN32 pqsignal(SIGINT, sigint_handler); + pqsignal(SIGTERM, sigint_handler); #endif /* -- 2.35.1
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
> > > Thank you for feedback. I improved my patch recently and tested it on different sizes of MAX_BUFFERED_TUPLES and REPLAY_BUFFER_SIZE. > I loaded 1 rows which contained 1 wrong row. > I expected I could see rows after COPY, but just saw 999 rows. Also I implemented your case and it worked correctly. > BTW I may be overlooking it, but have you submit this proposal to the next CommitFest? Good idea. Haven't done it yet. Regards, Damir Postgres Professional From fa6b99c129eb890b25f006bb7891a247c8a431a7 Mon Sep 17 00:00:00 2001 From: Damir Belyalov Date: Fri, 15 Oct 2021 11:55:18 +0300 Subject: [PATCH] COPY_IGNORE_ERRORS without GUC with function safeNextCopyFrom() with struct SafeCopyFromState with refactoring --- doc/src/sgml/ref/copy.sgml | 13 ++ src/backend/commands/copy.c | 8 ++ src/backend/commands/copyfrom.c | 162 ++- src/backend/parser/gram.y| 8 +- src/bin/psql/tab-complete.c | 3 +- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 21 +++ src/include/parser/kwlist.h | 1 + src/test/regress/expected/copy2.out | 123 + src/test/regress/sql/copy2.sql | 110 +++ 10 files changed, 445 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 8aae711b3b..7d20b1649e 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -34,6 +34,7 @@ COPY { table_name [ ( format_name FREEZE [ boolean ] +IGNORE_ERRORS [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' HEADER [ boolean | MATCH ] @@ -233,6 +234,18 @@ COPY { table_name [ ( + +IGNORE_ERRORS + + + Drop rows that contain malformed data while copying. That is rows + containing syntax errors in data, rows with too many or too few columns, + rows that result in constraint violations, rows containing columns where + the data type's input function raises an error. + + + + DELIMITER diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3ac731803b..fead1aba46 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -402,6 +402,7 @@ ProcessCopyOptions(ParseState *pstate, { bool format_specified = false; bool freeze_specified = false; + bool ignore_errors_specified = false; bool header_specified = false; ListCell *option; @@ -442,6 +443,13 @@ ProcessCopyOptions(ParseState *pstate, freeze_specified = true; opts_out->freeze = defGetBoolean(defel); } + else if (strcmp(defel->defname, "ignore_errors") == 0) + { + if (ignore_errors_specified) +errorConflictingDefElem(defel, pstate); + ignore_errors_specified = true; + opts_out->ignore_errors = defGetBoolean(defel); + } else if (strcmp(defel->defname, "delimiter") == 0) { if (opts_out->delim) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index a976008b3d..285c491ddd 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,9 @@ static char *limit_printout_length(const char *str); static void ClosePipeFromProgram(CopyFromState cstate); +static bool safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + /* * error context callback for COPY FROM * @@ -521,6 +524,125 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri, miinfo->bufferedBytes += tuplen; } +/* + * Analog of NextCopyFrom() but ignore rows with errors while copying. + */ +static bool +safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + SafeCopyFromState *safecstate = cstate->safecstate; + bool valid_row = true; + + safecstate->skip_row = false; + + PG_TRY(); + { + if (!safecstate->replay_is_active) + { + if (safecstate->begin_subtransaction) + { +BeginInternalSubTransaction(NULL); +CurrentResourceOwner = safecstate->oldowner; + +safecstate->begin_subtransaction = false; + } + + if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE) + { +valid_row = NextCopyFrom(cstate, econtext, values, nulls); +if (valid_row) +{ + /* Fill replay_buffer in oldcontext*/ + MemoryContextSwitchTo(safecstate->oldcontext); + safecstate->replay_buffer[safecstate->saved_tuples++] = heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls); + + safecstate->skip_row = true; +} +else if (!safecstate->processed_remaining_tuples) +{ + ReleaseCurrentSubTransaction(); + CurrentResourceOwner = safecstate->oldowner; + if (safecstate->replayed_tuples < safecstate->saved_tuples) + { + /* Prepare to replay remaining tuples if they exist */ + safecstate->replay_is_active = true; + safecstate->processed_remaining_tuples = true; +
Re: [PATCH] Optimize json_lex_string by batching character copying
I wrote > On Mon, Jul 11, 2022 at 11:07 PM Andres Freund wrote: > > > I wonder if we can add a somewhat more general function for scanning until > > some characters are found using SIMD? There's plenty other places that could > > be useful. > > In simple cases, we could possibly abstract the entire loop. With this > particular case, I imagine the most approachable way to write the loop would > be a bit more low-level: > > while (p < end - VECTOR_WIDTH && >!vector_has_byte(p, '\\') && >!vector_has_byte(p, '"') && >vector_min_byte(p, 0x20)) > p += VECTOR_WIDTH > > I wonder if we'd lose a bit of efficiency here by not accumulating set bits > from the three conditions, but it's worth trying. The attached implements the above, more or less, using new pg_lfind8() and pg_lfind8_le(), which in turn are based on helper functions that act on a single vector. The pg_lfind* functions have regression tests, but I haven't done the same for json yet. I went the extra step to use bit-twiddling for non-SSE builds using uint64 as a "vector", which still gives a pretty good boost (test below, min of 3): master: 356ms v5: 259ms v5 disable SSE: 288ms It still needs a bit of polishing and testing, but I think it's a good workout for abstracting SIMD out of the way. - test: DROP TABLE IF EXISTS long_json_as_text; CREATE TABLE long_json_as_text AS with long as ( select repeat(description, 11) from pg_description ) select (select json_agg(row_to_json(long))::text as t from long) from generate_series(1, 100); VACUUM FREEZE long_json_as_text; select 1 from long_json_as_text where t::json is null; -- from Andrew upthread -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c index fefd1d24d9..1f9eb134e8 100644 --- a/src/common/jsonapi.c +++ b/src/common/jsonapi.c @@ -19,6 +19,7 @@ #include "common/jsonapi.h" #include "mb/pg_wchar.h" +#include "port/pg_lfind.h" #ifndef FRONTEND #include "miscadmin.h" @@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex) } else { - char *p; + char *p = s; if (hi_surrogate != -1) return JSON_UNICODE_LOW_SURROGATE; @@ -853,7 +854,13 @@ json_lex_string(JsonLexContext *lex) * Skip to the first byte that requires special handling, so we * can batch calls to appendBinaryStringInfo. */ - for (p = s; p < end; p++) + while (p < end - SIZEOF_VECTOR && + !pg_lfind8('\\', (unsigned char*) p, SIZEOF_VECTOR) && + !pg_lfind8('"', (unsigned char*) p, SIZEOF_VECTOR) && + !pg_lfind8_le(0x1F, (unsigned char*) p, SIZEOF_VECTOR)) +p += SIZEOF_VECTOR; + + for (; p < end; p++) { if (*p == '\\' || *p == '"') break; diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h index fb125977b2..e090ea6ac3 100644 --- a/src/include/port/pg_lfind.h +++ b/src/include/port/pg_lfind.h @@ -1,7 +1,7 @@ /*- * * pg_lfind.h - * Optimized linear search routines. + * Optimized linear search routines using SIMD intrinsics where available * * Copyright (c) 2022, PostgreSQL Global Development Group * @@ -15,6 +15,76 @@ #include "port/simd.h" +/* + * pg_lfind8 + * + * Return true if there is an element in 'base' that equals 'key', otherwise + * return false. + */ +static inline bool +pg_lfind8(uint8 key, uint8 *base, uint32 nelem) +{ + uint32 i; + /* round down to multiple of vector length */ + uint32 iterations = nelem & ~(SIZEOF_VECTOR - 1); + TYPEOF_VECTOR chunk; + + for (i = 0; i < iterations; i += SIZEOF_VECTOR) + { +#ifdef USE_SSE2 + chunk = _mm_loadu_si128((const __m128i *) &base[i]); +#else + memcpy(&chunk, &base[i], sizeof(chunk)); +#endif /* USE_SSE2 */ + if (vector_eq_byte(chunk, key)) + return true; + } + + /* Process the remaining elements one at a time. */ + for (; i < nelem; i++) + { + if (key == base[i]) + return true; + } + + return false; +} + +/* + * pg_lfind8 + * + * Return true if there is an element in 'base' that is less than or equal to 'key', otherwise + * return false. + */ +static inline bool +pg_lfind8_le(uint8 key, uint8 *base, uint32 nelem) +{ + uint32 i; + /* round down to multiple of vector length */ + uint32 iterations = nelem & ~(SIZEOF_VECTOR - 1); + TYPEOF_VECTOR chunk; + + for (i = 0; i < iterations; i += SIZEOF_VECTOR) + { +#ifdef USE_SSE2 + chunk = _mm_loadu_si128((const __m128i *) &base[i]); +#else + memcpy(&chunk, &base[i], sizeof(chunk)); +#endif /* USE_SSE2 */ + if (vector_le_byte(chunk, key)) + return true; + } + + /* Process the remaining elements one at a time. */ + for (; i < nelem; i++) + { + if (base[i] <= key) + return true; + } + + return false; +} + /* * pg_lfind32 * @@ -26,7 +96,6 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem) { uint32 i = 0; - /* Use SIMD intrinsics where available. */ #ifdef USE_SSE2 /* diff --git a/src/inclu
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Aug 15, 2022 at 12:39 PM Masahiko Sawada wrote: > > On Fri, Jul 22, 2022 at 10:43 AM Masahiko Sawada > wrote: > > > > On Tue, Jul 19, 2022 at 1:30 PM John Naylor > > wrote: > > > > > > > > > > > > On Tue, Jul 19, 2022 at 9:11 AM Masahiko Sawada > > > wrote: > > > > > > > I’d like to keep the first version simple. We can improve it and add > > > > more optimizations later. Using radix tree for vacuum TID storage > > > > would still be a big win comparing to using a flat array, even without > > > > all these optimizations. In terms of single-value leaves method, I'm > > > > also concerned about an extra pointer traversal and extra memory > > > > allocation. It's most flexible but multi-value leaves method is also > > > > flexible enough for many use cases. Using the single-value method > > > > seems to be too much as the first step for me. > > > > > > > > Overall, using 64-bit keys and 64-bit values would be a reasonable > > > > choice for me as the first step . It can cover wider use cases > > > > including vacuum TID use cases. And possibly it can cover use cases by > > > > combining a hash table or using tree of tree, for example. > > > > > > These two aspects would also bring it closer to Andres' prototype, which > > > 1) makes review easier and 2) easier to preserve optimization work > > > already done, so +1 from me. > > > > Thanks. > > > > I've updated the patch. It now implements 64-bit keys, 64-bit values, > > and the multi-value leaves method. I've tried to remove duplicated > > codes but we might find a better way to do that. > > > > With the recent changes related to simd, I'm going to split the patch > into at least two parts: introduce other simd optimized functions used > by the radix tree and the radix tree implementation. Particularly we > need two functions for radix tree: a function like pg_lfind32 but for > 8 bits integers and return the index, and a function that returns the > index of the first element that is >= key. I recommend looking at https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com since I did the work just now for searching bytes and returning a bool, buth = and <=. Should be pretty close. Also, i believe if you left this for last as a possible refactoring, it might save some work. In any case, I'll take a look at the latest patch next month. -- John Naylor EDB: http://www.enterprisedb.com
Re: Header checker scripts should fail on failure
On 2022-08-15 Mo 01:38, Thomas Munro wrote: > Hi, > > I thought commit 81b9f23c9c8 had my back, but nope, we still need to > make CI turn red if "headerscheck" and "cpluspluscheck" don't like our > patches (crake in the build farm should be a secondary defence...). > See attached. Yeah, the buildfarm module works around that by looking for non-empty output, but this is better, cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: shared-memory based stats collector - v70
On Thu, 11 Aug 2022 at 02:11, Drouvot, Bertrand wrote: > > As Andres was not -1 about that idea (as it should be low cost to add > and maintain) as long as somebody cares enough to write something: then > I'll give it a try and submit a patch for it. I agree it would be a useful feature. I think there may be things to talk about here though. 1) Are you planning to go through the local hash table and LocalSnapshot and obey the consistency mode? I was thinking a flag passed to build_snapshot to request global mode might be sufficient instead of a completely separate function. 2) When I did the function attached above I tried to avoid returning the whole set and make it possible to process them as they arrive. I actually was hoping to get to the point where I could start shipping out network data as they arrive and not even buffer up the response, but I think I need to be careful about hash table locking then. 3) They key difference here is that we're returning whatever stats are in the hash table rather than using the catalog to drive a list of id numbers to look up. I guess the API should make it clear this is what is being returned -- on that note I wonder if I've done something wrong because I noted a few records with InvalidOid where I didn't expect it. 4) I'm currently looping over the hash table returning the records all intermixed. Some users will probably want to do things like "return all Relation records for all databases" or "return all Index records for database id xxx". So some form of filtering may be best or perhaps a way to retrieve just the keys so they can then be looked up one by one (through the local cache?). 5) On that note I'm not clear how the local cache will interact with these cross-database lookups. That should probably be documented... -- greg
Re: Header checker scripts should fail on failure
Hi, On 2022-08-15 17:38:21 +1200, Thomas Munro wrote: > I thought commit 81b9f23c9c8 had my back, but nope, we still need to > make CI turn red if "headerscheck" and "cpluspluscheck" don't like our > patches (crake in the build farm should be a secondary defence...). > See attached. +1 Greetings, Andres Freund
Re: Tab completion for "ALTER TYPE typename SET" and rearranged "Alter TYPE typename RENAME"
On Mon, Aug 15, 2022 at 10:42 AM Michael Paquier wrote: > > On Sun, Aug 14, 2022 at 07:56:00PM +0530, vignesh C wrote: > > Modified the patch to list all the properties in case of "ALTER TYPE > > typename SET (". I have included the properties in alphabetical order > > as I notice that the ordering is in alphabetical order in few cases > > ex: "ALTER SUBSCRIPTION SET (". The attached v2 patch has the > > changes for the same. Thoughts? > > Seems fine here, so applied after tweaking a bit the comments, mostly > for consistency with the area. Thanks for pushing this patch. Regards, Vignesh
Re: Include the dependent extension information in describe command.
On Sun, Aug 14, 2022 at 10:24 PM vignesh C wrote: > > On Sun, Aug 14, 2022 at 11:07 AM Tom Lane wrote: > > > > vignesh C writes: > > > Currently we do not include the dependent extension information for > > > index and materialized view in the describe command. I felt it would > > > be useful to include this information as part of the describe command > > > like: > > > \d+ idx_depends > > > Index "public.idx_depends" > > > Column | Type | Key? | Definition | Storage | Stats target > > > +-+--++-+-- > > > a | integer | yes | a | plain | > > > btree, for table "public.tbl_idx_depends" > > > Depends: > > > "plpgsql" > > > > > Attached a patch for the same. Thoughts? > > > > This seems pretty much useless noise to me. Can you point to > > any previous requests for such a feature? If we did do it, > > why would we do it in such a narrow fashion (ie, only dependencies > > of two specific kinds of objects on one other specific kind of > > object)? Why did you do it in this direction rather than > > the other one, ie show dependencies when examining the extension? > > While implementing logical replication of "index which depends on > extension", I found that this information was not available in any of > the \d describe commands. I felt having this information in the \d > describe command will be useful in validating the "depends on > extension" easily. Now that you pointed out, I agree that it will be > better to show the dependencies from the extension instead of handling > it in multiple places. I will change it to handle it from extension > and post an updated version soon for this. I have updated the patch to display "Objects depending on extension" as describe extension footer. The changes for the same are available in the v2 version patch attached. Thoughts? Regards, Vignesh From 3bdb382f38b889b303f7a7036d9a0fc5dbeb2be7 Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Thu, 28 Jul 2022 22:19:00 +0530 Subject: [PATCH v2] Include the objects depending on extension in describe extension Include the objects depending on extension in describe extension --- .../expected/create_transform.out | 3 - src/bin/psql/describe.c | 94 --- .../expected/test_extensions.out | 6 -- src/test/regress/expected/indexing.out| 17 src/test/regress/expected/matview.out | 16 src/test/regress/sql/indexing.sql | 7 ++ src/test/regress/sql/matview.sql | 6 ++ 7 files changed, 128 insertions(+), 21 deletions(-) diff --git a/contrib/hstore_plperl/expected/create_transform.out b/contrib/hstore_plperl/expected/create_transform.out index dc72395376..d060d6ff65 100644 --- a/contrib/hstore_plperl/expected/create_transform.out +++ b/contrib/hstore_plperl/expected/create_transform.out @@ -49,7 +49,6 @@ CREATE EXTENSION hstore_plperl; function hstore_to_plperl(internal) function plperl_to_hstore(internal) transform for hstore language plperl -(3 rows) ALTER EXTENSION hstore_plperl DROP TRANSFORM FOR hstore LANGUAGE plperl; \dx+ hstore_plperl @@ -58,7 +57,6 @@ Objects in extension "hstore_plperl" - function hstore_to_plperl(internal) function plperl_to_hstore(internal) -(2 rows) ALTER EXTENSION hstore_plperl ADD TRANSFORM FOR hstore LANGUAGE plperl; \dx+ hstore_plperl @@ -68,7 +66,6 @@ ALTER EXTENSION hstore_plperl ADD TRANSFORM FOR hstore LANGUAGE plperl; function hstore_to_plperl(internal) function plperl_to_hstore(internal) transform for hstore language plperl -(3 rows) DROP EXTENSION hstore CASCADE; NOTICE: drop cascades to extension hstore_plperl diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 327a69487b..003a3361ec 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -20,6 +20,7 @@ #include "catalog/pg_cast_d.h" #include "catalog/pg_class_d.h" #include "catalog/pg_default_acl_d.h" +#include "catalog/pg_extension_d.h" #include "common.h" #include "common/logging.h" #include "describe.h" @@ -45,7 +46,12 @@ static bool describeOneTSConfig(const char *oid, const char *nspname, const char *cfgname, const char *pnspname, const char *prsname); static void printACLColumn(PQExpBuffer buf, const char *colname); -static bool listOneExtensionContents(const char *extname, const char *oid); +static bool listOneExtensionContents(const char *extname, const char *oid, + printTableContent *const content, + PQExpBufferData *title, + const printTableOpt *opt); +static bool addFooterToExtensionDesc(const char *extname, const char *oid, + printTableContent *const content); static bool validateSQLNamePattern(PQExpBuffer buf, const char *pattern, bool have_where, bool force_escape, const char *schemavar, const char *namevar, @@ -5994,6 +6000,8 @@ lis
Re: Allow logical replication to copy tables in binary format
Euler Taveira , 11 Ağu 2022 Per, 20:16 tarihinde şunu yazdı: > My main concern is to break a scenario that was previously working (14 -> > 15) but after a subscriber upgrade > it won't (14 -> 16). > Fair concern. Some cases that might break the logical replication with version upgrade would be: 1- Usage of different binary formats between publisher and subscriber. As stated here [1], binary format has been changed after v7.4. But I don't think this would be a concern, since we wouldn't even have logical replication with 7.4 and earlier versions. 2- Lack (or mismatch) of binary send/receive functions for custom data types would cause failures. This case can already cause failures with current logical replication, regardless of binary copy. Stated here [2]. 3- Copying in binary format would work with the same schemas. Currently, logical replication does not require the exact same schemas in publisher and subscriber. This is an additional restriction that comes with the COPY command. If a logical replication has been set up with different schemas and subscription is created with the binary option, then yes this would break things. This restriction can be clearly stated and wouldn't be unexpected though. I'm also okay with allowing binary copy only for v16 or later, if you think it would be safer and no one disagrees with that. What are your thoughts? I would say that you should test some scenarios: > 014_binary.pl and also custom data types, same column with different data > types, etc. > I added scenarios in two tests to test binary copy: 014_binary.pl: This was already testing subscriptions with binary option enabled. I added an extra step to insert initial data before creating the subscription. So that we can test initial table sync with binary copy. 002_types.pl: This file was already testing more complex data types. I added an extra subscriber node to create a subscription with binary option enabled. This way, it now tests binary copy with different custom types. Do you think these would be enough in terms of testing? Attached patch also includes some additions to the doc along with the tests. Thanks, Melih [1] https://www.postgresql.org/docs/devel/sql-copy.html [2] https://www.postgresql.org/docs/devel/sql-createsubscription.html v2-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: build remaining Flex files standalone
Hi, Thanks for your work on this! On 2022-08-13 15:39:06 +0700, John Naylor wrote: > Here are the rest. Most of it was pretty straightforward, with the > main exception of jsonpath_scan.c, which is not quite finished. That > one passes tests but still has one compiler warning. I'm unsure how > much of what is there already is really necessary or was cargo-culted > from elsewhere without explanation. For starters, I'm not sure why the > grammar has a forward declaration of "union YYSTYPE". It's noteworthy > that it used to compile standalone, but with a bit more stuff, and > that was reverted in 550b9d26f80fa30. I can hack on it some more later > but I ran out of steam today. I'm not sure either... > Other questions thus far: > > - "BISONFLAGS += -d" is now in every make file with a .y file -- can > we just force that everywhere? Hm. Not sure it's worth it, extensions might use our BISON stuff... > - Include order seems to matter for the grammar's .h file. I didn't > test if that was the case every time, and after a few miscompiles just > always made it the last inclusion, but I'm wondering if we should keep > those inclusions outside %top{} and put it at the start of the next > %{} ? I think we have a few of those dependencies already, see e.g. /* * NB: include gram.h only AFTER including scanner.h, because scanner.h * is what #defines YYLTYPE. */ > From d723ba14acf56fd432e9e263db937fcc13fc0355 Mon Sep 17 00:00:00 2001 > From: John Naylor > Date: Thu, 11 Aug 2022 19:38:37 +0700 > Subject: [PATCH v201 1/9] Build guc-file.c standalone Might be worth doing some of the moving around here separately from the parser/scanner specific bits. > +/* functions shared between guc.c and guc-file.l */ > +extern int guc_name_compare(const char *namea, const char *nameb); > +extern ConfigVariable *ProcessConfigFileInternal(GucContext context, > + > bool applySettings, int elevel); > +extern void record_config_file_error(const char *errmsg, > + const > char *config_file, > + int > lineno, > + > ConfigVariable **head_p, > + > ConfigVariable **tail_p); > > /* > * The following functions are not in guc.c, but are declared here to avoid > -- > 2.36.1 > I think I prefer your suggestion of a guc_internal.h upthread. > From 7d4ecfcb3e91f3b45e94b9e64c7c40f1bbd22aa8 Mon Sep 17 00:00:00 2001 > From: John Naylor > Date: Fri, 12 Aug 2022 15:45:24 +0700 > Subject: [PATCH v201 2/9] Build booscanner.c standalone > -# bootscanner is compiled as part of bootparse > -bootparse.o: bootscanner.c > +# See notes in src/backend/parser/Makefile about the following two rules > +bootparse.h: bootparse.c > + touch $@ > + > +bootparse.c: BISONFLAGS += -d > + > +# Force these dependencies to be known even without dependency info built: > +bootparse.o bootscan.o: bootparse.h Wonder if we could / should wrap this is something common. It's somewhat annoying to repeat this stuff everywhere. > diff --git a/src/test/isolation/specscanner.l > b/src/test/isolation/specscanner.l > index aa6e89268e..2dc292c21d 100644 > --- a/src/test/isolation/specscanner.l > +++ b/src/test/isolation/specscanner.l > @@ -1,4 +1,4 @@ > -%{ > +%top{ > /*- > * > * specscanner.l > @@ -9,7 +9,14 @@ > * > *- > */ > +#include "postgres_fe.h" Miniscule nitpick: I think we typically leave an empty line between header and first include. > diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h > index dbe7d4f742..0b373048b5 100644 > --- a/contrib/cube/cubedata.h > +++ b/contrib/cube/cubedata.h > @@ -67,3 +67,7 @@ extern void cube_scanner_finish(void); > > /* in cubeparse.y */ > extern int cube_yyparse(NDBOX **result); > + > +/* All grammar constructs return strings */ > +#define YYSTYPE char * Why does this need to be defined in a semi-public header? If we do this in multiple files we'll end up with the danger of macro redefinition warnings. > +extern int scanbuflen; The code around scanbuflen seems pretty darn grotty. Allocating enough memory for the entire list by allocating the entire string size... I don't know anything about contrib/cube, but isn't that in effect O(inputlen^2) memory? Greetings, Andres Freund
Re: [PATCH] Optimize json_lex_string by batching character copying
Hi, I ran this test. DROP TABLE IF EXISTS long_json_as_text; CREATE TABLE long_json_as_text AS with long as ( select repeat(description, 11) from pg_description ) select (select json_agg(row_to_json(long))::text as t from long) from generate_series(1, 100); VACUUM FREEZE long_json_as_text; select 1 from long_json_as_text where t::json is null; head: Time: 161,741ms v5: Time: 270,298 ms ubuntu 64 bits gcc 9.4.0 Am I missing something? regards, Ranier Vilela
Re: [PATCH] Optimize json_lex_string by batching character copying
Em seg., 15 de ago. de 2022 às 15:34, Ranier Vilela escreveu: > Hi, > > I ran this test. > > DROP TABLE IF EXISTS long_json_as_text; > CREATE TABLE long_json_as_text AS > with long as ( > select repeat(description, 11) > from pg_description > ) > select (select json_agg(row_to_json(long))::text as t from long) from > generate_series(1, 100); > VACUUM FREEZE long_json_as_text; > > select 1 from long_json_as_text where t::json is null; > > head: > Time: 161,741ms > > v5: > Time: 270,298 ms > Sorry too fast, 270,298ms with native memchr. v5 Time: 208,689 ms regards, Ranier Vilela
Re: Cleaning up historical portability baggage
Hi, On 2022-08-15 13:48:22 +1200, Thomas Munro wrote: > On Sun, Aug 14, 2022 at 10:36 AM Andres Freund wrote: > > On 2022-08-14 10:03:19 +1200, Thomas Munro wrote: > > > I hadn't paid attention to our existing abstract Unix socket support > > > before and now I'm curious: do we have a confirmed sighting of that > > > working on Windows? > > > > I vaguely remember successfully trying it in the past. But I just tried it > > unsuccessfully in a VM and there's a bunch of other places saying it's not > > working... > > https://github.com/microsoft/WSL/issues/4240 > > I think we'd better remove our claim that it works then. Patch attached. > > We could also reject it, I guess, but it doesn't immediately seem > harmful so I'm on the fence. On the Windows version that Cirrus is > running, we happily start up with: > > 2022-08-13 20:44:35.174 GMT [4760][postmaster] LOG: listening on Unix > socket "@c:/cirrus/.s.PGSQL.61696" What I find odd is that you said your naive program rejected this... FWIW, in an up-to-date windows 10 VM the client side fails with: psql: error: connection to server on socket "@frak/.s.PGSQL.5432" failed: Invalid argument (0x2726/10022) Is the server running locally and accepting connections on that socket? That's with most security things disabled and developer mode turned on. Greetings, Andres Freund
Re: Cleaning up historical portability baggage
On Tue, Aug 16, 2022 at 7:25 AM Andres Freund wrote: > On 2022-08-15 13:48:22 +1200, Thomas Munro wrote: > > 2022-08-13 20:44:35.174 GMT [4760][postmaster] LOG: listening on Unix > > socket "@c:/cirrus/.s.PGSQL.61696" > > What I find odd is that you said your naive program rejected this... No, I said it wasn't behaving sanely. It allowed me to create two sockets and bind them both to "\000C:\\xx", but I expected the second to fail with EADDRINUSE/10048[1]. I was messing around with things like that because my original aim was to check if the names are silently truncated through EADDRINUSE errors, an approach that worked for regular Unix sockets. [1] https://cirrus-ci.com/task/4643322672185344?logs=main#L16
Re: Expose Parallelism counters planned/execute in pg_stat_statements
Hi: We have rewritten the patch and added the necessary columns to have the > number of times a parallel query plan was not executed using parallelism. > > This version includes comments on the source code and documentation. Regards v3-0001-Add-parallel-counters-to-pg_stat_statements.patch Description: Binary data
Re: Cleaning up historical portability baggage
On Tue, Aug 16, 2022 at 7:51 AM Thomas Munro wrote: > [1] https://cirrus-ci.com/task/4643322672185344?logs=main#L16 Derp, I noticed that that particular horrendous quick and dirty test code was invalidated by a closesocket() call, but in another version I commented that out and it didn't help. Of course it's possible that I'm still doing something wrong in the test, I didn't spend long on this once I saw the bigger picture...
identifying the backend that owns a temporary schema
Hi hackers, As Greg Stark noted elsewhere [0], it is presently very difficult to identify the PID of the session using a temporary schema, which is particularly unfortunate when a temporary table is putting a cluster in danger of transaction ID wraparound. I noted [1] that the following query can be used to identify the PID for a given backend ID: SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid; But on closer inspection, this is just plain wrong. The backend IDs returned by pg_stat_get_backend_idset() might initially bear some resemblance to the backend IDs stored in PGPROC, so my suggested query might work some of the time, but the pg_stat_get_backend_* backend IDs typically diverge from the PGPROC backend IDs as sessions connect and disconnect. I think it would be nice to have a reliable way to discover the PID for a given temporary schema via SQL. The other thread [2] introduces a helpful log message that indicates the PID for temporary tables that are in danger of causing transaction ID wraparound, and I intend for this proposal to be complementary to that work. At first, I thought about adding a new function for retrieving the PGPROC backend IDs, but I am worried that having two sets of backend IDs would be even more confusing than having one set that can't reliably be used for temporary schemas. Instead, I tried adjusting the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs. This ended up being simpler than anticipated. I added a backend_id field to the LocalPgBackendStatus struct (which is populated within pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to bsearch() for the entry with the given PGPROC backend ID. This does result in a small behavior change. Currently, pg_stat_get_backend_idset() simply returns a range of numbers (1 to the number of active backends). With the attached patch, this function will still return a set of numbers, but there might be gaps between the IDs, and the maximum backend ID will usually be greater than the number of active backends. I suppose this might break some existing uses, but I'm not sure how much we should worry about that. IMO uniting the backend IDs is a net improvement. Thoughts? [0] https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com [1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com [2] https://commitfest.postgresql.org/39/3358/ -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 2bbdc6d99c2d84bd8016bb6df370e34100b5f0bc Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 12 Aug 2022 23:07:37 -0700 Subject: [PATCH v1 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC backend IDs. Presently, these functions use the index of the backend's entry in localBackendStatusTable as the backend ID. While this might bear some resemblance to the backend ID of the backend's PGPROC struct, it can quickly diverge as sessions connect and disconnect. This change modifies the pg_stat_get_backend_*() suite of functions to use the PGPROC backend IDs instead. While we could instead introduce a new function for retrieving PGPROC backend IDs, presenting two sets of backend IDs to users seems likely to cause even more confusion than what already exists. This is primarily useful for discovering the session that owns a resource named with its PGPROC backend ID. For example, temporary schema names include the PGPROC backend ID, and it might be necessary to identify the session that owns a temporary table that is putting the cluster in danger of transaction ID wraparound. Author: Nathan Bossart --- doc/src/sgml/monitoring.sgml| 8 ++--- src/backend/utils/activity/backend_status.c | 40 +++-- src/backend/utils/adt/pgstatfuncs.c | 7 ++-- src/include/utils/backend_status.h | 8 + 4 files changed, 54 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 1d9509a2f6..ecd0410229 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i information. In such cases, an older set of per-backend statistics access functions can be used; these are shown in . - These access functions use a backend ID number, which ranges from one - to the number of currently active backends. + These access functions use the process's backend ID number. The function pg_stat_get_backend_idset provides a - convenient way to generate one row for each active backend for + convenient way to list all the active backends' ID numbers for invoking these functions. For example, to show the PIDs and current queries of all backends: @@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, setof integer -
Re: [PATCH] Optimize json_lex_string by batching character copying
On Mon, Aug 15, 2022 at 08:33:21PM +0700, John Naylor wrote: > The attached implements the above, more or less, using new pg_lfind8() > and pg_lfind8_le(), which in turn are based on helper functions that > act on a single vector. The pg_lfind* functions have regression tests, > but I haven't done the same for json yet. I went the extra step to use > bit-twiddling for non-SSE builds using uint64 as a "vector", which > still gives a pretty good boost (test below, min of 3): Looks pretty reasonable to me. > +#ifdef USE_SSE2 > + chunk = _mm_loadu_si128((const __m128i *) &base[i]); > +#else > + memcpy(&chunk, &base[i], sizeof(chunk)); > +#endif /* USE_SSE2 */ > +#ifdef USE_SSE2 > + chunk = _mm_loadu_si128((const __m128i *) &base[i]); > +#else > + memcpy(&chunk, &base[i], sizeof(chunk)); > +#endif /* USE_SSE2 */ Perhaps there should be a macro or inline function for loading a vector so that these USE_SSE2 checks can be abstracted away, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: SQL/JSON features for v15
Hi, Continuation from the thread at https://postgr.es/m/20220811171740.m5b4h7x63g4lzgrk%40awork3.anarazel.de On 2022-08-11 10:17:40 -0700, Andres Freund wrote: > On 2022-08-11 13:08:27 -0400, Jonathan S. Katz wrote: > > With RMT hat on, Andres do you have any thoughts on this? > > I think I need to prototype how it'd look like to give a more detailed > answer. I have a bunch of meetings over the next few hours, but after that I > can give it a shot. I started hacking on this Friday. I think there's some relatively easy improvements that make the code substantially more understandable, at least for me, without even addressing the structural stuff. One thing I could use help understanding is the logic behind ExecEvalJsonNeedsSubTransaction() - there's no useful comments, so it's hard to follow. bool ExecEvalJsonNeedsSubTransaction(JsonExpr *jsexpr, struct JsonCoercionsState *coercions) { /* want error to be raised, so clearly no subtrans needed */ if (jsexpr->on_error->btype == JSON_BEHAVIOR_ERROR) return false; if (jsexpr->op == JSON_EXISTS_OP && !jsexpr->result_coercion) return false; if (!coercions) return true; return false; } I guess the !coercions bit is just about the planner, where we want to be pessimistic about when subtransactions are used, for the purpose of parallelism? Because that's the only place that passes in NULL. What really baffles me is that last 'return false' - it seems to indicate that there's no paths during query execution where ExecEvalJsonNeedsSubTransaction() returns true. And indeed, tests pass with an Assert(!needSubtrans) added to ExecEvalJson() (and then unsurprisingly also after removing the ExecEvalJsonExprSubtrans() indirection). What's going on here? We, somewhat confusingly, still rely on subtransactions, heavily so. Responsible for that is this hunk of code: boolthrowErrors = jexpr->on_error->btype == JSON_BEHAVIOR_ERROR; [...] cxt.error = throwErrors ? NULL : &error; cxt.coercionInSubtrans = !needSubtrans && !throwErrors; Assert(!needSubtrans || cxt.error); So basically we start a subtransaction inside ExecEvalJsonExpr(), to coerce the result type, whenever !needSubtrans (which is always!), unless ERROR ON ERROR is used. Which then also explains the theory behind the EXISTS_OP check in ExecEvalJsonNeedsSubTransaction(). In that case ExecEvalJsonExpr() returns early, before doing a return value coercion, thus not starting a subtransaction. I don't think it's sane from a performance view to start a subtransaction for every coercion, particularly because most coercion paths will never trigger an error, leaving things like out-of-memory or interrupts aside. And those are re-thrown by ExecEvalJsonExprSubtrans(). A quick and dirty benchmark shows ERROR ON ERROR nearly 2xing speed. I'm worried about the system impact of using subtransactions this heavily, it's not exactly the best performing system - the only reason it's kind of ok here is that it's going to be very rare to allocate a subxid, I think. Next question: /* * We should catch exceptions of category ERRCODE_DATA_EXCEPTION and * execute the corresponding ON ERROR behavior then. */ oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; Assert(error); BeginInternalSubTransaction(NULL); /* Want to execute expressions inside function's memory context */ MemoryContextSwitchTo(oldcontext); PG_TRY(); { res = func(op, econtext, res, resnull, p, error); /* Commit the inner transaction, return to outer xact context */ ReleaseCurrentSubTransaction(); MemoryContextSwitchTo(oldcontext); CurrentResourceOwner = oldowner; } PG_CATCH(); { ErrorData *edata; int ecategory; /* Save error info in oldcontext */ MemoryContextSwitchTo(oldcontext); edata = CopyErrorData(); FlushErrorState(); /* Abort the inner transaction */ RollbackAndReleaseCurrentSubTransaction(); MemoryContextSwitchTo(oldcontext); CurrentResourceOwner = oldowner; Two points: 1) I suspect it's not safe to switch to oldcontext before calling func(). On error we'll have leaked memory into oldcontext and we'll just continue on. It might not be very consequential here, because the calling context presumably isn't very long lived, but that's probably not something we should rely on. Also, are we sure that the context will be in a clean state when it's used within an erroring subtransaction? I think the right thing here would be to stay in the subtransaction context
Wrong comment in statscmds.c/CreateStatistics?
I happened to notice the following code in src/backend/commands/statscmds.c, CreateStatistics: == /* * Parse the statistics kinds. * * First check that if this is the case with a single expression, there * are no statistics kinds specified (we don't allow that for the simple * CREATE STATISTICS form). */ if ((list_length(stmt->exprs) == 1) && (list_length(stxexprs) == 1)) { /* statistics kinds not specified */ if (list_length(stmt->stat_types) > 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("when building statistics on a single expression, statistics kinds may not be specified"))); } == AFAICT that one-line comment (/* statistics kinds not specified */) is wrong because at that point we don't yet know if kinds are specified or not. SUGGESTION-1 Change the comment to /* Check there are no statistics kinds specified */ SUGGESTION-2 Simply remove that one-line comment because the larger comment seems to be saying the same thing anyhow. Thoughts? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Cleaning up historical portability baggage
On Fri, Aug 12, 2022 at 7:42 PM Thomas Munro wrote: > On Fri, Aug 12, 2022 at 5:14 AM Andres Freund wrote: > > I don't really know what to do about the warnings around remove_temp() and > > trapsig(). I think we actually may be overreading the restrictions. To me > > the > > documented restrictions read more like a high-level-ish explanation of > > what's > > safe in a signal handler and what not. And it seems to not have caused a > > problem on windows on several thousand CI cycles, including plenty failures. So the question there is whether we can run this stuff safely in Windows signal handler context, considering the rather vaguely defined conditions[1]: unlink(sockself); unlink(socklock); rmdir(temp_sockdir); You'd think that basic stuff like DeleteFile() that just enters the kernel would be async-signal-safe, like on Unix; the danger surely comes from stepping on the user context's toes with state mutations, locks etc. But let's suppose we want to play by a timid interpretation of that page's "do not issue low-level or STDIO.H I/O routines". It also says that SIGINT is special and runs the handler in a new thread (in a big warning box because that has other hazards that would break other kinds of code). Well, we *know* it's safe to unlink files in another thread... so... how cheesy would it be if we just did raise(SIGINT) in the real handlers? [1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170
Re: SQL/JSON features for v15
Hi, On 2022-08-15 15:38:53 -0700, Andres Freund wrote: > Next question: > > /* >* We should catch exceptions of category ERRCODE_DATA_EXCEPTION and >* execute the corresponding ON ERROR behavior then. >*/ > oldcontext = CurrentMemoryContext; > oldowner = CurrentResourceOwner; > > Assert(error); > > BeginInternalSubTransaction(NULL); > /* Want to execute expressions inside function's memory context */ > MemoryContextSwitchTo(oldcontext); > > PG_TRY(); > { > res = func(op, econtext, res, resnull, p, error); > > /* Commit the inner transaction, return to outer xact context */ > ReleaseCurrentSubTransaction(); > MemoryContextSwitchTo(oldcontext); > CurrentResourceOwner = oldowner; > } > PG_CATCH(); > { > ErrorData *edata; > int ecategory; > > /* Save error info in oldcontext */ > MemoryContextSwitchTo(oldcontext); > edata = CopyErrorData(); > FlushErrorState(); > > /* Abort the inner transaction */ > RollbackAndReleaseCurrentSubTransaction(); > MemoryContextSwitchTo(oldcontext); > CurrentResourceOwner = oldowner; > > > Two points: > > 1) I suspect it's not safe to switch to oldcontext before calling func(). > > On error we'll have leaked memory into oldcontext and we'll just continue > on. It might not be very consequential here, because the calling context > presumably isn't very long lived, but that's probably not something we should > rely on. > > Also, are we sure that the context will be in a clean state when it's used > within an erroring subtransaction? > > > I think the right thing here would be to stay in the subtransaction context > and then copy the datum out to the surrounding context in the success case. > > > 2) If there was an out-of-memory error, it'll have been in oldcontext. So > switching back to it before calling CopyErrorData() doesn't seem good - we'll > just hit OOM issues again. > > > I realize that both of these issues are present in plenty other code (see > e.g. plperl_spi_exec()). So I'm curious why they are ok? Certainly seems to be missing a FreeErrorData() for the happy path? It'd be nicer if we didn't copy the error. In the case we rethrow we don't need it, because we can just PG_RE_THROW(). And in the other path we just want to get the error code. It just risks additional errors to CopyErrorData(). But it's not entirely obvious that geterrcode() is intended for this: * This is only intended for use in error callback subroutines, since there * is no other place outside elog.c where the concept is meaningful. */ a PG_CATCH() block isn't really an error callback subroutine. But it should be fine. Greetings, Andres Freund
Re: Cleaning up historical portability baggage
Hi, On 2022-08-16 13:02:55 +1200, Thomas Munro wrote: > On Fri, Aug 12, 2022 at 7:42 PM Thomas Munro wrote: > > On Fri, Aug 12, 2022 at 5:14 AM Andres Freund wrote: > > > I don't really know what to do about the warnings around remove_temp() and > > > trapsig(). I think we actually may be overreading the restrictions. To me > > > the > > > documented restrictions read more like a high-level-ish explanation of > > > what's > > > safe in a signal handler and what not. And it seems to not have caused a > > > problem on windows on several thousand CI cycles, including plenty > > > failures. > > So the question there is whether we can run this stuff safely in > Windows signal handler context, considering the rather vaguely defined > conditions[1]: > >unlink(sockself); >unlink(socklock); >rmdir(temp_sockdir); > > You'd think that basic stuff like DeleteFile() that just enters the > kernel would be async-signal-safe, like on Unix; the danger surely > comes from stepping on the user context's toes with state mutations, > locks etc. Yea. I guess it could be different because things like file descriptors are just a userland concept. > But let's suppose we want to play by a timid interpretation of that page's > "do not issue low-level or STDIO.H I/O routines". It also says that SIGINT > is special and runs the handler in a new thread (in a big warning box > because that has other hazards that would break other kinds of code). Well, > we *know* it's safe to unlink files in another thread... so... how cheesy > would it be if we just did raise(SIGINT) in the real handlers? Not quite sure I understand. You're proposing to raise(SIGINT) for all other handlers, so that signal_remove_temp() gets called in another thread, because we assume that'd be safe because doing file IO in other threads is safe? That assumes the signal handler invocation infrastructure isn't the problem... Looks like we could register a "native" ctrl-c handler: https://docs.microsoft.com/en-us/windows/console/setconsolectrlhandler they're documented to run in a different thread, but without any of the file-io warnings. https://docs.microsoft.com/en-us/windows/console/handlerroutine Greetings, Andres Freund
Propose a new function - list_is_empty
During a recent code review I was going to suggest that some new code would be more readable if the following: if (list_length(alist) == 0) ... was replaced with: if (list_is_empty(alist)) ... but then I found that actually no such function exists. ~~~ Searching the PG source found many cases using all kinds of inconsistent ways to test for empty Lists: e.g.1 if (list_length(alist) > 0) e.g.2 if (list_length(alist) == 0) e.g.3 if (list_length(alist) != 0) e.g.4 if (list_length(alist) >= 1) e.g.5 if (list_length(alist) < 1) Of course, all of them work OK as-is, but by using list_is_empty all those can be made consistent and often also more readable as to the code intent. Patch 0001 adds a new function 'list_is_empty'. Patch 0002 makes use of it. Thoughts? -- Kind Regards, Peter Smith. Fujitsu Australia v1-0002-list_is_empty-use-the-new-function.patch Description: Binary data v1-0001-list_is_empty-new-function.patch Description: Binary data
Re: Propose a new function - list_is_empty
Peter Smith writes: > During a recent code review I was going to suggest that some new code > would be more readable if the following: > if (list_length(alist) == 0) ... > was replaced with: > if (list_is_empty(alist)) ... > but then I found that actually no such function exists. That's because the *correct* way to write it is either "alist == NIL" or just "!alist". I don't think we need yet another way to spell that, and I'm entirely not on board with replacing either of those idioms. But if you want to get rid of overcomplicated uses of list_length() in favor of one of those spellings, have at it. regards, tom lane
Re: Propose a new function - list_is_empty
On Tue, Aug 16, 2022 at 11:27 AM Tom Lane wrote: > > Peter Smith writes: > > During a recent code review I was going to suggest that some new code > > would be more readable if the following: > > if (list_length(alist) == 0) ... > > > was replaced with: > > if (list_is_empty(alist)) ... > > > but then I found that actually no such function exists. > > That's because the *correct* way to write it is either "alist == NIL" > or just "!alist". I don't think we need yet another way to spell > that, and I'm entirely not on board with replacing either of those > idioms. But if you want to get rid of overcomplicated uses of > list_length() in favor of one of those spellings, have at it. > Thanks for your advice. Yes, I saw that NIL is the definition of an empty list - that's how I implemented list_is_empty. OK, I'll ditch the function idea and just look at de-complicating those existing empty List checks. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: fix typos
On Fri, Aug 12, 2022 at 8:55 PM Tom Lane wrote: > > John Naylor writes: > > This is really a straw-man proposal, since I'm not volunteering to do > > the work, or suggest anybody else should do the same. That being the > > case, it seems we should just go ahead with Justin's patch for > > consistency. Possibly we could also change the messages to say "ID"? > > I'd be content if we change the user-facing messages (and documentation > if any) to say "ID" not "OID". The documentation has both, so it makes sense to standardize on "ID". The messages all had oid/OID, which I changed in the attached. I think I got all the places. I'm thinking it's not wrong enough to confuse people, but consistency is good, so backpatch to v15 and no further. Does anyone want to make a case otherwise? -- John Naylor EDB: http://www.enterprisedb.com diff --git a/doc/src/sgml/replication-origins.sgml b/doc/src/sgml/replication-origins.sgml index 7e02c4605b..bb0fb624d2 100644 --- a/doc/src/sgml/replication-origins.sgml +++ b/doc/src/sgml/replication-origins.sgml @@ -27,12 +27,12 @@ - Replication origins have just two properties, a name and an OID. The name, + Replication origins have just two properties, a name and an ID. The name, which is what should be used to refer to the origin across systems, is free-form text. It should be used in a way that makes conflicts between replication origins created by different replication solutions unlikely; e.g., by prefixing the replication solution's name to it. - The OID is used only to avoid having to store the long version + The ID is used only to avoid having to store the long version in situations where space efficiency is important. It should never be shared across systems. diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c index c72ad6b93d..9eb97fac58 100644 --- a/src/backend/replication/logical/origin.c +++ b/src/backend/replication/logical/origin.c @@ -328,7 +328,7 @@ replorigin_create(const char *roname) if (tuple == NULL) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("could not find free replication origin OID"))); + errmsg("could not find free replication origin ID"))); heap_freetuple(tuple); return roident; @@ -364,7 +364,7 @@ restart: if (nowait) ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), - errmsg("could not drop replication origin with OID %d, in use by PID %d", + errmsg("could not drop replication origin with ID %u, in use by PID %d", state->roident, state->acquired_by))); @@ -408,7 +408,7 @@ restart: */ tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident)); if (!HeapTupleIsValid(tuple)) - elog(ERROR, "cache lookup failed for replication origin with oid %u", + elog(ERROR, "cache lookup failed for replication origin with ID %u", roident); CatalogTupleDelete(rel, &tuple->t_self); @@ -485,7 +485,7 @@ replorigin_by_oid(RepOriginId roident, bool missing_ok, char **roname) if (!missing_ok) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("replication origin with OID %u does not exist", + errmsg("replication origin with ID %u does not exist", roident))); return false; @@ -937,7 +937,7 @@ replorigin_advance(RepOriginId node, { ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), - errmsg("replication origin with OID %d is already active for PID %d", + errmsg("replication origin with ID %u is already active for PID %d", replication_state->roident, replication_state->acquired_by))); } @@ -948,7 +948,7 @@ replorigin_advance(RepOriginId node, if (replication_state == NULL && free_state == NULL) ereport(ERROR, (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), - errmsg("could not find free replication state slot for replication origin with OID %u", + errmsg("could not find free replication state slot for replication origin with ID %u", node), errhint("Increase max_replication_slots and try again."))); @@ -1126,7 +1126,7 @@ replorigin_session_setup(RepOriginId node) { ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), - errmsg("replication origin with OID %d is already active for PID %d", + errmsg("replication origin with ID %u is already active for PID %d", curstate->roident, curstate->acquired_by))); } @@ -1138,7 +1138,7 @@ replorigin_session_setup(RepOriginId node) if (session_replication_state == NULL && free_slot == -1) ereport(ERROR, (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), - errmsg("could not find free replication state slot for replication origin with OID %u", + errmsg("could not find free replication state slot for replication origin with ID %u", node), errhint("Increase max_replication_slots and try again."))); else if (session_replication_state == NULL)
Re: SQL/JSON features for v15
Hi, On 2022-08-16 04:02:17 +0300, Nikita Glukhov wrote: > Hi, > > > On 16.08.2022 01:38, Andres Freund wrote: > > Continuation from the thread at > > https://postgr.es/m/20220811171740.m5b4h7x63g4lzgrk%40awork3.anarazel.de > > > > > > I started hacking on this Friday. I think there's some relatively easy > > improvements that make the code substantially more understandable, at least > > for me, without even addressing the structural stuff. > > I also started hacking Friday, hacked all weekend, and now have a new > version of the patch. Cool. > > I don't understand the design of what needs to have error handling, > > and what not. > > > I don't think subtransactions per-se are a fundamental problem. > > I think the error handling implementation is ridiculously complicated, > > and while I started to hack on improving it, I stopped when I really > > couldn't understand what errors it actually needs to handle when and > > why. > > Here is the diagram that may help to understand error handling in > SQL/JSON functions (I hope it will be displayed correctly): I think that is helpful. > JSON path --- > expression\ > ->+---+SQL/JSON+--+ Result > PASSING args --->| JSON path |--> item or --->| Output |-> SQL > ->| executor | JSONB .->| Coercion | value >/ +---+ datum | +--+ > JSON + - - - -+ | || | > Context ->: FORMAT : v v| v > item : JSON : error? empty? | error? >+ - - - -+ | || | >|| +--+ | / >v| | ON EMPTY |--> SQL --' / > error? | +--+ value / >|| |/ > \ | v / > \ \ error? / > \ \| / >\__ \ | _/ > \ \ | / >v v v v +--+ > +--+| Output | Result > | ON ERROR |--->| Coercion |--> SQL > +--++--+ value >| | >V V >EXCEPTION EXCEPTION > > > The first dashed box "FORMAT JSON" used for parsing JSON is absent in > our implementation, because we support only jsonb type which is > pre-parsed. This could be used in queries like that: > JSON_VALUE('invalid json', '$' DEFAULT 'error' ON ERROR) => 'error' > On Aug 3, 2022 at 12:00 AM Andres Freund wrote: > > But we don't need to wrap arbitrary evaluation in a subtransaction - > > afaics the coercion calls a single function, not an arbitrary > > expression? > > SQL standard says that scalar SQL/JSON items are converted to SQL type > through CAST(corresponding_SQL_type_for_item AS returning_type). > Our JSON_VALUE implementation supports arbitrary output types that can > have specific CASTs from numeric, bool, datetime, which we can't > emulate with simple I/O coercion. But supporting of arbitrary types > may be dangerous, because SQL standard denotes only a limited set of > types: > > The contained in the explicit or implicit >shall be a that identifies > a character string data type, numeric data type, boolean data type, > or datetime data type. > > I/O coercion will not even work in the following simple case: > JSON_VALUE('1.23', '$' RETURNING int) > It is expected to return 1::int, like ordinary cast 1.23::numeric::int. Whether it's just IO coercions or also coercions through function calls doesn't matter terribly, as long as both can be wrapped as a single interpretation step. You can have a EEOP_JSON_COERCE_IO, EEOP_JSON_COERCE_FUNC that respectively call input/output function and the transformation routine within a subtransaction. On error they can jump to some on_error execution step. The difficulty is likely just dealing with the intermediary nodes like RelabelType. > Exceptions may come not only from coercions. Expressions in DEFAULT ON > EMPTY can also throw exceptions, which also must be handled. Are there other cases? > Only the one item coercion is used in execution of JSON_VALUE(). > Multiple coercions could be executed, if we supported quite useful SRF > JSON_QUERY() using "RETURNING SETOF type" (I had this idea for a long > time, but I didn't dare to implement it). > > I don't understand what "memory" you mean. I'm not entirely sure what I meant at that time either. Understanding this code involves a lot of guessing since there's practically no explanatory comments. > If we will not emit all possible expressions statically, we will need to
Add find_in_log() and advance_wal() perl functions to core test framework (?)
Hi, It seems like find_in_log() and advance_wal() functions (which are now being used in at least 2 places). find_in_log() is defined and being used in 2 places 019_replslot_limit.pl and 033_replay_tsp_drops.pl. The functionality of advancing WAL is implemented in 019_replslot_limit.pl with advance_wal() and 001_stream_repl.pl with the same logic as advance_wal() but no function there and an in-progress feature [1] needs advance_wal() as-is for tests. Do these functions qualify to be added to the core test framework in Cluster.pm? Or do we need more usages of these functions before we generalize and add to the core test framework? If added, a bit of duplicate code can be reduced and they become more usable across the entire tests for future use. Thoughts? [1] https://www.postgresql.org/message-id/calj2acuyz1z6qpdugn5gguckfd-ko44j4hkcomtp6fzv9xe...@mail.gmail.com -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Wrong comment in statscmds.c/CreateStatistics?
Yeah, the comments are kind of confusing, see some comments inline. On Tue, Aug 16, 2022 at 8:47 AM Peter Smith wrote: > > I happened to notice the following code in > src/backend/commands/statscmds.c, CreateStatistics: > > == > /* > * Parse the statistics kinds. > * > * First check that if this is the case with a single expression, there > * are no statistics kinds specified (we don't allow that for the simple maybe change to *there should be no* is better? > * CREATE STATISTICS form). > */ > if ((list_length(stmt->exprs) == 1) && (list_length(stxexprs) == 1)) > { > /* statistics kinds not specified */ remove this line or change to *statistics kinds should not be specified*, I prefer just removing it. > if (list_length(stmt->stat_types) > 0) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("when building statistics on a single expression, statistics > kinds may not be specified"))); change *may* to *should*? > } > == > > > AFAICT that one-line comment (/* statistics kinds not specified */) is > wrong because at that point we don't yet know if kinds are specified > or not. > > SUGGESTION-1 > Change the comment to /* Check there are no statistics kinds specified */ > > SUGGESTION-2 > Simply remove that one-line comment because the larger comment seems > to be saying the same thing anyhow. > > Thoughts? > > -- > Kind Regards, > Peter Smith. > Fujitsu Australia > > -- Regards Junwang Zhao
Re: SELECT documentation
On Sat, Aug 13, 2022 at 10:21:26PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > Hi, I agree we should show the more modern JOIN sytax. However, this is > > just an example, so one example should be sufficient. I went with the > > first one in the attached patch. > > You should not remove the CROSS JOIN mention at l. 604, first because > the references to it just below would become odd, and second because > then it's not explained anywhere on the page. Perhaps you could > put back a definition of CROSS JOIN just below the entry for NATURAL, > but you'll still have to do something with the references at l. 614, > 628, 632. Good point. I restrutured the docs to move CROSS JOIN to a separate section like NATURAL and adjusted the text, patch attached. > Also, doesn't "[ AS join_using_alias ]" apply to NATURAL and CROSS > joins? You've left that out of the syntax summary. Uh, I only see it for USING in gram.y: /* JOIN qualification clauses * Possibilities are: * USING ( column list ) [ AS alias ] *allows only unqualified column names, *which must match between tables. * ON expr allows more general qualifications. * * We return USING as a two-element List (the first item being a sub-List * of the common column names, and the second either an Alias item or NULL). * An ON-expr will not be a List, so it can be told apart that way. */ join_qual: USING '(' name_list ')' opt_alias_clause_for_join_using { $$ = (Node *) list_make2($3, $5); } | ON a_expr { $$ = $2; } ; ... /* * The alias clause after JOIN ... USING only accepts the AS ColId spelling, * per SQL standard. (The grammar could parse the other variants, but they * don't seem to be useful, and it might lead to parser problems in the * future.) */ opt_alias_clause_for_join_using: AS ColId { $$ = makeNode(Alias); $$->aliasname = $2; /* the column name list will be inserted later */ } | /*EMPTY*/ { $$ = NULL; } ; which is only used in: | table_ref join_type JOIN table_ref join_qual | table_ref JOIN table_ref join_qual I have updated my private build: https://momjian.us/tmp/pgsql/sql-select.html -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml new file mode 100644 index 410c80e..1f9538f *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** SELECT [ ALL | DISTINCT [ ON ( function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] ) [ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] ) [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] ! from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) [ AS join_using_alias ] ] and grouping_element can be one of: --- 59,67 [ LATERAL ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] ) [ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] ) [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ] ! from_item join_type from_item { ON join_condition | USING ( join_column [, ...] ) [ AS join_using_alias ] } ! from_item NATURAL join_type from_item ! from_item CROSS JOIN from_item and grouping_element can be one of: *** TABLE [ ONLY ] join_condition, or USING (join_column [, ...]). ! See below for the meaning. For CROSS JOIN, ! none of these clauses can appear. --- 602,616 FULL [ OUTER ] JOIN For the INNER and OUTER join types, a join condition must be specified, namely exactly one of ! ON join_condition, USING (join_column [, ...]), ! or NATURAL. See below for the meaning. *** TABLE [ ONLY ] + CROSS JOIN + + + CROSS JOIN is equivalent to INNER JOIN ON + (TRUE), that is, no rows are removed by qualification. + They produce a simple Cartesian product, the same result as you get from
Re: pg_upgrade test writes to source directory
Hi, On 2022-08-11 11:26:39 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-06-01 10:55:28 -0400, Tom Lane wrote: > >> [...] I'm definitely not happy with the proposed changes to > >> 010_tab_completion.pl. My recollection is that those tests > >> were intentionally written to test tab completion involving a > >> directory name, but this change just loses that aspect entirely. > > > How about creating a dedicated directory for the created files, to maintain > > that? My goal of being able to redirect the test output elsewhere can be > > achieved with just a hunk like this: > > Sure, there's no need for these files to be in the exact same place that > the output is collected. I just want to keep their same relationship > to the test's CWD. > > > Of course it'd need a comment adjustment etc. It's a bit ugly to use a > > otherwise empty tmp_check/ directory just to reduce the diff size, but it's > > also not too bad. > > Given that it's no longer going to be the same tmp_check dir used > elsewhere, maybe we could s/tmp_check/tab_comp_dir/g or something > like that? That'd add some clarity I think. Done in the attached patch (0001). A bunch of changes (e.g. f4ce6c4d3a3) made since I'd first written that TESTOUTDIR patch means that we don't need two different variables anymore. So patch 0002 just moves the addition of /tmp_check from Utils.pm to the places in which TESTDIR is defined. That still "forces" tmp_check/ to exist when going through pg_regress, but that's less annoying because pg_regress at least keeps regression.{diffs,out}/log files/directory outside of tmp_check/. I've also attached a 0003 that splits the log location from the data location. That could be used to make the log file location symmetrical between pg_regress (log/) and tap tests (tmp_check/log). But it'd break the buildfarm's tap test log file collection, so I don't think that's something we really can do soon-ish? Greetings, Andres Freund >From eebc08f6628b15e0e8cc1a84dec9bbbd9931696b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 11 Aug 2022 08:57:58 -0700 Subject: [PATCH 1/3] Create test files for 010_tab_completion.pl in dedicated directory This is mainly motivated by the meson patchset, which wants to put the log / data for tests in a different place than the autoconf build. 010_tab_completion.pl is the only test hardcoding the tmp_check/ directory, thereby mandating that directory name. It's also a bit cleaner independently, because it doesn't intermingle the files with more important things like the log/ directory. --- src/bin/psql/t/010_tab_completion.pl | 33 ++-- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 2eea515e871..cb36e8e4811 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -68,7 +68,7 @@ delete $ENV{LS_COLORS}; # In a VPATH build, we'll be started in the source directory, but we want # to run in the build directory so that we can use relative paths to -# access the tmp_check subdirectory; otherwise the output from filename +# access the tab_comp_dir subdirectory; otherwise the output from filename # completion tests is too variable. if ($ENV{TESTDIR}) { @@ -76,17 +76,18 @@ if ($ENV{TESTDIR}) } # Create some junk files for filename completion testing. +mkdir "tab_comp_dir"; my $FH; -open $FH, ">", "tmp_check/somefile" - or die("could not create file \"tmp_check/somefile\": $!"); +open $FH, ">", "tab_comp_dir/somefile" + or die("could not create file \"tab_comp_dir/somefile\": $!"); print $FH "some stuff\n"; close $FH; -open $FH, ">", "tmp_check/afile123" - or die("could not create file \"tmp_check/afile123\": $!"); +open $FH, ">", "tab_comp_dir/afile123" + or die("could not create file \"tab_comp_dir/afile123\": $!"); print $FH "more stuff\n"; close $FH; -open $FH, ">", "tmp_check/afile456" - or die("could not create file \"tmp_check/afile456\": $!"); +open $FH, ">", "tab_comp_dir/afile456" + or die("could not create file \"tab_comp_dir/afile456\": $!"); print $FH "other stuff\n"; close $FH; @@ -272,16 +273,16 @@ clear_query(); # check filename completion check_completion( - "\\lo_import tmp_check/some\t", - qr|tmp_check/somefile |, + "\\lo_import tab_comp_dir/some\t", + qr|tab_comp_dir/somefile |, "filename completion with one possibility"); clear_query(); # note: readline might print a bell before the completion check_completion( - "\\lo_import tmp_check/af\t", - qr|tmp_check/af\a?ile|, + "\\lo_import tab_comp_dir/af\t", + qr|tab_comp_dir/af\a?ile|, "filename completion with multiple possibilities"); # broken versions of libedit require clear_line not clear_query here @@ -291,15 +292,15 @@ clear_line(); # note: broken versions of libedit want to backslash the closing quote; # not much we can do about that check_completion( - "COPY foo FROM tmp_check/some\t", - qr|'tmp_
Logical WAL sender unresponsive during decoding commit
Hi hackers! Some time ago I've seen a hanging logical replication that was trying to send transaction commit after doing table pg_repack. I understand that those things do not mix well. Yet walsender was ignoring pg_terminate_backend() and I think this worth fixing. Can we add CHECK_FOR_INTERRUPTS(); somewhere in this backtrace? Full session is attaches as file. #0 pfree (pointer=0x561850bbee40) at ./build/../src/backend/utils/mmgr/mcxt.c:1032 #1 0x5617712530d6 in ReorderBufferReturnTupleBuf (tuple=, rb=) at ./build/../src/backend/replication/logical/reorderbuffer.c:469 #2 ReorderBufferReturnChange (rb=, change=0x561772456048) at ./build/../src/backend/replication/logical/reorderbuffer.c:398 #3 0x561771253da1 in ReorderBufferRestoreChanges (rb=rb@entry=0x561771c14e10, txn=0x561771c0b078, file=file@entry=0x561771c15168, segno=segno@entry=0x561771c15178) at ./build/../src/backend/replication/logical/reorderbuffer.c:2570 #4 0x5617712553ba in ReorderBufferIterTXNNext (state=0x561771c15130, rb=0x561771c14e10) at ./build/../src/backend/replication/logical/reorderbuffer.c:1146 #5 ReorderBufferCommit (rb=0x561771c14e10, xid=xid@entry=2976347782, commit_lsn=79160378448744, end_lsn=, commit_time=commit_time@entry=686095734290578, origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1523 #6 0x56177124a30a in DecodeCommit (xid=2976347782, parsed=0x7ffc3cb4c240, buf=0x7ffc3cb4c400, ctx=0x561771b10850) at ./build/../src/backend/replication/logical/decode.c:640 #7 DecodeXactOp (ctx=0x561771b10850, buf=buf@entry=0x7ffc3cb4c400) at ./build/../src/backend/replication/logical/decode.c:248 #8 0x56177124a6a9 in LogicalDecodingProcessRecord (ctx=0x561771b10850, record=0x561771b10ae8) at ./build/../src/backend/replication/logical/decode.c:117 #9 0x56177125d1e5 in XLogSendLogical () at ./build/../src/backend/replication/walsender.c:2893 #10 0x56177125f5f2 in WalSndLoop (send_data=send_data@entry=0x56177125d180 ) at ./build/../src/backend/replication/walsender.c:2242 #11 0x561771260125 in StartLogicalReplication (cmd=) at ./build/../src/backend/replication/walsender.c:1179 #12 exec_replication_command (cmd_string=cmd_string@entry=0x561771abe590 "START_REPLICATION SLOT dttsjtaa66crdhbm015h LOGICAL 0/0 ( \"include-timestamp\" '1', \"include-types\" '1', \"include-xids\" '1', \"write-in-chunks\" '1', \"add-tables\" '/* sanitized */.claim_audit,public.__consu"...) at ./build/../src/backend/replication/walsender.c:1612 #13 0x5617712b2334 in PostgresMain (argc=, argv=argv@entry=0x561771b2a438, dbname=, username=) at ./build/../src/backend/tcop/postgres.c:4267 #14 0x56177123857c in BackendRun (port=0x561771b0d7a0, port=0x561771b0d7a0) at ./build/../src/backend/postmaster/postmaster.c:4484 #15 BackendStartup (port=0x561771b0d7a0) at ./build/../src/backend/postmaster/postmaster.c:4167 #16 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1725 #17 0x56177123954b in PostmasterMain (argc=9, argv=0x561771ab70e0) at ./build/../src/backend/postmaster/postmaster.c:1398 #18 0x561770fae8b6 in main (argc=9, argv=0x561771ab70e0) at ./build/../src/backend/main/main.c:228 What do you think? Thank you! Best regards, Andrey Borodin. check_for_interrupts.diff Description: Binary data Time: 0.731 ms sas-/* sanitized *//postgres M # select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid |xmin| catalog_xmin | restart_lsn | confirmed_flush_lsn +--+---++--+---++++--+---+- /* sanitized */ | [null] | physical | [null] | [null] | f | t |1719461 | 2980736032 | [null] | 4818/908C9B98 | [null] vla_/* sanitized */_db_yandex_net | [null] | physical | [null] | [null] | f | t |1719460 | 2980736032 | [null] | 4818/908DE000 | [null] dttsjtaa66crdhbm015h | wal2json | logical | 16441 | /* sanitized */ | f | t |3697390 | [null] | 2976347004 | 47F6/3FFE5DA0 | 47FE/F588E800 (3 rows) Time: 0.454 ms sas-/* sanitized *//postgres M # select pg_terminate_backend(3697390); pg_terminate_backend -- t (1 row) Time: 0.189 ms sas-/* sanitized *//postgres M # select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid |xmin| catalog_xmin | restart_lsn | confirmed_flush_lsn +--+---++--+---++++--+---+- man_/* sanitized */_db_yandex_net | [null] | physical | [null] |
Re: Cleaning up historical portability baggage
On Tue, Aug 16, 2022 at 1:16 PM Andres Freund wrote: > > But let's suppose we want to play by a timid interpretation of that page's > > "do not issue low-level or STDIO.H I/O routines". It also says that SIGINT > > is special and runs the handler in a new thread (in a big warning box > > because that has other hazards that would break other kinds of code). Well, > > we *know* it's safe to unlink files in another thread... so... how cheesy > > would it be if we just did raise(SIGINT) in the real handlers? > > Not quite sure I understand. You're proposing to raise(SIGINT) for all other > handlers, so that signal_remove_temp() gets called in another thread, because > we assume that'd be safe because doing file IO in other threads is safe? That > assumes the signal handler invocation infrastructure isn't the problem... That's what I was thinking about, yeah. But after some more reading, now I'm wondering if we'd even need to do that, or what I'm missing. The 6 listed signals in the manual are SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV and SIGTERM (the 6 required by C). We want to run signal_remove_temp() on SIGHUP (doesn't exist, we made it up), SIGINT, SIGPIPE (doesn't exist, we made it up), and SIGTERM (exists for C spec compliance but will never be raised by the system according to the manual, and we don't raise it ourselves IIUC). So the only case we actually have to consider is SIGINT, and SIGINT handlers run in a thread, so if we assume it is therefore exempt from those very-hard-to-comply-with rules, aren't we good already? What am I missing? > Looks like we could register a "native" ctrl-c handler: > https://docs.microsoft.com/en-us/windows/console/setconsolectrlhandler > they're documented to run in a different thread, but without any of the > file-io warnings. > https://docs.microsoft.com/en-us/windows/console/handlerroutine Sounds better in general, considering the extreme constraints of the signal system, but it'd be nice to see if the current system is truly unsafe before writing more alien code. Someone who wants to handle more than one SIGINT would certainly need to consider that, because there doesn't seem to be a race-free way to reinstall the signal handler when you receive it[1]. Races aside, for any signal except SIGINT (assuming the above-mentioned exemption), you're probably also not even allowed to try because raise() might be a system call and they're banned. Fortunately we don't care, we wanted SIG_DFL next anyway. [1] https://wiki.sei.cmu.edu/confluence/display/c/SIG01-C.+Understand+implementation-specific+details+regarding+signal+handler+persistence
Re: Support logical replication of global object commands
On Fri, Aug 12, 2022 at 5:41 PM Amit Kapila wrote: > > On Tue, Aug 9, 2022 at 1:31 AM Zheng Li wrote: > > > > Hello, > > > > Logical replication of DDL commands support is being worked on in [1]. > > However, global object commands are quite different from other > > non-global object DDL commands and need to be handled differently. For > > example, global object commands include ROLE statements, DATABASE > > statements, TABLESPACE statements and a subset of GRANT/REVOKE > > statements if the object being modified is a global object. These > > commands are different from other DDL commands in that: > > > > 1. Global object commands can be executed in any database. > > 2. Global objects are not schema qualified. > > 3. Global object commands are not captured by event triggers. > > > > I’ve put together a prototype to support logical replication of global > > object commands in the attached patch. This patch builds on the DDL > > replication patch from ZJ in [2] and must be applied on top of it. > > Here is a list of global object commands that the patch replicate, you > > can find more details in function LogGlobalObjectCommand: > > > > /* ROLE statements */ > > CreateRoleStmt > > AlterRoleStmt > > AlterRoleSetStmt > > DropRoleStmt > > ReassignOwnedStmt > > GrantRoleStmt > > > > /* Database statements */ > > CreatedbStmt > > AlterDatabaseStmt > > AlterDatabaseRefreshCollStmt > > AlterDatabaseSetStmt > > DropdbStmt > > > > /* TableSpace statements */ > > CreateTableSpaceStmt > > DropTableSpaceStmt > > AlterTableSpaceOptionsStmt > > > > /* GrantStmt and RevokeStmt if objtype is a global object determined > > by EventTriggerSupportsObjectType() */ > > GrantStmt > > RevokeStmt > > > > The idea with this patch is to support global objects commands > > replication by WAL logging the command using the same function for DDL > > logging - LogLogicalDDLMessage towards the end of > > standard_ProcessUtility. Because global objects are not schema > > qualified, we can skip the deparser invocation and directly log the > > original command string for replay on the subscriber. > > > > A key problem to address is that global objects can become > > inconsistent between the publisher and the subscriber if a command > > modifying the global object gets executed in a database (on the source > > side) that doesn't replicate the global object commands. I think we > > can work on the following two aspects in order to avoid such > > inconsistency: > > > > 1. Introduce a publication option for global object commands > > replication and document that logical replication of global object > > commands is preferred to be enabled on all databases. Otherwise > > inconsistency can happen if a command modifies the global object in a > > database that doesn't replicate global object commands. > > > > For example, we could introduce the following publication option > > publish_global_object_command : > > CREATE PUBLICATION mypub > > FOR ALL TABLES > > WITH (publish = 'insert, delete, update', publish_global_object_command = > > true); > > > > Tying global objects with FOR ALL TABLES seems odd to me. One possible > idea could be to introduce publications FOR ALL OBJECTS. However, I am > not completely sure whether tying global objects with > database-specific publications is what users would expect but OTOH I > don't have any better ideas here. > Can we think of relying to send WAL of such DDLs just based on whether there is a corresponding publication (ex. publication of ALL OBJECTS)? I mean avoid database-specific filtering in decoding for such DDL commands but not sure how much better it is than the current proposal? The other idea that occurred to me is to have separate event triggers for global objects that we can store in the shared catalog but again it is not clear how to specify the corresponding function as functions are database specific. Thoughts? -- With Regards, Amit Kapila.
Re: Logical WAL sender unresponsive during decoding commit
On Tue, Aug 16, 2022 at 9:28 AM Andrey Borodin wrote: > > Hi hackers! > > Some time ago I've seen a hanging logical replication that was trying to send > transaction commit after doing table pg_repack. > I understand that those things do not mix well. Yet walsender was ignoring > pg_terminate_backend() and I think this worth fixing. > Can we add CHECK_FOR_INTERRUPTS(); somewhere in this backtrace? > I think if we want to do this in this code path then it may be it is better to add it in ReorderBufferProcessTXN where we are looping to process each change. -- With Regards, Amit Kapila.
Re: pg_upgrade test writes to source directory
Hi, On 2022-08-15 20:20:51 -0700, Andres Freund wrote: > On 2022-08-11 11:26:39 -0400, Tom Lane wrote: > > Andres Freund writes: > > > On 2022-06-01 10:55:28 -0400, Tom Lane wrote: > > >> [...] I'm definitely not happy with the proposed changes to > > >> 010_tab_completion.pl. My recollection is that those tests > > >> were intentionally written to test tab completion involving a > > >> directory name, but this change just loses that aspect entirely. > > > > > How about creating a dedicated directory for the created files, to > > > maintain > > > that? My goal of being able to redirect the test output elsewhere can be > > > achieved with just a hunk like this: > > > > Sure, there's no need for these files to be in the exact same place that > > the output is collected. I just want to keep their same relationship > > to the test's CWD. > > > > > Of course it'd need a comment adjustment etc. It's a bit ugly to use a > > > otherwise empty tmp_check/ directory just to reduce the diff size, but > > > it's > > > also not too bad. > > > > Given that it's no longer going to be the same tmp_check dir used > > elsewhere, maybe we could s/tmp_check/tab_comp_dir/g or something > > like that? That'd add some clarity I think. > > Done in the attached patch (0001). > > A bunch of changes (e.g. f4ce6c4d3a3) made since I'd first written that > TESTOUTDIR patch means that we don't need two different variables anymore. So > patch 0002 just moves the addition of /tmp_check from Utils.pm to the places > in which TESTDIR is defined. > > That still "forces" tmp_check/ to exist when going through pg_regress, but > that's less annoying because pg_regress at least keeps > regression.{diffs,out}/log files/directory outside of tmp_check/. > > I've also attached a 0003 that splits the log location from the data > location. That could be used to make the log file location symmetrical between > pg_regress (log/) and tap tests (tmp_check/log). But it'd break the > buildfarm's tap test log file collection, so I don't think that's something we > really can do soon-ish? Oops, 0003 had some typos in it that I added last minute... Corrected patches attached. Greetings, Andres Freund >From eebc08f6628b15e0e8cc1a84dec9bbbd9931696b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 11 Aug 2022 08:57:58 -0700 Subject: [PATCH v2 1/3] Create test files for 010_tab_completion.pl in dedicated directory This is mainly motivated by the meson patchset, which wants to put the log / data for tests in a different place than the autoconf build. 010_tab_completion.pl is the only test hardcoding the tmp_check/ directory, thereby mandating that directory name. It's also a bit cleaner independently, because it doesn't intermingle the files with more important things like the log/ directory. --- src/bin/psql/t/010_tab_completion.pl | 33 ++-- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index 2eea515e871..cb36e8e4811 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -68,7 +68,7 @@ delete $ENV{LS_COLORS}; # In a VPATH build, we'll be started in the source directory, but we want # to run in the build directory so that we can use relative paths to -# access the tmp_check subdirectory; otherwise the output from filename +# access the tab_comp_dir subdirectory; otherwise the output from filename # completion tests is too variable. if ($ENV{TESTDIR}) { @@ -76,17 +76,18 @@ if ($ENV{TESTDIR}) } # Create some junk files for filename completion testing. +mkdir "tab_comp_dir"; my $FH; -open $FH, ">", "tmp_check/somefile" - or die("could not create file \"tmp_check/somefile\": $!"); +open $FH, ">", "tab_comp_dir/somefile" + or die("could not create file \"tab_comp_dir/somefile\": $!"); print $FH "some stuff\n"; close $FH; -open $FH, ">", "tmp_check/afile123" - or die("could not create file \"tmp_check/afile123\": $!"); +open $FH, ">", "tab_comp_dir/afile123" + or die("could not create file \"tab_comp_dir/afile123\": $!"); print $FH "more stuff\n"; close $FH; -open $FH, ">", "tmp_check/afile456" - or die("could not create file \"tmp_check/afile456\": $!"); +open $FH, ">", "tab_comp_dir/afile456" + or die("could not create file \"tab_comp_dir/afile456\": $!"); print $FH "other stuff\n"; close $FH; @@ -272,16 +273,16 @@ clear_query(); # check filename completion check_completion( - "\\lo_import tmp_check/some\t", - qr|tmp_check/somefile |, + "\\lo_import tab_comp_dir/some\t", + qr|tab_comp_dir/somefile |, "filename completion with one possibility"); clear_query(); # note: readline might print a bell before the completion check_completion( - "\\lo_import tmp_check/af\t", - qr|tmp_check/af\a?ile|, + "\\lo_import tab_comp_dir/af\t", + qr|tab_comp_dir/af\a?ile|, "filename completion with multiple possibilities"); # broken versions of libedit require cl
Re: Logical WAL sender unresponsive during decoding commit
On Tue, Aug 16, 2022 at 2:08 PM Amit Kapila wrote: > > On Tue, Aug 16, 2022 at 9:28 AM Andrey Borodin wrote: > > > > Hi hackers! > > > > Some time ago I've seen a hanging logical replication that was trying to > > send transaction commit after doing table pg_repack. > > I understand that those things do not mix well. Yet walsender was ignoring > > pg_terminate_backend() and I think this worth fixing. > > Can we add CHECK_FOR_INTERRUPTS(); somewhere in this backtrace? > > > > I think if we want to do this in this code path then it may be it is > better to add it in ReorderBufferProcessTXN where we are looping to > process each change. +1 The same issue is recently reported[1] on -bugs and I proposed the patch that adds CHECK_FOR_INTERRUPTS() to the loop in ReorderBufferProcessTXN(). I think it should be backpatched. Regards, [1] https://www.postgresql.org/message-id/CAD21AoD%2BaNfLje%2B9JOqWbTiq1GL4BOp9_f7FxLADm8rS8cDhCQ%40mail.gmail.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Propose a new function - list_is_empty
On Tue, Aug 16, 2022 at 11:27 AM Tom Lane wrote: > > Peter Smith writes: > > During a recent code review I was going to suggest that some new code > > would be more readable if the following: > > if (list_length(alist) == 0) ... > > > was replaced with: > > if (list_is_empty(alist)) ... > > > but then I found that actually no such function exists. > > That's because the *correct* way to write it is either "alist == NIL" > or just "!alist". I don't think we need yet another way to spell > that, and I'm entirely not on board with replacing either of those > idioms. But if you want to get rid of overcomplicated uses of > list_length() in favor of one of those spellings, have at it. Done, and tested OK with make check-world. PSA. -- Kind Regards, Peter Smith. Fujitsu Australia. v2-0001-use-NIL-test-for-empty-List-checks.patch Description: Binary data
[PG15 Doc] remove "tty" connect string from manual
Hello, hackers. As of PostgreSQL 14, "tty" in the libpq connection string has already been removed with the commit below. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=14d9b37607ad30c3848ea0f2955a78436eff1268 But https://www.postgresql.org/docs/15/libpq-connect.html#LIBPQ-CONNSTRING still says "Ignored (formerly, this specified where to send server debug output)". The attached patch removes the "tty" item. Regards, Noriyoshi Shinoda libpq-connect-v1.diff Description: libpq-connect-v1.diff
Re: Logical WAL sender unresponsive during decoding commit
On Tue, Aug 16, 2022 at 10:56 AM Masahiko Sawada wrote: > > On Tue, Aug 16, 2022 at 2:08 PM Amit Kapila wrote: > > > > On Tue, Aug 16, 2022 at 9:28 AM Andrey Borodin wrote: > > > > > > Hi hackers! > > > > > > Some time ago I've seen a hanging logical replication that was trying to > > > send transaction commit after doing table pg_repack. > > > I understand that those things do not mix well. Yet walsender was > > > ignoring pg_terminate_backend() and I think this worth fixing. > > > Can we add CHECK_FOR_INTERRUPTS(); somewhere in this backtrace? > > > > > > > I think if we want to do this in this code path then it may be it is > > better to add it in ReorderBufferProcessTXN where we are looping to > > process each change. > > +1 > > The same issue is recently reported[1] on -bugs and I proposed the > patch that adds CHECK_FOR_INTERRUPTS() to the loop in > ReorderBufferProcessTXN(). I think it should be backpatched. > I agree that it is better to backpatch this as well. Would you like to verify if your patch works for all branches or if it need some tweaks? -- With Regards, Amit Kapila.
Re: Logical WAL sender unresponsive during decoding commit
On Tue, Aug 16, 2022 at 2:31 PM Amit Kapila wrote: > > On Tue, Aug 16, 2022 at 10:56 AM Masahiko Sawada > wrote: > > > > On Tue, Aug 16, 2022 at 2:08 PM Amit Kapila wrote: > > > > > > On Tue, Aug 16, 2022 at 9:28 AM Andrey Borodin > > > wrote: > > > > > > > > Hi hackers! > > > > > > > > Some time ago I've seen a hanging logical replication that was trying > > > > to send transaction commit after doing table pg_repack. > > > > I understand that those things do not mix well. Yet walsender was > > > > ignoring pg_terminate_backend() and I think this worth fixing. > > > > Can we add CHECK_FOR_INTERRUPTS(); somewhere in this backtrace? > > > > > > > > > > I think if we want to do this in this code path then it may be it is > > > better to add it in ReorderBufferProcessTXN where we are looping to > > > process each change. > > > > +1 > > > > The same issue is recently reported[1] on -bugs and I proposed the > > patch that adds CHECK_FOR_INTERRUPTS() to the loop in > > ReorderBufferProcessTXN(). I think it should be backpatched. > > > > I agree that it is better to backpatch this as well. Would you like to > verify if your patch works for all branches or if it need some tweaks? > Yes, I've confirmed v10 and master but will do that for other branches and send patches for all supported branches. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Support logical replication of global object commands
> Can we think of relying to send WAL of such DDLs just based on whether > there is a corresponding publication (ex. publication of ALL OBJECTS)? > I mean avoid database-specific filtering in decoding for such DDL > commands but not sure how much better it is than the current proposal? I think a publication of ALL OBJECTS sounds intuitive. Does it mean we'll publish all DDL commands, all commit and abort operations in every database if there is such publication of ALL OBJECTS? Best, Zheng
Re: Allow file inclusion in pg_hba and pg_ident files
Hi, On Sat, Jul 30, 2022 at 04:09:36PM +0800, Julien Rouhaud wrote: > > - 0001: the rule_number / mapping_number addition in the views in a separate > commit > - 0002: the main file inclusion patch. Only a few minor bugfix since > previous version discovered thanks to the tests (a bit more about it after), > and documentation tweaks based on previous discussions > - 0003: the pg_hba_matches() POC, no changes Attached v9, simple rebase after multiple conflicts and part of the patchset applied. >From 5430fd3b829b9c7600b61095752d4f7f85d1150d Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Mon, 30 May 2022 10:59:51 +0800 Subject: [PATCH v9 1/3] Add rule_number / mapping_number to the pg_hba/pg_ident views. Author: Julien Rouhaud Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya%40jrouhaud --- doc/src/sgml/system-views.sgml | 22 + src/backend/utils/adt/hbafuncs.c| 50 ++--- src/include/catalog/pg_proc.dat | 11 --- src/test/regress/expected/rules.out | 10 +++--- 4 files changed, 72 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 44aa70a031..1d619427c1 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -991,6 +991,18 @@ + + + rule_number int4 + + + Rule number of this rule among all rules if the rule is valid, otherwise + null. This indicates the order in which each rule will be considered + until the first matching one, if any, is used to perform authentication + with the client. + + + line_number int4 @@ -1131,6 +1143,16 @@ + + + mapping_number int4 + + + Mapping number, in priority order, of this mapping if the mapping is + valid, otherwise null + + + line_number int4 diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c index 9e5794071c..c9be4bff1f 100644 --- a/src/backend/utils/adt/hbafuncs.c +++ b/src/backend/utils/adt/hbafuncs.c @@ -26,10 +26,12 @@ static ArrayType *get_hba_options(HbaLine *hba); static void fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, - int lineno, HbaLine *hba, const char *err_msg); + int rule_number, int lineno, HbaLine *hba, + const char *err_msg); static void fill_hba_view(Tuplestorestate *tuple_store, TupleDesc tupdesc); static void fill_ident_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, - int lineno, IdentLine *ident, const char *err_msg); + int mapping_number, int lineno, IdentLine *ident, + const char *err_msg); static void fill_ident_view(Tuplestorestate *tuple_store, TupleDesc tupdesc); @@ -157,7 +159,7 @@ get_hba_options(HbaLine *hba) } /* Number of columns in pg_hba_file_rules view */ -#define NUM_PG_HBA_FILE_RULES_ATTS 9 +#define NUM_PG_HBA_FILE_RULES_ATTS 10 /* * fill_hba_line @@ -165,6 +167,7 @@ get_hba_options(HbaLine *hba) * * tuple_store: where to store data * tupdesc: tuple descriptor for the view + * rule_number: unique rule identifier among all valid rules * lineno: pg_hba.conf line number (must always be valid) * hba: parsed line data (can be NULL, in which case err_msg should be set) * err_msg: error message (NULL if none) @@ -174,7 +177,8 @@ get_hba_options(HbaLine *hba) */ static void fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, - int lineno, HbaLine *hba, const char *err_msg) + int rule_number, int lineno, HbaLine *hba, + const char *err_msg) { Datum values[NUM_PG_HBA_FILE_RULES_ATTS]; boolnulls[NUM_PG_HBA_FILE_RULES_ATTS]; @@ -193,6 +197,11 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, memset(nulls, 0, sizeof(nulls)); index = 0; + /* rule_number */ + if (err_msg) + nulls[index++] = true; + else + values[index++] = Int32GetDatum(rule_number); /* line_number */ values[index++] = Int32GetDatum(lineno); @@ -336,7 +345,7 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc, else { /* no parsing result, so set relevant fields to nulls */ - memset(&nulls[1], true, (NUM_PG_HBA_FILE_RULES_ATTS - 2) * sizeof(bool)); + memset(&nulls[2], true, (NUM_PG_HBA_FILE_RULES_ATTS - 3) * sizeof(bool)); } /* error */ @@ -359,6 +368,7 @@ fill_hba_view(Tuplestorestate *tuple_store, TupleDesc tu
Re: Expose Parallelism counters planned/execute in pg_stat_statements
Hi, On Fri, Jul 29, 2022 at 08:36:44AM -0500, Daymel Bonne Solís wrote: > > We have rewritten the patch and added the necessary columns to have the > number of times a parallel query plan was not executed using parallelism. > > We are investigating how to add more information related to the workers > created > by the Gather/GatherMerge nodes, but it is not a trivial task. As far as I can see the scope of the counters is now different. You said you wanted to be able to identify when a parallel query plan cannot be executed with parallelism, but what the fields are now showing is simply whether no workers were launched at all. It could be because of the dbeaver behavior you mentioned (the !es_use_parallel_mode case), but also if the executor did try to launch parallel workers and didn't get any. I don't think that's an improvement. With this patch if you see the "paral_planned_not_exec" counter going up, you still don't know if this is because of the !es_use_parallel_mode or if you simply have too many parallel queries running at the same time, or both, and therefore can't do much with that information. Both situations are different and in my opinion require different (and specialized) counters to properly handle them. Also, I don't think that paral_planned_exec and paral_planned_not_exec are good column (and variable) names. Maybe something like "parallel_exec_count" and "forced_non_parallel_exec_count" (assuming it's based on a parallel plan and !es_use_parallel_mode).