Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/5/22 9:24 AM, Michael Paquier wrote: On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote: Anyway, I have looked at the patch. + List *roles_re; + List *databases_re; + regex_thostname_re; I am surprised by the approach of using separate lists for the regular expressions and the raw names. Wouldn't it be better to store everything in a single list but assign an entry type? In this case it would be either regex or plain string. This would minimize the footprint of the changes (no extra arguments *_re in the routines checking for a match on the roles, databases or hosts). And it seems to me that this would make unnecessary the use of re_num here and there. Please find attached v5 addressing this. I started with an union but it turns out that we still need the plain string when a regex is used. This is not needed for the authentication per say but for fill_hba_line(). So I ended up creating a new struct without union in v5. The hostname is different, of course, requiring only an extra field for its type, or something like that. I'm using the same new struct as described above for the hostname. Perhaps the documentation would gain in clarity if there were more examples, like a set of comma-separated examples (mix of regex and raw strings for example, for all the field types that gain support for regexes)? Right, I added more examples in v5. -$node->append_conf('postgresql.conf', "log_connections = on\n"); +$node->append_conf( +'postgresql.conf', qq{ +listen_addresses = '127.0.0.1' +log_connections = on +}); Hmm. I think that we may need to reconsider the location of the tests for the regexes with the host name, as the "safe" regression tests should not switch listen_addresses. One location where we already do that is src/test/ssl/, so these could be moved there. Good point, I moved the hostname related tests in src/test/ssl. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c6f1b70fd3..406628ef35 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -235,8 +235,9 @@ hostnogssenc database userPostgreSQL database. - Multiple database names can be supplied by separating them with - commas. A separate file containing database names can be specified by + Multiple database names and/or regular expressions preceded by / + can be supplied by separating them with commas. + A separate file containing database names can be specified by preceding the file name with @. @@ -249,7 +250,8 @@ hostnogssenc database userall specifies that it matches all users. Otherwise, this is either the name of a specific - database user, or a group name preceded by +. + database user, a regular expression preceded by / + or a group name preceded by +. (Recall that there is no real distinction between users and groups in PostgreSQL; a + mark really means match any of the roles that are directly or indirectly members @@ -258,7 +260,8 @@ hostnogssenc database user/ + can be supplied by separating them with commas. A separate file containing user names can be specified by preceding the file name with @. @@ -270,8 +273,9 @@ hostnogssenc database user Specifies the client machine address(es) that this record - matches. This field can contain either a host name, an IP - address range, or one of the special key words mentioned below. + matches. This field can contain either a host name, a regular expression + preceded by / representing host names, an IP address range, + or one of the special key words mentioned below. @@ -739,6 +743,24 @@ hostall all ::1/128 trust # TYPE DATABASEUSERADDRESS METHOD hostall all localhost trust +# The same using a regular expression for host name, which allows connection for +# host name ending with "test". +# +# TYPE DATABASEUSERADDRESS METHOD +hostall all /^.*test$ trust + +# The same using regular expression for DATABASE, which allows connection to the +# db1 and testdb databases and any database with a name ending with "test". +# +# TYPE DATABASE USERADDRESS METHOD +local db1,/^.*test$,testdb all /^.*test$ trust + +# The same using regular expression for USER, which allows connection to the +# user1 and testuser users and any user with a name ending with "test". +# +# TYPE DATABASE USER ADDRESS METHO
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/6/22 2:53 AM, Michael Paquier wrote: On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote: On 10/5/22 9:24 AM, Michael Paquier wrote: - test_role() -> test_conn() to be able to pass down a database name. - reset_pg_hba() to control the host, db and user parts. The host part does not really apply after moving the hosts checks to a more secure location, so I guess that this had better be extended just for the user and database, keeping host=local all the time. I am planning to apply 0001 attached independently, 0001 looks good to me. Thanks. I have applied this refactoring, leaving the host part out of the equation as we should rely only on local connections for this part of the test. Thanks! Thanks! I'll look at it and the comments you just made up-thread. Cool, thanks. One thing that matters a lot IMO (that I forgot to mention previously) is to preserve the order of the items parsed from the configuration files. Fully agree, all the versions that have been submitted in this thread preserves the ordering. Also, I am wondering whether we'd better have some regression tests where a regex includes a comma and a role name itself has a comma, actually, just to stress more the parsing of individual items in the HBA file. Good idea, it has been added in v5 just shared up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Adding Support for Copy callback functionality on COPY TO api
On Sun, Oct 9, 2022 at 2:44 AM Nathan Bossart wrote: > > Sorry for the noise. There was an extra #include in v4 that I've removed > in v5. IIUC, COPY TO callback helps move a table's data out of postgres server. Just wondering, how is it different from existing solutions like COPY TO ... PROGRAM/FILE, logical replication, pg_dump etc. that can move a table's data out? I understandb that the COPY FROM callback was needed for logical replication 7c4f52409. Mentioning a concrete use-case helps here. I'm not quite sure if we need a separate module to just tell how to use this new callback. I strongly feel that it's not necessary. It unnecessarily creates extra code (actual code is 25 LOC with v1 patch but 150 LOC with v5 patch) and can cause maintenance burden. These callback APIs are simple enough to understand for those who know BeginCopyTo() or BeginCopyFrom() and especially for those who know how to write extensions. These are not APIs that an end-user uses. The best would be to document both COPY FROM and COPY TO callbacks, perhaps with a pseudo code specifying just the essence [1], and their possible usages somewhere here https://www.postgresql.org/docs/devel/sql-copy.html. The order of below NOTICE messages isn't guaranteed and it can change depending on platforms. Previously, we've had to suppress such messages in the test output 6adc5376d. +SELECT test_copy_to_callback('public.test'::pg_catalog.regclass); +NOTICE: COPY TO callback called with data "123" and length 5 +NOTICE: COPY TO callback called with data "123456" and length 8 +NOTICE: COPY TO callback called with data "123456789" and length 11 + test_copy_to_callback [1] +Relationrel = table_open(PG_GETARG_OID(0), AccessShareLock); +CopyToState cstate; + +cstate = BeginCopyTo(NULL, rel, NULL, RelationGetRelid(rel), NULL, NULL, + to_cb, NIL, NIL); +(void) DoCopyTo(cstate); +EndCopyTo(cstate); + +table_close(rel, AccessShareLock); -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: ps command does not show walsender's connected db
On Mon, Oct 10, 2022 at 12:20 PM bt22nakamorit wrote: > > 2022-10-10 12:27 Bharath Rupireddy wrote: > > if (port->database_name != NULL && port->database_name[0] != '\0') > > appendStringInfo(&ps_data, "%s ", port->database_name); > > > > The above works better. > > Thank you for the suggestion. > I have edited the patch to reflect your idea. Thanks. LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: archive modules
On 05.10.22 20:57, Nathan Bossart wrote: On Wed, Oct 05, 2022 at 07:55:58PM +0200, Peter Eisentraut wrote: Leaving archive_command empty is an intentional configuration choice. What we are talking about here is, arguably, a misconfiguration, so it should result in an error. Okay. What do you think about something like the attached? That looks like the right solution to me. Let's put that into PG 16, and maybe we can consider backpatching it.
Re: Remove unnecessary commas for goto labels
On Sun, Oct 9, 2022 at 10:08 AM Julien Rouhaud wrote: > > Hi, > > On Sun, Oct 09, 2022 at 07:42:58AM +0800, Japin Li wrote: > > > > Hi hackers, > > > > I find there are some unnecessary commas for goto lables, > > attached a patch to remove them. > > You mean semi-colon? +1, and a quick regex later I don't see any other > occurrence. Interestingly, I did find that in C, some statement is needed after a label, even an empty one, otherwise it won't compile. That's not the case here, though, so pushed. -- John Naylor EDB: http://www.enterprisedb.com
Re: Remove unnecessary commas for goto labels
On Mon, 10 Oct 2022 at 16:18, John Naylor wrote: > Interestingly, I did find that in C, some statement is needed after a > label, even an empty one, otherwise it won't compile. Yeah, this is required by ISO C [1]. > That's not the case here, though, so pushed. Thank you! [1] https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Labels -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
clean up pid_t printing and get rid of pgpid_t
On 04.10.22 10:15, Peter Eisentraut wrote: I wanted to propose the attached patch to get rid of the custom pgpid_t typedef in pg_ctl. Since we liberally use pid_t elsewhere, this seemed plausible. However, this patch fails the CompilerWarnings job on Cirrus, because apparently under mingw, pid_t is "volatile long long int", so all the printf placeholders mismatch. However, we print pid_t as %d in a lot of other places, so I'm confused why this fails here. I figured out that in most places we actually store PIDs in int, and in the cases where we use pid_t, casts before printing are indeed used and necessary. So nevermind that. In any case, I took this opportunity to standardize the printing of PIDs as %d. There were a few stragglers. And then the original patch to get rid of pgpid_t in pg_ctl, now updated with the correct casts for printing. I confirmed that this now passes the CompilerWarnings job. From 72b0620f3824ef91bf9f64b4814e5874f8152322 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 10 Oct 2022 10:04:45 +0200 Subject: [PATCH v2 1/3] doc: Correct type of bgw_notify_pid This has apparently been wrong since the beginning (090d0f2050647958865cb495dff74af7257d2bb4). --- doc/src/sgml/bgworker.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml index 3d4e4afcf919..7ba5da27e50a 100644 --- a/doc/src/sgml/bgworker.sgml +++ b/doc/src/sgml/bgworker.sgml @@ -63,7 +63,7 @@ Background Worker Processes charbgw_function_name[BGW_MAXLEN]; Datum bgw_main_arg; charbgw_extra[BGW_EXTRALEN]; -int bgw_notify_pid; +pid_t bgw_notify_pid; } BackgroundWorker; base-commit: 06dbd619bfbfe03fefa7223838690d4012f874ad -- 2.37.3 From c405c4a942c861a720662035983dca237fb92527 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 10 Oct 2022 10:07:14 +0200 Subject: [PATCH v2 2/3] Standardize format for printing PIDs Most code prints PIDs as %d, but some code tried to print them as long or unsigned long. While this is in theory allowed, the fact that PIDs fit into int is deeply baked into all PostgreSQL code, so these random deviations don't accomplish anything except confusion. Note that we still need casts from pid_t to int, because on 64-bit MinGW, pid_t is long long int. (But per above, actually supporting that range in PostgreSQL code would be major surgery and probably not useful.) --- contrib/pg_prewarm/autoprewarm.c | 20 ++-- src/backend/postmaster/bgworker.c| 4 ++-- src/backend/storage/ipc/procsignal.c | 4 ++-- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index c8d673a20e36..1843b1862e57 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -193,8 +193,8 @@ autoprewarm_main(Datum main_arg) { LWLockRelease(&apw_state->lock); ereport(LOG, - (errmsg("autoprewarm worker is already running under PID %lu", - (unsigned long) apw_state->bgworker_pid))); + (errmsg("autoprewarm worker is already running under PID %d", + (int) apw_state->bgworker_pid))); return; } apw_state->bgworker_pid = MyProcPid; @@ -303,8 +303,8 @@ apw_load_buffers(void) { LWLockRelease(&apw_state->lock); ereport(LOG, - (errmsg("skipping prewarm because block dump file is being written by PID %lu", - (unsigned long) apw_state->pid_using_dumpfile))); + (errmsg("skipping prewarm because block dump file is being written by PID %d", + (int) apw_state->pid_using_dumpfile))); return; } LWLockRelease(&apw_state->lock); @@ -599,12 +599,12 @@ apw_dump_now(bool is_bgworker, bool dump_unlogged) { if (!is_bgworker) ereport(ERROR, - (errmsg("could not perform block dump because dump file is being used by PID %lu", - (unsigned long) apw_state->pid_using_dumpfile))); + (errmsg("could not perform block dump because dump file is being used by PID %d", + (int) apw_state->pid_using_dumpfile))); ereport(LOG, - (errmsg("skipping block dump because it is already being performed by PID %lu", - (unsigned long) apw_state->pid_using_dumpfile))); + (
Re: problems with making relfilenodes 56-bits
On Wed, Oct 5, 2022 at 5:19 AM Andres Freund wrote: > > I light dusted off my old varint implementation from [1] and converted the > RelFileLocator and BlockNumber from fixed width integers to varint ones. This > isn't meant as a serious patch, but an experiment to see if this is a path > worth pursuing. > > A run of installcheck in a cluster with autovacuum=off, full_page_writes=off > (for increased reproducability) shows a decent saving: > > master: 241106544 - 230 MB > varint: 227858640 - 217 MB > > The average record size goes from 102.7 to 95.7 bytes excluding the remaining > FPIs, 118.1 to 111.0 including FPIs. > I have also executed my original test after applying these patches on top of the 56 bit relfilenode patch. So earlier we saw the WAL size increased by 11% (66199.09375 kB to 73906.984375 kB) and after this patch now the WAL generated is 58179.2265625. That means in this particular example this patch is reducing the WAL size by 12% even with the 56 bit relfilenode patch. [1] https://www.postgresql.org/message-id/CAFiTN-uut%2B04AdwvBY_oK_jLvMkwXUpDJj5mXg--nek%2BucApPQ%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: create subscription - improved warning message
On Mon, Oct 10, 2022 at 4:04 PM Amit Kapila wrote: > > On Mon, Oct 10, 2022 at 10:10 AM Tom Lane wrote: > > > > Amit Kapila writes: ... > The docs say [1]: "When creating a subscription, the remote host is > not reachable or in an unclear state. In that case, the subscription > can be created using the connect = false option. The remote host will > then not be contacted at all. This is what pg_dump uses. The remote > replication slot will then have to be created manually before the > subscription can be activated." > > I think the below gives accurate information: > WARNING: subscription was created, but is not connected > HINT: You should create the slot manually, enable the subscription, > and run %s to initiate replication. > +1 PSA patch v2 worded like that. -- Kind Regards, Peter Smith. Fujitsu Australia v2-0001-create-subscription-warning-message.patch Description: Binary data
src/test/perl/PostgreSQL/Test/*.pm not installed
I noticed that the new Perl test modules are not installed, so if you try to use PostgreSQL/Test/Cluster.pm in an external test from pgxs, it fails with the modules not being found. I see no reason for this other than having overseen it in b235d41d9646, so I propose the attached (for all branches, naturally.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La gente vulgar sólo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo" diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile index 3d3a95b52f..79f790772f 100644 --- a/src/test/perl/Makefile +++ b/src/test/perl/Makefile @@ -17,6 +17,7 @@ ifeq ($(enable_tap_tests),yes) installdirs: $(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)' + $(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test' install: all installdirs $(INSTALL_DATA) $(srcdir)/TestLib.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/TestLib.pm' @@ -24,6 +25,8 @@ install: all installdirs $(INSTALL_DATA) $(srcdir)/RecursiveCopy.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/RecursiveCopy.pm' $(INSTALL_DATA) $(srcdir)/PostgresNode.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgresNode.pm' $(INSTALL_DATA) $(srcdir)/PostgresVersion.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgresVersion.pm' + $(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Cluster.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm' + $(INSTALL_DATA) $(srcdir)/PostgreSQL/Test/Utils.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Utils.pm' uninstall: rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/TestLib.pm' @@ -31,5 +34,7 @@ uninstall: rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/RecursiveCopy.pm' rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgresNode.pm' rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgresVersion.pm' + rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Cluster.pm' + rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgreSQL/Test/Utils.pm' endif
Re: Suppressing useless wakeups in walreceiver
On Wed, Oct 5, 2022 at 11:38 PM Nathan Bossart wrote: > > I moved as much as I could to 0001 in v4. Some comments on v4-0002: 1. You might want to use PG_INT64_MAX instead of INT64_MAX for portability? 2. With the below change, the time walreceiver spends in XLogWalRcvSendReply() is also included for XLogWalRcvSendHSFeedback right? I think it's a problem given that XLogWalRcvSendReply() can take a while. Earlier, this wasn't the case, each function calculating 'now' separately. Any reason for changing this behaviour? I know that GetCurrentTimestamp(); isn't cheaper all the time, but here it might result in a change in the behaviour. -XLogWalRcvSendReply(false, false); -XLogWalRcvSendHSFeedback(false); +TimestampTz now = GetCurrentTimestamp(); + +XLogWalRcvSendReply(state, now, false, false); +XLogWalRcvSendHSFeedback(state, now, false); 3. I understand that TimestampTz type is treated as microseconds. Would you mind having a comment in below places to say why we're multiplying with 1000 or 100 here? Also, do we need 1000L or 100L or type cast to TimestampTz? +state->wakeup[reason] = now + (wal_receiver_timeout / 2) * 1000; +state->wakeup[reason] = now + wal_receiver_timeout * 1000; +state->wakeup[reason] = now + wal_receiver_status_interval * 100; 4. How about simplifying WalRcvComputeNextWakeup() something like below? static void WalRcvComputeNextWakeup(WalRcvInfo *state, WalRcvWakeupReason reason, TimestampTz now) { TimestampTz ts = INT64_MAX; switch (reason) { case WALRCV_WAKEUP_TERMINATE: if (wal_receiver_timeout > 0) ts = now + (TimestampTz) (wal_receiver_timeout * 1000L); break; case WALRCV_WAKEUP_PING: if (wal_receiver_timeout > 0) ts = now + (TimestampTz) ((wal_receiver_timeout / 2) * 1000L); break; case WALRCV_WAKEUP_HSFEEDBACK: if (hot_standby_feedback && wal_receiver_status_interval > 0) ts = now + (TimestampTz) (wal_receiver_status_interval * 100L); break; case WALRCV_WAKEUP_REPLY: if (wal_receiver_status_interval > 0) ts = now + (TimestampTz) (wal_receiver_status_interval * 100); break; default: /* An error is better here. */ } state->wakeup[reason] = ts; } 5. Can we move below code snippets to respective static functions for better readability and code reuse? This: +/* Find the soonest wakeup time, to limit our nap. */ +nextWakeup = INT64_MAX; +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) +nextWakeup = Min(state.wakeup[i], nextWakeup); +nap = Max(0, (nextWakeup - now + 999) / 1000); And this: +now = GetCurrentTimestamp(); +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) +WalRcvComputeNextWakeup(&state, i, now); -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re[2]: Possible solution for masking chosen columns when using pg_dump
Hi, I applied most of suggestions: used separate files for most of added code, fixed typos/mistakes, got rid of that pesky TODO that was already implemented, just not deleted. Added tests (and functionality) for cases when you need to mask columns in tables with the same name in different schemas. If schema is not specified, then columns in all tables with specified name are masked (Example - pg_dump -t ‘t0’ --mask-columns id --mask-function mask_int will mask all ids in all tables with names ‘t0’ in all existing schemas). Wrote comments for all ‘magic numbers’ About that >- Also it can be hard to use a lot of different functions for different >fields, maybe it would be better to set up functions in a file. I agree with that, but I know about at least 2 other patches (both are WIP, but still) that are interacting with reading command-line options from file. And if everyone will write their own version of reading command-line options from file, it will quickly get confusing. A solution to that problem is another patch that will put all options from file (one file for any possible options, from existing to future ones) into **argv in main, so that pg_dump can process them as if they came form command line. >Пятница, 7 октября 2022, 8:01 +07:00 от Виктория Шепард : > >Hi, >I took a look, here are several suggestions for improvement: > >- Masking is not a main functionality of pg_dump and it is better to write >most of the connected things in a separate file like parallel.c or >dumputils.c. This will help slow down the growth of an already huge pg_dump >file. > >- Also it can be hard to use a lot of different functions for different >fields, maybe it would be better to set up functions in a file. > >- How will it work for the same field and tables in the different schemas? Can >we set up the exact schema for the field? > >- misspelling in a word >>/* >>* Add all columns and funcions to list of MaskColumnInfo structures, >>*/ > >- Why did you use 256 here? >> char* table = (char*) pg_malloc(256 * sizeof(char)); >Also for malloc you need malloc on 1 symbol more because you have to store >'\0' symbol. > >- Instead of addFuncToDatabase you can run your query using something already >defined from fe_utils/query_utils.c. And It will be better to set up a >connection only once and create all functions. Establishing a connection is a >resource-intensive procedure. There are a lot of magic numbers, better to >leave some comments explaining why there are 64 or 512. > >- It seems that you are not using temp_string >> char *temp_string = (char*)malloc(256 * sizeof(char)); > >- Grammar issues >>/* >>* mask_column_info_list contains info about every to-be-masked column: >>* its name, a name its table (if nothing is specified - mask all columns with >>this name), >>* name of masking function and name of schema containing this function >>(public if not specified) >>*/ >the name of its table > >пн, 3 окт. 2022 г. в 20:45, Julien Rouhaud < rjuju...@gmail.com >: >>Hi, >> >>On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote: >>> >>> Hello, here's my take on masking data when using pg_dump >>> >>> The main idea is using PostgreSQL functions to replace data during a SELECT. >>> When table data is dumped SELECT a,b,c,d ... from ... query is generated, >>> the columns that are marked for masking are replaced with result of >>> functions on those columns >>> Example: columns name, count are to be masked, so the query will look as >>> such: SELECT id, mask_text(name), mask_int(count), date from ... >>> >>> So about the interface: I added 2 more command-line options: >>> >>> --mask-columns, which specifies what columns from what tables will be >>> masked >>> usage example: >>> --mask-columns " t1.name , t2.description" - both columns will >>> be masked with the same corresponding function >>> or --mask-columns name - ALL columns with name "name" from all >>> dumped tables will be masked with correspoding function >>> >>> --mask-function, which specifies what functions will mask data >>> usage example: >>> --mask-function mask_int - corresponding columns will be masked >>> with function named "mask_int" from default schema (public) >>> or --mask-function my_schema.mask_varchar - same as above but >>> with specified schema where the function is stored >>> or --mask-function somedir/filename - the function is "defined" >>> here - more on the structure below >> >>FTR I wrote an extension POC [1] last weekend that does that but on the >>backend >>side. The main advantage is that it's working with any existing versions of >>pg_dump (or any client relying on COPY or even plain interactive SQL >>statements), and that the DBA can force a dedicated role to only get a masked >>dump, even if they forgot to ask for it. >> >>I only had a quick look at your patch but it seems that you left some todo i
Re: make_ctags: use -I option to ignore pg_node_attr macro
Hello On 2022-Oct-07, Yugo NAGATA wrote: > I found that tag files generated by src/tools/make_ctags > doesn't include some keywords, that were field names of node > structs, for example norm_select in RestrictInfo. Such fields > are defined with pg_node_attr macro that was introduced > recently, like: > > Selectivity norm_selec pg_node_attr(equal_ignore); > > In this case, pg_node_attr is mistakenly interpreted to be > the name of the field. So, I propose to use -I option of ctags > to ignore the marco name. Attached is a patch to do it. I've wondered if there's anybody that uses this script. I suppose if you're reporting this problem then it has at least one user, and therefore worth fixing. If we do patch it, how about doing some more invasive surgery and bringing it to the XXI century? I think the `find` line is a bit lame: > # this is outputting the tags into the file 'tags', and appending > find `pwd`/ -type f -name '*.[chyl]' -print | > - xargs ctags -a -f tags "$FLAGS" > + xargs ctags -a -f tags "$FLAGS" -I "$IGNORE_LIST" especially because it includes everything in tmp_install, which pollutes the tag list. In my own tags script I just call "ctags -R", and I feed cscope with these find lines (find $SRCDIR \( -name tmp_install -prune -o -name tmp_check -prune \) -o \( -name "*.[chly]" -o -iname "*makefile*" -o -name "*.mk" -o -name "*.in" -o -name "*.sh" -o -name "*.sgml" -o -name "*.sql" -o -name "*.p[lm]" \) -type f -print ; \ find $BUILDDIR \( -name tmp_install -prune \) -o \( -name \*.h -a -type f \) -print ) which seems to give decent results. (Nowadays I wonder if it'd be better to exclude the "*_d.h" files from the builddir.) (I wonder why don't I have a prune for .git ...) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ ¡Ay, ay, ay! Con lo mucho que yo lo quería (bis) se fue de mi vera ... se fue para siempre, pa toíta ... pa toíta la vida ¡Ay Camarón! ¡Ay Camarón!(Paco de Lucía)
Re: [BUG] Logical replica crash if there was an error in a function.
On 2022-Sep-24, Tom Lane wrote: > "Anton A. Melnikov" writes: > > [ v4-0001-Fix-logical-replica-assert-on-func-error.patch ] > > I took a quick look at this. I think you're solving the > problem in the wrong place. The real issue is why are > we not setting up ActivePortal correctly when running > user-defined code in a logrep worker? There is other code > that expects that to be set, eg EnsurePortalSnapshotExists. Right ... mostly, the logical replication *does* attempt to set up a transaction and active snapshot when replaying actions (c.f. begin_replication_step()). Is this firing at an inappropriate time, perhaps? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: problems with making relfilenodes 56-bits
On Mon, Oct 10, 2022 at 5:16 AM Dilip Kumar wrote: > I have also executed my original test after applying these patches on > top of the 56 bit relfilenode patch. So earlier we saw the WAL size > increased by 11% (66199.09375 kB to 73906.984375 kB) and after this > patch now the WAL generated is 58179.2265625. That means in this > particular example this patch is reducing the WAL size by 12% even > with the 56 bit relfilenode patch. That's a very promising result, but the question in my mind is how much work would be required to bring this patch to a committable state? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
On Wed, 5 Oct 2022 at 16:30, Tom Lane wrote: > > Kyotaro Horiguchi writes: > > At Tue, 4 Oct 2022 17:15:31 -0700, Nathan Bossart > > wrote in > >> On Tue, Oct 04, 2022 at 07:53:11PM -0400, Tom Lane wrote: > >>> After further thought, maybe it'd be better to do it as attached, > >>> with one long-lived hash table for all the locks. > > > First one is straight forward outcome from the current implement but I > > like the new one. I agree that it is natural and that the expected > > overhead per (typical) transaction is lower than both the first one > > and doing the same operation on a list. I don't think that space > > inefficiency in that extent doesn't matter since it is the startup > > process. > > To get some hard numbers about this, I made a quick hack to collect > getrusage() numbers for the startup process (patch attached for > documentation purposes). I then ran the recovery/t/027_stream_regress.pl > test a few times and collected the stats (also attached). This seems > like a reasonably decent baseline test, since the core regression tests > certainly take lots of AccessExclusiveLocks what with all the DDL > involved, though they shouldn't ever take large numbers at once. Also > they don't run long enough for any lock list bloat to occur, so these > numbers don't reflect a case where the patches would provide benefit. > > If you look hard, there's maybe about a 1% user-CPU penalty for patch 2, > although that's well below the run-to-run variation so it's hard to be > sure that it's real. The same comments apply to the max resident size > stats. So I'm comforted that there's not a significant penalty here. > > I'll go ahead with patch 2 if there's not objection. Happy to see this change. > One other point to discuss: should we consider back-patching? I've > got mixed feelings about that myself. I don't think that cases where > this helps significantly are at all mainstream, so I'm kind of leaning > to "patch HEAD only". It looks fine to eventually backpatch, since StandbyReleaseLockTree() was optimized to only be called when the transaction had actually done some AccessExclusiveLocks. So the performance loss is minor and isolated to the users of such locks, so I see no problems with it. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: create subscription - improved warning message
On Mon, Oct 10, 2022 at 12:41 AM Tom Lane wrote: > I'm beginning to question the entire premise here. That is, > rather than tweaking this message until it's within hailing > distance of sanity, why do we allow the no-connect case at all? That sounds pretty nuts to me, because of the pg_dump use case if nothing else. I don't think it's reasonable to say "oh, if you execute this DDL on your system, it will instantaneously and automatically begin to create outbound network connections, and there's no way to turn that off." It ought to be possible to set up a configuration first and then only later turn it on. And it definitely ought to be possible, if things aren't working out, to turn it back off, too. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Query Jumbling for CALL and SET utility statements
Hi, On 10/7/22 6:13 AM, Michael Paquier wrote: On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote: I've been thinking since the beginning of this thread that there was no coherent, defensible rationale being offered for jumbling some utility statements and not others. Yeah. The potential performance impact of all the TransactionStmts worries me a bit, though. I wonder if the answer is to jumble them all. We avoided that up to now because it would imply a ton of manual effort and future code maintenance ... but now that the backend/nodes/ infrastructure is largely auto-generated, could we auto-generate the jumbling code? I think that's a good idea. Probably. One part that may be tricky though is the location of the constants we'd like to make generic, but perhaps this could be handled by using a dedicated variable type that just maps to int? It looks to me that we'd also need to teach the perl parser which fields per statements struct need to be jumbled (or more probably which ones to exclude from the jumbling, for example funccall in CallStmt). Not sure yet how to teach the perl parser, but I'll build this list first to help decide for a right approach, unless you already have some thoughts about it? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Query Jumbling for CALL and SET utility statements
Hi, On Mon, Oct 10, 2022 at 03:04:57PM +0200, Drouvot, Bertrand wrote: > > On 10/7/22 6:13 AM, Michael Paquier wrote: > > > > Probably. One part that may be tricky though is the location of the > > constants we'd like to make generic, but perhaps this could be handled > > by using a dedicated variable type that just maps to int? > > It looks to me that we'd also need to teach the perl parser which fields per > statements struct need to be jumbled (or more probably which ones to exclude > from the jumbling, for example funccall in CallStmt). Not sure yet how to > teach the perl parser, but I'll build this list first to help decide for a > right approach, unless you already have some thoughts about it? Unless I'm missing something both can be handled using pg_node_attr() annotations that already exists?
doc: add entry for pg_get_partkeydef()
Hi Small patch for $subject, as the other pg_get_XXXdef() functions are documented and I was looking for this one but couldn't remember what it was called. Regards Ian Barwick commit 1ab855e72e077abad247cc40619b6fc7f7758983 Author: Ian Barwick Date: Mon Oct 10 22:32:31 2022 +0900 doc: add entry for pg_get_partkeydef() Other pg_get_XXXdef() functions are documented, so it seems reasonable to include this as well. diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b2bdbc7d1c..d59b8ab77e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23718,6 +23718,23 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); + + + + pg_get_partkeydef + +pg_get_partkeydef ( table oid ) +text + + +Reconstructs the definition of a partition key, in the form it would +need to appear in the PARTITION BY clause in +CREATE TABLE. +(This is a decompiled reconstruction, not the original text +of the command.) + + +
Re: Add common function ReplicationOriginName.
On Tue, Sep 27, 2022 at 5:04 PM Aleksander Alekseev wrote: > > Hi Peter, > > > PSA patch v3 to combine the different replication origin name > > formatting in a single function ReplicationOriginNameForLogicalRep as > > suggested. > > LGTM except for minor issues with the formatting; fixed. > LGTM as well. I'll push this tomorrow unless there are any more comments. -- With Regards, Amit Kapila.
PostgreSQL 15 GA - Oct 13, 2022
Hi, The PostgreSQL 15 GA will be Oct 13, 2022. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Non-decimal integer literals
On 16.02.22 11:11, Peter Eisentraut wrote: The remaining patches are material for PG16 at this point, and I will set the commit fest item to returned with feedback in the meantime. Time to continue with this. Attached is a rebased and cleaned up patch for non-decimal integer literals. (I don't include the underscores-in-numeric literals patch. I'm keeping that for later.) Two open issues from my notes: Technically, numeric_in() should be made aware of this, but that seems relatively complicated and maybe not necessary for the first iteration. Taking another look around ecpg to see how this interacts with C-syntax integer literals. I'm not aware of any particular issues, but it's understandably tricky. Other than that, this seems pretty complete as a start. From d0bc72fa4c339ba2ea0bb8d1e5a3923d76ee8105 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 10 Oct 2022 16:03:15 +0200 Subject: [PATCH v9] Non-decimal integer literals Add support for hexadecimal, octal, and binary integer literals: 0x42F 0o273 0b100101 per SQL:202x draft. This adds support in the lexer as well as in the integer type input functions. Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com --- doc/src/sgml/syntax.sgml | 26 src/backend/catalog/information_schema.sql | 6 +- src/backend/catalog/sql_features.txt | 1 + src/backend/parser/scan.l | 99 +++ src/backend/utils/adt/numutils.c | 140 + src/fe_utils/psqlscan.l| 78 +--- src/interfaces/ecpg/preproc/pgc.l | 108 +--- src/test/regress/expected/int2.out | 19 +++ src/test/regress/expected/int4.out | 19 +++ src/test/regress/expected/int8.out | 19 +++ src/test/regress/expected/numerology.out | 59 - src/test/regress/sql/int2.sql | 7 ++ src/test/regress/sql/int4.sql | 7 ++ src/test/regress/sql/int8.sql | 7 ++ src/test/regress/sql/numerology.sql| 21 +++- 15 files changed, 523 insertions(+), 93 deletions(-) diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 93ad71737f51..bba78c22f1a9 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -694,6 +694,32 @@ Numeric Constants + + Additionally, non-decimal integer constants can be used in these forms: + +0xhexdigits +0ooctdigits +0bbindigits + + hexdigits is one or more hexadecimal digits + (0-9, A-F), octdigits is one or more octal + digits (0-7), bindigits is one or more binary + digits (0 or 1). Hexadecimal digits and the radix prefixes can be in + upper or lower case. Note that only integers can have non-decimal forms, + not numbers with fractional parts. + + + + These are some examples of this: +0b100101 +0B10011001 +0o273 +0O755 +0x42f +0X + + + integer bigint diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 18725a02d1fb..95c27a625e7e 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer WHEN 1700 /*numeric*/ THEN CASE WHEN $2 = -1 THEN null - ELSE (($2 - 4) >> 16) & 65535 + ELSE (($2 - 4) >> 16) & 0x END WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/ WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/ @@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) RETURNS integer WHEN $1 IN (1700) THEN CASE WHEN $2 = -1 THEN null - ELSE ($2 - 4) & 65535 + ELSE ($2 - 4) & 0x END ELSE null END; @@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */ THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END WHEN $1 IN (1186) /* interval */ - THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 END + THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 0x END ELSE null END; diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index da7c9c772e09..e897e28ed148 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -527,6 +527,7 @@ T652SQL-dynamic statements in SQL routines NO T653 SQL-schema statements in external routines YES T654 SQL-dynamic statements in external routines NO T655 Cyclically dependent r
RE: Fix some newly modified tab-complete changes
On Mon, Oct 10, 2022 2:12 PM shiy.f...@fujitsu.com wrote: > > On Tue, Oct 4, 2022 4:17 PM Peter Smith wrote: > > > > But, while testing I noticed another different quirk > > > > It seems that neither the GRANT nor the REVOKE auto-complete > > recognises the optional PRIVILEGE keyword > > > > e.g. GRANT ALL --> ON (but not PRIVILEGE) > > e.g. GRANT ALL PRIV --> ??? > > > > e.g. REVOKE ALL --> ON (but not PRIVILEGE).. > > e.g. REVOKE ALL PRIV --> ??? > > > > I tried to add tab-completion for it. Pleases see attached updated patch. > Sorry for attaching a wrong patch. Here is the right one. Regards, Shi yu v4-0001-Fix-tab-completion-for-GRANT-REVOKE.patch Description: v4-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Re: kerberos/001_auth test fails on arm CPU darwin
Hi, Thanks for the review! On 10/1/22 14:12, Peter Eisentraut wrote: This patch could use some more in-code comments. For example, this +# get prefix for kerberos executables and try to find them at this path +sub test_krb5_paths is not helpful. What does it "get", where does it put it, how does it "try", and what does it do if it fails? What are the inputs and outputs of this function? + # remove '\n' since 'krb5-config --prefix' returns path ends with '\n' + $krb5_path =~ s/\n//g; use chomp I updated patch regarding these comments. I have a question about my logic: + elsif ($^O eq 'linux') + { + test_krb5_paths('/usr/'); + } } Before that, test could use krb5kdc, kadmin and kdb5_util from '/usr/sbin/'; krb5_config and kinit from $PATH. However, now it will try to use all of them from $PATH or from '/usr/sbin/' and '/usr/bin/'. Does that cause a problem? Ci run after fix is applied: https://cirrus-ci.com/build/5359971746447360 Regards, Nazir Bilal Yavuz From 5aa95409db8146a076255e00be399a0d9ce1efe8 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 26 Sep 2022 10:56:51 +0300 Subject: [PATCH 1/2] ci: Add arm CPU for darwin . ci-os-only: darwin. --- .cirrus.yml | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 9f2282471a..c23be7363c 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -293,7 +293,6 @@ task: name: macOS - Monterey - Meson env: -CPUS: 12 # always get that much for cirrusci macOS instances BUILD_JOBS: $CPUS # Test performance regresses noticably when using all cores. 8 seems to # work OK. See @@ -313,15 +312,26 @@ task: only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*' - osx_instance: -image: monterey-base + matrix: +- name: macOS - Monterey - Meson - ARM CPU + macos_instance: +image: ghcr.io/cirruslabs/macos-monterey-base:latest + env: +BREW_PATH: /opt/homebrew +CPUS: 4 + +- name: macOS - Monterey - Meson - Intel CPU + osx_instance: +image: monterey-base + env: +BREW_PATH: /usr/local +CPUS: 12 # always get that much for cirrusci macOS instances sysinfo_script: | id uname -a ulimit -a -H && ulimit -a -S export - setup_core_files_script: - mkdir ${HOME}/cores - sudo sysctl kern.corefile="${HOME}/cores/core.%P" @@ -360,7 +370,7 @@ task: ccache_cache: folder: $CCACHE_DIR configure_script: | -brewpath="/usr/local" +brewpath=${BREW_PATH} PKG_CONFIG_PATH="${brewpath}/lib/pkgconfig:${PKG_CONFIG_PATH}" for pkg in icu4c krb5 openldap openssl zstd ; do -- 2.37.3 From 5ad2357ff7f3ddcb24754847f542952143ccafa7 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 23 Sep 2022 14:30:06 +0300 Subject: [PATCH 2/2] fix: darwin: ARM CPU darwin krb5 path fix --- src/test/kerberos/t/001_auth.pl | 103 1 file changed, 79 insertions(+), 24 deletions(-) diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl index a2bc8a5351..9587698df3 100644 --- a/src/test/kerberos/t/001_auth.pl +++ b/src/test/kerberos/t/001_auth.pl @@ -30,39 +30,94 @@ elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/) plan skip_all => 'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA'; } -my ($krb5_bin_dir, $krb5_sbin_dir); - -if ($^O eq 'darwin') -{ - $krb5_bin_dir = '/usr/local/opt/krb5/bin'; - $krb5_sbin_dir = '/usr/local/opt/krb5/sbin'; -} -elsif ($^O eq 'freebsd') -{ - $krb5_bin_dir = '/usr/local/bin'; - $krb5_sbin_dir = '/usr/local/sbin'; -} -elsif ($^O eq 'linux') -{ - $krb5_sbin_dir = '/usr/sbin'; -} - my $krb5_config = 'krb5-config'; my $kinit= 'kinit'; my $kdb5_util= 'kdb5_util'; my $kadmin_local = 'kadmin.local'; my $krb5kdc = 'krb5kdc'; -if ($krb5_bin_dir && -d $krb5_bin_dir) +# control variable for checking if all executables are found +my $is_krb5_found = 0; + +# Given $krb5_path_prefix (possible paths of kerberos executables), +# first check value of $is_krb5_found: +# if it is truthy value this means executables are already found so +# don't try to locate further, else add '/bin/' and '/sbin/' +# to the $krb5_path_prefix and try to locate all required kerberos executables +# ($krb5_config, $kinit, $kdb5_util, $kadmin_local, $krb5kdc). +# If all executables are found, set $is_krb5_found to 1 and use executables +# from this path; even if one of the executables is not found, skip this path. +# If all paths are searched and executables are still not found, +# test will try to use kerberos executables from $PATH. +sub test_krb5_paths { - $krb5_config = $krb5_bin_dir . '/' . $krb5_config; - $kinit = $krb5_bin_dir . '/' . $kinit; + my ($krb5_path_prefix) = (@_); + + # if executables are already found, return + if ($is_krb5_found) + {
Turn TransactionIdRetreat/Advance into inline functions
Hi! This patch is inspired by [0] and many others. I've notice recent activity to convert macros into inline functions. We should make TransactionIdRetreat/Advance functions Instead of a macro, should we? I also think about NormalTransactionIdPrecedes and NormalTransactionIdFollows, but maybe, they should be addressed separately: the comment says that "this is a macro for speed". Any thoughts? [0]: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com -- Best regards, Maxim Orlov. v1-0001-Convert-macros-to-static-inline-functions-transam.patch Description: Binary data
Re: src/test/perl/PostgreSQL/Test/*.pm not installed
Alvaro Herrera writes: > I noticed that the new Perl test modules are not installed, so if you > try to use PostgreSQL/Test/Cluster.pm in an external test from pgxs, it > fails with the modules not being found. > I see no reason for this other than having overseen it in b235d41d9646, > so I propose the attached (for all branches, naturally.) +1, but I suppose you need some adjustment in the meson.build files now too. (Also, please wait for the v15 release freeze to lift.) regards, tom lane
Re: create subscription - improved warning message
Peter Smith writes: > On Mon, Oct 10, 2022 at 4:04 PM Amit Kapila wrote: >> I think the below gives accurate information: >> WARNING: subscription was created, but is not connected >> HINT: You should create the slot manually, enable the subscription, >> and run %s to initiate replication. > +1 It feels a little strange to me that we describe two steps rather vaguely and then give exact SQL for the third step. How about, say, HINT: To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription. Another idea is HINT: To initiate replication, create the replication slot on the publisher, then run ALTER SUBSCRIPTION ... ENABLE and ALTER SUBSCRIPTION ... REFRESH PUBLICATION. regards, tom lane
Re: Turn TransactionIdRetreat/Advance into inline functions
Maxim Orlov writes: > I've notice recent activity to convert macros into inline functions. We > should make TransactionIdRetreat/Advance functions > Instead of a macro, should we? -1. Having to touch all the call sites like this outweighs any claimed advantage: it makes them uglier and it will greatly complicate any back-patching we might have to do in those areas. regards, tom lane
Re: Turn TransactionIdRetreat/Advance into inline functions
> -1. Having to touch all the call sites like this outweighs > any claimed advantage: it makes them uglier and it will greatly > complicate any back-patching we might have to do in those areas. > > regards, tom lane > Ok, got it. But what if we change the semantics of these calls to xid = TransactionIdAdvance(xid) ? -- Best regards, Maxim Orlov.
Re: Turn TransactionIdRetreat/Advance into inline functions
Maxim Orlov writes: >> -1. Having to touch all the call sites like this outweighs >> any claimed advantage: it makes them uglier and it will greatly >> complicate any back-patching we might have to do in those areas. > Ok, got it. But what if we change the semantics of these calls to > xid = TransactionIdAdvance(xid) ? Uh ... you'd still have to touch all the call sites. regards, tom lane
Re: shadow variables - pg15 edition
David Rowley writes: > On Fri, 7 Oct 2022 at 13:24, David Rowley wrote: >> Since I just committed the patch to fix the final warnings, I think we >> should go ahead and commit the patch you wrote to add >> -Wshadow=compatible-local to the standard build flags. I don't mind >> doing this. > Pushed. The buildfarm's showing a few instances of this warning, which seem to indicate that not all versions of the Perl headers are clean: fairywren | 2022-10-10 09:03:50 | C:/Perl64/lib/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Wshadow=compatible-local] fairywren | 2022-10-10 09:03:50 | C:/Perl64/lib/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Wshadow=compatible-local] fairywren | 2022-10-10 09:03:50 | C:/Perl64/lib/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Wshadow=compatible-local] fairywren | 2022-10-10 09:03:50 | C:/Perl64/lib/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Wshadow=compatible-local] fairywren | 2022-10-10 09:03:50 | C:/Perl64/lib/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Wshadow=compatible-local] fairywren | 2022-10-10 09:03:50 | C:/Perl64/lib/CORE/cop.h:612:13: warning: declaration of 'av' shadows a previous local [-Wshadow=compatible-local] snakefly | 2022-10-10 08:21:05 | Util.c:457:14: warning: declaration of 'cv' shadows a parameter [-Wshadow=compatible-local] Before you ask: fairywren: perl 5.24.3 snakefly: perl 5.16.3 which are a little old, but not *that* old. Scraping the configure logs also shows that only half of the buildfarm (exactly 50 out of 100 reporting animals) knows -Wshadow=compatible-local, which suggests that we might see more of these if they all did. On the other hand, animals with newer compilers probably also have newer Perl installations, so assuming that the Perl crew have kept this clean recently, maybe not. Not sure if this is problematic enough to justify removing the switch. A plausible alternative is to have a few animals with known-clean Perl installations add the switch manually (and use -Werror), so that we find out about violations without having warnings in the face of developers who can't fix them. I'm willing to wait to see if anyone complains of such warnings, though. regards, tom lane
Re: shadow variables - pg15 edition
Hi, On 2022-10-10 12:06:22 -0400, Tom Lane wrote: > Scraping the configure logs also shows that only half of the buildfarm > (exactly 50 out of 100 reporting animals) knows -Wshadow=compatible-local, > which suggests that we might see more of these if they all did. I think it's not just newness - only gcc has compatible-local, even very new clang doesn't. This was fixed ~6 years ago in perl: commit f2b9631d5d19d2b71c1776e1193173d13f3620bf Author: David Mitchell Date: 2016-05-23 14:43:56 +0100 CX_POP_SAVEARRAY(): use more distinctive var name Under -Wshadow, CX_POP_SAVEARRAY's local var 'av' can generate this warning: warning: declaration shadows a local variable [-Wshadow] So rename it to cx_pop_savearay_av to reduce the risk of a clash. (See http://nntp.perl.org/group/perl.perl5.porters/236444) > Not sure if this is problematic enough to justify removing the switch. > A plausible alternative is to have a few animals with known-clean Perl > installations add the switch manually (and use -Werror), so that we find > out about violations without having warnings in the face of developers > who can't fix them. I'm willing to wait to see if anyone complains of > such warnings, though. Given the age of affected perl instances I suspect there'll not be a lot of developers affected, and the number of warnings is reasonably small too. It'd likely hurt more developers to not see the warnings locally, given that such shadowing often causes bugs. Greetings, Andres Freund
Re: src/test/perl/PostgreSQL/Test/*.pm not installed
On 2022-Oct-10, Tom Lane wrote: > +1, but I suppose you need some adjustment in the meson.build files > now too. Oh, right, I forgot ... > (Also, please wait for the v15 release freeze to lift.) ... and now that I look, it turns out that 15 and master need no changes: both the Makefile and the meson files are correct already. Only 14 and back have this problem. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: shadow variables - pg15 edition
On 2022-Oct-10, Andres Freund wrote: > Given the age of affected perl instances I suspect there'll not be a lot of > developers affected, and the number of warnings is reasonably small too. It'd > likely hurt more developers to not see the warnings locally, given that such > shadowing often causes bugs. Maybe we can install a filter-out in src/pl/plperl's Makefile for the time being. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Por suerte hoy explotó el califont porque si no me habría muerto de aburrido" (Papelucho)
Re: shadow variables - pg15 edition
Hi, On 2022-10-10 18:33:11 +0200, Alvaro Herrera wrote: > On 2022-Oct-10, Andres Freund wrote: > > > Given the age of affected perl instances I suspect there'll not be a lot of > > developers affected, and the number of warnings is reasonably small too. > > It'd > > likely hurt more developers to not see the warnings locally, given that such > > shadowing often causes bugs. > > Maybe we can install a filter-out in src/pl/plperl's Makefile for the > time being. We could, but is it really a useful thing for something fixed 6 years ago? Greetings, Andres Freund
Re: Adding Support for Copy callback functionality on COPY TO api
On Mon, Oct 10, 2022 at 12:41:40PM +0530, Bharath Rupireddy wrote: > IIUC, COPY TO callback helps move a table's data out of postgres > server. Just wondering, how is it different from existing solutions > like COPY TO ... PROGRAM/FILE, logical replication, pg_dump etc. that > can move a table's data out? I understandb that the COPY FROM callback > was needed for logical replication 7c4f52409. Mentioning a concrete > use-case helps here. This new callback allows the use of COPY TO's machinery in extensions. A couple of generic use-cases are listed upthread [0], and one concrete use-case is the aws_s3 extension [1]. > I'm not quite sure if we need a separate module to just tell how to > use this new callback. I strongly feel that it's not necessary. It > unnecessarily creates extra code (actual code is 25 LOC with v1 patch > but 150 LOC with v5 patch) and can cause maintenance burden. These > callback APIs are simple enough to understand for those who know > BeginCopyTo() or BeginCopyFrom() and especially for those who know how > to write extensions. These are not APIs that an end-user uses. The > best would be to document both COPY FROM and COPY TO callbacks, > perhaps with a pseudo code specifying just the essence [1], and their > possible usages somewhere here > https://www.postgresql.org/docs/devel/sql-copy.html. > > The order of below NOTICE messages isn't guaranteed and it can change > depending on platforms. Previously, we've had to suppress such > messages in the test output 6adc5376d. I really doubt that this small test case is going to cause anything approaching undue maintenance burden. I think it's important to ensure this functionality continues to work as expected long into the future. [0] https://postgr.es/m/253C21D1-FCEB-41D9-A2AF-E6517015B7D7%40amazon.com [1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/postgresql-s3-export.html#aws_s3.export_query_to_s3 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Add index scan progress to pg_stat_progress_vacuum
> One idea would be to add a flag, say report_parallel_vacuum_progress, > to IndexVacuumInfo struct and expect index AM to check and update the > parallel index vacuum progress, say every 1GB blocks processed. The > flag is true only when the leader process is vacuuming an index. Sorry for the long delay on this. I have taken the approach as suggested by Sawada-san and Robert and attached is v12. 1. The patch introduces a new counter in the shared memory already used by the parallel leader and workers to keep track of the number of indexes completed. This way there is no reason to loop through the index status everytime we want to get the status of indexes completed. 2. A new function in vacuumparallel.c will be used to update the progress of a indexes completed by reading from the counter created in point #1. 3. The function is called during the vacuum_delay_point as a matter of convenience, since it's called in all major vacuum loops. The function will only do anything if the caller sets a boolean to report progress. Doing so will also ensure progress is being reported in case the parallel workers completed before the leader. 4. Rather than adding any complexity to WaitForParallelWorkersToFinish and introducing a new callback, vacuumparallel.c will wait until the number of vacuum workers is 0 and then process to call WaitForParallelWorkersToFinish as it does. 5. Went back to the idea of adding a new view called pg_stat_progress_vacuum_index which is accomplished by adding a new type called VACUUM_PARALLEL in progress.h Thanks, Sami Imseih Amazon Web Servies (AWS) v12-0001--Show-progress-for-index-vacuums.patch Description: v12-0001--Show-progress-for-index-vacuums.patch
Re: shadow variables - pg15 edition
On 2022-10-10 09:37:38 -0700, Andres Freund wrote: > On 2022-10-10 18:33:11 +0200, Alvaro Herrera wrote: > > On 2022-Oct-10, Andres Freund wrote: > > > > > Given the age of affected perl instances I suspect there'll not be a lot > > > of > > > developers affected, and the number of warnings is reasonably small too. > > > It'd > > > likely hurt more developers to not see the warnings locally, given that > > > such > > > shadowing often causes bugs. > > > > Maybe we can install a filter-out in src/pl/plperl's Makefile for the > > time being. > > We could, but is it really a useful thing for something fixed 6 years ago? As an out, a hypothetical dev could add -Wno-shadow=compatible-local to their CFLAGS.
Re: shadow variables - pg15 edition
On 2022-Oct-10, Andres Freund wrote: > On 2022-10-10 09:37:38 -0700, Andres Freund wrote: > > On 2022-10-10 18:33:11 +0200, Alvaro Herrera wrote: > > > On 2022-Oct-10, Andres Freund wrote: > > > > > > > Given the age of affected perl instances I suspect there'll not be a > > > > lot of > > > > developers affected, and the number of warnings is reasonably small > > > > too. It'd > > > > likely hurt more developers to not see the warnings locally, given that > > > > such > > > > shadowing often causes bugs. > > > > > > Maybe we can install a filter-out in src/pl/plperl's Makefile for the > > > time being. > > > > We could, but is it really a useful thing for something fixed 6 years ago? Well, for people purposefully installing using older installs of Perl (not me, admittedly), it does seem useful, because you get the benefit of checking shadow vars for the rest of the tree and still get no warnings if everything is clean. > As an out, a hypothetical dev could add -Wno-shadow=compatible-local to their > CFLAGS. But that disables it for the tree as a whole, which is not better. We can remove the filter-out when we decide to move the Perl version requirement up, say 4 years from now. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)
Re: Error-safe user functions
> > > The idea is simple -- introduce new "error-safe" calling mode of user > functions by passing special node through FunctCallInfo.context, in > which function should write error info and return instead of throwing > it. Also such functions should manually free resources before > returning an error. This gives ability to avoid PG_TRY/PG_CATCH and > subtransactions. > > I tried something similar when trying to implement TRY_CAST ( https://learn.microsoft.com/en-us/sql/t-sql/functions/try-cast-transact-sql?view=sql-server-ver16) late last year. I also considered having a default datum rather than just returning NULL. I had not considered a new node type. I had considered having every function have a "safe" version, which would be a big duplication of logic requiring a lot of regression tests and possibly fuzzing tests. Instead, I extended every core input function to have an extra boolean parameter to indicate if failures were allowed, and then an extra Datum parameter for the default value. The Input function wouldn't need to check the value of the new parameters until it was already in a situation where it found invalid data, but the extra overhead still remained, and it meant that basically every third party type extension would need to be changed. Then I considered whether the cast failure should be completely silent, or if the previous error message should instead be omitted as a LOG/INFO/WARN, and if we'd want that to be configurable, so then the boolean parameter became an integer enum: * regular fail (0) * use default silently (1) * use default emit LOG/NOTICE/WARNING (2,3,4) At the time, all of this seemed like too big of a change for a function that isn't even in the SQL Standard, but maybe SQL/JSON changes that. If so, it would allow for a can-cast-to test that users would find very useful. Something like: SELECT CASE WHEN 'abc' CAN BE integer THEN 'Integer' ELSE 'Nope' END There's obviously no standard syntax to support that, but the data cleansing possibilities would be great.
Re: src/test/perl/PostgreSQL/Test/*.pm not installed
Alvaro Herrera writes: > Only 14 and back have this problem. Ah, cool. There's no freeze on those branches ... regards, tom lane
Re: shadow variables - pg15 edition
Alvaro Herrera writes: > On 2022-Oct-10, Andres Freund wrote: >> We could, but is it really a useful thing for something fixed 6 years ago? > Well, for people purposefully installing using older installs of Perl > (not me, admittedly), it does seem useful, because you get the benefit > of checking shadow vars for the rest of the tree and still get no > warnings if everything is clean. Meh --- people purposefully using old Perls are likely using old compilers too. Let's wait and see if any devs actually complain. regards, tom lane
Re: Suppressing useless wakeups in walreceiver
On Mon, Oct 10, 2022 at 03:22:15PM +0530, Bharath Rupireddy wrote: > Some comments on v4-0002: Thanks for taking a look. > 1. You might want to use PG_INT64_MAX instead of INT64_MAX for portability? Yes, I used PG_INT64_MAX in v5. > 2. With the below change, the time walreceiver spends in > XLogWalRcvSendReply() is also included for XLogWalRcvSendHSFeedback > right? I think it's a problem given that XLogWalRcvSendReply() can > take a while. Earlier, this wasn't the case, each function calculating > 'now' separately. Any reason for changing this behaviour? I know that > GetCurrentTimestamp(); isn't cheaper all the time, but here it might > result in a change in the behaviour. Yes, if XLogWalRcvSendReply() takes a long time, we might defer sending the hot_standby_feedback message until a later time. The only reason I changed this was to avoid extra calls to GetCurrentTimestamp(), which might be expensive on some platforms. Outside of the code snippet you pointed out, I think WalReceiverMain() has a similar problem. That being said, I'm generally skeptical that this sort of thing is detrimental given the current behavior (i.e., wake up every 100ms), the usual values of these GUCs (i.e., tens of seconds), and the fact that any tasks that are inappropriately skipped will typically be retried in the next iteration of the loop. > 3. I understand that TimestampTz type is treated as microseconds. > Would you mind having a comment in below places to say why we're > multiplying with 1000 or 100 here? Also, do we need 1000L or > 100L or type cast to > TimestampTz? I used INT64CONST() in v5, as that seemed the most portable, but I stopped short of adding comments to explain all the conversions. IMO the conversions are relatively obvious, and the units of the GUCs can be easily seen in guc_tables.c. > 4. How about simplifying WalRcvComputeNextWakeup() something like below? Other than saving a couple lines of code, IMO this doesn't meaningfully simplify the function or improve readability. > 5. Can we move below code snippets to respective static functions for > better readability and code reuse? > This: > +/* Find the soonest wakeup time, to limit our nap. */ > +nextWakeup = INT64_MAX; > +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) > +nextWakeup = Min(state.wakeup[i], nextWakeup); > +nap = Max(0, (nextWakeup - now + 999) / 1000); > > And this: > > +now = GetCurrentTimestamp(); > +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) > +WalRcvComputeNextWakeup(&state, i, now); This did cross my mind, but I opted to avoid creating new functions because 1) they aren't likely to be reused very much, and 2) I actually think it might hurt readability by forcing developers to traipse around the file to figure out what these functions are actually doing. It's not like it would save many lines of code, either. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 9d274eecef8e66e60b34d14c24459e70e782cf86 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Oct 2022 10:31:35 -0700 Subject: [PATCH v5 1/2] Move WAL receivers' non-shared state to a new struct. This is preparatory work for a follow-up change that will revamp the wakeup mechanism for periodic tasks that WAL receivers must perform. --- src/backend/replication/walreceiver.c | 90 ++- 1 file changed, 48 insertions(+), 42 deletions(-) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 6cbb67c92a..89985c54cf 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -116,6 +116,14 @@ static struct XLogRecPtr Flush; /* last byte + 1 flushed in the standby */ } LogstreamResult; +/* + * A struct to keep track of non-shared state. + */ +typedef struct WalRcvInfo +{ + TimeLineID startpointTLI; +} WalRcvInfo; + static StringInfoData reply_message; static StringInfoData incoming_message; @@ -123,12 +131,12 @@ static StringInfoData incoming_message; static void WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last); static void WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI); static void WalRcvDie(int code, Datum arg); -static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len, - TimeLineID tli); -static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, - TimeLineID tli); -static void XLogWalRcvFlush(bool dying, TimeLineID tli); -static void XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli); +static void XLogWalRcvProcessMsg(WalRcvInfo *state, unsigned char type, + char *buf, Size len); +static void XLogWalRcvWrite(WalRcvInfo *state, char *buf, Size nbytes, + XLogRecPtr recptr); +static void XLogWalRcvFlush(WalRcvInfo *state, bool dying); +static void XLogWalRcvClose(WalRc
Re: [patch] \g with multiple result sets and \watch with copy queries
> > This is a bit more complicated than the usual tests, but not > that much. > Any opinions on this? +1 I think that because it is more complicated than usual psql, we may want to comment on the intention of the tests and some of the less-than-common psql elements (\set concatenation, resetting \o, etc). If you see value in that I can amend the patch. Are there any options on COPY (header, formats) that we think we should test as well?
Re: Suppressing useless wakeups in walreceiver
On Mon, Oct 10, 2022 at 10:51:14AM -0700, Nathan Bossart wrote: >> +/* Find the soonest wakeup time, to limit our nap. */ >> +nextWakeup = INT64_MAX; >> +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) >> +nextWakeup = Min(state.wakeup[i], nextWakeup); >> +nap = Max(0, (nextWakeup - now + 999) / 1000); Hm. We should probably be more cautious here since nextWakeup is an int64 and nap is an int. My guess is that we should just set nap to -1 if nextWakeup > INT_MAX. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [patch] \g with multiple result sets and \watch with copy queries
Bonjour Daniel, Good catch! Thanks for the quick fix! As usual, what is not tested does not:-( Attached a tap test to check for the expected behavior with multiple command \g. -- Fabien.diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl index f447845717..c81feadd4e 100644 --- a/src/bin/psql/t/001_basic.pl +++ b/src/bin/psql/t/001_basic.pl @@ -325,4 +325,22 @@ is($row_count, '10', 'client-side error commits transaction, no ON_ERROR_STOP and multiple -c switches' ); +# test \g +psql_like($node, "SELECT 'un' \\g g_file_1.out", qr//, "one command \\g"); +my $c1 = slurp_file("g_file_1.out"); +like($c1, qr/un/); +unlink "g_file_1.out" or die $!; + +psql_like($node, "SELECT 'deux' \\; SELECT 'trois' \\g g_file_2.out", qr//, "two commands \\g"); +my $c2 = slurp_file("g_file_2.out"); +like($c2, qr/deux.*trois/s); +unlink "g_file_2.out" or die $!; + +psql_like($node, "\\set SHOW_ALL_RESULTS 0\nSELECT 'quatre' \\; SELECT 'cinq' \\g g_file_3.out", qr//, + "two commands \\g with only last result"); +my $c3 = slurp_file("g_file_3.out"); +like($c3, qr/cinq/); +unlike($c3, qr/quatre/); +unlink "g_file_3.out" or die $!; + done_testing();
Re: subtransaction performance
On Fri, Oct 7, 2022 at 03:23:27PM -0700, Zhihong Yu wrote: > Hi, > I stumbled over: > > https://about.gitlab.com/blog/2021/09/29/ > why-we-spent-the-last-month-eliminating-postgresql-subtransactions/ > > I wonder if SAVEPOINT / subtransaction performance has been boosted since the > blog was written. No, I have not seen any changes in this area since then. Seems there are two problems --- the 64 cache per session and the 64k on the replica. In both cases, it seems sizing is not optimal, but sizing is never optimal. I guess we can look at allowing manual size adjustment, automatic size adjustment, or a different approach that is more graceful for larger savepoint workloads. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: list of acknowledgments for PG15
On Mon, Oct 10, 2022 at 02:41:06AM -0400, Tom Lane wrote: > I don't wish to object to adding Nakamori-san here, but I feel like we > need a policy that doesn't require last-minute updates to release notes. > > As far as I've understood, the idea is to credit people based on the > time frame in which their patches were committed, not on the branch(es) > that the patches were committed to. Otherwise we'd have to retroactively > add people to back-branch acknowledgements, and we have not been doing > that. So a patch that goes in during the v16 development cycle means > that the author should get acknowledged in the v16 release notes, > even if it got back-patched to older branches. What remains is to > define when is the cutoff point between "acknowledge in v15" versus > "acknowledge in v16". I don't have a strong opinion about that, > but I'd like it to be more than 24 hours before the 15.0 wrap. > Could we make the cutoff be, say, beta1? Is the issue that we are really only crediting people whose commits/work appears in major releases, and not in minor ones? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: list of acknowledgments for PG15
Bruce Momjian writes: > On Mon, Oct 10, 2022 at 02:41:06AM -0400, Tom Lane wrote: >> As far as I've understood, the idea is to credit people based on the >> time frame in which their patches were committed, not on the branch(es) >> that the patches were committed to. > Is the issue that we are really only crediting people whose commits/work > appears in major releases, and not in minor ones? What Peter has said about this is that he lists everyone whose name has appeared in commit messages over thus-and-such a time frame. So it doesn't matter which branch is involved, just when the contribution was made. That process is fine with me; I'm just seeking a bit more clarity as to what the time frames are. regards, tom lane
Re: list of acknowledgments for PG15
On Mon, Oct 10, 2022 at 02:44:22PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Mon, Oct 10, 2022 at 02:41:06AM -0400, Tom Lane wrote: > >> As far as I've understood, the idea is to credit people based on the > >> time frame in which their patches were committed, not on the branch(es) > >> that the patches were committed to. > > > Is the issue that we are really only crediting people whose commits/work > > appears in major releases, and not in minor ones? > > What Peter has said about this is that he lists everyone whose name > has appeared in commit messages over thus-and-such a time frame. > So it doesn't matter which branch is involved, just when the contribution > was made. That process is fine with me; I'm just seeking a bit more > clarity as to what the time frames are. Oh, that's an interesting approach but it might mean that, for example, PG 16 patch authors appear in the PG 15 major release notes. It seems that the stable major release branch date should be the cut-off, so no one is missed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
I've gone ahead and implemented option 1 (commented below). On Thu, Oct 6, 2022 at 6:23 PM Melanie Plageman wrote: > > v31 failed in CI, so > I've attached v32 which has a few issues fixed: > - addressed some compiler warnings I hadn't noticed locally > - autovac launcher and worker do indeed use bulkread strategy if they > end up starting before critical indexes have loaded and end up doing a > sequential scan of some catalog tables, so I have changed the > restrictions on BackendTypes allowed to track IO Operations in > IOCONTEXT_BULKREAD > - changed the name of the column "fsynced" to "files_synced" to make it > more clear what unit it is in (and that the unit differs from that of > the "unit" column) > > In an off-list discussion with Andres, he mentioned that he thought > buffers reused by a BufferAccessStrategy should be split from buffers > "acquired" and that "acquired" should be renamed "clocksweeps". > > I have started doing this, but for BufferAccessStrategy IO there are a > few choices about how we want to count the clocksweeps: > > Currently the following situations are counted under the following > IOContexts and IOOps: > > IOCONTEXT_[VACUUM,BULKREAD,BULKWRITE], IOOP_ACQUIRE > - reuse a buffer from the ring > > IOCONTEXT_SHARED, IOOP_ACQUIRE > - add a buffer to the strategy ring initially > - add a new shared buffer to the ring when all the existing buffers in > the ring are pinned > > And in the new paradigm, I think these are two good options: > > 1) > IOCONTEXT_[VACUUM,BULKREAD,BULKWRITE], IOOP_CLOCKSWEEP > - add a buffer to the strategy ring initially > - add a new shared buffer to the ring when all the existing buffers in > the ring are pinned > > IOCONTEXT_[VACUUM,BULKREAD,BULKWRITE], IOOP_REUSE > - reuse a buffer from the ring > I've implemented this option in attached v33. > 2) > IOCONTEXT_[VACUUM,BULKREAD,BULKWRITE], IOOP_CLOCKSWEEP > - add a buffer to the strategy ring initially > > IOCONTEXT_[VACUUM,BULKREAD,BULKWRITE], IOOP_REUSE > - reuse a buffer from the ring > > IOCONTEXT SHARED, IOOP_CLOCKSWEEP > - add a new shared buffer to the ring when all the existing buffers in > the ring are pinned - Melanie From 6a83f0028a69a56243fa5b036299185766b80629 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Thu, 6 Oct 2022 12:23:38 -0400 Subject: [PATCH v33 2/4] Aggregate IO operation stats per BackendType Stats on IOOps for all IOContexts for a backend are tracked locally. Add functionality for backends to flush these stats to shared memory and accumulate them with those from all other backends, exited and live. Also add reset and snapshot functions used by cumulative stats system for management of these statistics. The aggregated stats in shared memory could be extended in the future with per-backend stats -- useful for per connection IO statistics and monitoring. Some BackendTypes will not flush their pending statistics at regular intervals and explicitly call pgstat_flush_io_ops() during the course of normal operations to flush their backend-local IO operation statistics to shared memory in a timely manner. Because not all BackendType, IOOp, IOContext combinations are valid, the validity of the stats is checked before flushing pending stats and before reading in the existing stats file to shared memory. Author: Melanie Plageman Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Kyotaro Horiguchi Reviewed-by: Maciek Sakrejda Reviewed-by: Lukas Fittl Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- doc/src/sgml/monitoring.sgml | 2 + src/backend/utils/activity/pgstat.c | 35 src/backend/utils/activity/pgstat_bgwriter.c | 7 +- .../utils/activity/pgstat_checkpointer.c | 7 +- src/backend/utils/activity/pgstat_io_ops.c| 161 ++ src/backend/utils/activity/pgstat_relation.c | 15 +- src/backend/utils/activity/pgstat_shmem.c | 4 + src/backend/utils/activity/pgstat_wal.c | 4 +- src/backend/utils/adt/pgstatfuncs.c | 4 +- src/include/miscadmin.h | 2 + src/include/pgstat.h | 84 + src/include/utils/pgstat_internal.h | 36 src/tools/pgindent/typedefs.list | 3 + 13 files changed, 358 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 342b20ebeb..14dfd650f8 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5360,6 +5360,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i the pg_stat_bgwriter view, archiver to reset all the counters shown in the pg_stat_archiver view, +io to reset all the counters shown in the +pg_stat_io view, wal to reset all the counters shown in the pg_stat_wal view or recovery_prefetch to re
Re: Simplifying our Trap/Assert infrastructure
Nathan Bossart writes: > On Sun, Oct 09, 2022 at 05:08:39PM -0400, Tom Lane wrote: >> Something I thought about but forgot to mention in the initial email: >> is it worth sprinkling these macros with "unlikely()"? > I don't see why not. I experimented with that, and found something that surprised me: there's a noticeable code-bloat effect. With the patch as given, $ size src/backend/postgres textdata bss dec hex filename 9001199 86280 204496 9291975 8dc8c7 src/backend/postgres but with unlikely(), $ size src/backend/postgres textdata bss dec hex filename 9035423 86280 204496 9326199 8e4e77 src/backend/postgres I don't quite understand why that's happening, but it seems to show that this requires some investigation of its own. So for now I just pushed the patch as-is. regards, tom lane
Re: Reducing the chunk header sizes on all memory context types
I wrote: > What I am mainly wondering about at this point is whether Asserts > are good enough or we should use actual test-and-elog checks for > these things. Hearing no comments on that, I decided that a good policy would be to use Asserts in the paths dealing with small chunks but test-and-elog in the paths dealing with large chunks. We already had test-and-elog sanity checks in the latter paths, at least in aset.c, which the new checks can reasonably be combined with. It's unlikely that the large-chunk case is at all performance-critical, too. But adding runtime checks in the small-chunk paths would probably risk losing some performance in production builds, and I'm not currently prepared to argue that the problem is big enough to justify that. Hence v2 attached, which cleans things up a tad in aset.c and then extends similar policy to generation.c and slab.c. Of note is that slab.c was doing things like this: SlabContext *slab = castNode(SlabContext, context); Assert(slab); which has about the same effect as what I'm proposing with AllocSetIsValid, but (a) it's randomly different from the other allocators, and (b) it's probably a hair less efficient, because I doubt the compiler can optimize away castNode's special handling of NULL. So I made these bits follow the style of aset.c. regards, tom lane diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index ec423375ae..db402e3a41 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -132,6 +132,10 @@ typedef struct AllocFreeListLink #define GetFreeListLink(chkptr) \ (AllocFreeListLink *) ((char *) (chkptr) + ALLOC_CHUNKHDRSZ) +/* Validate a freelist index retrieved from a chunk header */ +#define FreeListIdxIsValid(fidx) \ + ((fidx) >= 0 && (fidx) < ALLOCSET_NUM_FREELISTS) + /* Determine the size of the chunk based on the freelist index */ #define GetChunkSizeFromFreeListIdx(fidx) \ Size) 1) << ALLOC_MINBITS) << (fidx)) @@ -202,7 +206,15 @@ typedef struct AllocBlockData * AllocSetIsValid * True iff set is valid allocation set. */ -#define AllocSetIsValid(set) PointerIsValid(set) +#define AllocSetIsValid(set) \ + (PointerIsValid(set) && IsA(set, AllocSetContext)) + +/* + * AllocBlockIsValid + * True iff block is valid block of allocation set. + */ +#define AllocBlockIsValid(block) \ + (PointerIsValid(block) && AllocSetIsValid((block)->aset)) /* * We always store external chunks on a dedicated block. This makes fetching @@ -530,8 +542,7 @@ AllocSetReset(MemoryContext context) { AllocSet set = (AllocSet) context; AllocBlock block; - Size keepersize PG_USED_FOR_ASSERTS_ONLY - = set->keeper->endptr - ((char *) set); + Size keepersize PG_USED_FOR_ASSERTS_ONLY; AssertArg(AllocSetIsValid(set)); @@ -540,6 +551,9 @@ AllocSetReset(MemoryContext context) AllocSetCheck(context); #endif + /* Remember keeper block size for Assert below */ + keepersize = set->keeper->endptr - ((char *) set); + /* Clear chunk freelists */ MemSetAligned(set->freelist, 0, sizeof(set->freelist)); @@ -598,8 +612,7 @@ AllocSetDelete(MemoryContext context) { AllocSet set = (AllocSet) context; AllocBlock block = set->blocks; - Size keepersize PG_USED_FOR_ASSERTS_ONLY - = set->keeper->endptr - ((char *) set); + Size keepersize PG_USED_FOR_ASSERTS_ONLY; AssertArg(AllocSetIsValid(set)); @@ -608,6 +621,9 @@ AllocSetDelete(MemoryContext context) AllocSetCheck(context); #endif + /* Remember keeper block size for Assert below */ + keepersize = set->keeper->endptr - ((char *) set); + /* * If the context is a candidate for a freelist, put it into that freelist * instead of destroying it. @@ -994,9 +1010,16 @@ AllocSetFree(void *pointer) if (MemoryChunkIsExternal(chunk)) { - + /* Release single-chunk block. */ AllocBlock block = ExternalChunkGetBlock(chunk); + /* + * Try to verify that we have a sane block pointer: the block header + * should reference an aset and the freeptr should match the endptr. + */ + if (!AllocBlockIsValid(block) || block->freeptr != block->endptr) + elog(ERROR, "could not find block containing chunk %p", chunk); + set = block->aset; #ifdef MEMORY_CONTEXT_CHECKING @@ -1011,14 +1034,6 @@ AllocSetFree(void *pointer) } #endif - - /* - * Try to verify that we have a sane block pointer, the freeptr should - * match the endptr. - */ - if (block->freeptr != block->endptr) - elog(ERROR, "could not find block containing chunk %p", chunk); - /* OK, remove block from aset's list and free it */ if (block->prev) block->prev->next = block->next; @@ -1036,12 +1051,23 @@ AllocSetFree(void *pointer) } else { - int fidx = MemoryChunkGetValue(chunk); AllocBlock block = MemoryChunkGetBlock(chunk); - AllocFreeListLink *link = GetFreeListLink(chunk); + int fidx; + AllocFreeListLink *link; + /* + * In this path, for speed reasons we just Assert that t
Re: Reducing the chunk header sizes on all memory context types
On Tue, 11 Oct 2022 at 08:35, Tom Lane wrote: > Hearing no comments on that, I decided that a good policy would be > to use Asserts in the paths dealing with small chunks but test-and-elog > in the paths dealing with large chunks. This seems like a good policy. I think it's good to get at least the Asserts in there. If we have any troubles in the future then we can revisit this and reconsider if we need to elog them instead. > Hence v2 attached, which cleans things up a tad in aset.c and then > extends similar policy to generation.c and slab.c. Looking at your changes to SlabFree(), I don't really think that change is well aligned to the newly proposed policy. My understanding of the rationale behind this policy is that large chunks get malloced and will be slower anyway, so the elog(ERROR) is less overhead for those. In SlabFree(), we're most likely not doing any free()s, so I don't quite understand why you've added the elog rather than the Assert for this case. The slab allocator *should* be very fast. I don't have any issue with any of the other changes. David
Re: Reducing the chunk header sizes on all memory context types
David Rowley writes: > Looking at your changes to SlabFree(), I don't really think that > change is well aligned to the newly proposed policy. My understanding > of the rationale behind this policy is that large chunks get malloced > and will be slower anyway, so the elog(ERROR) is less overhead for > those. In SlabFree(), we're most likely not doing any free()s, so I > don't quite understand why you've added the elog rather than the > Assert for this case. The slab allocator *should* be very fast. Yeah, slab.c hasn't any distinction between large and small chunks, so we have to just pick one policy or the other. I'd hoped to get away with the more robust runtime test on the basis that slab allocation is not used so much that this'd result in any noticeable performance change. SlabRealloc, at least, is not used *at all* per the code coverage tests, and if we're there at all we should be highly suspicious that something is wrong. However, I could be wrong about SlabFree, and if you're going to hold my feet to the fire then I'll change it rather than try to produce some performance evidence. regards, tom lane
Re: problems with making relfilenodes 56-bits
Hi, On 2022-10-10 08:10:22 -0400, Robert Haas wrote: > On Mon, Oct 10, 2022 at 5:16 AM Dilip Kumar wrote: > > I have also executed my original test after applying these patches on > > top of the 56 bit relfilenode patch. So earlier we saw the WAL size > > increased by 11% (66199.09375 kB to 73906.984375 kB) and after this > > patch now the WAL generated is 58179.2265625. That means in this > > particular example this patch is reducing the WAL size by 12% even > > with the 56 bit relfilenode patch. > > That's a very promising result, but the question in my mind is how > much work would be required to bring this patch to a committable > state? The biggest part clearly is to review the variable width integer patch. It's not a large amount of code, but probably more complicated than average. One complication there is that currently the patch assumes: * Note that this function, for efficiency, reads 8 bytes, even if the * variable integer is less than 8 bytes long. The buffer has to be * allocated sufficiently large to account for that fact. The maximum * amount of memory read is 9 bytes. We could make a less efficient version without that assumption, but I think it might be easier to just guarantee it in the xlog*.c case. Using it in xloginsert.c is pretty darn simple, code-wise. xlogreader is bit harder, although not for intrinsic reasons - the logic underlying COPY_HEADER_FIELD seems unneccessary complicated to me. The minimal solution would likely be to just wrap the varint reads in another weird macro. Leaving the code issues themselves aside, one important thing would be to evaluate what the performance impacts of the varint encoding/decoding are as part of "full" server. I suspect it'll vanish in the noise, but we'd need to validate that. Greetings, Andres Freund
Re: meson PGXS compatibility
Hi, On 2022-10-05 13:07:10 -0700, Andres Freund wrote: > 0004: solaris: Check for -Wl,-E directly instead of checking for gnu LD > > This allows us to get rid of the nontrivial detection of with_gnu_ld, > simplifying meson PGXS compatibility. It's also nice to delete libtool.m4. > > I don't like the SOLARIS_EXPORT_DYNAMIC variable I invented. If somebody has > a better idea... A cleaner approach could be to add a LDFLAGS_BE and emit the -Wl,-E into it from both configure and meson, solely based on whether -Wl,-E is supported. I haven't verified cygwin, but on our other platforms that seems to do the right thing. Greetings, Andres Freund
Re: [PATCH] Fix build with LLVM 15 or above
On Tue, Oct 4, 2022 at 10:45 AM Zhihong Yu wrote: > On Mon, Oct 3, 2022 at 2:41 PM Andres Freund wrote: >> On 2022-10-03 12:16:12 -0700, Andres Freund wrote: >> > I haven't yet run through the whole regression test with an assert enabled >> > llvm because an assert-enabled llvm is *SLOW*, but it got through the first >> > few parallel groups ok. Using an optimized llvm 15, all tests pass with >> > PGOPTIONS=-cjit_above_cost=0. +/* + * When targetting an llvm version with opaque pointers enabled by + * default, turn them off for the context we build our code in. Don't need + * to do so for other contexts (e.g. llvm_ts_context) - once the IR is + * generated, it carries the necessary information. + */ +#if LLVM_VERSION_MAJOR > 14 +LLVMContextSetOpaquePointers(LLVMGetGlobalContext(), false); +#endif Ahh, right, thanks! >> That did pass. But to be able to use clang >= 15 one more piece is >> needed. Updated patch attached. + bitcode_cflags += ['-Xclang', '-no-opaque-pointers'] Oh, right. That makes sense. > I think `targetting` should be spelled as targeting Yeah. OK, I'll wait for the dust to settle on our 15 release and then back-patch this. Then I'll keep working on the opaque pointer support for master, which LLVM 16 will need (I expect we'll eventually want to back-patch that eventually, but first things first...).
Re: Reducing the chunk header sizes on all memory context types
On Tue, 11 Oct 2022 at 10:07, Tom Lane wrote: > Yeah, slab.c hasn't any distinction between large and small chunks, > so we have to just pick one policy or the other. I'd hoped to get > away with the more robust runtime test on the basis that slab allocation > is not used so much that this'd result in any noticeable performance > change. SlabRealloc, at least, is not used *at all* per the code > coverage tests, and if we're there at all we should be highly suspicious > that something is wrong. However, I could be wrong about SlabFree, > and if you're going to hold my feet to the fire then I'll change it > rather than try to produce some performance evidence. The main reason I brought it up was that only yesterday I was looking into fixing the slowness of the Slab allocator. It's currently quite far behind the performance of both generation.c and aset.c and it would be very nice to bring it up to at least be on-par with those. Ideally there would be some performance advantages of the fixed-sized chunks. I'd just rather not have any additional things go in to make that goal harder to reach. The proposed patches in [1] do aim to make additional usages of the slab allocator, and I have a feeling that we'll want to fix the performance of slab.c before those. Perhaps the Asserts are a better option if we're to get the proposed radix tree implementation. David [1] https://postgr.es/m/cad21aod3w76wers_lq7_ua6+gtaooerpji+yz8ac6aui4jw...@mail.gmail.com
Re: Reducing the chunk header sizes on all memory context types
David Rowley writes: > The main reason I brought it up was that only yesterday I was looking > into fixing the slowness of the Slab allocator. It's currently quite > far behind the performance of both generation.c and aset.c and it > would be very nice to bring it up to at least be on-par with those. Really!? That's pretty sad, because surely it should be handling a simpler case. Anyway, I'm about to push this with an Assert in SlabFree and run-time test in SlabRealloc. That should be enough to assuage my safety concerns, and then we can think about better performance. regards, tom lane
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Thanks for working on this! Like Lukas, I'm excited to see more visibility into important parts of the system like this. On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman wrote: > > I've gone ahead and implemented option 1 (commented below). No strong opinion on 1 versus 2, but I guess at least partly because I don't understand the implications (I do understand the difference, just not when it might be important in terms of stats). Can we think of a situation where combining stats about initial additions with pinned additions hides some behavior that might be good to understand and hard to pinpoint otherwise? I took a look at the latest docs (as someone mostly familiar with internals at only a pretty high level, so probably somewhat close to the target audience) and have some feedback. + + + backend_type text + + + Type of backend (e.g. background worker, autovacuum worker). + + Not critical, but is there a list of backend types we could cross-reference elsewhere in the docs? >From the io_context column description: + The autovacuum daemon, explicit VACUUM, explicit + ANALYZE, many bulk reads, and many bulk writes use a + fixed amount of memory, acquiring the equivalent number of shared + buffers and reusing them circularly to avoid occupying an undue portion + of the main shared buffer pool. + I don't understand how this is relevant to the io_context column. Could you expand on that, or am I just missing something obvious? + + + extended bigint + + + Extends of relations done by this backend_type in + order to write data in this io_context. + + I understand what this is, but not why this is something I might want to know about. And from your earlier e-mail: On Thu, Oct 6, 2022 at 10:42 AM Melanie Plageman wrote: > > Because we want to add non-block-oriented IO in the future (like > temporary file IO) to this view and want to use the same "read", > "written", "extended" columns, I would prefer not to prefix the columns > with "blks_". I have added a column "unit" which would contain the unit > in which read, written, and extended are in. Unfortunately, fsyncs are > not per block, so "unit" doesn't really work for this. I documented > this. > > The most correct thing to do to accommodate block-oriented and > non-block-oriented IO would be to specify all the values in bytes. > However, I would like this view to be usable visually (as opposed to > just in scripts and by tools). The only current value of unit is > "block_size" which could potentially be combined with the value of the > GUC to get bytes. > > I've hard-coded the string "block_size" into the view generation > function pg_stat_get_io(), so, if this idea makes sense, perhaps I > should do something better there. That seems broadly reasonable, but pg_settings also has a 'unit' field, and in that view, unit is '8kB' on my system--i.e., it (presumably) reflects the block size. Is that something we should try to be consistent with (not sure if that's a good idea, but thought it was worth asking)? > On Fri, Sep 30, 2022 at 7:18 PM Lukas Fittl wrote: > > - Overall it would be helpful if we had a dedicated documentation page on > > I/O statistics that's linked from the pg_stat_io view description, and > > explains how the I/O statistics tie into the various concepts of shared > > buffers / buffer access strategies / etc (and what is not tracked today) > > I haven't done this yet. How specific were you thinking -- like > interpretations of all the combinations and what to do with what you > see? Like you should run pg_prewarm if you see X? Specific checkpointer > or bgwriter GUCs to change? Or just links to other docs pages on > recommended tunings? > > Were you imagining the other IO statistics views (like > pg_statio_all_tables and pg_stat_database) also being included in this > page? Like would it be a comprehensive guide to IO statistics and what > their significance/purposes are? I can't speak for Lukas here, but I encouraged him to suggest more thorough documentation in general, so I can speak to my concerns: in general, these stats should be usable for someone who does not know much about Postgres internals. It's pretty low-level information, sure, so I think you need some understanding of how the system broadly works to make sense of it. But ideally you should be able to find what you need to understand the concepts involved within the docs. I think your updated docs are much clearer (with the caveats of my specific comments above). It would still probably be helpful to have a dedicated page on I/O stats (and yeah, something with a broad scope, along the lines of a comprehensive guide), but I think that can wait until a future patch. Thanks, Maciek
autovacuum_freeze_max_age reloption seems broken
The autovacuum_freeze_max_age reloption exists so that the DBA can optionally have antiwraparound autovacuums run against a table that requires more frequent antiwraparound autovacuums. This has problems because there are actually two types of VACUUM right now (aggressive and non-aggressive), which, strictly speaking, is an independent condition of antiwraparound-ness. There is a tacit assumption within autovacuum.c that all antiwraparound autovacuums are also aggressive, I think. But that just isn't true, which leads to clearly broken behavior when the autovacuum_freeze_max_age reloption is in use. Note that the VacuumParams state that gets passed down to vacuum_set_xid_limits() does not include anything about the "reloption version" of autovacuum_freeze_max_age. So quite naturally vacuum_set_xid_limits() can only work off of the autovacuum_freeze_max_age GUC, even when the reloption happens to have been used over in autovacuum.c. In practice this means that we can easily see autovacuum spin uselessly when the reloption is in use -- it'll launch antiwraparound autovacuums that never advance relfrozenxid and so never address the relfrozenxid age issue from the point of view of autovacuum.c. There is no reason to think that the user will also (say) set the autovacuum_freeze_table_age reloption separately (not to be confused with the vacuum_freeze_table_age GUC!). We'll usually just work off the GUC (I mean why wouldn't we?). I don't see why vacuumlazy.c doesn't just force aggressive mode whenever it sees an antiwraparound autovacuum, no matter what. Recall the problem scenario that led to bugfix commit dd9ac7d5 -- that also could have been avoided by making sure that every antiwraparound autovacuum was aggressive (actually the original problem was that we'd suppress non-aggressive antiwraparound autovacuums as redundant). I only noticed this problem because I am in the process of writing a patch series that demotes vacuum_freeze_table_age to a mere compatibility option (and even gets rid of the whole concept of aggressive VACUUM). The whole way that vacuum_freeze_table_age and autovacuum_freeze_max_age are supposed to work together seems very confusing to me. I'm not surprised that this was overlooked for so long. -- Peter Geoghegan
Re: Adding Support for Copy callback functionality on COPY TO api
On Mon, Oct 10, 2022 at 09:38:59AM -0700, Nathan Bossart wrote: > This new callback allows the use of COPY TO's machinery in extensions. A > couple of generic use-cases are listed upthread [0], and one concrete > use-case is the aws_s3 extension [1]. FWIW, I understand that the proposal is to have an easier control of how, what and where to the data is processed. COPY TO PROGRAM provides that with exactly the same kind of interface (data input, its length) once you have a program able to process the data piped out the same way. However, it is in the shape of an external process that receives the data through a pipe hence it provides a much wider attack surface which is something that all cloud provider care about. The thing is that this allows extension developers to avoid arbitrary commands on the backend as the OS user running the Postgres instance, while still being able to process the data the way they want (auditing, analytics, whatever) within the strict context of the process running an extension code. I'd say that this is a very cheap change to allow people to have more fun with the backend engine (similar to the recent changes with archive libraries for archive_command, but much less complex): src/backend/commands/copy.c | 2 +- src/backend/commands/copyto.c | 18 +++--- 2 files changed, 16 insertions(+), 4 deletions(-) (Not to mention that we've had our share of CVEs regarding COPY PROGRAM even if it is superuser-only). > I really doubt that this small test case is going to cause anything > approaching undue maintenance burden. I think it's important to ensure > this functionality continues to work as expected long into the future. I like these toy modules, they provide test coverage while acting as a template for new developers. I am wondering whether it should have something for the copy from callback, actually, as it is named "test_copy_callbacks" but I see no need to extend the module more than necessary in the context of this thread (logical decoding uses it, anyway). -- Michael signature.asc Description: PGP signature
Re: Adding Support for Copy callback functionality on COPY TO api
On Tue, Oct 11, 2022 at 09:01:41AM +0900, Michael Paquier wrote: > I like these toy modules, they provide test coverage while acting as a > template for new developers. I am wondering whether it should have > something for the copy from callback, actually, as it is named > "test_copy_callbacks" but I see no need to extend the module more than > necessary in the context of this thread (logical decoding uses it, > anyway). Yeah, I named it that way because I figured we might want a test for the COPY FROM callback someday. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: shadow variables - pg15 edition
On Tue, 11 Oct 2022 at 06:02, Tom Lane wrote: > > Alvaro Herrera writes: > > On 2022-Oct-10, Andres Freund wrote: > >> We could, but is it really a useful thing for something fixed 6 years ago? > > > Well, for people purposefully installing using older installs of Perl > > (not me, admittedly), it does seem useful, because you get the benefit > > of checking shadow vars for the rest of the tree and still get no > > warnings if everything is clean. > > Meh --- people purposefully using old Perls are likely using old > compilers too. Let's wait and see if any devs actually complain. I can't really add much here, apart from I think it would be a shame if some 3rd party 6 year old code was to hold us back on this. I'm also keen to wait for complaints and only if we really have to, remove the shadow flag from being used only in the places where we need to. Aside from this issue, if anything I'd be keen to go a little further with this and upgrade to -Wshadow=local. The reason being is that I noticed that the const qualifier is not classed as "compatible" with the equivalently named and typed variable without the const qualifier. ISTM that there's close to as much opportunity to mix up two variables with the same name that are const and non-const as there are two variables with the same const qualifier. However, I'll be waiting for the dust to settle on the current flags before thinking any more about that. David
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila wrote: > > On Fri, Oct 7, 2022 at 8:47 AM Masahiko Sawada wrote: > > > > On Thu, Oct 6, 2022 at 9:04 PM houzj.f...@fujitsu.com > > wrote: > > > > > > I think the root reason for this kind of deadlock problems is the table > > > structure difference between publisher and subscriber(similar to the > > > unique > > > difference reported earlier[1]). So, I think we'd better disallow this > > > case. For > > > example to avoid the reported problem, we could only support parallel > > > apply if > > > pubviaroot is false on publisher and replicated tables' types(relkind) > > > are the > > > same between publisher and subscriber. > > > > > > Although it might restrict some use cases, but I think it only restrict > > > the > > > cases when the partitioned table's structure is different between > > > publisher and > > > subscriber. User can still use parallel apply for cases when the table > > > structure is the same between publisher and subscriber which seems > > > acceptable > > > to me. And we can also document that the feature is expected to be used > > > for the > > > case when tables' structure are the same. Thoughts ? > > > > I'm concerned that it could be a big restriction for users. Having > > different partitioned table's structures on the publisher and the > > subscriber is quite common use cases. > > > > From the feature perspective, the root cause seems to be the fact that > > the apply worker does both receiving and applying changes. Since it > > cannot receive the subsequent messages while waiting for a lock on a > > table, the parallel apply worker also cannot move forward. If we have > > a dedicated receiver process, it can off-load the messages to the > > worker while another process waiting for a lock. So I think that > > separating receiver and apply worker could be a building block for > > parallel-apply. > > > > I think the disadvantage that comes to mind is the overhead of passing > messages between receiver and applier processes even for non-parallel > cases. Now, I don't think it is advisable to have separate handling > for non-parallel cases. The other thing is that we need to someway > deal with feedback messages which helps to move synchronous replicas > and update subscriber's progress which in turn helps to keep the > restart point updated. These messages also act as heartbeat messages > between walsender and walapply process. > > To deal with this, one idea is that we can have two connections to > walsender process, one with walreceiver and the other with walapply > process which according to me could lead to a big increase in resource > consumption and it will bring another set of complexities in the > system. Now, in this, I think we have two possibilities, (a) The first > one is that we pass all messages to the leader apply worker and then > it decides whether to execute serially or pass it to the parallel > apply worker. However, that can again deadlock in the truncate > scenario we discussed because the main apply worker won't be able to > receive new messages once it is blocked at the truncate command. (b) > The second one is walreceiver process itself takes care of passing > streaming transactions to parallel apply workers but if we do that > then walreceiver needs to wait at the transaction end to maintain > commit order which means it can also lead to deadlock in case the > truncate happens in a streaming xact. I imagined (b) but I had missed the point of preserving the commit order. Separating the receiver and apply worker cannot resolve this problem. > > The other alternative is that we allow walreceiver process to wait for > apply process to finish transaction and send the feedback but that > seems to be again an overhead if we have to do it even for small > transactions, especially it can delay sync replication cases. Even, if > we don't consider overhead, it can still lead to a deadlock because > walreceiver won't be able to move in the scenario we are discussing. > > About your point that having different partition structures for > publisher and subscriber, I don't know how common it will be once we > have DDL replication. Also, the default value of > publish_via_partition_root is false which doesn't seem to indicate > that this is a quite common case. So how can we consider these concurrent issues that could happen only when streaming = 'parallel'? Can we restrict some use cases to avoid the problem or can we have a safeguard against these conflicts? We could find a new problematic scenario in the future and if it happens, logical replication gets stuck, it cannot be resolved only by apply workers themselves. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: autovacuum_freeze_max_age reloption seems broken
On Mon, Oct 10, 2022 at 4:46 PM Peter Geoghegan wrote: > There is no reason to think that the user will also (say) set the > autovacuum_freeze_table_age reloption separately (not to be confused > with the vacuum_freeze_table_age GUC!). We'll usually just work off > the GUC (I mean why wouldn't we?). I don't see why vacuumlazy.c > doesn't just force aggressive mode whenever it sees an antiwraparound > autovacuum, no matter what. Actually, even forcing every antiwraparound autovacuum to use aggressive mode isn't enough to stop autovacuum.c from spinning. It might be a good start, but it still leaves the freeze_min_age issue. The only way that autovacuum.c is going to be satisfied and back off with launching antiwraparound autovacuums is if relfrozenxid is advanced, and advanced by a significant amount. But what if the autovacuum_freeze_max_age reloption happens to have been set to something that's significantly less than the value of the vacuum_freeze_min_age GUC (or the autovacuum_freeze_min_age reloption, even)? Most of the time we can rely on vacuum_set_xid_limits() making sure that the FreezeLimit cutoff (cutoff that determines which XID we'll freeze) isn't unreasonably old relative to other cutoffs. But that won't work if we're forcing an aggressive VACUUM in vacuumlazy.c. I suppose that this separate freeze_min_age issue could be fixed by teaching autovacuum.c's table_recheck_autovac() function to set freeze_min_age to something less than the current value of reloptions like autovacuum_freeze_min_age and autovacuum_freeze_table_age for the same table (when either of the table-level reloptions happened to be set). In other words, autovacuum.c could be taught to make sure that these reloption-based cutoffs have sane values relative to each other by applying roughly the same approach taken in vacuum_set_xid_limits() for the GUCs. -- Peter Geoghegan
Re: Suppressing useless wakeups in walreceiver
On Mon, Oct 10, 2022 at 11:10:14AM -0700, Nathan Bossart wrote: > On Mon, Oct 10, 2022 at 10:51:14AM -0700, Nathan Bossart wrote: >>> +/* Find the soonest wakeup time, to limit our nap. */ >>> +nextWakeup = INT64_MAX; >>> +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) >>> +nextWakeup = Min(state.wakeup[i], nextWakeup); >>> +nap = Max(0, (nextWakeup - now + 999) / 1000); > > Hm. We should probably be more cautious here since nextWakeup is an int64 > and nap is an int. My guess is that we should just set nap to -1 if > nextWakeup > INT_MAX. Here's an attempt at fixing that. I ended up just changing "nap" to an int64 and then ensuring it's no larger than INT_MAX in the call to WaitLatchOrSocket(). IIUC we can't use -1 here because WL_TIMEOUT is set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From c844cf35d483bc7ab847fdc430057295d0f949c9 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Oct 2022 10:31:35 -0700 Subject: [PATCH v6 1/2] Move WAL receivers' non-shared state to a new struct. This is preparatory work for a follow-up change that will revamp the wakeup mechanism for periodic tasks that WAL receivers must perform. --- src/backend/replication/walreceiver.c | 90 ++- 1 file changed, 48 insertions(+), 42 deletions(-) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 6cbb67c92a..89985c54cf 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -116,6 +116,14 @@ static struct XLogRecPtr Flush; /* last byte + 1 flushed in the standby */ } LogstreamResult; +/* + * A struct to keep track of non-shared state. + */ +typedef struct WalRcvInfo +{ + TimeLineID startpointTLI; +} WalRcvInfo; + static StringInfoData reply_message; static StringInfoData incoming_message; @@ -123,12 +131,12 @@ static StringInfoData incoming_message; static void WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last); static void WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI); static void WalRcvDie(int code, Datum arg); -static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len, - TimeLineID tli); -static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, - TimeLineID tli); -static void XLogWalRcvFlush(bool dying, TimeLineID tli); -static void XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli); +static void XLogWalRcvProcessMsg(WalRcvInfo *state, unsigned char type, + char *buf, Size len); +static void XLogWalRcvWrite(WalRcvInfo *state, char *buf, Size nbytes, + XLogRecPtr recptr); +static void XLogWalRcvFlush(WalRcvInfo *state, bool dying); +static void XLogWalRcvClose(WalRcvInfo *state, XLogRecPtr recptr); static void XLogWalRcvSendReply(bool force, bool requestReply); static void XLogWalRcvSendHSFeedback(bool immed); static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime); @@ -175,7 +183,6 @@ WalReceiverMain(void) char slotname[NAMEDATALEN]; bool is_temp_slot; XLogRecPtr startpoint; - TimeLineID startpointTLI; TimeLineID primaryTLI; bool first_stream; WalRcvData *walrcv = WalRcv; @@ -185,6 +192,7 @@ WalReceiverMain(void) char *err; char *sender_host = NULL; int sender_port = 0; + WalRcvInfo state = {0}; /* * WalRcv should be set up already (if we are a backend, we inherit this @@ -238,7 +246,7 @@ WalReceiverMain(void) strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN); is_temp_slot = walrcv->is_temp_slot; startpoint = walrcv->receiveStart; - startpointTLI = walrcv->receiveStartTLI; + state.startpointTLI = walrcv->receiveStartTLI; /* * At most one of is_temp_slot and slotname can be set; otherwise, @@ -258,7 +266,7 @@ WalReceiverMain(void) pg_atomic_write_u64(&WalRcv->writtenUpto, 0); /* Arrange to clean up at walreceiver exit */ - on_shmem_exit(WalRcvDie, PointerGetDatum(&startpointTLI)); + on_shmem_exit(WalRcvDie, PointerGetDatum(&state)); /* Properly accept or ignore signals the postmaster might send us */ pqsignal(SIGHUP, SignalHandlerForConfigReload); /* set flag to read config @@ -345,11 +353,11 @@ WalReceiverMain(void) * Confirm that the current timeline of the primary is the same or * ahead of ours. */ - if (primaryTLI < startpointTLI) + if (primaryTLI < state.startpointTLI) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("highest timeline %u of the primary is behind recovery timeline %u", - primaryTLI, startpointTLI))); + primaryTLI, state.startpointTLI))); /* * Get any missing history files. We do this always, even when we're @@ -361,7 +369,7 @@ WalReceiverMain(void) * but let's avoid the confusion of timeline id collisions where we * can. */ - WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);
Re: Suppressing useless wakeups in walreceiver
On Mon, Oct 10, 2022 at 11:21 PM Nathan Bossart wrote: > Thanks for the updated patches. > > 2. With the below change, the time walreceiver spends in > > XLogWalRcvSendReply() is also included for XLogWalRcvSendHSFeedback > > right? I think it's a problem given that XLogWalRcvSendReply() can > > take a while. Earlier, this wasn't the case, each function calculating > > 'now' separately. Any reason for changing this behaviour? I know that > > GetCurrentTimestamp(); isn't cheaper all the time, but here it might > > result in a change in the behaviour. > > Yes, if XLogWalRcvSendReply() takes a long time, we might defer sending the > hot_standby_feedback message until a later time. The only reason I changed > this was to avoid extra calls to GetCurrentTimestamp(), which might be > expensive on some platforms. Agree. However... > Outside of the code snippet you pointed out, > I think WalReceiverMain() has a similar problem. That being said, I'm > generally skeptical that this sort of thing is detrimental given the > current behavior (i.e., wake up every 100ms), the usual values of these > GUCs (i.e., tens of seconds), and the fact that any tasks that are > inappropriately skipped will typically be retried in the next iteration of > the loop. AFICS, the aim of the patch isn't optimizing around GetCurrentTimestamp() calls and the patch does seem to change quite a bit of them which may cause a change of the behaviour. I think that the GetCurrentTimestamp() optimizations can go to 0003 patch in this thread itself or we can discuss it as a separate thread to seek more thoughts. The 'now' in many instances in the patch may not actually represent the true current time and it includes time taken by other operations as well. I think this will have problems. +XLogWalRcvSendReply(state, now, false, false); +XLogWalRcvSendHSFeedback(state, now, false); +now = GetCurrentTimestamp(); +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) +WalRcvComputeNextWakeup(&state, i, now); +XLogWalRcvSendHSFeedback(&state, now, true); +now = GetCurrentTimestamp(); +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) +WalRcvComputeNextWakeup(&state, i, now); +XLogWalRcvSendHSFeedback(&state, now, true); > > 4. How about simplifying WalRcvComputeNextWakeup() something like below? > > Other than saving a couple lines of code, IMO this doesn't meaningfully > simplify the function or improve readability. IMO, the duplicate lines of code aren't necessary in WalRcvComputeNextWakeup() and that function's code isn't that complex by removing them. > > 5. Can we move below code snippets to respective static functions for > > better readability and code reuse? > > This: > > +/* Find the soonest wakeup time, to limit our nap. */ > > +nextWakeup = INT64_MAX; > > +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) > > +nextWakeup = Min(state.wakeup[i], nextWakeup); > > +nap = Max(0, (nextWakeup - now + 999) / 1000); > > > > And this: > > > > +now = GetCurrentTimestamp(); > > +for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i) > > +WalRcvComputeNextWakeup(&state, i, now); > > This did cross my mind, but I opted to avoid creating new functions because > 1) they aren't likely to be reused very much, and 2) I actually think it > might hurt readability by forcing developers to traipse around the file to > figure out what these functions are actually doing. It's not like it would > save many lines of code, either. The second snippet is being used twice already. IMO having static functions (perhaps inline) is much better here. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Adding Support for Copy callback functionality on COPY TO api
On Wed, Sep 30, 2020 at 01:48:12PM +0500, Andrey V. Lepikhov wrote: > Your code almost exactly the same as proposed in [1] as part of 'Fast COPY > FROM' command. But it seems there are differences. > > [1] > https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru I have been looking at what you have here while reviewing the contents of this thread, and it seems to me that you should basically be able to achieve the row-level control that your patch is doing with the callback to do the per-row processing posted here. The main difference, though, is that you want to have more control at the beginning and the end of the COPY TO processing which explains the split of DoCopyTo(). I am a bit surprised to see this much footprint in the backend code once there are two FDW callbacks to control the beginning and the end of the COPY TO, to be honest, sacrifying a lot the existing symmetry between the COPY TO and COPY FROM code paths where there is currently a strict control on the pre-row and post-row processing like the per-row memory context. -- Michael signature.asc Description: PGP signature
Re: Suppressing useless wakeups in walreceiver
On Tue, Oct 11, 2022 at 07:01:26AM +0530, Bharath Rupireddy wrote: > On Mon, Oct 10, 2022 at 11:21 PM Nathan Bossart >> Outside of the code snippet you pointed out, >> I think WalReceiverMain() has a similar problem. That being said, I'm >> generally skeptical that this sort of thing is detrimental given the >> current behavior (i.e., wake up every 100ms), the usual values of these >> GUCs (i.e., tens of seconds), and the fact that any tasks that are >> inappropriately skipped will typically be retried in the next iteration of >> the loop. > > AFICS, the aim of the patch isn't optimizing around > GetCurrentTimestamp() calls and the patch does seem to change quite a > bit of them which may cause a change of the behaviour. I think that > the GetCurrentTimestamp() optimizations can go to 0003 patch in this > thread itself or we can discuss it as a separate thread to seek more > thoughts. > > The 'now' in many instances in the patch may not actually represent > the true current time and it includes time taken by other operations > as well. I think this will have problems. What problems do you think this will cause? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Adding Support for Copy callback functionality on COPY TO api
On Mon, Oct 10, 2022 at 05:06:39PM -0700, Nathan Bossart wrote: > Yeah, I named it that way because I figured we might want a test for the > COPY FROM callback someday. Okay. So, I have reviewed the whole thing, added a description of all the fields of BeginCopyTo() in its top comment, tweaked a few things and added in the module an extra NOTICE with the number of processed rows. The result seemed fine, so applied. -- Michael signature.asc Description: PGP signature
Re: Query Jumbling for CALL and SET utility statements
On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote: > Unless I'm missing something both can be handled using pg_node_attr() > annotations that already exists? Indeed, that should work for the locators. -- Michael signature.asc Description: PGP signature
Re: Adding Support for Copy callback functionality on COPY TO api
On Tue, Oct 11, 2022 at 11:52:03AM +0900, Michael Paquier wrote: > Okay. So, I have reviewed the whole thing, added a description of all > the fields of BeginCopyTo() in its top comment, tweaked a few things > and added in the module an extra NOTICE with the number of processed > rows. The result seemed fine, so applied. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart > wrote: >> I wonder if it would be better to simply remove this extra polling of >> pg_wal as a prerequisite to your patch. The existing commentary leads me >> to think there might not be a strong reason for this behavior, so it could >> be a nice way to simplify your patch. > > I don't think it's a good idea to remove that completely. As said > above, it might help someone, we never know. It would be great to hear whether anyone is using this functionality. If no one is aware of existing usage and there is no interest in keeping it around, I don't think it would be unreasonable to remove it in v16. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Reducing the chunk header sizes on all memory context types
On Tue, Oct 11, 2022 at 5:31 AM David Rowley wrote: > > The proposed patches in [1] do aim to make additional usages of the > slab allocator, and I have a feeling that we'll want to fix the > performance of slab.c before those. Perhaps the Asserts are a better > option if we're to get the proposed radix tree implementation. Going by [1], that use case is not actually a natural fit for slab because of memory fragmentation. The motivation to use slab there was that the allocation sizes are just over a power of two, leading to a lot of wasted space for aset. FWIW, I have proposed in that thread a scheme to squeeze things into power-of-two sizes without wasting quite as much space. That's not a done deal, of course, but it could work today without adding memory management code. [1] https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de -- John Naylor EDB: http://www.enterprisedb.com
Re: Non-decimal integer literals
Hi Peter, /* process digits */ + if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X')) + { + ptr += 2; + while (*ptr && isxdigit((unsigned char) *ptr)) + { + int8 digit = hexlookup[(unsigned char) *ptr]; + + if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) || + unlikely(pg_sub_s16_overflow(tmp, digit, &tmp))) + goto out_of_range; + + ptr++; + } + } + else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O')) + { + ptr += 2; + + while (*ptr && (*ptr >= '0' && *ptr <= '7')) + { + int8 digit = (*ptr++ - '0'); + + if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) || + unlikely(pg_sub_s16_overflow(tmp, digit, &tmp))) + goto out_of_range; + } + } + else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B')) + { + ptr += 2; + + while (*ptr && (*ptr >= '0' && *ptr <= '1')) + { + int8 digit = (*ptr++ - '0'); + + if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) || + unlikely(pg_sub_s16_overflow(tmp, digit, &tmp))) + goto out_of_range; + } + } + else + { while (*ptr && isdigit((unsigned char) *ptr)) { int8 digit = (*ptr++ - '0'); @@ -128,6 +181,7 @@ pg_strtoint16(const char *s) unlikely(pg_sub_s16_overflow(tmp, digit, &tmp))) goto out_of_range; } + } What do you think if we move these code into a static inline function? like: static inline char* process_digits(char *ptr, int32 *result) { ... } On Mon, Oct 10, 2022 at 10:17 PM Peter Eisentraut wrote: > > On 16.02.22 11:11, Peter Eisentraut wrote: > > The remaining patches are material for PG16 at this point, and I will > > set the commit fest item to returned with feedback in the meantime. > > Time to continue with this. > > Attached is a rebased and cleaned up patch for non-decimal integer > literals. (I don't include the underscores-in-numeric literals patch. > I'm keeping that for later.) > > Two open issues from my notes: > > Technically, numeric_in() should be made aware of this, but that seems > relatively complicated and maybe not necessary for the first iteration. > > Taking another look around ecpg to see how this interacts with C-syntax > integer literals. I'm not aware of any particular issues, but it's > understandably tricky. > > Other than that, this seems pretty complete as a start. -- Regards Junwang Zhao
Re: subtransaction performance
On Mon, Oct 10, 2022 at 02:20:37PM -0400, Bruce Momjian wrote: > On Fri, Oct 7, 2022 at 03:23:27PM -0700, Zhihong Yu wrote: >> I wonder if SAVEPOINT / subtransaction performance has been boosted since the >> blog was written. > > No, I have not seen any changes in this area since then. Seems there > are two problems --- the 64 cache per session and the 64k on the > replica. In both cases, it seems sizing is not optimal, but sizing is > never optimal. I guess we can look at allowing manual size adjustment, > automatic size adjustment, or a different approach that is more graceful > for larger savepoint workloads. I believe the following commitfest entries might be relevant to this discussion: https://commitfest.postgresql.org/39/2627/ https://commitfest.postgresql.org/39/3514/ https://commitfest.postgresql.org/39/3806/ -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: create subscription - improved warning message
On Mon, Oct 10, 2022 at 8:14 PM Tom Lane wrote: > > Peter Smith writes: > > On Mon, Oct 10, 2022 at 4:04 PM Amit Kapila wrote: > >> I think the below gives accurate information: > >> WARNING: subscription was created, but is not connected > >> HINT: You should create the slot manually, enable the subscription, > >> and run %s to initiate replication. > > > +1 > > It feels a little strange to me that we describe two steps rather vaguely > and then give exact SQL for the third step. How about, say, > > HINT: To initiate replication, you must manually create the replication > slot, enable the subscription, and refresh the subscription. > LGTM. BTW, do we want to slightly adjust the documentation for the connect option on CREATE SUBSCRIPTION page [1]? It says: "Since no connection is made when this option is false, no tables are subscribed, and so after you enable the subscription nothing will be replicated. You will need to then run ALTER SUBSCRIPTION ... REFRESH PUBLICATION for tables to be subscribed." It doesn't say anything about manually creating the slot and probably the wording can be made similar. [1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html -- With Regards, Amit Kapila.
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Dear Önder, Thanks for updating the patch! I checked yours and almost good. Followings are just cosmetic comments. === 01. relation.c - GetCheapestReplicaIdentityFullPath ``` * The reason that the planner would not pick partial indexes and indexes * with only expressions based on the way currently baserestrictinfos are * formed (e.g., col_1 = $1 ... AND col_N = $2). ``` Is "col_N = $2" a typo? I think it should be "col_N = $N" or "attr1 = $1 ... AND attrN = $N". === 02. 032_subscribe_use_index.pl If a table has a primary key on the subscriber, it will be used even if enable_indexscan is false(legacy behavior). Should we test it? ~~~ 03. 032_subscribe_use_index.pl - SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX I think this test seems to be not trivial, so could you write down the motivation? ~~~ 04. 032_subscribe_use_index.pl - SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX ``` # wait until the index is created $node_subscriber->poll_query_until( 'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';} ) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index"; ``` CREATE INDEX is a synchronous behavior, right? If so we don't have to wait here. ...And the comment in case of die may be wrong. (There are some cases like this) ~~~ 05. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS ``` # Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS # # Basic test where the subscriber uses index # and touches 50 rows with UPDATE ``` "touches 50 rows with UPDATE" -> "updates 50 rows", per other tests. ~~~ 06. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE I think this test seems to be not trivial, so could you write down the motivation? (Same as Re-calclate) ~~~ 07. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE ``` # show that index_b is not used $node_subscriber->poll_query_until( 'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';} ) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-2"; ``` I think we don't have to wait here, is() should be used instead. poll_query_until() should be used only when idx_scan>0 is checked. (There are some cases like this) ~~~ 08. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ON PARTITIONED TABLES ``` # make sure that the subscriber has the correct data $node_subscriber->poll_query_until( 'postgres', q{select sum(user_id+value_1+value_2)=550070 AND count(DISTINCT(user_id,value_1, value_2))=981 from users_table_part;} ) or die "ensure subscriber has the correct data at the end of the test"; ``` I think we can replace it to wait_for_catchup() and is()... Moreover, we don't have to wait here because in above line we wait until the index is used on the subscriber. (There are some cases like this) Best Regards, Hayato Kuroda FUJITSU LIMITED
PostgreSQL Logical decoding
Hello, We are looking for an example on how to consume the changes of WAL produced by logical decoding (streaming or SQL interface) in another postgres server. Basically, we are trying to create a replica/standby postgre server to a primary progre server. Between Logical replication and Logical Decoding we came up with Logical decoding as the choice due to limitation of logical replication (materialized views, external views/tables, sequences not replicated). However we are not finding a good example with instructions on how to set up a consumer postgre server. Thanks Ankit
Re: Suppressing useless wakeups in walreceiver
On Tue, Oct 11, 2022 at 8:20 AM Nathan Bossart wrote: > > > AFICS, the aim of the patch isn't optimizing around > > GetCurrentTimestamp() calls and the patch does seem to change quite a > > bit of them which may cause a change of the behaviour. I think that > > the GetCurrentTimestamp() optimizations can go to 0003 patch in this > > thread itself or we can discuss it as a separate thread to seek more > > thoughts. > > > > The 'now' in many instances in the patch may not actually represent > > the true current time and it includes time taken by other operations > > as well. I think this will have problems. > > What problems do you think this will cause? now = t1; XLogWalRcvSendReply() /* say it ran for a minute or so for whatever reasons */ XLogWalRcvSendHSFeedback() /* with patch walrecevier sends hot standby feedback more often without properly honouring wal_receiver_status_interval because the 'now' isn't actually the current time as far as that function is concerned, it is t1 + XLogWalRcvSendReply()'s time. */ Well, is this really a problem? I'm not sure about that. Let's hear from others. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: use has_privs_of_role() for pg_hba.conf
On Sun, Oct 09, 2022 at 02:13:48PM -0700, Nathan Bossart wrote: > Here you go. Thanks, applied. It took me a few minutes to note that regress_regression_* is required in the object names because we need to use the same name for the parent role and the database, with "regress_" being required for the role and "regression" being required for the database. I have added an extra section where pg_hba.conf is set to match only the parent role, while on it. perltidy has reshaped things in an interesting way, because the generated log_[un]like is long, it seems. -- Michael signature.asc Description: PGP signature
Re: PostgreSQL Logical decoding
Hi Ankit, On Tue, Oct 11, 2022 at 9:32 AM Ankit Oza wrote: > > Hello, > > We are looking for an example on how to consume the changes of WAL produced > by logical decoding (streaming or SQL interface) in another postgres server. built-in logical replication is good example to start looking for. https://www.postgresql.org/docs/current/logical-replication.html > > Basically, we are trying to create a replica/standby postgre server to a > primary progre server. Between Logical replication and Logical Decoding we > came up with Logical decoding as the choice due to limitation of logical > replication (materialized views, external views/tables, sequences not > replicated). However we are not finding a good example with instructions on > how to set up a consumer postgre server. > Logical decoding is the process to convert WAL to a logical change, logical replication deals with transferring these changes to another server and applying those there. So they work in tandem; just one without the other can not be used. So I am confused about your requirements. -- Best Wishes, Ashutosh Bapat
Re: Remove an unnecessary LSN calculation while validating WAL page header
At Mon, 10 Oct 2022 08:53:55 +0530, Bharath Rupireddy wrote in > It looks like we have an unnecessary XLogSegNoOffsetToRecPtr() in > XLogReaderValidatePageHeader(). We pass the start LSN of the WAL page > and check if it matches with the LSN that was stored in the WAL page > header (xlp_pageaddr). We find segno, offset and LSN again using > XLogSegNoOffsetToRecPtr(). This happens to be the same as the passed > in LSN 'recptr'. Yeah, that's obviously useless. It looks like a thinko in pg93 when recptr became to be directly passed from the caller instead of calculating from static variables for file, segment and in-segment offset. > Here's a tiny patch removing the unnecessary XLogSegNoOffsetToRecPtr() > and using the passed in 'recptr'. Looks good to me. # Mysteriously, I didn't find a code to change readId in the pg92 tree.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: src/test/perl/PostgreSQL/Test/*.pm not installed
On Mon, Oct 10, 2022 at 11:34:15AM +0200, Alvaro Herrera wrote: > I noticed that the new Perl test modules are not installed, so if you > try to use PostgreSQL/Test/Cluster.pm in an external test from pgxs, it > fails with the modules not being found. > > I see no reason for this other than having overseen it in b235d41d9646, > so I propose the attached (for all branches, naturally.) +1, good catch. The patch looks fine. -- Michael signature.asc Description: PGP signature
Re: Fast COPY FROM based on batch insert
On 10/7/22 11:18, Etsuro Fujita wrote: On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita wrote: I will review the patch a bit more, but I feel that it is in good shape. One thing I noticed is this bit added to CopyMultiInsertBufferFlush() to run triggers on the foreign table. + /* Run AFTER ROW INSERT triggers */ + if (resultRelInfo->ri_TrigDesc != NULL && + (resultRelInfo->ri_TrigDesc->trig_insert_after_row || +resultRelInfo->ri_TrigDesc->trig_insert_new_table)) + { + Oid relid = RelationGetRelid(resultRelInfo->ri_RelationDesc); + + for (i = 0; i < inserted; i++) + { + TupleTableSlot *slot = rslots[i]; + + /* +* AFTER ROW Triggers might reference the tableoid column, +* so (re-)initialize tts_tableOid before evaluating them. +*/ + slot->tts_tableOid = relid; + + ExecARInsertTriggers(estate, resultRelInfo, +slot, NIL, +cstate->transition_capture); + } + } Since foreign tables cannot have transition tables, we have trig_insert_new_table=false. So I simplified the if test and added an assertion ensuring trig_insert_new_table=false. Attached is a new version of the patch. I tweaked some comments a bit as well. I think the patch is committable. So I plan on committing it next week if there are no objections. I reviewed the patch one more time. Only one question: bistate and ri_FdwRoutine are strongly bounded. Maybe to add some assertion on (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future. -- Regards Andrey Lepikhov Postgres Professional
Re: PostgreSQL Logical decoding
On Tue, Oct 11, 2022 at 9:32 AM Ankit Oza wrote: > > Hello, > > We are looking for an example on how to consume the changes of WAL produced > by logical decoding (streaming or SQL interface) in another postgres server. > > Basically, we are trying to create a replica/standby postgre server to a > primary progre server. Between Logical replication and Logical Decoding we > came up with Logical decoding as the choice due to limitation of logical > replication (materialized views, external views/tables, sequences not > replicated). However we are not finding a good example with instructions on > how to set up a consumer postgre server. > I think from a code perspective, you can look at contrib/test_decoding and src\backend\replication\pgoutput to see how to consume changes and send them to the replica. You can refer to docs [1] for SQL functions to consume changes. [1] - https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION -- With Regards, Amit Kapila.
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote: > foreach(cell, tokens) > { > [...] > + tokreg = lfirst(cell); > + if (!token_is_regexp(tokreg)) > { > - if (strcmp(dbname, role) == 0) > + if (am_walsender && !am_db_walsender) > + { > + /* > + * physical replication walsender connections > can only match > + * replication keyword > + */ > + if (token_is_keyword(tokreg->authtoken, > "replication")) > + return true; > + } > + else if (token_is_keyword(tokreg->authtoken, "all")) > return true; When checking the list of databases in check_db(), physical WAL senders (aka am_walsender && !am_db_walsender) would be able to accept regexps, but these should only accept "replication" and never a regexp, no? The second check on "replication" placed in the branch for token_is_regexp() in your patch would be correctly placed, though. This is kind of special in the HBA logic, coming back to 9.0 where physical replication and this special role property have been introduced. WAL senders have gained an actual database property later on in 9.4 with logical decoding, keeping "replication" for compatibility (connection strings can use replication=database to connect as a non-physical WAL sender and connect to a specific database). > +typedef struct AuthToken > +{ > + char *string; > + boolquoted; > +} AuthToken; > + > +/* > + * Distinguish the case a token has to be treated as a regular > + * expression or not. > + */ > +typedef struct AuthTokenOrRegex > +{ > + boolis_regex; > + > + /* > + * Not an union as we still need the token string for fill_hba_line(). > + */ > + AuthToken *authtoken; > + regex_t*regex; > +} AuthTokenOrRegex; Hmm. With is_regex to check if a regex_t exists, both structures may not be necessary. I have not put my hands on that directly, but if I guess that I would shape things to have only AuthToken with (enforcing regex_t in priority if set in the list of elements to check for a match): - the string - quoted - regex_t A list member should never have (regex_t != NULL && quoted), right? Hostnames would never be quoted, as well. > +# test with a comma in the regular expression > +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password'); > +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username', > + 0); So, we check here that the role includes "5," in its name. This is getting fun to parse ;) > elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/) > { > - plan skip_all => 'Potentially unsafe test SSL not enabled in > PG_TEST_EXTRA'; > + plan skip_all => > + 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; > } Unrelated noise from perltidy. -- Michael signature.asc Description: PGP signature
Re: create subscription - improved warning message
On Tue, Oct 11, 2022 at 2:46 PM Amit Kapila wrote: > > On Mon, Oct 10, 2022 at 8:14 PM Tom Lane wrote: > > > > Peter Smith writes: > > > On Mon, Oct 10, 2022 at 4:04 PM Amit Kapila > > > wrote: > > >> I think the below gives accurate information: > > >> WARNING: subscription was created, but is not connected > > >> HINT: You should create the slot manually, enable the subscription, > > >> and run %s to initiate replication. > > > > > +1 > > > > It feels a little strange to me that we describe two steps rather vaguely > > and then give exact SQL for the third step. How about, say, > > > > HINT: To initiate replication, you must manually create the replication > > slot, enable the subscription, and refresh the subscription. > > > > LGTM. PSA patch v3-0001 where the message/hint is worded as suggested above > BTW, do we want to slightly adjust the documentation for the > connect option on CREATE SUBSCRIPTION page [1]? It says: "Since no > connection is made when this option is false, no tables are > subscribed, and so after you enable the subscription nothing will be > replicated. You will need to then run ALTER SUBSCRIPTION ... REFRESH > PUBLICATION for tables to be subscribed." > > It doesn't say anything about manually creating the slot and probably > the wording can be made similar. > PSA patch v3-0002 which changes CREATE SUBSCRIPTION pgdocs to use the same wording as in the HINT message. -- Kind Regards, Peter Smith. Fujitsu Australia v3-0002-create-subscription-pgdocs.patch Description: Binary data v3-0001-create-subscription-warning-message.patch Description: Binary data
Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records
> On Wed, 5 Oct 2022 at 16:30, Tom Lane wrote: > > One other point to discuss: should we consider back-patching? I've > > got mixed feelings about that myself. I don't think that cases where > > this helps significantly are at all mainstream, so I'm kind of leaning > > to "patch HEAD only". At Mon, 10 Oct 2022 13:24:34 +0100, Simon Riggs wrote in > It looks fine to eventually backpatch, since StandbyReleaseLockTree() > was optimized to only be called when the transaction had actually done > some AccessExclusiveLocks. > > So the performance loss is minor and isolated to the users of such > locks, so I see no problems with it. At Wed, 5 Oct 2022 12:00:55 -0700, Nathan Bossart wrote in > +1. It can always be back-patched in the future if there are additional > reports. The third +1 from me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center