Re: tweak to a few index tests to hits ambuildempty() routine.
After analyzing this, I found out why we don't reach that Assert but we have coverage shown - firstly, it reached via another test, vacuum; secondly, it depends on the gcc optimization flag. We reach that Assert only when using -O0. If we build with -O2 or -Og that function is not reached (due to different results of the heap_prune_satisfies_vacuum() check inside heap_page_prune()). But as the make checks mostly (including the buildfarm testing) performed with -O2/-Og, it looks like that after 4fb5c794e5 we have lost the coverage provided by the 4c51a2d1e4. Amul Sul писал 2022-09-14 14:28: On Wed, Sep 14, 2022 at 12:16 PM wrote: I still wonder, if assert doesn't catch why that place is marked as covered here? https://coverage.postgresql.org/src/backend/access/gin/ginvacuum.c.gcov.html Probably other tests cover that. Regards, Amul
Multi-insert related comment in CopyFrom()
Hi, While working on the “Fast COPY FROM based on batch insert” patch, I noticed this: else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL && resultRelInfo->ri_TrigDesc->trig_insert_new_table) { /* * For partitioned tables we can't support multi-inserts when there * are any statement level insert triggers. It might be possible to * allow partitioned tables with such triggers in the future, but for * now, CopyMultiInsertInfoFlush expects that any before row insert * and statement level insert triggers are on the same relation. */ insertMethod = CIM_SINGLE; } I think there is a thinko in the comment; “before” should be after. Patch attached. Best regards, Etsuro Fujita Fix-thinko-in-comment.patch Description: Binary data
Re: RFC: Logging plan of the running query
Ok, I get it. Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors. Agree. Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it. I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query. We have a similar task to print the plan of an interrupted process making a request for a specific pid. In short, I think, this task is different and for separating these parts I added this comment. I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. Yes, I was worried about it. I understood it, thank for explaining. AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal. ``` SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId) { volatile ProcSignalSlot *slot; ...(snip)... 278 if (slot->pss_pid == pid) 279 { 280 /* Atomically set the proper flag */ 281 slot->pss_signalFlags[reason] = true; 282 /* Send signal */ 283 return kill(pid, SIGUSR1); ``` Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'. ```C /* * Reasons for signaling a Postgres child process (a backend or an auxiliary * process, like checkpointer). We can cope with concurrent signals for different * reasons. However, if the same reason is signaled multiple times in quick * succession, the process is likely to observe only one notification of it. * This is okay for the present uses. ... typedef enum { PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */ PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */ PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */ PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */ PROCSIG_BARRIER, /* global barrier interrupt */ PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */ PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current query */ ... } ProcSignalReason; ``` [1] https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com
Re: Perform streaming logical transactions by background workers and parallel apply
Here are some review comments for patch v30-0001. == 1. Commit message In addition, the patch extends the logical replication STREAM_ABORT message so that abort_time and abort_lsn can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, meaning that parallel streaming is not supported for publications on servers < PG16. "meaning that parallel streaming is not supported" -> "parallel streaming is not supported" == 2. doc/src/sgml/logical-replication.sgml @@ -1611,8 +1622,12 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER to the subscriber, plus some reserve for table synchronization. max_logical_replication_workers must be set to at least the number of subscriptions, again plus some reserve for the table - synchronization. Additionally the max_worker_processes - may need to be adjusted to accommodate for replication workers, at least + synchronization. In addition, if the subscription parameter + streaming is set to parallel, please + increase max_logical_replication_workers according to + the desired number of parallel apply workers. Additionally the + max_worker_processes may need to be adjusted to + accommodate for replication workers, at least (max_logical_replication_workers + 1). Note that some extensions and parallel queries also take worker slots from max_worker_processes. IMO it looks a bit strange to have "In addition" followed by "Additionally". Also, "to accommodate for replication workers"? seems like a typo (but it is not caused by your patch) BEFORE In addition, if the subscription parameter streaming is set to parallel, please increase max_logical_replication_workers according to the desired number of parallel apply workers. AFTER (???) If the subscription parameter streaming is set to parallel, max_logical_replication_workers should be increased according to the desired number of parallel apply workers. == 3. .../replication/logical/applyparallelworker.c - parallel_apply_can_start +/* + * Returns true, if it is allowed to start a parallel apply worker, false, + * otherwise. + */ +static bool +parallel_apply_can_start(TransactionId xid) Seems a slightly complicated comment for a simple boolean function. SUGGESTION Returns true/false if it is OK to start a parallel apply worker. == 4. .../replication/logical/applyparallelworker.c - parallel_apply_free_worker + winfo->in_use = false; + + /* Are there enough workers in the pool? */ + if (napplyworkers > (max_parallel_apply_workers_per_subscription / 2)) + { I felt the comment/logic about "enough" needs a bit more description. At least it should say to refer to the more detailed explanation atop worker.c == 5. .../replication/logical/applyparallelworker.c - parallel_apply_setup_dsm + /* + * Estimate how much shared memory we need. + * + * Because the TOC machinery may choose to insert padding of oddly-sized + * requests, we must estimate each chunk separately. + * + * We need one key to register the location of the header, and we need two + * other keys to track of the locations of the message queue and the error + * message queue. + */ "track of" -> "keep track of" ? == 6. src/backend/replication/logical/launcher.c - logicalrep_worker_detach logicalrep_worker_detach(void) { + /* Stop the parallel apply workers. */ + if (!am_parallel_apply_worker() && !am_tablesync_worker()) + { + List*workers; + ListCell *lc; The condition is not very obvious. This is why I previously suggested adding another macro/function like 'isLeaderApplyWorker'. In the absence of that, then I think the comment needs to be more descriptive. SUGGESTION If this is the leader apply worker then stop the parallel apply workers. == 7. src/backend/replication/logical/proto.c - logicalrep_read_stream_abort void logicalrep_write_stream_abort(StringInfo out, TransactionId xid, - TransactionId subxid) + TransactionId subxid, XLogRecPtr abort_lsn, + TimestampTz abort_time, bool abort_info) { pq_sendbyte(out, LOGICAL_REP_MSG_STREAM_ABORT); @@ -1175,19 +1179,40 @@ logicalrep_write_stream_abort(StringInfo out, TransactionId xid, /* transaction ID */ pq_sendint32(out, xid); pq_sendint32(out, subxid); + + if (abort_info) + { + pq_sendint64(out, abort_lsn); + pq_sendint64(out, abort_time); + } The new param name 'abort_info' seems misleading. Maybe a name like 'write_abort_info' is better? ~~~ 8. src/backend/replication/logical/proto.c - logicalrep_read_stream_abort +logicalrep_read_stream_abort(StringInfo in, + LogicalRepStreamAbortData *abort_data, + bool read_abort_lsn) { - Assert(xid && subxid); + Assert(abort_data); + + abort_data->xid = pq_getmsgint(in, 4); + abort_data->subxid = pq_getmsgint(in, 4); - *xid = pq_getmsgint(in, 4); - *subxid = pq_getmsgint(in, 4); + i
Re: SLRUs in the main buffer pool, redux
On Sat, Sep 17, 2022 at 12:41 PM Bagga, Rishu wrote: > While I was working on adding the page headers to SLRU pages on your patch, I > came across this code where it seems like "MultiXactIdToMemberPage" is > mistakenly being used instead of MultiXactIdToOffsetPage in the TrimMultiXact > function. Thanks Rishu. Right. Will fix soon in the next version, along with my long overdue replies to Heikki and Konstantin.
Re: Add common function ReplicationOriginName.
On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila wrote: > ... > Can't we use the existing function ReplicationOriginNameForTablesync() > by passing relid as InvalidOid for this purpose? We need a check > inside to decide which name to construct, otherwise, it should be > fine. If we agree with this, then we can change the name of the > function to something like ReplicationOriginNameForLogicalRep or > ReplicationOriginNameForLogicalRepWorkers. > This suggestion attaches special meaning to the reild param. Won't it seem a bit strange for the non-tablesync callers (who probably have a perfectly valid 'relid') to have to pass an InvalidOid relid just so they can format the correct origin name? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Tighten pg_get_object_address argument checking
On Tue, Sep 20, 2022 at 11:14 PM Peter Eisentraut wrote: > > For publication schemas (OBJECT_PUBLICATION_NAMESPACE) and user > mappings (OBJECT_USER_MAPPING), pg_get_object_address() checked the > array length of the second argument, but not of the first argument. > If the first argument was too long, it would just silently ignore > everything but the first argument. Fix that by checking the length of > the first argument as well. > LGTM. -- With Regards, Amit Kapila.
Re: Add common function ReplicationOriginName.
On Wed, Sep 21, 2022 at 3:09 PM Peter Smith wrote: > > On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila wrote: > > > ... > > > Can't we use the existing function ReplicationOriginNameForTablesync() > > by passing relid as InvalidOid for this purpose? We need a check > > inside to decide which name to construct, otherwise, it should be > > fine. If we agree with this, then we can change the name of the > > function to something like ReplicationOriginNameForLogicalRep or > > ReplicationOriginNameForLogicalRepWorkers. > > > > This suggestion attaches special meaning to the reild param. > > Won't it seem a bit strange for the non-tablesync callers (who > probably have a perfectly valid 'relid') to have to pass an InvalidOid > relid just so they can format the correct origin name? > For non-tablesync workers, relid should always be InvalidOid. See, how we launch apply workers in ApplyLauncherMain(). Do you see any case for non-tablesync workers where relid is not InvalidOid? -- With Regards, Amit Kapila.
Re: making relfilenodes 56 bits
On Tue, Sep 20, 2022 at 10:44 PM Robert Haas wrote: Thanks for the review, please see my response inline for some of the comments, rest all are accepted. > On Fri, Sep 9, 2022 at 6:02 AM Dilip Kumar wrote: > > [ new patch ] > > +typedef pg_int64 RelFileNumber; > > This seems really random to me. First, why isn't this an unsigned > type? OID is unsigned and I don't see a reason to change to a signed > type. But even if we were going to change to a signed type, why > pg_int64? That is declared like this: > > /* Define a signed 64-bit integer type for use in client API declarations. */ > typedef PG_INT64_TYPE pg_int64; > > Surely this is not a client API declaration > > Note that if we change this a lot of references to INT64_FORMAT will > need to become UINT64_FORMAT. > > I think we should use int64 at the SQL level, because we don't have an > unsigned 64-bit SQL type, and a signed 64-bit type can hold 56 bits. > So it would still be Int64GetDatum((int64) rd_rel->relfilenode) or > similar. But internally I think using unsigned is cleaner. Yeah you are right we can make it uint64. With respect to this, we can not directly use uint64 because that is declared in c.h and that can not be used in postgres_ext.h IIUC. So what are the other option maybe we can typedef the RelFIleNumber similar to what c.h done for uint64 i.e. #ifdef HAVE_LONG_INT_64 typedef unsigned long int uint64; #elif defined(HAVE_LONG_LONG_INT_64) typedef long long int int64; #endif And maybe same for UINT64CONST ? I am not liking duplicating this logic but is there any better alternative for doing this? Can we move the existing definitions from c.h file to some common file (common for client and server)? > > +if (relnumber >= ShmemVariableCache->loggedRelFileNumber) > > It probably doesn't make any difference, but to me it seems better to > test flushedRelFileNumber rather than logRelFileNumber here. What do > you think? Actually based on this condition are logging more so it make more sense to check w.r.t loggedRelFileNumber, but OTOH technically, without flushing log we are not supposed to use the relfilenumber so make more sense to test flushedRelFileNumber. But since both are the same I am fine with flushedRelFileNumber. > /* > * We set up the lockRelId in case anything tries to lock the dummy > - * relation. Note that this is fairly bogus since relNumber may be > - * different from the relation's OID. It shouldn't really matter though. > - * In recovery, we are running by ourselves and can't have any lock > - * conflicts. While syncing, we already hold AccessExclusiveLock. > + * relation. Note we are setting relId to just FirstNormalObjectId which > + * is completely bogus. It shouldn't really matter though. In recovery, > + * we are running by ourselves and can't have any lock conflicts. While > + * syncing, we already hold AccessExclusiveLock. > */ > rel->rd_lockInfo.lockRelId.dbId = rlocator.dbOid; > -rel->rd_lockInfo.lockRelId.relId = rlocator.relNumber; > +rel->rd_lockInfo.lockRelId.relId = FirstNormalObjectId; > > Boy, this makes me uncomfortable. The existing logic is pretty bogus, > and we're replacing it with some other bogus thing. Do we know whether > anything actually does try to use this for locking? > > One notable difference between the existing logic and your change is > that, with the existing logic, we use a bogus value that will differ > from one relation to the next, whereas with this change, it will > always be the same value. Perhaps el->rd_lockInfo.lockRelId.relId = > (Oid) rlocator.relNumber would be a more natural adaptation? > > +#define CHECK_RELFILENUMBER_RANGE(relfilenumber)\ > +do {\ > +if ((relfilenumber) < 0 || (relfilenumber) > MAX_RELFILENUMBER) \ > +ereport(ERROR, \ > +errcode(ERRCODE_INVALID_PARAMETER_VALUE), \ > +errmsg("relfilenumber %lld is out of range",\ > +(long long) (relfilenumber))); \ > +} while (0) > > Here, you take the approach of casting the relfilenumber to long long > and then using %lld. But elsewhere, you use > INT64_FORMAT/UINT64_FORMAT. If we're going to use this technique, we > ought to use it everywhere. Based on the discussion [1], it seems we can not use INT64_FORMAT/UINT64_FORMAT while using ereport. But all other places I am using INT64_FORMAT/UINT64_FORMAT. Does this make sense? [1] https://www.postgresql.org/message-id/20220730113922.qd7qmenwcmzyacje%40alvherre.pgsql > typedef struct > { > -Oid reltablespace; > -RelFileNumber relfilenumber; > -} RelfilenumberMapKey; > - > -typedef struct > -{ > -RelfilenumberMapKey key;/* lookup key - must be first */ > +RelFileNumber relfilenumber;/* lookup key - must be first */ > Oid relid;
Re: Add common function ReplicationOriginName.
On Wed, Sep 21, 2022 at 8:08 PM Amit Kapila wrote: > > On Wed, Sep 21, 2022 at 3:09 PM Peter Smith wrote: > > > > On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila wrote: > > > > > ... > > > > > Can't we use the existing function ReplicationOriginNameForTablesync() > > > by passing relid as InvalidOid for this purpose? We need a check > > > inside to decide which name to construct, otherwise, it should be > > > fine. If we agree with this, then we can change the name of the > > > function to something like ReplicationOriginNameForLogicalRep or > > > ReplicationOriginNameForLogicalRepWorkers. > > > > > > > This suggestion attaches special meaning to the reild param. > > > > Won't it seem a bit strange for the non-tablesync callers (who > > probably have a perfectly valid 'relid') to have to pass an InvalidOid > > relid just so they can format the correct origin name? > > > > For non-tablesync workers, relid should always be InvalidOid. See, how > we launch apply workers in ApplyLauncherMain(). Do you see any case > for non-tablesync workers where relid is not InvalidOid? > Hmm, my mistake. I was thinking more of all the calls coming from the subscriptioncmds.c, but now that I look at it maybe none of those has any relid either. OK, I can unify the 2 functions as you suggested. I will post another patch in a few days. -- Kind Regards, Peter Smith, Fujitsu Australia.
Re: [PATCH]Feature improvement for MERGE tab completion
On 2022-Sep-21, Fujii Masao wrote: > How about adding something like PartialMatches() that checks whether > the keywords are included in the input string or not? If so, we can restrict > some tab-completion rules to operating only on MERGE, as follows. I attached > the WIP patch (0002 patch) that introduces PartialMatches(). > Is this approach over-complicated? Thought? I think it's fine to backpatch your 0001 to 15 and put 0002 in master. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food" (Donald Knuth)
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Wed, Sep 21, 2022 at 12:15 PM Fujii Masao wrote: > > This looks much complicated to me. > > Instead of making allocate_backup_state() or reset_backup_state() > store the label name in BackupState before do_pg_backup_start(), > how about making do_pg_backup_start() do that after checking its length? > Seems this can simplify the code very much. > > If so, ISTM that we can replace allocate_backup_state() and > reset_backup_state() with just palloc0() and MemSet(), respectively. > Also we can remove fill_backup_label_name(). Yes, that makes things simpler. I will post the v10 patch-set soon. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Sep 21, 2022 at 2:55 PM Peter Smith wrote: > > == > > 3. .../replication/logical/applyparallelworker.c - parallel_apply_can_start > > +/* > + * Returns true, if it is allowed to start a parallel apply worker, false, > + * otherwise. > + */ > +static bool > +parallel_apply_can_start(TransactionId xid) > > Seems a slightly complicated comment for a simple boolean function. > > SUGGESTION > Returns true/false if it is OK to start a parallel apply worker. > I think this is the style followed at some other places as well. So, we can leave it. > > 6. src/backend/replication/logical/launcher.c - logicalrep_worker_detach > > logicalrep_worker_detach(void) > { > + /* Stop the parallel apply workers. */ > + if (!am_parallel_apply_worker() && !am_tablesync_worker()) > + { > + List*workers; > + ListCell *lc; > > The condition is not very obvious. This is why I previously suggested > adding another macro/function like 'isLeaderApplyWorker'. > How about having function a function am_leader_apply_worker() { ... return OidIsValid(MyLogicalRepWorker->relid) && (MyLogicalRepWorker->apply_leader_pid == InvalidPid) ...}? > > 13. src/include/replication/worker_internal.h > > + /* > + * Indicates whether the worker is available to be used for parallel apply > + * transaction? > + */ > + bool in_use; > > This comment seems backward for this member's name. > > SUGGESTION (something like...) > Indicates whether this ParallelApplyWorkerInfo is currently being used > by a parallel apply worker processing a transaction. (If this flag is > false then it means the ParallelApplyWorkerInfo is available for > re-use by another parallel apply worker.) > I am not sure if this is an improvement over the current. The current comment appears reasonable to me as it is easy to follow. -- With Regards, Amit Kapila.
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Wed, Sep 21, 2022 at 12:27 PM Michael Paquier wrote: > > >>> I've also taken help of the error callback mechanism to clean up the > >>> allocated memory in case of a failure. For do_pg_abort_backup() cases, > >>> I think we don't need to clean the memory because that gets called on > >>> proc exit (before_shmem_exit()). > >> > >> Memory could still bloat while the process running the SQL functions > >> is running depending on the error code path, anyway. > > > > I didn't get your point. Can you please elaborate it? I think adding > > error callbacks at the right places would free up the memory for us. > > Please note that we already are causing memory leaks on HEAD today. > > I mean that HEAD makes no effort in freeing this memory in > TopMemoryContext on session ERROR. Correct. We can also solve it as part of this commit. Please let me know your thoughts on 0002 patch. > I have a few comments on 0001. > > +#include > Double quotes wanted here. Ah, my bad. Corrected now. > deallocate_backup_variables() is the only part of xlogbackup.c that > includes references of the tablespace map_and backup_label > StringInfos. I would be tempted to fully decouple that from > xlogbackup.c/h for the time being. There's no problem with it IMO, after all, they are backup related variables. And that function reduces a bit of duplicate code. > - tblspc_map_file = makeStringInfo(); > Not sure that there is a need for a rename here. We're referring tablespace_map and backup_label internally all around, just to be in sync, I wanted to rename it while we're refactoring this code. > +void > +build_backup_content(BackupState *state, bool ishistoryfile, StringInfo str) > +{ > It would be more natural to have build_backup_content() do by itself > the initialization of the StringInfo for the contents of backup_label > and return it as a result of the routine? This is short-lived in > xlogfuncs.c when the backup ends. See the below explaination. > @@ -248,18 +250,25 @@ perform_base_backup(basebackup_options *opt, bbsink > *sink) > [...] > + /* Construct backup_label contents. */ > + build_backup_content(backup_state, false, backup_label); > > Actually, for base backups, perhaps it would be more intuitive to > build and free the StringInfo of the backup_label when we send it for > base.tar rather than initializing it at the beginning and freeing it > at the end? sendFileWithContent() is in a for-loop and we are good if we call build_backup_content() before do_pg_backup_start() just once. Also, allocating backup_label in the for loop makes error handling trickier, how do we free-up when sendFileWithContent() errors out? Of course, we can allocate backup_label once even in the for loop with bool first_time sort of variable and store StringInfo *ptr_backup_label; in error callback info, but that would make things unnecessarily complex, instead we're good allocating and creating backup_label content at the beginning and freeing-it up at the end. > - * pg_backup_start: set up for taking an on-line backup dump > + * pg_backup_start: start taking an on-line backup. > * > - * Essentially what this does is to create a backup label file in $PGDATA, > - * where it will be archived as part of the backup dump. The label file > - * contains the user-supplied label string (typically this would be used > - * to tell where the backup dump will be stored) and the starting time and > - * starting WAL location for the dump. > + * This function starts the backup and creates tablespace_map contents. > > The last part of the comment is still correct while the former is not, > so this loses some information. Added that part before pg_backup_stop() now where it makes sense with the refactoring. I'm attaching the v10 patch-set with the above review comments addressed. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From cb93211220d73cfb4ae832ff4e04fd46e706ddfe Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 21 Sep 2022 10:52:06 + Subject: [PATCH v10] Refactor backup related code Refactor backup related code, advantages of doing so are following: 1) backup state is more structured now - all in a single structure, callers can create backup_label contents whenever required, either during the pg_backup_start or the pg_backup_stop or in between. 2) no parsing of backup_label file lines now, no error checking for invalid parsing. 3) backup_label and history file contents have most of the things in common, they can now be created within a single function. 4) makes backup related code extensible and readable. This introduces new source files xlogbackup.c/.h for backup related code and adds the new code in there. The xlog.c file has already grown to ~9000 LOC (as of this time). Eventually, we would want to move all the backup related code from xlog.c, xlogfuncs.c, elsewhere to here. Author: Bharath Rupireddy Reviewed-by: Michael Paquier Reviewed-b
Re: Hash index build performance tweak from sorting
On Wed, 21 Sept 2022 at 02:32, David Rowley wrote: > > On Tue, 2 Aug 2022 at 03:37, Simon Riggs wrote: > > Using the above test case, I'm getting a further 4-7% improvement on > > already committed code with the attached patch, which follows your > > proposal. > > > > The patch passes info via a state object, useful to avoid API churn in > > later patches. > > Hi Simon, > > I took this patch for a spin and saw a 2.5% performance increase using > the random INT test that Tom posted. The index took an average of > 7227.47 milliseconds on master and 7045.05 with the patch applied. Hi David, Thanks for tests and review. I'm just jumping on a plane, so may not respond in detail until next Mon. -- Simon Riggshttp://www.EnterpriseDB.com/
[PATCH] polish the error message of creating proc
when a error occurs when creating proc, it should point out the specific proc kind instead of just printing "function". diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c index a9fe45e347..58af4b48ce 100644 --- a/src/backend/catalog/pg_proc.c +++ b/src/backend/catalog/pg_proc.c @@ -373,7 +373,11 @@ ProcedureCreate(const char *procedureName, if (!replace) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_FUNCTION), -errmsg("function \"%s\" already exists with same argument types", +errmsg("%s \"%s\" already exists with same argument types", + (oldproc->prokind == PROKIND_AGGREGATE ? "aggregate function" : + oldproc->prokind == PROKIND_PROCEDURE ? "procedure" : + oldproc->prokind == PROKIND_WINDOW ? "window function" : +"function"), procedureName))); if (!pg_proc_ownercheck(oldproc->oid, proowner)) aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, -- Regards Junwang Zhao 0001-polish-the-error-message-of-creating-proc.patch Description: Binary data
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Fujii-san, Thanks for checking! > These failed to be applied to the master branch cleanly. Could you update > them? PSA rebased patches. I reviewed my myself and they contain changes. E.g., move GUC-related code to option.c. > + this option relies on kernel events exposed by Linux, macOS, > > s/this/This Fixed. > > + GUC_check_errdetail("pgfdw_health_check_interval must be set > to 0 on this platform"); > > The actual parameter name "postgres_fdw.health_check_interval" > should be used for the message instead of internal variable name. Fixed. > This registered signal handler does lots of things. But that's not acceptable > and they should be performed outside signal handler. No? I modified like v09 or earlier versions, which has a mechanism for registering CheckingRemoteServersCallback. It had been removed because we want to keep core simpler, but IIUC it is needed if the signal handler just sets some flags. The core-side does not consider the current status of transaction and running query for simpleness. Best Regards, Hayato Kuroda FUJITSU LIMITED v15-0001-Add-an-infrastracture-for-checking-remote-server.patch Description: v15-0001-Add-an-infrastracture-for-checking-remote-server.patch v15-0002-postgres_fdw-Implement-health-check-feature.patch Description: v15-0002-postgres_fdw-Implement-health-check-feature.patch v15-0003-add-doc.patch Description: v15-0003-add-doc.patch v15-0004-add-test.patch Description: v15-0004-add-test.patch
Re: tweak to a few index tests to hits ambuildempty() routine.
On 2022-Sep-21, a.kozhemya...@postgrespro.ru wrote: > After analyzing this, I found out why we don't reach that Assert but we have > coverage shown - firstly, it reached via another test, vacuum; secondly, it > depends on the gcc optimization flag. We reach that Assert only when using > -O0. > If we build with -O2 or -Og that function is not reached (due to different > results of the heap_prune_satisfies_vacuum() check inside > heap_page_prune()). > But as the make checks mostly (including the buildfarm testing) performed > with -O2/-Og, it looks like that after 4fb5c794e5 we have lost the coverage > provided by the 4c51a2d1e4. Hmm, so if we merely revert the change to gin.sql then we still won't get the coverage back? I was thinking that a simple change would be to revert the change from temp to unlogged for that table, and create another unlogged table; but maybe that's not enough. Do we need a better test for GIN vacuuming that works regardless of the optimization level? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Investigación es lo que hago cuando no sé lo que estoy haciendo" (Wernher von Braun)
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Thank you for reviewing. In the previous patch there was an error when processing constraints. The patch was fixed, but the code grew up and became more complicated (0005-COPY_IGNORE_ERRORS). I also simplified the logic of safeNextCopyFrom(). You asked why we need subtransactions, so the answer is in this patch. When processing a row that does not satisfy constraints or INSTEAD OF triggers, it is necessary to rollback the subtransaction and return the table to its original state. Cause of complexity, I had to abandon the constraints, triggers processing in and handle only errors that occur when reading the file. Attaching simplified patch (0006-COPY_IGNORE_ERRORS). Checked out these patches on all COPY regress tests and it worked correctly. > BTW in v4 patch, data are loaded into the buffer one by one, and when > the buffer fills up, the data in the buffer are 'replayed' also one by > one, right? > Wouldn't this have more overhead than a normal COPY? > The data is loaded into the buffer one by one, but in the "replay mode" ignore_errors works as standard COPY. Tuples add to the slot and depending on the type of the slot (CIM_SINGLE or CIM_MULTI) tuples are replayed in the corresponding case. For the 0006 patch you can imagine that we divide the loop for(;;) in 2 parts. The first part is adding tuples to the buffer and the second part is inserting tuples to the table. These parts don't intersect with each other and are completed sequentially. The main idea of replay_buffer is that it is needed for the CIM_SINGLE case. You can implement the CIM_SINGLE case and see that tuples before an error occurring don't add to the table. Logic of the 0005 patch is similar but with some differences. As a test, I COPYed slightly larger data with and without ignore_errors > option. > There might be other reasons, but I found a performance difference. Tried to reduce performance difference with cleaning up replay_buffer with resetting the new context for replay_buffer - replay_cxt. ``` Before: Without ignore_errors: COPY 1000 Time: 15538,579 ms (00:15,539) With ignore_errors: COPY 1000 Time: 21289,121 ms (00:21,289) After: Without ignore_errors: COPY 1000 Time: 15318,922 ms (00:15,319) With ignore_errors: COPY 1000 Time: 19868,175 ms (00:19,868) ``` - Put in the documentation that the warnings will not be output for more > than 101 cases. > Yeah, I point it out in the doc. > I applied v4 patch and when canceled the COPY, there was a case I found > myself left in a transaction. > Should this situation be prevented from occurring? > > ``` > =# copy test from '/tmp/1000.data' with (ignore_errors ); > > ^CCancel request sent > ERROR: canceling statement due to user request > > =# truncate test; > ERROR: current transaction is aborted, commands ignored until end of > transaction block > ``` > Tried to implement your error and could not. The result was the same as COPY FROM implements. diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index c25b52d0cb..c99adabcc9 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -34,6 +34,7 @@ COPY { table_name [ ( format_name FREEZE [ boolean ] +IGNORE_ERRORS [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' HEADER [ boolean | MATCH ] @@ -233,6 +234,20 @@ COPY { table_name [ ( + +IGNORE_ERRORS + + + Drops rows that contain malformed data while copying. These are rows + containing syntax errors in data, rows with too many or too few columns, + rows that result in constraint violations, rows containing columns where + the data type's input function raises an error. + Outputs warnings about rows with incorrect data (the number of warnings + is not more than 100) and the total number of errors. + + + + DELIMITER diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 49924e476a..f41b25f26a 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate, bool is_from, List *options) { + bool ignore_errors_specified = false; bool format_specified = false; bool freeze_specified = false; bool header_specified = false; @@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate, freeze_specified = true; opts_out->freeze = defGetBoolean(defel); } + else if (strcmp(defel->defname, "ignore_errors") == 0) + { + if (ignore_errors_specified) +errorConflictingDefElem(defel, pstate); + ignore_errors_specified = true; + opts_out->ignore_errors = defGetBoolean(defel); + } else if (strcmp(defel->defname, "delimiter") == 0) { if (opts_out->delim) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index e8bb168aea..6474d47d29 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,12 @@ static char *limit_print
Re: [PATCH] polish the error message of creating proc
Hi, On Wed, Sep 21, 2022 at 07:45:01PM +0800, Junwang Zhao wrote: > when a error occurs when creating proc, it should point out the > specific proc kind instead of just printing "function". You should have kept the original thread in copy (1), or at least mention it. > diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c > index a9fe45e347..58af4b48ce 100644 > --- a/src/backend/catalog/pg_proc.c > +++ b/src/backend/catalog/pg_proc.c > @@ -373,7 +373,11 @@ ProcedureCreate(const char *procedureName, > if (!replace) > ereport(ERROR, > (errcode(ERRCODE_DUPLICATE_FUNCTION), > -errmsg("function \"%s\" > already exists with same argument types", > +errmsg("%s \"%s\" already > exists with same argument types", > + > (oldproc->prokind == PROKIND_AGGREGATE ? "aggregate function" : > + > oldproc->prokind == PROKIND_PROCEDURE ? "procedure" : > + > oldproc->prokind == PROKIND_WINDOW ? "window function" : > +"function"), > procedureName))); > if (!pg_proc_ownercheck(oldproc->oid, proowner)) > aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, You can't put part of the message in parameter, as the resulting string isn't translatable. You should either use "routine" as a generic term or provide 3 different full messages. [1] https://www.postgresql.org/message-id/29ea5666.6ce8.1835f4b4992.coremail.qtds_...@126.com
Re: Support pg_attribute_aligned and noreturn in MSVC
On Tue, Sep 20, 2022 at 9:18 PM Michael Paquier wrote: > > On Tue, Sep 20, 2022 at 08:01:20AM -0400, James Coleman wrote: > > I don't have access to a Windows machine for testing, but re-reading > > the documentation it looks like the issue is that our noreturn macro > > is used after the definition while the MSVC equivalent is used before. > > A CI setup would do the job for example, see src/tools/ci/README that > explains how to set up things. That's a good reminder; I've been meaning to set that up but haven't taken the time yet. > > I've removed that for now (and commented about it); it's not as > > valuable anyway since it's mostly an indicator for code analysis > > (human and machine). > > Except for the fact that the patch missed to define > pg_attribute_noreturn() in the MSVC branch, this looks fine to me. I > have been looking at what you meant with packing, and I can see the > business with PACK(), something actually doable with gcc. > > That's a first step, at least, and I see no reason not to do it, so > applied. Thanks! James Coleman
Re: Summary function for pg_buffercache
Hi Andres, > All you need to do is to read BufferDesc->state into a local variable and > then make decisions based on that You are right, thanks. Here is the corrected patch. -- Best regards, Aleksander Alekseev v9-0001-Added-pg_buffercache_summary-function.patch Description: Binary data
Re: Mingw task for Cirrus CI
Hi hackers, Justin Pryzby , 5 Eyl 2022 Pzt, 14:50 tarihinde şunu yazdı: > But cfbot should run the Mingw task for this patch's own commitfest > entry. But right now (because cfbot doesn't include the original commit > message/s), it doesn't get run :( > I've been thinking about how to make the mingw task run only for this patch on cfbot and not for others. TBH, I couldn't come up with a nice way to achieve this. Does anyone have any suggestions on this? Andres Freund , 6 Eyl 2022 Sal, 02:52 tarihinde şunu yazdı: > Hi, > > On 2022-09-05 06:50:55 -0500, Justin Pryzby wrote: > > I saw that but hadn't tracked it down yet. Do you know if the tar > > failures were from a TAP test added since you first posted the mingw > > patch, or ?? > > I think it's that msys now includes tar by default, but not sure. Seems like msys2 comes with GNU tar now. Previously it was bsd tar under "/c/Windows/System32/tar.exe " which now we force it to use. Thanks, Melih
Re: On login trigger: take three
On 20.09.2022 17:10, Daniel Gustafsson wrote: On 20 Sep 2022, at 15:43, Sergey Shinderuk wrote: There is a race around setting and clearing of dathasloginevt. Thanks for the report! The whole dathasloginevt mechanism is IMO too clunky and unelegant to go ahead with, I wouldn't be surprised if there are other bugs lurking there. Indeed. CREATE DATABASE doesn't copy dathasloginevt from the template database. ALTER EVENT TRIGGER ... ENABLE doesn't set dathasloginevt. In both cases triggers are enabled according to \dy output, but don't fire. The admin may not notice it immediately. -- Sergey Shinderukhttps://postgrespro.com/
Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.
On Mon, Sep 19, 2022 at 8:09 PM Ashutosh Sharma wrote: > > On Mon, Sep 19, 2022 at 5:24 PM Ashutosh Bapat > wrote: > > > > On Mon, Sep 19, 2022 at 1:43 PM Ashutosh Sharma > > wrote: > > > > > > On Fri, Sep 9, 2022 at 5:36 PM Ashutosh Bapat > > > wrote: > > > > > > > > On Thu, Sep 8, 2022 at 8:32 PM Ashutosh Sharma > > > > wrote: > > > > > > > > > > On Thu, Sep 8, 2022 at 6:23 PM Ashutosh Bapat > > > > > wrote: > > > > > > > > > > > > On Thu, Sep 8, 2022 at 4:14 PM Ashutosh Sharma > > > > > > wrote: > > > > Can you please point to the documentation. > > > > > > > > > > AFAIU there is just one documentation. Here is the link for it: > > > > > > https://www.postgresql.org/docs/current/view-pg-replication-slots.html > > > > Thanks. Description of confirmed_flush_lsn is "The address (LSN) up to > > which the logical slot's consumer has confirmed receiving data. Data > > older than this is not available anymore. NULL for physical slots." > > The second sentence is misleading. AFAIU, it really should be "Data > > corresponding to the transactions committed before this LSN is not > > available anymore". WAL before restart_lsn is likely to be removed but > > WAL with LSN higher than restart_lsn is preserved. This correction > > makes more sense because of the third sentence. > > > > Thanks for the clarification. Attached is the patch with the changes. > Please have a look. > Looks good to me. Do you want to track this through commitfest app? -- Best Wishes, Ashutosh Bapat
Re: Tighten pg_get_object_address argument checking
On 21.09.22 12:01, Amit Kapila wrote: On Tue, Sep 20, 2022 at 11:14 PM Peter Eisentraut wrote: For publication schemas (OBJECT_PUBLICATION_NAMESPACE) and user mappings (OBJECT_USER_MAPPING), pg_get_object_address() checked the array length of the second argument, but not of the first argument. If the first argument was too long, it would just silently ignore everything but the first argument. Fix that by checking the length of the first argument as well. LGTM. Committed, thanks for checking.
Re: Mingw task for Cirrus CI
Hi, On 2022-09-21 16:18:43 +0300, Melih Mutlu wrote: > I've been thinking about how to make the mingw task run only for this patch > on cfbot and not for others. TBH, I couldn't come up with a nice way to > achieve this. > > Does anyone have any suggestions on this? Add a commented-out trigger-type: manual and note in the commit message that the committer needs to adjust that piece. Greetings, Andres Freund
Is it time to drop fix-old-flex-code.pl?
In view of Author: John Naylor Branch: master [8b878bffa] 2022-09-09 12:55:23 +0700 Bump minimum version of Flex to 2.5.35 I wonder if we should go a little further and get rid of src/tools/fix-old-flex-code.pl (just in HEAD, to be clear). That does nothing when flex is 2.5.36 or newer, and even with older versions it's just suppressing an "unused variable" warning. I think there's a reasonable case for not caring about that, or at least valuing it lower than saving one build step. According to a recent survey, these are the active buildfarm members still using 2.5.35: frogfish | 2022-08-21 17:59:26 | configure: using flex 2.5.35 hoverfly | 2022-09-02 16:02:01 | configure: using flex 2.5.35 lapwing | 2022-09-02 16:40:12 | configure: using flex 2.5.35 skate | 2022-09-02 07:27:10 | configure: using flex 2.5.35 snapper | 2022-09-02 13:38:22 | configure: using flex 2.5.35 lapwing uses -Werror, so it will be unhappy, but I don't think it's unreasonable to say that you should be using fairly modern tools if you want to use -Werror. Or we could leave well enough alone. But the current cycle of getting rid of ancient portability hacks seems like a good climate for this to happen. Thoughts? regards, tom lane
Removing unused parameter in SnapBuildGetOrBuildSnapshot
Hi hackers, Sharing a small patch to remove an unused parameter in SnapBuildGetOrBuildSnapshot function in snapbuild.c With commit 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, SnapBuildBuildSnapshot no longer needs transaction id. This also makes the xid parameter in SnapBuildGetOrBuildSnapshot useless. I couldn't see a reason to keep it and decided to remove it. Regards, Melih 0001-Remove-unused-xid-parameter.patch Description: Binary data
Re: [PATCH] polish the error message of creating proc
On Wed, Sep 21, 2022 at 8:17 PM Julien Rouhaud wrote: > > Hi, > > On Wed, Sep 21, 2022 at 07:45:01PM +0800, Junwang Zhao wrote: > > when a error occurs when creating proc, it should point out the > > specific proc kind instead of just printing "function". > > You should have kept the original thread in copy (1), or at least mention it. Yeah, thanks for pointing that out, will do that next time. > > > diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c > > index a9fe45e347..58af4b48ce 100644 > > --- a/src/backend/catalog/pg_proc.c > > +++ b/src/backend/catalog/pg_proc.c > > @@ -373,7 +373,11 @@ ProcedureCreate(const char *procedureName, > > if (!replace) > > ereport(ERROR, > > > > (errcode(ERRCODE_DUPLICATE_FUNCTION), > > -errmsg("function \"%s\" > > already exists with same argument types", > > +errmsg("%s \"%s\" already > > exists with same argument types", > > + > > (oldproc->prokind == PROKIND_AGGREGATE ? "aggregate function" : > > + > > oldproc->prokind == PROKIND_PROCEDURE ? "procedure" : > > + > > oldproc->prokind == PROKIND_WINDOW ? "window function" : > > +"function"), > > procedureName))); > > if (!pg_proc_ownercheck(oldproc->oid, proowner)) > > aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, > > You can't put part of the message in parameter, as the resulting string isn't > translatable. You should either use "routine" as a generic term or provide 3 > different full messages. I noticed that there are some translations under the backend/po directory, can we just change msgid "function \"%s\" already exists with same argument types" to msgid "%s \"%s\" already exists with same argument types" ? > > [1] > https://www.postgresql.org/message-id/29ea5666.6ce8.1835f4b4992.coremail.qtds_...@126.com -- Regards Junwang Zhao
Re: Removing unused parameter in SnapBuildGetOrBuildSnapshot
On Sep 21, 2022, 22:22 +0800, Melih Mutlu , wrote: > Hi hackers, > > Sharing a small patch to remove an unused parameter in > SnapBuildGetOrBuildSnapshot function in snapbuild.c > > With commit 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, SnapBuildBuildSnapshot > no longer needs transaction id. This also makes the xid parameter in > SnapBuildGetOrBuildSnapshot useless. > I couldn't see a reason to keep it and decided to remove it. > > Regards, > Melih +1, Good Catch. Regards, Zhang Mingli
Re: [PATCH] polish the error message of creating proc
On Wed, Sep 21, 2022 at 10:35:47PM +0800, Junwang Zhao wrote: > On Wed, Sep 21, 2022 at 8:17 PM Julien Rouhaud wrote: > > > > You can't put part of the message in parameter, as the resulting string > > isn't > > translatable. You should either use "routine" as a generic term or provide > > 3 > > different full messages. > > I noticed that there are some translations under the backend/po directory, > can we just change > msgid "function \"%s\" already exists with same argument types" > to > msgid "%s \"%s\" already exists with same argument types" ? The problem is that the parameters are passed as-is, so the final emitted translated string will be a mix of both languages, which isn't acceptable.
Re: why can't a table be part of the same publication as its schema
On 2022-Sep-20, Robert Haas wrote: > > I don't think we should change this behavior that's already in logical > > replication. While I understand the reasons why "GRANT ... ALL TABLES IN > > SCHEMA" has a different behavior (i.e. it's not applied to future > > objects) and do not advocate to change it, I have personally been > > affected where I thought a permission would be applied to all future > > objects, only to discover otherwise. I believe it's more intuitive to > > think that "ALL" applies to "everything, always." > > Nah, there's room for multiple behaviors here. It's reasonable to want > to add all the tables currently in the schema to a publication (or > grant permissions on them) and it's reasonable to want to include all > current and future tables in the schema in a publication (or grant > permissions on them) too. The reason I don't like the ALL TABLES IN > SCHEMA syntax is that it sounds like the former, but actually is the > latter. Based on your link to the email from Tom, I understand now the > reason why it's like that, but it's still counterintuitive to me. I already proposed elsewhere that we remove the ALL keyword from there, which I think serves to reduce confusion (in particular it's no longer parallel to the GRANT one). As in the attached. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "No renuncies a nada. No te aferres a nada." >From a72ceafe65c02998761cf21bec2820bf49b00a8f Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 21 Sep 2022 16:18:16 +0200 Subject: [PATCH] remove ALL from ALL TABLES IN SCHEMA --- doc/src/sgml/logical-replication.sgml | 4 +- doc/src/sgml/ref/alter_publication.sgml | 16 +-- doc/src/sgml/ref/create_publication.sgml | 14 +- doc/src/sgml/ref/create_subscription.sgml | 2 +- doc/src/sgml/system-views.sgml| 2 +- src/backend/catalog/pg_publication.c | 4 +- src/backend/commands/publicationcmds.c| 6 +- src/backend/parser/gram.y | 18 +-- src/backend/replication/pgoutput/pgoutput.c | 3 +- src/bin/pg_dump/pg_dump.c | 2 +- src/bin/pg_dump/t/002_pg_dump.pl | 12 +- src/bin/psql/tab-complete.c | 14 +- src/test/regress/expected/alter_table.out | 2 +- src/test/regress/expected/object_address.out | 2 +- src/test/regress/expected/publication.out | 120 +- src/test/regress/sql/alter_table.sql | 2 +- src/test/regress/sql/object_address.sql | 2 +- src/test/regress/sql/publication.sql | 104 +++ .../t/025_rep_changes_for_schema.pl | 6 +- src/test/subscription/t/028_row_filter.pl | 12 +- src/test/subscription/t/031_column_list.pl| 4 +- 21 files changed, 175 insertions(+), 176 deletions(-) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 48fd8e33dc..7fe3a1043e 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -700,7 +700,7 @@ test_sub=# SELECT * FROM t3; one of the publications was created using - FOR ALL TABLES IN SCHEMA and the table belongs to + FOR TABLES IN SCHEMA and the table belongs to the referred schema. This clause does not allow row filters. @@ -1530,7 +1530,7 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER Moreover, if untrusted users can create tables, use only publications that list tables explicitly. That is to say, create a subscription FOR ALL TABLES or - FOR ALL TABLES IN SCHEMA only when superusers trust + FOR TABLES IN SCHEMA only when superusers trust every user permitted to create a non-temp table on the publisher or the subscriber. diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml index d8ed89ee91..fc2d6d4885 100644 --- a/doc/src/sgml/ref/alter_publication.sgml +++ b/doc/src/sgml/ref/alter_publication.sgml @@ -31,7 +31,7 @@ ALTER PUBLICATION name RENAME TO where publication_object is one of: TABLE [ ONLY ] table_name [ * ] [ ( column_name [, ... ] ) ] [ WHERE ( expression ) ] [, ... ] -ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] +TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] @@ -71,19 +71,19 @@ ALTER PUBLICATION name RENAME TO You must own the publication to use ALTER PUBLICATION. Adding a table to a publication additionally requires owning that table. - The ADD ALL TABLES IN SCHEMA and - SET ALL TABLES IN SCHEMA to a publication requires the + The ADD TABLES IN SCHEMA and + SET TABLES IN SCHEMA to a publication requires the invoking user to be a superuser. To alter the owner, you must also be a direct or indirect member of the new owning role. The new owner must have CREATE privilege on the database. Also, th
Re: ICU for global collation
On 21.09.22 08:50, Marina Polyakova wrote: On 2022-09-20 12:59, Peter Eisentraut wrote: On 17.09.22 10:33, Marina Polyakova wrote: 3. The locale provider is ICU, but it has not yet been set from the template database: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && createdb --icu-locale ru-RU --template template0 mydb ... createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU Please see attached patch for a fix. Does that work for you? Yes, it works. The following test checks this fix: Committed with that test, thanks. I think that covers all the ICU issues you reported for PG15 for now?
Re: [PATCH] polish the error message of creating proc
Junwang Zhao writes: > I noticed that there are some translations under the backend/po directory, > can we just change > msgid "function \"%s\" already exists with same argument types" > to > msgid "%s \"%s\" already exists with same argument types" ? No. This doesn't satisfy our message translation guidelines [1]. The fact that there are other messages that aren't up to project standard isn't a license to create more. More generally: there are probably dozens, if not hundreds, of messages in the backend that say "function" but nowadays might also be talking about a procedure. I'm not sure there's value in improving just one of them. I am pretty sure that we made an explicit decision some time back that it is okay to say "function" when the object could also be an aggregate or window function. So you could at least cut this back to just handling "procedure" and "function". Or you could change it to "routine" as Julien suggests, but I think a lot of people will not think that's an improvement. regards, tom lane [1] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
Re: why can't a table be part of the same publication as its schema
On 9/21/22 10:24 AM, Alvaro Herrera wrote: On 2022-Sep-20, Robert Haas wrote: I don't think we should change this behavior that's already in logical replication. While I understand the reasons why "GRANT ... ALL TABLES IN SCHEMA" has a different behavior (i.e. it's not applied to future objects) and do not advocate to change it, I have personally been affected where I thought a permission would be applied to all future objects, only to discover otherwise. I believe it's more intuitive to think that "ALL" applies to "everything, always." Nah, there's room for multiple behaviors here. It's reasonable to want to add all the tables currently in the schema to a publication (or grant permissions on them) and it's reasonable to want to include all current and future tables in the schema in a publication (or grant permissions on them) too. The reason I don't like the ALL TABLES IN SCHEMA syntax is that it sounds like the former, but actually is the latter. Based on your link to the email from Tom, I understand now the reason why it's like that, but it's still counterintuitive to me. I already proposed elsewhere that we remove the ALL keyword from there, which I think serves to reduce confusion (in particular it's no longer parallel to the GRANT one). As in the attached. [personal, not RMT hat] I'd be OK with this. It would still allow for "FOR SEQUENCES IN SCHEMA" etc. Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] polish the error message of creating proc
On Wed, Sep 21, 2022 at 10:53 PM Tom Lane wrote: > > Junwang Zhao writes: > > I noticed that there are some translations under the backend/po directory, > > can we just change > > msgid "function \"%s\" already exists with same argument types" > > to > > msgid "%s \"%s\" already exists with same argument types" ? > > No. This doesn't satisfy our message translation guidelines [1]. > The fact that there are other messages that aren't up to project > standard isn't a license to create more. > > More generally: there are probably dozens, if not hundreds, of > messages in the backend that say "function" but nowadays might > also be talking about a procedure. I'm not sure there's value > in improving just one of them. > > I am pretty sure that we made an explicit decision some time back > that it is okay to say "function" when the object could also be > an aggregate or window function. So you could at least cut this > back to just handling "procedure" and "function". Or you could > change it to "routine" as Julien suggests, but I think a lot of > people will not think that's an improvement. Yeah, make sense, will leave it as is. > > regards, tom lane > > [1] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES -- Regards Junwang Zhao
Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.
On Wed, Sep 21, 2022 at 7:21 PM Ashutosh Bapat wrote: > > On Mon, Sep 19, 2022 at 8:09 PM Ashutosh Sharma wrote: > > > > On Mon, Sep 19, 2022 at 5:24 PM Ashutosh Bapat > > wrote: > > > > > > On Mon, Sep 19, 2022 at 1:43 PM Ashutosh Sharma > > > wrote: > > > > > > > > On Fri, Sep 9, 2022 at 5:36 PM Ashutosh Bapat > > > > wrote: > > > > > > > > > > On Thu, Sep 8, 2022 at 8:32 PM Ashutosh Sharma > > > > > wrote: > > > > > > > > > > > > On Thu, Sep 8, 2022 at 6:23 PM Ashutosh Bapat > > > > > > wrote: > > > > > > > > > > > > > > On Thu, Sep 8, 2022 at 4:14 PM Ashutosh Sharma > > > > > > > wrote: > > > > > Can you please point to the documentation. > > > > > > > > > > > > > AFAIU there is just one documentation. Here is the link for it: > > > > > > > > https://www.postgresql.org/docs/current/view-pg-replication-slots.html > > > > > > Thanks. Description of confirmed_flush_lsn is "The address (LSN) up to > > > which the logical slot's consumer has confirmed receiving data. Data > > > older than this is not available anymore. NULL for physical slots." > > > The second sentence is misleading. AFAIU, it really should be "Data > > > corresponding to the transactions committed before this LSN is not > > > available anymore". WAL before restart_lsn is likely to be removed but > > > WAL with LSN higher than restart_lsn is preserved. This correction > > > makes more sense because of the third sentence. > > > > > > > Thanks for the clarification. Attached is the patch with the changes. > > Please have a look. > > > Looks good to me. Do you want to track this through commitfest app? > Yeah, I've added an entry for it in the commitfest app and marked it as ready for committer. Thanks for the suggestion. -- With Regards, Ashutosh Sharma.
Re: [PoC] Let libpq reject unexpected authentication requests
On Fri, Sep 16, 2022 at 1:29 PM Jacob Champion wrote: > I'm happy to implement proofs of concept for that, or any other ideas, > given the importance of getting this "right enough" the first time. > Just let me know. v8 rebases over the postgres_fdw HINT changes; there are no functional differences. --Jacob From 2343bc37ebdbb6847d6a537f094a5de660274cc3 Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Fri, 24 Jun 2022 15:40:42 -0700 Subject: [PATCH v8 2/2] Add sslcertmode option for client certificates The sslcertmode option controls whether the server is allowed and/or required to request a certificate from the client. There are three modes: - "allow" is the default and follows the current behavior -- a configured sslcert is sent if the server requests one (which, with the current implementation, will happen whenever TLS is negotiated). - "disable" causes the client to refuse to send a client certificate even if an sslcert is configured. - "require" causes the client to fail if a client certificate is never sent and the server opens a connection anyway. This doesn't add any additional security, since there is no guarantee that the server is validating the certificate correctly, but it may help troubleshoot more complicated TLS setups. sslcertmode=require needs the OpenSSL implementation to support SSL_CTX_set_cert_cb(). Notably, LibreSSL does not. --- configure| 11 ++--- configure.ac | 4 +- doc/src/sgml/libpq.sgml | 54 +++ src/include/pg_config.h.in | 3 ++ src/interfaces/libpq/fe-auth.c | 21 + src/interfaces/libpq/fe-connect.c| 55 src/interfaces/libpq/fe-secure-openssl.c | 40 - src/interfaces/libpq/libpq-int.h | 3 ++ src/test/ssl/t/001_ssltests.pl | 43 ++ src/tools/msvc/Solution.pm | 8 10 files changed, 234 insertions(+), 8 deletions(-) diff --git a/configure b/configure index 4efed743a1..86cbf5d68a 100755 --- a/configure +++ b/configure @@ -12904,13 +12904,14 @@ else fi fi - # Function introduced in OpenSSL 1.0.2. - for ac_func in X509_get_signature_nid + # Functions introduced in OpenSSL 1.0.2. LibreSSL doesn't have all of these. + for ac_func in X509_get_signature_nid SSL_CTX_set_cert_cb do : - ac_fn_c_check_func "$LINENO" "X509_get_signature_nid" "ac_cv_func_X509_get_signature_nid" -if test "x$ac_cv_func_X509_get_signature_nid" = xyes; then : + as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` +ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" +if eval test \"x\$"$as_ac_var"\" = x"yes"; then : cat >>confdefs.h <<_ACEOF -#define HAVE_X509_GET_SIGNATURE_NID 1 +#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF fi diff --git a/configure.ac b/configure.ac index 967f7e7209..936c4f4d07 100644 --- a/configure.ac +++ b/configure.ac @@ -1350,8 +1350,8 @@ if test "$with_ssl" = openssl ; then AC_SEARCH_LIBS(CRYPTO_new_ex_data, [eay32 crypto], [], [AC_MSG_ERROR([library 'eay32' or 'crypto' is required for OpenSSL])]) AC_SEARCH_LIBS(SSL_new, [ssleay32 ssl], [], [AC_MSG_ERROR([library 'ssleay32' or 'ssl' is required for OpenSSL])]) fi - # Function introduced in OpenSSL 1.0.2. - AC_CHECK_FUNCS([X509_get_signature_nid]) + # Functions introduced in OpenSSL 1.0.2. LibreSSL doesn't have all of these. + AC_CHECK_FUNCS([X509_get_signature_nid SSL_CTX_set_cert_cb]) # Functions introduced in OpenSSL 1.1.0. We used to check for # OPENSSL_VERSION_NUMBER, but that didn't work with 1.1.0, because LibreSSL # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 1d6c02058e..821078ea6a 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1810,6 +1810,50 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname + + sslcertmode + + +This option determines whether a client certificate may be sent to the +server, and whether the server is required to request one. There are +three modes: + + + + disable + + +a client certificate is never sent, even if one is provided via + + + + + + + allow (default) + + +a certificate may be sent, if the server requests one and it has +been provided via sslcert + + + + + + require + + +the server must request a certificate. The +connection will fail if the server authenticates the client despite +not requesting or receiving one. + + + + + + + + sslrootcert @@ -7968,6 +8012,16 @
Re: Silencing the remaining clang 15 warnings
HEAD and v15 now compile cleanly for me with clang 15.0.0, but I find that there's still work to do in the back branches: * There are new(?) -Wunused-but-set-variable warnings in every older branch, which we evidently cleaned up or rewrote at one point or another. I think this is definitely worth fixing in the in-support branches. I'm a bit less sure if it's worth the trouble in the out-of-support branches. * The 9.2 and 9.3 branches spew boatloads of 'undeclared library function' warnings about strlcpy() and related functions. This is evidently because 16fbac39f was only back-patched as far as 9.4. There are enough of these to be pretty annoying if you're trying to build those branches with clang, so I think this is clearly justified for back-patching into the older out-of-support branches, assuming that the patch will work there. (I see that 9.2 and 9.3 were still on the prior version of autoconf, so it might not be an easy change.) I also observe that 9.2 and 9.3 produce float.c:1278:29: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion] This is because cbdb8b4c0 was only back-patched as far as 9.4. However, I think that that would *not* be fit material for back-patching into out-of-support branches, since our policy for them is "no behavioral changes". regards, tom lane
Re: Is it time to drop fix-old-flex-code.pl?
On Wed, Sep 21, 2022 at 10:21:25AM -0400, Tom Lane wrote: > > lapwing uses -Werror, so it will be unhappy, but I don't think it's > unreasonable to say that you should be using fairly modern tools > if you want to use -Werror. I think that the -Werror helped to find problems multiple times (although less often than the ones due to the 32bits arch). If it's worth keeping I could see if I can get a 2.5.36 flex to compile, if we drop fix-old-flex-code.pl.
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Sep 20, 2022 at 4:19 PM Jacob Champion wrote: > > 2. Add support to pass on the OAuth bearer token. In this > > obtaining the bearer token is left to 3rd party application or user. > > > > ./psql -U -d 'dbname=postgres > > oauth_client_id= oauth_bearer_token= > > This hurts, but I think people are definitely going to ask for it, given > the frightening practice of copy-pasting these (incredibly sensitive > secret) tokens all over the place... After some further thought -- in this case, you already have an opaque Bearer token (and therefore you already know, out of band, which provider needs to be used), you're willing to copy-paste it from whatever service you got it from, and you have an extension plugged into Postgres on the backend that verifies this Bearer blob using some procedure that Postgres knows nothing about. Why do you need the OAUTHBEARER mechanism logic at that point? Isn't that identical to a custom password scheme? It seems like that could be handled completely by Samay's pluggable auth proposal. --Jacob
Re: Query Jumbling for CALL and SET utility statements
On 2022/09/19 15:29, Drouvot, Bertrand wrote: Please find attached v6 taking care of the remarks mentioned above. Thanks for updating the patch! +SET pg_stat_statements.track_utility = TRUE; + +-- PL/pgSQL procedure and pg_stat_statements.track = all +-- we drop and recreate the procedures to avoid any caching funnies +SET pg_stat_statements.track_utility = FALSE; Could you tell me why track_utility is enabled just before it's disabled? Could you tell me what actually happens if we don't drop and recreate the procedures? I'd like to know what "any caching funnies" are. +SELECT pg_stat_statements_reset(); +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); + +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; This test set for the procedures is executed with the following four conditions, respectively. Do we really need all of these tests? track = top, track_utility = true track = top, track_utility = false track = all, track_utility = true track = all, track_utility = false +begin; +prepare transaction 'tx1'; +insert into test_tx values (1); +commit prepared 'tx1'; The test set of 2PC commands is also executed with track_utility = on and off, respectively. But why do we need to run that test when track_utility = off? - if (query->utilityStmt) + if (query->utilityStmt && !jstate) { if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt)) "pgss_track_utility" should be "pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Query JITing with LLVM ORC
Hi all, I am working on a project with LLVM ORC that led us to PostgreSQL as a target application. We were surprised by learning that PGSQL already uses LLVM ORC to JIT certain queries. I would love to know what motivated this feature and for what it is being currently used for, as it is not enabled by default. Thanks. -- João Paulo L. de Carvalho Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada joao.carva...@ic.unicamp.br joao.carva...@ualberta.ca
Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions
On Tue, Sep 20, 2022 at 5:13 PM Bharath Rupireddy wrote: > > On Mon, Sep 19, 2022 at 8:19 PM Ashutosh Sharma wrote: > > > > Hi All, > > > > Currently, we have pg_current_wal_insert_lsn and pg_walfile_name sql > > functions which gives us information about the next wal insert > > location and the WAL file that the next wal insert location belongs > > to. Can we have a binary version of these sql functions? > > +1 for the idea in general. > > As said, pg_waldump seems to be the right candidate. I think we want > the lsn of the last WAL record and its info and the WAL file name > given an input data directory or just the pg_wal directory or any > directory where WAL files are located. For instance, one can use this > on an archive location containing archived WAL files or on a node > where pg_receivewal is receiving WAL files. Am I missing any other > use-cases? > Yeah, we can either add this functionality to pg_waldump or maybe add a new binary itself that would return this information. -- With Regards, Ashutosh Sharma.
Re: why can't a table be part of the same publication as its schema
On Wed, Sep 21, 2022 at 8:24 PM Jonathan S. Katz wrote: > > On 9/21/22 10:24 AM, Alvaro Herrera wrote: > > On 2022-Sep-20, Robert Haas wrote: > > > >>> I don't think we should change this behavior that's already in logical > >>> replication. While I understand the reasons why "GRANT ... ALL TABLES IN > >>> SCHEMA" has a different behavior (i.e. it's not applied to future > >>> objects) and do not advocate to change it, I have personally been > >>> affected where I thought a permission would be applied to all future > >>> objects, only to discover otherwise. I believe it's more intuitive to > >>> think that "ALL" applies to "everything, always." > >> > >> Nah, there's room for multiple behaviors here. It's reasonable to want > >> to add all the tables currently in the schema to a publication (or > >> grant permissions on them) and it's reasonable to want to include all > >> current and future tables in the schema in a publication (or grant > >> permissions on them) too. The reason I don't like the ALL TABLES IN > >> SCHEMA syntax is that it sounds like the former, but actually is the > >> latter. Based on your link to the email from Tom, I understand now the > >> reason why it's like that, but it's still counterintuitive to me. > > > > I already proposed elsewhere that we remove the ALL keyword from there, > > which I think serves to reduce confusion (in particular it's no longer > > parallel to the GRANT one). As in the attached. > Thanks for working on this. > [personal, not RMT hat] > > I'd be OK with this. It would still allow for "FOR SEQUENCES IN SCHEMA" etc. > I also think this is reasonable. It can later be extended to have an option to exclude/include future tables with a publication option. Also, if we want to keep it compatible with FOR ALL TABLES syntax, we can later add ALL as an optional keyword in the syntax. -- With Regards, Amit Kapila.
Re: [RFC] building postgres with meson - v13
Andres Freund writes: > I think we should: > - convert windows to build with ninja - it builds faster, runs all tests, > parallelizes tests. That means that msbuild based builds don't have coverage > via CI / cfbot, but we don't currently have the resources to test both. Check. The sooner we can get rid of the custom MSVC scripts, the better, because now we'll be on the hook to maintain *three* build systems. > - add a linux build using meson, we currently can afford building both with > autoconf and meson for linux Right. > I'm less clear on whether we should convert macos / freebsd to meson at this > point? We certainly could debug/polish the meson stuff just on linux and windows for now, but is there a reason to wait on the others? regards, tom lane
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote: > In trying to wrap the SIMD code behind layers of abstraction, the latest > patch (and Nathan's cleanup) threw it away in almost all cases. To explain, > we need to talk about how vectorized code deals with the "tail" that is too > small for the register: > > 1. Use a one-by-one algorithm, like we do for the pg_lfind* variants. > 2. Read some junk into the register and mask off false positives from the > result. > > There are advantages to both depending on the situation. > > Patch v5 and earlier used #2. Patch v6 used #1, so if a node16 has 15 > elements or less, it will iterate over them one-by-one exactly like a > node4. Only when full with 16 will the vector path be taken. When another > entry is added, the elements are copied to the next bigger node, so there's > a *small* window where it's fast. > > In short, this code needs to be lower level so that we still have full > control while being portable. I will work on this, and also the related > code for node dispatch. Is it possible to use approach #2 here, too? AFAICT space is allocated for all of the chunks, so there wouldn't be any danger in searching all them and discarding any results >= node->count. Granted, we're depending on the number of chunks always being a multiple of elements-per-vector in order to avoid the tail path, but that seems like a reasonably safe assumption that can be covered with comments. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Proposal to use JSON for Postgres Parser format
On Tue, 20 Sept 2022 at 17:29, Alvaro Herrera wrote: > > On 2022-Sep-20, Matthias van de Meent wrote: > > > Allow me to add: compressability > > > > In the thread surrounding [0] there were complaints about the size of > > catalogs, and specifically the template database. Significant parts of > > that (688kB of 8080kB a fresh PG14 database) are in pg_rewrite, which > > consists mostly of serialized Nodes. If we're going to replace our > > current NodeToText infrastructure, we'd better know we can effectively > > compress this data. > > True. Currently, the largest ev_action values compress pretty well. I > think if we wanted this to be more succint, we would have to invent some > binary format -- perhaps something like Protocol Buffers: it'd be stored > in the binary format in catalogs, but for output it would be converted > into something easy to read (we already do this for > pg_statistic_ext_data for example). We'd probably lose compressibility, > but that'd be okay because the binary format would already remove most > of the redundancy by nature. > > Do we want to go there? I don't think that a binary format would be much better for debugging/fixing than an optimization of the current textual format when combined with compression. As I mentioned in that thread, there is a lot of improvement possible with the existing format, and I think any debugging of serialized nodes would greatly benefit from using a textual format. Then again, I also agree that this argument doesn't hold it's weight when storage and output formats are going to be different. I trust that any new tooling introduced as a result of this thread will be better than what we have right now. As for best format: I don't know. The current format is usable, and a better format would not store any data for default values. JSON can do that, but I could think of many formats that could do the same (Smile, BSON, xml, etc.). I do not think that protobuf is the best choice for storage, though, because it has its own rules on what it considers a default value and what it does or does not serialize: zero is considered the only default for numbers, as is the empty string for text, etc. I think it is allright for general use, but with e.g. `location: -1` in just about every parse node we'd probably want to select our own values to ignore during (de)serialization of fields. Kind regards, Matthias van de Meent
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-21 13:56:37 -0400, Tom Lane wrote: > Andres Freund writes: > > I think we should: > > > - convert windows to build with ninja - it builds faster, runs all tests, > > parallelizes tests. That means that msbuild based builds don't have > > coverage > > via CI / cfbot, but we don't currently have the resources to test both. > > Check. The sooner we can get rid of the custom MSVC scripts, the better, > because now we'll be on the hook to maintain *three* build systems. Agreed. I think the only "major" missing thing is the windows resource file generation stuff, which is mostly done in one of the "later" commits. Also need to test a few more of the optional dependencies (ICU, gettext, ...) on windows (I did test zlib, lz4, zstd). And of course get a bit of wider exposure than "just me and CI". > > I'm less clear on whether we should convert macos / freebsd to meson at this > > point? > > We certainly could debug/polish the meson stuff just on linux and windows > for now, but is there a reason to wait on the others? No - freebsd and macos have worked in CI for a long time. I was wondering whether we want more coverage for autoconf in CI, but thinking about it futher, it's more important to have the meson coverage. Greetings, Andres Freund
Re: [RFC] building postgres with meson - v13
On Wed, Sep 21, 2022 at 09:46:30AM -0700, Andres Freund wrote: > I think we should: > > - convert windows to build with ninja - it builds faster, runs all tests, > parallelizes tests. That means that msbuild based builds don't have coverage > via CI / cfbot, but we don't currently have the resources to test both. +1 If multiple Windows (or other) tasks are going to exist, I think they should have separate "ci-os-only" conditions, like windows-msvc, windows-ninja, ... It should be possible to run only one.
Re: Query JITing with LLVM ORC
On Thu, Sep 22, 2022 at 4:17 AM João Paulo Labegalini de Carvalho wrote: > I am working on a project with LLVM ORC that led us to PostgreSQL as a target > application. We were surprised by learning that PGSQL already uses LLVM ORC > to JIT certain queries. It JITs expressions but not whole queries. Query execution at the tuple-flow level is still done using a C call stack the same shape as the query plan, but it *could* be transformed to a different control flow that could be run more efficiently and perhaps JITed. CCing Andres who developed all this and had some ideas about that... > I would love to know what motivated this feature and for what it is being > currently used for, https://www.postgresql.org/docs/current/jit-reason.html > as it is not enabled by default. It's enabled by default in v12 and higher (if you built with --with-llvm, as packagers do), but not always used: https://www.postgresql.org/docs/current/jit-decision.html
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
On Tue, Sep 20, 2022 at 03:12:00PM -0700, Peter Geoghegan wrote: > On Mon, Sep 12, 2022 at 2:01 PM Peter Geoghegan wrote: >> I'd like to talk about one such technique on this thread. The attached >> WIP patch reduces the size of xl_heap_freeze_page records by applying >> a simple deduplication process. > > Attached is v2, which I'm just posting to keep CFTester happy. No real > changes here. This idea seems promising. I see that you called this patch a work-in-progress, so I'm curious what else you are planning to do with it. As I'm reading this thread and the patch, I'm finding myself wondering if it's worth exploring using wal_compression for these records instead. I think you've essentially created an efficient compression mechanism for this one type of record, but I'm assuming that lz4/zstd would also yield some rather substantial improvements for this kind of data. Presumably a generic WAL record compression mechanism could be reused for other large records, too. That could be much easier than devising a deduplication strategy for every record type. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_auth_members.grantor is bunk
Robert Haas writes: > On Thu, Aug 18, 2022 at 1:26 PM Robert Haas wrote: >> CI is happier with this version, so I've committed 0001. If no major >> problems emerge, I'll proceed with 0002 as well. > Done. Shouldn't the CF entry [1] be closed as committed? regards, tom lane [1] https://commitfest.postgresql.org/39/3745/
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart wrote: > This idea seems promising. I see that you called this patch a > work-in-progress, so I'm curious what else you are planning to do with it. I really just meant that the patch wasn't completely finished at that point. I hadn't yet convinced myself that I mostly had it right. I'm more confident now. > As I'm reading this thread and the patch, I'm finding myself wondering if > it's worth exploring using wal_compression for these records instead. The term deduplication works better than compression here because we're not actually decompressing anything in the REDO routine. Rather, the REDO routine processes each freeze plan by processing all affected tuples in order. To me this seems like the natural way to structure things -- the WAL records are much smaller, but in a way that's kind of incidental. The approach taken by the patch just seems like the natural approach, given the specifics of how freezing works at a high level. > I think you've essentially created an efficient compression mechanism for > this one type of record, but I'm assuming that lz4/zstd would also yield > some rather substantial improvements for this kind of data. I don't think of it that way. I've used the term "deduplication" to advertise the patch, but that's mostly just a description of what we're doing in the patch relative to what we do on HEAD today. There is nothing truly clever in the patch. We see a huge amount of redundancy among tuples from the same page in practically all cases, for reasons that have everything to do with what freezing is, and how it works at a high level. The thought process that led to my writing this patch was more high level than appearances suggest. (I often write patches that combine high level and low level insights in some way or other, actually.) Theoretically there might not be very much redundancy within each xl_heap_freeze_page record, with the right workload, but in practice a decrease of 4x or more is all but guaranteed once you have more than a few tuples to freeze on each page. If there are other WAL records that are as space inefficient as xl_heap_freeze_page is, then I'd be surprised -- it is *unusually* space inefficient (like I said, I suspect that this may have something to do with the fact that it was originally designed under time pressure). So I don't expect that this patch tells us much about what we should do for any other WAL record. I certainly *hope* that it doesn't, at least. > Presumably a > generic WAL record compression mechanism could be reused for other large > records, too. That could be much easier than devising a deduplication > strategy for every record type. It's quite possible that that's a good idea, but that should probably work as an additive thing. That's something that I think of as a "clever technique", whereas I'm focussed on just not being naive in how we represent this one specific WAL record type. -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan wrote: > > Presumably a > > generic WAL record compression mechanism could be reused for other large > > records, too. That could be much easier than devising a deduplication > > strategy for every record type. > > It's quite possible that that's a good idea, but that should probably > work as an additive thing. That's something that I think of as a > "clever technique", whereas I'm focussed on just not being naive in > how we represent this one specific WAL record type. BTW, if you wanted to pursue something like this, that would work with many different types of WAL record, ISTM that a "medium level" (not low level) approach might be the best place to start. In particular, the way that page offset numbers are represented in many WAL records is quite space inefficient. A domain-specific approach built with some understanding of how page offset numbers tend to look in practice seems promising. The representation of page offset numbers in PRUNE and VACUUM heapam WAL records (and in index WAL records) always just stores an array of 2 byte OffsetNumber elements. It probably wouldn't be all that difficult to come up with a simple scheme for compressing an array of OffsetNumbers in WAL records. It certainly doesn't seem like it would be all that difficult to get it down to 1 byte per offset number in most cases (even greater improvements seem doable). That could also be used for the xl_heap_freeze_page record type -- though only after this patch is committed. The patch makes the WAL record use a simple array of page offset numbers, just like in PRUNE/VACUUM records. That's another reason why the approach implemented by the patch seems like "the natural approach" to me. It's much closer to how heapam PRUNE records work (we have a variable number of arrays of page offset numbers in both cases). -- Peter Geoghegan
Re: Query JITing with LLVM ORC
Hi Thomas, It JITs expressions but not whole queries. Thanks for the clarification. > Query execution at the > tuple-flow level is still done using a C call stack the same shape as > the query plan, but it *could* be transformed to a different control > flow that could be run more efficiently and perhaps JITed. I see, so there is room for extending the use of Orc JIT in PGSQL. > CCing > Andres who developed all this and had some ideas about that... > Thanks for CCing Andres, it will be great to hear from him. > > I would love to know what motivated this feature and for what it is > being currently used for, > > https://www.postgresql.org/docs/current/jit-reason.html In that link I found the README under src/backend/jit, which was very helpful. > as it is not enabled by default. > > It's enabled by default in v12 and higher (if you built with > --with-llvm, as packagers do), but not always used: > > https://www.postgresql.org/docs/current/jit-decision.html > Good to know. I compiled from the REL_14_5 tag and did a simple experiment to contrast building with and w/o passing --with-llvm. I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up, and 120 of measurement time. The number of transactions per minute was about the same with & w/o JITing. Is this expected? Should I use a different benchmark to observe a performance difference? Regards, -- João Paulo L. de Carvalho Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada joao.carva...@ic.unicamp.br joao.carva...@ualberta.ca
Re: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER
On Wed, Sep 21, 2022 at 3:10 PM Andrey Chudnovskiy wrote: > We can support both passing the token from an upstream client and libpq > implementing OAUTH2 protocol to obtaining one. Right, I agree that we could potentially do both. > Libpq passing toked directly from an upstream client is useful in other > scenarios: > 1. Enterprise clients, built with .Net / Java and using provider-specific > authentication libraries, like MSAL for AAD. Those can also support more > advance provider-specific token acquisition flows. > 2. Resource-tight (like IoT) clients. Those can be compiled without optional > libpq flag not including the iddawc or other dependency. What I don't understand is how the OAUTHBEARER mechanism helps you in this case. You're short-circuiting the negotiation where the server tells the client what provider to use and what scopes to request, and instead you're saying "here's a secret string, just take it and validate it with magic." I realize the ability to pass an opaque token may be useful, but from the server's perspective, I don't see what differentiates it from the password auth method plus a custom authenticator plugin. Why pay for the additional complexity of OAUTHBEARER if you're not going to use it? --Jacob
Re: Query JITing with LLVM ORC
=?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= writes: > Good to know. I compiled from the REL_14_5 tag and did a simple experiment > to contrast building with and w/o passing --with-llvm. > I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up, > and 120 of measurement time. > The number of transactions per minute was about the same with & w/o JITing. > Is this expected? Should I use a different benchmark to observe a > performance difference? TPC-C is mostly short queries, so we aren't likely to choose to use JIT (and if we did, it'd likely be slower). You need a long query that will execute the same expressions over and over for it to make sense to compile them. Did you check whether any JIT was happening there? There are a bunch of issues in this area concerning whether our cost models are good enough to accurately predict whether JIT is a good idea. But single-row fetches and updates are basically never going to use it, nor should they. regards, tom lane
Re: [PoC] Let libpq reject unexpected authentication requests
On 21.09.22 17:33, Jacob Champion wrote: On Fri, Sep 16, 2022 at 1:29 PM Jacob Champion wrote: I'm happy to implement proofs of concept for that, or any other ideas, given the importance of getting this "right enough" the first time. Just let me know. v8 rebases over the postgres_fdw HINT changes; there are no functional differences. So let's look at the two TODO comments you have: * TODO: how should !auth_required interact with an incomplete * SCRAM exchange? What specific combination of events are you thinking of here? /* * If implicit GSS auth has already been performed via GSS * encryption, we don't need to have performed an * AUTH_REQ_GSS exchange. * * TODO: check this assumption. What mutual auth guarantees * are made in this case? */ I don't understand the details involved here, but I would be surprised if this assumption is true. For example, does GSS encryption deal with user names and a user name map? I don't see how these can be equivalent. In any case, it seems to me that it would be safer to *not* make this assumption at first and then have someone more knowledgeable make the argument that it would be safe.
Re: Query JITing with LLVM ORC
On Thu, Sep 22, 2022 at 10:35 AM Tom Lane wrote: > =?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= > writes: > > Good to know. I compiled from the REL_14_5 tag and did a simple experiment > > to contrast building with and w/o passing --with-llvm. > > I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up, > > and 120 of measurement time. > > The number of transactions per minute was about the same with & w/o JITing. > > Is this expected? Should I use a different benchmark to observe a > > performance difference? > > TPC-C is mostly short queries, so we aren't likely to choose to use JIT > (and if we did, it'd likely be slower). You need a long query that will > execute the same expressions over and over for it to make sense to > compile them. Did you check whether any JIT was happening there? See also the proposal thread which has some earlier numbers from TPC-H. https://www.postgresql.org/message-id/flat/20170901064131.tazjxwus3k2w3ybh%40alap3.anarazel.de
Re: Query JITing with LLVM ORC
On Thu, Sep 22, 2022 at 10:04 AM João Paulo Labegalini de Carvalho wrote: >building with and w/o passing --with-llvm BTW you can also just turn it off with runtime settings, no need to rebuild.
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-21 09:46:30 -0700, Andres Freund wrote: > After that I am planning to split the "ci" commit so that it converts a few of > the CI tasks to use meson, without adding all the other platforms I added for > development. I think that's important to get in soon, given that it'll > probably take a bit until the buildfarm grows meson coverage and because it > provides cfbot coverage which seems important for now as well. > > I think we should: > > - convert windows to build with ninja - it builds faster, runs all tests, > parallelizes tests. That means that msbuild based builds don't have coverage > via CI / cfbot, but we don't currently have the resources to test both. I was working on that and hit an issue that took me a while to resolve: Once I tested only the "main" meson commit plus CI the windows task was running out of memory. There was an outage of the CI provider at the same time, so I first blamed it on that. But it turns out to be "legitimately" high memory usage related to debug symbols - the only reason CI didn't show that before was that it's incidentally fixed as a indirect consequence of using precompiled headers, in a later commit. Argh. It can also be fixed by the option required to use ccache at some point, so I'll do that for now. Greetings, Andres Freund
Re: [PoC] Let libpq reject unexpected authentication requests
On Wed, Sep 21, 2022 at 3:36 PM Peter Eisentraut wrote: > So let's look at the two TODO comments you have: > > * TODO: how should !auth_required interact with an incomplete > * SCRAM exchange? > > What specific combination of events are you thinking of here? Let's say the client is using `require_auth=!password`. If the server starts a SCRAM exchange. but doesn't finish it, the connection will still succeed with the current implementation (because it satisfies the "none" case). This is also true for a client setting of `require_auth=scram-sha-256,none`. I think this is potentially dangerous, but it mirrors the current behavior of libpq and I'm not sure that we should change it as part of this patch. > /* > * If implicit GSS auth has already been performed via GSS > * encryption, we don't need to have performed an > * AUTH_REQ_GSS exchange. > * > * TODO: check this assumption. What mutual auth guarantees > * are made in this case? > */ > > I don't understand the details involved here, but I would be surprised > if this assumption is true. For example, does GSS encryption deal with > user names and a user name map? To my understanding, yes. There are explicit tests for it. > In any case, it seems to me that it would be safer to *not* > make this assumption at first and then have someone more knowledgeable > make the argument that it would be safe. I think I'm okay with that, regardless. Here's one of the wrinkles: right now, both of the following connstrings work: require_auth=gss gssencmode=require require_auth=gss gssencmode=prefer If we don't treat gssencmode as providing GSS auth, then the first case will always fail, because there will be no GSS authentication packet over an encrypted connection. Likewise, the second case will almost always fail, unless the server doesn't support gssencmode at all (so why are you using prefer?). If you're okay with those limitations, I will rip out the code. The reason I'm not too worried about it is, I don't think it makes much sense to be strict about your authentication requirements while at the same time leaving the choice of transport encryption up to the server. Thanks, --Jacob
pg_basebackup --create-slot-if-not-exists?
Currently, pg_basebackup has --create-slot option to create slot if not already exists or --slot to use existing slot Which means it needs knowledge on if the slot with the given name already exists or not before invoking the command. If pg_basebackup --create-slot <> command fails for some reason after creating the slot. Re-triggering the same command fails with ERROR slot already exists. Either then need to delete the slot and retrigger Or need to add a check before retriggering the command to check if the slot exists and based on the same alter the command to avoid passing --create-slot option. This poses inconvenience while automating on top of pg_basebackup. As checking for slot presence before invoking pg_basebackup unnecessarily involves issuing separate SQL commands. Would be really helpful for such scenarios if similar to CREATE TABLE, pg_basebackup can have IF NOT EXISTS kind of semantic. (Seems the limitation most likely is coming from CREATE REPLICATION SLOT protocol itself), Thoughts? -- *Ashwin Agrawal (VMware)*
Re: pg_basebackup's --gzip switch misbehaves
On Tue, Sep 13, 2022 at 04:13:20PM +0900, Michael Paquier wrote: > diff --git a/src/common/compression.c b/src/common/compression.c > index da3c291c0f..ac26287d54 100644 > --- a/src/common/compression.c > +++ b/src/common/compression.c > @@ -249,36 +299,49 @@ expect_integer_value(char *keyword, char *value, > pg_compress_specification *resu > char * > validate_compress_specification(pg_compress_specification *spec) > { > + int min_level = 1; > + int max_level = 1; > + int default_level = 0; > + > /* If it didn't even parse OK, it's definitely no good. */ > if (spec->parse_error != NULL) > return spec->parse_error; > > /* > - * If a compression level was specified, check that the algorithm > expects > - * a compression level and that the level is within the legal range for > - * the algorithm. > + * Check that the algorithm expects a compression level and it is > + * is within the legal range for the algorithm. >*/ > - if ((spec->options & PG_COMPRESSION_OPTION_LEVEL) != 0) > + switch (spec->algorithm) > { > - int min_level = 1; > - int max_level; > - > - if (spec->algorithm == PG_COMPRESSION_GZIP) > + case PG_COMPRESSION_GZIP: > max_level = 9; > - else if (spec->algorithm == PG_COMPRESSION_LZ4) > +#ifdef HAVE_LIBZ > + default_level = Z_DEFAULT_COMPRESSION; > +#endif > + break; > + case PG_COMPRESSION_LZ4: > max_level = 12; > - else if (spec->algorithm == PG_COMPRESSION_ZSTD) > + default_level = 0; /* fast mode */ > + break; > + case PG_COMPRESSION_ZSTD: > max_level = 22; I should've suggested to add: > min_level = -7; which has been supported since zstd 1.3.4 (and postgres requires 1.4.0). I think at some point (maybe before releasing 1.3.4) the range was increased to very large(small), negative levels. It's possible to query the library about the lowest supported compression level, but then there's a complication regarding the client-side library version vs the server-side version. So it seems better to just use -7. -- Justin
Re: pg_basebackup --create-slot-if-not-exists?
On Wednesday, September 21, 2022, Ashwin Agrawal wrote: > Currently, pg_basebackup has > --create-slot option to create slot if not already exists or > --slot to use existing slot > > Which means it needs knowledge on if the slot with the given name already > exists or not before invoking the command. If pg_basebackup --create-slot > <> command fails for some reason after creating the slot. Re-triggering the > same command fails with ERROR slot already exists. Either then need to > delete the slot and retrigger Or need to add a check before retriggering > the command to check if the slot exists and based on the same alter the > command to avoid passing --create-slot option. This poses > inconvenience while automating on top of pg_basebackup. As checking for > slot presence before invoking pg_basebackup unnecessarily involves issuing > separate SQL commands. Would be really helpful for such scenarios if > similar to CREATE TABLE, pg_basebackup can have IF NOT EXISTS kind of > semantic. (Seems the limitation most likely is coming from CREATE > REPLICATION SLOT protocol itself), Thoughts? > What’s the use case for automating pg_basebackup with a named replication slot created by the pg_basebackup command? Why can you not leverage a temporary replication slot (i.e., omit —slot). ISTM the create option is basically obsolete now. David J.
Re: pg_basebackup --create-slot-if-not-exists?
On Wed, Sep 21, 2022 at 05:34:20PM -0700, David G. Johnston wrote: > What’s the use case for automating pg_basebackup with a named replication > slot created by the pg_basebackup command? Why can you not leverage a > temporary replication slot (i.e., omit —slot). ISTM the create option is > basically obsolete now. +1. Perhaps the ask would ease some monitoring around pg_replication_slots with a fixed slot name? One thing that could come into play is the possibility to enforce the use of a temporary slot with a name given by -S as pg_basebackup makes it permanent in this case, still the temporary slot name being pg_basebackup_${PID} makes the slot searchable with a pattern. -- Michael signature.asc Description: PGP signature
Re: make additional use of optimized linear search routines
On Wed, Sep 21, 2022 at 02:28:08PM +0800, Richard Guo wrote: > I wonder if there are other code paths we can replace with the linear > search routines. I tried to search for them but no luck. I have been looking at a couple of simple patterns across the tree but no luck here either. Well, if someone spots something, it could always be done later. For now I have applied the bits discussed on this thread. -- Michael signature.asc Description: PGP signature
Re: pg_basebackup --create-slot-if-not-exists?
Michael Paquier writes: > On Wed, Sep 21, 2022 at 05:34:20PM -0700, David G. Johnston wrote: >> What’s the use case for automating pg_basebackup with a named replication >> slot created by the pg_basebackup command? Why can you not leverage a >> temporary replication slot (i.e., omit —slot). ISTM the create option is >> basically obsolete now. > +1. ISTM there'd also be some security concerns, ie what if there's a pre-existing slot (created by a hostile user, perhaps) that has properties different from what you expect? I realize that slot creation is a pretty high-privilege operation, but it's not superuser-only. In any case I agree with the point that --create-slot seems rather obsolete. If you are trying to resume in a previous replication stream (which seems like the point of persistent slots) then the slot had better already exist. If you are satisfied with just starting replication from the current instant, then a temp slot seems like what you want. regards, tom lane
Re: pg_basebackup's --gzip switch misbehaves
On Wed, Sep 21, 2022 at 07:31:48PM -0500, Justin Pryzby wrote: > I think at some point (maybe before releasing 1.3.4) the range was > increased to very large(small), negative levels. It's possible to query > the library about the lowest supported compression level, but then > there's a complication regarding the client-side library version vs the > server-side version. So it seems better to just use -7. Indeed. Contrary to the default level, there are no variables for the minimum and maximum levels. As you are pointing out, a lookup at zstd_compress.c shows that we have ZSTD_minCLevel() and ZSTD_maxCLevel() that assign the bounds. Both are available since 1.4.0. We still need a backend-side check as the level passed with a BASE_BACKUP command would be only validated there. It seems to me that this is going to be less of a headache in the long-term if we just use those routines at runtime, as zstd wants to keep some freedom with the min and max bounds for the compression level, at least that's the flexibility that this gives the library. So I would tweak things as the attached. -- Michael diff --git a/src/common/compression.c b/src/common/compression.c index e40ce98ef3..0c6bb9177b 100644 --- a/src/common/compression.c +++ b/src/common/compression.c @@ -324,8 +324,9 @@ validate_compress_specification(pg_compress_specification *spec) default_level = 0; /* fast mode */ break; case PG_COMPRESSION_ZSTD: - max_level = 22; #ifdef USE_ZSTD + max_level = ZSTD_maxCLevel(); + min_level = ZSTD_minCLevel(); default_level = ZSTD_CLEVEL_DEFAULT; #endif break; diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index f63c912e97..5cff3d3c07 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2757,8 +2757,10 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" -1), for lz4 an integer between 1 and 12 (default 0 for fast compression mode), and for zstd an integer between - 1 and 22 (default - ZSTD_CLEVEL_DEFAULT or 3). + ZSTD_minCLevel() (usually -131072) + and ZSTD_maxCLevel() (usually 22) + (default ZSTD_CLEVEL_DEFAULT or + 3). signature.asc Description: PGP signature
Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
On Wed, Sep 21, 2022 at 8:03 AM Bharath Rupireddy wrote: > > On Fri, Aug 19, 2022 at 5:27 PM Bharath Rupireddy > wrote: > > > > Please review the attached v6 patch. > > I'm attaching the v7 patch rebased on to the latest HEAD. v7 patch was failing on Windows [1]. This is because of the full path name being sent to dir_existsfile() instead of just sending the temp file name. I fixed the issue. Thanks Michael Paquier for an offlist chat. PSA v8 patch. [1] t/010_pg_basebackup.pl ... 134/? # Failed test 'pg_basebackup reports checksum mismatch stderr /(?^s:^WARNING.*checksum verification failed)/' # at t/010_pg_basebackup.pl line 769. # 'unrecognized win32 error code: 123WARNING: checksum verification failed in file "./base/5/16399", block 0: calculated 4C09 but expected B3F6 # pg_basebackup: error: checksum error occurred -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 4bb81868810f55675a2abb97531d70ae8e0e3405 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 22 Sep 2022 01:07:18 + Subject: [PATCH v8] Make WAL file initialization by pg_receivewal atomic While initializing WAL files with zeros (i.e. padding) the pg_receivewal can fail for any reason, for instance, no space left on device or host server crash etc., which may leave partially padded WAL files due to which the pg_receivewal won't come up after the hostserver restart. The error it emits is "error: write-ahead log file "foo.partial" has x bytes, should be 0 or y" (where y is size of WAL file typically and x < y). To fix this, one needs to manually intervene and delete the partially padded WAL file which requires an internal understanding of what the issue is and often a time-consuming process in production environments. The above problem can also exist for pg_basebackup as it uses the same walmethods.c function for initialization. To address the above problem, make WAL file initialization atomic i.e. first create a temporary file, pad it and then rename it to the target file. This is similar to what postgres does to make WAL file initialization atomic. Authors: Bharath Rupireddy, SATYANARAYANA NARLAPURAM Reviewed-by: Cary Huang, mahendrakar s Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://www.postgresql.org/message-id/CAHg%2BQDcVUss6ocOmbLbV5f4YeGLhOCt%2B1x2yLNfG2H_eDwU8tw%40mail.gmail.com --- src/bin/pg_basebackup/pg_receivewal.c | 22 ++ src/bin/pg_basebackup/walmethods.c| 98 --- 2 files changed, 111 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c index f98ec557db..1cbc992e67 100644 --- a/src/bin/pg_basebackup/pg_receivewal.c +++ b/src/bin/pg_basebackup/pg_receivewal.c @@ -237,6 +237,28 @@ is_xlogfilename(const char *filename, bool *ispartial, return true; } + /* + * File looks like a temporary partial uncompressed WAL file that was + * leftover during pre-padding phase from a previous crash, delete it now + * as we don't need it. + */ + if (fname_len == XLOG_FNAME_LEN + strlen(".partial.tmp") && + strcmp(filename + XLOG_FNAME_LEN, ".partial.tmp") == 0) + { + /* 12 is length of string ".partial.tmp" */ + char path[MAXPGPATH + XLOG_FNAME_LEN + 12]; + + snprintf(path, sizeof(path), "%s/%s", basedir, filename); + + if (unlink(path) < 0) + pg_log_error("could not remove file \"%s\"", path); + + if (verbose) + pg_log_info("removed file \"%s\"", path); + + return false; + } + /* File does not look like something we know */ return false; } diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index bc2e83d02b..ed7b3185b0 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -118,7 +118,11 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname, const char *temp_suffix, size_t pad_to_size) { DirectoryMethodData *dir_data = (DirectoryMethodData *) wwmethod; - char tmppath[MAXPGPATH]; + char targetpath[MAXPGPATH]; + char tmpfilename[MAXPGPATH]; + char tmpsuffixpath[MAXPGPATH]; + bool shouldcreatetempfile = false; + int flags; char *filename; int fd; DirectoryMethodFile *f; @@ -134,8 +138,9 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname, clear_error(wwmethod); filename = dir_get_file_name(wwmethod, pathname, temp_suffix); - snprintf(tmppath, sizeof(tmppath), "%s/%s", + snprintf(targetpath, sizeof(targetpath), "%s/%s", dir_data->basedir, filename); + snprintf(tmpfilename, MAXPGPATH, "%s.tmp", filename); pg_free(filename); /* @@ -143,12 +148,57 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname, * the file descriptor is important for dir_sync() method as gzflush() * does not do any system calls to fsync() to make changes permanent on * disk. + * + * Create a temporary file for padding and no compression cases to
Re: why can't a table be part of the same publication as its schema
On Wed, Sep 21, 2022 at 1:15 AM Jonathan S. Katz wrote: > > (RMT hat on, unless otherwise noted) > > On 9/20/22 9:42 AM, Robert Haas wrote: > > On Mon, Sep 19, 2022 at 11:03 PM Jonathan S. Katz > > wrote: > >> For #1 (allowing calls that have schema/table overlap...), there appears > >> to be both a patch that allows this (reversing[8]), and a suggestion for > >> dealing with a corner-case that is reasonable, i.e. disallowing adding > >> schemas to a publication when specifying column-lists. Do we think we > >> can have consensus on this prior to the RC1 freeze? > > > > I am not sure whether we can or should rush a fix in that fast, but I > > agree with this direction. > > The RMT met today to discuss this. > > We did agree that the above is an open item that should be resolved > before this release. While it is an accepted pattern for us to "ERROR" > on unsupported behavior and then later introduce said behavior, we do > agree with Peter's original post in this thread and would like it resolved. > As there seems to be an agreement with this direction, I think it is better to commit the patch in this release (before RC1) to avoid users seeing any behavior change in a later release. If the proposed behavior for one of the cases (disallowing adding schemas to a publication when specifying column-lists) turns out to be too restrictive for users, we can make it less restrictive in a future release. I am planning to commit the current patch [1] tomorrow unless RMT or anyone else thinks otherwise. [1] - https://www.postgresql.org/message-id/OS0PR01MB57162E862758402F978725CD944B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions
On Wed, Sep 21, 2022 at 9:53 PM Ashutosh Sharma wrote: > > Yeah, we can either add this functionality to pg_waldump or maybe add > a new binary itself that would return this information. IMV, a separate tool isn't the way, since pg_waldump already reads WAL files and decodes WAL records, what's proposed here can be an additional functionality of pg_waldump. It will be great if an initial patch is posted here. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Allow logical replication to copy tables in binary format
Hi Few more minor comments. On Tuesday, August 16, 2022 2:04 AM Melih Mutlu wrote: > > > My main concern is to break a scenario that was previously working (14 > -> 15) but after a subscriber upgrade > it won't (14 -> 16). > > Fair concern. Some cases that might break the logical replication with version > upgrade would be: ... > 3- Copying in binary format would work with the same schemas. Currently, > logical replication does not require the exact same schemas in publisher and > subscriber. > This is an additional restriction that comes with the COPY command. > > If a logical replication has been set up with different schemas and > subscription > is created with the binary option, then yes this would break things. > This restriction can be clearly stated and wouldn't be unexpected though. > > I'm also okay with allowing binary copy only for v16 or later, if you think > it would > be safer and no one disagrees with that. > What are your thoughts? I agree with the direction to support binary copy for v16 and later. IIUC, the binary format replication with different data types fails even during apply phase on HEAD. I thought that means, the upgrade concern only applies to a scenario that the user executes only initial table synchronizations between the publisher and subscriber and doesn't replicate any data at apply phase after that. I would say this isn't a valid scenario and your proposal makes sense. Best Regards, Takamichi Osumi
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Tues, Sep 20, 2022 at 18:30 PM Önder Kalacı wrote: > Thanks for the reviews, attached v12. Thanks for your patch. Here is a question and a comment: 1. In the function GetCheapestReplicaIdentityFullPath. + if (rel->pathlist == NIL) + { + /* +* A sequential scan could have been dominated by by an index scan +* during make_one_rel(). We should always have a sequential scan +* before set_cheapest(). +*/ + Path *seqScanPath = create_seqscan_path(root, rel, NULL, 0); + + add_path(rel, seqScanPath); + } This is a question I'm not sure about: Do we need this part to add sequential scan? I think in our case, the sequential scan seems to have been added by the function make_one_rel (see function set_plain_rel_pathlist). If I am missing something, please let me know. BTW, there is a typo in above comment: `by by`. 2. In the file execReplication.c. +#ifdef USE_ASSERT_CHECKING +#include "catalog/index.h" +#endif #include "commands/trigger.h" #include "executor/executor.h" #include "executor/nodeModifyTable.h" #include "nodes/nodeFuncs.h" #include "parser/parse_relation.h" #include "parser/parsetree.h" +#ifdef USE_ASSERT_CHECKING +#include "replication/logicalrelation.h" +#endif I think it's fine to only add `logicalrelation.h` here, because `index.h` has been added by `logicalrelation.h`. Regards, Wang wei
Re: pg_basebackup's --gzip switch misbehaves
On Thu, Sep 22, 2022 at 10:25:11AM +0900, Michael Paquier wrote: > On Wed, Sep 21, 2022 at 07:31:48PM -0500, Justin Pryzby wrote: > > I think at some point (maybe before releasing 1.3.4) the range was > > increased to very large(small), negative levels. It's possible to query > > the library about the lowest supported compression level, but then > > there's a complication regarding the client-side library version vs the > > server-side version. So it seems better to just use -7. > > Indeed. Contrary to the default level, there are no variables for the > minimum and maximum levels. As you are pointing out, a lookup at > zstd_compress.c shows that we have ZSTD_minCLevel() and > ZSTD_maxCLevel() that assign the bounds. Both are available since > 1.4.0. We still need a backend-side check as the level passed with a > BASE_BACKUP command would be only validated there. It seems to me > that this is going to be less of a headache in the long-term if we > just use those routines at runtime, as zstd wants to keep some freedom > with the min and max bounds for the compression level, at least that's > the flexibility that this gives the library. So I would tweak things > as the attached. Okay. Will that complicate tests at all? It looks like it's not an issue for the tests currently proposed in the CF APP. https://commitfest.postgresql.org/39/3835/ However the patch ends up, +0.75 to backpatch it to v15 rather than calling it a new feature in v16. -- Justin
Re: pg_basebackup's --gzip switch misbehaves
Justin Pryzby writes: > However the patch ends up, +0.75 to backpatch it to v15 rather than > calling it a new feature in v16. I don't have any opinion on the concrete merits of this change, but I want to note that 15rc1 wraps on Monday, and we don't like people pushing noncritical changes shortly before a wrap. There is not a lot of time for fooling around here. regards, tom lane
Re: Query JITing with LLVM ORC
Tom & Thomas: Thank you so much, those a very useful comments. I noticed that I didn't make my intentions very clear. My teams goal is to evaluate if there are any gains in JITing PostgreSQL itself, or at least parts of it, and not the expressions or parts of a query. The rationale to use PostgreSQL is because DBs are long running applications and the cost of JITing can be amortized. We have a prototype LLVM IR pass that outlines functions in a program to JIT and a ORC-based runtime to re-compile functions. Our goal is to see improvements due to target/sub-target specialization. The reason I was looking at benchmarks is to have a workload to profile PostgreSQL and find its bottlenecks. The hot functions would then be outlined for JITing. On Wed., Sep. 21, 2022, 4:54 p.m. Thomas Munro, wrote: > On Thu, Sep 22, 2022 at 10:04 AM João Paulo Labegalini de Carvalho > wrote: > >building with and w/o passing --with-llvm > > BTW you can also just turn it off with runtime settings, no need to > rebuild. >
Re: Fix snapshot name for SET TRANSACTION documentation
On 2022/09/21 14:40, Fujii Masao wrote: On 2022/09/21 12:01, Japin Li wrote: Hi hackers, In 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, we change the snapshot name when exporting snapshot, however, there is one place we missed update the snapshot name in documentation. Attach a patch to fix it. Thanks for the patch! Looks good to me. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: make additional use of optimized linear search routines
On Thu, Sep 22, 2022 at 09:52:57AM +0900, Michael Paquier wrote: > On Wed, Sep 21, 2022 at 02:28:08PM +0800, Richard Guo wrote: >> I wonder if there are other code paths we can replace with the linear >> search routines. I tried to search for them but no luck. > > I have been looking at a couple of simple patterns across the tree but > no luck here either. Well, if someone spots something, it could > always be done later. For now I have applied the bits discussed on > this thread. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
On Wed, Sep 21, 2022 at 02:41:28PM -0700, Peter Geoghegan wrote: > On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan wrote: >> > Presumably a >> > generic WAL record compression mechanism could be reused for other large >> > records, too. That could be much easier than devising a deduplication >> > strategy for every record type. >> >> It's quite possible that that's a good idea, but that should probably >> work as an additive thing. That's something that I think of as a >> "clever technique", whereas I'm focussed on just not being naive in >> how we represent this one specific WAL record type. > > BTW, if you wanted to pursue something like this, that would work with > many different types of WAL record, ISTM that a "medium level" (not > low level) approach might be the best place to start. In particular, > the way that page offset numbers are represented in many WAL records > is quite space inefficient. A domain-specific approach built with > some understanding of how page offset numbers tend to look in practice > seems promising. I wouldn't mind giving this a try. > The representation of page offset numbers in PRUNE and VACUUM heapam > WAL records (and in index WAL records) always just stores an array of > 2 byte OffsetNumber elements. It probably wouldn't be all that > difficult to come up with a simple scheme for compressing an array of > OffsetNumbers in WAL records. It certainly doesn't seem like it would > be all that difficult to get it down to 1 byte per offset number in > most cases (even greater improvements seem doable). > > That could also be used for the xl_heap_freeze_page record type -- > though only after this patch is committed. The patch makes the WAL > record use a simple array of page offset numbers, just like in > PRUNE/VACUUM records. That's another reason why the approach > implemented by the patch seems like "the natural approach" to me. It's > much closer to how heapam PRUNE records work (we have a variable > number of arrays of page offset numbers in both cases). Yeah, it seems likely that we could pack offsets in single bytes in many cases. A more sophisticated approach could even choose how many bits to use per offset based on the maximum in the array. Furthermore, we might be able to make use of SIMD instructions to mitigate any performance penalty. I'm tempted to start by just using single-byte offsets when possible since that should be relatively simple while still yielding a decent improvement for many workloads. What do you think? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
On Wed, Sep 21, 2022 at 02:11:36PM -0700, Peter Geoghegan wrote: > On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart > wrote: >> Presumably a >> generic WAL record compression mechanism could be reused for other large >> records, too. That could be much easier than devising a deduplication >> strategy for every record type. > > It's quite possible that that's a good idea, but that should probably > work as an additive thing. That's something that I think of as a > "clever technique", whereas I'm focussed on just not being naive in > how we represent this one specific WAL record type. Got it. I think that's a fair point. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_basebackup's --gzip switch misbehaves
On Wed, Sep 21, 2022 at 11:43:56PM -0400, Tom Lane wrote: > I don't have any opinion on the concrete merits of this change, > but I want to note that 15rc1 wraps on Monday, and we don't like > people pushing noncritical changes shortly before a wrap. There > is not a lot of time for fooling around here. If I were to do it in the next couple of hours, we'd still have quite a couple of days of coverage, which is plenty as far as I understand? Saying that, it is not a critical change. Just to give some numbers, for a fresh initdb's instance base.tar.zst is at: - 3.6MB at level 0. - 3.8MB at level 1. - 3.6MB at level 2. - 4.3MB at level -1. - 4.6MB at level -2. - 6.1MB at level -7. I am not sure if there would be a huge demand for this much control over the current [1,22], but the library wants to control dynamically the bounds and has the APIs to allow that. -- Michael signature.asc Description: PGP signature
Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
On Thu, Sep 22, 2022 at 07:16:41AM +0530, Bharath Rupireddy wrote: > t/010_pg_basebackup.pl ... 134/? > # Failed test 'pg_basebackup reports checksum mismatch stderr > /(?^s:^WARNING.*checksum verification failed)/' > # at t/010_pg_basebackup.pl line 769. > # 'unrecognized win32 error code: 123WARNING: > checksum verification failed in file "./base/5/16399", block 0: > calculated 4C09 but expected B3F6 > # pg_basebackup: error: checksum error occurred Shouldn't we extend the mapping table in win32error.c so as the information provided is more useful when seeing this error, then? There could be other code path able to trigger this failure, or other hackers working on separate features that could benefit from this extra information. -- Michael signature.asc Description: PGP signature
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart wrote: > > On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote: > > > In short, this code needs to be lower level so that we still have full > > control while being portable. I will work on this, and also the related > > code for node dispatch. > > Is it possible to use approach #2 here, too? AFAICT space is allocated for > all of the chunks, so there wouldn't be any danger in searching all them > and discarding any results >= node->count. Sure, the caller could pass the maximum node capacity, and then check if the returned index is within the range of the node count. > Granted, we're depending on the > number of chunks always being a multiple of elements-per-vector in order to > avoid the tail path, but that seems like a reasonably safe assumption that > can be covered with comments. Actually, we don't need to depend on that at all. When I said "junk" above, that can be any bytes, as long as we're not reading off the end of allocated memory. We'll never do that here, since the child pointers/values follow. In that case, the caller can hard-code the size (it would even happen to work now to multiply rt_node_kind by 16, to be sneaky). One thing I want to try soon is storing fewer than 16/32 etc entries, so that the whole node fits comfortably inside a power-of-two allocation. That would allow us to use aset without wasting space for the smaller nodes, which would be faster and possibly would solve the fragmentation problem Andres referred to in https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de While on the subject, I wonder how important it is to keep the chunks in the small nodes in sorted order. That adds branches and memmove calls, and is the whole reason for the recent "pg_lfind_ge" function. -- John Naylor EDB: http://www.enterprisedb.com
Re: pg_basebackup's --gzip switch misbehaves
Michael Paquier writes: > On Wed, Sep 21, 2022 at 11:43:56PM -0400, Tom Lane wrote: >> I don't have any opinion on the concrete merits of this change, >> but I want to note that 15rc1 wraps on Monday, and we don't like >> people pushing noncritical changes shortly before a wrap. There >> is not a lot of time for fooling around here. > If I were to do it in the next couple of hours, we'd still have quite > a couple of days of coverage, which is plenty as far as I understand? Sure. I'd say we have 48 hours to choose whether to put this in v15. But not more than that. regards, tom lane
Re: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER
First, My message from corp email wasn't displayed in the thread, That is what Jacob replied to, let me post it here for context: > We can support both passing the token from an upstream client and libpq > implementing OAUTH2 protocol to obtain one. > > Libpq implementing OAUTHBEARER is needed for community/3rd party tools to > have user-friendly authentication experience: > > 1. For community client tools, like pg_admin, psql etc. > Example experience: pg_admin would be able to open a popup dialog to > authenticate customers and keep refresh tokens to avoid asking the user > frequently. > 2. For 3rd party connectors supporting generic OAUTH with any provider. > Useful for datawiz clients, like Tableau or ETL tools. Those can support both > user and client OAUTH flows. > > Libpq passing toked directly from an upstream client is useful in other > scenarios: > 1. Enterprise clients, built with .Net / Java and using provider-specific > authentication libraries, like MSAL for AAD. Those can also support more > advanced provider-specific token acquisition flows. > 2. Resource-tight (like IoT) clients. Those can be compiled without the > optional libpq flag not including the iddawc or other dependency. - On this: > What I don't understand is how the OAUTHBEARER mechanism helps you in > this case. You're short-circuiting the negotiation where the server > tells the client what provider to use and what scopes to request, and > instead you're saying "here's a secret string, just take it and > validate it with magic." > > I realize the ability to pass an opaque token may be useful, but from > the server's perspective, I don't see what differentiates it from the > password auth method plus a custom authenticator plugin. Why pay for > the additional complexity of OAUTHBEARER if you're not going to use > it? Yes, passing a token as a new auth method won't make much sense in isolation. However: 1. Since OAUTHBEARER is supported in the ecosystem, passing a token as a way to authenticate with OAUTHBEARER is more consistent (IMO), then passing it as a password. 2. Validation on the backend side doesn't depend on whether the token is obtained by libpq or transparently passed by the upstream client. 3. Single OAUTH auth method on the server side for both scenarios, would allow both enterprise clients with their own Token acquisition and community clients using libpq flows to connect as the same PG users/roles. On Wed, Sep 21, 2022 at 8:36 PM Jacob Champion wrote: > > On Wed, Sep 21, 2022 at 3:10 PM Andrey Chudnovskiy > wrote: > > We can support both passing the token from an upstream client and libpq > > implementing OAUTH2 protocol to obtaining one. > > Right, I agree that we could potentially do both. > > > Libpq passing toked directly from an upstream client is useful in other > > scenarios: > > 1. Enterprise clients, built with .Net / Java and using provider-specific > > authentication libraries, like MSAL for AAD. Those can also support more > > advance provider-specific token acquisition flows. > > 2. Resource-tight (like IoT) clients. Those can be compiled without > > optional libpq flag not including the iddawc or other dependency. > > What I don't understand is how the OAUTHBEARER mechanism helps you in > this case. You're short-circuiting the negotiation where the server > tells the client what provider to use and what scopes to request, and > instead you're saying "here's a secret string, just take it and > validate it with magic." > > I realize the ability to pass an opaque token may be useful, but from > the server's perspective, I don't see what differentiates it from the > password auth method plus a custom authenticator plugin. Why pay for > the additional complexity of OAUTHBEARER if you're not going to use > it? > > --Jacob > > > >
Re: Background writer and checkpointer in crash recovery
On Thu, Sep 15, 2022 at 10:19:01PM -0500, Justin Pryzby wrote: > I don't know that this warrants an Opened Item, but I think some fix > ought to be applied to v15, whether that happens this week or next > month. With RC1 getting close by, I have looked at that again and applied a patch that resets the ps display after recovery is done, bringing as back to the same state as PG <= 14 in the startup process. -- Michael signature.asc Description: PGP signature
Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
On Thu, Sep 22, 2022 at 10:07 AM Michael Paquier wrote: > > On Thu, Sep 22, 2022 at 07:16:41AM +0530, Bharath Rupireddy wrote: > > t/010_pg_basebackup.pl ... 134/? > > # Failed test 'pg_basebackup reports checksum mismatch stderr > > /(?^s:^WARNING.*checksum verification failed)/' > > # at t/010_pg_basebackup.pl line 769. > > # 'unrecognized win32 error code: 123WARNING: > > checksum verification failed in file "./base/5/16399", block 0: > > calculated 4C09 but expected B3F6 > > # pg_basebackup: error: checksum error occurred > > Shouldn't we extend the mapping table in win32error.c so as the > information provided is more useful when seeing this error, then? > There could be other code path able to trigger this failure, or other > hackers working on separate features that could benefit from this > extra information. Thanks. I will start a separate thread to discuss that. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-21 09:46:30 -0700, Andres Freund wrote: > I'm planning to commit this today, unless somebody wants to argue against > that. And done! Changes: - fixed a few typos (thanks Thomas) - less duplication in the CI tasks - removed an incomplete implementation of the target for abbrevs.txt - do we even want to have that? - plenty hand wringing on my part I also rebased my meson git tree, which still has plenty additional test platforms (netbsd, openbsd, debian sid, fedora rawhide, centos 8, centos 7, opensuse tumbleweed), but without the autoconf versions of those targets. I also added a commit that translates most of the CompilerWarnings task to meson. Still need to add a headerscheck / cpluspluscheck target. Greetings, Andres Freund
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Sep 22, 2022 at 1:46 PM John Naylor wrote: > > > On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart > wrote: > > > > On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote: > > > > > In short, this code needs to be lower level so that we still have full > > > control while being portable. I will work on this, and also the related > > > code for node dispatch. > > > > Is it possible to use approach #2 here, too? AFAICT space is allocated for > > all of the chunks, so there wouldn't be any danger in searching all them > > and discarding any results >= node->count. > > Sure, the caller could pass the maximum node capacity, and then check if the > returned index is within the range of the node count. > > > Granted, we're depending on the > > number of chunks always being a multiple of elements-per-vector in order to > > avoid the tail path, but that seems like a reasonably safe assumption that > > can be covered with comments. > > Actually, we don't need to depend on that at all. When I said "junk" above, > that can be any bytes, as long as we're not reading off the end of allocated > memory. We'll never do that here, since the child pointers/values follow. In > that case, the caller can hard-code the size (it would even happen to work > now to multiply rt_node_kind by 16, to be sneaky). One thing I want to try > soon is storing fewer than 16/32 etc entries, so that the whole node fits > comfortably inside a power-of-two allocation. That would allow us to use aset > without wasting space for the smaller nodes, which would be faster and > possibly would solve the fragmentation problem Andres referred to in > > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de > > While on the subject, I wonder how important it is to keep the chunks in the > small nodes in sorted order. That adds branches and memmove calls, and is the > whole reason for the recent "pg_lfind_ge" function. Good point. While keeping the chunks in the small nodes in sorted order is useful for visiting all keys in sorted order, additional branches and memmove calls could be slow. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com