Re: A doubt about a newly added errdetail
On 2022-Sep-27, Kyotaro Horiguchi wrote: > By the way, this is not an issue caused by the proposed patch, I see > the following message in the patch. > > - errdetail("Column list cannot be used > for a partitioned table when %s is false.", > + errdetail("Column list cannot be > specified for a partitioned table when %s is false.", > > "publish_via_partition_root"))); > > I think that the purpose of such separation of variable names is to > unify multiple messages differing only by the names (to keep > translation labor (relatively:p) low). In that sense, that separation > here is useless since I see no chance of having the same message with > another variable in future. Well, it also reduces chances for typos and such, so while it's not strictly necessary to do it this way, I tend to prefer it on new messages. However, as you say it's not very interesting when there's no possibility of duplication, so changing existing messages to this style when we have no other reason to change the message, is not a useful use of time. In this case we're changing the message in another way too, so I think it's okay. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Re: [RFC] building postgres with meson - v13
On Tue, Sep 27, 2022 at 2:06 AM Andres Freund wrote: > > On 2022-09-26 15:18:29 +0700, John Naylor wrote: > > Either way it doesn't exist on this machine. I was able to get a working > > build with > > > > /usr/local/Homebrew/Library/Homebrew/os/mac/pkgconfig > > Hm - what did you need this path for - I don't think that should be needed. I just cargo-culted the pattern from Arm (before I figured out it was Arm) and used the "find" command to look for the directories by name. I tried again without specifying any of the three directory flags, and I can run the tests getting: Ok: 233 Expected Fail: 0 Fail: 0 Unexpected Pass:0 Skipped:2 Timeout:0 ...which is fine for me since I don't do much development on MacOS nowadays. > > 1) /opt/homebrew/ seems to be an "Apple silicon" path? > > Yea, it's /usr/local on x86-64, based on what was required to make macos CI > work. I updated the wiki page, half-blindly - it'd be nice if you could > confirm that that works? Not sure if you intended for me to try the full script in your last response or just what's in the wiki page, but for the latter (on commit bed0927aeb0c6), it fails at [1656/2199] Linking target src/bin/psql/psql FAILED: src/bin/psql/psql clang -o src/bin/psql/psql src/bin/psql/psql.p/meson-generated_.._psqlscanslash.c.o src/bin/psql/psql.p/meson-generated_.._sql_help.c.o src/bin/psql/psql.p/command.c.o src/bin/psql/psql.p/common.c.o src/bin/psql/psql.p/copy.c.o src/bin/psql/psql.p/crosstabview.c.o src/bin/psql/psql.p/describe.c.o src/bin/psql/psql.p/help.c.o src/bin/psql/psql.p/input.c.o src/bin/psql/psql.p/large_obj.c.o src/bin/psql/psql.p/mainloop.c.o src/bin/psql/psql.p/prompt.c.o src/bin/psql/psql.p/startup.c.o src/bin/psql/psql.p/stringutils.c.o src/bin/psql/psql.p/tab-complete.c.o src/bin/psql/psql.p/variables.c.o -L/usr/local/opt/readline/lib -L/usr/local/opt/gettext/lib -L/usr/local/opt/zlib/lib -L/usr/local/opt/openssl/lib -I/usr/local/opt/readline/include -I/usr/local/opt/gettext/include -I/usr/local/opt/zlib/include -I/usr/local/opt/openssl/include -Wl,-dead_strip_dylibs -Wl,-headerpad_max_install_names -Wl,-undefined,error -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk -Wl,-rpath,@loader_path/../../interfaces/libpq -Wl,-rpath,/usr/local/lib -Wl,-rpath,/usr/local/Cellar/zstd/1.5.2/lib src/fe_utils/libpgfeutils.a src/common/libpgcommon.a src/common/libpgcommon_ryu.a src/common/libpgcommon_config_info.a src/port/libpgport.a src/port/libpgport_crc.a src/interfaces/libpq/libpq.5.dylib -lm /usr/local/lib/libintl.dylib -ledit -lz /usr/local/Cellar/zstd/1.5.2/lib/libzstd.dylib -lz -lz -lz Undefined symbols for architecture x86_64: "_rl_completion_suppress_quote", referenced from: _psql_completion in tab-complete.c.o _quote_file_name in tab-complete.c.o _complete_from_files in tab-complete.c.o "_rl_filename_dequoting_function", referenced from: _initialize_readline in tab-complete.c.o "_rl_filename_quote_characters", referenced from: _initialize_readline in tab-complete.c.o "_rl_filename_quoting_function", referenced from: _initialize_readline in tab-complete.c.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) -- John Naylor EDB: http://www.enterprisedb.com
Re: pg_rewind WAL segments deletion pitfall
At Thu, 1 Sep 2022 13:33:09 +0200, Polina Bungina wrote in > Here is the new version of the patch that includes the changes you > suggested. It is smaller now but I doubt if it is as easy to understand as > it used to be. pg_rewind works in two steps. First it constructs file map which decides the action for each file, then second, it performs file operations according to the file map. So, if we are going to do something on some files, that action should be record that in the file map, I think. Regarding the the patch, pg_rewind starts reading segments from the divergence point back to the nearest checkpoint, then moves foward during rewinding. So, the fact that SimpleXLogPageRead have read a segment suggests that the segment is required during the next startup. So I don't think we need to move around the keepWalSeg flag. All files that are wanted while rewinding should be preserved unconditionally. It's annoying that the file path for file map and open(2) have different top directory. But sharing the same path string between the two seems rather ugly.. I feel uncomfortable to directly touch the internal of file_entry_t outside filemap.c. I'd like to hide the internals in filemap.c, but pg_rewind already does that.. + /* +* Some entries (WAL segments) already have an action assigned +* (see SimpleXLogPageRead()). +*/ + if (entry->action == FILE_ACTION_NONE) + continue; entry->action = decide_file_action(entry); It might be more reasonable to call decide_file_action() when action is UNDECIDED. > The need of manipulations with the target’s pg_wal/archive_status directory > is a question to discuss… > > At first glance it seems to be useless for .ready files: checkpointer > process will anyway recreate them if archiving is enabled on the rewound > old primary and we will finally have them in the archive. As for the .done > files, it seems reasonable to follow the pg_basebackup logic and keep .done > files together with the corresponding segments (those between the last > common checkpoint and the point of divergence) to protect them from being > archived once again. > > But on the other hand it seems to be not that straightforward: imaging we > have WAL segment X on the target along with X.done file and we decide to > preserve them both (or we download it from archive and force .done file > creation), while archive_mode was set to ‘always’ and the source (promoted > replica) also still has WAL segment X and X.ready file. After pg_rewind we > will end up with both X.ready and X.done, which seems to be not a good > situation (but most likely not critical either). Thanks for the thought. Yes, it's not so straight-forward. And, as you mentioned, the worst result comes from not doing that is that some already-archived segments are archived at next run, which is generally harmless. So I think we're ok to ignore that in this patdh then create other patch if we still want to do that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Thu, 22 Sept 2022 at 18:13, Andres Freund wrote: > Due to the merge of the meson based build this patch needs to be > adjusted: https://cirrus-ci.com/build/6350479973154816 > > Looks like you need to add amcheck--1.3--1.4.sql to the list of files to be > installed and t/004_verify_nbtree_unique.pl to the tests. > > Greetings, > > Andres Freund > Thanks! Fixed. -- Best regards, Maxim Orlov. v17-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-u.patch Description: Binary data
Re: Add last_vacuum_index_scans in pg_stat_all_tables
On 2022-Sep-16, Fujii Masao wrote: > Could you tell me why the number of index scans should be tracked for > each table? Instead, isn't it enough to have one global counter, to > check whether the current setting of maintenance_work_mem is sufficient > or not? That is, I'm thinking to have something like pg_stat_vacuum view > that reports, for example, the number of vacuum runs, the total > number of index scans, the maximum number of index scans by one > vacuum run, the number of cancellation of vacuum because of > lock conflicts, etc. If so, when these global counters are high or > increasing, we can think that it may worth tuning maintenance_work_mem. I think that there are going to be cases where some tables in a database definitely require multiple index scans no matter what; but you definitely want to know how many occurred for others, not so highly trafficked tables. So I *think* a single counter across the whole database might not be sufficient. The way I imagine using this (and I haven't operated databases in quite a while so this may be all wet) is that I would have a report of which tables have the highest numbers of indexscans, then study the detailed vacuum reports for those tables as a way to change autovacuum_work_mem. On the other hand, we have an absolute high cap of 1 GB for autovacuum's work_mem, and many systems are already using that as the configured value. Maybe trying to fine-tune it is a waste of time. If a 1TB table says that it had 4 index scans, what are you going to do about it? It's a lost cause. It sounds like we need more code changes so that more memory can be used; and also changes so that that memory is used more efficiently. We had a patch for this, I don't know if that was committed already. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
Re: Allow foreign keys to reference a superset of unique columns
On Tue, 27 Sept 2022 at 06:08, James Coleman wrote: > > On Mon, Sep 26, 2022 at 9:59 AM Wolfgang Walther > wrote: > > So no need for me to distract this thread from $subject anymore. I think > > the idea of allowing to create unique constraints on a superset of the > > columns of an already existing unique index is a good one, so let's > > discuss this further. > > Sounds good to me! I don't see any immediate problems with allowing UNIQUE constraints to be supported using a unique index which contains only a subset of columns that are mentioned in the constraint. There would be a few things to think about. e.g INSERT ON CONFLICT might need some attention as a unique constraint can be specified for use as the arbiter. Perhaps the patch could be broken down as follows: 0001: * Extend ALTER TABLE ADD CONSTRAINT UNIQUE syntax to allow a column list when specifying USING INDEX. * Add checks to ensure the index in USING INDEX contains only columns mentioned in the column list. * Do any required work for INSERT ON CONFLICT. I've not looked at the code but maybe some adjustments are required for where it gets the list of columns. * Address any other places that assume the supporting index contains all columns of the unique constraint. 0002: * Adjust transformFkeyCheckAttrs() to have it look at UNIQUE constraints as well as unique indexes * Ensure information_schema.referential_constraints view still works correctly. I think that would address all of Tom's concerns he mentioned in [1]. I wasn't quite sure I understood the NOT NULL concern there since going by RI_FKey_pk_upd_check_required(), we don't enforce FKs when the referenced table has a NULL in the FK's columns. David [1] https://www.postgresql.org/message-id/3057718.1658949...@sss.pgh.pa.us
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
This commit introduced BackupState struct. The comment of do_pg_backup_start says that: > * It fills in backup_state with the information required for the backup, And the parameters are: > do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, > BackupState *state, StringInfo tblspcmapfile) So backup_state is different from both the type BackupState and the parameter state. I find it annoying. Don't we either rename the parameter or fix the comment? The parameter "state" sounds a bit too generic. So I prefer to rename the parameter to backup_state, as the attached. What do you think about this? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1dd6df0fe1..715d5868eb 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8244,10 +8244,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) * function. It creates the necessary starting checkpoint and constructs the * backup state and tablespace map. * - * Input parameters are "state" (the backup state), "fast" (if true, we do - * the checkpoint in immediate mode to make it faster), and "tablespaces" - * (if non-NULL, indicates a list of tablespaceinfo structs describing the - * cluster's tablespaces.). + * Input parameters are "backup_state", "fast" (if true, we do the checkpoint + * in immediate mode to make it faster), and "tablespaces" (if non-NULL, + * indicates a list of tablespaceinfo structs describing the cluster's + * tablespaces.). * * The tablespace map contents are appended to passed-in parameter * tablespace_map and the caller is responsible for including it in the backup @@ -8269,11 +8269,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) */ void do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, - BackupState *state, StringInfo tblspcmapfile) + BackupState *backup_state, StringInfo tblspcmapfile) { boolbackup_started_in_recovery = false; - Assert(state != NULL); + Assert(backup_state != NULL); backup_started_in_recovery = RecoveryInProgress(); /* @@ -8292,7 +8292,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, errmsg("backup label too long (max %d bytes)", MAXPGPATH))); - memcpy(state->name, backupidstr, strlen(backupidstr)); + memcpy(backup_state->name, backupidstr, strlen(backupidstr)); /* * Mark backup active in shared memory. We must do full-page WAL writes @@ -8385,9 +8385,9 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, * pointer. */ LWLockAcquire(ControlFileLock, LW_SHARED); - state->checkpointloc = ControlFile->checkPoint; - state->startpoint = ControlFile->checkPointCopy.redo; - state->starttli = ControlFile->checkPointCopy.ThisTimeLineID; + backup_state->checkpointloc = ControlFile->checkPoint; + backup_state->startpoint = ControlFile->checkPointCopy.redo; + backup_state->starttli = ControlFile->checkPointCopy.ThisTimeLineID; checkpointfpw = ControlFile->checkPointCopy.fullPageWrites; LWLockRelease(ControlFileLock); @@ -8404,7 +8404,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, recptr = XLogCtl->lastFpwDisableRecPtr; SpinLockRelease(&XLogCtl->info_lck); - if (!checkpointfpw || state->startpoint <= recptr) + if (!checkpointfpw || backup_state->startpoint <= recptr) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL generated with full_page_writes=off was replayed " @@ -8436,9 +8436,9 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, * either because only few buffers have been dirtied yet. */ WALInsertLockAcquireExclusive(); - if (XLogCtl->Insert.lastBackupStart < state->startpoint) + if (XLogCtl->Insert.lastBackupStart < backup_state->startpoint) { - XLogCtl->Insert.lastBackupStart = state->startpoint; + XLogCtl->Insert.lastBackupStart = backup_state->star
Re: Avoid memory leaks during base backups
At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy wrote in > On Mon, Sep 26, 2022 at 7:34 PM Tom Lane wrote: > > This ... seems like inventing your own shape of wheel. The > > normal mechanism for preventing this type of leak is to put the > > allocations in a memory context that can be reset or deallocated > > in mainline code at the end of the operation. > > Yes, that's the typical way and the patch attached does it for > perform_base_backup(). What happens if we allocate some memory in the > new memory context and error-out before reaching the end of operation? > How do we deallocate such memory? Whoever directly or indirectly catches the exception can do that. For example, SendBaseBackup() seems to catch execptions from perform_base_backup(). bbsinc_cleanup() is already resides there. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Tue, Sep 27, 2022 at 1:54 PM Kyotaro Horiguchi wrote: > > This commit introduced BackupState struct. The comment of > do_pg_backup_start says that: > > > * It fills in backup_state with the information required for the backup, > > And the parameters are: > > > do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces, > > BackupState *state, StringInfo > > tblspcmapfile) > > So backup_state is different from both the type BackupState and the > parameter state. I find it annoying. Don't we either rename the > parameter or fix the comment? > > The parameter "state" sounds a bit too generic. So I prefer to rename > the parameter to backup_state, as the attached. > > What do you think about this? -1 from me. We have the function context and the structure name there to represent that the parameter name 'state' is actually 'backup state'. I don't think we gain anything here. Whenever the BackupState is used away from any function, the variable name backup_state is used, static variable in xlogfuncs.c -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Data is copied twice when specifying both child and parent table in publication
Here are my review comments for the HEAD_v11-0001 patch: == 1. General - Another related bug? In [1] Hou-san wrote: For another case you mentioned (via_root used when publishing child) CREATE PUBLICATION pub1 for TABLE parent; CREATE PUBLICATION pub2 for TABLE child with (publish_via_partition_root); CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2; The expected behavior is only the child table is published, all the changes should be replicated using the child table's identity. We should do table sync only for child tables and is same as the current behavior on HEAD. So, I think there is no bug in this case. ~ That behaviour seems different to my understanding because the pgdocs says when the via_root param is true the 'child' table would be using the 'parent' identity: [2] publish_via_partition_root determines whether changes in a partitioned table (or on its partitions) contained in the publication will be published using the identity and schema of the partitioned table rather than that of the individual partitions that are actually changed. ~ So is this another bug (slightly different from the current one being patched), or is it just some different special behaviour? If it's another bug then you need to know that ASAP because I think you may want to fix both of them at the same time which might impact how this 2x data copy patch should be implemented. Or perhaps just the pgdocs need more notes about special cases/combinations like this? == 2. General - documentation? For this current patch, IIUC it was decided that it is a bug because the data gets duplicated, and then some sensible rule was decided that this patch should use to address it (e.g. publishing a child combined with publishing a parent via_root will just ignore the child's publication...). So my question is - is this (new/fixed) behaviour something that a user will be able to figure out themselves from the existing documentation, or does this patch now need its own special notes in the documentation? == 3. src/backend/catalog/pg_publication.c - pg_get_publication_tables + foreach(lc, pub_elem_tables) + { + Oid *result = (Oid *) malloc(sizeof(Oid) * 2); + + result[0] = lfirst_oid(lc); + result[1] = pub_elem->oid; + table_infos = lappend(table_infos, result); + } 3a. It looks like each element in the table_infos list is a malloced obj of 2x Oids (Oid of table, Oid of pub). IMO better to call this element 'table_info' instead of the meaningless 'result' ~ 3b. Actually, I think it would be better if this function defines a little 2-element structure {Oid relid, Oid pubid} to use instead of this array (which requires knowledge that [0] means relid and [1] means pubid). ~~~ 4. + foreach(lc, table_infos) + { + Oid *table_info_tmp = (Oid *) lfirst(lc); + + if (!list_member_oid(tables, table_info_tmp[0])) + table_infos = foreach_delete_current(table_infos, lc); } I think the '_tmp' suffix is not helpful here - IMO having another relid variable would make this more self-explanatory. Or better yet adopt the suggestion o f #3b and have a little struct with self-explanatory member names. = 5. src/backend/commands/subscriptioncmds.c - fetch_table_list + if (server_version >= 16) + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" Since there is an else statement block, I think this would be more readable if there was a statement block here too. YMMV SUGGESTION if (server_version >= 16) { appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" ... } ~~~ 6. + /* + * Get the list of tables from publisher, the partition table whose + * ancestor is also in this list will be ignored, otherwise the initial + * data in the partition table would be replicated twice. + */ 6a. "from publisher, the partition" -> "from the publisher. The partition" ~ 6b. This looks like a common comment that also applied to the "if" part, so it seems more appropriate to move it to where I indicated below. Perhaps the whole comment needs a bit of massaging after you move it... + /* + * Get namespace, relname and column list (if supported) of the tables + * belonging to the specified publications. + * + * HERE < + * + * From version 16, the parameter of the function pg_get_publication_tables + * can be an array of publications. The partition table whose ancestor is + * also published in this publication array will be filtered out in this + * function. + */ -- [1] https://www.postgresql.org/message-id/OS0PR01MB5716A30DDEECC59132E1084F94799%40OS0PR01MB5716.jpnprd01.prod.outlook.com [2] https://www.postgresql.org/docs/devel/sql-createpublication.html Kind Regards, Peter Smith. Fujitsu Australia
Re: Fast COPY FROM based on batch insert
On Tue, Aug 23, 2022 at 2:58 PM Andrey Lepikhov wrote: > On 22/8/2022 11:44, Etsuro Fujita wrote: > > I think the latter is more consistent with the existing error context > > information when in CopyMultiInsertBufferFlush(). Actually, I thought > > this too, and I think this would be useful when the COPY FROM command > > is executed on a foreign table. My concern, however, is the case when > > the command is executed on a partitioned table containing foreign > > partitions; in that case the input data would not always be sorted in > > the partition order, so the range for an error-occurring foreign > > partition might contain many lines with rows from other partitions, > > which I think makes the range information less useful. Maybe I'm too > > worried about that, though. > I got your point. Indeed, perharps such info doesn't really needed to be > included into the core, at least for now. Ok. Sorry for the late response. Best regards, Etsuro Fujita
Re: Differentiate MERGE queries with different structures
Pushed, thanks. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ It does it in a really, really complicated way why does it need to be complicated? Because it's MakeMaker.
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
At Tue, 27 Sep 2022 14:03:24 +0530, Bharath Rupireddy wrote in > On Tue, Sep 27, 2022 at 1:54 PM Kyotaro Horiguchi > wrote: > > What do you think about this? > > -1 from me. We have the function context and the structure name there > to represent that the parameter name 'state' is actually 'backup > state'. I don't think we gain anything here. Whenever the BackupState > is used away from any function, the variable name backup_state is > used, static variable in xlogfuncs.c There's no shadowing caused by the change. If we mind the same variable names between files, we could rename backup_state in xlogfuncs.c to process_backup_state or session_backup_state. If this is still unacceptable, I propose to change the comment. (I found that the previous patch forgets about do_pg_backup_stop()) - * It fills in backup_state with the information required for the backup, + * It fills in the parameter "state" with the information required for the backup, (This is following the notation just above) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] Log details for client certificate failures
Hi, On Tue, Sep 13, 2022 at 11:11 PM Peter Eisentraut wrote: > > On 09.09.22 00:32, Jacob Champion wrote: > > On Thu, Jul 28, 2022 at 9:19 AM Jacob Champion > > wrote: > >> On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion > >> wrote: > >>> v4 attempts to fix this by letting the check hooks pass > >>> MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend, > >>> which just mallocs.) > >> > >> Ping -- should I add an open item somewhere so this isn't lost? > > > > Trying again. Peter, is this approach acceptable? Should I try something > > else? > > This looks fine to me. Committed. While looking at the recent changes for check_cluster_name() I found this thread. Regarding the following change made by the commit 45b1a67a0fc, there is possibly small memory leak: static bool check_cluster_name(char **newval, void **extra, GucSource source) { + char *clean; + /* Only allow clean ASCII chars in the cluster name */ - pg_clean_ascii(*newval); + clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM); + if (!clean) + return false; + + clean = guc_strdup(WARNING, clean); + if (!clean) + return false; + *newval = clean; return true; } pg_clean_ascii() does palloc_extended() to allocate memory in Postmaster context for the new characters and the clean is then replaced with the new memory allocated by guc_strdup(). No-one references the memory allocated by pg_clean_ascii() and it lasts for postmaster lifetime. Valgrind memcheck also shows: 1 bytes in 1 blocks are definitely lost in loss record 4 of 70 at 0xCD2A16: palloc_extended (mcxt.c:1239) by 0xD09437: pg_clean_ascii (string.c:99) by 0x7A5CF3: check_cluster_name (variable.c:1061) by 0xCAF7CD: call_string_check_hook (guc.c:6365) by 0xCAA724: InitializeOneGUCOption (guc.c:1439) by 0xCAA0ED: InitializeGUCOptions (guc.c:1268) by 0x99B245: PostmasterMain (postmaster.c:691) by 0x858896: main (main.c:197) I think we can fix it by the attached patch but I'd like to discuss whether it's worth fixing it. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c index c795cb7a29..e555fb3150 100644 --- a/src/backend/commands/variable.c +++ b/src/backend/commands/variable.c @@ -1025,17 +1025,22 @@ bool check_application_name(char **newval, void **extra, GucSource source) { char *clean; + char *ret; /* Only allow clean ASCII chars in the application name */ clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM); if (!clean) return false; - clean = guc_strdup(WARNING, clean); - if (!clean) + ret = guc_strdup(WARNING, clean); + if (!ret) + { + pfree(clean); return false; + } - *newval = clean; + pfree(clean); + *newval = ret; return true; } @@ -1056,17 +1061,22 @@ bool check_cluster_name(char **newval, void **extra, GucSource source) { char *clean; + char *ret; /* Only allow clean ASCII chars in the cluster name */ clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM); if (!clean) return false; - clean = guc_strdup(WARNING, clean); - if (!clean) + ret = guc_strdup(WARNING, clean); + if (!ret) + { + pfree(clean); return false; + } - *newval = clean; + pfree(clean); + *newval = ret; return true; }
Re: Fast COPY FROM based on batch insert
On Wed, Aug 10, 2022 at 5:30 PM Etsuro Fujita wrote: > On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu wrote: > > + /* If any rows were inserted, run AFTER ROW INSERT triggers. */ > > ... > > + for (i = 0; i < inserted; i++) > > + { > > + TupleTableSlot *slot = rslots[i]; > > ... > > + slot->tts_tableOid = > > + RelationGetRelid(resultRelInfo->ri_RelationDesc); > > > > It seems the return value of > > `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a > > variable outside the for loop. > > Inside the for loop, assign this variable to slot->tts_tableOid. > > Actually, I did this to match the code in ExecBatchInsert(), but that > seems like a good idea, so I’ll update the patch as such in the next > version. Done. I also adjusted the code in CopyMultiInsertBufferFlush() a bit further. No functional changes. I put back in the original position an assertion ensuring the FDW supports batching. Sorry for the back and forth. Attached is an updated version of the patch. Other changes are: * The previous patch modified postgres_fdw.sql so that the existing test cases for COPY FROM were tested in batch-insert mode. But I think we should keep them as-is to test the default behavior, so I added test cases for this feature by copying-and-pasting some of the existing test cases. Also, the previous patch added this: +create table foo (a int) partition by list (a); +create table foo1 (like foo); +create foreign table ffoo1 partition of foo for values in (1) + server loopback options (table_name 'foo1'); +create table foo2 (like foo); +create foreign table ffoo2 partition of foo for values in (2) + server loopback options (table_name 'foo2'); +create function print_new_row() returns trigger language plpgsql as $$ + begin raise notice '%', new; return new; end; $$; +create trigger ffoo1_br_trig before insert on ffoo1 + for each row execute function print_new_row(); + +copy foo from stdin; +1 +2 +\. Rather than doing so, I think it would be better to use a partitioned table defined in the above section “test tuple routing for foreign-table partitions”, to save cycles. So I changed this as such. * I modified comments a bit further and updated docs. That is it. I will review the patch a bit more, but I feel that it is in good shape. Best regards, Etsuro Fujita v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-3.patch Description: Binary data
RE: A doubt about a newly added errdetail
On Tuesday, September 27, 2022 3:21 PM Alvaro Herrera wrote: > > On 2022-Sep-27, Kyotaro Horiguchi wrote: > > > By the way, this is not an issue caused by the proposed patch, I see > > the following message in the patch. > > > > -errdetail("Column list cannot be used > for a partitioned table when %s is false.", > > +errdetail("Column list cannot be > specified for a partitioned > > +table when %s is false.", > > > "publish_via_partition_root"))); > > > > I think that the purpose of such separation of variable names is to > > unify multiple messages differing only by the names (to keep > > translation labor (relatively:p) low). In that sense, that separation > > here is useless since I see no chance of having the same message with > > another variable in future. > > Well, it also reduces chances for typos and such, so while it's not strictly > necessary to do it this way, I tend to prefer it on new messages. However, as > you say it's not very interesting when there's no possibility of duplication, > so > changing existing messages to this style when we have no other reason to > change the message, is not a useful use of time. In this case we're changing > the message in another way too, so I think it's okay. Thanks for reviewing! Just in case I misunderstand, it seems you mean the message style[1] is OK, right ? [1] errdetail("Column list cannot be specified for a partitioned table when %s is false.", "publish_via_partition_root"))); Best regards, Hou zj
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Tue, Sep 27, 2022 at 6:43 PM Bharath Rupireddy wrote: > On Tue, Sep 27, 2022 at 3:08 AM Thomas Munro wrote: > > > > I don't think so, that's an extra kernel call. I think I'll just have > > to revert part of my recent change that removed the pg_ prefix from > > those function names in our code, and restore the comment that warns > > you about the portability hazard (I thought it went away with HP-UX > > 10, where we were literally calling lseek() before every write()). > > The majority of users of these functions don't intermix them with > > calls to read()/write(), so they don't care about the file position, > > so I think it's just something we'll have to continue to be mindful of > > in the places that do. > > Yes, all of the existing pwrite() callers don't care about the file > position, but the new callers such as the actual idea and patch > proposed here in this thread cares. > > Is this the commit cf112c122060568aa06efe4e6e6fb9b2dd4f1090 part of > which [1] you're planning to revert? Yeah, just the renaming parts of that. The lseek()-based emulation is definitely not coming back. Something like the attached. From fc72be32668f0c37b899fd922d6986f39d8a6a2a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 27 Sep 2022 21:12:03 +1300 Subject: [PATCH] Restore pg_pread and friends. Commit cf112c12 was a little too hasty in getting rid of the pg_ prefixes where we use pread(), pwrite() and vectored variants. We dropped support for ancient Unixes that needed to use lseek() to implement replacements for those, but it turned out that Windows also changes the current position even when you pass in an offset to ReadFile() and WriteFile(), if the file handle is synchronous. Switching to asynchronous file handles would fix that, but have other complications. For now let's just put back the pg_ prefix and add some comments to highlight the non-standard side-effect. Reported-by: Bharath Rupireddy Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13 --- .../pg_stat_statements/pg_stat_statements.c | 4 +- src/backend/access/heap/rewriteheap.c | 2 +- src/backend/access/transam/slru.c | 4 +- src/backend/access/transam/xlog.c | 4 +- src/backend/access/transam/xlogreader.c | 2 +- src/backend/access/transam/xlogrecovery.c | 2 +- src/backend/backup/basebackup.c | 2 +- src/backend/replication/walreceiver.c | 2 +- src/backend/storage/file/fd.c | 8 +-- src/backend/utils/init/miscinit.c | 2 +- src/bin/pg_test_fsync/pg_test_fsync.c | 50 +-- src/include/access/xlogreader.h | 4 +- src/include/port.h| 9 src/include/port/pg_iovec.h | 13 - src/include/port/win32_port.h | 4 +- src/port/preadv.c | 4 +- src/port/pwritev.c| 4 +- src/port/win32pread.c | 2 +- src/port/win32pwrite.c| 2 +- 19 files changed, 71 insertions(+), 53 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index ba868f0de9..73439c0199 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -2103,9 +2103,9 @@ qtext_store(const char *query, int query_len, if (fd < 0) goto error; - if (pwrite(fd, query, query_len, off) != query_len) + if (pg_pwrite(fd, query, query_len, off) != query_len) goto error; - if (pwrite(fd, "\0", 1, off + query_len) != 1) + if (pg_pwrite(fd, "\0", 1, off + query_len) != 1) goto error; CloseTransientFile(fd); diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index 2f08fbe8d3..b01b39b008 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -1150,7 +1150,7 @@ heap_xlog_logical_rewrite(XLogReaderState *r) /* write out tail end of mapping file (again) */ errno = 0; pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE); - if (pwrite(fd, data, len, xlrec->offset) != len) + if (pg_pwrite(fd, data, len, xlrec->offset) != len) { /* if write didn't set errno, assume problem is no disk space */ if (errno == 0) diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index c9a7b97949..b65cb49d7f 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -718,7 +718,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) errno = 0; pgstat_report_wait_start(WAIT_EVENT_SLRU_READ); - if (pread(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ) + if (pg_pread(fd, shared->page_buffer[slotno]
Re: Insertion Sort Improvements
Getting back to improvements for small sort runs, it might make sense to consider using in-register based sorting via sorting networks for very small runs. This talk is specific to database sorting and illustrates how such an approach can be vectorized: https://youtu.be/HeFwVNHsDzc?list=PLSE8ODhjZXjasmrEd2_Yi1deeE360zv5O&t=1090 It looks like some of the commercial DBMSs do this very successfully. They use 4 512bit registers (AVX-512) in this example, which could technically store 4 * 4 sort-elements (given that the sorting key is 64 bit and the tuple pointer is 64 bit). I wonder whether this could also be done with just SSE (instead of AVX), which the project now readily supports thanks to your recent efforts?
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Tue, Sep 27, 2022 at 2:20 PM Kyotaro Horiguchi wrote: > > > -1 from me. We have the function context and the structure name there > > to represent that the parameter name 'state' is actually 'backup > > state'. I don't think we gain anything here. Whenever the BackupState > > is used away from any function, the variable name backup_state is > > used, static variable in xlogfuncs.c > > There's no shadowing caused by the change. If we mind the same > variable names between files, we could rename backup_state in > xlogfuncs.c to process_backup_state or session_backup_state. -1. > If this is still unacceptable, I propose to change the comment. (I > found that the previous patch forgets about do_pg_backup_stop()) > > - * It fills in backup_state with the information required for the backup, > + * It fills in the parameter "state" with the information required for the > backup, +1. There's another place that uses backup_state in the comments. I modified that as well. Please see the attached patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From b66c67712f3499259427c3ea32d102ac802941ac Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 27 Sep 2022 09:36:32 + Subject: [PATCH v1] Adjust BackupState comments to not use backup_state --- src/backend/access/transam/xlog.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1dd6df0fe1..675f851daa 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8256,9 +8256,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) * symlinks while extracting files from tar. However for consistency and * platform-independence, we do it the same way everywhere. * - * It fills in backup_state with the information required for the backup, - * such as the minimum WAL location that must be present to restore from - * this backup (starttli) and the corresponding timeline ID (starttli). + * It fills in the parameter "state" with the information required for the + * backup, such as the minimum WAL location that must be present to restore + * from this backup (starttli) and the corresponding timeline ID (starttli), + * etc. * * Every successfully started backup must be stopped by calling * do_pg_backup_stop() or do_pg_abort_backup(). There can be many @@ -8574,8 +8575,9 @@ get_backup_status(void) * file (if required), resets sessionBackupState and so on. It can optionally * wait for WAL segments to be archived. * - * backup_state is filled with the information necessary to restore from this - * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc. + * It fills in the parameter "state" with the information necessary to restore + * from this backup with its stop LSN (stoppoint), its timeline ID (stoptli), + * etc. * * It is the responsibility of the caller of this function to verify the * permissions of the calling user! -- 2.34.1
Extend win32 error codes to errno mapping in win32error.c
Hi, I recently faced an issue on windows where one of the tests was failing with 'unrecognized win32 error code: 123', see [1]. I figured out that it was due to a wrong file name being sent to open() system call (this error is of my own making and I fixed it for that thread). The failure occurs in dosmaperr() in win32error.c due to an unmapped errno for win32 error code. The error code 123 i.e. ERROR_INVALID_NAME says "The file name, directory name, or volume label syntax is incorrect." [2], the closest errno mapping would be ENOENT. I quickly looked around for the other win32 error codes [2] that don't have mapping. I filtered out some common error codes such as ERROR_OUTOFMEMORY, ERROR_HANDLE_DISK_FULL, ERROR_INSUFFICIENT_BUFFER, ERROR_NOACCESS. There may be many more, but these seemed common IMO. Having a right errno mapping always helps recognize the errors correctly. I'm attaching a patch that maps the above win32 error codes to errno in win32error.c. I also think that we can add a note in win32error.c by mentioning the link [2] to revisit the mapping whenever "unrecognized win32 error code:XXX" error occurs. Thoughts? Thanks Michael Paquier for off list chat. [1] https://www.postgresql.org/message-id/CALj2ACWKvjOO-JzYpMBpk-o_o9CeKGEqMcS=yxf-pc6m+jo...@mail.gmail.com [2] https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/18d8fbe8-a967-4f1c-ae50-99ca8e491d2d -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 5e7ea6dc32e60d5a4b0868596eb63516c6ad21d7 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 27 Sep 2022 08:43:02 + Subject: [PATCH v1] Extend win32 error codes to errno mapping in win32error.c --- src/port/win32error.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/port/win32error.c b/src/port/win32error.c index fca867ba3d..44ede0cfaa 100644 --- a/src/port/win32error.c +++ b/src/port/win32error.c @@ -164,6 +164,21 @@ static const struct }, { ERROR_DELETE_PENDING, ENOENT + }, + { + ERROR_INVALID_NAME, ENOENT + }, + { + ERROR_OUTOFMEMORY, ENOMEM + }, + { + ERROR_HANDLE_DISK_FULL, ENOSPC + }, + { + ERROR_INSUFFICIENT_BUFFER, EINVAL + }, + { + ERROR_NOACCESS, EACCES } }; -- 2.34.1
Re: A doubt about a newly added errdetail
On 2022-Sep-27, houzj.f...@fujitsu.com wrote: > Thanks for reviewing! > > Just in case I misunderstand, it seems you mean the message style[1] is OK, > right ? > > [1] > errdetail("Column list cannot be specified for a partitioned table when %s is > false.", >"publish_via_partition_root"))); Yeah, since you're changing another word in that line, it's ok to move the parameter line off-string. (If you were only changing the parameter to %s and there was no message duplication, I would reject the patch as useless.) I'm going over that patch now, I have a few other changes as attached, intend to commit soon. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Ni aún el genio muy grande llegaría muy lejos si tuviera que sacarlo todo de su propio interior" (Goethe) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index ba1afc21ac..35a93ad200 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -125,7 +125,8 @@ parse_publication_options(ParseState *pstate, if (!SplitIdentifierString(publish, ',', &publish_list)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("invalid list syntax for \"publish\" option"))); + errmsg("invalid list syntax in parameter \"%s\"", +"publish"))); /* Process the option list. */ foreach(lc, publish_list) @@ -143,7 +144,8 @@ parse_publication_options(ParseState *pstate, else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("unrecognized \"publish\" value: \"%s\"", publish_opt))); + errmsg("unrecognized value for publication option \"%s\": \"%s\"", + "publish", publish_opt))); } } else if (strcmp(defel->defname, "publish_via_partition_root") == 0) @@ -444,14 +446,18 @@ contain_mutable_or_user_functions_checker(Oid func_id, void *context) } /* - * Check if the node contains any unallowed object. See + * Check if the node contains any unallowed object. Subroutine for * check_simple_rowfilter_expr_walker. * - * Returns the error detail message in errdetail_msg for unallowed expressions. + * If a disallowed object is found, *errdetail_msg is set to a (possibly + * translated) message to use as errdetail. If none, *errdetail_msg is not + * modified. */ static void expr_allowed_in_node(Node *node, ParseState *pstate, char **errdetail_msg) { + Assert(*errdetail_msg == NULL); + if (IsA(node, List)) { /* @@ -590,7 +596,7 @@ check_simple_rowfilter_expr_walker(Node *node, ParseState *pstate) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("invalid publication WHERE expression"), - errdetail("%s", errdetail_msg), + errdetail_internal("%s", errdetail_msg), parser_errposition(pstate, exprLocation(node; return expression_tree_walker(node, check_simple_rowfilter_expr_walker, @@ -714,7 +720,7 @@ CheckPubRelationColumnList(List *tables, char *pubname, bool publish_schema, /* * If the publication doesn't publish changes via the root partitioned * table, the partition's column list will be used. So disallow using - * the column list on partitioned table in this case. + * a column list on the partitioned table in this case. */ if (!pubviaroot && pri->relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) @@ -723,7 +729,7 @@ CheckPubRelationColumnList(List *tables, char *pubname, bool publish_schema, errmsg("cannot use column list for relation \"%s.%s\" in publication \"%s\"", get_namespace_name(RelationGetNamespace(pri->relation)), RelationGetRelationName(pri->relation), pubname), - errdetail("Column list cannot be specified for a partitioned table when %s is false.", + errdetail("Column lists cannot be specified for partitioned tables when %s is false.", "publish_via_partition_root"))); } } @@ -1302,7 +1308,7 @@ AlterPublicationSchemas(AlterPublicationStmt *stmt, errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("cannot add schema to publication \"%s\"", stmt->pubname), - errdetail("Schema cannot be added if any table that specifies column list is already part of the publication.")); + errdetail("Schemas cannot be added if any tables that specify a column list are already part of the publication.")); ReleaseSysCache(coltuple); }
Fix some newly modified tab-complete changes
Hi hackers, I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" should be "ALL TABLES IN SCHEMA" in the following case. postgres=# grant all on ALL FUNCTIONS IN SCHEMA DATABASE FUNCTION PARAMETER SCHEMATABLESPACE ALL PROCEDURES IN SCHEMA DOMAINinformation_schema. PROCEDURE SEQUENCE tbl ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER LANGUAGE public. TABLE TYPE ALL SEQUENCES IN SCHEMA FOREIGN SERVERLARGE OBJECT ROUTINE TABLES IN SCHEMA I found that it is related to the recent commit 790bf615dd, and maybe it's better to fix it. I also noticed that some comments should be modified according to this new syntax. Attach a patch to fix them. Regards, Shi yu 0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch Description: 0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch
Re: kerberos/001_auth test fails on arm CPU darwin
On 27.09.22 03:37, Andres Freund wrote: Maybe we should rely on PATH, rather than hardcoding OS dependent locations? Or at least fall back to seach binaries in PATH? Seems pretty odd to hardcode all these locations without a way to influence it from outside the test. Homebrew intentionally does not install the krb5 and openldap packages into the path, because they conflict with macOS-provided software. However, those macOS-provided variants don't provide all the pieces we need for the tests. Also, on Linux you need /usr/sbin, which is often not in the path. So I think there is no good way around hardcoding a lot of these paths.
Re: Add more docs for pg_surgery?
Regards, Zhang Mingli On Sep 27, 2022, 00:47 +0800, Ashutosh Sharma , wrote: > > And further it looks like you are doing an > experiment on undamaged relation which is not recommended as > documented. Yeah. > If the relation would have been damaged, you probably may > not be able to access it. > That make some sense. > -- > With Regards, > Ashutosh Sharma.
Re: Add common function ReplicationOriginName.
Hi Peter, > PSA patch v3 to combine the different replication origin name > formatting in a single function ReplicationOriginNameForLogicalRep as > suggested. LGTM except for minor issues with the formatting; fixed. > I expect you can find more like just this if you look harder than I did. Thanks. You are right, there are more places that pass int as the second argument of *nprintf(). I used a command: $ grep -r nprintf ./ | perl -lne 'print if($_ !~ /nprintf\([^\,]+,\s*(sizeof|([0-9A-Z_ \-]+\,))/ )' > found.txt ... and then re-checked the results manually. This excludes patterns like *nprintf(..., sizeof(...)) and *nprintf(..., MACRO) and leaves only something like *nprintf(..., variable). The cases where we subtract an integer from a Size, etc were ignored. I don't have a strong opinion on whether we should be really worried by this. But in case we do, here is the patch. The order of 0001 and 0002 doesn't matter. As I understand, ecpg uses size_t rather than Size, so for this library I used size_t. Not 100% sure if the changes I made to src/backend/utils/adt/jsonb.c add much value. I leave this to the committer to decide. -- Best regards, Aleksander Alekseev v4-0002-Add-common-function-ReplicationOriginNameForLogic.patch Description: Binary data v4-0001-Pass-Size-size_t-as-a-2nd-argument-of-snprintf.patch Description: Binary data
Re: has_privs_of_role vs. is_member_of_role, redux
On Tue, Sep 27, 2022 at 2:05 AM Wolfgang Walther wrote: > I'm just saying WITH SET FALSE should take away more of the things you > can do (all the ownership things) to a point where it's safe to GRANT .. > WITH INHERIT TRUE, SET FALSE and still be useful for pre-defined or > privilege-container roles. I don't see that as viable, either. It's too murky what you'd have to take away to make it safe, and it sounds like stuff that naturally falls under INHERIT rather than SET. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro wrote: > > Something like the attached. Isn't it also better to add notes in win32pread.c and win32pwrite.c about the callers doing lseek(SEEK_SET) if they wish to and Windows implementations changing file position? We can also add a TODO item about replacing pg_ versions with pread and friends someday when Windows fixes the issue? Having it in the commit and include/port.h is good, but the comments in win32pread.c and win32pwrite.c make life easier IMO. Otherwise, the patch LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Perform streaming logical transactions by background workers and parallel apply
On Monday, September 26, 2022 6:58 PM Amit Kapila wrote: > > On Mon, Sep 26, 2022 at 8:41 AM wangw.f...@fujitsu.com > wrote: > > > > On Thur, Sep 22, 2022 at 18:12 PM Amit Kapila > wrote: > > > > > 3. > > > ApplyWorkerMain() > > > { > > > ... > > > ... > > > + > > > + if (server_version >= 16 && > > > + MySubscription->stream == SUBSTREAM_PARALLEL) > > > + options.proto.logical.streaming = pstrdup("parallel"); > > > > > > After deciding here whether the parallel streaming mode is enabled > > > or not, we recheck the same thing in apply_handle_stream_abort() and > > > parallel_apply_can_start(). In parallel_apply_can_start(), we do it > > > via two different checks. How about storing this information say in > > > structure MyLogicalRepWorker in ApplyWorkerMain() and then use it at > > > other places? > > > > Improved as suggested. > > Added a new flag "in_parallel_apply" to structure MyLogicalRepWorker. > > > > Can we name the variable in_parallel_apply as parallel_apply and set it in > logicalrep_worker_launch() instead of in ParallelApplyWorkerMain()? Changed. > Few other comments: > == > 1. > + if (is_subworker && > + nparallelapplyworkers >= max_parallel_apply_workers_per_subscription) > + { > + LWLockRelease(LogicalRepWorkerLock); > + > + ereport(DEBUG1, > + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), > + errmsg("out of parallel apply workers"), errhint("You might need to > + increase > max_parallel_apply_workers_per_subscription."))); > > I think it is better to keep the level of this as LOG. Similar messages at > other > places use WARNING or LOG. Here, I prefer LOG because the system can still > proceed without blocking anything. Changed. > 2. > +/* Reset replication origin tracking. */ void > +parallel_apply_replorigin_reset(void) > +{ > + bool started_tx = false; > + > + /* This function might be called inside or outside of transaction. */ > + if (!IsTransactionState()) { StartTransactionCommand(); started_tx = > + true; } > > Why do we need a transaction in this function? I think we don't need it and removed this in the new version patch. > 3. Few suggestions to improve in the patch: > diff --git a/src/backend/replication/logical/worker.c > b/src/backend/replication/logical/worker.c > index 1623c9e2fa..d9c519dfab 100644 > --- a/src/backend/replication/logical/worker.c > +++ b/src/backend/replication/logical/worker.c > @@ -1264,6 +1264,10 @@ apply_handle_stream_prepare(StringInfo s) > case TRANS_LEADER_SEND_TO_PARALLEL: > Assert(winfo); > > + /* > + * The origin can be active only in one process. See > + * apply_handle_stream_commit. > + */ > parallel_apply_replorigin_reset(); > > /* Send STREAM PREPARE message to the parallel apply worker. */ @@ > -1623,12 +1627,7 @@ apply_handle_stream_abort(StringInfo s) > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg_internal("STREAM ABORT message without STREAM STOP"))); > > - /* > - * Check whether the publisher sends abort_lsn and abort_time. > - * > - * Note that the parallel apply worker is only started when the publisher > - * sends abort_lsn and abort_time. > - */ > + /* We receive abort information only when we can apply in parallel. */ > if (MyLogicalRepWorker->in_parallel_apply) > read_abort_info = true; > > @@ -1656,7 +1655,13 @@ apply_handle_stream_abort(StringInfo s) > Assert(winfo); > > if (subxid == xid) > + { > + /* > + * The origin can be active only in one process. See > + * apply_handle_stream_commit. > + */ > parallel_apply_replorigin_reset(); > + } > > /* Send STREAM ABORT message to the parallel apply worker. */ > parallel_apply_send_data(winfo, s->len, s->data); @@ -1858,6 +1863,12 @@ > apply_handle_stream_commit(StringInfo s) > case TRANS_LEADER_SEND_TO_PARALLEL: > Assert(winfo); > > + /* > + * We need to reset the replication origin before sending the commit > + * message and set it up again after confirming that parallel worker > + * has processed the message. This is required because origin can be > + * active only in one process at-a-time. > + */ > parallel_apply_replorigin_reset(); > > /* Send STREAM COMMIT message to the parallel apply worker. */ diff --git > a/src/include/replication/worker_internal.h > b/src/include/replication/worker_internal.h > index 4cbfb43492..2bd9664f86 100644 > --- a/src/include/replication/worker_internal.h > +++ b/src/include/replication/worker_internal.h > @@ -70,11 +70,7 @@ typedef struct LogicalRepWorker > */ > pid_t apply_leader_pid; > > - /* > - * Indicates whether to use parallel apply workers. > - * > - * Determined based on streaming parameter and publisher version. > - */ > + /* Indicates whether apply can be performed parallelly. */ > bool in_parallel_apply; > Merged, thanks. Best regards, Hou zj
RE: Perform streaming logical transactions by background workers and parallel apply
On Tuesday, September 27, 2022 2:32 PM Kuroda, Hayato/黒田 隼人 > > Dear Wang, > > Followings are comments for your patchset. Thanks for the comments. > > 0001 > > > 01. launcher.c - logicalrep_worker_stop_internal() > > ``` > + > + Assert(LWLockHeldByMe(LogicalRepWorkerLock)); > + > ``` Changed. > I think it should be Assert(LWLockHeldByMeInMode(LogicalRepWorkerLock, > LW_SHARED)) because the lock is released one and acquired again as > LW_SHARED. > If newer function has been acquired lock as LW_EXCLUSIVE and call > logicalrep_worker_stop_internal(), > its lock may become weaker after calling it. > > 02. launcher.c - apply_handle_stream_start() > > ``` > + /* > +* Notify handle methods we're processing a remote > in-progress > +* transaction. > +*/ > + in_streamed_transaction = true; > > - MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet)); > - FileSetInit(MyLogicalRepWorker->stream_fileset); > + /* > +* Start a transaction on stream start, this > transaction > will be > +* committed on the stream stop unless it is a > tablesync worker in > +* which case it will be committed after processing > all > the > +* messages. We need the transaction for handling the > buffile, > +* used for serializing the streaming data and subxact > info. > +*/ > + begin_replication_step(); > ``` > > Previously in_streamed_transaction was set after the begin_replication_step(), > but the ordering is modified. Maybe we don't have to modify it if there is no > particular reason. > > 03. launcher.c - apply_handle_stream_stop() > > ``` > + /* Commit the per-stream transaction */ > + CommitTransactionCommand(); > + > + /* Reset per-stream context */ > + MemoryContextReset(LogicalStreamingContext); > + > + pgstat_report_activity(STATE_IDLE, NULL); > + > + in_streamed_transaction = false; > ``` > > Previously in_streamed_transaction was set after the MemoryContextReset(), > but the ordering is modified. > Maybe we don't have to modify it if there is no particular reason. I adjusted the position of this due to some other improvements this time. > > 04. applyparallelworker.c - LogicalParallelApplyLoop() > > ``` > + shmq_res = shm_mq_receive(mqh, &len, &data, false); > ... > + if (ConfigReloadPending) > + { > + ConfigReloadPending = false; > + ProcessConfigFile(PGC_SIGHUP); > + } > ``` > > > Here the parallel apply worker waits to receive messages and after dispatching > it ProcessConfigFile() is called. > It means that .conf will be not read until the parallel apply worker receives > new > messages and apply them. > > It may be problematic when users change log_min_message to debugXXX for > debugging but the streamed transaction rarely come. > They expected that detailed description appears on the log from next > streaming chunk, but it does not. > > This does not occur in leader worker when it waits messages from publisher, > because it uses libpqrcv_receive(), which works asynchronously. > > I 'm not sure whether it should be documented that the evaluation of GUCs may > be delayed, how do you think? I changed the shm_mq_receive to asynchronous mode which is also consistent with what we did for Gather node when reading data from parallel query workers. > > === > 0004 > > 05. logical-replication.sgml > > ``` > ... > In that case, it may be necessary to change the streaming mode to on or off > and > cause the same conflicts again so the finish LSN of the failed transaction > will be > written to the server log. > ... > ``` > > Above sentence is added by 0001, but it is not modified by 0004. > Such transactions will be retried as streaming=on mode, so some descriptions > related with it should be added. Added. Best regards, Hou zj
Re: Adding a clang-format file
Hi Peter, > I really don't see any real problem with making something available, > without changing any official project guidelines. It's commonplace to > provide a clang-format file these days. Personally I don't have anything against the idea. TimescaleDB uses clang-format to mimic pgindent and it works quite well. One problem worth mentioning though is that the clang-format file is dependent on the particular version of clang-format. TSDB requires version 7 or 8 and the reason why the project can't easily switch to a newer version is that the format has changed. So perhaps a directory would be more appropriate than a single file. -- Best regards, Aleksander Alekseev
Re: A doubt about a newly added errdetail
While reading this code, I noticed that function expr_allowed_in_node() has a very strange API: it doesn't have any return convention at all other than "if we didn't modify errdetail_str then all is good". I was tempted to add an "Assert(*errdetail_msg == NULL)" at the start of it, just to make sure that it is not called if a message is already set. I think it would be much saner to inline the few lines of that function in its sole caller, as in the attached. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "E pur si muove" (Galileo Galilei) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 8895e1be4e..f9792df05f 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -447,36 +447,6 @@ contain_mutable_or_user_functions_checker(Oid func_id, void *context) func_id >= FirstNormalObjectId); } -/* - * Check if the node contains any disallowed object. Subroutine for - * check_simple_rowfilter_expr_walker. - * - * If a disallowed object is found, *errdetail_msg is set to a (possibly - * translated) message to use as errdetail. If none, *errdetail_msg is not - * modified. - */ -static void -expr_allowed_in_node(Node *node, ParseState *pstate, char **errdetail_msg) -{ - if (IsA(node, List)) - { - /* - * OK, we don't need to perform other expr checks for List nodes - * because those are undefined for List. - */ - return; - } - - if (exprType(node) >= FirstNormalObjectId) - *errdetail_msg = _("User-defined types are not allowed."); - else if (check_functions_in_node(node, contain_mutable_or_user_functions_checker, - (void *) pstate)) - *errdetail_msg = _("User-defined or built-in mutable functions are not allowed."); - else if (exprCollation(node) >= FirstNormalObjectId || - exprInputCollation(node) >= FirstNormalObjectId) - *errdetail_msg = _("User-defined collations are not allowed."); -} - /* * The row filter walker checks if the row filter expression is a "simple * expression". @@ -586,12 +556,32 @@ check_simple_rowfilter_expr_walker(Node *node, ParseState *pstate) } /* - * For all the supported nodes, check the types, functions, and collations - * used in the nodes. + * For all the supported nodes, if we haven't already found a problem, + * check the types, functions, and collations used in it. */ if (!errdetail_msg) - expr_allowed_in_node(node, pstate, &errdetail_msg); + { + if (IsA(node, List)) + { + /* + * OK, we don't need to perform other expr checks for List nodes + * because those are undefined for List. + */ + } + else if (exprType(node) >= FirstNormalObjectId) + errdetail_msg = _("User-defined types are not allowed."); + else if (check_functions_in_node(node, contain_mutable_or_user_functions_checker, + (void *) pstate)) + errdetail_msg = _("User-defined or built-in mutable functions are not allowed."); + else if (exprCollation(node) >= FirstNormalObjectId || + exprInputCollation(node) >= FirstNormalObjectId) + errdetail_msg = _("User-defined collations are not allowed."); + } + /* + * Finally, if we found a problem in this node, throw error now. Otherwise + * keep going. + */ if (errdetail_msg) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
Re: [RFC] building postgres with meson - v13
On 26.09.22 18:35, Andres Freund wrote: 9f5be26c1215 meson: Add docs for building with meson I do like the overall layout of this. The "Supported Platforms" section should be moved back to near the end of the chapter. I don't see a reason to move it forward, at least none that is related to the meson issue. The changes to the "Getting the Source" section are also not appropriate for this patch. We don't really support building from a tarball with meson yet (you'd need to confiure, maintainer-clean, configure meson), so it does make some sense... Okay, interesting point. I suggest that we write it as if that were fixed, and for the time being insert a (or similar) explaining the above restriction. Otherwise we'll have to rewrite it again later.
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy wrote: > > I've explained the problem with the current HA setup with synchronous > replication upthread at [1]. Let me reiterate it here once again. > > When a query is cancelled (a simple stroke of CTRL+C or > pg_cancel_backend() call) while the txn is waiting for ack in > SyncRepWaitForLSN(); for the client, the txn is actually committed > (locally-committed-but-not-yet-replicated to all of sync standbys) > like any other txn, a warning is emitted into server logs but it is of > no use for the client (think of client as applications). There can be > many such txns waiting for ack in SyncRepWaitForLSN() and query cancel > can be issued on all of those sessions. The problem is that the > subsequent reads will then be able to read all of those > locally-committed-but-not-yet-replicated to all of sync standbys txns > data - this is what I call an inconsistency (can we call this a > read-after-write inconsistency?) because of lack of proper query > cancel handling. And if the sync standbys are down or unable to come > up for some reason, until then, the primary will be serving clients > with the inconsistent data. BTW, I found a report of this problem here > [2]. > > The solution proposed for the above problem is to just 'not honor > query cancels at all while waiting for ack in SyncRepWaitForLSN()'. > > When a proc die is pending, then also, there can be > locally-committed-but-not-yet-replicated to all of sync standbys txns. > Typically, there are two choices for the clients 1) reuse the primary > instance after restart 2) failover to one of sync standbys. For case > (1), there might be read-after-write inconsistency as explained above. > For case (2), those txns might get lost completely if the failover > target sync standby or the new primary didn't receive them and the > other sync standbys that have received them are now ahead and need a > special treatment (run pg_rewind) for them to be able to connect to > new primary. > > The solution proposed for case (1) of the above problem is to 'process > the ProcDiePending immediately and upon restart the first backend can > wait until the sync standbys are caught up to ensure no inconsistent > reads'. > The solution proposed for case (2) of the above problem is to 'either > run pg_rewind for the sync standbys that are ahead or use the idea > proposed at [3]'. > > I hope the above explanation helps. > > [1] > https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com > [2] > https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication > [3] > https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com I'm attaching the v2 patch rebased on the latest HEAD. Please note that there are still some open points, I'm yet to find time to think more about them. Meanwhile, I'm posting the v2 patch for making cfbot happy. Any further thoughts on the overall design of the patch are most welcome. Thanks. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From be6734c83d72333cc4ec1b2be1f4d54b875b74c8 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 27 Sep 2022 12:55:34 + Subject: [PATCH v2] Wait specified amount of time before cancelling sync replication In PostgreSQL high availability setup with synchronous replication, typically all the transactions first locally get committed, then streamed to the synchronous standbys and the backends that generated the transaction will wait for acknowledgement from synchronous standbys. While waiting for acknowledgement, it may happen that the query or the transaction gets canceled or the backend that's waiting for acknowledgement is asked to exit. In either of these cases, the wait for acknowledgement gets canceled and leaves transaction in an inconsistent state as it got committed locally but not on the standbys. When set the GUC synchronous_replication_naptime_before_cancel introduced in this patch, it will let the backends wait for the acknowledgement before canceling the wait for acknowledgement so that the transaction can be available in synchronous standbys as well. --- doc/src/sgml/config.sgml | 30 +++ src/backend/replication/syncrep.c | 50 +++ src/backend/utils/misc/guc_tables.c | 12 + src/backend/utils/misc/postgresql.conf.sample | 2 + src/include/replication/syncrep.h | 3 ++ 5 files changed, 97 insertions(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d8848bc774..baeef49012 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4524,6 +4524,36 @@ ANY num_sync ( + synchronous_replication_naptime_before_cancel (integer) + + synchronous_replication_naptime_before_cancel configuration pa
Re: Differentiate MERGE queries with different structures
2022-09-27 17:48 に Alvaro Herrera さんは書きました: Pushed, thanks. The code and the tests went fine on my environment. Thank you Alvaro for your help, and thank you Julien for your review! Best, Tatsuhiro Nakamori
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Tue, 26 Jul 2022 at 21:35, Simon Riggs wrote: > On Fri, 15 Jul 2022 at 12:29, Aleksander Alekseev > wrote: > > Personally I didn't expect that > > merging patches in this thread would take that long. They are in > > "Ready for Committer" state for a long time now and there are no known > > issues with them other than unit tests for SLRU [1] should be merged > > first. > > These patches look ready to me, including the SLRU tests. > > Even though they do very little, these patches touch many aspects of > the code, so it would make sense to apply these as the last step in > the CF. > > To encourage committers to take that next step, let's have a > democratic vote on moving this forwards: > +1 from me. > This set of patches no longer applies cleanly to the master branch. There are lots of hunks as well as failures. Please rebase the patches. There are failures for multiple files including the one given below: patching file src/backend/replication/logical/worker.c Hunk #1 succeeded at 1089 (offset 1 line). Hunk #2 succeeded at 1481 (offset 1 line). Hunk #3 succeeded at 3322 (offset 2 lines). Hunk #4 succeeded at 3493 (offset 2 lines). Hunk #5 FAILED at 4009. 1 out of 5 hunks FAILED -- saving rejects to file src/backend/replication/logical/worker.c.rej
Re: DROP OWNED BY is broken on master branch.
On Tue, Sep 27, 2022 at 2:53 AM Rushabh Lathia wrote: > Yes, I was also thinking to avoid the duplicate logic but couldn't found > a way. I did the quick testing with the patch, and reported test is working > fine. But "make check" is failing with few failures. Oh, woops. There was a dumb mistake in that version -- it was testing sdepForm->dbid == SHARED_DEPENDENCY_OWNER, which is nonsense, instead of sdepForm->dbid == MyDatabaseId. Here's a fixed version. -- Robert Haas EDB: http://www.enterprisedb.com v2-0001-Fix-bug-in-DROP-OWNED-BY.patch Description: Binary data
Re: kerberos/001_auth test fails on arm CPU darwin
Hi, Thanks for the reviews! On 9/27/2022 5:21 AM, Tom Lane wrote: Andres Freund writes: Maybe we should rely on PATH, rather than hardcoding OS dependent locations? My suggestion to consult krb5-config first was meant to allow PATH to influence the results. However, if that doesn't work, it's important IMO to have a sane list of hardwired places to look in. I updated my patch regarding these reviews. The current logic is it will try to find all executables in that order(if it finds all executables, it won't try remaining steps): 1 - 'krb5-config --prefix' 2 - hardcoded paths(I added arm and MacPorts paths for darwin) 3 - from PATH Also, I tried to do some refactoring for adding another paths to search in the future and being sure about all executables are found. Ci run after fix is applied: https://cirrus-ci.com/build/5758254918664192 Regards, Nazir Bilal Yavuz From 916de33762aa72e9d9e4665ba23a1e610c585570 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 26 Sep 2022 10:56:51 +0300 Subject: [PATCH 1/2] ci: Add arm CPU for darwin . ci-os-only: darwin. --- .cirrus.yml | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 7b5cb02102..cecd978515 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -259,7 +259,6 @@ task: name: macOS - Monterey - Meson env: -CPUS: 12 # always get that much for cirrusci macOS instances BUILD_JOBS: $CPUS TEST_JOBS: $CPUS # already fast enough to not be worth tuning @@ -275,15 +274,26 @@ task: only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*' - osx_instance: -image: monterey-base + matrix: +- name: macOS - Monterey - Meson - ARM CPU + macos_instance: +image: ghcr.io/cirruslabs/macos-monterey-base:latest + env: +BREW_PATH: /opt/homebrew +CPUS: 4 + +- name: macOS - Monterey - Meson - Intel CPU + osx_instance: +image: monterey-base + env: +BREW_PATH: /usr/local +CPUS: 12 # always get that much for cirrusci macOS instances sysinfo_script: | id uname -a ulimit -a -H && ulimit -a -S export - setup_core_files_script: - mkdir ${HOME}/cores - sudo sysctl kern.corefile="${HOME}/cores/core.%P" @@ -322,7 +332,7 @@ task: ccache_cache: folder: $CCACHE_DIR configure_script: | -brewpath="/usr/local" +brewpath=${BREW_PATH} PKG_CONFIG_PATH="${brewpath}/lib/pkgconfig:${PKG_CONFIG_PATH}" for pkg in icu4c krb5 openldap openssl zstd ; do -- 2.25.1 From cc3f5236a6c18d9d16c8c6faa1e2044d0610f490 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 23 Sep 2022 14:30:06 +0300 Subject: [PATCH 2/2] fix: darwin: ARM CPU darwin krb5 path fix --- src/test/kerberos/t/001_auth.pl | 89 - 1 file changed, 65 insertions(+), 24 deletions(-) diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl index 47169a1d1e..c1211b80dd 100644 --- a/src/test/kerberos/t/001_auth.pl +++ b/src/test/kerberos/t/001_auth.pl @@ -30,39 +30,80 @@ elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/) plan skip_all => 'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA'; } -my ($krb5_bin_dir, $krb5_sbin_dir); - -if ($^O eq 'darwin') -{ - $krb5_bin_dir = '/usr/local/opt/krb5/bin'; - $krb5_sbin_dir = '/usr/local/opt/krb5/sbin'; -} -elsif ($^O eq 'freebsd') -{ - $krb5_bin_dir = '/usr/local/bin'; - $krb5_sbin_dir = '/usr/local/sbin'; -} -elsif ($^O eq 'linux') -{ - $krb5_sbin_dir = '/usr/sbin'; -} - my $krb5_config = 'krb5-config'; my $kinit= 'kinit'; my $kdb5_util= 'kdb5_util'; my $kadmin_local = 'kadmin.local'; my $krb5kdc = 'krb5kdc'; -if ($krb5_bin_dir && -d $krb5_bin_dir) +# get prefix for kerberos executables and try to find them at this path +sub test_krb5_paths { - $krb5_config = $krb5_bin_dir . '/' . $krb5_config; - $kinit = $krb5_bin_dir . '/' . $kinit; + my ($is_krb5_found, $krb5_path) = (@_); + + # if it is alredy found return + if ($is_krb5_found) + { + return 1; + } + + my ($kdb5_util_path, $kadmin_local_path, $krb5kdc_path, $krb5_config_path, $kinit_path, @krb5_file_paths, $bin, $sbin); + + # remove '\n' since 'krb5-config --prefix' returns path ends with '\n' + $krb5_path =~ s/\n//g; + # remove trailing slashes + $krb5_path =~ s{(.+)/\z}{$1}; + $bin = '/bin/'; + $sbin = '/sbin/'; + $kdb5_util_path= $krb5_path . $sbin . $kdb5_util; + $kadmin_local_path = $krb5_path . $sbin . $kadmin_local; + $krb5kdc_path = $krb5_path . $sbin . $krb5kdc; + $krb5_config_path = $krb5_path . $bin . $krb5_config; + $kinit_path= $krb5_path . $bin . $kinit; + @krb5_file_paths = ($kdb5_util_path, $ka
Re: GUC tables - use designated initializers
Peter Smith writes: > Enums index a number of the GUC tables. This all relies on the > elements being carefully arranged to be in the same order as those > enums. There are comments to say what enum index belongs to each table > element. > But why not use designated initializers to enforce what the comments > are hoping for? Interesting proposal, but it's not clear to me that this solution makes the code more bulletproof rather than less so. Yeah, you removed the order dependency, but the other concern here is that the array gets updated at all when adding a new enum entry. This method seems like it'd help disguise such an oversight. In particular, the adjacent StaticAssertDecls about the array lengths are testing something different than they used to, and I fear they lost some of their power. Can we improve those checks so they'd catch a missing entry again? > Furthermore, with this change, now the GUC table elements are able to > be rearranged into any different order - eg alphabetical - if that > would be useful (my patch does not do this). If anything, that's an anti-feature IMV. I quite dislike code where the same set of items are dealt with in randomly different orders in different places. regards, tom lane
Re: Temporary file access API
* So I think that code simplification and easy adoption of in-memory data changes (such as encryption or compression) are two rather distinct goals. admit that I'm running out of ideas how to develop a framework that'd be useful for both. I’m also wondering about code simplification vs a more general encryption/compression framework. I started with the current proposal, and I’m also looking at David Steele’s approach to encryption/compression in pgbackrest. I’m beginning to think we should do a high-level design which includes encryption/compression, and then implement it as a “simplification” without actually doing the transformations.
Re: Backends stalled in 'startup' state
On Thu, Sep 15, 2022 at 4:42 PM Ashwin Agrawal wrote: > > We recently saw many backends (close to max_connection limit) get stalled > in 'startup' in one of the production environments for Greenplum (fork of > PostgreSQL). Tracing the reason, it was found all the tuples created by > bootstrap (xmin=1) in pg_attribute were at super high block numbers (for > example beyond 30,000). Tracing the reason for the backend startup stall > exactly matched Tom's reasoning in [1]. Stalls became much longer in > presence of sub-transaction overflow or presence of long running > transactions as tuple visibility took longer. The thread ruled out the > possibility of system catalog rows to be present in higher block numbers > instead of in front for pg_attribute. > > This thread provides simple reproduction on the latest version of > PostgreSQL and RCA for how bootstrap catalog entries can move to higher > blocks and as a result cause stalls for backend starts. Simple fix to avoid > the issue provided at the end. > > The cause is syncscan triggering during VACUUM FULL. VACUUM FULL rewrites > the table by performing the seqscan as well. And > heapam_relation_copy_for_cluster() conveys feel free to use syncscan. Hence > logic to not start from block 0 instead some other block already in cache > is possible and opens the possibility to move the bootstrap tuples to > anywhere else in the table. > > -- > Repro > -- > -- create database to play > drop database if exists test; > create database test; > \c test > > -- function just to create many tables to increase pg_attribute size > -- (ideally many column table might do the job more easily) > CREATE OR REPLACE FUNCTION public.f(id integer) > RETURNS void > LANGUAGE plpgsql > STRICT > AS $function$ > declare > sql text; > i int; > begin > for i in id..id+ loop > sql='create table if not exists tbl'||i||' (id int)'; > execute sql; > end loop; > end; > $function$; > > select f(1); > select f(2); > select f(3); > select f(4); > > -- validate pg_attribute size is greater than 1/4 of shared_buffers > -- for syncscan to triggger > show shared_buffers; > select pg_size_pretty(pg_relation_size('pg_attribute')); > select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 > limit 5; > > -- perform seq scan of pg_attribute to page past bootstrapped tuples > copy (select * from pg_attribute limit 2000) to '/tmp/p'; > > -- this will internally use syncscan starting with block after bootstrap > tuples > -- and hence move bootstrap tuples last to higher block numbers > vacuum full pg_attribute; > > -- > Sample run > -- > show shared_buffers; > shared_buffers > > 128MB > (1 row) > > select pg_size_pretty(pg_relation_size('pg_attribute')); > pg_size_pretty > > 40 MB > (1 row) > > select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 > limit 5; > ctid | xmin | attrelid | attname > ---+--+--+-- > (0,1) |1 | 1255 | oid > (0,2) |1 | 1255 | proname > (0,3) |1 | 1255 | pronamespace > (0,4) |1 | 1255 | proowner > (0,5) |1 | 1255 | prolang > (5 rows) > > copy (select * from pg_attribute limit 2000) to '/tmp/p'; > COPY 2000 > vacuum full pg_attribute; > VACUUM > select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 > limit 5; >ctid| xmin | attrelid | attname > ---+--+--+-- > (5115,14) |1 | 1255 | oid > (5115,15) |1 | 1255 | proname > (5115,16) |1 | 1255 | pronamespace > (5115,17) |1 | 1255 | proowner > (5115,18) |1 | 1255 | prolang > (5 rows) > > > Note: > -- used logic causing the problem to fix it as well on the system :-) > -- scan till block where bootstrap tuples are located > select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 > limit 5; > -- now due to syncscan triggering it will pick the blocks with bootstrap > tuples first and help to bring them back to front > vacuum full pg_attribute; > > -- > Patch to avoid the problem: > -- > diff --git a/src/backend/access/heap/heapam_handler.c > b/src/backend/access/heap/heapam_handler.c > index a3414a76e8..4c031914a3 100644 > --- a/src/backend/access/heap/heapam_handler.c > +++ b/src/backend/access/heap/heapam_handler.c > @@ -757,7 +757,17 @@ heapam_relation_copy_for_cluster(Relation OldHeap, > Relation NewHeap, > pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, > > PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP); > > - tableScan = table_beginscan(OldHea
Re: making relfilenodes 56 bits
On Tue, Sep 27, 2022 at 2:33 AM Dilip Kumar wrote: > Looks fine to me. OK, committed. I also committed the 0002 patch with some wordsmithing, and I removed a < 0 test an an unsigned value because my compiler complained about it. 0001 turned out to make headerscheck sad, so I just pushed a fix for that, too. I'm not too sure about 0003. I think if we need an is_shared flag maybe we might as well just pass the tablespace OID. The is_shared flag seems to just make things a bit complicated for the callers for no real benefit. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Fix perltidy breaking perlcritic
[resending to -hackers instead of -committers] Andrew Dunstan writes: > On Fri, Sep 9, 2022 at 10:44 PM John Naylor > wrote: > >> On Fri, Sep 9, 2022 at 3:32 AM Andrew Dunstan wrote: >> >> > A better way do do this IMNSHO is to put the eval in a block on its own >> along with the no critic marker on its own line, like this: >> > >> > { >> >## no critic (ProhibitStringyEval) >> >eval ... >> > } >> > >> > perlcritic respects block boundaries for its directives. >> >> I tried that in the attached -- it looks a bit nicer but requires more >> explanation. I don't have strong feelings either way. >> >> > Maybe even better would be just this, which I bet perltidy would not monkey > with, and would require no explanation: > > eval "\$hash_ref = $_"; ## no critic (ProhibitStringyEval) I didn't see this until it got committed, since I'm not subscribed to -committers, but I think it would be even better to rely on the fact that eval returns the value of the last expression in the string, which also gets rid of the ugly quoting and escaping, per the attached. - ilmari >From 8ef12d134e0a21c289796207d87244ba5f5ec92c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Mon, 12 Sep 2022 10:43:16 +0100 Subject: [PATCH] Use return value of eval instead of assigning inside string --- src/backend/catalog/Catalog.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 919a828ca7..41bbabdfee 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -315,7 +315,7 @@ sub ParseData # We're treating the input line as a piece of Perl, so we # need to use string eval here. Tell perlcritic we know what # we're doing. - eval "\$hash_ref = $_"; ## no critic (ProhibitStringyEval) + $hash_ref = eval $_;## no critic (ProhibitStringyEval) if (!ref $hash_ref) { die "$input_file: error parsing line $.:\n$_\n"; @@ -361,7 +361,7 @@ sub ParseData # the whole file at once. local $/; my $full_file = <$ifd>; - eval "\$data = $full_file"## no critic (ProhibitStringyEval) + $data = eval $full_file## no critic (ProhibitStringyEval) or die "error parsing $input_file\n"; foreach my $hash_ref (@{$data}) { -- 2.34.1
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
v30 attached rebased and pgstat_io_ops.c builds with meson now also, I tested with pgstat_report_stat() only flushing when forced and tests still pass From 153b06200b36891c9e07df526a86dbd913e36e3e Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Thu, 11 Aug 2022 18:28:50 -0400 Subject: [PATCH v30 3/3] Add system view tracking IO ops per backend type Add pg_stat_io, a system view which tracks the number of IOOps (allocs, extends, fsyncs, reads, and writes) done through each IOContext (shared buffers, local buffers, strategy buffers) by each type of backend (e.g. client backend, checkpointer). Some BackendTypes do not accumulate IO operations statistics and will not be included in the view. Some IOContexts are not used by some BackendTypes and will not be in the view. For example, checkpointer does not use a BufferAccessStrategy (currently), so there will be no row for the "strategy" IOContext for checkpointer. Some IOOps are invalid in combination with certain IOContexts. Those cells will be NULL in the view to distinguish between 0 observed IOOps of that type and an invalid combination. For example, local buffers are not fsync'd so cells for all BackendTypes for IOCONTEXT_STRATEGY and IOOP_FSYNC will be NULL. Some BackendTypes never perform certain IOOps. Those cells will also be NULL in the view. For example, bgwriter should not perform reads. View stats are populated with statistics incremented when a backend performs an IO Operation and maintained by the cumulative statistics subsystem. Each row of the view shows stats for a particular BackendType and IOContext combination (e.g. shared buffer accesses by checkpointer) and each column in the view is the total number of IO Operations done (e.g. writes). So a cell in the view would be, for example, the number of shared buffers written by checkpointer since the last stats reset. Note that some of the cells in the view are redundant with fields in pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in pg_stat_bgwriter for backwards compatibility. Deriving the redundant pg_stat_bgwriter stats from the IO operations stats structures was also problematic due to the separate reset targets for 'bgwriter' and 'io'. Suggested by Andres Freund Author: Melanie Plageman Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- doc/src/sgml/monitoring.sgml | 115 ++- src/backend/catalog/system_views.sql | 12 ++ src/backend/utils/adt/pgstatfuncs.c | 117 src/include/catalog/pg_proc.dat | 9 ++ src/test/regress/expected/rules.out | 9 ++ src/test/regress/expected/stats.out | 202 +++ src/test/regress/sql/stats.sql | 104 ++ 7 files changed, 567 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 9440b41770..c7ca078bc8 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -448,6 +448,15 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_iopg_stat_io + A row for each IO Context for each backend type showing + statistics about backend IO operations. See + + pg_stat_io for details. + + + pg_stat_walpg_stat_wal One row only, showing statistics about WAL activity. See @@ -3600,7 +3609,111 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i stats_reset timestamp with time zone - Time at which these statistics were last reset + Time at which these statistics were last reset. + + + + + + + + + + pg_stat_io + + + pg_stat_io + + + + The pg_stat_io view has a row for each backend type + and IO Context containing global data for the cluster on IO Operations done + by that backend type in that IO Context. + + + + pg_stat_io View + + + + + Column Type + + + Description + + + + + + + backend_type text + + + Type of backend (e.g. background worker, autovacuum worker). + + + + + + io_context text + + + IO Context used (e.g. shared buffers, local buffers). + + + + + + alloc bigint + + + Number of buffers allocated. + + + + + + read bigint + + + Number of blocks read. + + + + + + write bigint + + + Number of blocks written. + + + + + + extend bigint + + + Number of blocks extended. + + + + + + fsync bigint + + + Number of blocks fsynced. + +
longfin and tamandua aren't too happy but I'm not sure why
Hi, longfin and tamandua recently begun failing like this, quite possibly as a result of 05d4cbf9b6ba708858984b01ca0fc56d59d4ec7c: +++ regress check in contrib/test_decoding +++ test ddl ... FAILED (test process exited with exit code 2) 3276 ms (all other tests in this suite also fail, probably because the server crashed here) The server logs look like this: 2022-09-27 13:51:08.652 EDT [37090:4] LOG: server process (PID 37105) was terminated by signal 4: Illegal instruction: 4 2022-09-27 13:51:08.652 EDT [37090:5] DETAIL: Failed process was running: SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); Both animals are running with -fsanitize=alignment and it's not difficult to believe that the commit mentioned above could have introduced an alignment problem where we didn't have one before, but without a stack backtrace I don't know how to track it down. I tried running those tests locally with -fsanitize=alignment and they passed. Any ideas on how to track this down? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Wed, Sep 28, 2022 at 1:03 AM Bharath Rupireddy wrote:> > On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro wrote: > > Something like the attached. > > Isn't it also better to add notes in win32pread.c and win32pwrite.c > about the callers doing lseek(SEEK_SET) if they wish to and Windows > implementations changing file position? We can also add a TODO item > about replacing pg_ versions with pread and friends someday when > Windows fixes the issue? Having it in the commit and include/port.h is > good, but the comments in win32pread.c and win32pwrite.c make life > easier IMO. Otherwise, the patch LGTM. Thanks, will do. FWIW I doubt the OS itself will change released behaviour like that, but I did complain about the straight-up misleading documentation and they agreed and fixed it[1]. After some looking around, the only way I could find to avoid this behaviour is to switch to async handles, which do behave as documented in this respect, but then you can't use plain read/write at all unless you write replacement wrappers for those too, and AFAICT that adds more system calls (start write, wait for write to finish) and complexity/weirdness without any real payoff so it seems like the wrong direction, at least without more research that I'm not pursuing currently. (FWIW in AIO porting experiments a while back we used async handles to get IOs running concurrently with CPU work and possibly other IOs so it was actually worth doing, but that was only for smgr and the main wal writing code where there's no intermixed plain read/write calls as you have here, so the problem doesn't even come up.) [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309
Re: pgsql: Increase width of RelFileNumbers from 32 bits to 56 bits.
On Tue, Sep 27, 2022 at 2:51 PM Justin Pryzby wrote: > This seems to be breaking cfbot: > https://cirrus-ci.com/github/postgresql-cfbot/postgresql > > For example: > https://cirrus-ci.com/task/6720256776339456 OK, so it looks like the pg_buffercache test is failing there. But it doesn't fail for me, and I don't see a regression.diffs file in the output that would enable me to see what is failing. If it's there, can you tell me how to find it? > Some other minor issues: Will push fixes. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Increase width of RelFileNumbers from 32 bits to 56 bits.
On Tue, Sep 27, 2022 at 03:12:56PM -0400, Robert Haas wrote: > On Tue, Sep 27, 2022 at 2:51 PM Justin Pryzby wrote: > > This seems to be breaking cfbot: > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql > > > > For example: > > https://cirrus-ci.com/task/6720256776339456 > > OK, so it looks like the pg_buffercache test is failing there. But it > doesn't fail for me, and I don't see a regression.diffs file in the > output that would enable me to see what is failing. If it's there, can > you tell me how to find it? It's here in the artifacts. https://api.cirrus-ci.com/v1/artifact/task/5647133427630080/testrun/build/testrun/pg_buffercache/regress/regression.diffs Actually, this worked under autoconf but failed under meson. I think you just need to make the corresponding change in contrib/pg_buffercache/meson.build that's in ./Makefile.
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
On 26.09.22 19:34, Tom Lane wrote: I think we can do this while still having reasonable type-safety by adding AssertVariableIsOfTypeMacro() checks to the macros. An advantage of that solution is that we verify that the code will be safe for a 32-bit build even in 64-bit builds. (Of course, it's just checking the variable's type not its lifespan, but this is still a step forward.) 0001 attached is what you committed, 0002 is a proposed delta to fix the Fast macros. Thanks, I committed it like that. (I had looked into AssertVariableIsOfTypeMacro() for an earlier variant of this patch, before I had the idea with the inline functions. It's in general a bit too strict, such as with short vs int, and signed vs unsigned, but it should work ok for this limited set of uses.)
Re: [PATCH] Log details for client certificate failures
On Tue, Sep 27, 2022 at 1:51 AM Masahiko Sawada wrote: > I think we can fix it by the attached patch but I'd like to discuss > whether it's worth fixing it. Whoops. So every time it's changed, we leak a little postmaster memory? Your patch looks good to me and I see no reason not to fix it. Thanks, --Jacob
Re: longfin and tamandua aren't too happy but I'm not sure why
On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote: > Both animals are running with -fsanitize=alignment and it's not > difficult to believe that the commit mentioned above could have > introduced an alignment problem where we didn't have one before, but > without a stack backtrace I don't know how to track it down. I tried > running those tests locally with -fsanitize=alignment and they passed. There's one here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-09-27%2018%3A43%3A06 /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/access/rmgrdesc/xactdesc.c:102:30: runtime error: member access within misaligned address 0x04125074 for type 'xl_xact_invals' (aka 'struct xl_xact_invals'), which requires 8 byte alignment #0 0x5b6702 in ParseCommitRecord /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/access/rmgrdesc/xactdesc.c:102:30 #1 0xb5264d in xact_decode /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/decode.c:201:5 #2 0xb521ac in LogicalDecodingProcessRecord /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/decode.c:119:3 #3 0xb5e868 in pg_logical_slot_get_changes_guts /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/logicalfuncs.c:271:5 #4 0xb5e25f in pg_logical_slot_get_changes /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/logicalfuncs.c:338:9 #5 0x896bba in ExecMakeTableFunctionResult /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execSRF.c:234:13 #6 0x8c7660 in FunctionNext /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/nodeFunctionscan.c:95:5 #7 0x899048 in ExecScanFetch /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execScan.c:133:9 #8 0x89896b in ExecScan /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execScan.c:199:10 #9 0x8c6892 in ExecFunctionScan /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/nodeFunctionscan.c:270:9 #10 0x892f42 in ExecProcNodeFirst /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execProcnode.c:464:9 #11 0x8802dd in ExecProcNode /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/include/executor/executor.h:259:9 #12 0x8802dd in ExecutePlan /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execMain.c:1636:10 #13 0x8802dd in standard_ExecutorRun /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execMain.c:363:3 #14 0x87ffbb in ExecutorRun /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execMain.c:307:3 #15 0xc36c07 in PortalRunSelect /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/pquery.c:924:4 #16 0xc364ca in PortalRun /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/pquery.c:768:18 #17 0xc34138 in exec_simple_query /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:1238:10 #18 0xc30953 in PostgresMain /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c #19 0xb27e3f in BackendRun /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4482:2 #20 0xb2738d in BackendStartup /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4210:3 #21 0xb2738d in ServerLoop /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1804:7 #22 0xb24312 in PostmasterMain /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1476:11 #23 0x953694 in main /mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:197:3 #24 0x7f834e39a209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #25 0x7f834e39a2bb in __libc_start_main csu/../csu/libc-start.c:389:3 #26 0x4a40a0 in _start (/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/tmp_install/mnt/resource/bf/build/kestrel/HEAD/inst/bin/postgres+0x4a40a0) Note that cfbot is warning for a different reason now: https://cirrus-ci.com/task/5794615155490816
Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Peter Eisentraut writes: > On 26.09.22 19:34, Tom Lane wrote: >> I think we can do this while still having reasonable type-safety >> by adding AssertVariableIsOfTypeMacro() checks to the macros. > (I had looked into AssertVariableIsOfTypeMacro() for an earlier variant > of this patch, before I had the idea with the inline functions. It's in > general a bit too strict, such as with short vs int, and signed vs > unsigned, but it should work ok for this limited set of uses.) Yeah. I had sort of expected to need a UInt64GetDatumFast variant that would accept uint64, but there doesn't appear to be anyplace that wants that today. We should be willing to add it if anyone complains, though. regards, tom lane
Re: longfin and tamandua aren't too happy but I'm not sure why
Justin Pryzby writes: > On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote: >> Both animals are running with -fsanitize=alignment and it's not >> difficult to believe that the commit mentioned above could have >> introduced an alignment problem where we didn't have one before, but >> without a stack backtrace I don't know how to track it down. I tried >> running those tests locally with -fsanitize=alignment and they passed. > There's one here: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-09-27%2018%3A43%3A06 On longfin's host, the test_decoding run produces two core files. One has a backtrace like this: * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30 frame #1: 0x00010a765f9e postgres`xact_decode(ctx=0x7fa0680d9118, buf=0x7ff7b5c51000) at decode.c:201:5 [opt] frame #2: 0x00010a765d17 postgres`LogicalDecodingProcessRecord(ctx=0x7fa0680d9118, record=) at decode.c:119:3 [opt] frame #3: 0x00010a76d890 postgres`pg_logical_slot_get_changes_guts(fcinfo=, confirm=true, binary=false) at logicalfuncs.c:271:5 [opt] frame #4: 0x00010a76d320 postgres`pg_logical_slot_get_changes(fcinfo=) at logicalfuncs.c:338:9 [opt] frame #5: 0x00010a5a521d postgres`ExecMakeTableFunctionResult(setexpr=, econtext=0x7fa068098f50, argContext=, expectedDesc=0x7fa06701ba38, randomAccess=) at execSRF.c:234:13 [opt] frame #6: 0x00010a5c405b postgres`FunctionNext(node=0x7fa068098d40) at nodeFunctionscan.c:95:5 [opt] frame #7: 0x00010a5a61b9 postgres`ExecScan(node=0x7fa068098d40, accessMtd=(postgres`FunctionNext at nodeFunctionscan.c:61), recheckMtd=(postgres`FunctionRecheck at nodeFunctionscan.c:251)) at execScan.c:199:10 [opt] frame #8: 0x00010a596ee0 postgres`standard_ExecutorRun [inlined] ExecProcNode(node=0x7fa068098d40) at executor.h:259:9 [opt] frame #9: 0x00010a596eb8 postgres`standard_ExecutorRun [inlined] ExecutePlan(estate=, planstate=0x7fa068098d40, use_parallel_mode=, operation=CMD_SELECT, sendTuples=, numberTuples=0, direction=1745456112, dest=0x7fa067023848, execute_once=) at execMain.c:1636:10 [opt] frame #10: 0x00010a596e2a postgres`standard_ExecutorRun(queryDesc=, direction=1745456112, count=0, execute_once=) at execMain.c:363:3 [opt] and the other * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', xlrec=0x7fa06783a090, parsed=0x7ff7b5c50040) at xactdesc.c:102:30 frame #1: 0x00010a3cd24d postgres`xact_redo(record=0x7fa0670096c8) at xact.c:6161:3 frame #2: 0x00010a41770d postgres`ApplyWalRecord(xlogreader=0x7fa0670096c8, record=0x7fa06783a060, replayTLI=0x7ff7b5c507f0) at xlogrecovery.c:1897:2 frame #3: 0x00010a4154be postgres`PerformWalRecovery at xlogrecovery.c:1728:4 frame #4: 0x00010a3e0dc7 postgres`StartupXLOG at xlog.c:5473:3 frame #5: 0x00010a7498a0 postgres`StartupProcessMain at startup.c:267:2 [opt] frame #6: 0x00010a73e2cb postgres`AuxiliaryProcessMain(auxtype=StartupProcess) at auxprocess.c:141:4 [opt] frame #7: 0x00010a745b97 postgres`StartChildProcess(type=StartupProcess) at postmaster.c:5408:3 [opt] frame #8: 0x00010a7487e2 postgres`PostmasterStateMachine at postmaster.c:4006:16 [opt] frame #9: 0x00010a745804 postgres`reaper(postgres_signal_arg=) at postmaster.c:3256:2 [opt] frame #10: 0x7ff815b16dfd libsystem_platform.dylib`_sigtramp + 29 frame #11: 0x7ff815accd5b libsystem_kernel.dylib`__select + 11 frame #12: 0x00010a74689c postgres`ServerLoop at postmaster.c:1768:13 [opt] frame #13: 0x00010a743fbb postgres`PostmasterMain(argc=, argv=0x606480a0) at postmaster.c:1476:11 [opt] frame #14: 0x00010a61c775 postgres`main(argc=8, argv=) at main.c:197:3 [opt] Looks like it might be the same bug, but perhaps not. I recompiled access/transam and access/rmgrdesc at -O0 to get the accurate line numbers shown for those files. Let me know if you need any more info; I can add -O0 in more places, or poke around in the cores. regards, tom lane
Re: longfin and tamandua aren't too happy but I'm not sure why
I wrote: > * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', > xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30 Okay, so the problem is this: by widening RelFileNumber to 64 bits, you have increased the alignment requirement of struct RelFileLocator, and thereby also SharedInvalidationMessage, to 8 bytes where it had been 4. longfin's alignment check is therefore expecting that xl_xact_twophase will likewise be 8-byte-aligned, but it isn't: (lldb) p data (char *) $0 = 0x7fa06783a0a4 "\U0001" I'm not sure whether the code that generates commit WAL records is breaking a contract it should maintain, or xactdesc.c needs to be taught to not assume that this data is adequately aligned. There is a second problem that I am going to hold your feet to the fire about: (lldb) p sizeof(SharedInvalidationMessage) (unsigned long) $1 = 24 We have sweated a good deal for a long time to keep that struct to 16 bytes. I do not think 50% bloat is acceptable. regards, tom lane
Re: longfin and tamandua aren't too happy but I'm not sure why
On Tue, Sep 27, 2022 at 4:50 PM Tom Lane wrote: > I wrote: > > * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', > > xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30 > > Okay, so the problem is this: by widening RelFileNumber to 64 bits, > you have increased the alignment requirement of struct RelFileLocator, > and thereby also SharedInvalidationMessage, to 8 bytes where it had > been 4. longfin's alignment check is therefore expecting that > xl_xact_twophase will likewise be 8-byte-aligned, but it isn't: Yeah, I reached the same conclusion. > There is a second problem that I am going to hold your feet to the > fire about: > > (lldb) p sizeof(SharedInvalidationMessage) > (unsigned long) $1 = 24 > > We have sweated a good deal for a long time to keep that struct > to 16 bytes. I do not think 50% bloat is acceptable. I noticed that problem, too. The attached patch, which perhaps you can try out, fixes the alignment issue and also reduces the size of SharedInvalidationMessage from 24 bytes back to 20 bytes. I do not really see a way to do better than that. We use 1 type byte, 3 bytes for the backend ID, 4 bytes for the database OID, and 4 bytes for the tablespace OID. Previously, we then used 4 bytes for the relfilenode, but now we need 7, and there's no place from which we can plausibly steal those bits, at least not as far as I can see. If you have ideas, I'm all ears. Also, I don't really know what problem you think it's going to cause if that structure gets bigger. If we increased the size from 16 bytes even all the way to 32 or 64 bytes, what negative impact do you think that would have? It would use a little bit more shared memory, but on modern systems I doubt it would be enough to get excited about. The bigger impact would probably be that it would make commit records a bit bigger since those carry invalidations as payload. That is not great, but I think it only affects commit records for transactions that do DDL, so I'm struggling to see that as a big performance problem. -- Robert Haas EDB: http://www.enterprisedb.com v1-0001-tweak-SharedInvalSmgrMsg.patch Description: Binary data
Re: longfin and tamandua aren't too happy but I'm not sure why
... also, lapwing's not too happy [1]. The alter_table test expects this to yield zero rows, but it doesn't: SELECT m.* FROM filenode_mapping m LEFT JOIN pg_class c ON c.oid = m.oid WHERE c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL; I've reproduced that symptom in a 32-bit FreeBSD VM building with clang, so I suspect that it'll occur on any 32-bit build. mamba is a couple hours away from offering a confirmatory data point, though. (BTW, is that test case sane at all? I'm bemused by the symmetrical NOT NULL tests on a fundamentally not-symmetrical left join; what are those supposed to accomplish? Also, the fact that it doesn't deign to show any fields from "c" is sure making it hard to tell what's wrong.) regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2022-09-27%2018%3A40%3A18
Re: Allow foreign keys to reference a superset of unique columns
> For one example of where the semantics get fuzzy, it's not > very clear how the extra-baggage columns ought to participate in > CASCADE updates. Currently, if we have >CREATE TABLE foo (a integer PRIMARY KEY, b integer); > then an update that changes only foo.b doesn't need to update > referencing tables, and I think we even have optimizations that > assume that if no unique-key columns are touched then RI checks > need not be made. But if you did >CREATE TABLE bar (x integer, y integer, > FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE); > then perhaps you expect bar.y to be updated ... or maybe you don't? I'd expect bar.y to be updated. In my mind, the FOREIGN KEY constraint should behave the same, regardless of whether the underlying unique index on the referenced side is an equivalent set to, or a strict subset of, the referenced columns. > Another example is that I think the idea is only well-defined when > the subset column(s) are a primary key, or at least all marked NOT NULL. > Otherwise they're not as unique as you're claiming. But then the FK > constraint really has to be dependent on a PK constraint not just an > index definition, since indexes in themselves don't enforce not-nullness. > That gets back to Peter's complaint that referring to an index isn't > good enough. I think that uniqueness should be guaranteed enough even if the subset columns are nullable: CREATE TABLE foo (a integer UNIQUE, b integer); CREATE TABLE bar ( x integer, y integer, FOREIGN KEY (x, y) REFERENCES foo(a, b) ); The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo. However, such a row can't be the target of the foreign key constraint anyway. So, I'm fairly certain that, where it matters, a unique index on a nullable subset of the referenced columns guarantees a distinct referenced row. > It's also unclear to me how this ought to interact with the > information_schema views concerning foreign keys. We generally > feel that we don't want to present any non-SQL-compatible data > in information_schema, for fear that it will confuse applications > that expect to see SQL-spec behavior there. So do we leave such > FKs out of the views altogether, or show only the columns involving > the associated unique constraint? Neither answer seems pleasant. Here's the information_schema output for this example: CREATE TABLE foo (a integer, b integer); CREATE UNIQUE INDEX ON foo (a, b); CREATE TABLE bar ( x integer, y integer, FOREIGN KEY (x, y) REFERENCES foo(a, b) ); # SELECT * FROM information_schema.referential_constraints WHERE constraint_name = 'bar_x_y_fkey'; -[ RECORD 1 ]-+-- constraint_catalog| kaitingc constraint_schema | public constraint_name | bar_x_y_fkey unique_constraint_catalog | unique_constraint_schema | unique_constraint_name| match_option | NONE update_rule | NO ACTION delete_rule | NO ACTION # SELECT * FROM information_schema.key_column_usage WHERE constraint_name = 'bar_x_y_fkey'; -[ RECORD 173 ]---+-- constraint_catalog| kaitingc constraint_schema | public constraint_name | bar_x_y_fkey table_catalog | kaitingc table_schema | public table_name| bar column_name | x ordinal_position | 1 position_in_unique_constraint | 1 -[ RECORD 174 ]---+-- constraint_catalog| kaitingc constraint_schema | public constraint_name | bar_x_y_fkey table_catalog | kaitingc table_schema | public table_name| bar column_name | y ordinal_position | 2 position_in_unique_constraint | 2 It appears that currently in PostgreSQL, the unique_constraint_catalog, schema, and name are NULL in referential_constraints when a unique index (without an associated unique constraint) underlies the referenced columns. The behaviour I'm proposing would have the same behavior vis-a-vis referential_constraints. As for key_column_usage, I propose that position_in_unique_constraint be NULL if the referenced column isn't indexed.
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Mon, Sep 26, 2022 at 6:39 PM Andrey Chudnovsky wrote: > For the providing token directly, that would be primarily used for > scenarios where the same party controls both the server and the client > side wrapper. > I.e. The client knows how to get a token for a particular principal > and doesn't need any additional information other than human readable > messages. > Please clarify the scenarios where you see this falling apart. The most concrete example I can see is with the OAUTHBEARER error response. If you want to eventually handle differing scopes per role, or different error statuses (which the proof-of-concept currently hardcodes as `invalid_token`), then the client can't assume it knows what the server is going to say there. I think that's true even if you control both sides and are hardcoding the provider. How should we communicate those pieces to a custom client when it's passing a token directly? The easiest way I can see is for the custom client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin). If you had to parse the libpq error message, I don't think that'd be particularly maintainable. > I can provide an example in the cloud world. We (Azure) as well as > other providers offer ways to obtain OAUTH tokens for > Service-to-Service communication at IAAS / PAAS level. > on Azure "Managed Identity" feature integrated in Compute VM allows a > client to make a local http call to get a token. VM itself manages the > certificate livecycle, as well as implements the corresponding OAUTH > flow. > This capability is used by both our 1st party PAAS offerings, as well > as 3rd party services deploying on VMs or managed K8S clusters. > Here, the client doesn't need libpq assistance in obtaining the token. Cool. To me that's the strongest argument yet for directly providing tokens to libpq. > My optimistic plan here would be to implement several core OAUTH flows > in libpq core which would be generic enough to support major > enterprise OAUTH providers: > 1. Client Credentials flow (Client_id + Client_secret) for backend > applications. > 2. Authorization Code Flow with PKCE and/or Device code flow for GUI > applications. As long as it's clear to DBAs when to use which flow (because existing documentation for that is hit-and-miss), I think it's reasonable to eventually support multiple flows. Personally my preference would be to start with one or two core flows, and expand outward once we're sure that we do those perfectly. Otherwise the explosion of knobs and buttons might be overwhelming, both to users and devs. Related to the question of flows is the client implementation library. I've mentioned that I don't think iddawc is production-ready. As far as I'm aware, there is only one certified OpenID relying party written in C, and that's... an Apache server plugin. That leaves us either choosing an untested library, scouring the web for a "tested" library (and hoping we're right in our assessment), or implementing our own (which is going to tamp down enthusiasm for supporting many flows, though that has its own set of benefits). If you know of any reliable implementations with a C API, please let me know. > (2.) above would require a protocol between libpq and upstream clients > to exchange several messages. > Your patch includes a way for libpq to deliver to the client a message > about the next authentication steps, so planned to build on top of > that. Specifically it delivers that message to an end user. If you want a generic machine client to be able to use that, then we'll need to talk about how. > A little about scenarios, we look at. > What we're trying to achieve here is an easy integration path for > multiple players in the ecosystem: > - Managed PaaS Postgres providers (both us and multi-cloud solutions) > - SaaS providers deploying postgres on IaaS/PaaS providers' clouds > - Tools - pg_admin, psql and other ones. > - BI, ETL, Federation and other scenarios where postgres is used as > the data source. > > If we can offer a provider agnostic solution for Backend <=> libpq <=> > Upstreal client path, we can have all players above build support for > OAUTH credentials, managed by the cloud provider of their choice. Well... I don't quite understand why we'd go to the trouble of providing a provider-agnostic communication solution only to have everyone write their own provider-specific client support. Unless you're saying Microsoft would provide an officially blessed plugin for the *server* side only, and Google would provide one of their own, and so on. The server side authorization is the only place where I think it makes sense to specialize by default. libpq should remain agnostic, with the understanding that we'll need to make hard decisions when a major provider decides not to follow a spec. > For us, that would mean: > - Better administrator experience with pg_admin / psql handling of the > AAD (Azure Active Directory) authentication flows. > - Path for integration s
Re: longfin and tamandua aren't too happy but I'm not sure why
Robert Haas writes: > On Tue, Sep 27, 2022 at 4:50 PM Tom Lane wrote: >> There is a second problem that I am going to hold your feet to the >> fire about: >> (lldb) p sizeof(SharedInvalidationMessage) >> (unsigned long) $1 = 24 > Also, I don't really know what problem you think it's going to cause > if that structure gets bigger. If we increased the size from 16 bytes > even all the way to 32 or 64 bytes, what negative impact do you think > that would have? Maybe it wouldn't have any great impact. I don't know, but I don't think it's incumbent on me to measure that. You or the patch author should have had a handle on that question *before* committing. regards, tom lane
Re: Allow foreign keys to reference a superset of unique columns
> As I was reading through the email chain I had this thought: could you > get the same benefit (or 90% of it anyway) by instead allowing the > creation of a uniqueness constraint that contains more columns than > the index backing it? So long as the index backing it still guaranteed > the uniqueness on a subset of columns that would seem to be safe. > After writing down that idea I noticed Wolfgang Walther had commented > similarly, but it appears that that idea got lost (or at least not > responded to). Is it necessary to have the unique constraint at all? This currently works in PostgreSQL: CREATE TABLE foo (a integer, b integer); CREATE UNIQUE INDEX ON foo (a, b); CREATE TABLE bar ( x integer, y integer, FOREIGN KEY (x, y) REFERENCES foo(a, b) ); Where no unique constraint exists on foo (a, b). Forcing the creation of a unique constraint in this case seems more confusing to me, as a user, than allowing it without the definition of the unique constraint, given the existing behavior. > I'd be happy to sign up to review an updated patch if you're > interested in continuing this effort. If so, could you register the > patch in the CF app (if not there already)? The patch should already be registered! Though it's still in a state that needs a lot of work.
Re: Allow foreign keys to reference a superset of unique columns
> Could we create this additional unique constraint implicitly, when using > FOREIGN KEY ... REFERENCES on a superset of unique columns? This would > make it easier to use, but still give proper information_schema output. Currently, a foreign key declared where the referenced columns have only a unique index and not a unique constraint already populates the constraint related columns of information_schema.referential_constraints with NULL. It doesn't seem like this change would require a major deviation from the existing behavior in information_schema: CREATE TABLE foo (a integer, b integer); CREATE UNIQUE INDEX ON foo (a, b); CREATE TABLE bar ( x integer, y integer, FOREIGN KEY (x, y) REFERENCES foo(a, b) ); # SELECT * FROM information_schema.referential_constraints WHERE constraint_name = 'bar_x_y_fkey'; -[ RECORD 1 ]-+-- constraint_catalog| kaitingc constraint_schema | public constraint_name | bar_x_y_fkey unique_constraint_catalog | unique_constraint_schema | unique_constraint_name| match_option | NONE update_rule | NO ACTION delete_rule | NO ACTION The only change would be to information_schema.key_column_usage: # SELECT * FROM information_schema.key_column_usage WHERE constraint_name = 'bar_x_y_fkey'; -[ RECORD 173 ]---+-- constraint_catalog| kaitingc constraint_schema | public constraint_name | bar_x_y_fkey table_catalog | kaitingc table_schema | public table_name| bar column_name | x ordinal_position | 1 position_in_unique_constraint | 1 -[ RECORD 174 ]---+-- constraint_catalog| kaitingc constraint_schema | public constraint_name | bar_x_y_fkey table_catalog | kaitingc table_schema | public table_name| bar column_name | y ordinal_position | 2 position_in_unique_constraint | 2 Where position_in_unique_constraint would have to be NULL for the referenced columns that don't appear in the unique index. That column is already nullable: For a foreign-key constraint, ordinal position of the referenced column within its unique constraint (count starts at 1); otherwise null So it seems like this would be a minor documentation change at most. Also, should that documentation be updated to mention that it's actually the "ordinal position of the referenced column within its unique index" (since it's a little confusing that in referential_constraints, unique_constraint_name is NULL)?
Re: Allow foreign keys to reference a superset of unique columns
Kaiting Chen writes: >> Another example is that I think the idea is only well-defined when >> the subset column(s) are a primary key, or at least all marked NOT NULL. >> Otherwise they're not as unique as you're claiming. But then the FK >> constraint really has to be dependent on a PK constraint not just an >> index definition, since indexes in themselves don't enforce not-nullness. >> That gets back to Peter's complaint that referring to an index isn't >> good enough. > The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique > if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo. > However, such a row can't be the target of the foreign key constraint > anyway. You're ignoring the possibility of a MATCH PARTIAL FK constraint. Admittedly, we don't implement those today, and there hasn't been a lot of interest in doing so. But they're in the SQL spec so we should fix that someday. I also wonder how this all interacts with the UNIQUE NULLS NOT DISTINCT feature that we just got done implementing for v15. I don't know if the spec says that an FK depending on such a constraint should treat nulls as ordinary unique values --- but it sure seems like that'd be a plausible user expectation. The bottom line is there's zero chance you'll ever convince me that this is a good idea. I think the semantics are at best questionable, I think it will break important optimizations, and I think the chances of finding ourselves in conflict with some future SQL spec extension are too high. (Even if you can make the case that this isn't violating the spec *today*, which I rather doubt so far as the information_schema is concerned. The fact that we've got legacy behaviors that are outside the spec there isn't a great argument for adding more.) Now, if you can persuade the SQL committee that this behavior should be standardized, then two of those concerns would go away (since I don't think you'll get squishy semantics past them). But I think submitting a patch now is way premature and mostly going to waste people's time. regards, tom lane
Re: SYSTEM_USER reserved word implementation
On 9/26/22 06:29, Drouvot, Bertrand wrote: > Please find attached V4 taking care of Jacob's previous comments. > + /* > + * InitializeSystemUser should already be called once we are sure that > + * authn_id is not NULL (means auth_method is actually valid). > + * But keep the test here also for safety. > + */ > + if (authn_id) Since there are only internal clients to the API, I'd argue this makes more sense as an Assert(authn_id != NULL), but I don't think it's a dealbreaker. > As far the assertion failure mentioned by Michael when moving the > SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is > safe to force the collation to C_COLLATION_OID for SQLValueFunction > having a TEXT type, but I would be happy to also hear your thoughts > about it. Unfortunately I don't have much to add here; I don't know enough about the underlying problems. Thanks, --Jacob
Re: Allow foreign keys to reference a superset of unique columns
>>> Another example is that I think the idea is only well-defined when >>> the subset column(s) are a primary key, or at least all marked NOT NULL. >>> Otherwise they're not as unique as you're claiming. But then the FK >>> constraint really has to be dependent on a PK constraint not just an >>> index definition, since indexes in themselves don't enforce not-nullness. >>> That gets back to Peter's complaint that referring to an index isn't >>> good enough. >> The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique >> if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo. >> However, such a row can't be the target of the foreign key constraint >> anyway. > You're ignoring the possibility of a MATCH PARTIAL FK constraint. > Admittedly, we don't implement those today, and there hasn't been > a lot of interest in doing so. But they're in the SQL spec so we > should fix that someday. > I also wonder how this all interacts with the UNIQUE NULLS NOT > DISTINCT feature that we just got done implementing for v15. > I don't know if the spec says that an FK depending on such a > constraint should treat nulls as ordinary unique values --- but > it sure seems like that'd be a plausible user expectation. I don't think that the UNIQUE NULLS DISTINCT/NOT DISTINCT patch will have any impact on this proposal. Currently (and admittedly I haven't thought at all about MATCH PARTIAL), a NULL in a referencing row precludes a reference at all: * If the foreign key constraint is declared MATCH SIMPLE, then no referenced row exists for the referencing row. * If the foreign key constraint is declared MATCH FULL, then the referencing row must not have a NULL in any of its referencing columns. UNIQUE NULLS NOT DISTINCT is the current behavior, and this proposql shouldn't have a problem with the current behavior. In the case of UNIQUE NULLS DISTINCT, then NULLs behave, from a uniqueness perspective, as a singleton value and thus shouldn't cause any additional semantic difficulties in regards to this proposal. I don't have access to a copy of the SQL specification and it doesn't look like anyone implements MATCH PARTIAL. Based on what I can gather from the internet, it appears that MATCH PARTIAL allows at most one referencing column to be NULL, and guarantees that at least one row in the referenced table matches the remaining columns; implicitly, multiple matches are allowed. If these are the semantics of MATCH PARTIAL, then it seems to me that uniqueness of the referenced rows aren't very important. What other semantics and edge cases regarding this proposal should I consider?
Re: Allow foreign keys to reference a superset of unique columns
> So the broader point I'm trying to make is that, as I understand it, > indexes backing foreign key constraints is an implementation detail. > The SQL standard details the behavior of foreign key constraints > regardless of implementation details like a backing index. That means > that the behavior of two column foreign key constraints is defined in > a single way whether or not there's a backing index at all or whether > such a backing index, if present, contains one or two columns. > I understand that for the use case you're describing this isn't the > absolute most efficient way to implement the desired data semantics. > But it would be incredibly confusing (and, I think, a violation of the > SQL standard) to have one foreign key constraint work in a different > way from another such constraint when both are indistinguishable at > the constraint level (the backing index isn't an attribute of the > constraint; it's merely an implementation detail). It appears to me that the unique index backing a foreign key constraint *isn't* an implementation detail in PostgreSQL; rather, the *unique constraint* is the implementation detail. The reason I say this is because it's possible to create a foreign key constraint where the uniqueness of referenced columns are guaranteed by only a unique index and where no such unique constraint exists. Specifically, this line in the documentation: The referenced columns must be the columns of a non-deferrable unique or primary key constraint in the referenced table. Isn't true. In practice, the referenced columns must be the columns of a valid, nondeferrable, nonfunctional, nonpartial, unique index. Whether or not a unique constraint exists is immaterial to whether or not postgres will let you define such a foreign key constraint.
Re: GUC values - recommended way to declare the C variables?
On Tue, Sep 27, 2022 at 11:07 AM Peter Smith wrote: > ... > I will try to post a patch in the new few days to address (per your > suggestions) some of the variables that I am more familiar with. > PSA a small patch to tidy a few of the GUC C variables - adding comments and removing unnecessary declaration assignments. make check-world passed OK. -- Kind Regards, Peter Smith. Fujitsu Australia. v1-0001-Tidied-some-GUC-C-variable-declarations.patch Description: Binary data
Re: Adding a clang-format file
On Tue, Sep 27, 2022 at 5:35 AM Aleksander Alekseev wrote: > Personally I don't have anything against the idea. TimescaleDB uses > clang-format to mimic pgindent and it works quite well. One problem > worth mentioning though is that the clang-format file is dependent on > the particular version of clang-format. I was hoping that something generic could work here. Something that we could provide that didn't claim to be authoritative, that has a reasonable degree of compatibility that allows most people to use the file without much fuss. Kind of like our .editorconfig file. That might not be a realistic goal, though, since the clang-format settings are all quite complicated. -- Peter Geoghegan
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Tue, Sep 27, 2022 at 03:11:54PM +0530, Bharath Rupireddy wrote: > On Tue, Sep 27, 2022 at 2:20 PM Kyotaro Horiguchi > wrote: >> If this is still unacceptable, I propose to change the comment. (I >> found that the previous patch forgets about do_pg_backup_stop()) >> >> - * It fills in backup_state with the information required for the backup, >> + * It fills in the parameter "state" with the information required for the >> backup, > > +1. There's another place that uses backup_state in the comments. I > modified that as well. Please see the attached patch. Thanks, fixed the comments. I have let the variable names as they are now in the code, as both are backup-related code paths so it is IMO clear that the state is linked to a backup. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Log details for client certificate failures
On Wed, Sep 28, 2022 at 4:44 AM Jacob Champion wrote: > > On Tue, Sep 27, 2022 at 1:51 AM Masahiko Sawada wrote: > > I think we can fix it by the attached patch but I'd like to discuss > > whether it's worth fixing it. > > Whoops. So every time it's changed, we leak a little postmaster memory? No. Since cluster_name is PGC_POSTMATER, we leak a little postmaster memory only once when starting up. application_name is PGC_USERSET but since we normally allocate memory in PortalMemoryContext we eventually can free it. Since check_cluster_name and check_application_name are similar, I changed both for consistency. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"
On Tue, Sep 27, 2022 at 01:38:26AM +, kuroda.hay...@fujitsu.com wrote: > Maybe you mentioned about sigint_interrupt_enabled, > and it also seems to be modified in the signal handler. Yeah, at least as of the cancel callback psql_cancel_callback() that handle_sigint() would call on SIGINT as this is set by psql. So it does not seem right to use a boolean rather than a sig_atomic_t in this case, as well. -- Michael signature.asc Description: PGP signature
Re: GUC tables - use designated initializers
On Wed, Sep 28, 2022 at 2:21 AM Tom Lane wrote: > > Peter Smith writes: > > Enums index a number of the GUC tables. This all relies on the > > elements being carefully arranged to be in the same order as those > > enums. There are comments to say what enum index belongs to each table > > element. > > But why not use designated initializers to enforce what the comments > > are hoping for? > > Interesting proposal, but it's not clear to me that this solution makes > the code more bulletproof rather than less so. Yeah, you removed the > order dependency, but the other concern here is that the array gets > updated at all when adding a new enum entry. This method seems like > it'd help disguise such an oversight. In particular, the adjacent > StaticAssertDecls about the array lengths are testing something different > than they used to, and I fear they lost some of their power. Thanks for the feedback! The current code StaticAssertDecl asserts that the array length is the same as the number of enums by using hardwired knowledge of what enum is the "last" one (e.g. DEVELOPER_OPTIONS in the example below). StaticAssertDecl(lengthof(config_group_names) == (DEVELOPER_OPTIONS + 2), "array length mismatch"); Hmmm. I think maybe I found the example to justify your fear. It's a bit subtle and AFAIK the HEAD code would not suffer this problem --- imagine if the developer adds the new enum before the "last" one (e.g. ADD_NEW_BEFORE_LAST comes before DEVELOPER_OPTIONS) and at the same time they *forgot* to update the table elements, then that designated index [DEVELOPER_OPTIONS] will still ensure the table becomes the correct increased length (so the StaticAssertDecl will be ok) except now there will be an undetected "hole" in the table at the forgotten [ADD_NEW_BEFORE_LAST] index. > Can we > improve those checks so they'd catch a missing entry again? Thinking... -- Kind Regards, Peter Smith. Fujitsu Australia
meson vs mingw, plpython, readline and other fun
Hi, Melih mentioned on IM that while he could build postgres with meson on windows w/ mingw, the tests didn't run. Issues: - The bit computing PATH to the temporary installation for running tests only dealt with backward slashes in paths on windows, because that's what python.org python uses by default. But a msys ucrt python returns forward slashes. Trivial fix. I didn't encounter this because I'd used a meson from git, which thus didn't have msys's patch to return a different prefix. This made pg_regress/isolationtester tests other than the main regression tests pass. - I'd only passed in a fake HOST_TUPLE when building pg_regress, oops. I don't think it makes sense to come up with a config.guess compatible name - they're quite random. And we can't rely on shell to work when targetting msvc. The attached patch does: # Need make up something roughly like x86_64-pc-mingw64. resultmap matches on # patterns like ".*-.*-mingw.*". We probably can do better, but for now just # replace 'gcc' with 'mingw' on windows. host_tuple_cc = cc.get_id() if host_system == 'windows' and host_tuple_cc == 'gcc' host_tuple_cc = 'mingw' endif host_tuple = '@0@-@1@-@2@'.format(host_cpu, host_system, host_tuple_cc) which I don't perfectly like (e.g. clang also kind of works on windows), but it seems ok enough for now. I suspect we'd need a bunch of other changes to make clang on windows work. This made the main pg_regress tests pass. - I had not added the logic to not use existing getopt on mingw, causing tap tests to fail. Fixing that didn't immediately work because of duplicate symbols - because I hadn't copied over -Wl,--allow-multiple-definition. "Contrary" to the comment in src/template/win32 it doesn't appear to be needed for libpq and pgport - likely because for the meson build an export file is generated (needed for msvc anyway, I didn't think to disable it for mingw). But since we need to be able to override getopt(), we obviously need --allow-multiple-definition anyway. - This lead me to try to also add -Wl,--disable-auto-import. However, that caused two problems. 1) plpython tests started to fail, due to not finding Pg_magic_func in plpython3.dll. This confounded me for quite a while. It worked for every other extension .dll? A lot of looking around lead me to define #define PGDLLEXPORT __declspec (dllexport) for mingw as well. For mingw we otherwise end up with #define PGDLLEXPORT __attribute__((visibility("default"))) which works. As far as I can tell __attribute__((visibility("default"))) works as long as as there's no declspec(dllexport) symbol in the same dll. If there is, it stops working. Ugh. I don't see a reason not to define PGDLLEXPORT as __declspec(dllexport) for mingw as well? I suspect this is an issue for autoconf mingw build as well, but that fails to even configure - there's no coverage on the BF for it I think. This made plpython's test pass (again). 2) psql failed to build due to readline. I hadn't implemented disabling it automatically. Somewhat luckily - turns out it actually works (as long as --disable-auto-import isn't used), including autocomplete! The issue afaict is that while readline has an import library, functions aren't "annotated" with __declspec(dllimport), thus without --enable-auto-import the references are assumed to be local, and thus linking fails. It's possible we could "fix" this by declaring the relevant symbols ourselves or such. But for now just adding --enable-auto-import to the flags used to link to readline seems saner? I think it'd be very cool to finally have a working readline on windows. Unfortunately IO::Pty isn't installable on windows, it'd have been interesting to see how well that readline works. - Before I updated mingw, interactive psql didn't show a prompt, making me think something was broken. That turned out to be the same in an autoconf build. When inside the msys terminal (mintty) isatty returned 0, because of some detail of how it emulates ttys. After updating mingw that problem is gone. I included this partially so I have something to find in email next time I search for mintty and isatty :) With these things fixed, postgres built and ran tests successfully! With nearly all "relevant" dependencies: icu, libxml, libslt, lz4, nls, plperl, plpython, pltcl, readline, ssl, zlib, zstd Missing are gss and uuid. Both apparently work on windows, but they're not in the msys repository, and I don't feel like compiling them myself. Only 5 tests skipped: - recovery/017_shm - probably not applicable - recovery/022_crash_temp_files - I don't think it's ok that this test skips, but that's for another thread - psql/010_tab_completion - IO::Pty can't be installed - psql/010_cancel - IO::Pty can't be installed - ldap/001_auth - doesn't know how to find slap
Re: Assign TupleTableSlot->tts_tableOid duplicated in tale AM.
On Tue, Sep 20, 2022 at 11:51 PM Wenchao Zhang wrote: > We can get the two assigned values are same by reading codes. Maybe we should > remove one? > > What's more, we find that maybe we assign slot->tts_tableOid in outer > interface like table_scan_getnextslot may be better and more friendly when we > import other pluggable storage formats. I suppose that you're right; it really should happen in exactly one place, based on some overarching theory about how tts_tableOid works with the table AM interface. I just don't know what that theory is. -- Peter Geoghegan
Re: Strip -mmacosx-version-min options from plperl build
Hi, On 2022-08-30 09:35:51 -0400, Andrew Dunstan wrote: > On 2022-08-26 Fr 16:25, Andres Freund wrote: > > On 2022-08-26 16:00:31 -0400, Tom Lane wrote: > >> Andrew Dunstan writes: > >>> On 2022-08-26 Fr 12:11, Tom Lane wrote: > And if that doesn't help, try -Wl,--export-all-symbols > >>> worked > > Except that it's only happening for plperl, I'd wonder if it's possibly > > related to our magic symbols being prefixed with _. I noticed that the > > underscore prefix e.g. changes the behaviour of gcc's "collect2" on AIX, > > which > > is responsible for exporting symbols etc. > > > > > >> Hmph. Hard to see how that isn't a linker bug. > > Agreed, given that this is only happening with plperl, and not with any of > > the > > other extensions... > > > > > >> As a stopgap to get the farm green again, I propose adding something like > >> > >> ifeq ($(PORTNAME), cygwin) > >> SHLIB_LINK += -Wl,--export-all-symbols > >> endif > >> > >> to plperl's makefile. > > :( > > > > It doesn't make me very happy either, but nobody seems to have a better > idea. The plpython issue I was investigating in https://postgr.es/m/20220928022724.erzuk5v4ai4b53do%40awork3.anarazel.de feels eerily similar to the issue here. I wonder if it's the same problem - __attribute__((visibility("default"))) works to export - unless another symbol uses __declspec (dllexport). In the referenced thread that was PyInit_plpy(), here it could be some perl generated one. Does this issue resolved if you add #define PGDLLEXPORT __declspec (dllexport) to cygwin.h? Without the -Wl,--export-all-symbols of course. Greetings, Andres Freund
Re: SYSTEM_USER reserved word implementation
On Tue, Sep 27, 2022 at 03:38:49PM -0700, Jacob Champion wrote: > On 9/26/22 06:29, Drouvot, Bertrand wrote: > Since there are only internal clients to the API, I'd argue this makes > more sense as an Assert(authn_id != NULL), but I don't think it's a > dealbreaker. Using an assert() looks like a good idea from here. If this is called with a NULL authn, this could reflect a problem in the authentication logic. >> As far the assertion failure mentioned by Michael when moving the >> SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is >> safe to force the collation to C_COLLATION_OID for SQLValueFunction >> having a TEXT type, but I would be happy to also hear your thoughts >> about it. > > Unfortunately I don't have much to add here; I don't know enough about > the underlying problems. I have been looking at that, and after putting my hands on that this comes down to the facility introduced in 40c24bf. So, I think that we'd better use COERCE_SQL_SYNTAX so as there is no need to worry about the shortcuts this patch is trying to use with the collation setup. And there are a few tests for get_func_sql_syntax() in create_view.sql. Note that this makes the patch slightly shorter, and simpler. The docs still mentioned "name", and not "text". This brings in a second point. 40c24bf has refrained from removing SQLValueFunction, but based the experience on this thread I see a pretty good argument in doing the jump once and for all. This deserves a separate discussion, though. I'll do that and create a new thread. -- Michael diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 8b72f8a215..e89703a3bf 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -1508,6 +1508,9 @@ { oid => '746', descr => 'session user name', proname => 'session_user', provolatile => 's', prorettype => 'name', proargtypes => '', prosrc => 'session_user' }, +{ oid => '9977', descr => 'system user name', + proname => 'system_user', provolatile => 's', prorettype => 'text', + proargtypes => '', prosrc => 'system_user' }, { oid => '744', proname => 'array_eq', prorettype => 'bool', diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index ee48e392ed..e7ebea4ff4 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -357,6 +357,9 @@ extern void InitializeSessionUserIdStandalone(void); extern void SetSessionAuthorization(Oid userid, bool is_superuser); extern Oid GetCurrentRoleId(void); extern void SetCurrentRoleId(Oid roleid, bool is_superuser); +extern void InitializeSystemUser(const char *authn_id, + const char *auth_method); +extern const char *GetSystemUser(void); /* in utils/misc/superuser.c */ extern bool superuser(void); /* current user is superuser */ diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 9a7cc0c6bd..ccc927851c 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -409,6 +409,7 @@ PG_KEYWORD("support", SUPPORT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("symmetric", SYMMETRIC, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("system_user", SYSTEM_USER, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("table", TABLE, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL) diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 8cba888223..ee0985c7ed 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1496,6 +1496,14 @@ ParallelWorkerMain(Datum main_arg) false); RestoreClientConnectionInfo(clientconninfospace); + /* + * Initialize SystemUser now that MyClientConnectionInfo is restored. + * Also ensure that auth_method is actually valid, aka authn_id is not NULL. + */ + if (MyClientConnectionInfo.authn_id) + InitializeSystemUser(MyClientConnectionInfo.authn_id, + hba_authname(MyClientConnectionInfo.auth_method)); + /* Attach to the leader's serializable transaction, if SERIALIZABLE. */ AttachSerializableXact(fps->serializable_xact_handle); diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 0d8d292850..94d5142a4a 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -743,7 +743,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P START STATEMENT STATISTICS STDIN STDOUT STORAGE STORED STRICT_P STRIP_P - SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P + SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMP
Re: A doubt about a newly added errdetail
On Tue, Sep 27, 2022 at 6:12 PM Alvaro Herrera wrote: > > While reading this code, I noticed that function expr_allowed_in_node() > has a very strange API: it doesn't have any return convention at all > other than "if we didn't modify errdetail_str then all is good". I was > tempted to add an "Assert(*errdetail_msg == NULL)" at the start of it, > just to make sure that it is not called if a message is already set. > > I think it would be much saner to inline the few lines of that function > in its sole caller, as in the attached. > LGTM. -- With Regards, Amit Kapila.
Re: Insertion Sort Improvements
On Tue, Sep 27, 2022 at 4:39 PM Benjamin Coutu wrote: > > Getting back to improvements for small sort runs, it might make sense to consider using in-register based sorting via sorting networks for very small runs. > It looks like some of the commercial DBMSs do this very successfully. "Looks like"? > They use 4 512bit registers (AVX-512) in this example, which could technically store 4 * 4 sort-elements (given that the sorting key is 64 bit and the tuple pointer is 64 bit). I wonder whether this could also be done with just SSE (instead of AVX), which the project now readily supports thanks to your recent efforts? SortTuples are currently 24 bytes, and supported vector registers are 16 bytes, so not sure how you think that would work. More broadly, the more invasive a change is, the more risk is involved, and the more effort to test and evaluate. If you're serious about trying to improve insertion sort performance, the simple idea we discussed at the start of the thread is a much more modest step that has a good chance of justifying the time put into it. That is not to say it's easy, however, because testing is a non-trivial amount of work. -- John Naylor EDB: http://www.enterprisedb.com
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Fri, Sep 23, 2022 at 12:11 AM John Naylor wrote: > > > On Thu, Sep 22, 2022 at 11:46 AM John Naylor > wrote: > > One thing I want to try soon is storing fewer than 16/32 etc entries, so > > that the whole node fits comfortably inside a power-of-two allocation. That > > would allow us to use aset without wasting space for the smaller nodes, > > which would be faster and possibly would solve the fragmentation problem > > Andres referred to in > > > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de > > While calculating node sizes that fit within a power-of-two size, I noticed > the current base node is a bit wasteful, taking up 8 bytes. The node kind > only has a small number of values, so it doesn't really make sense to use an > enum here in the struct (in fact, Andres' prototype used a uint8 for > node_kind). We could use a bitfield for the count and kind: > > uint16 -- kind and count bitfield > uint8 shift; > uint8 chunk; > > That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag, the > bitfield can just go back to being count only. Good point, agreed. > > Here are the v6 node kinds: > > node4: 8 + 4 +(4)+ 4*8 = 48 bytes > node16: 8 + 16 + 16*8 = 152 > node32: 8 + 32 + 32*8 = 296 > node128: 8 + 256 + 128/8 + 128*8 = 1304 > node256: 8 + 256/8 + 256*8 = 2088 > > And here are the possible ways we could optimize nodes for space using aset > allocation. Parentheses are padding bytes. Even if my math has mistakes, the > numbers shouldn't be too far off: > > node3: 4 + 3 +(1)+ 3*8 = 32 bytes > node6: 4 + 6 +(6)+ 6*8 = 64 > node13: 4 + 13 +(7)+ 13*8 = 128 > node28: 4 + 28 + 28*8 = 256 > node31: 4 + 256 + 32/8 + 31*8 = 512 (XXX not good) > node94: 4 + 256 + 96/8 + 94*8 = 1024 > node220: 4 + 256 + 224/8 + 220*8 = 2048 > node256: = 4096 > > The main disadvantage is that node256 would balloon in size. Yeah, node31 and node256 are bloated. We probably could use slab for node256 independently. It's worth trying a benchmark to see how it affects the performance and the tree size. BTW We need to consider not only aset/slab but also DSA since we allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses the following size classes: static const uint16 dsa_size_classes[] = { sizeof(dsa_area_span), 0, /* special size classes */ 8, 16, 24, 32, 40, 48, 56, 64, /* 8 classes separated by 8 bytes */ 80, 96, 112, 128, /* 4 classes separated by 16 bytes */ 160, 192, 224, 256, /* 4 classes separated by 32 bytes */ 320, 384, 448, 512, /* 4 classes separated by 64 bytes */ 640, 768, 896, 1024,/* 4 classes separated by 128 bytes */ 1280, 1560, 1816, 2048, /* 4 classes separated by ~256 bytes */ 2616, 3120, 3640, 4096, /* 4 classes separated by ~512 bytes */ 5456, 6552, 7280, 8192 /* 4 classes separated by ~1024 bytes */ }; node256 will be classed as 2616, which is still not good. Anyway, I'll implement DSA support for radix tree. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: making relfilenodes 56 bits
Hi Dilip, I am very happy to see these commits. Here's some belated review for the tombstone-removal patch. > v7-0004-Don-t-delay-removing-Tombstone-file-until-next.patch More things you can remove: * sync_unlinkfiletag in struct SyncOps * the macro UNLINKS_PER_ABSORB * global variable pendingUnlinks This comment after the question mark is obsolete: * XXX should we CHECK_FOR_INTERRUPTS in this loop? Escaping with an * error in the case of SYNC_UNLINK_REQUEST would leave the * no-longer-used file still present on disk, which would be bad, so * I'm inclined to assume that the checkpointer will always empty the * queue soon. (I think if the answer to the question is now yes, then we should replace the stupid sleep with a condition variable sleep, but there's another thread about that somewhere). In a couple of places in dbcommands.c you could now make this change: /* -* Force a checkpoint before starting the copy. This will force all dirty -* buffers, including those of unlogged tables, out to disk, to ensure -* source database is up-to-date on disk for the copy. -* FlushDatabaseBuffers() would suffice for that, but we also want to -* process any pending unlink requests. Otherwise, if a checkpoint -* happened while we're copying files, a file might be deleted just when -* we're about to copy it, causing the lstat() call in copydir() to fail -* with ENOENT. +* Force all dirty buffers, including those of unlogged tables, out to +* disk, to ensure source database is up-to-date on disk for the copy. */ - RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | - CHECKPOINT_WAIT | CHECKPOINT_FLUSH_ALL); + FlushDatabaseBuffers(src_dboid); More obsolete comments you could change: * If we were copying database at block levels then drop pages for the * destination database that are in the shared buffer cache. And tell --> * checkpointer to forget any pending fsync and unlink requests for files --> * Tell checkpointer to forget any pending fsync and unlink requests for * files in the database; else the fsyncs will fail at next checkpoint, or * worse, it will delete file In tablespace.c I think you could now make this change: if (!destroy_tablespace_directories(tablespaceoid, false)) { - /* -* Not all files deleted? However, there can be lingering empty files -* in the directories, left behind by for example DROP TABLE, that -* have been scheduled for deletion at next checkpoint (see comments -* in mdunlink() for details). We could just delete them immediately, -* but we can't tell them apart from important data files that we -* mustn't delete. So instead, we force a checkpoint which will clean -* out any lingering files, and try again. -*/ - RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); - +#ifdef WIN32 /* * On Windows, an unlinked file persists in the directory listing * until no process retains an open handle for the file. The DDL @@ -523,6 +513,7 @@ DropTableSpace(DropTableSpaceStmt *stmt) /* And now try again. */ if (!destroy_tablespace_directories(tablespaceoid, false)) +#endif { /* Still not empty, the files must be important then */ ereport(ERROR,
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
At Wed, 28 Sep 2022 10:09:39 +0900, Michael Paquier wrote in > On Tue, Sep 27, 2022 at 03:11:54PM +0530, Bharath Rupireddy wrote: > > On Tue, Sep 27, 2022 at 2:20 PM Kyotaro Horiguchi > > wrote: > >> If this is still unacceptable, I propose to change the comment. (I > >> found that the previous patch forgets about do_pg_backup_stop()) > >> > >> - * It fills in backup_state with the information required for the backup, > >> + * It fills in the parameter "state" with the information required for > >> the backup, > > > > +1. There's another place that uses backup_state in the comments. I > > modified that as well. Please see the attached patch. > > Thanks, fixed the comments. I have let the variable names as they are > now in the code, as both are backup-related code paths so it is IMO > clear that the state is linked to a backup. Thanks! I'm fine with that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: longfin and tamandua aren't too happy but I'm not sure why
On Wed, Sep 28, 2022 at 2:59 AM Tom Lane wrote: > > ... also, lapwing's not too happy [1]. The alter_table test > expects this to yield zero rows, but it doesn't: By looking at regression diff as shown below, it seems that we are able to get the relfilenode from the Oid using pg_relation_filenode(oid) but the reverse mapping pg_filenode_relation(reltablespace, relfilenode) returned NULL. I am not sure but by looking at the code it is somehow related to alignment padding while computing the hash key size in the 32-bit machine in the function InitializeRelfilenumberMap(). I am still looking into this and will provide updates on this. + oid | mapped_oid | reltablespace | relfilenode | relname +---++---+-+ + 16385 || 0 | 10 | char_tbl + 16388 || 0 | 11 | float8_tbl + 16391 || 0 | 12 | int2_tbl + 16394 || 0 | 13 | int4_tbl -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Fix some newly modified tab-complete changes
On Tue, Sep 27, 2022 at 8:28 PM shiy.f...@fujitsu.com wrote: > > Hi hackers, > > I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" should > be "ALL TABLES IN SCHEMA" in the following case. > > postgres=# grant all on > ALL FUNCTIONS IN SCHEMA DATABASE FUNCTION > PARAMETER SCHEMATABLESPACE > ALL PROCEDURES IN SCHEMA DOMAINinformation_schema. > PROCEDURE SEQUENCE tbl > ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER LANGUAGE > public. TABLE TYPE > ALL SEQUENCES IN SCHEMA FOREIGN SERVERLARGE OBJECT > ROUTINE TABLES IN SCHEMA > > I found that it is related to the recent commit 790bf615dd, and maybe it's > better to fix it. I also noticed that some comments should be modified > according > to this new syntax. Attach a patch to fix them. > Thanks for the patch! Below are my review comments. The patch looks good to me but I did find some other tab-completion anomalies. IIUC these are unrelated to your work, but since I found them while testing your patch I am reporting them here. Perhaps you want to fix them in the same patch, or just raise them again separately? == 1. tab complete for CREATE PUBLICATION I don’t think this is any new bug, but I found that it is possible to do this... test_pub=# create publication p for ALL TABLES IN SCHEMA information_schema pg_catalog pg_toastpublic or, even this... test_pub=# create publication p for XXX TABLES IN SCHEMA information_schema pg_catalog pg_toastpublic == 2. tab complete for GRANT test_pub=# grant ALLEXECUTE pg_execute_server_program pg_read_server_files postgres TRIGGER ALTER SYSTEM GRANT pg_monitor pg_signal_backend REFERENCES TRUNCATE CONNECTINSERT pg_read_all_data pg_stat_scan_tablesSELECT UPDATE CREATE pg_checkpoint pg_read_all_settings pg_write_all_data SET USAGE DELETE pg_database_owner pg_read_all_stats pg_write_server_files TEMPORARY 2a. grant "GRANT" ?? ~ 2b. grant "TEMPORARY" but not "TEMP" ?? -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Avoid memory leaks during base backups
On Tue, Sep 27, 2022 at 05:32:26PM +0900, Kyotaro Horiguchi wrote: > At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy > wrote in > > On Mon, Sep 26, 2022 at 7:34 PM Tom Lane wrote: > > > This ... seems like inventing your own shape of wheel. The > > > normal mechanism for preventing this type of leak is to put the > > > allocations in a memory context that can be reset or deallocated > > > in mainline code at the end of the operation. > > > > Yes, that's the typical way and the patch attached does it for > > perform_base_backup(). What happens if we allocate some memory in the > > new memory context and error-out before reaching the end of operation? > > How do we deallocate such memory? > > Whoever directly or indirectly catches the exception can do that. For > example, SendBaseBackup() seems to catch execptions from > perform_base_backup(). bbsinc_cleanup() is already resides there. Even with that, what's the benefit in using an extra memory context in basebackup.c? backup_label and tablespace_map are mentioned upthread, but we have a tight control of these, and they should be allocated in the memory context created for replication commands (grep for "Replication command context") anyway. Using a dedicated memory context for the SQL backup functions under TopMemoryContext could be interesting, on the other hand.. -- Michael signature.asc Description: PGP signature
Re: Avoid memory leaks during base backups
On Wed, Sep 28, 2022 at 9:46 AM Michael Paquier wrote: > > On Tue, Sep 27, 2022 at 05:32:26PM +0900, Kyotaro Horiguchi wrote: > > At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy > > wrote in > > > On Mon, Sep 26, 2022 at 7:34 PM Tom Lane wrote: > > > > This ... seems like inventing your own shape of wheel. The > > > > normal mechanism for preventing this type of leak is to put the > > > > allocations in a memory context that can be reset or deallocated > > > > in mainline code at the end of the operation. > > > > > > Yes, that's the typical way and the patch attached does it for > > > perform_base_backup(). What happens if we allocate some memory in the > > > new memory context and error-out before reaching the end of operation? > > > How do we deallocate such memory? > > > > Whoever directly or indirectly catches the exception can do that. For > > example, SendBaseBackup() seems to catch execptions from > > perform_base_backup(). bbsinc_cleanup() is already resides there. > > Even with that, what's the benefit in using an extra memory context in > basebackup.c? backup_label and tablespace_map are mentioned upthread, > but we have a tight control of these, and they should be allocated in > the memory context created for replication commands (grep for > "Replication command context") anyway. Using a dedicated memory > context for the SQL backup functions under TopMemoryContext could be > interesting, on the other hand.. I had the same opinion. Here's what I think - for backup functions, we can have the new memory context child of TopMemoryContext and for perform_base_backup(), we can have the memory context child of CurrentMemoryContext. With PG_TRY()-PG_FINALLY()-PG_END_TRY(), we can delete those memory contexts upon ERRORs. This approach works for us since backup-related code doesn't have any FATALs. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Extend win32 error codes to errno mapping in win32error.c
On Tue, Sep 27, 2022 at 03:23:04PM +0530, Bharath Rupireddy wrote: > The failure occurs in dosmaperr() in win32error.c due to an unmapped > errno for win32 error code. The error code 123 i.e. ERROR_INVALID_NAME > says "The file name, directory name, or volume label syntax is > incorrect." [2], the closest errno mapping would be ENOENT. I quickly > looked around for the other win32 error codes [2] that don't have > mapping. > I filtered out some common error codes such as > ERROR_OUTOFMEMORY, ERROR_HANDLE_DISK_FULL, ERROR_INSUFFICIENT_BUFFER, > ERROR_NOACCESS. There may be many more, but these seemed common IMO. > > Having a right errno mapping always helps recognize the errors correctly. One important thing, in my opinion, when it comes to updating this table, is that it could be better to report the original error number if errno can be somewhat confusing for the mapping. It is also less interesting to increase the size of the table for errors that cannot be reached, or related to system calls we don't use. ERROR_INVALID_NAME => ENOENT Yeah, this mapping looks fine. ERROR_HANDLE_DISK_FULL => ENOSPC This one maps to various Setup*Error(), as well as GetDiskFreeSpaceEx(). The former is not interesting, but I can buy the case of the former for extension code (I've played with that in the past on WIN32, actually). ERROR_OUTOFMEMORY => ENOMEM ERROR_NOACCESS => EACCES ERROR_INSUFFICIENT_BUFFER => EINVAL Hmm. I have looked at our WIN32 system calls and the upstream docs, but these do not seem to be reachable in our code. -- Michael signature.asc Description: PGP signature
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Wed, Sep 28, 2022 at 12:32 AM Thomas Munro wrote: > > On Wed, Sep 28, 2022 at 1:03 AM Bharath Rupireddy > wrote:> > > On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro wrote: > > > Something like the attached. > > > > Isn't it also better to add notes in win32pread.c and win32pwrite.c > > about the callers doing lseek(SEEK_SET) if they wish to and Windows > > implementations changing file position? We can also add a TODO item > > about replacing pg_ versions with pread and friends someday when > > Windows fixes the issue? Having it in the commit and include/port.h is > > good, but the comments in win32pread.c and win32pwrite.c make life > > easier IMO. Otherwise, the patch LGTM. > > Thanks, will do. I'm looking forward to getting it in. > FWIW I doubt the OS itself will change released > behaviour like that, but I did complain about the straight-up > misleading documentation and they agreed and fixed it[1]. > > [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309 Great! Is there any plan to request for change in WriteFile() to not alter file position? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: [small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"
Dear Michael, > Yeah, at least as of the cancel callback psql_cancel_callback() that > handle_sigint() would call on SIGINT as this is set by psql. So it > does not seem right to use a boolean rather than a sig_atomic_t in > this case, as well. PSA fix patch. Note that PromptInterruptContext.enabled was also fixed because it is substituted from sigint_interrupt_enabled Best Regards, Hayato Kuroda FUJITSU LIMITED 0001-Mark-sigint_interrupt_enabled-as-sig_atomic_t.patch Description: 0001-Mark-sigint_interrupt_enabled-as-sig_atomic_t.patch
Re: Avoid memory leaks during base backups
Bharath Rupireddy writes: > I had the same opinion. Here's what I think - for backup functions, we > can have the new memory context child of TopMemoryContext and for > perform_base_backup(), we can have the memory context child of > CurrentMemoryContext. With PG_TRY()-PG_FINALLY()-PG_END_TRY(), we can > delete those memory contexts upon ERRORs. This approach works for us > since backup-related code doesn't have any FATALs. Not following your last point here? A process exiting on FATAL does not especially need to clean up its memory allocations first. Which is good, because "backup-related code doesn't have any FATALs" seems like an assertion with a very short half-life. regards, tom lane
Re: Avoid memory leaks during base backups
On Wed, Sep 28, 2022 at 10:19 AM Tom Lane wrote: > > Bharath Rupireddy writes: > > I had the same opinion. Here's what I think - for backup functions, we > > can have the new memory context child of TopMemoryContext and for > > perform_base_backup(), we can have the memory context child of > > CurrentMemoryContext. With PG_TRY()-PG_FINALLY()-PG_END_TRY(), we can > > delete those memory contexts upon ERRORs. This approach works for us > > since backup-related code doesn't have any FATALs. > > Not following your last point here? A process exiting on FATAL > does not especially need to clean up its memory allocations first. > Which is good, because "backup-related code doesn't have any FATALs" > seems like an assertion with a very short half-life. You're right. My bad. For FATALs, we don't need to clean the memory as the process itself exits. * Note: an ereport(FATAL) will not be caught by this construct; control will * exit straight through proc_exit(). /* * Perform error recovery action as specified by elevel. */ if (elevel == FATAL) { /* * Do normal process-exit cleanup, then return exit code 1 to indicate * FATAL termination. The postmaster may or may not consider this * worthy of panic, depending on which subprocess returns it. */ proc_exit(1); } -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: longfin and tamandua aren't too happy but I'm not sure why
wrasse is also failing with a bus error, but I cannot get the stack trace. So it seems it is hitting some alignment issues during startup [1]. Is it possible to get the backtrace or lineno? [1] 2022-09-28 03:19:26.228 CEST [180:4] LOG: redo starts at 0/30FE9D8 2022-09-28 03:19:27.674 CEST [177:3] LOG: startup process (PID 180) was terminated by signal 10: Bus Error 2022-09-28 03:19:27.674 CEST [177:4] LOG: terminating any other active server processes 2022-09-28 03:19:27.677 CEST [177:5] LOG: shutting down due to startup process failure 2022-09-28 03:19:27.681 CEST [177:6] LOG: database system is shut down -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: longfin and tamandua aren't too happy but I'm not sure why
Dilip Kumar writes: > wrasse is also failing with a bus error, Yeah. At this point I think it's time to call for this patch to get reverted. It should get tested *off line* on some non-Intel, non-64-bit, alignment-picky architectures before the rest of us have to deal with it any more. There may be a larger conversation to be had here about how much our CI infrastructure should be detecting. There seems to be a depressingly large gap between what that found and what the buildfarm is finding --- not only in portability issues, but in things like cpluspluscheck failures, which I had supposed CI would find. regards, tom lane
Re: Extend win32 error codes to errno mapping in win32error.c
On Wed, Sep 28, 2022 at 10:10 AM Michael Paquier wrote: > > One important thing, in my opinion, when it comes to updating this > table, is that it could be better to report the original error number > if errno can be somewhat confusing for the mapping. Returning errno = e instead of EINVAL in _dosmaperr() may have an impact on the callers that do a special handling for errno EINVAL. I don't think it's a good idea. > ERROR_INVALID_NAME => ENOENT > Yeah, this mapping looks fine. Hm. > ERROR_HANDLE_DISK_FULL => ENOSPC > This one maps to various Setup*Error(), as well as > GetDiskFreeSpaceEx(). The former is not interesting, but I can buy > the case of the former for extension code (I've played with that in > the past on WIN32, actually). > > ERROR_OUTOFMEMORY => ENOMEM > ERROR_NOACCESS => EACCES > ERROR_INSUFFICIENT_BUFFER => EINVAL > Hmm. I have looked at our WIN32 system calls and the upstream docs, > but these do not seem to be reachable in our code. IMO, we can add mapping for just ERROR_INVALID_NAME which is an obvious error code and easy to hit, leaving others. There are actually many win32 error codes that may get hit in our code base and actually mapping everything isn't possible. Please see the v2 patch. I've also added a CF entry - https://commitfest.postgresql.org/40/3914/ so that the patch gets tested across. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v2-0001-Extend-win32-error-codes-to-errno-mapping-in-win3.patch Description: Binary data
Re: longfin and tamandua aren't too happy but I'm not sure why
On Wed, Sep 28, 2022 at 6:15 PM Tom Lane wrote: > There may be a larger conversation to be had here about how > much our CI infrastructure should be detecting. There seems > to be a depressingly large gap between what that found and > what the buildfarm is finding --- not only in portability > issues, but in things like cpluspluscheck failures, which > I had supposed CI would find. +1, Andres had some sanitizer flags in the works (stopped by a weird problem to be resolved first), and 32 bit CI would clearly be good. It also seems that ARM is now available to us via CI (either Amazon's or a Mac), which IIRC is more SIGBUS-y about alignment than x86? FTR CI reported that cpluspluscheck failure and more[1], so perhaps we just need to get clearer agreement on the status of CI, ie a policy that CI had better be passing before you get to the next stage. It's still pretty new... [1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3711
Re: Fix some newly modified tab-complete changes
At Wed, 28 Sep 2022 14:14:01 +1000, Peter Smith wrote in > On Tue, Sep 27, 2022 at 8:28 PM shiy.f...@fujitsu.com > wrote: > > > > Hi hackers, > > > > I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" > > should > > be "ALL TABLES IN SCHEMA" in the following case. > > > > postgres=# grant all on > > ALL FUNCTIONS IN SCHEMA DATABASE FUNCTION > > PARAMETER SCHEMATABLESPACE > > ALL PROCEDURES IN SCHEMA DOMAINinformation_schema. > > PROCEDURE SEQUENCE tbl > > ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER LANGUAGE > > public. TABLE TYPE > > ALL SEQUENCES IN SCHEMA FOREIGN SERVERLARGE OBJECT > > ROUTINE TABLES IN SCHEMA > > > > I found that it is related to the recent commit 790bf615dd, and maybe it's > > better to fix it. I also noticed that some comments should be modified > > according > > to this new syntax. Attach a patch to fix them. > > > > Thanks for the patch! Below are my review comments. > > The patch looks good to me but I did find some other tab-completion > anomalies. IIUC these are unrelated to your work, but since I found > them while testing your patch I am reporting them here. Looks fine to me, too. > Perhaps you want to fix them in the same patch, or just raise them > again separately? > > == > > 1. tab complete for CREATE PUBLICATION > > I don’t think this is any new bug, but I found that it is possible to do > this... > > test_pub=# create publication p for ALL TABLES IN SCHEMA > information_schema pg_catalog pg_toastpublic > > or, even this... > > test_pub=# create publication p for XXX TABLES IN SCHEMA > information_schema pg_catalog pg_toastpublic Completion is responding to "IN SCHEMA" in these cases. However, I don't reach this state only by completion becuase it doesn't suggest "IN SCHEMA" after "TABLES" nor "ALL TABLES". I don't see a reason to change that behavior unless that fix doesn't cause any additional complexity. > == > > 2. tab complete for GRANT > > test_pub=# grant > ALLEXECUTE > pg_execute_server_program pg_read_server_files postgres > TRIGGER > ALTER SYSTEM GRANT pg_monitor > pg_signal_backend REFERENCES > TRUNCATE > CONNECTINSERT pg_read_all_data > pg_stat_scan_tablesSELECT UPDATE > CREATE pg_checkpoint > pg_read_all_settings pg_write_all_data SET > USAGE > DELETE pg_database_owner > pg_read_all_stats pg_write_server_files TEMPORARY > > 2a. > grant "GRANT" ?? Yeah, for the mement I thought that might a kind of admin option but there's no such a privilege. REVOKE gets the same suggestion. > 2b. > grant "TEMPORARY" but not "TEMP" ?? TEMP is an alternative spelling so that's fine. I found the following suggestion. CREATE PUBLICATION p FOR TABLES -> ["IN SCHEMA", "WITH ("] I believe "WITH (" doesn't come there. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] Add peer authentication TAP test
On Fri, Aug 26, 2022 at 10:43:43AM +0200, Drouvot, Bertrand wrote: > During the work in [1] we created a new TAP test to test the SYSTEM_USER > behavior with peer authentication. > > It turns out that there is currently no TAP test for the peer > authentication, so we think (thanks Michael for the suggestion [2]) that > it's better to split the work in [1] between "pure" SYSTEM_USER related work > and the "pure" peer authentication TAP test work. > > That's the reason of this new thread, please find attached a patch to add a > new TAP test for the peer authentication. +# Get the session_user to define the user name map test. +my $session_user = + $node->safe_psql('postgres', 'select session_user'); [...] +# Define a user name map. +$node->append_conf('pg_ident.conf', qq{mypeermap $session_user testmap$session_user}); + +# Set pg_hba.conf with the peer authentication and the user name map. +reset_pg_hba($node, 'peer map=mypeermap'); A map consists of a "MAPNAME SYSTEM_USER PG_USER". Why does this test use a Postgres role (from session_user) as the system user for the peer map? -- Michael signature.asc Description: PGP signature
Re: Extend win32 error codes to errno mapping in win32error.c
On Wed, Sep 28, 2022 at 11:14:53AM +0530, Bharath Rupireddy wrote: > IMO, we can add mapping for just ERROR_INVALID_NAME which is an > obvious error code and easy to hit, leaving others. There are actually > many win32 error codes that may get hit in our code base and actually > mapping everything isn't possible. Yes. I am fine to do just that as you have managed to hit it during development. The others may have more opinions to offer. -- Michael signature.asc Description: PGP signature
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Dear Önder: Thank you for updating patch! Your documentation seems OK, and I could not find any other places to be added Followings are my comments. 01 relation.c - general Many files are newly included. I was not sure but some codes related with planner may be able to move to src/backend/optimizer/plan. How do you and any other one think? 02 relation.c - FindLogicalRepUsableIndex ``` +/* + * Returns an index oid if we can use an index for the apply side. If not, + * returns InvalidOid. + */ +static Oid +FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel) ``` I grepped files, but I cannot find the word "apply side". How about "subscriber" instead? 03 relation.c - FindLogicalRepUsableIndex ``` + /* Simple case, we already have an identity or pkey */ + idxoid = GetRelationIdentityOrPK(localrel); + if (OidIsValid(idxoid)) + return idxoid; + + /* +* If index scans are disabled, use a sequential scan. +* +* Note that we still allowed index scans above when there is a primary +* key or unique index replica identity, but that is the legacy behaviour +* (even when enable_indexscan is false), so we hesitate to move this +* enable_indexscan check to be done earlier in this function. +*/ + if (!enable_indexscan) + return InvalidOid; ``` a. I think "identity or pkey" should be "replica identity key or primary key" or "RI or PK" b. Later part should be at around GetRelationIdentityOrPK. 04 relation.c - FindUsableIndexForReplicaIdentityFull ``` + MemoryContext usableIndexContext; ... + usableIndexContext = AllocSetContextCreate(CurrentMemoryContext, + "usableIndexContext", + ALLOCSET_DEFAULT_SIZES); ``` I grepped other sources, and I found that the name like "tmpcxt" is used for the temporary MemoryContext. 05 relation.c - SuitablePathsForRepIdentFull ``` + indexRelation = index_open(idxoid, AccessShareLock); + indexInfo = BuildIndexInfo(indexRelation); + is_btree = (indexInfo->ii_Am == BTREE_AM_OID); + is_partial = (indexInfo->ii_Predicate != NIL); + is_only_on_expression = IndexOnlyOnExpression(indexInfo); + index_close(indexRelation, NoLock); ``` Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't same lock be released? 06 relation.c - GetCheapestReplicaIdentityFullPath IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = $N" is emulated, right? you can write explicitly it as comment 07 relation.c - GetCheapestReplicaIdentityFullPath ``` + Path *path = (Path *) lfirst(lc); + Oid idxoid = GetIndexOidFromPath(path); + + if (!OidIsValid(idxoid)) + { + /* Unrelated Path, skip */ + suitableIndexList = lappend(suitableIndexList, path); + } ``` I was not clear about here. IIUC in the function we want to extract "good" scan plan and based on that the cheapest one is chosen. GetIndexOidFromPath() seems to return InvalidOid when the input path is not index scan, so why is it appended to the suitable list? === 08 worker.c - usable_indexoid_internal I think this is not "internal" function, such name should be used for like "apply_handle_commit" - "apply_handle_commit_internal", or "apply_handle_insert" - "apply_handle_insert_internal". How about "get_usable_index" or something? 09 worker.c - usable_indexoid_internal ``` + Oid targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid; + Oid localrelid = relinfo->ri_RelationDesc->rd_id; + + if (targetrelid != localrelid) ``` I think these lines are very confusable. IIUC targetrelid is corresponded to the "parent", and localrelid is corresponded to the "child", right? How about changing name to "partitionedoid" and "leadoid" or something? === 10 032_subscribe_use_index.pl ``` # create tables pub and sub $node_publisher->safe_psql('postgres', "CREATE TABLE test_replica_id_full (x int)"); $node_publisher->safe_psql('postgres', "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;"); $node_subscriber->safe_psql('postgres', "CREATE TABLE test_replica_id_full (x int)"); $node_subscriber->safe_psql('postgres', "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)"); ``` In many places same table is defined, altered as "REPLICA IDENTITY FULL", and index is created. Could you combine them into function? 11 032_subscribe_use_index.pl ``` # wait until the index is used on the subscri
Re: A doubt about a newly added errdetail
At Tue, 27 Sep 2022 12:19:35 +0200, Alvaro Herrera wrote in > Yeah, since you're changing another word in that line, it's ok to move > the parameter line off-string. (If you were only changing the parameter > to %s and there was no message duplication, I would reject the patch as > useless.) I'm fine with that. By the way, related to the area, I found the following error messages. >errmsg("publication \"%s\" is defined as FOR ALL TABLES", > NameStr(pubform->pubname)), >errdetail("Schemas cannot be added to or dropped from FOR ALL TABLES > publications."))); It looks tome that the errmsg and errordetail are reversed. Isn't the following order common? >errmsg("schemas cannot be added to or dropped from publication > \"%s\".", > NameStr(pubform->pubname)), >errdetail("The publication is defined as FOR ALL TABLES."))); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Wed, Sep 28, 2022 at 5:43 PM Bharath Rupireddy wrote: > On Wed, Sep 28, 2022 at 12:32 AM Thomas Munro wrote: > > FWIW I doubt the OS itself will change released > > behaviour like that, but I did complain about the straight-up > > misleading documentation and they agreed and fixed it[1]. > > > > [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309 > > Great! Is there any plan to request for change in WriteFile() to not > alter file position? Not from me. I stick to open source problems. Reporting bugs in documentation is legitimate self defence, though.
Re: Avoid memory leaks during base backups
At Wed, 28 Sep 2022 13:16:36 +0900, Michael Paquier wrote in > On Tue, Sep 27, 2022 at 05:32:26PM +0900, Kyotaro Horiguchi wrote: > > At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy > > wrote in > > > On Mon, Sep 26, 2022 at 7:34 PM Tom Lane wrote: > > > > This ... seems like inventing your own shape of wheel. The > > > > normal mechanism for preventing this type of leak is to put the > > > > allocations in a memory context that can be reset or deallocated > > > > in mainline code at the end of the operation. > > > > > > Yes, that's the typical way and the patch attached does it for > > > perform_base_backup(). What happens if we allocate some memory in the > > > new memory context and error-out before reaching the end of operation? > > > How do we deallocate such memory? > > > > Whoever directly or indirectly catches the exception can do that. For > > example, SendBaseBackup() seems to catch execptions from > > perform_base_backup(). bbsinc_cleanup() is already resides there. > > Even with that, what's the benefit in using an extra memory context in > basebackup.c? backup_label and tablespace_map are mentioned upthread, > but we have a tight control of these, and they should be allocated in > the memory context created for replication commands (grep for > "Replication command context") anyway. Using a dedicated memory > context for the SQL backup functions under TopMemoryContext could be > interesting, on the other hand.. If I understand you correctly, my point was the usage of error callbacks. I meant that we can release that tangling memory blocks in SendBaseBackup() even by directly pfree()ing then NULLing the pointer, if the pointer were file-scoped static. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Wed, Sep 28, 2022 at 10:49 AM Masahiko Sawada wrote: > BTW We need to consider not only aset/slab but also DSA since we > allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses > the following size classes: > > static const uint16 dsa_size_classes[] = { > [...] Thanks for that info -- I wasn't familiar with the details of DSA. For the non-parallel case, I plan to at least benchmark using aset because I gather it's the most heavily optimized. I'm thinking that will allow other problem areas to be more prominent. I'll also want to compare total context size compared to slab to see if possibly less fragmentation makes up for other wastage. Along those lines, one thing I've been thinking about is the number of size classes. There is a tradeoff between memory efficiency and number of branches when searching/inserting. My current thinking is there is too much coupling between size class and data type. Each size class currently uses a different data type and a different algorithm to search and set it, which in turn requires another branch. We've found that a larger number of size classes leads to poor branch prediction [1] and (I imagine) code density. I'm thinking we can use "flexible array members" for the values/pointers, and keep the rest of the control data in the struct the same. That way, we never have more than 4 actual "kinds" to code and branch on. As a bonus, when migrating a node to a larger size class of the same kind, we can simply repalloc() to the next size. To show what I mean, consider this new table: node2: 5 + 6 +(5)+ 2*8 = 32 bytes node6: 5 + 6 +(5)+ 6*8 = 64 node12: 5 + 27 + 12*8 = 128 node27: 5 + 27 + 27*8 = 248(->256) node91: 5 + 256 + 28 +(7)+ 91*8 = 1024 node219: 5 + 256 + 28 +(7)+219*8 = 2048 node256: 5 + 32 +(3)+256*8 = 2088(->4096) Seven size classes are grouped into the four kinds. The common base at the front is here 5 bytes because there is a new uint8 field for "capacity", which we can ignore for node256 since we assume we can always insert/update that node. The control data is the same in each pair, and so the offset to the pointer/value array is the same. Thus, migration would look something like: case FOO_KIND: if (unlikely(count == capacity)) { if (capacity == XYZ) /* for smaller size class of the pair */ { ; capacity = next-higher-capacity; goto do_insert; } else ; } else { do_insert: <...>; break; } /* FALLTHROUGH */ ... One disadvantage is that this wastes some space by reserving the full set of control data in the smaller size class of the pair, but it's usually small compared to array size. Somewhat unrelated, we could still implement Andres' idea [1] to dispense with the isset array in inner nodes of the indirect array type (now node128), since we can just test if the pointer is null. [1] https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de -- John Naylor EDB: http://www.enterprisedb.com