Re: [Proposal] Add foreign-server health checks infrastructure
On 2023-01-23 14:40, Hayato Kuroda (Fujitsu) wrote: Dear Ted, Thanks for reviewing! PSA new version. For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch , `pqConnCheck_internal` only has one caller which is quite short. Can pqConnCheck_internal and PQConnCheck be merged into one func ? I divided the function for feature expandability. Currently it works on linux platform, but the limitation should be removed in future and internal function will be longer. Therefore I want to keep this style. +int +PQCanConnCheck(void) It seems the return value should be of bool type. I slightly changed the returned value like true/false. But IIUC libpq functions cannot define as "bool" datatype. E.g. PQconnectionNeedsPassword() returns true/false, but the function is defined as int. Best Regards, Hayato Kuroda FUJITSU LIMITED Thank you for updating the patch! +/* Check whether the postgres server is still alive or not */ +extern int PQConnCheck(PGconn *conn); +extern int PQCanConnCheck(void); Aren't these PQconnCheck and PQcanConnCheck? I think the first letter following 'PQ' should be lower case. regards. -- Katsuragi Yuta Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Operation log for major operations
Hi! Maybe another discussion thread can be created for the consolidation of XX_file_exists functions. Usually XX_file_exists functions are simple. They contain single call stat() or open() and specific error processing after this call. Likely the unified function will be too complex to cover all the uses of XX_file_exists functions. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Wed, 25 Jan 2023 12:30:19 +0530, Amit Kapila wrote in > On Wed, Jan 25, 2023 at 11:57 AM Kyotaro Horiguchi > wrote: > > > > At Tue, 24 Jan 2023 12:19:04 +, "Takamichi Osumi (Fujitsu)" > > wrote in > > > Attached the patch v20 that has incorporated all comments so far. > > > ... > > > > > > + in which case no additional wait is necessary. If the system > > clocks > > + on publisher and subscriber are not synchronized, this may lead > > to > > + apply changes earlier than expected, but this is not a major > > issue > > + because this parameter is typically much larger than the time > > + deviations between servers. Note that if this parameter is set > > to a > > > > This doesn't seem to fit our documentation. It is not our business > > whether a certain amount deviation is critical or not. How about > > somethig like the following? > > > > But we have a similar description for 'recovery_min_apply_delay' [1]. > See "...If the system clocks on primary and standby are not > synchronized, this may lead to recovery applying records earlier than > expected; but that is not a major issue because useful settings of > this parameter are much larger than typical time deviations between > servers." Mmmm. I thought that we might be able to gather the description (including other common descriptions, if any), but I didn't find an appropreate place.. Okay. I agree to the current description. Thanks for the kind explanation. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: document the need to analyze partitioned tables
On Wed, 25 Jan 2023 at 19:46, Laurenz Albe wrote: > Did you see Justin's wording suggestion in > https://postgr.es/m/20230118174919.GA9837%40telsasoft.com ? > He didn't attach it as a patch, so you may have missed it. > I was pretty happy with that. I didn't pay too much attention as I tend to apply patches to obtain the full context of the change. Manually trying to apply a patch from an email is not something I like to do. > I think your first sentence it a bit clumsy and might be streamlined to > > Partitioned tables do not directly store tuples and consequently do not > require autovacuum to perform any VACUUM operations. That seems better than what I had. > Also, I am a little bit unhappy about > > 1. Your paragraph states that partitioned table need no autovacuum, >but doesn't state unmistakably that they will never be treated >by autovacuum. hmm. I assume the reader realises from the text that lack of any tuples means VACUUM is not required. The remaining part of what autovacuum does not do is explained when the text goes on to say that ANALYZE operations are also not performed on partitioned tables. I'm not sure what is left that's mistakable there. > 2. You make a distinction between table partitions and "normal tables", >but really there is no distiction. We may have different mental models here. This relates to the part that I wasn't keen on in your patch, i.e: +The partitions of a partitioned table are normal tables and get processed +by autovacuum While I agree that the majority of partitions are likely to be relkind='r', which you might ordinarily consider a "normal table", you just might change your mind when you try to INSERT or UPDATE records that would violate the partition constraint. Some partitions might also be themselves partitioned tables and others might be foreign tables. That does not really matter much when it comes to what autovacuum does or does not do, but I'm not really keen to imply in our documents that partitions are "normal tables". David
Re: old_snapshot_threshold bottleneck on replica
On Tue, 24 Jan 2023 at 18:46, Robert Haas wrote: > > (1) that mutex also protects something else and the existing comment > is wrong, or > > (2) the mutex should have been removed but the patch neglected to do so, or > > (3) the mutex is still needed for some reason, in which case either > (3a) the patch isn't actually safe or (3b) the patch needs comments to > explain what the new synchronization model is. > > Yes, you're absolutely right. And my first intention was to remove this mutex completely. But in TransactionIdLimitedForOldSnapshots these variable is using conjointly. So, I'm not sure, is it completely safe to remove mutex. Actually, removing mutex and switch to atomics was my first choice. I've run all the tests and no problems were found. But, at that time I choose to be more conservative. Anyway, here is the new variant. -- Best regards, Maxim Orlov. v2-0001-Use-atomic-old_snapshot_threshold.patch Description: Binary data
Re: plpython vs _POSIX_C_SOURCE
Hi, On 2023-01-24 23:37:44 -0500, Tom Lane wrote: > Andres Freund writes: > > Patches attached. > > +1 for 0001. Cool, will push tomorrow. > I'm still nervous about 0002. However, maybe the cases that we had trouble > with are legacy issues that nobody cares about anymore in 2023. We can > always look for another answer if we get complaints, I guess. Yea, it's a patch that should be easily revertable, if it comes to that. I'll add a note to the commit message about potentially needing to do that if there's not easily addressed fallout. Greetings, Andres Freund
Re: to_hex() for negative inputs
Hi Dean, > I don't see how a couple of extra arguments will expand to hundreds. Maybe I was exaggerating, but the point is that adding extra flags for every possible scenario is a disadvantageous approach in general. There is no need to increase the code base, the amount of test cases and the amount of documentation every time someone has an idea "in rare cases I also may want to do A or B, let's add a flag for this". > Which is broken for INT_MIN: > > select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) > from ( values (-2147483648) ) as s(x); > > ERROR: integer out of range I'm afraid the behavior of something like to_hex(X, with_sign => true) is going to be exactly the same. There is no safe and consistent way to calculate abs(INT_MIN). -- Best regards, Aleksander Alekseev
Re: heapgettup() with NoMovementScanDirection unused in core?
On Wed, 25 Jan 2023 at 13:55, Melanie Plageman wrote: > David Rowley and I were discussing how to test the > NoMovementScanDirection case for heapgettup() and heapgettup_pagemode() > in [1] (since there is not currently coverage). We are actually > wondering if it is dead code (in core). Yeah, so I see nothing in core that can cause heapgettup() or heapgettup_pagemode() to be called with NoMovementScanDirection. I imagine one possible way to hit it might be in an extension where someone's written their own ExecutorRun_hook that does not have the same NoMovementScanDirection check that standard_ExecutorRun() has. So far my thoughts are that we should just rip it out and see if anyone complains. If they complain loudly enough, then it's easy enough to put it back without any compatibility issues. However, if it's for the ExecutorRun_hook reason, then they'll likely be better to add the same NoMovementScanDirection as we have in standard_ExecutorRun(). I'm just not keen on refactoring the code without the means to test that the new code actually works. Does anyone know of any reason why we shouldn't ditch the nomovement code in heapgettup/heapgettup_pagemode? Maybe we'd also want to Assert that the direction is either forwards or backwards in table_scan_getnextslot and table_scan_getnextslot_tidrange. (I see heap_getnextslot_tidrange() does not have any handling for NoMovementScanDirection, so if this is not dead code, that probably needs to be fixed) David
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Hi, On 2023-01-24 20:59:04 -0800, Andres Freund wrote: > I find this to be awkward code. The booleans are kinda pointless, and the > tableagevac case is hard to follow because trigger is set elsewhere. > > I can give reformulating it a go. Need to make some food first. Here's a draft of what I am thinking of. Not perfect yet, but I think it looks better. The pg_stat_activity output looks like this right now: autovacuum due to table XID age: VACUUM public.pgbench_accounts (to prevent wraparound) Why don't we drop the "(to prevent wraparound)" now? And I still think removing the "table " bit would be an improvement. Greetings, Andres Freund diff --git i/src/include/commands/vacuum.h w/src/include/commands/vacuum.h index 4450003b4a8..13f70a1f654 100644 --- i/src/include/commands/vacuum.h +++ w/src/include/commands/vacuum.h @@ -237,7 +237,8 @@ typedef struct VacuumParams * use default */ int multixact_freeze_table_age; /* multixact age at which to scan * whole table */ - bool is_wraparound; /* force a for-wraparound vacuum */ + bool is_wraparound; /* antiwraparound autovacuum? */ + AutoVacType trigger; /* autovacuum trigger condition, if any */ int log_min_duration; /* minimum execution threshold in ms at * which autovacuum is logged, -1 to use * default */ @@ -328,6 +329,7 @@ extern void vacuum(List *relations, VacuumParams *params, extern void vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel); extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode); +extern const char *vac_autovacuum_trigger_msg(AutoVacType trigger); extern double vac_estimate_reltuples(Relation relation, BlockNumber total_pages, BlockNumber scanned_pages, diff --git i/src/backend/access/heap/vacuumlazy.c w/src/backend/access/heap/vacuumlazy.c index 8f14cf85f38..8a64dce6180 100644 --- i/src/backend/access/heap/vacuumlazy.c +++ w/src/backend/access/heap/vacuumlazy.c @@ -639,6 +639,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * implies aggressive. Produce distinct output for the corner * case all the same, just in case. */ +Assert(params->trigger == AUTOVACUUM_TABLE_XID_AGE || + params->trigger == AUTOVACUUM_TABLE_MXID_AGE); if (vacrel->aggressive) msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n"); else @@ -656,6 +658,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relnamespace, vacrel->relname, vacrel->num_index_scans); + if (!verbose) +appendStringInfo(&buf, _("triggered by: %s\n"), + vac_autovacuum_trigger_msg(params->trigger)); appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"), vacrel->removed_pages, new_rel_pages, diff --git i/src/backend/commands/vacuum.c w/src/backend/commands/vacuum.c index 7b1a4b127eb..18278acb557 100644 --- i/src/backend/commands/vacuum.c +++ w/src/backend/commands/vacuum.c @@ -273,8 +273,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel) params.multixact_freeze_table_age = -1; } - /* user-invoked vacuum is never "for wraparound" */ + /* user-invoked vacuum never uses these autovacuum-only flags */ params.is_wraparound = false; + params.trigger = AUTOVACUUM_NONE; /* user-invoked vacuum uses VACOPT_VERBOSE instead of log_min_duration */ params.log_min_duration = -1; @@ -1874,7 +1875,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, bool skip_privs) LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); MyProc->statusFlags |= PROC_IN_VACUUM; if (params->is_wraparound) + { + Assert(params->trigger == AUTOVACUUM_TABLE_XID_AGE || + params->trigger == AUTOVACUUM_TABLE_MXID_AGE); MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND; + } ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; LWLockRelease(ProcArrayLock); } @@ -2176,6 +2181,30 @@ vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode) pfree(Irel); } +/* + * Return translatable string describing autovacuum trigger threshold type + */ +const char * +vac_autovacuum_trigger_msg(AutoVacType trigger) +{ + switch (trigger) + { + case AUTOVACUUM_NONE: + return _("none"); + case AUTOVACUUM_TABLE_XID_AGE: + return _("table XID age"); + case AUTOVACUUM_TABLE_MXID_AGE: + return _("table MXID age"); + case AUTOVACUUM_DEAD_TUPLES: + return _("dead tuples"); + case AUTOVACUUM_INSERTED_TUPLES: + return _("inserted tuples"); + } + + Assert(false);/* cannot be reached */ + return NULL; +} + /* * vacuum_delay_point --- check for interrupts and cost-based delay. * diff --git i/src/backend/postmaster/autovacuum.c w/src/backend/postmaster/autovacuum.c index f5ea381c53e..18d150c975b 100644 --- i/src/backend/postmaster/autovacuum.c +++ w/src/backend/postmaster/autovacuum.c
Re: Progress report of CREATE INDEX for nested partitioned tables
On Wed, 18 Jan 2023 at 15:25, Justin Pryzby wrote: > > TBH, I think the best approach is what I did in: > 0001-report-top-parent-progress-for-CREATE-INDEX.txt > > That's a minimal patch, ideal for backpatching. > > ..which defines/clarifies that the progress reporting is only for > *direct* children. That avoids the need to change any data structures, > and it's what was probably intended by the original patch, which doesn't > seem to have considered intermediate partitioned tables. > > I think it'd be fine to re-define that in some future release, to allow > showing indirect children (probably only "leaves", and not intermediate > partitioned tables). Or "total_bytes" or other global progress. > Hmm. My expectation as a user is that partitions_total includes both direct and indirect (leaf) child partitions, that it is set just once at the start of the process, and that partitions_done increases from zero to partitions_total as the index-build proceeds. I think that should be achievable with a minimally invasive patch that doesn't change any data structures. I agree with all the review comments Tomas posted. In particular, this shouldn't need any changes to IndexStmt. I think the best approach would be to just add a new function to backend_progress.c that offsets a specified progress parameter by a specified amount, so that you can just increment partitions_done by one or more, at the appropriate points. Regards, Dean
Re: to_hex() for negative inputs
On Wed, 25 Jan 2023 at 09:02, Aleksander Alekseev wrote: > > > I don't see how a couple of extra arguments will expand to hundreds. > > Maybe I was exaggerating, but the point is that adding extra flags for > every possible scenario is a disadvantageous approach in general. > There is no need to increase the code base, the amount of test cases > and the amount of documentation every time someone has an idea "in > rare cases I also may want to do A or B, let's add a flag for this". > OK, but the point was that we've just added support for accepting hex inputs in a particular format, so I think it would be useful if to_hex() could produce outputs compatible with that. > > Which is broken for INT_MIN: > > > > select case when sign(x) > 0 then '' else '-' end || to_hex(abs(x)) > > from ( values (-2147483648) ) as s(x); > > > > ERROR: integer out of range > > I'm afraid the behavior of something like to_hex(X, with_sign => true) > is going to be exactly the same. There is no safe and consistent way > to calculate abs(INT_MIN). > Of course there is. This is easy to code in C using unsigned ints, without resorting to abs() (yes, I'm aware that abs() is undefined for INT_MIN). Regards, Dean
Re: to_hex() for negative inputs
Hi Dean, > Of course there is. This is easy to code in C using unsigned ints, > without resorting to abs() (yes, I'm aware that abs() is undefined for > INT_MIN). So in your opinion what is the expected result of to_hex(INT_MIN, with_sign => true)? -- Best regards, Aleksander Alekseev
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Katsuragi-san, Thank you for reading the patch! PSA new version. > Thank you for updating the patch! > > +/* Check whether the postgres server is still alive or not */ > +extern int PQConnCheck(PGconn *conn); > +extern int PQCanConnCheck(void); > > Aren't these PQconnCheck and PQcanConnCheck? I think the first letter > following 'PQ' should be lower case. I cannot find any rules about it, but seems right. Fixed. Best Regards, Hayato Kuroda FUJITSU LIMITED v27-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch Description: v27-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch v27-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch Description: v27-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch v27-0003-add-test.patch Description: v27-0003-add-test.patch
Re: to_hex() for negative inputs
On Wed, 25 Jan 2023 at 10:57, Aleksander Alekseev wrote: > > > Of course there is. This is easy to code in C using unsigned ints, > > without resorting to abs() (yes, I'm aware that abs() is undefined for > > INT_MIN). > > So in your opinion what is the expected result of to_hex(INT_MIN, > with_sign => true)? > "-8000" or "-0x8000", depending on whether the prefix is requested. The latter is legal input for an integer, which is the point (to allow round-tripping): SELECT int '-0x8000'; int4 - -2147483648 (1 row) SELECT pg_typeof(-0x8000); pg_typeof --- integer (1 row) Regards, Dean
Re: Logical replication timeout problem
On Tue, Jan 24, 2023 at 8:15 AM wangw.f...@fujitsu.com wrote: > > Attach the new patch. > I think the patch missed to handle the case of non-transactional messages which was previously getting handled. I have tried to address that in the attached. Is there a reason that shouldn't be handled? Apart from that changed a few comments. If my understanding is correct, then we need to change the callback update_progress_txn name as well because now it needs to handle both transactional and non-transactional changes. How about update_progress_write? We accordingly need to change the comments for the callback. Additionally, I think we should have a test case to show we don't time out because of not processing non-transactional messages. See pgoutput_message for cases where it doesn't process the message. -- With Regards, Amit Kapila. v7-0001-Fix-the-logical-replication-timeout-during-proces.patch Description: Binary data
Re: Improve GetConfigOptionValues function
> After looking at this, it seemed to me that the factorization > wasn't quite right after all: specifically, the new function > could be used in several more places if it confines itself to > being a privilege check and doesn't consider GUC_NO_SHOW_ALL. > So more like the attached. > > You could argue that the factorization is illusory since each > of these additional call sites has an error message that knows > exactly what the conditions are to succeed. But if we want to > go that direction then I'd be inclined to forget about the > permissions-check function altogether and just test the > conditions in-line everywhere. I am ok with the above changes. I thought of modifying the ConfigOptionIsVisible function to take an extra argument, say validate_superuser_only. If this argument is true then it only considers GUC_SUPERUSER_ONLY check and return based on that. Otherwise it considers both GUC_SUPERUSER_ONLY and GUC_NO_SHOW_ALL and returns based on that. I understand that this just complicates the function and has other disadvantages. Instead of testing the conditions in-line, I prefer the use of function as done in v4 patch as it reduces the code size. Thanks & Regards, Nitin Jadhav On Mon, Jan 23, 2023 at 9:51 PM Tom Lane wrote: > > Bharath Rupireddy writes: > > LGTM. I've marked it RfC. > > After looking at this, it seemed to me that the factorization > wasn't quite right after all: specifically, the new function > could be used in several more places if it confines itself to > being a privilege check and doesn't consider GUC_NO_SHOW_ALL. > So more like the attached. > > You could argue that the factorization is illusory since each > of these additional call sites has an error message that knows > exactly what the conditions are to succeed. But if we want to > go that direction then I'd be inclined to forget about the > permissions-check function altogether and just test the > conditions in-line everywhere. > > Also, I intentionally dropped the GUC_NO_SHOW_ALL check in > get_explain_guc_options, because it seems redundant given > the preceding GUC_EXPLAIN check. It's unlikely we'd ever have > a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ... > but if we did, shouldn't the former take precedence here anyway? > > regards, tom lane >
Re: Improve GetConfigOptionValues function
>>> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in >>> get_explain_guc_options, because it seems redundant given >>> the preceding GUC_EXPLAIN check. It's unlikely we'd ever have >>> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ... >>> but if we did, shouldn't the former take precedence here anyway? > >> You're right, but there's nothing that prevents users writing GUCs >> with GUC_EXPLAIN and GUC_NO_SHOW_ALL. > > "Users"? You do realize those flags are only settable by C code, > right? Moreover, you haven't explained why it would be good that > you can't get at the behavior that a GUC is both shown in EXPLAIN > and not shown in SHOW ALL. If you want "not shown by either", > that's already accessible by setting only the GUC_NO_SHOW_ALL > flag. So I'd almost argue this is a bug fix, though I concede > it's a bit hard to imagine why somebody would want that choice. > Still, if we have two independent flags they should produce four > behaviors, not just three. I agree that the developer can use both GUC_NO_SHOW_ALL and GUC_EXPLAIN knowingly or unknowingly for a single GUC. If used by mistake then according to the existing code (without patch), GUC_NO_SHOW_ALL takes higher precedence whether it is marked first or last in the code. I am more convinced with this behaviour as I feel it is safer than exposing the information which the developer might not have intended. Thanks & Regards, Nitin Jadhav On Tue, Jan 24, 2023 at 8:43 PM Tom Lane wrote: > > Bharath Rupireddy writes: > > On Mon, Jan 23, 2023 at 9:51 PM Tom Lane wrote: > >> Also, I intentionally dropped the GUC_NO_SHOW_ALL check in > >> get_explain_guc_options, because it seems redundant given > >> the preceding GUC_EXPLAIN check. It's unlikely we'd ever have > >> a variable that's marked both GUC_EXPLAIN and GUC_NO_SHOW_ALL ... > >> but if we did, shouldn't the former take precedence here anyway? > > > You're right, but there's nothing that prevents users writing GUCs > > with GUC_EXPLAIN and GUC_NO_SHOW_ALL. > > "Users"? You do realize those flags are only settable by C code, > right? Moreover, you haven't explained why it would be good that > you can't get at the behavior that a GUC is both shown in EXPLAIN > and not shown in SHOW ALL. If you want "not shown by either", > that's already accessible by setting only the GUC_NO_SHOW_ALL > flag. So I'd almost argue this is a bug fix, though I concede > it's a bit hard to imagine why somebody would want that choice. > Still, if we have two independent flags they should produce four > behaviors, not just three. > > regards, tom lane
Re: drop postmaster symlink
Hi, On Wed, 2023-01-25 at 08:54 +0100, Peter Eisentraut wrote: > > Apart from your concerns, it appears there is consensus for making > this change. The RPM packaging scripts can obviously be fixed > easily for this. Do you have an objection to making this change? I'm inclined to create the symlink in the RPMs to make users' lives (and my life) easier. So, no objection from here. Regards, -- Devrim Gündüz Open Source Solution Architect, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
On Mon, Jan 23, 2023 at 6:30 PM Melih Mutlu wrote: > > Hi, > > Thanks for your review. > Attached updated versions of the patches. > Hello, I am still in the process of reviewing the patch, before that I tried to run below test: --publisher create table tab1(id int , name varchar); create table tab2(id int primary key , name varchar); create table tab3(id int primary key , name varchar); Insert into tab1 values(10, 'a'); Insert into tab1 values(20, 'b'); Insert into tab1 values(30, 'c'); Insert into tab2 values(10, 'a'); Insert into tab2 values(20, 'b'); Insert into tab2 values(30, 'c'); Insert into tab3 values(10, 'a'); Insert into tab3 values(20, 'b'); Insert into tab3 values(30, 'c'); create publication mypub for table tab1, tab2, tab3; --subscriber create table tab1(id int , name varchar); create table tab2(id int primary key , name varchar); create table tab3(id int primary key , name varchar); create subscription mysub connection 'dbname=postgres host=localhost user=shveta port=5432' publication mypub; --I see initial data copied, but new catalog columns srrelslotname and srreloriginname are not updated: postgres=# select sublastusedid from pg_subscription; sublastusedid --- 2 postgres=# select * from pg_subscription_rel; srsubid | srrelid | srsubstate | srsublsn | srrelslotname | srreloriginname -+-++---+---+- 16409 | 16384 | r | 0/15219E0 | | 16409 | 16389 | r | 0/15219E0 | | 16409 | 16396 | r | 0/15219E0 | | When are these supposed to be updated? I thought the slotname created will be updated here. Am I missing something here? Also the v8 patch does not apply on HEAD, giving merge conflicts. thanks Shveta
Re: Fix to enum hashing for dump and restore
Those are excellent points. We will investigate adjusting pg_dump behavior, as this is primarily a dump+restore issue. Thank you! -Andrew J Repp (VMware) On Tue, Jan 24, 2023, at 9:56 PM, Tom Lane wrote: > Andrew writes: > > I have discovered a bug in one usage of enums. If a table with hash > > partitions uses an enum as a partitioning key, it can no longer be > > backed up and restored correctly. This is because enums are represented > > simply as oids, and the hash function for enums hashes that oid to > > determine partition distribution. Given the way oids are assigned, any > > dump+restore of a database with such a table may fail with the error > > "ERROR: new row for relation "TABLENAME" violates partition constraint". > > Ugh, that was not well thought out :-(. I suppose this isn't a problem > for pg_upgrade, which should preserve the enum value OIDs, but an > ordinary dump/restore will indeed hit this. > > > I have written a patch to fix this bug (attached), by instead having the > > hashenum functions look up the enumsortorder ID of the value being > > hashed. These are deterministic across databases, and so allow for > > stable dump and restore. > > Unfortunately, I'm not sure those are as deterministic as all that. > They are floats, so there's a question of roundoff error, not to > mention cross-platform variations in what a float looks like. (At the > absolute minimum, I think we'd have to change your patch to force > consistent byte ordering of the floats.) Actually though, roundoff > error wouldn't be a problem for the normal exact-integer values of > enumsortorder. Where it could come into play is with the fractional > values used after you insert a value into the existing sort order. > And then the whole idea fails, because a dump/restore won't duplicate > those fractional values. > > Another problem with this approach is that we can't get from here to there > without a guaranteed dump/reload failure, since it's quite unlikely that > the partition assignment will be the same when based on enumsortorder > as it was when based on OIDs. Worse, it also breaks the pg_upgrade case. > > I wonder if it'd work to make pg_dump force --load-via-partition-root > mode when a hashed partition key includes an enum. > > regards, tom lane >
CREATE ROLE bug?
This works in PG 15: CREATE ROLE service CREATEROLE; CREATE ROLE service1 WITH LOGIN IN ROLE service; SET SESSION AUTHORIZATION service; CREATE ROLE service2 WITH LOGIN IN ROLE service; but generates an error in git master: CREATE ROLE service CREATEROLE; CREATE ROLE service1 WITH LOGIN IN ROLE service; SET SESSION AUTHORIZATION service; CREATE ROLE service2 WITH LOGIN IN ROLE service; --> ERROR: must have admin option on role "service" If I make 'service' a superuser, it works: CREATE ROLE service SUPERUSER; CREATE ROLE service1 WITH LOGIN IN ROLE service; SET SESSION AUTHORIZATION service; CREATE ROLE service2 WITH LOGIN IN ROLE service; It is probably related to this discussion and change: https://www.postgresql.org/message-id/flat/CA+TgmobGds7oefDjZUY+k_J7p1sS=pTq3sZ060qdb=okei1...@mail.gmail.com I am not sure if the behavior is wrong, the error message is wrong, or it is working as expected. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: plpython vs _POSIX_C_SOURCE
On Tue, Jan 24, 2023 at 11:37 PM Tom Lane wrote: > Andres Freund writes: > > Patches attached. > > +1 for 0001. I'm still nervous about 0002. However, maybe the > cases that we had trouble with are legacy issues that nobody cares > about anymore in 2023. We can always look for another answer if > we get complaints, I guess. It feels like things are changing so fast these days that whatever was happening 12 years ago is not likely to be relevant. Compilers change enough to cause warnings and even errors in just a few years. A decade is long enough for an entire platform to become irrelevant. Plus, the cost of experimentation here seems very low. Sure, something might break, but if it does, we can just change it back, or change it again. That's not really a big deal. The thing that would be a big deal, maybe, is if we released and only found out afterward that this caused some subtle and horrible problem for which we had no back-patchable fix, but that seems pretty unlikely. -- Robert Haas EDB: http://www.enterprisedb.com
Syncrep and improving latency due to WAL throttling
Hi, attached is proposal idea by Tomas (in CC) for protecting and prioritizing OLTP latency on syncrep over other heavy WAL hitting sessions. This is the result of internal testing and research related to the syncrep behavior with Tomas, Alvaro and me. The main objective of this work-in-progress/crude patch idea (with GUC default disabled) is to prevent build-up of lag against the synchronous standby. It allows DBA to maintain stable latency for many backends when in parallel there is some WAL-heavy activity happening (full table rewrite, VACUUM, MV build, archiving, etc.). In other words it allows slow down of any backend activity. Any feedback on such a feature is welcome, including better GUC name proposals ;) and conditions in which such feature should be disabled even if it would be enabled globally (right now only anti- wraparound VACUUM comes to mind, it's not in the patch). Demo; Given: - two DBs in syncrep configuration and artificial RTT 10ms latency between them (introduced via tc qdisc netem) - insertBIG.sql = "insert into bandwidthhog select repeat('c', 1000) from generate_series(1, 50);" (50MB of WAL data) - pgbench (8c) and 1x INSERT session There are clearly visible drops of pgbench (OLTP) latency when the WAL socket is saturated: with 16devel/master and synchronous_commit_flush_wal_after=0 (disabled, default/baseline): postgres@host1:~$ pgbench -n -R 50 -c 8 -T 15 -P 1 pgbench (16devel) progress: 1.0 s, 59.0 tps, lat 18.840 ms stddev 11.251, 0 failed, lag 0.059 ms progress: 2.0 s, 48.0 tps, lat 14.332 ms stddev 4.272, 0 failed, lag 0.063 ms progress: 3.0 s, 56.0 tps, lat 15.383 ms stddev 6.270, 0 failed, lag 0.061 ms progress: 4.0 s, 51.0 tps, lat 15.104 ms stddev 5.850, 0 failed, lag 0.061 ms progress: 5.0 s, 47.0 tps, lat 15.184 ms stddev 5.472, 0 failed, lag 0.063 ms progress: 6.0 s, 23.0 tps, lat 88.495 ms stddev 141.845, 0 failed, lag 0.064 ms progress: 7.0 s, 1.0 tps, lat 999.053 ms stddev 0.000, 0 failed, lag 0.077 ms progress: 8.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed, lag 0.000 ms progress: 9.0 s, 1.0 tps, lat 2748.142 ms stddev NaN, 0 failed, lag 0.072 ms progress: 10.0 s, 68.1 tps, lat 3368.267 ms stddev 282.842, 0 failed, lag 2911.857 ms progress: 11.0 s, 97.0 tps, lat 2560.750 ms stddev 216.844, 0 failed, lag 2478.261 ms progress: 12.0 s, 96.0 tps, lat 1463.754 ms stddev 376.276, 0 failed, lag 1383.873 ms progress: 13.0 s, 94.0 tps, lat 616.243 ms stddev 230.673, 0 failed, lag 527.241 ms progress: 14.0 s, 59.0 tps, lat 48.265 ms stddev 72.533, 0 failed, lag 15.181 ms progress: 15.0 s, 39.0 tps, lat 14.237 ms stddev 6.073, 0 failed, lag 0.063 ms transaction type: [..] latency average = 931.383 ms latency stddev = 1188.530 ms rate limit schedule lag: avg 840.170 (max 3605.569) ms session2 output: postgres=# \i insertBIG.sql Timing is on. INSERT 0 50 Time: 4119.485 ms (00:04.119) This new GUC makes it possible for the OLTP traffic to be less affected (latency-wise) when the heavy bulk traffic hits. With synchronous_commit_flush_wal_after=1024 (kB) it's way better, but latency rises up to 45ms: postgres@host1:~$ pgbench -n -R 50 -c 8 -T 15 -P 1 pgbench (16devel) progress: 1.0 s, 52.0 tps, lat 17.300 ms stddev 10.178, 0 failed, lag 0.061 ms progress: 2.0 s, 51.0 tps, lat 19.490 ms stddev 12.626, 0 failed, lag 0.061 ms progress: 3.0 s, 48.0 tps, lat 14.839 ms stddev 5.429, 0 failed, lag 0.061 ms progress: 4.0 s, 53.0 tps, lat 24.635 ms stddev 13.449, 0 failed, lag 0.062 ms progress: 5.0 s, 48.0 tps, lat 17.999 ms stddev 9.291, 0 failed, lag 0.062 ms progress: 6.0 s, 57.0 tps, lat 21.513 ms stddev 17.011, 0 failed, lag 0.058 ms progress: 7.0 s, 50.0 tps, lat 28.071 ms stddev 21.622, 0 failed, lag 0.061 ms progress: 8.0 s, 45.0 tps, lat 27.244 ms stddev 11.975, 0 failed, lag 0.064 ms progress: 9.0 s, 57.0 tps, lat 35.988 ms stddev 25.752, 0 failed, lag 0.057 ms progress: 10.0 s, 56.0 tps, lat 45.478 ms stddev 39.831, 0 failed, lag 0.534 ms progress: 11.0 s, 62.0 tps, lat 45.146 ms stddev 32.881, 0 failed, lag 0.058 ms progress: 12.0 s, 51.0 tps, lat 24.250 ms stddev 12.405, 0 failed, lag 0.063 ms progress: 13.0 s, 57.0 tps, lat 18.554 ms stddev 8.677, 0 failed, lag 0.060 ms progress: 14.0 s, 44.0 tps, lat 15.923 ms stddev 6.958, 0 failed, lag 0.065 ms progress: 15.0 s, 54.0 tps, lat 19.773 ms stddev 10.024, 0 failed, lag 0.063 ms transaction type: [..] latency average = 25.575 ms latency stddev = 21.540 ms session2 output: postgres=# set synchronous_commit_flush_wal_after = 1024; SET postgres=# \i insertBIG.sql INSERT 0 50 Time: 8889.318 ms (00:08.889) With 16devel/master and synchronous_commit_flush_wal_after=256 (kB) all is smooth: postgres@host1:~$ pgbench -n -R 50 -c 8 -T 15 -P 1 pgbench (16devel) progress: 1.0 s, 49.0 tps, lat 14.345 ms stddev 4.700, 0 failed, lag 0.062 ms progress: 2.0 s, 45.0 tps, lat 14.812 ms stddev 5.816, 0 failed, lag 0.064 ms progress: 3.0 s, 49.0 tps, lat 13.145 ms stddev 4.320, 0 failed
Re: CREATE ROLE bug?
On Wed, Jan 25, 2023 at 8:29 AM Bruce Momjian wrote: > This works in PG 15: > > CREATE ROLE service CREATEROLE; > CREATE ROLE service1 WITH LOGIN IN ROLE service; > SET SESSION AUTHORIZATION service; > CREATE ROLE service2 WITH LOGIN IN ROLE service; > > but generates an error in git master: > > CREATE ROLE service CREATEROLE; > CREATE ROLE service1 WITH LOGIN IN ROLE service; > SET SESSION AUTHORIZATION service; > CREATE ROLE service2 WITH LOGIN IN ROLE service; > --> ERROR: must have admin option on role "service" > > If I make 'service' a superuser, it works: > > CREATE ROLE service SUPERUSER; > CREATE ROLE service1 WITH LOGIN IN ROLE service; > SET SESSION AUTHORIZATION service; > CREATE ROLE service2 WITH LOGIN IN ROLE service; > > It is probably related to this discussion and change: > > > https://www.postgresql.org/message-id/flat/CA+TgmobGds7oefDjZUY+k_J7p1sS=pTq3sZ060qdb=okei1...@mail.gmail.com > > I am not sure if the behavior is wrong, the error message is wrong, or > it is working as expected. It is indeed related to that discussion and change. In existing released branches, a CREATEROLE user can make any role a member of any other role even if they have no rights at all with respect to that role. This means that a CREATEROLE user can create a new user in the pg_execute_server_programs group even though they have no access to it. That allows any CREATEROLE user to take over the OS account, and thus also superuser. In master, the rules have been tightened up. CREATEROLE no longer exempts you from the usual permission checks about adding a user to a group. This means that a CREATEROLE user now needs the same permissions to add a user to a group as any other user would need, i.e. ADMIN OPTION on the group. In your example, the "service" user has CREATEROLE and is therefore entitled to create new roles. However, "service" can only add those new roles to groups for which "service" possesses ADMIN OPTION. And "service" does not have ADMIN OPTION on itself, because no role ever possesses ADMIN OPTION on itself. I wrote a blog about this yesterday, which may or may not be of help: http://rhaas.blogspot.com/2023/01/surviving-without-superuser-coming-to.html I think that the new behavior will surprise some people, as it has surprised you, and it will take some time to get used to. However, I also think that the changes are absolutely essential. We've been shipping major releases for years and just pretending that it's OK that having CREATEROLE lets you take over the OS account and the superuser account. That's a security hole -- and not a subtle one. I could have easily exploited it as a teenager. My goals in doing this project were to (1) fix the security holes, (2) otherwise change as little about the behavior of CREATEROLE as possible, and (3) make CREATEROLE do something useful. We could have accomplished the first goal by just removing CREATEROLE or making it not do anything at all, but meeting the second and third goals at the same time require letting CREATEROLE continue to work but putting just enough restrictions on its power to keep it from being used as a privilege-escalation attack. I hope that what's been committed accomplishes that goal. -- Robert Haas EDB: http://www.enterprisedb.com
Re: old_snapshot_threshold bottleneck on replica
On Wed, Jan 25, 2023 at 3:52 AM Maxim Orlov wrote: > But in TransactionIdLimitedForOldSnapshots these variable is using > conjointly. So, I'm not > sure, is it completely safe to remove mutex. Well, that's something we - and ideally you, as the patch author - need to analyze and figure out. We can't just take a shot and hope for the best. > Actually, removing mutex and switch to atomics > was my first choice. I've run all the tests and no problems were found Unfortunately, that kind of testing is not very likely to find a subtle synchronization problem. That's why a theoretical analysis is so important. -- Robert Haas EDB: http://www.enterprisedb.com
More pgindent tweaks
After I committed 1249371632 I thought that I should really go ahead and do what I suggested and allow multiple exclude pattern files for pgindent. One obvious case is to exclude an in tree meson build directory. I also sometimes have other in tree objects I'd like to be able exclude. The attached adds this ability. It also unifies the logic for finding the regular exclude pattern file and the typedefs file. I took the opportunity to remove a badly thought out and dangerous feature whereby the first non-option argument, if it's not a .c or .h file, is taken as the typedefs file. That's particularly dangerous in a situation where git is producing a list of files that have changed and passing it on the command line to pgindent. I also removed a number of extraneous blank lines. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent index 1f95a1a34e..2ddfe07982 100755 --- a/src/tools/pgindent/pgindent +++ b/src/tools/pgindent/pgindent @@ -22,7 +22,7 @@ my $indent_opts = my $devnull = File::Spec->devnull; my ($typedefs_file, $typedef_str, $code_base, - $excludes, $indent, $build, + @excludes, $indent, $build, $show_diff, $silent_diff, $help); $help = 0; @@ -32,7 +32,7 @@ my %options = ( "typedefs=s" => \$typedefs_file, "list-of-typedefs=s" => \$typedef_str, "code-base=s"=> \$code_base, - "excludes=s" => \$excludes, + "excludes=s" => \@excludes, "indent=s" => \$indent, "build" => \$build, "show-diff" => \$show_diff, @@ -46,10 +46,8 @@ usage("Cannot have both --silent-diff and --show-diff") run_build($code_base) if ($build); -# command line option wins, then first non-option arg, -# then environment (which is how --build sets it) , +# command line option wins, then environment (which is how --build sets it) , # then locations. based on current dir, then default location -$typedefs_file ||= shift if @ARGV && $ARGV[0] !~ /\.[ch]$/; $typedefs_file ||= $ENV{PGTYPEDEFS}; # build mode sets PGINDENT @@ -58,14 +56,15 @@ $indent ||= $ENV{PGINDENT} || $ENV{INDENT} || "pg_bsd_indent"; # no non-option arguments given. so do everything in the current directory $code_base ||= '.' unless @ARGV; +my $sourcedir = locate_sourcedir(); + # if it's the base of a postgres tree, we will exclude the files # postgres wants excluded -$excludes ||= "$code_base/src/tools/pgindent/exclude_file_patterns" - if $code_base && -f "$code_base/src/tools/pgindent/exclude_file_patterns"; - -# also look under the current directory for the exclude patterns file -$excludes ||= "src/tools/pgindent/exclude_file_patterns" - if -f "src/tools/pgindent/exclude_file_patterns"; +if ($sourcedir) +{ + my $exclude_candidate = "$sourcedir/exclude_file_patterns"; + push (@excludes, $exclude_candidate) if -f $exclude_candidate; +} # The typedef list that's mechanically extracted by the buildfarm may omit # some names we want to treat like typedefs, e.g. "bool" (which is a macro @@ -85,7 +84,6 @@ my %excluded = map { +"$_\n" => 1 } qw( my @files; my $filtered_typedefs_fh; - sub check_indent { system("$indent -? < $devnull > $devnull 2>&1"); @@ -114,26 +112,34 @@ sub check_indent return; } +sub locate_sourcedir +{ + # try fairly hard to locate the sourcedir + my $where = $code_base || '.'; + my $sub = "$where/src/tools/pgindent"; + return $sub if -d $sub; + # try to find it from an ancestor directory + $sub = "../src/tools/pgindent"; + foreach (1..4) + { + return $sub if -d $sub; + $sub = "../$sub"; + } + return; # undef if nothing found +} sub load_typedefs { - # try fairly hard to find the typedefs file if it's not set - foreach my $try ('.', 'src/tools/pgindent', '/usr/local/etc') + foreach my $try ('.', $sourcedir, '/usr/local/etc') { - $typedefs_file ||= "$try/typedefs.list" + last if $typedefs_file; + next unless defined $try; + $typedefs_file = "$try/typedefs.list" if (-f "$try/typedefs.list"); } - # try to find typedefs by moving up directory levels - my $tdtry = ".."; - foreach (1 .. 5) - { - $typedefs_file ||= "$tdtry/src/tools/pgindent/typedefs.list" - if (-f "$tdtry/src/tools/pgindent/typedefs.list"); - $tdtry = "$tdtry/.."; - } die "cannot locate typedefs file \"$typedefs_file\"\n" unless $typedefs_file && -f $typedefs_file; @@ -166,13 +172,13 @@ sub load_typedefs return $filter_typedefs_fh; } - sub process_exclude { - if ($excludes && @files) + foreach my $excl (@excludes) { - open(my $eh, '<', $excludes) - || die "cannot open exclude file \"$excludes\"\n"; + last unless @files; + open(my $eh, '<', $excl) + || die "cannot open exclude file \"$excl\"\n"; while (my $line = <$eh>) { chomp $line; @@ -185,7 +191,6 @@ sub process_exclude return; } - sub read_source { my $source_filename = shift; @@ -200,7 +205,6 @@ sub read_source return $source; } -
Re: Re: Support plpgsql multi-range in conditional control
Hello, this usage scenario is from Oracle's PL/SQL language (I have been doing the function development of PL/SQL language for some time). I think this patch is very practical and will expand our for loop scenario. In short, I look forward to your reply. Happy Chinese New Year! songjinzhou(2903807...@qq.com) Maybe you didn't understand my reply. Without some significant real use case, I am strongly against the proposed feature and merging your patch to upstream. I don't see any reason to enhance language with this feature. Regards Pavel
Re: Re: Support plpgsql multi-range in conditional control
Hi st 25. 1. 2023 v 15:18 odesílatel songjinzhou <2903807...@qq.com> napsal: > Hello, this usage scenario is from Oracle's PL/SQL language (I have been > doing the function development of PL/SQL language for some time). I think > this patch is very practical and will expand our for loop scenario. In > short, I look forward to your > I don't see any real usage. PL/SQL doesn't support proposed syntax. Regards Pavel > reply. > > Happy Chinese New Year! > > -- > songjinzhou(2903807...@qq.com) > > > Maybe you didn't understand my reply. Without some significant real use > case, I am strongly against the proposed feature and merging your patch to > upstream. I don't see any reason to enhance language with this feature. > > Regards > > Pavel > > >
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Wednesday, January 25, 2023 3:27 PM Kyotaro Horiguchi wrote: > At Tue, 24 Jan 2023 12:19:04 +, "Takamichi Osumi (Fujitsu)" > wrote in > > Attached the patch v20 that has incorporated all comments so far. > > Thanks! I looked thourgh the documentation part. Thank you for your review ! > + > + subminapplydelay int8 > + > + > + Total time spent delaying the application of changes, in milliseconds. > + > > I was confused becase it reads as this column shows the summarized actual > waiting time caused by min_apply_delay. IIUC actually it shows the > min_apply_delay setting for the subscription. Thus shouldn't it be something > like this? > > "The minimum amount of time to delay applying changes, in milliseconds" > And it might be better to mention the corresponding subscription paramter. This description looks much better to me than the past description. Fixed. OTOH, other parameters don't mention about its subscription parameters. So, I didn't add the mention. > + error. If wal_receiver_status_interval is set to > + zero, the apply worker doesn't send any feedback messages during > the > + min_apply_delay period. > > I took a bit longer time to understand what this sentence means. I'd like to > suggest something like the follwoing. > > "Since no status-update messages are sent while delaying, note that > wal_receiver_status_interval is the only source of keepalive messages during > that period." The current patch's description is precise and I prefer that. I would say "the only source" would be confusing to readers. However, I slightly adjusted the description a bit. Could you please check ? > + > + A logical replication subscription can delay the application of changes by > + specifying the min_apply_delay subscription > parameter. > + See for details. > + > > I'm not sure "logical replication subscription" is a common term. > Doesn't just "subscription" mean the same, especially in that context? > (Note that 31.2 starts with "A subscription is the downstream.."). I think you are right. Fixed. > + Any delay occurs only on WAL records for transaction begins after > all > + initial table synchronization has finished. The delay is > + calculated > > There is no "transaction begin" WAL records. Maybe it is "logical replication > transaction begin message". The timestamp is of "commit time". (I took > "transaction begins" as a noun, but that might be > wrong..) Yeah, we can improve here. But, we need to include not only "commit" but also "prepare" as nuance in this part. In short, I think we should change here to mention (1) the delay happens after all initial table synchronization (2) how delay is applied for non-streaming and streaming transactions in general. By the way, WAL timestamp is a word used in the recovery_min_apply_delay. So, I'd like to keep it to make the description more aligned with it, until there is a better description. Updated the doc. I adjusted the commit message according to this fix. > > +may reduce the actual wait time. It is also possible that the > overhead > +already exceeds the requested min_apply_delay > value, > +in which case no additional wait is necessary. If the system > + clocks > > I'm not sure it is right to say "necessary" here. IMHO it might be better be > "in > which case no delay is applied". Agreed. Fixed. > + in which case no additional wait is necessary. If the system clocks > + on publisher and subscriber are not synchronized, this may lead to > + apply changes earlier than expected, but this is not a major issue > + because this parameter is typically much larger than the time > + deviations between servers. Note that if this parameter is > + set to a > > This doesn't seem to fit our documentation. It is not our business whether a > certain amount deviation is critical or not. How about somethig like the > following? > > "Note that the delay is measured between the timestamp assigned by > publisher and the system clock on subscriber. You need to manage the > system clocks to be in sync so that the delay works properly." As discussed, this is aligned with recovery_min_apply_delay. So, I keep it. > +Delaying the replication can mean there is a much longer time > +between making a change on the publisher, and that change > being > +committed on the subscriber. This can impact the performance > of > +synchronous replication. See linkend="guc-synchronous-commit"/> > +parameter. > > Do we need the "can" in "Delaying the replication can mean"? If we want to > say, it might be "Delaying the replication means there can be a much > longer..."? The "can" indicates the possibility as the nuance, while adopting "means" in this case indicates "time delayed LR causes the long time wait always". I'm okay with either expressio
RE: Perform streaming logical transactions by background workers and parallel apply
On Wednesday, January 25, 2023 7:30 AM Peter Smith wrote: > > Here are my review comments for patch v87-0002. Thanks for your comments. > == > doc/src/sgml/config.sgml > > 1. > > -Allows streaming or serializing changes immediately in > logical decoding. > The allowed values of > logical_replication_mode are > -buffered and immediate. When > set > -to immediate, stream each change if > +buffered and immediate. > The default > +is buffered. > + > > I didn't think it was necessary to say “of logical_replication_mode”. > IMO that much is already obvious because this is the first sentence of the > description for logical_replication_mode. > Changed. > ~~~ > > 2. > + > +On the publisher side, it allows streaming or serializing changes > +immediately in logical decoding. When set to > +immediate, stream each change if > streaming option (see optional parameters set by > CREATE > SUBSCRIPTION) > is enabled, otherwise, serialize each change. When set to > -buffered, which is the default, decoding will > stream > -or serialize changes when > logical_decoding_work_mem > -is reached. > +buffered, decoding will stream or serialize > changes > +when logical_decoding_work_mem is > reached. > > > 2a. > "it allows" --> "logical_replication_mode allows" > > 2b. > "decoding" --> "the decoding" Changed. > ~~~ > > 3. > + > +On the subscriber side, if streaming option is set > +to parallel, this parameter also allows the leader > +apply worker to send changes to the shared memory queue or to > serialize > +changes. When set to buffered, the leader sends > +changes to parallel apply workers via shared memory queue. When > set to > +immediate, the leader serializes all changes to > +files and notifies the parallel apply workers to read and apply them > at > +the end of the transaction. > + > > "this parameter also allows" --> "logical_replication_mode also allows" Changed. > ~~~ > > 4. > > This parameter is intended to be used to test logical decoding and > replication of large transactions for which otherwise we need to > generate the changes till > logical_decoding_work_mem > -is reached. > +is reached. Moreover, this can also be used to test the transmission > of > +changes between the leader and parallel apply workers. > > > "Moreover, this can also" --> "It can also" > > I am wondering would this sentence be better put at the top of the GUC > description. So then the first paragraph becomes like this: > > > SUGGESTION (I've also added another sentence "The effect of...") > > The allowed values are buffered and immediate. The default is buffered. This > parameter is intended to be used to test logical decoding and replication of > large > transactions for which otherwise we need to generate the changes till > logical_decoding_work_mem is reached. It can also be used to test the > transmission of changes between the leader and parallel apply workers. The > effect of logical_replication_mode is different for the publisher and > subscriber: > > On the publisher side... > > On the subscriber side... I think your suggestion makes sense, so changed as suggested. > == > .../replication/logical/applyparallelworker.c > > 5. > + /* > + * In immeidate mode, directly return false so that we can switch to > + * PARTIAL_SERIALIZE mode and serialize remaining changes to files. > + */ > + if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE) return > + false; > > Typo "immediate" > > Also, I felt "directly" is not needed. "return false" and "directly return > false" is the > same. > > SUGGESTION > Using ‘immediate’ mode returns false to cause a switch to PARTIAL_SERIALIZE > mode so that the remaining changes will be serialized. Changed. > == > src/backend/utils/misc/guc_tables.c > > 6. > { > {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS, > - gettext_noop("Allows streaming or serializing each change in logical > decoding."), > - NULL, > + gettext_noop("Controls the behavior of logical replication publisher > and subscriber"), > + gettext_noop("If set to immediate, on the publisher side, it " > + "allows streaming or serializing each change in " > + "logical decoding. On the subscriber side, in " > + "parallel streaming mode, it allows the leader apply " > + "worker to serialize changes to files and notifies " > + "the parallel apply workers to read and apply them at " > + "the end of the transaction."), > GUC_NOT_IN_SAMPLE > }, > > 6a. short description > > User PoV behaviour should be the same. Instead, maybe say "controls the > internal behavior" or something like that? Changed to "internal behavior xxx" > ~ > > 6b. long description > >
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Wednesday, January 25, 2023 3:55 PM Amit Kapila wrote: > On Wed, Jan 25, 2023 at 11:23 AM Takamichi Osumi (Fujitsu) > wrote: > > > > > > Thank you for checking the patch ! > > On Wednesday, January 25, 2023 10:17 AM Kyotaro Horiguchi > wrote: > > > In short, I'd like to propose renaming the parameter > > > in_delayed_apply of send_feedback to "has_unprocessed_change". > > > > > > At Tue, 24 Jan 2023 12:27:58 +0530, Amit Kapila > > > wrote in > > > > > send_feedback(): > > > > > +* If the subscriber side apply is delayed (because of > > > time-delayed > > > > > +* replication) then do not tell the publisher that the > > > > > + received > > > latest > > > > > +* LSN is already applied and flushed, otherwise, it leads to > the > > > > > +* publisher side making a wrong assumption of logical > > > replication > > > > > +* progress. Instead, we just send a feedback message to > > > > > + avoid a > > > publisher > > > > > +* timeout during the delay. > > > > > */ > > > > > - if (!have_pending_txes) > > > > > + if (!have_pending_txes && !in_delayed_apply) > > > > > flushpos = writepos = recvpos; > > > > > > > > > > Honestly I don't like this wart. The reason for this is the > > > > > function assumes recvpos = applypos but we actually call it > > > > > while holding unapplied changes, that is, applypos < recvpos. > > > > > > > > > > Couldn't we maintain an additional static variable "last_applied" > > > > > along with last_received? > > > > > > > > > > > > > It won't be easy to maintain the meaning of last_applied because > > > > there are cases where we don't apply the change directly. For > > > > example, in case of streaming xacts, we will just keep writing it > > > > to the file, now, say, due to some reason, we have to send the > > > > feedback, then it will not allow you to update the latest write > > > > locations. This would then become different then what we are doing > without the patch. > > > > Another point to think about is that we also need to keep the > > > > variable updated for keep-alive ('k') messages even though we > > > > don't apply anything in that case. Still, other cases to consider > > > > are where we have mix of streaming and non-streaming transactions. > > > > > > Yeah. Even though I named it as "last_applied", its objective is to > > > have get_flush_position returning the correct have_pending_txes > > > without a hint from callers, that is, "let g_f_position know if > > > store_flush_position has been called with the last received data". > > > > > > Anyway I tried that but didn't find a clean and simple way. However, > > > while on it, I realized what the code made me confused. > > > > > > +static void send_feedback(XLogRecPtr recvpos, bool force, bool > > > requestReply, > > > + bool > > > + in_delayed_apply); > > > > > > The name "in_delayed_apply" doesn't donsn't give me an idea of what > > > the function should do for it. If it is named > > > "has_unprocessed_change", I think it makes sense that send_feedback > > > should think there may be an outstanding transaction that is not known to > the function. > > > > > > > > > So, my conclusion here is I'd like to propose changing the parameter > > > name to "has_unapplied_change". > > Renamed the variable name to "has_unprocessed_change". > > Also, removed the first argument of the send_feedback() which isn't > necessary now. > > > > Why did you remove the first argument of the send_feedback() when that is not > added by this patch? If you really think that is an improvement, feel free to > propose that as a separate patch. > Personally, I don't see a value in it. Oh, sorry for that. I have made the change back. Kindly have a look at the v22 shared in [1]. [1] - https://www.postgresql.org/message-id/TYCPR01MB837305BD31FA317256BC7B1FEDCE9%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
Re: CREATE ROLE bug?
On Wed, Jan 25, 2023 at 08:47:14AM -0500, Robert Haas wrote: > > I am not sure if the behavior is wrong, the error message is wrong, or > > it is working as expected. > > It is indeed related to that discussion and change. In existing > released branches, a CREATEROLE user can make any role a member of any > other role even if they have no rights at all with respect to that > role. This means that a CREATEROLE user can create a new user in the > pg_execute_server_programs group even though they have no access to > it. That allows any CREATEROLE user to take over the OS account, and > thus also superuser. In master, the rules have been tightened up. > CREATEROLE no longer exempts you from the usual permission checks > about adding a user to a group. This means that a CREATEROLE user now > needs the same permissions to add a user to a group as any other user > would need, i.e. ADMIN OPTION on the group. > > In your example, the "service" user has CREATEROLE and is therefore > entitled to create new roles. However, "service" can only add those > new roles to groups for which "service" possesses ADMIN OPTION. And > "service" does not have ADMIN OPTION on itself, because no role ever > possesses ADMIN OPTION on itself. So, how would someone with CREATEROLE permission add people to their own role, without superuser permission? Are we adding any security by preventing this? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: CREATE ROLE bug?
On Wed, Jan 25, 2023 at 7:35 AM Bruce Momjian wrote: > > So, how would someone with CREATEROLE permission add people to their own > role, without superuser permission? Are we adding any security by > preventing this? > > As an encouraged design choice you wouldn't. You'd create a new group and add both yourself and the new role to it - then grant it the desired permissions. A CREATEROLE role should probably be a user (LOGIN) role and user roles should not have members. David J.
Re: More pgindent tweaks
On Wed, Jan 25, 2023 at 08:59:44AM -0500, Andrew Dunstan wrote: > After I committed 1249371632 I thought that I should really go ahead and > do what I suggested and allow multiple exclude pattern files for > pgindent. One obvious case is to exclude an in tree meson build > directory. I also sometimes have other in tree objects I'd like to be > able exclude. > > The attached adds this ability. It also unifies the logic for finding > the regular exclude pattern file and the typedefs file. > > I took the opportunity to remove a badly thought out and dangerous > feature whereby the first non-option argument, if it's not a .c or .h > file, is taken as the typedefs file. That's particularly dangerous in a > situation where git is producing a list of files that have changed and > passing it on the command line to pgindent. I also removed a number of > extraneous blank lines. Can we make the pgindent options more visible, perhaps by adding them to pgindent.man or at least saying type pgindent --help? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: heapgettup() with NoMovementScanDirection unused in core?
David Rowley writes: > Does anyone know of any reason why we shouldn't ditch the nomovement > code in heapgettup/heapgettup_pagemode? AFAICS, the remaining actual use-case for NoMovementScanDirection is that defined by ExecutorRun: *If direction is NoMovementScanDirection then nothing is done *except to start up/shut down the destination. Otherwise, *we retrieve up to 'count' tuples in the specified direction. * *Note: count = 0 is interpreted as no portal limit, i.e., run to *completion. We must have the NoMovementScanDirection option because count = 0 does not mean "do nothing", and I noted at least two call sites that require it. The heapgettup definition is thus not only unreachable, but confusingly inconsistent with this meaning. I wonder if we couldn't also get rid of this confusingly-inconsistent alternative usage in the planner: * 'indexscandir' is one of: *ForwardScanDirection: forward scan of an ordered index *BackwardScanDirection: backward scan of an ordered index *NoMovementScanDirection: scan of an unordered index, or don't care * (The executor doesn't care whether it gets ForwardScanDirection or * NoMovementScanDirection for an indexscan, but the planner wants to * distinguish ordered from unordered indexes for building pathkeys.) While that comment's claim is plausible, I think it's been wrong for years. AFAICS indxpath.c makes pathkeys before it ever does this: index_is_ordered ? ForwardScanDirection : NoMovementScanDirection, and nothing depends on that later either. So I think we could simplify this to something like "indexscandir is either ForwardScanDirection or BackwardScanDirection. (Unordered index types need not support BackwardScanDirection.)" regards, tom lane
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
On Tue, Jan 24, 2023 at 3:33 PM Peter Geoghegan wrote: > Sure, it's possible that such a cancellable aggressive autovacuum was > indeed cancelled, and that that factor made the crucial difference. > But I find it far easier to believe that there simply was no such > aggressive autovacuum in the first place (not this time), since it > could have only happened when autovacuum thinks that there are > sufficiently many dead tuples to justify launching an autovacuum in > the first place. Which, as we now all accept, is based on highly > dubious sampling by ANALYZE. So I think it's much more likely to be > that factor (dead tuple accounting is bad), as well as the absurd > false dichotomy between aggressive and non-aggressive -- plus the > issue at hand, the auto-cancellation behavior. In my opinion, this is too speculative to justify making changes to the behavior. I'm not here to defend the way we do dead tuple accounting. I think it's a hot mess. But whether or not it played any role in this catastrophe is hard to say. The problems with dead tuple accounting are, as I understand it, all about statistical independence. That is, we make assumptions that what's true of the sample is likely to be true of the whole table when in reality it may not be true at all. Perhaps it's even unlikely to be true. But the kinds of problems you get from assuming statistical independence tend to hit users very unevenly. We make similar assumptions about selectivity estimation: unless there are extended statistics, we take P(a=1 and b=1) = P(a=1)*P(b = 1), which can be vastly and dramatically wrong. People can and do get extremely bad query plans as a result of that assumption. However, other people run PostgreSQL for years and years and never really have big problems with it. I'd say that it's completely fair to describe this as a big problem, but we can't therefore conclude that some particular user has this problem, not even if we know that they have a slow query and we know that it's due to a bad plan. And similarly here, I don't see a particular reason to think that your theory about what happened is more likely than mine. I freely admit that yours could be right, just as you admitted that mine could be right. But I think we really just don't know. It feels unlikely to me that there was ONE cancellable aggressive autovacuum and that it got cancelled. I think that it's probably either ZERO or LOTS, depending on whether the dead tuple threshold was ever reached. If it wasn't, then it must have been zero. But if it was, and the first autovacuum worker to visit that table got cancelled, then the next one would try again. And autovacuum_naptime=1m, so if we're not out of autovacuum workers, we're going to retry that table every minute. If we do run out of autovacuum workers, which is pretty likely, we'll still launch new workers in that database as often as we can given when other workers exit. If the system is very tight on autovacuum capacity, probably because the cost limit is too low, then you could have a situation where only one try gets made before we hit autovacuum_freeze_max_age. Otherwise, though, a single failed try would probably lead to trying a whole lot more times after that, and you only hit autovacuum_freeze_max_age if all those attempts fail. At the risk of repeating myself, here's what bugs me. If we suppose that your intuition is right and no aggressive autovacuum happened before autovacuum_freeze_max_age was reached, then what you are proposing will make things better. But if we suppose that my intuition is right and many aggressive autovacuums happened before autovacuum_freeze_max_age was reached, then what you are proposing will make things worse, because if we've been auto-cancelling repeatedly we're probably going to keep doing so until we shut that behavior off, and we want a vacuum to succeed sooner rather than later. So it doesn't feel obvious to me that we should change anything. Even if we knew what had happened for certain in this particular case, I don't know how we can possibly know what is typical in similar cases. My personal support experience has been that cases where autovacuum runs a lot but doesn't solve the problem for some reason are a lot more common than cases where it doesn't run when it should have done. That probably accounts for my intuition about what is likely to have happened here. But as my colleagues are often at pains to point out to me, my experiences aren't representative of what happens to PostgreSQL users generally for all kinds of reasons, and therefore sometimes my intuition is wrong. But since I have nobody else's experiences to use in lieu of my own, I don't know what else I can use to judge anything. -- Robert Haas EDB: http://www.enterprisedb.com
Re: CREATE ROLE bug?
On Wed, Jan 25, 2023 at 07:38:51AM -0700, David G. Johnston wrote: > On Wed, Jan 25, 2023 at 7:35 AM Bruce Momjian wrote: > > > So, how would someone with CREATEROLE permission add people to their own > role, without superuser permission? Are we adding any security by > preventing this? > > > > As an encouraged design choice you wouldn't. You'd create a new group and add > both yourself and the new role to it - then grant it the desired permissions. > > A CREATEROLE role should probably be a user (LOGIN) role and user roles should > not have members. Makes sense. I was actually using that pattern, but in running some test scripts that didn't revert back to the superuser, I saw the errors and was confused. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
[PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi hackers, Currently we allow self-conflicting inserts for ON CONFLICT DO NOTHING: ``` CREATE TABLE t (a INT UNIQUE, b INT); INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT DO NOTHING; -- succeeds, inserting the first row and ignoring the second ``` ... but not for ON CONFLICT .. DO UPDATE: ``` INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0; ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values. ``` Tom pointed out in 2016 that this is actually a bug [1] and I agree. The proposed patch fixes this. [1]: https://www.postgresql.org/message-id/22438.1477265185%40sss.pgh.pa.us -- Best regards, Aleksander Alekseev v1-0001-Make-ON-CONFLICT-DO-NOTHING-and-ON-CONFLICT-DO-UP.patch Description: Binary data
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Hi, I've updated this patch for the current master. Also I have some additional explanations.. On Wed, 2023-01-18 at 17:29 +0100, Tomas Vondra wrote: > 1) I'm not sure why the patch is adding tests of permissions on the > pg_stat_statements_reset function? I've fixed that > > 2) If we want the second timestamp, shouldn't it also cover resets of > the mean values, not just min/max? I think that mean values are not a targets for auxiliary resets because any sampling solution can easily calculate the mean values between samples without a reset. > > 3) I don't understand why the patch is adding "IS NOT NULL AS t" to > various places in the regression tests. This change is necessary in the current version because the pg_stat_statements_reset() function will return a timestamp of a reset, needed for sampling solutions to detect resets, perfermed by someone else. Regards -- Andrei Zubkov From 8214db3676e686993bcf73963f78c96baeb04c4e Mon Sep 17 00:00:00 2001 From: Andrei Zubkov Date: Wed, 25 Jan 2023 18:13:14 +0300 Subject: [PATCH] pg_stat_statements: Track statement entry timestamp This patch adds stats_since and minmax_stats_since columns to the pg_stat_statements view and pg_stat_statements() function. The new min/max reset mode for the pg_stat_stetments_reset() function is controlled by the parameter minmax_only. stat_since column is populated with the current timestamp when a new statement is added to the pg_stat_statements hashtable. It provides clean information about statistics collection time interval for each statement. Besides it can be used by sampling solutions to detect situations when a statement was evicted and stored again between samples. Such sampling solution could derive any pg_stat_statements statistic values for an interval between two samples with the exception of all min/max statistics. To address this issue this patch adds the ability to reset min/max statistics independently of the statement reset using the new minmax_only parameter of the pg_stat_statements_reset(userid oid, dbid oid, queryid bigint, minmax_only boolean) function. Timestamp of such reset is stored in the minmax_stats_since field for each statement. pg_stat_statements_reset() function now returns the timestamp of a reset as a result. Discussion: https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru --- contrib/pg_stat_statements/Makefile | 2 +- .../expected/oldextversions.out | 70 .../expected/pg_stat_statements.out | 361 +- .../pg_stat_statements--1.10--1.11.sql| 81 .../pg_stat_statements/pg_stat_statements.c | 139 +-- .../pg_stat_statements.control| 2 +- .../pg_stat_statements/sql/oldextversions.sql | 7 + .../sql/pg_stat_statements.sql| 149 +++- doc/src/sgml/pgstatstatements.sgml| 66 +++- 9 files changed, 721 insertions(+), 156 deletions(-) create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.10--1.11.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index edc40c8bbfb..0afb9060fa1 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,7 +6,7 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql \ +DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.10--1.11.sql \ pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out index efb2049ecff..1e1cc11a8e2 100644 --- a/contrib/pg_stat_statements/expected/oldextversions.out +++ b/contrib/pg_stat_statements/expected/oldextversions.out @@ -250,4 +250,74 @@ SELECT count(*) > 0 AS has_data FROM pg_stat_statements; t (1 row) +-- New functions and views for pg_stat_statements in 1.11 +AlTER EXTENSION pg_stat_statements UPDATE TO '1.11'; +\d pg_stat_statements + View "public.pg_stat_statements" + Column | Type | Collation | Nullable | Default ++--+---+--+- + userid | oid | | | + dbid | oid | | | + toplevel | boolean | | | + queryid| bigint | | | + query | text | | | + plans | bigint | | | + total_plan_time| double precision | | | + min_plan_time |
Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist
Hello Jim/Jacob, > > I do not think it is worth it to change the current behavior of > PostgreSQL > > in that sense. > > Well, I am not suggesting to change the current behavior of PostgreSQL in > that matter. Quite the contrary, I find this feature very convenient, > specially when you need to deal with many different clusters. What I am > proposing is rather the possibility to disable it on demand :) I mean, > in case I do not want libpq to try to authenticate using the certificates > in `~/.postgresql`. > > > PostgreSQL looks for the cert and key under `~/.postgresql` as a > facility. > > These files do not exist by default, so if PostgreSQL finds something in > > there it assumes you want to use it. > > Yes. I'm just trying to find an elegant way to disable this assumption > on demand. Right, I do understand your proposal. I was just thinking out loud and wondering about the broad audience of such a mode in the sslmode argument. Something else that came to my mind is that sslmode itself seems more like an argument covering the client expectations regarding the connection to the server, I mean, if it expects channel encryption and/or validation of the server identity. I wonder if we are willing to add some functionality around the expectations regarding the client certificate, if it wouldn't make more sense to be controlled through something like the clientcert option of pg_hba? If so, the downside of that is the fact that the client would still send the certificate even if it would not be used at all by the server. Again, just thinking out loud about what your goal is and possible ways of accomplishing that:) > > I do realize that this patch is a big ask, since probably nobody except > > me "needs it" :D > > I'd imagine other people have run into it too; it's just a matter of > how palatable the workarounds were to them. :) I imagine more people might have already hit a similar situation too. While the workaround can seem a bit weird, in my very humble opinion the user/client is somehow still the one to blame in this case as it is providing the "wrong" file in a path that is checked by libpq. With that in mind I would be inclined to say it is an acceptable workaround. > > I think the sslcertmode=disable option that I introduced in [1] solves this issue too; > > Well, I see there is indeed a significant overlap between our patches - > but yours has a much more comprehensive approach! If I got it right, > the new slcertmode=disable would indeed cancel the existing certs in > '~/.postgresql/ in case they exist. Right? > > +if (conn->sslcertmode[0] == 'd') /* disable */ > +{ > +/* don't send a client cert even if we have one */ > +have_cert = false; > +} > +else if (fnbuf[0] == '\0') > > My idea was rather to use the existing sslmode with a new option > "no-clientcert" that does actually the same: > > /* sslmode no-clientcert */ > if (conn->sslmode[0] == 'n') > { > fnbuf[0] = '\0'; > } > > ... > > if (fnbuf[0] == '\0') > { > /* no home directory, proceed without a client cert */ > have_cert = false; > } > > I wish I had found your patchset some months ago. Now I hate myself > for the duplication of efforts :D Although both patches achieve a similar goal regarding not sending the client certificate there is still a slight but in my opinion important difference between them: sslmode=disable will also disable channel encryption. It may or may not be acceptable depending on how the connection is between your client and the server. Kind regards, Israel.
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, 24 Jan 2023 at 23:50, Peter Geoghegan wrote: > > On Mon, Jan 16, 2023 at 5:55 PM Peter Geoghegan wrote: > > 0001 (the freezing strategies patch) is now committable IMV. Or at > > least will be once I polish the docs a bit more. I plan on committing > > 0001 some time next week, barring any objections. > > I plan on committing 0001 (the freezing strategies commit) tomorrow > morning, US Pacific time. > > Attached is v17. There are no significant differences compared to v17. > I decided to post a new version now, ahead of commit, to show how I've > cleaned up the docs in 0001 -- docs describing the new GUC, freeze > strategies, and so on. LGTM, +1 on 0001 Some more comments on 0002: > +lazy_scan_strategy(LVRelState *vacrel, bool force_scan_all) > scanned_pages_lazy & scanned_pages_eager We have not yet scanned the pages, so I suggest plan/scan_pages_eager and *_lazy as variable names instead, to minimize confusion about the naming. I'll await the next iteration of 0002 in which you've completed more TODOs before I'll dig deeper into that patch. Kind regards, Matthias van de Meent
Re: Improve GetConfigOptionValues function
Nitin Jadhav writes: > I agree that the developer can use both GUC_NO_SHOW_ALL and > GUC_EXPLAIN knowingly or unknowingly for a single GUC. If used by > mistake then according to the existing code (without patch), > GUC_NO_SHOW_ALL takes higher precedence whether it is marked first or > last in the code. I am more convinced with this behaviour as I feel it > is safer than exposing the information which the developer might not > have intended. Both of you are arguing as though GUC_NO_SHOW_ALL is a security property. It is not, or at least it's so trivially bypassable that it's useless to consider it one. All it is is a de-clutter mechanism. regards, tom lane
Re: More pgindent tweaks
On 2023-01-25 We 09:41, Bruce Momjian wrote: > On Wed, Jan 25, 2023 at 08:59:44AM -0500, Andrew Dunstan wrote: >> After I committed 1249371632 I thought that I should really go ahead and >> do what I suggested and allow multiple exclude pattern files for >> pgindent. One obvious case is to exclude an in tree meson build >> directory. I also sometimes have other in tree objects I'd like to be >> able exclude. >> >> The attached adds this ability. It also unifies the logic for finding >> the regular exclude pattern file and the typedefs file. >> >> I took the opportunity to remove a badly thought out and dangerous >> feature whereby the first non-option argument, if it's not a .c or .h >> file, is taken as the typedefs file. That's particularly dangerous in a >> situation where git is producing a list of files that have changed and >> passing it on the command line to pgindent. I also removed a number of >> extraneous blank lines. > Can we make the pgindent options more visible, perhaps by adding them to > pgindent.man or at least saying type pgindent --help? Sure, will do. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Re: Support plpgsql multi-range in conditional control
>Hi >st 25. 1. 2023 v 16:39 odesílatel songjinzhou <2903807...@qq.com> napsal: >Hello, my personal understanding is that you can use multiple iterative >controls (as a merge) in a fo loop, otherwise we can only separate these >iterative controls, but in fact, they may do the same thing. >1. please, don't use top posting in this mailing list >https://en.wikipedia.org/wiki/Posting_styl >2. I understand the functionality, but I don't think there is a real necessity >to support this functionality. Not in this static form, and just for integer >type. >Postgres has a nice generic type "multirange". I can imagine some iterator >over the value of multirange, but I cannot imagine the necessity of a richer >iterator over just integer range. So the question is, what is the real >possible use case of this proposed functionality? 1. I'm very sorry that my personal negligence has caused obstacles to your reading. Thank you for your reminding. 2. With regard to the use of this function, my understanding is relatively simple: there are many for loops that may do the same things. We can reduce our sql redundancy by merging iterative control; It is also more convenient to understand and read logically. As follows, we can only repeat the for statement before we use such SQL: begin for i in 10..20 loop raise notice '%', i; -- Things to do end loop; for i in 100 .. 200 by 10 loop raise notice '%', i; -- Things to do end loop; end; But now we can simplify it as follows: begin for i in 10..20, 100 .. 200 by 10 loop raise notice '%', i; -- Things to do end loop; end; Although we can only use integer iterative control here, this is just a horizontal expansion of the previous logic. Thank you very much for your reply. I am very grateful! --- songjinzhou(2903807...@qq.com)
Re: wake up logical workers after ALTER SUBSCRIPTION
Kyotaro Horiguchi writes: > At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart > wrote in >> Here is a first attempt at a patch. I scanned through all the existing >> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything >> else that needed adjusting. > There seems to be two cases for DSA_HANDLE_INVALID in dsa_get_handle > and dsa_attach_in_place, one of which is Assert(), though. Right. I fixed some other infelicities and pushed it. regards, tom lane
Re: Re: Support plpgsql multi-range in conditional control
st 25. 1. 2023 v 17:22 odesílatel songjinzhou <2903807...@qq.com> napsal: > > >Hi > > >st 25. 1. 2023 v 16:39 odesílatel songjinzhou <2903807...@qq.com> > napsal: Hello, my personal understanding is that you can use multiple > iterative controls (as a merge) in a fo loop, otherwise we can only > separate these iterative controls, but in fact, they may do the same thing. > > >1. please, don't use top posting in this mailing list > https://en.wikipedia.org/wiki/Posting_styl > > >2. I understand the functionality, but I don't think there is a real > necessity to support this functionality. Not in this static form, and just > for integer type. > > >Postgres has a nice generic type "multirange". I can imagine some > iterator over the value of multirange, but I cannot imagine the necessity > of a richer iterator over just integer range. So the question is, what is > the real possible use case of this proposed functionality? > > 1. I'm very sorry that my personal negligence has caused obstacles to your > reading. Thank you for your reminding. > 2. With regard to the use of this function, my understanding is relatively > simple: there are many for loops that may do the same things. We can reduce > our sql redundancy by merging iterative control; It is also more convenient > to understand and read logically. > > As follows, we can only repeat the for statement before we use such SQL: > > begin > for i in 10..20 loop > raise notice '%', i; -- Things to do > end loop; > > for i in 100 .. 200 by 10 loop > raise notice '%', i; -- Things to do > end loop; > end; > > But now we can simplify it as follows: > > begin > for i in 10..20, 100 .. 200 by 10 loop > raise notice '%', i; -- Things to do > end loop; > end; > > Although we can only use integer iterative control here, this is just a > horizontal expansion of the previous logic. Thank you very much for your > reply. I am very grateful! > > Unfortunately, this is not a real use case - this is not an example from the real world. Regards Pavel > --- > > songjinzhou(2903807...@qq.com) > >
Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist
On Wed, Jan 25, 2023 at 7:47 AM Israel Barth Rubio wrote: > I imagine more people might have already hit a similar situation too. While > the > workaround can seem a bit weird, in my very humble opinion the user/client is > somehow still the one to blame in this case as it is providing the "wrong" > file in > a path that is checked by libpq. With that in mind I would be inclined to say > it is > an acceptable workaround. I'm not sure how helpful it is to assign "blame" here. I think the requested improvement is reasonable -- it should be possible to override the default for a particular connection, without having to pick a junk value that you hope doesn't match up with an actual file on the disk. > Although both patches achieve a similar goal regarding not sending the > client certificate there is still a slight but in my opinion important > difference > between them: sslmode=disable will also disable channel encryption. It > may or may not be acceptable depending on how the connection is between > your client and the server. sslmode=disable isn't used in either of our proposals, though. Unless I'm missing what you mean? --Jacob
Re: CREATE ROLE bug?
On Wed, Jan 25, 2023 at 9:35 AM Bruce Momjian wrote: > So, how would someone with CREATEROLE permission add people to their own > role, without superuser permission? Are we adding any security by > preventing this? They can't, because a role can't ever have ADMIN OPTION on itself, and you need ADMIN OPTION on a role to confer membership in that role. The security argument here is complicated, but I think it basically boils down to wanting to distinguish between accessing the permissions of a role and administering the role, or in other words, being a member of a role is supposed to be different than having ADMIN OPTION on it. Probably for that reason, we've never allowed making a role a member of itself. In any release, this fails: rhaas=# grant bruce to bruce with admin option; ERROR: role "bruce" is a member of role "bruce" If that worked, then role "bruce" would be able to grant membership in role "bruce" to anyone -- but all those people wouldn't just get membership, they'd get ADMIN OPTION, too, because *bruce* has ADMIN OPTION on himself and anyone to whom he grants access to his role will therefore get admin option too. Someone might argue that this is OK, but our precedent is otherwise. It used to be the case that the users implicitly enjoyed ADMIN OPTION on their own roles and thus could do the sort of thing that you were proposing. That led to CVE-2014-0060 and commit fea164a72a7bfd50d77ba5fb418d357f8f2bb7d0. That CVE is, as I understand it, all about maintaining the distinction between membership and ADMIN OPTION. In other words, we've made an intentional decision to not let ordinary users do the sort of thing you tried to do here. So the only reason your example ever worked is because the "service" role had CREATEROLE, and thus, in earlier releases, got to bypass all the permissions checks. But it turns out that letting CREATEROLE bypass all the permissions checks is *also* a big security problem, so that is now restricted as well. I don't want to take the position that we couldn't find some way to allow ordinary users to do this. I think that the desire to maintain the distinction between membership and ADMIN OPTION makes sense as a general rule, but if somebody wants to abolish it in a particular case by making strange grants, would that really be that bad? I'm not totally convinced that it would be. It probably depends somewhat on how much you want to try to keep people from accidentally giving away more privileges than they intended, and also on whether you think that this is a useful thing for someone to be able to do in the first place. However, it's the way we've designed the system and we've even requested CVEs when we accidentally did something inconsistent with that general principle. Similarly, I don't want to take the position that the restrictions I put on CREATEROLE are the *only* way that we could have plugged the security holes that it has had for years now. I think they are pretty sensible and pretty consistent with the overall system design, but somebody else might have been able to come up with some other set of restrictions that allowed this case to work. I think David is right to raise the question of how useful it would be to allow this case. In general, I think that if role A creates role B, it is more sensible to grant B's permissions to A than to grant A's permissions to B. The former is useful because it permits the more powerful user to act on behalf of the less powerful user, just as the superuser is able to administer the whole system by being able to act on behalf of any other user. But the latter makes you wonder why you are even bothering to have two users, because you end up with A and B having exactly identical privileges. That's a bit of a strange thing to want, but if you do happen to want it, you can get it with this new system: again, as David says, you should just create one role to use as a group, and then grant membership in that group to multiple roles that are used as users. But it does seem pretty important to keep talking about these things, because there's definitely no guarantee whatsoever that all of the commits I've made to master in this area are without problems. If we find important cases that can't be supported given the new restrictions on CREATEROLE, or even important cases that never worked but we wish they did, well then we should think about what to change. I'm not too concerned about this particular case not working, but the next one might be different. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add LZ4 compression in pg_dump
On 1/25/23 16:37, gkokola...@pm.me wrote: > > > > > > --- Original Message --- > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby > wrote: > > >> >> >> On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote: >> >>> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby >>> pry...@telsasoft.com wrote: >>> On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote: > Please find attached v23 which reintroduces the split. > > 0001 is reworked to have a reduced footprint than before. Also in an > attempt > to facilitate the readability, 0002 splits the API's and the uncompressed > implementation in separate files. Thanks for updating the patch. Could you address the review comments I sent here ? https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com >>> >>> Please find v24 attached. >> >> >> Thanks for updating the patch. >> >> In 001, RestoreArchive() does: >> >>> -#ifndef HAVE_LIBZ >>> - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && >>> - AH->PrintTocDataPtr != NULL) >>> + supports_compression = false; >>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE || >>> + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) >>> + supports_compression = true; >>> + >>> + if (AH->PrintTocDataPtr != NULL) >>> { >>> for (te = AH->toc->next; te != AH->toc; te = te->next) >>> { >>> if (te->hadDumper && (te->reqs & REQ_DATA) != 0) >>> - pg_fatal("cannot restore from compressed archive (compression not >>> supported in this installation)"); >>> + { >>> +#ifndef HAVE_LIBZ >>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) >>> + supports_compression = false; >>> +#endif >>> + if (supports_compression == false) >>> + pg_fatal("cannot restore from compressed archive (compression not >>> supported in this installation)"); >>> + } >>> } >>> } >>> -#endif >> >> >> This first checks if the algorithm is implemented, and then checks if >> the algorithm is supported by the current build - that confused me for a >> bit. It seems unnecessary to check for unimplemented algorithms before >> looping. That also requires referencing both GZIP and LZ4 in two >> places. > > I am not certain that it is unnecessary, at least not in the way that is > described. The idea is that new compression methods can be added, without > changing the archive's version number. It is very possible that it is > requested to restore an archive compressed with a method not implemented > in the current binary. The first check takes care of that and sets > supports_compression only for the supported versions. It is possible to > enter the loop with supports_compression already set to false, for example > because the archive was compressed with ZSTD, triggering the fatal error. > > Of course, one can throw the error before entering the loop, yet I think > that it does not help the readability of the code. IMHO it is easier to > follow if the error is thrown once during that check. > Actually, I don't understand why 0001 moves the check into the loop. I mean, why not check HAVE_LIBZ before the loop? >> >> I think it could be written to avoid the need to change for added >> compression algorithms: >> >> + if (te->hadDumper && (te->reqs & REQ_DATA) != 0) >> >> + { >> + /* Check if the compression algorithm is supported */ >> + pg_compress_specification spec; >> + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec); >> >> + if (spec->parse_error != NULL) >> >> + pg_fatal(spec->parse_error); >> >> + } > > I am not certain how that would work in the example with ZSTD above. > If I am not wrong, parse_compress_specification() will not throw an error > if the codebase supports ZSTD, yet this specific pg_dump binary will not > support it because ZSTD is not implemented. parse_compress_specification() > is not aware of that and should not be aware of it, should it? > Not sure. What happens in a similar situation now? That is, when trying to deal with an archive gzip-compressed in a build without libz? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: CREATE ROLE bug?
On Wed, Jan 25, 2023 at 12:21:14PM -0500, Robert Haas wrote: > But it does seem pretty important to keep talking about these things, > because there's definitely no guarantee whatsoever that all of the > commits I've made to master in this area are without problems. If we > find important cases that can't be supported given the new > restrictions on CREATEROLE, or even important cases that never worked > but we wish they did, well then we should think about what to change. > I'm not too concerned about this particular case not working, but the > next one might be different. Agreed, thanks for the details. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 12:37 AM Michael Paquier wrote: > Rename contrib module basic_archive to basic_wal_module FWIW, I find this new name much less clear than the old one. If we want to provide a basic_archive module and a basic_recovery module, that seems fine. Why merge them? -- Robert Haas EDB: http://www.enterprisedb.com
Re: wake up logical workers after ALTER SUBSCRIPTION
On Wed, Jan 25, 2023 at 04:12:00PM +0900, Kyotaro Horiguchi wrote: > At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart > wrote in >> Here is a first attempt at a patch. I scanned through all the existing >> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything >> else that needed adjusting. > > There seems to be two cases for DSA_HANDLE_INVALID in dsa_get_handle > and dsa_attach_in_place, one of which is Assert(), though. Ah, sorry, I'm not sure how I missed this. Thanks for looking. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: wake up logical workers after ALTER SUBSCRIPTION
On Wed, Jan 25, 2023 at 11:49:27AM -0500, Tom Lane wrote: > Right. I fixed some other infelicities and pushed it. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Add LZ4 compression in pg_dump
On Wed, Jan 25, 2023 at 03:37:12PM +, gkokola...@pm.me wrote: > Of course, one can throw the error before entering the loop, yet I think > that it does not help the readability of the code. IMHO it is easier to > follow if the error is thrown once during that check. > If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(), > and remove altogether this check. On the other hand, I like the belts > and suspenders approach because there are no more checks after this point. While looking at this, I realized that commit 5e73a6048 introduced a regression: @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) - if (AH->compression != 0) - pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) + pg_fatal("archive is compressed, but this installation does not support compression"); Before, it was possible to restore non-data chunks of a dump file, even if the current build didn't support its compression. But that's now impossible - and it makes the code we're discussing in RestoreArchive() unreachable. I don't think we can currently test for that, since it requires creating a dump using a build --with compression and then trying to restore using a build --without compression. The coverage report disagrees with me, though... https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901 > > I think it could be written to avoid the need to change for added > > compression algorithms: ... > > I am not certain how that would work in the example with ZSTD above. > If I am not wrong, parse_compress_specification() will not throw an error > if the codebase supports ZSTD, yet this specific pg_dump binary will not > support it because ZSTD is not implemented. parse_compress_specification() > is not aware of that and should not be aware of it, should it? You're right. I think the 001 patch should try to remove hardcoded references to LIBZ/GZIP, such that the later patches don't need to update those same places for LZ4. For example in ReadHead() and RestoreArchive(), and maybe other places dealing with file extensions. Maybe that could be done by adding a function specific to pg_dump indicating whether or not an algorithm is implemented and supported. -- Justin
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 12:49:45PM -0500, Robert Haas wrote: > On Wed, Jan 25, 2023 at 12:37 AM Michael Paquier wrote: >> Rename contrib module basic_archive to basic_wal_module > > FWIW, I find this new name much less clear than the old one. > > If we want to provide a basic_archive module and a basic_recovery > module, that seems fine. Why merge them? I'll admit I've been stewing on whether "WAL Modules" is the right name. My first instinct was to simply call it "Archive and Recovery Modules," which is longer but (IMHO) clearer. I wanted to merge basic_archive and basic_recovery because there's a decent chunk of duplicated code. Perhaps that is okay, but I would rather just have one test module. AFAICT the biggest reason to split it is because we can't determine a good name. Maybe we could leave the name as "basic_archive" since it deals with creating and recovering archive files. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist
Hello Jacob, > I'm not sure how helpful it is to assign "blame" here. I think the > requested improvement is reasonable -- it should be possible to > override the default for a particular connection, without having to > pick a junk value that you hope doesn't match up with an actual file > on the disk. Right, I agree we can look for improvements. "blame" was likely not the best word to express myself in that message. > sslmode=disable isn't used in either of our proposals, though. Unless > I'm missing what you mean? Sorry about the noise, I misread the code snippet shared earlier (sslmode x sslcertmode). I just took a closer read at the previously mentioned patch about sslcertmode and it seems a bit more elegant way of achieving something similar to what has been proposed here. Best regards, Israel. Em qua., 25 de jan. de 2023 às 14:09, Jacob Champion < jchamp...@timescale.com> escreveu: > On Wed, Jan 25, 2023 at 7:47 AM Israel Barth Rubio > wrote: > > I imagine more people might have already hit a similar situation too. > While the > > workaround can seem a bit weird, in my very humble opinion the > user/client is > > somehow still the one to blame in this case as it is providing the > "wrong" file in > > a path that is checked by libpq. With that in mind I would be inclined > to say it is > > an acceptable workaround. > > I'm not sure how helpful it is to assign "blame" here. I think the > requested improvement is reasonable -- it should be possible to > override the default for a particular connection, without having to > pick a junk value that you hope doesn't match up with an actual file > on the disk. > > > Although both patches achieve a similar goal regarding not sending the > > client certificate there is still a slight but in my opinion important > difference > > between them: sslmode=disable will also disable channel encryption. It > > may or may not be acceptable depending on how the connection is between > > your client and the server. > > sslmode=disable isn't used in either of our proposals, though. Unless > I'm missing what you mean? > > --Jacob >
Re: heapgettup() with NoMovementScanDirection unused in core?
Hi, On 2023-01-25 10:02:28 -0500, Tom Lane wrote: > David Rowley writes: > > Does anyone know of any reason why we shouldn't ditch the nomovement > > code in heapgettup/heapgettup_pagemode? +1 Because I dug it up yesterday. There used to be callers of heap* with NoMovement. But they were unused themselves: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=adbfab119b308a7e0e6b1305de9be222cfd5c85b > *If direction is NoMovementScanDirection then nothing is done > *except to start up/shut down the destination. Otherwise, > *we retrieve up to 'count' tuples in the specified direction. > * > *Note: count = 0 is interpreted as no portal limit, i.e., run to > *completion. > > We must have the NoMovementScanDirection option because count = 0 > does not mean "do nothing", and I noted at least two call sites > that require it. I wonder if we'd be better off removing NoMovementScanDirection, and using count == (uint64)-1 for what NoMovementScanDirection is currently used for as an ExecutorRun parameter. Seems less confusing to me - right now we have two parameters with non-obbvious meanings and interactions. > I wonder if we couldn't also get rid of this confusingly-inconsistent > alternative usage in the planner: > > * 'indexscandir' is one of: > *ForwardScanDirection: forward scan of an ordered index > *BackwardScanDirection: backward scan of an ordered index > *NoMovementScanDirection: scan of an unordered index, or don't care > * (The executor doesn't care whether it gets ForwardScanDirection or > * NoMovementScanDirection for an indexscan, but the planner wants to > * distinguish ordered from unordered indexes for building pathkeys.) +1 Certainly seems confusing to me. Greetings, Andres Freund
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi, On 2023-01-25 18:45:12 +0300, Aleksander Alekseev wrote: > Currently we allow self-conflicting inserts for ON CONFLICT DO NOTHING: > > ``` > CREATE TABLE t (a INT UNIQUE, b INT); > INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT DO NOTHING; > -- succeeds, inserting the first row and ignoring the second > ``` > ... but not for ON CONFLICT .. DO UPDATE: > > ``` > INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0; > ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time > HINT: Ensure that no rows proposed for insertion within the same > command have duplicate constrained values. > ``` > > Tom pointed out in 2016 that this is actually a bug [1] and I agree. I don't think I agree with this being a bug. We can't sensible implement updating a row twice within a statement - hence erroring out for ON CONFLICT DO UPDATE affecting a row twice. But what's the justification for erroring out in the DO NOTHING case? ISTM that it's useful to be able to handle such duplicates, and I don't immediately see what semantic confusion or implementation difficulty we avoid by erroring out. It seems somewhat likely that a behavioural change will cause trouble for some of the uses of DO NOTHING out there. Greetings, Andres Freund
pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function
Hi hackers, I attempted to perform an upgrade from PG-14.5 to PG-15.1 with pg_upgrade and unfortunately it errors out because of a function that does not exist anymore in PG-15.1. The function is ‘pg_catalog.close_lb’ and it exists in 14.5 but not in 15.1. In our scenario we changed the permissions of this function in PG14.5 (via an automated tool) and then pg_upgrade tries to change the permissions in PG15.1 as well. Steps to reproduce: 1. Run initdb for 14.5 2. Run initdb for 15.1 3. Run psql client on 14.5 * postgres=# REVOKE ALL ON FUNCTION close_lb(line, box) FROM $USER; 4. Run pg_upgrade from 14.5 to 15.1 This will error out because pg_upgrade will attempt to REVOKE the persmissions on close_lb on 15.1. Is there a way to specify which functions/objects to exclude in pg_upgrade? Thanks in advance! Dimos (ServiceNow)
Re: Transparent column encryption
On 07.01.23 01:34, Justin Pryzby wrote: "ON (CASE WHEN a.attrealtypid <> 0 THEN a.attrealtypid ELSE a.atttypid END = t.oid)\n" This breaks interoperability with older servers: ERROR: column a.attrealtypid does not exist Same in describe.c Find attached some typos and bad indentation. I'm sending this off now as I've already sat on it for 2 weeks since starting to look at the patch. Thanks, I have integrated all that into the v15 patch I just posted.
Re: Transparent column encryption
On 12.01.23 17:32, Peter Eisentraut wrote: Can we do anything about the attack vector wherein a malicious DBA simply copies the encrypted datum from one row to another? We discussed this earlier [0]. This patch is not that feature. We could get there eventually, but it would appear to be an immense amount of additional work. We have to start somewhere. I've been thinking, this could be done as a "version 2" of the currently proposed feature, within the same framework. We'd extend the RowDescription and ParameterDescription messages to provide primary key information, some flags, then the client would have enough to know what to do. As you wrote in your follow-up message, a challenge would be to handle statements that do not touch all the columns. We'd need to work through this and consider all the details.
Re: Transparent column encryption
On 19.01.23 21:48, Jacob Champion wrote: I like the existing "caveats" documentation, and I've attached a sample patch with some more caveats documented, based on some of the upthread conversation: - text format makes fixed-length columns leak length information too - you only get partial protection against the Evil DBA - RSA-OAEP public key safety (Feel free to use/remix/discard as desired.) I have added those in the v15 patch I just posted. When writing the paragraph on RSA-OAEP I was reminded that we didn't really dig into the asymmetric/symmetric discussion. Assuming that most first-time users will pick the builtin CMK encryption method, do we still want to have an asymmetric scheme implemented first instead of a symmetric keywrap? I'm still concerned about that public key, since it can't really be made public. I had started coding that, but one problem was that the openssl CLI doesn't really provide any means to work with those kinds of keys. The "openssl enc" command always wants to mix in a password. Without that, there is no way to write a test case, and more crucially no way for users to set up these kinds of keys. Unless we write our own tooling for this, which, you know, the patch just passed 400k in size. For the padding caveat: + There is no concern if all values are of the same length (e.g., credit + card numbers). I nodded along to this statement last year, and then this year I learned that CCNs aren't fixed-length. So with a 16-byte block, you're probably going to be able to figure out who has an American Express card. Heh. I have removed that parenthetical remark. The column encryption algorithm is set per-column -- but isn't it tightly coupled to the CEK, since the key length has to match? From a layperson perspective, using the same key to encrypt the same plaintext under two different algorithms (if they happen to have the same key length) seems like it might be cryptographically risky. Is there a reason I should be encouraged to do that? Not really. I was also initially confused by this setup, but that's how other similar systems are set up, so I thought it would be confusing to do it differently. With the loss of \gencr it looks like we also lost a potential way to force encryption from within psql. Any plans to add that for v1? \gencr didn't do that either. We could do it. The libpq API supports it. We just need to come up with some syntax for psql. While testing, I forgot how the new option worked and connected with `column_encryption=on` -- and then I accidentally sent unencrypted data to the server, since `on` means "not enabled". :( The server errors out after the damage is done, of course, but would it be okay to strictly validate that option's values? fixed in v15 Are there plans to document client-side implementation requirements, to ensure cross-client compatibility? Things like the "PG\x00\x01" associated data are buried at the moment (or else I've missed them in the docs). If you're holding off until the feature is more finalized, that's fine too. This is documented in the protocol chapter, which I thought was the right place. Did you want more documentation, or in a different place? Speaking of cross-client compatibility, I'm still disconcerted by the ability to write the value "hello world" into an encrypted integer column. Should clients be required to validate the text format, using the attrealtypid? Well, we can ask them to, but we can't really require them, in a cryptographic sense. I'm not sure what more we can do. It occurred to me when looking at the "unspecified" CMK scheme that the CEK doesn't really have to be an encryption key at all. In that case it can function more like a (possibly signed?) cookie for lookup, or even be ignored altogether if you don't want to use a wrapping scheme (similar to JWE's "direct" mode, maybe?). So now you have three ways to look up or determine a column encryption key (CMK realm, CMK name, CEK cookie)... is that a concept worth exploring in v1 and/or the documentation? I don't completely follow this.
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi Andres, > I don't think I agree with this being a bug. Perhaps that's not a bug especially considering the fact that the documentation describes this behavior, but in any case the fact that: ``` INSERT INTO t VALUES (1,1) ON CONFLICT (a) DO UPDATE SET b = 0; INSERT INTO t VALUES (1,2) ON CONFLICT (a) DO UPDATE SET b = 0; ``` and: ``` INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO NOTHING; `` .. both work, and: ``` INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0; ``` ... doesn't is rather confusing. There is no reason why the latest query shouldn't work except for a slight complication of the code. Which seems to be a reasonable tradeoff, for me at least. > But what's the justification for erroring out in the DO NOTHING case? > > [...] > > It seems somewhat likely that a behavioural change will cause trouble for some > of the uses of DO NOTHING out there. Just to make sure we are on the same page. The patch doesn't break the current DO NOTHING behavior but rather makes DO UPDATE work the same way DO NOTHING does. -- Best regards, Aleksander Alekseev
Re: Syncrep and improving latency due to WAL throttling
Hi, On 2023-01-25 14:32:51 +0100, Jakub Wartak wrote: > In other words it allows slow down of any backend activity. Any feedback on > such a feature is welcome, including better GUC name proposals ;) and > conditions in which such feature should be disabled even if it would be > enabled globally (right now only anti- wraparound VACUUM comes to mind, it's > not in the patch). Such a feature could be useful - but I don't think the current place of throttling has any hope of working reliably: > @@ -1021,6 +1025,21 @@ XLogInsertRecord(XLogRecData *rdata, > pgWalUsage.wal_bytes += rechdr->xl_tot_len; > pgWalUsage.wal_records++; > pgWalUsage.wal_fpi += num_fpi; > + > + backendWalInserted += rechdr->xl_tot_len; > + > + if ((synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY || > synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_WRITE) && > + synchronous_commit_flush_wal_after > 0 && > + backendWalInserted > synchronous_commit_flush_wal_after > * 1024L) > + { > + elog(DEBUG3, "throttling WAL down on this session > (backendWalInserted=%d)", backendWalInserted); > + XLogFlush(EndPos); > + /* XXX: refactor SyncRepWaitForLSN() to have different > waitevent than default WAIT_EVENT_SYNC_REP */ > + /* maybe new WAIT_EVENT_SYNC_REP_BIG or something like > that */ > + SyncRepWaitForLSN(EndPos, false); > + elog(DEBUG3, "throttling WAL down on this session - > end"); > + backendWalInserted = 0; > + } > } You're blocking in the middle of an XLOG insertion. We will commonly hold important buffer lwlocks, it'll often be in a critical section (no cancelling / terminating the session!). This'd entail loads of undetectable deadlocks (i.e. hard hangs). And even leaving that aside, doing an unnecessary XLogFlush() with important locks held will seriously increase contention. My best idea for how to implement this in a somewhat safe way would be for XLogInsertRecord() to set a flag indicating that we should throttle, and set InterruptPending = true. Then the next CHECK_FOR_INTERRUPTS that's allowed to proceed (i.e. we'll not be in a critical / interrupts off section) can actually perform the delay. That should fix the hard deadlock danger and remove most of the increase in lock contention. I don't think doing an XLogFlush() of a record that we just wrote is a good idea - that'll sometimes significantly increase the number of fdatasyncs/sec we're doing. To make matters worse, this will often cause partially filled WAL pages to be flushed out - rewriting a WAL page multiple times can significantly increase overhead on the storage level. At the very least this'd have to flush only up to the last fully filled page. Just counting the number of bytes inserted by a backend will make the overhead even worse, as the flush will be triggered even for OLTP sessions doing tiny transactions, even though they don't contribute to the problem you're trying to address. How about counting how many bytes of WAL a backend has inserted since the last time that backend did an XLogFlush()? A bulk writer won't do a lot of XLogFlush()es, so the time/bytes since the last XLogFlush() will increase quickly, triggering a flush at the next opportunity. But short OLTP transactions will do XLogFlush()es at least at every commit. I also suspect the overhead will be more manageable if you were to force a flush only up to a point further back than the last fully filled page. We don't want to end up flushing WAL for every page, but if you just have a backend-local accounting mechanism, I think that's inevitably what you'd end up with when you have a large number of sessions. But if you'd limit the flushing to be done to synchronous_commit_flush_wal_after / 2 boundaries, and only ever to a prior boundary, the amount of unnecessary WAL flushes would be proportional to synchronous_commit_flush_wal_after. Greetings, Andres Freund
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 1:17 PM Nathan Bossart wrote: > On Wed, Jan 25, 2023 at 12:49:45PM -0500, Robert Haas wrote: > > On Wed, Jan 25, 2023 at 12:37 AM Michael Paquier > > wrote: > >> Rename contrib module basic_archive to basic_wal_module > > > > FWIW, I find this new name much less clear than the old one. > > > > If we want to provide a basic_archive module and a basic_recovery > > module, that seems fine. Why merge them? > > I'll admit I've been stewing on whether "WAL Modules" is the right name. > My first instinct was to simply call it "Archive and Recovery Modules," > which is longer but (IMHO) clearer. > > I wanted to merge basic_archive and basic_recovery because there's a decent > chunk of duplicated code. Perhaps that is okay, but I would rather just > have one test module. AFAICT the biggest reason to split it is because we > can't determine a good name. Maybe we could leave the name as > "basic_archive" since it deals with creating and recovering archive files. Yeah, maybe. I'm not sure what the best thing to do is, but if I see a module called basic_archive or basic_restore, I know what it's about, whereas basic_wal_module seems a lot less specific. It sounds like it could be generating or streaming it just as easily as it could be archiving it. It would be nice to have a name that is less prone to that kind of unclarity. -- Robert Haas EDB: http://www.enterprisedb.com
Re: What object types should be in schemas?
On 12.01.23 18:41, Alvaro Herrera wrote: I think one important criterion to think about is how does encryption work when you have per-customer (or per-whatever) schemas. Is the concept of a column encryption [objtype] a thing that you would like to set up per customer? In that case, you will probably want that object to live in that customer's schema. Otherwise, you'll force the DBA to come up with a naming scheme that includes the customer name in the column encryption object. Makes sense. In my latest patch I have moved these key objects into schemas.
Re: pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function
## Dimos Stamatakis (dimos.stamata...@servicenow.com): > In our scenario we changed the permissions of this function in PG14.5 > (via an automated tool) and then pg_upgrade tries to change the > permissions in PG15.1 as well. Given that this function wasn't even documented and did nothing but throw an error "function close_lb not implemented" - couldn't you revert that permissions change for the upgrade? (if it comes to the worst, a superuser could UPDATE pg_catalog.pg_proc and set proacl to NULL for that function, but that's not how you manage ACLs in production, it's for emergency fixing only). Regards, Christoph -- Spare Space
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
On Wed, Jan 25, 2023 at 11:01 AM Aleksander Alekseev wrote: > Just to make sure we are on the same page. The patch doesn't break the > current DO NOTHING behavior but rather makes DO UPDATE work the same > way DO NOTHING does. It also makes DO UPDATE not work the same way as either UPDATE itself (which will silently skip a second or subsequent update of the same row by the same UPDATE statement in RC mode), or MERGE (which has similar cardinality violations). DO NOTHING doesn't lock any conflicting row, and so won't have to dirty pages that have matching rows. It was always understood to be more susceptible to certain issues (when in READ COMMITTED mode) as a result. There are some halfway reasonable arguments against this sort of behavior, but I believe that we made the right trade-off. -- Peter Geoghegan
Re: Proposal: Support custom authentication methods using hooks
Greetings, Want to resurface the OAUTH support topic in the context of the concerns raised here. > How about- if we just added OAUTH support directly into libpq and the > backend, would that work with Azure's OIDC provider? If not, why not? > If it does, then what's the justification for trying to allow custom > backend-only authentication methods? We've explored this option, and prepared a patch which has bare-bone OAUTHBEARER mechanism implementation on both server- and libpq- sides. Based on existing SASL exchange support used for SCRAM. Most of the work has been done by @Jacob Champion and was referenced in this thread before. We validated the approach would work with Azure AD and other OIDC providers with token validation logic delegated to provider-specific extension. The thread link is here: https://www.postgresql.org/message-id/flat/CABkiuWo4fJQ7dhqgYLtJh41kpCkT6iXOO8Eym3Rdh5tx2RJCJw%40mail.gmail.com#f94c36969a68a07c087fa9af0f5401e1 With the client- and server- side support, the proposed patch would allow: - Identity providers to publish PG extensions to validate their specific token format. - Client ecosystem to build generic OAUTH flows using OIDC provider metadata defined in RFCs - Cloud providers to support a combination of Identity providers they work with. Looking forward to bring the discussion to that thread, and decide if we can proceed with the OAUTH support. Thanks, Andrey. On Wed, Jan 25, 2023 at 10:55 AM Stephen Frost wrote: > > Greetings, > > * Andres Freund (and...@anarazel.de) wrote: > > On 2022-08-03 17:21:58 -0400, Stephen Frost wrote: > > > * Andres Freund (and...@anarazel.de) wrote: > > > > On 2022-08-03 16:28:08 -0400, Stephen Frost wrote: > > > > > Again, server-side only is not interesting and not a direction that > > > > > makes sense to go in because it doesn't provide any way to have > > > > > trust established in both directions, which is what all modern > > > > > authentication methods do (certificates, kerberos, scram) and is what > > > > > we > > > > > should expect from anything new in this space. > > > > > > > > As explained repeatedly before, that's plainly not true. The patch > > > > allows > > > > e.g. to use the exact scram flow we already have, with the code we have > > > > for > > > > that (e.g. using a different source of secret). In fact the example > > > > extension > > > > does so (starting in v3, from 2022-03-15): > > > > > > Sure, thanks to the bespoke code in libpq for supporting SCRAM. I don't > > > think it makes sense to move the server-side code for that into an > > > extension as it just makes work for a bunch of people. Having a way to > > > have the authenticator token for scram exist elsewhere doesn't strike me > > > as unreasonable but that's not the same thing and is certainly more > > > constrained in terms of what it's doing. > > > > The question is: Why is providing that ability not a good step, even if > > somewhat constrainted? One argument would be that a later "protocol level > > extensibility" effort would require significant changes to the API - but I > > don't think that'd be the case. And even if, this isn't a large > > extensibility > > surface, so API evolution wouldn't be a problem. > > I'm quite confused by this as I feel like I've said multiple times that > having a way to have the SCRAM authenticator token come from somewhere > else seems reasonable. If that's not what you're referring to above by > 'that ability' then it'd really help if you could clarify what you mean > there. > > > > Further, I outlined exactly how extensability in this area could be > > > achieved: use some good third party library that provides multiple SASL > > > methods then just add support for that library to *PG*, on both the > > > libpq and the server sides. > > > > > What I don't think makes sense is adding extensibility to the server > > > side, especially if it's through a 3rd party library, but then not > > > adding support for that to libpq. I don't follow what the rationale is > > > for explicitly excluding libpq from this discussion. > > > > The rationale is trivial: Breaking down projects into manageable, separately > > useful, steps is a very sensible way of doing development. Even if we get > > protocol extensibility, we're still going to need an extension API like it's > > provided by the patchset. After protocol extensibility the authentication > > extensions would then have more functions it could call to control the auth > > flow with the client. > > > > > To be clear- I'm not explicitly saying that we can only add > > > extensibility with SCRAM, I'm just saying that whatever we're doing here > > > to add other actual authentication methods we should be ensuring is done > > > on both sides with a way for each side to authenticate the other side, > > > as all of the modern authentication methods we have already do. > > > > But *why* do these need to be tied together? > > I'm also confused by this. Surely you'd need a
Re: Implement missing join selectivity estimation for range types
Hi Tomas, > I finally had time to properly read the paper today - the general > approach mostly matches how I imagined the estimation would work for > inequalities, but it's definitely nice to see the algorithm properly > formalized and analyzed. Awesome, thanks for this interest! > What seems a bit strange to me is that the patch only deals with range > types, leaving the scalar cases unchanged. I understand why (not having > a MCV simplifies it a lot), but I'd bet joins on range types are wy > less common than inequality joins on scalar types. I don't even remember > seeing inequality join on a range column, TBH. > > That doesn't mean the patch is wrong, of course. But I'd expect users to > be surprised we handle range types better than "old" scalar types (which > range types build on, in some sense). > > Did you have any plans to work on improving estimates for the scalar > case too? Or did you do the patch needed for the paper, and have no > plans to continue working on this? I fully agree. Scalars are way more important. I join you in the call for Diogo and Zhicheng to continue this implementation, specially after the interest you show towards this patch. The current patch was a course project (taught by me and Maxime), which was specific to range types. But indeed the solution generalizes well to scalars. Hopefully after the current exam session (Feb), there will be time to continue the implementation. Nevertheless, it makes sense to do two separate patches: this one, and the scalars. The code of the two patches is located in different files, and the estimation algorithms have slight differences. > I'm also wondering about not having MCV for ranges. I was a bit > surprised we don't build MCV in compute_range_stats(), and perhaps we > should start building those - if there are common ranges, this might > significantly improve some of the estimates (just like for scalar > columns). Which would mean the estimates for range types are just as > complex as for scalars. Of course, we don't do that now. Good question. Our intuition is that MCV will not be useful for ranges. Maxime has done an experiment and confirmed this intuition. Here is his experiment and explanation: Create a table with 126000 int4ranges. All ranges have their middle between 0 and 1000 and a length between 90 and 110. The ranges are created in the following way: - 10 different ranges, each duplicated 1000 times - 20 different ranges, each duplicated 500 times - 40 different ranges, each duplicated 100 times - 200 different ranges, each duplicated 10 times - 10 different ranges, not duplicated Two such tables (t1 and t2) were created in the same way but with different data. Then he ran the following query: EXPLAIN ANALYZE SELECT count(*) FROM t, t2 WHERE t && t2 The results (using our patch) where the following: Plan rows: 2991415662 Actual rows: 2981335423 So the estimation accuracy is still fairly good for such data with a lot of repeated values, even without having MCV statistics. The only error that can happen in our algorithms comes from the last bin in which we assume uniform distribution of values. Duplicate values (end bound, not ranges) might make this assumption be wrong, which would create inaccurate estimations. However, this is still only incorrect for the last bin and all the others are correct. MCV's are mainly useful for equality, which is not an operation we cover, and probably not an important predicate for ranges. What do you think? Best regards, Mahmoud
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi Peter, > It also makes DO UPDATE not work the same way as either UPDATE itself > (which will silently skip a second or subsequent update of the same > row by the same UPDATE statement in RC mode), or MERGE (which has > similar cardinality violations). That's true. On the flip side, UPDATE and MERGE are different statements and arguably shouldn't behave the same way INSERT does. To clarify, I'm merely proposing the change and playing the role of Devil's advocate here. I'm not arguing that the patch should be necessarily accepted. In the end of the day it's up to the community to decide. Personally I think it would make the users a bit happier. The actual reason why I made the patch is that a colleague of mine, Sven Klemm, encountered this limitation recently and was puzzled by it and so was I. The only workaround the user currently has is to execute several INSERTs one by one which is expensive when you have a lot of INSERTs. -- Best regards, Aleksander Alekseev
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Wednesday, January 25th, 2023 at 6:28 PM, Tomas Vondra wrote: > > > > On 1/25/23 16:37, gkokola...@pm.me wrote: > > > --- Original Message --- > > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby > > pry...@telsasoft.com wrote: > > > > > On Tue, Jan 24, 2023 at 03:56:20PM +, gkokola...@pm.me wrote: > > > > > > > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby > > > > pry...@telsasoft.com wrote: > > > > > > > > > On Mon, Jan 23, 2023 at 05:31:55PM +, gkokola...@pm.me wrote: > > > > > > > > > > > Please find attached v23 which reintroduces the split. > > > > > > > > > > > > 0001 is reworked to have a reduced footprint than before. Also in > > > > > > an attempt > > > > > > to facilitate the readability, 0002 splits the API's and the > > > > > > uncompressed > > > > > > implementation in separate files. > > > > > > > > > > Thanks for updating the patch. Could you address the review comments I > > > > > sent here ? > > > > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com > > > > > > > > Please find v24 attached. > > > > > > Thanks for updating the patch. > > > > > > In 001, RestoreArchive() does: > > > > > > > -#ifndef HAVE_LIBZ > > > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && > > > > - AH->PrintTocDataPtr != NULL) > > > > + supports_compression = false; > > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE || > > > > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > + supports_compression = true; > > > > + > > > > + if (AH->PrintTocDataPtr != NULL) > > > > { > > > > for (te = AH->toc->next; te != AH->toc; te = te->next) > > > > { > > > > if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > > > > - pg_fatal("cannot restore from compressed archive (compression not > > > > supported in this installation)"); > > > > + { > > > > +#ifndef HAVE_LIBZ > > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > + supports_compression = false; > > > > +#endif > > > > + if (supports_compression == false) > > > > + pg_fatal("cannot restore from compressed archive (compression not > > > > supported in this installation)"); > > > > + } > > > > } > > > > } > > > > -#endif > > > > > > This first checks if the algorithm is implemented, and then checks if > > > the algorithm is supported by the current build - that confused me for a > > > bit. It seems unnecessary to check for unimplemented algorithms before > > > looping. That also requires referencing both GZIP and LZ4 in two > > > places. > > > > I am not certain that it is unnecessary, at least not in the way that is > > described. The idea is that new compression methods can be added, without > > changing the archive's version number. It is very possible that it is > > requested to restore an archive compressed with a method not implemented > > in the current binary. The first check takes care of that and sets > > supports_compression only for the supported versions. It is possible to > > enter the loop with supports_compression already set to false, for example > > because the archive was compressed with ZSTD, triggering the fatal error. > > > > Of course, one can throw the error before entering the loop, yet I think > > that it does not help the readability of the code. IMHO it is easier to > > follow if the error is thrown once during that check. > > > Actually, I don't understand why 0001 moves the check into the loop. I > mean, why not check HAVE_LIBZ before the loop? The intention is to be able to restore archives that don't contain data. In that case compression becomes irrelevant as only the data in an archive is compressed. > > > > I think it could be written to avoid the need to change for added > > > compression algorithms: > > > > > > + if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > > > > > > + { > > > + /* Check if the compression algorithm is supported */ > > > + pg_compress_specification spec; > > > + parse_compress_specification(AH->compression_spec.algorithm, NULL, > > > &spec); > > > > > > + if (spec->parse_error != NULL) > > > > > > + pg_fatal(spec->parse_error); > > > > > > + } > > > > I am not certain how that would work in the example with ZSTD above. > > If I am not wrong, parse_compress_specification() will not throw an error > > if the codebase supports ZSTD, yet this specific pg_dump binary will not > > support it because ZSTD is not implemented. parse_compress_specification() > > is not aware of that and should not be aware of it, should it? > > > Not sure. What happens in a similar situation now? That is, when trying > to deal with an archive gzip-compressed in a build without libz? In case that there are no data chunks, the archive will be restored. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Re: to_hex() for negative inputs
Hi Dean, > > So in your opinion what is the expected result of to_hex(INT_MIN, > > with_sign => true)? > > > > "-8000" or "-0x8000", depending on whether the prefix is > requested. Whether this is the right result is very debatable. 0x8000 is a binary representation of -2147483648: ``` (gdb) ptype cur_timeout type = int (gdb) p cur_timeout = 0x8000 $1 = -2147483648 (gdb) p/x cur_timeout $2 = 0x8000 ``` So what you propose to return is -(-2147483648). For some users this may be a wanted result, for some it may be not. Personally I would prefer to get an ERROR in this case. And this is exactly how you end up with even more flags. I believe it would be better to let the user write the exact query depending on what he/she wants. -- Best regards, Aleksander Alekseev
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Wednesday, January 25th, 2023 at 7:00 PM, Justin Pryzby wrote: > > > On Wed, Jan 25, 2023 at 03:37:12PM +, gkokola...@pm.me wrote: > > While looking at this, I realized that commit 5e73a6048 introduced a > regression: > > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) > > - if (AH->compression != 0) > > - pg_log_warning("archive is compressed, but this installation does not > support compression -- no data will be available"); > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > + pg_fatal("archive is compressed, but this installation does not support > compression"); > > Before, it was possible to restore non-data chunks of a dump file, even > if the current build didn't support its compression. But that's now > impossible - and it makes the code we're discussing in RestoreArchive() > unreachable. Nice catch! Cheers, //Georgios > -- > Justin
Re: Improve WALRead() to suck data directly from WAL buffers when possible
Hi, On 2023-01-14 12:34:03 -0800, Andres Freund wrote: > On 2023-01-14 00:48:52 -0800, Jeff Davis wrote: > > On Mon, 2022-12-26 at 14:20 +0530, Bharath Rupireddy wrote: > > > Please review the attached v2 patch further. > > > > I'm still unclear on the performance goals of this patch. I see that it > > will reduce syscalls, which sounds good, but to what end? > > > > Does it allow a greater number of walsenders? Lower replication > > latency? Less IO bandwidth? All of the above? > > One benefit would be that it'd make it more realistic to use direct IO for WAL > - for which I have seen significant performance benefits. But when we > afterwards have to re-read it from disk to replicate, it's less clearly a win. Satya's email just now reminded me of another important reason: Eventually we should add the ability to stream out WAL *before* it has locally been written out and flushed. Obviously the relevant positions would have to be noted in the relevant message in the streaming protocol, and we couldn't generally allow standbys to apply that data yet. That'd allow us to significantly reduce the overhead of synchronous replication, because instead of commonly needing to send out all the pending WAL at commit, we'd just need to send out the updated flush position. The reason this would lower the overhead is that: a) The reduced amount of data to be transferred reduces latency - it's easy to accumulate a few TCP packets worth of data even in a single small OLTP transaction b) The remote side can start to write out data earlier Of course this would require additional infrastructure on the receiver side. E.g. some persistent state indicating up to where WAL is allowed to be applied, to avoid the standby getting ahead of th eprimary, in case the primary crash-restarts (or has more severe issues). With a bit of work we could perform WAL replay on standby without waiting for the fdatasync of the received WAL - that only needs to happen when a) we need to confirm a flush position to the primary b) when we need to write back pages from the buffer pool (and some other things). Greetings, Andres Freund
Re: GUCs to control abbreviated sort keys
On Tue, 2023-01-24 at 21:42 -0500, Robert Haas wrote: > I find it a bit premature to include this comment in the very first > email what if other people don't like the idea? The trust_strxfrm GUC was pulled from the larger collation refactoring patch, which has been out for a while. The sort_abbreviated_keys GUC is new, and I posted these both in a new thread because they started to look independently useful. If someone doesn't like the idea, they are free to comment, like in every other case (though this patch doesn't seem very controversial to me?). I suppose the wording was off-putting, so I'll choose different words next time. > I would like to hear about the cases where abbreviated keys resulted > in a regression. I want to be clear that this is not a general criticism of the abbreviated keys optimization, nor a comprehensive analysis of its performance. I am highlighting this case because the existence of a single non- contrived case or regression suggests that we may want to explore further and tweak heuristics. That's quite natural when the heuristics are based on a complex dependency like a collation provider. The sort_abbreviated_keys GUC makes that kind of exploration and tweaking a lot easier. Built with meson on linux, gcc 11.3.0, opt -O3. Times are the middle of three runs, taken from the sort operator's "first returned tuple" time in EXPLAIN ANALYZE. Total runtime (as reported in EXPLAIN ANALYZE) is pretty much the same story, but I think there was slightly more noise in that number. $ perl text_generator.pl 1000 10 > /tmp/strings.txt CREATE TABLE s (t TEXT); COPY s FROM '/tmp/strings.txt'; VACUUM FREEZE s; CHECKPOINT; SET work_mem='10GB'; SET max_parallel_workers = 0; SET max_parallel_workers_per_gather = 0; SET sort_abbreviated_keys = false; EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu"; -- 20875ms SET sort_abbreviated_keys = true; EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu"; -- 22931ms Regression for abbreviated keys optimization in this case: 9.8% > I'd also like to know whether there's a realistic possibility that > making this a run-time test could itself result in a regression. The sort_abbreviated_keys branch is happening after tuplesort_begin_common (which creates memory contexts, etc.) and before preparing the sort keys (which involves catalog lookups). The trust_strxfrm branch is happening in the type-specific sort support function, which needs to be looked up in the catalog before being called (using V1 calling convention). It doesn't look likely that a single branch in that path will have a perf impact. Do you have a more specific concern? -- Jeff Davis PostgreSQL Contributor Team - AWS text_generator.pl Description: Perl program
Set arbitrary GUC options during initdb
The attached patch responds to the discussion at [1] about how we ought to offer a way to set any server GUC from the initdb command line. Currently, if for some reason the server won't start with default parameters, the only way to get through initdb is to change the installed version of postgresql.conf.sample. And even that is just a kluge, because the initial probes to choose max_connections and shared_buffers will all fail, causing initdb to choose rock-bottom-minimum values of those settings. You can fix that up after the fact if you notice it, but you might not. So this invents an initdb switch "-c NAME=VALUE" just like the one that the server itself has long had. The specified settings are applied on the command line of the initial probe calls (which happen before we've made any config files), and then they are added to postgresql.auto.conf, which causes them to take effect for the bootstrap backend runs as well as subsequent postmaster starts. I also invented "--set NAME=VALUE", mainly because just about every other initdb switch has a long form. The server itself doesn't have that spelling, so I'm not wedded to that part. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/17757-dbdfc1f1c954a6db%40postgresql.org diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 5b2bdac101..904cee7858 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -433,6 +433,23 @@ PostgreSQL documentation Other, less commonly used, options are also available: + + -c name=value + --set name=value + + +Forcibly set the server parameter name +to value during initdb, +and also set up that assignment in the +generated postgresql.auto.conf file, as though +it had been issued via ALTER SYSTEM SET. +This option can be used more than once to set several parameters. +It is primarily useful when the environment is such that the server +will not start at all using the default parameters. + + + + -d --debug diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 7a58c33ace..8daad6a1f6 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -82,6 +82,13 @@ /* Ideally this would be in a .h file, but it hardly seems worth the trouble */ extern const char *select_default_timezone(const char *share_path); +/* simple list of strings */ +typedef struct _stringlist +{ + char *str; + struct _stringlist *next; +} _stringlist; + static const char *const auth_methods_host[] = { "trust", "reject", "scram-sha-256", "md5", "password", "ident", "radius", #ifdef ENABLE_GSS @@ -142,6 +149,8 @@ static char *pwfilename = NULL; static char *superuser_password = NULL; static const char *authmethodhost = NULL; static const char *authmethodlocal = NULL; +static _stringlist *extra_guc_names = NULL; +static _stringlist *extra_guc_values = NULL; static bool debug = false; static bool noclean = false; static bool noinstructions = false; @@ -253,6 +262,7 @@ static void check_input(char *path); static void write_version_file(const char *extrapath); static void set_null_conf(void); static void test_config_settings(void); +static bool test_specific_config_settings(int test_conns, int test_buffs); static void setup_config(void); static void bootstrap_template1(void); static void setup_auth(FILE *cmdfd); @@ -359,6 +369,27 @@ escape_quotes_bki(const char *src) return result; } +/* + * Add an item at the end of a stringlist. + */ +static void +add_stringlist_item(_stringlist **listhead, const char *str) +{ + _stringlist *newentry = pg_malloc(sizeof(_stringlist)); + _stringlist *oldentry; + + newentry->str = pg_strdup(str); + newentry->next = NULL; + if (*listhead == NULL) + *listhead = newentry; + else + { + for (oldentry = *listhead; oldentry->next; oldentry = oldentry->next) + /* skip */ ; + oldentry->next = newentry; + } +} + /* * make a copy of the array of lines, with token replaced by replacement * the first time it occurs on each line. @@ -873,11 +904,9 @@ test_config_settings(void) 400, 300, 200, 100, 50 }; - char cmd[MAXPGPATH]; const int connslen = sizeof(trial_conns) / sizeof(int); const int bufslen = sizeof(trial_bufs) / sizeof(int); int i, -status, test_conns, test_buffs, ok_buffers = 0; @@ -903,19 +932,7 @@ test_config_settings(void) test_conns = trial_conns[i]; test_buffs = MIN_BUFS_FOR_CONNS(test_conns); - snprintf(cmd, sizeof(cmd), - "\"%s\" --check %s %s " - "-c max_connections=%d " - "-c shared_buffers=%d " - "-c dynamic_shared_memory_type=%s " - "< \"%s\" > \"%s\" 2>&1", - backend_exec, boot_options, extra_options, - test_conns, test_buffs, - dynamic_shared_memory_type, - DEVNULL, DEVNULL); - fflush(NULL); - status = system(cmd); - if (status == 0) + if
Re: plpython vs _POSIX_C_SOURCE
Hi, Pushed the patches. So far no fallout, and hoverfly recovered. I just checked a few of the more odd animals (Illumos, Solaris, old OpenBSD, AIX) that already ran without finding new warnings. There's a few more animals to run before I'll fully relax though. On 2023-01-25 08:31:23 -0500, Robert Haas wrote: > Plus, the cost of experimentation here seems very low. Sure, something > might break, but if it does, we can just change it back, or change it > again. That's not really a big deal. The thing that would be a big > deal, maybe, is if we released and only found out afterward that this > caused some subtle and horrible problem for which we had no > back-patchable fix, but that seems pretty unlikely. Agreed. Greetings, Andres Freund
Re: heapgettup() with NoMovementScanDirection unused in core?
Andres Freund writes: > On 2023-01-25 10:02:28 -0500, Tom Lane wrote: >> We must have the NoMovementScanDirection option because count = 0 >> does not mean "do nothing", and I noted at least two call sites >> that require it. > I wonder if we'd be better off removing NoMovementScanDirection, and using > count == (uint64)-1 for what NoMovementScanDirection is currently used for as > an ExecutorRun parameter. Seems less confusing to me - right now we have two > parameters with non-obbvious meanings and interactions. I'm down on that because it seems just about certain to break extensions that call the executor, and it isn't adding enough clarity (IMHO) to justify that. regards, tom lane
Re: GUCs to control abbreviated sort keys
On Tue, 2023-01-24 at 19:43 -0600, Justin Pryzby wrote: > I think "an optimization, if applicable" is either too terse, or > somehow > wrong. Maybe: > > > Enables or disables the use of abbreviated keys, a sort > > optimization... Done. > > + optimization could return wrong results. Set to > > + true if certain that > > strxfrm() > > + can be trusted. > > "if you are certain"; or "if it is ..." Done. Thank you, rebased patch attached. -- Jeff Davis PostgreSQL Contributor Team - AWS From 39ed011cc51ba3a4af5e3b559a7b8de25fb895a5 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Sat, 21 Jan 2023 12:44:07 -0800 Subject: [PATCH v2] Introduce GUCs to control abbreviated keys sort optimization. The setting sort_abbreviated_keys turns the optimization on or off overall. The optimization relies on collation providers, which are complex dependencies, and the performance of the optimization may rely on many factors. Introducing a GUC allows easier diagnosis when this optimization results in worse perforamnce. The setting trust_strxfrm replaces the define TRUST_STRXFRM, allowing users to experiment with the abbreviated keys optimization when using the libc provider. Previously, the optimization only applied to collations using the ICU provider unless specially compiled. By default, allowed only for superusers (because an incorrect setting could lead to wrong results), but can be granted to others. --- doc/src/sgml/config.sgml | 40 ++ src/backend/utils/adt/varlena.c| 10 +++--- src/backend/utils/misc/guc_tables.c| 24 + src/backend/utils/sort/tuplesortvariants.c | 17 ++--- 4 files changed, 82 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index f985afc009..8f55b89f35 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11252,6 +11252,46 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir' + + sort_abbreviated_keys (boolean) + + sort_abbreviated_keys configuration parameter + + + + +Enables or disables the use of abbreviated sort keys, a sort optimization, +if applicable. The default is true. Disabling may +be useful to diagnose problems or measure performance. + + + + + + trust_strxfrm (boolean) + + trust_strxfrm configuration parameter + + + + +Abbreviated keys, a sort optimization, depends on correct behavior of +the operating system function strxfrm() when +using a collation with the libc provider. On some +platforms strxfrm() does not return results +consistent with strcoll(), which means the +optimization could return wrong results. Set to +true if it is certain that +strxfrm() can be trusted. + + +The default value is false. This setting has no +effect if is set to +false. + + + + trace_locks (boolean) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index fd81c47474..c270022483 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -45,6 +45,9 @@ /* GUC variable */ int bytea_output = BYTEA_OUTPUT_HEX; +/* GUC to enable use of strxfrm() for abbreviated keys */ +bool trust_strxfrm = false; + typedef struct varlena unknown; typedef struct varlena VarString; @@ -2115,7 +2118,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid) * other libc other than Cygwin has so far been shown to have a problem, * we take the conservative course of action for right now and disable * this categorically. (Users who are certain this isn't a problem on - * their system can define TRUST_STRXFRM.) + * their system can set the trust_strxfrm GUC to true.) * * Even apart from the risk of broken locales, it's possible that there * are platforms where the use of abbreviated keys should be disabled at @@ -2128,10 +2131,9 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid) * categorically, we may still want or need to disable it for particular * platforms. */ -#ifndef TRUST_STRXFRM - if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU)) + if (!trust_strxfrm && !collate_c && + !(locale && locale->provider == COLLPROVIDER_ICU)) abbreviate = false; -#endif /* * If we're using abbreviated keys, or if we're using a locale-aware diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 4ac808ed22..fd4a02fbf5 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -102,6 +102,8 @@ extern bool trace_syncscan; #ifdef DEBUG_BOUNDED_SORT extern bool optimize_bounded_sort; #endif +extern bool sort_abbreviated_keys; +extern bool trust_strxfrm;
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 02:05:39PM -0500, Robert Haas wrote: > On Wed, Jan 25, 2023 at 1:17 PM Nathan Bossart > wrote: >> I wanted to merge basic_archive and basic_recovery because there's a decent >> chunk of duplicated code. Perhaps that is okay, but I would rather just >> have one test module. AFAICT the biggest reason to split it is because we >> can't determine a good name. Maybe we could leave the name as >> "basic_archive" since it deals with creating and recovering archive files. > > Yeah, maybe. I'm not sure what the best thing to do is, but if I see a > module called basic_archive or basic_restore, I know what it's about, > whereas basic_wal_module seems a lot less specific. It sounds like it > could be generating or streaming it just as easily as it could be > archiving it. It would be nice to have a name that is less prone to > that kind of unclarity. Good point. It seems like the most straightforward approach is just to have separate modules. Unless Michael objects, I'll go that route. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: to_hex() for negative inputs
On 24.01.23 14:10, Dean Rasheed wrote: I also think it might be useful for it to gain a couple of boolean options: 1). An option to output a signed value (defaulting to false, to preserve the current two's complement output). I find the existing behavior so strange, I would rather give up and invent a whole new function family with correct behavior, which could then also support octal and binary, and perhaps any base. This could be something generally named, like "convert" or "format". 2). An option to output the base prefix "0x" (which comes after the sign, making it inconvenient for the user to add themselves). This could also be handled with a "format"-like function.
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
Nathan Bossart writes: > I wanted to merge basic_archive and basic_recovery because there's a decent > chunk of duplicated code. Would said code likely be duplicated into non-test uses of this feature? If so, maybe you ought to factor it out into a common location. I agree with Robert's point that basic_wal_module is a pretty content-free name. regards, tom lane
Re: Improve WALRead() to suck data directly from WAL buffers when possible
On Sat, Jan 14, 2023 at 12:34 PM Andres Freund wrote: > Hi, > > On 2023-01-14 00:48:52 -0800, Jeff Davis wrote: > > On Mon, 2022-12-26 at 14:20 +0530, Bharath Rupireddy wrote: > > > Please review the attached v2 patch further. > > > > I'm still unclear on the performance goals of this patch. I see that it > > will reduce syscalls, which sounds good, but to what end? > > > > Does it allow a greater number of walsenders? Lower replication > > latency? Less IO bandwidth? All of the above? > > One benefit would be that it'd make it more realistic to use direct IO for > WAL > - for which I have seen significant performance benefits. But when we > afterwards have to re-read it from disk to replicate, it's less clearly a > win. > +1. Archive modules rely on reading the wal files for PITR. Direct IO for WAL requires reading these files from disk anyways for archival. However, Archiving using utilities like pg_receivewal can take advantage of this patch together with direct IO for WAL. Thanks, Satya
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
Hi, On 2023-01-25 14:05:39 -0500, Robert Haas wrote: > > I wanted to merge basic_archive and basic_recovery because there's a decent > > chunk of duplicated code. Perhaps that is okay, but I would rather just > > have one test module. AFAICT the biggest reason to split it is because we > > can't determine a good name. Maybe we could leave the name as > > "basic_archive" since it deals with creating and recovering archive files. > > Yeah, maybe. I'm not sure what the best thing to do is, but if I see a > module called basic_archive or basic_restore, I know what it's about, > whereas basic_wal_module seems a lot less specific. It sounds like it > could be generating or streaming it just as easily as it could be > archiving it. It would be nice to have a name that is less prone to > that kind of unclarity. I think it'd be just fine to keep the name as basic_archive and use it for both archiving and restoring. Restoring from an archive still deals with archiving. I agree that basic_wal_module isn't a good name. Greetings, Andres Freund
Re: Set arbitrary GUC options during initdb
Hi, On 2023-01-25 16:25:19 -0500, Tom Lane wrote: > The attached patch responds to the discussion at [1] about how > we ought to offer a way to set any server GUC from the initdb > command line. Are you thinking of backpatching this, to offer the people affected by the issue in [1] a way out? > So this invents an initdb switch "-c NAME=VALUE" just like the > one that the server itself has long had. I still am +1 on the idea. I've actually wanted this for development purposes a couple times... > The specified settings are applied on the command line of the initial probe > calls (which happen before we've made any config files), and then they are > added to postgresql.auto.conf, which causes them to take effect for the > bootstrap backend runs as well as subsequent postmaster starts. I think this means that if you set e.g. max_connections as an initdb parameter, the probes won't do much. Probably fine? Perhaps worth memorializing the priority of the -c options in a test? E.g. setting shared_buffers = 20MB or so and then testing that that's the value when starting the server? > I also invented "--set NAME=VALUE", mainly because just about > every other initdb switch has a long form. The server itself > doesn't have that spelling, so I'm not wedded to that part. Fine with me, but also fine to leave out. Greetings, Andres Freund
Re: Re: Support plpgsql multi-range in conditional control
On Wed, 25 Jan 2023 at 12:02, Pavel Stehule wrote: > > > st 25. 1. 2023 v 17:22 odesílatel songjinzhou <2903807...@qq.com> napsal: > >> >> As follows, we can only repeat the for statement before we use such SQL: >> >> begin >> for i in 10..20 loop >> raise notice '%', i; -- Things to do >> end loop; >> >> for i in 100 .. 200 by 10 loop >> raise notice '%', i; -- Things to do >> end loop; >> end; >> >> But now we can simplify it as follows: >> >> begin >> for i in 10..20, 100 .. 200 by 10 loop >> raise notice '%', i; -- Things to do >> end loop; >> end; >> >> Although we can only use integer iterative control here, this is just a >> horizontal expansion of the previous logic. Thank you very much for your >> reply. I am very grateful! >> >> > Unfortunately, this is not a real use case - this is not an example from > the real world. > And anyway, this is already supported using generate_series() and UNION: odyssey=> do $$ declare i int; begin for i in select generate_series (10, 20) union all select generate_series (100, 200, 10) do loop raise notice 'i=%', i; end loop; end;$$; NOTICE: i=10 NOTICE: i=11 NOTICE: i=12 NOTICE: i=13 NOTICE: i=14 NOTICE: i=15 NOTICE: i=16 NOTICE: i=17 NOTICE: i=18 NOTICE: i=19 NOTICE: i=20 NOTICE: i=100 NOTICE: i=110 NOTICE: i=120 NOTICE: i=130 NOTICE: i=140 NOTICE: i=150 NOTICE: i=160 NOTICE: i=170 NOTICE: i=180 NOTICE: i=190 NOTICE: i=200 DO odyssey=> The existing x..y notation is just syntactic sugar for a presumably common case (although I’m dubious how often one really loops through a range of numbers — surely in a database looping through a query result is overwhelmingly dominant?); I don’t think you’ll find much support around here for adding more syntax possibilities to the loop construct.
Re: Set arbitrary GUC options during initdb
Andres Freund writes: > On 2023-01-25 16:25:19 -0500, Tom Lane wrote: >> The attached patch responds to the discussion at [1] about how >> we ought to offer a way to set any server GUC from the initdb >> command line. > Are you thinking of backpatching this, to offer the people affected by the > issue in [1] a way out? We could ... it's a new feature for sure, but it seems quite unlikely to break things for anybody not using it. >> The specified settings are applied on the command line of the initial probe >> calls (which happen before we've made any config files), and then they are >> added to postgresql.auto.conf, which causes them to take effect for the >> bootstrap backend runs as well as subsequent postmaster starts. > I think this means that if you set e.g. max_connections as an initdb > parameter, the probes won't do much. Probably fine? Right, the probed value will be overridden. > Perhaps worth memorializing the priority of the -c options in a test? > E.g. setting shared_buffers = 20MB or so and then testing that that's the > value when starting the server? Given that it's written into postgresql.auto.conf, I imagine that we have test coverage of that point already. There is a more subtle issue, which is that -c max_connections or -c shared_buffers should override the probe values *during the probe steps*. My first thought about implementation had been to create postgresql.auto.conf right off the bat, but that would fail to have this property because server command line overrides config file. I can't think of any very portable way to check that though. regards, tom lane
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi, On 2023-01-25 22:00:50 +0300, Aleksander Alekseev wrote: > Perhaps that's not a bug especially considering the fact that the > documentation describes this behavior, but in any case the fact that: > > ``` > INSERT INTO t VALUES (1,1) ON CONFLICT (a) DO UPDATE SET b = 0; > INSERT INTO t VALUES (1,2) ON CONFLICT (a) DO UPDATE SET b = 0; > ``` > > and: > > ``` > INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO NOTHING; > `` > > .. both work, and: > > ``` > INSERT INTO t VALUES (1,1), (1,2) ON CONFLICT (a) DO UPDATE SET b = 0; > ``` > > ... doesn't is rather confusing. There is no reason why the latest > query shouldn't work except for a slight complication of the code. > Which seems to be a reasonable tradeoff, for me at least. I don't agree that this is just about a "slight complication" of the code. I think semantically the proposed new behaviour is pretty bogus. It *certainly* can't be right to just continue with the update in heap_update, as you've done. You'd have to skip the update, not execute it. What am I missing here? I think this'd completely break triggers, for example, because they won't be able to get the prior row version, since it won't actually be a row ever visible (due to cmin=cmax). I suspect it might break unique constraints as well, because we'd end up with an invisible row in part of the ctid chain. > > But what's the justification for erroring out in the DO NOTHING case? > > > > [...] > > > > It seems somewhat likely that a behavioural change will cause trouble for > > some > > of the uses of DO NOTHING out there. > > Just to make sure we are on the same page. The patch doesn't break the > current DO NOTHING behavior but rather makes DO UPDATE work the same > way DO NOTHING does. I see that now - I somehow thought you were recommending to error out in both cases, rather than the other way round. Greetings, Andres Freund
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 04:50:22PM -0500, Tom Lane wrote: > Nathan Bossart writes: >> I wanted to merge basic_archive and basic_recovery because there's a decent >> chunk of duplicated code. > > Would said code likely be duplicated into non-test uses of this feature? > If so, maybe you ought to factor it out into a common location. I agree > with Robert's point that basic_wal_module is a pretty content-free name. I doubt it. The duplicated parts are things like _PG_init(), the check hook for the GUC, and all the rest of the usual boilerplate stuff for extensions (e.g., Makefile, meson.build). This module is small enough that this probably makes up the majority of the code. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 01:58:01PM -0800, Andres Freund wrote: > I think it'd be just fine to keep the name as basic_archive and use it for > both archiving and restoring. Restoring from an archive still deals with > archiving. This is my preference. If Michael and Robert are okay with it, I think this is what we should do. Else, I'll create separate basic_archive and basic_restore modules. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pgsql: Rename contrib module basic_archive to basic_wal_module
On Wed, Jan 25, 2023 at 02:41:18PM -0800, Nathan Bossart wrote: > This is my preference. If Michael and Robert are okay with it, I think > this is what we should do. Else, I'll create separate basic_archive and > basic_restore modules. Grouping both things into the same module has the advantage to ease the configuration, at least in the example, where the archive directory GUC can be used for both paths. Regarding the renaming, I pushed for that. There's little love for it as far as I can see, so will revert accordingly. Keeping basic_archive as name for the module while it includes recovery is fine by me. -- Michael signature.asc Description: PGP signature
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
On 1/24/23 12:04, Robert Haas wrote: > I find the concept of "ambient authentication" problematic. I don't > know exactly what you mean by it. I hope you'll tell me, Sure: Ambient authority [1] means that something is granted access based on some aspect of its existence that it can't remove (or even necessarily enumerate). Up above, when you said "I cannot choose not to be myself," that's a clear marker that ambient authority is involved. Examples of ambient authn/z factors might include an originating IP address, the user ID of a connected peer process, the use of a loopback interface, a GPS location, and so on. So 'peer' and 'ident' are ambient authentication methods. And, because I think it's useful, I'll extend the definition to include privileges that _could_ be dropped by a proxy, but in practice are included because there's no easy way not to. Examples for libpq include the automatic use of the client certificate in ~/.postgresql, or any Kerberos credentials available in the local user cache. (Or even a PGPASSWORD set up and forgotten by a DBA.) Ambient authority is closely related to the confused deputy problem [2], and the proxy discussed here is a classic confused deputy. The proxy doesn't know that a core piece of its identity has been used to authenticate the request it's forwarding. It can't choose its IP address, or its user ID. I'm most familiar with this in the context of HTTP, cookie-/IP-based authn, and cross-site request forgeries. Whenever someone runs a local web server with no authentication and says "it's okay! we only respond to requests from the local host!" they're probably about to be broken open by the first person to successfully reflect a request through the victim's (very local) web browser. Ways to mitigate or solve this problem (that I know of) include 1) Forwarding the original ambient context along with the request, so the server can check it too. HTTP has the Origin header, so a browser can say, "This request is not coming from my end user; it's coming from a page controlled by example.org. You can't necessarily treat attached cookies like they're authoritative." The PROXY protocol lets a proxy forward several ambient factors, including the originating IP address (or even the use of a UNIX socket) and information about the original TLS context. 2) Explicitly combining the request with the proof of authority needed to make it, as in capability-based security [3]. Some web frameworks push secret "CSRF tokens" into URLs for this purpose, to tangle the authorization into the request itself [4]. I'd argue that the "password requirement" implemented by postgres_fdw and discussed upthread was an attempt at doing this, to try to ensure that the authentication comes from the user explicitly and not from the proxy. It's just not very strong. (require_auth would strengthen it quite a bit; a major feature of that patchset is to explicitly name the in-band authentication factors that a server is allowed to pull out of a client. It's still not strong enough to make a true capability, for one because it's client-side only. But as long as servers don't perform actions on behalf of users upon connection, that's pretty good in practice.) 3) Dropping as many implicitly-held privileges as possible before making a request. This doesn't solve the problem but may considerably reduce the practical attack surface. For example, if browsers didn't attach their user's cookies to cross-origin requests, cross-site request forgeries would probably be considerably less dangerous (and, in the years since I left the space, it looks like browsers have finally stopped doing this by default). Upthread, Andres suggested disabling the default inclusion of client certs and GSS creds, and I would extend that to include really *anything* pulled in from the environment. Make the DBA explicitly allow those things. > but I think > that I won't like it even after I know, because as I said before, it's > difficult to know why anyone else makes a decision, and asking an > untrusted third-party why they're deciding something is sketchy at > best. I think that's a red herring. Setting aside that you can, in fact, prove that the server has authenticated you (e.g. require_auth=scram-sha-256 in my proposed patchset), I don't think "untrusted servers, that we don't control, doing something stupid" is a very useful thing to focus on. We're trying to secure the case where a server *is* authenticating us, using known useful factors, but those factors have been co-opted by an attacker via a proxy. > I think that the problems we have in this area can be solved by > either (a) restricting the open proxy to be less open or (b) > encouraging people to authenticate users in some way that won't admit > connections from an open proxy. (a) is an excellent mitigation, and we should do it. (b) starts getting shaky because I think peer auth is actually a very reasonable choice for many people. So I hope we can also start solv
[BUG] pg_stat_statements and extended query protocol
Doing some work with extended query protocol, I encountered the same issue that was discussed in [1]. It appears when a client is using extended query protocol and sends an Execute message to a portal with max_rows, and a portal is executed multiple times, pg_stat_statements does not correctly track rows and calls. Using the attached jdbc script, TEST.java, which can reproduce the issue with setFetchSize of 100 with autocommit mode set to OFF. We can see that although pg_class has 414 rows, the total call and rows returned is 14. the first 4 * 100 fetches did not get accounted for,. postgres=# select calls, rows, query from pg_stat_statements postgres-# where queryid = '-1905758228217333571'; calls | rows | query - 1 | 14 | select * from pg_class (1 row) The execution work flow goes something like this: ExecutorStart ExecutorRun – which will be called multiple times to fetch from the portal until the caller Closes the portal or the portal runs out of rows. ExecutorFinish ExecutorEnd – portal is closed & pg_stat_statements stores the final rows processed Where this breaks for pg_stat_statements is during ExecutorRun, es_processed is reset to 0 every iteration. So by the time the portal is closed, es_processed will only show the total from the last execute message. This appears to be only an issue for portals fetched through extended query protocol and not explicit cursors that go through simple query protocol (i.e. FETCH ) I attached a JDBC script to repro the issue. One potential fix I see is to introduce 2 new counters in the ExecutionState which will track the total rows processed and the number of calls. These counters can then be used by pg_stat_statements. Attached is an experimental patch which shows the correct number of rows and number of calls. postgres=# select calls, rows, query from pg_stat_statements postgres-# where queryid = '-1905758228217333571'; calls | rows | query - 5 | 414 | select * from pg_class (1 row) [1] https://www.postgresql.org/message-id/flat/c90890e7-9c89-c34f-d3c5-d5c763a34bd8%40dunslane.net Thanks – Sami Imseih Amazon Web Services (AWS) 0001-correct-pg_stat_statements-tracking-of-portals.patch Description: 0001-correct-pg_stat_statements-tracking-of-portals.patch Test.java Description: Test.java
Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound (stop telling users to "vacuum that database in single-user mode")
On Mon, Jan 16, 2023 at 03:50:57PM +0300, Aleksander Alekseev wrote: > Hi hackers, > > > The proposed patchset changes the documentation and the error messages > > accordingly, making them less misleading. 0001 corrects the > > documentation but doesn't touch the code. 0002 and 0003 correct the > > messages shown when approaching xidWrapLimit and xidWarnLimit > > accordingly. > > A colleague of mine, Oleksii Kliukin, pointed out that the > recommendation about running VACUUM in a single-user mode is also > outdated, as it was previously reported in [1]. I didn't believe it at > first and decided to double-check: and again at: https://www.postgresql.org/message-id/flat/CA%2BTgmoYPfofQmRtUan%3DA3aWE9wFsJaOFr%2BW_ys2pPkNPr-2FZw%40mail.gmail.com#e7dd25fdcd171c5775f3f9e3f86b2082 > Unfortunately the [1] discussion went nowhere. likewise... > So I figured it would be appropriate to add corresponding changes to > the proposed patchset since it's relevant and is registered in the CF > app already. PFA patchset v2 which now also includes 0004. > > [1]: > https://www.postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com I suggest to resend this with a title like the 2021 thread [1] (I was unable to find this just now when I looked) | doc: stop telling users to "vacuum that database in single-user mode" And copy the participants of the previous two iterations of this thread. -- Justin
Re: suppressing useless wakeups in logical/worker.c
On Tue, Jan 24, 2023 at 06:45:08PM -0500, Tom Lane wrote: > I took a look through this, and have a number of mostly-cosmetic > issues: Thanks for the detailed review. > * It seems wrong that next_sync_start isn't handled as one of the > wakeup[NUM_LRW_WAKEUPS] entries. I see that it needs to be accessed from > another module; but you could handle that without exposing either enum > LogRepWorkerWakeupReason or the array, by providing getter/setter > functions for process_syncing_tables_for_apply() to call. > > * This code is far too intimately familiar with the fact that TimestampTz > is an int64 count of microseconds. (I'm picky about that because I > remember that they were once something else, so I wonder if someday > they will be different again.) You could get rid of the PG_INT64_MAX > usages by replacing those with the timestamp infinity macro DT_NOEND; > and I'd even be on board with adding a less-opaque alternate name for > that to datatype/timestamp.h. The various magic-constant multipliers > could perhaps be made less magic by using TimestampTzPlusMilliseconds(). > > * I think it might be better to construct the enum like this: > > +typedef enum LogRepWorkerWakeupReason > +{ > + LRW_WAKEUP_TERMINATE, > + LRW_WAKEUP_PING, > + LRW_WAKEUP_STATUS > +#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1) > +} LogRepWorkerWakeupReason; > > so that you don't have to have a default: case in switches on the > enum value. I'm more worried about somebody adding an enum value > and missing updating a switch statement elsewhere than I am about > somebody adding an enum value and neglecting to update the > immediately-adjacent macro. I did all of this in v3. > * The updates of "now" in LogicalRepApplyLoop seem rather > randomly placed, and I'm not entirely convinced that we'll > always be using a reasonably up-to-date value. Can't we > just update it right before each usage? This came up for walreceiver.c, too. The concern is that GetCurrentTimestamp() might be rather expensive on systems without something like the vDSO. I don't know how common that is. If we can rule that out, then I agree that we should just update it right before each use. > * This special handling of next_sync_start seems mighty ugly: > > +/* Also consider special wakeup time for starting sync workers. > */ > +if (next_sync_start < now) > +{ > +/* > + * Instead of spinning while we wait for the sync worker to > + * start, wait a bit before retrying (unless there's an > earlier > + * wakeup time). > + */ > +nextWakeup = Min(now + INT64CONST(10), nextWakeup); > +} > +else > +nextWakeup = Min(next_sync_start, nextWakeup); > > Do we really need the slop? If so, is there a reason why it > shouldn't apply to all possible sources of nextWakeup? (It's > going to be hard to fold next_sync_start into the wakeup[] > array unless you can make this not a special case.) I'm not positive it is absolutely necessary. AFAICT the function that updates this particular wakeup time is conditionally called, so it at least seems theoretically possible that we could end up spinning in a tight loop until we attempt to start a new tablesync worker. But perhaps this is unlikely enough that we needn't worry about it. I noticed that this wakeup time wasn't being updated when the number of active tablesync workers is >= max_sync_workers_per_subscription. In v3, I tried to handle this by setting the wakeup time to a second later for this case. I think you could ordinarily depend on the tablesync worker's notify_pid to wake up the apply worker, but that wouldn't work if the apply worker has restarted. Ultimately, this particular wakeup time seems to be a special case, and I probably need to think about it some more. If you have ideas, I'm all ears. > * It'd probably be worth enlarging the comment for > LogRepWorkerComputeNextWakeup to explain why its API is like that, > perhaps "We ask the caller to pass in the value of "now" because > this frequently avoids multiple calls of GetCurrentTimestamp(). > It had better be a reasonably-up-to-date value, though." I did this in v3. I noticed that many of your comments also applied to the similar patch that was recently applied to walreceiver.c, so I created another patch to fix that up. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 3b464bf0ccb22e36ab627a5e19981eaf3734d4dd Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 24 Jan 2023 20:52:21 -0800 Subject: [PATCH v3 1/2] code review for 05a7be9 --- src/backend/replication/walreceiver.c | 31 ++- src/include/utils/timestamp.h | 1 + 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 3876c0188d..0563bad0f6
Re: drop postmaster symlink
Hello, Somehow I missed the email changing the status of this back to "needs review". Buried in https://www.postgresql.org/message-id/20230107165942.748ccf4e%40slate.karlpinc.com is the one change I see that should be made. > In doc/src/sgml/ref/allfiles.sgml at line 222 there is an ENTITY > defined which references the deleted postmaster.sgml file. This line needs to be removed and the 0002-Don-t-install-postmaster-symlink-anymore.patch updated. (Unless there's some magic going on with the various allfiles.sgml files of which I am not aware.) If this is fixed I see no other problems. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
improving user.c error messages
moving this discussion to a new thread... On Thu, Jan 19, 2023 at 10:20:33AM -0500, Robert Haas wrote: > On Wed, Jan 18, 2023 at 6:17 PM Nathan Bossart > wrote: >> However, as the attribute >> system becomes more sophisticated, I think we ought to improve the error >> messages in user.c. IMHO messages like "permission denied" could be >> greatly improved with some added context. >> >> This probably shouldn't block your patch, but I think it's worth doing in >> v16 since there are other changes in this area. I'm happy to help. > > That would be great. I agree that it's good to try to improve the > error messages. It hasn't been entirely clear to me how to do that. > For instance, I don't think we want to say something like: > > ERROR: must have CREATEROLE privilege and ADMIN OPTION on the target > role, or in lieu of both of those to be superuser, to set the > CONNECTION LIMIT for another role > ERROR: must have CREATEROLE privilege and ADMIN OPTION on the target > role, plus also CREATEDB, or in lieu of all that to be superuser, to > remove the CREATEDB property from another role > > Such messages are long and we'd end up with a lot of variants. It's > possible that the messages could be multi-tier. For instance, if we > determine that you're trying to manage users and you don't have > permission to manage ANY user, we could say: > > ERROR: permission to manage roles denied > DETAIL: You must have the CREATEROLE privilege or be a superuser to > manage roles. > > If you could potentially manage some user, but not the one you're > trying to manage, we could say: > > ERROR: permission to manage role "%s" denied > DETAIL: You need ADMIN OPTION on the target role to manage it. > > If you have permission to manage the target role but not in the > requested manner, we could then say something like: > > ERROR: permission to manage CREATEDB for role "%s" denied > DETAIL: You need CREATEDB to manage it. > > This is just one idea, and maybe not the best one. I'm just trying to > say that I think this is basically an organizational problem. We need > a plan for how we're going to report errors that is not too > complicated to implement with reasonable effort, and that will produce > messages that users will understand. I'd be delighted if you wanted to > provide either ideas or patches... Here is an early draft of some modest improvements to the user.c error messages. I basically just tried to standardize the style of and add context to the existing error messages. I used errhint() for this extra context, but errdetail() would work, too. This isn't perfect. You might still have to go through a couple rounds of errors before your role has all the privileges it needs for a command, but this seems to improve matters a little. I think there is still a lot of room for improvement, but I wanted to at least get the discussion started before I went too far. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c index 3a92e930c0..1982831e2a 100644 --- a/src/backend/commands/user.c +++ b/src/backend/commands/user.c @@ -316,23 +316,28 @@ CreateRole(ParseState *pstate, CreateRoleStmt *stmt) if (!has_createrole_privilege(currentUserId)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("permission denied to create role"))); + errmsg("permission denied to create role"), + errhint("You must have CREATEROLE privilege to create roles."))); if (issuper) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("must be superuser to create superusers"))); + errmsg("permission denied to create role"), + errhint("You must have SUPERUSER privilege to create roles with SUPERUSER."))); if (createdb && !have_createdb_privilege()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("must have createdb permission to create createdb users"))); + errmsg("permission denied to create role"), + errhint("You must have CREATEDB privilege to create roles with CREATEDB."))); if (isreplication && !has_rolreplication(currentUserId)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("must have replication permission to create replication users"))); + errmsg("permission denied to create role"), + errhint("You must have REPLICATION privilege to create roles with REPLICATION."))); if (bypassrls && !has_bypassrls_privilege(currentUserId)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("must have bypassrls to create bypassrls users"))); + errmsg("permission denied to create role"), + errhint("You must have BYPASSRLS privilege to create roles with BYPASSRLS."))); } /* @@ -747,7 +752,8 @@ AlterRole(ParseState *pstate, AlterRoleStmt *stmt) if (!superuser() && (authform->rolsuper || dissuper)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIV
Re: suppressing useless wakeups in logical/worker.c
On Thu, Jan 26, 2023 at 12:50 PM Nathan Bossart wrote: > I did this in v3. I noticed that many of your comments also applied to the > similar patch that was recently applied to walreceiver.c, so I created > another patch to fix that up. Can we also use TimestampDifferenceMilliseconds()? It knows about rounding up for WaitLatch().
Re: suppressing useless wakeups in logical/worker.c
On Thu, Jan 26, 2023 at 01:23:41PM +1300, Thomas Munro wrote: > Can we also use TimestampDifferenceMilliseconds()? It knows about > rounding up for WaitLatch(). I think we might risk overflowing "long" when all the wakeup times are DT_NOEND: * This is typically used to calculate a wait timeout for WaitLatch() * or a related function. The choice of "long" as the result type * is to harmonize with that. It is caller's responsibility that the * input timestamps not be so far apart as to risk overflow of "long" * (which'd happen at about 25 days on machines with 32-bit "long"). Maybe we can adjust that function or create a new one to deal with this. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com