Re: Skipping logical replication transactions on subscriber side
On Fri, Nov 26, 2021 at 6:00 AM Masahiko Sawada wrote: > > Indeed. Attached an updated patch. Thanks! > I have made a number of changes in the attached patch which includes (a) the patch was trying to register multiple array entries for the same subscription which doesn't seem to be required, see changes in pgstat_vacuum_stat, (b) multiple changes in the test like reduced the wal_retrieve_retry_interval to 2s which has reduced the test time to half, remove the check related to resetting of stats as there is no guarantee that the message will be received by the collector and we were not sending it again, changed the test case file name to 026_stats as we can add more subscription-related stats in this test file itself (c) added/edited multiple comments, (d) updated PGSTAT_FILE_FORMAT_ID. Do let me know what you think of the attached? -- With Regards, Amit Kapila. v27-0001-Add-a-view-to-show-the-stats-of-subscription-wor.patch Description: Binary data
Re: Skipping logical replication transactions on subscriber side
On Fri, Nov 26, 2021 at 7:45 AM tanghy.f...@fujitsu.com wrote: > > On Friday, November 26, 2021 9:30 AM Masahiko Sawada > wrote: > > > > Indeed. Attached an updated patch. Thanks! > > Thanks for your patch. A small comment: > > + OID of the relation that the worker is synchronizing; null for the > + main apply worker > > Should we modify it to "OID of the relation that the worker was synchronizing > ..."? > I don't think this change is required, see the description of the similar column in pg_stat_subscription. -- With Regards, Amit Kapila.
Re: Windows: Wrong error message at connection termination
Am 22.11.21 um 00:04 schrieb Tom Lane: Do we know that that actually happens in an arm's-length connection (ie two separate machines)? I wonder if the data loss is strictly an artifact of a localhost connection. There'd be a lot more pressure on them to make cross-machine TCP work per spec, one would think. But in any case, if we can avoid sending RST in this situation, it seems mostly moot for our usage. Sorry it took some days to get a setup to check this! The result is as expected: 1. Windows client to Linux server works without dropping the error message 2. Linux client to Windows server works without dropping the error message 3. Windows client to remote Windows server drops the error message, depending on the timing of the event loop In 1. the Linux server doesn't end the connection with a RST packet, so that the Windows client enqueues the error message properly and doesn't drop it. In 2. the Linux client doesn't care about the RST packet of the Windows server and properly enqueues and raises the error message. In 3. the combination of the bad RST behavior of client and server leads to data loss. It depends on the network timing. A delay of 0.5 ms in the event loop was enough in a localhost setup and as wall as in some LAN setup. On the contrary over some slower WLAN connection a delay of less than 15 ms did not loose data, but higher delays still did. The idea of running a second process, pass the socket handle to it, observe the parent process and close the socket when it exited, could work, but I guess it's overly complicated and creates more issues than it solves. Probably the same if the master process handles the socket closing. So I still think it's best to close the socket as proposed in the patch. -- Regards, Lars Kanis
Re: pg_upgrade and publication/subscription problem
On Fri, Nov 26, 2021 at 5:47 PM Marcos Pegoraro wrote: >> >> AFAIU the main problem in your case is that you didn't block the write >> traffic on the publisher side. Let me try to understand the situation. >> After the upgrade is finished, there are some new tables with data on >> the publisher, and did old tables have any additional data? > > Correct. >> >> >> Are the contents in pg_replication_origin intact after the upgrade? > > Yes >> >> >> So, in short, I think what we need to solve is to get the data from >> new tables and newly performed writes on old tables. I could think of >> the following two approaches: >> >> Approach-1: >> 1. Drop subscription and Truncate all tables corresponding to subscription. >> >> 2. Create a new subscription for the publication. > > If I drop subscription it will drop WAL ou publication side and I lost all > changed data between the starting of pg_upgrade process and now. > I think you can disable the subscription as well or before dropping disassociate the slot from subscription. > My problem is not related with new tables, they will be copied fine because > doesn´t exists any record on subscriber. > But old tables had records modified since that pg_upgrade process, that is my > problem, only that. > Yeah, I understand that point. Here, the problem is that both old and new tables belong to the same publication, and you can't refresh some tables from the publication. > My question remains the same, why pg_subscription_rel was not copied from > previous version ? > > If pg_upgrade would copy pg_replication_origin (it did) and these > pg_subscription_rel (it didn´t) records from version 13 to 14, when I enable > subscription it would start copying data from that point on, correct ? > I think we don't want to make assumptions about the remote end being the same after the upgrade and we let users reactivate the subscriptions in a suitable way. See [1] (Start reading from "..When dumping logical replication subscriptions..") In your case, if you wouldn't have allowed new tables in the publication then a simple Alter Subscription Refresh Publication with (copy_data = false) would have served the purpose. BTW, just for records, this problem has nothing to do with any changes in PG-14, the same behavior exists in the previous versions as well. [1] - https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.
Re: pg_upgrade and publication/subscription problem
> > I think we don't want to make assumptions about the remote end being > the same after the upgrade and we let users reactivate the > subscriptions in a suitable way. See [1] (Start reading from "..When > dumping logical replication subscriptions..") In your case, if you > wouldn't have allowed new tables in the publication then a simple > Alter Subscription Refresh Publication with (copy_data = > false) would have served the purpose. I understand that this is not related with version 14, pg_upgrade would do the same steps on previous versions too. Additionally it would be interesting to document that pg_upgrade does not upgrade completely if the server is a subscriber of logical replication, so it´ll have pre and post steps to do if the server has this kind of replication.
Re: Windows build warnings
On 11/26/21 15:14, Daniel Gustafsson wrote: >> On 26 Nov 2021, at 20:33, Tom Lane wrote: >> >> I think our policy is to suppress unused-variable warnings if they >> appear on current mainstream compilers; and it feels a little churlish >> to deem MSVC non-mainstream. So I stick with my previous suggestion, >> which basically was to disable C4101 until such time as somebody can >> make PG_USED_FOR_ASSERTS_ONLY work correctly on MSVC. In the worst >> case, that might lead a Windows-based developer to submit a patch that >> draws warnings elsewhere ... but the cfbot, other developers, or the >> buildfarm will find such problems soon enough. > I agree with that, and can go make that happen. > [trust I have attributions right] ISTM the worst case is that there will be undetected unused variables in Windows-only code. I guess that would mostly be detected by Msys systems running gcc. Anyway I don't think it's worth arguing a lot about. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Enforce work_mem per worker
Hello! Since I used a lot of my time chasing short lived processes eating away big chunks of memory in recent weeks, I am wondering about a decent way to go about this. The problem I am facing essentially relates to the fact that work_mem settings, while they are enforced per hash and sort node, aren't enforced globally. One common case, that causes this problem more frequently than a few years ago, is the partitionwise_join. If there are a lot of partitions hash joined, we get a lot of hash nodes, each one potentially consuming work_mem. While avoiding oom seems a big deal to me, my search didn't turn up previous hackers discussions about this. There is a good chance I am missing something here, so I'd appreciate any pointers. The most reasonable solution seems to me to have a data structure per worker, that 1. tracks the amount of memory used by certain nodes and 2. offers a callback to let the node spill it's contents (almost) completely to disc. I am thinking about hash and sort nodes for now, since they affect memory usage a lot. This would allow a node to spill other nodes contents to disc to avoid exceeding work_mem. I'd love to hear your thoughts and suggestions! Regards Arne
Re: Enforce work_mem per worker
On Sat, Nov 27, 2021 at 04:33:07PM +, Arne Roland wrote: > Hello! > > Since I used a lot of my time chasing short lived processes eating away big > chunks of memory in recent weeks, I am wondering about a decent way to go > about this. > The problem I am facing essentially relates to the fact that work_mem > settings, while they are enforced per hash and sort node, aren't enforced > globally. > One common case, that causes this problem more frequently than a few years > ago, is the partitionwise_join. If there are a lot of partitions hash joined, > we get a lot of hash nodes, each one potentially consuming work_mem. > While avoiding oom seems a big deal to me, my search didn't turn up previous > hackers discussions about this. There is a good chance I am missing something > here, so I'd appreciate any pointers. Here's some pointers ;) https://www.postgresql.org/message-id/flat/20190708164401.GA22387%40telsasoft.com https://www.postgresql.org/message-id/flat/20191216215314.qvrtgxaz4m755geq%40development#75e9930ac2cd353a8036dc71e8f5e6f7 https://www.postgresql.org/message-id/flat/CAH2-WzmNwV%3DLfDRXPsmCqgmm91mp%3D2b4FvXNF%3DcCvMrb8YFLfQ%40mail.gmail.com - I don't recall reading all of this last one before, and it's got interesting historic value, so I'm reading it myself now... -- Justin
Re: Non-superuser subscription owners
> On Nov 24, 2021, at 4:30 PM, Jeff Davis wrote: > > We need to do permission checking for WITH CHECK OPTION and RLS. The > patch right now allows the subscription to write data that an RLS > policy forbids. Thanks for reviewing and for this observation! I can verify that RLS is not being honored on the subscriber side. I agree this is a problem for subscriptions owned by non-superusers. The implementation of the table sync worker uses COPY FROM, which makes this problem hard to fix, because COPY FROM does not support row level security. We could do some work to honor the RLS policies during the apply workers' INSERT statements, but then some data would circumvent RLS during table sync and other data would honor RLS during worker apply, which would make the implementation not only wrong but inconsistently so. I think a more sensible approach for v15 is to raise an ERROR if a non-superuser owned subscription is trying to replicate into a table which has RLS enabled. We might try to be more clever and check whether the RLS policies could possibly reject the operation (by comparing the TO and FOR clauses of the policies against the role and operation type) but that seems like a partial re-implementation of RLS. It would be simpler and more likely correct if we just unconditionally reject replicating into tables which have RLS enabled. What do you think? > A couple other points: > > * We shouldn't refer to the behavior of previous versions in the docs > unless there's a compelling reason Fair enough. > * Do we need to be smarter about partitioned tables, where an insert > can turn into an update? Do you mean an INSERT statement with an ON CONFLICT DO UPDATE clause that is running against a partitioned table? If so, I don't think we need to handle that on the subscriber side under the current logical replication design. I would expect the plain INSERT or UPDATE that ultimately executes on the publisher to be what gets replicated to the subscriber, and not the original INSERT .. ON CONFLICT DO UPDATE statement. > * Should we refactor to borrow logic from ExecInsert so that it's less > likely that we miss something in the future? Hooking into the executor at a higher level, possibly ExecInsert or ExecModifyTable would do a lot more than what logical replication currently does. If we also always used INSERT/UPDATE/DELETE statements and never COPY FROM statements, we might solve several problems at once, including honoring RLS policies and honoring rules defined for the target table on the subscriber side. Doing this would clearly be a major design change, and possibly one we do not want. Can we consider this out of scope? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: rand48 replacement
Hello Tom, Thanks for the feedback. +/* use Donald Knuth's LCG constants for default state */ How did Knuth get into this? This algorithm is certainly not his, so why are those constants at all relevant? They are not more nor less relevant than any other "random" constant. The state needs a default initialization. The point of using DK's is that it is somehow cannot be some specially crafted value which would have some special property only know to the purveyor of the constant and could be used by them to break the algorithm. https://en.wikipedia.org/wiki/Dual_EC_DRBG * I could do without the stream-of-consciousness notes in pg_prng.c. I think what's appropriate is to say "We use thus-and-such a generator which is documented here", maybe with a line or two about its properties. The stuff was really written essentially as a "why this" for the first patch, and to prevent questions about "why not this other generator" later, because it could never stop. * Function names like these convey practically nothing to readers: +extern int64 pg_prng_i64(pg_prng_state *state); +extern uint32 pg_prng_u32(pg_prng_state *state); +extern int32 pg_prng_i32(pg_prng_state *state); +extern double pg_prng_f64(pg_prng_state *state); +extern bool pg_prng_bool(pg_prng_state *state); The intention is obviously "postgres pseudo-random number generator for ". ISTM that it conveys (1) that it is a postgres-specific stuff, (2) that it is a PRNG (which I find *MUCH* more informative than the misleading statement that something is random when it is not, and it is shorter) and (3) about the type it returns, because C does require functions to have distinct names. What would you suggest? and these functions' header comments add a grand total of zero bits of information. Yes, probably. I do not like not to comment at all on a function. What someone generally wants to know first about a PRNG is (a) is it uniform and (b) what is the range of outputs, neither of which are specified anywhere. ISTM (b) is suggested thanks to the type and (a) I'm not sure about a PRNG which would claim not at least claim to be uniform. Non uniform PRNG are usually built based on a uniform one. What do you suggest as alternate names? +#define FIRST_BIT_MASK UINT64CONST(0x8000) +#define RIGHT_HALF_MASK UINT64CONST(0x) +#define DMANTISSA_MASK UINT64CONST(0x000F) I'm not sure that these named constants are any more readable than writing the underlying constant, maybe less so --- in particular I think something based on (1<<52)-1 would be more appropriate for the float mantissa operations. We don't need RIGHT_HALF_MASK at all, the casts to uint32 or int32 will accomplish that just fine. Yep. I did it for uniformity. BTW, why are we bothering with FIRST_BIT_MASK in the first place, rather than returning "v & 1" for pg_prng_bool? Because some PRNG are very bad in the low bits, not xoroshiro stuff, though. Is xoroshiro128ss less random in the low-order bits than the higher? If so, that would be a pretty important thing to document. If it's not, we shouldn't make the code look like it is. Dunno. Why should we prefer low bits? + * select in a range with bitmask rejection. What is "bitmask rejection"? Is it actually important to callers? No, it is important to understand how it does it. That is the name of the technique which is implemented, which helps if you want to understand what is going on by googling it. This point could be moved inside the function. I think this should be documented more like "Produce a random integer uniformly selected from the range [rmin, rmax)." Sure. -- Fabien.
Re: SSL Tests for sslinfo extension
Daniel Gustafsson writes: > On 17 Jun 2021, at 09:29, Michael Paquier wrote: >> Hmm. Adding internal dependencies between the tests should be avoided >> if we can. What would it take to move those TAP tests to >> contrib/sslinfo instead? This is while keeping in mind that there was >> a patch aimed at refactoring the SSL test suite for NSS. > It would be quite invasive as we currently don't provide the SSLServer test > harness outside of src/test/ssl, and given how tailored it is today I'm not > sure doing so without a rewrite would be a good idea. I think testing sslinfo in src/test/ssl is fine, while putting its test inside contrib/ would be dangerous, because then the test would be run by default. src/test/ssl is not run by default because the server it starts is potentially accessible by other local users, and AFAICS the same has to be true for an sslinfo test. So I don't have any problem with this structurally, but I do have a few nitpicks: * I think the error message added in 0001 should complain about missing password "encryption" not "encoding", no? * 0002 hasn't been updated for the great PostgresNode renaming. * 0002 needs to extend src/test/ssl/README to mention that "make installcheck" requires having installed contrib/sslinfo, analogous to similar comments in (eg) src/test/recovery/README. * 0002 writes a temporary file in the source tree. This is bad; for one thing I bet it fails under VPATH, but in any case there is no reason to risk it. Put it in the tmp_check directory instead (cf temp kdc files in src/test/kerberos/t/001_auth.pl). That's safer and you needn't worry about cleaning it up. * Hmm ... now I notice that you borrowed the key-file-copying logic from the 001 and 002 tests, but it's just as bad practice there. We should fix them too. * I ran a code-coverage check and it shows that this doesn't test ssl_issuer_field() or any of the following functions in sslinfo.c. I think at least ssl_extension_info() is complicated enough to deserve a test. regards, tom lane
Re: rand48 replacement
Fabien COELHO writes: >> How did Knuth get into this? This algorithm is certainly not his, >> so why are those constants at all relevant? > They are not more nor less relevant than any other "random" constant. The > state needs a default initialization. The point of using DK's is that it > is somehow cannot be some specially crafted value which would have some > special property only know to the purveyor of the constant and could be > used by them to break the algorithm. Well, none of that is in the comment, which is probably just as well because it reads like baseless paranoia. *Any* initialization vector should be as good as any other; if it's not, that's an algorithm fault. (OK, I'll give it a pass for zeroes being bad, but otherwise not.) >> * Function names like these convey practically nothing to readers: >> >> +extern int64 pg_prng_i64(pg_prng_state *state); >> +extern uint32 pg_prng_u32(pg_prng_state *state); >> +extern int32 pg_prng_i32(pg_prng_state *state); >> +extern double pg_prng_f64(pg_prng_state *state); >> +extern bool pg_prng_bool(pg_prng_state *state); > The intention is obviously "postgres pseudo-random number generator for > ". ISTM that it conveys (1) that it is a postgres-specific stuff, > (2) that it is a PRNG (which I find *MUCH* more informative than the > misleading statement that something is random when it is not, and it is > shorter) and (3) about the type it returns, because C does require > functions to have distinct names. > What would you suggest? We have names for these types, and those abbreviations are (mostly) not them. Name-wise I'd be all right with pg_prng_int64 and so on, but I still expect that these functions' header comments should be explicit about uniformity and about the precise output range. As an example, it's far from obvious whether the minimum value of pg_prng_int32 should be zero or INT_MIN. (Actually, I suspect you ought to provide both of those cases.) And the output range of pg_prng_float8 is not merely unobvious, but not very easy to deduce from examining the code either; not that users should have to. >> BTW, why are we bothering with FIRST_BIT_MASK in the first place, >> rather than returning "v & 1" for pg_prng_bool? > Because some PRNG are very bad in the low bits, not xoroshiro stuff, > though. Good, but then you shouldn't write associated code as if that's still a problem, because you'll cause other people to think it's still a problem and write equally contorted code elsewhere. "v & 1" is a transparent way of producing a bool, while this code isn't. regards, tom lane
Re: Correct handling of blank/commented lines in PSQL interactive-mode history
I wrote: > We could perhaps finesse that point by deciding that comment lines > that are handled this way will never be sent to the server --- but > I'm sure people will complain about that, too. I've definitely heard > people complain because "--" comments are stripped from what's sent > (so I'd look favorably on a patch to undo that). In hopes of moving this thread forward, I experimented with doing that bit, basically by simplifying the {whitespace} rule in psqlscan.l to be just "ECHO;". That caused massive regression test failures, of which this'll do as a sample: -- -- This should fail -- copy (select * from test1) (t,id) to stdout; ERROR: syntax error at or near "(" -LINE 1: copy (select * from test1) (t,id) to stdout; +LINE 4: copy (select * from test1) (t,id) to stdout; ^ -- -- Test JOIN Of course, the problem is that since we're now including the three "--" lines in what's sent to the server, it thinks the "copy" is on line 4. I do not think we want such a behavior change: people don't tend to think that such comments are part of the query. I then experimented with combining the psqlscan.l change with mainloop.c changes akin to what Greg had proposed, so as to discard leading comments at the level of mainloop.c rather than inside the lexer. I didn't have much luck getting to a behavior that I thought could be acceptable, although maybe with more sweat it'd be possible. One thing I noticed along the line is that because the history mechanism records raw input lines while psqlscan.l discards dash-dash comments, it's already the case that history entries don't match up too well with what's sent to the server. So I'm not sure that my upthread complaint about that holds water, and I'm less against Greg's original patch than I was. Trying to gather together the various issues mentioned on this thread, I see: * Initial input lines that are blank (whitespace, maybe including a comment) are merged into the next command's history entry; but since said lines don't give rise to any text sent to the server, there's not really any reason why they couldn't be treated as text to be emitted to the history file immediately. This is what Greg originally set out to change. After my experiments mentioned above, I'm quite doubtful that his patch is correct in detail (I'm afraid that it probably emits stuff too soon in some cases), but it could likely be fixed if we could just get agreement that a change of that sort is OK. * It's not great that dash-dash comments aren't included in what we send to the server. However, changing that is a lot trickier than it looks. I think we want to continue suppressing comments that precede the query proper. Including comments that are within the query text (ahead of the trailing semi) is not so hard, but comments following the semicolon look too much like comments-ahead-of-the- next-query. Perhaps that issue should be left for another day ... although it does feel like it interacts with the first point. * Misbehavior of M-# was also mentioned. Does anyone object to the draft patch I posted for that? regards, tom lane
Re: Deduplicate code updating ControleFile's DBState.
On Fri, Nov 26, 2021 at 2:48 PM Amul Sul wrote: > > ControlFile.state = DB_SHUTDOWNED; > > - ControlFile.time = (pg_time_t) time(NULL); > > This one had better not be removed, either, as we require pg_resetwal > > to guess a set of control file values. Removing the one in > > RewriteControlFile() is fine, on the contrary. > > My bad, sorry for the sloppy change, corrected it in the attached version. Thanks for the patch. By moving the time update to update_controlfile, the patch ensures that we have the correct last updated time. Earlier we were missing (in some places) to update the time before calling UpdateControlFile. Isn't it better if we update the ControlFile->time at the end of the update_controlfile, after file write/sync? Why do we even need UpdateControlFile which just calls another function? It may be there for usability and readability, but can't the pg backend code just call update_controlfile(DataDir, ControlFile, true); directly so that a function call cost can be avoided? Otherwise, why can't we make UpdateControlFile an inline function? I'm not sure if any of the compilers will ever optimize by inlining it without the "inline" keyword. Regards, Bharath Rupireddy.
Re: Unnecessary delay in streaming replication due to replay lag
On Tue, Nov 23, 2021 at 1:39 AM Soumyadeep Chakraborty wrote: > > Hi Bharath, > > Yes, that thread has been discussed here. Asim had x-posted the patch to [1]. > This thread > was more recent when Ashwin and I picked up the patch in Aug 2021, so we > continued here. > The patch has been significantly updated by us, addressing Michael's long > outstanding feedback. Thanks for the patch. I reviewed it a bit, here are some comments: 1) A memory leak: add FreeDir(dir); before returning. + ereport(LOG, + (errmsg("Could not start streaming WAL eagerly"), + errdetail("There are timeline changes in the locally available WAL files."), + errhint("WAL streaming will begin once all local WAL and archives are exhausted."))); + return; + } 2) Is there a guarantee that while we traverse the pg_wal directory to find startsegno and endsegno, the new wal files arrive from the primary or archive location or old wal files get removed/recycled by the standby? Especially when wal_receiver_start_condition=consistency? + startsegno = (startsegno == -1) ? logSegNo : Min(startsegno, logSegNo); + endsegno = (endsegno == -1) ? logSegNo : Max(endsegno, logSegNo); + } 3) I think the errmsg text format isn't correct. Note that the errmsg text starts with lowercase and doesn't end with "." whereas errdetail or errhint starts with uppercase and ends with ".". Please check other messages for reference. The following should be changed. + errmsg("Requesting stream from beginning of: %s", + errmsg("Invalid WAL segment found while calculating stream start: %s. Skipping.", + (errmsg("Could not start streaming WAL eagerly"), 4) I think you also need to have wal files names in double quotes, something like below: errmsg("could not close file \"%s\": %m", xlogfname))); 5) It is ".stream start: \"%s\", skipping..", + errmsg("Invalid WAL segment found while calculating stream start: %s. Skipping.", 4) I think the patch can make the startup process significantly slow, especially when there are lots of wal files that exist in the standby pg_wal directory. This is because of the overhead StartWALReceiverEagerlyIfPossible adds i.e. doing two while loops to figure out the start position of the streaming in advance. This might end up the startup process doing the loop over in the directory rather than the important thing of doing crash recovery or standby recovery. 5) What happens if this new GUC is enabled in case of a synchronous standby? What happens if this new GUC is enabled in case of a crash recovery? What happens if this new GUC is enabled in case a restore command is set i.e. standby performing archive recovery? 6) How about bgwriter/checkpointer which gets started even before the startup process (or a new bg worker? of course it's going to be an overkill) finding out the new start pos for the startup process and then we could get rid of startup behaviour of the patch? This avoids an extra burden on the startup process. Many times, users will be complaining about why recovery is taking more time now, after the GUC wal_receiver_start_condition=startup. 7) I think we can just have 'consistency' and 'exhaust' behaviours and let the bgwrite or checkpointer find out the start position for the startup process, so the startup process whenever reaches a consistent point, it sees if the other process has calculated start pos for it or not, if yes it starts wal receiver other wise it goes with its usual recovery. I'm not sure if this will be a good idea. 8) Can we have a better GUC name than wal_receiver_start_condition? Something like wal_receiver_start_at or wal_receiver_start or some other? Regards, Bharath Rupireddy.
Re: pg_waldump stucks with options --follow or -f and --stats or -z
On Fri, Nov 26, 2021 at 03:47:30PM +0530, Bharath Rupireddy wrote: > On Fri, Nov 26, 2021 at 11:50 AM Michael Paquier wrote: >> +When --follow is used with --stats and >> +the pg_waldump is terminated or interrupted >> +with signal SIGINT or >> SIGTERM >> +or SIGQUIT, the summary statistics computed >> +as of the termination will be displayed. >> This description is not completely correct, as the set of stats would >> show up only by using --stats, in non-quiet mode. Rather than >> describing this behavior at the end of the docs, I think that it would >> be better to add a new paragraph in the description of --stats. > > Done. v4 is just moving this paragraph in its correct subsection. My wording may have been confusing here, sorry about that. What I meant is that there is no need to mention --follow at all, as an interruption done before reaching the end LSN or the end of WAL would still print out the stats accumulated up to the interrupted point. -- Michael signature.asc Description: PGP signature
Re: Deduplicate code updating ControleFile's DBState.
On Sun, Nov 28, 2021 at 07:53:13AM +0530, Bharath Rupireddy wrote: > Isn't it better if we update the ControlFile->time at the end of the > update_controlfile, after file write/sync? I don't quite understand your point here. We want to update the control file's timestamp when it is written, before calculating its CRC. > Why do we even need UpdateControlFile which just calls another > function? It may be there for usability and readability, but can't the > pg backend code just call update_controlfile(DataDir, ControlFile, > true); directly so that a function call cost can be avoided? > Otherwise, why can't we make UpdateControlFile an inline function? I'm > not sure if any of the compilers will ever optimize by inlining it > without the "inline" keyword. I would leave it as-is as UpdateControlFile() is a public API old enough to vote (a70e74b0). Anyway, that's a useful wrapper for the backend. -- Michael signature.asc Description: PGP signature
Re: SSL Tests for sslinfo extension
On Sat, Nov 27, 2021 at 02:27:19PM -0500, Tom Lane wrote: > I think testing sslinfo in src/test/ssl is fine, while putting its test > inside contrib/ would be dangerous, because then the test would be run > by default. src/test/ssl is not run by default because the server it > starts is potentially accessible by other local users, and AFAICS the > same has to be true for an sslinfo test. Ah, indeed, good point. I completely forgot that we'd better control this stuff with PG_TEST_EXTRA. -- Michael signature.asc Description: PGP signature
Re: pg_waldump stucks with options --follow or -f and --stats or -z
On Sun, Nov 28, 2021 at 9:50 AM Michael Paquier wrote: > v4 is just moving this paragraph in its correct subsection. My > wording may have been confusing here, sorry about that. What I meant > is that there is no need to mention --follow at all, as an > interruption done before reaching the end LSN or the end of WAL would > still print out the stats accumulated up to the interrupted point. Thanks. Here's the v5. Regards, Bharath Rupireddy. v5-0001-pg_waldump-emit-stats-when-interrupted-with-SIGIN.patch Description: Binary data
Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes
On Mon, Nov 15, 2021 at 10:27 PM Bharath Rupireddy wrote: > > On Mon, Nov 15, 2021 at 10:04 PM vignesh C wrote: > > > > On Mon, Nov 15, 2021 at 7:47 AM Bharath Rupireddy > > wrote: > > > > > > On Fri, Nov 5, 2021 at 11:12 AM Bharath Rupireddy > > > wrote: > > > > PSA v2 patch and review it. > > > > > > I've modified the docs part a bit, please consider v3 for review. > > > > Thanks for the update patch, Few comments: > > 1) Should we change "CHECK_FOR_INTERRUPTS()" to > > "CHECK_FOR_INTERRUPTS() or process specific interrupt handlers" > > Done. > > > 2) Should we mention Postmaster process also along with logger and > > statistics collector process > > +backend or the > > +WAL sender or > > the > > +auxiliary > > process > > +with the specified process ID. All of the > > +auxiliary > > processes > > +are supported except the > linkend="glossary-logger">logger > > +and the > linkend="glossary-stats-collector">statistics collector > > +as they are not connected to shared memory the function can > > not make requests. > > +The backtrace will be logged at LOG message > > level. > > +They will appear in the server log based on the log configuration > > set > > +(See for more > > information), > > +but will not be sent to the client regardless of > > Done. > > Attaching v4 patch, please review it further. One small comment: 1) There should be a space in between "LOGmessage level" +it can) for memory contexts. These memory contexts will be logged at +LOGmessage level. They will appear in the server log +based on the log configuration set (See +for more information), but will not be sent to the client regardless of The rest of the patch looks good to me. Regards, Vignesh
Re: Synchronizing slots from primary to standby
On Sun, Oct 31, 2021 at 3:38 PM Peter Eisentraut wrote: > > I want to reactivate $subject. I took Petr Jelinek's patch from [0], > rebased it, added a bit of testing. It basically works, but as > mentioned in [0], there are various issues to work out. > > The idea is that the standby runs a background worker to periodically > fetch replication slot information from the primary. On failover, a > logical subscriber would then ideally find up-to-date replication slots > on the new publisher and can just continue normally. > > The previous thread didn't have a lot of discussion, but I have gathered > from off-line conversations that there is a wider agreement on this > approach. So the next steps would be to make it more robust and > configurable and documented. As I said, I added a small test case to > show that it works at all, but I think a lot more tests should be added. > I have also found that this breaks some seemingly unrelated tests in > the recovery test suite. I have disabled these here. I'm not sure if > the patch actually breaks anything or if these are just differences in > timing or implementation dependencies. This patch adds a LIST_SLOTS > replication command, but I think this could be replaced with just a > SELECT FROM pg_replication_slots query now. (This patch is originally > older than when you could run SELECT queries over the replication protocol.) > > So, again, this isn't anywhere near ready, but there is already a lot > here to gather feedback about how it works, how it should work, how to > configure it, and how it fits into an overall replication and HA > architecture. > > > [0]: > https://www.postgresql.org/message-id/flat/3095349b-44d4-bf11-1b33-7eefb585d578%402ndquadrant.com Thanks for working on this patch. This feature will be useful as it avoids manual intervention during the failover. Here are some thoughts: 1) Instead of a new LIST_SLOT command, can't we use READ_REPLICATION_SLOT (slight modifications needs to be done to make it support logical replication slots and to get more information from the subscriber). 2) How frequently the new bg worker is going to sync the slot info? How can it ensure that the latest information exists say when the subscriber is down/crashed before it picks up the latest slot information? 3) Instead of the subscriber pulling the slot info, why can't the publisher (via the walsender or a new bg worker maybe?) push the latest slot info? I'm not sure we want to add more functionality to the walsender, if yes, isn't it going to be much simpler? 4) IIUC, the proposal works only for logical replication slots but do you also see the need for supporting some kind of synchronization of physical replication slots as well? IMO, we need a better and consistent way for both types of replication slots. If the walsender can somehow push the slot info from the primary (for physical replication slots)/publisher (for logical replication slots) to the standby/subscribers, this will be a more consistent and simplistic design. However, I'm not sure if this design is doable at all. Regards, Bharath Rupireddy.
Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes
On Sun, Nov 28, 2021 at 12:22 PM vignesh C wrote: > > Attaching v4 patch, please review it further. > > One small comment: > 1) There should be a space in between "LOGmessage level" > +it can) for memory contexts. These memory contexts will be logged at > +LOGmessage level. They will appear in the server > log > +based on the log configuration set (See linkend="runtime-config-logging"/> > +for more information), but will not be sent to the client regardless > of Done. > The rest of the patch looks good to me. Thanks for the review. Here's the v5 patch. Regards, Bharath Rupireddy. v5-0001-enhance-pg_log_backend_memory_contexts-to-log-mem.patch Description: Binary data
Re: row filtering for logical replication
On Fri, Nov 26, 2021 at 1:16 PM houzj.f...@fujitsu.com wrote: > ... > Based on this direction, I tried to write a top up POC patch(0005) which I'd > like to share. > > The top up patch mainly did the following things. > > * Move the row filter columns invalidation to CheckCmdReplicaIdentity, so that > the invalidation is executed only when actual UPDATE or DELETE executed on the > published relation. It's consistent with the existing check about replica > identity. > > * Cache the results of the validation for row filter columns in relcache to > reduce the cost of the validation. It's safe because every operation that > change the row filter and replica identity will invalidate the relcache. > > Also attach the v42 patch set to keep cfbot happy. Hi Hou-san. Thanks for providing your "top-up" 0005 patch! I suppose the goal will be to later merge this top-up with the current 0002 validation patch, but in the meantime here are my review comments for 0005. == 1) src/include/catalog/pg_publication.h - PublicationInfo +typedef struct PublicationInfo +{ + PublicationActions pubactions; + + /* + * True if pubactions don't include UPDATE and DELETE or + * all the columns in the row filter expression are part + * of replica identity. + */ + bool rfcol_valid_for_replid; +} PublicationInfo; + IMO "PublicationInfo" sounded too much like it is about the Publication only, but IIUC it is really *per* Relation publication info, right? So I thought perhaps it should be called more like struct "RelationPubInfo". == 2) src/include/catalog/pg_publication.h - PublicationInfo The member "rfcol_valid_for_replid" also seems a little bit mis-named because in some scenario (not UPDATE/DELETE) it can be true even if there is not replica identity columns. So I thought perhaps it should be called more like just "rfcols_valid" Another thing - IIUC this is a kind of a "unified" boolean that covers *all* filters for this Relation (across multiple publications). If that is right., then the comment for this member should say something about this. == 3) src/include/catalog/pg_publication.h - PublicationInfo This new typedef should be added to src/tools/pgindent/typedefs.list == 4) src/backend/catalog/pg_publication.c - check_rowfilter_replident +/* + * Check if all the columns used in the row-filter WHERE clause are part of + * REPLICA IDENTITY + */ +bool +check_rowfilter_replident(Node *node, Bitmapset *bms_replident) +{ IIUC here the false means "valid" and true means "invalid" which is counter-intuitive to me. So at least true/false meaning ought to be clarified in the function comment, and/or perhaps also rename the function so that the return meaning is more obvious. == 5) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity + pubinfo = RelationGetPublicationInfo(rel); + IIUC this pubinfo* is palloced *every* time by RelationGetPublicationInfo isn't it? If that is the case shouldn't CheckCmdReplicaIdentity be doing a pfree(pubinfo)? == 6) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity + pubinfo = RelationGetPublicationInfo(rel); + + /* + * if not all columns in the publication row filter are part of the REPLICA + * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE. + */ + if (!pubinfo->rfcol_valid_for_replid) + { + if (cmd == CMD_UPDATE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot update table \"%s\"", + RelationGetRelationName(rel)), + errdetail("Not all row filter columns are not part of the REPLICA IDENTITY"))); + else if (cmd == CMD_DELETE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot delete from table \"%s\"", + RelationGetRelationName(rel)), + errdetail("Not all row filter columns are not part of the REPLICA IDENTITY"))); The comment seemed worded in a confusingly negative way. Before: + * if not all columns in the publication row filter are part of the REPLICA + * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE. My Suggestion: It is only safe to execute UPDATE/DELETE when all columns of the publication row filters are part of the REPLICA IDENTITY. ~~ Also, is "publication row filter" really the correct terminology? AFAIK it is more like *all* filters for this Relation across multiple publications, but I have not got a good idea how to word that in a comment. Anyway, I have a feeling this whole idea might be impacted by other discussions in this RF thread. == 7) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity Error messages have double negative wording? I think Greg already commented on this same point. + errdetail("Not all row filter columns are not part of the REPLICA IDENTITY"))); == 8) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity But which are the bad filter columns? Previously the Row Filter column validation gave errors for the invalid filter column, but in this top-up patch there is no indicati