Re: Unify drop-by-OID functions
On 2020-05-01 23:31, Ranier Vilela wrote: I can suggest improvements? 1. In case Object is cached, delay open_table until the last moment, for the row to be blocked as little as possible and close the table as quickly as possible. 2. In case Object is cached and the tuple is invalid, do not open table. 3. Otherwise, is it possible to call systable_endscan, after table_close? What do you mean by "object is cached"? In any case, this is a refactoring patch, so significant changes to the internal logic would not really be in scope. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SLRU statistics
On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote: On 2020/05/02 9:08, Tomas Vondra wrote: On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote: On 2020/05/01 3:19, Tomas Vondra wrote: On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote: On 2020/04/02 9:41, Tomas Vondra wrote: Hi, I've pushed this after some minor cleanup and improvements. +static char *slru_names[] = {"async", "clog", "commit_timestamp", + "multixact_offset", "multixact_member", + "oldserxid", "pg_xact", "subtrans", + "other" /* has to be last */}; When I tried pg_stat_slru, I found that it returns a row for "pg_xact". But since there is no "pg_xact" slru ("clog" slru exists instead), "pg_xact" should be removed? Patch attached. Yeah, I think I got confused and accidentally added both "clog" and "pg_xact". I'll get "pg_xact" removed. Thanks! OK, pushed. Thanks! Thanks a lot! But, like the patch that I attached in the previous email does, "pg_xact" should be removed from the description of pg_stat_reset_slru() in monitoring.sgml. Whooops. My bad, will fix. Another thing I found is; pgstat_send_slru() should be called also by other processes than backend? For example, since clog data is flushed basically by checkpointer, checkpointer seems to need to send slru stats. Otherwise, pg_stat_slru.flushes would not be updated. Hmmm, that's a good point. If I understand the issue correctly, the checkpointer accumulates the stats but never really sends them because it never calls pgstat_report_stat/pgstat_send_slru. That's only called from PostgresMain, but not from CheckpointerMain. Yes. I think we could simply add pgstat_send_slru() right after the existing call in CheckpointerMain, right? Checkpointer sends off activity statistics to the stats collector in two places, by calling pgstat_send_bgwriter(). What about calling pgstat_send_slru() just after pgstat_send_bgwriter()? Yep, that's what I proposed. In previous email, I mentioned checkpointer just as an example. So probably we need to investigate what process should send slru stats, other than checkpointer. I guess that at least autovacuum worker, logical replication walsender and parallel query worker (maybe this has been already covered by parallel query some mechanisms. Sorry I've not checked that) would need to send its slru stats. Probably. Do you have any other process type in mind? Atsushi-san reported another issue in pg_stat_slru. You're planning to work on that? https://postgr.es/m/cacz0uyfe16pjzxqyatn53mspym7dgmpyl3djljjpw69gmcc...@mail.gmail.com Yes, I'll investigate. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Rotten parts of src/backend/replication/README
On Sat, May 2, 2020 at 8:16 AM Michael Paquier wrote: > > Hi all, > > The first part of src/backend/replication/README lists all the APIs > usable for a WAL receiver, but these have aged and lost track of most > changes that happened over the years. Four functions are listed in > the README, with incorrect signatures and many others are missing: > - walrcv_connect() > - walrcv_receive() > - walrcv_send() > - walrcv_disconnect() > > I think that we should clean up that. +1. > And as it seems to me that > nobody really remembers to update this README, I would suggest to > update the first section of the README to refer to walreceiver.h for > details about each function, then move the existing API descriptions > from the README to walreceiver.h (while fixing the incomplete > descriptions of course). This way, if a new function is added or if > an existing function is changed, it is going to be hard to miss an > update of the function descriptions. > I think in README we can have a general description of the module and maybe at the broad level how different APIs can help to achieve the required functionality and then for API description we can refer to .h/.c. The detailed description of APIs should be where those APIs are defined. The header file can contain some generic description. The detailed description I am referring to is below in the README: "Retrieve any message available without blocking through the connection. If a message was successfully read, returns its length. If the connection is closed, returns -1. Otherwise returns 0 to indicate that no data is available, and sets *wait_fd to a socket descriptor which can be waited on before trying again. On success, a pointer to the message payload is stored in *buffer. The returned buffer is valid until the next call to walrcv_* functions, and the caller should not attempt to free it." I think having such a description near the actual definition helps in keeping it updated whenever we change the function. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao wrote: > > On 2020/04/08 1:49, Fujii Masao wrote: > > > > > > On 2020/04/07 20:21, David Steele wrote: > >> > >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote: > >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao > >>> wrote in > >> This doesn't seem a bug, so I'm thinking to merge this to next *major* > >> version release, i.e., v13. > > Not a bug, perhaps, but I think we do consider back-patching > > performance problems. The rise in S3 usage has just exposed how poorly > > this performed code in high-latency environments. > > I understood the situation and am fine to back-patch that. But I'm not > sure > if it's fair to do that. Maybe we need to hear more opinions about > this? > OTOH, feature freeze for v13 is today, so what about committing the > patch > in v13 at first, and then doing the back-patch after hearing opinions > and > receiving many +1? > >>> > >>> +1 for commit only v13 today, then back-patch if people wants and/or > >>> accepts. > > Please let me revisit this. Currently Grigory Smolkin, David Steele, > Michael Paquier and Pavel Suderevsky agree to the back-patch and > there has been no objection to that. So we should do the back-patch? > Or does anyone object to that? > > I don't think that this is a feature bug because archive recovery works > fine from a functional perspective without this commit. OTOH, > I understand that, without the commit, there is complaint about that > archive recovery may be slow unnecessarily when archival storage is > located in remote, e.g., Amazon S3 and it takes a long time to fetch > the non-existent archive WAL file. So I'm ok to the back-patch unless > there is no strong objection to that. > I don't see any obvious problem with the changed code but we normally don't backpatch performance improvements. I can see that the code change here appears to be straight forward so it might be fine to backpatch this. Have we seen similar reports earlier as well? AFAIK, this functionality is for a long time and if people were facing this on a regular basis then we would have seen such reports multiple times. I mean to say if the chances of this hitting are less then we can even choose not to backpatch this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WAL usage calculation patch
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud wrote: > > On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud wrote: > > > > On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila wrote: > > > > > > Julien, are you planning to write a cleanup patch for this open item? > > > > Sorry Amit, I've been quite busy at work for the last couple of days. > > I'll take care of that this morning for sure! > > Here's the patch. > Thanks for the patch. I will look into it early next week. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_reset_slru(name) doesn't seem to work as documented
On Fri, May 01, 2020 at 07:10:23PM +0900, Atsushi Torikoshi wrote: Hi, When I tried to reset a counter in pg_stat_slru using pg_stat_reset_slru(name), not only the specified counter but all the counters were reset. postgres=# SELECT * FROM pg_stat_slru ; name | blks_zeroed | blks_hit | blks_read | blks_written | blks_exists | flushes | truncates | stats_reset --+-+--+---+--+-+-+---+--- async| 3 |0 | 0 |3 | 0 | 0 | 0 | 2020-05-01 17:36:26.073433+09 clog | 0 | 56 | 0 |0 | 0 | 0 | 0 | 2020-05-01 17:36:26.073433+09 commit_timestamp | 0 |0 | 0 |0 | 0 | 0 | 0 | 2020-05-01 17:36:26.073433+09 (snip) postgres=# SELECT pg_stat_reset_slru('clog'); postgres=# SELECT * FROM pg_stat_slru ; name | blks_zeroed | blks_hit | blks_read | blks_written | blks_exists | flushes | truncates | stats_reset --+-+--+---+--+-+-+---+--- async| 0 |0 | 0 |0 | 0 | 0 | 0 | 2000-01-01 09:00:00+09 clog | 0 |0 | 0 |0 | 0 | 0 | 0 | 2020-05-01 17:37:02.525006+09 commit_timestamp | 0 |0 | 0 |0 | 0 | 0 | 0 | 2000-01-01 09:00:00+09 (snip) Attached a patch. Yeah, that memset is a left-over from some earlier version of the patch. Thanks for the report and patch, I'll push this shortly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Unify drop-by-OID functions
Em sáb., 2 de mai. de 2020 às 05:01, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> escreveu: > On 2020-05-01 23:31, Ranier Vilela wrote: > > I can suggest improvements? > > > > 1. In case Object is cached, delay open_table until the last moment, for > > the row to be blocked as little as possible and close the table as > > quickly as possible. > > 2. In case Object is cached and the tuple is invalid, do not open table. > > 3. Otherwise, is it possible to call systable_endscan, after table_close? > > What do you mean by "object is cached"? > Well, that's what I deduced from the cacheId variable name. regards, Ranier Vilela
Re: pg_stat_reset_slru(name) doesn't seem to work as documented
On Sat, May 02, 2020 at 02:27:33PM +0200, Tomas Vondra wrote: On Fri, May 01, 2020 at 07:10:23PM +0900, Atsushi Torikoshi wrote: Hi, When I tried to reset a counter in pg_stat_slru using pg_stat_reset_slru(name), not only the specified counter but all the counters were reset. postgres=# SELECT * FROM pg_stat_slru ; name | blks_zeroed | blks_hit | blks_read | blks_written | blks_exists | flushes | truncates | stats_reset --+-+--+---+--+-+-+---+--- async| 3 |0 | 0 |3 | 0 | 0 | 0 | 2020-05-01 17:36:26.073433+09 clog | 0 | 56 | 0 |0 | 0 | 0 | 0 | 2020-05-01 17:36:26.073433+09 commit_timestamp | 0 |0 | 0 |0 | 0 | 0 | 0 | 2020-05-01 17:36:26.073433+09 (snip) postgres=# SELECT pg_stat_reset_slru('clog'); postgres=# SELECT * FROM pg_stat_slru ; name | blks_zeroed | blks_hit | blks_read | blks_written | blks_exists | flushes | truncates | stats_reset --+-+--+---+--+-+-+---+--- async| 0 |0 | 0 |0 | 0 | 0 | 0 | 2000-01-01 09:00:00+09 clog | 0 |0 | 0 |0 | 0 | 0 | 0 | 2020-05-01 17:37:02.525006+09 commit_timestamp | 0 |0 | 0 |0 | 0 | 0 | 0 | 2000-01-01 09:00:00+09 (snip) Attached a patch. Yeah, that memset is a left-over from some earlier version of the patch. Thanks for the report and patch, I'll push this shortly. Pushed. Thanks for the report. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: WAL prefetch (another approach)
> On Sat, Apr 25, 2020 at 09:19:35PM +0200, Dmitry Dolgov wrote: > > On Tue, Apr 21, 2020 at 05:26:52PM +1200, Thomas Munro wrote: > > > > One report I heard recently said that if you get rid of I/O stalls, > > pread() becomes cheap enough that the much higher frequency lseek() > > calls I've complained about elsewhere[1] become the main thing > > recovery is doing, at least on some systems, but I haven't pieced > > together the conditions required yet. I'd be interested to know if > > you see that. > > At the moment I've performed couple of tests for the replication in case > when almost everything is in memory (mostly by mistake, I was expecting > that a postgres replica within a badly memory limited cgroup will cause > more IO, but looks like kernel do not evict pages anyway). Not sure if > that's what you mean by getting rid of IO stalls, but in these tests > profiling shows lseek & pread appear in similar amount of samples. > > If I understand correctly, eventually one can measure prefetching > influence by looking at different redo function execution time (assuming > that data they operate with is already prefetched they should be > faster). I still have to clarify what is the exact reason, but even in > the situation described above (in memory) there is some visible > difference, e.g. I've finally performed couple of tests involving more IO. The not-that-big dataset of 1.5 GB for the replica with the memory allowing fitting ~ 1/6 of it, default prefetching parameters and an update workload with uniform distribution. Rather a small setup, but causes stable reading into the page cache on the replica and allows to see a visible influence of the patch (more measurement samples tend to happen at lower latencies): # with patch Function = b'heap_redo' [206] nsecs : count distribution 1024 -> 2047 : 0|| 2048 -> 4095 : 32833|** | 4096 -> 8191 : 59476|| 8192 -> 16383 : 18617|| 16384 -> 32767 : 3992 |** | 32768 -> 65535 : 425 || 65536 -> 131071 : 5|| 131072 -> 262143 : 326 || 262144 -> 524287 : 6|| # without patch Function = b'heap_redo' [130] nsecs : count distribution 1024 -> 2047 : 0|| 2048 -> 4095 : 20062|*** | 4096 -> 8191 : 70662|| 8192 -> 16383 : 12895|*** | 16384 -> 32767 : 9123 |* | 32768 -> 65535 : 560 || 65536 -> 131071 : 1|| 131072 -> 262143 : 460 || 262144 -> 524287 : 3|| Not that there were any doubts, but at the same time it was surprising to me how good linux readahead works in this situation. The results above are shown with disabled readahead for filesystem and device, and without that there was almost no difference, since a lot of IO was avoided by readahead (which was in fact the majority of all reads): # with patch flags = Read usecs : count distribution 16 -> 31 : 0| | 32 -> 63 : 1| | 64 -> 127: 5 || flags = ReadAhead-Read usecs : count distribution 32 -> 63 : 0| | 64 -> 127: 131 || 128 -> 255: 12 |*** | 256 -> 511: 6|* | # without patch flags = Read usecs : count distribution 16 -> 31 : 0| | 32 -> 63 : 0| | 64 -> 127: 4 || flags = ReadAhead-Read usecs : count distribution 32 -> 63 : 0| |
Re: SLRU statistics
On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote: On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote: ... Another thing I found is; pgstat_send_slru() should be called also by other processes than backend? For example, since clog data is flushed basically by checkpointer, checkpointer seems to need to send slru stats. Otherwise, pg_stat_slru.flushes would not be updated. Hmmm, that's a good point. If I understand the issue correctly, the checkpointer accumulates the stats but never really sends them because it never calls pgstat_report_stat/pgstat_send_slru. That's only called from PostgresMain, but not from CheckpointerMain. Yes. I think we could simply add pgstat_send_slru() right after the existing call in CheckpointerMain, right? Checkpointer sends off activity statistics to the stats collector in two places, by calling pgstat_send_bgwriter(). What about calling pgstat_send_slru() just after pgstat_send_bgwriter()? Yep, that's what I proposed. In previous email, I mentioned checkpointer just as an example. So probably we need to investigate what process should send slru stats, other than checkpointer. I guess that at least autovacuum worker, logical replication walsender and parallel query worker (maybe this has been already covered by parallel query some mechanisms. Sorry I've not checked that) would need to send its slru stats. Probably. Do you have any other process type in mind? I've looked at places calling pgstat_send_* functions, and I found thsese places: src/backend/postmaster/bgwriter.c - AFAIK it merely writes out dirty shared buffers, so likely irrelevant. src/backend/postmaster/checkpointer.c - This is what we're already discussing here. src/backend/postmaster/pgarch.c - Seems irrelevant. I'm a bit puzzled why we're not sending any stats from walsender, which I suppose could do various stuff during logical decoding. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protocol problem with GSSAPI encryption?
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * Andrew Gierth (and...@tao11.riddles.org.uk) wrote: > > > "Peter" == Peter Eisentraut writes: > > > > >> It seems to me that this is a bug in ProcessStartupPacket, which > > >> should accept both GSS or SSL negotiation requests on a connection > > >> (in either order). Maybe secure_done should be two flags rather than > > >> one? > > > > Peter> I have also seen reports of that. I think your analysis is > > Peter> correct. > > > > I figure something along these lines for the fix. Anyone in a position > > to test this? > > At least at first blush, I tend to agree with your analysis and patch. > > I'll see about getting this actually set up and tested in the next week > or so (and maybe there's some way to also manage to have a regression > test for it..). After testing this and playing around with it a bit, I've gone ahead and pushed it. Thanks! Stephen signature.asc Description: PGP signature
Re: Problems with GSS encryption and SSL in libpq in 12~
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > A quick make check with Postgres 11 and 12 for src/test/ssl/ shows a > lot of difference in run time, using the same set of options with SSL > and the same compilation flags (OpenSSL 1.1.1f, with debugging and > assertions enabled among other things FWIW), with 12 taking up to five > minutes to complete and 11 finishing as a matter of seconds for me. > > I have spent a couple of hours on that, to find out that libpq tries > to initialize a GSS context where the client remains stuck: > #9 0x7fcd839bf72c in krb5_expand_hostname () from > /usr/lib/x86_64-linux-gnu/libkrb5.so.3 > #10 0x7fcd839bf9e0 in krb5_sname_to_principal () from > /usr/lib/x86_64-linux-gnu/libkrb5.so.3 > #11 0x7fcd83ad55b4 in ?? () from > /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 > #12 0x7fcd83ac0a98 in ?? () from > /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 > #13 0x7fcd83ac200f in gss_init_sec_context () from > /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 > #14 0x7fcd8423b24d in pqsecure_open_gss (conn=0x5582fa8cad90) at > fe-secure-gssapi.c:626 > #15 0x7fcd8421cd2b in PQconnectPoll (conn=0x5582fa8cad90) at > fe-connect.c:3165 > #16 0x7fcd8421b311 in connectDBComplete (conn=0x5582fa8cad90) at > fe-connect.c:2182 > #17 0x7fcd84218c1f in PQconnectdbParams (keywords=0x5582fa8cacf0, > values=0x5582fa8cad40, expand_dbname=1) at fe-connect.c:647 > #18 0x5582f8a81c87 in main (argc=8, argv=0x7ffe5ddb9df8) at > startup.c:266 > > However this makes little sense, why would libpq do that in the > context of an OpenSSL connection? Well, makeEmptyPGconn() does that, > which means that libpq would try by default to use GSS just if libpq > is *built* with GSS: > #ifdef ENABLE_GSS > conn->try_gss = true; > #endif Sure, but if you look at what is done with it: /* * If GSSAPI encryption is enabled, then call * pg_GSS_have_cred_cache() which will return true if we can * acquire credentials (and give us a handle to use in * conn->gcred), and then send a packet to the server asking * for GSSAPI Encryption (and skip past SSL negotiation and * regular startup below). */ if (conn->try_gss && !conn->gctx) conn->try_gss = pg_GSS_have_cred_cache(&conn->gcred); In other words, it's trying because a call to gss_acquire_cred() (called from pg_GSS_have_cred_cache()) returned without error, indicating that GSS should be possible to attempt. If you have GSS compiled in, and you've got a credential cache such that gss_acquire_cred() returns true, it seems entirely reasonable that you'd like to connect using GSS encryption. > It is possible to enforce this flag to false by using > gssencmode=disable, but that's not really user-friendly in my opinion > because nobody is going to remember that for connection strings with > SSL settings so a lot of application are taking a performance hit at > connection because of that in my opinion. I think that's also a bad > idea from the start to assume that we have to try GSS by default, as > any new code path opening a secured connection may fail into the trap > of attempting to use GSS if this flag is not reset. Shouldn't we try > to set this flag to false by default, and set it to true only if > necessary depending on gssencmode? A quick hack switching this flag > to false in makeEmptyPGconn() gives back the past performance to > src/test/ssl/, FWIW. We don't just always try to do GSS, that certainly wouldn't make sense- we only try when gss_acquire_cred() comes back without an error. As this is part of the initial connection, it's also not possible to decide to do it by "depending on gssencmode", as we haven't talked to the server at all at this point and need to decide if we're going to do GSS encryption or not with the initial packet. Note that this is more-or-less identical to what we do with SSL, and, as you saw, we default to 'prefer' with GSSENCMODE, but you can set it to 'disable' on the client side if you don't want to try GSS, even when you have a client compiled with GSS and you have a credential cache. > Looking around, it seems to me that there is a second issue as of > PQconnectPoll(), where we don't reset the state machine correctly for > try_gss within reset_connection_state_machine, and instead HEAD does > it in connectDBStart(). Not following exactly what you're referring to here, but I see you've sent a follow-up email about this and will respond to that independently. > Also, I have noted a hack as of pqsecure_open_gss() which does that: >/* > * We're done - hooray! Kind of gross, but we need to disable SSL > * here so that we don't accidentally tunnel one over the other. > */ > #ifdef USE_SSL >conn->allow_ssl_try = false; > #endif > And that looks like a rather bad idea to me.. Tunneling SSL over GSS encryption is definitely a bad idea, which is why we prevent that from happening. I'm not sure what the issue here is- are you suggesting that we should s
Re: Problems with GSS encryption and SSL in libpq in 12~
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Apr 06, 2020 at 04:25:57PM +0900, Michael Paquier wrote: > > It is possible to enforce this flag to false by using > > gssencmode=disable, but that's not really user-friendly in my opinion > > because nobody is going to remember that for connection strings with > > SSL settings so a lot of application are taking a performance hit at > > connection because of that in my opinion. I think that's also a bad > > idea from the start to assume that we have to try GSS by default, as > > any new code path opening a secured connection may fail into the trap > > of attempting to use GSS if this flag is not reset. Shouldn't we try > > to set this flag to false by default, and set it to true only if > > necessary depending on gssencmode? A quick hack switching this flag > > to false in makeEmptyPGconn() gives back the past performance to > > src/test/ssl/, FWIW. > > > > Looking around, it seems to me that there is a second issue as of > > PQconnectPoll(), where we don't reset the state machine correctly for > > try_gss within reset_connection_state_machine, and instead HEAD does > > it in connectDBStart(). > > So, a lot of things come down to PQconnectPoll() here. Once the > connection state reached is CONNECTION_MADE, we first try a GSS > connection if try_gss is true, and a SSL connection attempt follows > just after. This makes me wonder about the following things: > - gssencmode is prefer by default, the same as sslmode. Shouldn't we > issue an error if any of them is not disabled to avoid any conflicts > in the client, making the choice of gssencmode=prefer by default a bad > choice? It seems to me that there could be an argument to make > gssencmode disabled by default, and issue an error if somebody > attempts a connection without at least gssencode or sslmode set as > disabled. I don't see why it would make sense to throw an error and require that one of them be disabled. I certainly don't agree that we should disable GSS encryption by default, or that there's any reason to throw an error if both GSS and SSL are set to 'prefer' (as our current default is). > - The current code tries a GSS connection first, and then it follows > with SSL, which is annoying because gssencmode=prefer by default means > that any user would pay the cost of attempting a GSS connection for > nothing (with or even without SSL). Shouldn't we do the opposite > here, by trying SSL first, and then GSS? A GSS connection is only attempted, as mentioned, if your GSS library claims that there's a possibility that credentials could be acquired for the connection. > For now, I am attaching a WIP patch which seems like a good angle of > attack for libpq, meaning that if sslmode and gssencmode are both set, > then we would attempt a SSL connection before attempting GSS, so as > any user of SSL does not pay a performance hit compared to past > versions (I know that src/test/kerberos/ fails with that because > sslmode=prefer is triggered first in PQconnectPoll(), but that's just > to show the idea I had in mind). > > Any thoughts? I don't agree with the assumption that, in the face of having GSS up and running, which actually validates the client and the server, that we should prefer SSL, where our default configuration for SSL does *not* validate properly *either* the client or the server. What it sounds like to me is that it'd be helpful for you to review why your environment has a GSS credential cache (or, at least, gss_acquire_cred() returns without any error) but GSS isn't properly working. If you're using OSX, it's possible the issue here is actually the broken and ridiculously ancient kerberos code that OSX ships with (and which, in another thread, there's a discussion about detecting and failing to build if it's detected as it's just outright broken). If it's not that then it'd be great to understand more about the specific environment and what things like 'klist' return. Thanks, Stephen signature.asc Description: PGP signature
Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
On Thu, Apr 30, 2020 at 12:20 AM Masahiko Sawada wrote: > For the part of treating that case as an index corruption I will need > some time to review because of lacking knowledge of btree indexes. So > I'll review it later. I pushed the refactoring patch today. Thanks for the review. The final test for corruption that I added to btvacuumscan() is less aggressive than what you saw in the patch I posted. We only report corruption when backtracking/recursing if the page is "new", not a leaf page, or is half-dead. We don't treat a fully deleted page as corruption, because there is a case where the same call to btvacuumscan() may have called _bt_pagedel() already, which may have deleted the block that we backtrack/recurse to. The "sibling links cannot point to a deleted page without concurrent deletion, and we know that can't happen because we are VACUUM" stuff doesn't really apply -- we remember which block we will recurse to *before* we actually call _bt_pagedel(). -- Peter Geoghegan
Re: Failed test 'pg_recvlogical acknowledged changes, nothing pending on slot'
On Mon, Jan 29, 2018 at 10:09:49AM +1300, Thomas Munro wrote: > https://travis-ci.org/postgresql-cfbot/postgresql/builds/334334417 > > # Failed test 'pg_recvlogical acknowledged changes, nothing pending on slot' > # at t/006_logical_decoding.pl line 91. > # got: 'BEGIN > # table public.decoding_test: INSERT: x[integer]:1 y[text]:'1' > # table public.decoding_test: INSERT: x[integer]:2 y[text]:'2' > # table public.decoding_test: INSERT: x[integer]:3 y[text]:'3' > # table public.decoding_test: INSERT: x[integer]:4 y[text]:'4' > # COMMIT' > # expected: '' > # Looks like you failed 1 test of 16. The problem is this StreamLogicalLog() code: if (PQresultStatus(res) == PGRES_COPY_OUT) { /* * We're doing a client-initiated clean exit and have sent CopyDone to * the server. We've already sent replay confirmation and fsync'd so * we can just clean up the connection now. */ goto error; } Once pg_recvlogical receives the XLogData containing the sought-after end position, that code makes pg_recvlogical exit without draining the remainder of the backend messages. If pg_recvlogical exits quickly enough, the backend send()s libpq messages after pg_recvlogical disconnects, which can cause internal_flush() to fail with EPIPE ("LOG: could not send data to client: Broken pipe"). If that precedes LogicalConfirmReceivedLocation(), the test fails as you experienced. Such failure happened once on the buildfarm[1]; post-LogicalConfirmReceivedLocation() EPIPE also happens[2]. The first attached patch causes this failure to happen almost every run. The fix (second attachment) is to call PQgetCopyData() until no records remain, then issue PQresultStatus() again[3]. This closes an additional defect, described in the log message. I looked at the other instances of "Broken pipe" in a couple of check-world runs. Clients might prevent those with cleaner shutdown on error, but it's cosmetic. They appeared in cases where the client or the server had already recognized some other failure, whereas $SUBJECT taints a successful run. This led me to notice other pg_recvlogical bugs, which I left unchanged: 1) An "unexpected termination of replication stream" error doesn't preclude exit(0). 2) sigint_handler() doesn't trigger a PQputCopyEnd(). The connection status remains PGRES_COPY_BOTH, prompting this weird message: $ pg_recvlogical --create-slot --start -S foo -d postgres -f- && echo success ^Cpg_recvlogical: error: unexpected termination of replication stream: success Other aspects of signal handling surprised me, but they may not be bugs. The documentation says that --start continues "until terminated by a signal". We don't trap SIGTERM, just SIGINT (initiate clean exit) and SIGHUP (reopen output file). pg_recvlogical copied SIGINT behavior from pg_receivewal, and the pg_receivewal documentation is specific about signals. Both programs react to SIGINT with exit(0), whether or not they reached --endpos. sigint_handler() doesn't trigger a flushAndSendFeedback(), so the slot's next pg_recvlogical will repeat messages that followed the last fsync/feedback. 3) sendFeedback() wants s/last_fsync_lsn !=/last_fsync_lsn ==/. This just changes the volume of feedback messages. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogfish&dt=2020-03-07%2018%3A49%3A34 [2] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=florican&dt=2020-04-27%2017%3A25%3A08&stg=recovery-check [3] https://www.postgresql.org/docs/devel/libpq-copy.html writes "After PQgetCopyData returns -1, call PQgetResult to obtain the final result status of the COPY command. One can wait for this result to be available in the usual way. Then return to normal operation." Author: Noah Misch Commit: Noah Misch (not for commit) Reproduce failure seen here: https://postgr.es/m/CAEepm=1MzM2Z_xNe4foGwZ1a+MO_2S9oYDq3M5D11=jdu_+...@mail.gmail.com https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogfish&dt=2020-03-07%2018%3A49%3A34 diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c index 7717bb2..2268009 100644 --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -1448,7 +1448,8 @@ internal_flush(void) last_reported_send_errno = errno; ereport(COMMERROR, (errcode_for_socket_access(), -errmsg("could not send data to client: %m"))); +errmsg("could not send data to client: %m"), +errbacktrace())); } /* diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index ae4a9cb..d660038 100644 ---
Re: pg_stat_reset_slru(name) doesn't seem to work as documented
On Sat, May 2, 2020 at 11:05 PM Tomas Vondra wrote > Pushed. Thanks for the report. > Thanks! -- Atsushi Torikoshi
Re: design for parallel backup
Hi, On 2020-05-01 16:32:15 -0400, Robert Haas wrote: > On Thu, Apr 30, 2020 at 6:06 PM Robert Haas wrote: > > On Thu, Apr 30, 2020 at 3:52 PM Andres Freund wrote: > > > Why 8kb? That's smaller than what we currently do in pg_basebackup, > > > afaictl, and you're actually going to be bottlenecked by syscall > > > overhead at that point (unless you disable / don't have the whole intel > > > security mitigation stuff). > > > > I just picked something. Could easily try other things. > > I tried changing the write size to 64kB, keeping the rest the same. > Here are the results: > > filesystem media 1@64GB 2@32GB 4@16GB 8@8GB 16@4GB > xfs mag 65 53 64 74 79 > ext4 mag 96 68 75 303 437 > xfs ssd 75 43 29 33 38 > ext4 ssd 96 68 63 214 254 > spread spread n/a n/a 43 38 40 > > And here again are the previous results with an 8kB write size: > > xfs mag 97 53 60 67 71 > ext4 mag 94 68 66 335 549 > xfs ssd 97 55 33 27 25 > ext4 ssd 116 70 66 227 450 > spread spread n/a n/a 48 42 44 > > Generally, those numbers look better than the previous numbers, but > parallelism still looks fairly appealing on the SSD storage - less so > on magnetic disks, at least in this test. I spent a fair bit of time analyzing this, and my conclusion is that you might largely be seeing numa effects. Yay. I don't have an as large numa machine at hand, but here's what I'm seeing on my local machine, during a run of writing out 400GiB (this is a run with noise on the machine, the benchmarks below are without that). The machine has 192GiB of ram, evenly distributed to two sockets / numa domains. At start I see numastat -m|grep -E 'MemFree|MemUsed|Dirty|Writeback|Active\(file\)|Inactive\(file\)'" MemFree 91908.2092209.85 184118.05 MemUsed 3463.05 4553.33 8016.38 Active(file) 105.46 328.52 433.98 Inactive(file) 68.29 190.14 258.43 Dirty 0.860.901.76 Writeback 0.000.000.00 WritebackTmp0.000.000.00 For a while there's pretty decent IO throughput (all 10s samples): Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 0.00 0.00 0.00 0.000.00 0.00 1955.67 2299.32 0.00 0.00 42.48 1203.940.00 0.00 0.00 0.00 0.00 0.000.000.00 82.10 89.33 Then it starts to be slower on a sustained basis: Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 0.00 0.00 0.00 0.000.00 0.00 1593.33 1987.85 0.00 0.00 42.90 1277.550.00 0.00 0.00 0.00 0.00 0.000.000.00 67.55 76.53 And then performance tanks completely: Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 0.00 0.00 0.00 0.000.00 0.00 646.33 781.85 0.00 0.00 132.68 1238.700.00 0.00 0.00 0.00 0.00 0.000.000.00 85.43 58.63 That amount of degradation confused me for a while, especially because I couldn't reproduce it the more controlled I made the setups. In particular I stopped seeing the same magnitude of issues after pinnning processes to one numa socket (both running and memory). After a few seconds: Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 0.00 0.00 0.00 0.000.00 0.00 1882.00 2320.07 0.00 0.00 42.50 1262.350.00 0.00 0.00 0.00 0.00 0.000.000.00 79.05 88.07 MemFree 35356.5080986.46 116342.96 MemUsed 60014.7515776.7275791.47 Active(file) 179.44 163.28 342.72 Inactive(file) 58293.1813385.1571678.33 Dirty 18407.50 882.0019289.50 Writeback 235.78 335.43 571.21 WritebackTmp0.000.000.00 A bit later io starts to get slower: Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 0.00 0.00 0.00 0.000.00 0.00 1556.30 1898.70 0.00 0.00 40.
Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
On Sun, Apr 12, 2020 at 01:53:40PM +0200, Fabien COELHO wrote: > About v15, seen as one patch. Thanks for looking. > - I'm wondering whether could pg_stat_file call pg_ls_dir_files without > too much effort? ISTM that the output structure nearly the same. I do > not like much having one function specialized for files and one for > directories. I refactored but not like that. As I mentioned in the commit message, I don't see a good way to make a function operate on a file when the function's primary data structure is a DIR*. Do you ? I don't think it should call stat() and then conditionally branch off to pg_stat_file(). There are two functions because they wrap two separate syscalls, which see as good, transparent goal. If we want a function that does what "ls -al" does, that would also be a good example to follow, except that we already didn't follow it. /bin/ls first stat()s the path, and then either outputs its metadata (if it's a file or if -d was specified) or lists a dir. It's essentially a wrapper around *two* system calls (stat and readdir/getdents). Maybe we could invent a new pg_ls() which does that, and then refactor existing code. Or, maybe it would be a SQL function which calls stat() and then conditionally calls pg_ls_dir if isdir=True (or type='d'). That would be easy if we merge the commit which outputs all stat fields. I'm still hoping for confirmation from a committer if this approach is worth pursuing: https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com https://www.postgresql.org/message-id/20200313131232.GO29065%40telsasoft.com |Rather than making "recurse into tmpdir" the end goal: | | - add a function to show metadata of an arbitrary dir; | - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not |pg_ls_dir). | - maybe add pg_ls_dir_recurse, which satisfies the original need; | - retire pg_ls_dir (does this work with tuplestore?) | - profit -- Justin >From 6175cecd312296bbb9099834d91ffaa50e059f6c Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 16 Mar 2020 14:12:55 -0500 Subject: [PATCH v16 01/10] Document historic behavior of links to directories.. Backpatch to 9.5: pg_stat_file --- doc/src/sgml/func.sgml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 96ea57eedd..9b885102da 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25486,7 +25486,8 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); size, last accessed time stamp, last modified time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a boolean -indicating if it is a directory. Typical usages include: +indicating if it is a directory (or a symbolic link to a directory). +Typical usages include: SELECT * FROM pg_stat_file('filename'); SELECT (pg_stat_file('filename')).modification; -- 2.17.0 >From 52fd41bfacaba66f26f1e75b34e258b3a0f12372 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 30 Mar 2020 18:59:16 -0500 Subject: [PATCH v16 02/10] pg_stat_file and pg_ls_dir_* to use lstat().. pg_ls_dir_* will now skip (no longer show) symbolic links, same as other non-regular file types, as we advertize we do since 8b6d94cf6. That seems to be the intented behavior, since irregular file types are 1) less portable; and, 2) we don't currently show a file's type except for "bool is_dir". pg_stat_file will now 1) show metadata of links themselves, rather than their target; and, 2) specifically, show links to directories with "is_dir=false"; and, 3) not error on broken symlinks. --- doc/src/sgml/func.sgml | 2 +- src/backend/utils/adt/genfile.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9b885102da..96b08d0500 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25486,7 +25486,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); size, last accessed time stamp, last modified time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a boolean -indicating if it is a directory (or a symbolic link to a directory). +indicating if it is a directory. Typical usages include: SELECT * FROM pg_stat_file('filename'); diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index ceaa6180da..219ac160f8 100644 --- a/src/backend/utils/adt/genfile.c +++ b/src/backend/utils/adt/genfile.c @@ -370,7 +370,7 @@ pg_stat_file(PG_FUNCTION_ARGS) filename = convert_and_check_filename(filename_t); - if (stat(filename, &fst) < 0) + if (lstat(filename, &fst) < 0) { if (missing_ok && errno == ENOENT) PG_RETURN_NULL(); @@ -596,7 +596,7 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok) /* G
Re: full-text search with GiST indexes seems to be getting slower since 9.1
Tomas Vondra writes: > over the past couple of weeks I've been running various benchmarks with > the intent to demonstrate how the performance evolved since ~8.3. In > most tests we're doing pretty good, but I've noticed that full-text > search using GiST indexes is an annoying exception - it's getting slower > and slower since ~9.1. Hmmm... given that TS_execute is near the top, I wonder if this has anything to do with the several times we've mostly-rewritten that to make it less incorrect for phrase search cases. But that came in in 9.6 IIRC, and the fixes have only been back-patched that far. Don't know what might've changed in 9.3. regards, tom lane
Re: implicit declaration of datumIsEqual in parse_coerce.c
On Tue, Apr 07, 2020 at 10:39:30PM -0400, Tom Lane wrote: > > On Tue, Apr 07, 2020 at 05:16:58PM -0400, Tom Lane wrote: > >> That's inside "#ifdef RANDOMIZE_ALLOCATED_MEMORY", which probably > >> explains why most of us aren't seeing it. My guess is somebody > >> removed an #include without realizing that this chunk of code > >> needed it. > Curiously, there are no buildfarm warnings about this, even though we have > at least one member running with RANDOMIZE_ALLOCATED_MEMORY. Wonder why? The RANDOMIZE_ALLOCATED_MEMORY buildfarm members use xlc, which disables this warning by default. (Given flag -qinfo=pro, it would warn.)