Re: [PATCH] add relation and block-level filtering to pg_waldump
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi I am glad to find this patch here because it helps with my current development work, which involves a lot of debugging with the WAL records and this patch definitely make this much easier rather than using grep externally. I have tried all of the new options added by the patch and every combination seems to result correctly. The only comment I would have is in the documentation, where I would replace: "Display only records touching the given block" with "Display only records associated with the given block" "Display only records touching the given relation" with " Display only records associated with the given relation" just to make it sound more formal. :) best Cary Huang -- HighGo Software Canada www.highgo.ca
Re: pg_rewind copies
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hello The patch seems to do as described and the regression and tap tests are passing + /* +* A local source is not expected to change while we're rewinding, so check +* that we size of the file matches our earlier expectation. +*/ Is this a tyoo? thanks Cary
Re: Fix typo about WalSndPrepareWrite
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, thanks for the patch, however I don't think it is a typo. The comment indicates a technique that is used throughout many functions in walsender.c, which is to fill the send-timestamp as late as possible so it is more accurate. This is done by first reserving a spot in the write stream, write the actual data, and then finally write the current timestamp to the reserved spot. This technique is used in WalSndWriteData() and also XLogSendPhysical()... so really it doesn't matter which function name to put in the comment. thank you! --- Cary Huang HighGo Software (Canada)
Re: [PATCH] Add --create-only option to pg_dump/pg_dumpall
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hi I have tried the patch and the new option is able to control the contents of pg_dump outputs to include only create db related commands. I also agree that the option name is a little misleading to the user so I would suggest instead of using "create-only", you can say something maybe like "createdb-only" because this option only applies to CREATE DATABASE related commands, not CREATE TABLE or other objects. In the help menu, you can then elaborate more that this option "dump only the commands related to create database like ALTER, GRANT..etc" Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: Terminate the idle sessions
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed I applied this patch to the PG13 branch and generally this feature works as described. The new "idle_session_timeout" that controls the idle session disconnection is not in the default postgresql.conf and I think it should be included there with default value 0, which means disabled. There is currently no enforced minimum value for "idle_session_timeout" (except for value 0 for disabling the feature), so user can put any value larger than 0 and it could be very small like 500 or even 50 millisecond, this would make any psql connection to disconnect shortly after it has connected, which may not be ideal. Many systems I have worked with have 30 minutes inactivity timeout by default, and I think it would be better and safer to enforce a reasonable minimum timeout value
Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi I have tried the patch and it functions as described. The attached tap test case is comprehensive and is passing. However, the patch does not apply well on the current master; I had to checkout to a much earlier commit to be able to patch correctly. The patch will need to be rebased to the current master. Thanks Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: [PATCH] Proof of concept for GUC improvements
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hi I quite like the feature this patch provides, it makes special int values more meaningful to user. I like that you added a configuration to allow the original int value to be outputted or the special value, this adds some flexibility in case returning special value breaks some old applications. I scanned through the GUC list and found that the following parameters can potentially be categorized in the "special_disabled0" group, just for your reference. max_prepared_transactions vacuum_cost_delay effective_io_concurrency max_parallel_workers max_parallel_maintenance_workers max_parallel_workers_per_gather max_logical_replication_workers max_sync_workers_per_subscription jit_above_cost jit_inline_above_cost jit_optimize_above_cost log_rotation_age log_rotation_size log_transaction_sample_rate Cary Huang - HighGo Software Canada www.highgo.ca
Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hi The patch applies and tests fine. I don't think there is any harm having little extra statistical information about the checkpoint process. In fact, it could be useful in identifying a bottleneck during the checkpoint process as the stats exactly the time taken to do the file IO in pg_logical dir. best Cary Huang
Re: Setting min/max TLS protocol in clientside libpq
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed Hello I have applied the patch and did some basic testing with various combination of min and max TLS protocol versions. Overall the functionality works and the chosen TLS protocol aligns with the min/max TLS protocol settings on the PG server side. I agree with Arthur that it makes sense to check the validity of "conn->sslmaxprotocolversion" first before checking if it is larger than "conn->sslminprotocolversion" A small suggestion here. I see that PG server defaults TLS min version to be TLSv1.2 and max version to none. So by default the server accepts TLSv1.2 and above. I think on the client side, it also makes sense to have the same defaults as the server. In the patch, if the client does not supply "sslminprotocolversion", it will run to "else" statement and sets TLS min version to "INT_MIN", which is a huge negative number and of course openssl won't set it. I think this else statement can be enhanced a little to set "sslminprotocolversion" to TLSv1.2 by default to match the server and provide some warning message that TLS minimum has defaulted to TLSv1.2. Cary HighGo Software Canada
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Hello Sawada and all I would like to elaborate more on Sehrope and Sawada's discussion on passing NULL IV in "pg_cipher_encrypt/decrypt" functions during kmgr_wrap_key and kmgr_unwrap_key routines in kmgr_utils.c. Openssl implements key wrap according to RFC3394 as Sawada mentioned and passing NULL will make openssl to use default IV, which equals to A6A6A6A6A6A6A6A6. I have confirmed this on my side; A key wrapped with "NULL" IV can be unwrapped successfully with IV=A6A6A6A6A6A6A6A6, and unwrap will fail if IV is set to anything else other than NULL or A6A6A6A6A6A6A6A6. I would like to provide some comments on the encryption and decryption routines provided by cipher_openssl.c in which cipher.c and kmgr_utils.c are using. I see that "ossl_cipher_encrypt" calls "EVP_EncryptInit_ex" and "EVP_EncryptUpdate" only to complete the encryption. Same thing applies to decryption routines. According to my past experience with openssl and the usages online, it is highly recommended to use "init-update-final" cycle to complete the encryption and I see that the "final" part (EVP_EncryptFinal) is missing. This call will properly handle the last block of data especially when padding is taken into account. The functions still works now because the input is encryption key and its size is multiple of each cipher block and no padding is used. I think it will be safer to use the proper "init-update-final" cycle for encryption/decryption According to openssl EVP documentation, "EVP_EncryptUpdate" can be called multiple times at different offset to the input data to be encrypted. I see that "pg_cipher_encrypt" only calls "EVP_EncryptUpdate" once, which makes me assume that the application should invoke "pg_cipher_encrypt" multiple times until the entire data block is encrypted? I am asking because if we were to use "EVP_EncryptFinal" to complete the encryption cycle, then it is better to let "pg_cipher_encrypt" to figure out how many times "EVP_EncryptUpdate" should be called and finalize it with "EVP_EncryptFinal" at last block. Lastly, I think we are missing a cleanup routine that calls "EVP_CIPHER_CTX_free()" to free up the EVP_CIPHER_CTX when encryption is done. Thank you Cary Huang HighGo Software Canada
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Mon, Jan 6, 2020 at 4:43 AM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > On Sat, 4 Jan 2020 at 15:11, cary huang wrote: > >> > >> Hello Sawada and all > >> > >> I would like to elaborate more on Sehrope and Sawada's discussion on > passing NULL IV in "pg_cipher_encrypt/decrypt" functions during > kmgr_wrap_key and kmgr_unwrap_key routines in kmgr_utils.c. Openssl > implements key wrap according to RFC3394 as Sawada mentioned and passing > NULL will make openssl to use default IV, which equals to A6A6A6A6A6A6A6A6. > I have confirmed this on my side; A key wrapped with "NULL" IV can be > unwrapped successfully with IV=A6A6A6A6A6A6A6A6, and unwrap will fail if IV > is set to anything else other than NULL or A6A6A6A6A6A6A6A6. > >> > > >Sehrope also suggested me not to use the fixed IV in order to avoid > >getting the same result from the same value. I'm researching it now. > >Also, currently it's using key wrap algorithm[1] but it accepts only > >multiple of 8 bytes as input. Since it's not good for some cases it's > >better to use key wrap with padding algorithm[2] instead, which seems > >available in OpenSSL 1.1.0 or later. > Since the current kmgr only supports AES128 and AES256, the master key generated by kmgr during bootstrap will always be multiple of 8. I believe AES keys in general must always be in multiple of 8. I have not done enough research as to which encryption algorithm will involve keys not in multiple of 8 so I think with the current key_wrap_algorithm is fine. With the key wrap algorithm defined in RFC3394. The IV is used only as a "initial value" and it has to be static; either we randomize one or we use the default A6A6A6... by passing NULL, It is different from the CTR block cipher mode which has been selected to encrypt WAL and buffer. In CTR mode, each block requires a different and unique IV as input in order to be secured; and we have agreed to use segment IDs as IVs. For this reason, I think the current key wrap implementation is fine. The least we can do is to generate a IV during bootstrap and store it in control file, and this generated IV will be used for all key wrapping / unwrapping purposes instead of using the default one. Best regards Cary Huang HighGo Software Canada
Re: [DOC] Add detail regarding resource consumption wrt max_connections
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed I think it is good to warn the user about the increased allocation of memory for certain parameters so that they do not abuse it by setting it to a huge number without knowing the consequences. It is true that max_connections can increase the size of proc array and other resources, which are allocated in the shared buffer, which also means less shared buffer to perform regular data operations. I am sure this is not the only parameter that affects the memory allocation. "max_prepared_xacts" can also affect the shared memory allocation too so the same warning message applies here as well. Maybe there are other parameters with similar effects. Instead of stating that higher max_connections results in higher allocation, It may be better to tell the user that if the value needs to be set much higher, consider increasing the "shared_buffers" setting as well. thank you --- Cary Huang Highgo Software Canada www.highgo.ca
Re: [Proposal] Allow pg_dump to include all child tables with the root table
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hi the patch applies fine on current master branch and it works as described. However, I would suggest changing the new option name from "--with-childs" to "--with-partitions" for several reasons. "childs" is grammatically incorrect and in the PG community, the term "partitioned table" is normally used to denote a parent table, and the term "partition" is used to denote the child table under the parent table. We should use these terms to stay consistent with the community. Also, I would rephrase the documentation as: Used in conjunction with -t/--table or -T/--exclude-table options to include or exclude partitions of the specified tables if any. thank you Cary Huang HighGo Software Canada www.highgo.ca
Re: pgbench - adding pl/pgsql versions of tests
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hi thank you for the patch. It can be applied to current master branch and compiled fine. The feature works as described, I am able to run plpgsql-tpcb-like and plpgsql-simple-update scripts as you have added them. But I am not sure the purpose of --no-function to prevent the creation of pl/pgsql functions when the new plpgsql test scripts need them. I initialized pgbench database with --no-function, and plpgsql-tpcb-like and plpgsql-simple-update scripts will fail to run thanks Cary Huang === Highgo Software Canada www.highgo.ca
Re: sslinfo extension - add notbefore and notafter timestamps
Hi Jacob > Hi Cary, did you have any thoughts on the timestamptz notes from my last mail? > > > It might also be nice to rename > > ASN1_TIME_to_timestamp(). > > > > Squinting further at the server backend implementation, should that > > also be using TimestampTz throughout, instead of Timestamp? It all > > goes through float8_timestamptz at the end, so I guess it shouldn't > > have a material impact, but it's a bit confusing. Sorry I kind of missed this review comment from your last email. Thanks for bringing it up again though. I think it is right to change the backend references of "timestamp" to "timestampTz" for consistency reasons. I have gone ahead to make the changes. I have also reviewed the wording on the documentation and removed "UTC" from the descriptions. Since sslinfo extension and pg_stat_ssl both return timestampTz in whatever timezone PostgreSQL is running on, they do not always return UTC timestamps. Attached is the v10 patch with the above changes. Thanks again for the review. Best regards Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v10-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-displ.patch Description: Binary data
Re: sslinfo extension - add notbefore and notafter timestamps
Thank you for your review again. > ...but I think Timestamp[Tz]s are stored as microseconds, so we're off > by a factor of a million. This still works because later we cast to > double and pass it back through float8_timestamptz, which converts it: In my test, if I made ASN1_TIME_to_timestamptz to return in microseconds, the float8_timestamptz() function will catch a "timestamp out of range" exception as this function treats the input as seconds. > But anyone who ends up inspecting the value of > st_sslstatus->ssl_not_before directly is going to find an incorrect > timestamp. I think it'd be clearer to store microseconds to begin > with, and then just use TimestampTzGetDatum rather than the > DirectFunctionCall1 we have now. I have also tried TimestampTzGetDatum with ASN1_TIME_to_timestamptz outputting in microseconds. No exception is caught, but pg_stat_ssl displays incorrect results. The timestamps are extra large because the extra factor of 1 million is considered in the timestamp computation as well. The comments for TimestampTz says: * Timestamps, as well as the h/m/s fields of intervals, are stored as * int64 values with units of microseconds. (Once upon a time they were * double values with units of seconds.) but it seems to me that many of the timestamp related functions still consider timestamp or timestampTz as "double values with units of seconds" though. Best regards Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: Add last_commit_lsn to pg_stat_database
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello I think it is convenient to know the last commit LSN of a database for troubleshooting and monitoring purposes similar to the "pd_lsn" field in a page header that records the last LSN that modifies this page. I am not sure if it can help determine the WAL location to resume / catch up in logical replication as the "confirmed_flush_lsn" and "restart_lsn" in a logical replication slot are already there to figure out the resume / catchup point as I understand. Anyhow, I had a look at the patch, in general it looks good to me. Also ran it against the CI bot, which also turned out fine. Just one small comment though. The patch supports the recording of last commit lsn from 2 phase commit as well, but the test does not seem to have a test on 2 phase commit. In my opinion, it should test whether the last commit lsn increments when a prepared transaction is committed in addition to a regular transaction. thank you - Cary Huang www.highgo.ca
Re: Encoding protection for pgcrypto
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hello I had a look at your patch, which applies fine to PostgreSQL master. I noticed that the new regression tests you have added to test utf-8 encoding fails on my setup (make check) with the following diffs: --- @ -13,6 +13,7 @@ =ivrD -END PGP MESSAGE- '), '0123456789abcdefghij'), 'sha1'); +ERROR: invalid byte sequence for encoding "UTF8": 0x91 -- Database encoding protection. Ciphertext source: -- printf '\xe0\xe0\xbff' | gpg --batch --passphrase mykey --textmode --armor --symmetric select pgp_sym_decrypt(dearmor(' @@ -23,5 +24,5 @@ =QKy4 -END PGP MESSAGE- '), 'mykey', 'debug=1'); +ERROR: invalid byte sequence for encoding "UTF8": 0xe0 0xe0 0xbf \quit -\endif --- I am not sure but it seems that you intentionally provide a text input that would produce a non-utf-8 compliant decrypted output, which triggers the error from within "pg_verifymbstr()" call that you have added in pgp-pgsql.c? Are the errors expected in your new test case? If so, then the tests shall pass instead because it has caught a invalid encoding in decrypted output. Generally, I am ok with the extra encoding check after text decryption but do not think if it is a good idea to just error out and abort the transaction when it detects invalid encoding character. text decryption routines may be used quite frequently and users generally do not expect them to abort transaction. It may be ok to just give them a warning about invalid character encoding. thanks Cary Huang Highgo Software - Canada www.highgo.ca
[Patch] add multiple client certificate selection feature
Hello I would like to share a patch that adds a feature to libpq to automatically select the best client certificate to send to the server (if it requests one). This feature is inspired by this email discussion years ago: https://www.postgresql.org/message-id/200905081539.n48Fdl2Y003286%40no.baka.org. This feature is useful if libpq client needs to communicate with multiple TLS-enabled PostgreSQL servers with different TLS certificate setups. Instead of letting the application to figure out the right certificate for the right server, the patch allows libpq library itself to pick the most ideal client certificate to send to the server. Currently, we rely on options “sslcert” and “sslkey” parameters on the client side to select a client certificate + private key to send to the server, the patch adds 2 new options. “sslcertdir” and “sslkeydir” to specify directories where all possible certificate and private key files are stored. The new options cannot be used with “sslcert” and “sslkey” at the same time. The most ideal certificate selection is based on the trusted CA names sent by the server in “Certificate Request” handshake message; obtained by the client making a call to “SSL_get0_peer_CA_list()” function. This list of trusted CA names tells the client the list of “issuers” that this server can trust. Inside “sslcertdir”, If a client certificate candidate’s issuer name equals to one of the trusted CA names, then that is the certificate to use. Once a candidate certificate is identified, the patch will then look for a matching private key in “sslkeydir”. These actions are performed in certificate callback function (cert_cb), which gets called when server requests a client certificate during TLS handshake. This patch requires OpenSSL version 1.1.1 or later to work. The feature will be disabled with older OpenSSL versions. Attached is a POC patch containing the described feature. Limitations: One limitation of this feature is that it does not quite support the case where multiple private key files inside “sslkeydir” are encrypted with different passwords. When the client wants to find a matching private key from “sslkeydir”, it will always use the same password supplied by the client (via “sslpassword” option) to decrypt the private key it tries to access. Also, no tap tests have been added to the patch to test this feature yet. So, to test this feature, we will need to prepare the environment manually: 1. generate 2 root CA certificates (ca1 and ca2), which sign 2 sets of client and server certificates. 2. configure the server to use a server certificate signed by either ca1 or ca2. 3. put all client certificates and private keys (signed by both ca1 and ca2) into a directory (we will point"sslcertdir" and "sslkeydir" to this directory) 4. based on the root CA certificate configured at the server side, the client will pick the certificate that the server can trust from specified "sslcertdir" and "sslkeydir" directories Please let me know what you think. Any comments / feedback are greatly appreciated. Best regards Cary Huang Highgo Software (Canada) www.highgo.ca v1-0001-multiple_client_certificate_selection_support.patch Description: Binary data
Re: [Patch] add multiple client certificate selection feature
Hello I would like to share a version 2 patch for multiple client certificate selection feature with several enhancements over v1. I removed the extra parameter "sslcertdir" and "sslkeydir". Instead, I reuse the existing sslcert, ssldir and sslpassword parameters but allow multiple entries to be supplied separated by comma. This way, we are able to use a different sslpassword to decrypt different sslkey files based on the selected certificate. This was not possible in v1. When a client is doing a TLS handshake with a server that requires client certificate, the client will obtain a list of trusted CA names from the server and try to match it from the list of certificates provided via sslcert option. A client certificate is chosen if its issuer matches one of the server’s trusted CA names. Once a certificate is chosen, the corresponding private key and sslpassword (if required) will be used to establish a secured TLS connection. The feature is useful when a libpq client needs to communicate with multiple TLS-enabled PostgreSQL server instances with different TLS certificate setups. Instead of letting the application to figure out what certificate to send to what server, we can configure all possible certificate candidates to libpq and have it choose the best one to use instead. Hello Daniel Sorry to bother. I am just wondering your opinion about this feature? Should this be added to commitfest for review? This feature involves certificates issued by different root CAs to test the its ability to pick the right certificate, so the existing ssl tap test’s certificate generation script needs an update to test this. I have not done so yet, because I would like to discuss with you first. Any comments and recommendations are welcome. Thank you! Best regards Cary Huang v2-0001-multiple_client_certificate_selection_support.patch Description: Binary data
Re: sslinfo extension - add notbefore and notafter timestamps
Hello Thank you for the review and your patch. I have tested with minimum OpenSSL version 1.0.2 support and incorporated your changes into the v9 patch as attached. > In my -08 timezone, the date doesn't match what's recorded either > (it's my "tomorrow"). I think those probably just need to be converted > to UTC explicitly? I've attached a sample diff on top of v8 that > passes tests on my machine. Yes, I noticed this in the SSL test too. I am also in GTM-8, so for me the tests would fail too due to the time zone differences from GMT. It shall be okay to specifically set the outputs of pg_stat_ssl, ssl_client_get_notbefore, and ssl_client_get_notafte to be in GMT time zone. The not before and not after time stamps in a client certificate are generally expressed in GMT. Thank you! Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v9-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-displ.patch Description: Binary data
typo in paths.h
Hello I noticed that the comment for declaring create_tidscan_paths() in src/include/optimizer/paths.h has a typo. The function is implemented in tidpath.c, not tidpath.h as stated, which does not exist. Made a small patch to correct it. Thank you Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca correct_source_filename.patch Description: Binary data
Re: Patch: Global Unique Index
On 2022-11-29 6:16 p.m., Tom Lane wrote: Assuming that you are inserting into index X, and you've checked index Y to find that it has no conflicts, what prevents another backend from inserting a conflict into index Y just after you look? AIUI the idea is to prevent that by continuing to hold an exclusive lock on the whole index Y until you've completed the insertion. Perhaps there's a better way to do that, but it's not what was described. During inserts, global unique index patch does not acquire exclusive lock on the whole index Y while checking it for the uniqueness; it acquires a low level AccessShareLock on Y and will release after checking. So while it is checking, another backend can still insert a duplicate in index Y. If this is the case, a "transaction level lock" will be triggered. For example. Say backend A inserts into index X, and checks index Y to find no conflict, and backend B inserts a conflict into index Y right after. In this case, backend B still has to check index X for conflict and It will fetch a duplicate tuple that has been inserted by A, but it cannot declare a duplicate error yet. This is because the transaction inserting this conflict tuple started by backend A is still in progress. At this moment, backend B has to wait for backend A to commit / abort before it can continue. This is how "transaction level lock" prevents concurrent insert conflicts. There is a chance of deadlock if the conflicting insertions done by A and B happen at roughly the same time, where both backends trigger "transaction level lock" to wait for each other to commit/abort. If this is the case, PG's deadlock detection code will error out one of the backends. It should be okay because it means one of the backends tries to insert a conflict. The purpose of global unique index is also to error out backends trying to insert duplicates. In the end the effects are the same, it's just that the error says deadlock detected instead of duplicate detected. If backend B did not insert a conflicting tuple, no transaction lock wait will be triggered, and therefore no deadlock will happen. Regards Cary Huang --- HighGo Software Canada
Re: Patch: Global Unique Index
On 2022-11-30 2:30 p.m., Greg Stark wrote: On Tue, 29 Nov 2022 at 21:16, Tom Lane wrote: I actually think that that problem should be soluble with a slightly different approach. The thing that feels insoluble is that you can't do this without acquiring sufficient locks to prevent addition of new partitions while the insertion is in progress. That will be expensive in itself, and it will turn ATTACH PARTITION into a performance disaster. I think there`s a lot of room to manoeuvre here. This is a new feature that doesn't need to be 100% complete or satisfy any existing standard. There are lots of options for compromises that leave room for future improvements. 1) We could just say sure ATTACH is slow if you're attaching an non-empty partition 2) We could invent a concept like convalidated and let people attach a partition without validating the uniqueness and then validate it later concurrently 3) We could say ATTACH doesn't work now and come up with a better strategy in the future Also, don't I vaguely recall something in exclusion constraints about having some kind of in-memory "intent" list where you declared that you're about to insert a value, you validate it doesn't violate the constraint and then you're free to insert it because anyone else will see your intent in memory? There might be a need for some kind of global object that only holds inserted keys long enough that other sessions are guaranteed to see the key in the correct index. And that could maybe even be in memory rather than on disk. This isn't a simple project but I don't think it's impossible as long as we keep an open mind about the requirements. In the current global unique index implementation, ATTACH can be slow if there are concurrent inserts happening. ATTACH tries to acquire shareLock on all existing partitions and partition-to-be before it scans and sorts them for uniqueness check. It will release them only after all partitions have been checked. If there are concurrent inserts, ATTACH has to wait for all inserts complete. Likewise, if ATTACH is in progress, inserts have to wait as well. This is an issue now. If we were to make ATTACH acquire a lower level lock (AccessShareLock), scans a partition, and then release it. there is nothing stopping any concurrent inserts from inserting a conflict right after it finishes checking. This is another issue. There is no transaction level lock being triggered here like in multiple concurent inserts case Another email thread called "create index concurrently on partitioned index" discuss some approaches that may be used to solve the attach issue here, basically to allow ATTACH PARTITION CONCURRENTLY... regards Cary Huang - HighGo Software Canada
Re: pg_recvlogical prints bogus error when interrupted
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hello The patch applies and tests fine. I like the way to have both ready_to_exit and time_to_abort variables to control the exit sequence. I think the (void) cast can be removed in front of PQputCopyEnd(), PQflush for consistency purposes as it does not give warnings and everywhere else does not have those casts. thank you Cary
Re: [PATCH] Allow Postgres to pick an unused port to listen
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: not tested Documentation:not tested Hello This is one of those features that is beneficial to a handful of people in specific test cases. It may not benefit the majority of the users but is certainly not useless either. As long as it can be disabled and enough tests have been run to ensure it won't have a significant impact on working components while disabled, it should be fine in my opinion. Regarding where the selected port shall be saved (postmaster.pid, read by pg_ctl or saved in a dedicated file), I see that postmaster.pid already contains a port number in line number 4, so adding a port number into there is nothing new; port number is already there and we can simply replace the port number with the one selected by the system. I applied and tested the patch and found that the system can indeed start when port is set to 0, but line 4 of postmaster.pid does not store the port number selected by the system, rather, it stored 0, which is the same as configured. So I am actually not able to find out the port number that my PG is running on, at least not in a straight-forward way. thank you == Cary Huang HighGo Software www.highgo.ca
Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
Hello pg_receivewal creates this .partial WAL file during WAL streaming and it is already treating this file as a temporary file. It will fill this .partial file with zeroes up to 16777216 by default before streaming real WAL data on it. If your .partial file is only 8396800 bytes, then this could mean that pg_receivewal is terminated abruptly while it is appending zeroes or your system runs out of disk space. Do you have any error message? If this is case, the uninitialized .partial file should still be all zeroes, so it should be ok to delete it and have pg_receivewal to recreate a new .partial file. Also, in your patch, you are using pad_to_size argument in function dir_open_for_write to determine if it needs to create a temp file, but I see that this function is always given a pad_to_size = 16777216 , and never 0. Am I missing something? Cary Huang === HighGo Software Canada
Re: Add support for AT LOCAL
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello I think this feature can be a useful addition in dealing with time zones. I have applied and tried out the patch, The feature works as described and seems promising. The problem with compilation failure was probably reported on CirrusCI when compiled on different platforms. I have run the latest patch on my own Cirrus CI environment and everything checked out fine. Thank you Cary Huang -- Highgo Software Canada www.highgo.ca
Include sequence relation support in logical replication
Hi >From the PG logical replication documentation, I see that there is a listed >limitation that sequence relation is not replicated logically. After some >examination, I see that retrieving the next value from a sequence using the >nextval() call will emits a WAL update every 32 calls to nextval(). In fact, >when it emits a WAL update, it will write a future value 32 increments from >now, and maintain a internal cache for delivering sequence numbers. It is done >this way to minimize the write operation to WAL record at a risk of losing >some values during a crash. So if we were to replicate the sequence, the >subscriber will receive a future value (32 calls to nextval()) from now, and >it obviously does not reflect current status. Sequence changes caused by other >sequence-related SQL functions like setval() or ALTER SEQUENCE xxx, will >always emit a WAL update, so replicating changes caused by these should not be >a problem. I have shared a patch that allows sequence relation to be supported in logical replication via the decoding plugin ( test_decoding for example ); it does not support sequence relation in logical replication between a PG publisher and a PG subscriber via pgoutput plugin as it will require much more work. For the replication to make sense, the patch actually disables the WAL update at every 32 nextval() calls, so every call to nextval() will emit a WAL update for proper replication. This is done by setting SEQ_LOG_VALS to 0 in sequence.c I think the question is that should we minimize WAL update frequency (every 32 calls) for getting next value in a sequence at a cost of losing values during crash or being able to replicate a sequence relation properly at a cost or more WAL updates? Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca sequence_replication.patch Description: Binary data
Re: Include sequence relation support in logical replication
Hi Andres thanks for your reply and your patch review. Please see my comments below >On 2020-03-24 16:19:21 -0700, Cary Huang wrote: >> I have shared a patch that allows sequence relation to be supported in >> logical replication via the decoding plugin ( test_decoding for >> example ); it does not support sequence relation in logical >> replication between a PG publisher and a PG subscriber via pgoutput >> plugin as it will require much more work. > > Could you expand on "much more work"? Once decoding support is there, > that shouldn't be that much? By much more work, I meant more source files will need to be changed to have sequence replication supported between a PG subscriber and publisher using pgoutput plugin. About 10 more source file changes. Idea is similar though. >> Sequence changes caused by other sequence-related SQL functions like >> setval() or ALTER SEQUENCE xxx, will always emit a WAL update, so >> replicating changes caused by these should not be a problem. > > I think this really would need to handle at the very least setval to > make sense. yes, sure >> For the replication to make sense, the patch actually disables the WAL >> update at every 32 nextval() calls, so every call to nextval() will >> emit a WAL update for proper replication. This is done by setting >> SEQ_LOG_VALS to 0 in sequence.c > > Why is that needed? ISTM updating in increments of 32 is fine for > replication purposes? It's good imo, because sending out more granular > increments would increase the size of the WAL stream? yes, updating WAL at every 32 increment is good and have huge performance benefits according to Michael, but when it is replicated logically to subscribers, the sequence value they receive would not make much sense to them. For example, If i have a Sequence called "seq" with current value = 100 and increment = 5. The nextval('seq') call will return 105 to the client but it will write 260 to WAL record ( 100 + (5*32) ), because that is the value after 32 increments and internally it is also maintaining a "log_cnt" counter that tracks how many nextval() calls have been invoked since the last WAL write, so it could kind of derive backwards to find the proper next value to return to client. But the subscriber for this sequence will receive a change of 260 instead of 105, and it does not represent the current sequence status. Subscriber is not able to derive backwards because it does not know the increment size in its schema. Setting SEQ_LOG_VALS to 0 in sequence.c basically disables this 32 increment behavior and makes WAL update at every nextval() call and therefore the subscriber to this sequence will receive the same value (105) as the client, as a cost of more WAL writes. I would like to ask if you have some suggestions or ideas that can make subscriber receives the current value without the need to disabling the 32 increment behavior? >> diff --git a/contrib/test_decoding/test_decoding.c >> b/contrib/test_decoding/test_decoding.c >> index 93c948856e..7a7e572d6c 100644 >> --- a/contrib/test_decoding/test_decoding.c >> +++ b/contrib/test_decoding/test_decoding.c >> @@ -466,6 +466,15 @@ pg_decode_change(LogicalDecodingContext *ctx, >> ReorderBufferTXN *txn, >> &change->data.tp.oldtuple->tuple, >> true); >> break; >> +case REORDER_BUFFER_CHANGE_SEQUENCE: >> +appendStringInfoString(ctx->out, " SEQUENCE:"); >> +if (change->data.sequence.newtuple == NULL) >> +appendStringInfoString(ctx->out, " >> (no-tuple-data)"); >> +else >> +tuple_to_stringinfo(ctx->out, tupdesc, >> + >> &change->data.sequence.newtuple->tuple, >> +false); >> +break; >> default: >> Assert(false); >> } > > You should also add tests - the main purpose of contrib/test_decoding is > to facilitate writing those... thanks, I will add >> +ReorderBufferXidSetCatalogChanges(ctx->reorder, >> XLogRecGetXid(buf->record), buf->origptr); > > Huh, why are you doing this? That's going to increase overhead of logical > decoding by many times? This is needed to allow snapshot to be built inside DecodeCommit() function. Regular changes caused by
Re: Internal key management system
Hi I had a look on kms_v9 patch and have some comments --> pg_upgrade.c keys are copied correctly, but as pg_upgrade progresses further, it will try to start the new_cluster from "issue_warnings_and_set_wal_level()" function, which is called after key copy. The new cluster will fail to start due to the mismatch between cluster_passphrase_command and the newly copied keys. This causes pg_upgrade to always finish with failure. We could move "copy_master_encryption_key()" to be called after "issue_warnings_and_set_wal_level()" and this will make pg_upgrade to finish with success, but user will still have to manually correct the "cluster_passphrase_command" param on the new cluster in order for it to start up correctly. Should pg_upgrade also take care of copying "cluster_passphrase_command" param from old to new cluster after it has copied the encryption keys so users don't have to do this step? If the expectation is for users to manually correct "cluster_passphrase_command" param after successful pg_upgrade and key copy, then there should be a message to remind the users to do so. -->Kmgr.c + /* +* If there is only temporary directory, it means that the previous +* rotation failed after wrapping the all internal keys by the new +* passphrase. Therefore we use the new cluster passphrase. +*/ + if (stat(KMGR_DIR, &st) != 0) + { + ereport(DEBUG1, + (errmsg("both directories %s and %s exist, use the newly wrapped keys", + KMGR_DIR, KMGR_TMP_DIR))); I think the error message should say "there is only temporary directory exist" instead of "both directories exist" thanks! Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Wed, 25 Mar 2020 01:51:08 -0700 Masahiko Sawada wrote On Tue, 24 Mar 2020 at 23:15, Bruce Momjian <mailto:br...@momjian.us> wrote: > > On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote: > > That seems to work fine. > > > > So we will have pg_cryptokeys within PGDATA and each key is stored > > into separate file named the key id such as "sql", "tde-wal" and > > "tde-block". I'll update the patch and post. > > Yes, that makes sense to me. > I've attached the updated patch. With the patch, we have three internal keys: SQL key, TDE-block key and TDE-wal key. Only SQL key can be used so far to wrap and unwrap user secret via pg_wrap and pg_unwrap SQL functions. Each keys is saved to the single file located at pg_cryptokeys. After initdb with enabling key manager, the pg_cryptokeys directory has the following files: $ ll data/pg_cryptokeys total 12K -rw--- 1 masahiko staff 132 Mar 25 15:45 -rw--- 1 masahiko staff 132 Mar 25 15:45 0001 -rw--- 1 masahiko staff 132 Mar 25 15:45 0002 I used the integer id rather than string id to make the code simple. When cluster passphrase rotation, we update all keys atomically using temporary directory as follows: 1. Derive the new passphrase 2. Wrap all internal keys with the new passphrase 3. Save all internal keys to the temp directory 4. Remove the original directory, pg_cryptokeys 5. Rename the temp directory to pg_cryptokeys In case of failure during rotation, pg_cyrptokeys and pg_cyrptokeys_tmp can be left in an incomplete state. We recover it by checking if the temporary directory exists and the wrapped keys in the temporary directory are valid. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Let people set host(no)ssl settings from initdb
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi I applied the patch "v3-0001-Enable-setting-pg_hba.conf-permissions-from-initd.patch" and did some verification with it. The intended feature works overall and I think it is quite useful to support default auth methods for ssl and gss host types. I have also found some minor things in the patch and would like to share as below: > +"# CAUTION: Configuring the system for \"trust\" authentication\n" \ > +"# allows any user who can reach the databse on the route specified\n" \ > +"# to connect as any PostgreSQL user, including the database\n" \ > +"# superuser. If you do not trust all the users who could\n" \ > +"# reach the database on the route specified, use a more restrictive\n" \ > +"# authentication method.\n" Found a typo: should be 'database' instead of 'databse' > * a sort of poor man's grep -v > */ > -#ifndef HAVE_UNIX_SOCKETS > static char ** > filter_lines_with_token(char **lines, const char *token) > { > @@ -461,7 +466,6 @@ filter_lines_with_token(char **lines, const char *token) > > return result; > } > -#endif I see that you have removed "#ifndef HAVE_UNIX_SOCKETS" around the filter_lines_with_token() function definition so it would be always available, which is used to remove the @tokens@ in case user does not specify a default auth method for the new hostssl, hostgss options. I think you should also remove the "#ifndef HAVE_UNIX_SOCKETS" around its declaration as well so both function definition and declaration would make sense. #ifndef HAVE_UNIX_SOCKETS static char **filter_lines_with_token(char **lines, const char *token); #endif Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: warn if GUC set to an invalid shared library
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hello I tested the patches on master branch on Ubuntu 18.04 and regression turns out fine. I did a manual test following the query examples in this email thread and I do see the warnings when I attempted: these queries: postgres=# SET local_preload_libraries=xyz.so; 2022-01-28 15:11:00.592 PST [13622] WARNING: could not access file "xyz.so" WARNING: could not access file "xyz.so" SET postgres=# ALTER SYSTEM SET shared_preload_libraries=abc.so; 2022-01-28 15:11:07.729 PST [13622] WARNING: could not access file "$libdir/plugins/abc.so" WARNING: could not access file "$libdir/plugins/abc.so" ALTER SYSTEM This is fine as this is what these patches are aiming to provide. However, when I try to restart the server, it fails to start because abc.so and xyz.so do not exist. Setting the parameters "local_preload_libraries" and "local_preload_libraries" to something else in postgresql.conf does not seem to take effect either. It still complains shared_preload_libraries abc.so does not exist even though I have already set shared_preload_libraries to something else in postgresql.conf. This seems a little strange to me thank you Cary
Patch: Global Unique Index
Patch: Global Unique Index “Global unique index” in our definition is a unique index on a partitioned table that can ensure cross-partition uniqueness using a non-partition key. This work is inspired by this email thread, “Proposal: Global Index” started back in 2019 (Link below). My colleague David and I took a different approach to implement the feature that ensures uniqueness constraint spanning multiple partitions. We achieved this mainly by using application logics without heavy modification to current Postgres’s partitioned table/index structure. In other words, a global unique index and a regular partitioned index are essentially the same in terms of their storage structure except that one can do cross-partition uniqueness check, the other cannot. https://www.postgresql.org/message-id/CALtqXTcurqy1PKXzP9XO%3DofLLA5wBSo77BnUnYVEZpmcA3V0ag%40mail.gmail.com - Patch - The attached patches were generated based on commit `85d8b30724c0fd117a683cc72706f71b28463a05` on master branch. - Benefits of global unique index - 1. Ensure uniqueness spanning all partitions using a non-partition key column 2. Allow user to create a unique index on a non-partition key column without the need to include partition key (current Postgres enforces this) 3. Query performance increase when using a single unique non-partition key column - Supported Features - 1. Global unique index is supported only on btree index type 2. Global unique index is useful only when created on a partitioned table. 3. Cross-partition uniqueness check with CREATE UNIQUE INDEX in serial and parallel mode 4. Cross-partition uniqueness check with ATTACH in serial and parallel mode 5. Cross-partition uniqueness check when INSERT and UPDATE - Not-supported Features - 1. Global uniqueness check with Sub partition tables is not yet supported as we do not have immediate use case and it may involve majoy change in current implementation - Global unique index syntax - A global unique index can be created with "GLOBAL" and "UNIQUE" clauses in a "CREATE INDEX" statement run against a partitioned table. For example, CREATE UNIQUE INDEX global_index ON idxpart(bid) GLOBAL; - New Relkind: RELKIND_GLOBAL_INDEX - When a global unique index is created on a partitioned table, its relkind is RELKIND_PARTITIONED_INDEX (I). This is the same as creating a regular index. Then Postgres will recursively create index on each child partition, except now the relkind will be set as RELKIND_GLOBAL_INDEX (g) instead of RELKIND_INDEX (i). This new relkind, along with uniqueness flag are needed for cross-partition uniqueness check later. - Create a global unique index - To create a regular unique index on a partitioned table, Postgres has to perform heap scan and sorting on every child partition. Uniqueness check happens during the sorting phase and will raise an error if multiple tuples with the same index key are sorted together. To achieve global uniqueness check, we make Postgres perform the sorting after all of the child partitions have been scanned instead of on the "sort per partition" fashion. In otherwords, the sorting only happens once at the very end and it sorts the tuples coming from all the partitions and therefore can ensure global uniqueness. In parallel index build case, the idea is the same, except that the tuples will be put into shared file set (temp files) on disk instead of in memory to ensure other workers can share the sort results. At the end of the very last partition build, we make Postgres take over all the temp files and perform a final merge sort to ensure global uniqueness. Example: > CREATE TABLE gidx_part(a int, b int, c text) PARTITION BY RANGE (a); > CREATE TABLE gidx_part1 PARTITION OF gidx_part FOR VALUES FROM (0) to (10); > CREATE TABLE gidx_part2 PARTITION OF gidx_part FOR VALUES FROM (10) to (20); > INSERT INTO gidx_part values(5, 5, 'test'); > INSERT INTO gidx_part values(15, 5, 'test'); > CREATE UNIQUE INDEX global_unique_idx ON gidx_part(b) GLOBAL; ERROR: could not create unique index "gidx_part1_b_idx" DETAIL: Key (b)=(5) is duplicated. - INSERT and UPDATE - For every new tuple inserted or updated, Postgres attempts to fetch the same tuple from current partition to determine if a duplicate already exists. In the global unique index case, we make Postgres attempt to fetch the same tuple from other partitions as well as the current partition. If a duplicate is found, global uniqueness is violated and an error is raised. Example: > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a); > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10); > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20); > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL; > INSERT INTO gidx_part values(5, 5, 'test'); > INSERT INTO gidx_part values(15, 5, 'test');
Re: Patch: Global Unique Index
6 partition of test for values from (500) to (600); -> create unique index myindex on test(b) global; -> insert into test values(generate_series(0,599), generate_series(0,599), 'test'); /* record timing */ -> delete from test;/* record timing */ -> drop index myindex; -> insert into test values(generate_series(0,599), generate_series(0,599), 'test'); -> create unique index myindex on test(b) global; /* record timing */ -> create table test7 (a int, b int, c text); -> insert into test7 values(generate_series(600, 699), generate_series(600, 699), 'test'); -> alter table test attach partition test7 for values from (600) TO (700); /* record timing */ -> alter table test detach partition test7; /* record timing */ As you can see, insert operation suffers the most performance drawbacks. In fact, it takes roughly 6 times as much time to complete the insertion, which matches the number of partitions in the test. The Attach operation also takes roughly 6 times as much time to complete, because it has to performs uniqueness check on all 6 existing partitions to determine global uniqueness. Detach in both case takes the same time to complete. Create global unique index takes 35% longer to build. We also ran some tests for random SELECT and UPDATE using non-partition key with pgbench to compare the performance among 3 conditions: no index, regular unique index (with partition-key involved), and global unique index: Test 1: scale=100, 10 partitions, 1 million tuples/partition SELECT: -> No partitioned index:tps = 3.827886 -> regular unique index:tps = 14.713099 -> global unique index: tps = 23791.314238 UPDATE mixed with SELECT: -> No partitioned index:tps = 1.926013 -> regular unique index:tps = 7.087059 -> global unique index: tps = 2253.098335 Test 2: scale=1,000, 100 partitions, 1 million tuples/partition SELECT: -> No partitioned index:tps = 0.110029 -> regular unique index:tps = 0.268199 -> global unique index: tps = 2334.811682 UPDATE mixed with SELECT: -> No partitioned index:tps = 0.115329 -> regular unique index:tps = 0.197052 -> global unique index: tps = 541.488621 Test 3: scale=10,000, 1,000 partitions, 1 million tuples/partition SELECT: -> No partitioned index:tps = 0.011047 -> regular unique index:tps = 0.036812 -> global unique index: tps = 147.189456 UPDATE mixed with SELECT: -> No partitioned index:tps = 0.008209 -> regular unique index:tps = 0.054367 -> global unique index: tps = 57.740432 thank you very much and we hope this information could help clarify some concerns about this approach. David and Cary HighGo Software Canada http://www.highgo.ca On Mon, 21 Nov 2022 05:33:30 -0700 Simon Riggs wrote --- > On Thu, 17 Nov 2022 at 22:01, Cary Huang mailto:cary.hu...@highgo.ca> wrote: > > > > Patch: Global Unique Index > > Let me start by expressing severe doubt on the usefulness of such a > feature, but also salute your efforts to contribute. > > > In other words, a global unique index and a regular partitioned index are > > essentially the same in terms of their storage structure except that one > > can do cross-partition uniqueness check, the other cannot. > > This is the only workable architecture, since it allows DETACH to be > feasible, which is essential. > > You don't seem to mention that this would require a uniqueness check > on each partition. Is that correct? This would result in O(N) cost of > uniqueness checks, severely limiting load speed. I notice you don't > offer any benchmarks on load speed or the overhead associated with > this, which is not good if you want to be taken seriously, but at > least it is recoverable. > > (It might be necessary to specify some partitions as READ ONLY, to > allow us to record their min/max values for the indexed cols, allowing > us to do this more quickly.) > > > - Supported Features - > > 1. Global unique index is supported only on btree index type > > Why? Surely any index type that supports uniqueness is g
Re: Patch: Global Unique Index
On Thu, 24 Nov 2022 08:00:59 -0700 Thomas Kellerer wrote --- > Pavel Stehule schrieb am 24.11.2022 um 07:03: > > There are many Oracle users that find global indexes useful despite > > their disadvantages. > > > > I have seen this mostly when the goal was to get the benefits of > > partition pruning at runtime which turned the full table scan (=Seq > > Scan) > > on huge tables to partition scans on much smaller partitions. > > Partition wise joins were also helpful for query performance. > > The substantially slower drop partition performance was accepted in > > thos cases > > > > > > I think it would be nice to have the option in Postgres as well. > > > > I do agree however, that the global index should not be created > > automatically. > > > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better > > > > > > Is it necessary to use special marks like GLOBAL if this index will > > be partitioned, and uniqueness will be ensured by repeated > > evaluations? > > > > Or you think so there should be really forced one relation based > > index? > > > > I can imagine a unique index on partitions without a special mark, > > that will be partitioned, and a second variant classic index created > > over a partitioned table, that will be marked as GLOBAL. > > > My personal opinion is, that a global index should never be created > automatically. > > The user should consciously decide on using a feature > that might have a serious impact on performance in some areas. Agreed, if a unique index is created on non-partition key columns without including the special mark (partition key columns), it may be a mistake from user. (At least I make this mistake all the time). Current PG will give you a warning to include the partition keys, which is good. If we were to automatically turn that into a global unique index, user may be using the feature without knowing and experiencing some performance impacts (to account for extra uniqueness check in all partitions).
Re: [PATCH] Simple code cleanup in tuplesort.c.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation:not tested Removing "state->status = TSS_SORTEDINMEM" should be fine as it is already set in sort_bounded_heap(state) few lines before. Cary Huang HighGo Software Canada www.highgo.ca
Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist
> I think the sslcertmode=disable option that I introduced in [1] solves > this issue too; would it work for your case? That whole patchset is > meant to tackle the general case of the problem you've described. > > (Eventually I'd like to teach the server not to ask for a client > certificate if it's not going to use it.) there is an option in pg_hba.conf on the server side called "clientcert" that can be specified besides the auth method that controls if certain client connections are required to send client certificate for additional verification. The value of "clientcert" can be "verify-ca" or "verify-full". For example: hostssl all all 127.0.0.1/32 md5 clientcert=verify-full If clientcert is not requested by the server, but yet the client still sends the certificate, the server will still verify it. This is the case in this discussion. I agree that it is a more elegant approach to add "sslcertmode=disable" on the client side to prevent sending default certificate. But, if the server does request clientcert but client uses "sslcertmode=disable" to connect and not give a certificate, it would also result in authentication failure. In this case, we actually would want to ignore "sslcertmode=disable" and send default certificates if found. It would perhaps to better change the parameter to "defaultclientcert=on-demand" on the client side that will: 1. not send the existing default certificate if server does not request a certificate 2. send the existing default certificate if server does request a certificate while the client does not use "sslcert" parameter to specify another non-default certificate I put "default" in the parameter name to indicate that it only applies to default certificate. If user specifies a non-default certificate using "sslcert" parameter, "defaultclientcert" should not be used and client should give error if both exists. Cary Huang HighGo Software Canada www.highgo.ca
Re: Avoid memory leaks during base backups
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello I applied your v5 patch on the current master and run valgrind on it while doing a basebackup with simulated error. No memory leak related to backup is observed. Regression is also passing thank you Cary Huang HighGo Software Canada
Re: allow specifying action when standby encounters incompatible parameter settings
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello The patch applied and tested fine. I think for this kind of exception, it really is up to the administrator how he/she should proceed to resolve depending on his/her business application. Leaving things configurable by the user is generally a nice and modest change. I also like that you leave the parameters as enum entry so it is possible to extend other possible actions such as automatically adjust to match the new value. - Cary Huang HighGo Software Canada
Re: Allowing REINDEX to have an optional name
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello The patch applies and tests fine and I think this patch has good intentions to prevent the default behavior of REINDEX DATABASE to cause a deadlock. However, I am not in favor of simply omitting the database name after DATABASE clause because of consistency. Almost all other queries involving the DATABASE clause require database name to be given following after. For example, ALTER DATABASE [dbname]. Being able to omit database name for REINDEX DATABASE seems inconsistent to me. The documentation states that REINDEX DATABASE only works on the current database, but it still requires the user to provide a database name and require that it must match the current database. Not very useful option, isn’t it? But it is still required from the user to stay consistent with other DATABASE clauses. Maybe the best way is to keep the query clause as is (with the database name still required) and simply don’t let it reindex system catalog to prevent deadlock. At the end, give user a notification that system catalogs have not been reindexed, and tell them to use REINDEX SYSTEM instead. Cary Huang - HighGo Software Canada www.highgo.ca
Re: Switching XLog source from archive to streaming when primary available
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello I tested this patch in a setup where the standby is in the middle of replicating and REDOing primary's WAL files during a very large data insertion. During this time, I keep killing the walreceiver process to cause a stream failure and force standby to read from archive. The system will restore from archive for "wal_retrieve_retry_interval" seconds before it attempts to steam again. Without this patch, once the streaming is interrupted, it keeps reading from archive until standby reaches the same consistent state of primary and then it will switch back to streaming again. So it seems that the patch does the job as described and does bring some benefit during a very large REDO job where it will try to re-stream after restoring some WALs from archive to speed up this "catch up" process. But if the recovery job is not a large one, PG is already switching back to streaming once it hits consistent state. thank you Cary Huang HighGo Software Canada
Re: Add last failed connection error message to pg_stat_wal_receiver
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello The patch can be applied to PG master branch without problem and it passed regression and tap tests. I manually tested this feature too and the last conn error is correctly shown in the pg_stat_get_wal_receiver output, which does exactly as described. I think this feature is nice to have to troubleshoot replication issues on the standby side. thank you Cary Huang Highgo Software Canada
Re: Automatic notification of top transaction IDs
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hello This patch applies fine to master branch and the regression tests are passing. Regarding the parallel worker case, the AssignTransactionId() function is already handling that and it will error out if IsParallelWorker() is true. In a normal case, this function is only called by the main backend, and the parallel workers will synchronize the transaction ID when they are spawned and they will not call this function anyway. thank you Cary Huang HighGo Software Canada www.highgo.ca
Re: CREATE SEQUENCE with RESTART option
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi I have applied and run your patch, which works fine in my environment. Regarding your comments in the patch: /* * Restarting a sequence while defining it doesn't make any sense * and it may override the START value. Allowing both START and * RESTART option for CREATE SEQUENCE may cause confusion to user. * Hence, we throw error for CREATE SEQUENCE if RESTART option is * specified. However, it can be used with ALTER SEQUENCE. */ I would remove the first sentence, because it seems like a personal opinion to me. I am sure someone, somewhere may think it makes total sense :). I would rephrase like this: /* * Allowing both START and RESTART option for CREATE SEQUENCE * could override the START value and cause confusion to user. Hence, * we throw an error for CREATE SEQUENCE if RESTART option is * specified; it can only be used with ALTER SEQUENCE. */ just a thought. thanks! - Cary Huang HighGo Software Canada www.highgo.ca
Re: postgres_fdw: Handle boolean comparison predicates
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hello I tried to apply the patch to master branch and got a couple of errors, so I think the patch needs a rebase. I also agree with Ashutosh that the "IS NOT TRUE" case can be simplified to just "IS FALSE". it's simpler to understand. based on this, I think we should restructure the switch-case statement in deparseBooleanTest because some of the cases in there evaluate to the same result but handles differently. For example, "IS TRUE" and "IS NOT FALSE" both evaluate to true, so can be handled in the same way something like: switch (node->booltesttype) { case IS_TRUE: case IS_NOT_FALSE: appendStringInfoChar(buf, '('); deparseExpr(node->arg, context); appendStringInfoString(buf, ")"); break; case IS_FALSE: case IS_NOT_TRUE: appendStringInfoChar(buf, '('); deparseExpr(node->arg, context); appendStringInfoString(buf, " IS FALSE)"); break; case IS_UNKNOWN: appendStringInfoChar(buf, '('); deparseExpr(node->arg, context); appendStringInfoString(buf, " IS NULL)"); break; case IS_NOT_UNKNOWN: appendStringInfoChar(buf, '('); deparseExpr(node->arg, context); appendStringInfoString(buf, " IS NOT NULL)"); break; } just a thought thanks! --- Cary Huang HighGo Software Canada www.highgo.ca
Re: sslinfo extension - add notbefore and notafter timestamps
> Thanks for the new version! It doesn't fail the ssl tests, but the Kerberos > test now fails. You can see the test reports from the CFBot here: Yes, kerberos tests failed due to the addition of notbefore and notafter values. The values array within "pg_stat_get_activity" function related to "pg_stat_gssapi" were not set correctly. It is now fixed > This runs on submitted patches, you can also run the same CI checks in your > own > Github clone using the supplied CI files in the postgres repo. Thank you for pointing this out. I followed the CI instruction as suggested and am able to run the same CI checks to reproduce the test failures. > There are also some trivial whitespace issues shown with "git diff --check", > these can of course easily be addressed by a committer in a final-version > patch > but when sending a new version you might as well fix those. Yes, the white spaces issues should be addressed in the attached patches. > X509_getm_notBefore() and X509_getm_notAfter() are only available in OpenSSL > 1.1.1 and onwards, but postgres support 1.0.2 (as of today with 8e278b6576). > X509_get_notAfter() is available in 1.0.2 but deprecated in 1.1.1 and turned > into an alias for X509_getm_notAfter() (same with _notBefore of course), and > since we set 1.0.2 as the API compatibility we should be able to use that > without warnings instead. Thank you so much for catching this openssl function compatibility issue. I have changed the function calls to: - X509_get_notBefore() - X509_get_notAfter() which are compatible in OpenSSL v1.0.2 and also v1.1.1 where they will get translated to X509_getm_notBefore() and X509_getm_notAfter() respectively > These functions should IMO return timestamp data types to save the user from > having to convert them. Same with the additions to pg_stat_get_activity. Yes, agreed, the attached patches have the output changed to timestamp datatype instead of text. > You should add tests for the new functions in src/test/ssl/t/003_sslinfo.pl. Yes, agreed, I added 2 additional tests in src/test/ssl/t/003_sslinfo.pl to compare the notbefore and notafter outputs from sslinfo extension and pg_stat_ssl outputs. Both should be tested equal. Also added related documentation about the new not before and not after timestamps in pg_stat_ssl. thank you Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v4-0001-sslinfo-add-notbefore-and-notafter-timestamps.patch Description: Binary data v4-0002-pg-stat-ssl-add-notbefore-and-notafter-timestamps.patch Description: Binary data
Re: sslinfo extension - add notbefore and notafter timestamps
Hello > The way we typically ship extensions in contrib/ is to not create a new base > version .sql file for smaller changes like adding a few functions. For this > patch we should keep --1.2.sql and instead supply a 1.2--1.3.sql with the new > functions. Thank you for pointing this out. It makes sense to me. > +errmsg("failed to convert tm to timestamp"))); > > I think this error is too obscure for the user to act on, what we use > elsewhere > is "timestamp out of range" and I think thats more helpful. I do wonder if > there is ever a legitimate case when this can fail while still having a > authenticated client connection? My bad here, you are right. "timestamp out of range" is a much better error message. However, in an authenticated client connection, there should not be a legitimate case where the not before and not after can fall out of range. The "not before" and "not after" timestamps in a X509 certificate are normally represented by ASN1, which has maximum timestamp of December 31, . The timestamp data structure in PostgreSQL on the other hand can support year up to June 3, 5874898. Assuming the X509 certificate is generated correctly and no data corruptions happening (which is unlikely), the conversion from ASN1 to timestamp shall not result in out of range error. Perhaps calling "tm2timestamp(&pgtm_time, 0, NULL, &ts)" without checking the return code would be just fine. I see some other usages of tm2timstamp() in other code areas also skip checking the return code. > I have addressed the issues above in a new v5 patchset which includes a new > patch for setting stable validity on the test certificates (the notBefore > time > was arbitrarily chosen to match the date of opening up the tree for v17 - we > just need a date in the past). Your two patches are rolled into a single one > with a commit message added to get started on that part as well. thank you so much for addressing the ssl tests to make "not before" and "not after" timestamps static in the test certificate and also adjusting 003_sslinfo.pl to expect the new static timestamps in the v5 patches. I am able to apply both and all tests are passing. I did not know this test certificate could be changed by `cd src/test/ssl && make -f sslfiles.mk`, but now I know, thanks to you :p. Best regards Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: sslinfo extension - add notbefore and notafter timestamps
Hello > > Perhaps calling "tm2timestamp(&pgtm_time, 0, NULL, &ts)" without checking > > the return code would be just fine. I see some other usages of > > tm2timstamp() in other code areas also skip checking the return code. > > I think we want to know about any failures, btu we can probably make it into > an > elog() instead, as it should never fail. Yes, sure. I have corrected the error message to elog(ERROR, "timestamp out of range") on a rare tm2timestamp() failure. Please see the attached patch based on your v5. "v6-0001-Set-fixed-dates-for-test-certificates-validity.patch" is exactly the same as "v5-0001-Set-fixed-dates-for-test-certificates-validity.patch", I just up the version to be consistent. thank you very much Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v6-0001-Set-fixed-dates-for-test-certificates-validity.patch Description: Binary data v6-0002-Add-notBefore-and-notAfter-to-SSL-cert-info-displ.patch Description: Binary data
Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
Hello I had a look at the patch and tested it on CI bot, it compiles and tests fine both autoconf and meson. I noticed that the v2 patch contains the v1 patch file as well. Not sure if intended but put there my accident. > I don't think "invalidating" is the right terminology. Note that we already > have InvalidateBuffer() - but it's something we can't allow users to do, as > it > throws away dirty buffer contents (it's used for things like dropping a > table). > > How about using "discarding" for this functionality? I think "invalidating" is the right terminology here, it is exactly what the feature is doing, it tries to invalidate a buffer ID by calling InvalidateBuffer() routine inside buffer manager and calls FlushBuffer() before invalidating if marked dirty. The problem here is that InvalidateBuffer() could be dangerous because it allows a user to invalidate buffer that may have data in other tables not owned by the current user, I think it all comes down to the purpose of this feature. Based on the description in this email thread, I feel like this feature should be categorized as a developer-only feature, to be used by PG developer to experiment and observe some development works by invalidating one more more specific buffers. If this is the case, it may be helpful to add a "DEVELOPER_OPTIONS" in GUC, which allows or disallows the TryInvalidateBuffer() to run or to return error if user does not have this developer option enabled. If the purpose of this feature is for general users, then it would make sense to have something like pg_unwarm (exactly opposite of pg_prewarm) that takes table name (instead of buffer ID) and drop all buffers associated with that table name. There will be permission checks as well so a user cannot pg_unwarm a table owned by someone else. User in this case won't be able to invalidate a particular buffer, but he/she should not have to as a regular user anyway. thanks! Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hello Thank you for the patch and the effort to enhance \d+ 's output on partitioned tables that contain sub-partitions. However, the patch does not apply and I notice that this patch is generated as a differ file from 2 files, describe.c and describe_change.c. You should use git diff to generate a patch rather than maintaining 2 files yourself. Also I noticed that you include a "create_object.sql" file to illustrate the feature, which is not necessary. Instead, you should add them as a regression test cases in the existing regression test suite under "src/test/regress", so these will get run as tests to illustrate the feature. This patch changes the output of \d+ and it could potentially break other test cases so you should fix them in the patch in addition to providing the feature Now, regarding the feature, I see that you intent to print the sub partitions' partitions in the output, which is okay in my opinion. However, a sub-partition can also contain another sub-partition, which contains another sub-partition and so on. So it is possible that sub-partitions can span very, very deep. Your example assumes only 1 level of sub-partitions. Are you going to print all of them out in \d+? If so, it would definitely cluster the output so much that it starts to become annoying. Are you planning to set a limit on how many levels of sub-partitions to print or just let it print as many as it needs? thank you Cary Huang --- Highgo Software Canada www.highgo.ca
Re: [Patch] add multiple client certificate selection feature
Hello I would like to share an updated patch that adds a feature to libpq to automatically select the best client certificate to send to the server (if it requests one). This feature is inspired by this email discussion years ago: https://www.postgresql.org/message-id/200905081539.n48Fdl2Y003286%40no.baka.org, which makes it easier for a single client to communicate TLS with multiple TLS-enabled PostgreSQL servers with different certificate setups. Instead of specifying just one sslcert, sslkey, or sslpassword, this patch allows multiple to be specified and libpq is able to pick the matching one to send to the PostgreSQL server based on the trusted CA names sent during TLS handshake. If anyone finds it useful and would like to give it as try, I wrote a blog on how to test and verify this feature here: https://www.highgo.ca/2024/03/28/procedure-to-multiple-client-certificate-feature/ thank you Best regards Cary Huang v3-0001-multiple_client_certificate_selection_support.patch Description: Binary data
Support tid range scan in parallel?
Hello When using ctid as a restriction clause with lower and upper bounds, PostgreSQL's planner will use TID range scan plan to handle such query. This works and generally fine. However, if the ctid range covers a huge amount of data, the planner will not use parallel worker to perform ctid range scan because it is not supported. It could, however, still choose to use parallel sequential scan to complete the scan if ti costs less. In one of our migration scenarios, we rely on tid range scan to migrate huge table from one database to another once the lower and upper ctid bound is determined. With the support of parallel ctid range scan, this process could be done much quicker. The attached patch is my approach to add parallel ctid range scan to PostgreSQL's planner and executor. In my tests, I do see an increase in performance using parallel tid range scan over the single worker tid range scan and it is also faster than parallel sequential scan covering similar ranges. Of course, the table needs to be large enough to reflect the performance increase. below is the timing to complete a select query covering all the records in a simple 2-column table with 40 million records, - tid range scan takes 10216ms - tid range scan with 2 workers takes 7109ms - sequential scan with 2 workers takes 8499ms Having the support for parallel ctid range scan is definitely helpful in our migration case, I am sure it could be useful in other cases as well. I am sharing the patch here and if someone could provide a quick feedback or review that would be greatly appreciated. Thank you! Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca v1-0001-add-parallel-tid-rangescan.patch Description: Binary data
Re: Logging which interface was connected to in log_line_prefix
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi I did a quick test on this patch and it seems to work as expected. Originally I thought the patch would add the name of "local interface" such as "eth0", "eth1", "lo"... etc as %L log prefix format. Instead, it formats the local interface IP addresses , but I think it is fine too. I have tested this new addition with various types of IPs including IPv4, IPv4 and IPv6 local loop back addresses, global IPv6 address, linked local IPv6 address with interface specifier, it seems to format these IPs correctly There is a comment in the patch that states: /* We do not need clean_ipv6_addr here: just report verbatim */ I am not quite sure what it means, but I am guessing it means that the patch does not need to format the IPv6 addresses in any specific way. For example, removing leading zeros or compressing consecutive zeros to make a IPv6 address shorter. It may not be necessary to indicate this in a comment because In my test, if any of my interface's IPv6 address have consecutive zeroes like this: 2000::::::200:cafe/64, my network driver (Ubuntu 18.04) will format it as 2000::200:cafe, and the patch of course will read it as 2000::200:cafe, which is ... correct and clean. thank you Cary Huang www.highgo.ca
Re: Support tid range scan in parallel?
Range Scan on test (cost=0.01..511262.43 rows=28603575 width=4) (actual time=0.332..4906.262 rows=3272 loops=3) TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid)) Planning Time: 0.213 ms Execution Time: 12265.873 ms (7 rows) [parallel seq scan with 2 workers] set enable_tidscan = 'off'; EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= '(540540,100)'; QUERY PLAN --- Gather (cost=0.00..969595.42 rows=68648581 width=4) (actual time=4.489..9713.299 rows=9815 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test (cost=0.00..969595.42 rows=28603575 width=4) (actual time=0.995..5541.178 rows=3272 loops=3) Filter: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid)) Rows Removed by Filter: 62 Planning Time: 0.129 ms Execution Time: 12675.681 ms (8 rows) Best regards Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: Support tid range scan in parallel?
> This isn't a complete review. It's just that this seems enough to keep > you busy for a while. I can look a bit harder when the patch is > working correctly. I think you should have enough feedback to allow > that now. Thanks for the test, review and feedback. They are greatly appreciated! I will polish the patch some more following your feedback and share new results / patch when I have them. Thanks again! Cary
Re: Support tid range scan in parallel?
lt; '(216216,40)'; QUERY PLAN Aggregate (cost=716221.63..716221.64 rows=1 width=8) (actual time=12931.695..12931.696 rows=1 loops=1) Buffers: shared read=216217 I/O Timings: shared read=599.331 -> Tid Range Scan on test (cost=0.01..616221.31 rows=4130 width=4) (actual time=0.079..6800.482 rows=3999 loops=1) TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid)) Buffers: shared read=216217 I/O Timings: shared read=599.331 Planning: Buffers: shared hit=1 read=2 I/O Timings: shared read=0.124 Planning Time: 0.917 ms Execution Time: 12932.348 ms (12 rows) ===parallel tid rangescan with aggregate=== set max_parallel_workers_per_gather=2; EXPLAIN (ANALYZE, BUFFERS) select count(a) from test where ctid >= '(0,0)' and ctid < '(216216,40)'; QUERY PLAN - Finalize Aggregate (cost=298425.70..298425.71 rows=1 width=8) (actual time=4842.512..4847.863 rows=1 loops=1) Buffers: shared read=216217 I/O Timings: shared read=1155.321 -> Gather (cost=298425.68..298425.69 rows=2 width=8) (actual time=4842.020..4847.851 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared read=216217 I/O Timings: shared read=1155.321 -> Partial Aggregate (cost=298425.68..298425.69 rows=1 width=8) (actual time=4824.730..4824.731 rows=1 loops=3) Buffers: shared read=216217 I/O Timings: shared read=1155.321 -> Parallel Tid Range Scan on test (cost=0.01..256758.88 rows=1721 width=4) (actual time=0.098..2614.108 rows=1333 loops=3) TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid)) Buffers: shared read=216217 I/O Timings: shared read=1155.321 Planning: Buffers: shared read=3 I/O Timings: shared read=3.323 Planning Time: 4.124 ms Execution Time: 4847.992 ms (20 rows) Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v2-0001-add-parallel-tid-rangescan.patch Description: Binary data
Re: Support tid range scan in parallel?
63..0.565 rows=1 loops=4) Buffers: shared hit=11 -> Parallel Tid Range Scan on test (cost=0.01..11.08 rows=656 width=0) (actual time=0.018..0.338 rows=509 loops=4) TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid)) Buffers: shared hit=11 thank you! Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v3-0001-add-parallel-tid-rangescan.patch Description: Binary data
Re: Support tid range scan in parallel?
> I've not looked at the patch, but please add it to the July CF. I'll > try and look in more detail then. Thanks David, I have added this patch on July commitfest under the server feature category. I understand that the regression tests for parallel ctid range scan is a bit lacking now. It only has a few EXPLAIN clauses to ensure parallel workers are used when tid ranges are specified. They are added as part of select_parallel.sql test. I am not sure if it is more appropriate to have them as part of tidrangescan.sql test instead. So basically re-run the same test cases in tidrangescan.sql but in parallel? thank you Cary
Re: Support tid range scan in parallel?
> You should add a test that creates a table with a very low fillfactor, > low enough so only 1 tuple can fit on each page and insert a few dozen > tuples. The test would do SELECT COUNT(*) to ensure you find the > correct subset of tuples. You'd maybe want MIN(ctid) and MAX(ctid) in > there too for extra coverage to ensure that the correct tuples are > being counted. Just make sure and EXPLAIN (COSTS OFF) the query first > in the test to ensure that it's always testing the plan you're > expecting to test. thank you for the test suggestion. I moved the regress tests from select_parallel to tidrangescan instead. I follow the existing test table creation in tidrangescan with the lowest fillfactor of 10, I am able to get consistent 5 tuples per page instead of 1. It should be okay as long as it is consistently 5 tuples per page so the tuple count results from parallel tests would be in multiples of 5. The attached v4 patch includes the improved regression tests. Thank you very much! Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v4-0001-add-parallel-tid-rangescan.patch Description: Binary data
Re: Serverside SNI support in libpq
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested This is an interesting feature on PostgreSQL server side where it can swap the certificate settings based on the incoming hostnames in SNI field in client hello message. I think this patch resonate with a patch I shared awhile ago ( https://commitfest.postgresql.org/48/4924/ ) that adds multiple certificate support on the libpq client side while this patch adds multiple certificate support on the server side. My patch allows user to supply multiple certs, keys, sslpasswords in comma separated format and the libpq client will pick one that matches the CA issuer names sent by the server. In relation with your patch, this CA issuer name would match the CA certificate configured in pg_hosts.cfg. I had a look at the patch and here's my comments: + +PostgreSQL can be configured for +SNI using the pg_hosts.conf +configuration file. PostgreSQL inspects the TLS +hostname extension in the SSL connection handshake, and selects the right +SSL certificate, key and CA certificate to use for the connection. + pg_hosts should also have sslpassword_command just like in the postgresql.conf in case the sslkey for a particular host is encrypted with a different password. + /* +* Install SNI TLS extension callback in case the server is configured to +* validate hostnames. +*/ + if (ssl_snimode != SSL_SNIMODE_OFF) + SSL_CTX_set_tlsext_servername_callback(context, sni_servername_cb); If libpq client does not provide SNI, this callback will not be called, so there is not a chance to check for a hostname match from pg_hosts, swap the TLS CONTEXT, or possibly reject the connection even in strict mode. The TLS handshake in such case shall proceed and server will use the certificate specified in postgresql.conf (if these are loaded) to complete the handshake with the client. There is a comment in the patch that reads: > - strict: only pg_hosts.conf is loaded and the TLS extension hostname > MUST be passed and MUST have a match in the configuration, else the > connection is refused. I am not sure if it implies that if ssl_snimode is strict, then the normal ssl_cert, ssl_key and ca_cert…etc settings in postgresql.conf are ignored? thank you Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
sslinfo extension - add notbefore and notafter timestamps
Hello I noticed that sslinfo extension does not have functions to return current client certificate's notbefore and notafter timestamps which are also quite important attributes in a X509 certificate. The attached patch adds 2 functions to get notbefore and notafter timestamps from the currently connected client certificate. thank you! Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca v1-0001-sslinfo-add-notbefore-and-notafter-timestamps.patch Description: Binary data
Re: Mark a transaction uncommittable
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Hello It is one of those features that a handful of people would find useful in specific user cases. I think it is a nice to have feature that safeguards your production database against unwanted commits when troubleshooting production problems. Your patch applies fine on master and I am able to run a couple tests on it and it seems to do as described. I noticed that the patch has a per-session variable "default_transaction_committable" that could make all transaction committable or uncommittable on the session even without specifying "begin transaction not committable;" I am wondering if we should have a configurable default at all as I think it should always defaults to true and unchangable. If an user wants a uncommittable transaction, he/she will need to explicitly specify that during "begin". Having another option to change default behavior for all transactions may be a little unsafe, it is possible someone could purposely change this default to false on a production session that needs transactions to absolutely commit, causing damages there. thank you Cary Huang -- Highgo Software Canada www.highgo.ca
Re: sslinfo extension - add notbefore and notafter timestamps
> Off the cuff that doesn't seem like a bad idea, but I wonder if we should add > them to pg_stat_ssl (or both) instead if we deem them valuable? I think the same information should be available to pg_stat_ssl as well. pg_stat_ssl can show the client certificate information for all connecting clients, having it to show not_before and not_after timestamps of every client are important in my opinion. The attached patch "v2-0002-pg-stat-ssl-add-notbefore-and-notafter-timestamps.patch" adds this support > Re the patch, it would be nice to move the logic in ssl_client_get_notafter > and > the _notbefore counterpart to a static function since they are copies of > eachother. Yes agreed. I have made the adjustment attached as "v2-0001-sslinfo-add-notbefore-and-notafter-timestamps.patch" would this feature be suitable to be added to commitfest? What do you think? thank you Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v2-0001-sslinfo-add-notbefore-and-notafter-timestamps.patch Description: Binary data v2-0002-pg-stat-ssl-add-notbefore-and-notafter-timestamps.patch Description: Binary data
Re: sslinfo extension - add notbefore and notafter timestamps
> Yes, please add it to the July commitfest and feel free to set me as > Reviewer, > I intend to take a look at it. Thank you Daniel, I have added this patch to July commitfest under security category and added you as reviewer. best regards Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: sslinfo extension - add notbefore and notafter timestamps
> This needs to adjust the tests in src/test/ssl which now fails due to SELECT > * > returning a row which doesn't match what the test was coded for. Thank you so much for pointing out. I have adjusted the extra ssl test to account for the extra columns returned. It should not fail now. > The new patchset isn't updating contrib/sslinfo/meson with the 1.3 update so > it > fails to build with Meson. Thanks again for pointing out, I have adjusted the meson build file to include the 1.3 update Please see attached patches for the fixes. Thank you so much! Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v3-0001-sslinfo-add-notbefore-and-notafter-timestamps.patch Description: Binary data v3-0002-pg-stat-ssl-add-notbefore-and-notafter-timestamps.patch Description: Binary data
Re: [Patch] remove duplicated smgrclose
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi the patch looks good to me as well. Calling smgrclose() right after calling smgrdounlinkall() does seem unnecessary as it is already done inside smgrdounlinkall() as you mentioned. I checked the commit logs and seems like the code has been like this for over 10 years. One difference is that smgrdounlinkall() does not reset smgr_cached_nblocks and smgr_targblock but that does not seem to matter as it is about to remove the physical files. While leaving them like this does no harm because smgrclose() simply does nothing if the relation has already been closed, it does look weird that the code tries to close the relation after smgrdounlinkall(), because the physical files have just been deleted when smgrdounlinkall() completes, and we try to close something that has been deleted ?! --- Cary Huang Highgo software Canada www.highgo.ca
Re: Internal key management system
Hi all Having read through the discussion, I have some comments and suggestions that I would like to share. I think it is still quite early to even talk about external key management system even if it is running on the same host as PG. This is most likely achieved as an extension that can provide communication to external key server and it would be a separate project/discussion. I think the focus now is to finalize on the internal KMS design, and we can discuss about how to migrate internally managed keys to the external when the time is right. Key management system is generally built to manage the life cycle of cryptographic keys, so our KMS in my opinion needs to be built with key life cycle in mind such as: * Key generation * key protection * key storage * key rotation * key rewrap * key disable/enable * key destroy KMS should not perform the above life cycle management by itself automatically or hardcoded, instead it should expose some interfaces to the end user or even a backend process like TDE to trigger the above. The only key KMS should manage by itself is the KEK, which is derived from cluster passphrase value. This is fine in my opinion. This KEK should exist only within KMS to perform key protection (by wrapping) and key storage (save as file). The other life cycle stages should be just left as interfaces, waiting for somebody to request KMS to perform. Somebody could be end user or back end process like TDE. Using TDE as example, when TDE initializes, it calls KMS's key_generation interface to get however many keys that it needs, KMS should not return the keys in clear text of hex, it can return something like a key ID. Using regular user as example, each user can also call KMS's key_generation interface to create a cryptographic key for their own purpose, KMS should also return just an key ID and this key should be bound to the user and we can limit that each user can have only one key managed, and regular user can only manage his/her own key with KMS, rotate, destroy, disable..etc; he/she cannot manage others' keys super user (or key admin), however, can do all kinds of management to all keys, (generated by TDE or by other users). He or she can do key rotation, key rewrap, disable or destroy. Here we will need to think about how to prevent this user from misusing the key management functions. Super user should also be able to view the status of all the keys managed, information such as: * date of generation * key ID * owner * status * key length * suggested date of rotation..etc etc * expiry date?? to actually perform the encryption with keys managed by internal KMS, I suggest adding a set of wrapper functions within KMS using the Key ID as input argument. So for example, TDE wants to encrypt some data, it will call KMS's wrapper encryption function with Key ID supplied, KMS looked up the key with key ID ,verify caller's permission and translate these parameters and feed to pgcrypto for example. This may be a little slower due to the lookup, but we can have a variation of the function where KMS can look up the key with supplied key ID and convert it to encryption context and return it back to TDE. Then TDE can use this context to call another wrapper function for encryption without lookup all the time. If an end user wants to encrypt something, it will also call KMS's wrapper function and supply the key ID in the same way. I know that there is a discussion on moving the cryptographic functions as extension. In an already running PG, it is fine, but when TDE and XLOG bootstraps during initdb, it requires cryptographic function to encrypt the initial WAL file and during initdb i dont think it has access to any cryptographic function provided by the extension. we may need to include limited cryptographic function within KMS and TDE so it is enough to finish initdb with intial WAl encrypted. This is just my thought how this KMS and TDE should look like. Best Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Tue, 16 Jun 2020 23:52:03 -0700 Masahiko Sawada wrote On Sun, 14 Jun 2020 at 19:39, Fabien COELHO <mailto:coe...@cri.ensmp.fr> wrote: > > > Hello Masahiko-san, > > >>> * The external place needs to manage more encryption keys than the > >>> current patch does. > >> > >> Why? If the external place is just a separate process on the same host, > >> probably it would manage the very same amount as what your patch. > > > > In the current patch, the external place needs to manage only one key > > whereas postgres needs to manages multiple DEKs. But with your idea, > > the external place needs to manage both KEK and DEKs. > > Hmmm. I do not see a good use case for a "managem
Re: pg_dump --where option
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hi I had a look at the patch and it cleanly applies to postgres master branch. I tried to do a quick test on the new "where clause" functionality and for the most part it does the job as described and I'm sure some people will find this feature useful to their database dump needs. However I tried the feature with a case where I have a subquery in the where clause, but it seems to be failing to dump the data. I ran the pg_dump like: $ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > testdump2 $ pg_dump: error: processing of table "public.test1" failed both test1 and test2 exist in the database and the same subquery works under psql. I also notice that the regression tests for pg_dump is failing due to the patch, I think it is worth looking into the failure messages and also add some test cases on the new "where" clause to ensure that it can cover as many use cases as possible. thank you Best regards Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: Internal key management system
ions keys and store the new results in pg_cryptokeys directory. To rotate the cluster passphrase, user firstly needs to update cluster_passphrase_command in the postgresql.conf and then execute pg_rotate_cluster_passphrase() SQL function to initiate the rotation. Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Mon, 30 Mar 2020 21:30:19 -0700 Masahiko Sawada <mailto:masahiko.saw...@2ndquadrant.com> wrote On Tue, 31 Mar 2020 at 09:36, Cary Huang <mailto:cary.hu...@highgo.ca> wrote: > > Hi > I had a look on kms_v9 patch and have some comments > > --> pg_upgrade.c > keys are copied correctly, but as pg_upgrade progresses further, it will try > to start the new_cluster from "issue_warnings_and_set_wal_level()" function, > which is called after key copy. The new cluster will fail to start due to the > mismatch between cluster_passphrase_command and the newly copied keys. This > causes pg_upgrade to always finish with failure. We could move > "copy_master_encryption_key()" to be called after > "issue_warnings_and_set_wal_level()" and this will make pg_upgrade to finish > with success, but user will still have to manually correct the > "cluster_passphrase_command" param on the new cluster in order for it to > start up correctly. Should pg_upgrade also take care of copying > "cluster_passphrase_command" param from old to new cluster after it has > copied the encryption keys so users don't have to do this step? If the > expectation is for users to manually correct "cluster_passphrase_command" > param after successful pg_upgrade and key copy, then there should be a > message to remind the users to do so. I think both the old cluster and the new cluster must be initialized with the same passphrase at initdb. Specifying the different passphrase command to the new cluster at initdb and changing it after pg_upgrade doesn't make sense. Also I don't think we need to copy cluster_passphrase_command same as other GUC parameters. I've changed the patch so that pg_upgrade copies the crypto keys only if both new and old cluster enable the key management. User must specify the same passphrase command to both old and new cluster, which is not cumbersome, I think. I also added the description about this to the doc. > > -->Kmgr.c > + /* > + * If there is only temporary directory, it means that the previous > + * rotation failed after wrapping the all internal keys by the new > + * passphrase. Therefore we use the new cluster passphrase. > + */ > + if (stat(KMGR_DIR, &st) != 0) > + { > + ereport(DEBUG1, > + (errmsg("both directories %s and %s exist, use the newly wrapped keys", > + KMGR_DIR, KMGR_TMP_DIR))); > > I think the error message should say "there is only temporary directory > exist" instead of "both directories exist" You're right. Fixed. I've attached the new version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Include sequence relation support in logical replication
Hi Craig, Andres Thank you guys so much for your reviews and comments. Really helpful. Yes you guys are right, Sequence does not guarantee free of gaps and replicating sequence is useful for failover cases, then there will be no problem for a subscriber to get a future value 32 increments after. I will do more analysis on my end based on your comments and refine the patch with better test cases. Much appreciated of your help. Best regards Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Wed, 15 Apr 2020 22:18:28 -0700 Craig Ringer wrote On Thu, 16 Apr 2020 at 07:44, Andres Freund <mailto:and...@anarazel.de> wrote: > I would like to ask if you have some suggestions or ideas that can make > subscriber receives the current value without the need to > > disabling the 32 increment behavior? It simply shouldn't expect that to be the case. What do you need it for? As far as I can tell replicating sequence values is useful to allow failover, by ensuring all sequences will return sensible values going forward. That does not require to now precise values. Totally agree. Code that relies on getting specific sequence values is broken code. Alas, very common, but still broken. Cary, by way of background a large part of why this wasn't supported by logical decoding back in 9.4 is that until the pg_catalog.pg_sequence relation was introduced in PostgreSQL 10, the sequence relfilenode intermixed a bunch of transactional and non-transactional state in a very messy way. This made it very hard to achieve sensible behaviour for logical decoding. As it is, make sure your regression tests carefully cover the following cases, as TAP tests in src/test/recovery, probably a new module for logical decoding of sequences: 1. * Begin txn * Create sequence * Call nextval() on sequence over generate_series() and discard results * Rollback * Issue a dummy insert+commit to some other table to force logical decoding to send something * Ensure subscription catches up successfully This checks that we cope with advances for a sequence that doesn't get created. 2. * Begin 1st txn * Create a sequence * Use the sequence to populate a temp table with enough rows to ensure sequence updates are written * Begin a 2nd txn * Issue a dummy insert+commit to some other table to force logical decoding to send something * Commit the 2nd txn * Commit the 1st txn * Wait for subscription catchup * Check that the sequence value on the subscriber reflects the value after sequence advance, not the value at creation time This makes sure that sequence advances are handled sensibly when they arrive for a sequence that does not yet exist in the catalogs. You'll need to run psql in an IPC::Run background session for that. We should really have a helper for this. I'll see if I'm allowed to post the one I use for my own TAP tests to the list.
Re: Internal key management system
Hi all I am sharing here a document patch based on top of kms_v10 that was shared awhile back. This document patch aims to cover more design details of the current KMS design and to help people understand KMS better. Please let me know if you have any more comments. thank you Best regards Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Tue, 07 Apr 2020 20:56:12 -0700 Ahsan Hadi <mailto:ahsan.h...@gmail.com> wrote Hi Bruce/Joe, In the last meeting we discussed the need for improving the documentation for KMS so it is easier to understand the interface. Cary from highgo had a go at doing that, please see the previous email on this thread from Cary and let us know if it looks good...? -- Ahsan On Wed, Apr 8, 2020 at 3:46 AM Cary Huang <mailto:cary.hu...@highgo.ca> wrote: -- Highgo Software (Canada/China/Pakistan) URL : http://www.highgo.ca/ ADDR: 10318 WHALLEY BLVD, Surrey, BC EMAIL: mailto: Hello Thanks a lot for the patch, I think in terms of functionality, the patch provides very straightforward functionalities regarding key management. In terms of documentation, I think the patch is still lacking some pieces of information that kind of prevent people from fully understanding how KMS works and how it can be used and why, (at least that is the impression I got from the zoom meeting recordings :p). I spent some time today revisiting the key-management documentation in the patch and rephrase and restructure it based on my current understanding of latest KMS design. I mentioned all 3 application level keys that we have agreed and emphasize on explaining the SQL level encryption key because that is the key that can be used right now. Block and WAL levels keys we can add here more information once they are actually used in the TDE development. Please see below the KMS documentation that I have revised and I hope it will be more clear and easier for people to understand KMS. Feel free to make adjustments. Please note that we use the term "wrap" and "unwrap" a lot in our past discussions. Originally we used the terms within a context involving Key encryption keys (KEK). For example, "KMS wraps a master key with KEK". Later, we used the same term in a context involving encrypting user secret /password. For example, "KMS wraps a user secret with SQL key". In my opinion, both make sense but it may be confusing to people having the same term used differently. So in my revision below, the terms "wrap" and "unwrap" refer to encrypting or decrypting user secret / password as they are used in "pg_wrap() and pg_unwrap()". I use the terms "encapsulate" and "restore" when KEK is used to encrypt or decrypt a key. Chapter 32: Encryption Key Management -- PostgreSQL supports internal Encryption Key Management System, which is designed to manage the life cycles of cryptographic keys within the PostgreSQL system. This includes dealing with their generation, storage, usage and rotation. Encryption Key Management is enabled when PostgreSQL is build with --with-openssl and cluster passphrase command is specified during initdb. The cluster passphrase provided by --cluster-passphrase-command option during initdb and the one generated by cluster_passphrase_command in the postgresql.conf must match, otherwise, the database cluster will not start up. 32.1 Key Generations and Derivations -- When cluster_passphrase_command option is specified to the initdb, the process will derive the cluster passphrase into a Key Encryption Key (KEK) and a HMAC Key using key derivation protocol before the actual generation of application level cryptographic level keys. -Key Encryption Key (KEK) KEK is primarily used to encapsulate or restore a given application level cryptographic key -HMAC Key HMAC key is used to compute the HASH of a given application level cryptographic key for integrity check purposes These 2 keys are not stored physically within the PostgreSQL cluster as they are designed to be derived from the correctly configured cluster passphrase. Encryption Key Management System currently manages 3 application level cryptographic keys that have different purposes and usages within the PostgreSQL system and these are generated using pg_strong_random() after KEK and HMAC key derivation during initdb process. The 3 keys are: -SQL Level Key SQL Level Key is used to wrap and unwrap a user secret / passphrase via pg_wrap() and pg_unwrap() SQL functions. These 2 functions are designed to be used in conjunction with the cryptographic functions provided by pgcrypto extension to perform column level encryption/decryption without having to supply a clear
Re: Include sequence relation support in logical replication
Hi Craig I have added more regression test cases to the sequence replication patch with emphasis on transactions and rollback per your suggestions. I find that when a transaction is aborted with rollback, the decoder plugin will not receive the change but the sequence value will in fact advance if nextval() or setval() were called. I have also made sequence replication an optional parameter in test_decoding so other test_decoding regression test cases will not need an update due to the new sequence replication function. The sequence update in this patch will emit an wal update every 32 increment, and each update is a future value 32 increments after like it was originally, so it is no longer required getting a specific value of sequence. Could you elaborate more on your second case where it requires a psql in an IPC::Run background session and check if regression needs more coverage on certain cases? thank you! Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Thu, 16 Apr 2020 09:45:06 -0700 Cary Huang <mailto:cary.hu...@highgo.ca> wrote Hi Craig, Andres Thank you guys so much for your reviews and comments. Really helpful. Yes you guys are right, Sequence does not guarantee free of gaps and replicating sequence is useful for failover cases, then there will be no problem for a subscriber to get a future value 32 increments after. I will do more analysis on my end based on your comments and refine the patch with better test cases. Much appreciated of your help. Best regards Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Wed, 15 Apr 2020 22:18:28 -0700 Craig Ringer <mailto:cr...@2ndquadrant.com> wrote On Thu, 16 Apr 2020 at 07:44, Andres Freund <mailto:and...@anarazel.de> wrote: > I would like to ask if you have some suggestions or ideas that can make > subscriber receives the current value without the need to > > disabling the 32 increment behavior? It simply shouldn't expect that to be the case. What do you need it for? As far as I can tell replicating sequence values is useful to allow failover, by ensuring all sequences will return sensible values going forward. That does not require to now precise values. Totally agree. Code that relies on getting specific sequence values is broken code. Alas, very common, but still broken. Cary, by way of background a large part of why this wasn't supported by logical decoding back in 9.4 is that until the pg_catalog.pg_sequence relation was introduced in PostgreSQL 10, the sequence relfilenode intermixed a bunch of transactional and non-transactional state in a very messy way. This made it very hard to achieve sensible behaviour for logical decoding. As it is, make sure your regression tests carefully cover the following cases, as TAP tests in src/test/recovery, probably a new module for logical decoding of sequences: 1. * Begin txn * Create sequence * Call nextval() on sequence over generate_series() and discard results * Rollback * Issue a dummy insert+commit to some other table to force logical decoding to send something * Ensure subscription catches up successfully This checks that we cope with advances for a sequence that doesn't get created. 2. * Begin 1st txn * Create a sequence * Use the sequence to populate a temp table with enough rows to ensure sequence updates are written * Begin a 2nd txn * Issue a dummy insert+commit to some other table to force logical decoding to send something * Commit the 2nd txn * Commit the 1st txn * Wait for subscription catchup * Check that the sequence value on the subscriber reflects the value after sequence advance, not the value at creation time This makes sure that sequence advances are handled sensibly when they arrive for a sequence that does not yet exist in the catalogs. You'll need to run psql in an IPC::Run background session for that. We should really have a helper for this. I'll see if I'm allowed to post the one I use for my own TAP tests to the list. sequence_replication_v3.patch Description: Binary data
Re: Internal key management system
Hi I took a step back today and started to think about the purpose of internal KMS and what it is supposed to do, and how it compares to external KMS. Both are intended to manage the life cycle of encryptions keys including their generation, protection, storage and rotation. External KMS, on the other hand, is a more centralized but expensive way to manage encryption life cycles and many deployment actually starts with internal KMS and later migrate to external one. Anyhow, the design and implementation of internal KMS should circle around these stages of key lifecycle. 1. Key Generation - Yes, internal KMS module generates keys using pseudo random function, but only the keys for TDE and SQL level keys. Users cannot request new key generation 2. Key Protection - Yes, internal KMS wrap all keys with KEK and HMAC hash derived from a cluster passphrase 3. Key Storage - Yes, the wrapped keys are stored in the cluster 4. Key Rotation - Yes, internal KMS has a SQL method to swap out cluster passphrase, which rotates the KEK and HMAC key I am saying this, because I want to make sure we can all agree on the scope of internal KMS. Without clear scope, this KMS development will seem to go on forever. In this patch, the internal KMS exposes pg_encrypt() and pg_decrypt() (was pg_wrap() and pg_unwrap() before) to the user to turn a clear text password into some sort of key material based on the SQL level key generated at initdb. This is used so the user does not have to provide clear text password to pgp_sym_encrypt() provided by pgcrypto extension. The intention is good, I understand, but I don't think it is within the scope of KMS and it is definitely not within the scope of TDE either. Even if the password can be passed into pgp_sym_encrypt() securely by using pg_decrypt() function, the pgp_sym_encrypt() still will have to take this password and derive into an encryption key using algorithms that internal KMS does not manage currently. This kind of defeats the purpose of internal KMS. So simply using pg_encrypt() and pg_decrypt() is not really a solution to pgcrypto's limitation. This should be in another topic/project that is aimed to improve pgcrypto by integrating it with the internal KMS, similar to TDE where it also has to integrate with the internal KMS later. So for internal KMS, the only cryptographic functions needed for now is kmgr_wrap_key() and kmgr_unwrap_key() to encapsulate and restore the encryptions keys to satisfy the "key protection" life cycle stage. I don't think pg_encrypt() and pg_decrypt() should be part of internal KMS. Anyways, I have also reviewed the patch and have a few comments below: (1) The ciphering algorithm in my opinion should contain the algorithm name, key length and block cipher mode, which is similar to openssl's definition. Instead of defining a cipher as PG_CIPHER_AES_CBC, and have key length as separate parameter, I would define them as #define PG_CIPHER_AES128_CBC 0 #define PG_CIPHER_AES256_CBC 1 #define PG_CIPHER_AES128_CTR 2 #define PG_CIPHER_AES256_CTR 3 I know PG_CIPHER_AES128_CTR and PG_CIPHER_AES256_CTR are not being used now as these are for the TDE in future, but might as well list them here because this KMS is made to work specifically for TDE as I understand. --- /* * Supported symmetric encryption algorithm. These identifiers are passed * to pg_cipher_ctx_create() function, and then actual encryption * implementations need to initialize their context of the given encryption * algorithm. */ #define PG_CIPHER_AES_CBC0 #define PG_MAX_CIPHER_ID1 --- (2) If the cipher algorithms are defined like (1), then there is no need to pass key length as argument to ossl_cipher_ctx_create() function because it already knows the key length based on the cipher definition. Less argument the better. --- PgCipherCtx * pg_cipher_ctx_create(int cipher, uint8 *key, int klen) { PgCipherCtx *ctx = NULL; if (cipher >= PG_MAX_CIPHER_ID) return NULL; #ifdef USE_OPENSSL ctx = (PgCipherCtx *) palloc0(sizeof(PgCipherCtx)); ctx->encctx = ossl_cipher_ctx_create(cipher, key, klen, true); ctx->decctx = ossl_cipher_ctx_create(cipher, key, klen, false); #endif return ctx; } --- Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Sun, 31 May 2020 23:58:31 -0700 Masahiko Sawada wrote On
Re: Internal key management system
Since the user does not need to know the master secret key used to cipher the data, I don't think we should expose "pg_kmgr_unwrap("")" SQL function to the user at all. The wrapped key "" is stored in control data and it is possible to obtain by malicious user and steal the key by running SELECT pg_kmgr_unwrap(""). Even the user is righteous, it may not be very straightforward for that user to obtain the wrapped key "" to use in the unwrap function. pg_kmgr_(un)wrap function is in discussion because encrypt and decrypt function require the master secret key as input argument. I would suggest using cluster passphrase as input instead of master key, so the user does not have to obtain the master key using pg_kmgr_unwrap("") in order to use the encrypt and decrypt function. The passphrase is in fact not stored anywhere in the system and we have to be careful that this passphrase is not shown in any activity log so instead of: -- INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x')); SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl; it would become: -- INSERT INTO tbl VALUES (pg_encrypt('user data', 'cluster_pass_phrase'); SELECT pg_decrypt(secret_column, 'cluster_pass_phrase') FROM tbl; pg_decrypt will then have to: 1. derive the cluster pass phrase into KEK and HMAC key 2. verify pass phrase by comparing MAC 3. unwrap the key - Sehrope suggests a good approach to make wrap/unwrap function more secure by adding MAC verification and randomed IV instead of default. I think it is good 4. decrypt the data 5. return Using passphrase instead of master key to encrypt and decrypt function will also make front end tool integration simpler, as the front end tool also do not need to know the master key so it does not need to derive KEK or unwrap the key...etc. Not sure if you guys agree? Thanks! Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Thu, 06 Feb 2020 12:30:02 -0800 Robert Haas wrote On Mon, Feb 3, 2020 at 10:18 PM Masahiko Sawada <mailto:masahiko.saw...@2ndquadrant.com> wrote: > > I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what > > prevent users to: > > > >SELECT pg_kmgr_unwrap(''); > > > > so as to recover the key, somehow in contradiction to "allows user to > > encrypt and decrypt data without knowing the actual key". > > I might be missing your point but the above '' is the wrapped key > wrapped by the master key stored in PostgreSQL server. So user doesn't > need to know the raw secret key to encrypt/decrypt the data. Even if a > malicious user gets '' they cannot know the actual secret key > without the master key. pg_kmgr_wrap and pg_kmgr_unwrap are functions > and it's possible for user to know the raw secret key by using > pg_kmgr_unwrap(). The master key stored in PostgreSQL server never be > revealed. I think I have the same confusion as Fabien. Isn't it bad if somebody just runs pg_kmgr_unwrap() and records the return value? Now they've stolen your encryption key, it seems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Internal key management system
Hi I have tried the attached kms_v3 patch and have some comments: 1) In the comments, I think you meant hmac + iv + encrypted data instead of iv + hmac + encrypted data? ---> in kmgr_wrap_key( ): + /* + * Assemble the wrapped key. The order of the wrapped key is iv, hmac and + * encrypted data. + */ 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular cipher context init will both call ossl_aes256_encrypt_init to initialise context for encryption and key wrapping. In ossl_aes256_encrypt_init, the cipher method always initialises to aes-256-cbc, which is ok for keywrap because under CBC block cipher mode, we only had to supply one unique IV as initial value. But for actual WAL and buffer encryption that will come in later, I think the discussion is to use CTR block cipher mode, which requires one unique IV for each block, and the sequence id from WAL and buffer can be used to derive unique IV for each block for better security? I think it would be better to allow caller to decide which EVP_CIPHER to initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others? +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key) +{ + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL)) + return false; + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN)) + return false; + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL)) + return false; + + /* + * Always enable padding. We don't need to check the return + * value as EVP_CIPHER_CTX_set_padding always returns 1. + */ + EVP_CIPHER_CTX_set_padding(ctx, 1); + + return true; +} 3) Following up point 2), I think we should enhance the enum to include not only the Encryption algorithm and key size, but also the block cipher mode as well because having all 3 pieces of information can describe exactly how KMS is performing the encryption and decryption. So when we call "ossl_aes256_encrypt_init", we can include the new enum as input parameter and it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or EVP_aes_256_ctr() for different purposes (key wrapping, or WAL encryption..etc). ---> kmgr.h +/* Value of key_management_cipher */ +enum +{ + KMGR_CIPHER_OFF = 0, + KMGR_CIPHER_AES256 +}; + so it would become +enum +{ + KMGR_CIPHER_OFF = 0, + KMGR_CIPHER_AES256_CBC = 1, + KMGR_CIPHER_AES256_CTR = 2 +}; If you agree with this change, several other places will need to be changed as well, such as "kmgr_cipher_string", "kmgr_cipher_value" and the initdb code 4) the pg_wrap_key and pg_unwrap_key SQL functions defined in kmgr.c I tried these new SQL functions and found that the pg_unwrap_key will produce the original key with 4 bytes less. This is because the result length is not set long enough to accommodate the 4 byte VARHDRSZ header used by the multi-type variable. the len variable in SET_VARSIZE(res, len) should include also the variable header VARHDRSZ. Now it is 4 byte short so it will produce incomplete output. ---> pg_unwrap_key function in kmgr.c + if (!kmgr_unwrap_key(UnwrapCtx, (uint8 *) VARDATA_ANY(data), datalen, + (uint8 *) VARDATA(res), &len)) + ereport(ERROR, + (errmsg("could not unwrap the given secret"))); + + /* + * The size of unwrapped key can be smaller than the size estimated + * before unwrapping since the padding is removed during unwrapping. + */ + SET_VARSIZE(res, len); + PG_RETURN_BYTEA_P(res); I am only testing their functionalities with random key as input data. It is currently not possible for a user to obtain the wrapped key from the server in order to use these wrap/unwrap functions. I personally don't think it is a good idea to expose these functions to user thank you Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Fri, 14 Feb 2020 08:00:45 -0800 Robert Haas wrote On Thu, Feb 6, 2020 at 9:19 PM Masahiko Sawada <mailto:masahiko.saw...@2ndquadrant.com> wrote: > This feature protects data from disk thefts but cannot protect data > from attackers who are able to access PostgreSQL server. In this > design application side still is responsible for managing the wrapped > secret in order to protect it from attackers. This is the same as when > we use pgcrypto now. The difference is that data is safe even if > attackers steal the wrapped secret and the disk. The data cannot be > decrypted either without the passphrase which can be stored to other > key management systems or without accessing postgres server
Re: Internal key management system
Hi I would like to share with you a front end patch based on kms_v4.patch that you have shared, called "kms_v4_fe.patch". The patch integrates front end tool pg_waldump with the KMSv4 and obtain encryption and decryption cipher contexts from the KMS backend. These cipher contexts can then be used in subsequent encryption and decryption operations provided by cipher.h when TDE is enabled. I added two common functions in your kmgr_utils that other front end tools affected by TDE can also use to obtain the cipher contexts. Do let me know if this is how you would envision KMS APIs to be used by a front end. cheers Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Mon, 24 Feb 2020 17:55:09 -0800 Masahiko Sawada wrote On Thu, 20 Feb 2020 at 16:16, Masahiko Sawada <mailto:masahiko.saw...@2ndquadrant.com> wrote: > > On Wed, 19 Feb 2020 at 09:29, Cary Huang <mailto:cary.hu...@highgo.ca> wrote: > > > > Hi > > > > I have tried the attached kms_v3 patch and have some comments: > > > > 1) In the comments, I think you meant hmac + iv + encrypted data instead of > > iv + hmac + encrypted data? > > > > ---> in kmgr_wrap_key( ): > > + /* > > +* Assemble the wrapped key. The order of the wrapped key is iv, > > hmac and > > +* encrypted data. > > +*/ > > Right, will fix. > > > > > > > 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular > > cipher context init will both call ossl_aes256_encrypt_init to initialise > > context for encryption and key wrapping. In ossl_aes256_encrypt_init, the > > cipher method always initialises to aes-256-cbc, which is ok for keywrap > > because under CBC block cipher mode, we only had to supply one unique IV as > > initial value. But for actual WAL and buffer encryption that will come in > > later, I think the discussion is to use CTR block cipher mode, which > > requires one unique IV for each block, and the sequence id from WAL and > > buffer can be used to derive unique IV for each block for better security? > > I think it would be better to allow caller to decide which EVP_CIPHER to > > initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others? > > > > +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key) > > +{ > > + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL)) > > + return false; > > + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN)) > > + return false; > > + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL)) > > + return false; > > + > > + /* > > +* Always enable padding. We don't need to check the return > > +* value as EVP_CIPHER_CTX_set_padding always returns 1. > > +*/ > > + EVP_CIPHER_CTX_set_padding(ctx, 1); > > + > > + return true; > > +} > > It seems good. We can expand it to make caller decide the block cipher > mode of operation and key length. I removed such code from the > previous patch to make it simple since currently we support only > AES-256 CBC. > > > > > 3) Following up point 2), I think we should enhance the enum to include not > > only the Encryption algorithm and key size, but also the block cipher mode > > as well because having all 3 pieces of information can describe exactly how > > KMS is performing the encryption and decryption. So when we call > > "ossl_aes256_encrypt_init", we can include the new enum as input parameter > > and it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or > > EVP_aes_256_ctr() for different purposes (key wrapping, or WAL > > encryption..etc). > > > > ---> kmgr.h > > +/* Value of key_management_cipher */ > > +enum > > +{ > > + KMGR_CIPHER_OFF = 0, > > + KMGR_CIPHER_AES256 > > +}; > > + > > > > so it would become > > +enum > > +{ > > + KMGR_CIPHER_OFF = 0, > > + KMGR_CIPHER_AES256_CBC = 1, > > + KMGR_CIPHER_AES256_CTR = 2 > > +}; > > > > If you agree with this change, several other places will need to be changed > > as well, such as "kmgr_cipher_string", "kmgr_cipher_value" and the initdb > > code > > KMGR_CIPHER_XXX is relevant with cipher mode used by KMS and KMS would > still use AES256 CBC even if we had TDE which would use AES256 CTR. > > After more t
[PATCH] Documentation bug related to client authentication using TLS certificate
Hi I found a document bug about client authentication using TLS certificate. When clientcert authentication is enabled in pg_hba.conf, libpq does not verify that the common name in certificate matches database username like it is described in the documentation before allowing client connection. Instead, when sslmode is set to “verify-full”, libpq will verify if the server host name matches the common name in client certificate. When sslmode is set to “verify-ca”, libpq will verify that the client is trustworthy by checking the certificate trust chain up to the root certificate and it does not verify server hostname and certificate common name match in this case. The attached patch corrects the clientcert authentication description in the documentation cheers Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca client_cert_auth.patch Description: Binary data
Re: Internal key management system
rephrase based on (2) and uses pg_(un)wrap_secret instead. = Encryption key management System provides several functions described in Table 9.97, to wrap and unwrap user secrets with the Master Encryption Key, which is uniquely and securely stored inside the database cluster. These functions allow user to encrypt and decrypt user data without having to provide user encryption secret in plain text. One possible use case is to use encryption key management together with pgcrypto. User wraps the user encryption secret with pg_wrap_secret() and passes the wrapped encryption secret to the pgcrypto encryption functions. The wrapped secret can be stored in the application server or somewhere secured and should be obtained promptly for cryptographic operations with pgcrypto. [same examples follow after...] ===== Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Tue, 25 Feb 2020 12:50:18 -0800 Cary Huang <mailto:cary.hu...@highgo.ca> wrote Hi I would like to share with you a front end patch based on kms_v4.patch that you have shared, called "kms_v4_fe.patch". The patch integrates front end tool pg_waldump with the KMSv4 and obtain encryption and decryption cipher contexts from the KMS backend. These cipher contexts can then be used in subsequent encryption and decryption operations provided by cipher.h when TDE is enabled. I added two common functions in your kmgr_utils that other front end tools affected by TDE can also use to obtain the cipher contexts. Do let me know if this is how you would envision KMS APIs to be used by a front end. cheers Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca On Mon, 24 Feb 2020 17:55:09 -0800 Masahiko Sawada <mailto:masahiko.saw...@2ndquadrant.com> wrote On Thu, 20 Feb 2020 at 16:16, Masahiko Sawada <mailto:masahiko.saw...@2ndquadrant.com> wrote: > > On Wed, 19 Feb 2020 at 09:29, Cary Huang <mailto:cary.hu...@highgo.ca> wrote: > > > > Hi > > > > I have tried the attached kms_v3 patch and have some comments: > > > > 1) In the comments, I think you meant hmac + iv + encrypted data instead of > > iv + hmac + encrypted data? > > > > ---> in kmgr_wrap_key( ): > > + /* > > +* Assemble the wrapped key. The order of the wrapped key is iv, > > hmac and > > +* encrypted data. > > +*/ > > Right, will fix. > > > > > > > 2) I see that create_keywrap_ctx function in kmgr_utils.c and regular > > cipher context init will both call ossl_aes256_encrypt_init to initialise > > context for encryption and key wrapping. In ossl_aes256_encrypt_init, the > > cipher method always initialises to aes-256-cbc, which is ok for keywrap > > because under CBC block cipher mode, we only had to supply one unique IV as > > initial value. But for actual WAL and buffer encryption that will come in > > later, I think the discussion is to use CTR block cipher mode, which > > requires one unique IV for each block, and the sequence id from WAL and > > buffer can be used to derive unique IV for each block for better security? > > I think it would be better to allow caller to decide which EVP_CIPHER to > > initialize? EVP_aes_256_cbc(), EVP_aes_256_ctr() or others? > > > > +ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key) > > +{ > > + if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL)) > > + return false; > > + if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN)) > > + return false; > > + if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL)) > > + return false; > > + > > + /* > > +* Always enable padding. We don't need to check the return > > +* value as EVP_CIPHER_CTX_set_padding always returns 1. > > +*/ > > + EVP_CIPHER_CTX_set_padding(ctx, 1); > > + > > + return true; > > +} > > It seems good. We can expand it to make caller decide the block cipher > mode of operation and key length. I removed such code from the > previous patch to make it simple since currently we support only > AES-256 CBC. > > > > > 3) Following up point 2), I think we should enhance the enum to include not > > only the Encryption algorithm and key size, but also the block cipher mode > > as well because having all 3 pieces of information can describe exactly how > > KMS is performing the encryption and decryption. So when
Re: [PATCH] Documentation bug related to client authentication using TLS certificate
Hi Chris Thank you for your feedback. You are right, libpq verify if the server is trustworthy by checking server certificate and check hostname matches with server common name when sslmode is verify-full, and it is already explained in another documentation page https://www.postgresql.org/docs/current/libpq-ssl.html Having done another investigation, I found that the original documentation (https://www.postgresql.org/docs/current/auth-cert.html) is actually right. The server is indeed also checking the client certificate cn matches the database user name if the authentication method is set to "cert" Please disregard this patch. thanks! Cary On Mon, 02 Mar 2020 19:23:37 -0800 Chris Bandy wrote Hi, Cary. On 3/2/20 1:06 PM, Cary Huang wrote: > Hi > > I found a document bug about client authentication using TLS > certificate. When clientcert authentication is enabled in pg_hba.conf, > libpq does not verify that the *common name*in certificate > matches*database username*like it is described in the documentation > before allowing client connection. > > Instead, when sslmode is set to “verify-full”, libpq will verify if the > *server host name*matches the *common name *in client certificate. This sounds incorrect. My understanding is that the *server* host name is always matched with the *server* common name. When > sslmode is set to “verify-ca”, libpq will verify that the client is > trustworthy by checking the certificate trust chain up to the root > certificate and it does not verify *server hostname*and > certificate*common name *match in this case. Similarly, libpq will verify the *server* is trustworthy by checking the *server* certificate up to the root. It does not verify that the host name matches the common name in the *server* certificate. In all cases, libpq is responsible for verifying the *server* is who it claims to be. -- Chris
Re: Improve pg_dump dumping publication tables
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hi I applied the patch to PG master branch successfully and the patch seems to do as described. If there are a lot of tables created in the Postgres and only a few of them having publications, the getPublicationTables() function will loop through all the tables and check each one to see if it is the desired relation type and if it contains one or more publications. So potentially a lot of looping and checking may happen. In John's patch, all these looping and checking are replaced with one query that will return all the tables that have publications. One problem though, is that, if there are a million tables created in the server like you say and all of them have publications, the query can return a million rows, which will require a lot of memory on client side and the pg_dump client may need to handle this extreme case with proper pagination, which adds complexity to the function's logic. Existing logic does not have this problem, because it simply queries a table's publication one at a time and do it a million times. thank you Cary Huang HighGo Software
minor problem in boolean cast
Hi I noticed that when casting a string to boolean value with input 'of' it still cast it to 'f'. I think with 'of', it should give an error because 'off' is the expected candidate. This may not be intended so I made a simple patch to address this. ``` postgres=# select cast('of' as boolean); bool -- f (1 row) ``` Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca 0001-boolean-type-cast-fix.patch Description: Binary data
Re: sslinfo extension - add notbefore and notafter timestamps
> > The recent bump in minmum required versions of OpenSSL and LibreSSL made me > > remember to revisit this patch which was previously reverted due to library > > incompatibility (with *both* OpenSSL and LibreSSL on different APIs). > > > > The attached removes the timestamp conversion workaround which is no longer > > needed. > > The patch was marked as ready for committer and is currently failing > in the CI. I've moved it to next CF waiting on author. Could you > provide a rebase? Since the minimum OpenSSL version is now 1.1.1, the v13 patch would fail the CI because it uses the old APIs to obtain notBefore and notAfter timestamps: - X509_get_notBefore - X509_get_notAfter which where deprecated in OpenSSL 1.1.0... Instead, it should use: - X509_get0_notBefore - X509_get0_notAfter which are available in version 1.1.1 and beyond. These APIs now return a "const ASN1_TIME*" instead of "ASN1_TIME*". The changes below should remove the CI failing when applied to v13 patch: --- a/contrib/sslinfo/sslinfo.c +++ b/contrib/sslinfo/sslinfo.c -static Datum ASN1_TIME_to_timestamptz(ASN1_TIME *time); +static Datum ASN1_TIME_to_timestamptz(const ASN1_TIME *time); -ASN1_TIME_to_timestamptz(ASN1_TIME *ASN1_cert_ts) +ASN1_TIME_to_timestamptz(const ASN1_TIME *ASN1_cert_ts) - return ASN1_TIME_to_timestamptz(X509_get_notBefore(cert)); + return ASN1_TIME_to_timestamptz(X509_get0_notBefore(cert)); - return ASN1_TIME_to_timestamptz(X509_get_notAfter(cert)); + return ASN1_TIME_to_timestamptz(X509_get0_notAfter(cert)); --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c -static TimestampTz ASN1_TIME_to_timestamptz(ASN1_TIME *time); +static TimestampTz ASN1_TIME_to_timestamptz(const ASN1_TIME *time); -ASN1_TIME_to_timestamptz(ASN1_TIME *ASN1_cert_ts) +ASN1_TIME_to_timestamptz(const ASN1_TIME *ASN1_cert_ts) -*ptr = ASN1_TIME_to_timestamptz(X509_get_notBefore(port->peer)); +*ptr = ASN1_TIME_to_timestamptz(X509_get0_notBefore(port->peer)); - *ptr = ASN1_TIME_to_timestamptz(X509_get_notAfter(port->peer)); + *ptr = ASN1_TIME_to_timestamptz(X509_get0_notAfter(port->peer)); can you make a rebase with the above changes? Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: sslinfo extension - add notbefore and notafter timestamps
The attached v14 is the rebased patch that updated the OpenSSL API calls to be compatible in version 1.1.1 (and beyond), which is the new minimum OpenSSL version supported in PostgreSQL. Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca v14-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-disp.patch Description: Binary data
Re: encode/decode support for base64url
> Oh well - you're probably right. > I guess I was blinded by my convenience. > Adding a 'base64url' option there is more appropriate. I agree with it too. It is neater to add "base64url" as a new option for encode() and decode() SQL functions in encode.c. In addition, you may also want to add the C versions of base64rul encode and decode functions to "src/common/base64.c" as new API calls so that the frontend, backend applications and extensions can also have access to these base64url conversions. Cary Huang - HighGo Software Inc. (Canada) cary.hu...@highgo.ca www.highgo.ca
Re: Support tid range scan in parallel?
Hi Junwang Thank you so much for the review! > + /* > + * if parallel mode is used, store startblock and numblocks in parallel > + * scan descriptor as well. > + */ > + if (scan->rs_base.rs_parallel != NULL) > + { > + ParallelBlockTableScanDesc bpscan = NULL; > + > + bpscan = (ParallelBlockTableScanDesc) scan->rs_base.rs_parallel; > + bpscan->phs_startblock = scan->rs_startblock; > + bpscan->phs_numblock = scan->rs_numblocks; > + } > > It would be more intuitive and clearer to directly use startBlk and numBlks > to set these values. Since scan->rs_startblock and scan->rs_numblocks > are already set using these variables, using the same approach for bpscan > would make the code easier to understand. > > Another nitty-gritty is that you might want to use a capital `If` in the > comments to maintain the same style. Agreed, made the adjustment in the attached patch. > + if (nallocated >= pbscan->phs_nblocks || (pbscan->phs_numblock != 0 && > + nallocated >= pbscan->phs_numblock)) > > I'd suggest explictly setting phs_numblock to InvalidBlockNumber in > table_block_parallelscan_initialize, and compare with InvalidBlockNumber > here. Also agreed, phs_numblock should be initialized in table_block_parallelscan_initialize just like all other parameters in parallel scan context. You are right, it is much neater to use InvalidBlockNumber rather than 0 to indicate if an upper bound has been specified in the TID range scan. I have addressed your comment in the attached v6 patch. Thank you again for the review. Best regards Cary Huang v6-0001-add-parallel-tid-rangescan.patch Description: Binary data
Re: sslinfo extension - add notbefore and notafter timestamps
On Mon, 10 Mar 2025 12:29:28 -0700 Cary Huang wrote --- > The attached v14 is the rebased patch that updated the OpenSSL API calls to > be compatible in > version 1.1.1 (and beyond), which is the new minimum OpenSSL version > supported in PostgreSQL. > > > Cary Huang > - > HighGo Software Inc. (Canada) > cary.hu...@highgo.ca > www.highgo.ca attached is the rebased (v15) patch Regards Cary Huang v15-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-disp.patch Description: Binary data
Re: Support tid range scan in parallel?
Hello Sorry David and all who have reviewed the patch, it's been awhile since the patch was last worked on :(. Thank you all for the reviews and comments! Attached is the rebased patch that adds support for parallel TID range scans. This feature is particularly useful scanning large tables where the data needs to be scanned in sizable segments using a TID range in the WHERE clause to define each segment. By enabling parallelism, this approach can improve performance compared to both non-parallel TID range scans and traditional sequential scans. Regards Cary Huang v5-0001-add-parallel-tid-rangescan.patch Description: Binary data
Re: Support tid range scan in parallel?
Hi Steven thanks for the review! > I have two comments: > 1. Does table_beginscan_parallel_tidrange() need an assert of relid, > like what table_beginscan_parallel() did? > Assert(RelationGetRelid(relation) == pscan->phs_relid); In the v6 rebased patch, the assert has become: Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator)); rather than: Assert(RelationGetRelid(relation) == pscan->phs_relid); table_beginscan_parallel_tidrange() already has the proper assert line similar to what table_beginscan_parallel() has. > 2. The new field phs_numblock in ParallelBlockTableScanDescData > structure has almost the same name as another field phs_nblocks. Would > you consider changing it to another name, for example, > phs_maxnumblocktoscan? I actually had a similar thought too, phs_nblocks and phs_numblock are very similar but are quite different. But I still left the name as phs_numblock because I want to keep it consistent (kind of) with the 'numBlks' used in heap_set_tidrange() in heapam.c. The comments besides their declaration should be enough to describe their differences without causing confusion. Best regards Cary
Re: Support tid range scan in parallel?
Hi David Thank you so much for the review! I have addressed the comments in the attached v7 patch. > 1. In cost_tidrangescan() you're dividing the total costs by the > number of workers yet the comment is claiming that's CPU cost. I think > this needs to follow the lead of cost_seqscan() and separate out the > CPU and IO cost then add the IO cost at the end, after the divide by > the number of workers. I have separated the costs into disk and CPU costs similar to the style in cost_seqscan(). > 2. In execParallel.c, could you move the case for T_TidRangeScanState > below T_ForeignScanState? What you have right now is now quite > following the unwritten standard set out by the other nodes, i.e > non-parallel aware nodes are last. A good spot seems to be putting it > at the end of the scan types... Custom scan seems slightly misplaced, > but probably can ignore that one and put it after T_ForeignScanState Yes, it's been done. > 3. The following comment should mention what behaviour occurs when the > field is set to InvalidBlockNumber: Also addressed > 4. I think the following would be clearer if written using an else if: > > if (nallocated >= pbscan->phs_nblocks || (pbscan->phs_numblock != > InvalidBlockNumber && nallocated >= pbscan->phs_numblock)) > page = InvalidBlockNumber; /* all blocks have been allocated */ > else > page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks; > > e.g: > > if (nallocated >= pbscan->phs_nblocks) > page = InvalidBlockNumber; /* all blocks have been allocated */ > else if (pbscan->phs_numblock != InvalidBlockNumber && > nallocated >= pbscan->phs_numblock) > page = InvalidBlockNumber; /* upper scan limit reached */ > else > page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks; > > That way the comment after the assignment is accurate. Agreed, and also addressed. > 5. For the tests, is there any reason not to reuse the tidrangescan table? To test TID range scan in parallel, I have a new table created with very low fill factor such that there will be more pages created with small and fixed amount of tuples in each. Then, the test would do SELECT COUNT(*) to ensure correct amount of tuples are being counted by the parallel workers during parallel TID range scan. With this new table, it is easy to ensure the count is correct since we know how many tuples exist in each page and how many pages are to be scanned based on the WHERE predicates. Reusing the tidrangescan table would be hard to tell if the count is correct in my case. thank you! Cary. v7-0001-add-parallel-tid-rangescan.patch Description: Binary data