Re: Protect syscache from bloating with negative cache entries
Hello, my_gripe> But, still fluctulates by around 5%.. my_gripe> my_gripe> If this level of the degradation is still not acceptable, that my_gripe> means that nothing can be inserted in the code path and the new my_gripe> code path should be isolated from existing code by using indirect my_gripe> call. Finally, after some struggling, I think I could manage to measure the impact on performace precisely and reliably. Starting from "make distclean" every time building, then removing all in $TARGET before installation makes things stable enough. (I don't think it's good but I didin't investigate the cause..) I measured time/call by directly calling SearchSysCache3() many times. It showed that the patch causes around 0.1 microsec degradation per call. (The funtion overall took about 6.9 microsec on average.) Next, I counted how many times SearchSysCache is called during a planning with, as an instance, a query on a partitioned table having 3000 columns and 1000 partitions. explain analyze select sum(c) from test.p; Planner made 6020608 times syscache calls while planning and the overall planning time was 8641ms. (Exec time was 48ms.) 6020608 times 0.1 us is 602 ms of degradation. So roughly -7% degradation in planning time in estimation. The degradation was given by really only the two successive instructions "ADD/conditional MOVE(CMOVE)". The fact leads to the conclusion that the existing code path as is doesn't have room for any additional code. So I sought for room at least for one branch and found that (on gcc 7.3.1/CentOS7/x64). Interestingly, de-inlining SearchCatCacheInternal gave me gain on performance by about 3%. Further inlining of CatalogCacheComputeHashValue() gave another gain about 3%. I could add a branch in SearchCatCacheInteral within the gain. I also tried indirect calls but the degradation overwhelmed the gain, so I choosed branching rather than indirect calls. I didn't investigated how it happens. The following is the result. The binaries are build with the same configuration using -O2. binary means master : master HEAD. patched_off : patched, but pruning disabled (catalog_cache_prune_min_age=-1). patched_on : patched with pruning enabled. ("300s" for 1, "1s" for2, "0" for 3) bench: 1: corresponds to catcachebench(1); fetching STATRELATTINH 3000 * 1000 times generating new cache entriies. (Massive cache creatiion) Pruning doesn't happen while running this. 2: catcachebench(2); 6 times cache access on 1000 STATRELATTINH entries. (Frequent cache reference) Pruning doesn't happen while running this. 3: catcachebench(3); fetching 1000(tbls) * 3000(cols) STATRELATTINH entries. Catcache clock advancing with the interval of 100(tbls) * 3000(cols) times of access and pruning happenshoge. While running catcachebench(3) once, pruning happens 28 times and most of the time 202202 entries are removed and the total number of entries was limite to 524289. (The systable has 3000 * 1001 = 3003000 tuples.) iter: Number of iterations. Time ms and stddev is calculated over the iterations. binar| bench | iter | time ms | stddev -+---+---+--+ master | 1 |10 | 8150.30 | 12.96 master | 2 |10 | 4002.88 | 16.18 master | 3 |10 | 9065.06 | 11.46 -+---+---+--+ patched_off | 1 |10 | 8090.95 | 9.95 patched_off | 2 |10 | 3984.67 | 12.33 patched_off | 3 |10 | 9050.46 | 4.64 -+---+---+--+ patched_on | 1 |10 | 8158.95 | 6.29 patched_on | 2 |10 | 4023.72 | 10.41 patched_on | 3 |10 | 16532.66 | 18.39 patched_off is slightly faster than master. patched_on is generally a bit slower. Even though patched_on/3 seems take too long time, the extra time comes from increased catalog table acess in exchange of memory saving. (That is, it is expected behavior.) I ran it several times and most them showed the same tendency. As a side-effect, once the branch added, the shared syscache in a neighbour thread will be able to be inserted together without impact on existing code path. === The benchmark script is used as the follows: - create many (3000, as example) tables in "test" schema. I created a partitioned table with 3000 children. - The tables have many columns, 1000 for me. - Run the following commands. =# select catcachebench(0); -- warm up systables. =# set catalog_cache_prune_min_age = any; -- as required =# select catcachebench(n); -- 3 >= n >= 1, the number of "bench" above. The above result is taked with the following query. =# select 'patched_on', '3' , count(a), avg(a)::numeric(10,2), stddev(a)::numeric(10,2) from (select catcachebench(3) from generate_series(1, 10)) as a(a); The attached patches are: 0001-Adjust-inlining-of-some-f
Re: Superfluous libpq-be.h include in GSSAPI code
> On 29 Jun 2019, at 04:23, Michael Paquier wrote: > > On Fri, Jun 28, 2019 at 08:47:33PM +0200, Julien Rouhaud wrote: >> On Fri, Jun 28, 2019 at 4:37 PM Daniel Gustafsson wrote: >>> backend/libpq/be-secure-gssapi.c is including both libpq-be.h and libpq.h, >>> which makes libpq-be.h superfluous as it gets included via libpq.h. The >>> attached patch removes the inclusion of libpq-be.h to make >>> be-secure-gssapi.c >>> behave like other files which need both headers. >> >> LGTM. > > Thanks, committed. I looked at the area in case but did not notice > anything else strange. Thanks! > (We have in hba.h a kludge with hbaPort to avoid including libpq-be.h, > I got to wonder if we could do something about that..) I looked at that one too at the time, but didn’t come up with anything less kludgy. cheers ./daniel
Re: POC: Cleaning up orphaned files using undo logs
On Mon, Jul 1, 2019 at 7:53 PM Thomas Munro wrote: > 3. Recognise UNDO_SHARED record set boundaries differently. Whereas > undolog.c recognises transaction boundaries automatically for the > other categories (UNDO_PERMANENT, UNDO_UNLOGGED, UNDO_TEMP), for > UNDO_SHARED the ... set of records inserted in between calls to BeginUndoRecordInsert() and FinishUndoRecordInsert() calls is eventually discarded as a unit, and the rm_undo_status() callback for the calling AM decides when that is allowed. In contrast, for the other categories there may be records from any number undo-aware AMs that are entirely unaware of each other and they must all be discarded together if the transaction commits and becomes all visible, so undolog.c automatically manages the boundaries to make that work when inserting. -- Thomas Munro https://enterprisedb.com
Re: Add parallelism and glibc dependent only options to reindexdb
On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: > With the glibc 2.28 coming, all users will have to reindex almost > every indexes after a glibc upgrade to guarantee the lack of > corruption. Unfortunately, reindexdb is not ideal for that as it's > processing everything using a single connexion and isn't able to > discard indexes that doesn't depend on a glibc collation. We have seen that with a database of up to 100GB we finish by cutting the reindex time from 30 minutes to a couple of minutes with a schema we work on. Julien, what were the actual numbers? > PFA a patchset to add parallelism to reindexdb (reusing the > infrastructure in vacuumdb with some additions) and an option to > discard indexes that doesn't depend on glibc (without any specific > collation filtering or glibc version detection), with updated > regression tests. Note that this should be applied on top of the > existing reindexdb cleanup & refactoring patch > (https://commitfest.postgresql.org/23/2115/). Please note that patch 0003 does not seem to apply correctly on HEAD as of c74d49d4. Here is also a small description of each patch: - 0001 refactors the connection slot facility from vacuumdb.c into a new, separate file called parallel.c in src/bin/scripts/. This is not really fancy as some code only moves around. - 0002 adds an extra option for simple lists to be able to use pointers, with an interface to append elements in it. - 0003 begins to be the actual fancy thing with the addition of a --jobs option into reindexdb. The main issue here which should be discussed is that when it comes to reindex of tables, you basically are not going to have any conflicts between the objects manipulated. However if you wish to do a reindex on a set of indexes then things get more tricky as it is necessary to list items per-table so as multiple connections do not conflict with each other if attempting to work on multiple indexes of the same table. What this patch does is to select the set of indexes which need to be worked on (see the addition of cell in ParallelSlot), and then does a kind of pre-planning of each item into the connection slots so as each connection knows from the beginning which items it needs to process. This is quite different from vacuumdb where a new item is distributed only on a free connection from a unique list. I'd personally prefer if we keep the facility in parallel.c so as it is only execution-dependent and that we have no pre-planning. This would require keeping within reindexdb.c an array of lists, with one list corresponding to one connection instead which feels more natural. - 0004 is the part where the concurrent additions really matter as this consists in applying an extra filter to the indexes selected so as only the glibc-sensitive indexes are chosen for the processing. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT
On Tue, Mar 19, 2019 at 1:04 PM Matheus de Oliveira wrote: > Sorry for the long delay. I've rebased the patch to current master (at > f2004f19ed now), attached as postgresql-alter-constraint.v4.patch. All tests > passed cleanly. Hi Matheus, As the commitfest is starting, could you please send a rebased patch? Thanks, -- Thomas Munro https://enterprisedb.com
Re: Re: [PATCH][PROPOSAL] Add enum releation option type
On Mon, Mar 25, 2019 at 9:39 PM David Steele wrote: > It's not clear to me that all of Michael's and Álvaro's issues have been > addressed, so I've marked this Waiting on Author. Hi Nikolay, To help reviewers for the Commitfest that is starting, could you please rebase this patch? -- Thomas Munro https://enterprisedb.com
Re: Option to dump foreign data in pg_dump
> On 28 Jun 2019, at 17:30, Tom Lane wrote: > > Yeah, I think the feature as-proposed is a shotgun that's much more likely > > to cause problems than solve them. Almost certainly, what people would > > really need is the ability to dump individual foreign tables' data not > > everything. (I also note that the main reason for "dump everything", > > namely to get a guaranteed-consistent snapshot, isn't really valid for > > foreign tables anyhow.) I think this is sort of key here, the consistency guarantees are wildly different. A note about this should perhaps be added to the docs for the option discussed here? > On 28 Jun 2019, at 19:55, Luis Carril wrote: > What about providing a list of FDW servers instead of an all or nothing > option? In that way the user really has to do a conscious decision to dump > the content of the foreign tables for a specific server, this would allow > distinction if multiple FDW are being used in the same DB. I think this is a good option, the normal exclusion rules can then still apply in case not everything from a specific server is of interest. cheers ./daniel
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Sat, May 18, 2019 at 12:20 AM Robert Haas wrote: > On Tue, May 14, 2019 at 12:24 AM Amit Langote > wrote: > > He did mention that cases where the nullable side is provably empty can be > > handled by simply returning the path of the non-nullable side with > > suitable projection path added on top to emit NULLs for the columns of the > > nullable-side. If we teach populate_joinrel_with_paths() and underlings > > about that, then we can allow partitionwise join even in the case where > > the nullable side has some partitions missing. > > Yes, I think that would be a good approach to pursue. Hi Ashutosh, Amul, Fujita-san, Could we please have a fresh rebase for the new Commitfest? Thanks, -- Thomas Munro https://enterprisedb.com
Re: [HACKERS] Cached plans and statement generalization
On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik wrote: > New version of the patching disabling autoprepare for rules and handling > planner error. Hi Konstantin, This doesn't apply. Could we please have a fresh rebase for the new Commitfest? Thanks, -- Thomas Munro https://enterprisedb.com
Re: Attempt to consolidate reading of XLOG page
On Tue, May 21, 2019 at 9:12 PM Antonin Houska wrote: > Robert Haas wrote: > > It seems to me that it's better to unwind the stack i.e. have the > > function return the error information to the caller and let the caller > > do as it likes. > > Thanks for a hint. The next version tries to do that. Hi Antonin, Could you please send a fresh rebase for the new Commitfest? Thanks, -- Thomas Munro https://enterprisedb.com
Re: MSVC Build support with visual studio 2019
On Thu, 27 Jun 2019 at 17:28, Michael Paquier wrote: > On Wed, Jun 26, 2019 at 10:29:05PM +1000, Haribabu Kommi wrote: > > Thanks for the review. Yes, that patch applies till 9.5, it is my mistake > > in naming the patch. > > I have been able to finally set up an environment with VS 2019 (as > usual this stuff needs time, anyway..), and I can confirm that the > patch is able to compile properly. > Thanks for the review. > - Visual Studio 2017 (including Express > editions), > - as well as standalone Windows SDK releases 6.0 to 8.1. > + Visual Studio 2019 (including Express > editions), > + as well as standalone Windows SDK releases 8.1a to 10. > I would like to understand why this range of requirements is updated. > Is there any reason to do so. If we change these docs, what does it > mean in terms of versions of Visual Studio supported? > We stopped the support of building with all the visual studio versions less than 2013. I updated the SDK versions accordingly. > -or a VS2015Solution or a VS2017Solution, all in Solution.pm, depending on > -the user's build environment) and adding objects implementing the > corresponding > -Project interface (VC2013Project or VC2015Project or VC2017Project from > -MSBuildProject.pm) to it. > +or a VS2015Solution or a VS2017Solution or a VS2019Solution, all in > Solution.pm, > +depending on the user's build environment) and adding objects implementing > +the corresponding Project interface (VC2013Project or VC2015Project or > VC2017Project > +or VC2019Project from MSBuildProject.pm) to it. > This formulation is weird the more we accumulate new objects, let's > put that in a proper list of elements separated with commas except > for the two last ones which should use "or". > > s/greather/greater/. > > The patch still has typos, and the format is not satisfying yet, so I > have done a set of fixes as per the attached. > The change in the patch is good. > > - elsif ($major < 6) > + elsif ($major < 12) > { > croak > - "Unable to determine Visual Studio version: > Visual Studio versions before 6.0 aren't supported."; > + "Unable to determine Visual Studio version: > Visual Studio versions before 12.0 aren't supported."; > Well, this is a separate bug fix, still I don't mind fixing that in > the same patch as we meddle with those code paths now. Good catch. > > - croak $visualStudioVersion; > + carp $visualStudioVersion; > Same here. Just wouldn't it be better to print the version found in > the same message? > Yes, that is a good change, I thought of doing the same, but I don't know how to do it. The similar change is required for the CreateProject also. > + # The major visual studio that is supported has nmake version >= > 14.20 and < 15. > if ($major > 14) > Comment line is too long. It seems to me that the condition here > should be ($major >= 14 && $minor >= 30). That's not completely > correct either as we have a version higher than 14.20 for VS 2019 but > that's better than just using 14.29 or a fake number I guess. > The change is good, but the comment is wrong. + # The major visual studio that is supported has nmake + # version >= 14.30, so stick with it as the latest version The major visual studio version that is supported has nmake version <=14.30 Except for the above two changes, overall the patch is in good shape. Regards, Haribabu Kommi
Re: Built-in connection pooler
On Thu, Mar 21, 2019 at 4:33 AM Konstantin Knizhnik wrote: > New version of the patch (rebased + bug fixes) is attached to this mail. Hi again Konstantin, Interesting work. No longer applies -- please rebase. -- Thomas Munro https://enterprisedb.com
Re: Libpq support to connect to standby server as priority
On Mon, 3 Jun 2019 at 16:32, Haribabu Kommi wrote: > > > On Thu, Apr 11, 2019 at 9:13 AM Haribabu Kommi > wrote: > >> I fixed all the comments that you have raised above and attached the >> updated >> patches. >> > > Rebased patches are attached. > Rebased patches are attached. Regards, Haribabu Kommi 0001-Restructure-the-code-to-remove-duplicate-code.patch Description: Binary data 0002-New-TargetSessionAttrsType-enum.patch Description: Binary data 0003-Make-transaction_read_only-as-GUC_REPORT-variable.patch Description: Binary data 0004-New-prefer-read-target_session_attrs-type.patch Description: Binary data 0005-New-read-only-target_session_attrs-type.patch Description: Binary data 0006-Primary-prefer-standby-and-standby-options.patch Description: Binary data 0007-New-function-to-rejecting-the-checked-write-connecti.patch Description: Binary data 0008-Server-recovery-mode-handling.patch Description: Binary data
Re: [HACKERS] Custom compression methods
On Sat, Mar 16, 2019 at 12:52 AM Ildus Kurbangaliev wrote: > in my opinion this patch is usually skipped not because it is not > needed, but because of its size. It is not hard to maintain it until > commiters will have time for it or I will get actual response that > nobody is going to commit it. Hi Ildus, To maximise the chances of more review in the new Commitfest that is about to begin, could you please send a fresh rebase? This doesn't apply anymore. Thanks, -- Thomas Munro https://enterprisedb.com
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 1, 2019 at 10:55 AM Michael Paquier wrote: > > On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: > > With the glibc 2.28 coming, all users will have to reindex almost > > every indexes after a glibc upgrade to guarantee the lack of > > corruption. Unfortunately, reindexdb is not ideal for that as it's > > processing everything using a single connexion and isn't able to > > discard indexes that doesn't depend on a glibc collation. > > We have seen that with a database of up to 100GB we finish by cutting > the reindex time from 30 minutes to a couple of minutes with a schema > we work on. Julien, what were the actual numbers? I did my benchmarking using a quite ideal database, having a large number of tables and various set of indexes, for a 75 GB total size. This was done on my laptop which has 6 multithreaded cores (and crappy IO), also keeping the default max_parallel_maintenance_worker = 2. A naive reindexdb took approximately 33 minutes. Filtering the list of indexes took that down to slightly less than 15 min, but of course each database will have a different ratio there. Then, keeping the --glibc-dependent and using different level of parallelism: -j1: ~ 14:50 -j3: ~ 7:30 -j6: ~ 5:23 -j8: ~ 4:45 That's pretty much the kind of results I was expecting given the hardware I used. > > PFA a patchset to add parallelism to reindexdb (reusing the > > infrastructure in vacuumdb with some additions) and an option to > > discard indexes that doesn't depend on glibc (without any specific > > collation filtering or glibc version detection), with updated > > regression tests. Note that this should be applied on top of the > > existing reindexdb cleanup & refactoring patch > > (https://commitfest.postgresql.org/23/2115/). > > Please note that patch 0003 does not seem to apply correctly on HEAD > as of c74d49d4. Yes, this is because this patchset has to be applied on top of the reindexdb refactoring patch mentioned. It's sad that we don't have a good way to deal with that kind of dependency, as it's also breaking Thomas' cfbot :( > - 0003 begins to be the actual fancy thing with the addition of a > --jobs option into reindexdb. The main issue here which should be > discussed is that when it comes to reindex of tables, you basically > are not going to have any conflicts between the objects manipulated. > However if you wish to do a reindex on a set of indexes then things > get more tricky as it is necessary to list items per-table so as > multiple connections do not conflict with each other if attempting to > work on multiple indexes of the same table. What this patch does is > to select the set of indexes which need to be worked on (see the > addition of cell in ParallelSlot), and then does a kind of > pre-planning of each item into the connection slots so as each > connection knows from the beginning which items it needs to process. > This is quite different from vacuumdb where a new item is distributed > only on a free connection from a unique list. I'd personally prefer > if we keep the facility in parallel.c so as it is only > execution-dependent and that we have no pre-planning. This would > require keeping within reindexdb.c an array of lists, with one list > corresponding to one connection instead which feels more natural. My fear here is that this approach would add some extra complexity, especially requiring to deal with free connection handling both in GetIdleSlot() and the main reindexdb loop. Also, the pre-planning allows us to start processing the biggest tables first, which optimises the overall runtime.
Re: FETCH FIRST clause WITH TIES option
On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen wrote: > agree Hi Surafel, A new Commitfest is starting. Can you please post a fresh rebase of this patch? Thanks, -- Thomas Munro https://enterprisedb.com
Re: [PATCH] kNN for btree
On Tue, Mar 26, 2019 at 4:30 AM Nikita Glukhov wrote: > >> Fixed two bugs in patches 3 and 10 (see below). > >> Patch 3 was extracted from the main patch 5 (patch 4 in v9). > > > > This patch no longer applies so marking Waiting on Author. > > > Attached 11th version of the patches rebased onto current master. Hi Nikita, Since a new Commitfest is here and this doesn't apply, could we please have a fresh rebase? Thanks, -- Thomas Munro https://enterprisedb.com
Re: make installcheck-world in a clean environment
On Mon, Mar 25, 2019 at 8:30 PM Alexander Lakhin wrote: > 25.03.2019 10:25, David Steele wrote: > > If it doesn't attract more review or a committer during this CF, I > > think we should mark it as rejected. > Please, delay rejecting it till the end of the commitfest, I'll try to > find some enthusiasm amongst my colleagues. Hi Alexander, A new CF is here and this is in "Needs Review". Would you like to provide a rebased patch, or should it really be withdrawn? Thanks, -- Thomas Munro https://enterprisedb.com
Re: [HACKERS] Custom compression methods
Hi, Thomas! On Mon, Jul 1, 2019 at 1:22 PM Thomas Munro wrote: > > On Sat, Mar 16, 2019 at 12:52 AM Ildus Kurbangaliev > wrote: > > in my opinion this patch is usually skipped not because it is not > > needed, but because of its size. It is not hard to maintain it until > > commiters will have time for it or I will get actual response that > > nobody is going to commit it. > > To maximise the chances of more review in the new Commitfest that is > about to begin, could you please send a fresh rebase? This doesn't > apply anymore. As I get we're currently need to make high-level decision of whether we need this [1]. I was going to bring this topic up at last PGCon, but I didn't manage to attend. Does it worth bothering Ildus with continuous rebasing assuming we don't have this high-level decision yet? Links 1. https://www.postgresql.org/message-id/20190216054526.zss2cufdxfeudr4i%40alap3.anarazel.de -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Optimization of some jsonb functions
> >> On 2/22/19 2:05 AM, Nikita Glukhov wrote: > >>> Attached set of patches with some jsonb optimizations that were made > >>> during > >>> comparison of performance of ordinal jsonb operators and jsonpath > >>> operators. Hi Nikita, This doesn't apply -- to attract reviewers, could we please have a rebase? Thanks, -- Thomas Munro https://enterprisedb.com
RE: [PATCH] Speedup truncates of relation forks
On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote: > As far as I remember, you should see "relation" wait events (type lock) on > standby server. This is due to startup process acquiring AccessExclusiveLock > for the truncation and other backend waiting to acquire a lock to read the > table. Hi Adrien, thank you for taking time to reply. I understand that RelationTruncate() can block read-only queries on standby during redo. However, it's difficult for me to reproduce the test case where I need to catch that wait for relation lock, because one has to execute SELECT within the few milliseconds of redoing the truncation of one table. Instead, I just measured the whole recovery time, smgr_redo(), to show the recovery improvement compared to head. Please refer below. [Recovery Test] I used the same stored functions and configurations in the previous email & created "test" db. $ createdb test $ psql -d test 1. [Primary] Create 10,000 relations. test=# SELECT create_tables(1); 2. [P] Insert one row in each table. test=# SELECT insert_tables(1); 3. [P] Delete row of each table. test=# SELECT delfrom_tables(1); 4. [Standby] WAL application is stopped at Standby server. test=# SELECT pg_wal_replay_pause(); 5. [P] VACUUM is executed at Primary side, and measure its execution time. test=# \timing on test=# VACUUM; Alternatively, you may use: $ time psql -d test -c 'VACUUM;' (Note: WAL has not replayed on standby because it's been paused.) 6. [P] Wait until VACUUM has finished execution. Then, stop primary server. test=# pg_ctl stop -w 7. [S] Resume WAL replay, then promote standby (failover). I used a shell script to execute recovery & promote standby server because it's kinda difficult to measure recovery time. Please refer to the script below. - "SELECT pg_wal_replay_resume();" is executed and the WAL application is resumed. - "pg_ctl promote" to promote standby. - The time difference of "select pg_is_in_recovery();" from "t" to "f" is measured. shell script: PGDT=/path_to_storage_directory/ if [ "$1" = "resume" ]; then psql -c "SELECT pg_wal_replay_resume();" test date +%Y/%m/%d_%H:%M:%S.%3N pg_ctl promote -D ${PGDT} set +x date +%Y/%m/%d_%H:%M:%S.%3N while [ 1 ] do RS=`psql -Atc "select pg_is_in_recovery();" test` if [ ${RS} = "f" ]; then break fi done date +%Y/%m/%d_%H:%M:%S.%3N set -x exit 0 fi [Test Results] shared_buffers = 24GB 1. HEAD (wal replay resumed) 2019/07/01_08:48:50.326 server promoted 2019/07/01_08:49:50.482 2019/07/01_09:02:41.051 Recovery Time: 13 min 50.725 s -> Time difference from WAL replay to complete recovery 12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f" 2. PATCH (wal replay resumed) 2019/07/01_07:34:26.766 server promoted 2019/07/01_07:34:57.790 2019/07/01_07:34:57.809 Recovery Time: 31.043 s -> Time difference from WAL replay to complete recovery 00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f" [Conclusion] The recovery time significantly improved compared to head from 13 minutes to 30 seconds. Any thoughts? I'd really appreciate your comments/feedback about the patch and/or test. Regards, Kirk Jamison
Re: Choosing values for multivariate MCV lists
On Sat, 29 Jun 2019 at 14:01, Tomas Vondra wrote: > > >However, it looks like the problem is with mcv_list_items()'s use > >of %f to convert to text, which is pretty ugly. > > >>>There's one issue with the signature, though - currently the function > >>>returns null flags as bool array, but values are returned as simple > >>>text value (formatted in array-like way, but still just a text). > >>> > >>IMO fixing this to return a text array is worth doing, even though it > >>means a catversion bump. > >> > Attached is a cleaned-up version of that patch. The main difference is > that instead of using construct_md_array() this uses ArrayBuildState to > construct the arrays, which is much easier. The docs don't need any > update because those were already using text[] for the parameter, the > code was inconsistent with it. > Cool, this looks a lot neater and fixes the issues discussed with both floating point values no longer being converted to text, and proper text arrays for values. One minor nitpick -- it doesn't seem to be necessary to build the arrays *outfuncs and *fmgrinfo. You may as well just fetch the individual column output function information at the point where it's used (in the "if (!item->isnull[i])" block) rather than building those arrays. > This does require catversion bump, but as annoying as it is, I think > it's worth it (and there's also the thread discussing the serialization > issues). Barring objections, I'll get it committed later next week, once > I get back from PostgresLondon. > > As I mentioned before, if we don't want any additional catversion bumps, > it's possible to pass the arrays through output functions - that would > allow us keeping the text output (but correct, unlike what we have now). > I think this is a clear bug fix, so I'd vote for fixing it properly now, with a catversion bump. Regards, Dean
Re: How to estimate the shared memory size required for parallel scan?
On Fri, May 3, 2019 at 6:06 AM Thomas Munro wrote: > Added to commitfest. Rebased. -- Thomas Munro https://enterprisedb.com 0001-Make-file_fdw-parallel-aware-v3.patch Description: Binary data
Re: Psql patch to show access methods info
On Sun, Mar 31, 2019 at 2:13 PM wrote: > Thanks for review. Hi Sergey, A new Commitfest is here and this doesn't apply -- could you please post a rebase? Thanks, -- Thomas Munro https://enterprisedb.com
logical replication slot and publication alter
Hi All, There is an interesting issue: I created one replication slot, specifying pgouput plugin and one publication "foobar". The publication "foobar" was set to tables "foo1, foo2". The slot was left unread, while the tables foo1 and foo2 get changed. Then, I alter the publication "foobar" to remove table "foo2" and start logical replication. The changes from foo2 still be sent from the slot! Another try is even if I drop the publication "foobar", the slot still find the original publication definition and send the changes without problem. I check the source codes, and I think it's due to the snapshot, when pgoutput load the publication, it would use the catalog tuples from the snapshot instead of current version, so even if the publication get altered or get dropped, the original version is still there in the snapshot. Is it expected or it's a bug? Anyways, alter publication would not affect the replication stream is unexpected. Regards, Jinhua Luo
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Mon, Jul 1, 2019 at 6:50 PM Thomas Munro wrote: > On Sat, May 18, 2019 at 12:20 AM Robert Haas wrote: > > On Tue, May 14, 2019 at 12:24 AM Amit Langote > > wrote: > > > He did mention that cases where the nullable side is provably empty can be > > > handled by simply returning the path of the non-nullable side with > > > suitable projection path added on top to emit NULLs for the columns of the > > > nullable-side. If we teach populate_joinrel_with_paths() and underlings > > > about that, then we can allow partitionwise join even in the case where > > > the nullable side has some partitions missing. > > > > Yes, I think that would be a good approach to pursue. > > Hi Ashutosh, Amul, Fujita-san, > > Could we please have a fresh rebase for the new Commitfest? Will do unless Ashutosh, Amul, or anyone wants to. Thanks! Best regards, Etsuro Fujita
Re: shared-memory based stats collector
On Fri, May 17, 2019 at 6:48 PM Kyotaro HORIGUCHI wrote: > Fixed. Version 20. Hello Horiguchi-san, A new Commitfest is here. This doesn't apply (maybe just because of the new improved pgindent). Could we please have a fresh rebase? Thanks, -- Thomas Munro https://enterprisedb.com
Re: Converting NOT IN to anti-joins during planning
On Mon, 27 May 2019 at 20:43, Antonin Houska wrote: > I've spent some time looking into this. Thank you for having a look at this. > One problem I see is that SubLink can be in the JOIN/ON clause and thus it's > not necessarily at the top of the join tree. Consider this example: > > CREATE TABLE a(i int); > CREATE TABLE b(j int); > CREATE TABLE c(k int NOT NULL); > CREATE TABLE d(l int); > > SELECT * > FROM > a > JOIN b ON b.j NOT IN > ( SELECT > c.k > FROM > c) > JOIN d ON b.j = d.l; hmm yeah. Since the proofs that are being used in expressions_are_not_nullable assume the join has already taken place, then we'll either need to not use the join conditions are proofs in that case, or just disable the optimisation instead. I think it's fine to just disable the optimisation since it seem rather unlikely that someone would write a join condition like that. Plus it seems quite a bit more complex to validate that the optimisation would even be correct if NULLs were not possible. I've attached a patch which restricts the pullups to FromExpr quals. Anything below a JoinExpr disables the optimisation now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services not_in_anti_join_v2.patch Description: Binary data
Re: Tid scan improvements
On Tue, Mar 26, 2019 at 7:25 PM David Steele wrote: > On 3/26/19 8:11 AM, Edmund Horner wrote: > > Should I set update CF app to a) set the target version to 13, and/or > > move it to next commitfest? > > If you plan to continue working on it in this CF then you can just > change the target to PG13. If you plan to take a break and pick up the > work later then go ahead and push it to the next CF. Hi Edmund, The new CF is here. I'm going through poking threads for submissions that don't apply, but it sounds like this needs more than a rebase? Perhaps this belongs in the next CF? -- Thomas Munro https://enterprisedb.com
Re: Planning counters in pg_stat_statements (using pgss_store)
On Tue, Apr 2, 2019 at 9:22 AM legrand legrand wrote: > > >> case avg_tps pct_diff > >> 089 278 -- > >> 188 745 0,6% > >> 288 282 1,1% > >> 386 660 2,9% > >> > >> This means that even in this extrem test case, the worst degradation is > >> less > >> than 3% > >> (this overhead can be removed using pg_stat_statements.track_planning guc) > > > Is the difference between 2 and 3 the extraneous pgss_store call to > > always store the query text if planner hook doesn't have access to the > > query text? > > Yes it is, > but I agree it seems a big gap (1,8%) compared to the difference between 1 > and 2 (0,5%). > Maybe this is just mesure "noise" ... Rebased patches attached. From c7a407143d1283a6b86c3244efccf0da7990e330 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Wed, 27 Mar 2019 11:24:35 +0100 Subject: [PATCH 2/2] Add planning counters to pg_stat_statements --- contrib/pg_stat_statements/Makefile | 3 +- .../expected/pg_stat_statements.out | 64 .../pg_stat_statements--1.7--1.8.sql | 52 .../pg_stat_statements/pg_stat_statements.c | 293 +- .../pg_stat_statements.control| 2 +- .../sql/pg_stat_statements.sql| 16 + doc/src/sgml/pgstatstatements.sgml| 49 ++- 7 files changed, 401 insertions(+), 78 deletions(-) create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 051ce46f0c..db33d9ffe1 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -4,7 +4,8 @@ MODULE_big = pg_stat_statements OBJS = pg_stat_statements.o $(WIN32RES) EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \ +DATA = pg_stat_statements--1.4.sql \ +pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \ pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \ diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index c759763be2..8517b124e4 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -593,4 +593,68 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; -- DROP ROLE regress_stats_user1; DROP ROLE regress_stats_user2; +-- +-- [re]plan counting +-- +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +CREATE TABLE test (); +PREPARE prep1 AS SELECT COUNT(*) FROM test;; +EXECUTE prep1; + count +--- + 0 +(1 row) + +EXECUTE prep1; + count +--- + 0 +(1 row) + +EXECUTE prep1; + count +--- + 0 +(1 row) + +ALTER TABLE test ADD COLUMN x int; +EXECUTE prep1; + count +--- + 0 +(1 row) + +SELECT 42; + ?column? +-- + 42 +(1 row) + +SELECT 42; + ?column? +-- + 42 +(1 row) + +SELECT 42; + ?column? +-- + 42 +(1 row) + +SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; + query| plans | calls | rows ++---+---+-- + ALTER TABLE test ADD COLUMN x int | 0 | 1 |0 + CREATE TABLE test () | 0 | 1 |0 + PREPARE prep1 AS SELECT COUNT(*) FROM test | 2 | 4 |4 + SELECT $1 | 3 | 3 |3 + SELECT pg_stat_statements_reset() | 0 | 1 |1 +(5 rows) + DROP EXTENSION pg_stat_statements; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql new file mode 100644 index 00..de75643928 --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -0,0 +1,52 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(boolean); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements(IN showtext boolean, +OUT userid oid, +OUT dbid oid, +OUT queryid bigint, +OUT query text, +OUT plans int8, +OUT total_plan_time float8, +OUT calls int8, +OUT total_exec_time float8, +
Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2
On Wed, Apr 17, 2019 at 10:23 PM Masahiko Sawada wrote: > Sorry for the very late. Attached updated version patches. Hello Sawada-san, Can we please have a fresh rebase? Thanks, -- Thomas Munro https://enterprisedb.com
Re: allow online change primary_conninfo
Hello Updated version attached. Merge conflict was about tests count in 001_stream_rep.pl. Nothing else was changed. My approach can be still incorrect, any redesign ideas are welcome. Thanks in advance! regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 84341a30e5..054be17e08 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3916,9 +3916,14 @@ ANY num_sync ( @@ -3933,9 +3938,14 @@ ANY num_sync ( ). - This parameter can only be set at server start. + This parameter can only be set in the postgresql.conf + file or on the server command line. This setting has no effect if primary_conninfo is not - set. + set or the server is not in standby mode. + + + WAL receiver will be restarted after primary_slot_name + was changed. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3e2c4e3e5b..964989432c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12125,6 +12125,42 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, return false;/* not reached */ } +void +ProcessStartupSigHup(void) +{ + char *conninfo = pstrdup(PrimaryConnInfo); + char *slotname = pstrdup(PrimarySlotName); + bool conninfoChanged; + bool slotnameChanged; + + ProcessConfigFile(PGC_SIGHUP); + + /* + * We need restart XLOG_FROM_STREAM source if replication settings was + * changed + */ + conninfoChanged = (strcmp(conninfo, PrimaryConnInfo) != 0); + slotnameChanged = (strcmp(slotname, PrimarySlotName) != 0); + + if ((conninfoChanged || slotnameChanged) && + currentSource == XLOG_FROM_STREAM + && WalRcvRunning()) + { + if (conninfoChanged && slotnameChanged) + ereport(LOG, + (errmsg("The WAL receiver is going to be restarted due to change of primary_conninfo and primary_slot_name"))); + else + ereport(LOG, + (errmsg("The WAL receiver is going to be restarted due to change of %s", + conninfoChanged ? "primary_conninfo" : "primary_slot_name"))); + + pendingRestartSource = true; + } + + pfree(conninfo); + pfree(slotname); +} + /* * Determine what log level should be used to report a corrupt WAL record * in the current WAL page, previously read by XLogPageRead(). diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index 5048a2c2aa..9bf5c792fe 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -147,7 +147,7 @@ HandleStartupProcInterrupts(void) if (got_SIGHUP) { got_SIGHUP = false; - ProcessConfigFile(PGC_SIGHUP); + ProcessStartupSigHup(); } /* diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 92c4fee8f8..e54d8e7172 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3571,7 +3571,7 @@ static struct config_string ConfigureNamesString[] = }, { - {"primary_conninfo", PGC_POSTMASTER, REPLICATION_STANDBY, + {"primary_conninfo", PGC_SIGHUP, REPLICATION_STANDBY, gettext_noop("Sets the connection string to be used to connect to the sending server."), NULL, GUC_SUPERUSER_ONLY @@ -3582,7 +3582,7 @@ static struct config_string ConfigureNamesString[] = }, { - {"primary_slot_name", PGC_POSTMASTER, REPLICATION_STANDBY, + {"primary_slot_name", PGC_SIGHUP, REPLICATION_STANDBY, gettext_noop("Sets the name of the replication slot to use on the sending server."), NULL }, diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index d519252aad..9e49020b19 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -320,6 +320,7 @@ extern void SetWalWriterSleeping(bool sleeping); extern void XLogRequestWalReceiverReply(void); +extern void ProcessStartupSigHup(void); extern void assign_max_wal_size(int newval, void *extra); extern void assign_checkpoint_completion_target(double newval, void *extra); diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 3c743d7d7c..ae80f4df3a 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 32; +use Test::More tests => 33; # Initialize master node my $node_master = get_new_node('master'); @@ -208,7 +208,9 @@ $node_standby_2->append_conf('postgresql.conf', "primary_slot_name = $slotname_2"); $node_standby_2->append_conf('postgresql.conf', "wal_receiver_status_interval = 1"); -$node_standby_2->restart; +# should be able change primary_slot_name without restart +# will wait effect in get_slot_xmins above +$node_standby_2->reload; # Fetch xmin columns from slot's pg_replication_slots row, after waiting for # given boolean condition to be true to ensure we've reached a quiescent state @@ -344,3 +346,21 @@ is($catalog_xmin, ''
Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
On Fri, Apr 19, 2019 at 3:41 PM Paul Guo wrote: > I updated the patches as attached following previous discussions. Hi Paul, Could we please have a fresh rebase now that the CF is here? Thanks, -- Thomas Munro https://enterprisedb.com
Re: [PROPOSAL] Temporal query processing with range types
On Wed, Apr 3, 2019 at 2:12 AM Ibrar Ahmed wrote: > I start looking at the patch, there is a couple of problems with the patch. > The first one is the OID conflict, which I fixed on my machine. The second > problem is assertion failure. I think you have not compiled the PostgreSQL > code with the assertion. Hi Peter, Looks like there was some good feedback for this WIP project last time around. It's currently in "Needs Review" status in the July Commitfest. To encourage more review and see some automated compile and test results, could we please have a fresh rebase? The earlier patches no longer apply. Thanks, -- Thomas Munro https://enterprisedb.com
Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead
On Mon, Apr 1, 2019 at 4:36 AM Nikolay Shaplov wrote: > В письме от понедельник, 18 марта 2019 г. 3:03:04 MSK пользователь Iwata, Aya > написал: > > This patch does not apply. > Oh... Sorry... here goes new version Hi Nikolay, Could we please have a new rebase? Thanks, -- Thomas Munro https://enterprisedb.com
Re: extension patch of CREATE OR REPLACE TRIGGER
On Tue, Mar 5, 2019 at 10:19 PM David Steele wrote: > On 2/28/19 10:43 AM, Osumi, Takamichi wrote: > > One past thread about introducing CREATE OR REPLACE TRIGGER into the syntax > > > > had stopped without complete discussion in terms of LOCK level. > > > > The past thread is this. I'd like to inherit this one. > > Since this patch landed at the last moment in the last commitfest for > PG12 I have marked it as targeting PG13. Hello Osumi-san, The July Commitfest is now beginning. To give the patch the best chance of attracting reviewers, could you please post a rebased version? The last version doesn't apply. Thanks, -- Thomas Munro https://enterprisedb.com
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On Thu, Mar 28, 2019 at 6:49 AM Alexey Kondratov wrote: > Updated version of patch is attached. Hi Alexey, This no longer applies. Since the Commitfest is starting now, could you please rebase it? Thanks, -- Thomas Munro https://enterprisedb.com
Re: Change ereport level for QueuePartitionConstraintValidation
Hello This change is discussed as open item for pg12. Seems we have nor objections nor agreement. I attached updated version due merge conflict. > Per discussion started here: > https://www.postgresql.org/message-id/CA%2BTgmoZWSLUjVcc9KBSVvbn%3DU5QRgW1O-MgUX0y5CnLZOA2qyQ%40mail.gmail.com regards, Sergeidiff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index fd67d2a841..bc837b499e 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15357,11 +15357,11 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel, if (PartConstraintImpliedByRelConstraint(scanrel, partConstraint)) { if (!validate_default) - ereport(INFO, + ereport(DEBUG1, (errmsg("partition constraint for table \"%s\" is implied by existing constraints", RelationGetRelationName(scanrel; else - ereport(INFO, + ereport(DEBUG1, (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints", RelationGetRelationName(scanrel; return; diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c index f3c9236ad5..3344120190 100644 --- a/src/backend/partitioning/partbounds.c +++ b/src/backend/partitioning/partbounds.c @@ -1252,7 +1252,7 @@ check_default_partition_contents(Relation parent, Relation default_rel, */ if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints)) { - ereport(INFO, + ereport(DEBUG1, (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints", RelationGetRelationName(default_rel; return; @@ -1303,7 +1303,7 @@ check_default_partition_contents(Relation parent, Relation default_rel, if (PartConstraintImpliedByRelConstraint(part_rel, def_part_constraints)) { -ereport(INFO, +ereport(DEBUG1, (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints", RelationGetRelationName(part_rel; diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index e687150511..4c28029ee0 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3633,11 +3633,9 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); ALTER TABLE list_parted2 DETACH PARTITION part_3_4; ALTER TABLE part_3_4 ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); -INFO: partition constraint for table "part_3_4" is implied by existing constraints -- check if default partition scan skipped ALTER TABLE list_parted2_def ADD CONSTRAINT check_a CHECK (a IN (5, 6)); CREATE TABLE part_55_66 PARTITION OF list_parted2 FOR VALUES IN (55, 66); -INFO: updated partition constraint for default partition "list_parted2_def" is implied by existing constraints -- check validation when attaching range partitions CREATE TABLE range_parted ( a int, @@ -3662,7 +3660,6 @@ CREATE TABLE part2 ( b int NOT NULL CHECK (b >= 10 AND b < 18) ); ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20); -INFO: partition constraint for table "part2" is implied by existing constraints -- Create default partition CREATE TABLE partr_def1 PARTITION OF range_parted DEFAULT; -- Only one default partition is allowed, hence, following should give error @@ -3690,13 +3687,11 @@ ERROR: partition constraint is violated by some row DELETE FROM part_5_a WHERE a NOT IN (3); ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 5); ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5); -INFO: partition constraint for table "part_5" is implied by existing constraints ALTER TABLE list_parted2 DETACH PARTITION part_5; ALTER TABLE part_5 DROP CONSTRAINT check_a; -- scan should again be skipped, even though NOT NULL is now a column property ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5); -INFO: partition constraint for table "part_5" is implied by existing constraints -- Check the case where attnos of the partitioning columns in the table being -- attached differs from the parent. It should not affect the constraint- -- checking logic that allows to skip the scan. @@ -3707,7 +3702,6 @@ CREATE TABLE part_6 ( ); ALTER TABLE part_6 DROP c; ALTER TABLE list_parted2 ATTACH PARTITION part_6 FOR VALUES IN (6); -INFO: partition constraint for table "part_6" is implied by existing constraints -- Similar to above, but the table being attached is a partitioned table -- whose partition has still different attnos for the root partitioning -- columns. @@ -3725,10 +3719,7 @@ CREATE TABLE part_7_a_null ( ); ALTER TABLE part_7_a_null DROP c, DROP d, DROP e; ALTER TABLE part_7 ATTACH PARTITION part_7_a_null FOR VALUE
Re: Commitfest 2019-07, the first of five* for PostgreSQL 13
Hello hackers, It's now July everywhere on Earth, so I marked CF 2019-07 as in-progress, and 2019-09 as open for bumping patches into. I pinged most of the "Needs Review" threads that don't apply and will do a few more tomorrow, and then I'll try to chase patches that fail on CI, and then see what I can do to highlight some entries that really need review/discussion. I'll do end-of-week status reports. -- Thomas Munro https://enterprisedb.com
Re: Custom table AMs need to include heapam.h because of BulkInsertState
On Sun, 30 Jun 2019 at 17:54, David Rowley wrote: > Any further thoughts on this Michael? > > Or Andres? Do you have a preference to which of the approaches > (mentioned upthread) I use for the fix? > > If I don't hear anything I'll probably just push the first fix. The > inefficiency does not affect heap, so likely the people with the most > interest in improving that will be authors of other table AMs that > actually do something during table_finish_bulk_insert() for > partitions. We could revisit this in PG13 if someone comes up with a > need to improve things here. I've pushed the original patch plus a small change to only call table_finish_bulk_insert() for the target of the copy when we're using bulk inserts. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Optimize partial TOAST decompression
Hi! > Andrey Borodin 于2019年6月29日周六 下午9:48写道: > Hi! > Please, do not use top-posting, i.e. reply style where you quote whole > message under your response. It makes reading of archives terse. > > > 24 июня 2019 г., в 7:53, Binguo Bao написал(а): > > > >> This is not correct: L bytes of compressed data do not always can be > decoded into at least L bytes of data. At worst we have one control byte > per 8 bytes of literal bytes. This means at most we need (L*9 + 8) / 8 > bytes with current pglz format. > > > > Good catch! I've corrected the related code in the patch. > > ... > > <0001-Optimize-partial-TOAST-decompression-2.patch> > > I've took a look into the code. > I think we should extract function for computation of max_compressed_size > and put it somewhere along with pglz code. Just in case something will > change something about pglz so that they would not forget about compression > algorithm assumption. > > Also I suggest just using 64 bit computation to avoid overflows. And I > think it worth to check if max_compressed_size is whole data and use min of > (max_compressed_size, uncompressed_data_size). > > Also you declared needsize and max_compressed_size too far from use. But > this will be solved by function extraction anyway. > > Thanks! > > Best regards, Andrey Borodin. Thanks for the suggestion. I've extracted function for computation for max_compressed_size and put the function into pg_lzcompress.c. Best regards, Binguo Bao. From 79a1b4c292a0629df9d7ba3dc04e879aadca7a61 Mon Sep 17 00:00:00 2001 From: BBG Date: Sun, 2 Jun 2019 19:18:46 +0800 Subject: [PATCH] Optimize partial TOAST decompression --- src/backend/access/heap/tuptoaster.c | 24 +--- src/common/pg_lzcompress.c | 22 ++ src/include/common/pg_lzcompress.h | 1 + 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c index 55d6e91..684f1b2 100644 --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -266,6 +266,7 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, if (VARATT_IS_EXTERNAL_ONDISK(attr)) { struct varatt_external toast_pointer; + int32 max_size; VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); @@ -273,8 +274,13 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) return toast_fetch_datum_slice(attr, sliceoffset, slicelength); - /* fetch it back (compressed marker will get set automatically) */ - preslice = toast_fetch_datum(attr); + max_size = pglz_maximum_compressed_size(sliceoffset + slicelength, +toast_pointer.va_rawsize); + /* + * Be sure to get enough compressed slice + * and compressed marker will get set automatically + */ + preslice = toast_fetch_datum_slice(attr, 0, max_size); } else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) { @@ -2031,7 +2037,8 @@ toast_fetch_datum(struct varlena *attr) * Reconstruct a segment of a Datum from the chunks saved * in the toast relation * - * Note that this function only supports non-compressed external datums. + * Note that this function supports non-compressed external datums + * and compressed external datum slices at the start of the object. * -- */ static struct varlena * @@ -2072,10 +2079,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length) VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); /* - * It's nonsense to fetch slices of a compressed datum -- this isn't lo_* - * we can't return a compressed datum which is meaningful to toast later + * It's meaningful to fetch slices at the start of a compressed datum. */ - Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)); + Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset); attrsize = toast_pointer.va_extsize; totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1; @@ -2091,7 +2097,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length) result = (struct varlena *) palloc(length + VARHDRSZ); - SET_VARSIZE(result, length + VARHDRSZ); + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) { + SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ); + } else { + SET_VARSIZE(result, length + VARHDRSZ); + } if (length == 0) return result; /* Can save a lot of work at this point! */ diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c index 988b398..2b5f112 100644 --- a/src/common/pg_lzcompress.c +++ b/src/common/pg_lzcompress.c @@ -771,3 +771,25 @@ pglz_decompress(const char *source, int32 slen, char *dest, */ return (char *) dp - dest; } + + + +/* -- + * pglz_max_compressed_size - + * + * Calculate the maximum size of the compressed slice corresponding to the + * raw slice. Return the maximum size, or raw size if maximum size is greater + * than raw size. + * -- + */ +int32 +pglz_ma
Re: Add parallelism and glibc dependent only options to reindexdb
Now that we have REINDEX CONCURRENTLY, I think reindexdb is going to gain more popularity. Please don't reuse a file name as generic as "parallel.c" -- it's annoying when navigating source. Maybe conn_parallel.c multiconn.c connscripts.c admconnection.c ...? If your server crashes or is stopped midway during the reindex, you would have to start again from scratch, and it's tedious (if it's possible at all) to determine which indexes were missed. I think it would be useful to have a two-phase mode: in the initial phase reindexdb computes the list of indexes to be reindexed and saves them into a work table somewhere. In the second phase, it reads indexes from that table and processes them, marking them as done in the work table. If the second phase crashes or is stopped, it can be restarted and consults the work table. I would keep the work table, as it provides a bit of an audit trail. It may be important to be able to run even if unable to create such a work table (because of the numerous users that DROP DATABASE postgres). Maybe we'd have two flags in the work table for each index: "reindex requested", "reindex done". The "glibc filter" thing (which I take to mean "indexes that depend on collations") would apply to the first phase: it just skips adding other indexes to the work table. I suppose ICU collations are not affected, so the filter would be for glibc collations only? The --glibc-dependent switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can add other rules later. (Not "--exclude=foo" because we'll want to add the possibility to ignore specific indexes by name.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add parallelism and glibc dependent only options to reindexdb
Michael Paquier writes: > - 0003 begins to be the actual fancy thing with the addition of a > --jobs option into reindexdb. The main issue here which should be > discussed is that when it comes to reindex of tables, you basically > are not going to have any conflicts between the objects manipulated. > However if you wish to do a reindex on a set of indexes then things > get more tricky as it is necessary to list items per-table so as > multiple connections do not conflict with each other if attempting to > work on multiple indexes of the same table. What this patch does is > to select the set of indexes which need to be worked on (see the > addition of cell in ParallelSlot), and then does a kind of > pre-planning of each item into the connection slots so as each > connection knows from the beginning which items it needs to process. > This is quite different from vacuumdb where a new item is distributed > only on a free connection from a unique list. I'd personally prefer > if we keep the facility in parallel.c so as it is only > execution-dependent and that we have no pre-planning. This would > require keeping within reindexdb.c an array of lists, with one list > corresponding to one connection instead which feels more natural. Couldn't we make this enormously simpler and less bug-prone by just dictating that --jobs applies only to reindex-table operations? > - 0004 is the part where the concurrent additions really matter as > this consists in applying an extra filter to the indexes selected so > as only the glibc-sensitive indexes are chosen for the processing. I think you'd be better off to define and document this as "reindex only collation-sensitive indexes", without any particular reference to a reason why somebody might want to do that. regards, tom lane
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Hi Thomas, On 01.07.2019 15:02, Thomas Munro wrote: Hi Alexey, This no longer applies. Since the Commitfest is starting now, could you please rebase it? Thank you for a reminder. Rebased version of the patch is attached. I've also modified my logging code in order to obey new unified logging system for command-line programs commited by Peter (cc8d415117). Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company >From f5f359274322020c2338b5b494f6327eaa61c0e1 Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Tue, 19 Feb 2019 19:14:53 +0300 Subject: [PATCH v7] pg_rewind: options to use restore_command from command line or cluster config Previously, when pg_rewind could not find required WAL files in the target data directory the rewind process would fail. One had to manually figure out which of required WAL files have already moved to the archival storage and copy them back. This patch adds possibility to specify restore_command via command line option or use one specified inside postgresql.conf. Specified restore_command will be used for automatic retrieval of missing WAL files from archival storage. --- doc/src/sgml/ref/pg_rewind.sgml | 30 - src/bin/pg_rewind/parsexlog.c | 164 +- src/bin/pg_rewind/pg_rewind.c | 92 ++- src/bin/pg_rewind/pg_rewind.h | 6 +- src/bin/pg_rewind/t/001_basic.pl | 4 +- src/bin/pg_rewind/t/002_databases.pl | 4 +- src/bin/pg_rewind/t/003_extrafiles.pl | 4 +- src/bin/pg_rewind/t/RewindTest.pm | 84 - 8 files changed, 371 insertions(+), 17 deletions(-) diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 4d91eeb0ff..746c07e4df 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -67,8 +67,10 @@ PostgreSQL documentation ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the target cluster ran for a long time after the divergence, the old WAL - files might no longer be present. In that case, they can be manually - copied from the WAL archive to the pg_wal directory, or + files might no longer be present. In that case, they can be automatically + copied by pg_rewind from the WAL archive to the + pg_wal directory if either -r or + -R option is specified, or fetched on startup by configuring or . The use of pg_rewind is not limited to failover, e.g. a standby @@ -202,6 +204,30 @@ PostgreSQL documentation + + -r + --use-postgresql-conf + + +Use restore_command in the postgresql.conf to +retrieve missing in the target pg_wal directory +WAL files from the WAL archive. + + + + + + -R restore_command + --restore-command=restore_command + + +Specifies the restore_command to use for retrieval of the missing +in the target pg_wal directory WAL files from +the WAL archive. + + + + --debug diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c index 287af60c4e..d1de08320c 100644 --- a/src/bin/pg_rewind/parsexlog.c +++ b/src/bin/pg_rewind/parsexlog.c @@ -12,6 +12,7 @@ #include "postgres_fe.h" #include +#include #include "pg_rewind.h" #include "filemap.h" @@ -44,6 +45,7 @@ static char xlogfpath[MAXPGPATH]; typedef struct XLogPageReadPrivate { const char *datadir; + const char *restoreCommand; int tliIndex; } XLogPageReadPrivate; @@ -52,6 +54,9 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *pageTLI); +static int RestoreArchivedWAL(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand); + /* * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of @@ -59,7 +64,7 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader, */ void extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex, - XLogRecPtr endpoint) + XLogRecPtr endpoint, const char *restore_command) { XLogRecord *record; XLogReaderState *xlogreader; @@ -68,6 +73,7 @@ extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex, private.datadir = datadir; private.tliIndex = tliIndex; + private.restoreCommand = restore_command; xlogreader = XLogReaderAllocate(WalSegSz, &SimpleXLogPageRead, &private); if (xlogreader == NULL) @@ -155,7 +161,7 @@ readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex) void findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex, XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli, - XLo
Re: [HACKERS] Custom compression methods
On 2019-Jul-01, Alexander Korotkov wrote: > As I get we're currently need to make high-level decision of whether > we need this [1]. I was going to bring this topic up at last PGCon, > but I didn't manage to attend. Does it worth bothering Ildus with > continuous rebasing assuming we don't have this high-level decision > yet? I agree that having to constantly rebase a patch that doesn't get acted upon is a bit pointless. I see a bit of a process problem here: if the patch doesn't apply, it gets punted out of commitfest and reviewers don't look at it. This means the discussion goes unseen and no decisions are made. My immediate suggestion is to rebase even if other changes are needed. Longer-term I think it'd be useful to have patches marked as needing "high-level decisions" that may lag behind current master; maybe we have them provide a git commit-ID on top of which the patch applies cleanly. I recently found git-imerge which can make rebasing of large patch series easier, by letting you deal with smaller conflicts one step at a time rather than one giant conflict; it may prove useful. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Built-in connection pooler
On 01.07.2019 12:57, Thomas Munro wrote: On Thu, Mar 21, 2019 at 4:33 AM Konstantin Knizhnik wrote: New version of the patch (rebased + bug fixes) is attached to this mail. Hi again Konstantin, Interesting work. No longer applies -- please rebase. Rebased version of the patch is attached. Also all this version of built-ni proxy can be found in conn_proxy branch of https://github.com/postgrespro/postgresql.builtin_pool.git diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 84341a30e5..9398e561e8 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -719,6 +719,123 @@ include_dir 'conf.d' + + max_sessions (integer) + + max_sessions configuration parameter + + + + + The maximum number of client sessions that can be handled by + one connection proxy when session pooling is switched on. + This parameter does not add any memory or CPU overhead, so + specifying a large max_sessions value + does not affect performance. + If the max_sessions limit is reached new connection are not accepted. + + + The default value is 1000. This parameter can only be set at server start. + + + + + + session_pool_size (integer) + + session_pool_size configuration parameter + + + + + Enables session pooling and defines the maximum number of + backends that can be used by client sessions for each database/user combination. + Launched non-tainted backends are never terminated even if there are no active sessions. + Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements. + Tainted backend can server only one client. + + + The default value is 10, so up to 10 backends will server each database, + + + + + + proxy_port (integer) + + proxy_port configuration parameter + + + + + Sets the TCP port for the connection pooler. + Clients connected to main "port" will be assigned dedicated backends, + while client connected to proxy port will be connected to backends through proxy which + performs transaction level scheduling. + + + The default value is 6543. + + + + + + connection_proxies (integer) + + connection_proxies configuration parameter + + + + + Sets number of connection proxies. + Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing)." + "Each proxy launches its own subset of backends. So maximal number of non-tainted backends is " + "session_pool_size*connection_proxies*databases*roles. + + + The default value is 0, so session pooling is disabled. + + + + + + session_schedule (enum) + + session_schedule configuration parameter + + + + + Specifies scheduling policy for assigning session to proxies in case of + connection pooling. Default policy is round-robin. + + + With round-robin policy postmaster cyclicly scatter sessions between proxies. + + + With random policy postmaster randomly choose proxy for new session. + + + With load-balancing policy postmaster choose proxy with lowest load average. + Load average of proxy is estimated by number of clients connection assigned to this proxy with extra weight for SSL connections. + + + + + + restart_pooler_on_reload (string) + + restart_pooler_on_reload configuration parameter + + + + + Restart session pool workers once pg_reload_conf() is called. + The default value is false. + + + + unix_socket_directories (string) diff --git a/doc/src/sgml/connpool.sgml b/doc/src/sgml/connpool.sgml new file mode 100644 index 00..07f4202f75 --- /dev/null +++ b/doc/src/sgml/connpool.sgml @@ -0,0 +1,174 @@ + + + + Connection pooling + + + built-in connection pool proxy + + + +PostgreSQL spawns a separate process (backend) for each client. +For large number of clients such model can cause consumption of large number of system +resources and lead to significant performance degradation, especially at computers with large +number of CPU cores. The reason is high contention between backends for postgres resources. +Also size of many Postgres internal data structures are proportional to the number of +active backends as well as complexity of algorithms for
Fix two issues after moving to unified logging system for command-line utils
Hi hackers, I have found two minor issues with unified logging system for command-line programs (commited by Peter cc8d415117), while was rebasing my pg_rewind patch: 1) forgotten new-line symbol in pg_fatal call inside pg_rewind, which will cause the following Assert in common/logging.c to fire Assert(fmt[strlen(fmt) - 1] != '\n'); It seems not to be a problem for a production Postgres installation without asserts, but should be removed for sanity. 2) swapped progname <-> full_path in initdb.c setup_bin_paths's call [1], while logging message remained the same. So the output will be rather misleading, since in the pg_ctl and pg_dumpall the previous order is used. Attached is a small patch that fixes these issues. [1] https://github.com/postgres/postgres/commit/cc8d41511721d25d557fc02a46c053c0a602fed0#diff-c4414062a0071ec15df504d39a6df705R2500 Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company >From 2ea4a17ecc8f9bd57bb676f684fb729279339534 Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Mon, 1 Jul 2019 18:11:25 +0300 Subject: [PATCH v1] Fix usage of unified logging pg_log_* in pg_rewind and initdb --- src/bin/initdb/initdb.c | 2 +- src/bin/pg_rewind/pg_rewind.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 2ef179165b..70273be783 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2497,7 +2497,7 @@ setup_bin_paths(const char *argv0) pg_log_error("The program \"postgres\" is needed by %s but was not found in the\n" "same directory as \"%s\".\n" "Check your installation.", - full_path, progname); + progname, full_path); else pg_log_error("The program \"postgres\" was found by \"%s\"\n" "but was not the same version as %s.\n" diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index 6e77201be6..d378053de4 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -555,7 +555,7 @@ getTimelineHistory(ControlFileData *controlFile, int *nentries) else if (controlFile == &ControlFile_target) histfile = slurpFile(datadir_target, path, NULL); else - pg_fatal("invalid control file\n"); + pg_fatal("invalid control file"); history = rewind_parseTimeLineHistory(histfile, tli, nentries); pg_free(histfile); base-commit: 95bbe5d82e428db342fa3ec60b95f1b9873741e5 -- 2.17.1
Re: [HACKERS] Custom compression methods
On Mon, Jul 1, 2019 at 5:51 PM Alvaro Herrera wrote: > On 2019-Jul-01, Alexander Korotkov wrote: > > > As I get we're currently need to make high-level decision of whether > > we need this [1]. I was going to bring this topic up at last PGCon, > > but I didn't manage to attend. Does it worth bothering Ildus with > > continuous rebasing assuming we don't have this high-level decision > > yet? > > I agree that having to constantly rebase a patch that doesn't get acted > upon is a bit pointless. I see a bit of a process problem here: if the > patch doesn't apply, it gets punted out of commitfest and reviewers > don't look at it. This means the discussion goes unseen and no > decisions are made. My immediate suggestion is to rebase even if other > changes are needed. OK, let's do this assuming Ildus didn't give up yet :) > Longer-term I think it'd be useful to have patches > marked as needing "high-level decisions" that may lag behind current > master; maybe we have them provide a git commit-ID on top of which the > patch applies cleanly. +1, Sounds like good approach for me. > I recently found git-imerge which can make rebasing of large patch > series easier, by letting you deal with smaller conflicts one step at a > time rather than one giant conflict; it may prove useful. Thank you for pointing, will try. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Cached plans and statement generalization
On 01.07.2019 12:51, Thomas Munro wrote: On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik wrote: New version of the patching disabling autoprepare for rules and handling planner error. Hi Konstantin, This doesn't apply. Could we please have a fresh rebase for the new Commitfest? Thanks, Attached please find rebased version of the patch. Also this version can be found in autoprepare branch of this repository https://github.com/postgrespro/postgresql.builtin_pool.git on github. diff --git a/doc/src/sgml/autoprepare.sgml b/doc/src/sgml/autoprepare.sgml new file mode 100644 index 000..b3309bd --- /dev/null +++ b/doc/src/sgml/autoprepare.sgml @@ -0,0 +1,62 @@ + + + + Autoprepared statements + + + autoprepared statements + + + +PostgreSQL makes it possible prepare +frequently used statements to eliminate cost of their compilation +and optimization on each execution of the query. On simple queries +(like ones in pgbench -S) using prepared statements +increase performance more than two times. + + + +Unfortunately not all database applications are using prepared statements +and, moreover, it is not always possible. For example, in case of using +pgbouncer or any other session pooler, +there is no session state (transactions of one client may be executed at different +backends) and so prepared statements can not be used. + + + +Autoprepare mode allows to overcome this limitation. +In this mode Postgres tries to generalize executed statements +and build parameterized plan for them. Speed of execution of +autoprepared statements is almost the same as of explicitly +prepared statements. + + + +By default autoprepare mode is switched off. To enable it, assign non-zero +value to GUC variable autoprepare_tthreshold. +This variable specified minimal number of times the statement should be +executed before it is autoprepared. Please notice that, despite to the +value of this parameter, Postgres makes a decision about using +generalized plan vs. customized execution plans based on the results +of comparison of average time of five customized plans with +time of generalized plan. + + + +If number of different statements issued by application is large enough, +then autopreparing all of them can cause memory overflow +(especially if there are many active clients, because prepared statements cache +is local to the backend). To prevent growth of backend's memory because of +autoprepared cache, it is possible to limit number of autoprepared statements +by setting autoprepare_limit GUC variable. LRU strategy will be used +to keep in memory most frequently used queries. + + + +It is possible to inspect autoprepared queries in the backend using +pg_autoprepared_statements view. It shows original text of the +query, types of the extracted parameters (replacing literals) and +query execution counter. + + + diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index f2b9d40..cb703f2 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8314,6 +8314,11 @@ SCRAM-SHA-256$:&l + pg_autoprepared_statements + autoprepared statements + + + pg_prepared_xacts prepared transactions @@ -9630,6 +9635,68 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx + + + pg_autoprepared_statements + + + pg_autoprepared_statements + + + + The pg_autoprepared_statements view displays + all the autoprepared statements that are available in the current + session. See for more information about autoprepared + statements. + + + + pg_autoprepared_statements Columns + + + + + Name + Type + Description + + + + + statement + text + +The query string submitted by the client from which this prepared statement +was created. Please notice that literals in this statement are not +replaced with prepared statement placeholders. + + + + parameter_types + regtype[] + + The expected parameter types for the autoprepared statement in the + form of an array of regtype. The OID corresponding + to an element of this array can be obtained by casting the + regtype value to oid. + + + + exec_count + int8 + +Number of times this statement was executed. + + + + + + + + The pg_autoprepared_statements view is read only. + + + pg_prepared_xacts diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 84341a3..fcbb68b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5327,9 +5327,57 @@ SELECT * FROM parent WHERE key = 2400; + + + autoprepare_threshold (integer/type>) + +
Re: Avoid full GIN index scan when possible
On 29/06/2019 00:23, Julien Rouhaud wrote: > On Fri, Jun 28, 2019 at 10:16 PM Tom Lane wrote: >> >> Tomas Vondra writes: >>> On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote: I not only don't want that function in indxpath.c, I don't even want it to be known/called from there. If we need the ability for the index AM to editorialize on the list of indexable quals (which I'm not very convinced of yet), let's make an AM interface function to do it. >> >>> Wouldn't it be better to have a function that inspects a single qual and >>> says whether it's "optimizable" or not? That could be part of the AM >>> implementation, and we'd call it and it'd be us messing with the list. >> >> Uh ... we already determined that the qual is indexable (ie is a member >> of the index's opclass), or allowed the index AM to derive an indexable >> clause from it, so I'm not sure what you envision would happen >> additionally there. If I understand what Julien is concerned about >> --- and I may not --- it's that the set of indexable clauses *as a whole* >> may have or lack properties of interest. So I'm thinking the answer >> involves some callback that can do something to the whole list, not >> qual-at-a-time. We've already got facilities for the latter case. > > Yes, the root issue here is that with gin it's entirely possible that > "WHERE sometable.col op value1" is way more efficient than "WHERE > sometable.col op value AND sometable.col op value2", where both qual > are determined indexable by the opclass. The only way to avoid that > is indeed to inspect the whole list, as done in this poor POC. > > This is a problem actually hit in production, and as far as I know > there's no easy way from the application POV to prevent unexpected > slowdown. Maybe Marc will have more details about the actual problem > and how expensive such a case was compared to the normal ones. Sorry for the delay... Yes, quite easily, here is what we had (it's just a bit simplified, we have other criterions but I think it shows the problem): rh2=> explain analyze select * from account_employee where typeahead like '%albert%'; QUERY PLAN Bitmap Heap Scan on account_employee (cost=53.69..136.27 rows=734 width=666) (actual time=15.562..35.044 rows=8957 loops=1) Recheck Cond: (typeahead ~~ '%albert%'::text) Rows Removed by Index Recheck: 46 Heap Blocks: exact=8919 -> Bitmap Index Scan on account_employee_site_typeahead_gin_idx (cost=0.00..53.51 rows=734 width=0) (actual time=14.135..14.135 rows=9011 loops=1) Index Cond: (typeahead ~~ '%albert%'::text) Planning time: 0.224 ms Execution time: 35.389 ms (8 rows) rh2=> explain analyze select * from account_employee where typeahead like '%albert%' and typeahead like '%lo%'; QUERY PLAN Bitmap Heap Scan on account_employee (cost=28358.38..28366.09 rows=67 width=666) (actual time=18210.109..18227.134 rows=1172 loops=1) Recheck Cond: ((typeahead ~~ '%albert%'::text) AND (typeahead ~~ '%lo%'::text)) Rows Removed by Index Recheck: 7831 Heap Blocks: exact=8919 -> Bitmap Index Scan on account_employee_site_typeahead_gin_idx (cost=0.00..28358.37 rows=67 width=0) (actual time=18204.756..18204.756 rows=9011 loops=1) Index Cond: ((typeahead ~~ '%albert%'::text) AND (typeahead ~~ '%lo%'::text)) Planning time: 0.288 ms Execution time: 18230.182 ms (8 rows) We noticed this because the application timed out for users searching someone whose name was 2 characters ( it happens :) ). We reject such filters when it's the only criterion, as we know it's going to be slow, but ignoring it as a supplementary filter would be a bit weird. Of course there is the possibility of filtering with two stages with a CTE, but that's not as great as having PostgreSQL doing it itself. By the way, while preparing this, I noticed that it seems that during this kind of index scan, the interrupt signal is masked for a very long time. Control-C takes a very long while to cancel the query. But it's an entirely different problem :) Regards signature.asc Description: OpenPGP digital signature
Re: Cleanup/remove/update references to OID column
I'm resending this patch, which still seems to be needed. Also, should this be removed ? Or at leat remove the parenthesized text, since non-system tables no longer have OIDs: "(use to avoid output on system tables)" https://www.postgresql.org/docs/devel/runtime-config-developer.html trace_lock_oidmin (integer) And maybe this (?) trace_lock_table (integer) On Wed, May 08, 2019 at 02:05:57PM -0500, Justin Pryzby wrote: > I found what appears to be a dangling reference to old "hidden" OID behavior. > > Justin > From 1c6712c0ade949648dbc415dfd7ea80312360ef7 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Wed, 8 May 2019 13:57:12 -0500 > Subject: [PATCH v1] Cleanup/remove/update references to OID column... > > ..in wake of 578b229718e8f. > > See also > 93507e67c9ca54026019ebec3026de35d30370f9 > 1464755fc490a9911214817fe83077a3689250ab > f6b39171f3d65155b9390c2c69bc5b3469f923a8 > > Author: Justin Pryzby > --- > doc/src/sgml/catalogs.sgml | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml > index d544e60..0f9c6f2 100644 > --- a/doc/src/sgml/catalogs.sgml > +++ b/doc/src/sgml/catalogs.sgml > @@ -610,7 +610,7 @@ >oid >oid > > - Row identifier (hidden attribute; must be explicitly > selected) > + Row identifier > > > > -- > 2.7.4 > -- Justin Pryzby System Administrator Telsasoft +1-952-707-8581
Re: Cleanup/remove/update references to OID column
Justin Pryzby writes: > I'm resending this patch, which still seems to be needed. Yeah, clearly one copy of that text got missed out. Pushed that. > Also, should this be removed ? Or at leat remove the parenthesized text, > since > non-system tables no longer have OIDs: "(use to avoid output on system > tables)" No, I think that's still fine as-is. Tables still have OIDs, they just don't *contain* magic OID columns. > And maybe this (?) > trace_lock_table (integer) Hm, the description of that isn't English, at least: gettext_noop("Sets the OID of the table with unconditionally lock tracing."), I'm not entirely sure about the point of tracing locks on just one table, which seems to be what this is for. regards, tom lane
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera wrote: > > Please don't reuse a file name as generic as "parallel.c" -- it's > annoying when navigating source. Maybe conn_parallel.c multiconn.c > connscripts.c admconnection.c ...? I could use scripts_parallel.[ch] as I've already used it in the #define part? > If your server crashes or is stopped midway during the reindex, you > would have to start again from scratch, and it's tedious (if it's > possible at all) to determine which indexes were missed. I think it > would be useful to have a two-phase mode: in the initial phase reindexdb > computes the list of indexes to be reindexed and saves them into a work > table somewhere. In the second phase, it reads indexes from that table > and processes them, marking them as done in the work table. If the > second phase crashes or is stopped, it can be restarted and consults the > work table. I would keep the work table, as it provides a bit of an > audit trail. It may be important to be able to run even if unable to > create such a work table (because of the numerous users that > DROP DATABASE postgres). Or we could create a table locally in each database, that would fix this problem and probably make the code simpler? It also raises some additional concerns about data expiration. I guess that someone could launch the tool by mistake, kill reindexdb, and run it again 2 months later while a lot of new objects have been added for instance. > The "glibc filter" thing (which I take to mean "indexes that depend on > collations") would apply to the first phase: it just skips adding other > indexes to the work table. I suppose ICU collations are not affected, > so the filter would be for glibc collations only? Indeed, ICU shouldn't need such a filter. xxx_pattern_ops based indexes are also excluded. > The --glibc-dependent > switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can > add other rules later. (Not "--exclude=foo" because we'll want to add > the possibility to ignore specific indexes by name.) That's a good point, I like the --exclude-rule switch.
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 1, 2019 at 4:10 PM Tom Lane wrote: > > Michael Paquier writes: > > - 0003 begins to be the actual fancy thing with the addition of a > > --jobs option into reindexdb. The main issue here which should be > > discussed is that when it comes to reindex of tables, you basically > > are not going to have any conflicts between the objects manipulated. > > However if you wish to do a reindex on a set of indexes then things > > get more tricky as it is necessary to list items per-table so as > > multiple connections do not conflict with each other if attempting to > > work on multiple indexes of the same table. What this patch does is > > to select the set of indexes which need to be worked on (see the > > addition of cell in ParallelSlot), and then does a kind of > > pre-planning of each item into the connection slots so as each > > connection knows from the beginning which items it needs to process. > > This is quite different from vacuumdb where a new item is distributed > > only on a free connection from a unique list. I'd personally prefer > > if we keep the facility in parallel.c so as it is only > > execution-dependent and that we have no pre-planning. This would > > require keeping within reindexdb.c an array of lists, with one list > > corresponding to one connection instead which feels more natural. > > Couldn't we make this enormously simpler and less bug-prone by just > dictating that --jobs applies only to reindex-table operations? That would also mean that we'll have to fallback on doing reindex at table-level, even if we only want to reindex indexes that depends on glibc. I'm afraid that this will often add a huge penalty. > > - 0004 is the part where the concurrent additions really matter as > > this consists in applying an extra filter to the indexes selected so > > as only the glibc-sensitive indexes are chosen for the processing. > > I think you'd be better off to define and document this as "reindex > only collation-sensitive indexes", without any particular reference > to a reason why somebody might want to do that. We should still document that indexes based on ICU would be exluded? I also realize that I totally forgot to update reindexdb.sgml. Sorry about that, I'll fix with the next versions.
Re: Error message inconsistency
Do we have an actual patch here? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
On 2019-Apr-19, Paul Guo wrote: > The below patch runs single mode Postgres if needed to make sure the target > is cleanly shutdown. A new option is added (off by default). > v2-0001-Ensure-target-clean-shutdown-at-beginning-of-pg_r.patch Why do we need an option for this? Is there a reason not to do this unconditionally? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: POC: converting Lists into arrays
Hi, On 5/25/19 11:48 AM, Tom Lane wrote: The cfbot noticed a set-but-not-used variable that my compiler hadn't whined about. Here's a v5 to pacify it. No other changes. This needs a rebase. After that check-world passes w/ and w/o -DDEBUG_LIST_MEMORY_USAGE. There is some unneeded MemoryContext stuff in async.c's pg_listening_channels() which should be cleaned up. Thanks for working on this, as the API is more explicit now about what is going on. Best regards, Jesper
Re: FETCH FIRST clause WITH TIES option
Hi Thomas On Mon, Jul 1, 2019 at 1:31 PM Thomas Munro wrote: > On Mon, Apr 8, 2019 at 8:26 PM Surafel Temesgen > wrote: > > agree > > Hi Surafel, > > A new Commitfest is starting. Can you please post a fresh rebase of this > patch? > > Thank you for informing. attach is a rebased patch against current master regards Surafel diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 06d611b64c..e83d309c5b 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] +[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1430,7 +1430,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY +FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } In this syntax, the start or count value is required by @@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. -ROW +ROW . +WITH TIES option is used to return two or more rows +that tie for the last place in the result set according to ORDER BY +clause (ORDER BY clause must be specified in this case). and ROWS as well as FIRST and NEXT are noise words that don't influence the effects of these clauses. diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index baa669abe8..60660e710f 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -41,6 +41,7 @@ static TupleTableSlot * /* return: a tuple or NULL */ ExecLimit(PlanState *pstate) { LimitState *node = castNode(LimitState, pstate); + ExprContext *econtext = node->ps.ps_ExprContext; ScanDirection direction; TupleTableSlot *slot; PlanState *outerPlan; @@ -126,12 +127,16 @@ ExecLimit(PlanState *pstate) { /* * Forwards scan, so check for stepping off end of window. If - * we are at the end of the window, return NULL without - * advancing the subplan or the position variable; but change - * the state machine state to record having done so. + * we are at the end of the window, the behavior depends whether + * ONLY or WITH TIES was specified. In case of ONLY, we return + * NULL without advancing the subplan or the position variable; + * but change the state machine state to record having done so. + * In the WITH TIES mode, we need to advance the subplan until + * we find the first row with different ORDER BY pathkeys. */ if (!node->noCount && - node->position - node->offset >= node->count) + node->position - node->offset >= node->count && + node->limitOption == EXACT_NUMBER) { node->lstate = LIMIT_WINDOWEND; @@ -144,18 +149,69 @@ ExecLimit(PlanState *pstate) return NULL; } +else if (!node->noCount && + node->position - node->offset >= node->count && + node->limitOption == WITH_TIES) +{ + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + /* + * Test if the new tuple and the last tuple match. + * If so we return the tuple. + */ + econtext->ecxt_innertuple = slot; + econtext->ecxt_outertuple = node->last_slot; + if (ExecQualAndReset(node->eqfunction, econtext)) + { + node->subSlot = slot; + node->position++; + } + else + { + node->lstate = LIMIT_WINDOWEND; + + /* + * If we know we won't need to back up, we can release + * resources at this point. + */ + if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD)) + (void) ExecShutdownNode(outerPlan); + + return NULL; + } -/* - * Get next tuple from subplan, if any. - */ -slot = ExecProcNode(outerPlan); -if (TupIsNull(slot)) +} +else { - node->lstate = LIMIT_SUBPLANEOF; - return NULL; + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + + /* + * Tuple at limit is needed for comparation in subsequent execution + * to detect ties. + */ + if (node->limitOption == WITH_TIES && + node->position - node->offset == node->count - 1) + { + ExecCopySlot(node->last_slot, slot); + } + node->subSlo
Re: [PATCH] kNN for btree
On 01.07.2019 13:41, Thomas Munro wrote: On Tue, Mar 26, 2019 at 4:30 AM Nikita Glukhov wrote: Fixed two bugs in patches 3 and 10 (see below). Patch 3 was extracted from the main patch 5 (patch 4 in v9). This patch no longer applies so marking Waiting on Author. Attached 11th version of the patches rebased onto current master. Hi Nikita, Since a new Commitfest is here and this doesn't apply, could we please have a fresh rebase? Attached 12th version of the patches rebased onto the current master. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Fix-get_index_column_opclass-v12.patch.gz Description: application/gzip 0002-Introduce-ammatchorderby-function-v12.patch.gz Description: application/gzip 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexes-v12.patch.gz Description: application/gzip 0004-Extract-structure-BTScanState-v12.patch.gz Description: application/gzip 0005-Add-kNN-support-to-btree-v12.patch.gz Description: application/gzip 0006-Add-btree-distance-operators-v12.patch.gz Description: application/gzip 0007-Remove-distance-operators-from-btree_gist-v12.patch.gz Description: application/gzip 0008-Add-regression-tests-for-kNN-btree-v12.patch.gz Description: application/gzip 0009-Allow-ammatchorderby-to-return-pathkey-sublists-v12.patch.gz Description: application/gzip 0010-Add-support-of-array-ops-to-btree-kNN-v12.patch.gz Description: application/gzip
Change atoi to strtol in same place
Hello, we use atoi for user argument processing in same place which return zero for both invalid input and input value zero. In most case its ok because we error out with appropriate error message for input zero but in same place where we accept zero as valued input it case a problem by preceding for invalid input as input value zero. The attached patch change those place to strtol which can handle invalid input regards Surafel diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c index 23f706b21d..2bcacbfbb5 100644 --- a/contrib/pg_standby/pg_standby.c +++ b/contrib/pg_standby/pg_standby.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -638,6 +639,14 @@ sigquit_handler(int sig) int main(int argc, char **argv) { + char *keepfilesendptr; + char *maxretriesendptr; + char *sleeptimeendptr; + char *maxwaittimeendptr; + long numkeepfiles; + long nummaxretries; + long numsleeptime; + long nummaxwaittime; int c; progname = get_progname(argv[0]); @@ -688,12 +697,17 @@ main(int argc, char **argv) debug = true; break; case 'k': /* keepfiles */ -keepfiles = atoi(optarg); -if (keepfiles < 0) +errno = 0; +numkeepfiles = strtol(optarg, &keepfilesendptr, 10); + +if (keepfilesendptr == optarg || *keepfilesendptr != '\0' || + numkeepfiles < 0 || numkeepfiles > INT_MAX || + errno == ERANGE) { - fprintf(stderr, "%s: -k keepfiles must be >= 0\n", progname); + fprintf(stderr, "%s: -k keepfiles must be in range %d..%d\n", progname, 0, INT_MAX); exit(2); } +keepfiles = (int) numkeepfiles; break; case 'l': /* Use link */ @@ -707,31 +721,46 @@ main(int argc, char **argv) #endif break; case 'r': /* Retries */ -maxretries = atoi(optarg); -if (maxretries < 0) +errno = 0; +nummaxretries = strtol(optarg, &maxretriesendptr, 10); + +if (maxretriesendptr == optarg || *maxretriesendptr != '\0' || + nummaxretries < 0 || nummaxretries > INT_MAX || + errno == ERANGE) { - fprintf(stderr, "%s: -r maxretries must be >= 0\n", progname); + fprintf(stderr, "%s: -r maxretries must be in range %d..%d\n", progname, 0, INT_MAX); exit(2); } +maxretries = (int) nummaxretries; break; case 's': /* Sleep time */ -sleeptime = atoi(optarg); -if (sleeptime <= 0 || sleeptime > 60) +errno = 0; +numsleeptime = strtol(optarg, &sleeptimeendptr, 10); + +if (sleeptimeendptr == optarg || *sleeptimeendptr != '\0' || + numsleeptime <= 0 || numsleeptime > 60 || + errno == ERANGE) { - fprintf(stderr, "%s: -s sleeptime incorrectly set\n", progname); + fprintf(stderr, "%s: -s sleeptime must be in range %d..%d\n", progname, 1, 59); exit(2); } +sleeptime = (int) numsleeptime; break; case 't': /* Trigger file */ triggerPath = pg_strdup(optarg); break; case 'w': /* Max wait time */ -maxwaittime = atoi(optarg); -if (maxwaittime < 0) +errno = 0; +nummaxwaittime = strtol(optarg, &maxwaittimeendptr, 10); + +if (maxwaittimeendptr == optarg || *maxwaittimeendptr != '\0' || + nummaxwaittime < 0 || nummaxwaittime > INT_MAX || + errno == ERANGE) { - fprintf(stderr, "%s: -w maxwaittime incorrectly set\n", progname); + fprintf(stderr, "%s: -w maxwaittime must be in range %d..%d\n", progname, 0, INT_MAX); exit(2); } +maxwaittime = (int) nummaxwaittime; break; default: fprintf(stderr, "Try \"%s --help\" for more information.\n", progname); diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 63f554307c..16d44c8617 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -2193,6 +2193,10 @@ main(int argc, char **argv) {"no-verify-checksums", no_argument, NULL, 3}, {NULL, 0, NULL, 0} }; + char *compressEndptr; + char *timeoutEndptr; + long compressNumber; + long timeoutNumber; int c; int option_index; @@ -2305,12 +2309,18 @@ main(int argc, char **argv) #endif break; case 'Z': -compresslevel = atoi(optarg); -if (compresslevel < 0 || compresslevel > 9) +errno = 0; +compressNumber = strtol(optarg, &compressEndptr, 10); + +if (compressEndptr == optarg || *compressEndptr != '\0' || + compressNumber < 0 || compressNumber > 9 || + errno == ERANGE) { - pg_log_error("invalid compression level \"%s\"\n", optarg); + pg_log_error("compression level must be in range %d..%d \"%s\"\n", + 0, 9, optarg); exit(1); } +compresslevel = (int) compressNumber; break; case 'c': if (pg_strcasecmp(optarg, "fast") == 0) @@ -2343,12 +2353,18 @@ main(int argc, char **argv) dbgetpassword = 1; break; case 's': -standby_message_timeout = atoi(optarg) * 1000; -if (standby_message_timeout < 0) +errno
Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead
It seems strange to have relation_reloptions which abstracts away the need to know which function to call for each relkind, and separately have a bunch of places that call the specific relkind. Why not just call the specific function directly? It doesn't seem that we're gaining any abstraction ... maybe it'd be better to just remove relation_reloptions entirely. Or maybe we need to make it do a better job so that other places don't have to call the specific functions at all. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH][PROPOSAL] Add enum releation option type
It strikes me that the way to avoid sentence construction is to have each enum reloption declare a string that it uses to list the values it accepts. So for example we would have +#define GIST_OPTION_BUFFERING_ENUM_DEF { \ + { "on", GIST_OPTION_BUFFERING_ON }, \ + { "off",GIST_OPTION_BUFFERING_OFF },\ + { "auto", GIST_OPTION_BUFFERING_AUTO }, \ + { (const char *) NULL, 0 } \ +} + + GistBufferingValidMsg = gettext_noop("Valid values are \"on\", \"off\", and \"auto\"."); I think that's the most contentious point on this patch at this point (though I may be misremembering). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: POC: converting Lists into arrays
Jesper Pedersen writes: > This needs a rebase. After that check-world passes w/ and w/o > -DDEBUG_LIST_MEMORY_USAGE. Yup, here's a rebase against HEAD (and I also find that check-world shows no problems). This is pretty much of a pain to maintain, since it changes the API for lnext() which is, um, a bit invasive. I'd like to make a decision pretty quickly on whether we're going to do this, and either commit this patch or abandon it. > There is some unneeded MemoryContext stuff in async.c's > pg_listening_channels() which should be cleaned up. Yeah, there's a fair amount of follow-on cleanup that could be undertaken afterwards, but I've wanted to keep the patch's footprint as small as possible for the moment. Assuming we pull the trigger, I'd then go look at removing the planner's duplicative lists+arrays for RTEs and such as the first cleanup step. But thanks for the pointer to async.c, I'll check that too. regards, tom lane reimplement-List-as-array-6.patch.gz Description: reimplement-List-as-array-6.patch.gz
Re: Zedstore - compressed in-core columnar storage
On Sun, Jun 30, 2019 at 7:59 PM Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: > From: Ashwin Agrawal [mailto:aagra...@pivotal.io] > > The objective is to gather feedback on design and approach to the same. > > The implementation has core basic pieces working but not close to > complete. > > Thank you for proposing a very interesting topic. Are you thinking of > including this in PostgreSQL 13 if possible? > > > > * All Indexes supported > ... > > work. Btree indexes can be created. Btree and bitmap index scans work. > > Does Zedstore allow to create indexes of existing types on the table > (btree, GIN, BRIN, etc.) and perform index scans (point query, range query, > etc.)? > Yes, all indexes types work for zedstore and allow point or range queries. > > * Hybrid row-column store, where some columns are stored together, and > > others separately. Provide flexibility of granularity on how to > > divide the columns. Columns accessed together can be stored > > together. > ... > > This way of laying out the data also easily allows for hybrid row-column > > store, where some columns are stored together, and others have a > dedicated > > B-tree. Need to have user facing syntax to allow specifying how to group > > the columns. > ... > > Zedstore Table can be > > created using command: > > > > CREATE TABLE (column listing) USING zedstore; > > Are you aiming to enable Zedstore to be used for HTAP, i.e. the same table > can be accessed simultaneously for both OLTP and analytics with the minimal > performance impact on OLTP? (I got that impression from the word "hybrid".) > Well "hybrid" is more to convey compressed row and column store can be supported with same design. It really wasn't referring to HTAP. In general the goal we are moving towards is column store to be extremely efficient at analytics but still should be able to support all the OLTP operations (with minimal performance or storage size impact) Like when making trade-offs between different design choices and if both can't be meet, preference if towards analytics. If yes, is the assumption that only a limited number of columns are to be > stored in columnar format (for efficient scanning), and many other columns > are to be stored in row format for efficient tuple access? > Yes, like if its known that certain columns are always accessed together better to store them together and avoid the tuple formation cost. Though its still to be seen if compression plays role and storing each individual column and compressing can still be winner compared to compressing different columns as blob. Like saving on IO cost offsets out the tuple formation cost or not. Are those row-formatted columns stored in the same file as the > column-formatted columns, or in a separate file? > Currently, we are focused to just get pure column store working and hence not coded anything for hybrid layout yet. But at least right now the thought is would be in same file. Regarding the column grouping, can I imagine HBase and Cassandra? > How could the current CREATE TABLE syntax support column grouping? (I > guess CREATE TABLE needs a syntax for columnar store, and Zedstore need to > be incorporated in core, not as an extension...) > When column grouping comes up yes will need to modify CREATE TABLE syntax, we are still to reach that point in development. > > A column store uses the same structure but we have *multiple* B-trees, > one > > for each column, all indexed by TID. The B-trees for all columns are > stored > > in the same physical file. > > Did you think that it's not a good idea to have a different file for each > group of columns? Is that because we can't expect physical adjacency of > data blocks on disk even if we separate a column in a separate file? > > I thought a separate file for each group of columns would be easier and > less error-prone to implement and debug. Adding and dropping the column > group would also be very easy and fast. > Currently, each group is a single column (till we don't have column families) and having file for each column definitely seems not good idea. As it just explodes the number of files. Separate file may have its advantage from pre-fetching point of view but yes can't expect physical adjacency of data blocks plus access pattern will anyways involve reading multiple files (if each column stored in separate file). I doubt storing each group makes it any easier to implement or debug, I feel its actually reverse. Storing everything in single file but separate blocks, keep the logic contained inside AM layer. And don't have to write special code for example for drop table to delete files for all the groups and all, or while moving table to different tablespace and all such complication. Adding and dropping column group, irrespective can be made easy and fast with blocks for that group, added or marked for reuse within same file. Thank you for the questions.
Memory-Bounded Hash Aggregation
This is for design review. I have a patch (WIP) for Approach 1, and if this discussion starts to converge on that approach I will polish and post it. Let's start at the beginning: why do we have two strategies -- hash and sort -- for aggregating data? The two are more similar than they first appear. A partitioned hash strategy writes randomly among the partitions, and later reads the partitions sequentially; a sort will write sorted runs sequentially, but then read the among the runs randomly during the merge phase. A hash is a convenient small representation of the data that is cheaper to operate on; sort uses abbreviated keys for the same reason. Hash offers: * Data is aggregated on-the-fly, effectively "compressing" the amount of data that needs to go to disk. This is particularly important when the data contains skewed groups (see below). * Can output some groups after the first pass of the input data even if other groups spilled. * Some data types only support hashing; not sorting. Sort+Group offers: * Only one group is accumulating at once, so if the transition state grows (like with ARRAY_AGG), it minimizes the memory needed. * The input may already happen to be sorted. * Some data types only support sorting; not hashing. Currently, Hash Aggregation is only chosen if the optimizer believes that all the groups (and their transition states) fit in memory. Unfortunately, if the optimizer is wrong (often the case if the input is not a base table), the hash table will keep growing beyond work_mem, potentially bringing the entire system to OOM. This patch fixes that problem by extending the Hash Aggregation strategy to spill to disk when needed. Previous discussions: https://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop https://www.postgresql.org/message-id/1419326161.24895.13.camel%40jeff-desktop https://www.postgresql.org/message-id/87be3bd5-6b13-d76e-5618-6db0a4db584d%40iki.fi A lot was discussed, which I will try to summarize and address here. Digression: Skewed Groups: Imagine the input tuples have the following grouping keys: 0, 1, 0, 2, 0, 3, 0, 4, ..., 0, N-1, 0, N Group 0 is a skew group because it consists of 50% of all tuples in the table, whereas every other group has a single tuple. If the algorithm is able to keep group 0 in memory the whole time until finalized, that means that it doesn't have to spill any group-0 tuples. In this example, that would amount to a 50% savings, and is a major advantage of Hash Aggregation versus Sort+Group. High-level approaches: 1. When the in-memory hash table fills, keep existing entries in the hash table, and spill the raw tuples for all new groups in a partitioned fashion. When all input tuples are read, finalize groups in memory and emit. Now that the in-memory hash table is cleared (and memory context reset), process a spill file the same as the original input, but this time with a fraction of the group cardinality. 2. When the in-memory hash table fills, partition the hash space, and evict the groups from all partitions except one by writing out their partial aggregate states to disk. Any input tuples belonging to an evicted partition get spilled to disk. When the input is read entirely, finalize the groups remaining in memory and emit. Now that the in-memory hash table is cleared, process the next partition by loading its partial states into the hash table, and then processing its spilled tuples. 3. Use some kind of hybrid[1][2] of hashing and sorting. Evaluation of approaches: Approach 1 is a nice incremental improvement on today's code. The final patch may be around 1KLOC. It's a single kind of on-disk data (spilled tuples), and a single algorithm (hashing). It also handles skewed groups well because the skewed groups are likely to be encountered before the hash table fills up the first time, and therefore will stay in memory. Approach 2 is nice because it resembles the approach of Hash Join, and it can determine whether a tuple should be spilled without a hash lookup. Unfortunately, those upsides are fairly mild, and it has significant downsides: * It doesn't handle skew values well because it's likely to evict them. * If we leave part of the hash table in memory, it's difficult to ensure that we will be able to actually use the space freed by eviction, because the freed memory may be fragmented. That could force us to evict the entire in-memory hash table as soon as we partition, reducing a lot of the benefit of hashing. * It requires eviction for the algorithm to work. That may be necessary for handling cases like ARRAY_AGG (see below) anyway, but this approach constrains the specifics of eviction. Approach 3 is interesting because it unifies the two approaches and can get some of the benfits of both. It's only a single path, so it avoids planner mistakes. I really like this idea and it's possible we will end up with approach 3. However: * It requires that all data types
Re: Zedstore - compressed in-core columnar storage
On Thu, May 30, 2019 at 8:07 AM DEV_OPS wrote: > > it's really cool and very good progress, > > I'm interesting if SIDM/JIT will be supported > That's something outside of Zedstore work directly at least now. The intent is to work with current executor code or enhance it only wherever needed. If current executor code supports something that would work for Zedstore. But any other enhancements to executor will be separate undertaking.
Re: POC: converting Lists into arrays
Hi, On 7/1/19 2:44 PM, Tom Lane wrote: Yup, here's a rebase against HEAD (and I also find that check-world shows no problems). Thanks - no further comments. This is pretty much of a pain to maintain, since it changes the API for lnext() which is, um, a bit invasive. I'd like to make a decision pretty quickly on whether we're going to do this, and either commit this patch or abandon it. IMHO it is an improvement over the existing API. There is some unneeded MemoryContext stuff in async.c's pg_listening_channels() which should be cleaned up. Yeah, there's a fair amount of follow-on cleanup that could be undertaken afterwards, but I've wanted to keep the patch's footprint as small as possible for the moment. Assuming we pull the trigger, I'd then go look at removing the planner's duplicative lists+arrays for RTEs and such as the first cleanup step. But thanks for the pointer to async.c, I'll check that too. Yeah, I only called out the async.c change, as that function isn't likely to change in any of the follow up patches. Best regards, Jesper
Re: Comment typo in tableam.h
On Fri, Jun 28, 2019 at 1:47 PM Amit Kapila wrote: > On Mon, Jun 24, 2019 at 11:26 PM Ashwin Agrawal > wrote: > > > > On Mon, Jun 3, 2019 at 5:24 PM Ashwin Agrawal > wrote: > >> > >> There were few more minor typos I had collected for table am, passing > them along here. > >> > >> Some of the required callback functions are missing Assert checking > (minor thing), adding them in separate patch. > > > > > > Curious to know if need to register such small typo fixing and assertion > adding patchs to commit-fest as well ? > > > > Normally, such things are handled out of CF, but to avoid forgetting, > we can register it. However, this particular item suits more to 'Open > Items'[1]. You can remove the objectionable part of the comment, > other things in your patch look good to me. If nobody else picks it > up, I will take care of it. > Thank you, I thought Committer would remove the objectionable part of comment change and commit the patch if seems fine. I don't mind changing, just feel adds extra back and forth cycle. Please find attached v2 of patch 1 without objectionable comment change. v1 of patch 2 attaching here just for convenience, no modifications made to it. From 5c75807a56101a07685ed1f435eabcc43cd22fb7 Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 24 May 2019 16:30:38 -0700 Subject: [PATCH v2 1/2] Fix typos in few tableam comments. --- src/include/access/tableam.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h index fd07773b74f..1e45908c78a 100644 --- a/src/include/access/tableam.h +++ b/src/include/access/tableam.h @@ -434,8 +434,8 @@ typedef struct TableAmRoutine * * Note that only the subset of the relcache filled by * RelationBuildLocalRelation() can be relied upon and that the relation's - * catalog entries either will either not yet exist (new relation), or - * will still reference the old relfilenode. + * catalog entries will either not yet exist (new relation), or will still + * reference the old relfilenode. * * As output *freezeXid, *minmulti must be set to the values appropriate * for pg_class.{relfrozenxid, relminmxid}. For AMs that don't need those @@ -591,7 +591,7 @@ typedef struct TableAmRoutine * See table_relation_estimate_size(). * * While block oriented, it shouldn't be too hard for an AM that doesn't - * doesn't internally use blocks to convert into a usable representation. + * internally use blocks to convert into a usable representation. * * This differs from the relation_size callback by returning size * estimates (both relation size and tuple count) for planning purposes, @@ -967,7 +967,7 @@ table_index_fetch_end(struct IndexFetchTableData *scan) * * *all_dead, if all_dead is not NULL, will be set to true by * table_index_fetch_tuple() iff it is guaranteed that no backend needs to see - * that tuple. Index AMs can use that do avoid returning that tid in future + * that tuple. Index AMs can use that to avoid returning that tid in future * searches. * * The difference between this function and table_fetch_row_version is that @@ -1014,8 +1014,8 @@ extern bool table_index_fetch_tuple_check(Relation rel, * true, false otherwise. * * See table_index_fetch_tuple's comment about what the difference between - * these functions is. This function is the correct to use outside of - * index entry->table tuple lookups. + * these functions is. This function is correct to use outside of index + * entry->table tuple lookups. */ static inline bool table_tuple_fetch_row_version(Relation rel, -- 2.19.1 From 4ed947590f2f61182356a7fa4bc429c679e7619f Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Mon, 3 Jun 2019 17:07:05 -0700 Subject: [PATCH v2 2/2] Add assertions for required table am callbacks. --- src/backend/access/table/tableamapi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c index b2f58768107..98b8ea559d8 100644 --- a/src/backend/access/table/tableamapi.c +++ b/src/backend/access/table/tableamapi.c @@ -51,6 +51,7 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->scan_begin != NULL); Assert(routine->scan_end != NULL); Assert(routine->scan_rescan != NULL); + Assert(routine->scan_getnextslot != NULL); Assert(routine->parallelscan_estimate != NULL); Assert(routine->parallelscan_initialize != NULL); @@ -62,8 +63,12 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->index_fetch_tuple != NULL); Assert(routine->tuple_fetch_row_version != NULL); + Assert(routine->tuple_tid_valid != NULL); + Assert(routine->tuple_get_latest_tid != NULL); Assert(routine->tuple_satisfies_snapshot != NULL); + Assert(routine->compute_xid_horizon_for_tuples != NULL); + Assert(routine->tuple_insert != NULL); /* @@ -89,6 +94,7 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->index_validate_scan != NULL); Ass
Re: Add parallelism and glibc dependent only options to reindexdb
Julien Rouhaud wrote: > > I think you'd be better off to define and document this as "reindex > > only collation-sensitive indexes", without any particular reference > > to a reason why somebody might want to do that. > > We should still document that indexes based on ICU would be exluded? But why exclude them? As a data point, in the last 5 years, the en_US collation in ICU had 7 different versions (across 11 major versions of ICU): ICU Unicode en_US 54.17.0 137.56 55.17.0 153.56 56.18.0 153.64 57.18.0 153.64 58.29.0 153.72 59.19.0 153.72 60.210.0153.80 61.110.0153.80 62.111.0153.88 63.211.0153.88 64.212.1153.97 The rightmost column corresponds to pg_collation.collversion in Postgres. Each time there's a new Unicode version, it seems all collation versions are bumped. And there's a new Unicode version pretty much every year these days. Based on this, most ICU upgrades in practice would require reindexing affected indexes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Add parallelism and glibc dependent only options to reindexdb
On 2019-Jul-01, Daniel Verite wrote: > But why exclude them? > As a data point, in the last 5 years, the en_US collation in ICU > had 7 different versions (across 11 major versions of ICU): So we need a switch --include-rule=icu-collations? (I mentioned "--exclude-rule=glibc" elsewhere in the thread, but I think it should be --include-rule=glibc-collations instead, no?) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 1, 2019 at 10:13 PM Daniel Verite wrote: > > > > I think you'd be better off to define and document this as "reindex > > > only collation-sensitive indexes", without any particular reference > > > to a reason why somebody might want to do that. > > > > We should still document that indexes based on ICU would be exluded? > > But why exclude them? > As a data point, in the last 5 years, the en_US collation in ICU > had 7 different versions (across 11 major versions of ICU): > > ICU Unicode en_US > > 54.17.0 137.56 > 55.17.0 153.56 > 56.18.0 153.64 > 57.18.0 153.64 > 58.29.0 153.72 > 59.19.0 153.72 > 60.210.0153.80 > 61.110.0153.80 > 62.111.0153.88 > 63.211.0153.88 > 64.212.1153.97 > > The rightmost column corresponds to pg_collation.collversion > in Postgres. > Each time there's a new Unicode version, it seems > all collation versions are bumped. And there's a new Unicode > version pretty much every year these days. > Based on this, most ICU upgrades in practice would require reindexing > affected indexes. I have a vague recollection that ICU was providing some backward compatibility so that even if you upgrade your lib you can still get the sort order that was active when you built your indexes, though maybe for a limited number of versions. Even if that's just me being delusional, I'd still prefer Alvaro's approach to have distinct switches for each collation system.
Re: Refactoring base64 encoding and decoding into a safer interface
> On 23 Jun 2019, at 15:25, Michael Paquier wrote: > Attached is a refactoring patch for those interfaces, which introduces > a set of overflow checks so as we cannot repeat errors of the past. I’ve done a review of this submission. The patch applies cleanly, and passes make check, ssl/scram tests etc. There is enough documentation I very much agree that functions operating on a buffer like this should have the size of the buffer in order to safeguard against overflow, so +1 on the general concept. > Any thoughts? A few small comments: In src/common/scram-common.c there are a few instances like this. Shouldn’t we also free the result buffer in these cases? +#ifdef FRONTEND + return NULL; +#else In the below passage, we leave the input buffer with a non-complete encoded string. Should we memset the buffer to zero to avoid the risk that code which fails to check the returnvalue believes it has an encoded string? + /* +* Leave if there is an overflow in the area allocated for +* the encoded string. +*/ + if ((p - dst + 4) > dstlen) + return -1; cheers ./daniel
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud wrote: > I have a vague recollection that ICU was providing some backward > compatibility so that even if you upgrade your lib you can still get > the sort order that was active when you built your indexes, though > maybe for a limited number of versions. That isn't built in. Another database system that uses ICU handles this by linking to multiple versions of ICU, each with its own UCA version and associated collations. I don't think that we want to go there, so it makes sense to make an upgrade that crosses ICU or glibc versions as painless as possible. Note that ICU does at least provide a standard way to use multiple versions at once; the symbol names have the ICU version baked in. You're actually calling the functions using the versioned symbol names without realizing it, because there is macro trickery involved. -- Peter Geoghegan
Re: Add parallelism and glibc dependent only options to reindexdb
On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: > Even if that's just me being delusional, I'd still prefer Alvaro's > approach to have distinct switches for each collation system. Hi Julien, Makes sense. But why use the name "glibc" in the code and user interface? The name of the collation provider in PostgreSQL is "libc" (for example in the CREATE COLLATION command), and the problem applies no matter who makes your libc. -- Thomas Munro https://enterprisedb.com
Re: Add parallelism and glibc dependent only options to reindexdb
On 2019-Jul-02, Thomas Munro wrote: > On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: > > Even if that's just me being delusional, I'd still prefer Alvaro's > > approach to have distinct switches for each collation system. > > Hi Julien, > > Makes sense. But why use the name "glibc" in the code and user > interface? The name of the collation provider in PostgreSQL is "libc" > (for example in the CREATE COLLATION command), and the problem applies > no matter who makes your libc. Makes sense. "If your libc is glibc and you go across an upgrade over version X, please use --include-rule=libc-collation" -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On Fri, 2019-05-03 at 15:56 -0700, Paul Jungwirth wrote: > Hello, > > I wrote an extension to add a range_agg function with similar > behavior > to existing *_agg functions, and I'm wondering if folks would like > to > have it in core? Here is the repo: > https://github.com/pjungwir/range_agg This seems like a very useful extension, thank you. For getting into core though, it should be a more complete set of related operations. The patch is implicitly introducing the concept of a "multirange" (in this case, an array of ranges), but it's not making the concept whole. What else should return a multirange? This patch implements the union- agg of ranges, but we also might want an intersection-agg of ranges (that is, the set of ranges that are subranges of every input). Given that there are other options here, the name "range_agg" is too generic to mean union-agg specifically. What can we do with a multirange? A lot of range operators still make sense, like "contains" or "overlaps"; but "adjacent" doesn't quite work. What about new operations like inverting a multirange to get the gaps? Do we want to continue with the array-of-ranges implementation of a multirange, or do we want a first-class multirange concept that might eliminate the boilerplate around defining all of these operations? If we have a more complete set of operators here, the flags for handling overlapping ranges and gaps will be unnecessary. Regards, Jeff Davis
Re: Are there still non-ELF BSD systems?
On 2019-06-12 16:06, Tom Lane wrote: > Peter Eisentraut writes: > I checked around a bit ... all of the *BSD systems in the buildfarm > report ELF_SYS='true', so it's safe to say that the code you want to > remove is untested. > > Excavation in the various BSDens' change logs suggests that the last > one to fully drop a.out was OpenBSD, which didn't do so until 5.5 > (released 1 May 2015). That's more recent than I'd have hoped for, > though it looks like the holdout architectures were ones we don't > support anyway (e.g., m68k, vax). > > If we're considering this change for v13, it's hard to believe > there'd be any objections in practice. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: POC: converting Lists into arrays
I spent some time experimenting with the idea mentioned upthread of adding a macro to support deletion of a foreach loop's current element (adjusting the loop's state behind the scenes). This turns out to work really well: it reduces the complexity of fixing existing loops around element deletions quite a bit. Whereas in existing code you have to not use foreach() at all, and you have to track both the next list element and the previous undeleted element, now you can use foreach() and you don't have to mess with extra variables at all. A good example appears in the trgm_regexp.c changes below. Typically we've coded such loops with a handmade expansion of foreach, like prev = NULL; cell = list_head(state->enterKeys); while (cell) { TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell); next = lnext(cell); if (need to delete) state->enterKeys = list_delete_cell(state->enterKeys, cell, prev); else prev = cell; cell = next; } My previous patch would have had you replace this with a loop using an integer list-position index. You can still do that if you like, but it's less change to convert the loop to a foreach(), drop the prev/next variables, and replace the list_delete_cell call with foreach_delete_current: foreach(cell, state->enterKeys) { TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell); if (need to delete) state->enterKeys = foreach_delete_current(state->enterKeys, cell); } So I think this is a win, and attached is v7. regards, tom lane reimplement-List-as-array-7.patch.gz Description: reimplement-List-as-array-7.patch.gz
Re: Increasing default value for effective_io_concurrency?
Hi, On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote: > I think we should consider changing the effective_io_concurrency default > value, i.e. the guc that determines how many pages we try to prefetch in > a couple of places (the most important being Bitmap Heap Scan). Maybe we need improve the way it's used / implemented instead - it seems just too hard to determine the correct setting as currently implemented. > In some cases it helps a bit, but a bit higher value (4 or 8) performs > significantly better. Consider for example this "sequential" data set > from the 6xSSD RAID system (x-axis shows e_i_c values, pct means what > fraction of pages matches the query): I assume that the y axis is the time of the query? How much data is this compared to memory available for the kernel to do caching? >pct 0 14 1664 128 >--- > 1 25990 18624 3269 2219 2189 2171 > 5 88116 60242 14002 8663 8560 8726 > 10120556 99364 29856 17117 16590 17383 > 25101080184327 79212 47884 46846 46855 > 50130709309857163614103001 94267 94809 > 75126516435653248281156586139500140087 > > compared to the e_i_c=0 case, it looks like this: > >pct 14 1664 128 > > 1 72% 13% 9%8%8% > 5 68% 16%10% 10% 10% > 10 82% 25%14% 14% 14% > 25182% 78%47% 46% 46% > 50237% 125%79% 72% 73% > 75344% 196% 124% 110% 111% > > So for 1% of the table the e_i_c=1 is faster by about ~30%, but with > e_i_c=4 (or more) it's ~10x faster. This is a fairly common pattern, not > just on this storage system. > > The e_i_c=1 can perform pretty poorly, especially when the query matches > large fraction of the table - for example in this example it's 2-3x > slower compared to no prefetching, and higher e_i_c values limit the > damage quite a bit. I'm surprised the slowdown for small e_i_c values is that big - it's not obvious to me why that is. Which os / os version / filesystem / io scheduler / io scheduler settings were used? Greetings, Andres Freund
Re: Usage of epoch in txid_current
On Sat, Jun 22, 2019 at 11:01 AM Alexander Korotkov wrote: > 3. Change SLRU on-disk format to handle 64-bit xids and multixacts. > In particular make SLRU page numbers 64-bit too. Add SLRU upgrade > procedure to pg_upgrade. +1 for doing this for the xid-indexed SLRUs so the segment file names are never recycled. Having yet another level of wraparound code to feed and water and debug is not nice: https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com -- Thomas Munro https://enterprisedb.com
Re: Usage of epoch in txid_current
On Sun, Jun 30, 2019 at 9:07 AM Robert Haas wrote: > On Mon, Jun 24, 2019 at 1:43 PM Alvaro Herrera > wrote: > > I think enlarging multixacts to 64 bits is a terrible idea. I would > > rather get rid of multixacts completely; zheap is proposing not to use > > multixacts at all, for example. > > But zedstore, at least as of the Saturday after PGCon, is proposing to > keep using them, after first widening them to 64 bits: > > https://www.postgresql.org/message-id/CA+TgmoYeTeQSmALox0PmSm5Gh03oe=unjhml+k+btofy_u2...@mail.gmail.com > > I think we all have a visceral reaction against MultiXacts at this > point; they've just caused us too much pain, and the SLRU > infrastructure is a bunch of crap.[1] However, the case where N > sessions all do "SELECT * FROM sometab FOR SHARE" is pretty wretched > without multixacts. You'll just have to keep reducing the tuple > density per page to fit all the locks, whereas the current heap is > totally fine and neither the heap nor the multixact space bloats all > that terribly much. > > I currently think it's likely that zheap will use something > multixact-like rather than actually using multixacts per se. I am > having trouble seeing how we can have some sort of fixed-bit-width ID > number that represents as set of XIDs for purposes of lock-tracking > without creating some nasty degenerate cases that don't exist > today.[2] The new thing described over here is intended to support something a bit like multixacts: https://www.postgresql.org/message-id/CA%2BhUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ%40mail.gmail.com For example, here is some throw-away demo code that puts arbitrary data into an undo log, in this case a list of xids given with SELECT test_multixact(ARRAY[1234, 2345, 3456]), and provides a callback to say when the data can be discarded, in this case when all of those xids are no longer running. You can see the record with SELECT * FROM undoinspect('shared'), until it gets eaten by a background worker. The reason it has to be an in-core demo is just because we don't have a way to do extensions that have an RMGR ID and callback functions yet. Although this demo throws the return value away, the function PrepareUndoInsert() returns a 64 bit UndoRecPtr which could be stored in any page and can be used to retrieve the records (via the buffer pool) including the binary payload which can be whatever struct you like (though there is a size limit so you might need more than one record for a huge list of multi-lockers). When you try to load the record, you might be told that it's gone, which means that the lockers are gone, whcih means that your callback must have said that was OK. This probably makes most sense for a system that is already planning to use UndoRecPtr for other things, like MVCC, since it probably already has per page or per tuple space to store UndoRecPtr, and updates and explicit locks are so closely related. https://github.com/EnterpriseDB/zheap/commit/c92fdfd1f1178cbb44557a7c630b366d1539c332 -- Thomas Munro https://enterprisedb.com
Re: Hash join explain is broken
Hi, On 2019-06-18 00:00:28 -0700, Andres Freund wrote: > On 2019-06-13 16:23:34 -0700, Andres Freund wrote: > > On June 13, 2019 3:38:47 PM PDT, Tom Lane wrote: > > >Andres Freund writes: > > >> I am too tired to look further into this. I suspect the only reason > > >we > > >> didn't previously run into trouble with the executor stashing > > >hashkeys > > >> manually at a different tree level with: > > >> ((HashState *) innerPlanState(hjstate))->hashkeys > > >> is that hashkeys itself isn't printed... > > > > > >TBH, I think 5f32b29c is just wrong and should be reverted for now. > > >If there's a need to handle those expressions differently, it will > > >require some cooperation from the planner not merely a two-line hack > > >in executor startup. That commit didn't include any test case or > > >other demonstration that it was solving a live problem, so I think > > >we can leave it for v13 to address the issue. > > > > I'm pretty sure you'd get an assertion failure if you reverted it > > (that's why it was added). So it's a bit more complicated than that. > > Unfortunately I'll not get back to work until Monday, but I'll spend > > time on this then. > > Indeed, there are assertion failures when initializing the expression > with HashJoinState as parent - that's because when computing the > hashvalue for nodeHash input, we expect the slot from the node below to > be of the type that HashState returns (as that's what INNER_VAR for an > expression at the HashJoin level refers to), rather than the type of the > input to HashState. We could work around that by marking the slots from > underlying nodes as being of an unknown type, but that'd slow down > execution. > > I briefly played with the dirty hack of set_deparse_planstate() > setting dpns->inner_planstate = ps for IsA(ps, HashState), but that > seems just too ugly. > > I think the most straight-forward fix might just be to just properly > split the expression at plan time. Adding workarounds for things as > dirty as building an expression for a subsidiary node in the parent, and > then modifying the subsidiary node from the parent, doesn't seem like a > better way forward. > > The attached *prototype* does so. > > If we go that way, we probably need to: > - Add a test for the failure case at hand > - check a few of the comments around inner/outer in nodeHash.c > - consider moving the setrefs.c code into its own function? > - probably clean up the naming scheme in createplan.c > > I think there's a few more things we could do, although it's not clear > that that needs to happen in v12: > - Consider not extracting hj_OuterHashKeys, hj_HashOperators, > hj_Collations out of HashJoin->hashclauses, and instead just directly > handing them individually in the planner. create_mergejoin_plan() > already partially does that. Tom, any comments? Otherwise I'll go ahead, and commit after a round or two of polishing. - Andres
Re: Hash join explain is broken
Andres Freund writes: > Tom, any comments? Otherwise I'll go ahead, and commit after a round or > two of polishing. Sorry for not getting to this sooner --- I'll try to look tomorrow. regards, tom lane
Re: Rearranging ALTER TABLE to avoid multi-operations bugs
movead li writes: > I applied the 'alter-table-with-recursive-process-utility-calls-wip.patch' > on the master(e788e849addd56007a0e75f3b5514f294a0f3bca). And > when I test the cases, I find it works well on 'alter table t1 add column > f2 int not null, alter column f2 add generated always as identity' case, > but it doesn't work on #14827, #15180, #15670, #15710. This review seems not very on-point, because I made no claim to have fixed any of those bugs. The issue at the moment is how to structure the code to allow ALTER TABLE to call other utility statements --- or, if we aren't going to do that as Robert seems not to want to, what exactly we're going to do instead. The patch at hand does fix some ALTER TABLE ... IDENTITY bugs, because fixing those doesn't require any conditional execution of utility statements. But we'll need infrastructure for such conditional execution to fix the original bugs. I don't see much point in working on that part until we have some agreement about how to handle what this patch is already doing. regards, tom lane
Re: Code comment change
On Sun, Jun 23, 2019 at 3:36 AM Thomas Munro wrote: > Pushed. Thanks! I wonder what the comment is supposed to mean. I think that it's addressing the situation prior to commit 70508ba7aed in 2003, which was the point when the "fast" root concept was introduced. Prior to that commit, there was only what we would now call a true root, and _bt_getroot() had to loop to make sure that it reliably found it without deadlocking, while dealing with concurrent splits. This was necessary because the old design also involved maintaining a pointer to each page's parent in each page, which sounds like a seriously bad approach to me. I think that the whole sentence about "the standard class of race conditions" should go. There is no more dance. Nothing in _bt_getroot() is surprising to me. The other comments explain things comprehensively. -- Peter Geoghegan
Re: Code comment change
Peter Geoghegan writes: > On Sun, Jun 23, 2019 at 3:36 AM Thomas Munro wrote: >> Pushed. Thanks! > I wonder what the comment is supposed to mean. > I think that it's addressing the situation prior to commit 70508ba7aed > in 2003, which was the point when the "fast" root concept was > introduced. Yeah. I did some research into the provenance of that comment when Thomas pushed the change. It's *old*. The whole para exists verbatim in Postgres v4r2, src/backend/access/nbtree/nbtpage.c dated 1993-12-10 (in my copy of that tarball). The only change since then has been to change the whitespace for 4-space tabs. Even more interesting, the same para also exists verbatim in v4r2's src/backend/access/nobtree/nobtpage.c, which is dated 1991-10-29 in the same tarball. (If you're wondering, "nobtree" seems to stand for "no-overwrite btree"; so I suppose it went the way of all flesh when Stonebraker lost interest in write-once mass storage.) So presumably this comment dates back to some common ancestor of the mainline btree code and the no-overwrite code, which must have been even older than the 1991 date. This is only marginally relevant to what we should do about it today, but I think it's reasonable to conclude that the current locking considerations are nearly unrelated to what they were when the comment was written. > I think that the whole sentence about "the standard class of race > conditions" should go. There is no more dance. Nothing in > _bt_getroot() is surprising to me. The other comments explain things > comprehensively. +1 regards, tom lane
Re: cleanup & refactoring on reindexdb.c
On Sat, Jun 29, 2019 at 11:24:49AM +0900, Michael Paquier wrote: > Thanks for the rebase (and the reminder..). I'll look at that once > v13 opens for business. And applied. -- Michael signature.asc Description: PGP signature
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 01, 2019 at 06:28:13PM +0200, Julien Rouhaud wrote: > On Mon, Jul 1, 2019 at 4:10 PM Tom Lane wrote: >> Couldn't we make this enormously simpler and less bug-prone by just >> dictating that --jobs applies only to reindex-table operations? I had the same argument about the first patch sets actually, but... :) > That would also mean that we'll have to fallback on doing reindex at > table-level, even if we only want to reindex indexes that depends on > glibc. I'm afraid that this will often add a huge penalty. Yes, I would expect that most of the time glibc-sensible indexes are also mixed with other ones which we don't care about here. One advantage of the argument from Tom though is that it is possible to introduce --jobs with minimal steps: 1) Refactor the code for connection slots, without the cell addition 2) Introduce --jobs without INDEX support. In short, the conflict business between indexes is something which could be tackled afterwards and with a separate patch. Parallel indexes at table-level has value in itself, particularly with CONCURRENTLY coming in the picture. -- Michael signature.asc Description: PGP signature
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 01, 2019 at 06:14:20PM +0200, Julien Rouhaud wrote: > On Mon, Jul 1, 2019 at 3:51 PM Alvaro Herrera > wrote: > > > > Please don't reuse a file name as generic as "parallel.c" -- it's > > annoying when navigating source. Maybe conn_parallel.c multiconn.c > > connscripts.c admconnection.c ...? > > I could use scripts_parallel.[ch] as I've already used it in the > #define part? multiconn.c sounds rather good, but I have a poor ear for any kind of naming.. >> If your server crashes or is stopped midway during the reindex, you >> would have to start again from scratch, and it's tedious (if it's >> possible at all) to determine which indexes were missed. I think it >> would be useful to have a two-phase mode: in the initial phase reindexdb >> computes the list of indexes to be reindexed and saves them into a work >> table somewhere. In the second phase, it reads indexes from that table >> and processes them, marking them as done in the work table. If the >> second phase crashes or is stopped, it can be restarted and consults the >> work table. I would keep the work table, as it provides a bit of an >> audit trail. It may be important to be able to run even if unable to >> create such a work table (because of the numerous users that >> DROP DATABASE postgres). > > Or we could create a table locally in each database, that would fix > this problem and probably make the code simpler? > > It also raises some additional concerns about data expiration. I > guess that someone could launch the tool by mistake, kill reindexdb, > and run it again 2 months later while a lot of new objects have been > added for instance. This looks like fancy additions, still that's not the core of the problem, no? If you begin to play in this area you would need more control options, basically a "continue" mode to be able to restart a previously failed attempt, and a "reinit" mode able to restart the operation completely from scratch, and perhaps even a "reset" mode which cleans up any data already present. Not really a complexity, but this has to be maintained a database level. >> The --glibc-dependent >> switch seems too ad-hoc. Maybe "--exclude-rule=glibc"? That way we can >> add other rules later. (Not "--exclude=foo" because we'll want to add >> the possibility to ignore specific indexes by name.) > > That's a good point, I like the --exclude-rule switch. Sounds kind of nice. -- Michael signature.asc Description: PGP signature
Re: Code comment change
On Mon, Jul 1, 2019 at 7:28 PM Tom Lane wrote: > Even more interesting, the same para also exists verbatim in > v4r2's src/backend/access/nobtree/nobtpage.c, which is dated 1991-10-29 > in the same tarball. (If you're wondering, "nobtree" seems to stand > for "no-overwrite btree"; so I suppose it went the way of all flesh > when Stonebraker lost interest in write-once mass storage.) So presumably > this comment dates back to some common ancestor of the mainline btree code > and the no-overwrite code, which must have been even older than the 1991 > date. "no-overwrite btree" is described here, if you're interested: https://pdfs.semanticscholar.org/a0de/438d5efd96e8af51bc7595aa1c30d0497a57.pdf This is a link to the B-Tree focused paper "An Index Implementation Supporting Fast Recovery for the POSTGRES Storage System". I found that the paper provided me with some interesting historic context. I am pretty sure that the authors were involved in early work on the Postgres B-Tree code. It references Lanin and Shasha, even though the nbtree code that is influenced by L&S first appears in the same 2003 commit of yours that I mentioned. > > I think that the whole sentence about "the standard class of race > > conditions" should go. There is no more dance. Nothing in > > _bt_getroot() is surprising to me. The other comments explain things > > comprehensively. > > +1 I'll take care of it soon. -- Peter Geoghegan
Re: Add missing operator <->(box, point)
[ warning, drive-by comment ahead ] Fabien COELHO writes: > I notice that other distance tests do not test for commutativity. Are they > also not implemented, or just not tested? If not implemented, I'd suggest > to add them in the same batch. Yeah ... just looking at operators named <->, I see regression=# select oid, oid::regoperator, oprcom, oprcode from pg_operator where oprname = '<->'; oid | oid | oprcom | oprcode --+--++--- 517 | <->(point,point) |517 | point_distance 613 | <->(point,line) | 0 | dist_pl 614 | <->(point,lseg) | 0 | dist_ps 615 | <->(point,box) | 0 | dist_pb 616 | <->(lseg,line) | 0 | dist_sl 617 | <->(lseg,box)| 0 | dist_sb 618 | <->(point,path) | 0 | dist_ppath 706 | <->(box,box) |706 | box_distance 707 | <->(path,path) |707 | path_distance 708 | <->(line,line) |708 | line_distance 709 | <->(lseg,lseg) |709 | lseg_distance 712 | <->(polygon,polygon) |712 | poly_distance 1520 | <->(circle,circle) | 1520 | circle_distance 1522 | <->(point,circle)| 3291 | dist_pc 3291 | <->(circle,point)| 1522 | dist_cpoint 3276 | <->(point,polygon) | 3289 | dist_ppoly 3289 | <->(polygon,point) | 3276 | dist_polyp 1523 | <->(circle,polygon) | 0 | dist_cpoly 1524 | <->(line,box)| 0 | dist_lb 5005 | <->(tsquery,tsquery) | 0 | pg_catalog.tsquery_phrase (20 rows) It's not clear to me why to be particularly more excited about <->(box, point) than about the other missing cases here. regards, tom lane
Re: MSVC Build support with visual studio 2019
On Mon, Jul 01, 2019 at 07:56:29PM +1000, Haribabu Kommi wrote: > We stopped the support of building with all the visual studio versions less > than 2013. I updated the SDK versions accordingly. I have spent some time looking around, and wikipedia-sensei has proved to be helpful to grasp the release references: https://en.wikipedia.org/wiki/Microsoft_Windows_SDK So the suggestions from the patch are fine. This one was actually forgotten: src/tools/msvc/README:from www.microsoft.com (v6.0 or greater). > The similar change is required for the CreateProject also. I have changed both messages so as the version of VS attempted to be used is reported in the error message directly. > + # The major visual studio that is supported has nmake > + # version >= 14.30, so stick with it as the latest version > > The major visual studio version that is supported has nmake version > <=14.30 Damn. Thanks for pointing out that. > Except for the above two changes, overall the patch is in good shape. OK, committed to HEAD for now after perltidy'ing the patch. Let's see what the buildfarm has to say about it first. Once we are sure that the thing is stable, I'll try to backpatch it. This works on my own dev machines with VS 2015 and 2019, but who knows what hides in the shadows... -- Michael signature.asc Description: PGP signature
Re: cleanup & refactoring on reindexdb.c
On Tue, Jul 2, 2019 at 4:44 AM Michael Paquier wrote: > > On Sat, Jun 29, 2019 at 11:24:49AM +0900, Michael Paquier wrote: > > Thanks for the rebase (and the reminder..). I'll look at that once > > v13 opens for business. > > And applied. Thanks!
Re: Refactoring base64 encoding and decoding into a safer interface
On Mon, Jul 01, 2019 at 11:11:43PM +0200, Daniel Gustafsson wrote: > I very much agree that functions operating on a buffer like this should have > the size of the buffer in order to safeguard against overflow, so +1 on the > general concept. Thanks for the review! > A few small comments: > > In src/common/scram-common.c there are a few instances like this. Shouldn’t > we > also free the result buffer in these cases? > > +#ifdef FRONTEND > + return NULL; > +#else Fixed. > In the below passage, we leave the input buffer with a non-complete > encoded string. Should we memset the buffer to zero to avoid the > risk that code which fails to check the return value believes it has > an encoded string? Hmm. Good point. I have not thought of that, and your suggestion makes sense. Another question is if we'd want to actually use explicit_bzero() here, but that could be a discussion on this other thread, except if the patch discussed there is merged first: https://www.postgresql.org/message-id/42d26bde-5d5b-c90d-87ae-6cab875f7...@2ndquadrant.com Attached is an updated patch. -- Michael diff --git a/src/backend/libpq/auth-scram.c b/src/backend/libpq/auth-scram.c index 6b60abe1dd..3a31afc7b7 100644 --- a/src/backend/libpq/auth-scram.c +++ b/src/backend/libpq/auth-scram.c @@ -510,9 +510,11 @@ scram_verify_plain_password(const char *username, const char *password, return false; } - salt = palloc(pg_b64_dec_len(strlen(encoded_salt))); - saltlen = pg_b64_decode(encoded_salt, strlen(encoded_salt), salt); - if (saltlen == -1) + saltlen = pg_b64_dec_len(strlen(encoded_salt)); + salt = palloc(saltlen); + saltlen = pg_b64_decode(encoded_salt, strlen(encoded_salt), salt, + saltlen); + if (saltlen < 0) { ereport(LOG, (errmsg("invalid SCRAM verifier for user \"%s\"", username))); @@ -596,9 +598,10 @@ parse_scram_verifier(const char *verifier, int *iterations, char **salt, * Verify that the salt is in Base64-encoded format, by decoding it, * although we return the encoded version to the caller. */ - decoded_salt_buf = palloc(pg_b64_dec_len(strlen(salt_str))); + decoded_len = pg_b64_dec_len(strlen(salt_str)); + decoded_salt_buf = palloc(decoded_len); decoded_len = pg_b64_decode(salt_str, strlen(salt_str), -decoded_salt_buf); +decoded_salt_buf, decoded_len); if (decoded_len < 0) goto invalid_verifier; *salt = pstrdup(salt_str); @@ -606,16 +609,18 @@ parse_scram_verifier(const char *verifier, int *iterations, char **salt, /* * Decode StoredKey and ServerKey. */ - decoded_stored_buf = palloc(pg_b64_dec_len(strlen(storedkey_str))); + decoded_len = pg_b64_dec_len(strlen(storedkey_str)); + decoded_stored_buf = palloc(decoded_len); decoded_len = pg_b64_decode(storedkey_str, strlen(storedkey_str), -decoded_stored_buf); +decoded_stored_buf, decoded_len); if (decoded_len != SCRAM_KEY_LEN) goto invalid_verifier; memcpy(stored_key, decoded_stored_buf, SCRAM_KEY_LEN); - decoded_server_buf = palloc(pg_b64_dec_len(strlen(serverkey_str))); + decoded_len = pg_b64_dec_len(strlen(serverkey_str)); + decoded_server_buf = palloc(decoded_len); decoded_len = pg_b64_decode(serverkey_str, strlen(serverkey_str), -decoded_server_buf); +decoded_server_buf, decoded_len); if (decoded_len != SCRAM_KEY_LEN) goto invalid_verifier; memcpy(server_key, decoded_server_buf, SCRAM_KEY_LEN); @@ -649,8 +654,18 @@ mock_scram_verifier(const char *username, int *iterations, char **salt, /* Generate deterministic salt */ raw_salt = scram_mock_salt(username); - encoded_salt = (char *) palloc(pg_b64_enc_len(SCRAM_DEFAULT_SALT_LEN) + 1); - encoded_len = pg_b64_encode(raw_salt, SCRAM_DEFAULT_SALT_LEN, encoded_salt); + encoded_len = pg_b64_enc_len(SCRAM_DEFAULT_SALT_LEN); + /* don't forget the zero-terminator */ + encoded_salt = (char *) palloc(encoded_len + 1); + encoded_len = pg_b64_encode(raw_salt, SCRAM_DEFAULT_SALT_LEN, encoded_salt, +encoded_len); + + /* + * Note that we cannot reveal any information to an attacker here so the + * error message needs to remain generic. + */ + if (encoded_len < 0) + elog(ERROR, "could not encode salt"); encoded_salt[encoded_len] = '\0'; *salt = encoded_salt; @@ -1144,8 +1159,15 @@ build_server_first_message(scram_state *state) (errcode(ERRCODE_INTERNAL_ERROR), errmsg("could not generate random nonce"))); - state->server_nonce = palloc(pg_b64_enc_len(SCRAM_RAW_NONCE_LEN) + 1); - encoded_len = pg_b64_encode(raw_nonce, SCRAM_RAW_NONCE_LEN, state->server_nonce); + encoded_len = pg_b64_enc_len(SCRAM_RAW_NONCE_LEN); + /* don't forget the zero-terminator */ + state->server_nonce = palloc(encoded_len + 1); + encoded_len = pg_b64_encode(raw_nonce, SCRAM_RAW_NONCE_LEN, +state->server_nonce, encoded_len); + if (encoded_len < 0) + ereport(ERROR, +(errcode(ERRCODE_INTERNAL_ERROR), + errmsg("could not encode random nonce"))); state->server_nonce[encoded_len] =
Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
Rebased, aligned with recent changes in pg_rewind/pg_basebackup and then retested. Thanks. On Mon, Jul 1, 2019 at 7:35 PM Thomas Munro wrote: > On Fri, Apr 19, 2019 at 3:41 PM Paul Guo wrote: > > I updated the patches as attached following previous discussions. > > Hi Paul, > > Could we please have a fresh rebase now that the CF is here? > > Thanks, > > -- > Thomas Munro > > https://urldefense.proofpoint.com/v2/url?u=https-3A__enterprisedb.com&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=Usi0ex6Ch92MsB5QQDgYFw&m=mxictY8xxFIFvsyxFYx4bXwF4PfnGWWRuYLLX4R1yhs&s=qvC2BI2OhKkBz71FO1w2XNy6dvfhIeGHT3X0yU3XDlU&e= > v3-0001-Extact-common-functions-from-pg_basebackup-into-s.patch Description: Binary data v3-0002-Add-option-to-write-recovery-configuration-inform.patch Description: Binary data v3-0003-Ensure-target-clean-shutdown-at-the-beginning-of-.patch Description: Binary data
Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
On Tue, Jul 2, 2019 at 12:35 AM Alvaro Herrera wrote: > On 2019-Apr-19, Paul Guo wrote: > > > The below patch runs single mode Postgres if needed to make sure the > target > > is cleanly shutdown. A new option is added (off by default). > > v2-0001-Ensure-target-clean-shutdown-at-beginning-of-pg_r.patch > > Why do we need an option for this? Is there a reason not to do this > unconditionally? > There is concern about this (see previous emails in this thread). On greenplum (MPP DB based on Postgres), we unconditionally do this. I'm not sure about usually how Postgres users do this when there is an unclean shutdown, but providing an option seem to be safer to avoid breaking existing script/service whatever. If many people think this option is unnecessary, I'm fine to remove the option and keep the code logic.
Re: C testing for Postgres
On Fri, Jun 28, 2019 at 09:42:54AM -0400, Adam Berlin wrote: > If we were to use this tool, would the community want to vendor the > framework in the Postgres repository, or keep it in a separate repository > that produces a versioned shared library? Well, my take is that having a base infrastructure for a fault injection framework is something that would prove to be helpful, and that I am not against having something in core. While working on various issues, I have found myself doing many times crazy stat() calls on an on-disk file to enforce an elog(ERROR) or elog(FATAL), and by experience fault points are things very *hard* to place correctly because they should not be single-purpose things. Now, we don't want to finish with an infinity of fault points in the tree, but being able to enforce a failure in a point added for a patch using a SQL command can make the integration of tests in a patch easier for reviewers, for example isolation tests with elog(ERROR) (like what has been discussed for b4721f3). -- Michael signature.asc Description: PGP signature