Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Hello Justin, Patch series applies cleanly. The last status compiles and passes "make check". A few more comments: * v8.[123] ok. * v8.4 Avoid using the type name as a field name? "enum dir_action dir_action;" -> "enum dir_action action", or maybe rename "dir_action" enum "dir_action_t". About pg_ls_dir: "if (!fctx->dirdesc)" I do not think that is true even if AllocateDir failed, the list exists anyway. ISTM it should be linitial which is NULL in that case. Given the overlap between pg_ls_dir and pg_ls_dir_files, ISTM that the former should call the slightly extended later with appropriate flags. About populate_paths: function is a little bit strange to me, ISTM it would deserve more comments. I'm not sure the name reflect what it does. For instance, ISTM that it does one thing, but the name is plural. Maybe "move_to_next_path" or "update_current_path" or something? It returns an int which can only be 0 or 1, which smells like an bool. What is this int/bool is not told in the function head comment. I guess it is whether the path was updated. When it returns false, the list length is down to one. Shouldn't AllocateDir be tested for bad result? Maybe it is a dir but you do not have perms to open it? Or give a comment about why it cannot happen? later, good, at least the function is called, even if it is only for an error case. Maybe some non empty coverage tests could be added with a "count(*) > 0" on not is_dir or maybe "count(*) = 0" on is_dir, for instance? (SELECT oid FROM pg_tablespace b WHERE b.spcname='regress_tblspace' UNION SELECT 0 ORDER BY 1 DESC LIMIT 1)b The 'b' glued to the ')' looks pretty strange. I'd suggest ") AS b". Reusing the same alias twice could be avoided for clarity, maybe. * v8.[56] I'd say that a behavior change which adds a column and a possibly a few rows is ok, especially as the tmpdir contains subdirs now. -- Fabien.
Re: Psql patch to show access methods info
On Fri, Mar 6, 2020 at 11:46 AM Alexander Korotkov wrote: > On Fri, Mar 6, 2020 at 7:10 AM vignesh C wrote: > > I feel your explanation sounds fair to me. > > Thanks. > > I've also revised tab-completion code. I'm going to push this if no > objections. So, pushed! -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Allow to_date() and to_timestamp() to accept localized names
On Sun, Mar 8, 2020 at 3:48 AM Tom Lane wrote: > James Coleman writes: > > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane wrote: > >> Looks like you may not have Turkish locale installed? Try > >> locale -a | grep tr_TR > > > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I assume > the > > utf8 version is acceptable? Or is there a non-utf8 variant? > > Hmm ... I'm far from an expert on the packaging of locale data, but > the simplest explanation I can think of is that the tr_TR locale exists > to some extent on your machine but the LC_TIME component of that is > missing. > AFAICS, the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), is not the same as 'tr_TR.utf8'. > BTW, what platform are you using anyway? > I have just checked in a Debian Stretch Regards, Juan José Santamaría Flecha
RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()
> Attached is v5, which inlines in a targeted fashion, pretty much in the same > way as the earliest version. This is the same as v4 in every other way. > Perhaps you can test this. > Thank you for the new patch. With the new one I am indeed able to reproduce a performance increase. It is very difficult to reliably reproduce it when running on a large number of cores though, due to the NUMA architecture. For tests with a small number of cores, I pin all of them to the same node. With that, I see a significant performance increase for v5 compared to master. However, if I pin pgbench to a different node than the node that Postgres is pinned to, this leads to a 20% performance degradation compared to having all of them on the same node, as well as the stddev increasing by a factor of 2 (regardless of patch). With that, it becomes very difficult to see any kind of performance increase due to the patch. For a large number of pgbench workers, I cannot specifically pin the pgbench worker on the same node as the Postgres backend connection it's handling. Leaving it to the OS gives very unreliable results - when I run the 30 workers / 30 connections test, I sometimes see periods of up to 30 minutes where it's doing it 'correctly', but it could also randomly run at the -20% performance for a long time. So far my best bet at explaining this is the NUMA performance hit. I'd like to be able to specifically schedule some Postgres backends to run on one node, while other Postgres backends run on a different node, but this isn't straightforward. For now, I see no issues with the patch though. However, in real life situations there may be other, more important, optimizations for people that use big multi-node machines. Thoughts? -Floris
Re: Allow to_date() and to_timestamp() to accept localized names
On Sat, Mar 7, 2020 at 9:48 PM Tom Lane wrote: > James Coleman writes: > > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane wrote: > >> Looks like you may not have Turkish locale installed? Try > >> locale -a | grep tr_TR > > > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I assume > the > > utf8 version is acceptable? Or is there a non-utf8 variant? > > Hmm ... I'm far from an expert on the packaging of locale data, but > the simplest explanation I can think of is that the tr_TR locale exists > to some extent on your machine but the LC_TIME component of that is > missing. > > Do you get different results from "date" depending on the locale? > I get > > $ LANG=C date > Sat Mar 7 21:44:24 EST 2020 > $ LANG=tr_TR.utf8 date > Cts Mar 7 21:44:26 EST 2020 > $ LANG=C date Sun Mar 8 09:27:50 EDT 2020 $ LANG=tr_TR.utf8 date Paz Mar 8 09:27:54 EDT 2020 $ LANG=tr_TR date Sun Mar 8 09:27:57 EDT 2020 I'm curious if you get something different for that last one (no utf8 qualifier). on my Fedora 30 box. > > Another possibility perhaps is that you have partial locale settings > in your environment that are bollixing the test. Try > > $ env | grep ^LANG > This gives me: LANG=en_US.UTF-8 LANGUAGE=en_US > $ env | grep ^LC_ > Nothing for this. > If there's more than one relevant environment setting, and they > don't all agree, I'm not sure what would happen with our > regression tests. > There's LANG and LANGUAGE but they look like they effectively agree to me? > BTW, what platform are you using anyway? > ElementaryOS 5.1 Hera, which is built on top of Ubuntu 18.04.3 LTS (and Linux 4.15.0-72-generic). This suggests I have the standard Ubuntu Turkish language packages installed: $ dpkg -l | grep language-pack-tr ii language-pack-tr 1:18.04+20180712 all translation updates for language Turkish ii language-pack-tr-base 1:18.04+20180712 all translations for language Turkish James
Re: Index Skip Scan
> On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote: > > I've been looking over v32 of the patch and have a few comments > regarding the planner changes. Thanks for the commentaries! > I think the changes in create_distinct_paths() need more work. The > way I think this should work is that create_distinct_paths() gets to > know exactly nothing about what path types support the elimination of > duplicate values. The Path should carry the UniqueKeys so that can be > determined. In create_distinct_paths() you should just be able to make > use of those paths, which should already have been created when > creating index paths for the rel due to PlannerInfo's query_uniquekeys > having been set. Just for me to clarify. The idea is to "move" information about what path types support skipping into UniqueKeys (derived from PlannerInfo's query_uniquekeys), but other checks (e.g. if index am supports that) still perform in create_distinct_paths? > Also, should create_grouping_paths() be getting the same code? > Jesper's UniqueKey patch seems to set query_uniquekeys when there's a > GROUP BY with no aggregates. So it looks like he has intended that > something like: > > SELECT x FROM t GROUP BY x; > > should work the same way as > > SELECT DISTINCT x FROM t; > > but the 0002 patch does not make this work. Has that just been overlooked? I believe it wasn't overlooked in 0002 patch, but rather added just in case in 0001. I guess there are no theoretical problems in implementing it, but since we wanted to keep scope of the patch under control and concentrate on the existing functionality it probably makes sense to plan it as one of the next steps? > There's also some weird looking assumptions that an EquivalenceMember > can only be a Var in create_distinct_paths(). I think you're only > saved from crashing there because a ProjectionPath will be created > atop of the IndexPath to evaluate expressions, in which case you're > not seeing the IndexPath. This results in the optimisation not > working in cases like: > > postgres=# create table t (a int); create index on t ((a+1)); explain > select distinct a+1 from t; > CREATE TABLE > CREATE INDEX > QUERY PLAN > --- > HashAggregate (cost=48.25..50.75 rows=200 width=4) >Group Key: (a + 1) >-> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4) Yes, I need to fix it. > Using unique paths as I mentioned above should see that fixed. I'm a bit confused about this statement, how exactly unique paths should fix this?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote: > On Tue, Mar 19, 2019 at 12:38 PM legrand legrand > wrote: > > Would it make sense to add it in auto explain ? > > I don't know for explain itself, but maybe ... > > I think that it should appear in EXPLAIN. pg_stat_statements already > cannot have a query hash of zero, so it might be okay to display it > only when its value is non-zero. I had forgotten about this. After looking at it, I can see a few issues. For now post_parse_analyze_hook isn't called for the underlying statement, so we don't have the queryid. And we can't compute the queryid for the underlying query in the initial post_parse_analyze_hook call as we don't want the executor to have a queryid set in that case to avoid cumulating counters for both the explain and the query. We could add an extra call in ExplainQuery, but this will be ignored by pg_stat_statements unless you set pg_stat_statements.track to all. Also, pgss_post_parse_analyze will try to record an entry with the normalized query text if no one exists yet and if any constant where removed. The problem is that, as I already mentioned in [1], the underlying query doesn't have query_location or query_len valued, so the recorded query text will at least contain the explain part of the input query. [1] https://www.postgresql.org/message-id/CAOBaU_Y-y%2BVOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A%40mail.gmail.com
Re: Allow to_date() and to_timestamp() to accept localized names
On Sun, Mar 8, 2020 at 7:17 AM Juan José Santamaría Flecha < juanjo.santama...@gmail.com> wrote: > > > On Sun, Mar 8, 2020 at 3:48 AM Tom Lane wrote: > >> James Coleman writes: >> > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane wrote: >> >> Looks like you may not have Turkish locale installed? Try >> >> locale -a | grep tr_TR >> >> > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I assume >> the >> > utf8 version is acceptable? Or is there a non-utf8 variant? >> >> Hmm ... I'm far from an expert on the packaging of locale data, but >> the simplest explanation I can think of is that the tr_TR locale exists >> to some extent on your machine but the LC_TIME component of that is >> missing. >> > > AFAICS, the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), is not > the same as 'tr_TR.utf8'. > The test name implies it's about utf8, though, which makes me wonder if the test should be testing utf8 instead? That being said, a bit more googling based on your node about the proper ISO encoding turned up this page: https://unix.stackexchange.com/a/446762 And I confirmed that the locale you mentioned is available: $ grep "tr_TR" /usr/share/i18n/SUPPORTED tr_TR.UTF-8 UTF-8 tr_TR ISO-8859-9 So I tried: echo tr_TR.ISO-8859-9 >> /var/lib/locales/supported.d/local # In a root session sudo dpkg-reconfigure locales That didn't seem to fix it, though `locale -a` still only lists tr_TR.utf8, so I'm still at a loss, and also unclear why a test names utf8 is actually relying on an ISO encoding. James
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Saturday, March 7, 2020, Dilip Kumar wrote: > On Sat, Mar 7, 2020 at 9:59 AM Dilip Kumar wrote: > > > > On Sat, Mar 7, 2020 at 12:30 AM Andres Freund > wrote: > > > > > > Hi, > > > > > > On 2020-01-08 18:06:52 +0530, Dilip Kumar wrote: > > > > On Wed, 8 Jan 2020 at 5:28 PM, Heikki Linnakangas > wrote: > > > > > > > > > On 25/11/2019 05:52, Dilip Kumar wrote: > > > > > > In logical decoding, while sending the changes to the output > plugin we > > > > > > need to arrange them in the LSN order. But, if there is only one > > > > > > transaction which is a very common case then we can avoid > building the > > > > > > binary heap. A small patch is attached for the same. > > > > > > > > > > Does this make any measurable performance difference? Building a > > > > > one-element binary heap seems pretty cheap. > > > > > > > > > > > > I haven’t really measured the performance for this. I will try to > do that > > > > next week. Thanks for looking into this. > > > > > > Did you do that? > > > > I tried once in my local machine but could not produce consistent > > results. I will try this once again in the performance machine and > > report back. > > I have tried to decode changes for the 100,000 small transactions and > measured the time with head vs patch. I did not observe any > significant gain. > > Head > --- > 519ms > 500ms > 487ms > 501ms > > patch > -- > 501ms > 492ms > 486ms > 489ms > > IMHO, if we conclude that because there is no performance gain so we > don't want to add one extra path in the code then we might want to > remove that TODO from the code so that we don't spend time for > optimizing this in the future. > Would you be able to share your test setup? It seems like it’d helpful to get a larger sample size to better determine if it’s measurable or not. Visually those 4 runs look to me like it’s possible, but objectively I’m not sure we can yet conclude one way or the other. James
Re: Allow to_date() and to_timestamp() to accept localized names
On Sun, Mar 8, 2020 at 10:42 AM James Coleman wrote: > On Sun, Mar 8, 2020 at 7:17 AM Juan José Santamaría Flecha < > juanjo.santama...@gmail.com> wrote: > >> >> >> On Sun, Mar 8, 2020 at 3:48 AM Tom Lane wrote: >> >>> James Coleman writes: >>> > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane wrote: >>> >> Looks like you may not have Turkish locale installed? Try >>> >> locale -a | grep tr_TR >>> >>> > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I >>> assume the >>> > utf8 version is acceptable? Or is there a non-utf8 variant? >>> >>> Hmm ... I'm far from an expert on the packaging of locale data, but >>> the simplest explanation I can think of is that the tr_TR locale exists >>> to some extent on your machine but the LC_TIME component of that is >>> missing. >>> >> >> AFAICS, the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), is not >> the same as 'tr_TR.utf8'. >> > > The test name implies it's about utf8, though, which makes me wonder if > the test should be testing utf8 instead? > > That being said, a bit more googling based on your node about the proper > ISO encoding turned up this page: https://unix.stackexchange.com/a/446762 > > And I confirmed that the locale you mentioned is available: > $ grep "tr_TR" /usr/share/i18n/SUPPORTED > tr_TR.UTF-8 UTF-8 > tr_TR ISO-8859-9 > > So I tried: > echo tr_TR.ISO-8859-9 >> /var/lib/locales/supported.d/local # In a root > session > sudo dpkg-reconfigure locales > > That didn't seem to fix it, though `locale -a` still only lists > tr_TR.utf8, so I'm still at a loss, and also unclear why a test names utf8 > is actually relying on an ISO encoding. > Another update: Since sudo dpkg-reconfigure locales opens up an ncurses gui on my machine, I tried selecting the tr_TR.ISO-8859-9 option there and removed the /var/lib/locales/supported.d/local file. Now I get: $ locale -a | grep tr_TR tr_TR tr_TR.iso88599 tr_TR.utf8 And now `make check` passes. I'm still interested in understanding why we're using the ISO locale instead of the utf8 one in a utf8-labeled test though. Thanks, James
Re: Allow to_date() and to_timestamp() to accept localized names
James Coleman writes: > Since sudo dpkg-reconfigure locales opens up an ncurses gui on my machine, > I tried selecting the tr_TR.ISO-8859-9 option there and removed the > /var/lib/locales/supported.d/local file. Now I get: > $ locale -a | grep tr_TR > tr_TR > tr_TR.iso88599 > tr_TR.utf8 > And now `make check` passes. Hm. > I'm still interested in understanding why we're using the ISO locale > instead of the utf8 one in a utf8-labeled test though. We are not. My understanding of the rules about this is that the active LC_CTYPE setting determines the encoding that libc uses, period. The encoding suffix on the locale name only makes a difference when LC_CTYPE is being specified (or derived from LANG or LC_ALL), not any other LC_XXX setting --- although for consistency they'll let you include it in any LC_XXX value. We could of course choose to spell LC_TIME as 'tr_TR.utf8' in this test, but that would open up the question of whether that causes problems on platforms where the canonical spelling of the encoding suffix is different (eg "UTF-8"). I'm disinclined to take that risk without positive proof that it helps on some platform. My guess about what was actually happening on your machine is that the underlying locale data only exists in iso-8859-9 form, and that glibc normally translates that to utf8 on-the-fly when it's demanded ... but if the data isn't installed at all, nothing happens. On my Red Hat platforms, this installation choice seems pretty all-or-nothing, but it sounds like Debian lets you chop it up more finely (and maybe slice off your finger while at it :-(). regards, tom lane
Re: range_agg
Isaac Morland writes: >> so 7. 3. 2020 v 22:20 odesílatel Tom Lane napsal: >>> Actually ... have you given any thought to just deciding that ranges and >>> multiranges are the same type? That is, any range can now potentially >>> contain multiple segments? > Definitely agreed that range and multirange (or whatever it's called) > should be different. In the work I do I have a number of uses for ranges, > but not (yet) for multiranges. I want to be able to declare a column as > range and be sure that it is just a single range, and then call lower() and > upper() on it and be sure to get just one value in each case; and if I > accidentally try to take the union of ranges where the union isn’t another > range, I want to get an error rather than calculate some weird (in my > context) multirange. I do not find that argument convincing at all. Surely you could put that constraint on your column using "CHECK (numranges(VALUE) <= 1)" or some such notation. Also, you're attacking a straw man with respect to lower() and upper(); I did not suggest changing them to return arrays, but rather interpreting them as returning the lowest or highest endpoint, which I think would be transparent in most cases. (There would obviously need to be some other functions that could dissect a multirange more completely.) The real problem with the proposal as it stands, I think, is exactly that range union has failure conditions and you have to use some other operator if you want to get a successful result always. That's an enormously ugly kluge, and if we'd done it right the first time nobody would have objected. Bottom line is that I don't think that we should add a pile of new moving parts to the type system just because people are afraid of change; arguably, that's *more* change (and more risk of bugs), not less. Unifying the types would, for example, get rid of the pesky question of what promoting a range to multirange should look like exactly, because it'd be a no-op. regards, tom lane
Re: range_agg
On Fri, Dec 20, 2019 at 10:43 AM Alvaro Herrera wrote: > I took the liberty of rebasing this series on top of recent branch > master. > In the tests there is: +select '{[a,a],[b,b]}'::textmultirange; + textmultirange + + {[a,a],[b,b]} +(1 row) + +-- without canonicalization, we can't join these: +select '{[a,a], [b,b]}'::textmultirange; + textmultirange + + {[a,a],[b,b]} +(1 row) + Aside from the comment they are identical so I'm confused as to why both tests exist - though I suspect it has to do with the fact that the expected result would be {[a,b]} since text is discrete. Also, the current patch set seems a bit undecided on whether it wants to be truly a multi-range or a range that can report non-contiguous components. Specifically, +select '{[a,d), [b,f]}'::textmultirange; + textmultirange + + {[a,f]} +(1 row) There is a an argument that a multi-range should output {[a,d),[b,f]}. IMO its arguable that a multi-range container should not try and reduce the number of contained ranges at all. If that is indeed a desire, which seems like it is, that feature alone goes a long way to support wanting to just merge the desired functionality into the existing range type, where the final output has the minimum number of contiguous ranges possible, rather than having a separate multirange type. David J.
pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
While working on a patch, I noticed this pre-existing behavior, which seems to be new since v11, maybe due to changes to SRF. |postgres=# SELECT pg_ls_dir('.') LIMIT 1; |WARNING: 1 temporary files and directories not closed at end-of-transaction |pg_ls_dir | pg_dynshmem |postgres=# SELECT pg_ls_waldir() LIMIT 1; |WARNING: 1 temporary files and directories not closed at end-of-transaction |-[ RECORD 1 ]+- |pg_ls_waldir | (00013192007B,16777216,"2020-03-08 03:50:34-07") Note, that doesn't happen with "SELECT * FROM". I'm not sure what the solution is to that, but my patch was going to make it worse rather than better for pg_ls_tmpdir. -- Justin
Re: proposal: schema variables
so 7. 3. 2020 v 22:15 odesílatel Pavel Stehule napsal: > Hi > > I fixed the some ugly parts of this patch - now the LET x = DEFAULT, LET x > = NULL is processed by more usual way. > Statement LET is doesn't switch between STMT_UTILITY and STMT_PLAN_UTILITY > like before. It is only STMT_PLAN_UTILITY statement. > I did some cleaning - I mainly replaced CMD_PLAN_UTILITY by CMD_LET because there is not another similar statement in queue. Regards Pavel > Regards > > Pavel > schema-variables-20200308.patch.gz Description: application/gzip
Re: Allow to_date() and to_timestamp() to accept localized names
I wrote: > James Coleman writes: >> I'm still interested in understanding why we're using the ISO locale >> instead of the utf8 one in a utf8-labeled test though. > We are not. My understanding of the rules about this is that the > active LC_CTYPE setting determines the encoding that libc uses, > period. The encoding suffix on the locale name only makes a > difference when LC_CTYPE is being specified (or derived from LANG or > LC_ALL), not any other LC_XXX setting --- although for consistency > they'll let you include it in any LC_XXX value. Oh wait --- I'm wrong about that. Looking at the code in pg_locale.c, what actually happens is that we get data in the codeset implied by the LC_TIME setting and then translate it to the database encoding (cf commit 7ad1cd31b). So if bare "tr_TR" is taken as implying iso-8859-9, which seems likely (it appears to work that way here, anyway) then this test is exercising the codeset translation path. We could change the test to say 'tr_TR.utf8' but that would give us less test coverage. regards, tom lane
Re: Remove utils/acl.h from catalog/objectaddress.h
Peter Eisentraut writes: > I noticed that catalog/objectaddress.h includes utils/acl.h for no > apparent reason. It turns out this used to be needed but not anymore. > So removed it and cleaned up the fallout. Patch attached. Seems reasonable. One thing I noticed is that if you are including nodes/parsenodes.h explicitly in objectaddress.h, there seems little point in the #include "nodes/pg_list.h" right beside it. Sometime we really ought to make an effort to make our header inclusions less of a mass of spaghetti. But this patch needn't take on that load. regards, tom lane
Re: Use compiler intrinsics for bit ops in hash
On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote: > Hi David, > > On Wed, Feb 26, 2020 at 9:56 PM David Fetter wrote: > > > > On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote: > > > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote: > > > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > > > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > > > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > > > > > > > > > 1. Is ceil_log2_64 dead code? > > > > > > > > > > > > Let's call it nascent code. I suspect there are places it could go, > > > > > > if > > > > > > I look for them. Also, it seemed silly to have one without the > > > > > > other. > > > > > > > > > > > > > > > > While not absolutely required, I'd like us to find at least one > > > > > place and start using it. (Clang also nags at me when we have > > > > > unused functions). > > > > > > > > Done in the expanded patches attached. > I see that you've found use of it in dynahash, thanks! > > The math in the new (from v4 to v6) patch is wrong: it yields > ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted > the restriction of "num greater than one" for ceil_log2() in this patch > set, but it's now _more_ problematic to base those functions on > pg_leftmost_one_pos(). > > I'm not comfortable with your changes to pg_leftmost_one_pos() to remove > the restriction on word being non-zero. Specifically > pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its > current callers (in HEAD) is harmed, this introduces muddy semantics: > > 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning > for a set bit in a zero word won't find it anywhere. > > 2. we can _try_ generalizing it to accommodate ceil_log2 by > extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In > that case, the extrapolation yields -1 for pg_leftmost_one_pos(0). > > I'm not convinced that others on the list will be comfortable with the > generalization suggested in 2 above. > > I've quickly put together a PoC patch on top of yours, which > re-implements ceil_log2 using LZCNT coupled with a CPUID check. > Thoughts? Per discussion on IRC with Andrew (RhodiumToad) Gierth: The runtime detection means there's always an indirect call overhead and no way to inline. This is counter to what using compiler intrinsics is supposed to do. It's better to rely on the compiler, because: (a) The compiler often knows whether the value can or can't be 0 and can therefore skip a conditional jump. (b) If you're targeting a recent microarchitecture, the compiler can just use the right instruction. (c) Even if the conditional branch is left in, it's not a big overhead. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
Justin Pryzby writes: > While working on a patch, I noticed this pre-existing behavior, which seems to > be new since v11, maybe due to changes to SRF. > |postgres=# SELECT pg_ls_dir('.') LIMIT 1; > |WARNING: 1 temporary files and directories not closed at end-of-transaction Hmm, actually it looks to me like pg_ls_dir has been broken forever. The reason the warning didn't show up before v11 is that CleanupTempFiles didn't bleat about leaked "allocated" directories before that (cf 9cb7db3f0). I guess we ought to change that function to use returns-a-tuplestore protocol instead of thinking it can hold a directory open across calls. It's not hard to think of use-cases where the existing behavior would cause issues worse than a nanny-ish WARNING, especially on platforms with tight "ulimit -n" limits. regards, tom lane
Re: More tests to stress directly checksum_impl.h
Michael Paquier writes: > Thanks for the computations with big-endian! I would have just gone > down to the 8kB page for the expected results by seeing three other > tests blowing up, but no objection to what you have here either. I > have checked the computations with little-endian from your patch and > these are correct. After thinking more I concluded that the extra expected files would just be a waste of tarball space, at least till such time as we make a push to fix all the regression tests to be blocksize-independent. Pushed it with just the 8K files. regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote: > Justin Pryzby writes: > > While working on a patch, I noticed this pre-existing behavior, which seems > > to > > be new since v11, maybe due to changes to SRF. > > > |postgres=# SELECT pg_ls_dir('.') LIMIT 1; > > |WARNING: 1 temporary files and directories not closed at > > end-of-transaction > > Hmm, actually it looks to me like pg_ls_dir has been broken forever. > The reason the warning didn't show up before v11 is that CleanupTempFiles > didn't bleat about leaked "allocated" directories before that > (cf 9cb7db3f0). > > I guess we ought to change that function to use returns-a-tuplestore > protocol instead of thinking it can hold a directory open across calls. > It's not hard to think of use-cases where the existing behavior would > cause issues worse than a nanny-ish WARNING, especially on platforms > with tight "ulimit -n" limits. Thanks for the analysis. Do you mean it should enumerate all files during the initial SRF call, or use something other than the SRF_* macros ? -- Justin
Re: Additional improvements to extended statistics
On Fri, 6 Mar 2020 at 12:58, Tomas Vondra wrote: > > Here is a rebased version of this patch series. I've polished the first > two parts a bit - estimation of OR clauses and (Var op Var) clauses. > Hi, I've been looking over the first patch (OR list support). It mostly looks reasonable to me, except there's a problem with the way statext_mcv_clauselist_selectivity() combines multiple stat_sel values into the final result -- in the OR case, it needs to start with sel = 0, and then apply the OR formula to factor in each new estimate. I.e., this isn't right for an OR list: /* Factor the estimate from this MCV to the oveall estimate. */ sel *= stat_sel; (Oh and there's a typo in that comment: s/oveall/overall/). For example, with the regression test data, this isn't estimated well: SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0; Similarly, if no extended stats can be applied it needs to return 0 not 1, for example this query on the test data: SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL; It might also be worth adding a couple more regression test cases like these. Regards, Dean
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
Justin Pryzby writes: > On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote: >> I guess we ought to change that function to use returns-a-tuplestore >> protocol instead of thinking it can hold a directory open across calls. >> It's not hard to think of use-cases where the existing behavior would >> cause issues worse than a nanny-ish WARNING, especially on platforms >> with tight "ulimit -n" limits. > Thanks for the analysis. > Do you mean it should enumerate all files during the initial SRF call, or use > something other than the SRF_* macros ? It has to enumerate all the files during the first call. I suppose it could do that and then still hand back the results one-at-a-time, but there seems little point compared to filling a tuplestore immediately. So probably the SRF_ macros are useless here. Another possible solution is to register an exprstate-shutdown hook to ensure the resource is cleaned up, but I'm not very happy with that because it does nothing to prevent the hazard of overrunning the available resources if you have several of these active at once. I've just finished scanning the source code and concluding that all of these functions are similarly broken: pg_ls_dir pg_ls_dir_files pg_tablespace_databases pg_logdir_ls_internal pg_timezone_names pgrowlocks The first five risk leaking an open directory, the last risks leaking an active tablescan and open relation. I don't see anything in the documentation (either funcapi.h or xfunc.sgml) warning that the function might not get run to completion, either ... regards, tom lane
Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
I wrote: > I've just finished scanning the source code and concluding that all > of these functions are similarly broken: > pg_ls_dir > pg_ls_dir_files > pg_tablespace_databases > pg_logdir_ls_internal > pg_timezone_names > pgrowlocks BTW, another thing I noticed while looking around is that some of the functions using SRF_RETURN_DONE() think they should clean up memory beforehand. This is a waste of code/cycles, as long as the memory was properly allocated in funcctx->multi_call_memory_ctx, because funcapi.c takes care of deleting that context. We should probably document that *any* manual cleanup before SRF_RETURN_DONE() is an antipattern. If you have to have cleanup, it needs to be done via RegisterExprContextCallback instead. regards, tom lane
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On Mon, Mar 9, 2020 at 10:00 AM Marina Polyakova wrote: > On 2018-11-16 22:59, Alvaro Herrera wrote: > > On 2018-Sep-05, Marina Polyakova wrote: > > > >> v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch > >> - a patch for the RandomState structure (this is used to reset a > >> client's > >> random seed during the repeating of transactions after > >> serialization/deadlock failures). > > > > Pushed this one with minor stylistic changes (the most notable of which > > is the move of initRandomState to where the rest of the random > > generator > > infrastructure is, instead of in a totally random place). Thanks, > > Thank you very much! I'm going to send a new patch set until the end of > this week (I'm sorry I was very busy in the release of Postgres Pro > 11...). Is anyone interested in rebasing this, and summarising what needs to be done to get it in? It's arguably a bug or at least quite unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard that a couple of forks already ship Marina's patch set.
Re: pg_rewind docs correction
On Tue, Sep 17, 2019 at 9:41 PM Michael Paquier wrote: > On Tue, Sep 17, 2019 at 08:38:18AM -0400, James Coleman wrote: > > I don't agree that that's a valid equivalency. I myself spent a lot of > > time trying to understand how this could possibly be true a while > > back, and even looked at source code to be certain. I've asked other > > people and found the same confusion. > > > > As I read it the 2nd second sentence doesn't actually tell you the > > differences; it makes a quick attempt at summarizing *how* the first > > sentence is true, but if the first sentence isn't accurate, then it's > > hard to read the 2nd one as helping. > > Well, then it comes back to the part where I am used to the existing > docs :) > > > If you'd prefer something less detailed at this point at that point in > > the docs, then something along the lines of "results in a data > > directory state which can then be safely replayed from the source" or > > some such. > > Actually this is a good suggestion, and could replace the first > sentence of this paragraph. > > > The docs shouldn't be correct just for someone how already understands > > the intricacies. And the end user shouldn't have to read the "how it > > works" (which incidentally is kinda hidden at the bottom underneath > > the CLI args -- perhaps we could move that?) to extrapolate things in > > the primary documentation. > > Perhaps. This doc page is not that long either. > I'd set this aside for quite a while, but I was looking at it again this afternoon, and I've come to see your concern about the opening paragraphs remaining relatively simple. To that end I believe I've come up with a patch that's a good compromise: retaining that simplicity and being more clear and accurate at the same time. In the first paragraph I've updated it to refer to both "successful rewind and subsequent WAL replay" and the result I describe as being equivalent to the result of a base backup, since that's more technically correct anyway (the current text could be read as implying a full out copy of the data directory, but that's not really true just as it isn't with pg_basebackup). I've added the information about how the backup label control file is written, and updated the How It Works steps to refer to that separately from restart. Additionally the How It Works is updated to include WAL segments and new relation files in the list of files copied wholesale, since that was previously stated but somewhat contradicted there. I realized I didn't previously add this to the CF; since it's not a new patch I've added it to the current CF, but if this is incorrect please let me know. Thanks, James From 592ba15c35bb16e55b0bb0a7e7bdbb6dd4e08a0b Mon Sep 17 00:00:00 2001 From: James Coleman Date: Sun, 8 Mar 2020 16:39:45 -0400 Subject: [PATCH v3] Improve pg_rewind explanation and warnings The pg_rewind docs currently assert that the state of the target's data directory after rewind is equivalent to the source's data directory. But that isn't quite true both because the base state is further back in time and because the target's data directory will include the current state on the source of any copied blocks. Additionally the state isn't equal to a copy of the source data directory; it's equivalent to a base backup of the source. The How It Works section now: - Includes details about how the backup label file is created. - Is updated to include WAL segments and new relation files in the list of files copied wholesale from the source. Finally, document clearly the state of the cluster after the operation and also the operation sequencing dangers caused by copying configuration files from the source. --- doc/src/sgml/ref/pg_rewind.sgml | 87 - 1 file changed, 54 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 42d29edd4e..bc6f0009cc 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -48,14 +48,16 @@ PostgreSQL documentation - The result is equivalent to replacing the target data directory with the - source one. Only changed blocks from relation files are copied; - all other files are copied in full, including configuration files. The - advantage of pg_rewind over taking a new base backup, or - tools like rsync, is that pg_rewind does - not require reading through unchanged blocks in the cluster. This makes - it a lot faster when the database is large and only a small - fraction of blocks differ between the clusters. + After a successful rewind and subsequent WAL replay, the target data + directory is equivalent to a base backup of the source data directory. While + only changed blocks from existing relation files are copied; all other files + are copied in full, including new relation files, configuration files, and WAL + segments. The advantage of pg_rewind over taking a + new base backup, or tools like rsync, is t
Re: pg_rewind docs correction
On Sun, Mar 8, 2020 at 5:13 PM James Coleman wrote: > > I realized I didn't previously add this to the CF; since it's not a new > patch I've added it to the current CF, but if this is incorrect please let > me know. > Hmm, looks like I can't add it to the current one. I added it to the next one. I think it could probably go now, since the patch is really 6 months old, but either way is fine -- it's just a docs patch. James
Re: Index Skip Scan
On Mon, 9 Mar 2020 at 03:21, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > > > I've been looking over v32 of the patch and have a few comments > > regarding the planner changes. > > Thanks for the commentaries! > > > I think the changes in create_distinct_paths() need more work. The > > way I think this should work is that create_distinct_paths() gets to > > know exactly nothing about what path types support the elimination of > > duplicate values. The Path should carry the UniqueKeys so that can be > > determined. In create_distinct_paths() you should just be able to make > > use of those paths, which should already have been created when > > creating index paths for the rel due to PlannerInfo's query_uniquekeys > > having been set. > > Just for me to clarify. The idea is to "move" information about what > path types support skipping into UniqueKeys (derived from PlannerInfo's > query_uniquekeys), but other checks (e.g. if index am supports that) > still perform in create_distinct_paths? create_distinct_paths() shouldn't know any details specific to the pathtype that it's using or considering using. All the details should just be in Path. e.g. uniquekeys, pathkeys, costs etc. There should be no IsA(path, ...). Please have a look over the details in my reply to Tomas. I hope that reply has enough information in it, but please reply there if I've missed something. > > On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote: > > There's also some weird looking assumptions that an EquivalenceMember > > can only be a Var in create_distinct_paths(). I think you're only > > saved from crashing there because a ProjectionPath will be created > > atop of the IndexPath to evaluate expressions, in which case you're > > not seeing the IndexPath. This results in the optimisation not > > working in cases like: > > > > postgres=# create table t (a int); create index on t ((a+1)); explain > > select distinct a+1 from t; > > CREATE TABLE > > CREATE INDEX > > QUERY PLAN > > --- > > HashAggregate (cost=48.25..50.75 rows=200 width=4) > >Group Key: (a + 1) > >-> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4) > > Yes, I need to fix it. > > > Using unique paths as I mentioned above should see that fixed. > > I'm a bit confused about this statement, how exactly unique paths should > fix this? The path's uniquekeys would mention that it's unique on (a+1). You'd compare the uniquekeys of the path to the DISTINCT clause and see that the uniquekeys are a subset of the DISTINCT clause therefore the DISTINCT is a no-op. If that uniquekey path is cheaper than the cheapest_total_path + , then you should pick the unique path, otherwise use the cheapest_total_path and uniquify that. I think the UniqueKeys may need to be changed from using EquivalenceClasses to use Exprs instead.
Re: Allow to_date() and to_timestamp() to accept localized names
On Sun, Mar 8, 2020 at 2:19 PM Tom Lane wrote: > I wrote: > > James Coleman writes: > >> I'm still interested in understanding why we're using the ISO locale > >> instead of the utf8 one in a utf8-labeled test though. > > > We are not. My understanding of the rules about this is that the > > active LC_CTYPE setting determines the encoding that libc uses, > > period. The encoding suffix on the locale name only makes a > > difference when LC_CTYPE is being specified (or derived from LANG or > > LC_ALL), not any other LC_XXX setting --- although for consistency > > they'll let you include it in any LC_XXX value. > > Oh wait --- I'm wrong about that. Looking at the code in pg_locale.c, > what actually happens is that we get data in the codeset implied by > the LC_TIME setting and then translate it to the database encoding > (cf commit 7ad1cd31b). So if bare "tr_TR" is taken as implying > iso-8859-9, which seems likely (it appears to work that way here, > anyway) then this test is exercising the codeset translation path. > We could change the test to say 'tr_TR.utf8' but that would give us > less test coverage. > So just to confirm I understand, that implies that the issue is solely that only the utf8 tr_TR set is installed by default on this machine, and the iso-8859-9 set is a hard requirement (that is, the test is explicitly testing a codepath that generates utf8 results from a non-utf8 source)? If so, I'm going to try a bare Ubuntu install on a VM and see what locales are installed by default for Turkish. If in fact Ubuntu doesn't install this locale by default, then is this a caveat we should add to developer docs somewhere? It seems odd to me that I'd be the only one encountering it, but OTOH I would have thought this a fairly vanilla install too... James
Re: Improve search for missing parent downlinks in amcheck
Hi, Peter! On Tue, Mar 3, 2020 at 3:04 AM Peter Geoghegan wrote: > Apologies for the delayed response. I was a little tired from the > deduplication project. No problem. Apologies for the delayed revision as well. > I taught pageinspect to display a "htid" field for pivot tuples > recently, making it easier to visualize this example. Great! > I think that you should say more about how "lowkey" is used here: > > > /* > > -* Record if page that is about to become target is the right half > > of > > -* an incomplete page split. This can go stale immediately in > > -* !readonly case. > > +* Copy current target low key as the high key of right sibling. > > +* Allocate memory in upper level context, so it would be cleared > > +* after reset of target context. > > +* > > +* We only need low key for parent check. > > */ > > - state->rightsplit = P_INCOMPLETE_SPLIT(opaque); > > + if (state->readonly && !P_RIGHTMOST(opaque)) > > + { > > Say something about concurrent page splits, since they're the only > case where we actually use lowkey. Maybe say something like: "We > probably won't end up doing anything with lowkey, but it's simpler for > readonly verification to always have it available". I've revised this comment. Hopefully it's better now. > * I think that these comments could still be clearer: > > > + /* > > +* We're going to try match child high key to "negative > > +* infinity key". This normally happens when the last child > > +* we visited for target's left sibling was an incomplete > > +* split. So, we must be still on the child of target's left > > +* sibling. Thus, we should match to target's left sibling > > +* high key. Thankfully we saved it, it's called a "low > > key". > > +*/ > > Maybe start with "We cannot try to match child's high key to a > negative infinity key in target, since there is nothing to compare. > However...". Perhaps use terms like "cousin page" and "subtree", which > can be useful. Alternatively, mention this case in the diagram example > at the top of bt_child_highkey_check(). It's tough to write comments > like this, but I think it's worth it. I've updated this comment using terms "cousin page" and "subtree". I didn't refer the diagram example, because it doesn't contain appropriate case. And I wouldn't like this diagram to contain such case, because that probably makes this diagram too complex. I've also invented term "uncle page". BTW, should it be "aunt page"? I don't know. > Note that a high key is also a pivot tuple, so I wouldn't mention high > keys here: > > > +/* > > + * Check if two tuples are binary identical except the block number. So, > > + * this function is capable to compare high keys with pivot keys. > > + */ > > +static bool > > +bt_pivot_tuple_identical(IndexTuple itup1, IndexTuple itup2) > > +{ Sure, this comment is revised. > v7 looks pretty close to being commitable, though I'll probably want > to update some comments that you haven't touched when you commit this. > I should probably wait until you've committed the patch to go do that. > I'm thinking of things like old comments in bt_downlink_check(). > > I will test the patch properly one more time when you produce a new > revision. I haven't really tested it since the last time. Attached patch also has revised commit message. I'll wait for your response before commit. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Allow to_date() and to_timestamp() to accept localized names
James Coleman writes: > So just to confirm I understand, that implies that the issue is solely that > only the utf8 tr_TR set is installed by default on this machine, and the > iso-8859-9 set is a hard requirement (that is, the test is explicitly > testing a codepath that generates utf8 results from a non-utf8 source)? It's not "explicitly" testing that; the fact that "tr_TR" is treated as "tr_TR.iso88599" is surely an implementation artifact. (On macOS, some experimentation shows that "tr_TR" is treated as "tr_TR.UTF-8".) But yeah, I think it's intentional that we want the codeset translation path to be exercised here on at least some platforms. > If in fact Ubuntu doesn't install this locale by default, then is this a > caveat we should add to developer docs somewhere? It seems odd to me that > I'd be the only one encountering it, but OTOH I would have thought this a > fairly vanilla install too... Not sure. The lack of prior complaints points to this not being a common situation. It does seem weird that they'd set things up so that "tr_TR.utf8" exists but not "tr_TR"; even if that's not an outright packaging mistake, it seems like a POLA violation from here. regards, tom lane
Re: Allow to_date() and to_timestamp() to accept localized names
On Sun, Mar 8, 2020 at 6:03 PM Tom Lane wrote: > James Coleman writes: > > So just to confirm I understand, that implies that the issue is solely > that > > only the utf8 tr_TR set is installed by default on this machine, and the > > iso-8859-9 set is a hard requirement (that is, the test is explicitly > > testing a codepath that generates utf8 results from a non-utf8 source)? > > It's not "explicitly" testing that; the fact that "tr_TR" is treated > as "tr_TR.iso88599" is surely an implementation artifact. (On macOS, > some experimentation shows that "tr_TR" is treated as "tr_TR.UTF-8".) > But yeah, I think it's intentional that we want the codeset translation > path to be exercised here on at least some platforms. > I idly wonder if there could/should be some tests for the implicit case, some explicitly testing the codeset translation (if possible), and some testing the explicit utf8 case...but I don't know enough about this area to push for anything. > > If in fact Ubuntu doesn't install this locale by default, then is this a > > caveat we should add to developer docs somewhere? It seems odd to me that > > I'd be the only one encountering it, but OTOH I would have thought this a > > fairly vanilla install too... > > Not sure. The lack of prior complaints points to this not being a > common situation. It does seem weird that they'd set things up so > that "tr_TR.utf8" exists but not "tr_TR"; even if that's not an > outright packaging mistake, it seems like a POLA violation from here. > > regards, tom lane > All right. I downloaded an Ubuntu 18.04.3 server VM from OSBoxes, and it had very few locales installed by default...so that wasn't all that helpful. I think at this point I'll just leave this as a mystery, much as I hate that. James
Re: Add absolute value to dict_int
Jeff Janes writes: > I've seen a few requests on how to make FTS search on the absolute value of > integers. This question is usually driven by the fact that the text search > parser interprets a separating hyphen ("partnumber-987") as a minus sign. > There is currently no good answer for this that doesn't involve C coding. > I think this feature has a natural and trivial home in the dict_int > extension, so attached is a patch that does that. Seems reasonable, so pushed with minor cleanup (I noticed it was off-by-one for the maxlen check, which was harmless unless you had rejectlong enabled as well). I debated whether I liked the "absval" parameter name, which seemed a bit too abbreviative; but it's more or less in line with the existing parameter names, so I left it alone. > There are no changes to the extension creation scripts, so there is no need > to bump the version. And I think the convention is that we don't bump the > version just to signify a change which implements a new feature when that > doesn't change the creation scripts. Right, there's no need to update the creation script. I noticed one odd thing which is not the fault of this patch, but seems to need cleaned up: regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = true); ALTER TEXT SEARCH DICTIONARY regression=# select ts_lexize('intdict', '-123456'); ts_lexize --- {123456} (1 row) regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = 1); ALTER TEXT SEARCH DICTIONARY regression=# select ts_lexize('intdict', '-123456'); ERROR: absval requires a Boolean value Why did ALTER accept that, if it wasn't valid? It's not like there's no error checking at all: regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = foo); ERROR: absval requires a Boolean value Upon digging into that, the reason is that defGetBoolean accepts a T_Integer Value with value 1, but it doesn't accept a T_String with value "1". And apparently we're satisfied to smash dictionary parameters to strings before storing them. There are at least three things we could do here: 1. Insist that defGetBoolean and its siblings should accept the string equivalent of any non-string value they accept. This would change the behavior of a whole lot of utility commands, not only the text search commands, and I'm not exactly convinced it's a good idea. Seems like it's losing error detection capability. 2. Improve the catalog representation of text search parameters so that the original Value node can be faithfully reconstructed. I'd be for this, except it seems like a lot of work for a rather minor benefit. 3. Rearrange text search parameter validation so we smash parameters to strings *before* we validate them, ensuring we won't take any settings that will be rejected later. I'm leaning to #3 as being the most practical fix. Thoughts? regards, tom lane
Re: Use compiler intrinsics for bit ops in hash
Hi John, Oops this email has been sitting in my outbox for 3 days... On Wed, Mar 4, 2020 at 1:46 AM John Naylor wrote: > On Tue, Mar 3, 2020 at 4:46 AM Jesse Zhang wrote: > > I've quickly put together a PoC patch on top of yours, which > > re-implements ceil_log2 using LZCNT coupled with a CPUID check. > > Thoughts? > > This patch seems to be making an assumption that an indirect function > call is faster than taking a branch (in inlined code) that the CPU > will almost always predict correctly. It would be nice to have some > numbers to compare. (against pg_count_leading_zeros_* using the "slow" > versions but statically inlined). > Ah, how could I forget that... I ran a quick benchmark on my laptop, and indeed, even though the GCC-generated code takes a hit on zero input (Clang generates slightly different code that gives indistinguishable runtime for zero and non-zero inputs), the inlined code (the function input in my benchmark is never a constant literal so the branch does get exercised at runtime) is still more than twice as fast as the function call. -- BenchmarkTime CPU Iterations -- BM_pfunc/01.57 ns 1.56 ns447127265 BM_pfunc/11.56 ns 1.56 ns449618696 BM_pfunc/81.57 ns 1.57 ns443013856 BM_pfunc/64 1.57 ns 1.57 ns448784369 BM_slow/00.602 ns0.600 ns 10 BM_slow/10.391 ns0.390 ns 10 BM_slow/80.392 ns0.391 ns 10 BM_slow/64 0.391 ns0.390 ns 10 BM_fast/0 1.47 ns 1.46 ns477513921 BM_fast/1 1.47 ns 1.46 ns473992040 BM_fast/8 1.46 ns 1.46 ns474895755 BM_fast/641.47 ns 1.46 ns477215268 For your amusement, I've attached the meat of the benchmark. To build the code you can grab the repository at https://github.com/d/glowing-chainsaw/tree/pfunc > Stylistically, "8 * sizeof(num)" is a bit overly formal, since the > hard-coded number we want is in the name of the function. Oh yeah, overly generic code is indicative of the remnants of my C++ brain, will fix. Cheers, Jesse #include "lzcnt.h" static bool go = lzcnt_available(); __attribute__((target("lzcnt"))) inline int lzcnt_fast(uint64_t word) { return __builtin_clzll(word); } static int lzcnt_choose(uint64_t word) { if (lzcnt_available()) lzcnt_pfunc = lzcnt_fast; else lzcnt_pfunc = lzcnt_slow; return lzcnt_pfunc(word); } int (*lzcnt_pfunc)(uint64_t word) = lzcnt_choose; #ifndef LZCNT_H #define LZCNT_H #include #include static inline bool lzcnt_available() { uint32_t exx[4] = {0, 0, 0, 0}; __get_cpuid(0x8001, exx + 0, exx + 1, exx + 2, exx + 3); return (exx[2] & bit_ABM) != 0; } extern int (*lzcnt_pfunc)(uint64_t word); __attribute__((target("no-lzcnt"))) inline int lzcnt_slow(uint64_t word) { if (word == 0) return 8 * sizeof(word); return __builtin_clzll(word); } int lzcnt_fast(uint64_t word); #endif #include "benchmark/benchmark.h" #include "lzcnt.h" static void BM_pfunc(benchmark::State& state) { auto word = state.range(0); for (auto _ : state) { lzcnt_pfunc(word); } } static void BM_slow(benchmark::State& state) { auto word = state.range(0); for (auto _ : state) { benchmark::DoNotOptimize(lzcnt_slow(word)); } } static void BM_fast(benchmark::State& state) { auto word = state.range(0); for (auto _ : state) { lzcnt_fast(word); } } BENCHMARK(BM_pfunc)->Arg(0)->Range(1, 64); BENCHMARK(BM_slow)->Arg(0)->Range(1, 64); BENCHMARK(BM_fast)->Arg(0)->Range(1, 64); BENCHMARK_MAIN();
Re: Additional improvements to extended statistics
On Sun, Mar 08, 2020 at 07:17:10PM +, Dean Rasheed wrote: On Fri, 6 Mar 2020 at 12:58, Tomas Vondra wrote: Here is a rebased version of this patch series. I've polished the first two parts a bit - estimation of OR clauses and (Var op Var) clauses. Hi, I've been looking over the first patch (OR list support). It mostly looks reasonable to me, except there's a problem with the way statext_mcv_clauselist_selectivity() combines multiple stat_sel values into the final result -- in the OR case, it needs to start with sel = 0, and then apply the OR formula to factor in each new estimate. I.e., this isn't right for an OR list: /* Factor the estimate from this MCV to the oveall estimate. */ sel *= stat_sel; (Oh and there's a typo in that comment: s/oveall/overall/). For example, with the regression test data, this isn't estimated well: SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0; Similarly, if no extended stats can be applied it needs to return 0 not 1, for example this query on the test data: SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL; Ah, right. Thanks for noticing this. Attaches is an updated patch series with parts 0002 and 0003 adding tests demonstrating the issue and then fixing it (both shall be merged to 0001). It might also be worth adding a couple more regression test cases like these. Agreed, 0002 adds a couple of relevant tests. Incidentally, I've been working on improving test coverage for extended stats over the past few days (it has ~80% lines covered, which is not bad nor great). I haven't submitted that to hackers yet, because it's mostly mechanical and it's would interfere with the two existing threads about extended stats ... Speaking of which, would you take a look at [1]? I think supporting SAOP is fine, but I wonder if you agree with my conclusion we can't really support inclusion @> as explained in [2]. [1] https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy@pierred-pdoc [2] https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Additional improvements to extended statistics
On Mon, Mar 09, 2020 at 01:01:57AM +0100, Tomas Vondra wrote: On Sun, Mar 08, 2020 at 07:17:10PM +, Dean Rasheed wrote: On Fri, 6 Mar 2020 at 12:58, Tomas Vondra wrote: Here is a rebased version of this patch series. I've polished the first two parts a bit - estimation of OR clauses and (Var op Var) clauses. Hi, I've been looking over the first patch (OR list support). It mostly looks reasonable to me, except there's a problem with the way statext_mcv_clauselist_selectivity() combines multiple stat_sel values into the final result -- in the OR case, it needs to start with sel = 0, and then apply the OR formula to factor in each new estimate. I.e., this isn't right for an OR list: /* Factor the estimate from this MCV to the oveall estimate. */ sel *= stat_sel; (Oh and there's a typo in that comment: s/oveall/overall/). For example, with the regression test data, this isn't estimated well: SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0; Similarly, if no extended stats can be applied it needs to return 0 not 1, for example this query on the test data: SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL; Ah, right. Thanks for noticing this. Attaches is an updated patch series with parts 0002 and 0003 adding tests demonstrating the issue and then fixing it (both shall be merged to 0001). One day I won't forget to actually attach the files ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 4a0807b7e8ec511d72361598c390c158dc76e1a0 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 8 Mar 2020 23:26:50 +0100 Subject: [PATCH 1/4] Support using extended stats for parts of OR clauses --- src/backend/optimizer/path/clausesel.c| 109 +++--- src/backend/statistics/extended_stats.c | 45 +++- src/backend/statistics/mcv.c | 5 +- .../statistics/extended_stats_internal.h | 3 +- src/include/statistics/statistics.h | 3 +- src/test/regress/expected/stats_ext.out | 3 +- src/test/regress/sql/stats_ext.sql| 1 - 7 files changed, 138 insertions(+), 31 deletions(-) diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index a3ebe10592..8c1a404ce2 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -92,7 +92,7 @@ clauselist_selectivity(PlannerInfo *root, */ s1 *= statext_clauselist_selectivity(root, clauses, varRelid, jointype, sjinfo, rel, - &estimatedclauses); + &estimatedclauses, false); } /* @@ -104,6 +104,89 @@ clauselist_selectivity(PlannerInfo *root, estimatedclauses); } +/* + * clauselist_selectivity_or - + * Compute the selectivity of an implicitly-ORed list of boolean + * expression clauses. The list can be empty, in which case 0.0 + * must be returned. List elements may be either RestrictInfos + * or bare expression clauses --- the former is preferred since + * it allows caching of results. + * + * See clause_selectivity() for the meaning of the additional parameters. + * + * The basic approach is to apply extended statistics first, on as many + * clauses as possible, in order to capture cross-column dependencies etc. + * The remaining clauses are then estimated using regular statistics tracked + * for individual columns. This is done by simply passing the clauses to + * clauselist_selectivity and then combining the selectivities using the + * regular formula (s1+s2 - s1*s2). + */ +static Selectivity +clauselist_selectivity_or(PlannerInfo *root, + List *clauses, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo) +{ + ListCell *lc; + Selectivity s1 = 0.0; + RelOptInfo *rel; + Bitmapset *estimatedclauses = NULL; + int listidx; + + /* +* Determine if these clauses reference a single relation. If so, and if +* it has extended statistics, try to apply those. +*/ + rel = find_single_rel_for_clauses(root, clauses); + if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL) + { + /* +* Estimate as many clauses as possible using extended statistics. +* +* 'estimatedc
Nicer error when connecting to standby with hot_standby=off
I recently noticed while setting up a test environment that attempting to connect to a standby running without hot_standby=on results in a fairly generic error (I believe "the database system is starting up"). I don't have my test setup running right now, so can't confirm with a repro case at the moment, but with a little bit of spelunking I noticed that error text only shows up in src/backend/postmaster/postmaster.c when port->canAcceptConnections has the value CAC_STARTUP. Ideally the error message would include something along the lines of "The server is running as a standby but cannot accept connections with hot_standby=off". I wanted to get some initial feedback on the idea before writing a patch: does that seem like a reasonable change? Is it actually plausible to distinguish between this state and "still recovering" (i.e., when starting up a hot standby but initial recovery hasn't completed so it legitimately can't accept connections yet)? If so, should we include the possibility if hot_standby isn't on, just in case? The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h, which makes me wonder if changing this value would result in a wire protocol change/something the client wants to know about? If so, I assume it's not reasonable to change the value, but would it still be reasonable to change the error text? Thanks, James Coleman
Re: Use compiler intrinsics for bit ops in hash
Hi David, On Sun, Mar 8, 2020 at 11:34 AM David Fetter wrote: > > On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote: > > Hi David, > > Per discussion on IRC with Andrew (RhodiumToad) Gierth: > > The runtime detection means there's always an indirect call overhead > and no way to inline. This is counter to what using compiler > intrinsics is supposed to do. > > It's better to rely on the compiler, because: > (a) The compiler often knows whether the value can or can't be 0 and > can therefore skip a conditional jump. Yes, the compiler would know to eliminate the branch if the inlined function is called with a literal argument, or it infers an invariant from the context (like nesting inside a conditional block, or a previous conditional "noreturn" path). > (b) If you're targeting a recent microarchitecture, the compiler can > just use the right instruction. I might be more conservative than you are on (b). The thought of building a binary that cannot run "somewhere" where the compiler supports by default still mortifies me. > (c) Even if the conditional branch is left in, it's not a big overhead. > I 100% agree with (c), see benchmarking results upthread. Cheers, Jesse
Re: More tests to stress directly checksum_impl.h
On Sun, Mar 08, 2020 at 03:12:11PM -0400, Tom Lane wrote: > After thinking more I concluded that the extra expected files would > just be a waste of tarball space, at least till such time as we make > a push to fix all the regression tests to be blocksize-independent. Makes sense. > Pushed it with just the 8K files. Thanks! -- Michael signature.asc Description: PGP signature
Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench
On Sat, Mar 07, 2020 at 10:09:23AM +0900, Michael Paquier wrote: > Thanks to both of you for the reviews. Please note that I will > mention the business with pg_ctl and logging in a new thread and > remove the diff of pg_ctl.c from the previous patch, and that the doc > changes could be backpatched down to 12 for the relevant parts. The > documentation for PG_COLORS is still missing, but that's not new and I > think that we had better handle that case separately by creating a new > section in the docs. For now, let's wait a couple of days and see if > others have more thoughts to share about the doc patch of this thread. Hearing nothing, done. The part about pgbench with PGHOST, PGUSER and PGPORT could go further down, but it has been like that for years so I did not bother. -- Michael signature.asc Description: PGP signature
Re: Exposure related to GUC value of ssl_passphrase_command
On 2020/02/14 10:31, Moon, Insung wrote: Dear Hackers. Thank you for an response. I registered this entry in commifest of 2020-03. # I registered in the security part, but if it is wrong, sincerely apologize for this. And I'd like to review show authority to ssl_ * later and discuss it in a separate thread. So, you are planning to start new discussion about this? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Exposure related to GUC value of ssl_passphrase_command
On 2020/03/06 16:20, keisuke kuroda wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested I tested the patch on the master branch (addd034) and it works fine. I think that test case which non-superuser can't see this parameter is unnecessary. There is a similar test for pg_read_all_settings role. The new status of this patch is: Ready for Committer Pushed! Thanks! Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Asynchronous Append on postgres_fdw nodes.
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, passed Spec compliant: not tested Documentation:not tested I have tested the feature and it shows great performance in queries which have small amount result compared with base scan amount.
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Sun, Mar 8, 2020 at 9:24 PM James Coleman wrote: > > On Saturday, March 7, 2020, Dilip Kumar wrote: >> >> On Sat, Mar 7, 2020 at 9:59 AM Dilip Kumar wrote: >> > >> > On Sat, Mar 7, 2020 at 12:30 AM Andres Freund wrote: >> > > >> > > Hi, >> > > >> > > On 2020-01-08 18:06:52 +0530, Dilip Kumar wrote: >> > > > On Wed, 8 Jan 2020 at 5:28 PM, Heikki Linnakangas >> > > > wrote: >> > > > >> > > > > On 25/11/2019 05:52, Dilip Kumar wrote: >> > > > > > In logical decoding, while sending the changes to the output >> > > > > > plugin we >> > > > > > need to arrange them in the LSN order. But, if there is only one >> > > > > > transaction which is a very common case then we can avoid building >> > > > > > the >> > > > > > binary heap. A small patch is attached for the same. >> > > > > >> > > > > Does this make any measurable performance difference? Building a >> > > > > one-element binary heap seems pretty cheap. >> > > > >> > > > >> > > > I haven’t really measured the performance for this. I will try to do >> > > > that >> > > > next week. Thanks for looking into this. >> > > >> > > Did you do that? >> > >> > I tried once in my local machine but could not produce consistent >> > results. I will try this once again in the performance machine and >> > report back. >> >> I have tried to decode changes for the 100,000 small transactions and >> measured the time with head vs patch. I did not observe any >> significant gain. >> >> Head >> --- >> 519ms >> 500ms >> 487ms >> 501ms >> >> patch >> -- >> 501ms >> 492ms >> 486ms >> 489ms >> >> IMHO, if we conclude that because there is no performance gain so we >> don't want to add one extra path in the code then we might want to >> remove that TODO from the code so that we don't spend time for >> optimizing this in the future. > > > Would you be able to share your test setup? It seems like it’d helpful to get > a larger sample size to better determine if it’s measurable or not. Visually > those 4 runs look to me like it’s possible, but objectively I’m not sure we > can yet conclude one way or the other. Yeah, my test is very simple CREATE TABLE t1 (a int, b int); SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding'); --run 100,000 small transactions with pgbench ./pgbench -f test.sql -c 1 -j 1 -t 10 -P 1 postgres; --measure time to decode the changes time ./psql -d postgres -c "select count(*) from pg_logical_slot_get_changes('regression_slot', NULL,NULL); *test.sql is just one insert query like below insert into t1 values(1,1); I guess this should be the best case to test this patch because we are decoding a lot of small transactions but it seems the time taken for creating the binary heap is quite small. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Some problems of recovery conflict wait events
On 2020/03/08 13:52, Masahiko Sawada wrote: On Thu, 5 Mar 2020 at 20:16, Fujii Masao wrote: On 2020/03/05 16:58, Masahiko Sawada wrote: On Wed, 4 Mar 2020 at 15:21, Fujii Masao wrote: On 2020/03/04 14:31, Masahiko Sawada wrote: On Wed, 4 Mar 2020 at 13:48, Fujii Masao wrote: On 2020/03/04 13:27, Michael Paquier wrote: On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote: Yeah, so 0001 patch sets existing wait events to recovery conflict resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION) to the recovery conflict on a snapshot. 0003 patch improves these wait events by adding the new type of wait event such as WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch is the fix for existing versions and 0003 patch is an improvement for only PG13. Did you mean even 0001 patch doesn't fit for back-patching? Yes, it looks like a improvement rather than bug fix. Okay, understand. I got my eyes on this patch set. The full patch set is in my opinion just a set of improvements, and not bug fixes, so I would refrain from back-backpatching. I think that the issue (i.e., "waiting" is reported twice needlessly in PS display) that 0002 patch tries to fix is a bug. So it should be fixed even in the back branches. So we need only two patches: one fixes process title issue and another improve wait event. I've attached updated patches. Thanks for updating the patches! I started reading 0001 patch. Thank you for reviewing this patch. - /* -* Report via ps if we have been waiting for more than 500 msec -* (should that be configurable?) -*/ - if (update_process_title && new_status == NULL && - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(), - 500)) The patch changes ResolveRecoveryConflictWithSnapshot() and ResolveRecoveryConflictWithTablespace() so that they always add "waiting" into the PS display, whether wait is really necessary or not. But isn't it better to display "waiting" in PS basically when wait is necessary, like originally ResolveRecoveryConflictWithVirtualXIDs() does as the above? You're right. Will fix it. ResolveRecoveryConflictWithDatabase(Oid dbid) { + char*new_status = NULL; + + /* Report via ps we are waiting */ + new_status = set_process_title_waiting(); In ResolveRecoveryConflictWithDatabase(), there seems no need to display "waiting" in PS because no wait occurs when recovery conflict with database happens. Isn't the startup process waiting for other backend to terminate? Yeah, you're right. I agree that "waiting" should be reported in this case. Currently ResolveRecoveryConflictWithLock() and ResolveRecoveryConflictWithBufferPin() don't call ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting" in PS display. You changed them so that they report "waiting". I agree to have this change. But this change is an improvement rather than a bug fix, i.e., we should apply this change only in v13? Did you mean ResolveRecoveryConflictWithDatabase and ResolveRecoveryConflictWithBufferPin? Yes! Sorry for my typo. In the current code as far as I researched there are two cases where we don't add "waiting" and one case where we doubly add "waiting". ResolveRecoveryConflictWithDatabase and ResolveRecoveryConflictWithBufferPin don't update the ps title. Although the path where GetCurrentTimestamp() >= ltime is false in ResolveRecoveryConflictWithLock also doesn't update the ps title, it's already updated in WaitOnLock. On the other hand, the path where GetCurrentTimestamp() >= ltime is true in ResolveRecoveryConflictWithLock updates the ps title but it's wrong as I reported. I've split the patch into two patches: 0001 patch fixes the issue about doubly updating ps title, 0002 patch makes the recovery conflict resolution on database and buffer pin update the ps title. Thanks for splitting the patches. I think that 0001 patch can be back-patched. - /* -* Report via ps if we have been waiting for more than 500 msec -* (should that be configurable?) -*/ - if (update_process_title && new_status == NULL && - TimestampDifferenceExceeds(waitStart, GetCurrentTimestamp(), - 500)) Originally, "waiting" is reported in PS if we've been waiting for more than 500 msec, as the above does. But you got rid of those codes in the patch. Did you confirm that it's safe to do that? If not, isn't it better to apply the attached patch? The attached patch makes ResolveRecoveryConflictWithVirtualXIDs() report "waiting" as it d
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Sat, Mar 7, 2020 at 9:19 PM Dilip Kumar wrote: > On Sat, Mar 7, 2020 at 8:53 PM Tom Lane wrote: > > > > Dilip Kumar writes: > > > I think instead of the flag we need to keep the counter because we can > > > acquire the same relation extension lock multiple times. > > > > Uh ... what? How would that not be broken usage on its face? > > Basically, if we can ensure that while holding the relation extension > lock we will not wait for any other lock then we can ignore in the > deadlock detection path so that we don't detect the false deadlock due > to the group locking mechanism. So if we are already holding the > relation extension lock and trying to acquire the same lock-in same > mode then it can never wait so this is safe. > > > I continue to think that we'd be better off getting all of this > > out of the heavyweight lock manager. There is no reason why we > > should need deadlock detection, or multiple holds of the same > > lock, or pretty much anything that LWLocks don't give you. > > Right, we never need deadlock detection for this lock. But, I think > there are quite a few cases where we have multiple holds at the same > time. e.g, during RelationAddExtraBlocks, while holding the relation > extension lock we try to update the block in FSM and FSM might need to > add extra FSM block which will again try to acquire the same lock. > > But, I think the main reason for not converting it to an LWLocks is > because Andres has a concern about inventing new lock mechanism as > discuss upthread[1] > > Right, that is one point and another is that if we go via the route of converting it to LWLocks, then we also need to think of some solution for page locks that are used in ginInsertCleanup. However, if we go with the approach being pursued [1] then the page locks will be handled in a similar way as relation extension locks. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BNjo%2BpnqSNi2ScKf0BcVBWWf37BrW-pykVSG0B0C5Qig%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Crash by targetted recovery
At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao wrote in > > (It seems retroverting to the first patch when I started this...) > > The second place covers wider cases so I reverted the first place. > > Thanks for updating the patch that way. > Not sure which patch you're mentioning, though. That meant 0003. > Regarding 0003 patch, I added a bit more detail comments into > the patch so that we can understand the code more easily. > Updated version of 0003 patch attached. Barring any objection, > at first, I plan to commit this patch. Looks good to me. Thanks for writing the detailed comments. > >> + lastSourceFailed = false; /* We haven't failed on the new source */ > >> > >> Is this really necessary? Since ReadRecord() always reset > >> lastSourceFailed to false, it seems not necessary. > > It's just to make sure. Actually lastSourceFailed is always false > > when we get there. But when the source is switched, lastSourceFailed > > should be changed to false as a matter of design. I'd like to do that > > unless that harms. > > OK. Thanks. > >> - else if (currentSource == 0) > >> + else if (currentSource == 0 || > >> > >> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY? > >> There are some places where 0 is used as the value of currentSource. > >> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0. > > Yeah, I've thought that many times but have neglected since it is not > > critical and trivial as a separate patch. I'd take the chance to do > > that now. Another minor glitch is "int oldSource = currentSource;" it > > is not debugger-friendly so I changed it to XLogSource. It is added > > as a new patch file before the main patch. > > There seems to be more other places where XLogSource and > XLOG_FROM_XXX are not used yet. For example, the initial values > of readSource and XLogReceiptSource, the type of argument > "source" in XLogFileReadAnyTLI() and XLogFileRead(), etc. > These also should be updated? Right. I checked through the file and AFAICS that's all. The attachec v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current master. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 358f37899a02a8ae6f803fe32b97c2cbe302786f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Mar 2020 10:11:59 +0900 Subject: [PATCH v5] Tidy up XLogSource usage. We used interger 0 instead of XLOG_FROM_ANY and defined a variable to store the type with int. Tidy them up for readability and debugger-friendliness. --- src/backend/access/transam/xlog.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 21c0acb740..9ebfbf31c5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -795,7 +795,7 @@ static int readFile = -1; static XLogSegNo readSegNo = 0; static uint32 readOff = 0; static uint32 readLen = 0; -static XLogSource readSource = 0; /* XLOG_FROM_* code */ +static XLogSource readSource = XLOG_FROM_ANY; /* * Keeps track of which source we're currently reading from. This is @@ -804,7 +804,7 @@ static XLogSource readSource = 0; /* XLOG_FROM_* code */ * attempt to read from currentSource failed, and we should try another source * next. */ -static XLogSource currentSource = 0; /* XLOG_FROM_* code */ +static XLogSource currentSource = XLOG_FROM_ANY; static bool lastSourceFailed = false; typedef struct XLogPageReadPrivate @@ -823,7 +823,7 @@ typedef struct XLogPageReadPrivate * XLogReceiptSource tracks where we last successfully read some WAL.) */ static TimestampTz XLogReceiptTime = 0; -static XLogSource XLogReceiptSource = 0; /* XLOG_FROM_* code */ +static XLogSource XLogReceiptSource = XLOG_FROM_ANY; /* State information for XLOG reading */ static XLogRecPtr ReadRecPtr; /* start of last record read */ @@ -886,8 +886,8 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, bool find_free, XLogSegNo max_segno, bool use_lock); static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, - int source, bool notfoundOk); -static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source); + XLogSource source, bool notfoundOk); +static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source); static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf); static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, @@ -3633,7 +3633,7 @@ XLogFileOpen(XLogSegNo segno) */ static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, - int source, bool notfoundOk) + XLogSource source, bool notfoundOk) { char xlogfname[MAXFNAMELEN]; char activitymsg[MAXFNAMELEN + 16]; @@ -3715,7 +3715,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, * This version searches fo
Re: Some problems of recovery conflict wait events
On Mon, 9 Mar 2020 at 13:24, Fujii Masao wrote: > > > > On 2020/03/08 13:52, Masahiko Sawada wrote: > > On Thu, 5 Mar 2020 at 20:16, Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/03/05 16:58, Masahiko Sawada wrote: > >>> On Wed, 4 Mar 2020 at 15:21, Fujii Masao > >>> wrote: > > > > On 2020/03/04 14:31, Masahiko Sawada wrote: > > On Wed, 4 Mar 2020 at 13:48, Fujii Masao > > wrote: > >> > >> > >> > >> On 2020/03/04 13:27, Michael Paquier wrote: > >>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote: > Yeah, so 0001 patch sets existing wait events to recovery conflict > resolution. For instance, it sets (PG_WAIT_LOCK | > LOCKTAG_TRANSACTION) > to the recovery conflict on a snapshot. 0003 patch improves these > wait > events by adding the new type of wait event such as > WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) > patch > is the fix for existing versions and 0003 patch is an improvement for > only PG13. Did you mean even 0001 patch doesn't fit for > back-patching? > >> > >> Yes, it looks like a improvement rather than bug fix. > >> > > > > Okay, understand. > > > >>> I got my eyes on this patch set. The full patch set is in my opinion > >>> just a set of improvements, and not bug fixes, so I would refrain from > >>> back-backpatching. > >> > >> I think that the issue (i.e., "waiting" is reported twice needlessly > >> in PS display) that 0002 patch tries to fix is a bug. So it should be > >> fixed even in the back branches. > > > > So we need only two patches: one fixes process title issue and another > > improve wait event. I've attached updated patches. > > Thanks for updating the patches! I started reading 0001 patch. > >>> > >>> Thank you for reviewing this patch. > >>> > > - /* > -* Report via ps if we have been waiting for > more than 500 msec > -* (should that be configurable?) > -*/ > - if (update_process_title && new_status == NULL && > - TimestampDifferenceExceeds(waitStart, > GetCurrentTimestamp(), > - > 500)) > > The patch changes ResolveRecoveryConflictWithSnapshot() and > ResolveRecoveryConflictWithTablespace() so that they always add > "waiting" into the PS display, whether wait is really necessary or not. > But isn't it better to display "waiting" in PS basically when wait is > necessary, like originally ResolveRecoveryConflictWithVirtualXIDs() > does as the above? > >>> > >>> You're right. Will fix it. > >>> > > ResolveRecoveryConflictWithDatabase(Oid dbid) > { > + char*new_status = NULL; > + > + /* Report via ps we are waiting */ > + new_status = set_process_title_waiting(); > > In ResolveRecoveryConflictWithDatabase(), there seems no need to > display "waiting" in PS because no wait occurs when recovery conflict > with database happens. > >>> > >>> Isn't the startup process waiting for other backend to terminate? > >> > >> Yeah, you're right. I agree that "waiting" should be reported in this case. > >> > >> Currently ResolveRecoveryConflictWithLock() and > >> ResolveRecoveryConflictWithBufferPin() don't call > >> ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting" > >> in PS display. You changed them so that they report "waiting". I agree > >> to have this change. But this change is an improvement rather than > >> a bug fix, i.e., we should apply this change only in v13? > >> > > > > Did you mean ResolveRecoveryConflictWithDatabase and > > ResolveRecoveryConflictWithBufferPin? > > Yes! Sorry for my typo. > > > In the current code as far as I > > researched there are two cases where we don't add "waiting" and one > > case where we doubly add "waiting". > > > > ResolveRecoveryConflictWithDatabase and > > ResolveRecoveryConflictWithBufferPin don't update the ps title. > > Although the path where GetCurrentTimestamp() >= ltime is false in > > ResolveRecoveryConflictWithLock also doesn't update the ps title, it's > > already updated in WaitOnLock. On the other hand, the path where > > GetCurrentTimestamp() >= ltime is true in > > ResolveRecoveryConflictWithLock updates the ps title but it's wrong as > > I reported. > > > > I've split the patch into two patches: 0001 patch fixes the issue > > about doubly updating ps title, 0002 patch makes the recovery conflict > > resolution on database and buffer pin update the ps title. > > Thanks for splitting the patches. I think that 0001 patch can be back-patched. >
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander wrote in > On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao > wrote: > > I believe that the time required to estimate the backup size is not so large > > in most cases, so in the above idea, most users don't need to specify more > > option for the estimation. This is good for UI perspective. > > > > OTOH, users who are worried about the estimation time can use > > --no-estimate-backup-size option and skip the time-consuming estimation. > > Personally, I think this is the best idea. it brings a "reasonable > default", since most people are not going to have this problem, and > yet a good way to get out from the issue for those that potentially > have it. Especially since we are now already showing the state that > "walsender is estimating the size", it should be easy enugh for people > to determine if they need to use this flag or not. > > In nitpicking mode, I'd just call the flag --no-estimate-size -- it's > pretty clear things are about backups when you call pg_basebackup, and > it keeps the option a bit more reasonable in length. I agree to the negative option and the shortened name. What if both --no-estimate-size and -P are specifed? Rejecting as conflicting options or -P supercedes? I would choose the former because we don't know which of them has priority. $ pg_basebackup --no-estimate-size -P pg_basebackup: -P requires size estimate. $ regads. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > On Mon, 24 Feb 2020 at 19:08, Amit Kapila wrote: > > > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund > wrote: > > > > > > Hi, > > > > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: > > > > I think till we know the real need for changing group locking, going > > > > in the direction of what Tom suggested to use an array of LWLocks [1] > > > > to address the problems in hand is a good idea. > > > > > > -many > > > > > > I think that building yet another locking subsystem is the entirely > > > wrong idea - especially when there's imo no convincing architectural > > > reasons to do so. > > > > > > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do > > at other places as well (like BufferIO locks). I am not sure if we > > can call it as new locking subsystem, but if we decide to continue > > using lock.c and change group locking then I think we can do that as > > well, see my comments below regarding that. > > > > > > > > > It is not very clear to me that are we thinking to give up on Tom's > > > > idea [1] and change group locking even though it is not clear or at > > > > least nobody has proposed an idea/patch which requires that? Or are > > > > we thinking that we can do what Tom suggested for relation extension > > > > lock and also plan to change group locking for future parallel > > > > operations that might require it? > > > > > > What I'm advocating is that extension locks should continue to go > > > through lock.c. And yes, that requires some changes to group locking, > > > but I still don't see why they'd be complicated. > > > > > > > Fair position, as per initial analysis, I think if we do below three > > things, it should work out without changing to a new way of locking > > for relation extension or page type locks. > > a. As per the discussion above, ensure in code we will never try to > > acquire another heavy-weight lock after acquiring relation extension > > or page type locks (probably by having Asserts in code or maybe some > > other way). > > The current patch > (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) > doesn't check that acquiring a heavy-weight lock after page type lock, > is that right? No, it should do that. > There is the path doing that: ginInsertCleanup() holds > a page lock and insert the pending list items, which might hold a > relation extension lock. > Right, I could also see that, but do you see any problem with that? I agree that Assert should cover this case, but I don't see any fundamental problem with that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi wrote: > > At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander > wrote in > > On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao > > wrote: > > > I believe that the time required to estimate the backup size is not so > > > large > > > in most cases, so in the above idea, most users don't need to specify more > > > option for the estimation. This is good for UI perspective. > > > > > > OTOH, users who are worried about the estimation time can use > > > --no-estimate-backup-size option and skip the time-consuming estimation. > > > > Personally, I think this is the best idea. it brings a "reasonable > > default", since most people are not going to have this problem, and > > yet a good way to get out from the issue for those that potentially > > have it. Especially since we are now already showing the state that > > "walsender is estimating the size", it should be easy enugh for people > > to determine if they need to use this flag or not. > > > > In nitpicking mode, I'd just call the flag --no-estimate-size -- it's > > pretty clear things are about backups when you call pg_basebackup, and > > it keeps the option a bit more reasonable in length. > > I agree to the negative option and the shortened name. What if both > --no-estimate-size and -P are specifed? Rejecting as conflicting > options or -P supercedes? I would choose the former because we don't > know which of them has priority. I would definitely prefer rejecting an invalid combination of options. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Exposure related to GUC value of ssl_passphrase_command
Dear Kuroda-san, Fujii-san Thank you for review and commit! #Oops.. Sorry..This mail thread has been spammed in Gmail. I'll go to submit a new discussion after found which case could leak about the GUC parameters related to ssl_*. Please wait a bit. Best regards. Moon. On Mon, Mar 9, 2020 at 11:43 AM Fujii Masao wrote: > > > > On 2020/02/14 10:31, Moon, Insung wrote: > > Dear Hackers. > > > > Thank you for an response. > > I registered this entry in commifest of 2020-03. > > # I registered in the security part, but if it is wrong, sincerely > > apologize for this. > > > > And I'd like to review show authority to ssl_ * later and discuss it > > in a separate thread. > > So, you are planning to start new discussion about this? > > Regards, > > -- > Fujii Masao > NTT DATA CORPORATION > Advanced Platform Technology Group > Research and Development Headquarters
Re: logical copy_replication_slot issues
On Fri, 6 Mar 2020 at 20:02, Arseny Sher wrote: > > I wrote: > > > It looks good to me now. > > After lying for some time in my head it reminded me that > CreateInitDecodingContext not only pegs the LSN, but also xmin, so > attached makes a minor comment correction. > > While taking a look at the nearby code it seemed weird to me that > GetOldestSafeDecodingTransactionId checks PGXACT->xid, not xmin. Don't > want to investigate this at the moment though, and not for this thread. > > Also not for this thread, but I've noticed > pg_copy_logical_replication_slot doesn't allow to change plugin name > which is an omission in my view. It would be useful and trivial to do. > Thank you for updating the patch. The patch looks basically good to me but I have a few questions: /* -* Create logical decoding context, to build the initial snapshot. +* Create logical decoding context to find start point or, if we don't +* need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity. */ Do we need to numbering that despite not referring them? ctx = CreateInitDecodingContext(plugin, NIL, - false, /* do not build snapshot */ + false, /* do not build data snapshot */ restart_lsn, logical_read_local_xlog_page, NULL, NULL, NULL); I'm not sure this change makes the comment better. Could you elaborate on the motivation of this change? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: reindex concurrently and two toast indexes
On Fri, Mar 06, 2020 at 01:36:48PM +0100, Julien Rouhaud wrote: > Ah I see, thanks for the clarification. I guess there's room for improvement > in the comments about that, since the ERRCODE_FEATURE_NOT_SUPPORTED usage is > quite misleading there. > > v4 attached, which doesn't prevent a REINDEX INDEX CONCURRENTLY on any invalid > non-TOAST index anymore. Thanks. The position of the error check in reindex_relation() is correct, but as it opens a relation for the cache lookup let's invent a new routine in lsyscache.c to grab pg_index.indisvalid. It is possible to make use of this routine with all the other checks: - WARNING for REINDEX TABLE (non-conurrent) - ERROR for REINDEX INDEX (non-conurrent) - ERROR for REINDEX INDEX CONCURRENTLY (There is already a WARNING for REINDEX TABLE CONCURRENTLY.) I did not find the addition of an error check in ReindexIndex() consistent with the existing practice to check the state of the relation reindexed in reindex_index() (for the non-concurrent case) and ReindexRelationConcurrently() (for the concurrent case). Okay, this leads to the introduction of two new ERROR messages related to invalid toast indexes for the concurrent and the non-concurrent cases when using REINDEX INDEX instead of one, but having two messages leads to something much more consistent with the rest, and all checks remain centralized in the same routines. For the index-level operation, issuing a WARNING is not consistent with the existing practice to use an ERROR, which is more adapted as the operation is done on a single index at a time. For the check in reindex_relation, it is more consistent to check the namespace of the index instead of the parent relation IMO (the previous patch used "rel", which refers to the parent table). This has in practice no consequence though. It would have been nice to test this stuff. However, this requires an invalid toast index and we cannot create that except by canceling a concurrent reindex, leading us back to the upthread discussion about isolation tests, timeouts and fault injection :/ Any opinions? -- Michael From 2e808a2971fcaf160f6a1e6c9c80366ff053bccf Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 9 Mar 2020 14:43:33 +0900 Subject: [PATCH v2] Forbid reindex of invalid indexes on TOAST tables Such indexes can only be duplicated leftovers of a previously failed REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to exist already. As we only allow the drop of invalid indexes on TOAST tables, reindexing these would lead to useless duplicated indexes that can't be dropped anymore. Thanks to Justin Pryzby for reminding that this problem was reported long ago, but it has never been addressed. Reported-by: Sergei Kornilov, Justin Pryzby Author: Julien Rouhaud Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net Discussion: https://postgr.es/m/20200216190835.ga21...@telsasoft.com Backpatch-through: 12 --- src/include/utils/lsyscache.h | 1 + src/backend/catalog/index.c | 28 src/backend/commands/indexcmds.c| 12 src/backend/utils/cache/lsyscache.c | 23 +++ 4 files changed, 64 insertions(+) diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index f132d39458..131d10eab0 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -181,6 +181,7 @@ extern char *get_namespace_name_or_temp(Oid nspid); extern Oid get_range_subtype(Oid rangeOid); extern Oid get_range_collation(Oid rangeOid); extern Oid get_index_column_opclass(Oid index_oid, int attno); +extern bool get_index_isvalid(Oid index_oid); #define type_is_array(typid) (get_element_type(typid) != InvalidOid) /* type_is_array_domain accepts both plain arrays and domains over arrays */ diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 7223679033..ec2d7dc9cb 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -46,6 +46,7 @@ #include "catalog/pg_depend.h" #include "catalog/pg_description.h" #include "catalog/pg_inherits.h" +#include "catalog/pg_namespace_d.h" #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_tablespace.h" @@ -3474,6 +3475,17 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot reindex temporary tables of other sessions"))); + /* + * Don't allow reindex on an invalid toast index. This is a leftover from + * a failed REINDEX CONCURRENTLY, and if rebuilt it would not be possible + * to drop it anymore. + */ + if (RelationGetNamespace(iRel) == PG_TOAST_NAMESPACE && + !get_index_isvalid(indexId)) + ereport(ERROR, +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot reindex invalid toast index"))); + /* * Also check for active uses of the index in the curre
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Mon, 9 Mar 2020 at 14:16, Amit Kapila wrote: > > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada > wrote: >> >> On Mon, 24 Feb 2020 at 19:08, Amit Kapila wrote: >> > >> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund wrote: >> > > >> > > Hi, >> > > >> > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote: >> > > > I think till we know the real need for changing group locking, going >> > > > in the direction of what Tom suggested to use an array of LWLocks [1] >> > > > to address the problems in hand is a good idea. >> > > >> > > -many >> > > >> > > I think that building yet another locking subsystem is the entirely >> > > wrong idea - especially when there's imo no convincing architectural >> > > reasons to do so. >> > > >> > >> > Hmm, AFAIU, it will be done by having an array of LWLocks which we do >> > at other places as well (like BufferIO locks). I am not sure if we >> > can call it as new locking subsystem, but if we decide to continue >> > using lock.c and change group locking then I think we can do that as >> > well, see my comments below regarding that. >> > >> > > >> > > > It is not very clear to me that are we thinking to give up on Tom's >> > > > idea [1] and change group locking even though it is not clear or at >> > > > least nobody has proposed an idea/patch which requires that? Or are >> > > > we thinking that we can do what Tom suggested for relation extension >> > > > lock and also plan to change group locking for future parallel >> > > > operations that might require it? >> > > >> > > What I'm advocating is that extension locks should continue to go >> > > through lock.c. And yes, that requires some changes to group locking, >> > > but I still don't see why they'd be complicated. >> > > >> > >> > Fair position, as per initial analysis, I think if we do below three >> > things, it should work out without changing to a new way of locking >> > for relation extension or page type locks. >> > a. As per the discussion above, ensure in code we will never try to >> > acquire another heavy-weight lock after acquiring relation extension >> > or page type locks (probably by having Asserts in code or maybe some >> > other way). >> >> The current patch >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) >> doesn't check that acquiring a heavy-weight lock after page type lock, >> is that right? > > > No, it should do that. > >> >> There is the path doing that: ginInsertCleanup() holds >> a page lock and insert the pending list items, which might hold a >> relation extension lock. > > > Right, I could also see that, but do you see any problem with that? I agree > that Assert should cover this case, but I don't see any fundamental problem > with that. I think that could be a problem if we change the group locking so that it doesn't consider page lock type. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Crash by targetted recovery
On 2020/03/09 13:49, Kyotaro Horiguchi wrote: At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao wrote in (It seems retroverting to the first patch when I started this...) The second place covers wider cases so I reverted the first place. Thanks for updating the patch that way. Not sure which patch you're mentioning, though. That meant 0003. Regarding 0003 patch, I added a bit more detail comments into the patch so that we can understand the code more easily. Updated version of 0003 patch attached. Barring any objection, at first, I plan to commit this patch. Looks good to me. Thanks for writing the detailed comments. Thanks for the review! Pushed. I will review other two patches later. There seems to be more other places where XLogSource and XLOG_FROM_XXX are not used yet. For example, the initial values of readSource and XLogReceiptSource, the type of argument "source" in XLogFileReadAnyTLI() and XLogFileRead(), etc. These also should be updated? Right. I checked through the file and AFAICS that's all. The attachec v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current master. Thanks for the patch! Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > On Mon, 9 Mar 2020 at 14:16, Amit Kapila wrote: > > > > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada < > masahiko.saw...@2ndquadrant.com> wrote: > >> > > >> > Fair position, as per initial analysis, I think if we do below three > >> > things, it should work out without changing to a new way of locking > >> > for relation extension or page type locks. > >> > a. As per the discussion above, ensure in code we will never try to > >> > acquire another heavy-weight lock after acquiring relation extension > >> > or page type locks (probably by having Asserts in code or maybe some > >> > other way). > >> > >> The current patch > >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch) > >> doesn't check that acquiring a heavy-weight lock after page type lock, > >> is that right? > > > > > > No, it should do that. > > > >> > >> There is the path doing that: ginInsertCleanup() holds > >> a page lock and insert the pending list items, which might hold a > >> relation extension lock. > > > > > > Right, I could also see that, but do you see any problem with that? I > agree that Assert should cover this case, but I don't see any fundamental > problem with that. > > I think that could be a problem if we change the group locking so that > it doesn't consider page lock type. > I might be missing something, but won't that be a problem only when if there is a case where we acquire page lock after acquiring a relation extension lock? Can you please explain the scenario you have in mind which can create a problem? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com