Re: [HACKERS] REVIEW proposal: a validator for configuration files
On 09/10/2011 11:39 AM, Alexey Klyukin wrote: Hi Andy, On Sep 7, 2011, at 6:40 AM, Andy Colson wrote: Hi Alexey, I was taking a quick look at this patch, and have a question for ya. ... Where did the other warnings go? Its right though, line 570 is bad. It also seems to have killed the server. I have not gotten through the history of messages regarding this patch, but is it supposed to kill the server if there is a syntax error in the config file? Thank you for looking at this patch. v4 was more a "what if" concept that took a lot of time for somebody to look at. There were a lot of problems with it, but I think I've nailed down most of them. Attached is v5. It should fix both problems you've experienced with v4. As with the current code, the startup process gets interrupted on any error detected in the configuration file. Unlike the current code, the patch tries to report all of them before bailing out. The behavior during configuration reload has changed significantly. Instead of ignoring all changes after the first error, the code reports the problematic value and continues. It only skips applying new values completely on syntax errors and invalid configuration option names. In no cases should it bring the server down during reload. One problem I'm not sure how to address is the fact that we require 2 calls of set_config_option for each option, one to verify the new value and another to actually apply it. Normally, this function returns true for a valid value and false if it is unable to set the new value for whatever reason (either the value is invalid, or the value cannot be changed in the context of the caller). However, calling it once per value in the 'apply' mode during reload produces false for every unchanged value that can only be changed during startup (i.e. shared_buffers, or max_connections). If we ignore its return value, we'll lose the ability to detect whether invalid values were encountered during the reload and report this fact at the end of the function. I think the function should be changed, as described in my previous email (http://archives.postgresql.org/message-id/97A66029-9D3E-43AF-95AA-46FE1B447447(at)commandprompt(dot)com) and I'd like to hear other opinions on that. Meanwhile , due t o 2 calls to set_config_option, it currently reports all invalid values twice. If others will be opposed to changing the set_config_option, I'll fix this by removing the first, verification call and final 'errors were detected' warning to avoid 'false positives' on that (i.e. the WARNING you saw with the previous version for the valid .conf). I'd appreciate your further comments and suggestions. Thank you. -- Alexey Klyukinhttp://www.commandprompt.com The PostgreSQL Company – Command Prompt, Inc. After a quick two minute test, this patch seems to work well. It does just what I think it should. I'll add it to the commitfest page for ya. -Andy Alexey, I've included my review procedure outline and output below. Your patch behaves as you described in the email containing 'v5' and judging by Andy's earlier success, I think this is ready to go. I will mark this 'Ready for Committer'. Alexander --- TESTING METHODOLOGY === * Start postgres with stock, valid postgresql.conf * Record output * Stop postgres * Add invalid configuration option to postgresql.conf * Start postgres * Record output * Stop postgres * Add configuration option with bad syntax to postgresql.conf * Start postgres * Record output SETUP - ./configure --prefix=~/builds/pgsql/orig; make install ./configure --prefix=~/builds/pgsql/patched/guc-param-validator; make install cd ~/builds/pgsql/orig; bin/initdb -D data; bin/postgres -D data cd ~/builds/pgsql/patched/guc-param-validator; bin/initdb -D data; bin/postgres -D data OUTPUT -- orig: LOG: database system was shut down at 2011-09-28 20:26:17 PDT LOG: database system is ready to accept connections LOG: autovacuum launcher started # alter postgresql.conf, add bogus param # unknown_config_param = on $ kill -9 `pgrep postgres | head -n1`; bin/postgres -D data FATAL: unrecognized configuration parameter "unknown_config_param" # alter postgresql.conf, add param with bad syntax # unknown_config_param @@= on $ kill -9 `pgrep postgres | head -n1`; bin/postgres -D data FATAL: syntax error in file "/home/av/builds/pgsql/orig/data/postgresql.conf" line 156, near token "@" patched: LOG: database system was shut down at 2011-09-28 20:12:39 PDT LOG: database system is ready to accept connections LOG: autovacuum launcher started # alter postgresql.conf, add bogus param # unknown_config_param = on $ kill -9 `pgrep postgres | head -n1`; bin/postgres -D data LOG: unrecognized configuration parameter "unknown_config_param" LOG
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sat, Feb 27, 2016 at 2:44 AM, Andres Freund wrote: > On 2016-02-02 13:12:50 +0300, Alexander Korotkov wrote: > > On Tue, Feb 2, 2016 at 12:43 AM, Andres Freund > wrote: > > > > > On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote: > > > > On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov < > > > > a.korot...@postgrespro.ru> wrote: > > > > >> ClientBasePatch > > > > >> 11974419382 > > > > >> 8125923126395 > > > > >> 3231393151 > > > > >> 64387339496830 > > > > >> 128306412350610 > > > > >> > > > > >> Shared Buffer= 512MB > > > > >> max_connections=150 > > > > >> Scale Factor=300 > > > > >> > > > > >> ./pgbench -j$ -c$ -T300 -M prepared -S postgres > > > > >> > > > > >> ClientBasePatch > > > > >> 11716916454 > > > > >> 8108547105559 > > > > >> 32241619262818 > > > > >> 64206868233606 > > > > >> 128137084217013 > > > > > > So, there's a small regression on low client counts. That's worth > > > addressing. > > > > > > > Interesting. I'll try to reproduce it. > > Any progress here? > I didn't reproduce the regression. I had access to multicore machine but didn't see either regression on low clients or improvements on high clients. In the attached path spinlock delay was exposed in s_lock.h and used in LockBufHdr(). Dilip, could you try this version of patch? Could you also run perf or other profiler in the case of regression. It would be nice to compare profiles with and without patch. We probably could find the cause of regression. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company pinunpin-cas-3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The plan for FDW-based sharding
On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas wrote: > On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian wrote: > > On Tue, Mar 1, 2016 at 10:19:45AM -0500, Robert Haas wrote: > >> > Two reasons: > >> > 1. There is no ideal implementation of DTM which will fit all > possible needs > >> > and be efficient for all clusters. > >> > >> Hmm, what is the reasoning behind that statement? I mean, it is > >> certainly true that there are some places where we have decided that > >> one-size-fits-all is not the right approach. Indexing, for example. > > > > Uh, is that even true of indexing? While the plug-in nature of indexing > > allows for easier development and testing, does anyone create plug-in > > indexing that isn't shipped by us? I thought WAL support was something > > that prevented external indexing solutions from working. > > True. There is an API, though, and having pluggable WAL support seems > desirable too. At the same time, I don't think we know of anyone > maintaining a non-core index AM ... and there are probably good > reasons for that. It's because we didn't offer legal mechanism for pluggable AMs. > We end up revising the index AM API pretty > regularly every time somebody wants to do something new, so it's not > really a stable API that extensions can just tap into. I can't buy this argument. One may say this about any single API. Thinking so will lead you to rejecting any extendability. And that would be in direct contradiction to original postgres concept. During last 5 years we add 2 new AMs: SP-GiST and BRIN. And BRIN is very different from any other AM we have before. And I wouldn't say that AM API have dramatical changes during that time. There were some changes. But it would be normal work for extension maintainers to adopt these changes like they do for other API changes. There is simple example where we lack of extensible AMs: fast full-text search. We can't provide it with current GIN, because we lack of positional information in it. And we can't push these advances into core because current implementation has not perfect design. Ideal design would be push all required functionality into btree, then make GIN wrapper over btree then add required functionality. But this is roadmap for 5-10 years. These 5-10 years uses will suffer from having 3-rd party solutions for fast FTS instead of in core one. But our design questions is actually not something that users care about. It's not reliability questions. And having pluggable AMs would be really chance in this situation. Users could use extension right now. And then when after many years we finally implement the right design, they could migrate to in-core solution. But 5-10 years of fast FTS does matter. > I suspect that > a transaction manager API would end up similarly situated. > I disagree with you about AM API. But I agree that TM API should be in similar situation to AM API. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] The plan for FDW-based sharding
On Tue, Mar 1, 2016 at 10:11 PM, Bruce Momjian wrote: > On Tue, Mar 1, 2016 at 02:02:44PM -0500, Bruce wrote: > > On Tue, Mar 1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote: > > > Note that I am not saying that other discussed approaches are any > > > better, I am saying that we should know approximately what we > > > actually want and not just beat FDWs with a hammer and hope sharding > > > will eventually emerge and call that the plan. > > > > I will say it again --- FDWs are the only sharding method I can think of > > that has a chance of being accepted into Postgres core. It is a plan, > > and if it fails, it fails. If is succeeds, that's good. What more do > > you want me to say? I know of no other way to answer the questions you > > asked above. > > I guess all I can say is that if FDWs existed when Postgres XC/XL were > being developed, that they likely would have been used or at least > considered. I think we are basically making that attempt now. If FDWs existed then Postgres XC/XL were being developed then I believe they would try to build full-featured prototype of FDW based sharding. If this prototype succeed then we could make a full roadmap. For now, we don't have a full roadmap, we have only some pieces. This is why people doubt. When you're speaking about advances that are natural to FDW, then no problem, nobody is against FDW advances. However, other things are unclear. You can try to build full-featured prototype to convince people. Despite it would take some resources it will save more resources because it would save us from errors. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] The plan for FDW-based sharding
On Wed, Mar 2, 2016 at 9:53 PM, Josh berkus wrote: > On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote: > >> Sorry, but based on this plan it is possible to make a conclusion that >> there are only two possible cluster solutions for Postgres: >> XC/XL and FDW-based. From my point of view there are much more >> possible alternatives. >> > > Definitely. > > Currently we have five approaches to sharding inside postgres in the > field, in chronological order: > > 1. Greenplum's executor-based approach with motion nodes > > 2. Skype's function-based approach (PL/proxy) > > 3. XC/XL's approach, which I believe is also query executor-based > > 4. CitusDB's pg_shard which is based on query hooks > > 5. FDW-based (currently theoretical) > > One of the things which causes bad reactions and arguments, Bruce, is that > a lot of your posts and presentations detailing plans for the FDW approach > carry the subtext that all four of the other approaches are dead ends and > not worth considering. Given that the other approaches, whatever their > limitations, have working code in the field and the FDW approach does not, > that's more than a little offensive. > > If we want to move forwards on serious work on FDW-based sharding, the > folks working on it should stop treating it as a "fait accompli" that this > is the Chosen Way for the PostgreSQL project. Otherwise, you'll spend all > of your time arguing that point instead of working on features that matter. > > Bruce made a long comparison with built-in replication, but there's a big > difference here. We decided that WAL-based replication was the way to go > for built-in as a community decision here on -hackers and at various > conferences. Both the plan and the implementation for replication > transcended company backing, involving even active competitors, and > involved discussions with maintainers of the older replication projects. > > In contrast, this FDW plan *still* feels very much like a small group made > up of employees of only two companies came up with it in private and > decided that it should be the plan for the whole project. I know that > Bruce and others have good reasons for starting the FDW project, but there > hasn't been much of an attempt to obtain community consensus around it. If > Bruce and others want contributors to work on FDWs instead of other > sharding approaches, then they need to win over those people as to why they > should do that. It's how this community works. > > Alternately, you can just work on the individual FDW features, which > *everyone* thinks are a good idea, and when most of them are done, > FDW-based scaleout will be such an obvious solution that nobody will argue > with it. +1 Thank you, Josh. I think this is excellent summary for conversation about FDW-based sharding. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] improving GROUP BY estimation
Hi, Tomas! I've assigned to review this patch. I've checked version estimate-num-groups-v2.txt by Mark Dilger. It applies to head cleanly, passes corrected regression tests. About correlated/uncorrelated cases. I think now optimizer mostly assumes all distributions to be independent. I think we should follow this assumption in this case also until we have fundamentally better option (like your multivariate statistics). @@ -3438,9 +3438,9 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows, reldistinct = clamp; /* - * Multiply by restriction selectivity. + * Estimate number of distinct values expected in given number of rows. */ - reldistinct *= rel->rows / rel->tuples; + reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows)); /* * Update estimate of total distinct groups. I think we need way more explanation in comments here (possibly with external links). For now, it looks like formula which appears from nowhere. Also, note that rel->tuples gone away from formula parameters. So, we actually don't care about how may tuples are in the table. This is because this formula assumes that same tuple could be selected multiple times. For low numbers this can lead to some errors. Consider selecting 2 from 3 distinct tuples. If you can't select same tuple twice then you always selecting 2 distinct tuples. But according to formula you will select 5/3 in average. I think this error in small numbers is negligible, especially because getting rid of this error would require way more computations. But it worth mentioning in comments though. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] improving GROUP BY estimation
On Thu, Mar 3, 2016 at 7:35 PM, Tomas Vondra wrote: > On 03/03/2016 12:53 PM, Alexander Korotkov wrote: > >> I've assigned to review this patch. >> >> I've checked version estimate-num-groups-v2.txt by Mark Dilger. >> It applies to head cleanly, passes corrected regression tests. >> >> About correlated/uncorrelated cases. I think now optimizer mostly assumes >> all distributions to be independent. I think we should follow this >> assumption in this case also until we have fundamentally better option >> (like >> your multivariate statistics). >> >> @@ -3438,9 +3438,9 @@ estimate_num_groups(PlannerInfo *root, List >> *groupExprs, >> double input_rows, >> reldistinct = clamp; >> >> /* >> -* Multiply by restriction selectivity. >> +* Estimate number of distinct values expected in given number of rows. >> */ >> -reldistinct *= rel->rows / rel->tuples; >> +reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows)); >> >> /* >> * Update estimate of total distinct groups. >> >> I think we need way more explanation in comments here (possibly with >> external links). For now, it looks like formula which appears from >> nowhere. >> > > I thought the details (particularly the link to stackexchange, discussing > the formula) would be enough, but let me elaborate. > > The current formula > > reldistinct *= rel->rows / rel->tuples; > > simply multiplies the ndistinct estimate with selectivity. So when you > select 1% of the table, the estimate says we'll see 1% of ndistinct values. > But clearly that's naive, because for example when you have 10k distinct > values and you select 10M rows, you should expect nearly all of them in the > sample. > > And that's what the formula does - it gives you the expected number of > distinct values, when selecting 'k' values from 'd' distinct values with > replacements: > > count(k, d) = d * (1 - ((d-1)/d) ^ k) > > It's assuming the distinct values are equally probable (uniform > distribution) and that the probabilities do not change (that's the "with > replacement"). > > I guess we could relax those assumptions and for example use the MCV > statistics to further improve the estimate, and also relax the 'with > replacement' but that'd make the formula way more complicated. > > [1] > http://math.stackexchange.com/questions/72223/finding-expected-number-of-distinct-values-selected-from-a-set-of-integers Your explanation in the first mail was good enough. Not it's even better. But actually I mean that we need to include some brief explanation into source code (either comments or readme). It would be good if one who want to understand this could do without searching mailing list archives or git history. >> Also, note that rel->tuples gone away from formula parameters. So, we >> actually don't care about how may tuples are in the table. This is because >> this formula assumes that same tuple could be selected multiple times. For >> low numbers this can lead to some errors. Consider selecting 2 from 3 >> distinct tuples. If you can't select same tuple twice then you always >> selecting 2 distinct tuples. But according to formula you will select 5/3 >> in >> average. I think this error in small numbers is negligible, especially >> because getting rid of this error would require way more computations. But >> it worth mentioning in comments though. >> > > Well, yeah. That's the consequence of 'with replacement' assumption. > > I guess we could handle this somehow, though. For example we can compute > the minimum possible number of distinct values like this: > > average_rows_per_value = (tuples / ndistinct); > min_estimate = ceil(rows / average_rows_per_value); > > and use that as a minimum for the estimate. In the example you've given > this would say > > average_rows_per_value = (3 / 3) = 1; > min_estimate = ceil(2 / 1) = 2; > > So it serves as a correction for the 'with replacement' assumption. With > only 2 distinct values we'd get this: > > average_rows_per_value = (3 / 2) = 1.5; > min_estimate = ceil(2 / 1.5) = 2; > > Of course, it's just a heuristics, so this may fail in some other cases. I'm not sure we actually need it. Even in worst case error doesn't seems to be very big. But feel free to add this heuristics, it looks OK for me. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] improving GROUP BY estimation
On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra wrote: > So yes, each estimator works great for exactly the opposite cases. But > notice that typically, the results of the new formula is much higher than > the old one, sometimes by two orders of magnitude (and it shouldn't be > difficult to construct examples of much higher differences). > > The table also includes the 'average' estimator you propose, but it's > rather obvious that the result is always much closer to the new value, > simply because > >(small number) + (huge number) >-- > 2 > > is always much closer to the huge number. We're usually quite happy when > the estimates are within the same order of magnitude, so whether it's K or > K/2 makes pretty much no difference. I believe that Mark means geometrical average, i.e. sqrt((small number) * (huge number)). -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila wrote: > > I think the wait event types should be documented - and the wait > > events too, perhaps. > > > > As discussed upthread, I have added documentation for all the possible > wait events and an example. Some of the LWLocks like AsyncQueueLock and > AsyncCtlLock are used for quite similar purpose, so I have kept there > explanation as same. > Do you think it worth grouping rows in "wait_event Description" table by wait event type? It would be nice if user a priori knows which wait event have which type. Readability of this large table will be also improved. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Fri, Mar 4, 2016 at 4:20 PM, Amit Kapila wrote: > On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov > wrote: > >> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila >> wrote: >> >>> > I think the wait event types should be documented - and the wait >>> > events too, perhaps. >>> > >>> >>> As discussed upthread, I have added documentation for all the possible >>> wait events and an example. Some of the LWLocks like AsyncQueueLock and >>> AsyncCtlLock are used for quite similar purpose, so I have kept there >>> explanation as same. >>> >> >> Do you think it worth grouping rows in "wait_event Description" table by >> wait event type? >> > > They are already grouped (implicitly), do you mean to say that we should > add wait event type name as well in that table? > Yes. > If yes, then the only slight worry is that there will lot of repetition in > wait_event_type column, otherwise it is okay. > There is morerows attribute of entry tag in Docbook SGML, it behaves like rowspan in HTML. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] pg_resetxlog reference page reorganization
Hi, Peter! I've assigned myself as reviewer of this patch. On Tue, Mar 1, 2016 at 2:53 AM, Peter Eisentraut wrote: > The pg_resetxlog reference page has grown over the years into an > unnavigable jungle, so here is a patch that reorganizes it to be more in > the style of the other ref pages, with a normal options list. > Patch applies cleanly on head, documentation compiles with no problem. pg_resetxlog page definitely looks much better than it was before. I don't see any problems or issues with this patch. So, I mark it "Ready for committer". -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] syslog configurable line splitting behavior
On Sat, Feb 27, 2016 at 6:49 AM, Peter Eisentraut wrote: > Writing log messages to syslog caters to ancient syslog implementations > in two ways: > > - sequence numbers > - line splitting > > While these are arguably reasonable defaults, I would like a way to turn > them off, because they get in the way of doing more interesting things > with syslog (e.g., logging somewhere that is not just a text file). > > So I propose the two attached patches that introduce new configuration > Boolean parameters syslog_sequence_numbers and syslog_split_lines that > can toggle these behaviors. > Would it have any usage if we make PG_SYSLOG_LIMIT configurable (-1 for disable) instead of introducing boolean? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Upper planner pathification
Hi, Tom! I have a question about Sort path. AFAICS this question wasn't mentioned in the upthread discussion. We're producing Sort plans in two ways: from explicit Sort paths, and from other paths which implicitly assumes sorting (like MergeJoin path). Would it be better to produce Sort plan only from explicit Sort path? Thus, MergeJoin path would add explicit children Sort paths. That would be more unified way. I ask about this from point of view of my "Partial Sort" patch. The absence of implicit sorts would help to make this patch more simple and clean. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Upper planner pathification
On Wed, Mar 9, 2016 at 5:47 PM, Tom Lane wrote: > Alexander Korotkov writes: > > I have a question about Sort path. AFAICS this question wasn't mentioned > in > > the upthread discussion. > > We're producing Sort plans in two ways: from explicit Sort paths, and > from > > other paths which implicitly assumes sorting (like MergeJoin path). > > Would it be better to produce Sort plan only from explicit Sort path? > Thus, > > MergeJoin path would add explicit children Sort paths. That would be more > > unified way. > > Meh. I think the only real effect that would have is to make it slower to > build MergeJoin paths (and in a typical join problem, we build a lot of > those). The entire point of the Path/Plan distinction is to postpone > until createplan.c any work that's not necessary to arrive at a cost > estimate. So the way MergeJoinPath works now seems fine to me. > > > I ask about this from point of view of my "Partial Sort" patch. The > absence > > of implicit sorts would help to make this patch more simple and clean. > > There are other implicit sorts besides those. Admittedly, the efficiency > argument for not making the sort explicit in UniquePath or MergeAppendPath > is a lot weaker than it is for MergeJoin, just because those are less > common path types. But getting rid of the behavior entirely would be > a lot of work, and I'm not convinced it'd be an improvement. > Thank you for the explanation. I'll try to rebase "Partial Sort" leaving this situation as is. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] utils/misc: Simplify search of end of argv in save_ps_display_args()
Hello, Attached patch provides trivial simplification of the search of end of the argv[] area by replacing for() loop with calculation based on argc. diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 892a810..d8f2105 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -134,11 +134,8 @@ save_ps_display_args(int argc, char **argv) /* * check for contiguous argv strings */ - for (i = 0; i < argc; i++) - { - if (i == 0 || end_of_area + 1 == argv[i]) -end_of_area = argv[i] + strlen(argv[i]); - } + if (argc) + end_of_area = argv[argc - 1] + strlen(argv[argc - 1]); if (end_of_area == NULL) /* probably can't happen? */ { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Mon, Mar 7, 2016 at 6:19 PM, Robert Haas wrote: > On Sat, Mar 5, 2016 at 7:22 AM, Dilip Kumar wrote: > > On Wed, Mar 2, 2016 at 11:05 AM, Dilip Kumar > wrote: > >> And this latest result (no regression) is on X86 but on my local > machine. > >> > >> I did not exactly saw what this new version of patch is doing different, > >> so I will test this version in other machines also and see the results. > > > > > > I tested this on PPC again, This time in various order (sometime patch > first > > and then base first). > > I tested with latest patch pinunpin-cas-2.patch on Power8. > > > > Shared Buffer = 8GB > > ./pgbench -j$ -c$ -T300 -M prepared -S postgres > > > > BASE > > - > > Clientsrun1run2run3 > > 1 212001875420537 > > 2 403313952038746 > > > > > > Patch > > - > > Clientsrun1run2run3 > > 1 202251980619778 > > 2 398304189836620 > > > > I think, here we can not see any regression, (If I take median then it > may > > looks low with patch so posting all 3 reading). > > If the median looks low, how is that not a regression? > I don't think we can rely on median that much if we have only 3 runs. For 3 runs we can only apply Kornfeld method which claims that confidence interval should be between lower and upper values. Since confidence intervals for master and patched versions are overlapping we can't conclude that expected TPS numbers are different. Dilip, could you do more runs? 10, for example. Using such statistics we would be able to conclude something. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Background Processes and reporting
On Sat, Mar 12, 2016 at 12:22 AM, Andres Freund wrote: > On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote: > > It was many times stated in threads about waits monitoring [0, 1, 2] > > and supported by different people, but ultimately waits information > > was stored in PgBackendStatus. > > Only that it isn't. It's stored in PGPROC. Yes. And this is good. Original concept of single byte PgBackendStatus of Robert looks incompatible with any further development of wait monitoring. > This criticism is true of > the progress reporting patch, but a quick scan of the thread doesn't > show authors of the wait events patch participating there. > > > > Can’t we think one more time about implementation provided by Ildus > > and Alexander here [3]? > > I don't think so. Afaics the proposed patch tried to do too many things > at once, and it's authors didn't listen well to criticism. Original patch really did too many things at once. But it was good as prototype. We should move forward by splitting it into many smaller parts. But we didn't because disagreement about design. Idea of individual time measurement of every wait event met criticism because it might have high overhead [1]. This is really so at least for Windows [2]. But accessing only current values wouldn't be very useful. We anyway need to gather some statistics. Gathering it by sampling would be both more expensive and less accurate for majority of systems. This is why I proposed hooks to make possible platform dependent extensions. Robert rejects hook because he is "not a big fan of hooks as a way of resolving disagreements about the design" [3]. Besides that is actually not design issues but platform issues... Another question is wait parameters. We want to expose wait event with some parameters. Robert rejects that because it *might* add additional overhead [3]. When I proposed to fit something useful into hard-won 4-bytes, Roberts claims that it is "too clever" [4]. So, situation looks like dead-end. I have no idea how to convince Robert about any kind of advanced functionality of wait monitoring to PostgreSQL. I'm thinking about implementing sampling extension over current infrastructure just to make community see that it sucks. Andres, it would be very nice if you have any idea how to move this situation forward. Another aspect is that EnterpriseDB offers waits monitoring in proprietary fork [5]. > SQL Session/System wait tuning diagnostics > The Dynamic Runtime Instrumentation Tools Architecture (DRITA) allows a > DBA to query catalog views to determine the wait events that affect the > performance of individual sessions or the system as a whole. DRITA records > the *number of times each event occurs as well as the time spent waiting*; > you can use this information to diagnose performance problems. And they declare to measure time of individual wait events. This is exactly thing which Robert resist so much. So, I suspect some kind of hidden motivation here. Probably, EBD guys just don't want to loose majority of their proprietary product over open source PostgreSQL. 1. http://www.postgresql.org/message-id/ca+tgmobdc4fbfpa49j6ok-ikazyvhw71vssotpvmf4tdyzr...@mail.gmail.com 2. http://www.postgresql.org/message-id/capphfdvflevfetzgjprnrexfliyrksgkdu98xo5xx0hw5gu...@mail.gmail.com 3. http://www.postgresql.org/message-id/ca+tgmoz-8zpoum9bgtbup1u4duqhc-9epedlzyk0dg4pkmd...@mail.gmail.com 4. http://www.postgresql.org/message-id/ca+tgmobyf2vchghvoks7yrxa2wde+wmmrmxyughvx_w-0wx...@mail.gmail.com 5. http://www.enterprisedb.com/products-services-training/products/postgres-plus-advanced-server?quicktabs_advanceservertab=3#quicktabs-advanceservertab -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] [PATCH] Use MemoryContextAlloc() in the MemoryContextAllocZero() and MemoryContextAllocZeroAligned()
Hello all, Attached patch simplifies the MemoryContextAllocZero() and MemoryContextAllocZeroAligned(). The MemoryContextAllocZero() and MemoryContextAllocZeroAligned() functions does almost the same that MemoryContextAlloc() does. Additionally these functions fills allocated memory context with zeros via MemSetAligned() and MemSetLoop(). Let's call MemoryContextAlloc() in these functions instead of setting isReset to false, call alloc() callback of the context and etc., to prevent code duplication. diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 2bfd364..2845089 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -788,26 +788,7 @@ MemoryContextAllocZero(MemoryContext context, Size size) { void *ret; - AssertArg(MemoryContextIsValid(context)); - AssertNotInCriticalSection(context); - - if (!AllocSizeIsValid(size)) - elog(ERROR, "invalid memory alloc request size %zu", size); - - context->isReset = false; - - ret = (*context->methods->alloc) (context, size); - if (ret == NULL) - { - MemoryContextStats(TopMemoryContext); - ereport(ERROR, -(errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("out of memory"), - errdetail("Failed on request of size %zu.", size))); - } - - VALGRIND_MEMPOOL_ALLOC(context, ret, size); - + ret = MemoryContextAlloc(context, size); MemSetAligned(ret, 0, size); return ret; @@ -825,26 +806,7 @@ MemoryContextAllocZeroAligned(MemoryContext context, Size size) { void *ret; - AssertArg(MemoryContextIsValid(context)); - AssertNotInCriticalSection(context); - - if (!AllocSizeIsValid(size)) - elog(ERROR, "invalid memory alloc request size %zu", size); - - context->isReset = false; - - ret = (*context->methods->alloc) (context, size); - if (ret == NULL) - { - MemoryContextStats(TopMemoryContext); - ereport(ERROR, -(errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("out of memory"), - errdetail("Failed on request of size %zu.", size))); - } - - VALGRIND_MEMPOOL_ALLOC(context, ret, size); - + ret = MemoryContextAlloc(context, size); MemSetLoop(ret, 0, size); return ret; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Background Processes and reporting
On Sat, Mar 12, 2016 at 2:45 AM, Andres Freund wrote: > On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote: > > Idea of individual time measurement of every wait event met criticism > > because it might have high overhead [1]. > > Right. And that's actually one of the point which I meant with "didn't > listen to criticism". There've been a lot of examples, on an off list, > where taking timings trigger significant slowdowns. Yes, in some > bare-metal environments, which a coherent tsc, the overhead can be > low. But that doesn't make it ok to have a high overhead on a lot of > other systems. > > Just claiming that that's not a problem will only lead to your position > not being taken serious. > > > > This is really so at least for Windows [2]. > > Measuring timing overhead for a simplistic workload on a single system > doesn't mean that. Try doing such a test on a vmware esx virtualized > windows machine, on a multi-socket server; in a lot of instances you'll > see two-three orders of magnitude longer average times; with peaks going > into 4-5 orders of magnitude. And, as sad it is, realistically most > postgres instances will run in virtualized environments. > > > > But accessing only current values wouldn't be very useful. We > > anyway need to gather some statistics. Gathering it by sampling would be > > both more expensive and less accurate for majority of systems. This is > why > > I proposed hooks to make possible platform dependent extensions. Robert > > rejects hook because he is "not a big fan of hooks as a way of resolving > > disagreements about the design" [3]. > > I think I agree with Robert here. Providing hooks into very low level > places tends to lead to problems in my experience; tight control over > what happens is often important - I certainly don't want any external > code to run while we're waiting for an lwlock. > So, I get following. 1) Detailed wait monitoring might cause high overhead on some systems. 2) We want wait monitoring to be always on. And we don't want options to enable additional features of wait monitoring. 3) We don't want hook of wait events to be exposed. Can I conclude that we reject detailed wait monitoring by design? If it's so and not only Robert thinks so, then let's just admit it and add it to FAQ and etc. > Besides that is actually not design issues but platform issues... > > I don't see how that's the case. > > > > Another question is wait parameters. We want to expose wait event with > > some parameters. Robert rejects that because it *might* add additional > > overhead [3]. When I proposed to fit something useful into hard-won > > 4-bytes, Roberts claims that it is "too clever" [4]. > > I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good > first step to make progress here. > > > It seems entirely possible to extend the current API in an incremental > fashion, either allowing to disable the individual pieces, or providing > sufficient measurements that it's not needed. > > > > So, situation looks like dead-end. I have no idea how to convince Robert > > about any kind of advanced functionality of wait monitoring to > PostgreSQL. > > I'm thinking about implementing sampling extension over current > > infrastructure just to make community see that it sucks. Andres, it would > > be very nice if you have any idea how to move this situation forward. > > I've had my share of conflicts with Robert. But if I were in his shoes, > targeted by this kind of rhetoric, I'd be very tempted to just ignore > any further arguments from the origin. So I think the way forward is > for everyone to cool off, and to see how we can incrementally make > progress from here on. > > > > Another aspect is that EnterpriseDB offers waits monitoring in > proprietary > > fork [5]. > > So? So, we'll end up with every company providing fork with detailed wait monitoring. While community PostgreSQL resists from providing such functionality. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: Partial sort
Hi! Tom committed upper planner pathification patch. Partial sort patch rebased to master is attached. It was quite huge rebase in planner part of the patch. But I think now patch becomes better, much more logical. It's probably, something was missed after rebase. I'm going to examine this path more carefully next week. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company partial-sort-basic-7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Fri, Mar 11, 2016 at 7:08 AM, Dilip Kumar wrote: > > On Thu, Mar 10, 2016 at 8:26 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > >> I don't think we can rely on median that much if we have only 3 runs. >> For 3 runs we can only apply Kornfeld method which claims that confidence >> interval should be between lower and upper values. >> Since confidence intervals for master and patched versions are >> overlapping we can't conclude that expected TPS numbers are different. >> Dilip, could you do more runs? 10, for example. Using such statistics we >> would be able to conclude something. >> > > Here is the reading for 10 runs > > > Median Result > --- > > Client Base Patch > --- > 1 1987319739 > 2 3865838276 > 4 6881262075 > > Full Results of 10 runs... > > Base > - > Runs 1 Client2 Client 4 Client > - > 1194423486649023 > 2196053513951695 > 3197263710453899 > 4198353843955708 > 5198663863867473 > 6198803867970152 > 7199733872070920 > 8200483873771342 > 9200573881571403 > 10 203444142377953 > - > > > Patch > --- > Runs 1 Client 2 Client 4 Client > -- > 1188813016154928 > 2194153203159741 > 3195643502261060 > 4196273671261839 > 5196703765962011 > 6198083889462139 > 7198573908162983 > 8199103992375358 > 9201693993777481 > 10 201814000378462 > -- > I've drawn graphs for these measurements. The variation doesn't look random here. TPS is going higher from measurement to measurement. I bet you did measurements sequentially. I think we should do more measurements until TPS stop growing and beyond to see accumulate average statistics. Could you please do the same tests but 50 runs. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Background Processes and reporting
On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas wrote: > On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund wrote: > > On 2016-03-14 16:16:43 -0400, Robert Haas wrote: > >> > I have already shown [0, 1] the overhead of measuring timings in > linux on > >> > representative workload. AFAIK, these tests were the only one that > showed > >> > any numbers. All other statements about terrible performance have > been and > >> > remain unconfirmed. > >> > >> Of course, those numbers are substantial regressions which would > >> likely make it impractical to turn this on on a heavily-loaded > >> production system. > > > > A lot of people operating production systems are fine with trading a <= > > 10% impact for more insight into the system; especially if that > > configuration can be changed without a restart. I know a lot of systems > > that use pg_stat_statements, track_io_timing = on, etc; just to get > > that. In fact there's people running perf more or less continuously in > > production environments; just to get more insight. > > > > I think it's important to get as much information out there without > > performance overhead, so it can be enabled by default. But I don't think > > it makes sense to not allow features in that cannot be enabled by > > default, *if* we tried to make them cheap enough beforehand. > > Hmm, OK. I would have expected you to be on the other side of this > question, so maybe I'm all wet. One point I am concerned about is > that, right now, we have only a handful of types of wait events. I'm > very interested in seeing us add more, like I/O and client wait. So > any overhead we pay here is likely to eventually be paid in a lot of > places - thus it had better be extremely small. > OK. Let's start to produce light, not heat. As I get we have two features which we suspect to introduce overhead: 1) Recording parameters of wait events which requires some kind of synchronization protocol. 2) Recording time of wait events because time measurements might be expensive on some platforms. Simultaneously there are machines and workloads where both of these features doesn't produce measurable overhead. And, we're talking not about toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic) internet companies in the world. They do run both of this features in production highload database without noticing any overhead of them. It would be great progress, if we decide that we could add both of these features controlled by GUC (off by default). If we decide so, then let's start working on this. At first, we should construct list of machines and workloads for testing. Each list of machines and workloads would be not comprehensive. But let's find something that would be enough for testing of GUC controlled, off by default features. Then we can turn our conversation from theoretical thoughts to particular benchmarks which would be objective and convincing to everybody. Otherwise, let's just add these features to the list of unwanted functionality and close this question. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Declarative partitioning
Hi, Amit! I tried to apply your patch. It still applies, but has some duplicate oids. After fixing duplicate oids, I've noticed following warning during compilation by clang-700.1.81. scan.c:10308:23: warning: unused variable 'yyg' [-Wunused-variable] struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be unused depending upon options. */ ^ tablecmds.c:12922:6: warning: variable 'is_subpart' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (parent != NULL) ^~ tablecmds.c:12931:12: note: uninitialized use occurs here partKey = is_subpart ? list_nth(rootParent->rd_partkeys, parent_level) : ^~ tablecmds.c:12922:2: note: remove the 'if' if its condition is always true if (parent != NULL) ^~~ tablecmds.c:12912:19: note: initialize the variable 'is_subpart' to silence this warning boolis_subpart; ^ = '\0' tablecmds.c:13375:37: warning: variable 'operoid' is uninitialized when used here [-Wuninitialized] comp_left_expr = make_opclause(operoid, BOOLOID, false, ^~~ tablecmds.c:13326:17: note: initialize the variable 'operoid' to silence this warning Oid operoid; ^ = 0 Regression tests passed cleanly for me. I also examined code a bit. As I get, for DML queries, declarative partitioning works like inheritance. It just provides alternative way for collecting append_rel_list. We're working on the other side of partitioning problem. Without solving syntax problem, we're solving performance problems in pg_pathman extension: https://github.com/postgrespro/pg_pathman. We already have interesting results which you can see in blog posts [1], [2], [3]. We already have fast algorithm for partition selection in optimizer [1] and effective optimization of filter conditions [3]. And we're planning to implement following features: - Execute time partitions selection (useful for nested loops and prepared statements); - Optimization of ordered output from patitioned tables (use Append instead of MergeAppend when possible); - Optimization of hash join when both tables are patitioned by join key. 9.6 feature freeze in coming, and we're planning our development resources for 9.7. Besides providing an extension, we would like these features to eventually arrive to core. In order to achieve this we need to port out functionality over your declarative syntax. At first, we would need some way for caching partition metadata suitable for fast partition selection. For range partition it could be sorted array or RB-tree of partition bounds. When we have this infrastructure, we can start porting pieces of pg_pathman functionality to declarative partitiong. So, our draft plan of patches would be following: - Implement partition metadata cache suitable for fast partition selection. - Fast partition selection using metadata cache. - Optimization of filter conditions passed to partitions. - Execute time partitions selection (useful for nested loops and prepared statements); - Optimization of ordered output from patitioned tables (use Append instead of MergeAppend when possible); - Optimization of hash join when both tables are patitioned by join key. I'd like to validate that this development plan doesn't overlaps with your plans. If out plans are not overlapping then let's accept this plan of work for 9.7. Links. [1] http://www.postgrespro.com/blog/pgsql/pg_pathman [2] http://akorotkov.github.io/blog/2016/03/04/pg_pathman-beta-release/ [3] http://akorotkov.github.io/blog/2016/03/14/pg_pathman-condition-processing/ -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] initdb: introduce PG_CMD_PRINTF intestead of PG_CMD_PRINTF{1,2,3}
Hello all, The src/bin/initdb/initdb.c provides three macros to write data to cmdfd. All of these macro do the same, but with different amount of arguments for fprintf(). Attached patch introduces PG_CMD_PRINTF macro which will take set of variadic arguments via __VA_ARGS__ to replace the PG_CMD_PRINTF{1,2,3} macros. Any objections? diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index ed3ba7b..cd0ea72 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -307,21 +307,9 @@ do { \ output_failed = true, output_errno = errno; \ } while (0) -#define PG_CMD_PRINTF1(fmt, arg1) \ +#define PG_CMD_PRINTF(fmt, ...) \ do { \ - if (fprintf(cmdfd, fmt, arg1) < 0 || fflush(cmdfd) < 0) \ - output_failed = true, output_errno = errno; \ -} while (0) - -#define PG_CMD_PRINTF2(fmt, arg1, arg2) \ -do { \ - if (fprintf(cmdfd, fmt, arg1, arg2) < 0 || fflush(cmdfd) < 0) \ - output_failed = true, output_errno = errno; \ -} while (0) - -#define PG_CMD_PRINTF3(fmt, arg1, arg2, arg3) \ -do { \ - if (fprintf(cmdfd, fmt, arg1, arg2, arg3) < 0 || fflush(cmdfd) < 0) \ + if (fprintf(cmdfd, fmt, __VA_ARGS__) < 0 || fflush(cmdfd) < 0) \ output_failed = true, output_errno = errno; \ } while (0) @@ -1604,8 +1592,8 @@ get_set_pwd(FILE *cmdfd) } - PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n", - username, escape_quotes(pwd1)); + PG_CMD_PRINTF("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n", + username, escape_quotes(pwd1)); free(pwd1); } @@ -1724,8 +1712,8 @@ setup_description(FILE *cmdfd) " objsubid int4, " " description text) WITHOUT OIDS;\n\n"); - PG_CMD_PRINTF1("COPY tmp_pg_description FROM E'%s';\n\n", - escape_quotes(desc_file)); + PG_CMD_PRINTF("COPY tmp_pg_description FROM E'%s';\n\n", + escape_quotes(desc_file)); PG_CMD_PUTS("INSERT INTO pg_description " " SELECT t.objoid, c.oid, t.objsubid, t.description " @@ -1737,8 +1725,8 @@ setup_description(FILE *cmdfd) " classname name, " " description text) WITHOUT OIDS;\n\n"); - PG_CMD_PRINTF1("COPY tmp_pg_shdescription FROM E'%s';\n\n", - escape_quotes(shdesc_file)); + PG_CMD_PRINTF("COPY tmp_pg_shdescription FROM E'%s';\n\n", + escape_quotes(shdesc_file)); PG_CMD_PUTS("INSERT INTO pg_shdescription " " SELECT t.objoid, c.oid, t.description " @@ -1881,8 +1869,8 @@ setup_collation(FILE *cmdfd) quoted_locale = escape_quotes(localebuf); - PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n", - quoted_locale, quoted_locale, enc); + PG_CMD_PRINTF("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n", + quoted_locale, quoted_locale, enc); /* * Generate aliases such as "en_US" in addition to "en_US.utf8" for @@ -1893,15 +1881,15 @@ setup_collation(FILE *cmdfd) { char *quoted_alias = escape_quotes(alias); - PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n", - quoted_alias, quoted_locale, enc); + PG_CMD_PRINTF("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n", + quoted_alias, quoted_locale, enc); free(quoted_alias); } free(quoted_locale); } /* Add an SQL-standard name */ - PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8); + PG_CMD_PRINTF("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8); /* * When copying collations to the final location, eliminate aliases that @@ -1914,6 +1902,7 @@ setup_collation(FILE *cmdfd) * Also, eliminate any aliases that conflict with pg_collation's * hard-wired entries for "C" etc. */ + PG_CMD_PUTS("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) " " SELECT DISTINCT ON (collname, encoding)" " collname, " @@ -2058,16 +2047,16 @@ setup_schema(FILE *cmdfd) free(lines); - PG_CMD_PRINTF1("UPDATE information_schema.sql_implementation_info " - " SET character_value = '%s' " - " WHERE implementation_info_name = 'DBMS VERSION';\n\n", - infoversion); + PG_CMD_PRINTF("UPDATE information_schema.sql_implementation_info " + " SET character_value = '%s' " + " WHERE implementation_info_name = 'DBMS VERSION';\n\n", + infoversion); - PG_CMD_PRINTF1("COPY information_schema.sql_features " - " (feature_id, feature_name, sub_feature_id, " - " sub_feature_name, is_supported, comments) " - " FROM E'%s';\n\n", - escape_quotes(features_file)); + PG_CMD_PRINTF("COPY information_schema.sql_features " + " (feature_id, feature_name, sub_feature_id, " + " sub_feature_name, is_supported, comments) " + " FROM E'%s';\n\n", + escape_quotes(features_file)); } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning
On Wed, Mar 16, 2016 at 5:29 PM, Amit Langote wrote: > > 9.6 feature freeze in coming, and we're planning our development > resources > > for 9.7. Besides providing an extension, we would like these features to > > eventually arrive to core. In order to achieve this we need to port out > > functionality over your declarative syntax. At first, we would need some > > way for caching partition metadata suitable for fast partition selection. > > For range partition it could be sorted array or RB-tree of partition > bounds. > > When we have this infrastructure, we can start porting pieces of > pg_pathman > > functionality to declarative partitiong. > > I had to think about the internal metadata representation (and its > caching) when developing the tuple routing solution. I am hopeful > that it is suitable for other executor mechanisms we will build for > partitioned tables. > Yes, it appears that I missed it. You already have sorted array for range partitioning and binary search implementation. This is good. I'm a bit worrying about list partitioning because in this case you scan array sequentially. We could optimize this case for better handling of many list partitions. This is probably not most common case though. > So, our draft plan of patches would be following: > > > > Implement partition metadata cache suitable for fast partition selection. > > Fast partition selection using metadata cache. > > Optimization of filter conditions passed to partitions. > > Execute time partitions selection (useful for nested loops and prepared > > statements); > > Optimization of ordered output from patitioned tables (use Append > instead of > > MergeAppend when possible); > > Optimization of hash join when both tables are patitioned by join key. > > > > I'd like to validate that this development plan doesn't overlaps with > your > > plans. If out plans are not overlapping then let's accept this plan of > work > > for 9.7. > > It looks OK to me. Thanks for sharing it. > Great! Let's work together. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sat, Mar 19, 2016 at 3:22 PM, Dilip Kumar wrote: > > On Mon, Mar 14, 2016 at 3:09 AM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > >> I've drawn graphs for these measurements. The variation doesn't look >> random here. TPS is going higher from measurement to measurement. I bet >> you did measurements sequentially. >> I think we should do more measurements until TPS stop growing and beyond >> to see accumulate average statistics. Could you please do the same tests >> but 50 runs. >> > > I have taken reading at different client count (1,2 and 4) > > 1. Avg: Is taken average after discarding lowest 5 and highest 5 readings. > 2. With 4 client I have only 30 reading. > > Summary: > > I think if we check avg or median atleast at 1 or 2 client head always > looks winner, but same is not the case with 4 clients. And with 4 clients I > can see much more fluctuation in the reading. > > > Head(1 Client) Patch (1 Client) > Head (2 Client) Patch (2 Client) > Head (4 Client) Patch (4 Client) > Avg: 19628 19578 > 37180 36536 > 70044 70731 > Median: 19663 19581 > 37967 37484 > 73003 75376 > Below is all the reading. (Arranged in sorted order) > > *Runs* *Head (1 Client)* *Patch (1 Client)* > *Head (2 Client)* *Patch (2 Client)* > *Head (4 Client)* *Patch (4 Client)* > 1 18191 18153 > 29454 26128 > 49931 47210 > 2 18365 18768 > 31218 26956 > 53358 47287 > 3 19053 18769 > 31808 29286 > 53575 55458 > 4 19128 18915 > 31860 30558 > 54282 55794 > 5 19160 18948 > 32411 30945 > 56629 56253 > 6 19177 19055 > 32453 31316 > 57595 58147 > 7 19351 19232 > 32571 31703 > 59490 58543 > 8 19353 19239 > 33294 32029 > 63887 58990 > 9 19361 19255 > 33936 32217 > 64718 60233 > 10 19390 19297 > 34361 32767 > 65737 68210 > 11 19452 19368 > 34563 34487 > 65881 71409 > 12 19478 19387 > 36110 34907 > 67151 72108 > 13 19488 19388 > 36221 34936 > 70974 73977 > 14 19490 19395 > 36461 35068 > 72212 74891 > 15 19512 19406 > 36712 35298 > 73003 75376 > 16 19538 19450 > 37104 35492 > 74842 75468 > 17 19547 19487 > 37246 36648 > 75400 75515 > 18 19592 19521 > 37567 37263 > 75573 75626 > 19 19627 19536 > 37707 37430 > 75798 75745 > 20 19661 19556 > 37958 37461 > 75834 75868 > 21 19666 19607 > 37976 37507 > 76240 76161 > 22 19701 19624 > 38060 37557 > 76426 76162 > 23 19708 19643 > 38244 37717 > 76770 76333 > 24 19748 19684 > 38272 38285 > 77011 77009 > 25 19751 19694 > 38467 38294 > 77114 77168 > 26 19776 19695 > 38524 38469 > 77630 77318 > 27 19781 19709 > 38625 38642 > 77865 77550 > 28 19786 19765 > 38756 38643 > 77912 77904 > 29 19796 19823 > 38939 38649 > 78242 78736 > 30 19826 19847 > 39139 38659 > > > 31 19837 19899 > 39208 38713 > > > 32 19849 19909 > 39211 38837 > > > 33 19854 19932 > 39230 38876 > > > 34 19867 19949 > 39249 39088 > > > 35 19891 19990 > 39259 39148 > > > 36 20038 20085 > 39286 39453 > > > 37 20083 20128 > 39435 39563 > > > 38 20143 20166 > 39448 39959 > > > 39 20191 20198 > 39475 40495 > > > 40 20437 20455 > 40375 40664 > > > So, I think it's really some regression here on small number clients. I have following hypothesis about that. In some cases we use more atomic operations than before. For instace, in BufferAlloc we have following block of code. Previous code deals with that without atomic operations relying on spinlock. So, we have 2 extra atomic operations here, and similar situation in other places. pg_atomic_fetch_and_u32(&buf->state, ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT | BUF_USAGECOUNT_MASK)); if (relpersistence == RELPERSISTENCE_PERMANENT) pg_atomic_fetch_or_u32(&buf->state, BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE); else pg_atomic_fetch_or_u32(&buf->state, BM_TAG_VALID | BUF_USAGECOUNT_ONE); Actually, we behave like old code and do such modifications without increasing number of atomic operations. We can just calculate new value of state (including unset of BM_LOCKED flag) and write it to the buffer. In this case we don't require more atomic operations. However, it becomes not safe to use atomic decrement in UnpinBuffer(). We can use there loop of CAS which wait buffer header to be unlocked like PinBuffer() does. I'm not sure if this affects multicore scalability, but I think it worth trying. So, this idea is implemented in attached patch. Please, try it for both regression on lower number of clients and scalability on large number of clients. Other changes in this patch: 1) PinBuffer() and UnpinBuffer() use exponential backoff in spindealy like LockBufHdr() does. 2) Previous patch contained revert of ac1d7945f866b1928c2554c0f80fd52d7f92. This was not intentional. No, it doesn't contains this revert. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company pinunpin-cas-4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'
On Mon, Mar 21, 2016 at 9:34 AM, Abhijit Menon-Sen wrote: > At 2016-03-19 17:46:25 -0300, alvhe...@2ndquadrant.com wrote: > > > > I don't think the first patch is acceptable standalone -- we need both > > things together. > > OK. > > > But in reality, pg_depend handling is mixed up with other changes all > > over the place. > > Yes, I noticed that. :-/ > > > Anyway I think this should be something along the lines of > > ALTER FUNCTION foo() DEPENDS ON EXTENSION bar; > > OK. That's reasonable. > > > ALTER FUNCTION foo() OWNED BY EXTENSION bar; > > If the function is really OWNED BY EXTENSION, then the right way to > declare it would seem to be ALTER EXTENSION … ADD FUNCTION. I prefer > DEPENDS ON EXTENSION for this reason, there's no ambiguity about what > we're declaring. > I'm not sure why we want to make new dependency type by ALTER FUNCTION command, not ALTER EXTENSION? Since, we already add 'e' dependencies by ALTER EXTENSION command, why it should be different for 'x' dependencies. The argument could be that 'x' dependency type would be used for other objects not extensions. But this is much more general problem and it's unclear, that we would end up with this behaviour and this dependency type. So, I would prefer this patch to extend ALTER EXTENSION command while it's aimed to this particular problem. I even think we could extend existent grammar rule | ALTER EXTENSION name add_drop FUNCTION function_with_argtypes > *** AlterExtensionContentsStmt: > *** 3982,3987 > --- 3987,3993 > n->objtype = OBJECT_FUNCTION; > n->objname = $6->funcname; > n->objargs = $6->funcargs; > + n->deptype = 'e'; > $$ = (Node *)n; > } instead of adding another + | ALTER EXTENSION name add_drop DEPENDENT FUNCTION > function_with_argtypes > + { > + AlterExtensionContentsStmt *n = > makeNode(AlterExtensionContentsStmt); > + n->extname = $3; > + n->action = $4; > + n->objtype = OBJECT_FUNCTION; > + n->objname = $7->funcname; > + n->objargs = $7->funcargs; > + n->deptype = 'x'; > $$ = (Node *)n; > } by introducing separate rule extension_dependency_type. In the same way we could dependency type parameter to each ALTER EXTENSION grammar rule. Therefore, existent functionality would be extended in natural way with not large changes in the code. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] improving GROUP BY estimation
Hi, Dean! On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed wrote: > Probably a better URL to give is > http://www.adellera.it/investigations/distinct_balls/ which has a link > to the PDF version of the paper and also some supporting material. > > However, while that paper is in general very clear, I don't think it > gives a very clear explanation of that particular formula, and it > doesn't explain what it represents. It merely says that if "i" can be > ignored "for some reason (e.g. i << Nr)", then that formula is an > approximation to the exact "without replacement" formula, which is the > subject of that paper. > > But actually, that formula is the exact formula for the expected > number of distinct values when selecting *with replacement* from a > collection where each of the distinct values occurs an equal number of > times. So I think we should say that. > > Perhaps something along the lines of: > > /* > * Update the estimate based on the restriction selectivity. > * > * This uses the formula for the expected number of distinct > values > * when selecting with replacement from a collection where > each of > * the distinct values occurs an equal number of times (a > uniform > * distribution of values). This is a very close approximation > to > * the more correct method of selecting without replacement > when > * the number of distinct values is quite large --- for > example, > * see http://www.adellera.it/investigations/distinct_balls/. > * Additionally, it also turns out to work very well even when > the > * number of distinct values is small. > */ > +1 Thank you for work on this patch. The formula you propose and explanation look great! -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Phrase search ported to 9.6
Hi! I see that patch changes existing regression tests in tsearch2.out. *** a/contrib/tsearch2/expected/tsearch2.out --- b/contrib/tsearch2/expected/tsearch2.out *** SELECT '(!1|2)&3'::tsquery; *** 278,292 (1 row) SELECT '1|(2|(4|(5|6)))'::tsquery; ! tsquery ! - ! '1' | ( '2' | ( '4' | ( '5' | '6' ) ) ) (1 row) SELECT '1|2|4|5|6'::tsquery; ! tsquery ! - ! ( ( ( '1' | '2' ) | '4' ) | '5' ) | '6' (1 row) SELECT '1&(2&(4&(5&6)))'::tsquery; --- 278,292 (1 row) SELECT '1|(2|(4|(5|6)))'::tsquery; !tsquery ! - ! '1' | '2' | '4' | '5' | '6' (1 row) SELECT '1|2|4|5|6'::tsquery; !tsquery ! - ! '1' | '2' | '4' | '5' | '6' (1 row) This change looks like improvement, without braces tsquery readability is much better. *** select rewrite('moscow & hotel', 'select *** 461,469 (1 row) select rewrite('bar & new & qq & foo & york', 'select keyword, sample from test_tsquery'::text ); !rewrite ! - ! 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) ) (1 row) select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery; --- 461,469 (1 row) select rewrite('bar & new & qq & foo & york', 'select keyword, sample from test_tsquery'::text ); ! rewrite ! - ! ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' ) (1 row) select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery; However, such reorderings look unclear and need motivation. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'
On Mon, Mar 21, 2016 at 2:19 PM, Abhijit Menon-Sen wrote: > At 2016-03-21 13:04:33 +0300, a.korot...@postgrespro.ru wrote: > > > > I'm not sure why we want to make new dependency type by ALTER FUNCTION > > command, not ALTER EXTENSION? > > It's a matter of semantics. It means something very different than what > an 'e' dependency means. The extension doesn't own the function (and so > pg_dump shouldn't ignore it), but the function depends on the extension > (and so dropping the extension should drop it). > > > The argument could be that 'x' dependency type would be used for other > > objects not extensions. > > I suppose this is possible, but yes, I agree with you that it's not > clear how or why this would be useful. > > > So, I would prefer this patch to extend ALTER EXTENSION command while > > it's aimed to this particular problem. > > OK, so that's what the patch does, and it's certainly the simplest > approach for reasons discussed earlier (though perhaps not as clear > semantically as the ALTER FUNCTION approach). But: > > > I even think we could extend existent grammar rule > > > > | ALTER EXTENSION name add_drop FUNCTION > function_with_argtypes > > > *** AlterExtensionContentsStmt: > > > *** 3982,3987 > > > --- 3987,3993 > > > n->objtype = OBJECT_FUNCTION; > > > n->objname = $6->funcname; > > > n->objargs = $6->funcargs; > > > + n->deptype = 'e'; > > > $$ = (Node *)n; > > > } > > How exactly do you propose to do this, i.e., what would the final > command to declare an 'x' dependency look like? > I'm proposed something like this. extension_dependency_type: DEPENDENT { $$ = 'x'; } | /*EMPTY*/ { $$ = 'e'; } ; ... | ALTER EXTENSION name add_drop extension_dependency_type FUNCTION function_with_argtypes { AlterExtensionContentsStmt *n = makeNode(AlterExtensionContentsStmt); n->extname = $3; n->action = $4; n->objtype = OBJECT_FUNCTION; n->objname = $7->funcname; n->objargs = $7->funcargs; n->deptype = $5; $$ = (Node *)n; } I didn't try it. Probably it causes a grammar conflicts. In this case I don't insist on it. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Access method extendability
On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera wrote: > So. Before this version of the patch was posted in Nov 4th 2015, both > Tom and Heikki had said essentially "CREATE ACCESS METHOD is worthless, > let's pursue this stuff without those commands". > http://www.postgresql.org/message-id/54730dfd.2060...@vmware.com (Nov > 2014) > Yes, that's it. In this email Heikki asserts that INSERTs into pg_am tables in extensions would survive pg_upgrade, because pg_dump will dump CREATE EXTENSION command which execute extension script. In my response I noticed that it's not correct. pg_dump in binary upgrade mode works so that we will miss records in pg_am. I haven't any answer here. http://www.postgresql.org/message-id/capphfdsbkntlchypngr0ffu9wlhh__nukdp13qlqukgtnv_...@mail.gmail.com And, as Petr Jelinek mentioned, there is also problem of dependencies in DROP EXTENSION. > http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us (Oct > 2014) > > And then Alexander posted this version, without any discussion that > evidenced that those old objections were overridden. What happened > here? Did somebody discuss this in unarchived meetings? If so, it > would be good to know about that in this thread. After that Tom Lane committed very huge patch about rework AM API. One of aims of this patch was support for CREATE ACCESS METHOD command. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=65c5fcd353a859da9e61bfb2b92a99f12937de3b I've initially positioned this patch as refactoring for support CREATE ACCESS METHOD command. http://www.postgresql.org/message-id/capphfdtlisxmxk2b4tw+4+no_1-t0rao5foyszho6+sn2om...@mail.gmail.com During discussing Tom mentioned future CREATE ACCESS METHOD command. http://www.postgresql.org/message-id/16164.1439222...@sss.pgh.pa.us I don't think Tom would put so much efforts in this direction if he would remain opposed to this command. I noticed this state of affairs because I started reading the complete > thread in order to verify whether a consensus had been reached regarding > the move of IndexQualInfo and GenericCosts to selfuncs.h. But I only > see that Jim Nasby mentioned it and Alexander said "yes they move"; > nothing else. The reason for raising this is that, according to > Alexander, new index AMs will want to use those structs; but current > index AMs only include index_selfuncs.h, and none of them includes > selfuncs.h, so it seems completely wrong to be importing all the planner > knowledge embodied in selfuncs.h into extension index AMs. > > One observation is that the bloom AM patch (0003 of this series) uses > GenericCosts but not IndexQualInfo. But btcostestimate() and > gincostestimate() both use IndexQualInfo, so other AMs might well want > to use it too. > > We could move GenericCosts and the prototype for genericcostestimate() > to index_selfuncs.h; that much is pretty reasonable. But I'm not sure > what to do about IndexQualInfo. We could just add forward struct > declarations for RestrictInfo and Node to index_selfuncs.h. But then > the extension code is going to have to import the includes were those > are defined anyway. Certainly it seems that deconstruct_indexquals() > (which returns a list of IndexQualInfo) is going to be needed by > extension index AMs, at least ... > Thank you for highlighting these issues. > I've made a few edits to the patch already, but I need to do some more > testing. Did you already fix the issues above or do you need me to fix them? > Particularly I would like to understand the relcache issues to > figure out whether the current one is right. Good point. I'll also try to find relcache issues. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar wrote: > > On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar > wrote: > >> ! pg_atomic_write_u32(&bufHdr->state, state); >> } while (!StartBufferIO(bufHdr, true)); >> >> Better Write some comment, about we clearing the BM_LOCKED from stage >> directly and need not to call UnlockBufHdr explicitly. >> otherwise its confusing. >> > > Few more comments.. > > *** 828,837 > */ >do >{ > ! LockBufHdr(bufHdr); > *! Assert(bufHdr->flags & BM_VALID);* > ! bufHdr->flags &= ~BM_VALID; > ! UnlockBufHdr(bufHdr); >} while (!StartBufferIO(bufHdr, true)); >} >} > --- 826,834 > */ >do >{ > ! uint32 state = LockBufHdr(bufHdr); > ! state &= ~(BM_VALID | BM_LOCKED); > ! pg_atomic_write_u32(&bufHdr->state, state); >} while (!StartBufferIO(bufHdr, true)); > > 1. Previously there was a Assert, any reason why we removed it ? > Assert(bufHdr->flags & BM_VALID); > It was missed. In the attached patch I've put it back. 2. And also if we don't need assert then can't we directly clear BM_VALID > flag from state variable (if not locked) like pin/unpin buffer does, > without taking buffer header lock? > In this version of patch it could be also done as loop of CAS operation. But I'm not intended to it so because it would significantly complicate code. It's not yes clear that traffic in this place is high enough to make such optimizations. Since v4 patch implements slightly different approach. Could you please test it? We need to check that this approach worth putting more efforts on it. Or through it away otherwise. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company pinunpin-cas-5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump dump catalog ACLs
Hi, Stephen! I'm signed to review this patch. At first, patch wasn't applied cleanly, it had a conflict at the end of system_views.sql. But that way very easy to fix. On Mon, Mar 14, 2016 at 7:00 PM, Stephen Frost wrote: > I've not included it in this patch, but my thinking here would be to add > a check to the GRANT functions which checks 'if (creating_extension)' > and records/updates the entries in pg_init_privs for those objects. > This is similar to how we handle dependencies for extensions currently. > That way, extensions don't have to do anything and if the user changes > the permissions on an extension's object, the permissions for that > object would then be dumped out. > > This still requires more testing, documentation, and I'll see about > making it work completely for extensions also, but wanted to provide an > update and solicit for review/comments. > > The patches have been broken up in what I hope is a logical way for > those who wish to review/comment: > > #1 - Add the pg_init_privs catalog > #2 - Change pg_dump to use a bitmap instead of boolean for 'dump' > #3 - Split 'dump' into 'dump' and 'dump_contains' > #4 - Make pg_dump include changed ACLs in dump of 'pg_catalog' > + "CASE WHEN pip.initprivs IS NOT DISTINCT FROM n.nspacl " + "THEN false ELSE true END as dumpacl " Why don't just write " pip.initprivs IS DISTINCT FROM n.nspacl AS dumpacl"? I think there are few places whene CASE could be eliminated. + appendPQExpBuffer(query, "SELECT tableoid, oid, proname AS aggname, " + "pronamespace AS aggnamespace, " + "pronargs, proargtypes, " + "(%s proowner) AS rolname, " + "proacl AS aggacl " + "FROM pg_proc p " + "WHERE proisagg AND (" + "pronamespace != " + "(SELECT oid FROM pg_namespace " + "WHERE nspname = 'pg_catalog') OR " + "EXISTS (SELECT * FROM pg_init_privs pip " + "WHERE p.oid = pip.objoid AND pip.classoid = " + "(SELECT oid FROM pg_class " + "WHERE relname = 'pg_proc') " + "AND p.proacl <> pip.initprivs)", + username_subquery); Why do we compare ACLs using <> funcs and using IS DISTINCT for other objects? Is it intentional? Assuming most of proacl values are NULLs when you change it, acl wouldn't be dumped since NULL <> value IS NULL. In general, these checks using subqueries looks complicated for me, hard to validate and needs a lot of testing. Could we implement function "bool need_acl_dump(oid objoid, oid classoid, int objsubid, proacl acl)" and call it? > #5 - Remove 'if (!superuser()) ereport()' calls and REVOKE EXECUTE > Other things in the patches looks good for me except they need tests and documentation. I'm marking this waiting for author. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] improving GROUP BY estimation
Hi, Tomas! On Mon, Mar 21, 2016 at 2:39 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed > wrote: > >> Probably a better URL to give is >> http://www.adellera.it/investigations/distinct_balls/ which has a link >> to the PDF version of the paper and also some supporting material. >> >> However, while that paper is in general very clear, I don't think it >> gives a very clear explanation of that particular formula, and it >> doesn't explain what it represents. It merely says that if "i" can be >> ignored "for some reason (e.g. i << Nr)", then that formula is an >> approximation to the exact "without replacement" formula, which is the >> subject of that paper. >> >> But actually, that formula is the exact formula for the expected >> number of distinct values when selecting *with replacement* from a >> collection where each of the distinct values occurs an equal number of >> times. So I think we should say that. >> >> Perhaps something along the lines of: >> >> /* >> * Update the estimate based on the restriction selectivity. >> * >> * This uses the formula for the expected number of distinct >> values >> * when selecting with replacement from a collection where >> each of >> * the distinct values occurs an equal number of times (a >> uniform >> * distribution of values). This is a very close >> approximation to >> * the more correct method of selecting without replacement >> when >> * the number of distinct values is quite large --- for >> example, >> * see http://www.adellera.it/investigations/distinct_balls/. >> * Additionally, it also turns out to work very well even >> when the >> * number of distinct values is small. >> */ >> > > +1 > Thank you for work on this patch. The formula you propose and explanation > look great! > I think you should send a revision of patch including comments proposed by Deam Rasheed. I'm switching patch status to waiting on author in commitfest. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Access method extendability
On Tue, Mar 22, 2016 at 11:53 PM, Alvaro Herrera wrote: > Alexander Korotkov wrote: > > On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > > wrote: > > > > I noticed this state of affairs because I started reading the complete > > > thread in order to verify whether a consensus had been reached > regarding > > > the move of IndexQualInfo and GenericCosts to selfuncs.h. But I only > > > see that Jim Nasby mentioned it and Alexander said "yes they move"; > > > nothing else. The reason for raising this is that, according to > > > Alexander, new index AMs will want to use those structs; but current > > > index AMs only include index_selfuncs.h, and none of them includes > > > selfuncs.h, so it seems completely wrong to be importing all the > planner > > > knowledge embodied in selfuncs.h into extension index AMs. > > > > > > One observation is that the bloom AM patch (0003 of this series) uses > > > GenericCosts but not IndexQualInfo. But btcostestimate() and > > > gincostestimate() both use IndexQualInfo, so other AMs might well want > > > to use it too. > > > > > > We could move GenericCosts and the prototype for genericcostestimate() > > > to index_selfuncs.h; that much is pretty reasonable. But I'm not sure > > > what to do about IndexQualInfo. We could just add forward struct > > > declarations for RestrictInfo and Node to index_selfuncs.h. But then > > > the extension code is going to have to import the includes were those > > > are defined anyway. Certainly it seems that deconstruct_indexquals() > > > (which returns a list of IndexQualInfo) is going to be needed by > > > extension index AMs, at least ... > > > > Thank you for highlighting these issues. > > After thinking some more about this, I think it's all right to move both > IndexQualInfo and GenericCosts to selfuncs.h. The separation that we > want is this: the selectivity functions themselves need access to a lot > of planner knowledge, and so selfuncs.h knows a lot about planner. But > the code that _calls_ the selectivity function doesn't; and > index_selfuncs.h is about the latter. Properly architected extension > index AMs are going to have their selectivity functions in a separate .c > file which knows a lot about planner, and which includes selfuncs.h (and > maybe index_selfuncs.h if it wants to piggyback on the existing > selecticity functions, but that's unlikely); only the prototype for the > selfunc is going to be needed elsewhere, and the rest of the index code > is not going to know anything about planner stuff so it will not need to > include selfuncs.h nor index_selfuncs.h. Right, the purposes of headers are: * selfuncs.h – for those who going to estimate selectivity, * index_selfuncs.h – for those who going to call selectivity estimation functions of built-in AMs. >From this point for view we should keep both IndexQualInfo and GenericCosts in selfuncs.h. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
Hi! Since, patch for exposing current wait event information in PGPROC was committed, it becomes possible to collect wait event statistics using sampling. Despite I'm not fan of this approach, it is still useful and definitely better than nothing. In PostgresPro, we actually already had it. Now, it's too late to include something new to 9.6. This is why I've rework it and publish at github as an extension for 9.6: https://github.com/postgrespro/pg_wait_sampling/ Hopefully, it could be considered as contrib for 9.7. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Thu, Mar 24, 2016 at 5:17 PM, Robert Haas wrote: > On Fri, Mar 18, 2016 at 1:19 PM, Anastasia Lubennikova > wrote: > > Please, find the new version of the patch attached. Now it has WAL > > functionality. > > > > Detailed description of the feature you can find in README draft > > https://goo.gl/50O8Q0 > > > > This patch is pretty complicated, so I ask everyone, who interested in > this > > feature, > > to help with reviewing and testing it. I will be grateful for any > feedback. > > But please, don't complain about code style, it is still work in > progress. > > > > Next things I'm going to do: > > 1. More debugging and testing. I'm going to attach in next message > couple of > > sql scripts for testing. > > 2. Fix NULLs processing > > 3. Add a flag into pg_index, that allows to enable/disable compression > for > > each particular index. > > 4. Recheck locking considerations. I tried to write code as less > invasive as > > possible, but we need to make sure that algorithm is still correct. > > 5. Change BTMaxItemSize > > 6. Bring back microvacuum functionality. > > I really like this idea, and the performance results seem impressive, > but I think we should push this out to 9.7. A btree patch that didn't > have WAL support until two and a half weeks into the final CommitFest > just doesn't seem to me like a good candidate. First, as a general > matter, if a patch isn't code-complete at the start of a CommitFest, > it's reasonable to say that it should be reviewed but not necessarily > committed in that CommitFest. This patch has had some review, but I'm > not sure how deep that review is, and I think it's had no code review > at all of the WAL logging changes, which were submitted only a week > ago, well after the CF deadline. Second, the btree AM is a > particularly poor place to introduce possibly destabilizing changes. > Everybody depends on it, all the time, for everything. And despite > new tools like amcheck, it's not a particularly easy thing to debug. > It's all true. But: 1) It's a great feature many users dream about. 2) Patch is not very big. 3) Patch doesn't introduce significant infrastructural changes. It just change some well-isolated placed. Let's give it a chance. I've signed as additional reviewer and I'll do my best in spotting all possible issues in this patch. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Access method extendability
On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera wrote: > .. Oh crap. I just noticed I forgot to update a comment in pg_dump's > getAccessMethods. And we're missing psql tab-complete support for the > new commands. Attached patches fix both these issues. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company fix-pg_dump-comment.patch Description: Binary data fix-am-tab-complete.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Tue, Mar 22, 2016 at 1:08 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar > wrote: > >> >> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar >> wrote: >> >>> ! pg_atomic_write_u32(&bufHdr->state, state); >>> } while (!StartBufferIO(bufHdr, true)); >>> >>> Better Write some comment, about we clearing the BM_LOCKED from stage >>> directly and need not to call UnlockBufHdr explicitly. >>> otherwise its confusing. >>> >> >> Few more comments.. >> >> *** 828,837 >> */ >>do >>{ >> ! LockBufHdr(bufHdr); >> *! Assert(bufHdr->flags & BM_VALID);* >> ! bufHdr->flags &= ~BM_VALID; >> ! UnlockBufHdr(bufHdr); >>} while (!StartBufferIO(bufHdr, true)); >>} >>} >> --- 826,834 >> */ >>do >>{ >> ! uint32 state = LockBufHdr(bufHdr); >> ! state &= ~(BM_VALID | BM_LOCKED); >> ! pg_atomic_write_u32(&bufHdr->state, state); >>} while (!StartBufferIO(bufHdr, true)); >> >> 1. Previously there was a Assert, any reason why we removed it ? >> Assert(bufHdr->flags & BM_VALID); >> > > It was missed. In the attached patch I've put it back. > > 2. And also if we don't need assert then can't we directly clear BM_VALID >> flag from state variable (if not locked) like pin/unpin buffer does, >> without taking buffer header lock? >> > > In this version of patch it could be also done as loop of CAS operation. > But I'm not intended to it so because it would significantly complicate > code. It's not yes clear that traffic in this place is high enough to make > such optimizations. > Since v4 patch implements slightly different approach. Could you please > test it? We need to check that this approach worth putting more efforts on > it. Or through it away otherwise. > Could anybody run benchmarks? Feature freeze is soon, but it would be *very nice* to fit it into 9.6 release cycle, because it greatly improves scalability on large machines. Without this patch PostgreSQL 9.6 will be significantly behind competitors like MySQL 5.7. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Phrase search ported to 9.6
On Fri, Mar 25, 2016 at 6:42 PM, David Steele wrote: > On 3/16/16 12:38 PM, Dmitry Ivanov wrote: > > I've made an attempt to fix some of the issues you've listed, although >> there's >> still much work to be done. I'll add some comments later. >> > > Do you know when you'll have a chance to respond to reviews and provide a > new patch? > > Time is short and it's not encouraging that you say there is "still much > work to be done". Perhaps it would be best to mark this "returned with > feedback"? > Dmitry will provide a new version of patch today. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
Hi, Dilip! On Fri, Mar 25, 2016 at 8:32 PM, Dilip Kumar wrote: > On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > >> Could anybody run benchmarks? Feature freeze is soon, but it would be >> *very nice* to fit it into 9.6 release cycle, because it greatly improves >> scalability on large machines. Without this patch PostgreSQL 9.6 will be >> significantly behind competitors like MySQL 5.7. > > > I have run the performance and here are the results.. With latest patch I > did not see any regression at lower client count (median of 3 reading). > > scale factor 1000 shared buffer 8GB readonly > *Client Base patch* > 1 12957 13068 > 2 24931 25816 > 4 46311 48767 > 32 300921 310062 > 64 387623 493843 > 128 249635 583513 > scale factor 300 shared buffer 8GB readonly > *Client Base patch* > 1 14537 14586--> one thread number looks little less, generally I get > ~18000 (will recheck). > 2 34703 33929--> may be run to run variance (once I get time, will > recheck) > 4 67744 69069 > 32 312575 336012 > 64 213312 539056 > 128 190139 380122 > > *Summary:* > > Actually with 64 client we have seen ~470,000 TPS with head also, by > revering commit 6150a1b0. > refer this thread: ( > http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com > ) > > I haven't tested this patch by reverting commit 6150a1b0, so not sure can > this patch give even better performance ? > > It also points to the case, what Andres has mentioned in this thread. > > > http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de > Thank you very much for testing! I also got access to 4 x 18 Intel server with 144 threads. I'm going to post results of tests on this server in next Monday. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > Thank you very much for testing! > I also got access to 4 x 18 Intel server with 144 threads. I'm going to > post results of tests on this server in next Monday. > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100 -M prepared -T 300. See results in the table and chart. clients master v3 v5 1 11671 12507 12679 2 24650 26005 25010 4 49631 48863 49811 8 96790 96441 99946 10 121275 119928 124100 20 243066 243365 246432 30 359616 342241 357310 40 431375 415310 441619 50 489991 489896 500590 60 538057 636473 554069 70 588659 714426 738535 80 405008 923039 902632 90 295443 1181247 1155918 100 258695 1323125 1325019 110 238842 1393767 1410274 120 226018 1432504 1474982 130 215102 1465459 1503241 140 206415 1470454 1505380 150 197850 1475479 1519908 160 190935 1420915 1484868 170 185835 1438965 1453128 180 182519 1416252 1453098 My conclusions are following: 1) We don't observe any regression in v5 in comparison to master. 2) v5 in most of cases slightly outperforms v3. I'm going to do some code cleanup of v5 in Monday -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund wrote: > On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote: > > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov < > > a.korot...@postgrespro.ru> wrote: > > > > > Thank you very much for testing! > > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to > > > post results of tests on this server in next Monday. > > > > > > > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j > 100 > > -M prepared -T 300. > > See results in the table and chart. > > > > clients master v3 v5 > > 1 11671 12507 12679 > > 2 24650 26005 25010 > > 4 49631 48863 49811 > > 8 96790 96441 99946 > > 10 121275 119928 124100 > > 20 243066 243365 246432 > > 30 359616 342241 357310 > > 40 431375 415310 441619 > > 50 489991 489896 500590 > > 60 538057 636473 554069 > > 70 588659 714426 738535 > > 80 405008 923039 902632 > > 90 295443 1181247 1155918 > > 100 258695 1323125 1325019 > > 110 238842 1393767 1410274 > > 120 226018 1432504 1474982 > > 130 215102 1465459 1503241 > > 140 206415 1470454 1505380 > > 150 197850 1475479 1519908 > > 160 190935 1420915 1484868 > > 170 185835 1438965 1453128 > > 180 182519 1416252 1453098 > > > > My conclusions are following: > > 1) We don't observe any regression in v5 in comparison to master. > > 2) v5 in most of cases slightly outperforms v3. > > What commit did you base these tests on? I guess something recent, after > 98a64d0bd? > Yes, more recent than 98a64d0bd. It was based on 676265eb7b. > > I'm going to do some code cleanup of v5 in Monday > > Ok, I'll try to do a review and possibly commit after that. > Sounds good. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Phrase search ported to 9.6
Hi! On Sat, Mar 26, 2016 at 12:02 AM, David Steele wrote: > On 3/25/16 3:54 PM, Artur Zakirov wrote: > > On 25.03.2016 21:42, Dmitry Ivanov wrote: > >> Sorry for the delay, I desperately needed some time to finish a bunch of > >> dangling tasks. > >> > >> I've added some new comments and clarified the ones that were obscure. > >> Moreover, I felt an urge to recheck most parts of the code since > >> apparently > >> nobody (besides myself) has gone so far yet. > >> > >> On 25.03.16 18:42 MSK, David Steele wrote: > >>> Time is short and it's not encouraging that you say there is "still > much > >> work to be done". > >> > >> Indeed, I was inaccurate. I am more than interested in the positive > >> outcome. > >> > > > > I tested the patch and take a look on it. All regression tests passed. > > The code looks good and the patch introduce a great functionality. > > > > I think the patch can be marked as "Ready for Commiter". But I do not > > feel the force to do that myself. > > > > Also I agree with you about tsvector_setweight(). There is not a problem > > with it because this weights are immutable and so there is not benefits > > from new function. > > Ideally Alexander can also look at it. If not, then you should mark it > "ready for committer" since I doubt any more reviewers will sign on this > late in the CF. > I've checked the last version of the patch. Patch applies cleanly on master, builds without warnings, regression tests pass. The issue I reported about tsquery textual representation is fixed. I'm going to mark this "Ready for Committer". -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Autovacuum to prevent wraparound tries to consume xid
Hackers, one our customer meet near xid wraparound situation. xid counter reached xidStopLimit value. So, no transactions could be executed in normal mode. But what I noticed is strange behaviour of autovacuum to prevent wraparound. It vacuums tables, updates pg_class and pg_database, but then falls with "database is not accepting commands to avoid wraparound data loss in database" message. We end up with situation that according to pg_database maximum age of database was less than 200 mln., but transactions couldn't be executed, because ShmemVariableCache wasn't updated (checked by gdb). I've reproduced this situation on my laptop as following: 1) Connect gdb, do "set ShmemVariableCache->nextXid = ShmemVariableCache->xidStopLimit" 2) Stop postgres 3) Make some fake clog: "dd bs=1m if=/dev/zero of=/usr/local/pgsql/data/pg_clog/07FF count=1024" 4) Start postgres Then I found the same situation as in customer database. Autovacuum to prevent wraparound regularly produced following messages in the log: ERROR: database is not accepting commands to avoid wraparound data loss in database "template1" HINT: Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared transactions. Finally all databases was frozen # SELECT datname, age(datfrozenxid) FROM pg_database; datname │ age ───┼── template1 │0 template0 │0 postgres │ 5000 (3 rows) but no transactions could be executed (ShmemVariableCache wasn't updated). After some debugging I found that vac_truncate_clog consumes xid just to produce warning. I wrote simple patch which replaces GetCurrentTransactionId() with ShmemVariableCache->nextXid. That completely fixes this situation for me: ShmemVariableCache was successfully updated. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company fix_vac_truncate_clog_xid_consume.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
Hi, Petr! On Thu, Mar 17, 2016 at 10:56 AM, Petr Jelinek wrote: > On 16/03/16 15:31, Teodor Sigaev wrote: > >> Good catch, thanks! Tests were added. >>> >> >> I don't see any objection, is consensus reached? I'm close to comiit >> that... >> >> > I did only cursory review on the bloom contrib module and don't really > have complaints there, but I know you can review that one. I'd like the > English of the generic_xlog.c description improved but I won't get to it > before weekend. What is your plans about English of the generic_xlog.c? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sun, Mar 27, 2016 at 4:31 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund wrote: > >> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote: >> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov < >> > a.korot...@postgrespro.ru> wrote: >> > >> > > Thank you very much for testing! >> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going >> to >> > > post results of tests on this server in next Monday. >> > > >> > >> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j >> 100 >> > -M prepared -T 300. >> > See results in the table and chart. >> > >> > clients master v3 v5 >> > 1 11671 12507 12679 >> > 2 24650 26005 25010 >> > 4 49631 48863 49811 >> > 8 96790 96441 99946 >> > 10 121275 119928 124100 >> > 20 243066 243365 246432 >> > 30 359616 342241 357310 >> > 40 431375 415310 441619 >> > 50 489991 489896 500590 >> > 60 538057 636473 554069 >> > 70 588659 714426 738535 >> > 80 405008 923039 902632 >> > 90 295443 1181247 1155918 >> > 100 258695 1323125 1325019 >> > 110 238842 1393767 1410274 >> > 120 226018 1432504 1474982 >> > 130 215102 1465459 1503241 >> > 140 206415 1470454 1505380 >> > 150 197850 1475479 1519908 >> > 160 190935 1420915 1484868 >> > 170 185835 1438965 1453128 >> > 180 182519 1416252 1453098 >> > >> > My conclusions are following: >> > 1) We don't observe any regression in v5 in comparison to master. >> > 2) v5 in most of cases slightly outperforms v3. >> >> What commit did you base these tests on? I guess something recent, after >> 98a64d0bd? >> > > Yes, more recent than 98a64d0bd. It was based on 676265eb7b. > > >> > I'm going to do some code cleanup of v5 in Monday >> >> Ok, I'll try to do a review and possibly commit after that. >> > > Sounds good. > There is next revision of patch. It contains mostly cosmetic changes. Comment are adjusted to reflect changes of behaviour. I also changed atomic AND/OR for local buffers to read/write pair which would be cheaper I suppose. However, I don't insist on it, and it could be reverted. The patch is ready for your review. It's especially interesting what do you think about the way I abstracted exponential back off of spinlock. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company pinunpin-cas-6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Access method extendability
On Tue, Mar 29, 2016 at 6:20 PM, David Steele wrote: > On 3/28/16 1:26 PM, Alvaro Herrera wrote: > >> Alexander Korotkov wrote: >> >>> On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera < >>> alvhe...@2ndquadrant.com> >>> wrote: >>> >>> .. Oh crap. I just noticed I forgot to update a comment in pg_dump's >>>> getAccessMethods. And we're missing psql tab-complete support for the >>>> new commands. >>>> >>> >>> Attached patches fix both these issues. >>> >> >> I see Teodor just pushed these two fixes. Thanks! >> > > Does that mean this patch should be closed or is there more remaining to > commit? > There are still two pending patches: * Generic WAL records * Bloom filter contrib Teodor is going to commit them. But he is waiting Petr Jelinek to review the English of the first one. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: Partial sort
On Tue, Mar 29, 2016 at 4:56 PM, David Steele wrote: > On 3/23/16 8:39 PM, Peter Geoghegan wrote: > > This looks like an old change you missed: >> >> - * compare_path_fractional_costs >> + * compare_fractional_path_costs >> >> All in all, this looks significantly better. Thanks for your work on >> this. Sorry for the delay in my response, and that my review was >> relatively rushed, but I'm rather busy at the moment with fighting >> fires. >> > > It looks like a new patch is required before this can be marked "ready for > committer". Will you have that ready soon? > Yes, that's it. I'm working on it now. I'm going to post it until tomorrow. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Access method extendability
On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev wrote: > I incorporated your changes and did some additional refinements on top of >> them >> still. >> >> Attached is delta against v12, that should cause less issues when merging >> for >> Teodor. >> > > But last version is 13th? > > BTW, it would be cool to add odcs in VII. Internals chapter, description > should be similar to index's interface description, i.e. how to use generic > WAL interface. We already have generic WAL interface description in comments to generic_xlog.c. I'm not fan of duplicating things. What about moving interface description from comments to docs completely? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] SQL access to access method details
On Mon, Mar 28, 2016 at 11:02 PM, Jim Nasby wrote: > While working on a tool to capture catalog/stats info and looking at cross > version compatibility, I noticed that the pg_am changes removed SQL access > to a bunch of AM info. [1] indicates that's part of the purpose of the > patch; are we sure no tools are using this info? Idea is that this kind of information is internal API and shouldn't be exposed at SQL level. I'm not quite sure there is no tool using this info. Do you know such tools? > Unlike previous catalog compatibility breaks, this one completely removes > that information, so if someone was using it they're now completely hosed. > > [1] http://www.postgresql.org/message-id/55fec1ab.8010...@2ndquadrant.com Do you have any example of this information usage? If so, we can think about providing some way of exposing it. If not, default assumption is that this information shouldn't be exposed. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Access method extendability
On Tue, Mar 29, 2016 at 8:34 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev wrote: > >> I incorporated your changes and did some additional refinements on top of >>> them >>> still. >>> >>> Attached is delta against v12, that should cause less issues when >>> merging for >>> Teodor. >>> >> >> But last version is 13th? >> >> BTW, it would be cool to add odcs in VII. Internals chapter, description >> should be similar to index's interface description, i.e. how to use generic >> WAL interface. > > > We already have generic WAL interface description in comments to > generic_xlog.c. I'm not fan of duplicating things. What about moving > interface description from comments to docs completely? > I heard no objections. There is revision of patch where generic WAL interface description was moved to documentation. This description contains improvements by Petr Jelinek, Alvaro Herrera and Markus Nullmeier (who sent it to me privately). -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0002-generic-xlog.14.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Wed, Mar 30, 2016 at 10:16 AM, Andres Freund wrote: > On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote: > > On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund > wrote: > > > > > My gut feeling is that we should do both 1) and 2). > > > > > > Dilip, could you test performance of reducing ppc's spinlock to 1 byte? > > > Cross-compiling suggest that doing so "just works". I.e. replace the > > > #if defined(__ppc__) typedef from an int to a char. > > > > > > > I set that, but after that it hangs, even Initdb hangs.. > > Yea, as Tom pointed out that's not going to work. I'll try to write a > patch for approach 1). > Great! Do you need any improvements for pinunpin-cas-7.patch from me? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: Partial sort
instead. That would call ExecMaterializesOutput() for non-Sort cases. > I've added ExecPlanMaterializesOutput() function. > >> Optimizer > >> - > >> > > >> * cost_sort() needs way way more comments. Doesn't even mention > >> indexes. Not worth commenting further on until I know what it's > >> *supposed* to do, though. > > > > > > I've added some comments. > > Looking at cost_sort() now, it's a bit clearer. I think that you > should make sure that everything is costed as a quicksort, though, if > you accept that we should try and make every small sort done by the > partial sort a quicksort. Which I think is a good idea. The common > case is that groups are small, but the qsort() insertion sort will be > very very fast for that case. > I'm not sure that in partial sort we should estimate sorting of one group in other way than simple sort does. I see following reasons: 1) I'd like partial sort to be able to use other sorting methods to provide graceful degradation in the case of planner mistakes as I pointed above. 2) Planner should don't choose partial sort plans in some cases. That should happen because higher cost of these plans including high cost of partial sort. Estimation of other sort methods looks like good way for reporting such high costs. > This looks like an old change you missed: > > - * compare_path_fractional_costs > + * compare_fractional_path_costs > I think this is rather a typo fix. Because now comment doesn't meet function name. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company partial-sort-basic-8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
Hi, Tomas! On Sat, Jan 23, 2016 at 3:07 PM, Tomas Vondra wrote: > On 10/20/2015 01:17 PM, Alexander Korotkov wrote: > >> On Fri, Oct 16, 2015 at 7:11 PM, Alexander Korotkov >> mailto:aekorot...@gmail.com>> wrote: >> >> On Sun, Jun 7, 2015 at 11:01 PM, Peter Geoghegan > <mailto:p...@heroku.com>> wrote: >> >> On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson >> mailto:andr...@proxel.se>> wrote: >> > Are you planning to work on this patch for 9.6? >> >> FWIW I hope so. It's a nice patch. >> >> >> I'm trying to to whisk dust. Rebased version of patch is attached. >> This patch isn't passing regression tests because of plan changes. >> I'm not yet sure about those changes: why they happens and are they >> really regression? >> Since I'm not very familiar with planning of INSERT ON CONFLICT and >> RLS, any help is appreciated. >> >> >> Planner regression is fixed in the attached version of patch. It appears >> that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no >> ordering is required. >> >> > Alexander, are you working on this patch? I'd like to look at the patch, > but the last available version (v4) no longer applies - there's plenty of > bitrot. Do you plan to send an updated / rebased version? > I'm sorry that I didn't found time for this yet. I'm certainly planning to get back to this in near future. The attached version is just rebased without any optimization. The main thing I'm particularly interested in is how much is this coupled > with the Sort node, and whether it's possible to feed partially sorted > tuples into other nodes. > > I'm particularly thinking about Hash Aggregate, because the partial sort > allows to keep only the "current group" in a hash table, making it much > more memory efficient / faster. What do you think? > This seems to me very reasonable optimization. And it would be nice to implement some generalized way of presorted group processing. For instance, we could have some special node, say "Group Scan" which have 2 children: source and node which process every group. For "partial sort" the second node would be Sort node. But it could be Hash Aggregate node as well. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company partial-sort-basic-5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
Hi! On Sat, Jan 23, 2016 at 10:07 PM, Peter Geoghegan wrote: > On Sat, Jan 23, 2016 at 4:07 AM, Tomas Vondra > wrote: > > The main thing I'm particularly interested in is how much is this coupled > > with the Sort node, and whether it's possible to feed partially sorted > > tuples into other nodes. > > That's cool, but I'm particularly interested in seeing Alexander get > back to this because it's an important project on its own. We should > really have this. > Thank you for your review and interest in this patch! I'm sorry for huge delay I made. I'm going to get back to this soon. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera wrote: > So far as I can tell, there are three patches in flight here: > > * replication slot IO lwlocks > * ability of extensions to request tranches dynamically > * PGPROC > > The first one hasn't been reviewed at all, but the other two have seen a > bit of discussion and evolution. Is anyone doing any more reviewing? I'd like to add another one: fixed tranche id for each SLRU. And I'm going to make review over others. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company slru_tranche_ids_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Mon, Jan 4, 2016 at 5:58 PM, Robert Haas wrote: > On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila > wrote: > > If we do that way, then user of API needs to maintain the interlock > > guarantee that the requested number of locks is same as allocations > > done from that tranche and also if it is not allocating all the > > LWLocks in one function, it needs to save the base address of the > > array and then allocate from it by maintaining some counter. > > I agree that looking up for tranche names multiple times is not good, > > if there are many number of lwlocks, but that is done at startup > > time and not at operation-level. Also if we look currently at > > the extensions in contrib, then just one of them allocactes one > > LWLock in this fashion, now there could be extnsions apart from > > extensions in contrib, but not sure if they require many number of > > LWLocks, so I feel it is better to give an API which is more > > convinient for user to use. > > Well, I agree with you that we should provide the most convenient API > possible, but I don't agree that yours is more convenient than the one > I proposed. I think it is less convenient. In most cases, if the > user asks for a large number of LWLocks, they aren't going to be each > for a different purpose - they will all be for the same purpose, like > one per buffer or one per hash partition. The case where the user > wants to allocate 8 lwlocks from an extension, each for a different > purpose, and spread those allocations across a bunch of different > functions probably does not exist. *Maybe* there is somebody > allocating lwlocks from an extension for unrelated purposes, but I'd > be willing to bet that, if so, all of those allocations happen one > right after the other in a single function, because anything else > would be completely nuts. > I'm not sure if we should return LWLockPadded*. +LWLockPadded * > +GetLWLockAddinTranche(const char *tranche_name) It would be nice to isolate extension from knowledge about padding. Could we one day change it from LWLockPadded * to LWLockMinimallyPadded * or any? +LWLock ** > +GetLWLockAddinTranche(const char *tranche_name) Could we use this signature? > >> > Also I have retained NUM_USER_DEFINED_LWLOCKS in > >> > MainLWLockArray so that if any extensions want to assign a LWLock > >> > after startup, it can be used from this pool. The tranche for such > >> > locks > >> > will be reported as main. > >> > >> I don't like this. I think we should get rid of > >> NUM_USER_DEFINED_LWLOCKS altogether. It's just an accident waiting to > >> happen, and I don't think there are enough people using LWLocks from > >> extensions that we'll annoy very many people by breaking backward > >> compatibility here. If we are going to care about backward > >> compatibility, then I think the old-style lwlocks should go in their > >> own tranche, not main. But personally, I think it'd be better to just > >> force a hard break and make people switch to using the new APIs. > > > > One point to think before removing it altogether, is that the new API's > > provide a way to allocate LWLocks at the startup-time and if any user > > wants to allocate a new LWLock after start-up, it will not be possible > > with the proposed API's. Do you think for such cases, we should ask > > user to use it the way we have exposed or do you have something else > > in mind? > > Allocating a new lock after startup mostly doesn't work, because there > will be at most 3 available and sometimes less. And even if it does > work, it often isn't very useful because you probably need some other > shared memory space as well - otherwise, what is the lock protecting? > And that space might not be available either. > > I'd be interested in knowing whether there are cases where useful > extensions can be loaded apart from shared_preload_libraries because > of NUM_USER_DEFINED_LWLOCKS and our tendency to allocate a little bit > of extra shared memory, but my suspicion is that it rarely works out > and is too flaky to be useful to anybody. +1 for dropping old API I don't think it's useful to have LWLocks without having shared memory. There is another thing I'd like extensions to be able to do. It would be nice if extensions could use dynamic shared memory instead of static. Then extensions could use shared memory without being in shared_preload_libraries. But if extension register DSM, then there is no way to tell other backends to use it for that extension. Also DSM would be deallocated when all backends detached from it. This it out of scope for this patch though. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Tue, Jan 5, 2016 at 4:04 PM, Amit Kapila wrote: > On Mon, Jan 4, 2016 at 8:28 PM, Robert Haas wrote: > > > > On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila > wrote: > > > On Mon, Jan 4, 2016 at 4:49 AM, Robert Haas > wrote: > > >> On Thu, Dec 31, 2015 at 3:36 AM, Amit Kapila > > > >> wrote: > > >> > LWLock *LWLockAssignFromTranche(const char *tranche_name) will > > >> > assign a lock for the specified tranche. This also ensures that no > > >> > more than requested number of LWLocks can be assigned from a > > >> > specified tranche. > > >> > > >> However, this seems like an awkward API, because if there are many > > >> LWLocks you're going to need to look up the tranche name many times, > > >> and then you're going to need to make an array of pointers to them > > >> somehow, introducing a thoroughly unnecessary layer of indirection. > > >> Instead, I suggest just having a function LWLockPadded > > >> *GetLWLockAddinTranche(const char *tranche_name) that returns a > > >> pointer to the base of the array. > > > > > > If we do that way, then user of API needs to maintain the interlock > > > guarantee that the requested number of locks is same as allocations > > > done from that tranche and also if it is not allocating all the > > > LWLocks in one function, it needs to save the base address of the > > > array and then allocate from it by maintaining some counter. > > > I agree that looking up for tranche names multiple times is not good, > > > if there are many number of lwlocks, but that is done at startup > > > time and not at operation-level. Also if we look currently at > > > the extensions in contrib, then just one of them allocactes one > > > LWLock in this fashion, now there could be extnsions apart from > > > extensions in contrib, but not sure if they require many number of > > > LWLocks, so I feel it is better to give an API which is more > > > convinient for user to use. > > > > Well, I agree with you that we should provide the most convenient API > > possible, but I don't agree that yours is more convenient than the one > > I proposed. I think it is less convenient. In most cases, if the > > user asks for a large number of LWLocks, they aren't going to be each > > for a different purpose - they will all be for the same purpose, like > > one per buffer or one per hash partition. The case where the user > > wants to allocate 8 lwlocks from an extension, each for a different > > purpose, and spread those allocations across a bunch of different > > functions probably does not exist. > > Probably, but the point is to make user of API do what he or she wants > to accomplish without much knowledge of underlying stuff. However, > I think it is not too much details for user to know, so I have changed the > API as per your suggestion. > > > > > *Maybe* there is somebody > > allocating lwlocks from an extension for unrelated purposes, but I'd > > be willing to bet that, if so, all of those allocations happen one > > right after the other in a single function, because anything else > > would be completely nuts. > > > > >> > Also I have retained NUM_USER_DEFINED_LWLOCKS in > > >> > MainLWLockArray so that if any extensions want to assign a LWLock > > >> > after startup, it can be used from this pool. The tranche for such > > >> > locks > > >> > will be reported as main. > > >> > > > > I'd be interested in knowing whether there are cases where useful > > extensions can be loaded apart from shared_preload_libraries because > > of NUM_USER_DEFINED_LWLOCKS and our tendency to allocate a little bit > > of extra shared memory, but my suspicion is that it rarely works out > > and is too flaky to be useful to anybody. > > > > I am not aware of such cases, however the reason I have kept it was for > backward-compatability, but now I have removed it in the attached patch. > > Apart from that, I have updated the docs to reflect the changes related > to new API's. > > Fe things to Note - > Some change is needed in LWLockCounter[1] if this goes after 2 > other patches (separate tranche for PGProcs and separate tranche > for ReplicationSlots). Also, LWLockAssign() can be removed after > those patches > Also couple of minor comments from me. I think this + StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name, > tranche_name, strlen(tranche_name) + 1); should be + StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name, > tranche_name, > sizeof(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name)); And as far as I know english "it's" should be "its" in the sentence below. + from _PG_init. Tranche repersents an array of LWLocks > and > + can be accessed by it's name. First parameter > tranche_name -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Tue, Dec 29, 2015 at 11:56 AM, Amit Kapila wrote: > On Wed, Dec 16, 2015 at 12:26 AM, Robert Haas > wrote: > > > > > > In terms of this project overall, NumLWLocks() now knows about only > > four categories of stuff: fixed lwlocks, backend locks (proc.c), > > replication slot locks, and locks needed by extensions. I think it'd > > probably be fine to move the backend locks into PGPROC directly, and > > the replication slot locks into ReplicationSlot. > > > > IIdus has written a patch to move backend locks into PGPROC which > I am reviewing and will do performance tests once he responds to > my review comments and I have written a patch to move replication > slot locks into ReplicationSlot which is attached with this mail. > This patch looks good for me. It compiles without warnings, passes regression tests. I also did small testing of replication slots in order to check that it works correctly. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Sequence Access Method WIP
On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera wrote: > Petr Jelinek wrote: > > On 29 January 2016 at 14:48, Tom Lane wrote: > > > Alvaro Herrera writes: > > > > Uh, what? Surely we would provide a bespoke command for each possible > > > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to > check > > > that the provided function has the right signature, and then it would > put > > > the correct amkind into the pg_am entry automatically. > > I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something > like that. > Ok! Let's nail down the syntax and I can integrate it into my createam patch. I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER handler;", but I don't insist. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal: Generic WAL logical messages
Hi, Petr! On Sat, Jan 23, 2016 at 1:22 AM, Petr Jelinek wrote: > here is updated version of this patch, calling the messages logical > (decoding) messages consistently everywhere and removing any connection to > standby messages. Moving this to it's own module gave me place to write > some brief explanation about this so the code documentation has hopefully > improved as well. > > The functionality itself didn't change. I'd like to mention that there is my upcoming patch which is named generic WAL records. *http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com <http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com>* But it has to be distinct feature from your generic WAL logical messages. Theoretically, we could have generic messages with arbitrary content and both having custom WAL reply function and being decoded by output plugin. But custom WAL reply function would let extension bug break recovery, archiving and physical replication. And that doesn't seem to be acceptable. This is why we have to develop these as separate features. Should we think more about naming? Does two kinds of generic records confuse people? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
Hi, Dilip! On Tue, Jan 19, 2016 at 6:00 PM, Dilip Kumar wrote: > > On Tue, Jan 19, 2016 at 5:44 PM, Michael Paquier < > michael.paqu...@gmail.com> wrote: > >> > Test3: >> > pgbench -i -s 100 postgres >> > pgbench -c$ -j$ -Mprepared -S postgres >> > >> > Client Base Pached >> > >> > 1 2055519404 >> > 32 375919 332670 >> > 64 509067 440680 >> > 128431346 415121 >> > 256380926 379176 >> >> It seems like you did a copy-paste of the results with s=100 and >> s=300. Both are showing the exact same numbers. > > > Oops, my mistake, re-pasting the correct results for s=100 > > pgbench -i -s 100 postgres > pgbench -c$ -j$ -Mprepared -S postgres > > Client Base Pached > > 12054820791 > 32 372633 355356 > 64 532052 552148 > 128412755 478826 > 256 346701 372057 > Could you please re-run these tests few times? Just to be sure it's a reproducible regression with s=300 and not a statistical error. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Optimizer questions
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik < k.knizh...@postgrespro.ru> wrote: > Attached please find improved version of the optimizer patch for LIMIT > clause. > Now I try to apply this optimization only for non-trivial columns > requiring evaluation. > May be it will be better to take in account cost of this columns > evaluation but right now I just detect non-variable columns. > We may think about more general feature: LIMIT pushdown. In the Konstantin's patch planner push LIMIT before tlist calculation. But there are other cases when calculating LIMIT first would be beneficial. For instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN which is not used in filter and ordering clauses. See the example below. # create table t1 as (select i, random() v from generate_series(1,100) i); SELECT 100 # create table t2 as (select i, random() v from generate_series(1,100) i); SELECT 100 # explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v limit 10; QUERY PLAN Limit (cost=87421.64..87421.67 rows=10 width=24) (actual time=1486.276..1486.278 rows=10 loops=1) -> Sort (cost=87421.64..89921.64 rows=100 width=24) (actual time=1486.275..1486.275 rows=10 loops=1) Sort Key: t1.v Sort Method: top-N heapsort Memory: 25kB -> Hash Left Join (cost=27906.00..65812.00 rows=100 width=24) (actual time=226.180..1366.238 rows=100 Hash Cond: (t1.i = t2.i) -> Seq Scan on t1 (cost=0.00..15406.00 rows=100 width=12) (actual time=0.010..77.041 rows=100 l -> Hash (cost=15406.00..15406.00 rows=100 width=12) (actual time=226.066..226.066 rows=100 loop Buckets: 131072 Batches: 1 Memory Usage: 46875kB -> Seq Scan on t2 (cost=0.00..15406.00 rows=100 width=12) (actual time=0.007..89.002 rows=100 Planning time: 0.123 ms Execution time: 1492.118 ms (12 rows) # explain analyze select * from (select * from t1 order by t1.v limit 10) t1 left join t2 on t1.i = t2.i; QUERY PLAN Hash Right Join (cost=37015.89..56171.99 rows=10 width=24) (actual time=198.478..301.278 rows=10 loops=1) Hash Cond: (t2.i = t1.i) -> Seq Scan on t2 (cost=0.00..15406.00 rows=100 width=12) (actual time=0.005..74.303 rows=100 loops=1) -> Hash (cost=37015.77..37015.77 rows=10 width=12) (actual time=153.584..153.584 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Limit (cost=37015.64..37015.67 rows=10 width=12) (actual time=153.579..153.580 rows=10 loops=1) -> Sort (cost=37015.64..39515.64 rows=100 width=12) (actual time=153.578..153.578 rows=10 loops=1) Sort Key: t1.v Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on t1 (cost=0.00..15406.00 rows=100 width=12) (actual time=0.012..78.828 rows=100 Planning time: 0.132 ms Execution time: 301.308 ms (12 rows) In this example LIMIT pushdown makes query 5 times faster. It would be very nice if optimizer make this automatically. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Sat, Jan 30, 2016 at 10:58 AM, Amit Kapila wrote: > On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas > wrote: > > > > On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov > > wrote: > > > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > > > wrote: > > >> So far as I can tell, there are three patches in flight here: > > >> > > >> * replication slot IO lwlocks > > >> * ability of extensions to request tranches dynamically > > >> * PGPROC > > >> > > >> The first one hasn't been reviewed at all, but the other two have > seen a > > >> bit of discussion and evolution. Is anyone doing any more reviewing? > > > > > > I'd like to add another one: fixed tranche id for each SLRU. > > > > +1 for this. The patch looks good and I will commit it if nobody > objects. > > > > +1. Patch looks good to me as well, but I have one related question: > Is there a reason why we should not assign ReplicationOrigins a > fixed tranche id and then we might want to even get away with > LWLockRegisterTranche()? > +1. I think we should do this. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Mon, Feb 1, 2016 at 7:05 AM, Dilip Kumar wrote: > > On Sun, Jan 31, 2016 at 11:44 AM, Dilip Kumar > wrote: > >> By looking at the results with scale factor 1000 and 100 i don't see any >> reason why it will regress with scale factor 300. >> >> So I will run the test again with scale factor 300 and this time i am >> planning to run 2 cases. >> 1. when data fits in shared buffer >> 2. when data doesn't fit in shared buffer. >> > > I have run the test again with 300 S.F and found no regression, in fact > there is improvement with the patch like we saw with 1000 scale factor. > > Shared Buffer= 8GB > max_connections=150 > Scale Factor=300 > > ./pgbench -j$ -c$ -T300 -M prepared -S postgres > > ClientBasePatch > 11974419382 > 8125923126395 > 3231393151 > 64387339496830 > 128306412350610 > > Shared Buffer= 512MB > max_connections=150 > Scale Factor=300 > > ./pgbench -j$ -c$ -T300 -M prepared -S postgres > > ClientBasePatch > 11716916454 > 8108547105559 > 32241619262818 > 64206868233606 > 128137084217013 > Great, thanks! -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal: Generic WAL logical messages
On Sat, Jan 30, 2016 at 11:58 AM, Simon Riggs wrote: > On 29 January 2016 at 21:11, Alexander Korotkov > wrote: > >> Hi, Petr! >> >> On Sat, Jan 23, 2016 at 1:22 AM, Petr Jelinek >> wrote: >> >>> here is updated version of this patch, calling the messages logical >>> (decoding) messages consistently everywhere and removing any connection to >>> standby messages. Moving this to it's own module gave me place to write >>> some brief explanation about this so the code documentation has hopefully >>> improved as well. >>> >>> The functionality itself didn't change. >> >> >> I'd like to mention that there is my upcoming patch which is named >> generic WAL records. >> *http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com >> <http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com>* >> But it has to be distinct feature from your generic WAL logical messages. >> Theoretically, we could have generic messages with arbitrary content and >> both having custom WAL reply function and being decoded by output plugin. >> But custom WAL reply function would let extension bug break recovery, >> archiving and physical replication. And that doesn't seem to be acceptable. >> This is why we have to develop these as separate features. >> >> Should we think more about naming? Does two kinds of generic records >> confuse people? >> > > Logical messages > > Generic WAL records > > Seems like I can tell them apart. Worth checking, but I think we're OK. > I was worrying because topic name is "Generic WAL logical messages". But if we name them just "Logical messages" then it's OK for me. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Mon, Feb 1, 2016 at 10:26 AM, Amit Kapila wrote: > On Sat, Jan 30, 2016 at 2:15 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > > > > On Sat, Jan 30, 2016 at 10:58 AM, Amit Kapila > wrote: > >> > >> On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas > wrote: > >> > > >> > On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov > >> > wrote: > >> > > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > >> > > wrote: > >> > >> So far as I can tell, there are three patches in flight here: > >> > >> > >> > >> * replication slot IO lwlocks > >> > >> * ability of extensions to request tranches dynamically > >> > >> * PGPROC > >> > >> > >> > >> The first one hasn't been reviewed at all, but the other two have > seen a > >> > >> bit of discussion and evolution. Is anyone doing any more > reviewing? > >> > > > >> > > I'd like to add another one: fixed tranche id for each SLRU. > >> > > >> > +1 for this. The patch looks good and I will commit it if nobody > objects. > >> > > >> > >> +1. Patch looks good to me as well, but I have one related question: > >> Is there a reason why we should not assign ReplicationOrigins a > >> fixed tranche id and then we might want to even get away with > >> LWLockRegisterTranche()? > > > > > > +1. I think we should do this. > > > > Okay, Attached patch assigns fixed trancheid for ReplicationOrigins. > I don't think we can remove LWLockRegisterTranche(), as that will be > required for assigning transcheid's for extensions, so didn't change that > part of code. > OK. This one looks good for me too. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Mon, Feb 1, 2016 at 7:05 AM, Dilip Kumar wrote: > >> >> On Sun, Jan 31, 2016 at 11:44 AM, Dilip Kumar >> wrote: >> >>> By looking at the results with scale factor 1000 and 100 i don't see any >>> reason why it will regress with scale factor 300. >>> >>> So I will run the test again with scale factor 300 and this time i am >>> planning to run 2 cases. >>> 1. when data fits in shared buffer >>> 2. when data doesn't fit in shared buffer. >>> >> >> I have run the test again with 300 S.F and found no regression, in fact >> there is improvement with the patch like we saw with 1000 scale factor. >> >> Shared Buffer= 8GB >> max_connections=150 >> Scale Factor=300 >> >> ./pgbench -j$ -c$ -T300 -M prepared -S postgres >> >> ClientBasePatch >> 11974419382 >> 8125923126395 >> 3231393151 >> 64387339496830 >> 128306412350610 >> >> Shared Buffer= 512MB >> max_connections=150 >> Scale Factor=300 >> >> ./pgbench -j$ -c$ -T300 -M prepared -S postgres >> >> ClientBasePatch >> 11716916454 >> 8108547105559 >> 32241619262818 >> 64206868233606 >> 128137084217013 >> > > Great, thanks! > Attached patch is rebased and have better comments. Also, there is one comment which survive since original version by Andres. /* Add exponential backoff? Should seldomly be contended tho. */ Andres, did you mean we should twice the delay with each unsuccessful try to lock? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company > pinunpin-cas-2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Sun, Jan 31, 2016 at 6:55 AM, Amit Kapila wrote: > On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila > wrote: > > > > On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas > wrote: > > > > > > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de < > and...@anarazel.de> wrote: > > > > I do think there's a considerable benefit in improving the > > > > instrumentation here, but his strikes me as making live more complex > for > > > > more users than it makes it easier. At the very least this should be > > > > split into two fields (type & what we're actually waiting on). I also > > > > strongly suspect we shouldn't use in band signaling ("process not > > > > waiting"), but rather make the field NULL if we're not waiting on > > > > anything. > > > > > > +1 for splitting it into two fields. > > > > > > > I will take care of this. > > > > As discussed, I have added a new field wait_event_type along with > wait_event in pg_stat_activity. Changed the code return NULL, if > backend is not waiting. Updated the docs as well. > I wonder if we can use 4-byte wait_event_info more efficiently. LWLock number in the tranche would be also useful information to expose. Using lwlock number user can determine if there is high concurrency for single lwlock in tranche or it is spread over multiple lwlocks. I think it would be enough to have 6 bits for event class id and 10 bit for event id. So, it would be maximum 64 event classes and maximum 1024 events per class. These limits seem to be fair enough for me. And then we save 16 bits for lock number. It's certainly not enough for some tranches. For instance, number of buffers could be easily more than 2^16. However, we could expose at least lower 16 bits. It would be at least something. Using this information user at least can make a conclusion like "It MIGHT be a high concurrency for single buffer content. Other way it is coincidence that a lot of different buffers have the same 16 lower bits.". Any thoughts? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Fix KNN GiST ordering type
Hi! KNN GiST detects which type it should return by returning type of ordering operator. But it appears that type of sk_func is detected after it was replaced with distance function. That is wrong: distance function should always return float8. I think it is just a typo. Should be backpatched to 9.5. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company fix-knn-gist-ordering-type.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Tue, Feb 2, 2016 at 12:43 AM, Andres Freund wrote: > On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote: > > On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov < > > a.korot...@postgrespro.ru> wrote: > > >> ClientBasePatch > > >> 11974419382 > > >> 8125923126395 > > >> 3231393151 > > >> 64387339496830 > > >> 128306412350610 > > >> > > >> Shared Buffer= 512MB > > >> max_connections=150 > > >> Scale Factor=300 > > >> > > >> ./pgbench -j$ -c$ -T300 -M prepared -S postgres > > >> > > >> ClientBasePatch > > >> 11716916454 > > >> 8108547105559 > > >> 32241619262818 > > >> 64206868233606 > > >> 128137084217013 > > So, there's a small regression on low client counts. That's worth > addressing. > Interesting. I'll try to reproduce it. > > Attached patch is rebased and have better comments. > > Also, there is one comment which survive since original version by > Andres. > > > > /* Add exponential backoff? Should seldomly be contended tho. */ > > > > > > Andres, did you mean we should twice the delay with each unsuccessful try > > to lock? > > Spinning on a lock as fast as possible leads to rapid cacheline bouncing > without anybody making progress. See s_lock() in s_lock.c. > I didn't notice that s_lock() behaves so. Thank you. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Fix KNN GiST ordering type
On Mon, Feb 1, 2016 at 7:31 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > KNN GiST detects which type it should return by returning type of ordering > operator. > But it appears that type of sk_func is detected after it was replaced with > distance function. That is wrong: distance function should always return > float8. > I think it is just a typo. > I found this was introduced by this commit. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=821b821a2421beaa58225ff000833df69fb962c5;hp=284bef297733e553c73f1c858e0ce1532f754d18 However, commit message doesn't say anything special about this change. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Tue, Feb 2, 2016 at 2:54 PM, Robert Haas wrote: > On Mon, Feb 1, 2016 at 3:47 AM, Alexander Korotkov > wrote: > > OK. This one looks good for me too. > > All right, I pushed both this and the other one as a single commit, > since they were so closely related and the second only one line. > Great, thanks! It seems that we have only extension tranches patch pending in this thread. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Sequence Access Method WIP
On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek wrote: > On 29 January 2016 at 23:59, Robert Haas wrote: > > On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane wrote: > >> Alexander Korotkov writes: > >>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > >>> wrote: > >>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or > something > >>>> like that. > >> > >>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name > HANDLER > >>> handler;", but I don't insist. > >> > >> I think that Alvaro's idea is less likely to risk future grammar > >> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX > >> [ CONCURRENTLY ] to make me really wary of variables in the > statement-name > >> part of the syntax. > > > > Strong +1. If we put the type of access method immediately after > > CREATE, I'm almost positive we'll regret it for exactly that reason. > > > > Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar > conflict now, that's why my proposal was different, I didn't want to > add more keywords. I think Alvaro's proposal is fine as well. > > The other point is that we are creating ACCESS METHOD object so that's > what should be after CREATE. > > In any case this is slightly premature IMHO as DDL is somewhat unless > until we have sequence access methods implementation we can agree on, > or the generic WAL logging so that custom indexes can be crash safe. I've changed syntax of CREATE ACCESS METHOD syntax in the thread about index access method extendability. Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New column amtype of pg_am stores access method type. http://www.postgresql.org/message-id/capphfdu9gln7kuicwegsp50caamwx8q-jwzbpehc92rvfhz...@mail.gmail.com It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE HANDLER handler;". -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Figures in docs
17.02.2016 09:17, Tatsuo Ishii wrote: >> Hi. >> >> In DocBook 4.2 sgml dtd, figure tag is supported already. >> that was implemented for multi output format. > Ok, there's no technical problems with figures then. MySQL docs has > some nice figures. I am jealous. The "figure" tag is just a placeholder in the Docbook (http://www.docbook.org/tdg/en/html/figure.html). The question is what to place inside this tag: "graphic" (not inline), "mediaobject/imageobject" (alternative object, supports inline contents and SVG), or something else. So you surely can insert some picture from an external file (.png, .jpg), but if it could contain title or some labels that should be translated, it's not a best solution. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Figures in docs
Hello, In fact we use "make postgres.xml" to get a single XML, which we translate. We convert it to .po using xml2po (with some modifications), and then convert it back to xml (and then to sgml files with our custom script). So we have not changed sgml to xml files, and if it's more appropriate than having a single large XML, then we will propose a procedure to perform such conversion (that will result in corresponding mega patch). Best regards, Alexander Lakhin 17.02.2016 08:41, Tatsuo Ishii пишет: On Wed, Feb 17, 2016 at 4:17 AM, Tatsuo Ishii wrote: It seems there's no figures/diagrams in our docs. I vaguely recall that we used to have a few diagrams in our docs. If so, was there any technical reason to remove them? I don't know the reason, but it's shame, we are still in sgml ! We already do our translations (as others) in xml using custom scripting. xml provides us better integration with available tools and ability to insert graphics. Last time we asked in -docs about moving to xml and Alexander demonstrated acceptable speed of xml build, but there were no reply from Peter, who is (?) responsible for our documentation infrastructure. Probably, we should just created a patch and submit to commitfest. You can check this thread http://www.postgresql.org/message-id/1428009501118.85...@postgrespro.ru Well, there are some PostgreSQL doc translation projects are running including translation for Japanese, which I am working on. If we are going to change the manual format and/or the build system, I need to confirm it does work for Japanese as well. In theory because the Japanese translation project uses UTF-8, there should be no problem as far as the whole build system works for UTF-8. But I want to confirm first... BTW, are you going to propose a mega patch which changes all the sgml files to xml files? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about google summer of code 2016
On Wed, Feb 17, 2016 at 10:40 AM, Amit Langote < langote_amit...@lab.ntt.co.jp> wrote: > On 2016/02/17 16:27, Shubham Barai wrote: > > Hello everyone, > > > > I am currently pursuing my bachelor of engineering in computer science > > at Maharashtra > > Institute of Technology, Pune ,India. I am very excited about > contributing > > to postgres through google summer of code program. > > > > Is postgres applying for gsoc 2016 as mentoring organization ? > > I think it does. Track this page for updates: > http://www.postgresql.org/developer/summerofcode/ > > You can contact one of the people listed on that page for the latest. > > I didn't find for 2016 but here is the PostgreSQL wiki page for the last > year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas I've created wiki page for GSoC 2016. It contains unimplemented ideas from 2015 page. Now, GSoC accepting proposals from organizations. Typically, we have call for mentors in hackers mailing list in this period. Thom, do we apply this year? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] The plan for FDW-based sharding
. > Postgres XC has always used the Postgres license.) > > If we are not willing to add code for the missing Postgres XC/XL > features, Postgres XC/XL will probably remain a separate fork of > Postgres. I don't think anyone knows the answer to this question, and I > don't know how to find the answer except to keep going with our current > FDW sharding approach. > I have nothing against particular FDW advances. However, it's unclear for me that FDW should be the only sharding approach. It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can have some low-hanging fruits. That's good. But it's unclear we can have high-hanging fruits (like data redistribution) with FDW approach. And if we can it's unclear that it would be easier than with other approaches. Just let's don't call this community chosen plan for implementing sharding. Until we have full picture we can't select one way and reject others. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Relation cache invalidation on replica
On Fri, Feb 26, 2016 at 3:41 PM, Васильев Дмитрий wrote: > Session opened on replica doesn't see concurrently created indexes at this > time on master. > As I get, on standby index is visible when you run SQL queries on catalog tables (that is what \d+ does), but planner doesn't pick it. Assuming that in new session planner picks this index, it seems to be bug for me. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Pluggable storage
On Thu, Aug 18, 2016 at 10:58 AM, Simon Riggs wrote: > On 16 August 2016 at 19:46, Andres Freund wrote: > > On 2016-08-15 12:02:18 -0400, Robert Haas wrote: > >> Thanks for taking a stab at this. I'd like to throw out a few concerns. > >> > >> One, I'm worried that adding an additional layer of pointer-jumping is > >> going to slow things down and make Andres' work to speed up the > >> executor more difficult. I don't know that there is a problem there, > >> and if there is a problem I don't know what to do about it, but I > >> think it's something we need to consider. > > > > I'm quite concerned about that as well. > > This objection would apply to all other proposals as well, FDW etc.. > > Do you see some way to add flexibility yet without adding a branch > point in the code? > It's impossible without branch point in code. The question is where this branch should be located. In particular, be can put this branch point into planner by defining distinct executor nodes for each pluggable storage. In this case, each storage would have own optimized executor nodes. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Should we cacheline align PGXACT?
Hackers, originally this idea was proposed by Andres Freund while experimenting with lockfree Pin/UnpinBuffer [1]. The patch is attached as well as results of pgbench -S on 72-cores machine. As before it shows huge benefit in this case. For sure, we should validate that it doesn't cause performance regression in other cases. At least we should test read-write and smaller machines. Any other ideas? 1. https://www.postgresql.org/message-id/20160411214029.ce3fw6zxim5k6...@alap3.anarazel.de -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company pgxact-align-1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we cacheline align PGXACT?
On Fri, Aug 19, 2016 at 4:46 PM, Tom Lane wrote: > Alexander Korotkov writes: > > originally this idea was proposed by Andres Freund while experimenting > with > > lockfree Pin/UnpinBuffer [1]. > > The patch is attached as well as results of pgbench -S on 72-cores > > machine. As before it shows huge benefit in this case. > > That's one mighty ugly patch. Can't you do it without needing to > introduce the additional layer of struct nesting? > That's worrying me too. We could use anonymous struct, but it seems to be prohibited in C89 which we stick to. Another idea, which comes to my mind, is to manually calculate size of padding and insert it directly to PGXACT struct. But that seems rather ugly too. However, it would be ugly definition not ugly usage... Do you have better ideas? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Should we cacheline align PGXACT?
On Fri, Aug 19, 2016 at 3:19 PM, Amit Kapila wrote: > On Fri, Aug 19, 2016 at 11:42 AM, Alexander Korotkov > wrote: > > Hackers, > > > > originally this idea was proposed by Andres Freund while experimenting > with > > lockfree Pin/UnpinBuffer [1]. > > The patch is attached as well as results of pgbench -S on 72-cores > machine. > > As before it shows huge benefit in this case. > > For sure, we should validate that it doesn't cause performance > regression in > > other cases. At least we should test read-write and smaller machines. > > Any other ideas? > > > > may be test on Power m/c as well. > Good idea. I don't have any such machine at hand now. Do you have one? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Tracking wait event for latches
Hi, Michael! On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier wrote: > On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier > wrote: > > Attached is an updated patch. > > Updated version for 2 minor issues: > 1) s/stram/stream/ > 2) Docs used incorrect number > I took a look at your patch. Couple of notes from me. const char * > GetEventIdentifier(uint16 eventId) > { > const char *res; > switch (eventId) > { > case EVENT_ARCHIVER_MAIN: > res = "ArchiverMain"; > break; > ... long long list of events ... > case EVENT_WAL_SENDER_WRITE_DATA: > res = "WalSenderWriteData"; > break; > default: > res = "???"; > } > return res; > } Would it be better to use an array here? typedef enum EventIdentifier > { EventIdentifier seems too general name for me, isn't it? Could we name it WaitEventIdentifier? Or WaitEventId for shortcut? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Tracking wait event for latches
On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > I took a look at your patch. Couple of notes from me. > > const char * >> GetEventIdentifier(uint16 eventId) >> { >> const char *res; >> switch (eventId) >> { >> case EVENT_ARCHIVER_MAIN: >> res = "ArchiverMain"; >> break; >> ... long long list of events ... >> case EVENT_WAL_SENDER_WRITE_DATA: >> res = "WalSenderWriteData"; >> break; >> default: >> res = "???"; >> } >> return res; >> } > > > Would it be better to use an array here? > > typedef enum EventIdentifier >> { > > > EventIdentifier seems too general name for me, isn't it? Could we name it > WaitEventIdentifier? Or WaitEventId for shortcut? > I'm also not sure about handling of secure_read() and secure_write() functions. In the current patch we're tracking latch event wait inside them. But we setup latch only for blocking sockets and can do it multiple times in loop. Would it be better to make separate wait events NETWORK_READ and NETWORK_WRITE and setup them for the whole time spent in secure_read() and secure_write()? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Block level parallel vacuum WIP
On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane wrote: > Claudio Freire writes: > > Not only that, but from your description (I haven't read the patch, > > sorry), you'd be scanning the whole index multiple times (one per > > worker). > > What about pointing each worker at a separate index? Obviously the > degree of concurrency during index cleanup is then limited by the > number of indexes, but that doesn't seem like a fatal problem. > +1 We could eventually need some effective way of parallelizing vacuum of single index. But pointing each worker at separate index seems to be fair enough for majority of cases. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas wrote: > On 08/23/2016 06:18 PM, Heikki Linnakangas wrote: > >> On 08/22/2016 08:38 PM, Andres Freund wrote: >> >>> On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote: >>> >>>> I >>>> remember seeing ProcArrayLock contention very visible earlier, but I >>>> can't >>>> hit that now. I suspect you'd still see contention on bigger hardware, >>>> though, my laptop has oly 4 cores. I'll have to find a real server for >>>> the >>>> next round of testing. >>>> >>> >>> Yea, I think that's true. I can just about see ProcArrayLock contention >>> on my more powerful laptop, to see it really bad you need bigger >>> hardware / higher concurrency. >>> >> >> As soon as I sent my previous post, Vladimir Borodin kindly offered >> access to a 32-core server for performance testing. Thanks Vladimir! >> >> I installed Greg Smith's pgbench-tools kit on that server, and ran some >> tests. I'm seeing some benefit on "pgbench -N" workload, but only after >> modifying the test script to use "-M prepared", and using Unix domain >> sockets instead of TCP to connect. Apparently those things add enough >> overhead to mask out the little difference. >> >> Attached is a graph with the results. Full results are available at >> https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch >> improved throughput, measured in TPS, with >= 32 or so clients. The >> biggest difference was with 44 clients, which saw about 5% improvement. >> >> So, not phenomenal, but it's something. I suspect that with more cores, >> the difference would become more clear. >> >> Like on a cue, Alexander Korotkov just offered access to a 72-core >> system :-). Thanks! I'll run the same tests on that. >> > > And here are the results on the 72 core machine (thanks again, > Alexander!). The test setup was the same as on the 32-core machine, except > that I ran it with more clients since the system has more CPU cores. In > summary, in the best case, the patch increases throughput by about 10%. > That peak is with 64 clients. Interestingly, as the number of clients > increases further, the gain evaporates, and the CSN version actually > performs worse than unpatched master. I don't know why that is. One theory > that by eliminating one bottleneck, we're now hitting another bottleneck > which doesn't degrade as gracefully when there's contention. > Did you try to identify this second bottleneck with perf or something? It would be nice to also run pgbench -S. Also, it would be nice to check something like 10% of writes, 90% of reads (which is quite typical workload in real life I believe). -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: Partial sort
On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan wrote: > On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov > wrote: > > Hmm... I'm not completely agree with that. In typical usage partial sort > > should definitely use quicksort. However, fallback to other sort > methods is > > very useful. Decision of partial sort usage is made by planner. But > > planner makes mistakes. For example, our HashAggregate is purely > in-memory. > > In the case of planner mistake it causes OOM. I met such situation in > > production and not once. This is why I'd like partial sort to have > graceful > > degradation for such cases. > > I think that this should be moved to the next CF, unless a committer > wants to pick it up today. > Patch was rebased to current master. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company partial-sort-basic-9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Incremental sort
On Tue, Oct 3, 2017 at 2:52 PM, Robert Haas wrote: > On Mon, Oct 2, 2017 at 12:37 PM, Alexander Korotkov > wrote: > > I've applied patch on top of c12d570f and rerun the same benchmarks. > > CSV-file with results is attached. There is no dramatical changes. > There > > is still minority of performance regression cases while majority of cases > > has improvement. > > Yes, I think these results look pretty good. But are these times in > seconds? You might need to do some testing with bigger sorts. Good point. I'll rerun benchmarks with larger dataset size. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] 64-bit queryId?
On Tue, Oct 3, 2017 at 5:55 PM, Robert Haas wrote: > On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov > wrote: > > +1, > > I see 3 options there: > > 1) Drop high-order bit, as you proposed. > > 2) Allow negative queryIds. > > 3) Implement unsigned 64-type. > > > > #1 causes minor loss of precision which looks rather insignificant in > given > > context. > > #2 might be rather unexpected for users whose previously had non-negative > > queryIds. Changing queryId from 32-bit to 64-bit itself might require > some > > adoption from monitoring software. But queryIds are user-visible, and > > negative queryIds would look rather nonlogical. > > #3 would be attaching hard and long-term problem by insufficient reason. > > Thus, #1 looks like most harmless solution. > > I think we should just allow negative queryIds. I mean, the hash > functions have behaved that way for a long time: > > rhaas=# select hashtext(''); > hashtext > - > -1477818771 > (1 row) > > It seems silly to me to throw away a perfectly good bit from the hash > value just because of some risk of minor user confusion. I do like > Michael's suggestion of outputing hexa-like text, but changing the > types of the user-visible output columns seems like a job for another > patch. Similarly, if we were to adopt Andres's suggestions of a new > type or using numeric or Alexander's suggestion of implementing a > 64-bit unsigned type, I think it should be a separate patch from this > one. > > I would note that such a patch will actually create real > incompatibility -- extension upgrades might fail if there are > dependent views, for example -- whereas merely having query IDs start > to sometimes be negative only creates an incompatibility for people > who assumed that the int64 type wouldn't actually use its full range > of allowable values. I don't deny that someone may have done that, of > course, but I wouldn't guess that it's a common thing... maybe I'm > wrong. > BTW, you didn't comment Tom's suggestion about dropping high-order bit which trades minor user user confusion to minor loss of precision. TBH, for me it's not so important whether we allow negative queryIds or drop high-order bit. I would be anyway very good to have 64-(or 63-)bit queryIds committed. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Add TOAST to system tables with ACL?
Hi! This topic was already discussed (at least one time) in 2011. See [1] for details. I'd like to raise that again. Currently, we have table in system catalog with ACL, but without TOAST. Thus, it might happen that we can't fit new ACL item, because row becomes too large for in-line storage. You can easily reproduce this situation in psql. create table t (col int); \copy (select 'create user u' || i || ';' from generate_series(1,1) i) to 'script.sql' \i script.sql \copy (select 'grant all on t to u' || i || ';' from generate_series(1,1) i) to 'script.sql' \i script.sql Eventually GRANT statements start to raise error. psql:script.sql:2447: ERROR: row is too big: size 8168, maximum size 8160 I understand that we shouldn't endorse users to behave like this. We should rather advise them to evade adding too many ACL items to single object by using roles. And that would be way easier to manage too. However, current PostgreSQL behavior is rather unexpected and undocumented. Thus, I would say it's a bug. This bug would be nice to fix even if long ACL lists would work slower than normal. In the discussion to the post [1] Tom comments that he isn't excited about out-of-line ACL entry unless it's proven that performance doesn't completely tank in this case. I've done some basic experiments in this field on my laptop. Attached draft patch adds TOAST to all system catalog tables with ACL. I've run pgbench with custom script "SELECT * FROM t;" where t is empty table with long ACL entry. I've compared results with 1000 ACL items (in-line storage) and 1 ACL items (out-of-line storage). Also, I've notice performance degradation of GRANT statements themselves. 1000 GRANT statements are executed in 1.5 seconds while 1 GRANT statements are executed in 42 seconds. In average single GRANT statements becomes 2.8 times slower. That's significant degradation, but it doesn't seem to be fatal degradation for me. Results of pgbench are presented in following table. Number of ACL items | -M simple | -M prepared +---+- 1000 (in-line storage) | 6623 |7242 1 (out-of-line storage) | 14498 | 17827 So, it's 2.1-2.4 times degradation in this case. That's very significant degradation, but I wouldn't day that "performance completely tank". Any thoughts? Should we consider TOASTing ACL entries or should we give up with this? Links: 1. https://www.commandprompt.com/blog/grant_schema_usage_ to_2500_users_no_can_do/ -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company acl-toast-1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add TOAST to system tables with ACL?
On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane wrote: > Alexander Korotkov writes: > > This topic was already discussed (at least one time) in 2011. See [1] > for > > details. I'd like to raise that again. > > I'm a bit worried about adding a toast table to pg_class, and more so > about pg_database, because both of those have to be accessed in situations > where it's not clear that we could successfully fetch from a toast table, > because too little of the catalog access infrastructure is alive. > > pg_class is probably all right as long as only the ACL field could ever > get toasted, since it's unlikely that any low-level accesses would be > paying attention to that field anyway. > > For pg_database, you'd have to make sure that the startup-time check of > database CONNECT privilege still works if the ACL's been pushed out of > line. > Thank you for pointing. I'll check this. > Also, I've notice performance degradation of GRANT statements themselves. > > 1000 GRANT statements are executed in 1.5 seconds while 1 GRANT > > statements are executed in 42 seconds. In average single GRANT > statements > > becomes 2.8 times slower. That's significant degradation, but it doesn't > > seem to be fatal degradation for me. > > Seems all right, since we could just say "we don't really recommend that > usage pattern". > Yes, sure. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel
On Wed, Sep 13, 2017 at 10:57 AM, Ashutosh Sharma wrote: > On Wed, Sep 13, 2017 at 1:15 PM, Alexey Chernyshov > wrote: > > On Sat, 9 Sep 2017 13:53:35 +0530 > > Ashutosh Sharma wrote: > > > >> > >> Finally, i got some time to look into this patch and surprisingly I > >> didn't find any function returning information at page level instead > >> all the SQL functions are returning information at index level. > >> Therefore, i too feel that it should be either integrated with > >> pgstattuple which could a better match as Tomas also mentioned or may > >> be add a new contrib module itself. I think, adding a new contrib > >> module looks like a better option. The reason being, it doesn't simply > >> include the function for showing index level statistics (for e.g.. > >> index size, no of rows, values..e.t.c) like pgstattuple does but, > >> also, displays information contained in a page for e.g. the object > >> stored in the page, it's tid e.t.c. So, more or less, I would say > >> that, it's the mixture of pageinspect and pgstattuple module and > >> therefore, i feel, adding it as a new extension would be a better > >> choice. Thought? > > > > Thank you for your interest, I will add a new contrib module named, > > say, indexstat. I think we can add statistics on other indexes in the > > future. > > > > I think we should wait for experts opinion and then take a call. I am > not expert. I just gave my opinion as i have worked in this area > earlier when working for hash index. Thanks. I'm not sure that we should port these functions from gevel directly. We could try to re-implement similar functionality which fits pageinspect approach better. In particular, we can implement some low-level functions whose show detailed information at page level. And on top of them we can implement analogues of gevel functions in SQL or PL/pgSQL. Is it feasible? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Add TOAST to system tables with ACL?
On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane wrote: > For pg_database, you'd have to make sure that the startup-time check of > database CONNECT privilege still works if the ACL's been pushed out of > line. > I've tried this case. At first, make database temp with no connect privilege from public and 1 users. create database temp; revoke connect on database temp from public; \copy (select 'create user u' || i || ';' from generate_series(1,1) i) to 'script.sql' \i script.sql I've checked that user u1 can't login to database temp. $ psql temp -U u1 psql: FATAL: permission denied for database "temp" DETAIL: User does not have CONNECT privilege. Than I grant connect privilege to all that 1 users. \copy (select 'grant connect on database temp to u' || i || ';' from generate_series(1,1) i) to 'script.sql' \i script.sql Then user u1 can login successfully. $ psql temp -U u1 psql (11devel) Type "help" for help. u10000@temp=# Thus, in this simple case database CONNECT privilege works with out-of-line ACL for me. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: full merge join on comparison clause
As discussed earlier, I changed the way we work with mergeopfamilies. I use the "is_equality" flag to indicate whether the clause is an equality one, and fill mergeopfamilies for both equality and inequality operators. The updated patch is attached (rebased to 20b6552242). -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index 925b4cf553..73e6a4ca74 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -172,31 +172,32 @@ typedef enum * to the opfamily and collation, with nulls at the indicated end of the range. * This allows us to obtain the needed comparison function from the opfamily. */ -static MergeJoinClause +static void MJExamineQuals(List *mergeclauses, Oid *mergefamilies, Oid *mergecollations, int *mergestrategies, bool *mergenullsfirst, - PlanState *parent) + MergeJoinState *parent) { - MergeJoinClause clauses; int nClauses = list_length(mergeclauses); int iClause; ListCell *cl; - clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); + parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); + parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool)); + parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool)); iClause = 0; foreach(cl, mergeclauses) { OpExpr *qual = (OpExpr *) lfirst(cl); - MergeJoinClause clause = &clauses[iClause]; + MergeJoinClause clause = &parent->mj_Clauses[iClause]; Oid opfamily = mergefamilies[iClause]; Oid collation = mergecollations[iClause]; - StrategyNumber opstrategy = mergestrategies[iClause]; + StrategyNumber sort_op_strategy = mergestrategies[iClause]; bool nulls_first = mergenullsfirst[iClause]; - int op_strategy; + int join_op_strategy; Oid op_lefttype; Oid op_righttype; Oid sortfunc; @@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses, /* * Prepare the input expressions for execution. */ - clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent); - clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent); + clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent); + clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent); /* Set up sort support data */ clause->ssup.ssup_cxt = CurrentMemoryContext; clause->ssup.ssup_collation = collation; - if (opstrategy == BTLessStrategyNumber) + if (sort_op_strategy == BTLessStrategyNumber) clause->ssup.ssup_reverse = false; - else if (opstrategy == BTGreaterStrategyNumber) + else if (sort_op_strategy == BTGreaterStrategyNumber) clause->ssup.ssup_reverse = true; else /* planner screwed up */ - elog(ERROR, "unsupported mergejoin strategy %d", opstrategy); + elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy); clause->ssup.ssup_nulls_first = nulls_first; /* Extract the operator's declared left/right datatypes */ get_op_opfamily_properties(qual->opno, opfamily, false, - &op_strategy, + &join_op_strategy, &op_lefttype, &op_righttype); - if (op_strategy != BTEqualStrategyNumber) /* should not happen */ - elog(ERROR, "cannot merge using non-equality operator %u", - qual->opno); + + /* + * Determine whether we accept lesser and/or equal tuples of the inner + * relation. + */ + switch (join_op_strategy) + { + case BTEqualStrategyNumber: +parent->mj_UseEqual[iClause] = true; +break; + + case BTLessEqualStrategyNumber: +parent->mj_UseEqual[iClause] = true; +/* fall through */ + + case BTLessStrategyNumber: +parent->mj_UseLesser[iClause] = true; +break; + + case BTGreaterEqualStrategyNumber: +parent->mj_UseEqual[iClause] = true; +/* fall through */ + + case BTGreaterStrategyNumber: +parent->mj_UseLesser[iClause] = true; +break; + + default: +elog(ERROR, "unsupported join strategy %d", join_op_strategy); + } /* * sortsupport routine must know if abbreviation optimization is @@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses, iClause++; } - - return clauses; } /* @@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) return result; } +/* Tuple comparison result */ +typedef enum +{ + MJCR_NextInner = 1, + MJCR_NextOuter = -1, + MJCR_Join = 0 +} MJCompareResult; + /* * MJCompare * @@ -388,10 +422,10 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) * MJEvalOuterValues and MJEvalInnerValues must already have been called * for the current outer and inner tuples, re
Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
Analysis: The estimated value of the lossy_pages is way higher than its actual value and reason is that the total_pages calculated by the "Mackert and Lohman formula" is not correct. I think the problem might be that the total_pages includes cache effects and rescans. For bitmap entries we should use something like relation pages * selectivity. Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2 workers, SSD storage. It shows significant improvement on 4MB work_mem and no change on 2GB. Here are the results (execution time in seconds, average of 5 runs): work_mem4MB2GB Query masterpatchmasterpatch 4179174168167 5190168155156 628087227229 8197114179172 10269227190192 14110108106 105 -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate Locks for writes?
On Sat, Oct 7, 2017 at 2:26 PM, Simon Riggs wrote: > SERIALIZABLE looks for chains of rw cases. > > When we perform UPDATEs and DELETEs we search for rows and then modify > them. The current implementation views that as a read followed by a > write because we issue PredicateLockTuple() during the index fetch. > > Is it correct that a statement that only changes data will add > predicate locks for the rows that it modifies? > I'm not very aware of how this SI code exactly works. But it seems that once we update row, read SIReadLock on that tuple is released, because we're already holding stronger lock. I made simple experiment. # begin; BEGIN Time: 0,456 ms smagen@postgres=*# select * from test where id = 1; id │ val ┼─ 1 │ xyz (1 row) *# select locktype, database, relation, page, tuple, mode, granted from pg_locks where pid = pg_backend_pid(); locktype │ database │ relation │ page │ tuple │ mode │ granted ┼──┼──┼──┼───┼─┼─ relation │12558 │11577 │ NULL │ NULL │ AccessShareLock │ t relation │12558 │65545 │ NULL │ NULL │ AccessShareLock │ t relation │12558 │65538 │ NULL │ NULL │ AccessShareLock │ t virtualxid │ NULL │ NULL │ NULL │ NULL │ ExclusiveLock │ t page │12558 │65545 │1 │ NULL │ SIReadLock │ t tuple │12558 │65538 │0 │ 3 │ SIReadLock │ t (6 rows) *# update test set val = 'xyz' where id = 1; *# select locktype, database, relation, page, tuple, mode, granted from pg_locks where pid = pg_backend_pid(); locktype│ database │ relation │ page │ tuple │ mode │ granted ───┼──┼──┼──┼───┼──┼─ relation │12558 │11577 │ NULL │ NULL │ AccessShareLock │ t relation │12558 │65545 │ NULL │ NULL │ AccessShareLock │ t relation │12558 │65545 │ NULL │ NULL │ RowExclusiveLock │ t relation │12558 │65538 │ NULL │ NULL │ AccessShareLock │ t relation │12558 │65538 │ NULL │ NULL │ RowExclusiveLock │ t virtualxid│ NULL │ NULL │ NULL │ NULL │ ExclusiveLock│ t transactionid │ NULL │ NULL │ NULL │ NULL │ ExclusiveLock│ t page │12558 │65545 │1 │ NULL │ SIReadLock │ t (8 rows) Once we update the same tuple that we read before, SIReadLock on that tuple disappears. PredicateLockTuple() specifically avoids adding an SIRead lock if the > tuple already has a write lock on it, so surely it must also be > correct to skip the SIRead lock if we are just about to update the > row? > > I am suggesting that we ignore PredicateLockTuple() for cases where we > are about to update or delete the found tuple. > If my thoughts above are correct, than it would be a reasonable optimization. We would skip cycle of getting SIReadLock on tuple and then releasing it, without any change of behavior. > ISTM that a Before Row Trigger would need to add an SIRead lock since > that is clearly a read. > Yes, because depending on result of "Before Row Trigger" update might not really happen. That SIReadLock on tuple is necessary. However, ISTM that we could place SIReadLock on tuple after Before Row Trigger and only in the case when trigger has cancelled an update. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
On Thu, Oct 5, 2017 at 9:48 PM, Shubham Barai wrote: > On 3 October 2017 at 00:32, Alexander Korotkov > wrote: > >> On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin >> wrote: >> >>> On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov >>> wrote: >>> > What happen if exactly this "continue" fires? >>> > >>> >> if (GistFollowRight(stack->page)) >>> >> { >>> >> if (!xlocked) >>> >> { >>> >> LockBuffer(stack->buffer, GIST_UNLOCK); >>> >> LockBuffer(stack->buffer, GIST_EXCLUSIVE); >>> >> xlocked = true; >>> >> /* someone might've completed the split when we unlocked */ >>> >> if (!GistFollowRight(stack->page)) >>> >> continue; >>> > >>> > >>> > In this case we might get xlocked == true without calling >>> > CheckForSerializableConflictIn(). >>> Indeed! I've overlooked it. I'm remembering this issue, we were >>> considering not fixing splits if in Serializable isolation, but >>> dropped the idea. >>> >> >> Yeah, current insert algorithm assumes that split must be fixed before we >> can correctly traverse the tree downwards. >> >> >>> CheckForSerializableConflictIn() must be after every exclusive lock. >>> >> >> I'm not sure, that fixing split is the case to necessary call >> CheckForSerializableConflictIn(). This lock on leaf page is not taken >> to do modification of the page. It's just taken to ensure that nobody else >> is fixing this split the same this. After fixing the split, it might >> appear that insert would go to another page. >> >> > I think it would be rather safe and easy for understanding to more >>> > CheckForSerializableConflictIn() directly before gistinserttuple(). >>> The difference is that after lock we have conditions to change page, >>> and before gistinserttuple() we have actual intent to change page. >>> >>> From the point of future development first version is better (if some >>> new calls occasionally spawn in), but it has 3 calls while your >>> proposal have 2 calls. >>> It seems to me that CheckForSerializableConflictIn() before >>> gistinserttuple() is better as for now. >>> >> >> Agree. >> > > I have updated the location of CheckForSerializableConflictIn() and > changed the status of the patch to "needs review". > Now, ITSM that predicate locks and conflict checks are placed right for now. However, it would be good to add couple comments to gistdoinsert() whose would state why do we call CheckForSerializableConflictIn() in these particular places. I also take a look at isolation tests. You made two separate test specs: one to verify that serialization failures do fire, and another to check there are no false positives. I wonder if we could merge this two test specs into one, but use more variety of statements with different keys for both inserts and selects. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Pluggable storage
On Wed, Sep 27, 2017 at 7:51 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > I took a look on this patch. I've following notes for now. > > tuple_insert_hook tuple_insert; /* heap_insert */ >> tuple_update_hook tuple_update; /* heap_update */ >> tuple_delete_hook tuple_delete; /* heap_delete */ > > > I don't think this set of functions provides good enough level of > abstraction for storage AM. This functions encapsulate only low-level work > of insert/update/delete tuples into heap itself. However, it still assumed > that indexes are managed outside of storage AM. I don't think this is > right, assuming that one of most demanded storage API usage would be > different MVCC implementation (for instance, UNDO log based as discussed > upthread). Different MVCC implementation is likely going to manage indexes > in a different way. For example, storage AM utilizing UNDO would implement > in-place update even when indexed columns are modified. Therefore this > piece of code in ExecUpdate() wouldn't be relevant anymore. > > /* >> * insert index entries for tuple >> * >> * Note: heap_update returns the tid (location) of the new tuple in >> * the t_self field. >> * >> * If it's a HOT update, we mustn't insert new index entries. >> */ >> if ((resultRelInfo->ri_NumIndices > 0) && >> !storage_tuple_is_heaponly(resultRelationDesc, >> tuple)) >> recheckIndexes = ExecInsertIndexTuples(slot, &(slot->tts_tid), >> estate, false, NULL, NIL); > > > I'm firmly convinced that this logic should be encapsulated into storage > AM altogether with inserting new index tuples on storage insert. Also, HOT > should be completely encapsulated into heapam. It's quite evident for me > that storage utilizing UNDO wouldn't have anything like our current HOT. > Therefore, I think there shouldn't be hot_search_buffer() API function. > tuple_fetch() may cover hot_search_buffer(). That might require some > signature change of tuple_fetch() (probably, extra arguments). > For me, it's crucial point that pluggable storages should be able to have different MVCC implementation, and correspondingly have full control over its interactions with indexes. Thus, it would be good if we would get consensus on that point. I'd like other discussion participants to comment whether they agree/disagree and why. Any comments? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company