Re: Should the docs have a warning about pg_stat_reset()?
On 27/03/2019 22:28, Peter Eisentraut wrote: > On 2019-03-26 16:28, Euler Taveira wrote: >> I don't remember why we didn't consider table without stats to be >> ANALYZEd. Isn't it the case to fix autovacuum? Analyze >> autovacuum_count + vacuum_count = 0? > > When the autovacuum system was introduced, we didn't have those columns. > But now it seems to make sense that a table with autoanalyze_count + > analyze_count = 0 should be a candidate for autovacuum even if the write > statistics are zero. Obviously, this would have the effect that a > pg_stat_reset() causes an immediate autovacuum for all tables, so maybe > it's not quite that simple. Not just pg_stat_reset() but also on promotion. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: PostgreSQL pollutes the file system
Please don't top post. It makes it unnecessarily difficult to follow the discussion. See https://wiki.postgresql.org/wiki/Mailing_Lists On Fri, Apr 12, 2019 at 08:56:35PM +0200, Fred .Flintstone wrote: So there is no regression potential. I fail to understand how you came to this conclusion? Andreas pointed out Debian already uses pg_createcluster, so there clearly is potential for conflict and a regression. When and who can send the patch to rename the programs to carry the pg_ prefixes, and create symlinks from the old names? Well, presumably that would be you, sometime in the future? TBH I don't quite understand what are we trying to achieve in this thread. It started with the presumption that PostgreSQL "pollutes" the filesystem with scripts/binaries - which may or may not be true, but for the sake of the argument let's assume that it is. How does keeping the original stuff and adding symblinks improve the situation? On Fri, Apr 12, 2019 at 5:19 PM Andreas Karlsson wrote: On 4/12/19 5:14 PM, Chris Travers wrote: > 1. naming things is surprisingly hard. How sure are we that we can do > this right? Can we come up with a correct name for initdb? Maybe > pg_createcluster? The Debian packagers already use pg_createcluster for their script which wraps initdb, and while pg_initdb is a bit misleading (it creates a cluster rather than a database) it is not that bad. Andreas regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PostgreSQL pollutes the file system
On Sat, Apr 13, 2019 at 3:36 PM Tomas Vondra wrote: > On Fri, Apr 12, 2019 at 08:56:35PM +0200, Fred .Flintstone wrote: > >So there is no regression potential. > > > > I fail to understand how you came to this conclusion? Andreas pointed > out Debian already uses pg_createcluster, so there clearly is potential > for conflict and a regression. But there is no "createcluster" in PostgreSQL so that is not a problem. I don't know if there is any other third-party programs that carry the pg_ prefix though. > >When and who can send the patch to rename the programs to carry the > >pg_ prefixes, and create symlinks from the old names? > > > > Well, presumably that would be you, sometime in the future? It would be better if someone with more experienced than me did it. > TBH I don't quite understand what are we trying to achieve in this > thread. It started with the presumption that PostgreSQL "pollutes" the > filesystem with scripts/binaries - which may or may not be true, but for > the sake of the argument let's assume that it is. How does keeping the > original stuff and adding symblinks improve the situation? It would ease in discoverability and make the user space more coherent and predictable which would make it easier to use. It would also allow to move the symlinks into an optional package or remove them in the future.
Re: Useless code in RelationCacheInitializePhase3
Andres Freund writes: > On 2019-04-12 14:17:11 -0400, Tom Lane wrote: >> While looking at the pending patch to clean up management of >> rd_partcheck, I noticed that RelationCacheInitializePhase3 has code that >> purports to reload rd_partkey and rd_partdesc, but none for rd_partcheck. >> However, that reload code is dead code, as is easily confirmed by >> checking the code coverage report, because we have no partitioned system >> catalogs. >> >> Moreover, if somebody tried to add such a catalog, I'd bet a good deal >> of money that this code would not work. It seems highly unlikely that >> we could run RelationBuildPartitionKey or RelationBuildPartitionDesc >> successfully when we haven't even finished bootstrapping the relcache. > But it sure would be nice if we made it work at some point. Whether it would be nice or not is irrelevant to my point: this code doesn't work, and it's unlikely that it would ever be part of a working solution. I don't think there's any way that it'd be sane to attempt catalog accesses during RelationCacheInitializePhase3. If we want any of these features for system catalogs, I think the route to a real fix would be to make them load-on-demand data so that they can be fetched later on. Or, possibly, the easiest way is to include these data structures in the dumped cache file. But what's here is a dead end. I'd even call it an attractive nuisance, because it encourages people to add yet more nonfunctional code, rather than pointing them in the direction of doing something useful. >> I am less sure about whether the table-access-method stanza is as silly >> as the rest, but I do see that it's unreached in current testing. >> So I wonder whether there is any thought that we'd realistically support >> catalogs with nondefault AMs, and if there is, does anyone think that >> this code would work? > Right now it definitely won't work, Sure, I wasn't expecting that. The question is the same as above: is it plausible that this code would appear in this form in a complete working implementation? If not, I think we should rip it out rather than leave the impression that we think it does something useful. > I think it probably would work for catalog tables, as it's coded right > now. There's no catalog lookups RelationInitTableAccessMethod() for > tables that return true for IsCatalogTable(). In fact, I think we should > apply something like: Makes sense, and I'd also add some comments pointing out that there had better not be any catalog lookups when this is called for a system catalog. regards, tom lane
Re: Useless code in RelationCacheInitializePhase3
I wrote: > Whether it would be nice or not is irrelevant to my point: this code > doesn't work, and it's unlikely that it would ever be part of a working > solution. I don't think there's any way that it'd be sane to attempt > catalog accesses during RelationCacheInitializePhase3. BTW, to clarify that: obviously, this loop *does* access pg_class, and pg_class's indexes too. The issue here is that if any of these other stanzas ever really executed, we would be doing accesses to a bunch of other catalogs as well, meaning that their relcache entries would have to already exist in a state valid enough to permit access. That would mean that they'd have to be treated as bootstrap catalogs so that we could create hardwired entries with formrdesc. That's not a direction I want to go in. Bootstrap catalogs are a huge pain to maintain; we don't want any more than the absolute minimum of them. regards, tom lane
Re: hyrax vs. RelationBuildPartitionDesc
Thanks for the updated patch. On Sat, Apr 13, 2019 at 4:47 AM Tom Lane wrote: > Robert Haas writes: > > As a parenthetical note, I observe that relcache.c seems to know > > almost nothing about rd_partcheck. rd_partkey and rd_partdesc both > > have handling in RelationClearRelation(), but rd_partcheck does not, > > and I suspect that's wrong. So the problems are probably not confined > > to the relcache-drop-time problem that you observed. > > I concluded that that's not parenthetical but pretty relevant... Hmm, do you mean we should grow the same "keep_" logic for rd_partcheck as we have for rd_partkey and rd_partdesc? I don't see it in the updated patch though. > Attached is a revised version of Amit's patch at [1] that makes these > data structures be treated more similarly. I also added some Asserts > and comment improvements to address the complaints I made upthread about > under-documentation of all this logic. Thanks for all the improvements. > I also cleaned up the problem the code had with failing to distinguish > "partcheck list is NIL" from "partcheck list hasn't been computed yet". > For a relation with no such constraints, generate_partition_qual would do > the full pushups every time. Actually, callers must have checked that the table is a partition (relispartition). It wouldn't be a bad idea to add an Assert or elog in generate_partition_qual. > I'm not sure if the case actually occurs > commonly enough that that's a performance problem, but failure to account > for it made my added assertions fall over :-( and I thought fixing it > was better than weakening the assertions. Hmm, I wonder why the Asserts failed given what I said above. > I haven't made back-patch versions yet. I'd expect they could be > substantially the same, except the two new fields have to go at the > end of struct RelationData to avoid ABI breaks. To save you the pain of finding the right files to patch in back-branches, I made those (attached). Thanks, Amit pg10-fix-rd_partcheck-management-2.patch Description: Binary data pg11-fix-rd_partcheck-management-2.patch Description: Binary data
Re: hyrax vs. RelationBuildPartitionDesc
Amit Langote writes: > On Sat, Apr 13, 2019 at 4:47 AM Tom Lane wrote: >> I concluded that that's not parenthetical but pretty relevant... > Hmm, do you mean we should grow the same "keep_" logic for > rd_partcheck as we have for rd_partkey and rd_partdesc? I don't see > it in the updated patch though. No, the "keep_" stuff is only necessary when we're trying to preserve data structures in-place, which is only important if non-relcache code might be using pointers to it. Since rd_partcheck is never directly accessed by external code, only copied, there can't be any live pointers to it to worry about. Besides which, since it's load on demand rather than something that RelationBuildDesc forces to be valid immediately, any comparison would be guaranteed to fail: the field in the new reldesc will always be empty at this point. Perhaps there's an argument that it should be load-immediately not load-on-demand, but that would be an optimization not a bug fix, and I'm skeptical that it'd be an improvement anyway. Probably this is something to revisit whenever somebody gets around to addressing the whole copy-vs-dont-copy-vs-use-a-refcount business that we were handwaving about upthread. >> I also cleaned up the problem the code had with failing to distinguish >> "partcheck list is NIL" from "partcheck list hasn't been computed yet". >> For a relation with no such constraints, generate_partition_qual would do >> the full pushups every time. > Actually, callers must have checked that the table is a partition > (relispartition). That does not mean that it's guaranteed to have any partcheck constraints; there are counterexamples in the regression tests. It looks like the main case is a LIST-partitioned table that has only a default partition. regards, tom lane
Re: PostgreSQL pollutes the file system
Andreas Karlsson wrote: > The Debian packagers already use pg_createcluster for their script which > wraps initdb, and while pg_initdb is a bit misleading (it creates a > cluster rather than a database) it is not that bad. But that renaming wouldn't achieve anything in relation to the stated goal, since initdb is not in the $PATH in Debian/Ubuntu systems. It's part of the version-specific binaries located in /usr/lib/postgresql/$VERSION/bin, which are not meant to be called directly, but by the pg_*cluster* scripts that you mention, or pg_wrapper. Moreover, aside from package-specific issues, initdb can already be invoked through "pg_ctl initdb" since 2010 and version 9.0, as mentioned in: https://www.postgresql.org/docs/9.0/app-initdb.html This evolution was the result of discussions pretty much like the present thread. 9 years later, who bothers to use or recommend the new form? AFAICS nobody cares. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: hyrax vs. RelationBuildPartitionDesc
Amit Langote writes: > To save you the pain of finding the right files to patch in > back-branches, I made those (attached). BTW, as far as that goes: I'm of the opinion that the partitioning logic's factorization in this area is pretty awful, and v12 has made it worse not better. It's important IMO that there be a clear distinction between code that belongs to/can manipulate the relcache, and outside code that ought at most to examine it (and maybe not even that, depending on where we come down on this copy-vs-refcount business). Maybe allowing utils/cache/partcache.c to be effectively halfway inside the relcache module is tolerable, but I don't think it's great design. Allowing files over in partitioning/ to also have functions inside that boundary is really not good, especially when you can't even say that all of partdesc.c is part of relcache. I"m seriously inclined to put RelationBuildPartitionDesc back where it was in v11. regards, tom lane
Re: Checksum errors in pg_stat_database
I started looking at this the other night but I see Magnus beat me in committing it... On Fri, Apr 12, 2019 at 8:18 AM Magnus Hagander wrote: > On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud wrote: >> Thanks for looking it it! >> On Sun, Apr 7, 2019 at 4:36 PM Magnus Hagander wrote: >> > >> > I'm not sure I like the idea of using "" as the database >> > name. It's not very likely that somebody would be using that as a name for >> > their database, but i's not impossible. But it also just looks strrange. >> > Wouldn't NULL be a more appropriate choice? >> > >> > Likewise, shouldn't we return NULL as the number of backends for the >> > shared counters, rather than 0? >> I wanted to make things more POLA-compliant, but maybe it was a bad >> idea. I changed it for NULL here and for numbackends. >> ISTM the argument here is go with zero since you have zero connections vs go with null since you can't actually connect, so it doesn't make sense. (There is a third argument about making it -1 since you can't connect, but that breaks sum(numbackends) so it's easily dismissed.) I think I would have gone for 0 personally, but what ended up surprising me was that a bunch of other stuff like xact_commit show zero when AFAICT the above reasoning would apply the same to those columns. (unless there is a way to commit a transaction in the global objects that I don't know about). >> > Micro-nit: >> > + Time at which the last data page checksum failures was >> > detected in >> > s/failures/failure/ >> >> Oops. >> >> v5 attached. > What originally got me looking at this was the idea of returning -1 (or maybe null) for checksum failures for cases when checksums are not enabled. This seems a little more complicated to set up, but seems like it might ward off people thinking they are safe due to no checksum error reports when they actually aren't. Robert Treat https://xzilla.net
Re: Berserk Autovacuum (let's save next Mandrill)
On Thu, Apr 11, 2019 at 11:25:29AM +0200, Chris Travers wrote: On Wed, Apr 10, 2019 at 5:21 PM Andres Freund wrote: Hi, On April 10, 2019 8:13:06 AM PDT, Alvaro Herrera wrote: >On 2019-Mar-31, Darafei "Komяpa" Praliaskouski wrote: > >> Alternative point of "if your database is super large and actively >written, >> you may want to set autovacuum_freeze_max_age to even smaller values >so >> that autovacuum load is more evenly spread over time" may be needed. > >I don't think it's helpful to force emergency vacuuming more >frequently; >quite the contrary, it's likely to cause even more issues. We should >tweak autovacuum to perform freezing more preemtively instead. I still think the fundamental issue with making vacuum less painful is that the all indexes have to be read entirely. Even if there's not much work (say millions of rows frozen, hundreds removed). Without that issue we could vacuum much more frequently. And do it properly in insert only workloads. So I see a couple of issues here and wondering what the best approach is. The first is to just skip lazy_cleanup_index if no rows were removed. Is this the approach you have in mind? Or is that insufficient? I don't think that's what Andres had in mind, as he explicitly mentioned removed rows. So just skipping lazy_cleanup_index when there were no deleted would not help in that case. What I think we could do is simply leave the tuple pointers in the table (and indexes) when there are only very few of them, and only do the expensive table/index cleanup once there's anough of them. The second approach would be to replace the whole idea of this patch with a lazy freeze worker which would basically periodically do a vacuum freeze on relations matching certain criteria. This could have a lower max workers than autovacuum and therefore less of a threat in terms of total IO usage. Thoughts? Not sure. I find it rather difficult to manage more and more different types of cleanup workers. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Commit message / hash in commitfest page.
On Thu, Apr 11, 2019 at 02:55:10PM +0500, Ibrar Ahmed wrote: On Thu, Apr 11, 2019 at 2:44 PM Erikjan Rijkers wrote: On 2019-04-11 11:36, Ibrar Ahmed wrote: > Hi, > > Is it possible to have commit-message or at least git hash in > commitfest. It will be very easy to track commit against commitfest > item. > Commitfest items always point to discussion threads. These threads often end with a message that says that the patch is pushed. IMHO, that message would be the place to include the commithash. It would also be easily findable via the commitfest application. +1 I think it might be useful to actually have that directly in the CF app, not just in the thread. There would need to a way to enter multiple hashes, because patches often have multiple pieces. But maybe that'd be too much unnecessary burden. I don't remember when I last needed this information. And I'd probably try searching in git log first anyway. regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Commit message / hash in commitfest page.
Tomas Vondra writes: > On Thu, Apr 11, 2019 at 02:55:10PM +0500, Ibrar Ahmed wrote: >> On Thu, Apr 11, 2019 at 2:44 PM Erikjan Rijkers wrote: Is it possible to have commit-message or at least git hash in commitfest. It will be very easy to track commit against commitfest item. >>> Commitfest items always point to discussion threads. These threads often >>> end with a message that says that the patch is pushed. IMHO, that >>> message would be the place to include the commithash. It would also be >>> easily findable via the commitfest application. > I think it might be useful to actually have that directly in the CF app, > not just in the thread. There would need to a way to enter multiple > hashes, because patches often have multiple pieces. > But maybe that'd be too much unnecessary burden. I don't remember when I > last needed this information. And I'd probably try searching in git log > first anyway. Yeah, I can't see committers bothering to do this. Including the discussion thread link in the commit message is already pretty significant hassle, and something not everybody remembers/bothers with. But ... maybe it could be automated? A bot looking at the commit log could probably suck out the thread links and try to match them up to CF entries. Likely you could get about 90% right even without that, just by matching the committer's name and the time of commit vs time of CF entry closure. regards, tom lane
psql - add SHOW_ALL_RESULTS option
Hello devs, The attached patch implements a new SHOW_ALL_RESULTS option for psql, which shows all results of a combined query (\;) instead of only the last one. This solves a frustration that intermediate results were hidden from view for no good reason that I could think of. For that, call PQsendQuery instead of (mostly not documented) PQexec, and rework how results are processed afterwards. Timing is moved to ProcessQueryResults to keep the last result handling out of the measured time. I think it would not be a big deal to include it, but this is the previous behavior. In passing, refactor a little and add comments. Make function names about results plural or singular consistently with the fact the it processes one or several results. Change "PrintQueryResult" to "HandleQueryResult" because it was not always printing something. Also add a HandleCopyResult function, which makes the patch a little bigger by moving things around but clarifies the code. Code in "common.c" is actually a little shorter than the previous version. From my point of view the code is clearer than before because there is only one loop over results, not an implicit one within PQexec and another one afterwards to handle copy. Add a few tests for the new feature. IMHO this new setting should be on by default: few people know about \; so it would not change anything for most, and I do not see why those who use it would not be interested by the results of all the queries they asked for. -- Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 636df6c0ec..0633117c1b 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3915,6 +3915,17 @@ bar + +SHOW_ALL_RESULTS + + +When this variable is set to on, all results of a combined +(\;) query are shown instead of just the last one. +Default is off. + + + + SHOW_CONTEXT diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index bd284446f8..928f117a63 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -999,199 +999,113 @@ loop_exit: return success; } - /* - * ProcessResult: utility function for use by SendQuery() only - * - * When our command string contained a COPY FROM STDIN or COPY TO STDOUT, - * PQexec() has stopped at the PGresult associated with the first such - * command. In that event, we'll marshal data for the COPY and then cycle - * through any subsequent PGresult objects. - * - * When the command string contained no such COPY command, this function - * degenerates to an AcceptResult() call. - * - * Changes its argument to point to the last PGresult of the command string, - * or NULL if that result was for a COPY TO STDOUT. (Returning NULL prevents - * the command status from being printed, which we want in that case so that - * the status line doesn't get taken as part of the COPY data.) - * - * Returns true on complete success, false otherwise. Possible failure modes - * include purely client-side problems; check the transaction status for the - * server-side opinion. + * Marshal the COPY data. Either subroutine will get the + * connection out of its COPY state, then call PQresultStatus() + * once and report any error. + * + * For COPY OUT, direct the output to pset.copyStream if it's set, + * otherwise to pset.gfname if it's set, otherwise to queryFout. + * For COPY IN, use pset.copyStream as data source if it's set, + * otherwise cur_cmd_source. + * + * Update result if further processing is necessary. (Returning NULL + * prevents the command status from being printed, which we want in that + * case so that the status line doesn't get taken as part of the COPY data.) */ static bool -ProcessResult(PGresult **results) +HandleCopyResult(PGresult **result) { bool success = true; - bool first_cycle = true; + FILE *copystream; + PGresult *copy_result; + ExecStatusType result_status = PQresultStatus(*result); - for (;;) + Assert(result_status == PGRES_COPY_OUT || + result_status == PGRES_COPY_IN); + + SetCancelConn(); + if (result_status == PGRES_COPY_OUT) { - ExecStatusType result_status; - bool is_copy; - PGresult *next_result; + bool need_close = false; + bool is_pipe = false; - if (!AcceptResult(*results)) + if (pset.copyStream) { - /* - * Failure at this point is always a server-side failure or a - * failure to submit the command string. Either way, we're - * finished with this command string. - */ - success = false; - break; + /* invoked by \copy */ + copystream = pset.copyStream; } - - result_status = PQresultStatus(*results); - switch (result_status) + else if (pset.gfname) { - case PGRES_EMPTY_QUERY: - case PGRES_COMMAND_OK: - case PGRES_TUPLES_OK: -is_copy = false; -break; - - case PGRES_COPY_OUT: - case PGRES_COPY_IN: -is_copy = true;
Re: Should the docs have a warning about pg_stat_reset()?
On Wed, Apr 10, 2019 at 08:09:34PM -0300, Euler Taveira wrote: Em qua, 10 de abr de 2019 às 16:33, Alvaro Herrera escreveu: On 2019-Apr-10, Bruce Momjian wrote: > On Thu, Apr 11, 2019 at 04:14:11AM +1200, David Rowley wrote: > > I still think we should start with a warning about pg_stat_reset(). > > People are surprised by this, and these are just the ones who notice: > > > > https://www.postgresql.org/message-id/CAB_myF4sZpxNXdb-x=welpqbdou6ue8fhtm0fverpm-1j7p...@mail.gmail.com > > > > I imagine there are many others just suffering from bloat due to > > auto-vacuum not knowing how many dead tuples there are in the tables. > > OK, let me step back. Why are people resetting the statistics > regularly? Based on that purpose, does it make sense to clear the > stats that effect autovacuum? I agree that we should research that angle. IMO resetting stats should be seriously frowned upon. And if they do need to reset some counters for some valid reason, offer a mechanism that leaves the autovac- guiding counters alone. Then you have to change the way pg_stat_reset() works (it currently removes the hash tables). Even pg_stat_reset_single_table_counters() could cause trouble although it is in a smaller proportion. Reset statistics leaves autovacuum state machine in an invalid state. Since reset statistic is a rare situation (at least I don't know monitoring tools or practices that regularly execute those functions), would it be worth adding complexity to pg_stat_reset* functions? autovacuum could handle those rare cases just fine. Yeah, resetting most of the stats but keeping a couple of old values around is going to do more harm than good. Even resetting stats for a single object is annoying when you have to analyze the data, making it even more granular by keeping some fields is just complicating it further ... The main reason why people do this is that we only provide cumulative counters, so if you need to monitor how it changed in a given time period (last hour, day, ...) you need to compute the delta somehow. And just resetting the stats is the easiest way to achieve that. +1 to have a warning about this, and maybe we should point people to tools regularly snapshotting the statistics and computing the deltas for them. There's a couple of specialized ones, but even widely available monitoring tools will do that. If only we had a way to regularly snapshot the data from within the database, and then compute the deltas on that. If only we could insert data from one table into another one a then do some analysics on it, with like small windows moving over the data or something ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Commit message / hash in commitfest page.
On 04/13/19 15:56, Tomas Vondra wrote: > I think it might be useful to actually have that directly in the CF app, > not just in the thread. There would need to a way to enter multiple > hashes, because patches often have multiple pieces. The CF app already recognizes (some) attachments in the email thread and makes them directly clickable from the CF entry page. Could it do that with commit hashes, if found in the body of an email thread? Gitweb does that pretty successfully with commits mentioned in commit messages, and github does it automagically for text in issues and so on. Maybe it could even recognize phrases like "commit 01deadbeef closes cf entry" and change the cf entry state, though that'd be gravy. Regards, -Chap
Re: Commit message / hash in commitfest page.
On Sat, Apr 13, 2019 at 04:27:56PM -0400, Tom Lane wrote: > Tomas Vondra writes: > > On Thu, Apr 11, 2019 at 02:55:10PM +0500, Ibrar Ahmed wrote: > >> On Thu, Apr 11, 2019 at 2:44 PM Erikjan Rijkers wrote: > Is it possible to have commit-message or at least git hash in > commitfest. It will be very easy to track commit against commitfest > item. > > >>> Commitfest items always point to discussion threads. These threads often > >>> end with a message that says that the patch is pushed. IMHO, that > >>> message would be the place to include the commithash. It would also be > >>> easily findable via the commitfest application. > > > I think it might be useful to actually have that directly in the CF app, > > not just in the thread. There would need to a way to enter multiple > > hashes, because patches often have multiple pieces. > > > But maybe that'd be too much unnecessary burden. I don't remember when I > > last needed this information. And I'd probably try searching in git log > > first anyway. > > Yeah, I can't see committers bothering to do this. Including the > discussion thread link in the commit message is already pretty > significant hassle, and something not everybody remembers/bothers with. > > But ... maybe it could be automated? A bot looking at the commit log > could probably suck out the thread links and try to match them up > to CF entries. Likely you could get about 90% right even without that, > just by matching the committer's name and the time of commit vs time > of CF entry closure. I've been getting a lot of lift out of the git_fdw (well, out of caching it, as performance isn't great yet) for constructing the PostgreSQL Weekly News section on things already committed. About 3.5% of commits (as of last week) on master are within a minute of each other, so grabbing a window two minutes wide would work even if we didn't have the committer's name in hand, it's unlikely to produce more than one result. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Zedstore - compressed in-core columnar storage
On Thu, Apr 11, 2019 at 6:06 AM Rafia Sabih wrote: > Reading about it reminds me of this work -- TAG column storage( > http://www09.sigmod.org/sigmod/record/issues/0703/03.article-graefe.pdf ). > Isn't this storage system inspired from there, with TID as the TAG? > > It is not referenced here so made me wonder. I don't think they're particularly similar, because that paper describes an architecture based on using purely logical row identifiers, which is not what a TID is. TID is a hybrid physical/logical identifier, sometimes called a "physiological" identifier, which will have significant overhead. Ashwin said that ZedStore TIDs are logical identifiers, but I don't see how that's compatible with a hybrid row/column design (unless you map heap TID to logical row identifier using a separate B-Tree). The big idea with Graefe's TAG design is that there is practically no storage overhead for these logical identifiers, because each entry's identifier is calculated by adding its slot number to the page's tag/low key. The ZedStore design, in contrast, explicitly stores TID for every entry. ZedStore seems more flexible for that reason, but at the same time the per-datum overhead seems very high to me. Maybe prefix compression could help here, which a low key and high key can do rather well. -- Peter Geoghegan
Execute INSERT during a parallel operation
Hi, I'm trying to use the SPI to save the executed plans in the ExecutorEnd. When the plan involves multiple workers, the insert operations would trigger an error: cannot execute INSERT during a parallel operation. I wonder if there's a different hook I can use when there's a gather node? or other ways to get around? Thank you, Donald Dong
doc: datatype-table and xfunc-c-type-table
Hi, Both tables in $subject (in datatype.sgml and xfunc.sgml, respectively) contain similar information (though the xfunc one mentions C structs and header files, and the datatype one does not, but has a description column) and seem similarly out-of-date with respect to the currently supported types ... though not identically out-of-date; they have different numbers of rows, and different types that are missing. How crazy an idea would it be to have include/catalog/pg_type.dat augmented with description, ctypename, and cheader fields, and let both tables be generated, with their respective columns? Regards, -Chap