Re: WIP: System Versioned Temporal Table
> > > The spec does not allow schema changes at all on a a system versioned > table, except to change the system versioning itself. > > That would greatly simplify things!
Re: Window Function "Run Conditions"
On Tue, Mar 15, 2022 at 5:24 PM Greg Stark wrote: > This looks like an awesome addition. > > I have one technical questions... > > Is it possible to actually transform the row_number case into a LIMIT > clause or make the planner support for this case equivalent to it (in > which case we can replace the LIMIT clause planning to transform into > a window function)? > > The reason I ask is because the Limit plan node is actually quite a > bit more optimized than the general window function plan node. It > calculates cost estimates based on the limit and can support Top-N > sort nodes. > > But the bigger question is whether this patch is ready for a committer > to look at? Were you able to resolve Andy Fan's bug report? Did you > resolve the two questions in the original email? > +1 to all this It seems like this effort would aid in implementing what some other databases implement via the QUALIFY clause, which is to window functions what HAVING is to aggregate functions. example: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause
Re: Huge memory consumption on partitioned table with FKs
> > I think this can be solved easily in the patch, by having > ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if > they are equal then use the parent's constaint_id, otherwise use the > child constraint. That way, the cache entry is reused in the common > case where they are identical. > Somewhat of a detour, but in reviewing the patch for Statement-Level RI checks, Andres and I observed that SPI made for a large portion of the RI overhead. Given that we're already looking at these checks, I was wondering if this might be the time to consider implementing these checks by directly scanning the constraint index.
Re: Huge memory consumption on partitioned table with FKs
On Mon, Nov 30, 2020 at 9:48 PM Tom Lane wrote: > Corey Huinker writes: > > Given that we're already looking at these checks, I was wondering if this > > might be the time to consider implementing these checks by directly > > scanning the constraint index. > > Yeah, maybe. Certainly ri_triggers is putting a huge amount of effort > into working around the SPI/parser/planner layer, to not a lot of gain. > > However, it's not clear to me that that line of thought will work well > for the statement-level-trigger approach. In that case you might be > dealing with enough tuples to make a different plan advisable. > > regards, tom lane > Bypassing SPI would probably mean that we stay with row level triggers, and the cached query plan would go away, perhaps replaced by an already-looked-up-this-tuple hash sorta like what the cached nested loops effort is doing. I've been meaning to give this a try when I got some spare time. This may inspire me to try again.
Re: simplifying foreign key/RI checks
> > > It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I > > know that doesn't mean the usefulness of the macro but the mechanism > > the macro suggests, but it is confusing.) On the other hand, > > RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no > > longer used. (Couldn't we remove them?) > > Yeah, better to just remove those _PK macros and say this module no > longer runs any queries on the PK table. > > How about the attached? > > Sorry for the delay. I see that the changes were made as described. Passes make check and make check-world yet again. I'm marking this Ready For Committer unless someone objects.
A Case For Inlining Immediate Referential Integrity Checks
A Case For Inlining Immediate Referential Integrity Checks -- The following is an overview of how Postgres currently implemented referential integrity, the some problems with that architecture, attempted solutions for those problems, and a suggstion of another possible solution. Notes On Notation and Referential Integrity In General -- All referential integrity is ultimately of this form: R(X) => T(Y) Where one referencing table R has a set of columns X That references a set of columns Y which comprise a unique constraint on a target table T. Note that the Y-set of columns is usually the primary key of T, but does not have to be. The basic referential integrity checks fall into two basic categories, Insert and Delete, which can be checked Immediately following the statement, or can be Deferred to the end of the transaction. The Insert check is fairly straightforward. Any insert to R, or update of R that modifies [1] any column in X, is checked to see if all of the X columns are NOT NULL, and if so, a lookup is done on T to find a matching row tuple of Y. If none is found, then an error is raised. The Update check is more complicated, as it covers any UPDATE operation that modifies [1] any column in Y, where all of the values of Y are NOT NUL, as well as DELETE operation where all of the columns of Y are NOT NULL. For any Update check, the table R is scanned for any matching X tuples matching Y in the previous, and for any matches found, an action is taken. That action can be to fail the operation (NO ACTION, RESTRICT), update the X values to fixed values (SET NULL, SET DEFAULT), or to delete those rows in R (CASCADE). Current Implementation -- Currently, these operations are handled via per-row triggers. In our general case, one trigger is placed on R for INSERT operations, and one trigger is placed on T for DELETE operations, and an additional trigger is placed on T for UPDATE operations that affect any column of Y. These Insert trigger functions invoke the C function RI_FKey_check() [2]. The trigger is fired unconditionally, and the trigger itself determines if there is a referential integrity constraint to be made or not. Ultimately this trigger invokes an SPI query of the form SELECT 1 FROM WHERE () FOR KEY SHARE. This query is generally quite straightforward to the planner, as it becomes either a scan of a single unique index, or a partition search followed by a scan of a single unique index. The operation succeeds if a row is found, and fails if it does not. The Update trigger functions are implemented with a set of C functions RI_[noaction|restrict|cascade|setnull|setdefault]_[upd|del]() [3]. These functions each generate a variation of SPI query in one of the following forms cascade: DELETE FROM WHERE restrict/noaction: SELECT 1 FROM WHERE FOR KEY SHARE setnull: UPDATE SET x1 = NULL, ... WHERE setdefault: UPDATE SET x1 = DEFAULT, ... WHERE These triggers are either executed at statement time (Immediate) or are queued for execution as a part of the transaction commit (Deferred). Problems With The Current Implementation The main problems with this architecture come down to visiblity and performance. The foremost problem with this implementation is that these extra queries are not visible to the end user in any way. It is possible to infer that the functions executed by looking at the constraint defnitions and comparing pg_stat_user_tables or pg_stat_user_indexes before and after the operation, but in general the time spent in these functions accrues to the DML statement (Immediate) or COMMIT statement (Deferred) without any insght into what took place. This is especially vexing in situations where an operation as simple as "DELETE FROM highly_referenced_table WHERE id = 1" hits the primary key index, but takes several seconds to run. The performance of Insert operations is generally not too bad, in that query boils down to an Index Scan for a single row. The problem, however, is that this query must be executed for every row inserted. The query itself is only planned once, and that query plan is cached for later re-use. That removes some of the query overhead, but also incurs a growing cache of plans which can create memory pressure if the number of foreign keys is large, and indeed this has become a problem for at least one customer [4]. Some profiling of the RI check indicated that about half of the time of the insert was spent in SPI functions that could be bypassed if the C function called index_beginscan and index_rescan directly [5]. And these indications bore out when Amit Langote wrote a patch [6] which finds the designanted index from the constraint (with some drilling through partitions if need be) and then invokes the scan functions. This method showed about a halving of the t
Re: Release SPI plans for referential integrity with DISCARD ALL
> > In addition to that, a following case would be solved with this approach: > When many processes are referencing many tables defined foreign key > constraints thoroughly, a huge amount of memory will be consumed > regardless of whether referenced tables are partitioned or not. > > Attached the patch. Any thoughts? > Amit Langote has done some great work at eliminating SPI from INSERT/UPDATE triggers entirely, thus reducing the number of cached plans considerably. I think he was hoping to have a patch formalized this week, if time allowed. It doesn't have DELETE triggers in it, so this patch might still have good value for deletes on a commonly used enumeration table. However, our efforts might be better focused on eliminating SPI from delete triggers as well, an admittedly harder task.
Re: simplifying foreign key/RI checks
> > > In file included from > /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0, > from > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24: > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: > In function ‘ri_PrimaryKeyExists’: > /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: > warning: this statement may fall through [-Wimplicit-fallthrough=] > do { \ > ^ > /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: > note: in expansion of macro ‘ereport_domain’ > ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__) > ^~ > /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: > note: in expansion of macro ‘ereport’ > ereport(elevel, errmsg_internal(__VA_ARGS__)) > ^~~ > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: > note: in expansion of macro ‘elog’ > elog(ERROR, "unexpected table_tuple_lock status: %u", res); > ^~~~ > /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: > note: here > default: > ^~~ > > -- > Regrads, > Japin Li. > ChengDu WenWu Information Technology Co.,Ltd. > I also get this warning. Adding a "break;" at line 418 resolves the warning.
Re: simplifying foreign key/RI checks
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote wrote: > On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu wrote: > > > > Hi, > > I was looking at this statement: > > > > insert into f select generate_series(1, 200, 2); > > > > Since certain generated values (the second half) are not in table p, > wouldn't insertion for those values fail ? > > I tried a scaled down version (1000th) of your example: > > > > yugabyte=# insert into f select generate_series(1, 2000, 2); > > ERROR: insert or update on table "f" violates foreign key constraint > "f_a_fkey" > > DETAIL: Key (a)=(1001) is not present in table "p". > > Sorry, a wrong copy-paste by me. Try this: > > create table p (a numeric primary key); > insert into p select generate_series(1, 200); > create table f (a bigint references p); > > -- Unpatched > insert into f select generate_series(1, 200, 2); > INSERT 0 100 > Time: 6527.652 ms (00:06.528) > > update f set a = a + 1; > UPDATE 100 > Time: 8108.310 ms (00:08.108) > > -- Patched: > insert into f select generate_series(1, 200, 2); > INSERT 0 100 > Time: 3312.193 ms (00:03.312) > > update f set a = a + 1; > UPDATE 100 > Time: 4292.807 ms (00:04.293) > > > For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch : > > > > +* Collect partition key values from the unique key. > > > > At the end of the nested loop, should there be an assertion that > partkey->partnatts partition key values have been found ? > > This can be done by using a counter (initialized to 0) which is > incremented when a match is found by the inner loop. > > I've updated the patch to add the Assert. Thanks for taking a look. > > -- > Amit Langote > EDB: http://www.enterprisedb.com v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert. The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables. A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there. I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers. This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different. [1] https://www.postgresql.org/message-id/cakkq508z6r5e3jdqhfpwszsajlpho3oyyoamfesaupto5vg...@mail.gmail.com
Re: Release SPI plans for referential integrity with DISCARD ALL
On Wed, Jan 13, 2021 at 1:03 PM Corey Huinker wrote: > In addition to that, a following case would be solved with this approach: >> When many processes are referencing many tables defined foreign key >> constraints thoroughly, a huge amount of memory will be consumed >> regardless of whether referenced tables are partitioned or not. >> >> Attached the patch. Any thoughts? >> > > Amit Langote has done some great work at eliminating SPI from > INSERT/UPDATE triggers entirely, thus reducing the number of cached plans > considerably. > > I think he was hoping to have a patch formalized this week, if time > allowed. > > It doesn't have DELETE triggers in it, so this patch might still have good > value for deletes on a commonly used enumeration table. > > However, our efforts might be better focused on eliminating SPI from > delete triggers as well, an admittedly harder task. > Amit's patch is now available in this thread [1]. I'm curious if it has any effect on your memory pressure issue. [1] https://www.postgresql.org/message-id/ca+hiwqgkfjfydeq5vhph6eqpkjsbfpddy+j-kxyfepqedts...@mail.gmail.com
Re: simplifying foreign key/RI checks
> > I decided not to deviate from pk_ terminology so that the new code > doesn't look too different from the other code in the file. Although, > I guess we can at least call the main function > ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've > changed that. > I agree with leaving the existing terminology where it is for this patch. Changing the function name is probably enough to alert the reader that the things that are called pks may not be precisely that.
Re: simplifying foreign key/RI checks
> > > > I decided not to deviate from pk_ terminology so that the new code > doesn't look too different from the other code in the file. Although, > I guess we can at least call the main function > ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've > changed that. > I think that's a nice compromise, it makes the reader aware of the concept. > > I've attached the updated patch. > Missing "break" added. Check. Comment updated. Check. Function renamed. Check. Attribute mapping matching test (and assertion) added. Check. Patch applies to an as-of-today master, passes make check and check world. No additional regression tests required, as no new functionality is introduced. No docs required, as there is nothing user-facing. Questions: 1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care? 2. Same question for the AtrrMap map, should there be a free_attrmap().
Re: simplifying foreign key/RI checks
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu wrote: > Hi, > > + for (i = 0; i < riinfo->nkeys; i++) > + { > + Oid eq_opr = eq_oprs[i]; > + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); > + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid); > + > + if (pk_nulls[i] != 'n' && > OidIsValid(entry->cast_func_finfo.fn_oid)) > > It seems the pk_nulls[i] != 'n' check can be lifted ahead of the > assignment to the three local variables. That way, ri_HashCompareOp > wouldn't be called when pk_nulls[i] == 'n'. > > + case TM_Updated: > + if (IsolationUsesXactSnapshot()) > ... > + case TM_Deleted: > + if (IsolationUsesXactSnapshot()) > > It seems the handling for TM_Updated and TM_Deleted is the same. The cases > for these two values can be put next to each other (saving one block of > code). > > Cheers > I'll pause on reviewing v4 until you've addressed the suggestions above.
Re: simplifying foreign key/RI checks
On Sun, Jan 24, 2021 at 6:51 AM Amit Langote wrote: > On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker > wrote: > > On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu wrote: > >> > >> Hi, > > Thanks for the review. > > >> + for (i = 0; i < riinfo->nkeys; i++) > >> + { > >> + Oid eq_opr = eq_oprs[i]; > >> + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); > >> + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, > typeid); > >> + > >> + if (pk_nulls[i] != 'n' && > OidIsValid(entry->cast_func_finfo.fn_oid)) > >> > >> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the > assignment to the three local variables. That way, ri_HashCompareOp > wouldn't be called when pk_nulls[i] == 'n'. > > Good idea, so done. Although, there can't be nulls right now. > > >> + case TM_Updated: > >> + if (IsolationUsesXactSnapshot()) > >> ... > >> + case TM_Deleted: > >> + if (IsolationUsesXactSnapshot()) > >> > >> It seems the handling for TM_Updated and TM_Deleted is the same. The > cases for these two values can be put next to each other (saving one block > of code). > > Ah, yes. The TM_Updated case used to be handled a bit differently in > earlier unposted versions of the patch, though at some point I > concluded that the special handling was unnecessary, but didn't > realize what you just pointed out. Fixed. > > > I'll pause on reviewing v4 until you've addressed the suggestions above. > > Here's v5. > v5 patches apply to master. Suggested If/then optimization is implemented. Suggested case merging is implemented. Passes make check and make check-world yet again. Just to confirm, we *don't* free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct? Anybody else want to look this patch over before I mark it Ready For Committer?
Re: parse_slash_copy doesn't support psql variables substitution
On Wed, Feb 10, 2021 at 8:33 AM Pavel Stehule wrote: > Hi > > Is there some reason why \copy statement (parse_slash_copy parser) doesn't > support psql variables? > > Regards > > Pavel > I remember wondering about that when I was working on the \if stuff. I dug into it a bit, but the problem was out of scope for my goals. The additional options recently added to \g reduced my need for \copy, and it seemed liked there was some effort to have input pipes as well, that would eliminate the need for \copy altogether.
Re: Add id's to various elements in protocol.sgml
On Sun, Dec 5, 2021 at 11:15 AM Daniel Gustafsson wrote: > > On 5 Dec 2021, at 16:51, Brar Piening wrote: > > > The attached patch adds id's to various elements in protocol.sgml to > > make them more accesssible via the public html documentation interface. > > Off the cuff without having checked the compiled results yet, it seems > like a good idea. > > — > Daniel Gustafsson > I wanted to do something similar for every function specification in the docs. This may inspire me to take another shot at that.
Re: Getting rid of regression test input/ and output/ files
> > > 0001 adds the \getenv command to psql; now with documentation > and a simple regression test. > +1. Wish I had added this years ago when I had a need for it. > > 0002 tweaks pg_regress to export the needed values as environment > variables, and modifies the test scripts to use those variables. > (For ease of review, this patch modifies the scripts in-place, > and then 0003 will move them.) A few comments on this: > > * I didn't see any value in exporting @testtablespace@ as a separate > variable; we might as well let the test script know how to construct > that path name. > > * I concluded that the right way to handle the concatenation issue > is *not* to rely on SQL literal concatenation, but to use psql's > \set command to concatenate parts of a string. In particular this > +1 to that, much better than the multi-line thing. I have a nitpick about the \getenv FOO FOO lines. It's a new function to everyone, and to anyone who hasn't seen the documentation it won't be immediately obvious which one is the ENV var and which one is the local var. Lowercasing the local var would be a way to reinforce which is which to the reader. It would also be consistent with var naming in the rest of the script. > > 0004 finally removes the no-longer-needed infrastructure in > +1 Deleted code is debugged code.
Re: Getting rid of regression test input/ and output/ files
On Sun, Dec 19, 2021 at 5:48 PM Tom Lane wrote: > Corey Huinker writes: > > I have a nitpick about the \getenv FOO FOO lines. > > It's a new function to everyone, and to anyone who hasn't seen the > > documentation it won't be immediately obvious which one is the ENV var > and > > which one is the local var. Lowercasing the local var would be a way to > > reinforce which is which to the reader. It would also be consistent with > > var naming in the rest of the script. > > Reasonable idea. Another thing I was wondering about was whether > to attach PG_ prefixes to the environment variable names, since > those are in a more-or-less global namespace. If we do that, > then a different method for distinguishing the psql variables > is to not prefix them. +1 to that as well. Which brings up a tangential question, is there value in having something that brings in one or more env vars as psql vars directly. I'm thinking something like: \importenv pattern [prefix] (alternate names: \getenv_multi \getenv_pattern, \getenvs, etc) which could be used like \importenv PG* env_ which would import PGFOO and PGBAR as env_PGFOO and env_PGBAR, awkward names but leaving no doubt about where a previously unreferenced variable came from. I don't *think* we need it for this specific case, but since the subject of env vars has come up I thought I'd throw it out there.
Re: Getting rid of regression test input/ and output/ files
On Sun, Dec 19, 2021 at 7:00 PM Tom Lane wrote: > Corey Huinker writes: > > Which brings up a tangential question, is there value in having something > > that brings in one or more env vars as psql vars directly. I'm thinking > > something like: > > > \importenv pattern [prefix] > > Meh ... considering we've gone this long without any getenv capability > in psql at all, that seems pretty premature. > > regards, tom lane > Fair enough. Patches didn't apply with `git apply` but did fine with `patch -p1`, from there it passes make check-world.
Re: simplifying foreign key/RI checks
> > > > I wasn't able to make much inroads into how we might be able to get > rid of the DETACH-related partition descriptor hacks, the item (3), > though I made some progress on items (1) and (2). > > For (1), the attached 0001 patch adds a new isolation suite > fk-snapshot.spec to exercise snapshot behaviors in the cases where we > no longer go through SPI. It helped find some problems with the > snapshot handling in the earlier versions of the patch, mainly with > partitioned PK tables. It also contains a test along the lines of the > example you showed upthread, which shows that the partition descriptor > hack requiring ActiveSnapshot to be set results in wrong results. > Patch includes the buggy output for that test case and marked as such > in a comment above the test. > > In updated 0002, I fixed things such that the snapshot-setting > required by the partition descriptor hack is independent of > snapshot-setting of the RI query such that it no longer causes the PK > index scan to return rows that the RI query mustn't see. That fixes > the visibility bug illustrated in your example, and as mentioned, also > exercised in the new test suite. > > I also moved find_leaf_pk_rel() into execPartition.c with a new name > and a new set of parameters. > > -- > Amit Langote > EDB: http://www.enterprisedb.com Sorry for the delay. This patch no longer applies, it has some conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a
Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET
> > Out of curiosity, could you please tell me the concrete situations > where you wanted to delete one of two identical records? > In my case, there is a table with known duplicates, and we would like to delete all but the one with the lowest ctid, and then add a unique index to the table which then allows us to use INSERT ON CONFLICT in a meaningful way. The other need for a DELETE...LIMIT or UPDATE...LIMIT is when you're worried about flooding a replica, so you parcel out the DML into chunks that don't cause unacceptable lag on the replica. Both of these can be accomplished via DELETE FROM foo WHERE ctid IN ( SELECT ... FROM foo ... LIMIT 1000), but until recently such a construct would result in a full table scan, and you'd take the same hit with each subsequent DML. I *believe* that the ctid range scan now can limit those scans, especially if you can order the limited set by ctid, but those techniques are not widely known at this time.
Re: simplifying foreign key/RI checks
> > > > Good catch, thanks. Patch updated. > > > Applies clean. Passes check-world.
Re: Foreign key joins revisited
> > > > Perhaps this would be more SQL idiomatic: > > FROM permission p >LEFT JOIN ON KEY role IN p AS r >LEFT JOIN team_role AS tr ON KEY role TO r >LEFT JOIN ON KEY team IN tr AS t >LEFT JOIN user_role AS ur ON KEY role TO r >LEFT JOIN ON KEY user IN ur AS u > > My second guess would be: FROM permission p LEFT JOIN role AS r ON [FOREIGN] KEY [(p.col1 [, p.col2 ...])] where the key spec is only required when there are multiple foreign keys in permission pointing to role. But my first guess would be that the standards group won't get around to it.
Re: Foreign key joins revisited
> > > First, there is only one FK in permission pointing to role, and we write a > query leaving out the key columns. > Then, another different FK in permission pointing to role is later added, > and our old query is suddenly in trouble. > > We already have that problem with cases where two tables have a common x column: SELECT x FROM a, b so this would be on-brand for the standards body. And worst case scenario you're just back to the situation you have now.
Re: Suggestion: optionally return default value instead of error on failed cast
> > currently a failed cast throws an error. It would be useful to have a > way to get a default value instead. > I've recently encountered situations where this would have been helpful. Recently I came across some client code: CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean LANGUAGE PLPGSQL AS $$ DECLARE j json; BEGIN j := str::json; return true; EXCEPTION WHEN OTHERS THEN return false; END $$; This is a double-bummer. First, the function discards the value so we have to recompute it, and secondly, the exception block prevents the query from being parallelized. > > T-SQL has try_cast [1] > I'd be more in favor of this if we learn that there's no work (current or proposed) in the SQL standard. > Oracle has CAST(... AS .. DEFAULT ... ON CONVERSION ERROR) [2] > If the SQL group has suggested anything, I'd bet it looks a lot like this. > > The DEFAULT ... ON CONVERSION ERROR syntax seems like it could be > implemented in PostgreSQL. Even if only DEFAULT NULL was supported (at > first) that would already help. > > The short syntax could be extended for the DEFAULT NULL case, too: > > SELECT '...'::type -- throws error > SELECT '...':::type -- returns NULL > I think I'm against adding a ::: operator, because too many people are going to type (or omit) the third : by accident, and that would be a really subtle bug. The CAST/TRY_CAST syntax is wordy but it makes it very clear that you expected janky input and have specified a contingency plan. The TypeCast node seems like it wouldn't need too much modification to allow for this. The big lift, from what I can tell, is either creating versions of every $foo_in() function to return NULL instead of raising an error, and then effectively wrapping that inside a coalesce() to process the default. Alternatively, we could add an extra boolean parameter ("nullOnFailure"? "suppressErrors"?) to the existing $foo_in() functions, a boolean to return null instead of raising an error, and the default would be handled in coerce_to_target_type(). Either of those would create a fair amount of work for extensions that add types, but I think the value would be worth it. I do remember when I proposed the "void"/"black hole"/"meh" datatype (all values map to NULL) I ran into a fairly fundamental rule that types must map any not-null input to a not-null output, and this could potentially violate that, but I'm not sure. Does anyone know if the SQL standard has anything to say on this subject?
Re: SQL:2011 application time
On Wed, Jan 5, 2022 at 11:07 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > On 21.11.21 02:51, Paul A Jungwirth wrote: > > Here are updated patches. They are rebased and clean up some of my > > TODOs. > > This patch set looks very interesting. It's also very big, so it's > difficult to see how to get a handle on it. I did a pass through it > to see if there were any obvious architectural or coding style > problems. I also looked at some of your TODO comments to see if I had > something to contribute there. > > I'm confused about how to query tables based on application time > periods. Online, I see examples using AS OF, but in the SQL standard > I only see this used for system time, which we are not doing here. > What is your understanding of that? > Paul has previously supplied me with this document https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf and that formed the basis of a lot of my questions a few months earlier. There was similar work being done for system periods, which are a bit simpler but require a side (history) table to be created. I was picking people's brains about some aspects of system versioning to see if I could help bringing that into this already very large patchset, but haven't yet felt like I had done enough research to post it. It is my hope that we can at least get the syntax for both application and system versioning committed, even if it's just stubbed in with not-yet-supported errors.
Re: Suggestion: optionally return default value instead of error on failed cast
On Thu, Jan 6, 2022 at 12:18 PM Andrew Dunstan wrote: > > On 1/4/22 22:17, Corey Huinker wrote: > > > > currently a failed cast throws an error. It would be useful to have a > > way to get a default value instead. > > > > > > I've recently encountered situations where this would have been > > helpful. Recently I came across some client code: > > > > CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean > > LANGUAGE PLPGSQL > > AS $$ > > DECLARE > > j json; > > BEGIN > > j := str::json; > > return true; > > EXCEPTION WHEN OTHERS THEN return false; > > END > > $$; > > > > > > This is a double-bummer. First, the function discards the value so we > > have to recompute it, and secondly, the exception block prevents the > > query from being parallelized. > > > This particular case is catered for in the SQL/JSON patches which > several people are currently reviewing: > > That's great to know, but it would still be parsing the json twice, once to learn that it is legit json, and once to get the casted value. Also, I had a similar issue with type numeric, so having generic "x is a type_y" support would essentially do everything that a try_catch()-ish construct would need to do, and be more generic.
Re: SQL:2011 application time
> > > But > the standard says that dropping system versioning should automatically > drop all historical records (2 under Part 2: Foundation, 11.30 system versioning clause>). That actually makes sense though: when you > do DML we automatically update the start/end columns, but we don't > save copies of the previous data (and incidentally the end column will > always be the max value.) This is what I was referring to when I mentioned a side-table. deleting history would be an O(1) operation. Any other misunderstandings are all mine.
Re: Add 64-bit XIDs into PostgreSQL 15
> > I'd be > curious to know where we found the bits for that -- the tuple header > isn't exactly replete with extra bit space. > +1 - and can we somehow shoehorn in a version # into the new format so we never have to look for spare bits again.
Re: test runner (was Re: SQL-standard function body)
> > > This is nice. Are there any parallelism capabilities? > > Yes. It defaults to number-of-cores processes, but obviously can also be > specified explicitly. One very nice part about it is that it'd work > largely the same on windows (which has practically unusable testing > right now). It probably doesn't yet, because I just tried to get it > build and run tests at all, but it shouldn't be a lot of additional > work. > The pidgin developers speak very highly of meson, for the same reasons already mentioned in this thread.
Re: use pg_get_functiondef() in pg_dump
> > I'm sure there's a lot of folks who'd like to see more of the logic we > have in pg_dump for building objects from the catalog available to more > tools through libpgcommon- psql being one of the absolute first > use-cases for exactly that (there's certainly no shortage of people > who've asked how they can get a CREATE TABLE statement for a table by > using psql...). > I count myself among those folks (see https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com for discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server side or client side). I'm all for having this as "just" as set of pg_get_*def functions, because they allow for the results to be used in queries. Granted, the shape of the result set may not be stable, but that's the sort of thing we can warn for the same way we have warnings for changes to pg_stat_activity. At that point any DESCRIBE/SHOW CREATE server side functions essentially become just shells around the pg_get_*def(), with no particular requirement to make those new commands work inside a SELECT. Would it be totally out of left field to have the functions have an optional "version" parameter, defaulted to null, that would be used to give backwards compatible results if and when we do make a breaking change?
Re: automatically generating node support functions
> > build support and made the Perl code more portable, so that the cfbot > doesn't have to be sad. > Was this also the reason for doing the output with print statements rather than using one of the templating libraries? I'm mostly just curious, and certainly don't want it to get in the way of working code.
Re: Extending range type operators to cope with elements
> > > >- @> contains range/element > >- <@ element/range is contained by > I'm not a heavy user or range types, so I can't really judge how useful > that is in practice, but it seems like a fairly natural extension of the > existing operators. I mean, if I understand it correctly, the proposed > behavior is equal to treating the element as a "collapsed range". > I used to give a talk on ranges and partitioning, prior to postgresql getting native partitioning (see: https://wiki.postgresql.org/images/1/1b/Ranges%2C_Partitioning_and_Limitations.pdf ) In that talk, I mention the need for exactly these operators, specifically for an extension called range_partitioning which had some logic for "If I were to insert a row with this value, what partition would it end up in?" which allowed for a subsequent COPY operation directly to that partition. That logic essentially binary-searched a series of ranges, so it needed an "elem <@ range" as well as << and >>. Yes, constructing a collapsed range was the work-around I used in the absence of real functions. That extension has been replaced by real table partitioning and the planner itself now does similar logic for partition pruning. So yes, I've had a need for those operators in the past. What I don't know is whether adding these functions will be worth the catalog clutter.
Add A Glossary
Attached is a v1 patch to add a Glossary to the appendix of our current documentation. I believe that our documentation needs a glossary for a few reasons: 1. It's hard to ask for help if you don't know the proper terminology of the problem you're having. 2. Readers who are new to databases may not understand a few of the terms that are used casually both in the documentation and in forums. This helps to make our documentation a bit more useful as a teaching tool. 3. Readers whose primary language is not English may struggle to find the correct search terms, and this glossary may help them grasp that a given term has a usage in databases that is different from common English usage. 3b. If we are not able to find the resources to translate all of the documentation into a given language, translating the glossary page would be a good first step. 4. The glossary would be web-searchable, and draw viewers to the official documentation. 5. adding link anchors to each term would make them cite-able, useful in forum conversations. A few notes about this patch: 1. It's obviously incomplete. There are more terms, a lot more, to add. 2. The individual definitions supplied are off-the-cuff, and should be thoroughly reviewed. 3. The definitions as a whole should be reviewed by an actual tech writer (one was initially involved but had to step back due to prior commitments), and the definitions should be normalized in terms of voice, tone, audience, etc. 4. My understanding of DocBook is not strong. The glossary vs glosslist tag issue is a bit confusing to me, and I'm not sure if the glossary tag is even appropriate for our needs. 5. I've made no effort at making each term an anchor, nor have I done any CSS styling at all. 6. I'm not quite sure how to handle terms that have different definitions in different contexts. Should that be two glossdefs following one glossterm, or two separate def/term pairs? Please review and share your thoughts. From 343d5c18bf23f98341b510595e3e042e002242cb Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Sun, 13 Oct 2019 17:57:36 + Subject: [PATCH] add glossary page with sample terms and definitions --- doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/glossary.sgml | 618 doc/src/sgml/stylesheet.css | 2 + 3 files changed, 621 insertions(+) create mode 100644 doc/src/sgml/glossary.sgml diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3da2365ea9..504c8a6326 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -170,6 +170,7 @@ + diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml new file mode 100644 index 00..016eee2d76 --- /dev/null +++ b/doc/src/sgml/glossary.sgml @@ -0,0 +1,618 @@ + + + + Glossary + + + This is a list of terms and their in the context of PostgreSQL and databases in general. + + + + +Aggregate + + + To combine a collection of data values into a single value, whose value +may not be of the same type as the original values. Aggregate functions combine +multiple rows that share a common set of values into one row, which means that +the only data visible in the values in common, and the aggregates of the +non-common data. + + + + + +Analytic + + + A function whose computed value can reference values found in nearby rows +of the same result set. + + + + + +Atomic + + + In reference to the value of an Attribute or Datum: cannot be broken up +into smaller components. + + + In reference to an operation: An event that cannot be completed in part: +it must either entirely succeed or entirely fail. A series of SQL statements can +be combined into a Transaction, and that transaction is said to be Atomic. + + + + + +Attribute + + + A typed data element found within a Tuple or Relation or Table. + + + + + +Cast + + + A conversion of a Datum from its current data type to another data type. + + + + + +Check Constraint + + + A type of constraint defined on a relation which restricts the values +allowed in one or more Attributes. The check constraint can make reference to +any Attribute in the Relation, but cannot reference other rows of the same +relation or other relations. + + + + + +Column + + + An Attribute found in a Table or View. + + + + + +Commit + + + The act of finalizing a Transaction within the database. + + + + + +Concurrency + + + The concept that multiple independent operations can be happening within +the database at the same time. + + + + + +Constraint + + + A method of restricting the values of data allowed within a Table. + + + + + +Dat
Add Change Badges to documentation
Attached is a patch to implement change badges in our documentation. What's a change badge? It's my term for a visual cue in the documentation used to indicate that the nearby section of documentation is new in this version or otherwise changed from the previous version. One example of change badges being used is in the DocBook documentation reference: https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes Docbook used graphical badges, which seemed to be a bad idea. Instead, I went with a decorated text span like one finds in gmail labels or Reddit "flair". The badges are implemented via using the "revision" attribute available on all docbook tags. All one needs to do to indicate a change is to change one tag, and add a revision attribute. For example: will add a small green text box with the tex "new in 13" immediately preceding the rendered elements. I have attached a screenshot (badges_in_acronyms.png) of an example of this from my browser viewing changes to the acronyms.html file. This obviously lacks the polish of viewing the page on a full website, but it does give you an idea of the flexibility of the change badge, and where badge placement is (and is not) a good idea. What are the benefits of using this? I think the benefits are as follows: 1. It shows a casual user what pieces are new on that page (new functions, new keywords, new command options, etc). 2. It also works in the negative: a user can quickly skim a page, and lacking any badges, feel confident that everything there works in the way that it did in version N-1. 3. It also acts as a subtle cue for the user to click on the previous version to see what it used to look like, confident that there *will* be a difference on the previous version. How would we implement this? 1. All new documentation pages would get a "NEW" badge in their title. 2. New function definitions, new command options, etc would get a "NEW" badge as visually close to the change as is practical. 3. Changes to existing functions, options, etc. would get a badge of "UPDATED" 4. At major release time, we could do one of two things: 4a. We could keep the NEW/UPDATED badges in the fixed release version, and then completely remove them from the master, because for version N+1, they won't be new anymore. This can be accomplished with an XSL transform looking for any tag with the "revision" attribute 4b. We could code in the version number at release time, and leave it in place. So in version 14 you could find both "v13" and "v14" badges, and in version 15 you could find badges for 15, 14, and 13. At some point (say v17), we start retiring the v13 badges, and in v18 we'd retire the v14 badges, and so on, to keep the clutter to a minimum. Back to the patch: I implemented this only for html output, and the colors I chose are very off-brand for postgres, so that will have to change. There's probably some spacing/padding issues I haven't thought of. Please try it out, make some modifications to existing document pages to see how badges would work in those contexts. From ded965fc90b223a834ac52d55512587b7a6ea139 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Fri, 18 Oct 2019 06:15:10 -0400 Subject: [PATCH] add document change badges --- doc/src/sgml/acronyms.sgml | 6 +++--- doc/src/sgml/stylesheet-html-common.xsl | 10 ++ doc/src/sgml/stylesheet.css | 10 ++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml index f638665dc9..87bfef04be 100644 --- a/doc/src/sgml/acronyms.sgml +++ b/doc/src/sgml/acronyms.sgml @@ -10,7 +10,7 @@ -ANSI +ANSI https://en.wikipedia.org/wiki/American_National_Standards_Institute";> @@ -19,7 +19,7 @@ - + API @@ -31,7 +31,7 @@ ASCII - + https://en.wikipedia.org/wiki/Ascii";>American Standard Code for Information Interchange diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl index 9edce52a10..cb04cb7f0d 100644 --- a/doc/src/sgml/stylesheet-html-common.xsl +++ b/doc/src/sgml/stylesheet-html-common.xsl @@ -289,4 +289,14 @@ set toc,title + + + + + + + + + + diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css index 1a66c789d5..d0cae2f59f 100644 --- a/doc/src/sgml/stylesheet.css +++ b/doc/src/sgml/stylesheet.css @@ -109,3 +109,13 @@ acronym { font-style: inherit; } width: 75%; } } + +/* version badge styling */ +span.revision-badge { + visibility: visible ; +color: white; + background-color: #00933C; + border: 1px solid #00; + border-radius: 2px; +padding: 1px; +} -- 2.14.1
Re: \describe*
> > >> - Tab completion for \descibe-verbose. >> I know that \d+ tab completion is also not there, but I think we must >> have tab completion for \descibe-verbose. >> >> postgres=# \describe- >> \describe-extension >> \describe-replication-publication \describe-user-mapping >> \describe-foreign-data-wrapper >> \describe-replication-subscription\describe-view >> \describe-foreign-server \describe-role >> \describe-window-function >> \describe-foreign-table \describe-rule >> ... >> > I just confirmed that there isn't tab completion for the existing S/+ options, so it's hard to justify them for the equivalent verbose suffixes. > (1 row) >> Invalid command \describe. Try \? for help. >> >> >> I think this status is causing the problem. >> >> >> >> + /* >> standard listing of interesting things */ >> + success = >> listTables("tvmsE", NULL, show_verbose, show_system); >> + } >> + status = PSQL_CMD_UNKNOWN; >> >> I'll look into this, thanks! > - Confusion about \desc and \desC >> There is confusion while running the \desc command. I know the problem, >> but the user may confuse by this. >> postgres=# \desC >>List of foreign servers >> Name | Owner | Foreign-data wrapper >> --+---+-- >> (0 rows) >> >> postgres=# \desc >> Invalid command \desc. Try \? for help. >> >> - Auto-completion of commands. >> There is some more confusion in the completion of commands. >> >> This command shows List of aggregates. >> postgres=# \describe-aggregate-function >> List of aggregate functions >> Schema | Name | Result data type | Argument data types | Description >> +--+--+-+- >> (0 rows) >> >> >> >> This command shows a list of relation "\d" >> postgres=# \describe-aggregatE-function >> List of relations >> Schema | Name | Type | Owner >> +--+---+- >> public | foo | table | vagrant >> (1 row) >> >> This command also shows a list of relations "\d". >> postgres=# \describe-aggr >> List of relations >> Schema | Name | Type | Owner >> +--+---+- >> public | foo | table | vagrant >> (1 row) >> >> This command shows error messages. >> postgres=# \descr >> Invalid command \descr. Try \? for help. >> >> I will look into it. > >> I have done a brief code review except for the documentation code. I >> don't like this code >> >> if (cmd_match(cmd,"describe-aggregate-function")) >> >> success = describeAggregates(pattern, show_verbose, show_system); >> else if (cmd_match(cmd, >> "describe-access-method")) >> success = describeAccessMethods(pattern, >> show_verbose); >> else if (cmd_match(cmd, >> "describe-tablespace")) >> success = describeTablespaces(pattern, >> show_verbose); >> else if (cmd_match(cmd, >> "describe-conversion")) >> success = listConversions(pattern, >> show_verbose, show_system); >> else if (cmd_match(cmd, "describe-cast")) >> success = listCasts(pattern, show_verbose >> >> >> This can be achieved with the list/array/hash table, so I have changed >> that code in the attached patch just for a sample if you want I can do that >> for whole code. >> > There's some problems with a hash table. The function signatures vary quite a lot, and some require additional psql_scan_slash_options to be called. The hash option, if implemented, probably should be expanded to all slash commands, at which point maybe it belongs in psqlscanslash.l... >
Re: Re: \describe*
> > > I agree with Andres and Robert. This patch should be pushed to PG13. > > I'll do that on March 8 unless there is a compelling argument not to. > > No objection. I'll continue to work on it, though.
Re: \describe*
On Mon, Mar 4, 2019 at 1:45 PM Corey Huinker wrote: > >>> - Tab completion for \descibe-verbose. >>> I know that \d+ tab completion is also not there, but I think we must >>> have tab completion for \descibe-verbose. >>> >>> postgres=# \describe- >>> \describe-extension >>> \describe-replication-publication \describe-user-mapping >>> \describe-foreign-data-wrapper >>> \describe-replication-subscription\describe-view >>> \describe-foreign-server \describe-role >>> \describe-window-function >>> \describe-foreign-table \describe-rule >>> ... >>> >> > I just confirmed that there isn't tab completion for the existing S/+ > options, so it's hard to justify them for the equivalent verbose suffixes. > We can add completions for describe[-thing-]-verbose, but the auto-completions start to run into combinatoric complexity, and the original short-codes don't do that completion, probably for the same reason. + success = >>> listTables("tvmsE", NULL, show_verbose, show_system); >>> + } >>> + status = >>> PSQL_CMD_UNKNOWN; >>> >>> > I'll look into this, thanks! > This was fixed, good find. > - Confusion about \desc and \desC >>> There is confusion while running the \desc command. I know the problem, >>> but the user may confuse by this. >>> postgres=# \desC >>>List of foreign servers >>> Name | Owner | Foreign-data wrapper >>> --+---+-- >>> (0 rows) >>> >>> postgres=# \desc >>> Invalid command \desc. Try \? for help. >>> >> I've changed the code to first strip out 0-1 instances of "-verbose" and "-system" and the remaining string must be an exact match of a describe command or it's an error. This same system could be applied to the short commands to strip out 'S' and '+' and it might clean up the original code a bit. This command shows a list of relation "\d" >>> postgres=# \describe-aggregatE-function >>> List of relations >>> Schema | Name | Type | Owner >>> +--+---+- >>> public | foo | table | vagrant >>> (1 row) >>> >> Same issue, same fix. >>> I have done a brief code review except for the documentation code. I >>> don't like this code >>> >>> if (cmd_match(cmd,"describe-aggregate-function")) >>> >>> success = describeAggregates(pattern, show_verbose, show_system); >>> else if (cmd_match(cmd, >>> "describe-access-method")) >>> success = >>> describeAccessMethods(pattern, show_verbose); >>> else if (cmd_match(cmd, >>> "describe-tablespace")) >>> success = describeTablespaces(pattern, >>> show_verbose); >>> else if (cmd_match(cmd, >>> "describe-conversion")) >>> success = listConversions(pattern, >>> show_verbose, show_system); >>> else if (cmd_match(cmd, "describe-cast")) >>> success = listCasts(pattern, >>> show_verbose >>> >>> >>> This can be achieved with the list/array/hash table, so I have changed >>> that code in the attached patch just for a sample if you want I can do that >>> for whole code. >>> >> > There's some problems with a hash table. The function signatures vary > quite a lot, and some require additional psql_scan_slash_options to be > called. The hash option, if implemented, probably should be expanded to all > slash commands, at which point maybe it belongs in psqlscanslash.l... > As I suspected, there's a lot of variance in the function signatures of the various listSomething()/describeSomething() commands, and listDbRoleSettings requires a second pattern to be scanned, and as far as I know PsqlScanState isn't known inside describe.h, so building and using a hash table would be a lot of work for uncertain gain. The original code just plows through strings in alphabetical order, breaking things up by comparing leading characters, so I largely did the same at the
GIN indexes on an = ANY(array) clause
(moving this over from pgsql-performance) A client had an issue with a where that had a where clause something like this: WHERE 123456 = ANY(integer_array_column) I was surprised that this didn't use the pre-existing GIN index on integer_array_column, whereas recoding as WHERE ARRAY[123456] <@ integer_array_column did cause the GIN index to be used. Is this a known/expected behavior? If so, is there any logical reason why we couldn't have the planner pick up on that? Flo Rance (toura...@gmail.com) was nice enough to show that yes, this is expected behavior. Which leaves the questions: - is the transformation I made is algebraically correct in a general case? - if so, could we have the planner do that automatically when in the presence of a matching GIN index? This seems like it might tie in with the enforcement of foreign keys within an array thread (which I can't presently find...).
Re: Syntax diagrams in user documentation
On Thu, Mar 28, 2019 at 6:49 PM Peter Geoghegan wrote: > On Thu, Mar 28, 2019 at 3:46 PM Jeremy Schneider > wrote: > > We're just gearing up for the Google Season of Docs and I think this > > would be a great task for a doc writer to help with. Any reason to > > expect serious objections to syntax diagram graphics in the docs? > > It might be hard to come to a consensus, because it's one of those > things that everybody can be expected to have an opinion on. It > probably won't be hard to get something committed that's clearly more > informative than what we have right now, though. > > There is a question about how we maintain consistency between the > syntax diagrams in psql if we go this way, though. Not sure what to do > about that. > This discussion is highly relevant to an upcoming talk I have called "In Aid Of RTFM", and the work I hope would follow from it. While I personally like these bubble charts because they remind me of my misspent youth at IBM, they have some drawbacks: 1. They look like something out of an IBM manual 2. Images conceal information from visually impaired people 3. They aren't copy paste-able text 4. They aren't easily comparable 5. They bake in the language of the comments The merits of #1 can be argued forever, and it's possible that a more modern bubble chart theme is possible. #2 is problematic, because things like ADA compliance and the EU Accessibility Requirements frown upon conveying text inside images. The way around this might be to have the alt-text of the image be the original syntax as we have it now. #3 is important when attempting to relay the relevant excerpt of a very large documentation page via email or slack. Yes, I could right click and copy the URL of the image (in this case https://www.sqlite.org/images/syntax/insert-stmt.gif and others), but that's more work that copy-paste. We could add an HTML anchor to each image (my talk discusses our current lack of reference anchors) and that would mitigate it somewhat. Making the original text available via mouse-over or a "copy text" link might work too. #3b As long as I live, I will never properly memorize the syntax for RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I will google this and copy-paste it. I suspect I'm not alone. If it's available only in an image, then I can't copy paste, and I *will* mistype some part of that at least twice. #4 isn't such an immediate issue, but one of my points in the talk is that right now there is no way to easily distinguish text on a page that is new in the most recent version of pgsql (i.e. a red-line markup). We could of course flag that an image changed from version X-1 to X, but it would be tougher to convey which parts of the image changed. #5 it not such a big issue because most of what is in the diagram is pure syntax, but comments will leak in, and those snippets of English will be buried very deep in bubble-markup.
Re: DWIM mode for psql
On Sun, Mar 31, 2019 at 5:04 PM Andres Freund wrote: > On 2019-04-01 09:52:34 +1300, Thomas Munro wrote: > > +/* > > + * This program is free software: you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation, either version 3 of the License, or > > + * (at your option) any later version. > > Indentation bug. You really need to work a bit more careful. > The patch applies cleanly, and passes "make check", but it generated an executable called "mongodb". Should I have run "make maintainer-clean" first?
Re: \describe*
> > It seems this topic is ongoing so I've moved it to the September CF, > but it's in "Waiting on Author" because we don't have a concrete patch > that applies (or agreement on what it should do?) right now. > All recent work has been investigating the need(s) we're trying to address. This is as good of a time as any to share my findings (with much collaboration with Dave Fetter) so far. 1. Adding helper commands to psql aids only psql, and a great number of users do not, or can not, use psql. So adding something on the server side would have broader usage and appeal. Furthermore, some access tools (especially browser-based ones) are not good about returning non-tabular results, so helper commands that return result sets would have the broadest usage. 2. Our own interest in server-side commands is all over the map. Some just want the convenience of having them server side, or familiarity with $OTHER_DB. Others want to eliminate the need for some code in pg_dump, JDBC, or elsewhere. 3. There isn't much consensus in the other databases, though all of them do *something*: SQLServer --- SQLServer has sp_help ( https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-help-transact-sql?view=sql-server-2017 ) which contextually returns one of two different result sets (name, owner, object type) or (column name, type, storage, length, precision, scale, nullable, default, rule, collation) DB2 -- Has a describe command (source: https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0002019.html) which can be used to describe query output (data type, data type length, column name, column name length). It also has an option to DESCRIBE TABLE foo which returns a set of (col_name, schema_of_datatype, data_type, data_type_length, data_type_scale, Nulls t/f) It also has DESCRIBE INDEXES FOR TABLE foo which returns a set of (schema of index, name of index, unique flag, number of columns, index type) It also has DESCRIBE DATA PARTITIONS FOR TABLE which as you might guess shows partitions. All of these options have a SHOW DETAIL modifier which adds more columns. MySQL -- (https://dev.mysql.com/doc/refman/8.0/en/show-columns.html) MySSQL has SHOW COLUMNS which also returns a set of (name, type similar to format_type(), null flag, PK or index indicator, default value, notes about auto-increment/autogreneration/implicit trggers), and can be extended to show privileges and comments with the EXTENDED and FULL options. MySQL has a DESCRIBE command, but it is a synonym of EXPLAIN. MySQL also has a raft of commands like SHOW CREATE USER, SHOW CREATE VIEW, SHOW CREATE TRIGGER, SHOW CREATE TABLE, etc. (ex: https://dev.mysql.com/doc/refman/8.0/en/show-create-user.html) These commands all return a result set of of exactly one column, each row representing one SQL statement, essentially doing a single-object schema-only pg_dump. Oracle - https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12019.htm SQL*Plus has a describe command that works on tables and views and composite types (tabular set of: name, null, type) procedures (tabular set of: arg name, type, in/out), and packages (a series of sets one per type and procedure) SQLcl has the INFO statement, which is roughly analogous to psql's \d in that it is a mix of tabular and non-tabular information. Oracle itself has dbms_metadata.get_ddl() which seems analogous to mysql's SHOW CREATE commands. Snowflake -- Snowflake has DESCRIBE TABLE https://docs.snowflake.net/manuals/sql-reference/sql/desc-table.html and DESCRIBE VIEW https://docs.snowflake.net/manuals/sql-reference/functions/get_ddl.html Which return a set of: (name, type, column type, null flag, default, primary key, unique key, check, expression, comment). It also has an option for describing "stage" tables, which are s3 buckets with a file format associated, the closest postgresql analog would be a file_fdw foreign table, and there is a separate result set format for that. Snowflake has no concept of indexes (it understands that there's things called a unique keys, and it remembers that you said you wanted one, but does nothing to enforce it), so no command for that. These result sets are not composable in a query, however, they are stored in the RESULT_SCAN cache, which means that you can run a describe, and then immediately fetch the results of that command as if it was a table. Snowflake also has a get_ddl() function https://docs.snowflake.net/manuals/sql-reference/sql/desc-view.html which is a one-column result set of statements to re-create the given object. >From all this, I have so far concluded: 1. There is real demand to be able to easily see the basic structure of tables, views, and indexes in a way that strikes a balance between detail and clutter. 2. There is some acknowledgement that this data be useful if it was further filtered through SQL, though only one vendor h
Re: Referential Integrity Checks with Statement-level Triggers
> > > > The people who expressed opinions on nuking triggers from orbit (it's > the only way to be sure) have yet to offer up any guidance on how to > proceed from here, and I suspect it's because they're all very busy getting > things ready for v12. I definitely have an interest in working on this for > 13, but I don't feel good about striking out on my own without their input. > > Very interesting thread, but the current patch has been through two > CFs without comments or new patches, so I'm going to mark it "Returned > with feedback". I hope all this discussion will trigger more research > in this space. > I've noticed that the zedstore efforts ran into the same problem that refactoring triggers has: we cannot determine which columns in a table will be affected by a trigger. so we have to assume that all of them will be. This causes a lot of unnecessary overhead with triggers. If we had a compilation step for triggers (which, ultimately means a compilation step for procedures) which kept a dependency tree of which tables/columns were touched, then we would have that insight. it's true that one dynamic statement or SELECT * would force us right back to keep-everything, but if procedures which did not do such things had performance benefits, that would be an incentive to code them more fastidiously.
CREATE ROUTINE MAPPING
A few months ago, I was researching ways for formalizing calling functions on one postgres instance from another. RPC, basically. In doing so, I stumbled across an obscure part of the the SQL Standard called ROUTINE MAPPING, which is exactly what I'm looking for. The syntax specified is, roughly: CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec SERVER my_server [ OPTIONS( ... ) ] Which isn't too different from CREATE USER MAPPING. The idea here is that if I had a local query: SELECT t.x, remote_func1(), remote_func2(t.y) FROM remote_table t WHERE t.active = true; that would become this query on the remote side: SELECT t.x, local_func1(), local_func2(t.y) FROM local_table t WHERE t.active = true; That was probably the main intention of this feature, but I see a different possibility there. Consider the cases: SELECT remote_func(1,'a'); and SELECT * FROM remote_srf(10, true); Now we could have written remote_func() and remote_srf() in plpythonu, and it could access whatever remote data that we wanted to see, but that exposes our local server to the untrusted pl/python module as well as python process overhead. We could create a specialized foreign data wrapper that requires a WHERE clause to include all the require parameters as predicates, essentially making every function a table, but that's awkward and unclear to an end user. Having the ability to import functions from other servers allows us to write foreign servers that expose functions to the local database, and those foreign servers handle the bloat and risks associated with accessing that remote data. Moreover, it would allow hosted environments (AWS, etc) that restrict the extensions that can be added to the database to still connect to those foreign data sources. I'm hoping to submit a patch for this someday, but it touches on several areas of the codebase where I have no familiarity, so I've put forth to spark interest in the feature, to see if any similar work is underway, or if anyone can offer guidance. Thanks in advance.
Re: CREATE ROUTINE MAPPING
> > PostgreSQL allows function overloading, which means that there can be > multiple functions with same name differing in argument types. So, the > syntax has to include the input parameters or their types at least. > "local_routine_name" and "remote_routine_spec" were my own paraphrasings of what the spec implies. I'm nearly certain that the local routine name, which the spec says is just an identifier, cannot have a parameter spec on it, which leaves only one other place to define it, remote_routine_spec, which wasn't defined at all. I _suppose_ parameter definitions could be pushed into options, but that'd be ugly.
Re: CREATE ROUTINE MAPPING
> > > > It goes on from there, but I think there's a reasonable interpretation > of this which allows us to use the same syntax as CREATE > (FUNCTION|PROCEDURE), apart from the body, e.g.: > > CREATE ROUTINE MAPPING local_routine_name > FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [ argname ] > argtype [ { DEFAULT | = } default_expr ] [, ...] ] ) >[ RETURNS rettype > | RETURNS TABLE ( column_name column_type [, ...] ) ] > SERVER foreign_server_name >[ (option [, ...]) ] > > Does that seem like too broad an interpretation? > That's really interesting. I didn't think to look in the definition of CREATE FUNCTION to see if a SERVER option popped in there, but seems like a more accessible way to introduce the notion of remote functions, because I talked to a few developers about this before posting to the list, and only one had ever heard of ROUTINE MAPPING and had no clear recollection of it. An option on CREATE FUNCTION is going to get noticed (and used!) a lot sooner. Having said that, I think syntactically we have to implement CREATE ROUTINE MAPPING, even if it is just translated to a CREATE FUNCTION call. In either case, I suspected that pg_proc would need a nullable srvid column pointing to pg_foreign_server, and possibly a new row in pg_language for 'external'. I had entertained having a pg_routine_mappings table like pg_user_mappings, and we still could, if the proc's language of 'external' clued the planner to look for the mapping. I can see arguments for either approach. Before anyone asks, I looked for, and did not find, any suggestion of IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there wouldn't be any way to grab all the functions that .a foreign server is offering up.
Re: CREATE ROUTINE MAPPING
> > CREATE ROUTINE MAPPING local_routine_name > > > FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [ > argname ] > > > argtype [ { DEFAULT | = } default_expr ] [, ...] ] ) > > >[ RETURNS rettype > > > | RETURNS TABLE ( column_name column_type [, ...] ) ] > > > SERVER foreign_server_name > > >[ (option [, ...]) ] > > > > > > Does that seem like too broad an interpretation? > > > > > > > I had entertained having a pg_routine_mappings table like > > pg_user_mappings, and we still could, if the proc's language of > > 'external' clued the planner to look for the mapping. I can see > > arguments for either approach. > > It would be good to have them in the catalog somehow if we make CREATE > ROUTINE MAPPING a DDL. If I've read the standard correctly, there are > parts of information_schema which come into play for those routine > mappings. > > > Before anyone asks, I looked for, and did not find, any suggestion of > > IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there > > wouldn't be any way to grab all the functions that .a foreign server is > > offering up. > > How about making an option to IMPORT FOREIGN SCHEMA do it? > > Ok, so the steps seem to be: 1. settle on syntax. 2. determine data dictionary structures 3. parse and create those structures 4. "handle" external functions locally 5. provide structures passed to FDW handlers so that they can handle external functions 6. implement those handlers in postgres_fdw #1 is largely prescribed for us, though I'm curious as to how the CRM statements I've made up in examples above would look like as CREATE FUNCTION ... SERVER ... #2 deserves a lot of debate, but probably mostly hinges on the new "language" and how to associate a pg_proc entry with a pg_foreign_server #3 i'm guessing this is a lot of borrowing code from CREATE ROUTINE MAPPING but is otherwise pretty straightforward. #4 an external function obviously cannot be executed locally, doing so means that the planner failed to push it down, so this is probably stub-error functions #5 These functions would essentially be passed in the same as foreign columns with the "name" as "f(a,b,4)", and the burden of forming the remote query is on the FDW Which gets tricky. What should happen in simple situations is obvious: SELECT t.x, remote_func1(), remote_func2(t.y) FROM remote_table t WHERE t.active = true; that would become this query on the remote side: SELECT t.x, local_func1(), local_func2(t.y) FROM local_table t WHERE t.active = true; And it's still simple when local functions consume remote input SELECT local_func1(remote_func1(r.x)) FROM remote_table r WHERE r.active = true; But other situations seem un-handle-able to me: SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true; In those cases, at least initially, I think the FDW handler is right to raise an error, because the function inputs are unknowable at query time, and the inputs cannot also be pushed down to the remote server. That might not be common, but I can see situations like this: SELECT r.* FROM remote_srf( ( SELECT remote_code_value FROM local_table_of_remote_codes WHERE local_code_value = 'xyz' ) ) r; and we would want things like that to work. Currently is similar table-situations the FDW has no choice but to fetch the entire table and filter locally. That's good for tables, whose contents are knowable, but the set of possible function inputs is unreasonably large. The current workaround in table-land is to run the inner query locally, and present the result at a constant to a follow-up query, so maybe that's what we have to do here, at least initially. #6 is where the FDW either does the translation or rejects the notion that functions can be pushed down, either outright or based on the usage of the function in the query. I'm doing this thinking on the mailing list in the hopes that it evokes suggestions, warnings, suggested code samples, and of course, help.
Re: CREATE ROUTINE MAPPING
> > > > > > But other situations seem un-handle-able to me: > > > > SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true; > > Do we have any way, or any plan to make a way, to push the set (SELECT > x FROM local_table WHERE active = true) to the remote side for > execution there? Obviously, there are foreign DBs that couldn't > support this, but I'm guessing they wouldn't have much by way of UDFs > either. > No. The remote query has to be generated at planning time, so it can't make predicates out of anything that can't be resolved into constants by the planner itself. The complexities of doing so would be excessive, far better to let the application developer split the queries up because they know better which parts have to resolve first.
\describe*
Some of the discussions about making psql more user friendly (more tab completions help, exit, etc) got me thinking about other ways that psql could be more friendly, and the one that comes to mind is our terse but cryptic \d* commands. I think it would be helpful and instructive to have corresponding long-form describe commands. Take describing schemas. Is \dn intuitive? Not really. In hindsight, you may think "yeah, a schema is a namespace", but you never guessed 'n' on the first try, or the second. Looking over exec_command_d() a bit, I think it's a bit of a stretch do have each command handle a long form like this: \describe table my_table or \describe table verbose my_table because then each \d-variant has to account for objects named "table" and "verbose" and that's a path to unhappiness. But if we dash-separated them, then all of the strcmps would be in the 'e' subsection, and each one would just have to know it's long to short translation, and call exec_command_d with the corresponding short command describe => d describe-verbose => d+ describe-aggregates-verbose => da+ describe-roles => du We could even presume the verbose flag in all cases (after all, the user was being verbose...), which would also cut down on tab-completion results, and we could check for interactive mode and display a message like \describe-schemas (short: \dn+) so that the person has the opportunity to learn the corresponding short command. In additional to aiding tab completion discovery of the commands (i.e. typing "\desc" and then hitting tab, it would also make scripts a little more self-documenting. Thoughts?
Re: \describe*
> > It would be about as hard to memorize \describe-schemas as it is to > memorize \dn: > You'd have to remember that it is "-" and not "_", that it is "describe", > not "desc" > and that it is "schemas", not "schema". > You wouldn't memorize them. You'd discover them with tab completion. Type "\d" and you'll see \d \dA \dc \dd \ddp \des \deu \df \dFd \dFt \di \dL \dn \d0 \drds \dS \dT \dv \dy \da \db \dC \dD \dE \det \dew \dF \dFp \dg \dl \dm \do \dp \ds \dt \du \dx which is more heat than light. Yes, those are all the possibilites, but I, Joe Newguy, want to list schemas, and \ds and \dS look like the good guesses, neither of which is the right answer. If, with this feature, I typed \desc, I might see: \describe \describe-functions \describe-schemas \describe-tables ... So my voyage of discovery would have completed with having typed "\desc-sc" and if we add a note to interactive mode, I'd be shown the hint that \dn is the shortcut for that just above the list of schemas.
Re: [PATCH v1] Add \echo_stderr to psql
> > >\warn ... >\warning ... > These two seem about the best to me, drawing from the perl warn command. I suppose we could go the bash &2 route here, but I don't want to.
Re: range_agg
> > One question is how to aggregate ranges that would leave gaps and/or > overlaps. So in my extension there is a one-param version that forbids > gaps & overlaps, but I let you permit them by passing extra parameters, > so the signature is: > Perhaps a third way would be to allow and preserve the gaps. A while back I wrote an extension called disjoint_date_range for storing sets of dates where it was assumed that most dates would be contiguous. The basic idea was that The core datatype was an array of ranges of dates, and with every modification you'd unnest them all to their discrete elements and use a window function to identify "runs" of dates and recompose them into a canonical set. It was an efficient way of representing "Every day last year except for June 2nd and August 4th, when we closed business for special events." For arrays of ranges the principle is the same but it'd get a bit more tricky, you'd have to order by low bound, use window functions to detect adjacency/overlap to identify your runs, and the generate the canonical minimum set of ranges in your array.
Re: Table as argument in postgres function
> > > You can pass table name as text or table object id as regclass type. > > inside procedure you should to use dynamic sql - execute statement. > Generally you cannot to use a variable as table or column name ever. > > Dynamic SQL is other mechanism - attention on SQL injection. > On this note, Snowflake has the ability to to parameterize object names (see: https://docs.snowflake.net/manuals/sql-reference/identifier-literal.html ) So you can do things like SELECT col_a, col_b FROM identifier('a_table_name') or as a bind variable SELECT col_a, col_b FROM identifier($1) Which is their way of avoiding SQL injection attacks in *some* circumstances. Their implementation of it is a bit uneven, but it has proven useful for my work. I can see where this obviously would prevent the planning of a prepared statement when a table name is a parameter, but the request comes up often enough, and the benefits to avoiding SQL injection attacks are significant enough that maybe we should try to enable it for one-off. I don't necessarily think we need an identifier(string) function, a 'schema.table'::regclass would be more our style. Is there anything preventing us from having the planner resolve object names from strings?
Re: Table as argument in postgres function
> > >> Is there anything preventing us from having the planner resolve object >> names from strings? >> > > The basic problem is fact so when you use PREPARE, EXECUTE protocol, you > has not parameters in planning time. > I agree that it defeats PREPARE as it is currently implemented with PQprepare(), and it would never be meaningful to have a query plan that hasn't finalized which objects are involved. But could it be made to work with PQexecParams(), where the parameter values are already provided? Could we make a version of PQprepare() that takes an extra array of paramValues for object names that must be supplied at prepare-time?
Re: PostgreSQL 12 Beta 1 press release draft
For CTEs, is forcing inlining the example we want to give, rather than the example of forcing materialization given? According to the docs, virtual generated columns aren't yet supported. I'm pretty sure the docs are right. Do we still want to mention it? Otherwise it looks good to me. On Tue, May 21, 2019 at 11:39 PM Jonathan S. Katz wrote: > Hi, > > Attached is a draft of the PG12 Beta 1 press release that is going out > this Thursday. The primary goals of this release announcement are to > introduce new features, enhancements, and changes that are available in > PG12, as well as encourage our users to test and provide feedback to > help ensure the stability of the release. > > Speaking of feedback, please provide me with your feedback on the > technical correctness of this announcement so I can incorporate changes > prior to the release. > > Thanks! > > Jonathan >
Re: Why we allow CHECK constraint contradiction?
On Wed, Oct 10, 2018 at 1:44 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Tuesday, October 9, 2018, Imai, Yoshikazu < > imai.yoshik...@jp.fujitsu.com> wrote: >> >> Are there any rows which can satisfy the ct's CHECK constraint? If not, >> why we >> allow creating table when check constraint itself is contradicted? >> > > I'd bet on it being a combination of complexity and insufficient expected > benefit. Time is better spent elsewhere. Mathmatically proving a > contradiction in software is harder than reasoning about it mentally. > I've actually used that as a feature, in postgresql and other databases, where assertions were unavailable, or procedural code was unavailable or against policy. Consider the following: CREATE TABLE wanted_values ( x integer ); INSERT INTO wanted_values VALUES (1), (2), (3); CREATE TABLE found_values ( x integer ); INSERT INTO found_values VALUES (1), (3); CREATE TABLE missing_values ( x integer, CONSTRAINT contradiction CHECK (false) ); INSERT INTO missing_values SELECT x FROM wanted_values EXCEPT SELECT x FROM found_values; gives the error ERROR: new row for relation "missing_values" violates check constraint "contradiction" DETAIL: Failing row contains (2). Which can be handy when you need to fail a transaction because of bad data and don't have branching logic available.
Re: COPY FROM WHEN condition
On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen wrote: > > > On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder < > c...@burggraben.net> wrote: > >> You can: >> COPY ( query ) TO 'filename'; >> > it is for COPY FROM > > regards > Surafel > It didn't get far, but you may want to take a look at a rejected patch for copy_srf() (set returning function) https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com https://commitfest.postgresql.org/12/869/ Having a set returning function gives you the full expressiveness of SQL, at the cost of an extra materialization step.
Re: CopyFrom() has become way too complicated
> > I think the code needs to be split up so that CopyFrom() in the loop > body calls CopyFromOneTuple(), which then also splits out the tuple > routing into its own CopyFromOneTupleRoute() function (that's 200 LOC on > its own...). I suspect it'd also be good to refactor the > partition-change code out into its own function. > +1 I had a hard time with this when doing my copy_srf() misadventure.
Re: date_trunc() in a specific time zone
> > >> A use case that I see quite a lot of is needing to do reports and other > >> calculations on data per day/hour/etc but in the user's time zone. The > >> way to do that is fairly trivial, but it's not obvious what it does so > >> reading queries becomes just a little bit more difficult. > > +1 A client encountered this exact problem last week, and I was surprised that the parameter didn't already exist.
Re: COPY FROM WHEN condition
> > > Are you thinking something like having a COPY command that provides > > results in such a way that they could be referenced in a FROM clause > > (perhaps a COPY that defines a cursor…)? > > That would also be nice, but what I was thinking of was that some > highly restricted subset of cases of SQL in general could lend > themselves to levels of optimization that would be impractical in > other contexts. > If COPY (or a syntactical equivalent) can return a result set, then the whole of SQL is available to filter and aggregate the results and we don't have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves subtly different from a similar FROM-WHERE. Also, what would we be saving computationally? The whole file (or program output) has to be consumed no matter what, the columns have to be parsed no matter what. At least some of the columns have to be converted to their assigned datatypes enough to know whether or not to filter the row, but we might be able push that logic inside a copy. I'm thinking of something like this: SELECT x.a, sum(x.b) FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) ) WHERE x.d >= '2018-11-01' In this case, there is the *opportunity* to see the following optimizations: - columns c and e are never referenced, and need never be turned into a datum (though we might do so just to confirm that they conform to the data type) - if column d is converted first, we can filter on it and avoid converting columns a,b - whatever optimizations we can infer from knowing that the two surviving columns will go directly into an aggregate If we go this route, we can train the planner to notice other optimizations and add those mechanisms at that time, and then existing code gets faster. If we go the COPY-WHEN route, then we have to make up new syntax for every possible future optimization.
Re: COPY FROM WHEN condition
> > > > SELECT x.a, sum(x.b) > > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b > numeric, c text, d date, e json) ) > > Apologies for bike-shedding, but wouldn't the following be a better > fit with the current COPY? > > COPY t(a integer, b numeric, c text, d date, e json) FROM > '/path/to/foo.txt' WITH (FORMAT CSV, INLINE) > +1 Very much a better fit. > >
Re: partitioned tables referenced by FKs
On Fri, Nov 2, 2018 at 7:42 PM Alvaro Herrera wrote: > Here's a patch to allow partitioned tables to be referenced by foreign > keys. Current state is WIP, but everything should work; see below for > the expected exception. > > The design is very simple: have one pg_constraint row for each partition > on each side, each row pointing to the topmost table on the other side; > triggers appear on each leaf partition (and naturally they don't appear > on any intermediate partitioned table). > This is an important and much needed feature! Based on my extremely naive reading of this code, I have two perhaps equally naive questions: 1. it seems that we will continue to to per-row RI checks for inserts and updates. However, there already exists a bulk check in RI_Initial_Check(). Could we modify this bulk check to do RI checks on a per-statement basis rather than a per-row basis? 2. If #1 is possible, is the overhead of transitions tables too great for the single-row case?
Re: partitioned tables referenced by FKs
> > > > 1. it seems that we will continue to to per-row RI checks for inserts and > > updates. However, there already exists a bulk check in > RI_Initial_Check(). > > Could we modify this bulk check to do RI checks on a per-statement basis > > rather than a per-row basis? > > One of the goals when implementing trigger transition tables was to > supplant the current per-row implementation of RI triggers with > per-statement. I haven't done that, but AFAIK it remains possible :-) > > Changing that is definitely not a goal of this patch. > Then I may try to tackle it myself in a separate thread. Without an implementation, I can't say, but if I had to guess, I would > assume so. Or maybe there are clever optimizations for that particular > case. > But in this case there is no actual defined trigger, it's internal code making an SPI call...is there an indicator that tells us whether this change was multi-row or not?
Re: [HACKERS] generated columns
> > > 3. Radical alternative: Collapse everything into one new column. We > > could combine atthasdef and attgenerated and even attidentity into a new > > column. (Only one of the three can be the case.) This would give > > client code a clean break, which may or may not be good. The > > implementation would be uglier than #1 but probably cleaner than #2. We > > could also get 4 bytes back per pg_attribute row. > > > > I'm happy with the current choice #1, but it's worth thinking about. > > #3 looks very appealing in my opinion as those columns have no overlap, > so it would take five possible values: > Could the removed columns live on...as generated-always columns?
Re: Desirability of client-side expressions in psql?
> > >>psql> \if :i >= 5 > >> > > I think we're ok with that so long as none of the operators or values > has a > > \ in it. > > What barriers do you see to re-using the pgbench grammar? > > The pgbench expression grammar mimics SQL expression grammar, > on integers, floats, booleans & NULL. > > I'm unsure about some special cases in psql (`shell command`, > 'text' "identifier"). They can be forbidden on a new commande (\let), > but what happens on "\if ..." which I am afraid allows them is unclear. > > -- > Fabien. > (raising this thread from hibernation now that I have the bandwidth) It seems like the big barriers to just using pgbench syntax are: - the ability to indicate that the next thing to follow will be a pgbench expression - a way to coax pgbench truth-y values into psql truthy values (t/f, y/n, 1/0) For that, I see a few ways forward: 1. A suffix on \if, \elif, -exp suffix (or even just -x) to existing commands to indicate that a pgbench expression would follow This would look something like \ifx \elifx \setx \if$ \elif$ \set$ 2. A command-line-esque switch or other sigil to indicate that what follows is a pgbench expression with psql vars to interpolate Example: \set foo -x 1 + 4 \set foo \expr 1 + 4 \if -x :limit > 10 \if \expr :limit > 10 3. A global toggle to indicate which mode should be used by \if, \elif, and \set Example: \pset expressions [on | off] 4. A combination of #2 and #3 with a corresponding switch/sigil to indicate "do not evaluate pgbench-style This is particularly appealing to me because it would allow code snippets from pgbench to be used without modification, while still allowing the user to mix-in old/new style to an existing script. 5. A special variant of `command` where variables are interpolated before being sent to the OS, and allow that on \if, \elif \set foo ``expr :y + :z`` \set foo $( expr :y + :z ) \if ``expr :limit > 10`` \if $( expr :limit > 10 ) This also has some appeal because it allows for a great amount of flexibility, but obviously constrains us with OS-dependencies. The user might have a hard time sending commands with ')' in them if we go the $( ) route 6. Option #5, but we add an additional executable (suggested name: pgexpr) to the client libs, which encapsulates the pgbench expression library as a way around OS-dependent code. 7. I believe someone suggested introducing the :{! pgbench-command} or :{{ pgbench-command }} var-mode \set foo :{! :y + :z } \set foo :{{ :y + :z }} \if :{! :limit > 10 } \if :{{ :limit > 10 }} This has some appeal as well, though I prefer the {{...}} syntax because "!" looks like negation, and {{ resembles the [[ x + y ]] syntax in bash One nice thing is that most of these options are not mutually exclusive. Thoughts?
Re: csv format for psql
> > > Or we could kill both issues by hard-wiring the separator as ','. +1 I've never encountered a situation where a customer wanted a custom delimiter AND quoted strings. So either they wanted pure CSV or a customed TSV. Could we have another output type called "separated" that uses the existing --fieldsep / --recordsep? Word will get out that csv is faster, but we'd still have the flexibility if somebody really wanted it.
Re: csv format for psql
On Sun, Nov 25, 2018 at 11:23 PM Tom Lane wrote: > Corey Huinker writes: > > Could we have another output type called "separated" that uses the > existing > > --fieldsep / --recordsep? > > Uh, what's the difference from the existing unaligned format? > No footer and I guess we'd want to escape instances of fieldsep and recordsep in the data, so I guess if we had an option to escape instances of fieldsep/recordsep found in the data, unaligned would work fine.
Re: Statistics Import and Export
On Mon, Jan 22, 2024 at 1:09 AM Peter Smith wrote: > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there were CFbot test failures last time it was run [2]. Please have a > look and post an updated version if necessary. > > == > [1] https://commitfest.postgresql.org/46/4538/ > [2] > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4538 > > Kind Regards, > Peter Smith. > Attached is v4 of the statistics export/import patch. This version has been refactored to match the design feedback received previously. The system views are gone. These were mostly there to serve as a baseline for what an export query would look like. That role is temporarily reassigned to pg_export_stats.c, but hopefully they will be integrated into pg_dump in the next version. The regression test also contains the version of each query suitable for the current server version. The export format is far closer to the raw format of pg_statistic and pg_statistic_ext_data, respectively. This format involves exporting oid values for types, collations, operators, and attributes - values which are specific to the server they were created on. To make sense of those values, a subset of the columns of pg_type, pg_attribute, pg_collation, and pg_operator are exported as well, which allows pg_import_rel_stats() and pg_import_ext_stats() to reconstitute the data structure as it existed on the old server, and adapt it to the modern structure and local schema objects. pg_import_rel_stats matches up local columns with the exported stats by column name, not attnum. This allows for stats to be imported when columns have been dropped, added, or reordered. pg_import_ext_stats can also handle column reordering, though it currently would get confused by changes in expressions that maintain the same result data type. I'm not yet brave enough to handle importing nodetrees, nor do I think it's wise to try. I think we'd be better off validating that the destination extended stats object is identical in structure, and to fail the import of that one object if it isn't perfect. Export formats go back to v10.
Re: Document efficient self-joins / UPDATE LIMIT techniques.
> > I have changed the status of commitfest entry to "Returned with > Feedback" as Laurenz's comments have not yet been resolved. Please > handle the comments and update the commitfest entry accordingly. > > Here's another attempt, applying Laurenz's feedback: I removed all changes to the SELECT documentation. That might seem strange given that the heavy lifting happens in the SELECT, but I'm working from the assumption that people's greatest need for a ctid self-join will be because they are trying to find the LIMIT keyword on UPDATE/DELETE and coming up empty. Because the join syntax is subtly different between UPDATE and DELETE, I've kept code examples in both, but the detailed explanation is in UPDATE under the anchor "update-limit" and the DELETE example links to it. From 298c812838491408e6910f7535067ea147abe5fc Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Sat, 3 Feb 2024 14:38:50 -0500 Subject: [PATCH v2] Documentation: Show alternatives to LIMIT on UPDATE and DELETE Show examples of how to simulate UPDATE or DELETE with a LIMIT clause. These examples also serve to show the existence and utility of ctid self-joins. --- doc/src/sgml/ref/delete.sgml | 18 + doc/src/sgml/ref/update.sgml | 38 +++- 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml index 1b81b4e7d7..21aae30e91 100644 --- a/doc/src/sgml/ref/delete.sgml +++ b/doc/src/sgml/ref/delete.sgml @@ -234,6 +234,24 @@ DELETE FROM films In some cases the join style is easier to write or faster to execute than the sub-select style. + + While there is no LIMIT clause for + DELETE, it is possible to get a similar effect + using the method for UPDATE operations described + in greater detail here. + +WITH delete_batch AS ( + SELECT l.ctid + FROM user_logs AS l + WHERE l.status = 'archived' + ORDER BY l.creation_date + LIMIT 1 + FOR UPDATE +) +DELETE FROM user_logs AS ul +USING delete_branch AS del +WHERE ul.ctid = del.ctid; + diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml index 2ab24b0523..49e0dc29de 100644 --- a/doc/src/sgml/ref/update.sgml +++ b/doc/src/sgml/ref/update.sgml @@ -434,7 +434,6 @@ UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003'; COMMIT; - Change the kind column of the table films in the row on which the cursor @@ -442,6 +441,43 @@ COMMIT; UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films; + + Updates affecting many rows can have negative effects on system performance, + such as table bloat, increased replica lag, increased lock contention, + and possible failure of the operation due to a deadlock. In such situations + it can make sense to perform the operation in smaller batches. Performing a + VACUUM operation on the table in between batches can help + reduce table bloat. The + SQL standard does + not define a LIMIT clause for UPDATE + operations, but it is possible get a similar effect through the use of a + Common Table Expression and an + efficient self-join via the system column + ctid: + +WITH exceeded_max_retries AS ( + SELECT w.ctid + FROM work_item AS w + WHERE w.status = 'active' + AND w.num_retries > 10 + ORDER BY w.retry_timestamp + FOR UPDATE + LIMIT 5000 +) +UPDATE work_item +SET status = 'failed' +FROM exceeded_max_retries AS emr +WHERE work_item.ctid = emr.ctid + +If lock contention is a concern, then SKIP LOCKED can +be added to the CTE. However, one final +UPDATE without SKIP LOCKED or +LIMIT will be needed to ensure that no matching rows +were overlooked. The use of an ORDER BY clause allows +the command to prioritize which rows will be locked and updated. This can +also reduce contention with other update operations if they use the same +ordering. + -- 2.43.0
Re: Add SHELL_EXIT_CODE to psql
> > > > The patch set seem to be in a good shape and pretty stable for quite a > while. > Could you add one more minor improvement, a new line after variables > declaration? > As you wish. Attached. > > After this changes, I think, we make this patch RfC, shall we? > > Fingers crossed. From bb55e0fd1cd2de6fa25b7fc738dd6482d0c008c4 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Wed, 11 Jan 2023 17:27:23 -0500 Subject: [PATCH v9 1/3] Add PG_OS_TARGET environment variable to enable OS-specific changes to regression tests. --- src/test/regress/pg_regress.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 6cd5998b9d..bf0ada4560 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -594,6 +594,17 @@ initialize_environment(void) #endif } + /* + * Regression tests that invoke OS commands must occasionally use + * OS-specific syntax (example: NUL vs /dev/null). This env var allows + * those tests to adjust commands accordingly. + */ +#if defined(WIN32) + setenv("PG_OS_TARGET", "WIN32", 1); +#else + setenv("PG_OS_TARGET", "OTHER", 1); +#endif + /* * Set translation-related settings to English; otherwise psql will * produce translated messages and produce diffs. (XXX If we ever support -- 2.39.2 From fe954f883f4ee65cbfe5dc2c20f012465d031ada Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Thu, 12 Jan 2023 23:04:31 -0500 Subject: [PATCH v9 2/3] Add wait_result_to_exit_code(). --- src/common/wait_error.c | 15 +++ src/include/port.h | 1 + 2 files changed, 16 insertions(+) diff --git a/src/common/wait_error.c b/src/common/wait_error.c index 4a3c3c61af..1eb7ff3fa2 100644 --- a/src/common/wait_error.c +++ b/src/common/wait_error.c @@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found) return true; return false; } + +/* + * Return the POSIX exit code (0 to 255) that corresponds to the argument. + * The argument is a return code returned by wait(2) or waitpid(2), which + * also applies to pclose(3) and system(3). + */ +int +wait_result_to_exit_code(int exit_status) +{ + if (WIFEXITED(exit_status)) + return WEXITSTATUS(exit_status); + if (WIFSIGNALED(exit_status)) + return WTERMSIG(exit_status); + return 0; +} diff --git a/src/include/port.h b/src/include/port.h index e66193bed9..1b119a3c56 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src); extern char *wait_result_to_str(int exitstatus); extern bool wait_result_is_signal(int exit_status, int signum); extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found); +extern int wait_result_to_exit_code(int exit_status); /* * Interfaces that we assume all Unix system have. We retain individual macros -- 2.39.2 From 0f91719c26b6201f4aaa436bd13141ba325e2f85 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Wed, 22 Feb 2023 14:55:34 -0500 Subject: [PATCH v9 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE which report the success or failure of the most recent psql OS-command executed via the \! command or `backticks`. These variables are roughly analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL commands. --- doc/src/sgml/ref/psql-ref.sgml | 21 ++ src/bin/psql/command.c | 15 + src/bin/psql/help.c| 4 src/bin/psql/psqlscanslash.l | 34 +- src/test/regress/expected/psql.out | 29 + src/test/regress/sql/psql.sql | 25 ++ 6 files changed, 123 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index dc6528dc11..3d8a7353b3 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4264,6 +4264,27 @@ bar + + SHELL_ERROR + + + true if the last shell command failed, false if + it succeeded. This applies to shell commands invoked via either \! + or `. See also SHELL_EXIT_CODE. + + + + + + SHELL_EXIT_CODE + + + The exit code return by the last shell command, invoked via either \! or `. + 0 means no error. See also SHELL_ERROR. + + + + SHOW_ALL_RESULTS diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 955397ee9d..40ba2810ea 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -5032,6 +5032,21 @@ do_shell(const char *command) else result = system(command); + if (result == 0) + { + SetVariable(pset.vars, "SHELL_EXIT_CODE", "0"); + SetVariable(pset.vars, "SHELL_ERROR", "false"); + } + else + { + int exit_code = wait_result_to_exit_code(result); + char buf[32]; + + snpri
Re: Add SHELL_EXIT_CODE to psql
On Wed, Feb 22, 2023 at 4:18 PM Thomas Munro wrote: > On Tue, Jan 31, 2023 at 9:23 AM Corey Huinker > wrote: > >> Unfortunately, there is a fail in FreeBSD > https://cirrus-ci.com/task/6466749487382528 > >> > >> Maybe, this patch is need to be rebased? > >> > > > > That failure is in postgres_fdw, which this code doesn't touch. > > > > I'm not able to get to > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/regression.out > - It's not in either of the browseable zip files and the 2nd zip file isn't > downloadable, so it's hard to see what's going on. > > Yeah, that'd be our current top flapping CI test[1]. These new > "*-running" tests (like the old "make installcheck") are only enabled > in the FreeBSD CI task currently, so that's why the failures only show > up there. I see[2] half a dozen failures of the "fdw_retry_check" > thing in the past ~24 hours. > > [1] > https://www.postgresql.org/message-id/flat/20221209001511.n3vqodxobqgscuil%40awork3.anarazel.de > [2] http://cfbot.cputube.org/highlights/regress.html Thanks for the explanation. I was baffled.
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
On Wed, Feb 22, 2023 at 5:47 PM Andres Freund wrote: > Hi, > > On 2023-02-22 16:34:44 -0500, Tom Lane wrote: > > I wrote: > > > Andres Freund writes: > > >> Maybe it's worth sticking a StaticAssert() for the struct size > > >> somewhere. > > > > > Indeed. I thought we had one already. > > > > >> I'm a bit wary about that being too noisy, there are some machines > with > > >> odd alignment requirements. Perhaps worth restricting the assertion to > > >> x86-64 + armv8 or such? > > > > > I'd put it in first and only reconsider if it shows unfixable problems. > > > > Now that we've got the sizeof(ExprEvalStep) under control, shouldn't > > we do the attached? > > Indeed. Pushed. > > Let's hope there's no rarely used architecture with odd alignment rules. > > Greetings, > > Andres Freund > > > I have a question about this that may affect some of my future work. My not-ready-for-16 work on CAST( ... ON DEFAULT ... ) involved making FuncExpr/IoCoerceExpr/ArrayCoerceExpr have a safe_mode flag, and that necessitates adding a reserror boolean to ExprEvalStep for subsequent steps to test if the error happened. Will that change be throwing some architectures over the 64 byte count?
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
On Thu, Feb 23, 2023 at 2:39 PM Andres Freund wrote: > Hi, > > On 2023-02-23 13:56:56 -0500, Tom Lane wrote: > > Corey Huinker writes: > > > My not-ready-for-16 work on CAST( ... ON DEFAULT ... ) involved making > > > FuncExpr/IoCoerceExpr/ArrayCoerceExpr have a safe_mode flag, and that > > > necessitates adding a reserror boolean to ExprEvalStep for subsequent > steps > > > to test if the error happened. > > > > Why do you want it in ExprEvalStep ... couldn't it be in ExprState? > > I can't see why you'd need more than one at a time during evaluation. > > I don't know exactly what CAST( ... ON DEFAULT ... ) is aiming for - I > guess > it wants to assign a different value when the cast fails? Is the default > expression a constant, or does it need to be runtime evaluated? If a > const, > then the cast steps just could assign the new value. If runtime evaluation > is > needed I'd expect the various coerce steps to jump to the value > implementing > the default expression in case of a failure. > The default expression is itself a cast expression. So CAST (expr1 AS some_type DEFAULT expr2 ON ERROR) would basically be a safe-mode cast of expr1 to some_type, and only upon failure would the non-safe cast of expr2 to some_type be executed. Granted, the most common use case would be for expr2 to be a constant or something that folds into a constant, but the proposed spec allows for it. My implementation involved adding a setting to CoalesceExpr that tested for error flags rather than null flags, hence putting it in ExprEvalStep and ExprState (perhaps mistakenly). Copying and adapting EEOP_JUMP_IF_NOT_NULL lead me to this: EEO_CASE(EEOP_JUMP_IF_NOT_ERROR) { /* Transfer control if current result is non-error */ if (!*op->reserror) { *op->reserror = false; EEO_JUMP(op->d.jump.jumpdone); } /* reset error flag */ *op->reserror = false; EEO_NEXT(); }
Re: Disable vacuuming to provide data history
On Thu, Feb 23, 2023 at 6:04 AM wrote: > Hey, > > It depnends on scenario, but there is many use cases that hack data > change from somebody with admin privileges could be disaster. > That is the place where data history could come with help. Some basic > solution would be trigger which writes previous version of record > to some other table. Trigger however can be disabled or removed (crazy > solution would be to provide pernament > triggers and tables which can only be pernamently inserted). > Then we have also possibility to modify tablespace directly on disk. > > But Postgres has ability to not override records when two concurrent > transaction modify data to provide MVCC. > > So what about pernamently not vacuumable tables. Adding some xid log > tables with hash of record on hash on previous hash. > I think that would be serious additional advantage for best open source > relational databes. > > Best regards, >Marek Mosiewicz > What you are describing sounds like the "system versioning" flavor of "temporal" tables. It's a part of the SQL Standard, but PostgreSQL has yet to implement it in core. Basically, every row has a start_timestamp and end_timestamp field. Updating a row sets the end_timestamp of the old version and inserts a new one with a start_timestamp matching the end-timestamp of the previous row. Once a record has a non-null [1] end_timestamp, it is not possible to update that row via SQL. Regular SQL statements effectively have a "AND end_timestamp IS NULL" filter on them, so the old rows are not visible without specifically invoking temporal features to get point-in-time queries. At the implementation level, this probably means a table with 2 partitions, one for live rows all having null end_timestamps, and one for archived rows which is effectively append-only. This strategy is common practice for chain of custody and auditing purposes, either as a feature of the RDBMS or home-rolled. I have also seen it used for developing forecasting models (ex "what would this model have told us to do if we had run it a year ago?"). A few years ago, I personally thought about implementing a hash-chain feature, but my research at the time concluded that: * Few customers were interested in going beyond what was required for regulatory compliance * Once compliant, any divergence from established procedures, even if it was an unambiguous improvement, only invited re-examination of it and adjacent procedures, and they would avoid that * They could get the same validation by comparing against a secured backup and out-of-band audit "logs" (most would call them "reports") * They were of the opinion that if a bad actor got admin access, it was "game over" anyway The world may have changed since then, but even if there is now interest, I wonder if that isn't better implemented at the OS level rather than the RDBMS level. [1] some implementations don't use null, they use an end-timestamp set to a date implausibly far in the future ( 3999-12-31 for example ), but the concept remains that once the column is set to a real timestamp, the row isn't visible to update statements.
Re: verbose mode for pg_input_error_message?
On Thu, Feb 23, 2023 at 4:47 PM Nathan Bossart wrote: > On Thu, Feb 23, 2023 at 11:30:38AM -0800, Nathan Bossart wrote: > > Will post a new version shortly. > > As promised... > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com Looks good to me, passes make check-world. Thanks for slogging through this.
Re: Add SHELL_EXIT_CODE to psql
On Thu, Mar 2, 2023 at 1:35 PM Tom Lane wrote: > Corey Huinker writes: > > [ v9-0003-Add-psql-variables-SHELL_ERROR-and-SHELL_EXIT_COD.patch ] > > I took a brief look through this. I'm on board with the general > concept, but I think you've spent too much time on an ultimately > futile attempt to get a committable test case, and not enough time > I didn't want to give the impression that I was avoiding/dismissing the test case, and at some point I got curious how far we could push pg_regress. > on making the behavior correct/usable. In particular, it bothers > me that there doesn't appear to be any way to distinguish whether > a command failed on nonzero exit code versus a signal. Also, > That's an intriguing distinction, and one I hadn't considered, mostly because I assumed that any command with a set of exit codes rich enough to warrant inspection would have a separate exit code for i-was-terminated. It would certainly be possible to have two independent booleans, SHELL_ERROR and SHELL_SIGNAL. > I see that you're willing to report whatever ferror returns > (something quite unspecified in relevant standards, beyond > zero/nonzero) as an equally-non-distinguishable integer. > I had imagined users of this feature falling into two camps: 1. Users writing scripts for their specific environment, with the ability to know what exit codes are possible and different desired courses of action based on those codes. 2. Users in no specific environment, content with SHELL_ERROR boolean, who are probably just going to test for failure, and if so either \set a default or \echo a message and \quit. > > I'm tempted to suggest that we ought to be returning a string, > along the lines of "OK" or "exit code N" or "signal N" or > "I/O error". This though brings up the question of exactly > what you expect scripts to use the variable for. Maybe such > a definition would be unhelpful, but if so why? Maybe we > should content ourselves with adding the SHELL_ERROR boolean, and > leave the details to whatever we print in the error message? > Having a curated list of responses like "OK" and "exit code N" is also intriguing, but could be hard to unpack given psql's limited string manipulation capabilities. I think the SHELL_ERROR boolean solves most cases, but it was suggested in https://www.postgresql.org/message-id/20221102115801.gg16...@telsasoft.com that users might want more detail than that, even if that detail is unspecified and highly system dependent. > > As for the test case, I'm unimpressed with it because it's so > weak as to be nearly useless; plus it seems like a potential > security issue (what if "nosuchcommand" exists?). It's fine > to have it during development, I guess, but I'm inclined to > leave it out of the eventual commit. > > I have no objection to leaving it out. I think it proves that writing os-specific pg_regress tests is hard, and little else.
Re: Add SHELL_EXIT_CODE to psql
On Fri, Mar 10, 2023 at 3:49 PM Tom Lane wrote: > I'm okay with adopting bash's rule, but then it should actually match > bash --- signal N is reported as 128+N, not -N. > 128+N is implemented. Nonzero pclose errors of any kind will now overwrite an existing exit_code. I didn't write a formal test for the signals, but instead tested it manually: [310444:16:54:32 EDT] corey=# \! sleep 1000 -- in another window, i found the pid of the sleep command and did a kill -9 on it [310444:16:55:15 EDT] corey=# \echo :SHELL_EXIT_CODE 137 137 = 128+9, so that checks out. The OS-specific regression test has been modified. On Windows systems it attempts to execute "CON", and on other systems it attempts to execute "/". These are marginally better than "nosuchcommand" in that they should always exist on their host OS and could never be a legit executable. I have no opinion whether the test should live on past the development phase, but if it does not then the 0001 patch is not needed. From a5dddedcc7bf654b4b65c14ff7b89b9d83bb5c27 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Fri, 17 Mar 2023 06:43:24 -0400 Subject: [PATCH v9 1/3] Add PG_OS_TARGET environment variable to enable OS-specific changes to regression tests. --- src/test/regress/pg_regress.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 7b23cc80dc..333aaab139 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -594,6 +594,17 @@ initialize_environment(void) #endif } + /* + * Regression tests that invoke OS commands must occasionally use + * OS-specific syntax (example: NUL vs /dev/null). This env var allows + * those tests to adjust commands accordingly. + */ +#if defined(WIN32) + setenv("PG_OS_TARGET", "WIN32", 1); +#else + setenv("PG_OS_TARGET", "OTHER", 1); +#endif + /* * Set translation-related settings to English; otherwise psql will * produce translated messages and produce diffs. (XXX If we ever support -- 2.39.2 From fe578661eef7dbe33aec8f4eaa6dca80a1304c9b Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Fri, 17 Mar 2023 06:44:43 -0400 Subject: [PATCH v9 2/3] Add wait_result_to_exit_code(). --- src/common/wait_error.c | 15 +++ src/include/port.h | 1 + 2 files changed, 16 insertions(+) diff --git a/src/common/wait_error.c b/src/common/wait_error.c index 4a3c3c61af..8e1c0e3fc8 100644 --- a/src/common/wait_error.c +++ b/src/common/wait_error.c @@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found) return true; return false; } + +/* + * Return the POSIX exit code (0 to 255) that corresponds to the argument. + * The argument is a return code returned by wait(2) or waitpid(2), which + * also applies to pclose(3) and system(3). + */ +int +wait_result_to_exit_code(int exit_status) +{ + if (WIFEXITED(exit_status)) + return WEXITSTATUS(exit_status); + if (WIFSIGNALED(exit_status)) + return 128 + WTERMSIG(exit_status); + return 0; +} diff --git a/src/include/port.h b/src/include/port.h index e66193bed9..1b119a3c56 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src); extern char *wait_result_to_str(int exitstatus); extern bool wait_result_is_signal(int exit_status, int signum); extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found); +extern int wait_result_to_exit_code(int exit_status); /* * Interfaces that we assume all Unix system have. We retain individual macros -- 2.39.2 From 251863ddcbc4eecf4fa15e332724acbd3c652478 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Fri, 17 Mar 2023 16:46:57 -0400 Subject: [PATCH v9 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE which report the success or failure of the most recent psql OS-command executed via the \! command or `backticks`. These variables are roughly analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL commands. --- doc/src/sgml/ref/psql-ref.sgml | 21 ++ src/bin/psql/command.c | 15 + src/bin/psql/help.c| 4 src/bin/psql/psqlscanslash.l | 35 +- src/test/regress/expected/psql.out | 34 + src/test/regress/sql/psql.sql | 30 + 6 files changed, 134 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7b8ae9fac3..e6e307fd18 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4267,6 +4267,27 @@ bar + + SHELL_ERROR + + + true if the last shell command failed, false if + it succeeded. This applies to shell commands invoked via either \! + or `. See also SHELL_EXIT_CODE. + + + + + +
Re: Add SHELL_EXIT_CODE to psql
On Mon, Mar 20, 2023 at 1:01 PM Tom Lane wrote: > Corey Huinker writes: > > 128+N is implemented. > > I think this mostly looks OK, but: > > * I still say there is no basis whatever for believing that the result > of ferror() is an exit code, errno code, or anything else with > significance beyond zero-or-not. Feeding it to wait_result_to_exit_code > as you've done here is not going to do anything but mislead people in > a platform-dependent way. Probably should set exit_code to -1 if > ferror reports trouble. > Agreed. Sorry, I read your comment wrong the last time or I would have already made it so. > > * Why do you have wait_result_to_exit_code defaulting to return 0 > if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED? > That seems pretty misleading; again -1 would be a better idea. > That makes sense as well. Given that WIFSIGNALED is currently defined as the negation of WIFEXITED, whatever default result we have here is basically a this-should-never-happen. That might be something we want to catch with an assert, but I'm fine with a -1 for now. From 1ff2b5ba4b82efca0edf60a9fb1cdf8307471eee Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Fri, 17 Mar 2023 06:43:24 -0400 Subject: [PATCH v11 1/3] Add PG_OS_TARGET environment variable to enable OS-specific changes to regression tests. --- src/test/regress/pg_regress.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 7b23cc80dc..333aaab139 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -594,6 +594,17 @@ initialize_environment(void) #endif } + /* + * Regression tests that invoke OS commands must occasionally use + * OS-specific syntax (example: NUL vs /dev/null). This env var allows + * those tests to adjust commands accordingly. + */ +#if defined(WIN32) + setenv("PG_OS_TARGET", "WIN32", 1); +#else + setenv("PG_OS_TARGET", "OTHER", 1); +#endif + /* * Set translation-related settings to English; otherwise psql will * produce translated messages and produce diffs. (XXX If we ever support -- 2.39.2 From 3d9c43cfec7f60785c1f1fa1aaa3e1b48224ef98 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Mon, 20 Mar 2023 16:05:10 -0400 Subject: [PATCH v11 2/3] Add wait_result_to_exit_code(). --- src/common/wait_error.c | 15 +++ src/include/port.h | 1 + 2 files changed, 16 insertions(+) diff --git a/src/common/wait_error.c b/src/common/wait_error.c index 4a3c3c61af..80b94cae51 100644 --- a/src/common/wait_error.c +++ b/src/common/wait_error.c @@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found) return true; return false; } + +/* + * Return the POSIX exit code (0 to 255) that corresponds to the argument. + * The argument is a return code returned by wait(2) or waitpid(2), which + * also applies to pclose(3) and system(3). + */ +int +wait_result_to_exit_code(int exit_status) +{ + if (WIFEXITED(exit_status)) + return WEXITSTATUS(exit_status); + if (WIFSIGNALED(exit_status)) + return 128 + WTERMSIG(exit_status); + return -1; +} diff --git a/src/include/port.h b/src/include/port.h index e66193bed9..1b119a3c56 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src); extern char *wait_result_to_str(int exitstatus); extern bool wait_result_is_signal(int exit_status, int signum); extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found); +extern int wait_result_to_exit_code(int exit_status); /* * Interfaces that we assume all Unix system have. We retain individual macros -- 2.39.2 From bf65f36fd57977acaac57972425d1717a205cb72 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Mon, 20 Mar 2023 16:05:23 -0400 Subject: [PATCH v11 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE which report the success or failure of the most recent psql OS-command executed via the \! command or `backticks`. These variables are roughly analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL commands. --- doc/src/sgml/ref/psql-ref.sgml | 21 ++ src/bin/psql/command.c | 15 + src/bin/psql/help.c| 4 src/bin/psql/psqlscanslash.l | 33 + src/test/regress/expected/psql.out | 34 ++ src/test/regress/sql/psql.sql | 30 ++ 6 files changed, 133 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7b8ae9fac3..e6e307fd18 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4267,6 +4267,27 @@ bar + + SHELL_ERROR + +
Re: Add SHELL_EXIT_CODE to psql
> > > As you had it, any nonzero result would prevent backtick substitution. > I'm not really sure how much thought went into the existing behavior, > but I am pretty sure that it's not part of this patch's charter to > change that. > > regards, tom lane > The changes all make sense, thanks!
Re: Add n_tup_newpage_upd to pg_stat table views
> > > * No more dedicated struct to carry around the type of an update. > > We just use two boolean arguments to the pgstats function instead. The > struct didn't seem to be adding much, and it was distracting to track > the information this way within heap_update(). > That's probably a good move, especially if we start tallying updates that use TOAST.
Re: doc: add missing "id" attributes to extension packaging page
> > TBH I'm a bit afraid that people will immediately start complaining > about the failing docs builds after this got applied since it forces > them to add ids to all varlistenries in a variablelist if they add one, > which can be perceived as quite a burden (also committers and reviewers > will have to get used to always watch out for failing docs builds > because of this). > As a person who had to re-rebase a patch because I discovered that id tags had been added to one of the files I was working on, I can confidently say "don't worry". It wasn't that big of a deal, I wasn't even following this thread at the time and I immediately figured out what had happened and what was expected of me. So it isn't that much of an inconvenience. If there is a negative consequence to this change, it would be that it might incentivize patch writers to omit documentation completely at the early stages. But I don't think that's a problem because people generally see a lack of documentation as a clue that maybe the patch isn't ready to be reviewed, and this change would only reinforce that litmus test. I had suggested we do something like this a few years back (the ids, that is. the idea that we could check for compliance was beyond my imagination at the time), and I'm glad to see both finally happening. While I can foresee people overlooking the docs build, such oversights won't go long before being caught, and the fix is simple. Now if we can just get a threaded version of xlstproc to make the builds faster... p.s. I'm "Team Paperclip" when it comes to the link hint, but let's get the feature in first and worry about the right character later.
Re: Make ON_ERROR_STOP stop on shell script failure
On Fri, Mar 24, 2023 at 11:07 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > On 20.03.23 19:31, Greg Stark wrote: > > So I took a look at this patch. The conflict is with 2fe3bdbd691 > > committed by Peter Eisentraut which added error checks for pipes. > > Afaics the behaviour is now for pipe commands returning non-zero to > > cause an error*always* regardless of the setting of ON_ERROR_STOP. > Commit b0d8f2d983cb25d1035fae1cd7de214dd67809b4 adds SHELL_ERROR as a set to 'true' whenever a \! or backtick command has a nonzero exit code. So it'll need some rebasing to remove the duplicated work. So it's now possible to do this: \set result = `some command` \if :SHELL_ERROR -- maybe test SHELL_EXIT_CODE to see what kind of error \echo some command failed -- nah, just quit \q \endif > > I'm not entirely sure that's sensible actually. If you write to a pipe > > that ends in grep and it happens to produce no matching rows you may > > actually be quite surprised when that causes your script to fail... > I agree that that would be quite surprising, and this feature would be a non-starter for them. But if we extended the SHELL_ERROR and SHELL_EXIT_CODE patch to handle output pipes (which maybe we should have done in the first place), the handling would look like this: SELECT ... \g grep Frobozz \if :SHELL_ERROR SELECT :SHELL_EXIT_CODE = 1 AS grep_found_nothing \gset \if :grep_found_nothing ...not-a-real-error handling... \else ...regular error handling... \endif \endif ...and that would be the solution for people who wanted to do something more nuanced than ON_ERROR_STOP.
Re: Make ON_ERROR_STOP stop on shell script failure
On Fri, Mar 24, 2023 at 2:16 PM Corey Huinker wrote: > > > On Fri, Mar 24, 2023 at 11:07 AM Peter Eisentraut < > peter.eisentr...@enterprisedb.com> wrote: > >> On 20.03.23 19:31, Greg Stark wrote: >> > So I took a look at this patch. The conflict is with 2fe3bdbd691 >> > committed by Peter Eisentraut which added error checks for pipes. >> > Afaics the behaviour is now for pipe commands returning non-zero to >> > cause an error*always* regardless of the setting of ON_ERROR_STOP. >> > > Commit b0d8f2d983cb25d1035fae1cd7de214dd67809b4 adds SHELL_ERROR as a set > to 'true' whenever a \! or backtick command has a nonzero exit code. So > it'll need some rebasing to remove the duplicated work. > > So it's now possible to do this: > > \set result = `some command` > \if :SHELL_ERROR >-- maybe test SHELL_EXIT_CODE to see what kind of error >\echo some command failed >-- nah, just quit >\q > \endif > > >> > I'm not entirely sure that's sensible actually. If you write to a pipe >> > that ends in grep and it happens to produce no matching rows you may >> > actually be quite surprised when that causes your script to fail... >> > > I agree that that would be quite surprising, and this feature would be a > non-starter for them. But if we extended the SHELL_ERROR and > SHELL_EXIT_CODE patch to handle output pipes (which maybe we should have > done in the first place), the handling would look like this: > > SELECT ... \g grep Frobozz > \if :SHELL_ERROR > SELECT :SHELL_EXIT_CODE = 1 AS grep_found_nothing \gset > \if :grep_found_nothing > ...not-a-real-error handling... > \else > ...regular error handling... > \endif > \endif > > ...and that would be the solution for people who wanted to do something > more nuanced than ON_ERROR_STOP. > > Dangit. Replied to Peter's email thinking he had gone off Greg's culling of the recipients. Re-culled.
Re: Add SHELL_EXIT_CODE to psql
On Tue, Mar 21, 2023 at 3:33 PM Corey Huinker wrote: > >> As you had it, any nonzero result would prevent backtick substitution. >> I'm not really sure how much thought went into the existing behavior, >> but I am pretty sure that it's not part of this patch's charter to >> change that. >> >> regards, tom lane >> > > The changes all make sense, thanks! > This is a follow up patch to apply the committed pattern to the various piped output commands. I spotted this oversight in the https://www.postgresql.org/message-id/CADkLM=dMG6AAWfeKvGnKOzz1O7ZNctFR1BzAA3K7-+XQxff=4...@mail.gmail.com thread and, whether or not that feature gets in, we should probably apply it to output pipes as well. From 9e29989372b7f42fcb7eac6dd15769ebe643abb5 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Fri, 24 Mar 2023 17:20:27 -0400 Subject: [PATCH v1] Have pipes set SHELL_ERROR and SHELL_EXIT_CODE. --- src/bin/psql/common.c | 20 +--- src/bin/psql/copy.c | 4 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index f907f5d4e8..3be24250ad 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -103,7 +103,13 @@ setQFout(const char *fname) if (pset.queryFout && pset.queryFout != stdout && pset.queryFout != stderr) { if (pset.queryFoutPipe) - pclose(pset.queryFout); + { + char buf[32]; + int exit_code = pclose(pset.queryFout); + snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(exit_code)); + SetVariable(pset.vars, "SHELL_EXIT_CODE", buf); + SetVariable(pset.vars, "SHELL_ERROR", (exit_code == 0) ? "false" : "true"); + } else fclose(pset.queryFout); } @@ -1652,7 +1658,11 @@ ExecQueryAndProcessResults(const char *query, { if (gfile_is_pipe) { - pclose(gfile_fout); + char buf[32]; + int exit_code = pclose(gfile_fout); + snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(exit_code)); + SetVariable(pset.vars, "SHELL_EXIT_CODE", buf); + SetVariable(pset.vars, "SHELL_ERROR", (exit_code == 0) ? "false" : "true"); restore_sigpipe_trap(); } else @@ -1870,7 +1880,11 @@ ExecQueryUsingCursor(const char *query, double *elapsed_msec) /* close \g argument file/pipe */ if (is_pipe) { - pclose(fout); + char buf[32]; + int exit_code = pclose(fout); + snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(exit_code)); + SetVariable(pset.vars, "SHELL_EXIT_CODE", buf); + SetVariable(pset.vars, "SHELL_ERROR", (exit_code == 0) ? "false" : "true"); restore_sigpipe_trap(); } else diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c index 110d2d7269..00e01095fe 100644 --- a/src/bin/psql/copy.c +++ b/src/bin/psql/copy.c @@ -375,6 +375,7 @@ do_copy(const char *args) { if (options->program) { + char buf[32]; int pclose_rc = pclose(copystream); if (pclose_rc != 0) @@ -391,6 +392,9 @@ do_copy(const char *args) } success = false; } + snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(pclose_rc)); + SetVariable(pset.vars, "SHELL_EXIT_CODE", buf); + SetVariable(pset.vars, "SHELL_ERROR", (pclose_rc == 0) ? "false" : "true"); restore_sigpipe_trap(); } else -- 2.39.2
Re: Statistics Import and Export
> > ANALYZE takes out one lock StatisticRelationId per relation, not per > attribute like we do now. If we didn't release the lock after every > attribute, and we only called the function outside of a larger transaction > (as we plan to do with pg_restore) then that is the closest we're going to > get to being consistent with ANALYZE. > v9 attached. This adds pg_dump support. It works in tests against existing databases such as dvdrental, though I was surprised at how few indexes have attribute stats there. Statistics are preserved by default, but this can be disabled with the option --no-statistics. This follows the prevailing option pattern in pg_dump, etc. There are currently several failing TAP tests around pg_dump/pg_restore/pg_upgrade. I'm looking at those, but in the mean time I'm seeking feedback on the progress so far. From bf291e323fc01215264d41b75d579c11bd22d2ec Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Mon, 11 Mar 2024 14:18:39 -0400 Subject: [PATCH v9 1/2] Create pg_set_relation_stats, pg_set_attribute_stats. These functions will be used by pg_dump/restore and pg_upgrade to convey relation and attribute statistics from the source database to the target. This would be done instead of vacuumdb --analyze-in-stages. Both functions take an oid to identify the target relation that will receive the statistics. There is nothing requiring that relation to be the same one as the one exported, though the statistics would have to make sense in the context of the new relation. Typecasts for stavaluesN parameters may fail if the destination column is not of the same type as the source column. The parameters of pg_set_attribute_stats identify the attribute by name rather than by attnum. This is intentional because the column order may be different in situations other than binary upgrades. For example, dropped columns do not carry over in a restore. The statistics imported by pg_set_attribute_stats are imported transactionally like any other operation. However, pg_set_relation_stats does it's update in-place, which is to say non-transactionally. This is in line with what ANALYZE does to avoid table bloat in pg_class. These functions also allows for tweaking of table statistics in-place, allowing the user to inflate rowcounts, skew histograms, etc, to see what those changes will evoke from the query planner. --- src/include/catalog/pg_proc.dat | 15 + src/include/statistics/statistics.h | 2 + src/backend/statistics/Makefile | 3 +- src/backend/statistics/meson.build| 1 + src/backend/statistics/statistics.c | 410 ++ .../regress/expected/stats_export_import.out | 211 + src/test/regress/parallel_schedule| 2 +- src/test/regress/sql/stats_export_import.sql | 198 + doc/src/sgml/func.sgml| 95 9 files changed, 935 insertions(+), 2 deletions(-) create mode 100644 src/backend/statistics/statistics.c create mode 100644 src/test/regress/expected/stats_export_import.out create mode 100644 src/test/regress/sql/stats_export_import.sql diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 700f7daf7b..a726451a6f 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12170,4 +12170,19 @@ proargtypes => 'int2', prosrc => 'gist_stratnum_identity' }, +# Statistics Import +{ oid => '8048', + descr => 'set statistics on relation', + proname => 'pg_set_relation_stats', provolatile => 'v', proisstrict => 't', + proparallel => 'u', prorettype => 'bool', + proargtypes => 'oid float4 int4 int4', + proargnames => '{relation,reltuples,relpages,relallvisible}', + prosrc => 'pg_set_relation_stats' }, +{ oid => '8049', + descr => 'set statistics on attribute', + proname => 'pg_set_attribute_stats', provolatile => 'v', proisstrict => 'f', + proparallel => 'u', prorettype => 'bool', + proargtypes => 'oid name bool float4 int4 float4 int2 int2 int2 int2 int2 _float4 _float4 _float4 _float4 _float4 text text text text text', + proargnames => '{relation,attname,stainherit,stanullfrac,stawidth,stadistinct,stakind1,stakind2,stakind3,stakind4,stakind5,stanumbers1,stanumbers2,stanumbers3,stanumbers4,stanumbers5,stavalues1,stavalues2,stavalues3,stavalues4,stavalues5}', + prosrc => 'pg_set_attribute_stats' }, ] diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h index 7f2bf18716..1dddf96576 100644 --- a/src/include/statistics/statistics.h +++ b/src/include/statistics/statistics.h @@ -127,4 +127,6 @@ extern StatisticExtInfo *choose_
Re: Statistics Import and Export
> > > * pg_set_relation_stats() needs to do an ACL check so you can't set the > stats on someone else's table. I suggest honoring the new MAINTAIN > privilege as well. > Added. > > * If possible, reading from pg_stats (instead of pg_statistic) would be > ideal because pg_stats already does the right checks at read time, so a > non-superuser can export stats, too. > Done. That was sorta how it was originally, so returning to that wasn't too hard. > > * If reading from pg_stats, should you change the signature of > pg_set_relation_stats() to have argument names matching the columns of > pg_stats (e.g. most_common_vals instead of stakind/stavalues)? > Done. > > In other words, make this a slightly higher level: conceptually > exporting/importing pg_stats rather than pg_statistic. This may also > make the SQL export queries simpler. > Eh, about the same. > Also, I'm wondering about error handling. Is some kind of error thrown > by pg_set_relation_stats() going to abort an entire restore? That might > be easy to prevent with pg_restore, because it can just omit the stats, > but harder if it's in a SQL file. > Aside from the oid being invalid, there's not a whole lot that can go wrong in set_relation_stats(). The error checking I did closely mirrors that in analyze.c. Aside from the changes you suggested, as well as the error reporting change you suggested for pg_dump, I also filtered out attempts to dump stats on views. A few TAP tests are still failing and I haven't been able to diagnose why, though the failures in parallel dump seem to be that it tries to import stats on indexes that haven't been created yet, which is odd because I sent the dependency. All those changes are available in the patches attached. From ba411ce31c25193cf05bc4206dcb8f2bf32af0c7 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Mon, 11 Mar 2024 14:18:39 -0400 Subject: [PATCH v10 1/2] Create pg_set_relation_stats, pg_set_attribute_stats. These functions will be used by pg_dump/restore and pg_upgrade to convey relation and attribute statistics from the source database to the target. This would be done instead of vacuumdb --analyze-in-stages. Both functions take an oid to identify the target relation that will receive the statistics. There is nothing requiring that relation to be the same one as the one exported, though the statistics would have to make sense in the context of the new relation. The parameters of pg_set_attribute_stats intentionaly mirror the columns in the view pg_stats, with the ANYARRAY types casted to TEXT. Those values will be cast to arrays of the basetype of the attribute, and that operation may fail if the attribute type has changed. The statistics imported by pg_set_attribute_stats are imported transactionally like any other operation. However, pg_set_relation_stats does it's update in-place, which is to say non-transactionally. This is in line with what ANALYZE does to avoid table bloat in pg_class. These functions also allows for tweaking of table statistics in-place, allowing the user to inflate rowcounts, skew histograms, etc, to see what those changes will evoke from the query planner. --- src/include/catalog/pg_proc.dat | 15 + src/include/statistics/statistics.h | 2 + src/backend/statistics/Makefile | 3 +- src/backend/statistics/meson.build| 1 + src/backend/statistics/statistics.c | 521 ++ .../regress/expected/stats_export_import.out | 211 +++ src/test/regress/parallel_schedule| 2 +- src/test/regress/sql/stats_export_import.sql | 188 +++ doc/src/sgml/func.sgml| 83 +++ 9 files changed, 1024 insertions(+), 2 deletions(-) create mode 100644 src/backend/statistics/statistics.c create mode 100644 src/test/regress/expected/stats_export_import.out create mode 100644 src/test/regress/sql/stats_export_import.sql diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 700f7daf7b..3070d72d7a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12170,4 +12170,19 @@ proargtypes => 'int2', prosrc => 'gist_stratnum_identity' }, +# Statistics Import +{ oid => '8048', + descr => 'set statistics on relation', + proname => 'pg_set_relation_stats', provolatile => 'v', proisstrict => 't', + proparallel => 'u', prorettype => 'bool', + proargtypes => 'oid float4 int4 int4', + proargnames => '{relation,reltuples,relpages,relallvisible}', + prosrc => 'pg_set_relation_stats' }, +{ oid => '8049', + descr => 'set statistics on attribute', + proname => 'pg_set_attribute_stats
Re: Statistics Import and Export
> > > > > From testrun/pg_dump/002_pg_dump/log/regress_log_002_pg_dump, search > for the "not ok" and then look at what it tried to do right before > that. I see: > > pg_dump: error: prepared statement failed: ERROR: syntax error at or > near "%" > LINE 1: ..._histogram => %L::real[]) coalesce($2, format('%I.%I', > a.nsp... > Thanks. Unfamiliar turf for me. > > > All those changes are available in the patches attached. > > How about if you provided "get" versions of the functions that return a > set of rows that match what the "set" versions expect? That would make > 0001 essentially a complete feature itself. > That's tricky. At the base level, those functions would just be an encapsulation of "SELECT * FROM pg_stats WHERE schemaname = $1 AND tablename = $2" which isn't all that much of a savings. Perhaps we can make the documentation more explicit about the source and nature of the parameters going into the pg_set_ functions. Per conversation, it would be trivial to add a helper functions that replace the parameters after the initial oid with a pg_class rowtype, and that would dissect the values needed and call the more complex function: pg_set_relation_stats( oid, pg_class) pg_set_attribute_stats( oid, pg_stats) > > I think it would also make the changes in pg_dump simpler, and the > tests in 0001 a lot simpler. > I agree. The tests are currently showing that a fidelity copy can be made from one table to another, but to do so we have to conceal the actual stats values because those are 1. not deterministic/known and 2. subject to change from version to version. I can add some sets to arbitrary values like was done for pg_set_relation_stats().
Re: Statistics Import and Export
v11 attached. - TAP tests passing (the big glitch was that indexes that are used in constraints should have their stats dependent on the constraint, not the index, thanks Jeff) - The new range-specific statistics types are now supported. I'm not happy with the typid machinations I do to get them to work, but it is working so far. These are stored out-of-stakind-order (7 before 6), which is odd because all other types seem store stakinds in ascending order. It shouldn't matter, it was just odd. - regression tests now make simpler calls with arbitrary stats to demonstrate the function usage more cleanly - pg_set_*_stats function now have all of their parameters in the same order as the table/view they pull from From 5c63ed5748eb3817d193b64329b57dc590e0196e Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Mon, 11 Mar 2024 14:18:39 -0400 Subject: [PATCH v11 1/2] Create pg_set_relation_stats, pg_set_attribute_stats. These functions will be used by pg_dump/restore and pg_upgrade to convey relation and attribute statistics from the source database to the target. This would be done instead of vacuumdb --analyze-in-stages. Both functions take an oid to identify the target relation that will receive the statistics. There is nothing requiring that relation to be the same one as the one exported, though the statistics would have to make sense in the context of the new relation. The parameters of pg_set_attribute_stats intentionaly mirror the columns in the view pg_stats, with the ANYARRAY types casted to TEXT. Those values will be cast to arrays of the basetype of the attribute, and that operation may fail if the attribute type has changed. The statistics imported by pg_set_attribute_stats are imported transactionally like any other operation. However, pg_set_relation_stats does it's update in-place, which is to say non-transactionally. This is in line with what ANALYZE does to avoid table bloat in pg_class. These functions also allows for tweaking of table statistics in-place, allowing the user to inflate rowcounts, skew histograms, etc, to see what those changes will evoke from the query planner. --- src/include/catalog/pg_proc.dat | 15 + src/include/statistics/statistics.h | 2 + src/backend/statistics/Makefile | 3 +- src/backend/statistics/meson.build| 1 + src/backend/statistics/statistics.c | 603 ++ .../regress/expected/stats_export_import.out | 283 src/test/regress/parallel_schedule| 2 +- src/test/regress/sql/stats_export_import.sql | 246 +++ doc/src/sgml/func.sgml| 91 +++ 9 files changed, 1244 insertions(+), 2 deletions(-) create mode 100644 src/backend/statistics/statistics.c create mode 100644 src/test/regress/expected/stats_export_import.out create mode 100644 src/test/regress/sql/stats_export_import.sql diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 177d81a891..f31412d4a6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12177,4 +12177,19 @@ proargtypes => 'int2', prosrc => 'gist_stratnum_identity' }, +# Statistics Import +{ oid => '8048', + descr => 'set statistics on relation', + proname => 'pg_set_relation_stats', provolatile => 'v', proisstrict => 't', + proparallel => 'u', prorettype => 'bool', + proargtypes => 'oid int4 float4 int4', + proargnames => '{relation,relpages,reltuples,relallvisible}', + prosrc => 'pg_set_relation_stats' }, +{ oid => '8049', + descr => 'set statistics on attribute', + proname => 'pg_set_attribute_stats', provolatile => 'v', proisstrict => 'f', + proparallel => 'u', prorettype => 'bool', + proargtypes => 'oid name bool float4 int4 float4 text _float4 text float4 text _float4 _float4 text float4 text', + proargnames => '{relation,attname,inherited,null_frac,avg_width,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation,most_common_elems,most_common_elem_freqs,elem_count_histogram,range_length_histogram,range_empty_frac,range_bounds_histogram}', + prosrc => 'pg_set_attribute_stats' }, ] diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h index 7f2bf18716..1dddf96576 100644 --- a/src/include/statistics/statistics.h +++ b/src/include/statistics/statistics.h @@ -127,4 +127,6 @@ extern StatisticExtInfo *choose_best_statistics(List *stats, char requiredkind, int nclauses); extern HeapTuple statext_expressions_load(Oid stxoid, bool inh, int idx); +extern Datum pg_set_relation_stats(PG_FUNCTION_ARGS); +extern Datum pg_set_attribute_stats(PG_FUNCTION_ARGS
Re: Statistics Import and Export
On Thu, Mar 21, 2024 at 2:29 AM Jeff Davis wrote: > On Tue, 2024-03-19 at 05:16 -0400, Corey Huinker wrote: > > v11 attached. > > Thank you. > > Comments on 0001: > > This test: > >+SELECT >+format('SELECT pg_catalog.pg_set_attribute_stats( ' >... > > seems misplaced. It's generating SQL that can be used to restore or > copy the stats -- that seems like the job of pg_dump, and shouldn't be > tested within the plain SQL regression tests. > Fair enough. > > And can the other tests use pg_stats rather than pg_statistic? > They can, but part of what I wanted to show was that the values that aren't directly passed in as parameters (staopN, stacollN) get set to the correct values, and those values aren't guaranteed to match across databases, hence testing them in the regression test rather than in a TAP test. I'd still like to be able to test that. > > The function signature for pg_set_attribute_stats could be more > friendly -- how about there are a few required parameters, and then it > only sets the stats that are provided and the other ones are either > left to the existing value or get some reasonable default? > That would be problematic. 1. We'd have to compare the stats provided against the stats that are already there, make that list in-memory, and then re-order what remains 2. There would be no way to un-set statistics of a given stakind, unless we added an "actually set it null" boolean for each parameter that can be null. 3. I tried that with the JSON formats, it made the code even messier than it already was. Make sure all error paths ReleaseSysCache(). > +1 > > Why are you calling checkCanModifyRelation() twice? > Once for the relation itself, and once for pg_statistic. > I'm confused about when the function should return false and when it > should throw an error. I'm inclined to think the return type should be > void and all failures should be reported as ERROR. > I go back and forth on that. I can see making it void and returning an error for everything that we currently return false for, but if we do that, then a statement with one pg_set_relation_stats, and N pg_set_attribute_stats (which we lump together in one command for the locking benefits and atomic transaction) would fail entirely if one of the set_attributes named a column that we had dropped. It's up for debate whether that's the right behavior or not. replaces[] is initialized to {true}, which means only the first element > is initialized to true. Try following the pattern in AlterDatabase (or > similar) which reads the catalog tuple first, then updates a few fields > selectively, setting the corresponding element of replaces[] along the > way. > +1. > > The test also sets the most_common_freqs in an ascending order, which > is weird. > I pulled most of the hardcoded values from pg_stats itself. The sample set is trivially small, and the values inserted were in-order-ish. So maybe that's why. > Relatedly, I got worried recently about the idea of plain users > updating statistics. In theory, that should be fine, and the planner > should be robust to whatever pg_statistic contains; but in practice > there's some risk of mischief there until everyone understands that the > contents of pg_stats should not be trusted. Fortunately I didn't find > any planner crashes or even errors after a brief test. > Maybe we could have the functions restricted to a role or roles: 1. pg_write_all_stats (can modify stats on ANY table) 2. pg_write_own_stats (can modify stats on tables owned by user) I'm iffy on the need for the first one, I list it first purely to show how I derived the name for the second. > One thing we can do is some extra validation for consistency, like > checking that the arrays are properly sorted, check for negative > numbers in the wrong place, or fractions larger than 1.0, etc. > +1. All suggestions of validation checks welcome.
Re: Statistics Import and Export
> > How about just some defaults then? Many of them have a reasonable > default, like NULL or an empty array. Some are parallel arrays and > either both should be specified or neither (e.g. > most_common_vals+most_common_freqs), but you can check for that. > +1 Default NULL has been implemented for all parameters after n_distinct. > > > > Why are you calling checkCanModifyRelation() twice? > > > > Once for the relation itself, and once for pg_statistic. > > Nobody has the privileges to modify pg_statistic except superuser, > right? I thought the point of a privilege check is that users could > modify statistics for their own tables, or the tables they maintain. > In which case wouldn't the checkCanModify on pg_statistic would be a proxy for is_superuser/has_special_role_we_havent_created_yet. > > > > > I can see making it void and returning an error for everything that > > we currently return false for, but if we do that, then a statement > > with one pg_set_relation_stats, and N pg_set_attribute_stats (which > > we lump together in one command for the locking benefits and atomic > > transaction) would fail entirely if one of the set_attributes named a > > column that we had dropped. It's up for debate whether that's the > > right behavior or not. > > I'd probably make the dropped column a WARNING with a message like > "skipping dropped column whatever". Regardless, have some kind of > explanatory comment. > That's certainly do-able. > > > > > I pulled most of the hardcoded values from pg_stats itself. The > > sample set is trivially small, and the values inserted were in-order- > > ish. So maybe that's why. > > In my simple test, most_common_freqs is descending: > >CREATE TABLE a(i int); >INSERT INTO a VALUES(1); >INSERT INTO a VALUES(2); >INSERT INTO a VALUES(2); >INSERT INTO a VALUES(3); >INSERT INTO a VALUES(3); >INSERT INTO a VALUES(3); >INSERT INTO a VALUES(4); >INSERT INTO a VALUES(4); >INSERT INTO a VALUES(4); >INSERT INTO a VALUES(4); >ANALYZE a; >SELECT most_common_vals, most_common_freqs > FROM pg_stats WHERE tablename='a'; > most_common_vals | most_common_freqs >--+--- > {4,3,2} | {0.4,0.3,0.2} >(1 row) > > Can you show an example where it's not? > Not off hand, no. > > > > > Maybe we could have the functions restricted to a role or roles: > > > > 1. pg_write_all_stats (can modify stats on ANY table) > > 2. pg_write_own_stats (can modify stats on tables owned by user) > > If we go that route, we are giving up on the ability for users to > restore stats on their own tables. Let's just be careful about > validating data to mitigate this risk. > A great many test cases coming in the next patch.
Re: Statistics Import and Export
> > > > But ideally we'd just make it safe to dump and reload stats on your own > tables, and then not worry about it. > That is my strong preference, yes. > > > Not off hand, no. > > To me it seems like inconsistent data to have most_common_freqs in > anything but descending order, and we should prevent it. > Sorry, I misunderstood, I thought we were talking about values, not the frequencies. Yes, the frequencies should only be monotonically non-increasing (i.e. it can go down or flatline from N->N+1). I'll add a test case for that.
Re: Statistics Import and Export
> > > > +\gexec > > Why do we need to construct the command and execute? Can we instead > execute the function directly? That would also avoid ECHO magic. > We don't strictly need it, but I've found the set-difference operation to be incredibly useful in diagnosing problems. Additionally, the values are subject to change due to changes in test data, no guarantee that the output of ANALYZE is deterministic, etc. But most of all, because the test cares about the correct copying of values, not the values themselves. > > + > +Database Object Statistics Import Functions > + > + > + > + > +Function > + > + > +Description > + > + > + > > COMMENT: The functions throw many validation errors. Do we want to list > the acceptable/unacceptable input values in the documentation corresponding > to those? I don't expect one line per argument validation. Something like > "these, these and these arguments can not be NULL" or "both arguments in > each of the pairs x and y, a and b, and c and d should be non-NULL or NULL > respectively". > Yes. It should. > Statistics are about data. Whenever pg_dump dumps some filtered data, the > statistics collected for the whole table are uselss. We should avoide > dumping > statistics in such a case. E.g. when only schema is dumped what good is > statistics? Similarly the statistics on a partitioned table may not be > useful > if some its partitions are not dumped. Said that dumping statistics on > foreign > table makes sense since they do not contain data but the statistics still > makes sense. > Good points, but I'm not immediately sure how to enforce those rules. > > >> >> Key areas where I'm seeking feedback: >> >> - What level of errors in a restore will a user tolerate, and what should >> be done to the error messages to indicate that the data itself is fine, but >> a manual operation to update stats on that particular table is now >> warranted? >> - To what degree could pg_restore/pg_upgrade take that recovery action >> automatically? >> - Should the individual attribute/class set function calls be grouped by >> relation, so that they all succeed/fail together, or should they be called >> separately, each able to succeed or fail on their own? >> - Any other concerns about how to best use these new functions. >> >> >> > Whether or not I pass --no-statistics, there is no difference in the dump > output. Am I missing something? > $ pg_dump -d postgres > /tmp/dump_no_arguments.out > $ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out > $ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out > $ > > IIUC, pg_dump includes statistics by default. That means all our pg_dump > related tests will have statistics output by default. That's good since the > functionality will always be tested. 1. We need additional tests to ensure > that the statistics is installed after restore. 2. Some of those tests > compare dumps before and after restore. In case the statistics is changed > because of auto-analyze happening post-restore, these tests will fail. > +1 > I believe, in order to import statistics through IMPORT FOREIGN SCHEMA, > postgresImportForeignSchema() will need to add SELECT commands invoking > pg_set_relation_stats() on each imported table and pg_set_attribute_stats() > on each of its attribute. Am I right? Do we want to make that happen in the > first cut of the feature? How do you expect these functions to be used to > update statistics of foreign tables? > I don't think there's time to get it into this release. I think we'd want to extend this functionality to both IMPORT FOREIGN SCHEMA and ANALYZE for foreign tables, in both cases with a server/table option to do regular remote sampling. In both cases, they'd do a remote query very similar to what pg_dump does (hence putting it in fe_utils), with some filters on which columns/tables it believes it can trust. The remote table might itself be a view (in which case they query would turn up nothing) or column data types may change across the wire, and in those cases we'd have to fall back to sampling.
Re: Statistics Import and Export
> > 1) The docs say this: > > >The purpose of this function is to apply statistics values in an >upgrade situation that are "good enough" for system operation until >they are replaced by the next ANALYZE, usually via >autovacuum This function is used by >pg_upgrade and pg_restore to >convey the statistics from the old system version into the new one. > > > I find this a bit confusing, considering the pg_dump/pg_restore changes > are only in 0002, not in this patch. > True, I'll split the docs. > > 2) Also, I'm not sure about this: > > relation, the parameters in this are all > derived from pg_stats, and the values > given are most often extracted from there. > > How do we know where do the values come from "most often"? I mean, where > else would it come from? > The next most likely sources would be 1. stats from another similar table and 2. the imagination of a user testing hypothetical query plans. > > 3) The function pg_set_attribute_stats() is very long - 1000 lines > or so, that's way too many for me to think about. I agree the flow is > pretty simple, but I still wonder if there's a way to maybe split it > into some smaller "meaningful" steps. > I wrestle with that myself. I think there's some pieces that can be factored out. > 4) It took me *ages* to realize the enums at the beginning of some of > the functions are actually indexes of arguments in PG_FUNCTION_ARGS. > That'd surely deserve a comment explaining this. > My apologies, it definitely deserves a comment. > > 5) The comment for param_names in pg_set_attribute_stats says this: > > /* names of columns that cannot be null */ > const char *param_names[] = { ... } > > but isn't that actually incorrect? I think that applies only to a couple > initial arguments, but then other fields (MCV, mcelem stats, ...) can be > NULL, right? > Yes, that is vestigial, I'll remove it. > > 6) There's a couple minor whitespace fixes or comments etc. > > > 0002 > > > 1) I don't understand why we have exportExtStatsSupported(). Seems > pointless - nothing calls it, even if it did we don't know how to export > the stats. > It's not strictly necessary. > > 2) I think this condition in dumpTableSchema() is actually incorrect: > > IMO that's wrong, the statistics should be delayed to the post-data > section. Which probably means there needs to be a separate dumpable > object for statistics on table/index, with a dependency on the object. > Good points. > > 3) I don't like the "STATS IMPORT" description. For extended statistics > we dump the definition as "STATISTICS" so why to shorten it to "STATS" > here? And "IMPORT" seems more like the process of loading data, not the > data itself. So I suggest "STATISTICS DATA". > +1
Re: Statistics Import and Export
> > > There's still a failure in the pg_upgrade TAP test. One seems to be > ordering, so perhaps we need to ORDER BY the attribute number. Others > seem to be missing relstats and I'm not sure why yet. I suggest doing > some manual pg_upgrade tests and comparing the before/after dumps to > see if you can reproduce a smaller version of the problem. > That's fixed in my current working version, as is a tsvector-specific issue. Working on the TAP issue.
Re: Statistics Import and Export
On Fri, Mar 29, 2024 at 7:34 PM Jeff Davis wrote: > On Fri, 2024-03-29 at 18:02 -0400, Stephen Frost wrote: > > I’d certainly think “with stats” would be the preferred default of > > our users. > > I'm concerned there could still be paths that lead to an error. For > pg_restore, or when loading a SQL file, a single error isn't fatal > (unless -e is specified), but it still could be somewhat scary to see > errors during a reload. > To that end, I'm going to be modifying the "Optimizer statistics are not transferred by pg_upgrade..." message when stats _were_ transferred, width additional instructions that the user should treat any stats-ish error messages encountered as a reason to manually analyze that table. We should probably say something about extended stats as well.
Re: Statistics Import and Export
> > (I've not read the patch yet, but I assume the switch works like > other pg_dump filters in that you can apply it on the restore > side?) > Correct. It follows the existing --no-something pattern.
Re: Statistics Import and Export
> > I'm getting late into this discussion and I apologize if I've missed this > being discussed before. But. > > Please don't. > > That will make it *really* hard for any form of automation or drivers of > this. The information needs to go somewhere where such tools can easily > consume it, and an informational message during runtime (which is also > likely to be translated in many environments) is the exact opposite of that. > That makes a lot of sense. I'm not sure what form it would take (file, pseudo-table, something else?). Open to suggestions.
Re: Statistics Import and Export
> > I didn't have any specific proposal in mind, was just trying to think > outside the box. > What if we added a separate resection SECTION_STATISTICS which is run following post-data?
Re: Statistics Import and Export
> > That will make it *really* hard for any form of automation or drivers of > this. The information needs to go somewhere where such tools can easily > consume it, and an informational message during runtime (which is also > likely to be translated in many environments) is the exact opposite of that. > Having given this some thought, I'd be inclined to create a view, pg_stats_missing, with the same security barrier as pg_stats, but looking for tables that lack stats on at least one column, or lack stats on an extended statistics object. Table structure would be schemaname name tablename name attnames text[] ext_stats text[] The informational message, if it changes at all, could reference this new view as the place to learn about how well the stats import went. vacuumdb might get a --missing-only option in addition to --analyze-in-stages. For that matter, we could add --analyze-missing options to pg_restore and pg_upgrade to do the mopping up themselves.
Re: Statistics Import and Export
On Sun, Mar 31, 2024 at 2:41 PM Tom Lane wrote: > Corey Huinker writes: > > Having given this some thought, I'd be inclined to create a view, > > pg_stats_missing, with the same security barrier as pg_stats, but looking > > for tables that lack stats on at least one column, or lack stats on an > > extended statistics object. > > The week before feature freeze is no time to be designing something > like that, unless you've abandoned all hope of getting this into v17. > It was a response to the suggestion that there be some way for tools/automation to read the status of stats. I would view it as a separate patch, as such a view would be useful now for knowing which tables to ANALYZE, regardless of whether this patch goes in or not. > There's a bigger issue though: AFAICS this patch set does nothing > about dumping extended statistics. I surely don't want to hold up > the patch insisting that that has to happen before we can commit the > functionality proposed here. But we cannot rip out pg_upgrade's > support for post-upgrade ANALYZE processing before we do something > about extended statistics, and that means it's premature to be > designing any changes to how that works. So I'd set that whole > topic on the back burner. > So Extended Stats _were_ supported by earlier versions where the medium of communication was JSON. However, there were several problems with adapting that to the current model where we match params to stat types: * Several of the column types do not have functional input functions, so we must construct the data structure internally and pass them to statext_store(). * The output functions for some of those column types have lists of attnums, with negative values representing positional expressions in the stat definition. This information is not translatable to another system without also passing along the attnum/attname mapping of the source system. At least three people told me "nobody uses extended stats" and to just drop that from the initial version. Unhappy with this assessment, I inquired as to whether my employer (AWS) had some internal databases that used extended stats so that I could get good test data, and came up with nothing, nor did anyone know of customers who used the feature. So when the fourth person told me that nobody uses extended stats, and not to let a rarely-used feature get in the way of a feature that would benefit nearly 100% of users, I dropped it. > It's possible that we could drop the analyze-in-stages recommendation, > figuring that this functionality will get people to the > able-to-limp-along level immediately and that all that is needed is a > single mop-up ANALYZE pass. But I think we should leave that till we > have a bit more than zero field experience with this feature. It may be that we leave the recommendation exactly as it is. Perhaps we enhance the error messages in pg_set_*_stats() to indicate what command would remediate the issue.