from:"Corey Huinker"

Re: WIP: System Versioned Temporal Table

2022-02-20 Thread Corey Huinker

>
>
> The spec does not allow schema changes at all on a a system versioned
> table, except to change the system versioning itself.
>
>
That would greatly simplify things!

Re: Window Function "Run Conditions"

2022-03-16 Thread Corey Huinker

On Tue, Mar 15, 2022 at 5:24 PM Greg Stark  wrote:

> This looks like an awesome addition.
>
> I have one technical questions...
>
> Is it possible to actually transform the row_number case into a LIMIT
> clause or make the planner support for this case equivalent to it (in
> which case we can replace the LIMIT clause planning to transform into
> a window function)?
>
> The reason I ask is because the Limit plan node is actually quite a
> bit more optimized than the general window function plan node. It
> calculates cost estimates based on the limit and can support Top-N
> sort nodes.
>
> But the bigger question is whether this patch is ready for a committer
> to look at? Were you able to resolve Andy Fan's bug report? Did you
> resolve the two questions in the original email?
>

+1 to all this

It seems like this effort would aid in implementing what some other
databases implement via the QUALIFY clause, which is to window functions
what HAVING is to aggregate functions.
example:
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause

Re: Huge memory consumption on partitioned table with FKs

2020-11-30 Thread Corey Huinker

>
> I think this can be solved easily in the patch, by having
> ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
> they are equal then use the parent's constaint_id, otherwise use the
> child constraint.  That way, the cache entry is reused in the common
> case where they are identical.
>

Somewhat of a detour, but in reviewing the patch for Statement-Level RI
checks, Andres and I observed that SPI made for a large portion of the RI
overhead.

Given that we're already looking at these checks, I was wondering if this
might be the time to consider implementing these checks by directly
scanning the constraint index.

Re: Huge memory consumption on partitioned table with FKs

2020-11-30 Thread Corey Huinker

On Mon, Nov 30, 2020 at 9:48 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Given that we're already looking at these checks, I was wondering if this
> > might be the time to consider implementing these checks by directly
> > scanning the constraint index.
>
> Yeah, maybe.  Certainly ri_triggers is putting a huge amount of effort
> into working around the SPI/parser/planner layer, to not a lot of gain.
>
> However, it's not clear to me that that line of thought will work well
> for the statement-level-trigger approach.  In that case you might be
> dealing with enough tuples to make a different plan advisable.
>
> regards, tom lane
>

Bypassing SPI would probably mean that we stay with row level triggers, and
the cached query plan would go away, perhaps replaced by an
already-looked-up-this-tuple hash sorta like what the cached nested loops
effort is doing.

I've been meaning to give this a try when I got some spare time. This may
inspire me to try again.

Re: simplifying foreign key/RI checks

2021-02-28 Thread Corey Huinker

>
> > It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
> > know that doesn't mean the usefulness of the macro but the mechanism
> > the macro suggests, but it is confusing.) On the other hand,
> > RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
> > longer used.  (Couldn't we remove them?)
>
> Yeah, better to just remove those _PK macros and say this module no
> longer runs any queries on the PK table.
>
> How about the attached?
>
>
Sorry for the delay.
I see that the changes were made as described.
Passes make check and make check-world yet again.
I'm marking this Ready For Committer unless someone objects.

A Case For Inlining Immediate Referential Integrity Checks

2021-03-14 Thread Corey Huinker

A Case For Inlining Immediate Referential Integrity Checks
--

The following is an overview of how Postgres currently implemented
referential integrity, the some problems with that architecture, attempted
solutions for those problems, and a suggstion of another possible solution.

Notes On Notation and Referential Integrity In General
--

All referential integrity is ultimately of this form:

R(X) => T(Y)

Where one referencing table R has a set of columns X That references a set
of columns Y which comprise a unique constraint on a target table T. Note
that the Y-set of columns is usually the primary key of T, but does not
have to be.

The basic referential integrity checks fall into two basic categories,
Insert and Delete, which can be checked Immediately following the
statement, or can be Deferred to the end of the transaction.

The Insert check is fairly straightforward. Any insert to R, or update of R
that modifies [1] any column in X, is checked to see if all of the X
columns are NOT NULL, and if so, a lookup is done on T to find a matching
row tuple of Y. If none is found, then an error is raised.

The Update check is more complicated, as it covers any UPDATE operation
that modifies [1] any column in Y, where all of the values of Y are NOT
NUL, as well as DELETE operation where all of the columns of Y are NOT
NULL. For any Update check, the table R is scanned for any matching X
tuples matching Y in the previous, and for any matches found, an action is
taken. That action can be to fail the operation (NO ACTION, RESTRICT),
update the X values to fixed values (SET NULL, SET DEFAULT), or to delete
those rows in R (CASCADE).


Current Implementation
--

Currently, these operations are handled via per-row triggers. In our
general case, one trigger is placed on R for INSERT operations, and one
trigger is placed on T for DELETE operations, and an additional trigger is
placed on T for UPDATE operations that affect any column of Y.

These Insert trigger functions invoke the C function RI_FKey_check() [2].
The trigger is fired unconditionally, and the trigger itself determines if
there is a referential integrity constraint to be made or not. Ultimately
this trigger invokes an SPI query of the form SELECT 1 FROM  WHERE () FOR KEY SHARE. This query is generally quite straightforward to the
planner, as it becomes either a scan of a single unique index, or a
partition search followed by a scan of a single unique index. The operation
succeeds if a row is found, and fails if it does not.

The Update trigger functions are implemented with a set of C functions
RI_[noaction|restrict|cascade|setnull|setdefault]_[upd|del]() [3]. These
functions each generate a variation of SPI query in one of the following
forms

 cascade: DELETE FROM  WHERE 
 restrict/noaction: SELECT 1 FROM  WHERE  FOR KEY SHARE
 setnull: UPDATE  SET x1 = NULL, ... WHERE 
 setdefault: UPDATE  SET x1 = DEFAULT, ... WHERE 

These triggers are either executed at statement time (Immediate) or are
queued for execution as a part of the transaction commit (Deferred).

Problems With The Current Implementation


The main problems with this architecture come down to visiblity and
performance.

The foremost problem with this implementation is that these extra queries
are not visible to the end user in any way. It is possible to infer that
the functions executed by looking at the constraint defnitions and
comparing pg_stat_user_tables or pg_stat_user_indexes before and after the
operation, but in general the time spent in these functions accrues to the
DML statement (Immediate) or COMMIT statement (Deferred) without any insght
into what took place. This is especially vexing in situations where an
operation as simple as "DELETE FROM highly_referenced_table WHERE id = 1"
hits the primary key index, but takes several seconds to run.

The performance of Insert operations is generally not too bad, in that
query boils down to an Index Scan for a single row. The problem, however,
is that this query must be executed for every row inserted. The query
itself is only planned once, and that query plan is cached for later
re-use. That removes some of the query overhead, but also incurs a growing
cache of plans which can create memory pressure if the number of foreign
keys is large, and indeed this has become a problem for at least one
customer [4]. Some profiling of the RI check indicated that about half of
the time of the insert was spent in SPI functions that could be bypassed if
the C function called index_beginscan and index_rescan directly [5]. And
these indications bore out when Amit Langote wrote a patch [6] which finds
the designanted index from the constraint (with some drilling through
partitions if need be) and then invokes the scan functions. This method
showed about a halving of the t

Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-13 Thread Corey Huinker

>
> In addition to that, a following case would be solved with this approach:
> When many processes are referencing many tables defined foreign key
> constraints thoroughly, a huge amount of memory will be consumed
> regardless of whether referenced tables are partitioned or not.
>
> Attached the patch.  Any thoughts?
>

Amit Langote has done some great work at eliminating SPI from INSERT/UPDATE
triggers entirely, thus reducing the number of cached plans considerably.

I think he was hoping to have a patch formalized this week, if time allowed.

It doesn't have DELETE triggers in it, so this patch might still have good
value for deletes on a commonly used enumeration table.

However, our efforts might be better focused on eliminating SPI from delete
triggers as well, an admittedly harder task.

Re: simplifying foreign key/RI checks

2021-01-18 Thread Corey Huinker

>
>
> In file included from
> /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
>  from
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:
> In function ‘ri_PrimaryKeyExists’:
> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5:
> warning: this statement may fall through [-Wimplicit-fallthrough=]
>   do { \
>  ^
> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2:
> note: in expansion of macro ‘ereport_domain’
>   ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
>   ^~
> /home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2:
> note: in expansion of macro ‘ereport’
>   ereport(elevel, errmsg_internal(__VA_ARGS__))
>   ^~~
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5:
> note: in expansion of macro ‘elog’
>  elog(ERROR, "unexpected table_tuple_lock status: %u", res);
>  ^~~~
> /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4:
> note: here
> default:
> ^~~
>
> --
> Regrads,
> Japin Li.
> ChengDu WenWu Information Technology Co.,Ltd.
>

I also get this warning. Adding a "break;" at line 418 resolves the warning.

Re: simplifying foreign key/RI checks

2021-01-18 Thread Corey Huinker

On Mon, Jan 18, 2021 at 9:45 PM Amit Langote 
wrote:

> On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu  wrote:
> >
> > Hi,
> > I was looking at this statement:
> >
> > insert into f select generate_series(1, 200, 2);
> >
> > Since certain generated values (the second half) are not in table p,
> wouldn't insertion for those values fail ?
> > I tried a scaled down version (1000th) of your example:
> >
> > yugabyte=# insert into f select generate_series(1, 2000, 2);
> > ERROR:  insert or update on table "f" violates foreign key constraint
> "f_a_fkey"
> > DETAIL:  Key (a)=(1001) is not present in table "p".
>
> Sorry, a wrong copy-paste by me.  Try this:
>
> create table p (a numeric primary key);
> insert into p select generate_series(1, 200);
> create table f (a bigint references p);
>
> -- Unpatched
> insert into f select generate_series(1, 200, 2);
> INSERT 0 100
> Time: 6527.652 ms (00:06.528)
>
> update f set a = a + 1;
> UPDATE 100
> Time: 8108.310 ms (00:08.108)
>
> -- Patched:
> insert into f select generate_series(1, 200, 2);
> INSERT 0 100
> Time: 3312.193 ms (00:03.312)
>
> update f set a = a + 1;
> UPDATE 100
> Time: 4292.807 ms (00:04.293)
>
> > For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
> >
> > +* Collect partition key values from the unique key.
> >
> > At the end of the nested loop, should there be an assertion that
> partkey->partnatts partition key values have been found ?
> > This can be done by using a counter (initialized to 0) which is
> incremented when a match is found by the inner loop.
>
> I've updated the patch to add the Assert.  Thanks for taking a look.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

v2 patch applies and passes make check and make check-world. Perhaps, given
the missing break at line 418 without any tests failing, we could add
another regression test if we're into 100% code path coverage. As it is, I
think the compiler warning was a sufficient alert.

The code is easy to read, and the comments touch on the major points of
what complexities arise from partitioned tables.

A somewhat pedantic complaint I have brought up off-list is that this patch
continues the pattern of the variable and function names making the
assumption that the foreign key is referencing the primary key of the
referenced table. Foreign key constraints need only reference a unique
index, it doesn't have to be the primary key. Granted, that unique index is
behaving exactly as a primary key would, so conceptually it is very
similar, but keeping with the existing naming (pk_rel, pk_type, etc) can
lead a developer to think that it would be just as correct to find the
referenced relation and get the primary key index from there, which would
not always be correct. This patch correctly grabs the index from the
constraint itself, so no problem there.

I like that this patch changes the absolute minimum of the code in order to
get a very significant performance benefit. It does so in a way that should
reduce resource pressure found in other places [1]. This will in turn
reduce the performance penalty of "doing the right thing" in terms of
defining enforced foreign keys. It seems to get a clearer performance boost
than was achieved with previous efforts at statement level triggers.

This patch completely sidesteps the DELETE case, which has more insidious
performance implications, but is also far less common, and whose solution
will likely be very different.

[1]
https://www.postgresql.org/message-id/cakkq508z6r5e3jdqhfpwszsajlpho3oyyoamfesaupto5vg...@mail.gmail.com

Re: Release SPI plans for referential integrity with DISCARD ALL

2021-01-18 Thread Corey Huinker

On Wed, Jan 13, 2021 at 1:03 PM Corey Huinker 
wrote:

> In addition to that, a following case would be solved with this approach:
>> When many processes are referencing many tables defined foreign key
>> constraints thoroughly, a huge amount of memory will be consumed
>> regardless of whether referenced tables are partitioned or not.
>>
>> Attached the patch.  Any thoughts?
>>
>
> Amit Langote has done some great work at eliminating SPI from
> INSERT/UPDATE triggers entirely, thus reducing the number of cached plans
> considerably.
>
> I think he was hoping to have a patch formalized this week, if time
> allowed.
>
> It doesn't have DELETE triggers in it, so this patch might still have good
> value for deletes on a commonly used enumeration table.
>
> However, our efforts might be better focused on eliminating SPI from
> delete triggers as well, an admittedly harder task.
>

Amit's patch is now available in this thread [1]. I'm curious if it has any
effect on your memory pressure issue.

[1]
https://www.postgresql.org/message-id/ca+hiwqgkfjfydeq5vhph6eqpkjsbfpddy+j-kxyfepqedts...@mail.gmail.com

Re: simplifying foreign key/RI checks

2021-01-19 Thread Corey Huinker

>
> I decided not to deviate from pk_ terminology so that the new code
> doesn't look too different from the other code in the file.  Although,
> I guess we can at least call the main function
> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
> changed that.
>

I agree with leaving the existing terminology where it is for this patch.
Changing the function name is probably enough to alert the reader that the
things that are called pks may not be precisely that.

Re: simplifying foreign key/RI checks

2021-01-21 Thread Corey Huinker

>
>
>
> I decided not to deviate from pk_ terminology so that the new code
> doesn't look too different from the other code in the file.  Although,
> I guess we can at least call the main function
> ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
> changed that.
>

I think that's a nice compromise, it makes the reader aware of the concept.


>
> I've attached the updated patch.
>

Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is
introduced.
No docs required, as there is nothing user-facing.

Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is
the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().

Re: simplifying foreign key/RI checks

2021-01-23 Thread Corey Huinker

On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu  wrote:

> Hi,
>
> +   for (i = 0; i < riinfo->nkeys; i++)
> +   {
> +   Oid eq_opr = eq_oprs[i];
> +   Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
> +   RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
> +
> +   if (pk_nulls[i] != 'n' &&
> OidIsValid(entry->cast_func_finfo.fn_oid))
>
> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
> assignment to the three local variables. That way, ri_HashCompareOp
> wouldn't be called when pk_nulls[i] == 'n'.
>
> +   case TM_Updated:
> +   if (IsolationUsesXactSnapshot())
> ...
> +   case TM_Deleted:
> +   if (IsolationUsesXactSnapshot())
>
> It seems the handling for TM_Updated and TM_Deleted is the same. The cases
> for these two values can be put next to each other (saving one block of
> code).
>
> Cheers
>

I'll pause on reviewing v4 until you've addressed the suggestions above.

Re: simplifying foreign key/RI checks

2021-01-24 Thread Corey Huinker

On Sun, Jan 24, 2021 at 6:51 AM Amit Langote 
wrote:

> On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker 
> wrote:
> > On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu  wrote:
> >>
> >> Hi,
>
> Thanks for the review.
>
> >> +   for (i = 0; i < riinfo->nkeys; i++)
> >> +   {
> >> +   Oid eq_opr = eq_oprs[i];
> >> +   Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
> >> +   RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr,
> typeid);
> >> +
> >> +   if (pk_nulls[i] != 'n' &&
> OidIsValid(entry->cast_func_finfo.fn_oid))
> >>
> >> It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
> assignment to the three local variables. That way, ri_HashCompareOp
> wouldn't be called when pk_nulls[i] == 'n'.
>
> Good idea, so done.  Although, there can't be nulls right now.
>
> >> +   case TM_Updated:
> >> +   if (IsolationUsesXactSnapshot())
> >> ...
> >> +   case TM_Deleted:
> >> +   if (IsolationUsesXactSnapshot())
> >>
> >> It seems the handling for TM_Updated and TM_Deleted is the same. The
> cases for these two values can be put next to each other (saving one block
> of code).
>
> Ah, yes.  The TM_Updated case used to be handled a bit differently in
> earlier unposted versions of the patch, though at some point I
> concluded that the special handling was unnecessary, but didn't
> realize what you just pointed out.  Fixed.
>
> > I'll pause on reviewing v4 until you've addressed the suggestions above.
>
> Here's v5.
>

v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we *don't* free the RI_CompareHashEntry because it points
to an entry in a hash table which is TopMemoryContext aka lifetime of the
session, correct?

Anybody else want to look this patch over before I mark it Ready For
Committer?

Re: parse_slash_copy doesn't support psql variables substitution

2021-02-11 Thread Corey Huinker

On Wed, Feb 10, 2021 at 8:33 AM Pavel Stehule 
wrote:

> Hi
>
> Is there some reason why \copy statement (parse_slash_copy parser) doesn't
> support psql variables?
>
> Regards
>
> Pavel
>

I remember wondering about that when I was working on the \if stuff. I dug
into it a bit, but the problem was out of scope for my goals.

The additional options recently added to \g reduced my need for \copy, and
it seemed liked there was some effort to have input pipes as well, that
would eliminate the need for \copy altogether.

Re: Add id's to various elements in protocol.sgml

2021-12-14 Thread Corey Huinker

On Sun, Dec 5, 2021 at 11:15 AM Daniel Gustafsson  wrote:

> > On 5 Dec 2021, at 16:51, Brar Piening  wrote:
>
> > The attached patch adds id's to various elements in protocol.sgml to
> > make them more accesssible via the public html documentation interface.
>
> Off the cuff without having checked the compiled results yet, it seems
> like a good idea.
>
> —
> Daniel Gustafsson
>

I wanted to do something similar for every function specification in the
docs. This may inspire me to take another shot at that.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker

>
>
> 0001 adds the \getenv command to psql; now with documentation
> and a simple regression test.
>

+1. Wish I had added this years ago when I had a need for it.


>
> 0002 tweaks pg_regress to export the needed values as environment
> variables, and modifies the test scripts to use those variables.
> (For ease of review, this patch modifies the scripts in-place,
> and then 0003 will move them.)  A few comments on this:
>
> * I didn't see any value in exporting @testtablespace@ as a separate
> variable; we might as well let the test script know how to construct
> that path name.
>
> * I concluded that the right way to handle the concatenation issue
> is *not* to rely on SQL literal concatenation, but to use psql's
> \set command to concatenate parts of a string.  In particular this
>

+1 to that, much better than the multi-line thing.

I have a nitpick about the \getenv FOO FOO lines.
It's a new function to everyone, and to anyone who hasn't seen the
documentation it won't be immediately obvious which one is the ENV var and
which one is the local var. Lowercasing the local var would be a way to
reinforce which is which to the reader. It would also be consistent with
var naming in the rest of the script.


>
> 0004 finally removes the no-longer-needed infrastructure in
>

+1
Deleted code is debugged code.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker

On Sun, Dec 19, 2021 at 5:48 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > I have a nitpick about the \getenv FOO FOO lines.
> > It's a new function to everyone, and to anyone who hasn't seen the
> > documentation it won't be immediately obvious which one is the ENV var
> and
> > which one is the local var. Lowercasing the local var would be a way to
> > reinforce which is which to the reader. It would also be consistent with
> > var naming in the rest of the script.
>
> Reasonable idea.  Another thing I was wondering about was whether
> to attach PG_ prefixes to the environment variable names, since
> those are in a more-or-less global namespace.  If we do that,
> then a different method for distinguishing the psql variables
> is to not prefix them.

+1 to that as well.

Which brings up a tangential question, is there value in having something
that brings in one or more env vars as psql vars directly. I'm thinking
something like:

\importenv pattern [prefix]

(alternate names: \getenv_multi \getenv_pattern, \getenvs, etc)

which could be used like

\importenv PG* env_

which would import PGFOO and PGBAR as env_PGFOO and env_PGBAR, awkward
names but leaving no doubt about where a previously unreferenced variable
came from.

I don't *think* we need it for this specific case, but since the subject of
env vars has come up I thought I'd throw it out there.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker

On Sun, Dec 19, 2021 at 7:00 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Which brings up a tangential question, is there value in having something
> > that brings in one or more env vars as psql vars directly. I'm thinking
> > something like:
>
> > \importenv pattern [prefix]
>
> Meh ... considering we've gone this long without any getenv capability
> in psql at all, that seems pretty premature.
>
> regards, tom lane
>

Fair enough.

Patches didn't apply with `git apply` but did fine with `patch -p1`, from
there it passes make check-world.

Re: simplifying foreign key/RI checks

2021-12-19 Thread Corey Huinker

>
>
>
> I wasn't able to make much inroads into how we might be able to get
> rid of the DETACH-related partition descriptor hacks, the item (3),
> though I made some progress on items (1) and (2).
>
> For (1), the attached 0001 patch adds a new isolation suite
> fk-snapshot.spec to exercise snapshot behaviors in the cases where we
> no longer go through SPI.  It helped find some problems with the
> snapshot handling in the earlier versions of the patch, mainly with
> partitioned PK tables.  It also contains a test along the lines of the
> example you showed upthread, which shows that the partition descriptor
> hack requiring ActiveSnapshot to be set results in wrong results.
> Patch includes the buggy output for that test case and marked as such
> in a comment above the test.
>
> In updated 0002, I fixed things such that the snapshot-setting
> required by the partition descriptor hack is independent of
> snapshot-setting of the RI query such that it no longer causes the PK
> index scan to return rows that the RI query mustn't see.  That fixes
> the visibility bug illustrated in your example, and as mentioned, also
> exercised in the new test suite.
>
> I also moved find_leaf_pk_rel() into execPartition.c with a new name
> and a new set of parameters.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com


Sorry for the delay. This patch no longer applies, it has some conflict
with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-20 Thread Corey Huinker

>
> Out of curiosity, could you please tell me the concrete situations
> where you wanted to delete one of two identical records?
>

In my case, there is a table with known duplicates, and we would like to
delete all but the one with the lowest ctid, and then add a unique index to
the table which then allows us to use INSERT ON CONFLICT in a meaningful
way.

The other need for a DELETE...LIMIT or UPDATE...LIMIT is when you're
worried about flooding a replica, so you parcel out the DML into chunks
that don't cause unacceptable lag on the replica.

Both of these can be accomplished via  DELETE FROM foo WHERE ctid IN (
SELECT ... FROM foo ... LIMIT 1000), but until recently such a construct
would result in a full table scan, and you'd take the same hit with each
subsequent DML.

I *believe* that the ctid range scan now can limit those scans, especially
if you can order the limited set by ctid, but those techniques are not
widely known at this time.

Re: simplifying foreign key/RI checks

2021-12-20 Thread Corey Huinker

>
>
>
> Good catch, thanks.  Patch updated.
>
>
>
Applies clean. Passes check-world.

Re: Foreign key joins revisited

2021-12-26 Thread Corey Huinker

>
>
>
> Perhaps this would be more SQL idiomatic:
>
> FROM permission p
>LEFT JOIN ON KEY role IN p AS r
>LEFT JOIN team_role AS tr ON KEY role TO r
>LEFT JOIN ON KEY team IN tr AS t
>LEFT JOIN user_role AS ur ON KEY role TO r
>LEFT JOIN ON KEY user IN ur AS u
>
>
My second guess would be:

FROM permission p
LEFT JOIN role AS r ON [FOREIGN] KEY [(p.col1 [, p.col2 ...])]


where the key spec is only required when there are multiple foreign keys in
permission pointing to role.

But my first guess would be that the standards group won't get around to it.

Re: Foreign key joins revisited

2021-12-27 Thread Corey Huinker

>
>
> First, there is only one FK in permission pointing to role, and we write a
> query leaving out the key columns.
> Then, another different FK in permission pointing to role is later added,
> and our old query is suddenly in trouble.
>
>
We already have that problem with cases where two tables have a common x
column:

SELECT x FROM a, b


so this would be on-brand for the standards body. And worst case scenario
you're just back to the situation you have now.

Re: Suggestion: optionally return default value instead of error on failed cast

2022-01-04 Thread Corey Huinker

>
> currently a failed cast throws an error. It would be useful to have a
> way to get a default value instead.
>

I've recently encountered situations where this would have been helpful.
Recently I came across some client code:

CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean LANGUAGE
PLPGSQL
AS $$
DECLARE
j json;
BEGIN
j := str::json;
return true;
EXCEPTION WHEN OTHERS THEN return false;
END
$$;


This is a double-bummer. First, the function discards the value so we have
to recompute it, and secondly, the exception block prevents the query from
being parallelized.


>
> T-SQL has try_cast [1]
>

I'd be more in favor of this if we learn that there's no work (current or
proposed) in the SQL standard.


> Oracle has CAST(... AS .. DEFAULT ... ON CONVERSION ERROR) [2]
>

If the SQL group has suggested anything, I'd bet it looks a lot like this.


>
> The DEFAULT ... ON CONVERSION ERROR syntax seems like it could be
> implemented in PostgreSQL. Even if only DEFAULT NULL was supported (at
> first) that would already help.
>
> The short syntax could be extended for the DEFAULT NULL case, too:
>
> SELECT '...'::type -- throws error
> SELECT '...':::type -- returns NULL
>

I think I'm against adding a ::: operator, because too many people are
going to type (or omit) the third : by accident, and that would be a really
subtle bug. The CAST/TRY_CAST syntax is wordy but it makes it very clear
that you expected janky input and have specified a contingency plan.

The TypeCast node seems like it wouldn't need too much modification to
allow for this. The big lift, from what I can tell, is either creating
versions of every $foo_in() function to return NULL instead of raising an
error, and then effectively wrapping that inside a coalesce() to process
the default. Alternatively, we could add an extra boolean parameter
("nullOnFailure"? "suppressErrors"?) to the existing $foo_in() functions, a
boolean to return null instead of raising an error, and the default would
be handled in coerce_to_target_type(). Either of those would create a fair
amount of work for extensions that add types, but I think the value would
be worth it.

I do remember when I proposed the "void"/"black hole"/"meh" datatype (all
values map to NULL) I ran into a fairly fundamental rule that types must
map any not-null input to a not-null output, and this could potentially
violate that, but I'm not sure.

Does anyone know if the SQL standard has anything to say on this subject?

Re: SQL:2011 application time

2022-01-05 Thread Corey Huinker

On Wed, Jan 5, 2022 at 11:07 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 21.11.21 02:51, Paul A Jungwirth wrote:
> > Here are updated patches. They are rebased and clean up some of my
> > TODOs.
>
> This patch set looks very interesting.  It's also very big, so it's
> difficult to see how to get a handle on it.  I did a pass through it
> to see if there were any obvious architectural or coding style
> problems.  I also looked at some of your TODO comments to see if I had
> something to contribute there.
>
> I'm confused about how to query tables based on application time
> periods.  Online, I see examples using AS OF, but in the SQL standard
> I only see this used for system time, which we are not doing here.
> What is your understanding of that?
>

Paul has previously supplied me with this document
https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
and that formed the basis of a lot of my questions a few months earlier.

There was similar work being done for system periods, which are a bit
simpler but require a side (history) table to be created. I was picking
people's brains about some aspects of system versioning to see if I could
help bringing that into this already very large patchset, but haven't yet
felt like I had done enough research to post it.

It is my hope that we can at least get the syntax for both application and
system versioning committed, even if it's just stubbed in with
not-yet-supported errors.

Re: Suggestion: optionally return default value instead of error on failed cast

2022-01-06 Thread Corey Huinker

On Thu, Jan 6, 2022 at 12:18 PM Andrew Dunstan  wrote:

>
> On 1/4/22 22:17, Corey Huinker wrote:
> >
> > currently a failed cast throws an error. It would be useful to have a
> > way to get a default value instead.
> >
> >
> > I've recently encountered situations where this would have been
> > helpful. Recently I came across some client code:
> >
> > CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean
> > LANGUAGE PLPGSQL
> > AS $$
> > DECLARE
> > j json;
> > BEGIN
> > j := str::json;
> > return true;
> > EXCEPTION WHEN OTHERS THEN return false;
> > END
> > $$;
> >
> >
> > This is a double-bummer. First, the function discards the value so we
> > have to recompute it, and secondly, the exception block prevents the
> > query from being parallelized.
>
>
> This particular case is catered for in the SQL/JSON patches which
> several people are currently reviewing:
>
>
That's great to know, but it would still be parsing the json twice, once to
learn that it is legit json, and once to get the casted value.

Also, I had a similar issue with type numeric, so having generic "x is a
type_y" support would essentially do everything that a try_catch()-ish
construct would need to do, and be more generic.

Re: SQL:2011 application time

2022-01-06 Thread Corey Huinker

>
>
>  But
> the standard says that dropping system versioning should automatically
> drop all historical records (2 under Part 2: Foundation, 11.30  system versioning clause>). That actually makes sense though: when you
> do DML we automatically update the start/end columns, but we don't
> save copies of the previous data (and incidentally the end column will
> always be the max value.)


This is what I was referring to when I mentioned a side-table.
deleting history would be an O(1) operation. Any other
misunderstandings are all mine.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Corey Huinker

>
>  I'd be
> curious to know where we found the bits for that -- the tuple header
> isn't exactly replete with extra bit space.
>

+1 - and can we somehow shoehorn in a version # into the new format so we
never have to look for spare bits again.

Re: test runner (was Re: SQL-standard function body)

2021-04-11 Thread Corey Huinker

>
> > This is nice.  Are there any parallelism capabilities?
>
> Yes. It defaults to number-of-cores processes, but obviously can also be
> specified explicitly. One very nice part about it is that it'd work
> largely the same on windows (which has practically unusable testing
> right now). It probably doesn't yet, because I just tried to get it
> build and run tests at all, but it shouldn't be a lot of additional
> work.
>

The pidgin developers speak very highly of meson, for the same reasons
already mentioned in this thread.

Re: use pg_get_functiondef() in pg_dump

2020-08-17 Thread Corey Huinker

>
> I'm sure there's a lot of folks who'd like to see more of the logic we
> have in pg_dump for building objects from the catalog available to more
> tools through libpgcommon- psql being one of the absolute first
> use-cases for exactly that (there's certainly no shortage of people
> who've asked how they can get a CREATE TABLE statement for a table by
> using psql...).
>

I count myself among those folks (see
https://www.postgresql.org/message-id/CADkLM%3DfxfsrHASKk_bY_A4uomJ1Te5MfGgD_rwwQfV8wP68ewg%40mail.gmail.com
for
discussion of doing DESCRIBE and SHOW CREATE-ish functions either on server
side or client side).

I'm all for having this as "just" as set of pg_get_*def functions, because
they allow for the results to be used in queries. Granted, the shape of the
result set may not be stable, but that's the sort of thing we can warn for
the same way we have warnings for changes to pg_stat_activity. At that
point any DESCRIBE/SHOW CREATE server side functions essentially become
just shells around the pg_get_*def(), with no particular requirement to
make those new commands work inside a SELECT.

Would it be totally out of left field to have the functions have an
optional "version" parameter, defaulted to null, that would be used to give
backwards compatible results if and when we do make a breaking change?

Re: automatically generating node support functions

2021-10-11 Thread Corey Huinker

>
> build support and made the Perl code more portable, so that the cfbot
> doesn't have to be sad.
>

Was this also the reason for doing the output with print statements rather
than using one of the templating libraries? I'm mostly just curious, and
certainly don't want it to get in the way of working code.

Re: Extending range type operators to cope with elements

2019-09-14 Thread Corey Huinker

>
>
> >- @> contains range/element
> >- <@ element/range is contained by
>


I'm not a heavy user or range types, so I can't really judge how useful
> that is in practice, but it seems like a fairly natural extension of the
> existing operators. I mean, if I understand it correctly, the proposed
> behavior is equal to treating the element as a "collapsed range".
>

I used to give a talk on ranges and partitioning, prior to postgresql
getting native partitioning (see:
https://wiki.postgresql.org/images/1/1b/Ranges%2C_Partitioning_and_Limitations.pdf
 )
In that talk, I mention the need for exactly these operators, specifically
for an extension called range_partitioning which had some logic for "If I
were to insert a row with this value, what partition would it end up in?"
which allowed for a subsequent COPY operation directly to that partition.
That logic essentially binary-searched a series of ranges, so it needed an
"elem <@ range" as well as << and >>.

Yes, constructing a collapsed range was the work-around I used in the
absence of real functions.

That extension has been replaced by real table partitioning and the planner
itself now does similar logic for partition pruning.

So yes, I've had a need for those operators in the past. What I don't know
is whether adding these functions will be worth the catalog clutter.

Add A Glossary

2019-10-14 Thread Corey Huinker

Attached is a v1 patch to add a Glossary to the appendix of our current
documentation.

I believe that our documentation needs a glossary for a few reasons:

1. It's hard to ask for help if you don't know the proper terminology of
the problem you're having.

2. Readers who are new to databases may not understand a few of the terms
that are used casually both in the documentation and in forums. This helps
to make our documentation a bit more useful as a teaching tool.

3. Readers whose primary language is not English may struggle to find the
correct search terms, and this glossary may help them grasp that a given
term has a usage in databases that is different from common English usage.

3b. If we are not able to find the resources to translate all of the
documentation into a given language, translating the glossary page would be
a good first step.

4. The glossary would be web-searchable, and draw viewers to the official
documentation.

5. adding link anchors to each term would make them cite-able, useful in
forum conversations.


A few notes about this patch:

1. It's obviously incomplete. There are more terms, a lot more, to add.

2. The individual definitions supplied are off-the-cuff, and should be
thoroughly reviewed.

3. The definitions as a whole should be reviewed by an actual tech writer
(one was initially involved but had to step back due to prior commitments),
and the definitions should be normalized in terms of voice, tone, audience,
etc.

4. My understanding of DocBook is not strong. The glossary vs glosslist tag
issue is a bit confusing to me, and I'm not sure if the glossary tag is
even appropriate for our needs.

5. I've made no effort at making each term an anchor, nor have I done any
CSS styling at all.

6. I'm not quite sure how to handle terms that have different definitions
in different contexts. Should that be two glossdefs following one
glossterm, or two separate def/term pairs?

Please review and share your thoughts.
From 343d5c18bf23f98341b510595e3e042e002242cb Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Sun, 13 Oct 2019 17:57:36 +
Subject: [PATCH] add glossary page with sample terms and definitions

---
 doc/src/sgml/filelist.sgml  |   1 +
 doc/src/sgml/glossary.sgml  | 618 
 doc/src/sgml/stylesheet.css |   2 +
 3 files changed, 621 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..016eee2d76
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,618 @@
+
+
+
+ Glossary
+
+ 
+  This is a list of terms and their in the context of PostgreSQL and databases in general.
+
+  
+
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value
+may not be of the same type as the original values. Aggregate functions combine
+multiple rows that share a common set of values into one row, which means that
+the only data visible in the values in common, and the aggregates of the
+non-common data.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A function whose computed value can reference values found in nearby rows
+of the same result set.
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken up
+into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part:
+it must either entirely succeed or entirely fail. A series of SQL statements can
+be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  A typed data element found within a Tuple or Relation or Table.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A conversion of a Datum from its current data type to another data type.
+ 
+
+   
+
+   
+Check Constraint
+
+ 
+  A type of constraint defined on a relation which restricts the values
+allowed in one or more Attributes. The check constraint can make reference to
+any Attribute in the Relation, but cannot reference other rows of the same
+relation or other relations.
+ 
+
+   
+
+   
+Column
+
+ 
+  An Attribute found in a Table or View.
+ 
+
+   
+
+   
+Commit
+
+ 
+  The act of finalizing a Transaction within the database.
+ 
+
+   
+
+   
+Concurrency
+
+ 
+  The concept that multiple independent operations can be happening within
+the database at the same time.
+ 
+
+   
+
+   
+Constraint
+
+ 
+  A method of restricting the values of data allowed within a Table.
+ 
+
+   
+
+   
+Dat

Add Change Badges to documentation

2019-10-18 Thread Corey Huinker

Attached is a patch to implement change badges in our documentation.

What's a change badge? It's my term for a visual cue in the documentation
used to indicate that the nearby section of documentation is new in this
version or otherwise changed from the previous version.

One example of change badges being used is in the DocBook documentation
reference:
https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes

Docbook used graphical badges, which seemed to be a bad idea. Instead, I
went with a decorated text span like one finds in gmail labels or Reddit
"flair".

The badges are implemented via using the "revision" attribute available on
all docbook tags. All one needs to do to indicate a change is to change one
tag, and add a revision attribute. For example:



will add a small green text box with the tex "new in 13" immediately
preceding the rendered elements. I have attached a screenshot
(badges_in_acronyms.png) of an example of this from my browser viewing
changes to the acronyms.html file. This obviously lacks the polish of
viewing the page on a full website, but it does give you an idea of the
flexibility of the change badge, and where badge placement is (and is not)
a good idea.

What are the benefits of using this?

I think the benefits are as follows:

1. It shows a casual user what pieces are new on that page (new functions,
new keywords, new command options, etc).

2. It also works in the negative: a user can quickly skim a page, and
lacking any badges, feel confident that everything there works in the way
that it did in version N-1.

3. It also acts as a subtle cue for the user to click on the previous
version to see what it used to look like, confident that there *will* be a
difference on the previous version.


How would we implement this?

1. All new documentation pages would get a "NEW" badge in their title.

2. New function definitions, new command options, etc would get a "NEW"
badge as visually close to the change as is practical.

3. Changes to existing functions, options, etc. would get a badge of
"UPDATED"

4. At major release time, we could do one of two things:

4a. We could keep the NEW/UPDATED badges in the fixed release version, and
then completely remove them from the master, because for version N+1, they
won't be new anymore. This can be accomplished with an XSL transform
looking for any tag with the "revision" attribute

4b. We could code in the version number at release time, and leave it in
place. So in version 14 you could find both "v13" and "v14" badges, and in
version 15 you could find badges for 15, 14, and 13. At some point (say
v17), we start retiring the v13 badges, and in v18 we'd retire the v14
badges, and so on, to keep the clutter to a minimum.

Back to the patch:
I implemented this only for html output, and the colors I chose are very
off-brand for postgres, so that will have to change. There's probably some
spacing/padding issues I haven't thought of. Please try it out, make some
modifications to existing document pages to see how badges would work in
those contexts.
From ded965fc90b223a834ac52d55512587b7a6ea139 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Fri, 18 Oct 2019 06:15:10 -0400
Subject: [PATCH] add document change badges

---
 doc/src/sgml/acronyms.sgml  |  6 +++---
 doc/src/sgml/stylesheet-html-common.xsl | 10 ++
 doc/src/sgml/stylesheet.css | 10 ++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..87bfef04be 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -10,7 +10,7 @@
   
 

-ANSI
+ANSI
 
  
   https://en.wikipedia.org/wiki/American_National_Standards_Institute";>
@@ -19,7 +19,7 @@
 

 
-   
+   
 API
 
  
@@ -31,7 +31,7 @@

 ASCII
 
- 
+ 
   https://en.wikipedia.org/wiki/Ascii";>American Standard
   Code for Information Interchange
  
diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl
index 9edce52a10..cb04cb7f0d 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -289,4 +289,14 @@ set   toc,title
   
 
 
+
+
+  
+  
+  
+
+  
+  
+
+
 
diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css
index 1a66c789d5..d0cae2f59f 100644
--- a/doc/src/sgml/stylesheet.css
+++ b/doc/src/sgml/stylesheet.css
@@ -109,3 +109,13 @@ acronym		{ font-style: inherit; }
 width: 75%;
   }
 }
+
+/* version badge styling */
+span.revision-badge {
+	visibility: visible  ;
+color: white;
+	background-color: #00933C;
+	border: 1px solid #00;
+	border-radius: 2px;
+padding: 1px;
+}
-- 
2.14.1

Re: \describe*

2019-03-04 Thread Corey Huinker

>
>
>> - Tab completion for \descibe-verbose.
>> I know that \d+ tab completion is also not there, but I think we must
>> have tab completion for \descibe-verbose.
>>
>> postgres=# \describe-
>> \describe-extension
>>  \describe-replication-publication \describe-user-mapping
>> \describe-foreign-data-wrapper
>> \describe-replication-subscription\describe-view
>> \describe-foreign-server  \describe-role
>>   \describe-window-function
>> \describe-foreign-table   \describe-rule
>>  ...
>>
>
I just confirmed that there isn't tab completion for the existing S/+
options, so it's hard to justify them for the equivalent verbose suffixes.



> (1 row)
>> Invalid command \describe. Try \? for help.
>>
>>
>> I think this status is causing the problem.
>>
>>
>>
>> +   /*
>> standard listing of interesting things */
>> +   success =
>> listTables("tvmsE", NULL, show_verbose, show_system);
>> +   }
>> +   status = PSQL_CMD_UNKNOWN;
>>
>>
I'll look into this, thanks!



> - Confusion about \desc and \desC
>> There is confusion while running the \desc command. I know the problem,
>> but the user may confuse by this.
>> postgres=# \desC
>>List of foreign servers
>>  Name | Owner | Foreign-data wrapper
>> --+---+--
>> (0 rows)
>>
>> postgres=# \desc
>> Invalid command \desc. Try \? for help.
>>
>> - Auto-completion of commands.
>> There is some more confusion in the completion of commands.
>>
>> This command shows List of aggregates.
>> postgres=# \describe-aggregate-function
>>  List of aggregate functions
>>  Schema | Name | Result data type | Argument data types | Description
>> +--+--+-+-
>> (0 rows)
>>
>>
>>
>> This command shows a list of relation "\d"
>> postgres=# \describe-aggregatE-function
>> List of relations
>>  Schema | Name | Type  |  Owner
>> +--+---+-
>>  public | foo  | table | vagrant
>> (1 row)
>>
>> This command also shows a list of relations "\d".
>> postgres=# \describe-aggr
>> List of relations
>>  Schema | Name | Type  |  Owner
>> +--+---+-
>>  public | foo  | table | vagrant
>> (1 row)
>>
>> This command shows error messages.
>> postgres=# \descr
>> Invalid command \descr. Try \? for help.
>>
>>
I will look into it.



>
>> I have done a brief code review except for the documentation code. I
>> don't like this code
>>
>> if (cmd_match(cmd,"describe-aggregate-function"))
>>
>>  success = describeAggregates(pattern, show_verbose, show_system);
>>  else if (cmd_match(cmd,
>> "describe-access-method"))
>>  success = describeAccessMethods(pattern,
>> show_verbose);
>>  else if (cmd_match(cmd,
>> "describe-tablespace"))
>>  success = describeTablespaces(pattern,
>> show_verbose);
>>  else if (cmd_match(cmd,
>> "describe-conversion"))
>>  success = listConversions(pattern,
>> show_verbose, show_system);
>>  else if (cmd_match(cmd, "describe-cast"))
>>  success = listCasts(pattern, show_verbose
>>
>>
>> This can be achieved with the list/array/hash table, so I have changed
>> that code in the attached patch just for a sample if you want I can do that
>> for whole code.
>>
>
There's some problems with a hash table. The function signatures vary quite
a lot, and some require additional psql_scan_slash_options to be called.
The hash option, if implemented, probably should be expanded to all slash
commands, at which point maybe it belongs in psqlscanslash.l...

>

Re: Re: \describe*

2019-03-05 Thread Corey Huinker

>
>
> I agree with Andres and Robert.  This patch should be pushed to PG13.
>
> I'll do that on March 8 unless there is a compelling argument not to.
>
>
No objection. I'll continue to work on it, though.

Re: \describe*

2019-03-08 Thread Corey Huinker

On Mon, Mar 4, 2019 at 1:45 PM Corey Huinker 
wrote:

>
>>> - Tab completion for \descibe-verbose.
>>> I know that \d+ tab completion is also not there, but I think we must
>>> have tab completion for \descibe-verbose.
>>>
>>> postgres=# \describe-
>>> \describe-extension
>>>  \describe-replication-publication \describe-user-mapping
>>> \describe-foreign-data-wrapper
>>> \describe-replication-subscription\describe-view
>>> \describe-foreign-server  \describe-role
>>> \describe-window-function
>>> \describe-foreign-table   \describe-rule
>>>  ...
>>>
>>
> I just confirmed that there isn't tab completion for the existing S/+
> options, so it's hard to justify them for the equivalent verbose suffixes.
>

We can add completions for describe[-thing-]-verbose, but the
auto-completions start to run into combinatoric complexity, and the
original short-codes don't do that completion, probably for the same reason.

+   success =
>>> listTables("tvmsE", NULL, show_verbose, show_system);
>>> +   }
>>> +   status =
>>> PSQL_CMD_UNKNOWN;
>>>
>>>
> I'll look into this, thanks!
>

This was fixed, good find.



> - Confusion about \desc and \desC
>>> There is confusion while running the \desc command. I know the problem,
>>> but the user may confuse by this.
>>> postgres=# \desC
>>>List of foreign servers
>>>  Name | Owner | Foreign-data wrapper
>>> --+---+--
>>> (0 rows)
>>>
>>> postgres=# \desc
>>> Invalid command \desc. Try \? for help.
>>>
>>
I've changed the code to first strip out 0-1 instances of "-verbose" and
"-system" and the remaining string must be an exact match of a describe
command or it's an error. This same system could be applied to the short
commands to strip out 'S' and '+' and it might clean up the original code a
bit.

This command shows a list of relation "\d"
>>> postgres=# \describe-aggregatE-function
>>> List of relations
>>>  Schema | Name | Type  |  Owner
>>> +--+---+-
>>>  public | foo  | table | vagrant
>>> (1 row)
>>>
>>
Same issue, same fix.


>>> I have done a brief code review except for the documentation code. I
>>> don't like this code
>>>
>>> if (cmd_match(cmd,"describe-aggregate-function"))
>>>
>>>  success = describeAggregates(pattern, show_verbose, show_system);
>>>  else if (cmd_match(cmd,
>>> "describe-access-method"))
>>>  success =
>>> describeAccessMethods(pattern, show_verbose);
>>>  else if (cmd_match(cmd,
>>> "describe-tablespace"))
>>>  success = describeTablespaces(pattern,
>>> show_verbose);
>>>  else if (cmd_match(cmd,
>>> "describe-conversion"))
>>>  success = listConversions(pattern,
>>> show_verbose, show_system);
>>>  else if (cmd_match(cmd, "describe-cast"))
>>>  success = listCasts(pattern,
>>> show_verbose
>>>
>>>
>>> This can be achieved with the list/array/hash table, so I have changed
>>> that code in the attached patch just for a sample if you want I can do that
>>> for whole code.
>>>
>>
> There's some problems with a hash table. The function signatures vary
> quite a lot, and some require additional psql_scan_slash_options to be
> called. The hash option, if implemented, probably should be expanded to all
> slash commands, at which point maybe it belongs in psqlscanslash.l...
>

As I suspected, there's a lot of variance in the function signatures of the
various listSomething()/describeSomething() commands,
and listDbRoleSettings requires a second pattern to be scanned, and as far
as I know PsqlScanState isn't known inside describe.h, so building and
using a hash table would be a lot of work for uncertain gain. The original
code just plows through strings in alphabetical order, breaking things up
by comparing leading characters, so I largely did the same at the

GIN indexes on an = ANY(array) clause

2019-03-13 Thread Corey Huinker

(moving this over from pgsql-performance)

A client had an issue with a where that had a where clause something like
this:

WHERE 123456 = ANY(integer_array_column)


I was surprised that this didn't use the pre-existing GIN index on
integer_array_column, whereas recoding as

WHERE ARRAY[123456] <@ integer_array_column


did cause the GIN index to be used. Is this a known/expected behavior? If
so, is there any logical reason why we couldn't have the planner pick up on
that?

Flo Rance (toura...@gmail.com) was nice enough to show that yes, this is
expected behavior.

Which leaves the questions:
- is the transformation I made is algebraically correct in a general case?
- if so, could we have the planner do that automatically when in the
presence of a matching GIN index?

This seems like it might tie in with the enforcement of foreign keys within
an array thread (which I can't presently find...).

Re: Syntax diagrams in user documentation

2019-03-28 Thread Corey Huinker

On Thu, Mar 28, 2019 at 6:49 PM Peter Geoghegan  wrote:

> On Thu, Mar 28, 2019 at 3:46 PM Jeremy Schneider 
> wrote:
> > We're just gearing up for the Google Season of Docs and I think this
> > would be a great task for a doc writer to help with.  Any reason to
> > expect serious objections to syntax diagram graphics in the docs?
>
> It might be hard to come to a consensus, because it's one of those
> things that everybody can be expected to have an opinion on. It
> probably won't be hard to get something committed that's clearly more
> informative than what we have right now, though.
>
> There is a question about how we maintain consistency between the
> syntax diagrams in psql if we go this way, though. Not sure what to do
> about that.
>

This discussion is highly relevant to an upcoming talk I have called "In
Aid Of RTFM", and the work I hope would follow from it.

While I personally like these bubble charts because they remind me of my
misspent youth at IBM, they have some drawbacks:

1. They look like something out of an IBM manual
2. Images conceal information from visually impaired people
3. They aren't copy paste-able text
4. They aren't easily comparable
5. They bake in the language of the comments

The merits of #1 can be argued forever, and it's possible that a more
modern bubble chart theme is possible.

#2 is problematic, because things like ADA compliance and the EU
Accessibility Requirements frown upon conveying text inside images. The way
around this might be to have the alt-text of the image be the original
syntax as we have it now.

#3 is important when attempting to relay the relevant excerpt of a very
large documentation page via email or slack. Yes, I could right click and
copy the URL of the image (in this case
https://www.sqlite.org/images/syntax/insert-stmt.gif and others), but
that's more work that copy-paste. We could add an HTML anchor to each image
(my talk discusses our current lack of reference anchors) and that would
mitigate it somewhat. Making the original text available via mouse-over or
a "copy text" link might work too.

#3b As long as I live, I will never properly memorize the syntax for RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I will google this and
copy-paste it. I suspect I'm not alone. If it's available only in an image,
then I can't copy paste, and I *will* mistype some part of that at least
twice.

#4 isn't such an immediate issue, but one of my points in the talk is that
right now there is no way to easily distinguish text on a page that is new
in the most recent version of pgsql (i.e. a red-line markup). We could of
course flag that an image changed from version X-1 to X, but it would be
tougher to convey which parts of the image changed.

#5 it not such a big issue because most of what is in the diagram is pure
syntax, but comments will leak in, and those snippets of English will be
buried very deep in bubble-markup.

Re: DWIM mode for psql

2019-03-31 Thread Corey Huinker

On Sun, Mar 31, 2019 at 5:04 PM Andres Freund  wrote:

> On 2019-04-01 09:52:34 +1300, Thomas Munro wrote:
> > +/*
> > + * This program is free software: you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + *  the Free Software Foundation, either version 3 of the License, or
> > + *  (at your option) any later version.
>
> Indentation bug. You really need to work a bit more careful.
>

The patch applies cleanly, and passes "make check", but it generated an
executable called "mongodb".
Should I have run "make maintainer-clean" first?

>
> It seems this topic is ongoing so I've moved it to the September CF,
> but it's in "Waiting on Author" because we don't have a concrete patch
> that applies (or agreement on what it should do?) right now.
>

All recent work has been investigating the need(s) we're trying to address.
This is as good of a time as any to share my findings (with much
collaboration with Dave Fetter) so far.

1. Adding helper commands to psql aids only psql, and a great number of
users do not, or can not, use psql. So adding something on the server side
would have broader usage and appeal. Furthermore, some access tools
(especially browser-based ones) are not good about returning non-tabular
results, so helper commands that return result sets would have the broadest
usage.

2. Our own interest in server-side commands is all over the map. Some just
want the convenience of having them server side, or familiarity with
$OTHER_DB. Others want to eliminate the need for some code in pg_dump,
JDBC, or elsewhere.

3. There isn't much consensus in the other databases, though all of them do
*something*:

SQLServer
---

SQLServer has sp_help (
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-help-transact-sql?view=sql-server-2017
)

which contextually returns one of two different result sets (name, owner,
object type) or (column name, type, storage, length, precision, scale,
nullable, default, rule, collation)

DB2
--
Has a describe command (source:
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0002019.html)
which
can be used to describe query output (data type, data type length, column
name, column name length).

It also has an option to DESCRIBE TABLE foo which returns a set of
(col_name, schema_of_datatype, data_type, data_type_length,
data_type_scale, Nulls t/f)

It also has DESCRIBE INDEXES FOR TABLE foo which returns a set of (schema
of index, name of index, unique flag, number of columns, index type)

It also has DESCRIBE DATA PARTITIONS FOR TABLE which as you might guess
shows partitions.

All of these options have a SHOW DETAIL modifier which adds more columns.

MySQL
--

(https://dev.mysql.com/doc/refman/8.0/en/show-columns.html)
MySSQL has SHOW COLUMNS which also returns a set of (name, type similar to
format_type(), null flag, PK or index indicator, default value, notes about
auto-increment/autogreneration/implicit trggers), and can be extended to
show privileges and comments with the EXTENDED and FULL options.

MySQL has a DESCRIBE command, but it is a synonym of EXPLAIN.

MySQL also has a raft of commands like SHOW CREATE USER, SHOW CREATE VIEW,
SHOW CREATE TRIGGER, SHOW CREATE TABLE, etc. (ex:
https://dev.mysql.com/doc/refman/8.0/en/show-create-user.html) These
commands all return a result set of of exactly one column, each row
representing one SQL statement, essentially doing a single-object
schema-only pg_dump.

Oracle
-

https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12019.htm

SQL*Plus has a describe command that works on tables and views and
composite types (tabular set of: name, null, type) procedures (tabular set
of: arg name, type, in/out), and packages (a series of sets one per type
and procedure)

SQLcl has the INFO statement, which is roughly analogous to psql's \d in
that it is a mix of tabular and non-tabular information.

Oracle itself has dbms_metadata.get_ddl() which seems analogous to mysql's
SHOW CREATE commands.

Snowflake
--

Snowflake has DESCRIBE TABLE
https://docs.snowflake.net/manuals/sql-reference/sql/desc-table.html and
DESCRIBE VIEW
https://docs.snowflake.net/manuals/sql-reference/functions/get_ddl.html

Which return a set of: (name, type, column type, null flag, default,
primary key, unique key, check, expression, comment).

It also has an option for describing "stage" tables, which are s3 buckets
with a file format associated, the closest postgresql analog would be a
file_fdw foreign table, and there is a separate result set format for that.

Snowflake has no concept of indexes (it understands that there's things
called a unique keys, and it remembers that you said you wanted one, but
does nothing to enforce it), so no command for that.

These result sets are not composable in a query, however, they are stored
in the RESULT_SCAN cache, which means that you can run a describe, and then
immediately fetch the results of that command as if it was a table.

Snowflake also has a get_ddl() function
https://docs.snowflake.net/manuals/sql-reference/sql/desc-view.html which
is a one-column result set of statements to re-create the given object.

>From all this, I have so far concluded:

1. There is real demand to be able to easily see the basic structure of
tables, views, and indexes in a way that strikes a balance between detail
and clutter.
2. There is some acknowledgement that this data be useful if it was further
filtered through SQL, though only one vendor h

Re: Referential Integrity Checks with Statement-level Triggers

2019-08-01 Thread Corey Huinker

>
>
> > The people who expressed opinions on nuking triggers from orbit (it's
> the only way to be sure) have yet to offer up any guidance on how to
> proceed from here, and I suspect it's because they're all very busy getting
> things ready for v12. I definitely have an interest in working on this for
> 13, but I don't feel good about striking out on my own without their input.
>
> Very interesting thread, but the current patch has been through two
> CFs without comments or new patches, so I'm going to mark it "Returned
> with feedback".  I hope all this discussion will trigger more research
> in this space.
>

I've noticed that the zedstore efforts ran into the same problem that
refactoring triggers has: we cannot determine which columns in a table will
be affected by a trigger. so we have to assume that all of them will be.
This causes a lot of unnecessary overhead with triggers. If we had a
compilation step for triggers (which, ultimately means a compilation step
for procedures) which kept a dependency tree of which tables/columns were
touched, then we would have that insight. it's true that one dynamic
statement or SELECT * would force us right back to keep-everything, but if
procedures which did not do such things had performance benefits, that
would be an incentive to code them more fastidiously.

CREATE ROUTINE MAPPING

2018-01-11 Thread Corey Huinker

A few months ago, I was researching ways for formalizing calling functions
on one postgres instance from another. RPC, basically. In doing so, I
stumbled across an obscure part of the the SQL Standard called ROUTINE
MAPPING, which is exactly what I'm looking for.

The syntax specified is, roughly:

CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec
SERVER my_server [ OPTIONS( ... ) ]


Which isn't too different from CREATE USER MAPPING.

The idea here is that if I had a local query:

SELECT t.x, remote_func1(),  remote_func2(t.y)

FROM remote_table t

WHERE t.active = true;


that would become this query on the remote side:

SELECT t.x, local_func1(), local_func2(t.y)

FROM local_table t

WHERE t.active = true;



That was probably the main intention of this feature, but I see a different
possibility there. Consider the cases:

SELECT remote_func(1,'a');


and

SELECT * FROM remote_srf(10, true);


Now we could have written remote_func() and remote_srf() in plpythonu, and
it could access whatever remote data that we wanted to see, but that
exposes our local server to the untrusted pl/python module as well as
python process overhead.

We could create a specialized foreign data wrapper that requires a WHERE
clause to include all the require parameters as predicates, essentially
making every function a table, but that's awkward and unclear to an end
user.

Having the ability to import functions from other servers allows us to
write foreign servers that expose functions to the local database, and
those foreign servers handle the bloat and risks associated with accessing
that remote data.

Moreover, it would allow hosted environments (AWS, etc) that restrict the
extensions that can be added to the database to still connect to those
foreign data sources.

I'm hoping to submit a patch for this someday, but it touches on several
areas of the codebase where I have no familiarity, so I've put forth to
spark interest in the feature, to see if any similar work is underway, or
if anyone can offer guidance.

Thanks in advance.

Re: CREATE ROUTINE MAPPING

2018-01-12 Thread Corey Huinker

>
> PostgreSQL allows function overloading, which means that there can be
> multiple functions with same name differing in argument types. So, the
> syntax has to include the input parameters or their types at least.
>

"local_routine_name" and "remote_routine_spec" were my own paraphrasings of
what the spec implies. I'm nearly certain that the local routine name,
which the spec says is just an identifier, cannot have a parameter spec on
it, which leaves only one other place to define it, remote_routine_spec,
which wasn't defined at all. I _suppose_ parameter definitions could be
pushed into options, but that'd be ugly.

Re: CREATE ROUTINE MAPPING

2018-01-12 Thread Corey Huinker

>
>
>
> It goes on from there, but I think there's a reasonable interpretation
> of this which allows us to use the same syntax as CREATE
> (FUNCTION|PROCEDURE), apart from the body, e.g.:
>
> CREATE ROUTINE MAPPING local_routine_name
> FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [ argname ]
> argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
>[ RETURNS rettype
>  | RETURNS TABLE ( column_name column_type [, ...] ) ]
> SERVER foreign_server_name
>[ (option [, ...]) ]
>
> Does that seem like too broad an interpretation?
>

That's really interesting. I didn't think to look in the definition of
CREATE FUNCTION to see if a SERVER option popped in there, but seems like a
more accessible way to introduce the notion of remote functions, because I
talked to a few developers about this before posting to the list, and only
one had ever heard of ROUTINE MAPPING and had no clear recollection of it.
An option on CREATE FUNCTION is going to get noticed (and used!) a lot
sooner.

Having said that, I think syntactically we have to implement CREATE ROUTINE
MAPPING, even if it is just translated to a CREATE FUNCTION call.

In either case, I suspected that pg_proc would need a nullable srvid column
pointing to pg_foreign_server, and possibly a new row in pg_language for
'external'. I had entertained having a pg_routine_mappings table like
pg_user_mappings, and we still could, if the proc's language of 'external'
clued the planner to look for the mapping. I can see arguments for either
approach.

Before anyone asks, I looked for, and did not find, any suggestion of
IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there
wouldn't be any way to grab all the functions that .a foreign server is
offering up.

Re: CREATE ROUTINE MAPPING

2018-01-17 Thread Corey Huinker

> > CREATE ROUTINE MAPPING local_routine_name
> > > FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [
> argname ]
> > > argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
> > >[ RETURNS rettype
> > >  | RETURNS TABLE ( column_name column_type [, ...] ) ]
> > > SERVER foreign_server_name
> > >[ (option [, ...]) ]
> > >
> > > Does that seem like too broad an interpretation?
> > >
> >
> > I had entertained having a pg_routine_mappings table like
> > pg_user_mappings, and we still could, if the proc's language of
> > 'external' clued the planner to look for the mapping. I can see
> > arguments for either approach.
>
> It would be good to have them in the catalog somehow if we make CREATE
> ROUTINE MAPPING a DDL.  If I've read the standard correctly, there are
> parts of information_schema which come into play for those routine
> mappings.
>
> > Before anyone asks, I looked for, and did not find, any suggestion of
> > IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there
> > wouldn't be any way to grab all the functions that .a foreign server is
> > offering up.
>
> How about making an option to IMPORT FOREIGN SCHEMA do it?
>
>

Ok, so the steps seem to be:
1. settle on syntax.
2. determine data dictionary structures
3. parse and create those structures
4. "handle" external functions locally
5. provide structures passed to FDW handlers so that they can handle
external functions
6. implement those handlers in postgres_fdw

#1 is largely prescribed for us, though I'm curious as to how the CRM
statements I've made up in examples above would look like as CREATE
FUNCTION ... SERVER ...

#2 deserves a lot of debate, but probably mostly hinges on the new
"language" and how to associate a pg_proc entry with a pg_foreign_server

#3 i'm guessing this is a lot of borrowing code from CREATE ROUTINE MAPPING
but is otherwise pretty straightforward.

#4 an external function obviously cannot be executed locally, doing so
means that the planner failed to push it down, so this is probably
stub-error functions

#5 These functions would essentially be passed in the same as foreign
columns with the "name" as "f(a,b,4)", and the burden of forming the remote
query is on the FDW

Which gets tricky. What should happen in simple situations is obvious:

SELECT t.x, remote_func1(),  remote_func2(t.y)

FROM remote_table t

WHERE t.active = true;


that would become this query on the remote side:

SELECT t.x, local_func1(), local_func2(t.y)

FROM local_table t

WHERE t.active = true;

And it's still simple when local functions consume remote input


SELECT local_func1(remote_func1(r.x)) FROM remote_table r WHERE r.active =
true;


But other situations seem un-handle-able to me:

SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;


In those cases, at least initially, I think the FDW handler is right to
raise an error, because the function inputs are unknowable at query time,
and the inputs cannot also be pushed down to the remote server. That might
not be common, but I can see situations like this:

SELECT r.*
FROM remote_srf( ( SELECT remote_code_value FROM
local_table_of_remote_codes WHERE local_code_value = 'xyz' ) ) r;

and we would want things like that to work. Currently is similar
table-situations the FDW has no choice but to fetch the entire table and
filter locally. That's good for tables, whose contents are knowable, but
the set of possible function inputs is unreasonably large. The current
workaround in table-land is to run the inner query locally, and present the
result at a constant to a follow-up query, so maybe that's what we have to
do here, at least initially.

#6 is where the FDW either does the translation or rejects the notion that
functions can be pushed down, either outright or based on the usage of the
function in the query.


I'm doing this thinking on the mailing list in the hopes that it evokes
suggestions, warnings, suggested code samples, and of course, help.

Re: CREATE ROUTINE MAPPING

2018-01-18 Thread Corey Huinker

>
>
> >
> > But other situations seem un-handle-able to me:
> >
> > SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
>
> Do we have any way, or any plan to make a way, to push the set (SELECT
> x FROM local_table WHERE active = true) to the remote side for
> execution there?  Obviously, there are foreign DBs that couldn't
> support this, but I'm guessing they wouldn't have much by way of UDFs
> either.
>

No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.

\describe*

2018-01-25 Thread Corey Huinker

Some of the discussions about making psql more user friendly (more tab
completions help, exit, etc) got me thinking about other ways that psql
could be more friendly, and the one that comes to mind is our terse but
cryptic \d* commands.

I think it would be helpful and instructive to have corresponding long-form
describe commands.

Take describing schemas. Is \dn intuitive? Not really. In hindsight, you
may think "yeah, a schema is a namespace", but you never guessed 'n' on the
first try, or the second.

Looking over exec_command_d() a bit, I think it's a bit of a stretch do
have each command handle a long form like this:

\describe table my_table
or
\describe table verbose my_table

because then each \d-variant has to account for objects named "table" and
"verbose" and that's a path to unhappiness.

But if we dash-separated them, then all of the strcmps would be in the 'e'
subsection, and each one would just have to know it's long to short
translation, and call exec_command_d with the corresponding short command

describe => d
describe-verbose => d+
describe-aggregates-verbose => da+
describe-roles => du

We could even presume the verbose flag in all cases (after all, the user
was being verbose...), which would also cut down on tab-completion results,
and we could check for interactive mode and display a message like

\describe-schemas (short: \dn+)

so that the person has the opportunity to learn the corresponding short
command.

In additional to aiding tab completion discovery of the commands (i.e.
typing "\desc" and then hitting tab, it would also make scripts a little
more self-documenting.

Thoughts?

Re: \describe*

2018-01-26 Thread Corey Huinker

>
> It would be about as hard to memorize \describe-schemas as it is to
> memorize \dn:
> You'd have to remember that it is "-" and not "_", that it is "describe",
> not "desc"
> and that it is "schemas", not "schema".
>

You wouldn't memorize them.  You'd discover them with tab completion.

Type "\d"  and you'll see
\d  \dA \dc \dd \ddp \des \deu \df \dFd \dFt \di \dL \dn \d0 \drds \dS \dT
\dv \dy
\da \db \dC \dD \dE  \det \dew \dF \dFp \dg  \dl \dm \do \dp \ds   \dt \du
\dx

which is more heat than light. Yes, those are all the possibilites, but I,
Joe Newguy, want to list schemas, and \ds and \dS look like the good
guesses, neither of which is the right answer.  If, with this feature, I
typed \desc, I might see:

\describe  \describe-functions  \describe-schemas  \describe-tables
...

So my voyage of discovery would have completed with having typed
"\desc-sc" and if we add a note to interactive mode, I'd be shown
the hint that \dn is the shortcut for that just above the list of schemas.

Re: [PATCH v1] Add \echo_stderr to psql

2019-04-21 Thread Corey Huinker

>
>
>\warn ...
>\warning ...
>

These two seem about the best to me, drawing from the perl warn command.

I suppose we could go the bash &2 route here, but I don't want to.

Re: range_agg

2019-05-04 Thread Corey Huinker

>
> One question is how to aggregate ranges that would leave gaps and/or
> overlaps. So in my extension there is a one-param version that forbids
> gaps & overlaps, but I let you permit them by passing extra parameters,
> so the signature is:
>

Perhaps a third way would be to allow and preserve the gaps.

A while back I wrote an extension called disjoint_date_range for storing
sets of dates where it was assumed that most dates would be contiguous. The
basic idea was that The core datatype was an array of ranges of dates, and
with every modification you'd unnest them all to their discrete elements
and use a window function to identify "runs" of dates and recompose them
into a canonical set. It was an efficient way of representing "Every day
last year except for June 2nd and August 4th, when we closed business for
special events."

For arrays of ranges the principle is the same but it'd get a bit more
tricky, you'd have to order by low bound, use window functions to detect
adjacency/overlap to identify your runs, and the generate the canonical
minimum set of ranges in your array.

Re: Table as argument in postgres function

2019-05-19 Thread Corey Huinker

>
>
> You can pass table name as text or table object id as regclass type.
>
> inside procedure you should to use dynamic sql - execute statement.
> Generally you cannot to use a variable as table or column name ever.
>
> Dynamic SQL is other mechanism - attention on SQL injection.
>

On this note, Snowflake has the ability to to parameterize object names
(see:
https://docs.snowflake.net/manuals/sql-reference/identifier-literal.html )

So you can do things like
SELECT col_a, col_b FROM identifier('a_table_name')
or as a bind variable
SELECT col_a, col_b FROM identifier($1)

Which is their way of avoiding SQL injection attacks in *some* circumstances.
Their implementation of it is a bit uneven, but it has proven useful for my
work.

I can see where this obviously would prevent the planning of a prepared
statement when a table name is a parameter, but the request comes up often
enough, and the benefits to avoiding SQL injection attacks are significant
enough that maybe we should try to enable it for one-off. I don't
necessarily think we need an identifier(string) function, a
'schema.table'::regclass would be more our style.

Is there anything preventing us from having the planner resolve object
names from strings?

Re: Table as argument in postgres function

2019-05-21 Thread Corey Huinker

>
>
>> Is there anything preventing us from having the planner resolve object
>> names from strings?
>>
>
> The basic problem is fact so when you use PREPARE, EXECUTE protocol, you
> has not parameters in planning time.
>

I agree that it defeats PREPARE as it is currently implemented with
PQprepare(), and it would never be meaningful to have a query plan that
hasn't finalized which objects are involved.

But could it be made to work with PQexecParams(), where the parameter
values are already provided?

Could we make a version of PQprepare() that takes an extra array of
paramValues for object names that must be supplied at prepare-time?

Re: PostgreSQL 12 Beta 1 press release draft

2019-05-21 Thread Corey Huinker

For CTEs, is forcing inlining the example we want to give, rather than the
example of forcing materialization given?

According to the docs, virtual generated columns aren't yet supported. I'm
pretty sure the docs are right. Do we still want to mention it?

Otherwise it looks good to me.

On Tue, May 21, 2019 at 11:39 PM Jonathan S. Katz 
wrote:

> Hi,
>
> Attached is a draft of the PG12 Beta 1 press release that is going out
> this Thursday. The primary goals of this release announcement are to
> introduce new features, enhancements, and changes that are available in
> PG12, as well as encourage our users to test and provide feedback to
> help ensure the stability of the release.
>
> Speaking of feedback, please provide me with your feedback on the
> technical correctness of this announcement so I can incorporate changes
> prior to the release.
>
> Thanks!
>
> Jonathan
>

Re: Why we allow CHECK constraint contradiction?

2018-10-09 Thread Corey Huinker

On Wed, Oct 10, 2018 at 1:44 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tuesday, October 9, 2018, Imai, Yoshikazu <
> imai.yoshik...@jp.fujitsu.com> wrote:
>>
>> Are there any rows which can satisfy the ct's CHECK constraint? If not,
>> why we
>> allow creating table when check constraint itself is contradicted?
>>
>
> I'd bet on it being a combination of complexity and insufficient expected
> benefit.  Time is better spent elsewhere.  Mathmatically proving a
> contradiction in software is harder than reasoning about it mentally.
>

I've actually used that as a feature, in postgresql and other databases,
where assertions were unavailable, or procedural code was unavailable or
against policy.

Consider the following:

CREATE TABLE wanted_values ( x integer );

INSERT INTO wanted_values VALUES (1), (2), (3);


CREATE TABLE found_values ( x integer );

INSERT INTO found_values VALUES (1), (3);


CREATE TABLE missing_values (

x integer,

CONSTRAINT contradiction CHECK (false)

);


INSERT INTO missing_values

SELECT x FROM wanted_values

EXCEPT

SELECT x FROM found_values;


gives the error

ERROR:  new row for relation "missing_values" violates check constraint
"contradiction"

DETAIL:  Failing row contains (2).


Which can be handy when you need to fail a transaction because of bad data
and don't have branching logic available.

Re: COPY FROM WHEN condition

2018-10-11 Thread Corey Huinker

On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen 
wrote:

>
>
> On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <
> c...@burggraben.net> wrote:
>
>> You can:
>>   COPY ( query ) TO 'filename';
>>
> it is for COPY FROM
>
> regards
> Surafel
>

It didn't get far, but you may want to take a look at a rejected patch for
copy_srf() (set returning function)
https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
https://commitfest.postgresql.org/12/869/

Having a set returning function gives you the full expressiveness of SQL,
at the cost of an extra materialization step.

Re: CopyFrom() has become way too complicated

2018-10-15 Thread Corey Huinker

>
> I think the code needs to be split up so that CopyFrom() in the loop
> body calls CopyFromOneTuple(), which then also splits out the tuple
> routing into its own CopyFromOneTupleRoute() function (that's 200 LOC on
> its own...). I suspect it'd also be good to refactor the
> partition-change code out into its own function.
>

+1
I had a hard time with this when doing my copy_srf() misadventure.

Re: date_trunc() in a specific time zone

2018-10-29 Thread Corey Huinker

>
> >> A use case that I see quite a lot of is needing to do reports and other
> >> calculations on data per day/hour/etc but in the user's time zone.  The
> >> way to do that is fairly trivial, but it's not obvious what it does so
> >> reading queries becomes just a little bit more difficult.
>
>
+1 A client encountered this exact problem last week, and I was surprised
that the parameter didn't already exist.

Re: COPY FROM WHEN condition

2018-11-01 Thread Corey Huinker

>
> > Are you thinking something like having a COPY command that provides
> > results in such a way that they could be referenced in a FROM clause
> > (perhaps a COPY that defines a cursor…)?
>
> That would also be nice, but what I was thinking of was that some
> highly restricted subset of cases of SQL in general could lend
> themselves to levels of optimization that would be impractical in
> other contexts.
>

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'


In this case, there is the *opportunity* to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)
- if column d is converted first, we can filter on it and avoid converting
columns a,b
- whatever optimizations we can infer from knowing that the two surviving
columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations
and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every
possible future optimization.

Re: COPY FROM WHEN condition

2018-11-02 Thread Corey Huinker

>
>
> > SELECT x.a, sum(x.b)
> > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> numeric, c text, d date, e json) )
>
> Apologies for bike-shedding, but wouldn't the following be a better
> fit with the current COPY?
>
> COPY t(a integer, b numeric, c text, d date, e json) FROM
> '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)
>

+1 Very much a better fit.

>
>

Re: partitioned tables referenced by FKs

2018-11-04 Thread Corey Huinker

On Fri, Nov 2, 2018 at 7:42 PM Alvaro Herrera 
wrote:

> Here's a patch to allow partitioned tables to be referenced by foreign
> keys.  Current state is WIP, but everything should work; see below for
> the expected exception.
>
> The design is very simple: have one pg_constraint row for each partition
> on each side, each row pointing to the topmost table on the other side;
> triggers appear on each leaf partition (and naturally they don't appear
> on any intermediate partitioned table).
>

This is an important and much needed feature!

Based on my extremely naive reading of this code, I have two perhaps
equally naive questions:

1. it seems that we will continue to to per-row RI checks for inserts and
updates. However, there already exists a bulk check in RI_Initial_Check().
Could we modify this bulk check to do RI checks on a per-statement basis
rather than a per-row basis?

2. If #1 is possible, is the overhead of transitions tables too great for
the single-row case?

Re: partitioned tables referenced by FKs

2018-11-05 Thread Corey Huinker

>
>
> > 1. it seems that we will continue to to per-row RI checks for inserts and
> > updates. However, there already exists a bulk check in
> RI_Initial_Check().
> > Could we modify this bulk check to do RI checks on a per-statement basis
> > rather than a per-row basis?
>
> One of the goals when implementing trigger transition tables was to
> supplant the current per-row implementation of RI triggers with
> per-statement.  I haven't done that, but AFAIK it remains possible :-)
>
> Changing that is definitely not a goal of this patch.
>

Then I may try to tackle it myself in a separate thread.

Without an implementation, I can't say, but if I had to guess, I would
> assume so.  Or maybe there are clever optimizations for that particular
> case.
>

But in this case there is no actual defined trigger, it's internal code
making an SPI call...is there an indicator that tells us whether this
change was multi-row or not?

Re: [HACKERS] generated columns

2018-11-15 Thread Corey Huinker

>
> > 3. Radical alternative: Collapse everything into one new column.  We
> > could combine atthasdef and attgenerated and even attidentity into a new
> > column.  (Only one of the three can be the case.)  This would give
> > client code a clean break, which may or may not be good.  The
> > implementation would be uglier than #1 but probably cleaner than #2.  We
> > could also get 4 bytes back per pg_attribute row.
> >
> > I'm happy with the current choice #1, but it's worth thinking about.
>
> #3 looks very appealing in my opinion as those columns have no overlap,
> so it would take five possible values:
>

Could the removed columns live on...as generated-always columns?

Re: Desirability of client-side expressions in psql?

2018-11-24 Thread Corey Huinker

>
> >>psql> \if :i >= 5
> >>
> > I think we're ok with that so long as none of the operators or values
> has a
> > \ in it.
> > What barriers do you see to re-using the pgbench grammar?
>
> The pgbench expression grammar mimics SQL expression grammar,
> on integers, floats, booleans & NULL.
>
> I'm unsure about some special cases in psql (`shell command`,
> 'text' "identifier"). They can be forbidden on a new commande (\let),
> but what happens on "\if ..." which I am afraid allows them is unclear.
>
> --
> Fabien.
>

(raising this thread from hibernation now that I have the bandwidth)

It seems like the big barriers to just using pgbench syntax are:
  - the ability to indicate that the next thing to follow will be a pgbench
expression
  - a way to coax pgbench truth-y values into psql truthy values (t/f, y/n,
1/0)

For that, I see a few ways forward:

1. A suffix on \if, \elif, -exp suffix (or even just -x) to existing
commands to indicate that a pgbench expression would follow
This would look something like
\ifx \elifx \setx
\if$ \elif$ \set$

2. A command-line-esque switch or other sigil to indicate that what follows
is a pgbench expression with psql vars to interpolate
Example:
\set foo -x 1 + 4
\set foo \expr 1 + 4
\if -x :limit > 10
\if \expr :limit > 10

3. A global toggle to indicate which mode should be used by \if, \elif, and
\set
Example:
 \pset expressions [on | off]

4. A combination of #2 and #3 with a corresponding switch/sigil to indicate
"do not evaluate pgbench-style
   This is particularly appealing to me because it would allow code
snippets from pgbench to be used without modification, while still allowing
the user to mix-in old/new style to an existing script.

5. A special variant of `command` where variables are interpolated before
being sent to the OS, and allow that on \if, \elif
\set foo ``expr :y + :z``
\set foo $( expr :y + :z )
\if ``expr :limit > 10``
\if $( expr :limit > 10 )

This also has some appeal because it allows for a great amount of
flexibility, but obviously constrains us with OS-dependencies. The user
might have a hard time sending commands with ')' in them if we go the $( )
route

6. Option #5, but we add an additional executable (suggested name: pgexpr)
to the client libs, which encapsulates the pgbench expression library as a
way around OS-dependent code.

7. I believe someone suggested introducing the :{! pgbench-command} or :{{
pgbench-command }} var-mode
\set foo :{! :y + :z }
\set foo :{{ :y + :z }}
\if :{! :limit > 10 }
\if :{{ :limit > 10 }}

This has some appeal as well, though I prefer the {{...}}  syntax
because "!" looks like negation, and {{ resembles the [[ x + y ]] syntax in
bash

One nice thing is that most of these options are not mutually exclusive.

Thoughts?

Re: csv format for psql

2018-11-25 Thread Corey Huinker

>
>
> Or we could kill both issues by hard-wiring the separator as ','.


+1

I've never encountered a situation where a customer wanted a custom
delimiter AND quoted strings. So either they wanted pure CSV or a customed
TSV.

Could we have another output type called "separated" that uses the existing
--fieldsep / --recordsep? Word will get out that csv is faster, but we'd
still have the flexibility if somebody really wanted it.

Re: csv format for psql

2018-11-25 Thread Corey Huinker

On Sun, Nov 25, 2018 at 11:23 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Could we have another output type called "separated" that uses the
> existing
> > --fieldsep / --recordsep?
>
> Uh, what's the difference from the existing unaligned format?
>

No footer and I guess we'd want to escape instances of fieldsep and
recordsep in the data, so I guess if we had an option to escape instances
of fieldsep/recordsep found in the data, unaligned would work fine.

Re: Statistics Import and Export

2024-02-02 Thread Corey Huinker

On Mon, Jan 22, 2024 at 1:09 AM Peter Smith  wrote:

> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.
>
> ==
> [1] https://commitfest.postgresql.org/46/4538/
> [2]
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4538
>
> Kind Regards,
> Peter Smith.
>

Attached is v4 of the statistics export/import patch.

This version has been refactored to match the design feedback received
previously.

The system views are gone. These were mostly there to serve as a baseline
for what an export query would look like. That role is temporarily
reassigned to pg_export_stats.c, but hopefully they will be integrated into
pg_dump in the next version. The regression test also contains the version
of each query suitable for the current server version.

The export format is far closer to the raw format of pg_statistic and
pg_statistic_ext_data, respectively. This format involves exporting oid
values for types, collations, operators, and attributes - values which are
specific to the server they were created on. To make sense of those values,
a subset of the columns of pg_type, pg_attribute, pg_collation, and
pg_operator are exported as well, which allows pg_import_rel_stats() and
pg_import_ext_stats() to reconstitute the data structure as it existed on
the old server, and adapt it to the modern structure and local schema
objects.

pg_import_rel_stats matches up local columns with the exported stats by
column name, not attnum. This allows for stats to be imported when columns
have been dropped, added, or reordered.

pg_import_ext_stats can also handle column reordering, though it currently
would get confused by changes in expressions that maintain the same result
data type. I'm not yet brave enough to handle importing nodetrees, nor do I
think it's wise to try. I think we'd be better off validating that the
destination extended stats object is identical in structure, and to fail
the import of that one object if it isn't perfect.

Export formats go back to v10.

Re: Document efficient self-joins / UPDATE LIMIT techniques.

2024-02-03 Thread Corey Huinker

>
> I have changed the status of commitfest entry to "Returned with
> Feedback" as Laurenz's comments have not yet been resolved. Please
> handle the comments and update the commitfest entry accordingly.
>
>
Here's another attempt, applying Laurenz's feedback:

I removed all changes to the SELECT documentation. That might seem strange
given that the heavy lifting happens in the SELECT, but I'm working from
the assumption that people's greatest need for a ctid self-join will be
because they are trying to find the LIMIT keyword on UPDATE/DELETE and
coming up empty.

Because the join syntax is subtly different between UPDATE and DELETE, I've
kept code examples in both, but the detailed explanation is in UPDATE under
the anchor "update-limit" and the DELETE example links to it.
From 298c812838491408e6910f7535067ea147abe5fc Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Sat, 3 Feb 2024 14:38:50 -0500
Subject: [PATCH v2] Documentation: Show alternatives to LIMIT on UPDATE and
 DELETE

Show examples of how to simulate UPDATE or DELETE with a LIMIT clause.

These examples also serve to show the existence and utility of ctid self-joins.
---
 doc/src/sgml/ref/delete.sgml | 18 +
 doc/src/sgml/ref/update.sgml | 38 +++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml
index 1b81b4e7d7..21aae30e91 100644
--- a/doc/src/sgml/ref/delete.sgml
+++ b/doc/src/sgml/ref/delete.sgml
@@ -234,6 +234,24 @@ DELETE FROM films
In some cases the join style is easier to write or faster to
execute than the sub-select style.
   
+  
+   While there is no LIMIT clause for
+   DELETE, it is possible to get a similar effect
+   using the method for UPDATE operations described
+   in greater detail here.
+
+WITH delete_batch AS (
+  SELECT l.ctid
+  FROM user_logs AS l
+  WHERE l.status = 'archived'
+  ORDER BY l.creation_date
+  LIMIT 1
+  FOR UPDATE
+)
+DELETE FROM user_logs AS ul
+USING delete_branch AS del
+WHERE ul.ctid = del.ctid;
+
  
 
  
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 2ab24b0523..49e0dc29de 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -434,7 +434,6 @@ UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003';
 COMMIT;
 
   
-
   
Change the kind column of the table
films in the row on which the cursor
@@ -442,6 +441,43 @@ COMMIT;
 
 UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films;
 
+  
+   Updates affecting many rows can have negative effects on system performance,
+   such as table bloat, increased replica lag, increased lock contention,
+   and possible failure of the operation due to a deadlock. In such situations
+   it can make sense to perform the operation in smaller batches. Performing a
+   VACUUM operation on the table in between batches can help
+   reduce table bloat. The
+   SQL standard does
+   not define a LIMIT clause for UPDATE
+   operations, but it is possible get a similar effect through the use of a
+   Common Table Expression and an
+   efficient self-join via the system column
+   ctid:
+
+WITH exceeded_max_retries AS (
+  SELECT w.ctid
+  FROM work_item AS w
+  WHERE w.status = 'active'
+  AND w.num_retries > 10
+  ORDER BY w.retry_timestamp
+  FOR UPDATE
+  LIMIT 5000
+)
+UPDATE work_item
+SET status = 'failed'
+FROM exceeded_max_retries AS emr
+WHERE work_item.ctid = emr.ctid
+
+If lock contention is a concern, then SKIP LOCKED can
+be added to the CTE. However, one final
+UPDATE without SKIP LOCKED or
+LIMIT will be needed to ensure that no matching rows
+were overlooked. The use of an ORDER BY clause allows
+the command to prioritize which rows will be locked and updated. This can
+also reduce contention with other update operations if they use the same
+ordering.
+  
  
 
  
-- 
2.43.0

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Corey Huinker

>
>
>
> The patch set seem to be in a good shape and pretty stable for quite a
> while.
> Could you add one more minor improvement, a new line after variables
> declaration?
>

As you wish. Attached.


>
> After this changes, I think, we make this patch RfC, shall we?
>
>
Fingers crossed.
From bb55e0fd1cd2de6fa25b7fc738dd6482d0c008c4 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v9 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.2

From fe954f883f4ee65cbfe5dc2c20f012465d031ada Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:04:31 -0500
Subject: [PATCH v9 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..1eb7ff3fa2 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.2

From 0f91719c26b6201f4aaa436bd13141ba325e2f85 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 22 Feb 2023 14:55:34 -0500
Subject: [PATCH v9 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 15 +
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 34 +-
 src/test/regress/expected/psql.out | 29 +
 src/test/regress/sql/psql.sql  | 25 ++
 6 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 955397ee9d..40ba2810ea 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5032,6 +5032,21 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	if (result == 0)
+	{
+		SetVariable(pset.vars, "SHELL_EXIT_CODE", "0");
+		SetVariable(pset.vars, "SHELL_ERROR", "false");
+	}
+	else
+	{
+		int		exit_code = wait_result_to_exit_code(result);
+		char	buf[32];
+
+		snpri

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Corey Huinker

On Wed, Feb 22, 2023 at 4:18 PM Thomas Munro  wrote:

> On Tue, Jan 31, 2023 at 9:23 AM Corey Huinker 
> wrote:
> >> Unfortunately, there is a fail in FreeBSD
> https://cirrus-ci.com/task/6466749487382528
> >>
> >> Maybe, this patch is need to be rebased?
> >>
> >
> > That failure is in postgres_fdw, which this code doesn't touch.
> >
> > I'm not able to get to
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/regression.out
> - It's not in either of the browseable zip files and the 2nd zip file isn't
> downloadable, so it's hard to see what's going on.
>
> Yeah, that'd be our current top flapping CI test[1].  These new
> "*-running" tests (like the old "make installcheck") are only enabled
> in the FreeBSD CI task currently, so that's why the failures only show
> up there.  I see[2] half a dozen failures of the "fdw_retry_check"
> thing in the past ~24 hours.
>
> [1]
> https://www.postgresql.org/message-id/flat/20221209001511.n3vqodxobqgscuil%40awork3.anarazel.de
> [2] http://cfbot.cputube.org/highlights/regress.html


Thanks for the explanation. I was baffled.

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2023-02-23 Thread Corey Huinker

On Wed, Feb 22, 2023 at 5:47 PM Andres Freund  wrote:

> Hi,
>
> On 2023-02-22 16:34:44 -0500, Tom Lane wrote:
> > I wrote:
> > > Andres Freund  writes:
> > >> Maybe it's worth sticking a StaticAssert() for the struct size
> > >> somewhere.
> >
> > > Indeed.  I thought we had one already.
> >
> > >> I'm a bit wary about that being too noisy, there are some machines
> with
> > >> odd alignment requirements. Perhaps worth restricting the assertion to
> > >> x86-64 + armv8 or such?
> >
> > > I'd put it in first and only reconsider if it shows unfixable problems.
> >
> > Now that we've got the sizeof(ExprEvalStep) under control, shouldn't
> > we do the attached?
>
> Indeed. Pushed.
>
> Let's hope there's no rarely used architecture with odd alignment rules.
>
> Greetings,
>
> Andres Freund
>
>
>
I have a question about this that may affect some of my future work.

My not-ready-for-16 work on CAST( ... ON DEFAULT ... ) involved making
FuncExpr/IoCoerceExpr/ArrayCoerceExpr have a safe_mode flag, and that
necessitates adding a reserror boolean to ExprEvalStep for subsequent steps
to test if the error happened.

Will that change be throwing some architectures over the 64 byte count?

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2023-02-24 Thread Corey Huinker

On Thu, Feb 23, 2023 at 2:39 PM Andres Freund  wrote:

> Hi,
>
> On 2023-02-23 13:56:56 -0500, Tom Lane wrote:
> > Corey Huinker  writes:
> > > My not-ready-for-16 work on CAST( ... ON DEFAULT ... ) involved making
> > > FuncExpr/IoCoerceExpr/ArrayCoerceExpr have a safe_mode flag, and that
> > > necessitates adding a reserror boolean to ExprEvalStep for subsequent
> steps
> > > to test if the error happened.
> >
> > Why do you want it in ExprEvalStep ... couldn't it be in ExprState?
> > I can't see why you'd need more than one at a time during evaluation.
>
> I don't know exactly what CAST( ... ON DEFAULT ... ) is aiming for - I
> guess
> it wants to assign a different value when the cast fails?  Is the default
> expression a constant, or does it need to be runtime evaluated?  If a
> const,
> then the cast steps just could assign the new value. If runtime evaluation
> is
> needed I'd expect the various coerce steps to jump to the value
> implementing
> the default expression in case of a failure.
>

The default expression is itself a cast expression. So CAST (expr1 AS
some_type DEFAULT expr2 ON ERROR) would basically be a safe-mode cast of
expr1 to some_type, and only upon failure would the non-safe cast of expr2
to some_type be executed. Granted, the most common use case would be for
expr2 to be a constant or something that folds into a constant, but the
proposed spec allows for it.

My implementation involved adding a setting to CoalesceExpr that tested for
error flags rather than null flags, hence putting it in ExprEvalStep and
ExprState (perhaps mistakenly). Copying and adapting EEOP_JUMP_IF_NOT_NULL
lead me to this:

  EEO_CASE(EEOP_JUMP_IF_NOT_ERROR)
  {
  /* Transfer control if current result is non-error */
  if (!*op->reserror)
  {
  *op->reserror = false;
  EEO_JUMP(op->d.jump.jumpdone);
  }

  /* reset error flag */
  *op->reserror = false;

  EEO_NEXT();
  }

Re: Disable vacuuming to provide data history

2023-02-24 Thread Corey Huinker

On Thu, Feb 23, 2023 at 6:04 AM  wrote:

> Hey,
>
> It depnends on scenario, but there is many use cases that hack data
> change from somebody with admin privileges could be disaster.
> That is the place where data history could come with help.  Some basic
> solution would be trigger which writes previous version of record
> to some other table. Trigger however can be disabled or removed (crazy
> solution would be to provide pernament
> triggers and tables which  can only be pernamently inserted).
> Then we have also possibility to modify tablespace directly on disk.
>
> But Postgres has ability to not override records when two concurrent
> transaction modify data to provide MVCC.
>
> So what about pernamently not vacuumable tables. Adding some xid log
> tables with hash of record on hash on previous hash.
> I think that would be serious additional advantage for best open source
> relational databes.
>
> Best regards,
>Marek Mosiewicz
>

What you are describing sounds like the "system versioning" flavor of
"temporal" tables. It's a part of the SQL Standard, but PostgreSQL has yet
to implement it in core. Basically, every row has a start_timestamp and
end_timestamp field. Updating a row sets the end_timestamp of the old
version and inserts a new one with a start_timestamp matching the
end-timestamp of the previous row. Once a record has a non-null [1]
end_timestamp, it is not possible to update that row via SQL. Regular SQL
statements effectively have a "AND end_timestamp IS NULL" filter on them,
so the old rows are not visible without specifically invoking temporal
features to get point-in-time queries. At the implementation level, this
probably means a table with 2 partitions, one for live rows all having null
end_timestamps, and one for archived rows which is effectively append-only.

This strategy is common practice for chain of custody and auditing
purposes, either as a feature of the RDBMS or home-rolled. I have also seen
it used for developing forecasting models (ex "what would this model have
told us to do if we had run it a year ago?").

A few years ago, I personally thought about implementing a hash-chain
feature, but my research at the time concluded that:

* Few customers were interested in going beyond what was required for
regulatory compliance
* Once compliant, any divergence from established procedures, even if it
was an unambiguous improvement, only invited re-examination of it and
adjacent procedures, and they would avoid that
* They could get the same validation by comparing against a secured backup
and out-of-band audit "logs" (most would call them "reports")
* They were of the opinion that if a bad actor got admin access, it was
"game over" anyway

The world may have changed since then, but even if there is now interest, I
wonder if that isn't better implemented at the OS level rather than the
RDBMS level.

 [1] some implementations don't use null, they use an end-timestamp set to
a date implausibly far in the future ( 3999-12-31 for example ), but the
concept remains that once the column is set to a real timestamp, the row
isn't visible to update statements.

Re: verbose mode for pg_input_error_message?

2023-02-24 Thread Corey Huinker

On Thu, Feb 23, 2023 at 4:47 PM Nathan Bossart 
wrote:

> On Thu, Feb 23, 2023 at 11:30:38AM -0800, Nathan Bossart wrote:
> > Will post a new version shortly.
>
> As promised...
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Looks good to me, passes make check-world. Thanks for slogging through this.

Re: Add SHELL_EXIT_CODE to psql

2023-03-02 Thread Corey Huinker

On Thu, Mar 2, 2023 at 1:35 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > [ v9-0003-Add-psql-variables-SHELL_ERROR-and-SHELL_EXIT_COD.patch ]
>
> I took a brief look through this.  I'm on board with the general
> concept, but I think you've spent too much time on an ultimately
> futile attempt to get a committable test case, and not enough time
>

I didn't want to give the impression that I was avoiding/dismissing the
test case, and at some point I got curious how far we could push pg_regress.

> on making the behavior correct/usable.  In particular, it bothers
> me that there doesn't appear to be any way to distinguish whether
> a command failed on nonzero exit code versus a signal.  Also,
>

That's an intriguing distinction, and one I hadn't considered, mostly
because I assumed that any command with a set of exit codes rich enough to
warrant inspection would have a separate exit code for i-was-terminated.

It would certainly be possible to have two independent booleans,
SHELL_ERROR and SHELL_SIGNAL.

> I see that you're willing to report whatever ferror returns
> (something quite unspecified in relevant standards, beyond
> zero/nonzero) as an equally-non-distinguishable integer.
>

I had imagined users of this feature falling into two camps:

1. Users writing scripts for their specific environment, with the ability
to know what exit codes are possible and different desired courses of
action based on those codes.
2. Users in no specific environment, content with SHELL_ERROR boolean, who
are probably just going to test for failure, and if so either \set a
default or \echo a message and \quit.

>
> I'm tempted to suggest that we ought to be returning a string,
> along the lines of "OK" or "exit code N" or "signal N" or
> "I/O error".  This though brings up the question of exactly
> what you expect scripts to use the variable for.  Maybe such
> a definition would be unhelpful, but if so why?  Maybe we
> should content ourselves with adding the SHELL_ERROR boolean, and
> leave the details to whatever we print in the error message?
>

Having a curated list of responses like "OK" and "exit code N" is also
intriguing, but could be hard to unpack given psql's limited string
manipulation capabilities.

I think the SHELL_ERROR boolean solves most cases, but it was suggested in
https://www.postgresql.org/message-id/20221102115801.gg16...@telsasoft.com
that users might want more detail than that, even if that detail is
unspecified and highly system dependent.

>
> As for the test case, I'm unimpressed with it because it's so
> weak as to be nearly useless; plus it seems like a potential
> security issue (what if "nosuchcommand" exists?).  It's fine
> to have it during development, I guess, but I'm inclined to
> leave it out of the eventual commit.
>
>
I have no objection to leaving it out. I think it proves that writing
os-specific pg_regress tests is hard, and little else.

Re: Add SHELL_EXIT_CODE to psql

2023-03-17 Thread Corey Huinker

On Fri, Mar 10, 2023 at 3:49 PM Tom Lane  wrote:

> I'm okay with adopting bash's rule, but then it should actually match
> bash --- signal N is reported as 128+N, not -N.
>

128+N is implemented.

Nonzero pclose errors of any kind will now overwrite an existing exit_code.

I didn't write a formal test for the signals, but instead tested it
manually:

[310444:16:54:32 EDT] corey=# \! sleep 1000
-- in another window, i found the pid of the sleep command and did a kill
-9 on it
[310444:16:55:15 EDT] corey=# \echo :SHELL_EXIT_CODE
137


137 = 128+9, so that checks out.

The OS-specific regression test has been modified. On Windows systems it
attempts to execute "CON", and on other systems it attempts to execute "/".
These are marginally better than "nosuchcommand" in that they should always
exist on their host OS and could never be a legit executable. I have no
opinion whether the test should live on past the development phase, but if
it does not then the 0001 patch is not needed.
From a5dddedcc7bf654b4b65c14ff7b89b9d83bb5c27 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 06:43:24 -0400
Subject: [PATCH v9 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 7b23cc80dc..333aaab139 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.2

From fe578661eef7dbe33aec8f4eaa6dca80a1304c9b Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 06:44:43 -0400
Subject: [PATCH v9 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..8e1c0e3fc8 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return 128 + WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.2

From 251863ddcbc4eecf4fa15e332724acbd3c652478 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 16:46:57 -0400
Subject: [PATCH v9 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 15 +
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 35 +-
 src/test/regress/expected/psql.out | 34 +
 src/test/regress/sql/psql.sql  | 30 +
 6 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..e6e307fd18 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4267,6 +4267,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+

Re: Add SHELL_EXIT_CODE to psql

2023-03-20 Thread Corey Huinker

On Mon, Mar 20, 2023 at 1:01 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > 128+N is implemented.
>
> I think this mostly looks OK, but:
>
> * I still say there is no basis whatever for believing that the result
> of ferror() is an exit code, errno code, or anything else with
> significance beyond zero-or-not.  Feeding it to wait_result_to_exit_code
> as you've done here is not going to do anything but mislead people in
> a platform-dependent way.  Probably should set exit_code to -1 if
> ferror reports trouble.
>

Agreed. Sorry, I read your comment wrong the last time or I would have
already made it so.


>
> * Why do you have wait_result_to_exit_code defaulting to return 0
> if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED?
> That seems pretty misleading; again -1 would be a better idea.
>

That makes sense as well. Given that WIFSIGNALED is currently defined as
the negation of WIFEXITED, whatever default result we have here is
basically a this-should-never-happen. That might be something we want to
catch with an assert, but I'm fine with a -1 for now.
From 1ff2b5ba4b82efca0edf60a9fb1cdf8307471eee Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 06:43:24 -0400
Subject: [PATCH v11 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 7b23cc80dc..333aaab139 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.2

From 3d9c43cfec7f60785c1f1fa1aaa3e1b48224ef98 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 20 Mar 2023 16:05:10 -0400
Subject: [PATCH v11 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..80b94cae51 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return 128 + WTERMSIG(exit_status);
+	return -1;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.2

From bf65f36fd57977acaac57972425d1717a205cb72 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 20 Mar 2023 16:05:23 -0400
Subject: [PATCH v11 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 15 +
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 33 +
 src/test/regress/expected/psql.out | 34 ++
 src/test/regress/sql/psql.sql  | 30 ++
 6 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..e6e307fd18 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4267,6 +4267,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+

Re: Add SHELL_EXIT_CODE to psql

2023-03-21 Thread Corey Huinker

>
>
> As you had it, any nonzero result would prevent backtick substitution.
> I'm not really sure how much thought went into the existing behavior,
> but I am pretty sure that it's not part of this patch's charter to
> change that.
>
> regards, tom lane
>

 The changes all make sense, thanks!

Re: Add n_tup_newpage_upd to pg_stat table views

2023-03-22 Thread Corey Huinker

>
>
> * No more dedicated struct to carry around the type of an update.
>
> We just use two boolean arguments to the pgstats function instead. The
> struct didn't seem to be adding much, and it was distracting to track
> the information this way within heap_update().
>

That's probably a good move, especially if we start tallying updates that
use TOAST.

Re: doc: add missing "id" attributes to extension packaging page

2023-03-23 Thread Corey Huinker

>
> TBH I'm a bit afraid that people will immediately start complaining
> about the failing docs builds after this got applied since it forces
> them to add ids to all varlistenries in a variablelist if they add one,
> which can be perceived as quite a burden (also committers and reviewers
> will have to get used to always watch out for failing docs builds
> because of this).
>

As a person who had to re-rebase a patch because I discovered that id tags
had been added to one of the files I was working on, I can confidently say
"don't worry". It wasn't that big of a deal, I wasn't even following this
thread at the time and I immediately figured out what had happened and what
was expected of me. So it isn't that much of an inconvenience. If there is
a negative consequence to this change, it would be that it might
incentivize patch writers to omit documentation completely at the early
stages. But I don't think that's a problem because people generally see a
lack of documentation as a clue that maybe the patch isn't ready to be
reviewed, and this change would only reinforce that litmus test.

I had suggested we do something like this a few years back (the ids, that
is. the idea that we could check for compliance was beyond my imagination
at the time), and I'm glad to see both finally happening.

While I can foresee people overlooking the docs build, such oversights
won't go long before being caught, and the fix is simple.  Now if we can
just get a threaded version of xlstproc to make the builds faster...

p.s. I'm "Team Paperclip" when it comes to the link hint, but let's get the
feature in first and worry about the right character later.

Re: Make ON_ERROR_STOP stop on shell script failure

2023-03-24 Thread Corey Huinker

On Fri, Mar 24, 2023 at 11:07 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 20.03.23 19:31, Greg Stark wrote:
> > So I took a look at this patch. The conflict is with 2fe3bdbd691
> > committed by Peter Eisentraut which added error checks for pipes.
> > Afaics the behaviour is now for pipe commands returning non-zero to
> > cause an error*always*  regardless of the setting of ON_ERROR_STOP.
>

Commit b0d8f2d983cb25d1035fae1cd7de214dd67809b4 adds SHELL_ERROR as a set
to 'true' whenever a \! or backtick command has a nonzero exit code. So
it'll need some rebasing to remove the duplicated work.

So it's now possible to do this:

\set result = `some command`
\if :SHELL_ERROR
   -- maybe test SHELL_EXIT_CODE to see what kind of error
   \echo some command failed
   -- nah, just quit
   \q
\endif

> > I'm not entirely sure that's sensible actually. If you write to a pipe
> > that ends in grep and it happens to produce no matching rows you may
> > actually be quite surprised when that causes your script to fail...
>

I agree that that would be quite surprising, and this feature would be a
non-starter for them. But if we extended the SHELL_ERROR and
SHELL_EXIT_CODE patch to handle output pipes (which maybe we should have
done in the first place), the handling would look like this:

SELECT ... \g grep Frobozz
\if :SHELL_ERROR
  SELECT :SHELL_EXIT_CODE = 1 AS grep_found_nothing \gset
  \if :grep_found_nothing
  ...not-a-real-error handling...
  \else
  ...regular error handling...
  \endif
\endif

...and that would be the solution for people who wanted to do something
more nuanced than ON_ERROR_STOP.

Re: Make ON_ERROR_STOP stop on shell script failure

2023-03-24 Thread Corey Huinker

On Fri, Mar 24, 2023 at 2:16 PM Corey Huinker 
wrote:

>
>
> On Fri, Mar 24, 2023 at 11:07 AM Peter Eisentraut <
> peter.eisentr...@enterprisedb.com> wrote:
>
>> On 20.03.23 19:31, Greg Stark wrote:
>> > So I took a look at this patch. The conflict is with 2fe3bdbd691
>> > committed by Peter Eisentraut which added error checks for pipes.
>> > Afaics the behaviour is now for pipe commands returning non-zero to
>> > cause an error*always*  regardless of the setting of ON_ERROR_STOP.
>>
>
> Commit b0d8f2d983cb25d1035fae1cd7de214dd67809b4 adds SHELL_ERROR as a set
> to 'true' whenever a \! or backtick command has a nonzero exit code. So
> it'll need some rebasing to remove the duplicated work.
>
> So it's now possible to do this:
>
> \set result = `some command`
> \if :SHELL_ERROR
>-- maybe test SHELL_EXIT_CODE to see what kind of error
>\echo some command failed
>-- nah, just quit
>\q
> \endif
>
>
>> > I'm not entirely sure that's sensible actually. If you write to a pipe
>> > that ends in grep and it happens to produce no matching rows you may
>> > actually be quite surprised when that causes your script to fail...
>>
>
> I agree that that would be quite surprising, and this feature would be a
> non-starter for them. But if we extended the SHELL_ERROR and
> SHELL_EXIT_CODE patch to handle output pipes (which maybe we should have
> done in the first place), the handling would look like this:
>
> SELECT ... \g grep Frobozz
> \if :SHELL_ERROR
>   SELECT :SHELL_EXIT_CODE = 1 AS grep_found_nothing \gset
>   \if :grep_found_nothing
>   ...not-a-real-error handling...
>   \else
>   ...regular error handling...
>   \endif
> \endif
>
> ...and that would be the solution for people who wanted to do something
> more nuanced than ON_ERROR_STOP.
>
>
Dangit. Replied to Peter's email thinking he had gone off Greg's culling of
the recipients. Re-culled.

Re: Add SHELL_EXIT_CODE to psql

2023-03-24 Thread Corey Huinker

On Tue, Mar 21, 2023 at 3:33 PM Corey Huinker 
wrote:

>
>> As you had it, any nonzero result would prevent backtick substitution.
>> I'm not really sure how much thought went into the existing behavior,
>> but I am pretty sure that it's not part of this patch's charter to
>> change that.
>>
>> regards, tom lane
>>
>
>  The changes all make sense, thanks!
>

This is a follow up patch to apply the committed pattern to the various
piped output commands.

I spotted this oversight in the
https://www.postgresql.org/message-id/CADkLM=dMG6AAWfeKvGnKOzz1O7ZNctFR1BzAA3K7-+XQxff=4...@mail.gmail.com
thread and, whether or not that feature gets in, we should probably apply
it to output pipes as well.
From 9e29989372b7f42fcb7eac6dd15769ebe643abb5 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 24 Mar 2023 17:20:27 -0400
Subject: [PATCH v1] Have pipes set SHELL_ERROR and SHELL_EXIT_CODE.

---
 src/bin/psql/common.c | 20 +---
 src/bin/psql/copy.c   |  4 
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index f907f5d4e8..3be24250ad 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -103,7 +103,13 @@ setQFout(const char *fname)
 	if (pset.queryFout && pset.queryFout != stdout && pset.queryFout != stderr)
 	{
 		if (pset.queryFoutPipe)
-			pclose(pset.queryFout);
+		{
+			char	buf[32];
+			int		exit_code = pclose(pset.queryFout);
+			snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(exit_code));
+			SetVariable(pset.vars, "SHELL_EXIT_CODE", buf);
+			SetVariable(pset.vars, "SHELL_ERROR", (exit_code == 0) ? "false" : "true");
+		}
 		else
 			fclose(pset.queryFout);
 	}
@@ -1652,7 +1658,11 @@ ExecQueryAndProcessResults(const char *query,
 	{
 		if (gfile_is_pipe)
 		{
-			pclose(gfile_fout);
+			char	buf[32];
+			int		exit_code = pclose(gfile_fout);
+			snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(exit_code));
+			SetVariable(pset.vars, "SHELL_EXIT_CODE", buf);
+			SetVariable(pset.vars, "SHELL_ERROR", (exit_code == 0) ? "false" : "true");
 			restore_sigpipe_trap();
 		}
 		else
@@ -1870,7 +1880,11 @@ ExecQueryUsingCursor(const char *query, double *elapsed_msec)
 		/* close \g argument file/pipe */
 		if (is_pipe)
 		{
-			pclose(fout);
+			char	buf[32];
+			int		exit_code = pclose(fout);
+			snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(exit_code));
+			SetVariable(pset.vars, "SHELL_EXIT_CODE", buf);
+			SetVariable(pset.vars, "SHELL_ERROR", (exit_code == 0) ? "false" : "true");
 			restore_sigpipe_trap();
 		}
 		else
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 110d2d7269..00e01095fe 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -375,6 +375,7 @@ do_copy(const char *args)
 	{
 		if (options->program)
 		{
+			char		buf[32];
 			int			pclose_rc = pclose(copystream);
 
 			if (pclose_rc != 0)
@@ -391,6 +392,9 @@ do_copy(const char *args)
 }
 success = false;
 			}
+			snprintf(buf, sizeof(buf), "%d", wait_result_to_exit_code(pclose_rc));
+			SetVariable(pset.vars, "SHELL_EXIT_CODE", buf);
+			SetVariable(pset.vars, "SHELL_ERROR", (pclose_rc == 0) ? "false" : "true");
 			restore_sigpipe_trap();
 		}
 		else
-- 
2.39.2

Re: Statistics Import and Export

2024-03-15 Thread Corey Huinker

>
> ANALYZE takes out one lock StatisticRelationId per relation, not per
> attribute like we do now. If we didn't release the lock after every
> attribute, and we only called the function outside of a larger transaction
> (as we plan to do with pg_restore) then that is the closest we're going to
> get to being consistent with ANALYZE.
>

v9 attached. This adds pg_dump support. It works in tests against existing
databases such as dvdrental, though I was surprised at how few indexes have
attribute stats there.

Statistics are preserved by default, but this can be disabled with the
option --no-statistics. This follows the prevailing option pattern in
pg_dump, etc.

There are currently several failing TAP tests around
pg_dump/pg_restore/pg_upgrade. I'm looking at those, but in the mean
time I'm seeking feedback on the progress so far.
From bf291e323fc01215264d41b75d579c11bd22d2ec Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Mon, 11 Mar 2024 14:18:39 -0400
Subject: [PATCH v9 1/2] Create pg_set_relation_stats, pg_set_attribute_stats.

These functions will be used by pg_dump/restore and pg_upgrade to convey
relation and attribute statistics from the source database to the
target. This would be done instead of vacuumdb --analyze-in-stages.

Both functions take an oid to identify the target relation that will
receive the statistics. There is nothing requiring that relation to be
the same one as the one exported, though the statistics would have to
make sense in the context of the new relation. Typecasts for stavaluesN
parameters may fail if the destination column is not of the same type as
the source column.

The parameters of pg_set_attribute_stats identify the attribute by name
rather than by attnum. This is intentional because the column order may
be different in situations other than binary upgrades. For example,
dropped columns do not carry over in a restore.

The statistics imported by pg_set_attribute_stats are imported
transactionally like any other operation. However, pg_set_relation_stats
does it's update in-place, which is to say non-transactionally. This is
in line with what ANALYZE does to avoid table bloat in pg_class.

These functions also allows for tweaking of table statistics in-place,
allowing the user to inflate rowcounts, skew histograms, etc, to see
what those changes will evoke from the query planner.
---
 src/include/catalog/pg_proc.dat   |  15 +
 src/include/statistics/statistics.h   |   2 +
 src/backend/statistics/Makefile   |   3 +-
 src/backend/statistics/meson.build|   1 +
 src/backend/statistics/statistics.c   | 410 ++
 .../regress/expected/stats_export_import.out  | 211 +
 src/test/regress/parallel_schedule|   2 +-
 src/test/regress/sql/stats_export_import.sql  | 198 +
 doc/src/sgml/func.sgml|  95 
 9 files changed, 935 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/statistics/statistics.c
 create mode 100644 src/test/regress/expected/stats_export_import.out
 create mode 100644 src/test/regress/sql/stats_export_import.sql

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 700f7daf7b..a726451a6f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12170,4 +12170,19 @@
   proargtypes => 'int2',
   prosrc => 'gist_stratnum_identity' },
 
+# Statistics Import
+{ oid => '8048',
+  descr => 'set statistics on relation',
+  proname => 'pg_set_relation_stats', provolatile => 'v', proisstrict => 't',
+  proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'oid float4 int4 int4',
+  proargnames => '{relation,reltuples,relpages,relallvisible}',
+  prosrc => 'pg_set_relation_stats' },
+{ oid => '8049',
+  descr => 'set statistics on attribute',
+  proname => 'pg_set_attribute_stats', provolatile => 'v', proisstrict => 'f',
+  proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'oid name bool float4 int4 float4 int2 int2 int2 int2 int2 _float4 _float4 _float4 _float4 _float4 text text text text text',
+  proargnames => '{relation,attname,stainherit,stanullfrac,stawidth,stadistinct,stakind1,stakind2,stakind3,stakind4,stakind5,stanumbers1,stanumbers2,stanumbers3,stanumbers4,stanumbers5,stavalues1,stavalues2,stavalues3,stavalues4,stavalues5}',
+  prosrc => 'pg_set_attribute_stats' },
 ]
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 7f2bf18716..1dddf96576 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -127,4 +127,6 @@ extern StatisticExtInfo *choose_

Re: Statistics Import and Export

2024-03-17 Thread Corey Huinker

>
>
> * pg_set_relation_stats() needs to do an ACL check so you can't set the
> stats on someone else's table. I suggest honoring the new MAINTAIN
> privilege as well.
>

Added.


>
> * If possible, reading from pg_stats (instead of pg_statistic) would be
> ideal because pg_stats already does the right checks at read time, so a
> non-superuser can export stats, too.
>

Done. That was sorta how it was originally, so returning to that wasn't too
hard.


>
> * If reading from pg_stats, should you change the signature of
> pg_set_relation_stats() to have argument names matching the columns of
> pg_stats (e.g. most_common_vals instead of stakind/stavalues)?
>

Done.


>
> In other words, make this a slightly higher level: conceptually
> exporting/importing pg_stats rather than pg_statistic. This may also
> make the SQL export queries simpler.
>

Eh, about the same.


> Also, I'm wondering about error handling. Is some kind of error thrown
> by pg_set_relation_stats() going to abort an entire restore? That might
> be easy to prevent with pg_restore, because it can just omit the stats,
> but harder if it's in a SQL file.
>

Aside from the oid being invalid, there's not a whole lot that can go wrong
in set_relation_stats(). The error checking I did closely mirrors that in
analyze.c.

Aside from the changes you suggested, as well as the error reporting change
you suggested for pg_dump, I also filtered out attempts to dump stats on
views.

A few TAP tests are still failing and I haven't been able to diagnose why,
though the failures in parallel dump seem to be that it tries to import
stats on indexes that haven't been created yet, which is odd because I sent
the dependency.

All those changes are available in the patches attached.
From ba411ce31c25193cf05bc4206dcb8f2bf32af0c7 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Mon, 11 Mar 2024 14:18:39 -0400
Subject: [PATCH v10 1/2] Create pg_set_relation_stats, pg_set_attribute_stats.

These functions will be used by pg_dump/restore and pg_upgrade to convey
relation and attribute statistics from the source database to the
target. This would be done instead of vacuumdb --analyze-in-stages.

Both functions take an oid to identify the target relation that will
receive the statistics. There is nothing requiring that relation to be
the same one as the one exported, though the statistics would have to
make sense in the context of the new relation.

The parameters of pg_set_attribute_stats intentionaly mirror the columns
in the view pg_stats, with the ANYARRAY types casted to TEXT. Those
values will be cast to arrays of the basetype of the attribute, and that
operation may fail if the attribute type has changed.

The statistics imported by pg_set_attribute_stats are imported
transactionally like any other operation. However, pg_set_relation_stats
does it's update in-place, which is to say non-transactionally. This is
in line with what ANALYZE does to avoid table bloat in pg_class.

These functions also allows for tweaking of table statistics in-place,
allowing the user to inflate rowcounts, skew histograms, etc, to see
what those changes will evoke from the query planner.
---
 src/include/catalog/pg_proc.dat   |  15 +
 src/include/statistics/statistics.h   |   2 +
 src/backend/statistics/Makefile   |   3 +-
 src/backend/statistics/meson.build|   1 +
 src/backend/statistics/statistics.c   | 521 ++
 .../regress/expected/stats_export_import.out  | 211 +++
 src/test/regress/parallel_schedule|   2 +-
 src/test/regress/sql/stats_export_import.sql  | 188 +++
 doc/src/sgml/func.sgml|  83 +++
 9 files changed, 1024 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/statistics/statistics.c
 create mode 100644 src/test/regress/expected/stats_export_import.out
 create mode 100644 src/test/regress/sql/stats_export_import.sql

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 700f7daf7b..3070d72d7a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12170,4 +12170,19 @@
   proargtypes => 'int2',
   prosrc => 'gist_stratnum_identity' },
 
+# Statistics Import
+{ oid => '8048',
+  descr => 'set statistics on relation',
+  proname => 'pg_set_relation_stats', provolatile => 'v', proisstrict => 't',
+  proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'oid float4 int4 int4',
+  proargnames => '{relation,reltuples,relpages,relallvisible}',
+  prosrc => 'pg_set_relation_stats' },
+{ oid => '8049',
+  descr => 'set statistics on attribute',
+  proname => 'pg_set_attribute_stats

Re: Statistics Import and Export

2024-03-18 Thread Corey Huinker

>
>
>
>
> From testrun/pg_dump/002_pg_dump/log/regress_log_002_pg_dump, search
> for the "not ok" and then look at what it tried to do right before
> that. I see:
>
> pg_dump: error: prepared statement failed: ERROR:  syntax error at or
> near "%"
> LINE 1: ..._histogram => %L::real[]) coalesce($2, format('%I.%I',
> a.nsp...
>

Thanks. Unfamiliar turf for me.


>
> > All those changes are available in the patches attached.
>
> How about if you provided "get" versions of the functions that return a
> set of rows that match what the "set" versions expect? That would make
> 0001 essentially a complete feature itself.
>

That's tricky. At the base level, those functions would just be an
encapsulation of "SELECT * FROM pg_stats WHERE schemaname = $1 AND
tablename = $2" which isn't all that much of a savings. Perhaps we can make
the documentation more explicit about the source and nature of the
parameters going into the pg_set_ functions.

Per conversation, it would be trivial to add a helper functions that
replace the parameters after the initial oid with a pg_class rowtype, and
that would dissect the values needed and call the more complex function:

pg_set_relation_stats( oid, pg_class)
pg_set_attribute_stats( oid, pg_stats)


>
> I think it would also make the changes in pg_dump simpler, and the
> tests in 0001 a lot simpler.
>

I agree. The tests are currently showing that a fidelity copy can be made
from one table to another, but to do so we have to conceal the actual stats
values because those are 1. not deterministic/known and 2. subject to
change from version to version.

I can add some sets to arbitrary values like was done for
pg_set_relation_stats().

Re: Statistics Import and Export

2024-03-19 Thread Corey Huinker

v11 attached.

- TAP tests passing (the big glitch was that indexes that are used in
constraints should have their stats dependent on the constraint, not the
index, thanks Jeff)
- The new range-specific statistics types are now supported. I'm not happy
with the typid machinations I do to get them to work, but it is working so
far. These are stored out-of-stakind-order (7 before 6), which is odd
because all other types seem store stakinds in ascending order. It
shouldn't matter, it was just odd.
- regression tests now make simpler calls with arbitrary stats to
demonstrate the function usage more cleanly
- pg_set_*_stats function now have all of their parameters in the same
order as the table/view they pull from
From 5c63ed5748eb3817d193b64329b57dc590e0196e Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Mon, 11 Mar 2024 14:18:39 -0400
Subject: [PATCH v11 1/2] Create pg_set_relation_stats, pg_set_attribute_stats.

These functions will be used by pg_dump/restore and pg_upgrade to convey
relation and attribute statistics from the source database to the
target. This would be done instead of vacuumdb --analyze-in-stages.

Both functions take an oid to identify the target relation that will
receive the statistics. There is nothing requiring that relation to be
the same one as the one exported, though the statistics would have to
make sense in the context of the new relation.

The parameters of pg_set_attribute_stats intentionaly mirror the columns
in the view pg_stats, with the ANYARRAY types casted to TEXT. Those
values will be cast to arrays of the basetype of the attribute, and that
operation may fail if the attribute type has changed.

The statistics imported by pg_set_attribute_stats are imported
transactionally like any other operation. However, pg_set_relation_stats
does it's update in-place, which is to say non-transactionally. This is
in line with what ANALYZE does to avoid table bloat in pg_class.

These functions also allows for tweaking of table statistics in-place,
allowing the user to inflate rowcounts, skew histograms, etc, to see
what those changes will evoke from the query planner.
---
 src/include/catalog/pg_proc.dat   |  15 +
 src/include/statistics/statistics.h   |   2 +
 src/backend/statistics/Makefile   |   3 +-
 src/backend/statistics/meson.build|   1 +
 src/backend/statistics/statistics.c   | 603 ++
 .../regress/expected/stats_export_import.out  | 283 
 src/test/regress/parallel_schedule|   2 +-
 src/test/regress/sql/stats_export_import.sql  | 246 +++
 doc/src/sgml/func.sgml|  91 +++
 9 files changed, 1244 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/statistics/statistics.c
 create mode 100644 src/test/regress/expected/stats_export_import.out
 create mode 100644 src/test/regress/sql/stats_export_import.sql

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 177d81a891..f31412d4a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12177,4 +12177,19 @@
   proargtypes => 'int2',
   prosrc => 'gist_stratnum_identity' },
 
+# Statistics Import
+{ oid => '8048',
+  descr => 'set statistics on relation',
+  proname => 'pg_set_relation_stats', provolatile => 'v', proisstrict => 't',
+  proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'oid int4 float4 int4',
+  proargnames => '{relation,relpages,reltuples,relallvisible}',
+  prosrc => 'pg_set_relation_stats' },
+{ oid => '8049',
+  descr => 'set statistics on attribute',
+  proname => 'pg_set_attribute_stats', provolatile => 'v', proisstrict => 'f',
+  proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'oid name bool float4 int4 float4 text _float4 text float4 text _float4 _float4 text float4 text',
+  proargnames => '{relation,attname,inherited,null_frac,avg_width,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation,most_common_elems,most_common_elem_freqs,elem_count_histogram,range_length_histogram,range_empty_frac,range_bounds_histogram}',
+  prosrc => 'pg_set_attribute_stats' },
 ]
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 7f2bf18716..1dddf96576 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -127,4 +127,6 @@ extern StatisticExtInfo *choose_best_statistics(List *stats, char requiredkind,
 int nclauses);
 extern HeapTuple statext_expressions_load(Oid stxoid, bool inh, int idx);
 
+extern Datum pg_set_relation_stats(PG_FUNCTION_ARGS);
+extern Datum pg_set_attribute_stats(PG_FUNCTION_ARGS

Re: Statistics Import and Export

2024-03-21 Thread Corey Huinker

On Thu, Mar 21, 2024 at 2:29 AM Jeff Davis  wrote:

> On Tue, 2024-03-19 at 05:16 -0400, Corey Huinker wrote:
> > v11 attached.
>
> Thank you.
>
> Comments on 0001:
>
> This test:
>
>+SELECT
>+format('SELECT pg_catalog.pg_set_attribute_stats( '
>...
>
> seems misplaced. It's generating SQL that can be used to restore or
> copy the stats -- that seems like the job of pg_dump, and shouldn't be
> tested within the plain SQL regression tests.
>

Fair enough.

>
> And can the other tests use pg_stats rather than pg_statistic?
>

They can, but part of what I wanted to show was that the values that aren't
directly passed in as parameters (staopN, stacollN) get set to the correct
values, and those values aren't guaranteed to match across databases, hence
testing them in the regression test rather than in a TAP test. I'd still
like to be able to test that.

>
> The function signature for pg_set_attribute_stats could be more
> friendly -- how about there are a few required parameters, and then it
> only sets the stats that are provided and the other ones are either
> left to the existing value or get some reasonable default?
>

That would be problematic.

1. We'd have to compare the stats provided against the stats that are
already there, make that list in-memory, and then re-order what remains
2. There would be no way to un-set statistics of a given stakind, unless we
added an "actually set it null" boolean for each parameter that can be
null.
3. I tried that with the JSON formats, it made the code even messier than
it already was.

Make sure all error paths ReleaseSysCache().
>

+1

>
> Why are you calling checkCanModifyRelation() twice?
>

Once for the relation itself, and once for pg_statistic.

> I'm confused about when the function should return false and when it
> should throw an error. I'm inclined to think the return type should be
> void and all failures should be reported as ERROR.
>

I go back and forth on that. I can see making it void and returning an
error for everything that we currently return false for, but if we do that,
then a statement with one pg_set_relation_stats, and N
pg_set_attribute_stats (which we lump together in one command for the
locking benefits and atomic transaction) would fail entirely if one of the
set_attributes named a column that we had dropped. It's up for debate
whether that's the right behavior or not.

replaces[] is initialized to {true}, which means only the first element
> is initialized to true. Try following the pattern in AlterDatabase (or
> similar) which reads the catalog tuple first, then updates a few fields
> selectively, setting the corresponding element of replaces[] along the
> way.
>

+1.

>
> The test also sets the most_common_freqs in an ascending order, which
> is weird.
>

I pulled most of the hardcoded values from pg_stats itself. The sample set
is trivially small, and the values inserted were in-order-ish. So maybe
that's why.

> Relatedly, I got worried recently about the idea of plain users
> updating statistics. In theory, that should be fine, and the planner
> should be robust to whatever pg_statistic contains; but in practice
> there's some risk of mischief there until everyone understands that the
> contents of pg_stats should not be trusted. Fortunately I didn't find
> any planner crashes or even errors after a brief test.
>

Maybe we could have the functions restricted to a role or roles:

1. pg_write_all_stats (can modify stats on ANY table)
2. pg_write_own_stats (can modify stats on tables owned by user)

I'm iffy on the need for the first one, I list it first purely to show how
I derived the name for the second.

> One thing we can do is some extra validation for consistency, like
> checking that the arrays are properly sorted, check for negative
> numbers in the wrong place, or fractions larger than 1.0, etc.
>

+1. All suggestions of validation checks welcome.

Re: Statistics Import and Export

2024-03-21 Thread Corey Huinker

>
> How about just some defaults then? Many of them have a reasonable
> default, like NULL or an empty array. Some are parallel arrays and
> either both should be specified or neither (e.g.
> most_common_vals+most_common_freqs), but you can check for that.
>

+1
Default NULL has been implemented for all parameters after n_distinct.


>
> > > Why are you calling checkCanModifyRelation() twice?
> >
> > Once for the relation itself, and once for pg_statistic.
>
> Nobody has the privileges to modify pg_statistic except superuser,
> right? I thought the point of a privilege check is that users could
> modify statistics for their own tables, or the tables they maintain.
>

In which case wouldn't the checkCanModify on pg_statistic would be a proxy
for is_superuser/has_special_role_we_havent_created_yet.



>
> >
> > I can see making it void and returning an error for everything that
> > we currently return false for, but if we do that, then a statement
> > with one pg_set_relation_stats, and N pg_set_attribute_stats (which
> > we lump together in one command for the locking benefits and atomic
> > transaction) would fail entirely if one of the set_attributes named a
> > column that we had dropped. It's up for debate whether that's the
> > right behavior or not.
>
> I'd probably make the dropped column a WARNING with a message like
> "skipping dropped column whatever". Regardless, have some kind of
> explanatory comment.
>

That's certainly do-able.




>
> >
> > I pulled most of the hardcoded values from pg_stats itself. The
> > sample set is trivially small, and the values inserted were in-order-
> > ish. So maybe that's why.
>
> In my simple test, most_common_freqs is descending:
>
>CREATE TABLE a(i int);
>INSERT INTO a VALUES(1);
>INSERT INTO a VALUES(2);
>INSERT INTO a VALUES(2);
>INSERT INTO a VALUES(3);
>INSERT INTO a VALUES(3);
>INSERT INTO a VALUES(3);
>INSERT INTO a VALUES(4);
>INSERT INTO a VALUES(4);
>INSERT INTO a VALUES(4);
>INSERT INTO a VALUES(4);
>ANALYZE a;
>SELECT most_common_vals, most_common_freqs
>  FROM pg_stats WHERE tablename='a';
> most_common_vals | most_common_freqs
>--+---
> {4,3,2}  | {0.4,0.3,0.2}
>(1 row)
>
> Can you show an example where it's not?
>

Not off hand, no.



>
> >
> > Maybe we could have the functions restricted to a role or roles:
> >
> > 1. pg_write_all_stats (can modify stats on ANY table)
> > 2. pg_write_own_stats (can modify stats on tables owned by user)
>
> If we go that route, we are giving up on the ability for users to
> restore stats on their own tables. Let's just be careful about
> validating data to mitigate this risk.
>

A great many test cases coming in the next patch.

Re: Statistics Import and Export

2024-03-21 Thread Corey Huinker

>
>
>
> But ideally we'd just make it safe to dump and reload stats on your own
> tables, and then not worry about it.
>

That is my strong preference, yes.


>
> > Not off hand, no.
>
> To me it seems like inconsistent data to have most_common_freqs in
> anything but descending order, and we should prevent it.
>

Sorry, I misunderstood, I thought we were talking about values, not the
frequencies. Yes, the frequencies should only be monotonically
non-increasing (i.e. it can go down or flatline from N->N+1). I'll add a
test case for that.

Re: Statistics Import and Export

2024-03-26 Thread Corey Huinker

>
>
>
> +\gexec
>
> Why do we need to construct the command and execute? Can we instead
> execute the function directly? That would also avoid ECHO magic.
>

We don't strictly need it, but I've found the set-difference operation to
be incredibly useful in diagnosing problems. Additionally, the values are
subject to change due to changes in test data, no guarantee that the output
of ANALYZE is deterministic, etc. But most of all, because the test cares
about the correct copying of values, not the values themselves.


>
> +   
> +Database Object Statistics Import Functions
> +
> + 
> +  
> +   
> +Function
> +   
> +   
> +Description
> +   
> +  
> + 
>
> COMMENT: The functions throw many validation errors. Do we want to list
> the acceptable/unacceptable input values in the documentation corresponding
> to those? I don't expect one line per argument validation. Something like
> "these, these and these arguments can not be NULL" or "both arguments in
> each of the pairs x and y, a and b, and c and d should be non-NULL or NULL
> respectively".
>

Yes. It should.


>  Statistics are about data. Whenever pg_dump dumps some filtered data, the
> statistics collected for the whole table are uselss. We should avoide
> dumping
> statistics in such a case. E.g. when only schema is dumped what good is
> statistics? Similarly the statistics on a partitioned table may not be
> useful
> if some its partitions are not dumped. Said that dumping statistics on
> foreign
> table makes sense since they do not contain data but the statistics still
> makes sense.
>

Good points, but I'm not immediately sure how to enforce those rules.


>
>
>>
>> Key areas where I'm seeking feedback:
>>
>> - What level of errors in a restore will a user tolerate, and what should
>> be done to the error messages to indicate that the data itself is fine, but
>> a manual operation to update stats on that particular table is now
>> warranted?
>> - To what degree could pg_restore/pg_upgrade take that recovery action
>> automatically?
>> - Should the individual attribute/class set function calls be grouped by
>> relation, so that they all succeed/fail together, or should they be called
>> separately, each able to succeed or fail on their own?
>> - Any other concerns about how to best use these new functions.
>>
>>
>>
> Whether or not I pass --no-statistics, there is no difference in the dump
> output. Am I missing something?
> $ pg_dump -d postgres > /tmp/dump_no_arguments.out
> $ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
> $ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out
> $
>
> IIUC, pg_dump includes statistics by default. That means all our pg_dump
> related tests will have statistics output by default. That's good since the
> functionality will always be tested. 1. We need additional tests to ensure
> that the statistics is installed after restore. 2. Some of those tests
> compare dumps before and after restore. In case the statistics is changed
> because of auto-analyze happening post-restore, these tests will fail.
>

+1


> I believe, in order to import statistics through IMPORT FOREIGN SCHEMA,
> postgresImportForeignSchema() will need to add SELECT commands invoking
> pg_set_relation_stats() on each imported table and pg_set_attribute_stats()
> on each of its attribute. Am I right? Do we want to make that happen in the
> first cut of the feature? How do you expect these functions to be used to
> update statistics of foreign tables?
>

I don't think there's time to get it into this release. I think we'd want
to extend this functionality to both IMPORT FOREIGN SCHEMA and ANALYZE for
foreign tables, in both cases with a server/table option to do regular
remote sampling. In both cases, they'd do a remote query very similar to
what pg_dump does (hence putting it in fe_utils), with some filters on
which columns/tables it believes it can trust. The remote table might
itself be a view (in which case they query would turn up nothing) or column
data types may change across the wire, and in those cases we'd have to fall
back to sampling.

Re: Statistics Import and Export

2024-03-26 Thread Corey Huinker

>
> 1) The docs say this:
>
>   
>The purpose of this function is to apply statistics values in an
>upgrade situation that are "good enough" for system operation until
>they are replaced by the next ANALYZE, usually via
>autovacuum This function is used by
>pg_upgrade and pg_restore to
>convey the statistics from the old system version into the new one.
>   
>
> I find this a bit confusing, considering the pg_dump/pg_restore changes
> are only in 0002, not in this patch.
>

True, I'll split the docs.


>
> 2) Also, I'm not sure about this:
>
>   relation, the parameters in this are all
>   derived from pg_stats, and the values
>   given are most often extracted from there.
>
> How do we know where do the values come from "most often"? I mean, where
> else would it come from?
>

The next most likely sources would be 1. stats from another similar table
and 2. the imagination of a user testing hypothetical query plans.


>
> 3) The function pg_set_attribute_stats() is very long - 1000 lines
> or so, that's way too many for me to think about. I agree the flow is
> pretty simple, but I still wonder if there's a way to maybe split it
> into some smaller "meaningful" steps.
>

I wrestle with that myself. I think there's some pieces that can be
factored out.


> 4) It took me *ages* to realize the enums at the beginning of some of
> the functions are actually indexes of arguments in PG_FUNCTION_ARGS.
> That'd surely deserve a comment explaining this.
>

My apologies, it definitely deserves a comment.


>
> 5) The comment for param_names in pg_set_attribute_stats says this:
>
> /* names of columns that cannot be null */
> const char *param_names[] = { ... }
>
> but isn't that actually incorrect? I think that applies only to a couple
> initial arguments, but then other fields (MCV, mcelem stats, ...) can be
> NULL, right?
>

Yes, that is vestigial, I'll remove it.


>
> 6) There's a couple minor whitespace fixes or comments etc.
>
>
> 0002
> 
>
> 1) I don't understand why we have exportExtStatsSupported(). Seems
> pointless - nothing calls it, even if it did we don't know how to export
> the stats.
>

It's not strictly necessary.


>
> 2) I think this condition in dumpTableSchema() is actually incorrect:
>
> IMO that's wrong, the statistics should be delayed to the post-data
> section. Which probably means there needs to be a separate dumpable
> object for statistics on table/index, with a dependency on the object.
>

Good points.


>
> 3) I don't like the "STATS IMPORT" description. For extended statistics
> we dump the definition as "STATISTICS" so why to shorten it to "STATS"
> here? And "IMPORT" seems more like the process of loading data, not the
> data itself. So I suggest "STATISTICS DATA".
>

+1

Re: Statistics Import and Export

2024-03-29 Thread Corey Huinker

>
>
> There's still a failure in the pg_upgrade TAP test. One seems to be
> ordering, so perhaps we need to ORDER BY the attribute number. Others
> seem to be missing relstats and I'm not sure why yet. I suggest doing
> some manual pg_upgrade tests and comparing the before/after dumps to
> see if you can reproduce a smaller version of the problem.
>

That's fixed in my current working version, as is a tsvector-specific
issue. Working on the TAP issue.

Re: Statistics Import and Export

2024-03-29 Thread Corey Huinker

On Fri, Mar 29, 2024 at 7:34 PM Jeff Davis  wrote:

> On Fri, 2024-03-29 at 18:02 -0400, Stephen Frost wrote:
> > I’d certainly think “with stats” would be the preferred default of
> > our users.
>
> I'm concerned there could still be paths that lead to an error. For
> pg_restore, or when loading a SQL file, a single error isn't fatal
> (unless -e is specified), but it still could be somewhat scary to see
> errors during a reload.
>

To that end, I'm going to be modifying the "Optimizer statistics are not
transferred by pg_upgrade..." message when stats _were_ transferred,
width additional instructions that the user should treat any stats-ish
error messages encountered as a reason to manually analyze that table. We
should probably say something about extended stats as well.

Re: Statistics Import and Export

2024-03-29 Thread Corey Huinker

>
> (I've not read the patch yet, but I assume the switch works like
> other pg_dump filters in that you can apply it on the restore
> side?)
>

Correct. It follows the existing --no-something pattern.

Re: Statistics Import and Export

2024-03-30 Thread Corey Huinker

>
> I'm getting late into this discussion and I apologize if I've missed this
> being discussed before. But.
>
> Please don't.
>
> That will make it *really* hard for any form of automation or drivers of
> this. The information needs to go somewhere where such tools can easily
> consume it, and an informational message during runtime (which is also
> likely to be translated in many environments) is the exact opposite of that.
>

That makes a lot of sense. I'm not sure what form it would take (file,
pseudo-table, something else?). Open to suggestions.

Re: Statistics Import and Export

2024-03-30 Thread Corey Huinker

>
> I didn't have any specific proposal in mind, was just trying to think
> outside the box.
>

What if we added a separate resection SECTION_STATISTICS which is run
following post-data?

Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker

>
> That will make it *really* hard for any form of automation or drivers of
> this. The information needs to go somewhere where such tools can easily
> consume it, and an informational message during runtime (which is also
> likely to be translated in many environments) is the exact opposite of that.
>

Having given this some thought, I'd be inclined to create a view,
pg_stats_missing, with the same security barrier as pg_stats, but looking
for tables that lack stats on at least one column, or lack stats on an
extended statistics object.

Table structure would be

schemaname name
tablename name
attnames text[]
ext_stats text[]


The informational message, if it changes at all, could reference this new
view as the place to learn about how well the stats import went.

vacuumdb might get a --missing-only option in addition to
--analyze-in-stages.

For that matter, we could add --analyze-missing options to pg_restore and
pg_upgrade to do the mopping up themselves.

Re: Statistics Import and Export

2024-03-31 Thread Corey Huinker

On Sun, Mar 31, 2024 at 2:41 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Having given this some thought, I'd be inclined to create a view,
> > pg_stats_missing, with the same security barrier as pg_stats, but looking
> > for tables that lack stats on at least one column, or lack stats on an
> > extended statistics object.
>
> The week before feature freeze is no time to be designing something
> like that, unless you've abandoned all hope of getting this into v17.
>

It was a response to the suggestion that there be some way for
tools/automation to read the status of stats. I would view it as a separate
patch, as such a view would be useful now for knowing which tables to
ANALYZE, regardless of whether this patch goes in or not.

> There's a bigger issue though: AFAICS this patch set does nothing
> about dumping extended statistics.  I surely don't want to hold up
> the patch insisting that that has to happen before we can commit the
> functionality proposed here.  But we cannot rip out pg_upgrade's
> support for post-upgrade ANALYZE processing before we do something
> about extended statistics, and that means it's premature to be
> designing any changes to how that works.  So I'd set that whole
> topic on the back burner.
>

So Extended Stats _were_ supported by earlier versions where the medium of
communication was JSON. However, there were several problems with adapting
that to the current model where we match params to stat types:

* Several of the column types do not have functional input functions, so we
must construct the data structure internally and pass them to
statext_store().
* The output functions for some of those column types have lists of
attnums, with negative values representing positional expressions in the
stat definition. This information is not translatable to another system
without also passing along the attnum/attname mapping of the source system.

At least three people told me "nobody uses extended stats" and to just drop
that from the initial version. Unhappy with this assessment, I inquired as
to whether my employer (AWS) had some internal databases that used extended
stats so that I could get good test data, and came up with nothing, nor did
anyone know of customers who used the feature. So when the fourth person
told me that nobody uses extended stats, and not to let a rarely-used
feature get in the way of a feature that would benefit nearly 100% of
users, I dropped it.

> It's possible that we could drop the analyze-in-stages recommendation,
> figuring that this functionality will get people to the
> able-to-limp-along level immediately and that all that is needed is a
> single mop-up ANALYZE pass.  But I think we should leave that till we
> have a bit more than zero field experience with this feature.

It may be that we leave the recommendation exactly as it is.

Perhaps we enhance the error messages in pg_set_*_stats() to indicate what
command would remediate the issue.

1 2 3 4 5 >

1 - 100 of 444 matches

Mail list logo