Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2019-01-02 Thread Mark
Hi Tom,

Thanks for your reply.

Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window. Looks like your patch will help if this happens.

Then again, it also seems no one has had a problem with its current
behaviour (for 11 years!).

Thanks,

Mark

On Wed, 2 Jan 2019 at 16:11 Tom Lane  wrote:

> Mark  writes:
> > I have a long running job which deletes all of the common_student table
> and
> > then repopulates it. It takes long time to load all the other data and
> > commit the transaction. I didn't think the delete inside the transaction
> > would have any effect until it is commited or rolled back.
> > I will have to rewrite the application so it updates the existing rows
> > rather than deleting all and then inserting.
>
> Hmm ... I'm not sure if that will actually make things better.  The root
> of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:
>
>  * We count delete-in-progress rows as still live,
> using
>  * the same reasoning given above; but we don't bother
> to
>  * include them in the sample.
>
> The "reasoning given above" is a reference to what happens for
> INSERT_IN_PROGRESS tuples:
>
>  * Insert-in-progress rows are not counted.  We assume
>  * that when the inserting transaction commits or
> aborts,
>  * it will send a stats message to increment the proper
>  * count.  This works right only if that transaction
> ends
>  * after we finish analyzing the table; if things
> happen
>  * in the other order, its stats update will be
>  * overwritten by ours.  However, the error will be
> large
>  * only if the other transaction runs long enough to
>  * insert many tuples, so assuming it will finish
> after us
>  * is the safer option.
>
> Now the problem with this, from your perspective, is that *neither*
> INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
> ANALYZE's data sample.  So a long-running update transaction will
> cause all the rows it changes to be excluded from the sample until
> commit.  This seems like a bad thing, and it definitely means that
> adjusting your app as you're contemplating won't help.
>
> In order to handle the bulk-update scenario sanely, it seems like
> we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
> to be included.  And it looks like, for consistency with the row-counting
> logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
> is a slightly annoying conclusion, because it means we're accumulating
> stats that we know are likely to soon be obsolete, but I think sampling
> INSERT_IN_PROGRESS tuples instead would lead to very strange results
> in some cases.
>
> In short, I think we want to do this to the DELETE_IN_PROGRESS logic:
>
> if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
> deadrows += 1;
> else
> +   {
> +   sample_it = true;
> liverows += 1;
> +   }
>
> with suitable adjustment of the adjacent comment.
>
> Thoughts?
>
> regards, tom lane
>


Re: Time to drop plpython2?

2022-02-21 Thread Mark Wong
On Sat, Feb 19, 2022 at 08:22:29AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-02-19 02:00:28 +0000, Mark Wong wrote:
> > On Fri, Feb 18, 2022 at 02:41:04PM -0800, Andres Freund wrote:
> > > There's snapper ("pgbf [ a t ] twiska.com"), and there's Mark Wong's large
> > > menagerie. Mark said yesterday that he's working on updating.
> > 
> > I've made one pass.  Hopefully I didn't make any mistakes. :)
> 
> Unfortunately it looks like it wasn't quite enough. All, or nearly all, your
> animals that ran since still seem to be failing in the same spot...
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gadwall&dt=2022-02-19%2011%3A22%3A48
> 
> checking Python.h usability... no
> checking Python.h presence... no
> checking for Python.h... no
> configure: error: header file  is required for Python
> 
> 
> For that machine (and the other debian based ones) the relevant package likely
> is python3-dev.
> 
> For the Red Hat and Suse ones, it's likely python3-devel.
> 
> 
> I've wondered before if it's worth maintaining a list of packages for
> dependencies for at least the more popular distros. It's annoying to have to
> figure it out everytime one needs to test something.
> 
> 
> FWIW, here's the recipe I just used to verify the packages necessary for
> Python.h to be found:
> 
> $ podman run --rm -it opensuse/leap
> # zypper install -y python3
> # ls -l $(python3 -c "import sysconfig; 
> print(sysconfig.get_config_var('INCLUDEPY'))")/Python.h
> 
> # zypper install -y python3-devel
> # ls -l $(python3 -c "import sysconfig; 
> print(sysconfig.get_config_var('INCLUDEPY'))")/Python.h
> -rw-r--r-- 1 root root 3221 Jan  4 14:04 /usr/include/python3.6m/Python.h
> 
> (Wow, zypper repos are expensive to refresh. And I thought dnf was slow doing
> so, compared to apt.)

Oops, made another pass for python3 dev libraries.

I can't seem to find archived ppc repos OpenSUSE Leap 43.2.  I'm
debating whether to disable python or upgrade/rebrand that animal for a
newer SUSE release.  I've stopped my cron jobs on this animal for the
time being.

Regards,
Mark




Re: Time to drop plpython2?

2022-02-22 Thread Mark Wong
On Mon, Feb 21, 2022 at 03:28:35PM -0500, Tom Lane wrote:
> Mark Wong  writes:
> > On Sat, Feb 19, 2022 at 08:22:29AM -0800, Andres Freund wrote:
> >> Unfortunately it looks like it wasn't quite enough. All, or nearly all, 
> >> your
> >> animals that ran since still seem to be failing in the same spot...
> 
> > Oops, made another pass for python3 dev libraries.
> 
> You might need to do one more thing, which is manually blow away the cache
> files under $BUILDFARM/accache.  For example, on demoiselle everything
> looks fine in HEAD, but the back branches are failing like this:
> 
> checking for python... (cached) /usr/bin/python
> ./configure: line 10334: /usr/bin/python: No such file or directory
> configure: using python 
> ./configure: line 10342: test: : integer expression expected
> checking for Python sysconfig module... no
> configure: error: sysconfig module not found
> 
> Very recent versions of the buildfarm script will discard accache
> automatically after a configure or make failure, but I think the
> REL_11 you're running here doesn't have that defense.  It'll only
> flush accache after a change in the configure script in git.

Take 3. :)

I've upgraded everyone to the v14 buildfarm scripts and made sure the
--test passed on HEAD on each one.  So I hopefully didn't miss any
(other than the one EOL OpenSUSE version that I will plan on upgrading.)

Regards,
Mark




Re: Time to drop plpython2?

2022-02-23 Thread Mark Wong
On Tue, Feb 22, 2022 at 08:50:07PM -0500, Tom Lane wrote:
> Mark Wong  writes:
> > Take 3. :)
> 
> > I've upgraded everyone to the v14 buildfarm scripts and made sure the
> > --test passed on HEAD on each one.  So I hopefully didn't miss any
> > (other than the one EOL OpenSUSE version that I will plan on upgrading.)
> 
> Thanks!
> 
> However ... it seems like most of your animals haven't run at all
> in the last couple of days.  Did you maybe forget to re-enable
> their cron jobs after messing with them, or something like that?

Uh oh, more user error.  I tested run_build but run_branches wasn't
happy.  I'll start working through that...

Regards,
Mark




Re: Time to drop plpython2?

2022-02-24 Thread Mark Wong
On Wed, Feb 23, 2022 at 07:59:01AM -0800, Mark Wong wrote:
> On Tue, Feb 22, 2022 at 08:50:07PM -0500, Tom Lane wrote:
> > Mark Wong  writes:
> > > Take 3. :)
> > 
> > > I've upgraded everyone to the v14 buildfarm scripts and made sure the
> > > --test passed on HEAD on each one.  So I hopefully didn't miss any
> > > (other than the one EOL OpenSUSE version that I will plan on upgrading.)
> > 
> > Thanks!
> > 
> > However ... it seems like most of your animals haven't run at all
> > in the last couple of days.  Did you maybe forget to re-enable
> > their cron jobs after messing with them, or something like that?
> 
> Uh oh, more user error.  I tested run_build but run_branches wasn't
> happy.  I'll start working through that...

I think I have most of them operational again.  I see some animals are
still failing on some branches, but still better overall?

I discovered that clang for gadwall and pintail got purged when I
removed python2, so I disabled those two animals.

Regards,
Mark




real/float example for testlibpq3

2022-02-24 Thread Mark Wong
Hi everyone,

Would adding additional examples to testlibpq3.c on handling more data
types be helpful?  I've attached a patch adding how to handle a REAL to
current example.

Regards,
Mark
diff --git a/src/test/examples/testlibpq3.c b/src/test/examples/testlibpq3.c
index 4f7b791388..0c1739fcbb 100644
--- a/src/test/examples/testlibpq3.c
+++ b/src/test/examples/testlibpq3.c
@@ -11,19 +11,21 @@
  * CREATE SCHEMA testlibpq3;
  * SET search_path = testlibpq3;
  * SET standard_conforming_strings = ON;
- * CREATE TABLE test1 (i int4, t text, b bytea);
- * INSERT INTO test1 values (1, 'joe''s place', '\000\001\002\003\004');
- * INSERT INTO test1 values (2, 'ho there', '\004\003\002\001\000');
+ * CREATE TABLE test1 (i int4, r real, t text, b bytea);
+ * INSERT INTO test1 values (1, 2.3, 'joe''s place', '\000\001\002\003\004');
+ * INSERT INTO test1 values (2, 4.5 'ho there', '\004\003\002\001\000');
  *
  * The expected output is:
  *
  * tuple 0: got
  * i = (4 bytes) 1
+ * r = (4 bytes) 2.3
  * t = (11 bytes) 'joe's place'
  * b = (5 bytes) \000\001\002\003\004
  *
  * tuple 0: got
  * i = (4 bytes) 2
+ * r = (4 bytes) 4.5
  * t = (8 bytes) 'ho there'
  * b = (5 bytes) \004\003\002\001\000
  */
@@ -62,32 +64,41 @@ show_binary_results(PGresult *res)
int i,
j;
int i_fnum,
+   r_fnum,
t_fnum,
b_fnum;
 
/* Use PQfnumber to avoid assumptions about field order in result */
i_fnum = PQfnumber(res, "i");
+   r_fnum = PQfnumber(res, "r");
t_fnum = PQfnumber(res, "t");
b_fnum = PQfnumber(res, "b");
 
for (i = 0; i < PQntuples(res); i++)
{
char   *iptr;
+   char   *rptr;
char   *tptr;
char   *bptr;
int blen;
int ival;
+   union {
+   int ival;
+   float   fval;
+   }   rval;
 
/* Get the field values (we ignore possibility they are null!) 
*/
iptr = PQgetvalue(res, i, i_fnum);
+   rptr = PQgetvalue(res, i, r_fnum);
tptr = PQgetvalue(res, i, t_fnum);
bptr = PQgetvalue(res, i, b_fnum);
 
/*
-* The binary representation of INT4 is in network byte order, 
which
-* we'd better coerce to the local byte order.
+* The binary representation of INT4 and REAL are in network 
byte
+* order, which we'd better coerce to the local byte order.
 */
ival = ntohl(*((uint32_t *) iptr));
+   rval.ival = ntohl(*((uint32_t *) rptr));
 
/*
 * The binary representation of TEXT is, well, text, and since 
libpq
@@ -102,6 +113,8 @@ show_binary_results(PGresult *res)
printf("tuple %d: got\n", i);
printf(" i = (%d bytes) %d\n",
   PQgetlength(res, i, i_fnum), ival);
+   printf(" r = (%d bytes) %f\n",
+  PQgetlength(res, i, r_fnum), rval.fval);
printf(" t = (%d bytes) '%s'\n",
   PQgetlength(res, i, t_fnum), tptr);
printf(" b = (%d bytes) ", blen);
diff --git a/src/test/examples/testlibpq3.sql b/src/test/examples/testlibpq3.sql
index 35a95ca347..e3a70cec27 100644
--- a/src/test/examples/testlibpq3.sql
+++ b/src/test/examples/testlibpq3.sql
@@ -1,6 +1,6 @@
 CREATE SCHEMA testlibpq3;
 SET search_path = testlibpq3;
 SET standard_conforming_strings = ON;
-CREATE TABLE test1 (i int4, t text, b bytea);
-INSERT INTO test1 values (1, 'joe''s place', '\000\001\002\003\004');
-INSERT INTO test1 values (2, 'ho there', '\004\003\002\001\000');
+CREATE TABLE test1 (i int4, r real, t text, b bytea);
+INSERT INTO test1 values (1, 2.3, 'joe''s place', '\000\001\002\003\004');
+INSERT INTO test1 values (2, 4.5, 'ho there', '\004\003\002\001\000');


Re: trigger example for plsample

2022-03-02 Thread Mark Wong
On Fri, Feb 25, 2022 at 06:39:39PM +, Chapman Flack wrote:
> This patch is straightforward, does what it says, and passes the tests.
> 
> Regarding the duplication of code between plsample_func_handler and
> plsample_trigger_handler, perhaps that's for the best for now, as 3554 in
> the same commitfest also touches plsample, so merge conflicts may be
> minimized by not doing more invasive refactoring.
> 
> That would leave low-hanging fruit for a later patch that could refactor
> plsample to reduce the duplication (maybe adding a validator at the same
> time? That would also duplicate some of the checks in the existing handlers.)
> 
> I am not sure that structuring the trigger handler with separate compile and
> execute steps is worth the effort for a simple example like plsample. The main
> plsample_func_handler is not so structured.
> 
> It's likely that many real PLs will have some notion of compilation separate 
> from
> execution. But those will also have logic to do the compilation only once, and
> somewhere to cache the result of that for reuse across calls, and those kinds 
> of
> details might make plsample's basic skeleton more complex than needed.
> 
> I know that in just looking at expected/plsample.out, I was a little 
> distracted by
> seeing multiple "compile" messages for the same trigger function in the same
> session and wondering why that was.
> 
> So maybe it would be simpler and less distracting to assume that the PL 
> targeted
> by plsample is one that just has a simple interpreter that works from the 
> source text.

I've attached v2, which reduces the output:

* Removing the notices for the text body, and the "compile" message.
* Replaced the notice for "compile" message with a comment as a
  placeholder for where a compiling code or checking a cache may go.
* Reducing the number of rows inserted into the table, thus reducing
  the number of notice messages about which code path is taken.


I think that reduces the repetitiveness of the output...

Regards,
Mark
diff --git a/src/test/modules/plsample/expected/plsample.out 
b/src/test/modules/plsample/expected/plsample.out
index a0c318b6df..a7912c7897 100644
--- a/src/test/modules/plsample/expected/plsample.out
+++ b/src/test/modules/plsample/expected/plsample.out
@@ -6,9 +6,6 @@ AS $$
   Example of source with text result.
 $$ LANGUAGE plsample;
 SELECT plsample_result_text(1.23, 'abc', '{4, 5, 6}');
-NOTICE:  source text of function "plsample_result_text": 
-  Example of source with text result.
-
 NOTICE:  argument: 0; name: a1; value: 1.23
 NOTICE:  argument: 1; name: a2; value: abc
 NOTICE:  argument: 2; name: a3; value: {4,5,6}
@@ -25,12 +22,57 @@ AS $$
   Example of source with void result.
 $$ LANGUAGE plsample;
 SELECT plsample_result_void('{foo, bar, hoge}');
-NOTICE:  source text of function "plsample_result_void": 
-  Example of source with void result.
-
 NOTICE:  argument: 0; name: a1; value: {foo,bar,hoge}
  plsample_result_void 
 --
  
 (1 row)
 
+CREATE FUNCTION my_trigger_func() RETURNS trigger AS $$
+if TD_event == "INSERT"
+return TD_NEW
+elseif TD_event == "UPDATE"
+return TD_NEW
+else
+return "OK"
+end
+$$ language plsample;
+CREATE TABLE my_table (num integer, description text);
+CREATE TRIGGER my_trigger_func BEFORE INSERT OR UPDATE ON my_table
+   FOR EACH ROW EXECUTE FUNCTION my_trigger_func();
+CREATE TRIGGER my_trigger_func2 AFTER INSERT OR UPDATE ON my_table
+   FOR EACH ROW EXECUTE FUNCTION my_trigger_func(8);
+INSERT INTO my_table (num, description)
+VALUES (1, 'first');
+NOTICE:  trigger name: my_trigger_func
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by INSERT
+NOTICE:  triggered BEFORE
+NOTICE:  triggered per row
+NOTICE:  trigger name: my_trigger_func2
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by INSERT
+NOTICE:  triggered AFTER
+NOTICE:  triggered per row
+NOTICE:  trigger arg[0]: 8
+UPDATE my_table
+SET description = 'first, modified once'
+WHERE num = 1;
+NOTICE:  trigger name: my_trigger_func
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by UPDATE
+NOTICE:  triggered BEFORE
+NOTICE:  triggered per row
+NOTICE:  trigger name: my_trigger_func2
+NOTICE:  trigger relation: my_table
+NOTICE:  trigger relation schema: public
+NOTICE:  triggered by UPDATE
+NOTICE:  triggered AFTER
+NOTICE:  triggered per row
+NOTICE:  trigger arg[0]: 8
+DROP TRIGGER my_trigger_func ON my_table;
+DROP TRIGGER my_trigger_func2 ON my_table;
+DROP FUNCTION my_trigger_func;
diff --git a/src/test/modules/plsample/plsample.c 
b/src/test/modules/plsample/plsample.c
index 6fc33c72

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-06 Thread Mark Dilger



> On Mar 6, 2022, at 2:13 PM, Tom Lane  wrote:
> 
> 1. If we need to change these two contrib modules, doesn't that imply
> a lot of changes forced on external modules as well?  What are the
> security implications if somebody doesn't make such a change?
> 
> 2. It looks to me like if someone installs the updated postgres_fdw.so,
> but doesn't run ALTER EXTENSION UPDATE, they are going to be left with a
> rather broken extension.  This seems not good either, especially if it's
> multiplied by a boatload of third-party extensions requiring updates.
> 
> So I think some more thought is needed to see if we can't avoid
> the need to touch extension modules in this patch.  Or at least
> avoid the need for synchronized C-level and SQL-level updates,
> because that is going to create a lot of pain for end users.
> 
> I'm also fairly distressed by the number of changes in guc.c,
> mainly because I fear that that means that pending patches that
> add GUC variables will be subtly incorrect/insecure if they're not
> updated to account for this.  Frankly, I also reject the apparent
> position that we need to support forbidding users from touching
> many of these GUCs.  Or, if that's our position, why are there
> per-GUC changes at all, rather than just redefining what the
> context values mean?  (That is, why not redefine USERSET and
> SUSET as simply indicating the default ACL to be applied if there's
> no entry in the catalog.)

To my knowledge, there is no mechanism to revoke an implicit privilege.  You 
can revoke a privilege explicitly listed in an aclitem[], but only if the 
privilege is being tracked that way.

Userset variables are implicitly settable by any user.  There was a request, 
off-list as I recall, to make it possible to revoke the privilege to set 
variables such as "work_mem".  To make that possible, but not change the 
default behavior vis-a-vis prior releases, I upgraded most userset variables to 
suset with a corresponding grant to public on the variable.  Sites which wish 
to have a more restrictive policy on such variables can revoke that privilege 
from public and instead issue more restrictive grants.  There were a few 
variables where such treatment didn't seem sensible, such as ones to do with 
client connections, and I left them alone.  I didn't insist on a defense for 
why any particular setting needed to be revocable in order to apply this 
treatment.  My assumption was that sites should be allowed to determine their 
own security policies per setting unless there is a technical difficulty for a 
given setting that would make it overly burdensome to implement.

It seemed more complete to do the same thing for contrib modules, so I did.

If a third-party module doesn't do the equivalent operation, they would simply 
continue with their userset variables working as before.  I don't see a 
specific security concern with that, though I'd be happy to consider arguments 
to the contrary.

I believe there are also some issues with this patch relating to 
pg_dump/pg_restore and to pg_upgrade, which EDB is working to address.  We 
intend to repost something soon.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-06 Thread Mark Dilger



> On Mar 6, 2022, at 2:57 PM, Tom Lane  wrote:
> 
> I don't think this is materially different from what we do with
> permissions on (say) functions.  If you want to revoke the public
> SET privilege on some USERSET variable, you instantiate the default
> and then revoke.  You end up with an empty ACL stored in pg_setting_acl,
> and voila.

I assume you mean the implementation of REVOKE does this, not that the user 
needs to do both a grant and a revoke.

> It'd likely be necessary to refuse to record a grant/revoke on
> an unknown GUC, since if we don't know the GUC then we can't know
> what the relevant default ACL ought to be.  But I bet your existing
> patch has some dubious behavior in that case too.

The existing patch allows grants on unknown gucs, because it can't know what 
guc an upgrade script will introduce, and the grant statement may need to 
execute before the guc exists.  That opens a window for granting privileges on 
non-existent gucs.  That sounds bad, but I don't know of any actual harm that 
it does, beyond just being ugly.

With your proposal, it sounds like we could avoid that ugliness, so I'm 
inclined to at least try doing it as you propose.

Thanks for the suggestion!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-07 Thread Mark Dilger



> On Mar 7, 2022, at 10:28 AM, Tom Lane  wrote:
> 
> Does anything interesting break if you do just take it out?

 SET SESSION AUTHORIZATION regress_priv_group2;
 GRANT regress_priv_group2 TO regress_priv_user5; -- ok: a role can self-admin
-NOTICE:  role "regress_priv_user5" is already a member of role 
"regress_priv_group2"
+ERROR:  must have admin option on role "regress_priv_group2"

This test failure is just a manifestation of the intended change, but assuming 
we make no other changes, the error message would clearly need to be updated, 
because it suggests the role should have admin_option on itself, a situation 
which is not currently supported.

Perhaps we should support that, though, by adding a reflexive aclitem[] to 
pg_authid (meaning it tracks which privileges a role has on itself) with 
tracking of who granted it, so that revocation can be handled properly.  The 
aclitem could start out null, meaning the role has by default the traditional 
limited self-admin which the code comments discuss:

/*
 * A role can admin itself when it matches the session user and we're
 * outside any security-restricted operation, SECURITY DEFINER or
 * similar context.  SQL-standard roles cannot self-admin.  However,
 * SQL-standard users are distinct from roles, and they are not
 * grantable like roles: PostgreSQL's role-user duality extends the
 * standard.  Checking for a session user match has the effect of
 * letting a role self-admin only when it's conspicuously behaving
 * like a user.  Note that allowing self-admin under a mere SET ROLE
 * would make WITH ADMIN OPTION largely irrelevant; any member could
 * SET ROLE to issue the otherwise-forbidden command.
 *
 * Withholding self-admin in a security-restricted operation prevents
 * object owners from harnessing the session user identity during
 * administrative maintenance.  Suppose Alice owns a database, has
 * issued "GRANT alice TO bob", and runs a daily ANALYZE.  Bob creates
 * an alice-owned SECURITY DEFINER function that issues "REVOKE alice
 * FROM carol".  If he creates an expression index calling that
 * function, Alice will attempt the REVOKE during each ANALYZE.
 * Checking InSecurityRestrictedOperation() thwarts that attack.
 *
 * Withholding self-admin in SECURITY DEFINER functions makes their
 * behavior independent of the calling user.  There's no security or
 * SQL-standard-conformance need for that restriction, though.
 *
 * A role cannot have actual WITH ADMIN OPTION on itself, because that
 * would imply a membership loop.  Therefore, we're done either way.
 */

For non-null aclitem[], we could support REVOKE ADMIN OPTION FOR joe FROM joe, 
and for explicit re-grants, we could track who granted it, such that further 
revocations could properly refuse if the revoker doesn't have sufficient 
privileges vis-a-vis the role that granted it in the first place.

I have not yet tried to implement this, and might quickly hit problems with the 
idea, but will take a stab at a proof-of-concept patch unless you suggest a 
better approach.

Thoughts?


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-07 Thread Mark Dilger



> On Mar 7, 2022, at 12:01 PM, Robert Haas  wrote:
> 
> It's been pointed out upthread that this would have undesirable
> security implications, because the admin option would be inherited,
> and the implicit permission isn't.

Right, but with a reflexive self-admin-option, we could document that it works 
in a non-inherited way.  We'd just be saying the current hard-coded behavior is 
an option which can be revoked rather than something you're stuck with.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-07 Thread Mark Dilger



> On Mar 7, 2022, at 12:03 PM, Mark Dilger  wrote:
> 
> Right, but with a reflexive self-admin-option, we could document that it 
> works in a non-inherited way.  We'd just be saying the current hard-coded 
> behavior is an option which can be revoked rather than something you're stuck 
> with.

We could also say that the default is to not have admin option on yourself, 
with that being something grantable, but that is a larger change from the 
historical behavior and might have more consequences for dump/restore, etc.

My concern about just nuking self-admin is that there may be sites which use 
self-admin and we'd be leaving them without a simple work-around after upgrade, 
because they couldn't restore the behavior by executing a grant.  They'd have 
to more fundamentally restructure their role relationships to not depend on 
self-admin, something which might be harder for them to do.  Perhaps nobody is 
using self-admin, or very few people are using it, and I'm being overly 
concerned.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-07 Thread Mark Dilger


> On Mar 7, 2022, at 12:16 PM, Tom Lane  wrote:
> 
> What would be
> lost if we drop it?

I looked into this a bit.  Removing that bit of code, the only regression test 
changes for "check-world" are the expected ones, with nothing else breaking.  
Running installcheck+pg_upgrade to the patched version of HEAD from each of 
versions 11, 12, 13 and 14 doesn't turn up anything untoward.  The change I 
used (for reference) is attached:



v1-0001-WIP-Removing-role-self-administration.patch.WIP
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: role self-revocation

2022-03-10 Thread Mark Dilger



> On Mar 10, 2022, at 7:56 AM, David G. Johnston  
> wrote:
> 
> 
> I want that because I want mini-superusers, where alice can administer
> the users that alice creates just as if she were a superuser,
> including having their permissions implicitly and dropping them when
> she wants them gone, but where alice cannot break out to the operating
> system as a true superuser could do.
> 
> CREATEROLE (once the pg_* with admin rules are fixed) + Ownership and rules 
> restricting interfering with another role's objects (unless superuser) seems 
> to handle this.

What if one of alice's subordinates also owns roles?  Can alice interfere with 
*that* role's objects?  I don't see that a simple rule restricting roles from 
interfering with another role's objects is quite enough.  That raises the 
question of whether role ownership is transitive, and whether we need a concept 
similar to inherit/noinherit for ownership.

There is also the problem that CREATEROLE currently allows a set of privileges 
to be granted to created roles, and that set of privileges is hard-coded.  
You've suggested changing the hard-coded rules to remove pg_* roles from the 
list of grantable privileges, but that's still an inflexible set of hardcoded 
privileges.  Wouldn't it make more sense for the grantor to need GRANT OPTION 
on any privilege they give to roles they create?

> the bot can stand up
> accounts, can grant them membership in a defined set of groups, but
> cannot exercise the privileges of those accounts (or hack superuser
> either).
> 
> The bot should be provided a security definer procedure that encapsulates all 
> of this rather than us trying to hack the permission system.  This isn't a 
> user permission concern, it is an unauthorized privilege escalation concern.  
> Anyone with the bot's credentials can trivially overcome the third 
> restriction by creating a role with the desired membership and then logging 
> in as that role - and there is nothing the system can do to prevent that 
> while also allowing the other two permissions.

Doesn't this assume password authentication?  If the server uses ldap 
authentication, for example, wouldn't the bot need valid ldap credentials for 
at least one user for this attack to work?  And if CREATEROLE has been made 
more configurable, wouldn't the bot only be able to grant that ldap user the 
limited set of privileges that the bot's database user has been granted ADMIN 
OPTION for?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-10 Thread Mark Dilger



> On Mar 10, 2022, at 2:01 PM, Robert Haas  wrote:
> 
> It sounds like you prefer a behavior where CREATEROLE gives power over
> all non-superusers, but that seems pretty limiting to me. Why can't
> someone want to create a user with power over some users but not
> others?

I agree with Robert on this.

Over at [1], I introduced a patch series to (a) change CREATEROLE and (b) 
introduce role ownership.  Part (a) wasn't that controversial.  The patch 
series failed to make it for postgres 15 on account of (b).  The patch didn't 
go quite far enough, but with it applied, this is an example of a min-superuser 
"lord" operating within database "fiefdom":

fiefdom=# -- mini-superuser who can create roles and write all data
fiefdom=# CREATE ROLE lord
fiefdom-# WITH CREATEROLE
fiefdom-# IN ROLE pg_write_all_data;
CREATE ROLE
fiefdom=# 
fiefdom=# -- group which "lord" belongs to
fiefdom=# CREATE GROUP squire
fiefdom-# ROLE lord;
CREATE ROLE
fiefdom=# 
fiefdom=# -- group which "lord" has no connection to
fiefdom=# CREATE GROUP paladin;
CREATE ROLE
fiefdom=# 
fiefdom=# SET SESSION AUTHORIZATION lord;
SET
fiefdom=> 
fiefdom=> -- fail, merely a member of "squire"
fiefdom=> CREATE ROLE peon IN ROLE squire;
ERROR:  must have admin option on role "squire"
fiefdom=> 
fiefdom=> -- fail, no privilege to grant CREATEDB 
fiefdom=> CREATE ROLE peon CREATEDB;
ERROR:  must have createdb privilege to create createdb users
fiefdom=> 
fiefdom=> RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- grant admin over "squire" to "lord"
fiefdom=# GRANT squire
fiefdom-# TO lord
fiefdom-# WITH ADMIN OPTION;
GRANT ROLE
fiefdom=# 
fiefdom=# SET SESSION AUTHORIZATION lord;
SET
fiefdom=> 
fiefdom=> -- ok, have both "CREATEROLE" and admin option for "squire"
fiefdom=> CREATE ROLE peon IN ROLE squire;
CREATE ROLE
fiefdom=> 
fiefdom=> -- fail, no privilege to grant CREATEDB
fiefdom=> CREATE ROLE peasant CREATEDB IN ROLE squire;
ERROR:  must have createdb privilege to create createdb users
fiefdom=> 
fiefdom=> RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- Give lord the missing privilege
fiefdom=# GRANT CREATEDB TO lord;
ERROR:  role "createdb" does not exist
fiefdom=# 
fiefdom=# RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- ok, have "CREATEROLE", "CREATEDB", and admin option for "squire"
fiefdom=# CREATE ROLE peasant CREATEDB IN ROLE squire;
CREATE ROLE

The problem with this is that "lord" needs CREATEDB to grant CREATEDB, but 
really it should need something like grant option on "CREATEDB".  But that's 
hard to do with the existing system, given the way these privilege bits are 
represented.  If we added a few more built-in pg_* roles, such as pg_create_db, 
it would just work.  CREATEROLE itself could be reimagined as pg_create_role, 
and then users could be granted into this role with or without admin option, 
meaning they could/couldn't further give it away.  I think that would be a 
necessary component to Joshua's "bot" use-case, since the bot must itself have 
the privilege to create roles, but shouldn't necessarily be trusted with the 
privilege to create additional roles who have it.

[1] 
https://www.postgresql.org/message-id/53c7df4c-8463-4647-9dfd-779b5e186...@amazon.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-11 Thread Mark Dilger



> On Mar 11, 2022, at 7:58 AM, Robert Haas  wrote:
> 
> This kind of thing is always a judgement call. If we were talking
> about breaking 'SELECT * from table', I'm sure it would be hard to
> convince anybody to agree to do that at all, let alone with no
> deprecation period. Fortunately, CREATEROLE is less used, so breaking
> it will inconvenience fewer people.

This issue of how much backwards compatibility breakage we're willing to 
tolerate is just as important as questions about how we would want roles to 
work in a green-field development project.  The sense I got a year ago, on this 
list, was that changing CREATEROLE was acceptable, but changing other parts of 
the system, such as how ADMIN OPTION works, would go too far.

Role ownership did not yet exist, and that was a big motivation in introducing 
that concept, because you couldn't credibly say it broke other existing 
features.  It introduces the new notion that when a superuser creates a role, 
the superuser owns it, which is identical to how things implicitly work today; 
and when a CREATEROLE non-superuser creates a role, that role owns the new 
role, which is different from how it works today, arguably breaking 
CREATEROLE's prior behavior.  *But it doesn't break anything else*.

If we're going to change how ADMIN OPTION works, or how role membership works, 
or how inherit/noinherit works, let's first be clear that we are willing to 
accept whatever backwards incompatibility that entails.  This is not a 
green-field development project.  The constant spinning around with regard to 
how much compatibility we need to preserve is giving me vertigo.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-11 Thread Mark Dilger



> On Mar 11, 2022, at 8:48 AM, Stephen Frost  wrote:
> 
> I agree that it would have an impact on backwards compatibility to
> change how WITH ADMIN works- but it would also get us more in line with
> what the SQL standard says for how WITH ADMIN is supposed to work and
> that seems worth the change to me.

I'm fine with giving up some backwards compatibility to get some SQL standard 
compatibility, as long as we're clear that is what we're doing.  What you say 
about the SQL spec isn't great, though, because too much power is vested in 
"ADMIN".  I see "ADMIN" as at least three separate privileges together.  Maybe 
it would be spec compliant to implement "ADMIN" as a synonym for a set of 
separate privileges? 

> On Mar 11, 2022, at 8:41 AM, Stephen Frost  wrote:
> 
> That we aren't discussing the issues with the current GRANT ... WITH
> ADMIN OPTION and how we deviate from what the spec calls for when it
> comes to DROP ROLE, which seems to be the largest thing that's
> 'solved' with this ownership concept, is concerning to me.

Sure, let's discuss that a bit more.  Here is my best interpretation of your 
post about the spec, when applied to postgres with an eye towards not doing any 
more damage than necessary:

> On Mar 10, 2022, at 11:58 AM, Stephen Frost  wrote:
> 
> let's look at what the spec says:
> 
> CREATE ROLE
>  - Who is allowed to run CREATE ROLE is implementation-defined

This should be anyone with membership in pg_create_role.  

>  - After creation, this is effictively run:
>GRANT new_role TO creator_role WITH ADMIN, GRANTOR "_SYSTEM"

This should internally be implemented as three separate privileges, one which 
means you can grant the role, another which means you can drop the role, and a 
third that means you're a member of the role.  That way, they can be 
independently granted and revoked.  We could make "WITH ADMIN" a short-hand for 
"WITH G, D, M" where G, D, and M are whatever we name the independent 
privileges Grant, Drop, and Member-of.

Splitting G and D helps with backwards compatibility, because it gives people 
who want the traditional postgres "admin" a way to get there, by granting 
"G+M".  Splitting M from G and D makes it simpler to implement the "bot" idea, 
since the bot shouldn't have M.  But it does raise a question about always 
granting G+D+M to the creator, since the bot is the creator and we don't want 
the bot to have M.  This isn't a problem I've invented from thin air, mind you, 
as G+D+M is just the definition of ADMIN per the SQL spec, if I've understood 
you correctly.  So we need to think a bit more about the pg_create_role 
built-in role and whether that needs to be further refined to distinguish those 
who can get membership in roles they create vs. those who cannot.  This line of 
reasoning takes me in the direction of what I think you were calling #5 
upthread, but you'd have to elaborate on that, and how it interacts with the 
spec, for us to have a useful conversation about it.

> DROP ROLE
>  - Any user who has been GRANT'd a role with ADMIN option is able to
>DROP that role.

Change this to "Any role who has D on the role".  That's spec compliant, 
because anyone granted ADMIN necessarily has D.

> GRANT ROLE
>  - No cycles allowed
>  - A role must have ADMIN rights on the role to be able to GRANT it to
>another role.

Change this to "Any role who has G on the role".  That's spec compliant, 
because anyone grant ADMIN necessarily has G.

We should also fix the CREATE ROLE command to require the grantor have G on a 
role in order to give it to the new role as part of the command.  Changing the 
CREATEROLE, CREATEDB, REPLICATION, and BYPASSRLS attributes into 
pg_create_role, pg_create_db, pg_replication, and pg_bypassrls, the creator 
could only give them to the created role if the creator has G on the roles.  If 
we do this, we could keep the historical privilege bits and their syntax 
support for backward compatibility, or we could rip them out, but the decision 
between those two options is independent of the rest of the design.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-11 Thread Mark Dilger



> On Mar 11, 2022, at 2:46 PM, Stephen Frost  wrote:
> 
> I do think that’s reasonable … and believe I suggested it about 3 messages 
> ago in this thread. ;)  (step #3 I think it was?  Or maybe 4).

Yes, and you mentioned it to me off-list.

I'm soliciting a more concrete specification for what you are proposing.  To 
me, that means understanding how the SQL spec behavior that you champion 
translates into specific changes.  You specified some of this in steps #1 
through #5, but I'd like a clearer indication of how many of those (#1 alone, 
both #1 and #2, or what?) constitute a competing idea to the idea of role 
ownership, and greater detail about how each of those steps translate into 
specific behavior changes in postgres.  Your initial five-step email seems to 
be claiming that #1 by itself is competitive, but to me it seems at least #1 
and #2 would be required.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-11 Thread Mark Dilger



> On Mar 11, 2022, at 4:56 PM, Stephen Frost  wrote:
> 
> First … I outlined a fair bit of further description in the message you just 
> responded to but neglected to include in your response, which strikes me as 
> odd that you’re now asking for further explanation.



>  When it comes to completing the idea of role ownership- I didn’t come up 
> with that idea nor champion it

Sorry, not "completing", but "competing".  It seems we're discussing different 
ways to fix how roles and CREATEROLE work, and we have several ideas competing 
against each other.  (This differs from *people* competing against each other, 
as I don't necessarily like the patch I wrote better than I like your idea.)

> and therefore I’m not really sure how many of the steps are required to fully 
> support that concept..?

There are problems that the ownership concepts solve.  I strongly suspect that 
your proposal could also solve those same problems, and just trying to identify 
the specific portions of your proposal necessary to do so.

>  For my part, I would think that those steps necessary to satisfy the spec 
> would get us pretty darn close to what true folks advocating for role 
> ownership are asking for

I have little idea what "true folks" means in this context.  As for "advocating 
for role ownership", I'm not in that group.  Whether role ownership or 
something else, I just want some solution to a set of problems, mostly to do 
with needing superuser to do role management tasks.

> , but that doesn’t include the superuser-only alter role attributes piece.  
> Is that included in role ownership?  I wouldn’t think so, but some might 
> argue otherwise, and I don’t know that it is actually useful to divert into a 
> discussion about what is or isn’t in that.

Introducing the idea of role ownership doesn't fix that.  But a patch which 
introduces role ownership is useless unless CREATEROLE is also fixed.  There 
isn't any point having non-superusers create and own roles if, to do so, they 
need a privilege which can break into superuser.  But that argument is no 
different with a patch along the lines of what you are proposing.  CREATEROLE 
needs fixing either way.

> If we agree that the role attribute bits are independent

Yes.

> then I think I agree that we need 1 and 2 to get the capabilities that the 
> folks asking for role ownership want

Yes.

> as 2 is where we make sure that one admin of a role can’t revoke another 
> admin’s rights over that role.

Exactly, so #2 is part of the competing proposal.  (I get the sense you might 
not see these as competing proposals, but I find that framing useful for 
deciding which approach to pursue.)

>  Perhaps 2 isn’t strictly necessary in a carefully managed environment where 
> no one else is given admin rights over the mini-superuser roles, but I’d 
> rather not have folks depending on that.

I think it is necessary, and for the reason you say.

>  I’d still push back though and ask those advocating for this if it meets 
> what they’re asking for.  I got the impression that it did but maybe I 
> misunderstood.
> 
> In terms of exactly how things would work with these changes… I thought I 
> explained it pretty clearly, so it’s kind of hard to answer that further 
> without a specific question to answer.  Did you have something specific in 
> mind?  Perhaps I could answer a specific question and provide more clarity 
> that way.

Your emails contained a lot of "we could do this or that depending on what 
people want, and maybe this other thing, but that isn't really necessary, and 
" which left me unclear on the proposal.  I don't mean to disparage your 
communication style; it's just that when trying to distill technical details, 
high level conversation can be hard to grok.

I have the sense that you aren't going to submit a patch, so I wanted this 
thread to contain enough detail for somebody else to do so.  Thanks.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: role self-revocation

2022-03-14 Thread Mark Dilger



> On Mar 14, 2022, at 7:38 AM, Stephen Frost  wrote:
> 
> So ... do you feel like that's now the case?  Or were you looking for
> more?

I don't have any more questions at the moment.  Thanks!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg14 psql broke \d datname.nspname.relname

2022-03-15 Thread Mark Dilger



> On Mar 15, 2022, at 12:27 PM, Robert Haas  wrote:
> 
> - Justin Pryzby, who originally discovered the problem, prefers the
> same behavior that I prefer long-term, but thinks Tom's behavior is
> better than doing nothing.
> - Mark Dilger, Isaac Moreland, Garick Hamlin, Alvaro Herrera, and
> Julien Rouhaud have commented on the thread but have not endorsed
> either of these dueling proposals.

I vote in favor of committing the patch, though I'd also say it's not super 
important to me.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-16 Thread Mark Dilger



> On Mar 16, 2022, at 11:47 AM, Tom Lane  wrote:
> 
> Stepping back a bit ... do we really want to institutionalize the
> term "setting" for GUC variables?  I realize that the view pg_settings
> exists, but the documentation generally prefers the term "configuration
> parameters".  Where config.sgml uses "setting" as a noun, it's usually
> talking about a specific concrete value for a parameter, and you can
> argue that the view's name comports with that meaning.  But you can't
> GRANT a parameter's current value.
> 
> I don't have a better name to offer offhand --- I surely am not proposing
> that we change the syntax to be "GRANT ... ON CONFIGURATION PARAMETER x",
> because that's way too wordy.  But now is the time to bikeshed if we're
> gonna bikeshed, or else admit that we're not interested in precise
> vocabulary.

Informally, we often use "GUC" on this list, but that isn't used formally, 
leaving "configuration parameter" and "setting" as the two obvious choices.  I 
preferred "configuration parameter" originally and was argued out of it.  My 
take on "setting" was also that it more naturally refers to the choice of 
setting, not the thing being set, such that "work_mem = 8192" means the 
configuration parameter "work_mem" has the setting "8192".  I'm happy to change 
the patch along these lines, but I vaguely recall it being Robert who liked the 
"setting" language.  Robert?  (Sorry if I misremember that...)

> I'm also fairly allergic to the way that this patch has decided to assign
> multi-word names to privilege types (ie SET VALUE, ALTER SYSTEM).  There
> is no existing precedent for that, and I think it's going to break
> client-side code that we don't need to break.  It's not coincidental that
> this forces weird changes in rules about whitespace in the has_privilege
> functions, for example; and if you think that isn't going to cause
> problems I think you are wrong.  Perhaps we could just use "SET" and
> "ALTER", or "SET" and "SYSTEM"?

I chose those particular multi-word names to avoid needing to promote any 
keywords.  That's been a while ago, and I don't recall exactly where the 
shift-reduce or reduce-reduce errors were coming from.  I'll play with it to 
see what I can find.

> I also agree with the upthread criticism that this is abusing the
> ObjectPostAlterHook API unreasonably much.  In particular, it's
> trying to use pg_setting_acl OIDs as identities for GUCs, which
> they are not, first because they're unstable and second because
> most GUCs won't even have an entry there.  I therefore judge the
> hook calls added to ExecSetVariableStmt and AlterSystemSetConfigFile
> to be 100% useless, in fact probably counterproductive because they
> introduce a boatload of worries about whether the right things happen
> if the hook errors out or does something guc.c isn't expecting.

I think Joshua was planning to use these hooks for security purposes.  The 
hooks are supposed to check whether the Oid is valid, and if not, still be able 
to make choices based on the other information.  Joshua, any comment on this?

> I suggest that what might be saner is to consider that the "objects"
> that the hook calls are concerned with are the pg_setting_acl entries,
> not the underlying GUCs, and thus that the hooks need be invoked only
> when creating, destroying or altering those entries.

That's certainly a thing we could do, but I got the impression that Joshua 
wanted to hook into SET, RESET, and ALTER SYSTEM SET, not merely into GRANT and 
REVOKE.

>  If we do have
> a need for a hook editorializing on GUC value settings, that would
> have to be an independent API --- but I have heard no calls for
> the ability to have such a hook, and I don't think that this patch
> is the place to introduce one.

Well, the call for it was in this thread, but I'm ok with yanking out that part 
of the patch if it seems half-baked.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-17 Thread Mark Dilger



> On Mar 17, 2022, at 9:04 AM, Robert Haas  wrote:
> 
> not just in terms of telling other people how they have to write
> their patches.

...

>  the burden of maintaining relatively little-used features
> should fall entirely on people who don't care about them.

Joshua helped test the DAC portion of this patch, and answered a number of my 
questions on the topic, including in-person back in December.  I take your 
point, Robert, on the general principle, but the archives should reflect that 
Joshua did contribute to this specific patch.

Joshua, should we drop the DAC portion for postgres 15 and revisit the issue 
for postgres 16?  I think it's getting late in the development cycle to attempt 
what Tom described upthread, and I'd hate to see the rest of this patch punted.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-17 Thread Mark Dilger



> On Mar 17, 2022, at 9:29 AM, Joshua Brindle  
> wrote:
> 
> I hope this patch can be rolled
> into the contribution from Mark.

Working on it Thanks for the patch!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







New Object Access Type hooks

2022-03-17 Thread Mark Dilger
Hackers,

Over in [1], Joshua proposed a new set of Object Access Type hooks based on 
strings rather than Oids.

His patch was written to be applied atop my patch for granting privileges on 
gucs.

On review of his patch, I became uncomfortable with the complete lack of 
regression test coverage.  To be fair, he did paste a bit of testing logic to 
the thread, but it appears to be based on pgaudit, and it is unclear how to 
include such a test in the core project, where pgaudit is not assumed to be 
installed.

First, I refactored his patch to work against HEAD and not depend on my GUCs 
patch.  Find that as v1-0001.  The refactoring exposed a bit of a problem.  To 
call the new hook for SET and ALTER SYSTEM commands, I need to pass in the Oid 
of a catalog table.  But since my GUC patch isn't applied yet, there isn't any 
such table (pg_setting_acl or whatnot) to pass.  So I'm passing InvalidOid, but 
I don't know if that is right.  In any event, if we want a new API like this, 
we should think a bit harder about whether it can be used to check operations 
where no table Oid is applicable.

Second, I added a new test directory, src/test/modules/test_oat_hooks, which 
includes a new loadable module with hook implementations and a regression test 
for testing the object access hooks.  The main point of the test is to log 
which hooks get called in which order, and which hooks do or do not get called 
when other hooks allow or deny access.  That information shows up in the 
expected output as NOTICE messages.

This second patch has gotten a little long, and I'd like another pair of eyes 
on this before spending a second day on the effort.  Please note that this is a 
quick WIP patch in response to the patch Joshua posted earlier today.  Sorry 
for sometimes missing function comments, etc.  The goal, if this design seems 
acceptable, is to polish this, hopefully with Joshua's assistance, and get it 
committed *before* my GUCs patch, so that my patch can be rebased to use it.  
Otherwise, if this is rejected, I can continue on the GUC patch without this.

(FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl when 
testing v1-0001.  I'm not sure yet what that is about.)



v1-0001-Add-String-object-access-hooks.patch
Description: Binary data


v1-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


[1] 
https://www.postgresql.org/message-id/flat/664799.1647456444%40sss.pgh.pa.us#c9721c2da88d59684ac7ac5fc36f09c1

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-18 Thread Mark Dilger



> On Mar 18, 2022, at 7:16 AM, Joshua Brindle  
> wrote:
> 
> This is great, thank you for doing this. I didn't even realize the OAT
> hooks had no regression tests.
> 
> It looks good to me, I reviewed both and tested the module. I wonder
> if the slight abuse of subid is warranted with brand new hooks going
> in but not enough to object, I just hope this doesn't rise to the too
> large to merge this late level.

The majority of the patch is regression testing code, stuff which doesn't get 
installed.  It's even marked as NO_INSTALLCHECK, so it won't get installed even 
as part of "make installcheck".  That seems safe enough to me.

Not including tests of OAT seems worse, as it leaves us open to breaking the 
behavior without realizing we've done so.  A refactoring of the core code might 
cause hooks to be called in a different order, something which isn't 
necessarily wrong, but should not be done unknowingly.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-21 Thread Mark Dilger


> On Mar 18, 2022, at 3:04 PM, Andrew Dunstan  wrote:
> 
> I haven't looked at it in detail, but regarding the test code I'm not
> sure why there's a .control file, since this isn't a loadable extension,
> not why there's a test_oat_hooks.h file.

The .control file exists because the test defines a loadable module which 
defines the hooks.  The test_oat_hooks.h file was extraneous, and has been 
removed in v2.



v2-0001-Add-String-object-access-hooks.patch
Description: Binary data


v2-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-21 Thread Mark Dilger



> On Mar 21, 2022, at 8:41 AM, Andrew Dunstan  wrote:
> 
> My first inclination is to say it's probably ok. The immediately obvious
> alternative would be to create yet another set of functions that don't
> have classId parameters. That doesn't seem attractive.
> 
> Modulo that issue I think patch 1 is basically ok, but we should fix the
> comments in objectaccess.c.  Rather than "It is [the] entrypoint ..." we
> should have something like "Oid variant entrypoint ..." and "Name
> variant entrypoint ...", and also fix the function names in the comments.

Joshua,

Do you care to create a new version of this, perhaps based on the v2-0001 patch 
I just posted?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-21 Thread Mark Dilger


> On Mar 21, 2022, at 1:30 PM, Andrew Dunstan  wrote:
> 
> To the best of my knowledge .control files are only used by extensions,
> not by other modules. They are only referenced in
> src/backend/commands/extension.c in the backend code. For example,
> auto_explain which is a loadable module but not en extension does not
> have one, and I bet if you remove it you'll find this will work just fine.

Fixed, also with adjustments to Joshua's function comments.



v3-0001-Add-String-object-access-hooks.patch
Description: Binary data


v3-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg14 psql broke \d datname.nspname.relname

2022-03-21 Thread Mark Dilger


> On Mar 21, 2022, at 6:12 PM, Andres Freund  wrote:
> 
> Needs another one: http://cfbot.cputube.org/patch_37_3367.log
> 
> Marked as waiting-on-author.



v6-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 21, 2022, at 10:03 PM, Thomas Munro  wrote:
> 
> On Fri, Mar 18, 2022 at 4:22 PM Mark Dilger
>  wrote:
>> (FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl 
>> when testing v1-0001.  I'm not sure yet what that is about.)
> 
> Doesn't look like 0001 has anything to do with that...   Are you on a
> Mac?

Yes, macOS Catalina, currently 10.15.7.
  
>  Did it look like this recent failure from CI?
> 
> https://cirrus-ci.com/task/4686108033286144
> https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/regress_log_013_crash_restart
> https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/013_crash_restart_primary.log

I no longer have the logs for comparison.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 8:14 AM, Julien Rouhaud  wrote:
> 
> Hi,
> 
> On Tue, Mar 22, 2022 at 10:41:05AM -0400, Andrew Dunstan wrote:
>> 
>> Pushed with slight adjustments - the LOAD was unnecessary as was the
>> setting of client_min_messages - the latter would have made buildfarm
>> animals unhappy.
> 
> For the record this just failed on my buildfarm animal:
> https://brekka.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2022-03-22%2014%3A40%3A10&stg=misc-check.

culicidae is complaining:

==~_~===-=-===~_~== 
pgsql.build/src/test/modules/test_oat_hooks/log/postmaster.log 
==~_~===-=-===~_~==
2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  starting PostgreSQL 
15devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 11.2.0-18) 11.2.0, 
64-bit
2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  listening on Unix 
socket "/tmp/pg_regress-RiE7x8/.s.PGSQL.6280"
2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.202 UTC [2167006][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.203 UTC [2167009][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  checkpointer 
process (PID 2167006) exited with exit code 1
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  terminating any 
other active server processes
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  shutting down 
because restart_after_crash is off
2022-03-22 14:53:27.206 UTC [2166986][postmaster][:0] LOG:  database system is 
shut down
==~_~===-=-===~_~== pgsql.build/src/test/modules/test_rls_hooks/log/initdb.log 
==~_~===-=-===~_~==


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 9:09 AM, Andrew Dunstan  wrote:
> 
> If I can't com up with a very quick fix I'll revert it.

The problem is coming from the REGRESS_exec_check_perms, which was included in 
the patch to demonstrate when the other hooks fired relative to the 
ExecutorCheckPerms_hook, but since it is causing problems, I can submit a patch 
with that removed.  Give me a couple minutes


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger


> On Mar 22, 2022, at 9:11 AM, Mark Dilger  wrote:
> 
> Give me a couple minutes



v1-0001-Fix-build-farm-failures-in-test_oat_hooks.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger


> On Mar 22, 2022, at 9:33 AM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
>> On 3/22/22 11:26, Mark Dilger wrote:
>>> culicidae is complaining:
>>> 2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
>>> test_oat_hooks must be loaded via shared_preload_libraries
> 
>> That seems quite weird. I'm not sure how it's getting loaded at all if
>> not via shared_preload_libraries
> 
> After checking culicidae's config, I've duplicated this failure
> by building with EXEC_BACKEND defined.  So I'd opine that there
> is something broken about the method test_oat_hooks uses to
> decide if it was loaded via shared_preload_libraries or not.
> (Note that the failures appear to be coming out of auxiliary
> processes such as the checkpointer.)
> 
> As a quick-n-dirty fix to avoid reverting the entire test module,
> perhaps just delete this error check for now.

Ok, done as you suggest:



v2-0001-Fix-buildfarm-test-failures-in-test_oat_hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 9:58 AM, Tom Lane  wrote:
> 
>> Ok, done as you suggest:
> 
> I only suggested removing the error check in _PG_init, not
> changing the way the test works.

I should have been more explicit and said, "done as y'all suggest".  The "To" 
line of that email was to you and Andrew.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New Object Access Type hooks

2022-03-22 Thread Mark Dilger



> On Mar 22, 2022, at 3:20 PM, Andres Freund  wrote:
> 
> Seems like it might actually be good to test that object access hooks work
> well in a parallel worker. How about going the other way and explicitly 
> setting
> force_parallel_mode = disabled for parts of the test and to enabled for
> others?

Wouldn't we get differing numbers of NOTICE messages depending on how many 
parallel workers there are?  Or would you propose setting the number of workers 
to a small, fixed value?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







room for improvement in amcheck btree checking?

2020-12-01 Thread Mark Dilger
Peter,

While working on the pg_amcheck code for [1], I discovered an unexpected 
deficiency in the way btree indexes are checked.  So far as I can tell from the 
docs [2], the deficiency does not violate any promises that amcheck is making, 
but I found it rather surprising all the same.  To reproduce:

1) Create a (possibly empty) table and btree index over the table.
2) Flush buffers and backup a copy of the heap relation file.
3) Load (more) data into the table.
4) Flushing buffers as needed, revert the heap relation file to the backup 
previously taken.
5) Run bt_index_check and bt_index_parent_check with and without heapallindexed 
and/or rootdescend.  Note that the index passes all checks.
6) Run a SQL query that uses a sequential scan on the table and observe no 
errors.
7) Run a SQL query that uses an index scan on the table and see that it errors 
with something like:

   ERROR:  could not read block 0 in file "base/13097/16391": read only 0 of 
8192 bytes

I found it surprising that even when precisely zero of the tids in the index 
exist in the table the index checks all come back clean.  The heapallindexed 
check is technically running as advertised, checking that all of the zero 
tuples in the heap are present in the index.  That is a pretty useless check 
under this condition, though.  Is a "indexallheaped" option (by some less crazy 
name) needed?

Users might also run into this problem when a heap relation file gets 
erroneously shortened by some number of blocks but not fully truncated, or 
perhaps with torn page writes.

Have you already considered and rejected a "indexallheaped" type check?



Background
---

I have up until recently been focused on corruption caused by twiddling the 
bits within heap and index relation pages, but real-world user error, file 
system error, and perhaps race conditions in the core postgresql code seem at 
least as likely to result in missing or incorrect versions of blocks of 
relation files rather than individual bytes within those blocks being wrong.  
Per our discussions in [3], not all corruptions that can be created under 
laboratory conditions are equally likely to occur in the wild, and it may be 
reasonable to only harden the amcheck code against corruptions that are more 
likely to happen in actual practice.

To make it easier for tap tests to cover common corruption type scenarios, I 
have been extending PostgresNode.pm with functions to perform these kinds of 
file system corruptions.  I expect to post that work in another thread soon.  I 
am not embedding any knowledge of the internal structure of heap, index, or 
toast relations in PostgresNode, only creating functions to archive versions of 
files and perform full or partial reversions of them later.

The ultimate goal of this work is to have sufficient regression tests to 
demonstrate that pg_amcheck can be run with default options against a system 
corrupted in these common ways without crashing, and with reasonable likelihood 
of detecting these common corruptions.  Users might understand that hard to 
detect corruption will go unnoticed, but it would be harder to explain to them 
why, immediately after getting a clean bill of health on their system, a query 
bombed out with the sort of error shown above.


[1] https://commitfest.postgresql.org/31/2670/
[2] https://www.postgresql.org/docs/13/amcheck.html
[3] 
https://www.postgresql.org/message-id/flat/CAH2-WznaU6HcahLV4Hg-DnhEmW8DuSdYfn3vfWXoj3Me9jq%3DsQ%40mail.gmail.com#0691475da5e9163d21b13fc415095801

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: room for improvement in amcheck btree checking?

2020-12-01 Thread Mark Dilger



> On Dec 1, 2020, at 4:38 PM, Peter Geoghegan  wrote:
> 
> On Tue, Dec 1, 2020 at 12:39 PM Mark Dilger
>  wrote:
>> I found it surprising that even when precisely zero of the tids in the index 
>> exist in the table the index checks all come back clean.  The heapallindexed 
>> check is technically running as advertised, checking that all of the zero 
>> tuples in the heap are present in the index.  That is a pretty useless check 
>> under this condition, though.  Is a "indexallheaped" option (by some less 
>> crazy name) needed?
>> 
>> Users might also run into this problem when a heap relation file gets 
>> erroneously shortened by some number of blocks but not fully truncated, or 
>> perhaps with torn page writes.
> 
> It would probably be possible to detect this exact condition (though
> not other variants of the more general problem) relatively easily.
> Perhaps by doing something with RelationOpenSmgr() with the heap
> relation, while applying a little knowledge of what must be true about
> the heap relation given what is known about the index relation. (I'm
> not 100% sure that that's possible, but offhand it seems like it
> probably is.)
> 
> See also: commit 6754fe65a4c, which tightened things up in this area
> for the index relation itself.

Yes, some of the test code I've been playing with already hit the error 
messages introduced in that commit.

>> Have you already considered and rejected a "indexallheaped" type check?
> 
> Actually, I pointed out that we should do something along these lines
> very recently:
> 
> https://postgr.es/m/CAH2-Wz=dy--fg5ij0kpcqums0w5g+xqed3t-7he+uqak_hm...@mail.gmail.com

Right.

> I'd be happy to see you pursue it.

I'm not sure how much longer I'll be pursuing corruption checkers before 
switching to another task.  Certainly, I'm interested in pursuing this if time 
permits.

>> Background
>> ---
>> 
>> I have up until recently been focused on corruption caused by twiddling the 
>> bits within heap and index relation pages, but real-world user error, file 
>> system error, and perhaps race conditions in the core postgresql code seem 
>> at least as likely to result in missing or incorrect versions of blocks of 
>> relation files rather than individual bytes within those blocks being wrong. 
>>  Per our discussions in [3], not all corruptions that can be created under 
>> laboratory conditions are equally likely to occur in the wild, and it may be 
>> reasonable to only harden the amcheck code against corruptions that are more 
>> likely to happen in actual practice.
> 
> Right. I also like to harden amcheck in ways that happen to be easy,
> especially when it seems like it might kind of work as a broad
> defense that doesn't consider any particular scenario. For example,
> hardening to make sure the an index tuple's lp_len matches
> IndexTupleSize() for the tuple proper (this also kind of works as
> storage format documentation). I am concerned about both costs and
> benefits.
> 
> FWIW, this is a perfect example of the kind of hardening that makes
> sense to me. Clearly this kind of failure is a reasonably plausible
> scenario (probably even a known real world scenario with catalog
> corruption), and clearly we could do something about it that's pretty
> simple and obviously low risk. It easily meets the standard that I
> might apply here.

Ok, it seems we're in general agreement about that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Test harness for regex code (to allow importing Tcl's test suite)

2021-01-03 Thread Mark Dilger



> On Jan 3, 2021, at 7:49 PM, Tom Lane  wrote:
> 
> Over the holiday break I've been fooling with some regex performance
> improvements.  I don't have anything ready to show yet in that line,
> but I was feeling the need for more-thorough test coverage, so I set
> to work on something that's been in the back of my mind for a long
> time: we need to absorb the test cases that Henry Spencer wrote way-
> back-when for that code, which up to now existed only as a script
> in the Tcl test suite.  That state of affairs seemed OK to me in
> the beginning, when we thought of Tcl as the upstream for that code,
> and figured they'd vet any nontrivial changes.  They've pretty
> thoroughly dropped the ball on that though, and indeed I think that
> they now believe *we're* the upstream.  So we need to have a test
> suite that reflects that status, at least to the extent of running
> all the test cases that exist for that code.
> 
> Accordingly, here's a new src/test/modules package that creates a
> function modeled on regexp_matches(), but using a set of flags that
> matches Spencer's design for the Tcl test suite, allowing parts
> of src/backend/regex/ to be tested that can't be reached with our
> existing SQL-exposed functions.  The test scripts in the module
> reproduce all the tests in Tcl's "tests/reg.test" script as of
> Tcl 8.6.10, plus a few others that I felt advisable, such as tests
> for the lookbehind constraints we added a few years ago.  (Note:
> Tcl also has regexp.test and regexpComp.test, but those seem to be
> oriented towards testing their language-specific wrappers not the
> regex engine itself.)
> 
> According to my testing, this increases our test code coverage for
> src/backend/regex/ from 71.1% to 86.7%, which is not too shabby,
> especially seeing that a lot of the remainder is not-deterministically-
> reachable code for malloc failure handling.
> 
> Thoughts?  Is anyone interested in reviewing this?  Since it's only
> test code, I'd be okay with pushing it without review, but I'd be
> happy if someone else wants to look at it.

I've quickly read this over and generally like it.  Thanks for working on this!

Have you thought about whether If it weren't in test/modules, it might be nice 
to expose test_regex from SQL with a slightly different interface that doesn't 
throw on regex compilation error?  Maybe something in contrib?  It might be 
useful to some users to validate regular expressions.  I'm just asking... I 
don't have any problem with how you have it here.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: macOS SIP, next try

2021-01-04 Thread Mark Dilger



> On Dec 2, 2020, at 6:28 AM, Peter Eisentraut 
>  wrote:
> 
> Previous discussions on macOS SIP:
> 
> https://www.postgresql.org/message-id/flat/561E73AB.8060800%40gmx.net
> https://www.postgresql.org/message-id/flat/CA%2BTgmoYGi5oR8Rvb2-SY1_WEjd76H5gCqSukxFQ66qR7MewWAQ%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/6a4d6124-41f0-756b-0811-c5c5def7ef4b%402ndquadrant.com

See also 
https://www.postgresql.org/message-id/flat/18012hGLG6HJ9pQDkHAMYuwQKg%40sparkpost.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: new heapcheck contrib module

2021-01-07 Thread Mark Dilger
escend and bt_index_check 
wouldn't make sense, as there is no rootdescend option on that function.  So 
users would need multiple flags to turn on various options, with some flag 
combinations drawing an error about the flags not being mutually compatible.  
That's doable, but people may not like that interface.

> * heapallindexed is kind of expensive, but valuable. But the extra
> check is probably less likely to help on the second or subsequent
> index on a table.

There is a switch for enabling this.  It is off by default.

> It might be worth considering an option that only uses it with only
> one index: Preferably the primary key index, failing that some unique
> index, and failing that some other index.

It might make sense for somebody to submit this for a later release.  I don't 
have any plans to work on this during this release cycle.

>> I'm not very convinced by the decision to
>> override the user's decision about heapallindexed either.
> 
> I strongly agree.

I have removed the override.

> 
>> Maybe I lack
>> imagination, but that seems pretty arbitrary. Suppose there's a giant
>> index which is missing entries for 5 million heap tuples and also
>> there's 1 entry in the table which has an xmin that is less than the
>> pg_clas.relfrozenxid value by 1. You are proposing that because I have
>> the latter problem I don't want you to check for the former one. But
>> I, John Q. Smartuser, do not want you to second-guess what I told you
>> on the command line that I wanted. :-)
> 
> Even if your user is just average, they still have one major advantage
> over the architects of pg_amcheck: actual knowledge of the problem in
> front of them.

There is a switch for skipping index checks on corrupt tables.  By default, the 
indexes will be checked.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-13 Thread Mark Rofail
Hey Joel,

+ oprresult = DatumGetBool(FunctionCallInvoke(locfcinfo));
> -+ if (oprresult)
> ++ if (!locfcinfo->isnull && oprresult)
> + return true;
> + }
>
> Is checking !locfcinfo->isnull due to something new in v2,
> or what is just a correction for a problem also in v1?
>
The “!locfcinfo->isnull” is to protect against segmentation faults. I found
it was added to the original function which I adapted the “element” version
from.

/Mark


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-13 Thread Mark Rofail
Hey Joel,

test opr_sanity   ... FAILED
>
> AND binary_coercible(p2.opcintype, p1.amoplefttype));
>   amopfamily | amopstrategy | amopopr
> +--+-
> -(0 rows)
> +   2745 |5 |6105
> +(1 row)
>
> -- Operators that are primary members of opclasses must be immutable (else
> -- it suggests that the index ordering isn't fixed).  Operators that are
>
This is due using anycompatiblearray for the left operand in @>>.
To solve this problem we need to use @>>(anyarray,anyelement) or introduce
a new opclass for gin indices.
These are the two approaches that come to mind to solve this. Which one is
the right way or is there another solution I am not aware of?
That’s why I’m asking this on the mailing list, to get the community’s
input.

>
/Mark


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-17 Thread Mark Rofail
Hey Joel,

I will now proceed with the review of fk_arrays_elem_v2 as well.
>
Great work!!

/Mark

>


Re: new heapcheck contrib module

2021-02-23 Thread Mark Dilger



> On Feb 17, 2021, at 12:56 PM, Robert Haas  wrote:
> 
> On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger
>  wrote:
>> It will reconnect and retry a command one time on error.  That should cover 
>> the case that the connection to the database was merely lost.  If the second 
>> attempt also fails, no further retry of the same command is attempted, 
>> though commands for remaining relation targets will still be attempted, both 
>> for the database that had the error and for other remaining databases in the 
>> list.
>> 
>> Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` 
>> could result in two failures per relation in db2 before finally moving on to 
>> db3.  That seems pretty awful considering how many relations that could be, 
>> but failing to soldier on in the face of errors seems a strange design for a 
>> corruption checking tool.
> 
> That doesn't seem right at all. I think a PQsendQuery() failure is so
> remote that it's probably justification for giving up on the entire
> operation. If it's caused by a problem with some object, it probably
> means that accessing that object caused the whole database to go down,
> and retrying the object will take the database down again. Retrying
> the object is betting that the user interrupted connectivity between
> pg_amcheck and the database but the interruption is only momentary and
> the user actually wants to complete the operation. That seems unlikely
> to me. I think it's far more probably that the database crashed or got
> shut down and continuing is futile.
> 
> My proposal is: if we get an ERROR trying to *run* a query, give up on
> that object but still try the other ones after reconnecting. If we get
> a FATAL or PANIC trying to *run* a query, give up on the entire
> operation. If even sending a query fails, also give up.

This is changed in v40 as you propose to exit on FATAL and PANIC level errors 
and on error to send a query.  On lesser errors (which includes all corruption 
reports about btrees and some heap corruption related errors), the slot's 
connection is still useable, I think.  Are there cases where the error is lower 
than FATAL and yet the connection needs to be reestablished?  It does not seem 
so from the testing I have done, but perhaps I'm not thinking of the right sort 
of non-fatal error?

(I'll wait to post v40 until after hearing your thoughts on this.)

>> In v39, exit(1) is used for all errors which are intended to stop the 
>> program.  It is important to recognize that finding corruption is not an 
>> error in this sense.  A query to verify_heapam() can fail if the relation's 
>> checksums are bad, and that happens beyond verify_heapam()'s control when 
>> the page is not allowed into the buffers.  There can be errors if the file 
>> backing a relation is missing.  There may be other corruption error cases 
>> that I have not yet thought about.  The connections' errors get reported to 
>> the user, but pg_amcheck does not exit as a consequence of them.  As 
>> discussed above, failing to send the query to the server is not viewed as a 
>> reason to exit, either.  It would be hard to quantify all the failure modes, 
>> but presumably the catalogs for a database could be messed up enough to 
>> cause such failures, and I'm not sure that pg_amcheck should just abort.
> 
> I agree that exit(1) should happen after any error intended to stop
> the program. But I think it should also happen at the end of the run
> if we hit any problems for which we did not stop, so that exit(0)
> means your database is healthy.

In v40, exit(1) means the program encountered fatal errors leading it to stop, 
and exit(2) means that a non-fatal error and/or corruption reports occurred 
somewhere during the processing.  Otherwise, exit(0) means your database was 
successfully checked and is healthy.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Interest in GSoC 2021 Projects

2021-02-26 Thread Mark Wong
Hi,

On Fri, Feb 26, 2021 at 03:56:17PM +0800, Yixin Shi wrote:
> Hi,
> 
> 
> 
> I am Yixin Shi, a junior student majoring in Computer Science at University
> of Michigan. I notice that the project ideas for GSoC 2021 have been posted
> on your webpage and I am quite interested in two of them. I really wish to
> take part in the project *Make plsample a more complete procedural language
> handler example (2021)* considering both my interest and ability. The
> potential mentor for this Project is *Mark Wong*. I am also interested in
> the project *Improve PostgreSQL Regression Test Coverage (2021)* and the
> potential mentor is *Andreas Scherbaum* and *Stephen Frost*.
> 
> 
> 
> I would like to learn more about these two projects but failed to contact
> the mentors. How can I contact them? Also, I really hope to join the
> project. Are there any suggestions on application?

The idea for plsample is to continue providing working code and
documentation examples so that plsample can be used as a template for
creating additional language handlers.

Depending on the individual's comfort level, some effort may need to be
put into studying the current handlers for ideas on how to implement the
lacking functionality in plsample.

Does that help?

Regards,
Mark




Re: proposal: psql –help reflecting service or URI usage

2021-02-28 Thread Mark Dilger



> On Feb 28, 2021, at 1:57 AM, Paul Förster  wrote:
> 
> Hi,
> 
> I'd like to propose a patch to psql --help output:
> 
> Currently it is:
> 
> Usage:
>  psql [OPTION]... [DBNAME [USERNAME]]
> 
> ...
> 
> Connection options:
>  -h, --host=HOSTNAME  database server host or socket directory (default: 
> "local socket")
>  -p, --port=PORT  database server port (default: "5432")
>  -U, --username=USERNAME  database user name (default: "paul")
>  -w, --no-passwordnever prompt for password
>  -W, --password   force password prompt (should happen automatically)
> 
> I'd like to change it to the following to reflect the psql ability to process 
> a service name or a PostgreSQL URI:
> 
> Usage:
>  psql [OPTION]... [DBNAME [USERNAME]|service|uri]
> 
> ...
> 
> Connection options:
>  -h, --host=HOSTNAME  database server host or socket directory (default: 
> "local socket")
>  -p, --port=PORT  database server port (default: "5432")
>  -U, --username=USERNAME  database user name (default: "paul")
>  -w, --no-passwordnever prompt for password
>  -W, --password   force password prompt (should happen automatically)
>  service=name service name as definited in pg_service.conf

"definited" is a typo.

Should this say "as defined in pg_service.conf"?  That's the default, but the 
user might have $PGSERVICEFILE set to something else.  Perhaps you could borrow 
the wording of other options and use "(default: as defined in 
pg_service.conf)", or something like that, but of course being careful to still 
fit in the line length limit.

>  uri  connection URI (postgresql://...)
> 
> ...
> 
> Attached is a patch for src/bin/psql/help.c for this. The file 
> doc/src/sgml/ref/psql-ref.sgml does not seem to need any changes for this.
> 
> Any thoughts on this?

Other client applications follow the same pattern as psql, so if this change 
were adopted, it should apply to all of them.

Your proposal seems like something that would have been posted to the list 
before, possibly multiple times.  Any chance you could dig up past 
conversations on this subject and post links here for context?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: proposal: psql –help reflecting service or URI usage

2021-03-01 Thread Mark Dilger



> On Feb 28, 2021, at 10:10 AM, Paul Förster  wrote:
> 
> Hi Mark,
> 
>> On 28. Feb, 2021, at 17:54, Mark Dilger  wrote:
>> 
>> "definited" is a typo.
> 
> yes, definitely a typo, sorry. Thanks for pointing this out.
> 
>> Should this say "as defined in pg_service.conf"?  That's the default, but 
>> the user might have $PGSERVICEFILE set to something else.  Perhaps you could 
>> borrow the wording of other options and use "(default: as defined in 
>> pg_service.conf)", or something like that, but of course being careful to 
>> still fit in the line length limit.
> 
> I agree to all, thanks. What is the line length limit?

The output from --help should fit in a terminal window with only 80 characters 
width.  For example, in src/bin/scripts/createuser.c the line about 
--interactive is wrapped:

> printf(_("  -S, --no-superuserrole will not be superuser 
> (default)\n"));
> printf(_("  -V, --version output version information, then 
> exit\n"));
> printf(_("  --interactive prompt for missing role name and 
> attributes rather\n"
>  "than using defaults\n"));
> printf(_("  --replication role can initiate replication\n"));

You can find counter-examples where applications do not follow this rule.  
pg_isready is one of them.


>> Other client applications follow the same pattern as psql, so if this change 
>> were adopted, it should apply to all of them.
> 
> well, psql is central and IMHO the best place to start. I'd have to try out 
> all of them then. What I do know, though, is that pg_isready does not 
> understand a URI (why is that?), which is very unfortunate. So, I'd have to 
> try them all out and supply patches for them all?

There is a pattern in the client applications that the --help output is less 
verbose than the docs (see, for example, 
https://www.postgresql.org/docs/13/reference-client.html).  Your proposed patch 
makes psql's --help output a bit more verbose about this issue while leaving 
the other applications less so, without any obvious reason for the difference.

> Still, supporting a feature and not documenting it in its help is IMHO not a 
> good idea.

Ok.

>> Your proposal seems like something that would have been posted to the list 
>> before, possibly multiple times.  Any chance you could dig up past 
>> conversations on this subject and post links here for context?
> 
> I don't know any past discussions here. I only subscribed today to 
> psql-hackers. It might have been mentioned on psql-general, though. But I'm 
> not sure. This idea popped into my mind just yesterday when I was playing 
> around with psql and URIs and noticed that psql –help doesn't show them.

I mentioned looking at the mailing list archives, but neglected to give you a 
link:  https://www.postgresql.org/list/

Over the years, many proposals get made and debated, with some accepted and 
some rejected.  The rejected proposals often have some merit, and get suggested 
again, only to get rejected for the same reasons as the previous times they 
were suggested.  So searching the archives before posting a patch can sometimes 
be enlightening.  The difficulty in my experience is knowing what words and 
phrases to search for.  It can be a bit time consuming trying to find a prior 
discussion on a topic.

I don't know of any specific discussion which rejected your patch idea.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Add --tablespace option to reindexdb

2021-03-01 Thread Mark Dilger



> On Feb 25, 2021, at 10:49 PM, Michael Paquier  wrote:
> 
> Hi all,

Hi Michael,

> Since c5b2860, it is possible to specify a tablespace for a REINDEX,
> but the equivalent option has not been added to reindexdb.  Attached
> is a patch to take care of that.
> 
> This includes documentation and tests.

The documentation of the TABLESPACE option for REINDEX says:

+  Specifies that indexes will be rebuilt on a new tablespace.

but in your patch for reindexdb, it is less clear:

+Specifies the tablespace to reindex on. (This name is processed as
+a double-quoted identifier.)

I think the language "to reindex on" could lead users to think that indexes 
already existent in the given tablespace will be reindexed.  In other words, 
the option might be misconstrued as a way to specify all the indexes you want 
reindexed.  Whatever you do here, beware that you are using similar language in 
the --help, so consider changing that, too.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Add --tablespace option to reindexdb

2021-03-01 Thread Mark Dilger



> On Feb 25, 2021, at 10:49 PM, Michael Paquier  wrote:
> 
> While on it, I have added tests for toast tables and indexes with a
> tablespace move during a REINDEX. 


In t/090_reindexdb.pl, you add:

+# Create a tablespace for testing.
+my $ts = $node->basedir . '/regress_reindex_tbspace';
+mkdir $ts or die "cannot create directory $ts";
+# this takes care of WIN-specific path issues
+my $ets = TestLib::perl2host($ts);
+my $tbspace = 'reindex_tbspace';
+$node->safe_psql('postgres', "CREATE TABLESPACE $tbspace LOCATION '$ets';");

I recognize that you are borrowing some of that from 
src/bin/pgbench/t/001_pgbench_with_server.pl, but I don't find anything 
intuitive about the name "ets" and would rather not see that repeated.  There 
is nothing in TestLib::perl2host to explain this name choice, nor in pgbench's 
test, nor yours.

You also change a test:

 $node->issues_sql_like(
-   [ 'reindexdb', '-v', '-t', 'test1', 'postgres' ],
-   qr/statement: REINDEX \(VERBOSE\) TABLE public\.test1;/,
-   'reindex with verbose output');
+   [ 'reindexdb', '--concurrently', '-v', '-t', 'test1', 'postgres' ],
+   qr/statement: REINDEX \(VERBOSE\) TABLE CONCURRENTLY public\.test1;/,
+   'reindex concurrently with verbose output');

but I don't see what tests of the --concurrently option have to do with this 
patch.  I'm not saying there is anything wrong with this change, but it seems 
out of place.  Am I missing something?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Add --tablespace option to reindexdb

2021-03-01 Thread Mark Dilger



> On Feb 25, 2021, at 10:49 PM, Michael Paquier  wrote:
> 
> Anyway, I would rather group the whole set of
> tests together, and using the --tablespace option introduced here
> within a TAP test does the job.

Your check verifies that reindexing a system table on a new tablespace fails, 
but does not verify what the failure was.  I wonder if you might want to make 
it more robust, something like:

diff --git a/src/bin/scripts/t/090_reindexdb.pl 
b/src/bin/scripts/t/090_reindexdb.pl
index 6946268209..8453acc817 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 54;
+use Test::More tests => 58;
 
 program_help_ok('reindexdb');
 program_version_ok('reindexdb');
@@ -108,23 +108,35 @@ $node->issues_sql_like(
 # names, and CONCURRENTLY cannot be used in transaction blocks, preventing
 # the use of TRY/CATCH blocks in a custom function to filter error
 # messages.
-$node->command_fails(
+$node->command_checks_all(
[ 'reindexdb', '-t', $toast_table, '--tablespace', $tbspace, 'postgres' 
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast table with tablespace');
-$node->command_fails(
+$node->command_checks_all(
[
'reindexdb','--concurrently', '-t', $toast_table,
'--tablespace', $tbspace, 'postgres'
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast table concurrently with tablespace');
-$node->command_fails(
+$node->command_checks_all(
[ 'reindexdb', '-i', $toast_index, '--tablespace', $tbspace, 'postgres' 
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast index with tablespace');
-$node->command_fails(
+$node->command_checks_all(
[
'reindexdb','--concurrently', '-i', $toast_index,
'--tablespace', $tbspace, 'postgres'
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast index concurrently with tablespace');
 
 # connection strings


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Cross-reference comments on signal handling logic

2021-03-01 Thread Mark Dilger



> On Jan 17, 2021, at 11:51 PM, Craig Ringer  
> wrote:
> 
> 

In src/backend/postmaster/interrupt.c:

+ * These handlers are NOT used by normal user backends as they do not support

vs.

+ * Most backends use this handler.

These two comments seem to contradict.  If interrupt.c contains handlers that 
normal user backends to not use, then how can it be that most backends use one 
of the handlers in the file?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-03-01 Thread Mark Dilger



> On Feb 9, 2021, at 10:43 AM, Pavel Borisov  wrote:
> 
> вт, 9 февр. 2021 г. в 01:46, Mark Dilger :
> 
> 
> > On Feb 8, 2021, at 2:46 AM, Pavel Borisov  wrote:
> > 
> > 0002 - is a temporary hack for testing. It will allow inserting duplicates 
> > in a table even if an index with the exact name "idx" has a unique 
> > constraint (generally it is prohibited to insert). Then a new amcheck will 
> > tell us about these duplicates. It's pity but testing can not be done 
> > automatically, as it needs a core recompile. For testing I'd recommend a 
> > protocol similar to the following:
> > 
> > - Apply patch 0002
> > - Set autovaccum = off in postgresql.conf
> 
> Thanks Pavel and Anastasia for working on this!
> 
> Updating pg_catalog directly is ugly, but the following seems a simpler way 
> to set up a regression test than having to recompile.  What do you think?
> 
> Very nice idea, thanks!
> I've made a regression test based on it. PFA v.2 of a patch. Now it doesn't 
> need anything external for testing.

If bt_index_check() and bt_index_parent_check() are to have this functionality, 
shouldn't there be an option controlling it much as the option (heapallindexed 
boolean) controls checking whether all entries in the heap are indexed in the 
btree?  It seems inconsistent to have an option to avoid checking the heap for 
that, but not for this.  Alternately, this might even be better coded as its 
own function, named something like bt_unique_index_check() perhaps.  I hope 
Peter might advise?

The regression test you provided is not portable.  I am getting lots of errors 
due to differing output of the form "page lsn=0/4DAD7E0".  You might turn this 
into a TAP test and use a regular expression to check the output.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Use extended statistics to estimate (Var op Var) clauses

2021-03-01 Thread Mark Dilger



> On Nov 12, 2020, at 5:14 PM, Tomas Vondra  
> wrote:
> 
> <0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch>

Your patch no longer applies.  Can we get a new version please?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-03-01 Thread Mark Dilger



> On Mar 1, 2021, at 12:05 PM, Pavel Borisov  wrote:
> 
> The regression test you provided is not portable.  I am getting lots of 
> errors due to differing output of the form "page lsn=0/4DAD7E0".  You might 
> turn this into a TAP test and use a regular expression to check the output.
> May I ask you to ensure you used v3 of a patch to check? I've made tests 
> portable in v3, probably, you've checked not the last version.

Yes, my review was of v2.  Updating to v3, I see that the test passes on my 
laptop.  It still looks brittle to have all the tid values in the test output, 
but it does pass.

> Thanks for your attention to the patch

Thanks for the patch!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-03-01 Thread Mark Dilger



> On Mar 1, 2021, at 12:23 PM, Pavel Borisov  wrote:
> 
> If bt_index_check() and bt_index_parent_check() are to have this 
> functionality, shouldn't there be an option controlling it much as the option 
> (heapallindexed boolean) controls checking whether all entries in the heap 
> are indexed in the btree?  It seems inconsistent to have an option to avoid 
> checking the heap for that, but not for this.  Alternately, this might even 
> be better coded as its own function, named something like 
> bt_unique_index_check() perhaps.  I hope Peter might advise?
>  
> As for heap checking, my reasoning was that we can not check whether a unique 
> constraint violated by the index, without checking heap tuple visibility. 
> I.e. we can have many identical index entries without uniqueness violated if 
> only one of them corresponds to a visible heap tuple. So heap checking 
> included in my patch is _necessary_ for unique constraint checking, it should 
> not have an option to be disabled, otherwise, the only answer we can get is 
> that unique constraint MAY be violated and may not be, which is quite 
> useless. If you meant something different, please elaborate. 

I completely agree that checking uniqueness requires looking at the heap, but I 
don't agree that every caller of bt_index_check on an index wants that 
particular check to be performed.  There are multiple ways in which an index 
might be corrupt, and Peter wrote the code to only check some of them by 
default, with options to expand the checks to other things.  This is why 
heapallindexed is optional.  If you don't want to pay the price of checking all 
entries in the heap against the btree, you don't have to.

I'm not against running uniqueness checks on unique indexes.  It seems fairly 
normal that a user would want that.  Perhaps the option should default to 
'true' if unspecified?  But having no option at all seems to run contrary to 
how the other functionality is structured.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: new heapcheck contrib module

2021-03-01 Thread Mark Dilger



> On Mar 1, 2021, at 1:14 PM, Robert Haas  wrote:
> 
> On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger
>  wrote:
>> [ new patches ]
> 
> Regarding 0001:
> 
> There seem to be whitespace-only changes to the comment for select_loop().
> 
> I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal()
> changes could be simpler. First idea: Suppose you had
> ParallelSlotsSetup(numslots) that just creates the slot array with 0
> connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you
> want to make it own an existing connection. That seems like it might
> be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB()
> altogether, and just let ParallelSlotsGetIdle() connect the other
> slots as required? Preconnecting all slots before we do anything is
> good because ... of what?

Mostly because, if --jobs is set too high, you get an error before launching 
any work.  I don't know that it's really a big deal if vacuumdb or reindexdb 
have a bunch of tasks kicked off prior to exit(1) due to not being able to open 
connections for all the slots, but it is a behavioral change.

> I also wonder if things might be simplified by introducing a wrapper
> object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the
> number of slots (num_slots), the array of actual PGconn objects, and
> the ConnParams to be used for new connections, and the initcmd to be
> used for new connections. Maybe also the progname. This seems like it
> would avoid a bunch of repeated parameter passing: you could just
> create the ParallelSlotArray with the right contents, and then pass it
> around everywhere, instead of having to keep passing the same stuff
> in. If you want to switch to connecting to a different DB, you tweak
> the ConnParams - maybe using an accessor function - and the system
> figures the rest out.

The existing version of parallel slots (before any of my changes) could already 
have been written that way, but the author chose not to.  I thought about 
making the sort of change you suggest, and decided against, mostly on the 
principle of stare decisis.  But the idea is very appealing, and since you're 
on board, I think I'll go make that change.

> I wonder if it's really useful to generalize this to a point of caring
> about all the ConnParams fields, too. Like, if you only provide
> ParallelSlotUpdateDB(slotarray, dbname), then that's the only field
> that can change so you don't need to care about the others. And maybe
> you also don't really need to keep the ConnParams fields in every
> slot, either. Like, couldn't you just say something like: if
> (strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB,
> can't reuse without a reconnect }? I know sometimes a dbname is really
> a whole connection string, but perhaps we could try to fix that by
> using PQconninfoParse() in the right place, so that what ends up in
> the cparams is just the db name, not a whole connection string.

I went a little out of my way to avoid that, as I didn't want the next 
application that uses parallel slots to have to refactor it again, if for 
example they want to process in parallel databases listening on different 
ports, or to process commands issued under different roles.

> This is just based on a relatively short amount of time spent studying
> the patch, so I might well be off-base here. What do you think?

I like the ParallelSlotArray idea, and will go do that now.  I'm happy to defer 
to your judgement on the other stuff, too, but will wait to hear back from you.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Mark Dilger



> On Mar 1, 2021, at 9:38 AM, Joel Jacobson  wrote:
> 
> Forgive me for just sending a patch without much discussion on the list,
> but it was so easy to implement, so I thought an implementation can
> help the discussion on if this is something we want or not.

I like the idea so I did a bit of testing.  I think the following should not 
error, but does:

+SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
+ERROR:  range lower bound must be less than or equal to range upper bound


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Add --tablespace option to reindexdb

2021-03-02 Thread Mark Dilger



> On Mar 1, 2021, at 10:04 PM, Michael Paquier  wrote:
> 
> On Mon, Mar 01, 2021 at 10:12:54AM -0800, Mark Dilger wrote:
>> Your check verifies that reindexing a system table on a new
>> tablespace fails, but does not verify what the failure was.  I
>> wonder if you might want to make it more robust, something like:
> 
> I was not sure if that was worth adding or not, but no objections to
> do so.  So updated the patch to do that.
> 
> On Mon, Mar 01, 2021 at 09:47:57AM -0800, Mark Dilger wrote:
>> I recognize that you are borrowing some of that from
>> src/bin/pgbench/t/001_pgbench_with_server.pl, but I don't find
>> anything intuitive about the name "ets" and would rather not see
>> that repeated.  There is nothing in TestLib::perl2host to explain
>> this name choice, nor in pgbench's test, nor yours.
> 
> Okay, I have made the variable names more explicit.
> 
>> but I don't see what tests of the --concurrently option have to do
>> with this patch.  I'm not saying there is anything wrong with this
>> change, but it seems out of place.  Am I missing something?
> 
> While hacking on this feature I have just bumped into this separate
> issue, where the same test just gets repeated twice.  I have just
> applied an independent patch to take care of this problem separately,
> and backpatched it down to 12 where this got introduced.
> 
> On Mon, Mar 01, 2021 at 09:26:03AM -0800, Mark Dilger wrote:
>> I think the language "to reindex on" could lead users to think that
>> indexes already existent in the given tablespace will be reindexed.
>> In other words, the option might be misconstrued as a way to specify
>> all the indexes you want reindexed.  Whatever you do here, beware
>> that you are using similar language in the --help, so consider
>> changing that, too.
> 
> OK.  I have switched the docs to "Specifies the tablespace where
> indexes are rebuilt" and --help to "tablespace where indexes are
> rebuilt".
> 
> I have also removed the assertions based on the version number of the
> backend, based on Daniel's input sent upthread.
> 
> What do you think?

Looks good.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 5:34 AM, Isaac Morland  wrote:
> 
> Returning to the RE result issue, I wonder how much it actually matters where 
> any empty matches are. Certainly the actual contents of the match don’t 
> matter; you don’t need to be able to index into the string to extract the 
> substring. The only scenario I can see where it could matter is if the RE is 
> using lookahead or look back to find occurrences before or after something 
> else. If we stipulate that the result array will be in order, then you still 
> don’t have the exact location of empty matches but you do at least have where 
> they are relative to non-empty matches.

I agree the contents of the match don't matter, because they are always empty.  
But the position matters.  You could intend to split a string in multiple 
places using lookaheads and lookbehinds to determine the split points.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 5:20 AM, Joel Jacobson  wrote:
> 
> it's currently not possible to create an empty range with bounds information.
> 
> This patch tries to improve the situation by keeping the bounds information,
> and allow accessing it via lower() and upper().
> 
> No other semantics have been changed.
> All tests passes without any changes.

I recall this issue of empty ranges not keeping any bounds information being 
discussed back when range types were developed, and the design choice was 
intentional.  Searching the archives for that discussion, I don't find 
anything, probably because I'm not searching for the right keywords.  Anybody 
have a link to that discussion?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 8:51 AM, Pantelis Theodosiou  wrote:
> 
> 
> 
> On Tue, Mar 2, 2021 at 3:28 PM Mark Dilger  
> wrote:
> 
> 
> > On Mar 2, 2021, at 5:20 AM, Joel Jacobson  wrote:
> > 
> > it's currently not possible to create an empty range with bounds 
> > information.
> > 
> > This patch tries to improve the situation by keeping the bounds information,
> > and allow accessing it via lower() and upper().
> > 
> > No other semantics have been changed.
> > All tests passes without any changes.
> 
> I recall this issue of empty ranges not keeping any bounds information being 
> discussed back when range types were developed, and the design choice was 
> intentional.  Searching the archives for that discussion, I don't find 
> anything, probably because I'm not searching for the right keywords.  Anybody 
> have a link to that discussion?
> 
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
>  Marc, perhaps you were referring to this discussion?
> https://www.postgresql.org/message-id/4d5534d002250003a...@gw.wicourts.gov

Yes, I believe so.  Thank you for the link.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 9:52 AM, Joel Jacobson  wrote:
> 
> On Tue, Mar 2, 2021, at 15:42, Tom Lane wrote:
>> "Joel Jacobson"  writes:
>> > As discussed in the separate thread "[PATCH] regexp_positions ( string 
>> > text, pattern text, flags text ) → setof int4range[]" [1]
>> > it's currently not possible to create an empty range with bounds 
>> > information.
>> 
>> > This patch tries to improve the situation by keeping the bounds 
>> > information,
>> > and allow accessing it via lower() and upper().
>> 
>> I think this is an actively bad idea.  We had a clean set-theoretic
>> definition of ranges as sets of points, and with this we would not.
>> We should not be whacking around the fundamental semantics of a
>> whole class of data types on the basis that it'd be cute to make
>> regexp_position return its result as int4range rather than int[].
> 
> I think there are *lots* of other use-cases where the current semantics of 
> range types are very problematic.

I'm inclined to think that this conversation is ten years too late.  Range 
semantics are already relied upon in our code, but also in the code of others.  
It might be very hard to debug code that was correct when written but broken by 
this proposed change.  The problem is not just with lower() and upper(), but 
with equality testing (mentioned upthread), since code may rely on two 
different "positions" (your word) both being equal, and both sorting the same.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 10:08 AM, Joel Jacobson  wrote:
> 
> On Tue, Mar 2, 2021, at 19:01, Mark Dilger wrote:
>> The problem is not just with lower() and upper(), but with equality testing 
>> (mentioned upthread), since code may rely on two different "positions" (your 
>> word) both being equal, and both sorting the same.
> 
> That's why the patch doesn't change equality.

How does that work if I SELECT DISTINCT ON (nr) ... and then take upper(nr).  
It's just random which values I get?  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 10:16 AM, Chapman Flack  wrote:
> 
> On 03/02/21 13:01, Mark Dilger wrote:
>> The problem is not just with lower() and upper(), but with equality testing
>> (mentioned upthread), since code may rely on two different "positions"
>> (your word) both being equal, and both sorting the same.
> 
> Could those concerns be addressed perhaps, not by adding an entirely new
> just-like-a-range-but-remembers-position-when-zero-width type (which would
> feel wartlike to me), but by tweaking ranges to /secretly/ remember the
> position when zero width?
> 
> Secretly, in the sense that upper(), lower(), and the default sort
> operator would keep their established behavior, but new functions like
> upper_or_pos(), lower_or_pos() would return the non-NULL value even for
> an empty range, and another sort operator could be provided for use
> when one wants the ordering to reflect it?

I vaguely recall that ten years ago I wanted zero-width range types to not 
collapse into an empty range.  I can't recall if I ever expressed that opinion 
-- I just remember thinking it would be nice, and for reasons similar to what 
Joel is arguing here.  But I can't see having 
compares-equal-but-not-truly-equal ranges as a good idea.  I think Tom is right 
about that.

I also think the regexp work that inspired this thread could return something 
other than a range, so the motivation for creating a frankenstein range 
implementation doesn't really exist.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 11:34 AM, Joel Jacobson  wrote:
> 
> Yes. It's random, since equality isn't changed, the sort operation cannot 
> tell the difference, and nor could a user who isn't aware of upper() / 
> lower() could reveal differences.

This sounds unworkable even just in light of the original motivation for this 
whole thread.  If I use your proposed regexp_positions(string text, pattern 
text, flags text) function to parse a large number of "positions" from a 
document, store all those positions in a table, and do a join of those 
positions against something else, it's not going to work.  Positions will 
randomly vanish from the results of that join, which is going to be really 
surprising.  I'm sure there are other examples of Tom's general point about 
compares-equal-but-not-equal datatypes.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 11:42 AM, Mark Dilger  wrote:
> 
> 
> 
>> On Mar 2, 2021, at 11:34 AM, Joel Jacobson  wrote:
>> 
>> Yes. It's random, since equality isn't changed, the sort operation cannot 
>> tell the difference, and nor could a user who isn't aware of upper() / 
>> lower() could reveal differences.
> 
> This sounds unworkable even just in light of the original motivation for this 
> whole thread.  If I use your proposed regexp_positions(string text, pattern 
> text, flags text) function to parse a large number of "positions" from a 
> document, store all those positions in a table, and do a join of those 
> positions against something else, it's not going to work.  Positions will 
> randomly vanish from the results of that join, which is going to be really 
> surprising.  I'm sure there are other examples of Tom's general point about 
> compares-equal-but-not-equal datatypes.

I didn't phrase that clearly enough.  I'm thinking about whether you include 
the bounds information in the hash function.  The current implementation of 
hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds, since 
you didn't change it to do otherwise, so "equal" values won't always hash the 
same.  I haven't tested this out, but it seems you could get a different set of 
rows depending on whether the planner selects a hash join.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 12:04 PM, Joel Jacobson  wrote:
> 
> On Tue, Mar 2, 2021, at 20:57, Mark Dilger wrote:
>> I didn't phrase that clearly enough.  I'm thinking about whether you include 
>> the bounds information in the hash function.  The current implementation of 
>> hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds, 
>> since you didn't change it to do otherwise, so "equal" values won't always 
>> hash the same.  I haven't tested this out, but it seems you could get a 
>> different set of rows depending on whether the planner selects a hash join.
> 
> I think this issue is solved by the 
> empty-ranges-with-bounds-information-v2.patch I just sent,
> since with it, there are no semantic changes at all, lower() and upper() 
> works like before.

There are semantic differences, because hash_range() doesn't call lower() and 
upper(), it uses RANGE_HAS_LBOUND and RANGE_HAS_UBOUND, the behavior of which 
you have changed.  I created a regression test and expected results and checked 
after applying your patch, and your patch breaks the hash function behavior.  
Notice that before your patch, all three ranges hashed to the same value, but 
not so after:


@@ -1,18 +1,18 @@
 select hash_range('[a,a)'::textrange);
  hash_range
 
-  484847245
+ -590102690
 (1 row)

 select hash_range('[b,b)'::textrange);
  hash_range
 
-  484847245
+  281562732
 (1 row)

 select hash_range('[c,c)'::textrange);
- hash_range 
-
-  484847245
+ hash_range  
+-
+ -1887445565
 (1 row)


You might change how hash_range() works to get all "equal" values to hash the 
same value, but that just gets back to the problem that non-equal things appear 
to be equal.  I guess that's your two-warty-feet preference, but not everyone 
is going to be in agreement on that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Support empty ranges with bounds information

2021-03-02 Thread Mark Dilger



> On Mar 2, 2021, at 12:51 PM, Joel Jacobson  wrote:
> 
> Yikes. Here be dragons. I think I want my wart free foot back please.
> 
> Many thanks for explaining. I think I’ll abandon this patch. I guess 
> implementing an entirely new range type could be an acceptable solution, but 
> that’s too big of a project for me to manage on my own. If any more 
> experienced hackers are interested in such a project, I would love to help if 
> I can.

Part of what was strange about arguing against your patch is that I kind of 
wanted the feature to work that way back when it originally got written.  (Not 
to say that it *should* have worked that way, just that part of me wanted it.)


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-03-03 Thread Mark Rofail
Hello Justin,

It doesn't just rebase: it also removes the test which was failing on
> windows
> CI:
>
My apologies, I didn’t include it in the changelog since this is not a code
change, just wanted to see if any other test would fail on the windows CI

I think the SELECT, when it works, is actually doing a seq scan and not
> using
> the index.  On my PC, the index scan is used until an autovacuum/analyze
> run,
> after which it uses seqscan.  I'm not sure how the other CIs all managed
> to run
> autovacuum between creating a table and running a query on it, though.

This is genius! That explains it. I have been racking my brain for two
weeks now and you figured it out.

I guess you should first run the query with "explain (costs off)" to show
> what
> plan it's using, and add things like "SET enable_seqscan=off" as needed to
> guarantee that everyone will use the same plan, regardless of minor cost
> differences and vacuum timing.

I think that will solve the test discrepancy.

Honestly Justin, hats off!

/Mark

>


Re: pg_amcheck contrib application

2021-03-04 Thread Mark Dilger



> On Mar 4, 2021, at 2:04 PM, Peter Geoghegan  wrote:
> 
> On Thu, Mar 4, 2021 at 7:29 AM Robert Haas  wrote:
>> I think this whole approach is pretty suspect because the number of
>> blocks in the relation can increase (by relation extension) or
>> decrease (by VACUUM or TRUNCATE) between the time when we query for
>> the list of target relations and the time we get around to executing
>> any queries against them. I think it's OK to use the number of
>> relation pages for progress reporting because progress reporting is
>> only approximate anyway, but I wouldn't print them out in the progress
>> messages, and I wouldn't try to fix up the startblock and endblock
>> arguments on the basis of how long you think that relation is going to
>> be.
> 
> I don't think that the struct AmcheckOptions block fields (e.g.,
> startblock) should be of type 'long' -- that doesn't work well on
> Windows, where 'long' is only 32-bit. To be fair we already do the
> same thing elsewhere, but there is no reason to repeat those mistakes.
> (I'm rather suspicious of 'long' in general.)
> 
> I think that you could use BlockNumber + strtoul() without breaking Windows.

Fair enough.

>> There are a LOT of things that can go wrong when we go try to run
>> verify_heapam on a table. The table might have been dropped; in fact,
>> on a busy production system, such cases are likely to occur routinely
>> if DDL is common, which for many users it is. The system catalog
>> entries might be screwed up, so that the relation can't be opened.
>> There might be an unreadable page in the relation, either because the
>> OS reports an I/O error or something like that, or because checksum
>> verification fails. There are various other possibilities. We
>> shouldn't view such errors as low-level things that occur only in
>> fringe cases; this is a corruption-checking tool, and we should expect
>> that running it against messed-up databases will be common. We
>> shouldn't try to interpret the errors we get or make any big decisions
>> about them, but we should have a clear way of reporting them so that
>> the user can decide what to do.
> 
> I agree.
> 
> Your database is not supposed to be corrupt. Once your database has
> become corrupt, all bets are off -- something happened that was
> supposed to be impossible -- which seems like a good reason to be
> modest about what we think we know.
> 
> The user should always see the unvarnished truth. pg_amcheck should
> not presume to suppress errors from lower level code, except perhaps
> in well-scoped special cases.

I think Robert mistook why I was doing that.  I was thinking about a different 
usage pattern.  If somebody thinks a subset of relations have been badly 
corrupted, but doesn't know which relations those might be, they might try to 
find them with pg_amcheck, but wanting to just check the first few blocks per 
relation in order to sample the relations.  So,

  pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes

or something like that.  I don't think it's very fun to have it error out for 
each relation that doesn't have at least ten blocks, nor is it fun to have 
those relations skipped by error'ing out before checking any blocks, as they 
might be the corrupt relations you are looking for.  But using --startblock and 
--endblock for this is not a natural fit, as evidenced by how I was trying to 
"fix things up" for the user, so I'll punt on this usage until some future 
version, when I might add a sampling option.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: proposal: psql –help reflecting service or URI usage

2021-03-08 Thread Mark Dilger



> On Mar 6, 2021, at 5:55 AM, Paul Förster  wrote:
> 
> Hi Mark,
> 
> sorry for the delay.
> 
>> On 01. Mar, 2021, at 17:02, Mark Dilger  wrote:
>> 
>> The output from --help should fit in a terminal window with only 80 
>> characters width.  For example, in src/bin/scripts/createuser.c the line 
>> about --interactive is wrapped:
> 
> I see.
> 
>> You can find counter-examples where applications do not follow this rule.  
>> pg_isready is one of them.
> 
> yes, I noticed that.
> 
>> There is a pattern in the client applications that the --help output is less 
>> verbose than the docs (see, for example, 
>> https://www.postgresql.org/docs/13/reference-client.html).  Your proposed 
>> patch makes psql's --help output a bit more verbose about this issue while 
>> leaving the other applications less so, without any obvious reason for the 
>> difference.
> 
> I could do the other tools too, that wouldn't be a problem. But I'm not good 
> at writing docs. And I found that the man pages would have to be adapted too, 
> which would be a big one for me as I'm no man guru.

Fortunately, the man pages and html docs are generated from the same sources.  
Those sources are written in sgml, and the tools to build the docs must be 
installed.  From the top directory, execute `make docs` and if it complains 
about missing tools you will need to install them.  (The build target is 
'docs', but the directory containing the docs is named 'doc'.)

>> Over the years, many proposals get made and debated, with some accepted and 
>> some rejected.  The rejected proposals often have some merit, and get 
>> suggested again, only to get rejected for the same reasons as the previous 
>> times they were suggested.  So searching the archives before posting a patch 
>> can sometimes be enlightening.  The difficulty in my experience is knowing 
>> what words and phrases to search for.  It can be a bit time consuming trying 
>> to find a prior discussion on a topic.
> 
> I've searched the archives for quite some time and found tons of hits for the 
> search term "help". But that's useless because all people ask for "help". :-) 
> Beside that, I found nothing which even remotely discussed the topic.
> 
> But I generally see your points so I drop the proposal. It was only an idea 
> after all :-) Thanks for your input.

Oh, I'm quite sorry to hear that.  The process of getting a patch accepted, 
especially the first time you submit one, can be discouraging.  But the 
community greatly benefits from new contributors joining the effort, so I'd 
much rather you not withdraw the idea.

If you need help with certain portions of the submission, such as editing the 
docs, I can help with that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Mark Dilger



> On Mar 5, 2021, at 11:46 AM, Joel Jacobson  wrote:
> 
> 
> /Joel
> <0003-regexp-positions.patch>

I did a bit more testing:

+SELECT regexp_positions('foobarbequebaz', 'b', 'g');
+ regexp_positions 
+--
+ {"[3,5)"}
+ {"[6,8)"}
+ {"[11,13)"}
+(3 rows)
+

I understand that these ranges are intended to be read as one character long 
matches starting at positions 3, 6, and 11, but they look like they match 
either two or three characters, depending on how you read them, and users will 
likely be confused by that.

+SELECT regexp_positions('foobarbequebaz', '(?=beque)', 'g');
+ regexp_positions 
+--
+ {"[6,7)"}
+(1 row)
+

This is a zero length match.  As above, it might be confusing that a zero 
length match reads this way.

+SELECT regexp_positions('foobarbequebaz', '(?<=z)', 'g');
+ regexp_positions 
+--
+ {"[14,15)"}
+(1 row)
+

Same here, except this time position 15 is referenced, which is beyond the end 
of the string.

I think a zero length match at the end of this string should read as 
{"[14,14)"}, and you have been forced to add one to avoid that collapsing down 
to "empty", but I'd rather you found a different datatype rather than abuse the 
definition of int4range.

It seems that you may have reached a similar conclusion down-thread?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: proposal: psql –help reflecting service or URI usage

2021-03-08 Thread Mark Dilger



> On Mar 8, 2021, at 8:40 AM, Paul Förster  wrote:
> 
> Hi Mark,
> 
>> On 08. Mar, 2021, at 16:39, Mark Dilger  wrote:
>> 
>> Fortunately, the man pages and html docs are generated from the same 
>> sources.  Those sources are written in sgml, and the tools to build the docs 
>> must be installed.  From the top directory, execute `make docs` and if it 
>> complains about missing tools you will need to install them.  (The build 
>> target is 'docs', but the directory containing the docs is named 'doc'.)
> 
> so the help files I'd change would be doc/src/sgml/ref/psql-ref.sgml, 
> doc/src/sgml/ref/pg_isready.sgml, etc.?

Yes

>> Oh, I'm quite sorry to hear that.  The process of getting a patch accepted, 
>> especially the first time you submit one, can be discouraging.  But the 
>> community greatly benefits from new contributors joining the effort, so I'd 
>> much rather you not withdraw the idea.
> 
> I'd like to, and also I'd like to do all the bin/* tools (including wrapping 
> the long line in pg_isready ;-)), as you suggested, but I don't know the 
> process. In my first admittedly naive attempt, I just downloaded the source 
> from https://www.postgresql.org/ftp/source/v13.2, unpacked it and made my 
> changes there. Then I did a diff to the original and posted it here. I don't 
> even know if this is the correct workflow. I saw gitgub being mentioned a 
> couple of times but I don't have an account, nor do I even know how it works.
> 
> I was pretty surprised to see the lines in PWN:
> 
> "Paul Förster sent in a patch to mention database URIs in psql's --help 
> output."
> "Paul Förster sent in another revision of a patch to mention URIs and 
> services in psql --help's output."
> 
> Is there a FAQ somewhere that describes how properly create patches, submit 
> them and possibly get them released? Something like a step-by-step?
> 
> Is github a must-have here?

No, github is not a must-have.  I don't use a github account for my 
submissions.  The project uses git for source code control, but that's not the 
same thing as requiring "github".  The project switched from cvs to git a few 
years back.

If you can install git, using rpm or yum or whatever, then from the command 
line you can use

  git clone https://git.postgresql.org/git/postgresql.git

and it will create a working directory for you.  You can make modifications and 
commit them.  When you are finished, you can run

  git format-patch -v 1 master

and it will create a patch set containing all your changes relative to the 
public sources you cloned, and the patch will include your commit messages, 
which helps reviewers not only know what you changed, but why you made the 
changes, in your own words.

See https://wiki.postgresql.org/wiki/Development_information

>> If you need help with certain portions of the submission, such as editing 
>> the docs, I can help with that.
> 
> as you see above, I'm curious to learn, though doing it to all the tools will 
> take some time for me.
> 
> Sorry, I'm a noob, not so much to C, but to the workflows here. Hence my 
> questions may seem a little obvious to all the pros.

That's not a problem.  If this gets too verbose for the list, we can take this 
off-list and I can still walk you through it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Mark Dilger



> On Mar 8, 2021, at 9:05 AM, Joel Jacobson  wrote:
> 
> If a N+1 dimension array could easily be unnested to a N dimension array,
> I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and 
> not controversial.

How about proposing some array functions to go along with the regexp_positions, 
and then do it that way?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Mark Dilger



> On Mar 8, 2021, at 9:20 AM, Joel Jacobson  wrote:
> 
> On Mon, Mar 8, 2021, at 18:11, Mark Dilger wrote:
>> > On Mar 8, 2021, at 9:05 AM, Joel Jacobson  wrote:
>> > 
>> > If a N+1 dimension array could easily be unnested to a N dimension array,
>> > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and 
>> > not controversial.
>> 
>> How about proposing some array functions to go along with the 
>> regexp_positions, and then do it that way?
> 
> Sounds like a nice solution. That would be a huge win when dealing with 
> multidimensional arrays in general.
> 
> Do we have strong support on the list for such a function? If so, I can make 
> an attempt implementing it, unless some more experienced hacker wants to do 
> it.

That's a hard question to answer in advance.  Typically, you need to propose a 
solution, and then get feedback.  You wouldn't need to post a patch, but 
perhaps some examples of how you would expect it to work, like

+SELECT slice('{{1,2,3,4},{5,6,7,8},{9,10,11,12},{13,14,15,16}}'::integer[][], 
'[2,4)'::int4range);
+   slice   
+---
+ {{2,3}}
+ {{6,7}}
+ {{10,11}}
+ {{14,15}}
+(4 rows)
+
+SELECT 
slice('{{{1,2,3},{4,5,6},{7,8,9}},{{10,11,12},{13,14,15},{16,17,18}}}'::integer[][][],
 '[2,4)'::int4range);
+   slice   
+---
+ {{{4,5,6},{7,8,9}}}
+ {{{13,14,15},{16,17,18}}}
+(2 rows)
+

and then people can tell you why they hate that choice of interface.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-08 Thread Mark Dilger



> On Mar 8, 2021, at 8:26 AM, Robert Haas  wrote:
> 
> On Thu, Mar 4, 2021 at 5:39 PM Mark Dilger  
> wrote:
>> I think Robert mistook why I was doing that.  I was thinking about a 
>> different usage pattern.  If somebody thinks a subset of relations have been 
>> badly corrupted, but doesn't know which relations those might be, they might 
>> try to find them with pg_amcheck, but wanting to just check the first few 
>> blocks per relation in order to sample the relations.  So,
>> 
>>  pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
>> 
>> or something like that.  I don't think it's very fun to have it error out 
>> for each relation that doesn't have at least ten blocks, nor is it fun to 
>> have those relations skipped by error'ing out before checking any blocks, as 
>> they might be the corrupt relations you are looking for.  But using 
>> --startblock and --endblock for this is not a natural fit, as evidenced by 
>> how I was trying to "fix things up" for the user, so I'll punt on this usage 
>> until some future version, when I might add a sampling option.
> 
> I admit I hadn't thought of that use case. I guess somebody could want
> to do that, but it doesn't seem all that useful. Checking the first
> up-to-ten blocks of every relation is not a very representative
> sample, and it's not clear to me that sampling is a good idea even if
> it were representative. What good is it to know that 10% of my
> database is probably not corrupted?


`cd $PGDATA; tar xfz my_csv_data.tgz` ctrl-C ctrl-C ctrl-C
`rm -rf $PGDATA` ctrl-C ctrl-C ctrl-C
`/my/stupid/backup/and/restore/script.sh` ctrl-C ctrl-C ctrl-C

# oh wow, i wonder if any relations got overwritten with csv file data, or had 
their relation files unlinked, or ...?

`pg_amcheck --jobs=8 --startblock=0 --endblock=10`

# ah, darn, it's spewing lots of irrelevant errors because some relations are 
too short

`pg_amcheck --jobs=8 --startblock=0 --endblock=0`

# ah, darn, it's still spewing lots of irrelevant errors because I have lots of 
indexes with zero blocks of data

`pg_amcheck --jobs=8`

# ah, darn, it's taking forever, because it's processing huge tables in their 
entirety

I agree this can be left to later, and the --startblock and --endblock options 
are the wrong way to do it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Lowering the ever-growing heap->pd_lower

2021-03-09 Thread Mark Dilger



> On Mar 9, 2021, at 7:13 AM, Matthias van de Meent 
>  wrote:
> 
> Hi,
> 
> The heap AMs' pages only grow their pd_linp array, and never shrink
> when trailing entries are marked unused. This means that up to 14% of
> free space (=291 unused line pointers) on a page could be unusable for
> data storage, which I think is a shame. With a patch in the works that
> allows the line pointer array to grow up to one third of the size of
> the page [0], it would be quite catastrophic for the available data
> space on old-and-often-used pages if this could not ever be reused for
> data.
> 
> The shrinking of the line pointer array is already common practice in
> indexes (in which all LP_UNUSED items are removed), but this specific
> implementation cannot be used for heap pages due to ItemId
> invalidation. One available implementation, however, is that we
> truncate the end of this array, as mentioned in [1]. There was a
> warning at the top of PageRepairFragmentation about not removing
> unused line pointers, but I believe that was about not removing
> _intermediate_ unused line pointers (which would imply moving in-use
> line pointers); as far as I know there is nothing that relies on only
> growing page->pd_lower, and nothing keeping us from shrinking it
> whilst holding a pin on the page.
> 
> Please find attached a fairly trivial patch for which detects the last
> unused entry on a page, and truncates the pd_linp array to that entry,
> effectively freeing 4 bytes per line pointer truncated away (up to
> 1164 bytes for pages with MaxHeapTuplesPerPage unused lp_unused
> lines).
> 
> One unexpected benefit from this patch is that the PD_HAS_FREE_LINES
> hint bit optimization can now be false more often, increasing the
> chances of not having to check the whole array to find an empty spot.
> 
> Note: This does _not_ move valid ItemIds, it only removes invalid
> (unused) ItemIds from the end of the space reserved for ItemIds on a
> page, keeping valid linepointers intact.
> 
> 
> Enjoy,
> 
> Matthias van de Meent
> 
> [0] 
> https://www.postgresql.org/message-id/flat/cad21aod0ske11fmw4jd4renawbmcw1wasvnwpjvw3tvqpoq...@mail.gmail.com
> [1] 
> https://www.postgresql.org/message-id/CAEze2Wjf42g8Ho%3DYsC_OvyNE_ziM0ZkXg6wd9u5KVc2nTbbYXw%40mail.gmail.com
> 

For a prior discussion on this topic:

https://www.postgresql.org/message-id/2e78013d0709130606l56539755wb9dbe17225ffe90a%40mail.gmail.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Lowering the ever-growing heap->pd_lower

2021-03-09 Thread Mark Dilger



> On Mar 9, 2021, at 1:35 PM, Tom Lane  wrote:
> 
> So, to accept a patch that shortens the line pointer array, what we need
> to do is verify that every such code path checks for an out-of-range
> offset before trying to fetch the target line pointer.  I believed
> back in 2007 that there were, or once had been, code paths that omitted
> such a range check, assuming that they could trust the TID they had
> gotten from $wherever to point at an extant line pointer array entry.
> Maybe things are all good now, but I think you should run around and
> examine every place that checks for tuple deadness to see if the offset
> it used is known to be within the current page bounds.

Much as Pavan asked [1], I'm curious how we wouldn't already be in trouble if 
such code exists?  In such a scenario, what stops a dead line pointer from 
being reused (rather than garbage collected by this patch) prior to such 
hypothetical code using an outdated TID?

I'm not expressing a view here, just asking questions.

[1] 
https://www.postgresql.org/message-id/2e78013d0709130832t31244e79k9488a3e4eb00d64c%40mail.gmail.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-11 Thread Mark Dilger



> On Mar 11, 2021, at 12:12 AM, Peter Eisentraut 
>  wrote:
> 
> (I'm still not a fan of adding more client-side tools whose sole task is to 
> execute server-side functionality in a slightly filtered way, but it seems 
> people are really interested in this, so ...)
> 
> I want to register, if we are going to add this, it ought to be in src/bin/.  
> If we think it's a useful tool, it should be there with all the other useful 
> tools.

I considered putting it in src/bin/scripts where reindexdb and vacuumdb also 
live.  It seems most similar to those two tools.

> I realize there is a dependency on a module in contrib, and it's probably now 
> not the time to re-debate reorganizing contrib.  But if we ever get to that, 
> this program should be the prime example why the current organization is 
> problematic, and we should be prepared to make the necessary moves then.

Before settling on contrib/pg_amcheck as the location, I checked whether any 
tools under src/bin had dependencies on a contrib module, and couldn't find any 
current examples.  (There seems to have been one in the past, though I forget 
which that was at the moment.)

I have no argument with changing the location of this tool before it gets 
committed, but I wonder if we should do that now, or wait until some future 
time when contrib gets reorganized?  I can't quite tell which you prefer from 
your comments above.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-11 Thread Mark Dilger



> On Mar 11, 2021, at 3:36 AM, Andrey Borodin  wrote:
> 
> 
> 
>> 11 марта 2021 г., в 13:12, Peter Eisentraut 
>>  написал(а):
>> 
>> client-side tools whose sole task is to execute server-side functionality in 
>> a slightly filtered way
> 
> By the way, can we teach pg_amcheck to verify database without creating local 
> PGDATA and using bare minimum of file system quota?

pg_amcheck does not need a local data directory to check a remote database 
server, though it does need to connect to that server.  The local file system 
quota should not be a problem, as pg_amcheck does not download and save any 
data to disk.  I am uncertain if this answers your question.  If you are 
imagining pg_amcheck running on the same server as the database cluster, then 
of course running pg_amcheck puts a burden on the server to read all the 
relation files necessary, much as running queries over the same relations would 
do.

> We can implement a way for a pg_amcheck to ask for some specific file, which 
> will be downloaded by backup tool and streamed to pg_amcheck.
> E.g. pg_amcheck could have a restore_file_command = 'backup-tool 
> bring-my-file %backup_id %file_name' and probably 
> list_files_command='backup-tool list-files %backup_id'. And pg_amcheck could 
> then fetch bare minimum of what is needed.
> 
> I see that this is somewhat orthogonal idea, but from my POV interesting one.

pg_amcheck is not designed to detect corruption directly, but rather to open 
one or more connections to the database and execute sql queries which employ 
the contrib/amcheck sql functions.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-11 Thread Mark Dilger



> On Mar 11, 2021, at 9:10 AM, Andrey Borodin  wrote:
> 
> 
> 
>> 11 марта 2021 г., в 20:30, Mark Dilger  
>> написал(а):
>> 
>> 
>> pg_amcheck does not need a local data directory to check a remote database 
>> server, though it does need to connect to that server.
> No, I mean it it would be great if we did not need to materialise whole DB 
> anywhere. Let's say I have a backup of 10Tb cluster in S3. And don't have 
> that clusters hardware anymore. I want to spawn tiny VM with few GiBs of RAM 
> and storage no larger than biggest index within DB + WAL from start to end. 
> And stream-check all backup, mark it safe and sleep well. It would be perfect 
> if we could do backup verification at cost of corruption monitoring (and not 
> vice versa, which is trivial).

Thanks for clarifying.  I agree that would be useful.  I don't see any way to 
make that part of this project, but maybe after the v14 cycle you'll look over 
the code a propose a way forward for that?'

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger


> On Mar 12, 2021, at 1:43 PM, Robert Haas  wrote:
> 
> On Fri, Mar 12, 2021 at 2:31 PM Robert Haas  wrote:
>> I'll commit something shortly to address these.
> 
> There are some interesting failures in the test cases on the
> buildfarm. One of the tests ($offnum == 13) corrupts the TOAST pointer
> with a garbage value, expecting to get the message "final toast chunk
> number 0 differs from expected value 6". But on florican and maybe
> other systems we instead get "final toast chunk number 0 differs from
> expected value 5". That's because the value of TOAST_MAX_CHUNK_SIZE
> depends on MAXIMUM_ALIGNOF. I think that on 4-byte alignment systems
> it works out to 2000 and on 8-byte alignment systems it works out to
> 1996, and the value being stored is 1 bytes, hence the problem.
> The place where the calculation goes different seems to be in
> MaximumBytesPerTuple(), where it uses MAXALIGN_DOWN() on a value that,
> according to my calculations, will be 2038 on all platforms, but the
> output of MAXALIGN_DOWN() will be 2032 or 2036 depending on the
> platform. I think the solution to this is just to change the message
> to match \d+ chunks instead of exactly 6. We should do that right away
> to avoid having the buildfarm barf.
> 
> But, I also notice a couple of other things I think could be improved here:
> 
> 1. amcheck is really reporting the complete absence of any TOAST rows
> here due to a corrupted va_valueid. It could pick a better phrasing of
> that message than "final toast chunk number 0 differs from expected
> value XXX". I mean, there is no chunk 0. There are no chunks at all.
> 
> 2. Using S as the perl unpack code for the varlena header is
> not ideal, because it's really 2 1-byte fields followed by 4 4-byte
> fields. So I think you should be using CCllLL, for two unsigned bytes
> and then two signed 4-byte quantities and then two unsigned 4-byte
> quantities. I think if you did that you'd be overwriting the
> va_valueid with the *same* garbage value on every platform, which
> would be better than different ones. Perhaps when we improve the
> message as suggested in (1) this will become a live issue, since we
> might choose to say something like "no TOAST entries for value %u".
> 
> -- 
> Robert Haas
> EDB: http://www.enterprisedb.com


This does nothing to change the verbiage from contrib/amcheck, but it should 
address the problems discussed here in pg_amcheck's regression tests.



v1-0001-Fixing-portability-issues-in-pg_amcheck-regressio.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger


> On Mar 12, 2021, at 2:55 PM, Robert Haas  wrote:
> 
> On Fri, Mar 12, 2021 at 5:24 PM Mark Dilger
>  wrote:
>> This does nothing to change the verbiage from contrib/amcheck, but it should 
>> address the problems discussed here in pg_amcheck's regression tests.
> 
> Committed.

Thanks.

There are two more, attached here.  The first deals with error message text 
which differs between farm animals, and the second deals with an apparent 
problem with IPC::Run shell expanding an asterisk on some platforms but not 
others.  That second one, if true, seems like a problem with scope beyond the 
pg_amcheck project, as TestLib::command_checks_all uses IPC::Run, and it would 
be desirable to have consistent behavior across platforms.




v2-0001-Fixing-pg_amcheck-regression-test-portability-iss.patch
Description: Binary data


v2-0002-Working-around-apparent-difficulty-in-IPC-Run.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger



> On Mar 12, 2021, at 11:24 AM, Robert Haas  wrote:
> 
> On Fri, Mar 12, 2021 at 2:05 PM  wrote:
>> I think there is a formatting glitch in lines like:
>> 
>> 2/4 relations (50%) 187977/187978 pages (99%), (testdb   
>>   )
>> 
>> I suppose that last part should show up trimmed as '(testdb)', right?
> 
> Actually I think this is intentional. The idea is that as the line is
> rewritten we don't want the close-paren to move around. It's the same
> thing pg_basebackup does with its progress reporting.
> 
> Now that is not to say that some other behavior might not be better,
> just that Mark was copying something that already exists, probably
> because he knows that I'm finnicky about consistency

I think Erik's test case only checked one database, which might be why it 
looked odd to him.  But consider:

  pg_amcheck -d foo -d bar -d myreallylongdatabasename -d myshortername -d baz 
--progress

The tool will respect your database ordering, and check foo, then bar, etc.  If 
you have --jobs greater than one, it will start checking some relations in bar 
before finishing all relations in foo, but with a fudge factor, pg_amcheck can 
report that the processing has moved on to database bar, etc.

You wouldn't want the parens to jump around when the long database names get 
processed.  So it keeps the parens in the same location, space pads shorter 
database names, and truncates overlong database names.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger


> On Mar 12, 2021, at 3:24 PM, Mark Dilger  wrote:
> 
> and the second deals with an apparent problem with IPC::Run shell expanding 
> an asterisk on some platforms but not others.  That second one, if true, 
> seems like a problem with scope beyond the pg_amcheck project, as 
> TestLib::command_checks_all uses IPC::Run, and it would be desirable to have 
> consistent behavior across platforms.

The problem with IPC::Run appears to be real, though I might just need to wait 
longer for the farm animals to prove me wrong about that.  But there is a 
similar symptom caused by an unrelated problem, one entirely my fault and 
spotted by Robert.  Here is a patch:



v3-0001-Avoid-use-of-non-portable-option-ordering-in-comm.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger


> On Mar 12, 2021, at 5:16 PM, Robert Haas  wrote:
> 
> Gah, tests are so annoying. :-)

There is another problem of non-portable option ordering in the tests.



v4-0001-pg_amcheck-Keep-trying-to-fix-the-tests.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger



> On Mar 12, 2021, at 9:08 PM, Tom Lane  wrote:
> 
> I wrote:
>> Don't almost all of the following tests have the same issue?
> 
> Ah, nevermind, I was looking at an older version of 003_check.pl.
> I concur that 24189277f missed only one here.
> 
> Pushed your fix.
> 
>   regards, tom lane

Thanks!  Was just responding to your other email, but now I don't have to send 
it.

Sorry for painting so many farm animals red this evening.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger



> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
> 
> On Sat, Mar 13, 2021 at 01:07:15AM -0500, Tom Lane wrote:
>> I wrote:
>>>> ... btw, prairiedog (which has a rather old Perl) has a
>>>> different complaint:
>>>> Invalid type 'q' in unpack at t/004_verify_heapam.pl line 104.
>> 
>>> Hmm ... "man perlfunc" on that system quoth
>>>   q   A signed quad (64-bit) value.
>>>   Q   An unsigned quad value.
>>> (Quads are available only if your system supports 
>>> 64-bit
>>>  integer values _and_ if Perl has been compiled to 
>>> support those.
>>>  Causes a fatal error otherwise.)
>>> It does not seem unreasonable for us to rely on Perl having that
>>> in 2021, so I'll see about upgrading this perl installation.
>> 
>> Hm, wait a minute: hoverfly is showing the same failure, even though
>> it claims to be running a 64-bit Perl.  Now I'm confused.
> 
> On that machine:
> 
> [nm@power8-aix 7:0 2021-03-13T06:09:08 ~ 0]$ /usr/bin/perl64 -e 'unpack "q", 
> ""'
> [nm@power8-aix 7:0 2021-03-13T06:09:10 ~ 0]$ /usr/bin/perl -e 'unpack "q", ""'
> Invalid type 'q' in unpack at -e line 1.
> 
> hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 32-bit
> Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
> "prove64".)  This test should unpack the field as two 32-bit values, not a
> 64-bit value, to avoid requiring more from the Perl installation.

I will post a modified test in a bit that avoids using Q/q.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger



> On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
>>> hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 32-bit
>>> Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
>>> "prove64".)
> 
> Oh, that's annoying.
> 
>>> This test should unpack the field as two 32-bit values, not a
>>> 64-bit value, to avoid requiring more from the Perl installation.
> 
>> I will post a modified test in a bit that avoids using Q/q.
> 
> Coping with both endiannesses might be painful.

Not too bad if the bigint value is zero, as both the low and high 32bits will 
be zero, regardless of endianness.  The question is whether that gives up too 
much in terms of what the test is trying to do.  I'm not sure that it does, but 
if you'd rather solve this by upgrading perl, that's ok by me. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger


> On Mar 12, 2021, at 10:28 PM, Mark Dilger  
> wrote:
> 
> 
> 
>> On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
>> 
>> Mark Dilger  writes:
>>> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
>>>> hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 
>>>> 32-bit
>>>> Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
>>>> "prove64".)
>> 
>> Oh, that's annoying.
>> 
>>>> This test should unpack the field as two 32-bit values, not a
>>>> 64-bit value, to avoid requiring more from the Perl installation.
>> 
>>> I will post a modified test in a bit that avoids using Q/q.
>> 
>> Coping with both endiannesses might be painful.
> 
> Not too bad if the bigint value is zero, as both the low and high 32bits will 
> be zero, regardless of endianness.  The question is whether that gives up too 
> much in terms of what the test is trying to do.  I'm not sure that it does, 
> but if you'd rather solve this by upgrading perl, that's ok by me. 


I'm not advocating that this be the solution, but if we don't fix up the perl 
end of it, then this test change might be used instead.



v5-0001-pg_amcheck-continuing-to-fix-portability-problems.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-12 Thread Mark Dilger



> On Mar 12, 2021, at 10:36 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
>>> Coping with both endiannesses might be painful.
> 
>> Not too bad if the bigint value is zero, as both the low and high 32bits 
>> will be zero, regardless of endianness.  The question is whether that gives 
>> up too much in terms of what the test is trying to do.  I'm not sure that it 
>> does, but if you'd rather solve this by upgrading perl, that's ok by me. 
> 
> I don't mind updating the perl installations on prairiedog and gaur,
> but Noah might have some difficulty with his AIX flotilla, as I believe
> he's not sysadmin there.
> 
> You might think about using some symmetric-but-not-zero value,
> 0x01010101 or the like.

I thought about that, but I'm not sure that it proves much more than just using 
zero.  The test doesn't really do much of interest with this value, and it 
doesn't seem worth complicating the test.  The idea originally was that perl's 
"q" pack code would make reading/writing a number such as 12345678 easy, but 
since it's not easy, this is easy.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-03-13 Thread Mark Rofail
>
>
> This still fails for CI (windows) and me (linux)
>
> Can you provide more information about your Linux platform? So I may test
> the errors for myself?
>
Seems the new tests fails every CI. That’s good honestly, that we found
this problem.

The `arrays` regression test extensively test the new operators. So I think
the problem will be in when the operators are combined with the GIN index.

The problem is that the tests don’t fail on my linux build. Any idea how to
replicate the failed tests so I can debug?

/Mark


Re: pg_amcheck contrib application

2021-03-13 Thread Mark Dilger


> On Mar 13, 2021, at 6:50 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> I think it would be good to use a non-zero value here. We're doing a
>> lot of poking into raw bytes here, and if something goes wrong, a zero
>> value is more likely to look like something normal than whatever else.
>> I suggest picking a value where all 8 bytes are the same, but not
>> zero, and ideally chosen so that they don't look much like any of the
>> surrounding bytes.
> 
> Actually, it seems like we can let pack/unpack deal with byte-swapping
> within 32-bit words; what we lose by not using 'q' format is just the
> ability to correctly swap the two 32-bit words.  Hence, any value in
> which upper and lower halves are the same should work, say
> 0x1234567812345678.
> 
>   regards, tom lane

The heap tuple in question looks as follows, with  in the spot we're 
debating:

Tuple Layout (values in hex):

t_xmin: 223
t_xmax: 0
t_field3: 0
bi_hi: 0
bi_lo: 0
ip_posid: 1
t_infomask2: 3
t_infomask: b06
t_hoff: 18
t_bits: 0
a_1: 
a_2: 
b_header: 11# little-endian, will be 88 on big endian
b_body1: 61
b_body2: 62
b_body3: 63
b_body4: 64
b_body5: 65
b_body6: 66
b_body7: 67
c_va_header: 1
c_va_vartag: 12
c_va_rawsize: 2714
c_va_extsize: 2710

valueid and toastrelid are not shown, as they won't be stable.  Relying on 
t_xmin to be stable makes the test brittle, but fortunately that is separated 
from a_1 and a_2 far enough that we should not need to worry about it.

We want to use values that don't look like any of the others.  The complete set 
of nibbles in the values above is [012345678B], leaving the set [9ACDEF] 
unused.  The attached patch uses the value DEADF9F9 as it seems a little easier 
to remember than other permutations of those nibbles.



v6-0001-pg_amcheck-continuing-to-fix-portability-problems.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





amcheck hardening

2021-03-13 Thread Mark Dilger
Peter,

I'd like your advice on the following observations, if you'd be so kind:


Using the pg_amcheck command committed yesterday (thanks, Robert! thanks Tom!), 
I have begun investigating segfaults that sometimes occur when running the 
amcheck routines on intentionally corrupted databases.  We've discussed this 
before, and there are limits to how much hardening is possible, particularly if 
it comes at the expense of backend performance under normal conditions.  There 
are also serious problems with corruption schemes that differ from what is 
likely to go wrong in the wild.

These segfaults present a real usage problem for pg_amcheck.  We made the 
decision [3] to not continue checking if we get a FATAL or PANIC error.  
Getting a segfault in just one database while checking just one index can abort 
a pg_amcheck run that spans multiple databases, tables and indexes, and 
therefore is not easy to blow off.  Perhaps the decision in [3] was wrong, but 
if not, some hardening of amcheck might make this problem less common.

The testing strategy I'm using is to corrupt heap and btree pages in schema 
"public" of the "regression" database created by `make installcheck`, by 
overwriting random bytes in randomly selected locations on pages after the page 
header.  Then I run `pg_amcheck regression` and see if anything segfaults.  
Doing this repeatedly, with random bytes and locations within files not the 
same from one run to the next, I can find the locations of segfaults that are 
particularly common.

Admittedly, this is not what is likely to be wrong in real-world installations. 
 I had a patch as part of the pg_amcheck series that I ultimately abandoned for 
v14 given the schedule.  It reverts tables and indexes (or portions thereof) to 
prior states.  I'll probably move on to testing with that in a bit.


The first common problem [1] happens at verify_nbtree.c:1422 when a corruption 
report is being generated.  The generation does not seem entirely safe, and the 
problematic bit can be avoided, though I suspect you could do better than the 
brute-force solution I'm using locally:

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c4ca633918..fa8b7d5163 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1418,9 +1418,13 @@ bt_target_page_check(BtreeCheckState *state)

  OffsetNumberNext(offset));
itup = (IndexTuple) PageGetItem(state->target, itemid);
tid = BTreeTupleGetPointsToTID(itup);
+#if 0
nhtid = psprintf("(%u,%u)",
 
ItemPointerGetBlockNumberNoCheck(tid),
 
ItemPointerGetOffsetNumberNoCheck(tid));
+#else
+   nhtid = "(UNRESOLVED,UNRESOLVED)";
+#endif
 
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),

The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here.  I get much 
the same crash at verify_nbtree.c:1136, but it's so similar I'm not bother to 
include a crash report for it.


The second common problem [2] happens at verify_nbtree.c:2762 where it uses 
_bt_compare, which ends up calling pg_detoast_datum_packed on a garbage value.  
I'm not sure we can fix that at the level of _bt_compare, since that would have 
performance consequences on backends under normal conditions, but perhaps we 
could have a function that sanity checks datums, and call that from 
invariant_l_offset() prior to _bt_compare?  I have observed a variant of this 
crash where the text data is not toasted but crashes anyway:

0   libsystem_platform.dylib0x7fff738ec929 
_platform_memmove$VARIANT$Haswell + 41
1   postgres0x00010bf1af34 varstr_cmp + 532 
(varlena.c:1678)
2   postgres0x00010bf1b6c9 text_cmp + 617 
(varlena.c:1770)
3   postgres0x00010bf1bfe5 bttextcmp + 69 
(varlena.c:1990)
4   postgres0x00010bf68c87 FunctionCall2Coll + 167 
(fmgr.c:1163)
5   postgres0x00010b8a7cb5 _bt_compare + 1445 
(nbtsearch.c:721)
6   amcheck.so  0x00011525eaeb invariant_l_offset + 123 
(verify_nbtree.c:2758)
7   amcheck.so  0x00011525cd92 bt_target_page_check + 
4754 (verify_nbtree.c:1398)



[1]

 

postgres_2021-03-13-084305_laptop280-ma-us.crash
Description: Binary data


[2]



postgres_2021-03-13-094450_laptop280-ma-us.crash
Description: Binary data


[3] 
https://www.postgresql.org/message-id/CA%2BTgmob2c0eM8%2B5kzkXaqdc9XbBCkHmtihSOSk-jCzzT4BFhFQ%40mail.gmail.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-13 Thread Mark Dilger



> On Mar 13, 2021, at 10:46 AM, Noah Misch  wrote:
> 
> On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
>>> On Mar 12, 2021, at 3:24 PM, Mark Dilger  
>>> wrote:
>>> 
>>> and the second deals with an apparent problem with IPC::Run shell expanding 
>>> an asterisk on some platforms but not others.  That second one, if true, 
>>> seems like a problem with scope beyond the pg_amcheck project, as 
>>> TestLib::command_checks_all uses IPC::Run, and it would be desirable to 
>>> have consistent behavior across platforms.
> 
>>> One of pg_amcheck's regression tests was passing an asterisk through
>>> TestLib's command_checks_all() command, which gets through to
>>> pg_amcheck without difficulty on most platforms, but appears to get
>>> shell expanded on Windows (jacana) and AIX (hoverfly).
> 
> For posterity, I can't reproduce this on hoverfly.  The suite fails the same
> way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite passes
> even after reverting f371a4c.  Self-contained IPC::Run usage also does not
> corroborate the theory:
> 
> [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
> 'IPC::Run::run "printf", "%s\n", "*"'
> *
> [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
> 'IPC::Run::run "sh", "-c", "printf %sn *"'
> COPYRIGHT
> GNUmakefile.in
> HISTORY
> Makefile
> README
> README.git
> aclocal.m4
> config
> configure
> configure.ac
> contrib
> doc
> src
> 
>> there is a similar symptom caused by an unrelated problem
> 
>> Subject: [PATCH v3] Avoid use of non-portable option ordering in
>> command_checks_all().
> 
> On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this.

Thanks for investigating!

The reason I suspected that passing the '*' through IPC::Run had to do with the 
error that pg_amcheck gave.  It complained that too many arguments where being 
passed to it, and that the first such argument was "pg_amcheck.c".  There's no 
reason pg_amcheck should know it's source file name, nor that the regression 
test should know that, which suggested that the asterisk was being shell 
expanded within the src/bin/pg_amcheck/ directory and the file listing was 
being passed into pg_amcheck as arguments.

That theory may have been wrong, but it was the only theory I had at the time.  
I don't have access to any of the machines where that happened, so it is hard 
for me to investigate further.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: amcheck hardening

2021-03-13 Thread Mark Dilger



> On Mar 13, 2021, at 11:06 AM, Peter Geoghegan  wrote:
> 
> On Sat, Mar 13, 2021 at 10:35 AM Mark Dilger
>  wrote:
>> The testing strategy I'm using is to corrupt heap and btree pages in schema 
>> "public" of the "regression" database created by `make installcheck`, by 
>> overwriting random bytes in randomly selected locations on pages after the 
>> page header.  Then I run `pg_amcheck regression` and see if anything 
>> segfaults.  Doing this repeatedly, with random bytes and locations within 
>> files not the same from one run to the next, I can find the locations of 
>> segfaults that are particularly common.
> 
> That seems like a reasonable starting point to me.
> 
>> The first common problem [1] happens at verify_nbtree.c:1422 when a 
>> corruption report is being generated.  The generation does not seem entirely 
>> safe, and the problematic bit can be avoided
> 
>> The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here.
> 
> I think that the best way to further harden verify_nbtree.c at this
> point is to do the most basic validation of IndexTuples in our own new
> variant of a core bufpage.h macro: PageGetItemCareful(). In other
> words, we should invent the IndexTuple equivalent of the existing
> PageGetItemIdCareful() function (which already does the same thing for
> item pointers), and then replace all current PageGetItem() calls with
> PageGetItemCareful() calls -- we ban raw PageGetItem() calls from
> verify_nbtree.c forever.
> 
> Here is some of the stuff that could go in PageGetItemCareful(), just
> off the top of my head:
> 
> * The existing "if (tupsize != ItemIdGetLength(itemid))" check at the
> top of bt_target_page_check() -- make sure the length from the
> caller's line pointer agrees with IndexTupleSize().
> 
> * Basic validation against the index's tuple descriptor -- in
> particular, that varlena headers are basically sane, and that the
> apparent range of datums is safely within the space on the page for
> the tuple.
> 
> * Similarly, BTreeTupleGetHeapTID() should not be able to return a
> pointer that doesn't actually point somewhere inside the space that
> the target page has for the IndexTuple.
> 
> * No external TOAST pointers, since this is an index AM, and so
> doesn't allow that.
> 
> In general this kind of very basic validation should be pushed down to
> the lowest level code, so that it detects the problem as early as
> possible, before slightly higher level code has the opportunity to
> run. Higher level code is always going to be at risk of making
> assumptions about the data not being corrupt, because there is so much
> more of it, and also because it tends to roughly look like idiomatic
> AM code.

Excellent write-up!  Thanks!

I'll work on this and get a patch set going if time allows.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-14 Thread Mark Dilger



> On Mar 14, 2021, at 12:10 AM, Julien Rouhaud  wrote:
> 
> v6 attached, rebase only due to conflict with recent commit.

Hi Julien,

I'm coming to this patch quite late, perhaps too late to change design decision 
in time for version 14.


+   if (outdated && PQserverVersion(conn) < 14)
+   {
+   PQfinish(conn);
+   pg_log_error("cannot use the \"%s\" option on server versions 
older than PostgreSQL %s",
+"outdated", "14");
+   exit(1);
+   }

If detection of outdated indexes were performed entirely in the frontend 
(reindexdb) rather than in the backend (reindex command), would reindexdb be 
able to connect to older servers?  Looking quickly that the catalogs, it 
appears pg_index, pg_depend, pg_collation and a call to the SQL function 
pg_collation_actual_version() compared against pg_depend.refobjversion would be 
enough to construct a list of indexes in need of reindexing.  Am I missing 
something here?

I understand that wouldn't help somebody wanting to reindex from psql.  Is that 
the whole reason you went a different direction with this feature?



+   printf(_("  --outdated   only process indexes having 
outdated depencies\n"));  

typo.



+   bool outdated;  /* depends on at least on deprected collation? */

typo.



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-03-15 Thread Mark Dilger



> On Mar 2, 2021, at 6:08 AM, Pavel Borisov  wrote:
> 
> I completely agree that checking uniqueness requires looking at the heap, but 
> I don't agree that every caller of bt_index_check on an index wants that 
> particular check to be performed.  There are multiple ways in which an index 
> might be corrupt, and Peter wrote the code to only check some of them by 
> default, with options to expand the checks to other things.  This is why 
> heapallindexed is optional.  If you don't want to pay the price of checking 
> all entries in the heap against the btree, you don't have to.
> 
> I've got the idea and revised the patch accordingly. Thanks!
> Pfa v4 of a patch. I've added an optional argument to allow uniqueness checks 
> for the unique indexes.
> Also, I added a test variant to make them work on 32-bit systems. 
> Unfortunately, converting the regression test to TAP would be a pain for me. 
> Hope it can be used now as a 2-variant regression test for 32 and 64 bit 
> systems.
> 
> Thank you for your consideration!
> 
> -- 
> Best regards,
> Pavel Borisov
> 
> Postgres Professional: http://postgrespro.com
> 

Looking over v4, here are my review comments...

I created the file contrib/amcheck/amcheck--1.2--1.3.sql during the v14 
development cycle, so that is not released yet.  If your patch goes out in v14, 
does it need to create an amcheck--1.3--1.4.sql, or could you fit your changes 
into the 1.2--1.3.sql file?  (Does the project have a convention governing 
this?)  This is purely a question.  I'm not advising you to change anything 
here.

You need to update doc/src/sgml/amcheck.sgml to reflect the changes you made to 
the bt_index_check and bt_index_parent_check functions.

You need to update the recently committed src/bin/pg_amcheck project to include 
--checkunique as an option.  This client application already has flags for 
heapallindexed and rootdescend.  I can help with that if it isn't obvious what 
needs to be done.  Note that pg_amcheck/t contains TAP tests that exercise the 
options, so you'll need to extend code coverage to include this new option.


> On Mar 2, 2021, at 7:10 PM, Peter Geoghegan  wrote:
> 
> I don't think that it's acceptable for your new check to raise a
> WARNING instead of an ERROR.

You already responded to Peter, and I can see that after raising WARNINGs about 
an index, the code raises an ERROR.  That is different from behavior that 
pg_amcheck currently expects from contrib/amcheck functions.  It will be 
interesting to see if that makes integration harder.


> On Mar 2, 2021, at 6:54 PM, Peter Geoghegan  wrote:
> 
>> The regression test you provided is not portable.  I am getting lots of 
>> errors due to differing output of the form "page lsn=0/4DAD7E0".  You might 
>> turn this into a TAP test and use a regular expression to check the output.
> 
> I would test this using a custom opclass that does simple fault
> injection. For example, an opclass that indexes integers, but can be
> configured to dynamically make 0 values equal or unequal to each
> other. That's more representative of real-world problems.
> 
> You "break the warranty" by updating pg_index, even compared to
> updating other system catalogs. In particular, you break the
> "indcheckxmin wait -- wait for xmin to be old before using index"
> stuff in get_relation_info(). So it seems worse than updating
> pg_attribute, for example (which is something that the tests do
> already).

Take a look at src/bin/pg_amcheck/t/005_opclass_damage.pl for an example of 
changing an opclass to test btree index breakage.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







  1   2   3   4   5   6   7   8   9   10   >