from:"Mark"

> On Mar 7, 2022, at 10:28 AM, Tom Lane  wrote:
> 
> Does anything interesting break if you do just take it out?

 SET SESSION AUTHORIZATION regress_priv_group2;
 GRANT regress_priv_group2 TO regress_priv_user5; -- ok: a role can self-admin
-NOTICE:  role "regress_priv_user5" is already a member of role 
"regress_priv_group2"
+ERROR:  must have admin option on role "regress_priv_group2"

This test failure is just a manifestation of the intended change, but assuming 
we make no other changes, the error message would clearly need to be updated, 
because it suggests the role should have admin_option on itself, a situation 
which is not currently supported.

Perhaps we should support that, though, by adding a reflexive aclitem[] to 
pg_authid (meaning it tracks which privileges a role has on itself) with 
tracking of who granted it, so that revocation can be handled properly.  The 
aclitem could start out null, meaning the role has by default the traditional 
limited self-admin which the code comments discuss:

/*
 * A role can admin itself when it matches the session user and we're
 * outside any security-restricted operation, SECURITY DEFINER or
 * similar context.  SQL-standard roles cannot self-admin.  However,
 * SQL-standard users are distinct from roles, and they are not
 * grantable like roles: PostgreSQL's role-user duality extends the
 * standard.  Checking for a session user match has the effect of
 * letting a role self-admin only when it's conspicuously behaving
 * like a user.  Note that allowing self-admin under a mere SET ROLE
 * would make WITH ADMIN OPTION largely irrelevant; any member could
 * SET ROLE to issue the otherwise-forbidden command.
 *
 * Withholding self-admin in a security-restricted operation prevents
 * object owners from harnessing the session user identity during
 * administrative maintenance.  Suppose Alice owns a database, has
 * issued "GRANT alice TO bob", and runs a daily ANALYZE.  Bob creates
 * an alice-owned SECURITY DEFINER function that issues "REVOKE alice
 * FROM carol".  If he creates an expression index calling that
 * function, Alice will attempt the REVOKE during each ANALYZE.
 * Checking InSecurityRestrictedOperation() thwarts that attack.
 *
 * Withholding self-admin in SECURITY DEFINER functions makes their
 * behavior independent of the calling user.  There's no security or
 * SQL-standard-conformance need for that restriction, though.
 *
 * A role cannot have actual WITH ADMIN OPTION on itself, because that
 * would imply a membership loop.  Therefore, we're done either way.
 */

For non-null aclitem[], we could support REVOKE ADMIN OPTION FOR joe FROM joe, 
and for explicit re-grants, we could track who granted it, such that further 
revocations could properly refuse if the revoker doesn't have sufficient 
privileges vis-a-vis the role that granted it in the first place.

I have not yet tried to implement this, and might quickly hit problems with the 
idea, but will take a stab at a proof-of-concept patch unless you suggest a 
better approach.

Thoughts?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 7, 2022, at 12:01 PM, Robert Haas  wrote:
> 
> It's been pointed out upthread that this would have undesirable
> security implications, because the admin option would be inherited,
> and the implicit permission isn't.

Right, but with a reflexive self-admin-option, we could document that it works 
in a non-inherited way.  We'd just be saying the current hard-coded behavior is 
an option which can be revoked rather than something you're stuck with.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 7, 2022, at 12:03 PM, Mark Dilger  wrote:
> 
> Right, but with a reflexive self-admin-option, we could document that it 
> works in a non-inherited way.  We'd just be saying the current hard-coded 
> behavior is an option which can be revoked rather than something you're stuck 
> with.

We could also say that the default is to not have admin option on yourself, 
with that being something grantable, but that is a larger change from the 
historical behavior and might have more consequences for dump/restore, etc.

My concern about just nuking self-admin is that there may be sites which use 
self-admin and we'd be leaving them without a simple work-around after upgrade, 
because they couldn't restore the behavior by executing a grant.  They'd have 
to more fundamentally restructure their role relationships to not depend on 
self-admin, something which might be harder for them to do.  Perhaps nobody is 
using self-admin, or very few people are using it, and I'm being overly 
concerned.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 7, 2022, at 12:16 PM, Tom Lane  wrote:
> 
> What would be
> lost if we drop it?

I looked into this a bit.  Removing that bit of code, the only regression test 
changes for "check-world" are the expected ones, with nothing else breaking.  
Running installcheck+pg_upgrade to the patched version of HEAD from each of 
versions 11, 12, 13 and 14 doesn't turn up anything untoward.  The change I 
used (for reference) is attached:

v1-0001-WIP-Removing-role-self-administration.patch.WIP
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

2022-03-10 Thread Mark Dilger

> On Mar 10, 2022, at 7:56 AM, David G. Johnston  
> wrote:
> 
> 
> I want that because I want mini-superusers, where alice can administer
> the users that alice creates just as if she were a superuser,
> including having their permissions implicitly and dropping them when
> she wants them gone, but where alice cannot break out to the operating
> system as a true superuser could do.
> 
> CREATEROLE (once the pg_* with admin rules are fixed) + Ownership and rules 
> restricting interfering with another role's objects (unless superuser) seems 
> to handle this.

What if one of alice's subordinates also owns roles?  Can alice interfere with 
*that* role's objects?  I don't see that a simple rule restricting roles from 
interfering with another role's objects is quite enough.  That raises the 
question of whether role ownership is transitive, and whether we need a concept 
similar to inherit/noinherit for ownership.

There is also the problem that CREATEROLE currently allows a set of privileges 
to be granted to created roles, and that set of privileges is hard-coded.  
You've suggested changing the hard-coded rules to remove pg_* roles from the 
list of grantable privileges, but that's still an inflexible set of hardcoded 
privileges.  Wouldn't it make more sense for the grantor to need GRANT OPTION 
on any privilege they give to roles they create?

> the bot can stand up
> accounts, can grant them membership in a defined set of groups, but
> cannot exercise the privileges of those accounts (or hack superuser
> either).
> 
> The bot should be provided a security definer procedure that encapsulates all 
> of this rather than us trying to hack the permission system.  This isn't a 
> user permission concern, it is an unauthorized privilege escalation concern.  
> Anyone with the bot's credentials can trivially overcome the third 
> restriction by creating a role with the desired membership and then logging 
> in as that role - and there is nothing the system can do to prevent that 
> while also allowing the other two permissions.

Doesn't this assume password authentication?  If the server uses ldap 
authentication, for example, wouldn't the bot need valid ldap credentials for 
at least one user for this attack to work?  And if CREATEROLE has been made 
more configurable, wouldn't the bot only be able to grant that ldap user the 
limited set of privileges that the bot's database user has been granted ADMIN 
OPTION for?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

2022-03-10 Thread Mark Dilger

> On Mar 10, 2022, at 2:01 PM, Robert Haas  wrote:
> 
> It sounds like you prefer a behavior where CREATEROLE gives power over
> all non-superusers, but that seems pretty limiting to me. Why can't
> someone want to create a user with power over some users but not
> others?

I agree with Robert on this.

Over at [1], I introduced a patch series to (a) change CREATEROLE and (b) 
introduce role ownership.  Part (a) wasn't that controversial.  The patch 
series failed to make it for postgres 15 on account of (b).  The patch didn't 
go quite far enough, but with it applied, this is an example of a min-superuser 
"lord" operating within database "fiefdom":

fiefdom=# -- mini-superuser who can create roles and write all data
fiefdom=# CREATE ROLE lord
fiefdom-# WITH CREATEROLE
fiefdom-# IN ROLE pg_write_all_data;
CREATE ROLE
fiefdom=# 
fiefdom=# -- group which "lord" belongs to
fiefdom=# CREATE GROUP squire
fiefdom-# ROLE lord;
CREATE ROLE
fiefdom=# 
fiefdom=# -- group which "lord" has no connection to
fiefdom=# CREATE GROUP paladin;
CREATE ROLE
fiefdom=# 
fiefdom=# SET SESSION AUTHORIZATION lord;
SET
fiefdom=> 
fiefdom=> -- fail, merely a member of "squire"
fiefdom=> CREATE ROLE peon IN ROLE squire;
ERROR:  must have admin option on role "squire"
fiefdom=> 
fiefdom=> -- fail, no privilege to grant CREATEDB 
fiefdom=> CREATE ROLE peon CREATEDB;
ERROR:  must have createdb privilege to create createdb users
fiefdom=> 
fiefdom=> RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- grant admin over "squire" to "lord"
fiefdom=# GRANT squire
fiefdom-# TO lord
fiefdom-# WITH ADMIN OPTION;
GRANT ROLE
fiefdom=# 
fiefdom=# SET SESSION AUTHORIZATION lord;
SET
fiefdom=> 
fiefdom=> -- ok, have both "CREATEROLE" and admin option for "squire"
fiefdom=> CREATE ROLE peon IN ROLE squire;
CREATE ROLE
fiefdom=> 
fiefdom=> -- fail, no privilege to grant CREATEDB
fiefdom=> CREATE ROLE peasant CREATEDB IN ROLE squire;
ERROR:  must have createdb privilege to create createdb users
fiefdom=> 
fiefdom=> RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- Give lord the missing privilege
fiefdom=# GRANT CREATEDB TO lord;
ERROR:  role "createdb" does not exist
fiefdom=# 
fiefdom=# RESET SESSION AUTHORIZATION;
RESET
fiefdom=# 
fiefdom=# -- ok, have "CREATEROLE", "CREATEDB", and admin option for "squire"
fiefdom=# CREATE ROLE peasant CREATEDB IN ROLE squire;
CREATE ROLE

The problem with this is that "lord" needs CREATEDB to grant CREATEDB, but 
really it should need something like grant option on "CREATEDB".  But that's 
hard to do with the existing system, given the way these privilege bits are 
represented.  If we added a few more built-in pg_* roles, such as pg_create_db, 
it would just work.  CREATEROLE itself could be reimagined as pg_create_role, 
and then users could be granted into this role with or without admin option, 
meaning they could/couldn't further give it away.  I think that would be a 
necessary component to Joshua's "bot" use-case, since the bot must itself have 
the privilege to create roles, but shouldn't necessarily be trusted with the 
privilege to create additional roles who have it.

[1] 
https://www.postgresql.org/message-id/53c7df4c-8463-4647-9dfd-779b5e186...@amazon.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 11, 2022, at 7:58 AM, Robert Haas  wrote:
> 
> This kind of thing is always a judgement call. If we were talking
> about breaking 'SELECT * from table', I'm sure it would be hard to
> convince anybody to agree to do that at all, let alone with no
> deprecation period. Fortunately, CREATEROLE is less used, so breaking
> it will inconvenience fewer people.

This issue of how much backwards compatibility breakage we're willing to 
tolerate is just as important as questions about how we would want roles to 
work in a green-field development project.  The sense I got a year ago, on this 
list, was that changing CREATEROLE was acceptable, but changing other parts of 
the system, such as how ADMIN OPTION works, would go too far.

Role ownership did not yet exist, and that was a big motivation in introducing 
that concept, because you couldn't credibly say it broke other existing 
features.  It introduces the new notion that when a superuser creates a role, 
the superuser owns it, which is identical to how things implicitly work today; 
and when a CREATEROLE non-superuser creates a role, that role owns the new 
role, which is different from how it works today, arguably breaking 
CREATEROLE's prior behavior.  *But it doesn't break anything else*.

If we're going to change how ADMIN OPTION works, or how role membership works, 
or how inherit/noinherit works, let's first be clear that we are willing to 
accept whatever backwards incompatibility that entails.  This is not a 
green-field development project.  The constant spinning around with regard to 
how much compatibility we need to preserve is giving me vertigo.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 11, 2022, at 8:48 AM, Stephen Frost  wrote:
> 
> I agree that it would have an impact on backwards compatibility to
> change how WITH ADMIN works- but it would also get us more in line with
> what the SQL standard says for how WITH ADMIN is supposed to work and
> that seems worth the change to me.

I'm fine with giving up some backwards compatibility to get some SQL standard 
compatibility, as long as we're clear that is what we're doing.  What you say 
about the SQL spec isn't great, though, because too much power is vested in 
"ADMIN".  I see "ADMIN" as at least three separate privileges together.  Maybe 
it would be spec compliant to implement "ADMIN" as a synonym for a set of 
separate privileges? 

> On Mar 11, 2022, at 8:41 AM, Stephen Frost  wrote:
> 
> That we aren't discussing the issues with the current GRANT ... WITH
> ADMIN OPTION and how we deviate from what the spec calls for when it
> comes to DROP ROLE, which seems to be the largest thing that's
> 'solved' with this ownership concept, is concerning to me.

Sure, let's discuss that a bit more.  Here is my best interpretation of your 
post about the spec, when applied to postgres with an eye towards not doing any 
more damage than necessary:

> On Mar 10, 2022, at 11:58 AM, Stephen Frost  wrote:
> 
> let's look at what the spec says:
> 
> CREATE ROLE
>  - Who is allowed to run CREATE ROLE is implementation-defined

This should be anyone with membership in pg_create_role.  

>  - After creation, this is effictively run:
>GRANT new_role TO creator_role WITH ADMIN, GRANTOR "_SYSTEM"

This should internally be implemented as three separate privileges, one which 
means you can grant the role, another which means you can drop the role, and a 
third that means you're a member of the role.  That way, they can be 
independently granted and revoked.  We could make "WITH ADMIN" a short-hand for 
"WITH G, D, M" where G, D, and M are whatever we name the independent 
privileges Grant, Drop, and Member-of.

Splitting G and D helps with backwards compatibility, because it gives people 
who want the traditional postgres "admin" a way to get there, by granting 
"G+M".  Splitting M from G and D makes it simpler to implement the "bot" idea, 
since the bot shouldn't have M.  But it does raise a question about always 
granting G+D+M to the creator, since the bot is the creator and we don't want 
the bot to have M.  This isn't a problem I've invented from thin air, mind you, 
as G+D+M is just the definition of ADMIN per the SQL spec, if I've understood 
you correctly.  So we need to think a bit more about the pg_create_role 
built-in role and whether that needs to be further refined to distinguish those 
who can get membership in roles they create vs. those who cannot.  This line of 
reasoning takes me in the direction of what I think you were calling #5 
upthread, but you'd have to elaborate on that, and how it interacts with the 
spec, for us to have a useful conversation about it.

> DROP ROLE
>  - Any user who has been GRANT'd a role with ADMIN option is able to
>DROP that role.

Change this to "Any role who has D on the role".  That's spec compliant, 
because anyone granted ADMIN necessarily has D.

> GRANT ROLE
>  - No cycles allowed
>  - A role must have ADMIN rights on the role to be able to GRANT it to
>another role.

Change this to "Any role who has G on the role".  That's spec compliant, 
because anyone grant ADMIN necessarily has G.

We should also fix the CREATE ROLE command to require the grantor have G on a 
role in order to give it to the new role as part of the command.  Changing the 
CREATEROLE, CREATEDB, REPLICATION, and BYPASSRLS attributes into 
pg_create_role, pg_create_db, pg_replication, and pg_bypassrls, the creator 
could only give them to the created role if the creator has G on the roles.  If 
we do this, we could keep the historical privilege bits and their syntax 
support for backward compatibility, or we could rip them out, but the decision 
between those two options is independent of the rest of the design.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 11, 2022, at 2:46 PM, Stephen Frost  wrote:
> 
> I do think that’s reasonable … and believe I suggested it about 3 messages 
> ago in this thread. ;)  (step #3 I think it was?  Or maybe 4).

Yes, and you mentioned it to me off-list.

I'm soliciting a more concrete specification for what you are proposing.  To 
me, that means understanding how the SQL spec behavior that you champion 
translates into specific changes.  You specified some of this in steps #1 
through #5, but I'd like a clearer indication of how many of those (#1 alone, 
both #1 and #2, or what?) constitute a competing idea to the idea of role 
ownership, and greater detail about how each of those steps translate into 
specific behavior changes in postgres.  Your initial five-step email seems to 
be claiming that #1 by itself is competitive, but to me it seems at least #1 
and #2 would be required.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

> On Mar 11, 2022, at 4:56 PM, Stephen Frost  wrote:
> 
> First … I outlined a fair bit of further description in the message you just 
> responded to but neglected to include in your response, which strikes me as 
> odd that you’re now asking for further explanation.

>  When it comes to completing the idea of role ownership- I didn’t come up 
> with that idea nor champion it

Sorry, not "completing", but "competing".  It seems we're discussing different 
ways to fix how roles and CREATEROLE work, and we have several ideas competing 
against each other.  (This differs from *people* competing against each other, 
as I don't necessarily like the patch I wrote better than I like your idea.)

> and therefore I’m not really sure how many of the steps are required to fully 
> support that concept..?

There are problems that the ownership concepts solve.  I strongly suspect that 
your proposal could also solve those same problems, and just trying to identify 
the specific portions of your proposal necessary to do so.

>  For my part, I would think that those steps necessary to satisfy the spec 
> would get us pretty darn close to what true folks advocating for role 
> ownership are asking for

I have little idea what "true folks" means in this context.  As for "advocating 
for role ownership", I'm not in that group.  Whether role ownership or 
something else, I just want some solution to a set of problems, mostly to do 
with needing superuser to do role management tasks.

> , but that doesn’t include the superuser-only alter role attributes piece.  
> Is that included in role ownership?  I wouldn’t think so, but some might 
> argue otherwise, and I don’t know that it is actually useful to divert into a 
> discussion about what is or isn’t in that.

Introducing the idea of role ownership doesn't fix that.  But a patch which 
introduces role ownership is useless unless CREATEROLE is also fixed.  There 
isn't any point having non-superusers create and own roles if, to do so, they 
need a privilege which can break into superuser.  But that argument is no 
different with a patch along the lines of what you are proposing.  CREATEROLE 
needs fixing either way.

> If we agree that the role attribute bits are independent

Yes.

> then I think I agree that we need 1 and 2 to get the capabilities that the 
> folks asking for role ownership want

Yes.

> as 2 is where we make sure that one admin of a role can’t revoke another 
> admin’s rights over that role.

Exactly, so #2 is part of the competing proposal.  (I get the sense you might 
not see these as competing proposals, but I find that framing useful for 
deciding which approach to pursue.)

>  Perhaps 2 isn’t strictly necessary in a carefully managed environment where 
> no one else is given admin rights over the mini-superuser roles, but I’d 
> rather not have folks depending on that.

I think it is necessary, and for the reason you say.

>  I’d still push back though and ask those advocating for this if it meets 
> what they’re asking for.  I got the impression that it did but maybe I 
> misunderstood.
> 
> In terms of exactly how things would work with these changes… I thought I 
> explained it pretty clearly, so it’s kind of hard to answer that further 
> without a specific question to answer.  Did you have something specific in 
> mind?  Perhaps I could answer a specific question and provide more clarity 
> that way.

Your emails contained a lot of "we could do this or that depending on what 
people want, and maybe this other thing, but that isn't really necessary, and 
" which left me unclear on the proposal.  I don't mean to disparage your 
communication style; it's just that when trying to distill technical details, 
high level conversation can be hard to grok.

I have the sense that you aren't going to submit a patch, so I wanted this 
thread to contain enough detail for somebody else to do so.  Thanks.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

2022-03-14 Thread Mark Dilger




> On Mar 14, 2022, at 7:38 AM, Stephen Frost  wrote:
> 
> So ... do you feel like that's now the case?  Or were you looking for
> more?

I don't have any more questions at the moment.  Thanks!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg14 psql broke \d datname.nspname.relname

2022-03-15 Thread Mark Dilger




> On Mar 15, 2022, at 12:27 PM, Robert Haas  wrote:
> 
> - Justin Pryzby, who originally discovered the problem, prefers the
> same behavior that I prefer long-term, but thinks Tom's behavior is
> better than doing nothing.
> - Mark Dilger, Isaac Moreland, Garick Hamlin, Alvaro Herrera, and
> Julien Rouhaud have commented on the thread but have not endorsed
> either of these dueling proposals.

I vote in favor of committing the patch, though I'd also say it's not super 
important to me.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-16 Thread Mark Dilger

> On Mar 16, 2022, at 11:47 AM, Tom Lane  wrote:
> 
> Stepping back a bit ... do we really want to institutionalize the
> term "setting" for GUC variables?  I realize that the view pg_settings
> exists, but the documentation generally prefers the term "configuration
> parameters".  Where config.sgml uses "setting" as a noun, it's usually
> talking about a specific concrete value for a parameter, and you can
> argue that the view's name comports with that meaning.  But you can't
> GRANT a parameter's current value.
> 
> I don't have a better name to offer offhand --- I surely am not proposing
> that we change the syntax to be "GRANT ... ON CONFIGURATION PARAMETER x",
> because that's way too wordy.  But now is the time to bikeshed if we're
> gonna bikeshed, or else admit that we're not interested in precise
> vocabulary.

Informally, we often use "GUC" on this list, but that isn't used formally, 
leaving "configuration parameter" and "setting" as the two obvious choices.  I 
preferred "configuration parameter" originally and was argued out of it.  My 
take on "setting" was also that it more naturally refers to the choice of 
setting, not the thing being set, such that "work_mem = 8192" means the 
configuration parameter "work_mem" has the setting "8192".  I'm happy to change 
the patch along these lines, but I vaguely recall it being Robert who liked the 
"setting" language.  Robert?  (Sorry if I misremember that...)

> I'm also fairly allergic to the way that this patch has decided to assign
> multi-word names to privilege types (ie SET VALUE, ALTER SYSTEM).  There
> is no existing precedent for that, and I think it's going to break
> client-side code that we don't need to break.  It's not coincidental that
> this forces weird changes in rules about whitespace in the has_privilege
> functions, for example; and if you think that isn't going to cause
> problems I think you are wrong.  Perhaps we could just use "SET" and
> "ALTER", or "SET" and "SYSTEM"?

I chose those particular multi-word names to avoid needing to promote any 
keywords.  That's been a while ago, and I don't recall exactly where the 
shift-reduce or reduce-reduce errors were coming from.  I'll play with it to 
see what I can find.

> I also agree with the upthread criticism that this is abusing the
> ObjectPostAlterHook API unreasonably much.  In particular, it's
> trying to use pg_setting_acl OIDs as identities for GUCs, which
> they are not, first because they're unstable and second because
> most GUCs won't even have an entry there.  I therefore judge the
> hook calls added to ExecSetVariableStmt and AlterSystemSetConfigFile
> to be 100% useless, in fact probably counterproductive because they
> introduce a boatload of worries about whether the right things happen
> if the hook errors out or does something guc.c isn't expecting.

I think Joshua was planning to use these hooks for security purposes.  The 
hooks are supposed to check whether the Oid is valid, and if not, still be able 
to make choices based on the other information.  Joshua, any comment on this?

> I suggest that what might be saner is to consider that the "objects"
> that the hook calls are concerned with are the pg_setting_acl entries,
> not the underlying GUCs, and thus that the hooks need be invoked only
> when creating, destroying or altering those entries.

That's certainly a thing we could do, but I got the impression that Joshua 
wanted to hook into SET, RESET, and ALTER SYSTEM SET, not merely into GRANT and 
REVOKE.

>  If we do have
> a need for a hook editorializing on GUC value settings, that would
> have to be an independent API --- but I have heard no calls for
> the ability to have such a hook, and I don't think that this patch
> is the place to introduce one.

Well, the call for it was in this thread, but I'm ok with yanking out that part 
of the patch if it seems half-baked.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-17 Thread Mark Dilger

> On Mar 17, 2022, at 9:04 AM, Robert Haas  wrote:
> 
> not just in terms of telling other people how they have to write
> their patches.

...

>  the burden of maintaining relatively little-used features
> should fall entirely on people who don't care about them.

Joshua helped test the DAC portion of this patch, and answered a number of my 
questions on the topic, including in-person back in December.  I take your 
point, Robert, on the general principle, but the archives should reflect that 
Joshua did contribute to this specific patch.

Joshua, should we drop the DAC portion for postgres 15 and revisit the issue 
for postgres 16?  I think it's getting late in the development cycle to attempt 
what Tom described upthread, and I'd hate to see the rest of this patch punted.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-17 Thread Mark Dilger




> On Mar 17, 2022, at 9:29 AM, Joshua Brindle  
> wrote:
> 
> I hope this patch can be rolled
> into the contribution from Mark.

Working on it Thanks for the patch!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

New Object Access Type hooks

2022-03-17 Thread Mark Dilger

Hackers,

Over in [1], Joshua proposed a new set of Object Access Type hooks based on
strings rather than Oids.

His patch was written to be applied atop my patch for granting privileges on
gucs.

On review of his patch, I became uncomfortable with the complete lack of
regression test coverage. To be fair, he did paste a bit of testing logic to
the thread, but it appears to be based on pgaudit, and it is unclear how to
include such a test in the core project, where pgaudit is not assumed to be
installed.

First, I refactored his patch to work against HEAD and not depend on my GUCs
patch. Find that as v1-0001. The refactoring exposed a bit of a problem. To
call the new hook for SET and ALTER SYSTEM commands, I need to pass in the Oid
of a catalog table. But since my GUC patch isn't applied yet, there isn't any
such table (pg_setting_acl or whatnot) to pass. So I'm passing InvalidOid, but
I don't know if that is right. In any event, if we want a new API like this,
we should think a bit harder about whether it can be used to check operations
where no table Oid is applicable.

Second, I added a new test directory, src/test/modules/test_oat_hooks, which
includes a new loadable module with hook implementations and a regression test
for testing the object access hooks. The main point of the test is to log
which hooks get called in which order, and which hooks do or do not get called
when other hooks allow or deny access. That information shows up in the
expected output as NOTICE messages.

This second patch has gotten a little long, and I'd like another pair of eyes
on this before spending a second day on the effort. Please note that this is a
quick WIP patch in response to the patch Joshua posted earlier today. Sorry
for sometimes missing function comments, etc. The goal, if this design seems
acceptable, is to polish this, hopefully with Joshua's assistance, and get it
committed *before* my GUCs patch, so that my patch can be rebased to use it.
Otherwise, if this is rejected, I can continue on the GUC patch without this.

(FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl when
testing v1-0001. I'm not sure yet what that is about.)

v1-0001-Add-String-object-access-hooks.patch
Description: Binary data

v1-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data

[1]
https://www.postgresql.org/message-id/flat/664799.1647456444%40sss.pgh.pa.us#c9721c2da88d59684ac7ac5fc36f09c1

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks

2022-03-18 Thread Mark Dilger

> On Mar 18, 2022, at 7:16 AM, Joshua Brindle  
> wrote:
> 
> This is great, thank you for doing this. I didn't even realize the OAT
> hooks had no regression tests.
> 
> It looks good to me, I reviewed both and tested the module. I wonder
> if the slight abuse of subid is warranted with brand new hooks going
> in but not enough to object, I just hope this doesn't rise to the too
> large to merge this late level.

The majority of the patch is regression testing code, stuff which doesn't get 
installed.  It's even marked as NO_INSTALLCHECK, so it won't get installed even 
as part of "make installcheck".  That seems safe enough to me.

Not including tests of OAT seems worse, as it leaves us open to breaking the 
behavior without realizing we've done so.  A refactoring of the core code might 
cause hooks to be called in a different order, something which isn't 
necessarily wrong, but should not be done unknowingly.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks



> On Mar 18, 2022, at 3:04 PM, Andrew Dunstan  wrote:
> 
> I haven't looked at it in detail, but regarding the test code I'm not
> sure why there's a .control file, since this isn't a loadable extension,
> not why there's a test_oat_hooks.h file.

The .control file exists because the test defines a loadable module which 
defines the hooks.  The test_oat_hooks.h file was extraneous, and has been 
removed in v2.



v2-0001-Add-String-object-access-hooks.patch
Description: Binary data


v2-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks




> On Mar 21, 2022, at 8:41 AM, Andrew Dunstan  wrote:
> 
> My first inclination is to say it's probably ok. The immediately obvious
> alternative would be to create yet another set of functions that don't
> have classId parameters. That doesn't seem attractive.
> 
> Modulo that issue I think patch 1 is basically ok, but we should fix the
> comments in objectaccess.c.  Rather than "It is [the] entrypoint ..." we
> should have something like "Oid variant entrypoint ..." and "Name
> variant entrypoint ...", and also fix the function names in the comments.

Joshua,

Do you care to create a new version of this, perhaps based on the v2-0001 patch 
I just posted?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks



> On Mar 21, 2022, at 1:30 PM, Andrew Dunstan  wrote:
> 
> To the best of my knowledge .control files are only used by extensions,
> not by other modules. They are only referenced in
> src/backend/commands/extension.c in the backend code. For example,
> auto_explain which is a loadable module but not en extension does not
> have one, and I bet if you remove it you'll find this will work just fine.

Fixed, also with adjustments to Joshua's function comments.



v3-0001-Add-String-object-access-hooks.patch
Description: Binary data


v3-0002-Add-regression-tests-of-Object-Access-Type-hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg14 psql broke \d datname.nspname.relname



> On Mar 21, 2022, at 6:12 PM, Andres Freund  wrote:
> 
> Needs another one: http://cfbot.cputube.org/patch_37_3367.log
> 
> Marked as waiting-on-author.



v6-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks




> On Mar 21, 2022, at 10:03 PM, Thomas Munro  wrote:
> 
> On Fri, Mar 18, 2022 at 4:22 PM Mark Dilger
>  wrote:
>> (FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl 
>> when testing v1-0001.  I'm not sure yet what that is about.)
> 
> Doesn't look like 0001 has anything to do with that...   Are you on a
> Mac?

Yes, macOS Catalina, currently 10.15.7.
  
>  Did it look like this recent failure from CI?
> 
> https://cirrus-ci.com/task/4686108033286144
> https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/regress_log_013_crash_restart
> https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/013_crash_restart_primary.log

I no longer have the logs for comparison.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks




> On Mar 22, 2022, at 8:14 AM, Julien Rouhaud  wrote:
> 
> Hi,
> 
> On Tue, Mar 22, 2022 at 10:41:05AM -0400, Andrew Dunstan wrote:
>> 
>> Pushed with slight adjustments - the LOAD was unnecessary as was the
>> setting of client_min_messages - the latter would have made buildfarm
>> animals unhappy.
> 
> For the record this just failed on my buildfarm animal:
> https://brekka.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2022-03-22%2014%3A40%3A10&stg=misc-check.

culicidae is complaining:

==~_~===-=-===~_~== 
pgsql.build/src/test/modules/test_oat_hooks/log/postmaster.log 
==~_~===-=-===~_~==
2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  starting PostgreSQL 
15devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 11.2.0-18) 11.2.0, 
64-bit
2022-03-22 14:53:27.175 UTC [2166986][postmaster][:0] LOG:  listening on Unix 
socket "/tmp/pg_regress-RiE7x8/.s.PGSQL.6280"
2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.202 UTC [2167006][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.203 UTC [2167009][not initialized][:0] FATAL:  
test_oat_hooks must be loaded via shared_preload_libraries
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  checkpointer 
process (PID 2167006) exited with exit code 1
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  terminating any 
other active server processes
2022-03-22 14:53:27.204 UTC [2166986][postmaster][:0] LOG:  shutting down 
because restart_after_crash is off
2022-03-22 14:53:27.206 UTC [2166986][postmaster][:0] LOG:  database system is 
shut down
==~_~===-=-===~_~== pgsql.build/src/test/modules/test_rls_hooks/log/initdb.log 
==~_~===-=-===~_~==


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks

> On Mar 22, 2022, at 9:09 AM, Andrew Dunstan  wrote:
> 
> If I can't com up with a very quick fix I'll revert it.

The problem is coming from the REGRESS_exec_check_perms, which was included in 
the patch to demonstrate when the other hooks fired relative to the 
ExecutorCheckPerms_hook, but since it is causing problems, I can submit a patch 
with that removed.  Give me a couple minutes

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks



> On Mar 22, 2022, at 9:11 AM, Mark Dilger  wrote:
> 
> Give me a couple minutes



v1-0001-Fix-build-farm-failures-in-test_oat_hooks.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks



> On Mar 22, 2022, at 9:33 AM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
>> On 3/22/22 11:26, Mark Dilger wrote:
>>> culicidae is complaining:
>>> 2022-03-22 14:53:27.198 UTC [2167008][not initialized][:0] FATAL:  
>>> test_oat_hooks must be loaded via shared_preload_libraries
> 
>> That seems quite weird. I'm not sure how it's getting loaded at all if
>> not via shared_preload_libraries
> 
> After checking culicidae's config, I've duplicated this failure
> by building with EXEC_BACKEND defined.  So I'd opine that there
> is something broken about the method test_oat_hooks uses to
> decide if it was loaded via shared_preload_libraries or not.
> (Note that the failures appear to be coming out of auxiliary
> processes such as the checkpointer.)
> 
> As a quick-n-dirty fix to avoid reverting the entire test module,
> perhaps just delete this error check for now.

Ok, done as you suggest:



v2-0001-Fix-buildfarm-test-failures-in-test_oat_hooks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks

> On Mar 22, 2022, at 9:58 AM, Tom Lane  wrote:
> 
>> Ok, done as you suggest:
> 
> I only suggested removing the error check in _PG_init, not
> changing the way the test works.

I should have been more explicit and said, "done as y'all suggest".  The "To" 
line of that email was to you and Andrew.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New Object Access Type hooks

> On Mar 22, 2022, at 3:20 PM, Andres Freund  wrote:
> 
> Seems like it might actually be good to test that object access hooks work
> well in a parallel worker. How about going the other way and explicitly 
> setting
> force_parallel_mode = disabled for parts of the test and to enabled for
> others?

Wouldn't we get differing numbers of NOTICE messages depending on how many 
parallel workers there are?  Or would you propose setting the number of workers 
to a small, fixed value?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

room for improvement in amcheck btree checking?

2020-12-01 Thread Mark Dilger

Peter,

While working on the pg_amcheck code for [1], I discovered an unexpected
deficiency in the way btree indexes are checked. So far as I can tell from the
docs [2], the deficiency does not violate any promises that amcheck is making,
but I found it rather surprising all the same. To reproduce:

1) Create a (possibly empty) table and btree index over the table.
2) Flush buffers and backup a copy of the heap relation file.
3) Load (more) data into the table.
4) Flushing buffers as needed, revert the heap relation file to the backup
previously taken.
5) Run bt_index_check and bt_index_parent_check with and without heapallindexed
and/or rootdescend. Note that the index passes all checks.
6) Run a SQL query that uses a sequential scan on the table and observe no
errors.
7) Run a SQL query that uses an index scan on the table and see that it errors
with something like:

ERROR: could not read block 0 in file "base/13097/16391": read only 0 of
8192 bytes

I found it surprising that even when precisely zero of the tids in the index
exist in the table the index checks all come back clean. The heapallindexed
check is technically running as advertised, checking that all of the zero
tuples in the heap are present in the index. That is a pretty useless check
under this condition, though. Is a "indexallheaped" option (by some less crazy
name) needed?

Users might also run into this problem when a heap relation file gets
erroneously shortened by some number of blocks but not fully truncated, or
perhaps with torn page writes.

Have you already considered and rejected a "indexallheaped" type check?

Background
---

I have up until recently been focused on corruption caused by twiddling the
bits within heap and index relation pages, but real-world user error, file
system error, and perhaps race conditions in the core postgresql code seem at
least as likely to result in missing or incorrect versions of blocks of
relation files rather than individual bytes within those blocks being wrong.
Per our discussions in [3], not all corruptions that can be created under
laboratory conditions are equally likely to occur in the wild, and it may be
reasonable to only harden the amcheck code against corruptions that are more
likely to happen in actual practice.

To make it easier for tap tests to cover common corruption type scenarios, I
have been extending PostgresNode.pm with functions to perform these kinds of
file system corruptions. I expect to post that work in another thread soon. I
am not embedding any knowledge of the internal structure of heap, index, or
toast relations in PostgresNode, only creating functions to archive versions of
files and perform full or partial reversions of them later.

The ultimate goal of this work is to have sufficient regression tests to
demonstrate that pg_amcheck can be run with default options against a system
corrupted in these common ways without crashing, and with reasonable likelihood
of detecting these common corruptions. Users might understand that hard to
detect corruption will go unnoticed, but it would be harder to explain to them
why, immediately after getting a clean bill of health on their system, a query
bombed out with the sort of error shown above.

[1] https://commitfest.postgresql.org/31/2670/
[2] https://www.postgresql.org/docs/13/amcheck.html
[3]
https://www.postgresql.org/message-id/flat/CAH2-WznaU6HcahLV4Hg-DnhEmW8DuSdYfn3vfWXoj3Me9jq%3DsQ%40mail.gmail.com#0691475da5e9163d21b13fc415095801

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: room for improvement in amcheck btree checking?

2020-12-01 Thread Mark Dilger




> On Dec 1, 2020, at 4:38 PM, Peter Geoghegan  wrote:
> 
> On Tue, Dec 1, 2020 at 12:39 PM Mark Dilger
>  wrote:
>> I found it surprising that even when precisely zero of the tids in the index 
>> exist in the table the index checks all come back clean.  The heapallindexed 
>> check is technically running as advertised, checking that all of the zero 
>> tuples in the heap are present in the index.  That is a pretty useless check 
>> under this condition, though.  Is a "indexallheaped" option (by some less 
>> crazy name) needed?
>> 
>> Users might also run into this problem when a heap relation file gets 
>> erroneously shortened by some number of blocks but not fully truncated, or 
>> perhaps with torn page writes.
> 
> It would probably be possible to detect this exact condition (though
> not other variants of the more general problem) relatively easily.
> Perhaps by doing something with RelationOpenSmgr() with the heap
> relation, while applying a little knowledge of what must be true about
> the heap relation given what is known about the index relation. (I'm
> not 100% sure that that's possible, but offhand it seems like it
> probably is.)
> 
> See also: commit 6754fe65a4c, which tightened things up in this area
> for the index relation itself.

Yes, some of the test code I've been playing with already hit the error 
messages introduced in that commit.

>> Have you already considered and rejected a "indexallheaped" type check?
> 
> Actually, I pointed out that we should do something along these lines
> very recently:
> 
> https://postgr.es/m/CAH2-Wz=dy--fg5ij0kpcqums0w5g+xqed3t-7he+uqak_hm...@mail.gmail.com

Right.

> I'd be happy to see you pursue it.

I'm not sure how much longer I'll be pursuing corruption checkers before 
switching to another task.  Certainly, I'm interested in pursuing this if time 
permits.

>> Background
>> ---
>> 
>> I have up until recently been focused on corruption caused by twiddling the 
>> bits within heap and index relation pages, but real-world user error, file 
>> system error, and perhaps race conditions in the core postgresql code seem 
>> at least as likely to result in missing or incorrect versions of blocks of 
>> relation files rather than individual bytes within those blocks being wrong. 
>>  Per our discussions in [3], not all corruptions that can be created under 
>> laboratory conditions are equally likely to occur in the wild, and it may be 
>> reasonable to only harden the amcheck code against corruptions that are more 
>> likely to happen in actual practice.
> 
> Right. I also like to harden amcheck in ways that happen to be easy,
> especially when it seems like it might kind of work as a broad
> defense that doesn't consider any particular scenario. For example,
> hardening to make sure the an index tuple's lp_len matches
> IndexTupleSize() for the tuple proper (this also kind of works as
> storage format documentation). I am concerned about both costs and
> benefits.
> 
> FWIW, this is a perfect example of the kind of hardening that makes
> sense to me. Clearly this kind of failure is a reasonably plausible
> scenario (probably even a known real world scenario with catalog
> corruption), and clearly we could do something about it that's pretty
> simple and obviously low risk. It easily meets the standard that I
> might apply here.

Ok, it seems we're in general agreement about that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Test harness for regex code (to allow importing Tcl's test suite)

2021-01-03 Thread Mark Dilger




> On Jan 3, 2021, at 7:49 PM, Tom Lane  wrote:
> 
> Over the holiday break I've been fooling with some regex performance
> improvements.  I don't have anything ready to show yet in that line,
> but I was feeling the need for more-thorough test coverage, so I set
> to work on something that's been in the back of my mind for a long
> time: we need to absorb the test cases that Henry Spencer wrote way-
> back-when for that code, which up to now existed only as a script
> in the Tcl test suite.  That state of affairs seemed OK to me in
> the beginning, when we thought of Tcl as the upstream for that code,
> and figured they'd vet any nontrivial changes.  They've pretty
> thoroughly dropped the ball on that though, and indeed I think that
> they now believe *we're* the upstream.  So we need to have a test
> suite that reflects that status, at least to the extent of running
> all the test cases that exist for that code.
> 
> Accordingly, here's a new src/test/modules package that creates a
> function modeled on regexp_matches(), but using a set of flags that
> matches Spencer's design for the Tcl test suite, allowing parts
> of src/backend/regex/ to be tested that can't be reached with our
> existing SQL-exposed functions.  The test scripts in the module
> reproduce all the tests in Tcl's "tests/reg.test" script as of
> Tcl 8.6.10, plus a few others that I felt advisable, such as tests
> for the lookbehind constraints we added a few years ago.  (Note:
> Tcl also has regexp.test and regexpComp.test, but those seem to be
> oriented towards testing their language-specific wrappers not the
> regex engine itself.)
> 
> According to my testing, this increases our test code coverage for
> src/backend/regex/ from 71.1% to 86.7%, which is not too shabby,
> especially seeing that a lot of the remainder is not-deterministically-
> reachable code for malloc failure handling.
> 
> Thoughts?  Is anyone interested in reviewing this?  Since it's only
> test code, I'd be okay with pushing it without review, but I'd be
> happy if someone else wants to look at it.

I've quickly read this over and generally like it.  Thanks for working on this!

Have you thought about whether If it weren't in test/modules, it might be nice 
to expose test_regex from SQL with a slightly different interface that doesn't 
throw on regex compilation error?  Maybe something in contrib?  It might be 
useful to some users to validate regular expressions.  I'm just asking... I 
don't have any problem with how you have it here.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: macOS SIP, next try

2021-01-04 Thread Mark Dilger




> On Dec 2, 2020, at 6:28 AM, Peter Eisentraut 
>  wrote:
> 
> Previous discussions on macOS SIP:
> 
> https://www.postgresql.org/message-id/flat/561E73AB.8060800%40gmx.net
> https://www.postgresql.org/message-id/flat/CA%2BTgmoYGi5oR8Rvb2-SY1_WEjd76H5gCqSukxFQ66qR7MewWAQ%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/6a4d6124-41f0-756b-0811-c5c5def7ef4b%402ndquadrant.com

See also 
https://www.postgresql.org/message-id/flat/18012hGLG6HJ9pQDkHAMYuwQKg%40sparkpost.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

2021-01-07 Thread Mark Dilger

escend and bt_index_check 
wouldn't make sense, as there is no rootdescend option on that function.  So 
users would need multiple flags to turn on various options, with some flag 
combinations drawing an error about the flags not being mutually compatible.  
That's doable, but people may not like that interface.

> * heapallindexed is kind of expensive, but valuable. But the extra
> check is probably less likely to help on the second or subsequent
> index on a table.

There is a switch for enabling this.  It is off by default.

> It might be worth considering an option that only uses it with only
> one index: Preferably the primary key index, failing that some unique
> index, and failing that some other index.

It might make sense for somebody to submit this for a later release.  I don't 
have any plans to work on this during this release cycle.

>> I'm not very convinced by the decision to
>> override the user's decision about heapallindexed either.
> 
> I strongly agree.

I have removed the override.

> 
>> Maybe I lack
>> imagination, but that seems pretty arbitrary. Suppose there's a giant
>> index which is missing entries for 5 million heap tuples and also
>> there's 1 entry in the table which has an xmin that is less than the
>> pg_clas.relfrozenxid value by 1. You are proposing that because I have
>> the latter problem I don't want you to check for the former one. But
>> I, John Q. Smartuser, do not want you to second-guess what I told you
>> on the command line that I wanted. :-)
> 
> Even if your user is just average, they still have one major advantage
> over the architects of pg_amcheck: actual knowledge of the problem in
> front of them.

There is a switch for skipping index checks on corrupt tables.  By default, the 
indexes will be checked.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-13 Thread Mark Rofail

Hey Joel,

+ oprresult = DatumGetBool(FunctionCallInvoke(locfcinfo));
> -+ if (oprresult)
> ++ if (!locfcinfo->isnull && oprresult)
> + return true;
> + }
>
> Is checking !locfcinfo->isnull due to something new in v2,
> or what is just a correction for a problem also in v1?
>
The “!locfcinfo->isnull” is to protect against segmentation faults. I found
it was added to the original function which I adapted the “element” version
from.

/Mark

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-13 Thread Mark Rofail

Hey Joel,

test opr_sanity   ... FAILED
>
> AND binary_coercible(p2.opcintype, p1.amoplefttype));
>   amopfamily | amopstrategy | amopopr
> +--+-
> -(0 rows)
> +   2745 |5 |6105
> +(1 row)
>
> -- Operators that are primary members of opclasses must be immutable (else
> -- it suggests that the index ordering isn't fixed).  Operators that are
>
This is due using anycompatiblearray for the left operand in @>>.
To solve this problem we need to use @>>(anyarray,anyelement) or introduce
a new opclass for gin indices.
These are the two approaches that come to mind to solve this. Which one is
the right way or is there another solution I am not aware of?
That’s why I’m asking this on the mailing list, to get the community’s
input.

>
/Mark

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-02-17 Thread Mark Rofail

Hey Joel,

I will now proceed with the review of fk_arrays_elem_v2 as well.
>
Great work!!

/Mark

>

Re: new heapcheck contrib module

2021-02-23 Thread Mark Dilger




> On Feb 17, 2021, at 12:56 PM, Robert Haas  wrote:
> 
> On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger
>  wrote:
>> It will reconnect and retry a command one time on error.  That should cover 
>> the case that the connection to the database was merely lost.  If the second 
>> attempt also fails, no further retry of the same command is attempted, 
>> though commands for remaining relation targets will still be attempted, both 
>> for the database that had the error and for other remaining databases in the 
>> list.
>> 
>> Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` 
>> could result in two failures per relation in db2 before finally moving on to 
>> db3.  That seems pretty awful considering how many relations that could be, 
>> but failing to soldier on in the face of errors seems a strange design for a 
>> corruption checking tool.
> 
> That doesn't seem right at all. I think a PQsendQuery() failure is so
> remote that it's probably justification for giving up on the entire
> operation. If it's caused by a problem with some object, it probably
> means that accessing that object caused the whole database to go down,
> and retrying the object will take the database down again. Retrying
> the object is betting that the user interrupted connectivity between
> pg_amcheck and the database but the interruption is only momentary and
> the user actually wants to complete the operation. That seems unlikely
> to me. I think it's far more probably that the database crashed or got
> shut down and continuing is futile.
> 
> My proposal is: if we get an ERROR trying to *run* a query, give up on
> that object but still try the other ones after reconnecting. If we get
> a FATAL or PANIC trying to *run* a query, give up on the entire
> operation. If even sending a query fails, also give up.

This is changed in v40 as you propose to exit on FATAL and PANIC level errors 
and on error to send a query.  On lesser errors (which includes all corruption 
reports about btrees and some heap corruption related errors), the slot's 
connection is still useable, I think.  Are there cases where the error is lower 
than FATAL and yet the connection needs to be reestablished?  It does not seem 
so from the testing I have done, but perhaps I'm not thinking of the right sort 
of non-fatal error?

(I'll wait to post v40 until after hearing your thoughts on this.)

>> In v39, exit(1) is used for all errors which are intended to stop the 
>> program.  It is important to recognize that finding corruption is not an 
>> error in this sense.  A query to verify_heapam() can fail if the relation's 
>> checksums are bad, and that happens beyond verify_heapam()'s control when 
>> the page is not allowed into the buffers.  There can be errors if the file 
>> backing a relation is missing.  There may be other corruption error cases 
>> that I have not yet thought about.  The connections' errors get reported to 
>> the user, but pg_amcheck does not exit as a consequence of them.  As 
>> discussed above, failing to send the query to the server is not viewed as a 
>> reason to exit, either.  It would be hard to quantify all the failure modes, 
>> but presumably the catalogs for a database could be messed up enough to 
>> cause such failures, and I'm not sure that pg_amcheck should just abort.
> 
> I agree that exit(1) should happen after any error intended to stop
> the program. But I think it should also happen at the end of the run
> if we hit any problems for which we did not stop, so that exit(0)
> means your database is healthy.

In v40, exit(1) means the program encountered fatal errors leading it to stop, 
and exit(2) means that a non-fatal error and/or corruption reports occurred 
somewhere during the processing.  Otherwise, exit(0) means your database was 
successfully checked and is healthy.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Interest in GSoC 2021 Projects

2021-02-26 Thread Mark Wong

Hi,

On Fri, Feb 26, 2021 at 03:56:17PM +0800, Yixin Shi wrote:
> Hi,
> 
> 
> 
> I am Yixin Shi, a junior student majoring in Computer Science at University
> of Michigan. I notice that the project ideas for GSoC 2021 have been posted
> on your webpage and I am quite interested in two of them. I really wish to
> take part in the project *Make plsample a more complete procedural language
> handler example (2021)* considering both my interest and ability. The
> potential mentor for this Project is *Mark Wong*. I am also interested in
> the project *Improve PostgreSQL Regression Test Coverage (2021)* and the
> potential mentor is *Andreas Scherbaum* and *Stephen Frost*.
> 
> 
> 
> I would like to learn more about these two projects but failed to contact
> the mentors. How can I contact them? Also, I really hope to join the
> project. Are there any suggestions on application?

The idea for plsample is to continue providing working code and
documentation examples so that plsample can be used as a template for
creating additional language handlers.

Depending on the individual's comfort level, some effort may need to be
put into studying the current handlers for ideas on how to implement the
lacking functionality in plsample.

Does that help?

Regards,
Mark

Re: proposal: psql –help reflecting service or URI usage

2021-02-28 Thread Mark Dilger

> On Feb 28, 2021, at 1:57 AM, Paul Förster  wrote:
> 
> Hi,
> 
> I'd like to propose a patch to psql --help output:
> 
> Currently it is:
> 
> Usage:
>  psql [OPTION]... [DBNAME [USERNAME]]
> 
> ...
> 
> Connection options:
>  -h, --host=HOSTNAME  database server host or socket directory (default: 
> "local socket")
>  -p, --port=PORT  database server port (default: "5432")
>  -U, --username=USERNAME  database user name (default: "paul")
>  -w, --no-passwordnever prompt for password
>  -W, --password   force password prompt (should happen automatically)
> 
> I'd like to change it to the following to reflect the psql ability to process 
> a service name or a PostgreSQL URI:
> 
> Usage:
>  psql [OPTION]... [DBNAME [USERNAME]|service|uri]
> 
> ...
> 
> Connection options:
>  -h, --host=HOSTNAME  database server host or socket directory (default: 
> "local socket")
>  -p, --port=PORT  database server port (default: "5432")
>  -U, --username=USERNAME  database user name (default: "paul")
>  -w, --no-passwordnever prompt for password
>  -W, --password   force password prompt (should happen automatically)
>  service=name service name as definited in pg_service.conf

"definited" is a typo.

Should this say "as defined in pg_service.conf"?  That's the default, but the 
user might have $PGSERVICEFILE set to something else.  Perhaps you could borrow 
the wording of other options and use "(default: as defined in 
pg_service.conf)", or something like that, but of course being careful to still 
fit in the line length limit.

>  uri  connection URI (postgresql://...)
> 
> ...
> 
> Attached is a patch for src/bin/psql/help.c for this. The file 
> doc/src/sgml/ref/psql-ref.sgml does not seem to need any changes for this.
> 
> Any thoughts on this?

Other client applications follow the same pattern as psql, so if this change 
were adopted, it should apply to all of them.

Your proposal seems like something that would have been posted to the list 
before, possibly multiple times.  Any chance you could dig up past 
conversations on this subject and post links here for context?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: psql –help reflecting service or URI usage




> On Feb 28, 2021, at 10:10 AM, Paul Förster  wrote:
> 
> Hi Mark,
> 
>> On 28. Feb, 2021, at 17:54, Mark Dilger  wrote:
>> 
>> "definited" is a typo.
> 
> yes, definitely a typo, sorry. Thanks for pointing this out.
> 
>> Should this say "as defined in pg_service.conf"?  That's the default, but 
>> the user might have $PGSERVICEFILE set to something else.  Perhaps you could 
>> borrow the wording of other options and use "(default: as defined in 
>> pg_service.conf)", or something like that, but of course being careful to 
>> still fit in the line length limit.
> 
> I agree to all, thanks. What is the line length limit?

The output from --help should fit in a terminal window with only 80 characters 
width.  For example, in src/bin/scripts/createuser.c the line about 
--interactive is wrapped:

> printf(_("  -S, --no-superuserrole will not be superuser 
> (default)\n"));
> printf(_("  -V, --version output version information, then 
> exit\n"));
> printf(_("  --interactive prompt for missing role name and 
> attributes rather\n"
>  "than using defaults\n"));
> printf(_("  --replication role can initiate replication\n"));

You can find counter-examples where applications do not follow this rule.  
pg_isready is one of them.


>> Other client applications follow the same pattern as psql, so if this change 
>> were adopted, it should apply to all of them.
> 
> well, psql is central and IMHO the best place to start. I'd have to try out 
> all of them then. What I do know, though, is that pg_isready does not 
> understand a URI (why is that?), which is very unfortunate. So, I'd have to 
> try them all out and supply patches for them all?

There is a pattern in the client applications that the --help output is less 
verbose than the docs (see, for example, 
https://www.postgresql.org/docs/13/reference-client.html).  Your proposed patch 
makes psql's --help output a bit more verbose about this issue while leaving 
the other applications less so, without any obvious reason for the difference.

> Still, supporting a feature and not documenting it in its help is IMHO not a 
> good idea.

Ok.

>> Your proposal seems like something that would have been posted to the list 
>> before, possibly multiple times.  Any chance you could dig up past 
>> conversations on this subject and post links here for context?
> 
> I don't know any past discussions here. I only subscribed today to 
> psql-hackers. It might have been mentioned on psql-general, though. But I'm 
> not sure. This idea popped into my mind just yesterday when I was playing 
> around with psql and URIs and noticed that psql –help doesn't show them.

I mentioned looking at the mailing list archives, but neglected to give you a 
link:  https://www.postgresql.org/list/

Over the years, many proposals get made and debated, with some accepted and 
some rejected.  The rejected proposals often have some merit, and get suggested 
again, only to get rejected for the same reasons as the previous times they 
were suggested.  So searching the archives before posting a patch can sometimes 
be enlightening.  The difficulty in my experience is knowing what words and 
phrases to search for.  It can be a bit time consuming trying to find a prior 
discussion on a topic.

I don't know of any specific discussion which rejected your patch idea.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add --tablespace option to reindexdb

> On Feb 25, 2021, at 10:49 PM, Michael Paquier  wrote:
> 
> Hi all,

Hi Michael,

> Since c5b2860, it is possible to specify a tablespace for a REINDEX,
> but the equivalent option has not been added to reindexdb.  Attached
> is a patch to take care of that.
> 
> This includes documentation and tests.

The documentation of the TABLESPACE option for REINDEX says:

+  Specifies that indexes will be rebuilt on a new tablespace.

but in your patch for reindexdb, it is less clear:

+Specifies the tablespace to reindex on. (This name is processed as
+a double-quoted identifier.)

I think the language "to reindex on" could lead users to think that indexes 
already existent in the given tablespace will be reindexed.  In other words, 
the option might be misconstrued as a way to specify all the indexes you want 
reindexed.  Whatever you do here, beware that you are using similar language in 
the --help, so consider changing that, too.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add --tablespace option to reindexdb

> On Feb 25, 2021, at 10:49 PM, Michael Paquier  wrote:
> 
> While on it, I have added tests for toast tables and indexes with a
> tablespace move during a REINDEX. 

In t/090_reindexdb.pl, you add:

+# Create a tablespace for testing.
+my $ts = $node->basedir . '/regress_reindex_tbspace';
+mkdir $ts or die "cannot create directory $ts";
+# this takes care of WIN-specific path issues
+my $ets = TestLib::perl2host($ts);
+my $tbspace = 'reindex_tbspace';
+$node->safe_psql('postgres', "CREATE TABLESPACE $tbspace LOCATION '$ets';");

I recognize that you are borrowing some of that from 
src/bin/pgbench/t/001_pgbench_with_server.pl, but I don't find anything 
intuitive about the name "ets" and would rather not see that repeated.  There 
is nothing in TestLib::perl2host to explain this name choice, nor in pgbench's 
test, nor yours.

You also change a test:

 $node->issues_sql_like(
-   [ 'reindexdb', '-v', '-t', 'test1', 'postgres' ],
-   qr/statement: REINDEX \(VERBOSE\) TABLE public\.test1;/,
-   'reindex with verbose output');
+   [ 'reindexdb', '--concurrently', '-v', '-t', 'test1', 'postgres' ],
+   qr/statement: REINDEX \(VERBOSE\) TABLE CONCURRENTLY public\.test1;/,
+   'reindex concurrently with verbose output');

but I don't see what tests of the --concurrently option have to do with this 
patch.  I'm not saying there is anything wrong with this change, but it seems 
out of place.  Am I missing something?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add --tablespace option to reindexdb




> On Feb 25, 2021, at 10:49 PM, Michael Paquier  wrote:
> 
> Anyway, I would rather group the whole set of
> tests together, and using the --tablespace option introduced here
> within a TAP test does the job.

Your check verifies that reindexing a system table on a new tablespace fails, 
but does not verify what the failure was.  I wonder if you might want to make 
it more robust, something like:

diff --git a/src/bin/scripts/t/090_reindexdb.pl 
b/src/bin/scripts/t/090_reindexdb.pl
index 6946268209..8453acc817 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 54;
+use Test::More tests => 58;
 
 program_help_ok('reindexdb');
 program_version_ok('reindexdb');
@@ -108,23 +108,35 @@ $node->issues_sql_like(
 # names, and CONCURRENTLY cannot be used in transaction blocks, preventing
 # the use of TRY/CATCH blocks in a custom function to filter error
 # messages.
-$node->command_fails(
+$node->command_checks_all(
[ 'reindexdb', '-t', $toast_table, '--tablespace', $tbspace, 'postgres' 
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast table with tablespace');
-$node->command_fails(
+$node->command_checks_all(
[
'reindexdb','--concurrently', '-t', $toast_table,
'--tablespace', $tbspace, 'postgres'
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast table concurrently with tablespace');
-$node->command_fails(
+$node->command_checks_all(
[ 'reindexdb', '-i', $toast_index, '--tablespace', $tbspace, 'postgres' 
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast index with tablespace');
-$node->command_fails(
+$node->command_checks_all(
[
'reindexdb','--concurrently', '-i', $toast_index,
'--tablespace', $tbspace, 'postgres'
],
+   1,
+   [ ],
+   [ qr/cannot move system relation/ ],
'reindex toast index concurrently with tablespace');
 
 # connection strings


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Cross-reference comments on signal handling logic

> On Jan 17, 2021, at 11:51 PM, Craig Ringer  
> wrote:
> 
> 

In src/backend/postmaster/interrupt.c:

+ * These handlers are NOT used by normal user backends as they do not support

vs.

+ * Most backends use this handler.

These two comments seem to contradict.  If interrupt.c contains handlers that 
normal user backends to not use, then how can it be that most backends use one 
of the handlers in the file?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

> On Feb 9, 2021, at 10:43 AM, Pavel Borisov  wrote:
> 
> вт, 9 февр. 2021 г. в 01:46, Mark Dilger :
> 
> 
> > On Feb 8, 2021, at 2:46 AM, Pavel Borisov  wrote:
> > 
> > 0002 - is a temporary hack for testing. It will allow inserting duplicates 
> > in a table even if an index with the exact name "idx" has a unique 
> > constraint (generally it is prohibited to insert). Then a new amcheck will 
> > tell us about these duplicates. It's pity but testing can not be done 
> > automatically, as it needs a core recompile. For testing I'd recommend a 
> > protocol similar to the following:
> > 
> > - Apply patch 0002
> > - Set autovaccum = off in postgresql.conf
> 
> Thanks Pavel and Anastasia for working on this!
> 
> Updating pg_catalog directly is ugly, but the following seems a simpler way 
> to set up a regression test than having to recompile.  What do you think?
> 
> Very nice idea, thanks!
> I've made a regression test based on it. PFA v.2 of a patch. Now it doesn't 
> need anything external for testing.

If bt_index_check() and bt_index_parent_check() are to have this functionality, 
shouldn't there be an option controlling it much as the option (heapallindexed 
boolean) controls checking whether all entries in the heap are indexed in the 
btree?  It seems inconsistent to have an option to avoid checking the heap for 
that, but not for this.  Alternately, this might even be better coded as its 
own function, named something like bt_unique_index_check() perhaps.  I hope 
Peter might advise?

The regression test you provided is not portable.  I am getting lots of errors 
due to differing output of the form "page lsn=0/4DAD7E0".  You might turn this 
into a TAP test and use a regular expression to check the output.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use extended statistics to estimate (Var op Var) clauses




> On Nov 12, 2020, at 5:14 PM, Tomas Vondra  
> wrote:
> 
> <0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch>

Your patch no longer applies.  Can we get a new version please?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.




> On Mar 1, 2021, at 12:05 PM, Pavel Borisov  wrote:
> 
> The regression test you provided is not portable.  I am getting lots of 
> errors due to differing output of the form "page lsn=0/4DAD7E0".  You might 
> turn this into a TAP test and use a regular expression to check the output.
> May I ask you to ensure you used v3 of a patch to check? I've made tests 
> portable in v3, probably, you've checked not the last version.

Yes, my review was of v2.  Updating to v3, I see that the test passes on my 
laptop.  It still looks brittle to have all the tid values in the test output, 
but it does pass.

> Thanks for your attention to the patch

Thanks for the patch!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

> On Mar 1, 2021, at 12:23 PM, Pavel Borisov  wrote:
> 
> If bt_index_check() and bt_index_parent_check() are to have this 
> functionality, shouldn't there be an option controlling it much as the option 
> (heapallindexed boolean) controls checking whether all entries in the heap 
> are indexed in the btree?  It seems inconsistent to have an option to avoid 
> checking the heap for that, but not for this.  Alternately, this might even 
> be better coded as its own function, named something like 
> bt_unique_index_check() perhaps.  I hope Peter might advise?
>  
> As for heap checking, my reasoning was that we can not check whether a unique 
> constraint violated by the index, without checking heap tuple visibility. 
> I.e. we can have many identical index entries without uniqueness violated if 
> only one of them corresponds to a visible heap tuple. So heap checking 
> included in my patch is _necessary_ for unique constraint checking, it should 
> not have an option to be disabled, otherwise, the only answer we can get is 
> that unique constraint MAY be violated and may not be, which is quite 
> useless. If you meant something different, please elaborate. 

I completely agree that checking uniqueness requires looking at the heap, but I 
don't agree that every caller of bt_index_check on an index wants that 
particular check to be performed.  There are multiple ways in which an index 
might be corrupt, and Peter wrote the code to only check some of them by 
default, with options to expand the checks to other things.  This is why 
heapallindexed is optional.  If you don't want to pay the price of checking all 
entries in the heap against the btree, you don't have to.

I'm not against running uniqueness checks on unique indexes.  It seems fairly 
normal that a user would want that.  Perhaps the option should default to 
'true' if unspecified?  But having no option at all seems to run contrary to 
how the other functionality is structured.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module




> On Mar 1, 2021, at 1:14 PM, Robert Haas  wrote:
> 
> On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger
>  wrote:
>> [ new patches ]
> 
> Regarding 0001:
> 
> There seem to be whitespace-only changes to the comment for select_loop().
> 
> I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal()
> changes could be simpler. First idea: Suppose you had
> ParallelSlotsSetup(numslots) that just creates the slot array with 0
> connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you
> want to make it own an existing connection. That seems like it might
> be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB()
> altogether, and just let ParallelSlotsGetIdle() connect the other
> slots as required? Preconnecting all slots before we do anything is
> good because ... of what?

Mostly because, if --jobs is set too high, you get an error before launching 
any work.  I don't know that it's really a big deal if vacuumdb or reindexdb 
have a bunch of tasks kicked off prior to exit(1) due to not being able to open 
connections for all the slots, but it is a behavioral change.

> I also wonder if things might be simplified by introducing a wrapper
> object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the
> number of slots (num_slots), the array of actual PGconn objects, and
> the ConnParams to be used for new connections, and the initcmd to be
> used for new connections. Maybe also the progname. This seems like it
> would avoid a bunch of repeated parameter passing: you could just
> create the ParallelSlotArray with the right contents, and then pass it
> around everywhere, instead of having to keep passing the same stuff
> in. If you want to switch to connecting to a different DB, you tweak
> the ConnParams - maybe using an accessor function - and the system
> figures the rest out.

The existing version of parallel slots (before any of my changes) could already 
have been written that way, but the author chose not to.  I thought about 
making the sort of change you suggest, and decided against, mostly on the 
principle of stare decisis.  But the idea is very appealing, and since you're 
on board, I think I'll go make that change.

> I wonder if it's really useful to generalize this to a point of caring
> about all the ConnParams fields, too. Like, if you only provide
> ParallelSlotUpdateDB(slotarray, dbname), then that's the only field
> that can change so you don't need to care about the others. And maybe
> you also don't really need to keep the ConnParams fields in every
> slot, either. Like, couldn't you just say something like: if
> (strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB,
> can't reuse without a reconnect }? I know sometimes a dbname is really
> a whole connection string, but perhaps we could try to fix that by
> using PQconninfoParse() in the right place, so that what ends up in
> the cparams is just the db name, not a whole connection string.

I went a little out of my way to avoid that, as I didn't want the next 
application that uses parallel slots to have to refactor it again, if for 
example they want to process in parallel databases listening on different 
ports, or to process commands issued under different roles.

> This is just based on a relatively short amount of time spent studying
> the patch, so I might well be off-base here. What do you think?

I like the ParallelSlotArray idea, and will go do that now.  I'm happy to defer 
to your judgement on the other stuff, too, but will wait to hear back from you.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]




> On Mar 1, 2021, at 9:38 AM, Joel Jacobson  wrote:
> 
> Forgive me for just sending a patch without much discussion on the list,
> but it was so easy to implement, so I thought an implementation can
> help the discussion on if this is something we want or not.

I like the idea so I did a bit of testing.  I think the following should not 
error, but does:

+SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
+ERROR:  range lower bound must be less than or equal to range upper bound


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add --tablespace option to reindexdb




> On Mar 1, 2021, at 10:04 PM, Michael Paquier  wrote:
> 
> On Mon, Mar 01, 2021 at 10:12:54AM -0800, Mark Dilger wrote:
>> Your check verifies that reindexing a system table on a new
>> tablespace fails, but does not verify what the failure was.  I
>> wonder if you might want to make it more robust, something like:
> 
> I was not sure if that was worth adding or not, but no objections to
> do so.  So updated the patch to do that.
> 
> On Mon, Mar 01, 2021 at 09:47:57AM -0800, Mark Dilger wrote:
>> I recognize that you are borrowing some of that from
>> src/bin/pgbench/t/001_pgbench_with_server.pl, but I don't find
>> anything intuitive about the name "ets" and would rather not see
>> that repeated.  There is nothing in TestLib::perl2host to explain
>> this name choice, nor in pgbench's test, nor yours.
> 
> Okay, I have made the variable names more explicit.
> 
>> but I don't see what tests of the --concurrently option have to do
>> with this patch.  I'm not saying there is anything wrong with this
>> change, but it seems out of place.  Am I missing something?
> 
> While hacking on this feature I have just bumped into this separate
> issue, where the same test just gets repeated twice.  I have just
> applied an independent patch to take care of this problem separately,
> and backpatched it down to 12 where this got introduced.
> 
> On Mon, Mar 01, 2021 at 09:26:03AM -0800, Mark Dilger wrote:
>> I think the language "to reindex on" could lead users to think that
>> indexes already existent in the given tablespace will be reindexed.
>> In other words, the option might be misconstrued as a way to specify
>> all the indexes you want reindexed.  Whatever you do here, beware
>> that you are using similar language in the --help, so consider
>> changing that, too.
> 
> OK.  I have switched the docs to "Specifies the tablespace where
> indexes are rebuilt" and --help to "tablespace where indexes are
> rebuilt".
> 
> I have also removed the assertions based on the version number of the
> backend, based on Daniel's input sent upthread.
> 
> What do you think?

Looks good.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]




> On Mar 2, 2021, at 5:34 AM, Isaac Morland  wrote:
> 
> Returning to the RE result issue, I wonder how much it actually matters where 
> any empty matches are. Certainly the actual contents of the match don’t 
> matter; you don’t need to be able to index into the string to extract the 
> substring. The only scenario I can see where it could matter is if the RE is 
> using lookahead or look back to find occurrences before or after something 
> else. If we stipulate that the result array will be in order, then you still 
> don’t have the exact location of empty matches but you do at least have where 
> they are relative to non-empty matches.

I agree the contents of the match don't matter, because they are always empty.  
But the position matters.  You could intend to split a string in multiple 
places using lookaheads and lookbehinds to determine the split points.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information

> On Mar 2, 2021, at 5:20 AM, Joel Jacobson  wrote:
> 
> it's currently not possible to create an empty range with bounds information.
> 
> This patch tries to improve the situation by keeping the bounds information,
> and allow accessing it via lower() and upper().
> 
> No other semantics have been changed.
> All tests passes without any changes.

I recall this issue of empty ranges not keeping any bounds information being 
discussed back when range types were developed, and the design choice was 
intentional.  Searching the archives for that discussion, I don't find 
anything, probably because I'm not searching for the right keywords.  Anybody 
have a link to that discussion?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information




> On Mar 2, 2021, at 8:51 AM, Pantelis Theodosiou  wrote:
> 
> 
> 
> On Tue, Mar 2, 2021 at 3:28 PM Mark Dilger  
> wrote:
> 
> 
> > On Mar 2, 2021, at 5:20 AM, Joel Jacobson  wrote:
> > 
> > it's currently not possible to create an empty range with bounds 
> > information.
> > 
> > This patch tries to improve the situation by keeping the bounds information,
> > and allow accessing it via lower() and upper().
> > 
> > No other semantics have been changed.
> > All tests passes without any changes.
> 
> I recall this issue of empty ranges not keeping any bounds information being 
> discussed back when range types were developed, and the design choice was 
> intentional.  Searching the archives for that discussion, I don't find 
> anything, probably because I'm not searching for the right keywords.  Anybody 
> have a link to that discussion?
> 
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
>  Marc, perhaps you were referring to this discussion?
> https://www.postgresql.org/message-id/4d5534d002250003a...@gw.wicourts.gov

Yes, I believe so.  Thank you for the link.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information

> On Mar 2, 2021, at 9:52 AM, Joel Jacobson  wrote:
> 
> On Tue, Mar 2, 2021, at 15:42, Tom Lane wrote:
>> "Joel Jacobson"  writes:
>> > As discussed in the separate thread "[PATCH] regexp_positions ( string 
>> > text, pattern text, flags text ) → setof int4range[]" [1]
>> > it's currently not possible to create an empty range with bounds 
>> > information.
>> 
>> > This patch tries to improve the situation by keeping the bounds 
>> > information,
>> > and allow accessing it via lower() and upper().
>> 
>> I think this is an actively bad idea.  We had a clean set-theoretic
>> definition of ranges as sets of points, and with this we would not.
>> We should not be whacking around the fundamental semantics of a
>> whole class of data types on the basis that it'd be cute to make
>> regexp_position return its result as int4range rather than int[].
> 
> I think there are *lots* of other use-cases where the current semantics of 
> range types are very problematic.

I'm inclined to think that this conversation is ten years too late.  Range 
semantics are already relied upon in our code, but also in the code of others.  
It might be very hard to debug code that was correct when written but broken by 
this proposed change.  The problem is not just with lower() and upper(), but 
with equality testing (mentioned upthread), since code may rely on two 
different "positions" (your word) both being equal, and both sorting the same.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information




> On Mar 2, 2021, at 10:08 AM, Joel Jacobson  wrote:
> 
> On Tue, Mar 2, 2021, at 19:01, Mark Dilger wrote:
>> The problem is not just with lower() and upper(), but with equality testing 
>> (mentioned upthread), since code may rely on two different "positions" (your 
>> word) both being equal, and both sorting the same.
> 
> That's why the patch doesn't change equality.

How does that work if I SELECT DISTINCT ON (nr) ... and then take upper(nr).  
It's just random which values I get?  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information




> On Mar 2, 2021, at 10:16 AM, Chapman Flack  wrote:
> 
> On 03/02/21 13:01, Mark Dilger wrote:
>> The problem is not just with lower() and upper(), but with equality testing
>> (mentioned upthread), since code may rely on two different "positions"
>> (your word) both being equal, and both sorting the same.
> 
> Could those concerns be addressed perhaps, not by adding an entirely new
> just-like-a-range-but-remembers-position-when-zero-width type (which would
> feel wartlike to me), but by tweaking ranges to /secretly/ remember the
> position when zero width?
> 
> Secretly, in the sense that upper(), lower(), and the default sort
> operator would keep their established behavior, but new functions like
> upper_or_pos(), lower_or_pos() would return the non-NULL value even for
> an empty range, and another sort operator could be provided for use
> when one wants the ordering to reflect it?

I vaguely recall that ten years ago I wanted zero-width range types to not 
collapse into an empty range.  I can't recall if I ever expressed that opinion 
-- I just remember thinking it would be nice, and for reasons similar to what 
Joel is arguing here.  But I can't see having 
compares-equal-but-not-truly-equal ranges as a good idea.  I think Tom is right 
about that.

I also think the regexp work that inspired this thread could return something 
other than a range, so the motivation for creating a frankenstein range 
implementation doesn't really exist.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information

> On Mar 2, 2021, at 11:34 AM, Joel Jacobson  wrote:
> 
> Yes. It's random, since equality isn't changed, the sort operation cannot 
> tell the difference, and nor could a user who isn't aware of upper() / 
> lower() could reveal differences.

This sounds unworkable even just in light of the original motivation for this 
whole thread.  If I use your proposed regexp_positions(string text, pattern 
text, flags text) function to parse a large number of "positions" from a 
document, store all those positions in a table, and do a join of those 
positions against something else, it's not going to work.  Positions will 
randomly vanish from the results of that join, which is going to be really 
surprising.  I'm sure there are other examples of Tom's general point about 
compares-equal-but-not-equal datatypes.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information

> On Mar 2, 2021, at 11:42 AM, Mark Dilger  wrote:
> 
> 
> 
>> On Mar 2, 2021, at 11:34 AM, Joel Jacobson  wrote:
>> 
>> Yes. It's random, since equality isn't changed, the sort operation cannot 
>> tell the difference, and nor could a user who isn't aware of upper() / 
>> lower() could reveal differences.
> 
> This sounds unworkable even just in light of the original motivation for this 
> whole thread.  If I use your proposed regexp_positions(string text, pattern 
> text, flags text) function to parse a large number of "positions" from a 
> document, store all those positions in a table, and do a join of those 
> positions against something else, it's not going to work.  Positions will 
> randomly vanish from the results of that join, which is going to be really 
> surprising.  I'm sure there are other examples of Tom's general point about 
> compares-equal-but-not-equal datatypes.

I didn't phrase that clearly enough.  I'm thinking about whether you include 
the bounds information in the hash function.  The current implementation of 
hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds, since 
you didn't change it to do otherwise, so "equal" values won't always hash the 
same.  I haven't tested this out, but it seems you could get a different set of 
rows depending on whether the planner selects a hash join.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information

> On Mar 2, 2021, at 12:04 PM, Joel Jacobson  wrote:
> 
> On Tue, Mar 2, 2021, at 20:57, Mark Dilger wrote:
>> I didn't phrase that clearly enough.  I'm thinking about whether you include 
>> the bounds information in the hash function.  The current implementation of 
>> hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds, 
>> since you didn't change it to do otherwise, so "equal" values won't always 
>> hash the same.  I haven't tested this out, but it seems you could get a 
>> different set of rows depending on whether the planner selects a hash join.
> 
> I think this issue is solved by the 
> empty-ranges-with-bounds-information-v2.patch I just sent,
> since with it, there are no semantic changes at all, lower() and upper() 
> works like before.

There are semantic differences, because hash_range() doesn't call lower() and 
upper(), it uses RANGE_HAS_LBOUND and RANGE_HAS_UBOUND, the behavior of which 
you have changed.  I created a regression test and expected results and checked 
after applying your patch, and your patch breaks the hash function behavior.  
Notice that before your patch, all three ranges hashed to the same value, but 
not so after:

@@ -1,18 +1,18 @@
 select hash_range('[a,a)'::textrange);
  hash_range

-  484847245
+ -590102690
 (1 row)

 select hash_range('[b,b)'::textrange);
  hash_range

-  484847245
+  281562732
 (1 row)

 select hash_range('[c,c)'::textrange);
- hash_range 
-
-  484847245
+ hash_range  
+-
+ -1887445565
 (1 row)

You might change how hash_range() works to get all "equal" values to hash the 
same value, but that just gets back to the problem that non-equal things appear 
to be equal.  I guess that's your two-warty-feet preference, but not everyone 
is going to be in agreement on that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Support empty ranges with bounds information

> On Mar 2, 2021, at 12:51 PM, Joel Jacobson  wrote:
> 
> Yikes. Here be dragons. I think I want my wart free foot back please.
> 
> Many thanks for explaining. I think I’ll abandon this patch. I guess 
> implementing an entirely new range type could be an acceptable solution, but 
> that’s too big of a project for me to manage on my own. If any more 
> experienced hackers are interested in such a project, I would love to help if 
> I can.

Part of what was strange about arguing against your patch is that I kind of 
wanted the feature to work that way back when it originally got written.  (Not 
to say that it *should* have worked that way, just that part of me wanted it.)

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-03-03 Thread Mark Rofail

Hello Justin,

It doesn't just rebase: it also removes the test which was failing on
> windows
> CI:
>
My apologies, I didn’t include it in the changelog since this is not a code
change, just wanted to see if any other test would fail on the windows CI

I think the SELECT, when it works, is actually doing a seq scan and not
> using
> the index.  On my PC, the index scan is used until an autovacuum/analyze
> run,
> after which it uses seqscan.  I'm not sure how the other CIs all managed
> to run
> autovacuum between creating a table and running a query on it, though.

This is genius! That explains it. I have been racking my brain for two
weeks now and you figured it out.

I guess you should first run the query with "explain (costs off)" to show
> what
> plan it's using, and add things like "SET enable_seqscan=off" as needed to
> guarantee that everyone will use the same plan, regardless of minor cost
> differences and vacuum timing.

I think that will solve the test discrepancy.

Honestly Justin, hats off!

/Mark

>

Re: pg_amcheck contrib application

2021-03-04 Thread Mark Dilger




> On Mar 4, 2021, at 2:04 PM, Peter Geoghegan  wrote:
> 
> On Thu, Mar 4, 2021 at 7:29 AM Robert Haas  wrote:
>> I think this whole approach is pretty suspect because the number of
>> blocks in the relation can increase (by relation extension) or
>> decrease (by VACUUM or TRUNCATE) between the time when we query for
>> the list of target relations and the time we get around to executing
>> any queries against them. I think it's OK to use the number of
>> relation pages for progress reporting because progress reporting is
>> only approximate anyway, but I wouldn't print them out in the progress
>> messages, and I wouldn't try to fix up the startblock and endblock
>> arguments on the basis of how long you think that relation is going to
>> be.
> 
> I don't think that the struct AmcheckOptions block fields (e.g.,
> startblock) should be of type 'long' -- that doesn't work well on
> Windows, where 'long' is only 32-bit. To be fair we already do the
> same thing elsewhere, but there is no reason to repeat those mistakes.
> (I'm rather suspicious of 'long' in general.)
> 
> I think that you could use BlockNumber + strtoul() without breaking Windows.

Fair enough.

>> There are a LOT of things that can go wrong when we go try to run
>> verify_heapam on a table. The table might have been dropped; in fact,
>> on a busy production system, such cases are likely to occur routinely
>> if DDL is common, which for many users it is. The system catalog
>> entries might be screwed up, so that the relation can't be opened.
>> There might be an unreadable page in the relation, either because the
>> OS reports an I/O error or something like that, or because checksum
>> verification fails. There are various other possibilities. We
>> shouldn't view such errors as low-level things that occur only in
>> fringe cases; this is a corruption-checking tool, and we should expect
>> that running it against messed-up databases will be common. We
>> shouldn't try to interpret the errors we get or make any big decisions
>> about them, but we should have a clear way of reporting them so that
>> the user can decide what to do.
> 
> I agree.
> 
> Your database is not supposed to be corrupt. Once your database has
> become corrupt, all bets are off -- something happened that was
> supposed to be impossible -- which seems like a good reason to be
> modest about what we think we know.
> 
> The user should always see the unvarnished truth. pg_amcheck should
> not presume to suppress errors from lower level code, except perhaps
> in well-scoped special cases.

I think Robert mistook why I was doing that.  I was thinking about a different 
usage pattern.  If somebody thinks a subset of relations have been badly 
corrupted, but doesn't know which relations those might be, they might try to 
find them with pg_amcheck, but wanting to just check the first few blocks per 
relation in order to sample the relations.  So,

  pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes

or something like that.  I don't think it's very fun to have it error out for 
each relation that doesn't have at least ten blocks, nor is it fun to have 
those relations skipped by error'ing out before checking any blocks, as they 
might be the corrupt relations you are looking for.  But using --startblock and 
--endblock for this is not a natural fit, as evidenced by how I was trying to 
"fix things up" for the user, so I'll punt on this usage until some future 
version, when I might add a sampling option.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: psql –help reflecting service or URI usage




> On Mar 6, 2021, at 5:55 AM, Paul Förster  wrote:
> 
> Hi Mark,
> 
> sorry for the delay.
> 
>> On 01. Mar, 2021, at 17:02, Mark Dilger  wrote:
>> 
>> The output from --help should fit in a terminal window with only 80 
>> characters width.  For example, in src/bin/scripts/createuser.c the line 
>> about --interactive is wrapped:
> 
> I see.
> 
>> You can find counter-examples where applications do not follow this rule.  
>> pg_isready is one of them.
> 
> yes, I noticed that.
> 
>> There is a pattern in the client applications that the --help output is less 
>> verbose than the docs (see, for example, 
>> https://www.postgresql.org/docs/13/reference-client.html).  Your proposed 
>> patch makes psql's --help output a bit more verbose about this issue while 
>> leaving the other applications less so, without any obvious reason for the 
>> difference.
> 
> I could do the other tools too, that wouldn't be a problem. But I'm not good 
> at writing docs. And I found that the man pages would have to be adapted too, 
> which would be a big one for me as I'm no man guru.

Fortunately, the man pages and html docs are generated from the same sources.  
Those sources are written in sgml, and the tools to build the docs must be 
installed.  From the top directory, execute `make docs` and if it complains 
about missing tools you will need to install them.  (The build target is 
'docs', but the directory containing the docs is named 'doc'.)

>> Over the years, many proposals get made and debated, with some accepted and 
>> some rejected.  The rejected proposals often have some merit, and get 
>> suggested again, only to get rejected for the same reasons as the previous 
>> times they were suggested.  So searching the archives before posting a patch 
>> can sometimes be enlightening.  The difficulty in my experience is knowing 
>> what words and phrases to search for.  It can be a bit time consuming trying 
>> to find a prior discussion on a topic.
> 
> I've searched the archives for quite some time and found tons of hits for the 
> search term "help". But that's useless because all people ask for "help". :-) 
> Beside that, I found nothing which even remotely discussed the topic.
> 
> But I generally see your points so I drop the proposal. It was only an idea 
> after all :-) Thanks for your input.

Oh, I'm quite sorry to hear that.  The process of getting a patch accepted, 
especially the first time you submit one, can be discouraging.  But the 
community greatly benefits from new contributors joining the effort, so I'd 
much rather you not withdraw the idea.

If you need help with certain portions of the submission, such as editing the 
docs, I can help with that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

> On Mar 5, 2021, at 11:46 AM, Joel Jacobson  wrote:
> 
> 
> /Joel
> <0003-regexp-positions.patch>

I did a bit more testing:

+SELECT regexp_positions('foobarbequebaz', 'b', 'g');
+ regexp_positions 
+--
+ {"[3,5)"}
+ {"[6,8)"}
+ {"[11,13)"}
+(3 rows)
+

I understand that these ranges are intended to be read as one character long 
matches starting at positions 3, 6, and 11, but they look like they match 
either two or three characters, depending on how you read them, and users will 
likely be confused by that.

+SELECT regexp_positions('foobarbequebaz', '(?=beque)', 'g');
+ regexp_positions 
+--
+ {"[6,7)"}
+(1 row)
+

This is a zero length match.  As above, it might be confusing that a zero 
length match reads this way.

+SELECT regexp_positions('foobarbequebaz', '(?<=z)', 'g');
+ regexp_positions 
+--
+ {"[14,15)"}
+(1 row)
+

Same here, except this time position 15 is referenced, which is beyond the end 
of the string.

I think a zero length match at the end of this string should read as 
{"[14,14)"}, and you have been forced to add one to avoid that collapsing down 
to "empty", but I'd rather you found a different datatype rather than abuse the 
definition of int4range.

It seems that you may have reached a similar conclusion down-thread?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: psql –help reflecting service or URI usage




> On Mar 8, 2021, at 8:40 AM, Paul Förster  wrote:
> 
> Hi Mark,
> 
>> On 08. Mar, 2021, at 16:39, Mark Dilger  wrote:
>> 
>> Fortunately, the man pages and html docs are generated from the same 
>> sources.  Those sources are written in sgml, and the tools to build the docs 
>> must be installed.  From the top directory, execute `make docs` and if it 
>> complains about missing tools you will need to install them.  (The build 
>> target is 'docs', but the directory containing the docs is named 'doc'.)
> 
> so the help files I'd change would be doc/src/sgml/ref/psql-ref.sgml, 
> doc/src/sgml/ref/pg_isready.sgml, etc.?

Yes

>> Oh, I'm quite sorry to hear that.  The process of getting a patch accepted, 
>> especially the first time you submit one, can be discouraging.  But the 
>> community greatly benefits from new contributors joining the effort, so I'd 
>> much rather you not withdraw the idea.
> 
> I'd like to, and also I'd like to do all the bin/* tools (including wrapping 
> the long line in pg_isready ;-)), as you suggested, but I don't know the 
> process. In my first admittedly naive attempt, I just downloaded the source 
> from https://www.postgresql.org/ftp/source/v13.2, unpacked it and made my 
> changes there. Then I did a diff to the original and posted it here. I don't 
> even know if this is the correct workflow. I saw gitgub being mentioned a 
> couple of times but I don't have an account, nor do I even know how it works.
> 
> I was pretty surprised to see the lines in PWN:
> 
> "Paul Förster sent in a patch to mention database URIs in psql's --help 
> output."
> "Paul Förster sent in another revision of a patch to mention URIs and 
> services in psql --help's output."
> 
> Is there a FAQ somewhere that describes how properly create patches, submit 
> them and possibly get them released? Something like a step-by-step?
> 
> Is github a must-have here?

No, github is not a must-have.  I don't use a github account for my 
submissions.  The project uses git for source code control, but that's not the 
same thing as requiring "github".  The project switched from cvs to git a few 
years back.

If you can install git, using rpm or yum or whatever, then from the command 
line you can use

  git clone https://git.postgresql.org/git/postgresql.git

and it will create a working directory for you.  You can make modifications and 
commit them.  When you are finished, you can run

  git format-patch -v 1 master

and it will create a patch set containing all your changes relative to the 
public sources you cloned, and the patch will include your commit messages, 
which helps reviewers not only know what you changed, but why you made the 
changes, in your own words.

See https://wiki.postgresql.org/wiki/Development_information

>> If you need help with certain portions of the submission, such as editing 
>> the docs, I can help with that.
> 
> as you see above, I'm curious to learn, though doing it to all the tools will 
> take some time for me.
> 
> Sorry, I'm a noob, not so much to C, but to the workflows here. Hence my 
> questions may seem a little obvious to all the pros.

That's not a problem.  If this gets too verbose for the list, we can take this 
off-list and I can still walk you through it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]




> On Mar 8, 2021, at 9:05 AM, Joel Jacobson  wrote:
> 
> If a N+1 dimension array could easily be unnested to a N dimension array,
> I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and 
> not controversial.

How about proposing some array functions to go along with the regexp_positions, 
and then do it that way?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]




> On Mar 8, 2021, at 9:20 AM, Joel Jacobson  wrote:
> 
> On Mon, Mar 8, 2021, at 18:11, Mark Dilger wrote:
>> > On Mar 8, 2021, at 9:05 AM, Joel Jacobson  wrote:
>> > 
>> > If a N+1 dimension array could easily be unnested to a N dimension array,
>> > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and 
>> > not controversial.
>> 
>> How about proposing some array functions to go along with the 
>> regexp_positions, and then do it that way?
> 
> Sounds like a nice solution. That would be a huge win when dealing with 
> multidimensional arrays in general.
> 
> Do we have strong support on the list for such a function? If so, I can make 
> an attempt implementing it, unless some more experienced hacker wants to do 
> it.

That's a hard question to answer in advance.  Typically, you need to propose a 
solution, and then get feedback.  You wouldn't need to post a patch, but 
perhaps some examples of how you would expect it to work, like

+SELECT slice('{{1,2,3,4},{5,6,7,8},{9,10,11,12},{13,14,15,16}}'::integer[][], 
'[2,4)'::int4range);
+   slice   
+---
+ {{2,3}}
+ {{6,7}}
+ {{10,11}}
+ {{14,15}}
+(4 rows)
+
+SELECT 
slice('{{{1,2,3},{4,5,6},{7,8,9}},{{10,11,12},{13,14,15},{16,17,18}}}'::integer[][][],
 '[2,4)'::int4range);
+   slice   
+---
+ {{{4,5,6},{7,8,9}}}
+ {{{13,14,15},{16,17,18}}}
+(2 rows)
+

and then people can tell you why they hate that choice of interface.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application




> On Mar 8, 2021, at 8:26 AM, Robert Haas  wrote:
> 
> On Thu, Mar 4, 2021 at 5:39 PM Mark Dilger  
> wrote:
>> I think Robert mistook why I was doing that.  I was thinking about a 
>> different usage pattern.  If somebody thinks a subset of relations have been 
>> badly corrupted, but doesn't know which relations those might be, they might 
>> try to find them with pg_amcheck, but wanting to just check the first few 
>> blocks per relation in order to sample the relations.  So,
>> 
>>  pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
>> 
>> or something like that.  I don't think it's very fun to have it error out 
>> for each relation that doesn't have at least ten blocks, nor is it fun to 
>> have those relations skipped by error'ing out before checking any blocks, as 
>> they might be the corrupt relations you are looking for.  But using 
>> --startblock and --endblock for this is not a natural fit, as evidenced by 
>> how I was trying to "fix things up" for the user, so I'll punt on this usage 
>> until some future version, when I might add a sampling option.
> 
> I admit I hadn't thought of that use case. I guess somebody could want
> to do that, but it doesn't seem all that useful. Checking the first
> up-to-ten blocks of every relation is not a very representative
> sample, and it's not clear to me that sampling is a good idea even if
> it were representative. What good is it to know that 10% of my
> database is probably not corrupted?


`cd $PGDATA; tar xfz my_csv_data.tgz` ctrl-C ctrl-C ctrl-C
`rm -rf $PGDATA` ctrl-C ctrl-C ctrl-C
`/my/stupid/backup/and/restore/script.sh` ctrl-C ctrl-C ctrl-C

# oh wow, i wonder if any relations got overwritten with csv file data, or had 
their relation files unlinked, or ...?

`pg_amcheck --jobs=8 --startblock=0 --endblock=10`

# ah, darn, it's spewing lots of irrelevant errors because some relations are 
too short

`pg_amcheck --jobs=8 --startblock=0 --endblock=0`

# ah, darn, it's still spewing lots of irrelevant errors because I have lots of 
indexes with zero blocks of data

`pg_amcheck --jobs=8`

# ah, darn, it's taking forever, because it's processing huge tables in their 
entirety

I agree this can be left to later, and the --startblock and --endblock options 
are the wrong way to do it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Lowering the ever-growing heap->pd_lower

2021-03-09 Thread Mark Dilger




> On Mar 9, 2021, at 7:13 AM, Matthias van de Meent 
>  wrote:
> 
> Hi,
> 
> The heap AMs' pages only grow their pd_linp array, and never shrink
> when trailing entries are marked unused. This means that up to 14% of
> free space (=291 unused line pointers) on a page could be unusable for
> data storage, which I think is a shame. With a patch in the works that
> allows the line pointer array to grow up to one third of the size of
> the page [0], it would be quite catastrophic for the available data
> space on old-and-often-used pages if this could not ever be reused for
> data.
> 
> The shrinking of the line pointer array is already common practice in
> indexes (in which all LP_UNUSED items are removed), but this specific
> implementation cannot be used for heap pages due to ItemId
> invalidation. One available implementation, however, is that we
> truncate the end of this array, as mentioned in [1]. There was a
> warning at the top of PageRepairFragmentation about not removing
> unused line pointers, but I believe that was about not removing
> _intermediate_ unused line pointers (which would imply moving in-use
> line pointers); as far as I know there is nothing that relies on only
> growing page->pd_lower, and nothing keeping us from shrinking it
> whilst holding a pin on the page.
> 
> Please find attached a fairly trivial patch for which detects the last
> unused entry on a page, and truncates the pd_linp array to that entry,
> effectively freeing 4 bytes per line pointer truncated away (up to
> 1164 bytes for pages with MaxHeapTuplesPerPage unused lp_unused
> lines).
> 
> One unexpected benefit from this patch is that the PD_HAS_FREE_LINES
> hint bit optimization can now be false more often, increasing the
> chances of not having to check the whole array to find an empty spot.
> 
> Note: This does _not_ move valid ItemIds, it only removes invalid
> (unused) ItemIds from the end of the space reserved for ItemIds on a
> page, keeping valid linepointers intact.
> 
> 
> Enjoy,
> 
> Matthias van de Meent
> 
> [0] 
> https://www.postgresql.org/message-id/flat/cad21aod0ske11fmw4jd4renawbmcw1wasvnwpjvw3tvqpoq...@mail.gmail.com
> [1] 
> https://www.postgresql.org/message-id/CAEze2Wjf42g8Ho%3DYsC_OvyNE_ziM0ZkXg6wd9u5KVc2nTbbYXw%40mail.gmail.com
> 

For a prior discussion on this topic:

https://www.postgresql.org/message-id/2e78013d0709130606l56539755wb9dbe17225ffe90a%40mail.gmail.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Lowering the ever-growing heap->pd_lower

2021-03-09 Thread Mark Dilger




> On Mar 9, 2021, at 1:35 PM, Tom Lane  wrote:
> 
> So, to accept a patch that shortens the line pointer array, what we need
> to do is verify that every such code path checks for an out-of-range
> offset before trying to fetch the target line pointer.  I believed
> back in 2007 that there were, or once had been, code paths that omitted
> such a range check, assuming that they could trust the TID they had
> gotten from $wherever to point at an extant line pointer array entry.
> Maybe things are all good now, but I think you should run around and
> examine every place that checks for tuple deadness to see if the offset
> it used is known to be within the current page bounds.

Much as Pavan asked [1], I'm curious how we wouldn't already be in trouble if 
such code exists?  In such a scenario, what stops a dead line pointer from 
being reused (rather than garbage collected by this patch) prior to such 
hypothetical code using an outdated TID?

I'm not expressing a view here, just asking questions.

[1] 
https://www.postgresql.org/message-id/2e78013d0709130832t31244e79k9488a3e4eb00d64c%40mail.gmail.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

2021-03-11 Thread Mark Dilger

> On Mar 11, 2021, at 12:12 AM, Peter Eisentraut 
>  wrote:
> 
> (I'm still not a fan of adding more client-side tools whose sole task is to 
> execute server-side functionality in a slightly filtered way, but it seems 
> people are really interested in this, so ...)
> 
> I want to register, if we are going to add this, it ought to be in src/bin/.  
> If we think it's a useful tool, it should be there with all the other useful 
> tools.

I considered putting it in src/bin/scripts where reindexdb and vacuumdb also 
live.  It seems most similar to those two tools.

> I realize there is a dependency on a module in contrib, and it's probably now 
> not the time to re-debate reorganizing contrib.  But if we ever get to that, 
> this program should be the prime example why the current organization is 
> problematic, and we should be prepared to make the necessary moves then.

Before settling on contrib/pg_amcheck as the location, I checked whether any 
tools under src/bin had dependencies on a contrib module, and couldn't find any 
current examples.  (There seems to have been one in the past, though I forget 
which that was at the moment.)

I have no argument with changing the location of this tool before it gets 
committed, but I wonder if we should do that now, or wait until some future 
time when contrib gets reorganized?  I can't quite tell which you prefer from 
your comments above.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

2021-03-11 Thread Mark Dilger

> On Mar 11, 2021, at 3:36 AM, Andrey Borodin  wrote:
> 
> 
> 
>> 11 марта 2021 г., в 13:12, Peter Eisentraut 
>>  написал(а):
>> 
>> client-side tools whose sole task is to execute server-side functionality in 
>> a slightly filtered way
> 
> By the way, can we teach pg_amcheck to verify database without creating local 
> PGDATA and using bare minimum of file system quota?

pg_amcheck does not need a local data directory to check a remote database 
server, though it does need to connect to that server.  The local file system 
quota should not be a problem, as pg_amcheck does not download and save any 
data to disk.  I am uncertain if this answers your question.  If you are 
imagining pg_amcheck running on the same server as the database cluster, then 
of course running pg_amcheck puts a burden on the server to read all the 
relation files necessary, much as running queries over the same relations would 
do.

> We can implement a way for a pg_amcheck to ask for some specific file, which 
> will be downloaded by backup tool and streamed to pg_amcheck.
> E.g. pg_amcheck could have a restore_file_command = 'backup-tool 
> bring-my-file %backup_id %file_name' and probably 
> list_files_command='backup-tool list-files %backup_id'. And pg_amcheck could 
> then fetch bare minimum of what is needed.
> 
> I see that this is somewhat orthogonal idea, but from my POV interesting one.

pg_amcheck is not designed to detect corruption directly, but rather to open 
one or more connections to the database and execute sql queries which employ 
the contrib/amcheck sql functions.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

2021-03-11 Thread Mark Dilger




> On Mar 11, 2021, at 9:10 AM, Andrey Borodin  wrote:
> 
> 
> 
>> 11 марта 2021 г., в 20:30, Mark Dilger  
>> написал(а):
>> 
>> 
>> pg_amcheck does not need a local data directory to check a remote database 
>> server, though it does need to connect to that server.
> No, I mean it it would be great if we did not need to materialise whole DB 
> anywhere. Let's say I have a backup of 10Tb cluster in S3. And don't have 
> that clusters hardware anymore. I want to spawn tiny VM with few GiBs of RAM 
> and storage no larger than biggest index within DB + WAL from start to end. 
> And stream-check all backup, mark it safe and sleep well. It would be perfect 
> if we could do backup verification at cost of corruption monitoring (and not 
> vice versa, which is trivial).

Thanks for clarifying.  I agree that would be useful.  I don't see any way to 
make that part of this project, but maybe after the v14 cycle you'll look over 
the code a propose a way forward for that?'

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application



> On Mar 12, 2021, at 1:43 PM, Robert Haas  wrote:
> 
> On Fri, Mar 12, 2021 at 2:31 PM Robert Haas  wrote:
>> I'll commit something shortly to address these.
> 
> There are some interesting failures in the test cases on the
> buildfarm. One of the tests ($offnum == 13) corrupts the TOAST pointer
> with a garbage value, expecting to get the message "final toast chunk
> number 0 differs from expected value 6". But on florican and maybe
> other systems we instead get "final toast chunk number 0 differs from
> expected value 5". That's because the value of TOAST_MAX_CHUNK_SIZE
> depends on MAXIMUM_ALIGNOF. I think that on 4-byte alignment systems
> it works out to 2000 and on 8-byte alignment systems it works out to
> 1996, and the value being stored is 1 bytes, hence the problem.
> The place where the calculation goes different seems to be in
> MaximumBytesPerTuple(), where it uses MAXALIGN_DOWN() on a value that,
> according to my calculations, will be 2038 on all platforms, but the
> output of MAXALIGN_DOWN() will be 2032 or 2036 depending on the
> platform. I think the solution to this is just to change the message
> to match \d+ chunks instead of exactly 6. We should do that right away
> to avoid having the buildfarm barf.
> 
> But, I also notice a couple of other things I think could be improved here:
> 
> 1. amcheck is really reporting the complete absence of any TOAST rows
> here due to a corrupted va_valueid. It could pick a better phrasing of
> that message than "final toast chunk number 0 differs from expected
> value XXX". I mean, there is no chunk 0. There are no chunks at all.
> 
> 2. Using S as the perl unpack code for the varlena header is
> not ideal, because it's really 2 1-byte fields followed by 4 4-byte
> fields. So I think you should be using CCllLL, for two unsigned bytes
> and then two signed 4-byte quantities and then two unsigned 4-byte
> quantities. I think if you did that you'd be overwriting the
> va_valueid with the *same* garbage value on every platform, which
> would be better than different ones. Perhaps when we improve the
> message as suggested in (1) this will become a live issue, since we
> might choose to say something like "no TOAST entries for value %u".
> 
> -- 
> Robert Haas
> EDB: http://www.enterprisedb.com


This does nothing to change the verbiage from contrib/amcheck, but it should 
address the problems discussed here in pg_amcheck's regression tests.



v1-0001-Fixing-portability-issues-in-pg_amcheck-regressio.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

> On Mar 12, 2021, at 2:55 PM, Robert Haas  wrote:
> 
> On Fri, Mar 12, 2021 at 5:24 PM Mark Dilger
>  wrote:
>> This does nothing to change the verbiage from contrib/amcheck, but it should 
>> address the problems discussed here in pg_amcheck's regression tests.
> 
> Committed.

Thanks.

There are two more, attached here.  The first deals with error message text 
which differs between farm animals, and the second deals with an apparent 
problem with IPC::Run shell expanding an asterisk on some platforms but not 
others.  That second one, if true, seems like a problem with scope beyond the 
pg_amcheck project, as TestLib::command_checks_all uses IPC::Run, and it would 
be desirable to have consistent behavior across platforms.

v2-0001-Fixing-pg_amcheck-regression-test-portability-iss.patch
Description: Binary data

v2-0002-Working-around-apparent-difficulty-in-IPC-Run.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

> On Mar 12, 2021, at 11:24 AM, Robert Haas  wrote:
> 
> On Fri, Mar 12, 2021 at 2:05 PM  wrote:
>> I think there is a formatting glitch in lines like:
>> 
>> 2/4 relations (50%) 187977/187978 pages (99%), (testdb   
>>   )
>> 
>> I suppose that last part should show up trimmed as '(testdb)', right?
> 
> Actually I think this is intentional. The idea is that as the line is
> rewritten we don't want the close-paren to move around. It's the same
> thing pg_basebackup does with its progress reporting.
> 
> Now that is not to say that some other behavior might not be better,
> just that Mark was copying something that already exists, probably
> because he knows that I'm finnicky about consistency

I think Erik's test case only checked one database, which might be why it 
looked odd to him.  But consider:

  pg_amcheck -d foo -d bar -d myreallylongdatabasename -d myshortername -d baz 
--progress

The tool will respect your database ordering, and check foo, then bar, etc.  If 
you have --jobs greater than one, it will start checking some relations in bar 
before finishing all relations in foo, but with a fudge factor, pg_amcheck can 
report that the processing has moved on to database bar, etc.

You wouldn't want the parens to jump around when the long database names get 
processed.  So it keeps the parens in the same location, space pads shorter 
database names, and truncates overlong database names.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

> On Mar 12, 2021, at 3:24 PM, Mark Dilger  wrote:
> 
> and the second deals with an apparent problem with IPC::Run shell expanding 
> an asterisk on some platforms but not others.  That second one, if true, 
> seems like a problem with scope beyond the pg_amcheck project, as 
> TestLib::command_checks_all uses IPC::Run, and it would be desirable to have 
> consistent behavior across platforms.

The problem with IPC::Run appears to be real, though I might just need to wait 
longer for the farm animals to prove me wrong about that.  But there is a 
similar symptom caused by an unrelated problem, one entirely my fault and 
spotted by Robert.  Here is a patch:

v3-0001-Avoid-use-of-non-portable-option-ordering-in-comm.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application



> On Mar 12, 2021, at 5:16 PM, Robert Haas  wrote:
> 
> Gah, tests are so annoying. :-)

There is another problem of non-portable option ordering in the tests.



v4-0001-pg_amcheck-Keep-trying-to-fix-the-tests.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application




> On Mar 12, 2021, at 9:08 PM, Tom Lane  wrote:
> 
> I wrote:
>> Don't almost all of the following tests have the same issue?
> 
> Ah, nevermind, I was looking at an older version of 003_check.pl.
> I concur that 24189277f missed only one here.
> 
> Pushed your fix.
> 
>   regards, tom lane

Thanks!  Was just responding to your other email, but now I don't have to send 
it.

Sorry for painting so many farm animals red this evening.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application




> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
> 
> On Sat, Mar 13, 2021 at 01:07:15AM -0500, Tom Lane wrote:
>> I wrote:
>>>> ... btw, prairiedog (which has a rather old Perl) has a
>>>> different complaint:
>>>> Invalid type 'q' in unpack at t/004_verify_heapam.pl line 104.
>> 
>>> Hmm ... "man perlfunc" on that system quoth
>>>   q   A signed quad (64-bit) value.
>>>   Q   An unsigned quad value.
>>> (Quads are available only if your system supports 
>>> 64-bit
>>>  integer values _and_ if Perl has been compiled to 
>>> support those.
>>>  Causes a fatal error otherwise.)
>>> It does not seem unreasonable for us to rely on Perl having that
>>> in 2021, so I'll see about upgrading this perl installation.
>> 
>> Hm, wait a minute: hoverfly is showing the same failure, even though
>> it claims to be running a 64-bit Perl.  Now I'm confused.
> 
> On that machine:
> 
> [nm@power8-aix 7:0 2021-03-13T06:09:08 ~ 0]$ /usr/bin/perl64 -e 'unpack "q", 
> ""'
> [nm@power8-aix 7:0 2021-03-13T06:09:10 ~ 0]$ /usr/bin/perl -e 'unpack "q", ""'
> Invalid type 'q' in unpack at -e line 1.
> 
> hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 32-bit
> Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
> "prove64".)  This test should unpack the field as two 32-bit values, not a
> 64-bit value, to avoid requiring more from the Perl installation.

I will post a modified test in a bit that avoids using Q/q.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

> On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
>>> hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 32-bit
>>> Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
>>> "prove64".)
> 
> Oh, that's annoying.
> 
>>> This test should unpack the field as two 32-bit values, not a
>>> 64-bit value, to avoid requiring more from the Perl installation.
> 
>> I will post a modified test in a bit that avoids using Q/q.
> 
> Coping with both endiannesses might be painful.

Not too bad if the bigint value is zero, as both the low and high 32bits will 
be zero, regardless of endianness.  The question is whether that gives up too 
much in terms of what the test is trying to do.  I'm not sure that it does, but 
if you'd rather solve this by upgrading perl, that's ok by me. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application



> On Mar 12, 2021, at 10:28 PM, Mark Dilger  
> wrote:
> 
> 
> 
>> On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
>> 
>> Mark Dilger  writes:
>>> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
>>>> hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 
>>>> 32-bit
>>>> Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
>>>> "prove64".)
>> 
>> Oh, that's annoying.
>> 
>>>> This test should unpack the field as two 32-bit values, not a
>>>> 64-bit value, to avoid requiring more from the Perl installation.
>> 
>>> I will post a modified test in a bit that avoids using Q/q.
>> 
>> Coping with both endiannesses might be painful.
> 
> Not too bad if the bigint value is zero, as both the low and high 32bits will 
> be zero, regardless of endianness.  The question is whether that gives up too 
> much in terms of what the test is trying to do.  I'm not sure that it does, 
> but if you'd rather solve this by upgrading perl, that's ok by me. 


I'm not advocating that this be the solution, but if we don't fix up the perl 
end of it, then this test change might be used instead.



v5-0001-pg_amcheck-continuing-to-fix-portability-problems.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application




> On Mar 12, 2021, at 10:36 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> On Mar 12, 2021, at 10:22 PM, Tom Lane  wrote:
>>> Coping with both endiannesses might be painful.
> 
>> Not too bad if the bigint value is zero, as both the low and high 32bits 
>> will be zero, regardless of endianness.  The question is whether that gives 
>> up too much in terms of what the test is trying to do.  I'm not sure that it 
>> does, but if you'd rather solve this by upgrading perl, that's ok by me. 
> 
> I don't mind updating the perl installations on prairiedog and gaur,
> but Noah might have some difficulty with his AIX flotilla, as I believe
> he's not sysadmin there.
> 
> You might think about using some symmetric-but-not-zero value,
> 0x01010101 or the like.

I thought about that, but I'm not sure that it proves much more than just using 
zero.  The test doesn't really do much of interest with this value, and it 
doesn't seem worth complicating the test.  The idea originally was that perl's 
"q" pack code would make reading/writing a number such as 12345678 easy, but 
since it's not easy, this is easy.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-03-13 Thread Mark Rofail

>
>
> This still fails for CI (windows) and me (linux)
>
> Can you provide more information about your Linux platform? So I may test
> the errors for myself?
>
Seems the new tests fails every CI. That’s good honestly, that we found
this problem.

The `arrays` regression test extensively test the new operators. So I think
the problem will be in when the operators are combined with the GIN index.

The problem is that the tests don’t fail on my linux build. Any idea how to
replicate the failed tests so I can debug?

/Mark

Re: pg_amcheck contrib application

> On Mar 13, 2021, at 6:50 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> I think it would be good to use a non-zero value here. We're doing a
>> lot of poking into raw bytes here, and if something goes wrong, a zero
>> value is more likely to look like something normal than whatever else.
>> I suggest picking a value where all 8 bytes are the same, but not
>> zero, and ideally chosen so that they don't look much like any of the
>> surrounding bytes.
> 
> Actually, it seems like we can let pack/unpack deal with byte-swapping
> within 32-bit words; what we lose by not using 'q' format is just the
> ability to correctly swap the two 32-bit words.  Hence, any value in
> which upper and lower halves are the same should work, say
> 0x1234567812345678.
> 
>   regards, tom lane

The heap tuple in question looks as follows, with  in the spot we're 
debating:

Tuple Layout (values in hex):

t_xmin: 223
t_xmax: 0
t_field3: 0
bi_hi: 0
bi_lo: 0
ip_posid: 1
t_infomask2: 3
t_infomask: b06
t_hoff: 18
t_bits: 0
a_1: 
a_2: 
b_header: 11# little-endian, will be 88 on big endian
b_body1: 61
b_body2: 62
b_body3: 63
b_body4: 64
b_body5: 65
b_body6: 66
b_body7: 67
c_va_header: 1
c_va_vartag: 12
c_va_rawsize: 2714
c_va_extsize: 2710

valueid and toastrelid are not shown, as they won't be stable.  Relying on 
t_xmin to be stable makes the test brittle, but fortunately that is separated 
from a_1 and a_2 far enough that we should not need to worry about it.

We want to use values that don't look like any of the others.  The complete set 
of nibbles in the values above is [012345678B], leaving the set [9ACDEF] 
unused.  The attached patch uses the value DEADF9F9 as it seems a little easier 
to remember than other permutations of those nibbles.

v6-0001-pg_amcheck-continuing-to-fix-portability-problems.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

amcheck hardening

Peter,

I'd like your advice on the following observations, if you'd be so kind:


Using the pg_amcheck command committed yesterday (thanks, Robert! thanks Tom!), 
I have begun investigating segfaults that sometimes occur when running the 
amcheck routines on intentionally corrupted databases.  We've discussed this 
before, and there are limits to how much hardening is possible, particularly if 
it comes at the expense of backend performance under normal conditions.  There 
are also serious problems with corruption schemes that differ from what is 
likely to go wrong in the wild.

These segfaults present a real usage problem for pg_amcheck.  We made the 
decision [3] to not continue checking if we get a FATAL or PANIC error.  
Getting a segfault in just one database while checking just one index can abort 
a pg_amcheck run that spans multiple databases, tables and indexes, and 
therefore is not easy to blow off.  Perhaps the decision in [3] was wrong, but 
if not, some hardening of amcheck might make this problem less common.

The testing strategy I'm using is to corrupt heap and btree pages in schema 
"public" of the "regression" database created by `make installcheck`, by 
overwriting random bytes in randomly selected locations on pages after the page 
header.  Then I run `pg_amcheck regression` and see if anything segfaults.  
Doing this repeatedly, with random bytes and locations within files not the 
same from one run to the next, I can find the locations of segfaults that are 
particularly common.

Admittedly, this is not what is likely to be wrong in real-world installations. 
 I had a patch as part of the pg_amcheck series that I ultimately abandoned for 
v14 given the schedule.  It reverts tables and indexes (or portions thereof) to 
prior states.  I'll probably move on to testing with that in a bit.


The first common problem [1] happens at verify_nbtree.c:1422 when a corruption 
report is being generated.  The generation does not seem entirely safe, and the 
problematic bit can be avoided, though I suspect you could do better than the 
brute-force solution I'm using locally:

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c4ca633918..fa8b7d5163 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1418,9 +1418,13 @@ bt_target_page_check(BtreeCheckState *state)

  OffsetNumberNext(offset));
itup = (IndexTuple) PageGetItem(state->target, itemid);
tid = BTreeTupleGetPointsToTID(itup);
+#if 0
nhtid = psprintf("(%u,%u)",
 
ItemPointerGetBlockNumberNoCheck(tid),
 
ItemPointerGetOffsetNumberNoCheck(tid));
+#else
+   nhtid = "(UNRESOLVED,UNRESOLVED)";
+#endif
 
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),

The temPointerGetBlockNumberNoCheck(tid) seems to be unsafe here.  I get much 
the same crash at verify_nbtree.c:1136, but it's so similar I'm not bother to 
include a crash report for it.


The second common problem [2] happens at verify_nbtree.c:2762 where it uses 
_bt_compare, which ends up calling pg_detoast_datum_packed on a garbage value.  
I'm not sure we can fix that at the level of _bt_compare, since that would have 
performance consequences on backends under normal conditions, but perhaps we 
could have a function that sanity checks datums, and call that from 
invariant_l_offset() prior to _bt_compare?  I have observed a variant of this 
crash where the text data is not toasted but crashes anyway:

0   libsystem_platform.dylib0x7fff738ec929 
_platform_memmove$VARIANT$Haswell + 41
1   postgres0x00010bf1af34 varstr_cmp + 532 
(varlena.c:1678)
2   postgres0x00010bf1b6c9 text_cmp + 617 
(varlena.c:1770)
3   postgres0x00010bf1bfe5 bttextcmp + 69 
(varlena.c:1990)
4   postgres0x00010bf68c87 FunctionCall2Coll + 167 
(fmgr.c:1163)
5   postgres0x00010b8a7cb5 _bt_compare + 1445 
(nbtsearch.c:721)
6   amcheck.so  0x00011525eaeb invariant_l_offset + 123 
(verify_nbtree.c:2758)
7   amcheck.so  0x00011525cd92 bt_target_page_check + 
4754 (verify_nbtree.c:1398)



[1]

 

postgres_2021-03-13-084305_laptop280-ma-us.crash
Description: Binary data


[2]



postgres_2021-03-13-094450_laptop280-ma-us.crash
Description: Binary data


[3] 
https://www.postgresql.org/message-id/CA%2BTgmob2c0eM8%2B5kzkXaqdc9XbBCkHmtihSOSk-jCzzT4BFhFQ%40mail.gmail.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application




> On Mar 13, 2021, at 10:46 AM, Noah Misch  wrote:
> 
> On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
>>> On Mar 12, 2021, at 3:24 PM, Mark Dilger  
>>> wrote:
>>> 
>>> and the second deals with an apparent problem with IPC::Run shell expanding 
>>> an asterisk on some platforms but not others.  That second one, if true, 
>>> seems like a problem with scope beyond the pg_amcheck project, as 
>>> TestLib::command_checks_all uses IPC::Run, and it would be desirable to 
>>> have consistent behavior across platforms.
> 
>>> One of pg_amcheck's regression tests was passing an asterisk through
>>> TestLib's command_checks_all() command, which gets through to
>>> pg_amcheck without difficulty on most platforms, but appears to get
>>> shell expanded on Windows (jacana) and AIX (hoverfly).
> 
> For posterity, I can't reproduce this on hoverfly.  The suite fails the same
> way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite passes
> even after reverting f371a4c.  Self-contained IPC::Run usage also does not
> corroborate the theory:
> 
> [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
> 'IPC::Run::run "printf", "%s\n", "*"'
> *
> [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
> 'IPC::Run::run "sh", "-c", "printf %sn *"'
> COPYRIGHT
> GNUmakefile.in
> HISTORY
> Makefile
> README
> README.git
> aclocal.m4
> config
> configure
> configure.ac
> contrib
> doc
> src
> 
>> there is a similar symptom caused by an unrelated problem
> 
>> Subject: [PATCH v3] Avoid use of non-portable option ordering in
>> command_checks_all().
> 
> On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this.

Thanks for investigating!

The reason I suspected that passing the '*' through IPC::Run had to do with the 
error that pg_amcheck gave.  It complained that too many arguments where being 
passed to it, and that the first such argument was "pg_amcheck.c".  There's no 
reason pg_amcheck should know it's source file name, nor that the regression 
test should know that, which suggested that the asterisk was being shell 
expanded within the src/bin/pg_amcheck/ directory and the file listing was 
being passed into pg_amcheck as arguments.

That theory may have been wrong, but it was the only theory I had at the time.  
I don't have access to any of the machines where that happened, so it is hard 
for me to investigate further.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: amcheck hardening