date:20200828

Re: [PATCH] Explicit null dereferenced (src/backend/access/heap/heaptoast.c)

2020-08-28 Thread gkokolatos

‐‐‐ Original Message ‐‐‐
On Friday, 28 August 2020 03:22, Ranier Vilela  wrote:

> Hi,
>
> Per Coverity.
>
> When "Prepare for toasting", it is necessary to turn off the flag 
> TOAST_NEEDS_DELETE_OLD,
> if there is no need to delete external values from the old tuple, otherwise,
> there are dereference NULL at toast_helper.c (on toast_tuple_cleanup 
> function).
>

Excuse my ignorance, isn't this a false positive?

Regardless right after prepare for toasting, a call to toast_tuple_init is made 
which will explicitly and unconditionally set ttc_flags to zero so the flag bit 
set in the patch will be erased anyways. This patch may make coverity happy but 
does not really change anything in the behaviour of the code.

Furthermore, in the same function, toast_tuple_init, the flag is set to 
TOAST_NEEDS_DELETE_OLD after the old value is actually inspected and found to 
not be null, be stored on disk and to be different than the new value. To my 
understanding, this seems to be correct.

Can you please explain to me what I am missing?

//Georgios

> regards,
> Ranier Vilela

Re: Display individual query in pg_stat_activity

2020-08-28 Thread Masahiro Ikeda


On 2020-08-19 14:48, Drouvot, Bertrand wrote:

Hi,
On 8/18/20 9:35 AM, Pavel Stehule wrote:


Hi

út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
 napsal:


Hi,


I've attached a patch to display individual query in the
pg_stat_activity query field when multiple SQL statements are
currently displayed.

Motivation:

When multiple statements are displayed then we don’t know

which

one is currently running.

I'm not sure I'd want that to happen, as it could make it much
harder to track the activity back to a query in the application
layer or server logs.

Perhaps a separate field could be added for the current

statement,

or a value to indicate what the current statement number in the
query is?


As a user, I think this feature is useful to users.

It would be nice that pg_stat_activity also show currently running
query
in a user defined function(PL/pgSQL) .

I understood that this patch is not for user defined functions.
Please let me know if it's better to make another thread.


Yeah I think it would be nice to have.

I also think it would be better to create a dedicated thread
(specially looking at Pavel's comment below)


Thank you. I will.


In general, PL/pgSQL functions have multiple queries,
and users want to know the progress of query execution, doesn't
it?


I am afraid of the significant performance impact of this feature.
In this case you have to copy all nested queries to the stat
collector process. Very common usage of PL is a glue of very fast
queries. Sure, it is used like glue for very slow queries too.
Just I thinking about two features:


OK, thanks for much advice and show alternative solutions.


1. extra interface for auto_explain, that allows you to get a stack
of statements assigned to some pid (probably these informations
should be stored inside shared memory and collected before any query
execution). Sometimes some slow function is slow due repeated
execution of relatively fast queries. In this case, the deeper
nested level is not too interesting. You need to see a stack of
calls and you are searching the first slow level in the stack.


Thanks. I didn't know auto_explain module.
I agreed when only requested, it copy the stack of statements.


2. can be nice to have a status column in pg_stat_activity, and
status GUC for sending a custom information from deep levels to the
user. Now, users use application_name, but some special variables
can be better for this purpose.  This value of status can be
refreshed periodically and can substitute some tags. So developer
can set

BEGIN
-- before slow long query
SET status TO 'slow query calculation xxy %d';
...

It is a alternative to RAISE NOTICE, but with different format -
with format that is special for reading from pg_stat_activity

For long (slow) queries usually you need to see the sum of all times
of all levels from the call stack to get valuable information.


In comparison to 1, user must implements logging statement to
their query but user can control what he/she wants to know.

I worry which solution is best.


p.s. pg_stat_activity is maybe too wide table already, and probably
is not good to enhance this table too much


Thanks. I couldn't think from this point of view.

After I make some PoC patches, I will create a dedicated thread.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-28 Thread Masahiko Sawada

On Fri, 28 Aug 2020 at 05:14, Robert Haas  wrote:
>
> On Wed, Aug 26, 2020 at 10:26 PM Ashutosh Sharma  
> wrote:
> > Okay, point noted.
>
> I spent some time today working on this patch. I'm fairly happy with
> it now and intend to commit it if nobody sees a big problem with that.
> Per discussion, I do not intend to back-patch at this time. The two
> most significant changes I made to your version are:

Thank you for updating the patch.

>
> 1. I changed things around to avoid using any form of ereport() in a
> critical section. I'm not actually sure whether it is project policy
> to avoid ereport(NOTICE, ...) or similar in a critical section, but it
> seems prudent, because if anything fails in a critical section, we
> will PANIC, so doing fewer things there seems prudent.
>
> 2. I changed the code so that it does not try to follow redirected
> line pointers; instead, it skips them with an appropriate message, as
> we were already doing for dead and unused line pointers. I think the
> way you had it coded might've been my suggestion originally, but the
> more I looked into it the less I liked it. One problem is that it
> didn't match the docs. A second is that following a corrupted line
> pointer might index off the end of the line pointer array, and while
> that probably shouldn't happen, we are talking about corruption
> recovery here. Then I realized that, as you coded it, if the line
> pointer was redirected to a line pointer that is in turn dead (or
> unused, if there's corruption) the user would get a NOTICE complaining
> about a TID they hadn't specified, which seems like it would be very
> confusing. I thought about trying to fix all that stuff, but it just
> didn't seem worth it, because I can't think of a good reason to pass
> this function the TID of a redirected line pointer in the first place.
> If you're doing surgery, you should probably specify the exact thing
> upon which you want to operate, not some other thing that points to
> it.
>
> Here is a list of other changes I made:
>
> * Added a .gitignore file.
> * Renamed the regression test file from pg_surgery to heap_surgery to
> match the name of the single C source file we currently have.
> * Capitalized TID in a few places.
> * Ran pgindent.
> * Adjusted various comments.
> * Removed the check for an empty TID array. I don't see any reason why
> this should be an error case and I don't see much precedent for having
> such a check.
> * Fixed the code to work properly with that change and added a test case.
> * Added a check that the array is not multi-dimensional.
> * Put the AM type check before the relkind check, following existing 
> precedent.
> * Adjusted the check to use the AM OID rather than the handler OID,
> following existing precedent. Fixed the message wording accordingly.
> * Changed the documentation wording to say less about specific
> recovery procedures and focus more on the general idea that this is
> dangerous.

You've removed the description about executing VACUUM with
DISABLE_PAGE_SKIPPING option on the target relation after using
pg_surgery functions from the doc but I guess it’s better to recommend
that in the doc for safety. Could you please tell me the reason for
removing that?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_index.indisreplident and invalid indexes

2020-08-28 Thread Dmitry Dolgov

> On Thu, Aug 27, 2020 at 11:57:21AM +0900, Michael Paquier wrote:
>
> I think that this problem is similar to indisclustered, and that we
> had better set indisreplident to false when clearing indisvalid for an
> index concurrently dropped.  This would prevent problems with ALTER
> TABLE of course, but also the relcache.
>
> Any objections to the attached?  I am not sure that this is worth a
> backpatch as that's unlikely going to be a problem in the field, so
> I'd like to fix this issue only on HEAD.  This exists since 9.4 and
> the introduction of replica identities.

Thanks for the patch. It sounds right, so no objections from me. But I
wonder if something similar has to be done also for
index_concurrently_swap function?

/*
 * Mark the new index as valid, and the old index as invalid similarly 
to
 * what index_set_state_flags() does.
 */
newIndexForm->indisvalid = true;
oldIndexForm->indisvalid = false;
oldIndexForm->indisclustered = false;

Re: pg_index.indisreplident and invalid indexes

2020-08-28 Thread Michael Paquier

On Fri, Aug 28, 2020 at 10:15:37AM +0200, Dmitry Dolgov wrote:
> Thanks for the patch. It sounds right, so no objections from me. But I
> wonder if something similar has to be done also for
> index_concurrently_swap function?

As of index.c, this already happens:
/* Preserve indisreplident in the new index */
newIndexForm->indisreplident = oldIndexForm->indisreplident;
oldIndexForm->indisreplident = false;

In short, the new concurrent index is created first with
indisreplident = false, and when swapping the old and new indexes, the
new index inherits the setting of the old one, and the old one planned
for drop uses indisreplident = false when swapping.
--
Michael


signature.asc
Description: PGP signature

Re: Deprecating postfix and factorial operators in PostgreSQL 13

2020-08-28 Thread John Naylor

Hi Mark,

-{ oid => '111',
+{ oid => '111', descr => 'factorial',

I see that opr_sanity fails without something here. We explicitly
don't have descriptions of functions that implement deprecated
operators (see setup_description() in initdb.c), but in all other
cases, there are also supported operators present. Technically, it's
not the same entry as the sql-callable function (1376), so it might be
better to try to match the other operator functions and say
"implementation of deprecated ! and !! operators".

For typeconv.sgml, it looks like in v14 we'll just have a different
operator entirely for the example, so ideally we would backpatch that
change for v13. What you have is good enough in a pinch, though.

--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: factorial function/phase out postfix operators?

2020-08-28 Thread John Naylor

On Thu, Aug 27, 2020 at 5:04 PM Tom Lane  wrote:
> I'm a bit inclined to kill them both off and standardize on factorial()
> (not numeric_fac).

+1

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Display individual query in pg_stat_activity

2020-08-28 Thread Pavel Stehule

pá 28. 8. 2020 v 10:06 odesílatel Masahiro Ikeda 
napsal:

> On 2020-08-19 14:48, Drouvot, Bertrand wrote:
> > Hi,
> > On 8/18/20 9:35 AM, Pavel Stehule wrote:
> >
> >> Hi
> >>
> >> út 18. 8. 2020 v 8:54 odesílatel Masahiro Ikeda
> >>  napsal:
> >>
> >>> Hi,
> >>>
>  I've attached a patch to display individual query in the
>  pg_stat_activity query field when multiple SQL statements are
>  currently displayed.
> 
>  Motivation:
> 
>  When multiple statements are displayed then we don’t know
> >>> which
>  one is currently running.
> 
>  I'm not sure I'd want that to happen, as it could make it much
>  harder to track the activity back to a query in the application
>  layer or server logs.
> 
>  Perhaps a separate field could be added for the current
> >>> statement,
>  or a value to indicate what the current statement number in the
>  query is?
> >>>
> >>> As a user, I think this feature is useful to users.
> >>>
> >>> It would be nice that pg_stat_activity also show currently running
> >>> query
> >>> in a user defined function(PL/pgSQL) .
> >>>
> >>> I understood that this patch is not for user defined functions.
> >>> Please let me know if it's better to make another thread.
> >
> > Yeah I think it would be nice to have.
> >
> > I also think it would be better to create a dedicated thread
> > (specially looking at Pavel's comment below)
>
> Thank you. I will.
>
> >>> In general, PL/pgSQL functions have multiple queries,
> >>> and users want to know the progress of query execution, doesn't
> >>> it?
> >>
> >> I am afraid of the significant performance impact of this feature.
> >> In this case you have to copy all nested queries to the stat
> >> collector process. Very common usage of PL is a glue of very fast
> >> queries. Sure, it is used like glue for very slow queries too.
> >> Just I thinking about two features:
>
> OK, thanks for much advice and show alternative solutions.
>
> >> 1. extra interface for auto_explain, that allows you to get a stack
> >> of statements assigned to some pid (probably these informations
> >> should be stored inside shared memory and collected before any query
> >> execution). Sometimes some slow function is slow due repeated
> >> execution of relatively fast queries. In this case, the deeper
> >> nested level is not too interesting. You need to see a stack of
> >> calls and you are searching the first slow level in the stack.
>
> Thanks. I didn't know auto_explain module.
> I agreed when only requested, it copy the stack of statements.
>
> >> 2. can be nice to have a status column in pg_stat_activity, and
> >> status GUC for sending a custom information from deep levels to the
> >> user. Now, users use application_name, but some special variables
> >> can be better for this purpose.  This value of status can be
> >> refreshed periodically and can substitute some tags. So developer
> >> can set
> >>
> >> BEGIN
> >> -- before slow long query
> >> SET status TO 'slow query calculation xxy %d';
> >> ...
> >>
> >> It is a alternative to RAISE NOTICE, but with different format -
> >> with format that is special for reading from pg_stat_activity
> >>
> >> For long (slow) queries usually you need to see the sum of all times
> >> of all levels from the call stack to get valuable information.
>
> In comparison to 1, user must implements logging statement to
> their query but user can control what he/she wants to know.
>
> I worry which solution is best.
>

There is no best solution - @1 doesn't need manual work, but @1 is not too
useful when queries are similar (first n chars) and are long. In this case
custom messages are much more practical.

I don't think so we can implement only one design - in this case we can
support more tools with similar purpose but different behaviors in corner
cases.


> >> p.s. pg_stat_activity is maybe too wide table already, and probably
> >> is not good to enhance this table too much
>
> Thanks. I couldn't think from this point of view.
>
> After I make some PoC patches, I will create a dedicated thread.
>
> Regards,
> --
> Masahiro Ikeda
> NTT DATA CORPORATION
>

Re: factorial function/phase out postfix operators?

2020-08-28 Thread John Naylor

On Wed, Aug 26, 2020 at 6:57 PM Mark Dilger
 wrote:
> I don't have any problem with the changes you made in your patch, but 
> building on your changes I also found that the following cleanup causes no 
> apparent problems:
>
> -%nonassoc  UNBOUNDED   /* ideally should have same 
> precedence as IDENT */
> -%nonassoc  IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE 
> ROLLUP
> +%nonassoc  UNBOUNDED IDENT
> +%nonassoc  PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP

Thinking about this some more, I don't think we don't need to do any
precedence refactoring in order to apply the functional change of
these patches. We could leave that for follow-on patches once we
figure out the best way forward, which could take some time.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-28 Thread Dilip Kumar

On Thu, Aug 27, 2020 at 11:16 AM Amit Kapila  wrote:
>
> On Wed, Aug 26, 2020 at 11:22 PM Jeff Janes  wrote:
> >
> >
> > On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila  wrote:
> >
> >>
> >>  I am planning
> >> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> >> this series tomorrow unless you have any comments on the same.
> >
> >
> >
> > I'm getting compiler warnings now, src/backend/storage/file/sharedfileset.c 
> > line 288 needs to be:
> >
> > boolfound PG_USED_FOR_ASSERTS_ONLY = false;
> >
>
> Thanks for the report. Tom Lane has already fixed this [1].
>
> [1] - 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e942af7b8261cd8070d0eeaf518dbc1a664859fd

As discussed, I have added a another test case for covering the out of
order subtransaction rollback scenario.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


tap_test_for_out_of_order_subxact_abort.patch
Description: Binary data

Re: Transactions involving multiple postgres foreign servers, take 2

2020-08-28 Thread Masahiro Ikeda


On 2020-07-17 15:55, Masahiko Sawada wrote:
On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda 


wrote:


On 2020-07-16 13:16, Masahiko Sawada wrote:
On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda 


wrote:


I've attached the latest version patches. I've incorporated the 
review

comments I got so far and improved locking strategy.


I want to ask a question about streaming replication with 2PC.
Are you going to support 2PC with streaming replication?

I tried streaming replication using v23 patches.
I confirm that 2PC works with streaming replication,
which there are primary/standby coordinator.

But, in my understanding, the WAL of "PREPARE" and
"COMMIT/ABORT PREPARED" can't be replicated to the standby server 
in

sync.

If this is right, the unresolved transaction can be occurred.

For example,

1. PREPARE is done
2. crash primary before the WAL related to PREPARE is
replicated to the standby server
3. promote standby server // but can't execute "ABORT PREPARED"

In above case, the remote server has the unresolved transaction.
Can we solve this problem to support in-sync replication?

But, I think some users use async replication for performance.
Do we need to document the limitation or make another solution?



IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so 
far

has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes.


Thanks! I didn't noticed the behavior when a user executes PREPARE
TRANSACTION is same.

IIUC with 2PC, there is a different point between (1)PREPARE
TRANSACTION
and (2)2PC.
The point is that whether the client can know when the server crashed
and it's global tx id.

If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
command
because if the remote server is already prepared the command will be
ignored.

But, if (2)2PC is failed with coordinator crash, the client can't 
know

what operations should be done.

If the old coordinator already executed PREPARED, there are some
transaction which should be ABORT PREPARED.
But if the PREPARED WAL is not sent to the standby, the new
coordinator
can't execute ABORT PREPARED.
And the client can't know which remote servers have PREPARED
transactions which should be ABORTED either.

Even if the client can know that, only the old coordinator knows its
global transaction id.
Only the database administrator can analyze the old coordinator's log
and then execute the appropriate commands manually, right?


I think that's right. In the case of the coordinator crash, the user
can look orphaned foreign prepared transactions by checking the
'identifier' column of pg_foreign_xacts on the new standby server and
the prepared transactions on the remote servers.


I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby 
server.

It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.


If the primary replicates XLOG_FDWXACT_INSERT to the standby 
asynchronously,

some prepared transaction may be unsolved forever.

Since I think to solve this inconsistency manually is hard operation,
we need to support synchronous XLOG_FDWXACT_INSERT replication.

I understood that there are a lot of impact to the performance,
but users can control the consistency/durability vs performance
with synchronous_commit parameter.

What do you think?


Thank you for letting me know. I've attached the latest version patch 
set.


Thanks for updating.
But, the latest patches failed to be applied to the master branch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

RE: Please help for error ( file is required for XML support )

2020-08-28 Thread Sachin Khanna

Package libxml2-devel-2.9.1-6.el7.4.ppc64le is already installed.

Thanks and Regards,
SACHIN KHANNA
212 BASIS DBA TEAM OFFSHORE
Office : 204058624
Cell : 9049522511


From: Ranier Vilela 
Sent: Thursday, August 27, 2020 6:46 PM
To: PostgreSQL Hackers 
Cc: Sachin Khanna 
Subject: re: Please help for error ( file  is required for XML 
support )


[**EXTERNAL EMAIL**]
>checking for libxml/parser.h... no
>configure: error: header file  is required for XML support

sudo yum install libxml2-devel

You need to install the development package before ./configure.

regards,
Ranier Vilela

RE: extension patch of CREATE OR REPLACE TRIGGER

2020-08-28 Thread osumi.takami...@fujitsu.com

Hi, Peter


You gave me two posts for my patch review.
Thank you so much. I'll write all my replies into this post.

> 
> 
> COMMENT pg_constraint.c (wrong comment?)
> 
> - * The new constraint's OID is returned.
> + * The new constraint's OID is returned. (This will be the same as
> + * "conOid" if that is specified as nonzero.)
> 
> Shouldn't that comment say:
> (This will be the same as "existing_constraint_oid" if that is other than
> InvalidOid)
Thanks. I corrected this part.

> 
> 
> 
> COMMENT pg_constraint.c (declarations)
> 
> @@ -91,6 +93,11 @@ CreateConstraintEntry(const char *constraintName,
>   NameData cname;
>   int i;
>   ObjectAddress conobject;
> + SysScanDesc   conscan;
> + ScanKeyData   skey[2];
> + HeapTuple tuple;
> + bool  replaces[Natts_pg_constraint];
> + Form_pg_constraint constrForm;
> 
> Maybe it is more convenient/readable to declare these in the scope where they
> are actually used.
You're right. Fixed.


> 
> 
> COMMENT pg_constraint.c (oid checking)
> 
> - conOid = GetNewOidWithIndex(conDesc, ConstraintOidIndexId,
> - Anum_pg_constraint_oid);
> - values[Anum_pg_constraint_oid - 1] = ObjectIdGetDatum(conOid);
> + if (!existing_constraint_oid)
> + {
> + conOid = GetNewOidWithIndex(conDesc, ConstraintOidIndexId,
> + Anum_pg_constraint_oid);
> 
> Maybe better to use if (!OidIsValid(existing_constraint_oid)) here.
Got it. I replaced that part by OidIsValid ().


> 
> 
> COMMENT tablecmds.c (unrelated change)
> 
> -   false); /* is_internal */
> +   false); /* is_internal */
> 
> Some whitespace which has nothing to do with the patch was changed.
Yeah. Fixed.

> 
> 
> COMMENT trigger.c (declarations / whitespace)
> 
> + bool is_update = false;
> + HeapTuple newtup;
> + TupleDesc tupDesc;
> + bool replaces[Natts_pg_trigger];
> + Oid existing_constraint_oid = InvalidOid; bool trigger_exists = false;
> + bool trigger_deferrable = false;
> 
> 1. One of those variables is misaligned with tabbing.
Fixed.

> 
> 2. Maybe it is more convenient/readable to declare some of these in the scope
> where they are actually used.
> e.g. newtup, tupDesc, replaces.
I cannot do this because those variables are used
at the top level in this function. Anyway, thanks for the comment.

> 
> 
> COMMENT trigger.c (error messages)
> 
> + /*
> + * If this trigger has pending event, throw an error.
> + */
> + if (stmt->replace && !isInternal && trigger_deferrable &&
> + AfterTriggerPendingOnRel(RelationGetRelid(rel)))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_IN_USE),
> + errmsg("cannot replace \"%s\" on \"%s\" because it has pending
> trigger events",
> + stmt->trigname, RelationGetRelationName(rel;
> + /*
> + * without OR REPLACE clause, can't override the trigger with the same name.
> + */
> + if (!stmt->replace && !isInternal)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("trigger \"%s\" for relation \"%s\" already exists",
> + stmt->trigname, RelationGetRelationName(rel;
> + /*
> + * CREATE OR REPLACE CONSTRAINT TRIGGER command can't replace
> non-constraint trigger.
> + */
> + if (stmt->replace && stmt->isconstraint &&
> !OidIsValid(existing_constraint_oid))
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("Trigger \"%s\" for relation \"%s\" cannot be replaced with
> constraint trigger",
> + stmt->trigname, RelationGetRelationName(rel;
> + /*
> + * CREATE OR REPLACE TRIGGER command can't replace constraint trigger.
> + */
> + if (stmt->replace && !stmt->isconstraint &&
> OidIsValid(existing_constraint_oid))
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> + errmsg("Constraint trigger \"%s\" for relation \"%s\" cannot be
> replaced with non-constraint trigger",
> + stmt->trigname, RelationGetRelationName(rel;
> 
> 
> 1. The order of these new errors is confusing. Maybe do the "already exists"
> check first so that all the REPLACE errors can be grouped together.
OK. I sorted out the order and conditions for this part.

> 2. There is inconsistent message text capitalising the 1st word.
> Should all be lower [1]
> [1] - https://www.postgresql.org/docs/current/error-style-guide.html
Fixed.


> 3. That "already exists" error might benefit from a message hint. e.g.
> ---
> ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT),
> errmsg("trigger \"%s\" for relation \"%s\" already exists", ...), errhint("use
> CREATE OR REPLACE to replace it")));
> ---
> 
> 4. Those last two errors seem like a complicated way just to say the trigger
> types are not compatible. Maybe these messages can be simplified, and some
> message hints added. e.g.
> ---
> ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT),
> errmsg("trigger \"%s\" for relation \"%s\" is a regular trigger", ...),
> errhint("use CREATE OR REPLACE TRIGGER to replace a regular
> trigger")));
> 
> 
> ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT),
> errmsg("trigger \"%s\" for relation \"%s\" is a constraint trigger", ...),
> errhint("use

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-28 Thread Ashutosh Sharma

> > The tidcmp function can be removed, and ItemPointerCompare used
directly by qsort as:
> >
> > -   qsort((void*) tids, ntids, sizeof(ItemPointerData),
tidcmp);
> > +   qsort((void*) tids, ntids, sizeof(ItemPointerData),
> > + (int (*) (const void *, const void *))
ItemPointerCompare);
> >
>
> Will have a look into this.
>

We can certainly do this way, but I would still prefer having a comparator
function (tidcmp) here for the reasons that it makes the code look a bit
cleaner, it also makes us more consistent with the way the comparator
function argument is being passed to qsort at several other places in
postgres which kinda of increases the code readability and simplicity. For
e.g. there is a comparator function for gin that does the same thing as
tidcmp is doing here. See below:

static int
qsortCompareItemPointers(const void *a, const void *b)
{
int res = ginCompareItemPointers((ItemPointer) a, (ItemPointer)
b);

/* Assert that there are no equal item pointers being sorted */
Assert(res != 0);
return res;
}

In this case as well, it could have been done the way you are suggesting,
but it seems like writing a small comparator function with the prototype
that qsort accepts looked like a better option. Considering this, I am just
leaving this as-it-is. Please let me know if you feel the other way round.

> > The header comment for function find_tids_one_page should state the
requirement that the tids array must be sorted.
> >
>
> Okay, will add a comment for this.
>

Added a comment for this in the attached patch.

Please have a look into the attached patch for the changes and let me know
for any other concerns. Thank you.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From ee49cfbfc153bbf6212d90ac81a2f9d13fa0eef7 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Fri, 28 Aug 2020 15:00:41 +0530
Subject: [PATCH] Add contrib/pg_surgery to perform surgery on a damaged
 relation.

This contrib module provides couple of functions named heap_force_kill
and heap_force_freeze that can be used to perform surgery on a damaged
heap table.

Author: Ashutosh Sharma and Robert Haas.
Reviewed by: Sawada Masahiko, Andrey Borodin, Beena Emerson, Mark Dilger and Robert Haas
Tested by: Rajkumar Raghuwanshi
Discussion: https://www.postgresql.org/message-id/CA%2BTgmoZW1fsU-QUNCRUQMGUygBDPVeOTLCqRdQZch%3DEYZnctSA%40mail.gmail.com
---
 contrib/Makefile |   1 +
 contrib/pg_surgery/.gitignore|   4 +
 contrib/pg_surgery/Makefile  |  23 ++
 contrib/pg_surgery/expected/heap_surgery.out | 180 +++
 contrib/pg_surgery/heap_surgery.c| 428 +++
 contrib/pg_surgery/pg_surgery--1.0.sql   |  18 ++
 contrib/pg_surgery/pg_surgery.control|   5 +
 contrib/pg_surgery/sql/heap_surgery.sql  |  91 ++
 doc/src/sgml/contrib.sgml|   1 +
 doc/src/sgml/filelist.sgml   |   1 +
 doc/src/sgml/pgsurgery.sgml  | 107 +++
 src/tools/pgindent/typedefs.list |   1 +
 12 files changed, 860 insertions(+)
 create mode 100644 contrib/pg_surgery/.gitignore
 create mode 100644 contrib/pg_surgery/Makefile
 create mode 100644 contrib/pg_surgery/expected/heap_surgery.out
 create mode 100644 contrib/pg_surgery/heap_surgery.c
 create mode 100644 contrib/pg_surgery/pg_surgery--1.0.sql
 create mode 100644 contrib/pg_surgery/pg_surgery.control
 create mode 100644 contrib/pg_surgery/sql/heap_surgery.sql
 create mode 100644 doc/src/sgml/pgsurgery.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d41..c8d2a16 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_prewarm	\
 		pg_standby	\
 		pg_stat_statements \
+		pg_surgery	\
 		pg_trgm		\
 		pgcrypto	\
 		pgrowlocks	\
diff --git a/contrib/pg_surgery/.gitignore b/contrib/pg_surgery/.gitignore
new file mode 100644
index 000..5dcb3ff
--- /dev/null
+++ b/contrib/pg_surgery/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_surgery/Makefile b/contrib/pg_surgery/Makefile
new file mode 100644
index 000..a66776c
--- /dev/null
+++ b/contrib/pg_surgery/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_surgery/Makefile
+
+MODULE_big = pg_surgery
+OBJS = \
+	$(WIN32RES) \
+	heap_surgery.o
+
+EXTENSION = pg_surgery
+DATA = pg_surgery--1.0.sql
+PGFILEDESC = "pg_surgery - perform surgery on a damaged relation"
+
+REGRESS = heap_surgery
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_surgery
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_surgery/expected/heap_surgery.out b/contrib/pg_surgery/expected/heap_surgery.out
new file mode 100644
index 000..9451c57
--- /dev/null
+++ b/cont

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Magnus Hagander

On Fri, Aug 28, 2020 at 2:30 AM Stephen Frost  wrote:

> Greetings,
>
> There's no shortage of requests and responses regarding how to have a
> 'read all of the data' role in PG, with various hacks involving "GRANT
> ALL" and "ALTER DEFAULT PRIVILEGES" to "solve" this, neither of which
> really works long term ("GRANT ALL" is one-time, and "ALTER DEFAULT"
> only helps for the roles that exist today).
>
> Now that we have the default role system, we can provide a proper
> solution to this oft-requested capability.
>
> This patch adds a default role to meet specifically that use-case, in
> the long-term, by explicitly allowing SELECT rights on all relations,
> and USAGE rights on all schemas, for roles who are members of the new
> 'pg_read_all_data' role.
>
> No effort is made to prevent a user who has this role from writing data-
> that's up to the admin, but this will allow someone to use pg_dump or
> pg_dumpall in a much more reliable manner to make sure that the entire
> database is able to be exported for the purpose of backups, upgrades, or
> other common use-cases, without having to have that same user be a PG
> superuser.
>
> This role is given the Bypass RLS right, though to use it effectively, a
> user would need to pass '--role=pg_read_all_data' to pg_dump/pg_dumpall,
> since role attributes are not checked as part of role membership.
>
> Thoughts?
>

Without having actually looked at the code, definite +1 for this feature.
It's much requested...

But, should we also have a pg_write_all_data to go along with it?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-28 Thread Neha Sharma

Hi,

I have done code coverage analysis on the latest patches(v53) and below is
the report for the same.
Highlighted are the files where the coverage modifications were observed.

OS: Ubuntu 18.04
Patch applied on commit : 77c7267c37f7fa8e5e48abda4798afdbecb2b95a

File Name
Coverage
Without logical decoding patch On v53 (2,3,4,5) patch Without v53-0003 patch
%Line %Function %Line %Function %Line %Function
src/backend/access/transam/xact.c 86.2 92.9 86.2 92.9 86.2 92.9
src/backend/access/transam/xloginsert.c 90.2 94.1 90.2 94.1 90.2 94.1
src/backend/access/transam/xlogreader.c 73.3 93.3 73.8 93.3 73.8 93.3
src/backend/replication/logical/decode.c 93.4 100 93.4 100 93.4 100
src/backend/access/rmgrdesc/xactdesc.c 54.4 63.6 54.4 63.6 54.4 63.6
src/backend/replication/logical/reorderbuffer.c 93.4 96.7 93.4 96.7 93.4
96.7
src/backend/utils/cache/inval.c 98.1 100 98.1 100 98.1 100
contrib/test_decoding/test_decoding.c 86.8 95.2 86.8 95.2 86.8 95.2
src/backend/replication/logical/logical.c 90.9 93.5 90.9 93.5 91.8 93.5
src/backend/access/heap/heapam.c 86.1 94.5 86.1 94.5 86.1 94.5
src/backend/access/index/genam.c 90.7 91.7 91.2 91.7 91.2 91.7
src/backend/access/table/tableam.c 90.6 100 90.6 100 90.6 100
src/backend/utils/time/snapmgr.c 81.1 98.1 80.2 98.1 81.1 98.1
src/include/access/tableam.h 92.5 100 92.5 100 92.5 100
src/backend/access/heap/heapam_visibility.c 77.8 100 77.8 100 77.8 100
src/backend/replication/walsender.c 90.5 97.8 90.5 97.8 90.9 100
src/backend/catalog/pg_subscription.c 96 100 96 100 96 100
src/backend/commands/subscriptioncmds.c 93.2 90 92.7 90 92.7 90
src/backend/postmaster/pgstat.c 64.2 85.1 63.9 85.1 64.6 86.1
src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 82.4 95 82.5 95
83.6 95
src/backend/replication/logical/proto.c 93.5 91.3 93.7 93.3 93.7 93.3
src/backend/replication/logical/worker.c 91.6 96 91.5 97.4 91.9 97.4
src/backend/replication/pgoutput/pgoutput.c 81.9 100 85.5 100 86.2 100
src/backend/replication/slotfuncs.c 93 93.8 93 93.8 93 93.8
src/include/pgstat.h 100 - 100 - 100 -
src/backend/replication/logical/logicalfuncs.c 87.1 90 87.1 90 87.1 90
src/backend/storage/file/buffile.c 68.3 85 69.6 85 69.6 85
src/backend/storage/file/fd.c 81.1 93 81.1 93 81.1 93
src/backend/storage/file/sharedfileset.c 77.7 90.9 93.2 100 93.2 100
src/backend/utils/sort/logtape.c 94.4 100 94.4 100 94.4 100
src/backend/utils/sort/sharedtuplestore.c 90.1 90.9 90.1 90.9 90.1 90.9

Thanks.
--
Regards,
Neha Sharma


On Thu, Aug 27, 2020 at 11:16 AM Amit Kapila 
wrote:

> On Wed, Aug 26, 2020 at 11:22 PM Jeff Janes  wrote:
> >
> >
> > On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila 
> wrote:
> >
> >>
> >>  I am planning
> >> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> >> this series tomorrow unless you have any comments on the same.
> >
> >
> >
> > I'm getting compiler warnings now,
> src/backend/storage/file/sharedfileset.c line 288 needs to be:
> >
> > boolfound PG_USED_FOR_ASSERTS_ONLY = false;
> >
>
> Thanks for the report. Tom Lane has already fixed this [1].
>
> [1] -
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e942af7b8261cd8070d0eeaf518dbc1a664859fd
>
> --
> With Regards,
> Amit Kapila.
>
>
>

Re: SyncRepLock acquired exclusively in default configuration

2020-08-28 Thread Asim Praveen



> On 28-Aug-2020, at 7:03 AM, Fujii Masao  wrote:
> 
> On 2020/08/27 15:59, Asim Praveen wrote:
>> 
>> +1.  May I suggest the following addition to the above comment (feel free to
>> rephrase / reject)?
>> "If sync_standbys_defined was being set from false to true and we observe it 
>> as
>> false, it ok to skip the wait.  Replication was async and it is in the 
>> process
>> of being changed to sync, due to user request.  Subsequent commits will 
>> observe
>> the change and start waiting.”
> 
> Thanks for the suggestion! I'm not sure if it's worth adding this because
> it seems obvious thing. But maybe you imply that we need to comment
> why the lock is not necessary when sync_standbys_defined is false. Right?
> If so, what about updating the comments as follows?
> 
> +  * Since this routine gets called every commit time, it's important to
> +  * exit quickly if sync replication is not requested. So we check
> +  * WalSndCtl->sync_standbys_defined flag without the lock and exit
> +  * immediately if it's false. If it's true, we need to check it again 
> later
> +  * while holding the lock, to check the flag and operate the sync rep
> +  * queue atomically. This is necessary to avoid the race condition
> +  * described in SyncRepUpdateSyncStandbysDefined(). On the other
> +  * hand, if it's false, the lock is not necessary because we don't touch
> +  * the queue.

Thank you for updating the comment.  This looks better.

Asim

Re: SyncRepLock acquired exclusively in default configuration

2020-08-28 Thread Masahiko Sawada

On Fri, 28 Aug 2020 at 10:33, Fujii Masao  wrote:
>
>
>
> On 2020/08/27 15:59, Asim Praveen wrote:
> >
> >> On 26-Aug-2020, at 11:10 PM, Fujii Masao  
> >> wrote:
> >>
> >> I added the following comments based on the suggestion by Sawada-san 
> >> upthread. Thought?
> >>
> >> + * Since this routine gets called every commit time, it's important to
> >> + * exit quickly if sync replication is not requested. So we check
> >> + * WalSndCtl->sync_standbys_define without the lock and exit
> >> + * immediately if it's false. If it's true, we check it again later
> >> + * while holding the lock, to avoid the race condition described
> >> + * in SyncRepUpdateSyncStandbysDefined().
> >>
> >
> > +1.  May I suggest the following addition to the above comment (feel free to
> > rephrase / reject)?
> >
> > "If sync_standbys_defined was being set from false to true and we observe 
> > it as
> > false, it ok to skip the wait.  Replication was async and it is in the 
> > process
> > of being changed to sync, due to user request.  Subsequent commits will 
> > observe
> > the change and start waiting.”
>
> Thanks for the suggestion! I'm not sure if it's worth adding this because
> it seems obvious thing. But maybe you imply that we need to comment
> why the lock is not necessary when sync_standbys_defined is false. Right?
> If so, what about updating the comments as follows?
>
> +* Since this routine gets called every commit time, it's important to
> +* exit quickly if sync replication is not requested. So we check
> +* WalSndCtl->sync_standbys_defined flag without the lock and exit
> +* immediately if it's false. If it's true, we need to check it again 
> later
> +* while holding the lock, to check the flag and operate the sync rep
> +* queue atomically. This is necessary to avoid the race condition
> +* described in SyncRepUpdateSyncStandbysDefined(). On the other
> +* hand, if it's false, the lock is not necessary because we don't 
> touch
> +* the queue.
>
> >
> >>
> >> Attached is the updated version of the patch. I didn't change how to
> >> fix the issue. But I changed the check for fast exit so that it's called
> >> before setting the "mode", to avoid a few cycle.
> >>
> >
> >
> > Looks good to me.  There is a typo in the comment:
> >
> >  s/sync_standbys_define/sync_standbys_defined/
>
> Fixed. Thanks!
>

Both v2 and v3 look good to me too. IMO I'm okay with and without the
last sentence.


Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> Without having actually looked at the code, definite +1 for this feature.
> It's much requested...

Thanks.

> But, should we also have a pg_write_all_data to go along with it?

Perhaps, but could certainly be a different patch, and it'd need to be
better defined, it seems to me...  read_all is pretty straight-forward
(the general goal being "make pg_dumpall/pg_dump work"), what would
write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?

Doesn't seem like you could just declare it to be 'allow pg_restore'
either, as that might include creating untrusted functions, et al.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* Georgios Kokolatos (gkokola...@protonmail.com) wrote:
> The patch seems to be implementing a useful and requested feature.
> The patch applies cleanly and passes the basic regress tests. Also the 
> commitfest bot is happy.
> 
> A first pass at the code, has not revealed any worthwhile comments.
> Please allow me for a second and more thorough pass. The commitfest has 
> hardly started after all.

Great, thanks!

> Also allow me a series of genuine questions: 
> 
> What would the behaviour be with REVOKE?
> In a sequence similar to:
> GRANT ALL ON ...

GRANT ALL would be independently GRANT'ing rights to some role and
therefore unrelated.

> REVOKE pg_read_all_data FROM ...

This would simply REVOKE that role from the user.  Privileges
independently GRANT'd directly to the user wouldn't be affected.  Nor
would other role membership.

> What privileges would the user be left with? Would it be possible to end up 
> in the same privilege only with a GRANT command?

I'm not sure what's being asked here.

> Does the above scenario even make sense?

I definitely believe it makes sense for a given role/user to be a member
of pg_read_all_data and to be a member of other roles, or to have other
privileges GRANT'd directly to them.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Jakub Wartak

Hi Thomas, hackers,

>> > To move these writes out of recovery's way, we should probably just
>> > run the bgwriter process during crash recovery.  I'm going to look
>> > into that.
>>
>> Sounds awesome.
>
>I wrote a quick and dirty experimental patch to try that.  I can't see
>any benefit from it on pgbench with default shared buffers, but maybe
>it would do better with your append test due to locality, especially
>if you can figure out how to tune bgwriter to pace itself optimally.
>https://github.com/macdice/postgres/tree/bgwriter-in-crash-recovery

OK, so I've quickly tested those two PoCs patches together, in the conditions 
like below:
- similar append-only workload by pgbench (to eliminate other already known 
different WAL bottlenecks: e.g. sorting),
- 4.3GB of WAL to be applied (mostly Btree/INSERT_LEAF)
- on same system as last time (ext4 on NVMe, 1s8c16, 4.14 kernel) 
- 14master already with SLRU fsync to checkpointer/pg_qgsort patches applied

TEST bgwriterPOC1:
- in severe dirty memory conditions (artificially simulated via small s_b here) 
--> so for workloads with very high FlushBuffer activity in StartupXLOG
- with fsync=off/fpw=off by default and on NVMe (e.g. scenario:  I want to 
perform some PITR as fast as I can to see how production data looked like in 
the past, before some user deleted some data)

baseline s_b@128MB: 140.404, 0.123 (2nd small as there is small region to 
checkpoint)

    22.49%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
            ---copy_user_enhanced_fast_string
               |--14.72%--copyin
               |          __pwrite_nocancel
               |          FileWrite
               |          mdwrite
               |          FlushBuffer
               |          ReadBuffer_common
               |           --14.52%--btree_xlog_insert
                --7.77%--copyout
                          __pread_nocancel
                           --7.57%--FileRead
                                     mdread
                                     ReadBuffer_common
     6.13%  postgres  [kernel.kallsyms]   [k] do_syscall_64
               |--1.64%--__pwrite_nocancel
                --1.23%--__pread_nocancel
     3.68%  postgres  postgres            [.] hash_search_with_hash_value
            ---hash_search_with_hash_value
               |--1.02%--smgropen

After applying:
patch -p1 < ../0001-Run-checkpointer-and-bgworker-in-crash-recovery.patch
patch -p1 < ../0002-Optionally-don-t-wait-for-end-of-recovery-checkpoint.patch

0001+0002 s_b@128MB: similar result to above
0001+0002 s_b@128MB: 108.871, 0.114 , bgwriter_delay = 
10ms/bgwriter_lru_maxpages = 1000
0001+0002 s_b@128MB: 85.392, 0.103 , bgwriter_delay = 
10ms/bgwriter_lru_maxpages = 5 #~390MB max?

    18.40%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
            ---copy_user_enhanced_fast_string
               |--17.79%--copyout
               |          __pread_nocancel
               |          |--16.56%--FileRead
               |          |          mdread
               |          |          ReadBuffer_common
                --0.61%--copyin // WOW
                          __pwrite_nocancel
                          FileWrite
                          mdwrite
                          FlushBuffer
                          ReadBuffer_common
     9.20%  postgres  postgres            [.] hash_search_with_hash_value
            ---hash_search_with_hash_value
               |--4.70%--smgropen

of course there is another WOW moment during recovery ("61.9%")

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 120935  0.9  0.0 866052  3824 ?        Ss   09:47   0:00 postgres: 
checkpointer
postgres 120936 61.9  0.0 865796  3824 ?        Rs   09:47   0:22 postgres: 
background writer
postgres 120937 97.4  0.0 865940  5228 ?        Rs   09:47   0:36 postgres: 
startup recovering 00010089

speedup of 1.647x when dirty memory is in way. When it's not:

baseline  s_b@24000MB: 39.199, 1.448 (2x patches off)
0001+0002 s_b@24000MB: 39.383, 1.442 , bgwriter_delay = 
10ms/bgwriter_lru_maxpages = 5 #~390MB/s max, yay

there's no regression. I have only one comment about those 2 WIP patches, 
bgwriter_lru_maxpages should be maybe called standby_bgwriter_lru_maxpages in 
this scenario or even more preferred there shouldn't be a maximum set during 
closed DB recovery scenario.

TEST bgwriterPOC2a to showcase the 2nd patch which opens the DB for read-write 
users before the final checkpoint finishes after redo recovery. The DBA may 
make the decision via this parameter end_of_recovery_checkpoint_wait=off.
- on slow storage (xvda, fsync=on) and even with high memory:

s_b@24000MB: 39.043, 15.639 -- even with WAL recovery being 100% CPU 
bound(mostly on hash_search_with_hash_value() for 
Buffers/__memmove_ssse3_back), it took additional 15s to perform checkpoint 
before DB was open for users (it had to write 269462 buffers =~ 2GB =~ 140

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Magnus Hagander

On Fri, Aug 28, 2020 at 2:38 PM Stephen Frost  wrote:

> Greetings,
>
> * Magnus Hagander (mag...@hagander.net) wrote:
> > Without having actually looked at the code, definite +1 for this feature.
> > It's much requested...
>
> Thanks.
>
> > But, should we also have a pg_write_all_data to go along with it?
>
> Perhaps, but could certainly be a different patch, and it'd need to be
> better defined, it seems to me...  read_all is pretty straight-forward
> (the general goal being "make pg_dumpall/pg_dump work"), what would
> write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?
>

Well, it's pg_write_all_*data*, so it certainly wouldn't be alter table or
system catalogs.

I'd say insert/update/delete yes.

TRUNCATE is always an outlier.Given it's generally classified as DDL, I
wouldn't include it.

Doesn't seem like you could just declare it to be 'allow pg_restore'
> either, as that might include creating untrusted functions, et al.
>

No definitely not. That wouldn't be the usecase at all :)

(and fwiw to me the main use case for read_all_data also isn't pg_dump,
because most people using pg_dump are already db owner or higher in my
experience. But it is nice that it helps with that too)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Isaac Morland

On Fri, 28 Aug 2020 at 08:43, Stephen Frost  wrote:

> This would simply REVOKE that role from the user.  Privileges
> independently GRANT'd directly to the user wouldn't be affected.  Nor
> would other role membership.
>
> > What privileges would the user be left with? Would it be possible to end
> up in the same privilege only with a GRANT command?
>

What about:

REVOKE SELECT ON [table] FROM pg_read_all_data;

I guess what I’m really asking is whether pg_read_all_data is automatically
granted SELECT on all newly-created relations, or if the permission
checking system always returns TRUE when asked if pg_read_all_data can
select from a relation? I’m guessing it’s the latter so that it would be
ineffective to revoke select privilege as I think this is more useful, but
I’d like to be sure and the documentation should be explicit on this point.

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* Isaac Morland (isaac.morl...@gmail.com) wrote:
> On Fri, 28 Aug 2020 at 08:43, Stephen Frost  wrote:
> > This would simply REVOKE that role from the user.  Privileges
> > independently GRANT'd directly to the user wouldn't be affected.  Nor
> > would other role membership.
> >
> > > What privileges would the user be left with? Would it be possible to end
> > up in the same privilege only with a GRANT command?
> 
> What about:
> 
> REVOKE SELECT ON [table] FROM pg_read_all_data;

Wouldn't have any effect, and I think that's correct.

> I guess what I’m really asking is whether pg_read_all_data is automatically
> granted SELECT on all newly-created relations, or if the permission
> checking system always returns TRUE when asked if pg_read_all_data can
> select from a relation? I’m guessing it’s the latter so that it would be
> ineffective to revoke select privilege as I think this is more useful, but
> I’d like to be sure and the documentation should be explicit on this point.

Yes, it's the latter.  I'm not really sure about the documentation
change you're contemplating- have a specific suggestion?

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Magnus Hagander

On Fri, Aug 28, 2020 at 2:43 PM Stephen Frost  wrote:

> Greetings,
>
> * Georgios Kokolatos (gkokola...@protonmail.com) wrote:
> > The patch seems to be implementing a useful and requested feature.
> > The patch applies cleanly and passes the basic regress tests. Also the
> commitfest bot is happy.
> >
> > A first pass at the code, has not revealed any worthwhile comments.
> > Please allow me for a second and more thorough pass. The commitfest has
> hardly started after all.
>
> Great, thanks!
>
> > Also allow me a series of genuine questions:
> >
> > What would the behaviour be with REVOKE?
> > In a sequence similar to:
> > GRANT ALL ON ...
>
> GRANT ALL would be independently GRANT'ing rights to some role and
> therefore unrelated.
>
> > REVOKE pg_read_all_data FROM ...
>
> This would simply REVOKE that role from the user.  Privileges
> independently GRANT'd directly to the user wouldn't be affected.  Nor
> would other role membership.
>
> > What privileges would the user be left with? Would it be possible to end
> up in the same privilege only with a GRANT command?
>
> I'm not sure what's being asked here.
>

I think the core thing to remember here is that SQL permissions are always
additive, that's what confuses some cases.

That is, REVOKE something FROM role only removes this particular additive
permission. It doesn't make sure the role doesn't have the same permission
*through some other means*.

Sometime it would be really useful to be able to do e.g. "DENY DELETE ON
importanttable FROM sfrost", which would then override any DELETE
permissions he'd be getting from anywhere else. To be able to say
"everybody except x". But that's not something that's in SQL permissions
AFAIK :)

> Does the above scenario even make sense?
>
> I definitely believe it makes sense for a given role/user to be a member
> of pg_read_all_data and to be a member of other roles, or to have other
> privileges GRANT'd directly to them.
>

Yeah, for example another role might have a combination of read and write
permissions, and those would then remain for the user if pg_read_all_data
is removed.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread gkokolatos







‐‐‐ Original Message ‐‐‐
On Friday, 28 August 2020 15:43, Stephen Frost  wrote:

> Greetings,
>
> -   Georgios Kokolatos (gkokola...@protonmail.com) wrote:
>
> > The patch seems to be implementing a useful and requested feature.
> > The patch applies cleanly and passes the basic regress tests. Also the 
> > commitfest bot is happy.
> > A first pass at the code, has not revealed any worthwhile comments.
> > Please allow me for a second and more thorough pass. The commitfest has 
> > hardly started after all.
>
> Great, thanks!
>
> > Also allow me a series of genuine questions:
> > What would the behaviour be with REVOKE?
> > In a sequence similar to:
> > GRANT ALL ON ...
>
> GRANT ALL would be independently GRANT'ing rights to some role and
> therefore unrelated.
>
> > REVOKE pg_read_all_data FROM ...
>
> This would simply REVOKE that role from the user. Privileges
> independently GRANT'd directly to the user wouldn't be affected. Nor
> would other role membership.
>
> > What privileges would the user be left with? Would it be possible to end up 
> > in the same privilege only with a GRANT command?
>
> I'm not sure what's being asked here.

You are correct. My phrasing is not clear. Please be patient and allow me to 
try again.

I was playing around with the code and I was trying a bit the opposite of what 
you have submitted in the test file.

You have, (snipped):

GRANT pg_read_all_data TO regress_priv_user6;

SET SESSION AUTHORIZATION regress_priv_user6;
SELECT * FROM atest1; -- ok
INSERT INTO atest2 VALUES ('foo', true); -- fail


I was expecting:
REVOKE pg_read_all_data FROM regress_priv_user6;

SET SESSION AUTHORIZATION regress_priv_user6;
SELECT * FROM atest1; -- fail
INSERT INTO atest2 VALUES ('foo', true); -- ok


My expectation was not met since in my manual test (unless I made a mistake 
which is entirely possible), the SELECT above did not fail. The insert did 
succeed though.

The first question: Was my expectation wrong?
The second question: Is there a privilege that can be granted to 
regress_priv_user6 that will not permit the select operation but will permit 
the insert operation? If no, should there be one?

I hope I am clearer now.

Thank you again for your patience.

>
> > Does the above scenario even make sense?
>
> I definitely believe it makes sense for a given role/user to be a member
> of pg_read_all_data and to be a member of other roles, or to have other
> privileges GRANT'd directly to them.
>
> Thanks,
>
> Stephen

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* gkokola...@pm.me (gkokola...@pm.me) wrote:
> On Friday, 28 August 2020 15:43, Stephen Frost  wrote:
> > > What privileges would the user be left with? Would it be possible to end 
> > > up in the same privilege only with a GRANT command?
> >
> > I'm not sure what's being asked here.
> 
> You are correct. My phrasing is not clear. Please be patient and allow me to 
> try again.
> 
> I was playing around with the code and I was trying a bit the opposite of 
> what you have submitted in the test file.
> 
> You have, (snipped):
> 
> GRANT pg_read_all_data TO regress_priv_user6;
> 
> SET SESSION AUTHORIZATION regress_priv_user6;
> SELECT * FROM atest1; -- ok
> INSERT INTO atest2 VALUES ('foo', true); -- fail
> 
> 
> I was expecting:
> REVOKE pg_read_all_data FROM regress_priv_user6;

Are you sure this REVOKE was successful..?

> SET SESSION AUTHORIZATION regress_priv_user6;
> SELECT * FROM atest1; -- fail
> INSERT INTO atest2 VALUES ('foo', true); -- ok

=# create role r1;
CREATE ROLE
=*# grant pg_read_all_data to r1;
GRANT ROLE
=*# create table t1 (c1 int);
CREATE TABLE
=*# set role r1;
=*> select * from t1;
 c1 

(0 rows)
=*> reset role;
RESET
=*# revoke pg_read_all_data from r1;
REVOKE ROLE
=*# set role r1;
SET
=*> select * from t1;
ERROR:  permission denied for table t1

Seems to be working as expected here.

> My expectation was not met since in my manual test (unless I made a mistake 
> which is entirely possible), the SELECT above did not fail. The insert did 
> succeed though.

That the INSERT worked seems pretty odd- could you post the exact
changes you've made to the regression tests, or the exact script where
you aren't seeing what you expect?  I've not been able to reproduce the
GRANT allowing a user to INSERT into a table.

> The first question: Was my expectation wrong?

If there aren't any other privileges involved, then REVOKE'ing the role
from a user should prevent that user from being able to SELECT from the
table.

> The second question: Is there a privilege that can be granted to 
> regress_priv_user6 that will not permit the select operation but will permit 
> the insert operation? If no, should there be one?

GRANT INSERT ON atest1 TO regress_prive_user6; would allow just
INSERT'ing.

Magnus also brought up the idea of a 'write_all_data' role, but that's
pretty independent of this, imv.  Not against adding it, if we can agree
as to what it means, exactly, but we should probably discuss over in
that sub-thread.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Isaac Morland

On Fri, 28 Aug 2020 at 08:54, Stephen Frost  wrote:

>
> Yes, it's the latter.  I'm not really sure about the documentation
> change you're contemplating- have a specific suggestion?
>

Sorry, I was discussing this as if it was an abstract idea, not a concrete
patch. I've just taken a look at the patch and I think the documentation in
doc/src/sgml/user-manag.sgml is OK.

Re: [PATCH] Explicit null dereferenced (src/backend/access/heap/heaptoast.c)

2020-08-28 Thread Ranier Vilela

Em sex., 28 de ago. de 2020 às 04:45,  escreveu:

>
>
>
>
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, 28 August 2020 03:22, Ranier Vilela 
> wrote:
>
> > Hi,
> >
> > Per Coverity.
> >
> > When "Prepare for toasting", it is necessary to turn off the flag
> TOAST_NEEDS_DELETE_OLD,
> > if there is no need to delete external values from the old tuple,
> otherwise,
> > there are dereference NULL at toast_helper.c (on toast_tuple_cleanup
> function).
> >
>
> Excuse my ignorance, isn't this a false positive?
>
Yes, you're right.

Coverity fails with &.

if (oldtup == NULL)
147{

3. assign_zero: Assigning: ttc.ttc_oldvalues = NULL.
148ttc.ttc_oldvalues = NULL;
149ttc.ttc_oldisnull = NULL;

4. Falling through to end of if statement.
150}
151else
152{
153ttc.ttc_oldvalues = toast_oldvalues;
154ttc.ttc_oldisnull = toast_oldisnull;
155}
156ttc.ttc_attr = toast_attr;
157toast_tuple_init(&ttc);  // Coverity ignores the call completely
here.

toast_tuple_init, solves the bug, because reset ttc->flags.

> Regardless right after prepare for toasting, a call to toast_tuple_init is
> made which will explicitly and unconditionally set ttc_flags to zero so the
> flag bit set in the patch will be erased anyways. This patch may make
> coverity happy but does not really change anything in the behaviour of the
> code.
>
That's right, the patch doesn't change anything.

>
> Furthermore, in the same function, toast_tuple_init, the flag is set to
> TOAST_NEEDS_DELETE_OLD after the old value is actually inspected and found
> to not be null, be stored on disk and to be different than the new value.
> To my understanding, this seems to be correct.
>
Very correct.

Thanks for taking a look here.

You could take a look at the attached patch,
would it be an improvement?
toast_tuple_init, it seems to me that it can be improved.
ttc->ttc_oldvalues is constant, and it could be unlooping?

regards,
Ranier Vilela

unloop_toast_tuple_init.patch
Description: Binary data

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Fri, Aug 28, 2020 at 2:38 PM Stephen Frost  wrote:
> > * Magnus Hagander (mag...@hagander.net) wrote:
> > > Without having actually looked at the code, definite +1 for this feature.
> > > It's much requested...
> >
> > Thanks.
> >
> > > But, should we also have a pg_write_all_data to go along with it?
> >
> > Perhaps, but could certainly be a different patch, and it'd need to be
> > better defined, it seems to me...  read_all is pretty straight-forward
> > (the general goal being "make pg_dumpall/pg_dump work"), what would
> > write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?
> 
> Well, it's pg_write_all_*data*, so it certainly wouldn't be alter table or
> system catalogs.
> 
> I'd say insert/update/delete yes.
> 
> TRUNCATE is always an outlier.Given it's generally classified as DDL, I
> wouldn't include it.

Alright, that seems like it'd be pretty easy.  We already have a check
in pg_class_aclmask to disallow modification of system catalogs w/o
being a superuser, so we should be alright to add a similar check for
insert/update/delete just below where I added the SELECT check.

> > Doesn't seem like you could just declare it to be 'allow pg_restore'
> > either, as that might include creating untrusted functions, et al.
> 
> No definitely not. That wouldn't be the usecase at all :)

Good. :)

> (and fwiw to me the main use case for read_all_data also isn't pg_dump,
> because most people using pg_dump are already db owner or higher in my
> experience. But it is nice that it helps with that too)

Glad to have confirmation that there's other use-cases this helps with.

I'll post an updated patch with that added in a day or two.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Support for OUT parameters in procedures

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 2:04 AM Peter Eisentraut
 wrote:
> The handling of results of SQL statements executed at the top level
> (a.k.a. direct SQL) is implementation-specific and varies widely in
> practice.  More interesting in practice, in terms of functionality and
> also compatibility, are nested calls in PL/pgSQL as well as integration
> in JDBC.

I agree that driver integration, and in particular JDBC integration,
is important and needs some thought. I don't think it horribly
matters, with a feature like this, what shows up when people type
stuff into psql. Whatever it is, people will get used to it. But when
they interact through a driver, it's different. It is no good
inventing things, either in PostgreSQL or in the JDBC driver for
PostgreSQL, that make PostgreSQL behave differently from every other
database they use. I don't know exactly how we get to a good outcome
here, but I think it's worth some careful consideration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

2020-08-28 Thread Ranier Vilela

Em sex., 28 de ago. de 2020 às 00:11, Tom Lane  escreveu:

>
> In other words, the C standard made a damfool decision and now we need
> to deal with the consequences of that as perpetrated by other fools.
> Still, it's all hypothetical so far --- does anyone have examples of
> actual rather than theoretical issues?
>
I still think the value of this alert would be to avoid the call.
Why do memcmp have to deal with a NULL value?
clog.c: 299, it is a case outside the curve, there are hundreds of calls in
the report.
It must be very difficult to correct, but if TransactionIdSetPageStatus was
not called in these cases, memcmp, it would not have to deal with the NULL
pointer.

regards,
Ranier Vilela

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Gilles Darold


Le 28/08/2020 à 15:26, Stephen Frost a écrit :

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:

On Fri, Aug 28, 2020 at 2:38 PM Stephen Frost  wrote:

* Magnus Hagander (mag...@hagander.net) wrote:

Without having actually looked at the code, definite +1 for this feature.
It's much requested...

Thanks.


But, should we also have a pg_write_all_data to go along with it?

Perhaps, but could certainly be a different patch, and it'd need to be
better defined, it seems to me...  read_all is pretty straight-forward
(the general goal being "make pg_dumpall/pg_dump work"), what would
write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?

Well, it's pg_write_all_*data*, so it certainly wouldn't be alter table or
system catalogs.

I'd say insert/update/delete yes.

TRUNCATE is always an outlier.Given it's generally classified as DDL, I
wouldn't include it.

Alright, that seems like it'd be pretty easy.  We already have a check
in pg_class_aclmask to disallow modification of system catalogs w/o
being a superuser, so we should be alright to add a similar check for
insert/update/delete just below where I added the SELECT check.


Doesn't seem like you could just declare it to be 'allow pg_restore'
either, as that might include creating untrusted functions, et al.

No definitely not. That wouldn't be the usecase at all :)

Good. :)


(and fwiw to me the main use case for read_all_data also isn't pg_dump,
because most people using pg_dump are already db owner or higher in my
experience. But it is nice that it helps with that too)

Glad to have confirmation that there's other use-cases this helps with.

I'll post an updated patch with that added in a day or two.

Thanks,

Stephen


Hi,


Looking at this thread I was thinking about the FDW. Does role 
pg_read_all_data will allow to execute pg_dump with 
--include-foreign-data option? If this is the case how about priviledge 
on fdw and foreign server? If this is the behavior we want I guess that 
the modification should be applied to pg_foreign_data_wrapper_aclmask() 
and pg_foreign_server_aclmask() too.



Best regards,

--
Gilles Darold

Re: new heapcheck contrib module

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 1:07 AM Andrey M. Borodin  wrote:
> I was experimenting a bit with our internal heapcheck and found out that it's 
> not helping with truncated CLOG anyhow.
> Will your module be able to gather tid's of similar corruptions?
>
> server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
> ERROR:  58P01: could not access status of transaction 636558742
> DETAIL:  Could not open file "pg_xact/025F": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:913
> Time: 3439.915 ms (00:03.440)

This kind of thing gets really tricky. PostgreSQL uses errors in tons
of places to report problems, and if you want to accumulate a list of
errors and report them all rather than just letting the first one
cancel the operation, you need special handling for each individual
error you want to bypass. A tool like this naturally wants to use as
much PostgreSQL infrastructure as possible, to avoid duplicating a ton
of code and creating a bloated monstrosity, but all that code can
throw errors. I think the code in its current form is trying to be
resilient against problems on the table pages that it is actually
checking, but it can't necessarily handle gracefully corruption in
other parts of the system. For instance:

- CLOG could be truncated, as in your example
- the disk files could have had their permissions changed so that they
can't be accessed
- the PageIsVerified() check might fail when pages are read
- the TOAST table's metadata in pg_class/pg_attribute/etc. could be corrupted
- ...or the files for those system catalogs could've had their
permissions changed
- or they could contain invalid pages
- ...or their indexes could be messed up

I think there are probably a bunch more, and I don't think it's
practical to allow this tool to continue after arbitrary stuff goes
wrong. It'll be too much code and impossible to maintain. In the case
you mention, I think we should view that as a problem with clog rather
than a problem with the table, and thus out of scope.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Please help for error ( file is required for XML support )

2020-08-28 Thread Ranier Vilela

Em sex., 28 de ago. de 2020 às 04:37, Sachin Khanna <
sachin_kha...@infosys.com> escreveu:

>
>
> Now I can see the that libraries are identified in config.log . I also
> found some error in configuration file as well. Highlight below.
>
> Please find the attached config.log file for reference.
>
> Please help and let me know if I there is any issue in configuration
> script and I need to change it ?
>
>
>
> I am using PostgreSQL version postgresql-12.3.tar.gz
>
>
>
> configure:8291: checking for XML2_CONFIG
>
> configure:8293: result: /usr/lib64/libxml2.so
>
> configure:8416: checking for grep that handles long lines and -e
>
> configure:8474: result: /bin/grep
>
> configure:8479: checking for egrep
>
> configure:8541: result: /bin/grep -E
>
> configure:8588: checking for ld used by GCC
>
> configure:8651: result: /bin/ld
>
> configure:8658: checking if the linker (/bin/ld) is GNU ld
>
> GNU ld version 2.27-44.base.el7
>
>
>
> at-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2
> -D_GNU_SOURCE  conftest.c >&5
>
> conftest.c: In function 'main':
>
> *conftest.c:76:20: error: expected expression before ')' token*
>
> *if (sizeof ((_Bool)))*
>
> *^*
>
> *configure:12910: $? = 1*
>
> *configure: failed program was:*
>
> *| /* confdefs.h */*
>
What version of gcc is being used in this environment?

regards,
Ranier Vilela

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 5:55 AM Ashutosh Sharma  wrote:
> We can certainly do this way, but I would still prefer having a comparator 
> function (tidcmp) here for the reasons that it makes the code look a bit 
> cleaner, it also makes us more consistent with the way the comparator 
> function argument is being passed to qsort at several other places in 
> postgres which kinda of increases the code readability and simplicity. For 
> e.g. there is a comparator function for gin that does the same thing as 
> tidcmp is doing here.

Me too. Casting one kind of function pointer to another kind of
function pointer assumes that the compiler is using the same
argument-passing conventions in both cases, which seems slightly
risky. It also means that if the signature for the function were to
diverge further from the signature that we need in the future, the
compiler might not warn us about it. Perhaps there is some case where
the performance gains would be sufficiently to justify those risks,
but this is certainly not that case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Please help for error ( file is required for XML support )

2020-08-28 Thread Ranier Vilela

em sex., 28 de ago. de 2020 às 11:05, Ranier Vilela 
escreveu:

> Em sex., 28 de ago. de 2020 às 04:37, Sachin Khanna <
> sachin_kha...@infosys.com> escreveu:
>
>>
>>
>> Now I can see the that libraries are identified in config.log . I also
>> found some error in configuration file as well. Highlight below.
>>
>> Please find the attached config.log file for reference.
>>
>> Please help and let me know if I there is any issue in configuration
>> script and I need to change it ?
>>
>>
>>
>> I am using PostgreSQL version postgresql-12.3.tar.gz
>>
>>
>>
>> configure:8291: checking for XML2_CONFIG
>>
>> configure:8293: result: /usr/lib64/libxml2.so
>>
>> configure:8416: checking for grep that handles long lines and -e
>>
>> configure:8474: result: /bin/grep
>>
>> configure:8479: checking for egrep
>>
>> configure:8541: result: /bin/grep -E
>>
>> configure:8588: checking for ld used by GCC
>>
>> configure:8651: result: /bin/ld
>>
>> configure:8658: checking if the linker (/bin/ld) is GNU ld
>>
>> GNU ld version 2.27-44.base.el7
>>
>>
>>
>> at-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2
>> -D_GNU_SOURCE  conftest.c >&5
>>
>> conftest.c: In function 'main':
>>
>> *conftest.c:76:20: error: expected expression before ')' token*
>>
>> *if (sizeof ((_Bool)))*
>>
>> *^*
>>
>> *configure:12910: $? = 1*
>>
>> *configure: failed program was:*
>>
>> *| /* confdefs.h */*
>>
> What version of gcc is being used in this environment?
>
It seems gcc does not compile C99, without -std=c99
configure:4421: checking for gcc option to accept ISO C99
conftest.c:130:3: note: use option -std=c99 or -std=gnu99 to compile your
code

try add -std=c99
./configure -std=c99

regards,
Ranier Vilela

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 4:07 AM Masahiko Sawada
 wrote:
> You've removed the description about executing VACUUM with
> DISABLE_PAGE_SKIPPING option on the target relation after using
> pg_surgery functions from the doc but I guess it’s better to recommend
> that in the doc for safety. Could you please tell me the reason for
> removing that?

Well, I think that was added because there wasn't code to clear the
visibility map bits, either page-level in the map, but we added code
for that, so now I don't really see why it's necessary or even
desirable.

Here are a few example scenarios:

1. My table is not corrupt. For no particular reason, I force-freeze
or force-kill a tuple which is neither dead nor all-visible.
Concurrent queries might return wrong answers, but the table is not
corrupt. It does not require VACUUM and would not benefit from it.
Actually, it doesn't need anything at all.

2. My table is not corrupt. For no particular reason, I force-freeze a
tuple which is dead. I believe it's possible that the index entries
for that tuple might be gone already, but VACUUM won't fix that.
REINDEX or a table rewrite would, though. It's also possible, if the
dead tuple was added by an aborted transaction which added columns to
the table, that the tuple might have been created using a tuple
descriptor that differs from the table's current tuple descriptor. If
so, I think scanning the table could produce a crash. VACUUM won't fix
this, either. I would need to delete or force-kill the offending
tuple.

3. I have one or more tuples in my table that are intact except that
they have garbage values for xmin, resulting in VACUUM failure or
possibly even SELECT failure if the CLOG entries are also missing. I
force-kill or force-freeze them. If by chance the affected tuples were
also omitted from one or more indexes, a REINDEX or table rewrite is
needed to fix them, but a VACUUM will not help. On the other hand, if
those tuples are present in the indexes, there's no remaining problem
and VACUUM is not needed for the purpose of restoring the integrity of
the table. If the problem has been ongoing for a while, VACUUM might
be needed to advance relfrozenxid, but that doesn't require
DISABLE_PAGE_SKIPPING.

4. I have some pages in my table that have incorrect visibility map
bits. In this case, I need VACUUM (DISABLE_PAGE_SKIPPING). However, I
don't need the functions we're talking about here at all unless I also
have tuples with corrupted visibility information. If I do happen to
have both tuples with corrupted visibility information and also pages
with incorrect visibility map bits, then I suppose I need both these
tools and also VACUUM (DISABLE_PAGE_SKIPPING). Probably, I'll want to
do the VACUUM second. But, if I happened to do the VACUUM first and
then use these functions afterward, the worst thing that could happen
is that I might end up with a some dead tuples that could've gotten
removed faster if I'd switched the order. And that's not a disaster.

Basically, I can see no real reason to recommend VACUUM
(DISABLE_PAGE_SKIPPING) here. There are problems that can be fixed
with that command, and there are problems that can be fixed by this
method, but they are mostly independent of each other. We should not
recommend that people run VACUUM "just in case." That kind of fuzzy
thinking seems relatively prevalent already, and it leads to people
spending a lot of time running slow maintenance commands that do
nothing to help them, and which occasionally make things worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* Gilles Darold (gil...@darold.net) wrote:
> Le 28/08/2020 à 15:26, Stephen Frost a écrit :
> >* Magnus Hagander (mag...@hagander.net) wrote:
> >>On Fri, Aug 28, 2020 at 2:38 PM Stephen Frost  wrote:
> >>>* Magnus Hagander (mag...@hagander.net) wrote:
> Without having actually looked at the code, definite +1 for this feature.
> It's much requested...
> >>>Thanks.
> >>>
> But, should we also have a pg_write_all_data to go along with it?
> >>>Perhaps, but could certainly be a different patch, and it'd need to be
> >>>better defined, it seems to me...  read_all is pretty straight-forward
> >>>(the general goal being "make pg_dumpall/pg_dump work"), what would
> >>>write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?
> >>Well, it's pg_write_all_*data*, so it certainly wouldn't be alter table or
> >>system catalogs.
> >>
> >>I'd say insert/update/delete yes.
> >>
> >>TRUNCATE is always an outlier.Given it's generally classified as DDL, I
> >>wouldn't include it.
> >Alright, that seems like it'd be pretty easy.  We already have a check
> >in pg_class_aclmask to disallow modification of system catalogs w/o
> >being a superuser, so we should be alright to add a similar check for
> >insert/update/delete just below where I added the SELECT check.
> >
> >>>Doesn't seem like you could just declare it to be 'allow pg_restore'
> >>>either, as that might include creating untrusted functions, et al.
> >>No definitely not. That wouldn't be the usecase at all :)
> >Good. :)
> >
> >>(and fwiw to me the main use case for read_all_data also isn't pg_dump,
> >>because most people using pg_dump are already db owner or higher in my
> >>experience. But it is nice that it helps with that too)
> >Glad to have confirmation that there's other use-cases this helps with.
> >
> >I'll post an updated patch with that added in a day or two.
> 
> Looking at this thread I was thinking about the FDW. Does role
> pg_read_all_data will allow to execute pg_dump with --include-foreign-data
> option? If this is the case how about priviledge on fdw and foreign server?
> If this is the behavior we want I guess that the modification should be
> applied to pg_foreign_data_wrapper_aclmask() and pg_foreign_server_aclmask()
> too.

Using an FDW will often also require having a user mapping and there's
no way for that to be accomplished through only GRANT'ing a default
role, so I don't think we should mix this specific role with the FDW
permissions system.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-28 Thread David Rowley

On Wed, 26 Aug 2020 at 03:52, Andres Freund  wrote:
>
> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> > However, given the correct planner choice, there will never be
> > a gross slowdown due to having the extra node.
>
> There'll be a significant reduction in increase in performance.

So I did a very rough-cut change to the patch to have the caching be
part of Nested Loop.  It can be applied on top of the other 3 v7
patches.

For the performance, the test I did results in the performance
actually being reduced from having the Result Cache as a separate
node.  The reason for this is mostly because Nested Loop projects.
Each time I fetch a MinimalTuple from the cache, the patch will deform
it in order to store it in the virtual inner tuple slot for the nested
loop. Having the Result Cache as a separate node can skip this step as
it's result tuple slot is a TTSOpsMinimalTuple, so we can just store
the cached MinimalTuple right into the slot without any
deforming/copying.

Here's an example of a query that's now slower:

select count(*) from hundredk hk inner join lookup100 l on hk.one = l.a;

In this case, the original patch does not have to deform the
MinimalTuple from the cache as the count(*) does not require any Vars
from it.   With the rough patch that's attached, the MinimalTuple is
deformed in during the transformation during ExecCopySlot(). The
slowdown exists no matter which column of the hundredk table I join to
(schema in [1]).

Performance comparison is as follows:

v7 (Result Cache as a separate node)
postgres=# explain (analyze, timing off) select count(*) from hundredk
hk inner join lookup l on hk.one = l.a;
 Execution Time: 652.582 ms

v7 + attached rough patch
postgres=# explain (analyze, timing off) select count(*) from hundredk
hk inner join lookup l on hk.one = l.a;
 Execution Time: 843.566 ms

I've not yet thought of any way to get rid of the needless
MinimalTuple deform.  I suppose the cache could just have already
deformed tuples, but that requires more memory and would result in a
worse cache hit ratios for workloads where the cache gets filled.

I'm open to ideas to make the comparison fairer.

(Renamed the patch file to .txt to stop the CFbot getting upset with me)

David

[1] 
https://www.postgresql.org/message-id/caaphdvrpcqyqdwergywx8j+2dlungxu+fosbq1uscxrunyx...@mail.gmail.com
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 6d4b9eb3b9..42c6df549f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -108,8 +108,7 @@ static void show_sort_info(SortState *sortstate, 
ExplainState *es);
 static void show_incremental_sort_info(IncrementalSortState *incrsortstate,

ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
-static void show_resultcache_info(ResultCacheState *rcstate, List *ancestors,
- ExplainState 
*es);
+static void show_resultcache_info(NestLoopState *nlstate, List *ancestors, 
ExplainState *es);
 static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
ExplainState 
*es);
@@ -1494,10 +1493,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 * For historical reasons, the join 
type is interpolated
 * into the node type name...
 */
-   if (((Join *) plan)->jointype != 
JOIN_INNER)
+   if (((Join *)plan)->jointype != 
JOIN_INNER)
appendStringInfo(es->str, " %s 
Join", jointype);
else if (!IsA(plan, NestLoop))
appendStringInfoString(es->str, 
" Join");
+   else if (castNode(NestLoop, 
plan)->paramcache)
+   appendStringInfoString(es->str, 
" Cached");
+
}
else
ExplainPropertyText("Join Type", 
jointype, es);
@@ -1883,6 +1885,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
break;
case T_NestLoop:
+   show_resultcache_info((NestLoopState *) planstate, 
ancestors, es);
show_upper_qual(((NestLoop *) plan)->join.joinqual,
"Join Filter", 
planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
@@ -1963,10 +1966,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
cas

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-28 Thread David Rowley

On Sat, 29 Aug 2020 at 02:54, David Rowley  wrote:
> I'm open to ideas to make the comparison fairer.

While on that, it's not just queries that don't require the cached
tuple to be deformed that are slower.  Here's a couple of example that
do requite both patches to deform the cached tuple:

Some other results that do result in both patches deforming tuples
still slows that v7 is faster:

Query1:

v7 + attached patch
postgres=# explain (analyze, timing off) select count(l.a) from
hundredk hk inner join lookup100 l on hk.one = l.a;
   QUERY PLAN

 Aggregate  (cost=378570.41..378570.42 rows=1 width=8) (actual rows=1 loops=1)
   ->  Nested Loop Cached  (cost=0.43..353601.00 rows=9987763 width=4)
(actual rows=1000 loops=1)
 Cache Key: $0
 Hits: 9  Misses: 1  Evictions: 0  Overflows: 0
 ->  Seq Scan on hundredk hk  (cost=0.00..1637.00 rows=10
width=4) (actual rows=10 loops=1)
 ->  Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
   Index Cond: (a = hk.one)
   Heap Fetches: 0
 Planning Time: 0.050 ms
 Execution Time: 928.698 ms
(10 rows)

v7 only:
postgres=# explain (analyze, timing off) select count(l.a) from
hundredk hk inner join lookup100 l on hk.one = l.a;
  QUERY
PLAN
--
 Aggregate  (cost=152861.19..152861.20 rows=1 width=8) (actual rows=1 loops=1)
   ->  Nested Loop  (cost=0.45..127891.79 rows=9987763 width=4)
(actual rows=1000 loops=1)
 ->  Seq Scan on hundredk hk  (cost=0.00..1637.00 rows=10
width=4) (actual rows=10 loops=1)
 ->  Result Cache  (cost=0.45..2.53 rows=100 width=4) (actual
rows=100 loops=10)
   Cache Key: hk.one
   Hits: 9  Misses: 1  Evictions: 0  Overflows: 0
   ->  Index Only Scan using lookup100_a_idx on lookup100
l  (cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
 Index Cond: (a = hk.one)
 Heap Fetches: 0
 Planning Time: 0.604 ms
 Execution Time: 897.958 ms
(11 rows)


Query2:

v7 + attached patch
postgres=# explain (analyze, timing off) select * from hundredk hk
inner join lookup100 l on hk.one = l.a;
QUERY PLAN
--
 Nested Loop Cached  (cost=0.43..353601.00 rows=9987763 width=28)
(actual rows=1000 loops=1)
   Cache Key: $0
   Hits: 9  Misses: 1  Evictions: 0  Overflows: 0
   ->  Seq Scan on hundredk hk  (cost=0.00..1637.00 rows=10
width=24) (actual rows=10 loops=1)
   ->  Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
 Index Cond: (a = hk.one)
 Heap Fetches: 0
 Planning Time: 0.621 ms
 Execution Time: 883.610 ms
(9 rows)

v7 only:
postgres=# explain (analyze, timing off) select * from hundredk hk
inner join lookup100 l on hk.one = l.a;
  QUERY PLAN

 Nested Loop  (cost=0.45..127891.79 rows=9987763 width=28) (actual
rows=1000 loops=1)
   ->  Seq Scan on hundredk hk  (cost=0.00..1637.00 rows=10
width=24) (actual rows=10 loops=1)
   ->  Result Cache  (cost=0.45..2.53 rows=100 width=4) (actual
rows=100 loops=10)
 Cache Key: hk.one
 Hits: 9  Misses: 1  Evictions: 0  Overflows: 0
 ->  Index Only Scan using lookup100_a_idx on lookup100 l
(cost=0.43..2.52 rows=100 width=4) (actual rows=100 loops=1)
   Index Cond: (a = hk.one)
   Heap Fetches: 0
 Planning Time: 0.088 ms
 Execution Time: 870.601 ms
(10 rows)

David

Re: factorial function/phase out postfix operators?

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 4:44 AM John Naylor  wrote:
> On Thu, Aug 27, 2020 at 5:04 PM Tom Lane  wrote:
> > I'm a bit inclined to kill them both off and standardize on factorial()
> > (not numeric_fac).
>
> +1

Here's a modified version of John's patch that also describes ! and !!
as deprecated. It looked too wordy to me to recommend what should be
used instead, so I have not done that.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

postfix-and-factorial-deprecate.patch
Description: Binary data

Re: factorial function/phase out postfix operators?

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 11:00 AM Robert Haas  wrote:
> On Fri, Aug 28, 2020 at 4:44 AM John Naylor  
> wrote:
> > On Thu, Aug 27, 2020 at 5:04 PM Tom Lane  wrote:
> > > I'm a bit inclined to kill them both off and standardize on factorial()
> > > (not numeric_fac).
> >
> > +1
>
> Here's a modified version of John's patch that also describes ! and !!
> as deprecated. It looked too wordy to me to recommend what should be
> used instead, so I have not done that.
>
> Comments?

Never mind, I see there's a new thread for this. Sorry for the noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: new heapcheck contrib module

2020-08-28 Thread Mark Dilger

> On Aug 27, 2020, at 10:07 PM, Andrey M. Borodin  wrote:
> 
> 
> 
>> 25 авг. 2020 г., в 19:36, Mark Dilger  
>> написал(а):
> 
> Hi Mark!
> 
> Thanks for working on this important feature.
> 
> I was experimenting a bit with our internal heapcheck and found out that it's 
> not helping with truncated CLOG anyhow.
> Will your module be able to gather tid's of similar corruptions?
> 
> server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
> ERROR:  58P01: could not access status of transaction 636558742
> DETAIL:  Could not open file "pg_xact/025F": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:913
> Time: 3439.915 ms (00:03.440)

The design principle for verify_heapam.c is, if the rest of the system is not 
corrupt, corruption in the table being checked should not cause a crash during 
the table check. This is a very limited principle.  Even corruption in the 
associated toast table or toast index could cause a crash.  That is why 
checking against the toast table is optional, and false by default.

Perhaps a more extensive effort could be made later.  I think it is out of 
scope for this release cycle.  It is a very interesting area for further 
research, though.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deprecating postfix and factorial operators in PostgreSQL 13

2020-08-28 Thread Robert Haas

On Thu, Aug 27, 2020 at 1:07 PM Mark Dilger
 wrote:
> Yes, that is better.  Attached.

So, in this version, there are six copies of the deprecation notice
John wrote, rather than just one. Maybe we need more than one, but I
doubt we need six. I don't think the CREATE OPERATOR documentation
needs to mention this both when first introducing the concept and then
again when defining right_type; the former seems sufficient. I don't
think xoper.sgml needs these changes either; they interrupt the flow
of the narrative and I don't think this is where anyone would look for
a deprecation notice. I would also argue for dropping the mentions in
the docs for ALTER OPERATOR FAMILY and CREATE OPERATOR CLASS, although
those seem less clear-cut. Really, CREATE OPERATOR where John had it
originally seems like the right place.

That being said, the changes to typeconv.sgml and drop_operator.sgml
seem like improvements, so I'd keep those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Gilles Darold


Le 28/08/2020 à 16:52, Stephen Frost a écrit :

Greetings,

* Gilles Darold (gil...@darold.net) wrote:

Le 28/08/2020 à 15:26, Stephen Frost a écrit :

* Magnus Hagander (mag...@hagander.net) wrote:

On Fri, Aug 28, 2020 at 2:38 PM Stephen Frost  wrote:

* Magnus Hagander (mag...@hagander.net) wrote:

Without having actually looked at the code, definite +1 for this feature.
It's much requested...

Thanks.


But, should we also have a pg_write_all_data to go along with it?

Perhaps, but could certainly be a different patch, and it'd need to be
better defined, it seems to me...  read_all is pretty straight-forward
(the general goal being "make pg_dumpall/pg_dump work"), what would
write mean?  INSERT?  DELETE?  TRUNCATE?  ALTER TABLE?  System catalogs?

Well, it's pg_write_all_*data*, so it certainly wouldn't be alter table or
system catalogs.

I'd say insert/update/delete yes.

TRUNCATE is always an outlier.Given it's generally classified as DDL, I
wouldn't include it.

Alright, that seems like it'd be pretty easy.  We already have a check
in pg_class_aclmask to disallow modification of system catalogs w/o
being a superuser, so we should be alright to add a similar check for
insert/update/delete just below where I added the SELECT check.


Doesn't seem like you could just declare it to be 'allow pg_restore'
either, as that might include creating untrusted functions, et al.

No definitely not. That wouldn't be the usecase at all :)

Good. :)


(and fwiw to me the main use case for read_all_data also isn't pg_dump,
because most people using pg_dump are already db owner or higher in my
experience. But it is nice that it helps with that too)

Glad to have confirmation that there's other use-cases this helps with.

I'll post an updated patch with that added in a day or two.

Looking at this thread I was thinking about the FDW. Does role
pg_read_all_data will allow to execute pg_dump with --include-foreign-data
option? If this is the case how about priviledge on fdw and foreign server?
If this is the behavior we want I guess that the modification should be
applied to pg_foreign_data_wrapper_aclmask() and pg_foreign_server_aclmask()
too.

Using an FDW will often also require having a user mapping and there's
no way for that to be accomplished through only GRANT'ing a default
role, so I don't think we should mix this specific role with the FDW
permissions system.



I'm fine with that, perhaps it should be mentioned in the documentation 
that foreign tables are not covered by this role.


--
Gilles Darold

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-08-28 Thread Robert Haas

On Wed, Aug 19, 2020 at 6:58 PM Alvaro Herrera  wrote:
> On 2020-Aug-19, David Rowley wrote:
> > Andres' suggestion:
> > regression=# explain (analyze, costs off, timing off, summary off)
> > select count(*) from tenk1 t1 inner join tenk1 t2 on
> > t1.twenty=t2.unique1;
> >   QUERY PLAN
> > ---
> >  Aggregate (actual rows=1 loops=1)
> >->  Nested Loop (actual rows=1 loops=1)
> >   Cache Key: t1.twenty  Hits: 9980  Misses: 20  Evictions: 0 
> > Overflows: 0
> > ->  Seq Scan on tenk1 t1 (actual rows=1 loops=1)
> > ->  Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 
> > loops=20)
> >   Index Cond: (unique1 = t1.twenty)
> > (6 rows)
>
> I think it doesn't look terrible in the SubPlan case -- it kinda makes
> sense there -- but for nested loop it appears really strange.

I disagree. I don't know why anyone should find this confusing, except
that we're not used to seeing it. It seems to make a lot of sense that
if you are executing the same plan tree with different parameters, you
might want to cache results to avoid recomputation. So why wouldn't
nodes that do this include a cache?

This is not necessarily a vote for Andres's proposal. I don't know
whether it's technically better to include the caching in the Nested
Loop node or to make it a separate node, and I think we should do the
one that's better. Getting pushed into an inferior design because we
think the EXPLAIN output will be clearer does not make sense to me.

I think David's points elsewhere on the thread about ProjectSet and
Materialize nodes are interesting. It was never very clear to me why
ProjectSet was handled separately in every node, adding quite a bit of
complexity, and why Materialize was a separate node. Likewise, why are
Hash Join and Hash two separate nodes instead of just one? Why do we
not treat projection as a separate node type even when we're not
projecting a set? In general, I've never really understood why we
choose to include some functionality in other nodes and keep other
things separate. Is there even an organizing principle, or is it just
historical baggage?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

More aggressive vacuuming of temporary tables

2020-08-28 Thread Tom Lane

It strikes me that when we are vacuuming a temporary table (which
necessarily will be one of our own session), we don't really need
to care what the global xmin horizon is.  If we're not inside a
user transaction block, then there are no tuples in the table that
could be in-doubt anymore.  Neither are there any snapshots in our
session that could see any dead tuples.  Nor do we give a fig what
other sessions might think of those tuples.  So we could just set
the xmin cutoff as aggressively as possible, which is to say
equal to the nextXid counter.  While vacuuming a temp table is
perhaps not something people do very often, I think when they do
do it they would like us to clean out all the dead tuples not just
some.

Hence I propose 0001 attached.  80% of it is just API additions to allow
passing down the isTopLevel flag so that we can do the right thing in
the CLUSTER case; the important change is in vacuum_set_xid_limits.
(For ease of review, I didn't reindent the existing code in
vacuum_set_xid_limits, but that would be something to do before commit.)

The reason I got interested in this is that yesterday while fooling
with bug #16595, I was annoyed about our poor code test coverage in
access/gin/.  The 0002 patch attached brings coverage for ginvacuum.c
up from 59% to 93%; but as things stand, a long delay has to be inserted
between the DELETE and VACUUM steps, else the VACUUM won't remove the
dead tuples because of concurrent transactions, and we get no coverage
improvement.  Since the table in question is a temp table, that seems
pretty silly.

Thoughts?  Am I missing something important here?

regards, tom lane

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0da444af0..beafb60ac2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,6 +470,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 		  params->freeze_table_age,
 		  params->multixact_freeze_min_age,
 		  params->multixact_freeze_table_age,
+		  true, /* we must be a top-level command */
 		  &OldestXmin, &FreezeLimit, &xidFullScanLimit,
 		  &MultiXactCutoff, &mxactFullScanLimit);
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 04d12a7ece..0d647e912c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,10 +67,13 @@ typedef struct
 } RelToCluster;
 
 
-static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
+static void rebuild_relation(Relation OldHeap, Oid indexOid,
+			 bool isTopLevel, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
-			bool verbose, bool *pSwapToastByContent,
-			TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+			bool isTopLevel, bool verbose,
+			bool *pSwapToastByContent,
+			TransactionId *pFreezeXid,
+			MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 
 
@@ -170,7 +173,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		table_close(rel, NoLock);
 
 		/* Do the job. */
-		cluster_rel(tableOid, indexOid, stmt->options);
+		cluster_rel(tableOid, indexOid, stmt->options, isTopLevel);
 	}
 	else
 	{
@@ -219,7 +222,8 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 			PushActiveSnapshot(GetTransactionSnapshot());
 			/* Do the job. */
 			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-		stmt->options | CLUOPT_RECHECK);
+		stmt->options | CLUOPT_RECHECK,
+		isTopLevel);
 			PopActiveSnapshot();
 			CommitTransactionCommand();
 		}
@@ -250,7 +254,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
  * and error messages should refer to the operation as VACUUM not CLUSTER.
  */
 void
-cluster_rel(Oid tableOid, Oid indexOid, int options)
+cluster_rel(Oid tableOid, Oid indexOid, int options, bool isTopLevel)
 {
 	Relation	OldHeap;
 	bool		verbose = ((options & CLUOPT_VERBOSE) != 0);
@@ -400,7 +404,7 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 	TransferPredicateLocksToHeapRelation(OldHeap);
 
 	/* rebuild_relation does all the dirty work */
-	rebuild_relation(OldHeap, indexOid, verbose);
+	rebuild_relation(OldHeap, indexOid, isTopLevel, verbose);
 
 	/* NB: rebuild_relation does table_close() on OldHeap */
 
@@ -545,11 +549,12 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
  *
  * OldHeap: table to rebuild --- must be opened and exclusive-locked!
  * indexOid: index to cluster by, or InvalidOid to rewrite in physical order.
+ * isTopLevel: should be passed down from ProcessUtility.
  *
  * NB: this routine closes OldHeap at the right time; caller should not.
  */
 static void
-rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
+rebuild_relation(Relation OldHeap, Oid indexOid, bool isTopLevel, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
 	Oid			tableSpace = OldHeap->rd_rel->reltablespace;
@

Re: Deprecating postfix and factorial operators in PostgreSQL 13

2020-08-28 Thread Tom Lane

Robert Haas  writes:
> So, in this version, there are six copies of the deprecation notice
> John wrote, rather than just one. Maybe we need more than one, but I
> doubt we need six. I don't think the CREATE OPERATOR documentation
> needs to mention this both when first introducing the concept and then
> again when defining right_type; the former seems sufficient. I don't
> think xoper.sgml needs these changes either; they interrupt the flow
> of the narrative and I don't think this is where anyone would look for
> a deprecation notice. I would also argue for dropping the mentions in
> the docs for ALTER OPERATOR FAMILY and CREATE OPERATOR CLASS, although
> those seem less clear-cut. Really, CREATE OPERATOR where John had it
> originally seems like the right place.

Yeah, I agree that there are way too many copies here.  CREATE OPERATOR
seems sufficient.  It also seems like we should just rewrite the typeconv
and drop_operator examples to use some other operator.  We'll have
to do that eventually anyway, so why not now, instead of visiting those
places twice?

regards, tom lane

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Georgios Kokolatos

Thank you for the patch.

My high level review comment:
The patch seems to be implementing a useful and requested feature.
The patch applies cleanly and passes the basic regress tests. Also the 
commitfest bot is happy.

A first pass at the code, has not revealed any worthwhile comments.
Please allow me for a second and more thorough pass. The commitfest has hardly 
started after all.

Also allow me a series of genuine questions: 

What would the behaviour be with REVOKE?
In a sequence similar to:
GRANT ALL ON ...
REVOKE pg_read_all_data FROM ...
What privileges would the user be left with? Would it be possible to end up in 
the same privilege only with a GRANT command?
Does the above scenario even make sense?

Regards,

Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING

2020-08-28 Thread Robert Haas

On Tue, Jul 21, 2020 at 9:21 AM Dilip Kumar  wrote:
> In the previous version, the feature was enabled for cluster/vacuum
> full command as well.   in the attached patch I have enabled it only
> if we are running vacuum command.  It will not be enabled during a
> table rewrite.  If we think that it should be enabled for the 'vacuum
> full' then we might need to pass a flag from the cluster_rel, all the
> way down to the heap_freeze_tuple.

I think this is a somewhat clunky way of accomplishing this. The
caller passes down a flag to heap_prepare_freeze_tuple() which decides
whether or not an error is forced, and then that function and
FreezeMultiXactId use vacuum_damage_elevel() to combine the results of
that flag with the value of the vacuum_tolerate_damage GUC. But that
means that a decision that could be made in one place is instead made
in many places. If we just had heap_freeze_tuple() and
FreezeMultiXactId() take an argument int vacuum_damage_elevel, then
heap_freeze_tuple() could pass ERROR and lazy_scan_heap() could
arrange to pass WARNING or ERROR based on the value of
vacuum_tolerate_damage. I think that would likely end up cleaner. What
do you think?

I also suggest renaming is_corrupted_xid to found_corruption. With the
current name, it's not very clear which XID we're saying is corrupted;
in fact, the problem might be a MultiXactId rather than an XID, and
the real issue might be with the table's pg_class entry or something.

The new arguments to heap_prepare_freeze_tuple() need to be documented
in its header comment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING

2020-08-28 Thread Robert Haas

On Mon, Jul 20, 2020 at 4:30 PM Andres Freund  wrote:
> If we really were to do something like this the option would need to be
> called vacuum_allow_making_corruption_worse or such. Its need to be
> *exceedingly* clear that it will likely lead to making everything much
> worse.

I don't really understand this objection. How does letting VACUUM
continue after problems have been detected make anything worse? I
agree that if it does, it shouldn't touch relfrozenxid or relminmxid,
but the patch has been adjusted to work that way. Assuming you don't
touch relfrozenxid or relminmxid, what harm befalls if you continue
freezing undamaged tuples and continue removing dead tuples after
finding a bad tuple? You may have already done an arbitrary amount of
that before encountering the damage, and doing it afterward is no
different. Doing the index vacuuming step is different, but I don't
see how that would exacerbate corruption either.

The point is that when you make VACUUM fail, you not only don't
advance relfrozenxid/relminmxid, but also don't remove dead tuples. In
the long run, either thing will kill you, but it is not difficult to
have a situation where failing to remove dead tuples kills you a lot
faster. The table can just bloat until performance tanks, and then the
application goes down, even if you still had 100+ million XIDs before
you needed a wraparound vacuum.

Honestly, I wonder why continuing (but without advancing relfrozenxid
or relminmxid) shouldn't be the default behavior. I mean, if it
actually corrupts your data, then it clearly shouldn't be, and
probably shouldn't even be an optional behavior, but I still don't see
why it would do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Deprecating postfix and factorial operators in PostgreSQL 13

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 11:56 AM Tom Lane  wrote:
> Yeah, I agree that there are way too many copies here.  CREATE OPERATOR
> seems sufficient.  It also seems like we should just rewrite the typeconv
> and drop_operator examples to use some other operator.  We'll have
> to do that eventually anyway, so why not now, instead of visiting those
> places twice?

Hmm, that's an idea. I think it would be reasonable to rewrite the
typeconv.sgml one, but the one in drop_operator.sgml seems like it
could just be dropped. Its only purpose seems to be to demonstrate how
to drop a right-unary operator vs. a left-unary operator, but I
venture to guess that anyone smart enough to make any sort of
effective use of DROP OPERATOR could probably draw the necessary
inferences anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

2020-08-28 Thread Peter Geoghegan

On Thu, Aug 27, 2020 at 11:04 PM Noah Misch  wrote:
> On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:
> > It's surely not hard to visualize cases where necessary code could
> > be optimized away if the compiler thinks it's entitled to assume
> > such things.
>
> Good point.

I wonder if we should start using -fno-delete-null-pointer-checks:

https://lkml.org/lkml/2018/4/4/601

This may not be strictly relevant to the discussion, but I was
reminded of it just now and thought I'd mention it.

-- 
Peter Geoghegan

Re: New default role- 'pg_read_all_data'

2020-08-28 Thread Stephen Frost

Greetings,

* Gilles Darold (gil...@darold.net) wrote:
> Le 28/08/2020 à 16:52, Stephen Frost a écrit :
> >Using an FDW will often also require having a user mapping and there's
> >no way for that to be accomplished through only GRANT'ing a default
> >role, so I don't think we should mix this specific role with the FDW
> >permissions system.
> 
> I'm fine with that, perhaps it should be mentioned in the documentation that
> foreign tables are not covered by this role.

We could say it doesn't GRANT CONNECT rights on databases, or EXECUTE on
functions too, but that doesn't seem like a terribly good approach for
the documentation to take- instead we document specifically what IS
included, which seems sufficiently clear to me.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING

2020-08-28 Thread Andres Freund

Hi,

On 2020-08-28 12:37:17 -0400, Robert Haas wrote:
> On Mon, Jul 20, 2020 at 4:30 PM Andres Freund  wrote:
> > If we really were to do something like this the option would need to be
> > called vacuum_allow_making_corruption_worse or such. Its need to be
> > *exceedingly* clear that it will likely lead to making everything much
> > worse.
> 
> I don't really understand this objection. How does letting VACUUM
> continue after problems have been detected make anything worse?

It can break HOT chains, plain ctid chains etc, for example. Which, if
earlier / follower tuples are removed can't be detected anymore at a
later time.


> The point is that when you make VACUUM fail, you not only don't
> advance relfrozenxid/relminmxid, but also don't remove dead tuples. In
> the long run, either thing will kill you, but it is not difficult to
> have a situation where failing to remove dead tuples kills you a lot
> faster. The table can just bloat until performance tanks, and then the
> application goes down, even if you still had 100+ million XIDs before
> you needed a wraparound vacuum.
> 
> Honestly, I wonder why continuing (but without advancing relfrozenxid
> or relminmxid) shouldn't be the default behavior. I mean, if it
> actually corrupts your data, then it clearly shouldn't be, and
> probably shouldn't even be an optional behavior, but I still don't see
> why it would do that.

I think it's an EXTREMELY bad idea to enable anything like this by
default. It'll make bugs entirely undiagnosable, because we'll remove a
lot of the evidence of what the problem is. And we've had many long
standing bugs in this area, several only found because we actually
started to bleat about them. And quite evidently, we have more bugs to
fix in the area.

Greetings,

Andres Freund

Re: poc - possibility to write window function in PL languages

2020-08-28 Thread Pavel Stehule

pá 28. 8. 2020 v 8:14 odesílatel Pavel Stehule 
napsal:

>
>
> st 26. 8. 2020 v 17:06 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> I simplified access to results of  winfuncargs functions by proxy type
>> "typedvalue". This type can hold any Datum value, and allows fast cast to
>> basic buildin types or it can use (slower) generic cast functions. It is
>> used in cooperation with a plpgsql assign statement that can choose the
>> correct cast implicitly. When the winfuncarg function returns a value of
>> the same type, that is expected by the variable on the left side of the
>> assign statement, then (for basic types), the value is just copied without
>> casts. With this proxy type is not necessary to have special statement for
>> assigning returned value from winfuncargs functions, so source code of
>> window function in plpgsql looks intuitive to me.
>>
>> Example - implementation of "lag" function in plpgsql
>>
>> create or replace function pl_lag(numeric)
>> returns numeric as $$
>> declare v numeric;
>> begin
>>   v := get_input_value_in_partition(windowobject, 1, -1, 'seek_current',
>> false);
>>   return v;
>> end;
>> $$ language plpgsql window;
>>
>> I think this code is usable, and I assign this patch to commitfest.
>>
>> Regards
>>
>> Pavel
>>
>
> fix regress tests and some doc
>

update - refactored implementation typedvalue type
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 815912666d..5b4d5bbac4 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4568,6 +4568,99 @@ CREATE EVENT TRIGGER snitch ON ddl_command_start EXECUTE FUNCTION snitch();
 
   
 
+ 
+  Window Functions
+
+  
+   window
+   in PL/pgSQL
+  
+
+  
+   PL/pgSQL can be used to define window
+   functions. A window function is created with the CREATE FUNCTION
+   command with clause WINDOW. The specific feature of
+   this functions is a possibility to two special storages with 
+   sorted values of window function arguments and store with stored
+   one value of any type for currently processed partition (of window
+   function).
+  
+
+  
+   Access to both storages is done with special internal variable
+   WINDOWOBJECT. This variable is declared implicitly,
+   and it is available only in window functions.
+
+
+CREATE OR REPLACE FUNCTION plpgsql_rownum() RETURNS int8
+LANGUAGE plpgsql WINDOW
+AS $$
+DECLARE pos int8
+BEGIN
+pos := get_current_position(WINDOWOBJECT);
+pos := pos + 1;
+PERFORM set_mark_position(WINDOWOBJECT, pos);
+RETURN pos;
+$$;
+
+SELECT plpgsql_rownum() OVER (), * FROM tab;
+
+  
+
+  
+   The arguments of window function cannot be accessed directly. The special
+   functions should be used. With these functions we can choose a scope of
+   buffered arguments, we can choose a wanted position against first, current, or
+   last row. The implementation of lag can looks like
+   (the window functions in plpgsql can use polymorphic types too):
+
+
+CREATE OR REPLACE FUNCTION plpgsql_lag(anyelement) RETURNS anyelement
+LANGUAGE plpgsql WINDOW
+AS $$
+BEGIN
+RETURN
+  get_input_value_in_partition(WINDOWOBJECT,
+   1, -1,
+   'seek_current',
+   false);
+END;
+$$;
+
+SELECT v, plpgsql_lag(v) FROM generate_series(1, 10) g(v);
+
+
+  
+
+  
+   Second buffer that can be used in window function is a buffer for one value
+   assigned to partition. The content of this buffer can be read by function
+   get_partition_context_value or modified by function
+   set_partition_context_value. Next function replaces
+   missing values by previous non NULL value:
+
+
+CREATE OR REPLACE FUNCTION plpgsql_replace_missing(numeric) RETURNS numeric
+LANGUAGE plpgsql WINDOW
+AS $$
+DECLATE
+v numeric;
+BEGIN
+v := get_input_value_for_row(WINDOWOBJECT, 1);
+IF v IS NULL THEN
+v := get_partition_context_value(WINDOWOBJECT, NULL::numeric);
+ELSE
+PERFORM set_partition_context_value(WINDOWOBJECT, v);
+END IF;
+RETURN v;
+END;
+$$;
+
+
+  
+
+  
+
   
PL/pgSQL under the Hood
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a2d61302f9..f926f2e386 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1421,6 +1421,18 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+CREATE OR REPLACE FUNCTION
+  get_partition_context_value(windowobjectproxy, anyelement, int4 DEFAULT NULL)
+  RETURNS anyelement
+LANGUAGE internal
+AS 'windowobject_get_partition_context_value';
+
+CREATE OR REPLACE FUNCTION
+  set_partition_context_value(windowobjectproxy, anyelement, int4 DEFAULT NULL)
+  RETURNS void
+LANGUAGE internal
+AS 'windowobject_set_partition_context_value';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
dif

Re: MultiXact\SLRU buffers configuration

2020-08-28 Thread Anastasia Lubennikova


On 08.07.2020 10:03, Andrey M. Borodin wrote:



2 июля 2020 г., в 17:02, Daniel Gustafsson  написал(а):


On 15 May 2020, at 02:03, Kyotaro Horiguchi  wrote:
Generally in such cases, condition variables would work.  In the
attached PoC, the reader side gets no penalty in the "likely" code
path.  The writer side always calls ConditionVariableBroadcast but the
waiter list is empty in almost all cases.  But I couldn't cause the
situation where the sleep 1000u is reached.

The submitted patch no longer applies, can you please submit an updated
version?  I'm marking the patch Waiting on Author in the meantime.

Thanks, Daniel! PFA V2

Best regards, Andrey Borodin.


1) The first patch is sensible and harmless, so I think it is ready for 
committer. I haven't tested the performance impact, though.


2) I like the initial proposal to make various SLRU buffers 
configurable, however, I doubt if it really solves the problem, or just 
moves it to another place?


The previous patch you sent was based on some version that contained 
changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' 
and 'multixact_members_slru_buffers'. Have you just forgot to attach 
them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
Meanwhile, I attach the rebased patch to calm down the CFbot. The 
changes are trivial.


2.1) I think that both min and max values for this parameter are too 
extreme. Have you tested them?


+   &multixact_local_cache_entries,
+   256, 2, INT_MAX / 2,

2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.

3) No changes for third patch. I just renamed it for consistency.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 264b475fbf6ee519dc3b4eb2cca25145d2aaa569 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Sat, 9 May 2020 21:19:18 +0500
Subject: [PATCH v2 1/4] Use shared lock in GetMultiXactIdMembers for offsets
 and members

Previously read of multixact required exclusive control locks in
offstes and members SLRUs. This could lead to contention.
In this commit we take advantge of SimpleLruReadPage_ReadOnly
similar to CLOG usage.
---
 src/backend/access/transam/clog.c  |  2 +-
 src/backend/access/transam/commit_ts.c |  2 +-
 src/backend/access/transam/multixact.c | 12 ++--
 src/backend/access/transam/slru.c  |  8 +---
 src/backend/access/transam/subtrans.c  |  2 +-
 src/backend/commands/async.c   |  2 +-
 src/backend/storage/lmgr/predicate.c   |  2 +-
 src/include/access/slru.h  |  2 +-
 8 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f3da40ae01..3614d0c774 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -643,7 +643,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 
-	slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid);
+	slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid, false);
 	byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
 
 	status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9cdb136435..5fbbf446e3 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -342,7 +342,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	}
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
-	slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid);
+	slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid, false);
 	memcpy(&entry,
 		   CommitTsCtl->shared->page_buffer[slotno] +
 		   SizeOfCommitTimestampEntry * entryno,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 79ec21c75a..241d9dd78d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1321,12 +1321,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	 * time on every multixact creation.
 	 */
 retry:
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
-	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+	/* lock is acquired by SimpleLruReadPage_ReadOnly */
+	slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi, false);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
 	offptr += entryno;
 	offset = *offptr;
@@ -1358,7 +1358,7 @@ retry:
 		entryno = MultiXactIdToOffsetEntry(tmpMXact);
 
 		if (pageno != prev_pageno)
-			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+			slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, tmpMXact, true);
 
 		offptr = (MultiXactOffset *) Mu

Re: new heapcheck contrib module

2020-08-28 Thread Andrey M. Borodin




> 28 авг. 2020 г., в 18:58, Robert Haas  написал(а):
> In the case
> you mention, I think we should view that as a problem with clog rather
> than a problem with the table, and thus out of scope.

I don't think so. ISTM It's the same problem of xmax

Re: new heapcheck contrib module

2020-08-28 Thread Mark Dilger

> On Aug 28, 2020, at 11:10 AM, Andrey M. Borodin  wrote:
> 
> 
> 
>> 28 авг. 2020 г., в 18:58, Robert Haas  написал(а):
>> In the case
>> you mention, I think we should view that as a problem with clog rather
>> than a problem with the table, and thus out of scope.
> 
> I don't think so. ISTM It's the same problem of xmax just hidden behind detoasing.
> Our regular heap_check was checking xmin\xmax invariants for tables, but 
> failed to recognise the problem in toast (while toast was accessible until 
> CLOG truncation).
> 
> Best regards, Andrey Borodin.

If you lock the relations involved, check the toast table first, the toast 
index second, and the main table third, do you still get the problem?  Look at 
how pg_amcheck handles this and let me know if you still see a problem.  There 
is the ever present problem that external forces, like a rogue process deleting 
backend files, will strike at precisely the wrong moment, but barring that kind 
of concurrent corruption, I think the toast table being checked prior to the 
main table being checked solves some of the issues you are worried about.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

2020-08-28 Thread Ranier Vilela

Em sex., 28 de ago. de 2020 às 03:04, Noah Misch 
escreveu:

> On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:
> > Noah Misch  writes:
> > > On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:
> > >> Looks legit, and at least per commit 13bba02271dc we do fix such
> things,
> > >> even if it's useless in practice.
> > >> Given that no buildfarm member has ever complained, this exercise
> seems
> > >> pretty pointless.
> >
> > > Later decision to stop changing such code:
> > >
> https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9e...@2ndquadrant.com
>
> > I think that the actual risk here has to do not with
> > what memcmp() might do, but with what gcc might do in code surrounding
> > the call, once it's armed with the assumption that any pointer we pass
> > to memcmp() could not be null.  See
> >
> >
> https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl
> >
> > It's surely not hard to visualize cases where necessary code could
> > be optimized away if the compiler thinks it's entitled to assume
> > such things.
>
> Good point.  We could pick from a few levels of concern:
>
> - No concern: reject changes serving only to remove this class of
> deviation.
>   This is today's policy.
> - Medium concern: accept fixes, but the buildfarm continues not to break in
>   the face of new deviations.  This will make some code uglier, but we'll
> be
>   ready against some compiler growing the optimization you describe.
> - High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm
> member
>   thorntail.  In addition to the drawback of the previous level, this will
>   create urgent work for committers introducing new deviations (or
> introducing
>   test coverage that unearths old deviations).  This is our current
> response
>   to Valgrind complaints, for example.
>
Maybe in this specific case, the policy could be ignored, this change does
not hurt.

--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -293,7 +293,7 @@ TransactionIdSetPageStatus(TransactionId xid, int
nsubxids,
  * sub-XIDs and all of the XIDs for which we're adjusting clog should be
  * on the same page.  Check those conditions, too.
  */
- if (all_xact_same_page && xid == MyProc->xid &&
+ if (all_xact_same_page && subxids && xid == MyProc->xid &&
  nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
  nsubxids == MyProc->subxidStatus.count &&
  memcmp(subxids, MyProc->subxids.xids,

regards,
Ranier Vilela

Re: [Patch] ALTER SYSTEM READ ONLY

2020-08-28 Thread Robert Haas

On Wed, Aug 19, 2020 at 6:28 AM Amul Sul  wrote:
> Attached is a rebased on top of the latest master head (# 3e98c0bafb2).

Does anyone, especially anyone named Andres Freund, have comments on
0001? That work is somewhat independent of the rest of this patch set
from a theoretical point of view, and it seems like if nobody sees a
problem with the line of attack there, it would make sense to go ahead
and commit that part. Considering that this global barrier stuff is
new and that I'm not sure how well we really understand the problems
yet, there's a possibility that we might end up revising these details
again. I understand that most people, including me, are somewhat
reluctant to see experimental code get committed, in this case that
ship has basically sailed already, since neither of the patches that
we thought would use the barrier mechanism end up making it into v13.
I don't think it's really making things any worse to try to improve
the mechanism.

0002 isn't separately committable, but I don't see anything wrong with it.

Regarding 0003:

I don't understand why ProcessBarrierWALProhibit() can safely assert
that the WALPROHIBIT_STATE_READ_ONLY is set.

+errhint("Cannot continue a
transaction if it has performed writes while system is read only.")));

This sentence is bad because it makes it sound like the current
transaction successfully performed a write after the system had
already become read-only. I think something like errdetail("Sessions
with open write transactions must be terminated.") would be better.

I think SetWALProhibitState() could be in walprohibit.c rather than
xlog.c. Also, this function appears to have obvious race conditions.
It fetches the current state, then thinks things over while holding no
lock, and then unconditionally updates the current state. What happens
if somebody else has changed the state in the meantime? I had sort of
imagined that we'd use something like pg_atomic_uint32 for this and
manipulate it using compare-and-swap operations. Using some kind of
lock is probably fine, too, but you have to hold it long enough that
the variable can't change under you while you're still deciding
whether it's OK to modify it, or else recheck after reacquiring the
lock that the value doesn't differ from what you expect.

I think the choice to use info_lck to synchronize
SharedWALProhibitState is very strange -- what is the justification
for that? I thought the idea might be that we frequently need to check
SharedWALProhibitState at times when we'd be holding info_lck anyway,
but it looks to me like you always do separate acquisitions of
info_lck just for this, in which case I don't see why we should use it
here instead of a separate lock. For that matter, why does this need
to be part of XLogCtlData rather than a separate shared memory area
that is private to walprohibit.c?

-   else
+   /*
+* Can't perform checkpoint or xlog rotation without writing WAL.
+*/
+   else if (XLogInsertAllowed())

Not project style.

+   case WAIT_EVENT_SYSTEM_WALPROHIBIT_STATE_CHANGE:

Can we drop the word SYSTEM here to make this shorter, or would that
break some convention?

+/*
+ * NB: The return string should be the same as the _ShowOption() for boolean
+ * type.
+ */
+ static const char *
+ show_system_is_read_only(void)
+{

I'm not sure the comment is appropriate here, but I'm very sure the
extra spaces before "static" and "show" are not per style.

+   /*  We'll be done once in-progress flag bit is cleared */

Another whitespace mistake.

+   elog(DEBUG1, "WALProhibitRequest: Waiting for checkpointer");
+   elog(DEBUG1, "Done WALProhibitRequest");

I think these should be removed.

Can WALProhibitRequest() and performWALProhibitStateChange() be moved
to walprohibit.c, just to bring more of the code for this feature
together in one place? Maybe we could also rename them to
RequestWALProhibitChange() and CompleteWALProhibitChange()?

-* think it should leave the child state in place.
+* think it should leave the child state in place.  Note that the upper
+* transaction will be a force to ready-only irrespective of
its previous
+* status if the server state is WAL prohibited.
 */
-   XactReadOnly = s->prevXactReadOnly;
+   XactReadOnly = s->prevXactReadOnly || !XLogInsertAllowed();

Both instances of this pattern seem sketchy to me. You don't expect
that reverting the state to a previous state will instead change to a
different state that doesn't match up with what you had before. What
is the bad thing that would happen if we did not make this change?

-* Else, must check to see if we're still in recovery.
+* Else, must check to see if we're still in recovery

Spurious change.

+   /* Request checkpoint */
+   RequestCheckpoint(CHECKPOINT_IMMEDIATE);
+   ereport(LOG, (errmsg("sy

Re: new heapcheck contrib module

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin  wrote:
> I don't think so. ISTM It's the same problem of xmax just hidden behind detoasing.
> Our regular heap_check was checking xmin\xmax invariants for tables, but 
> failed to recognise the problem in toast (while toast was accessible until 
> CLOG truncation).

The code can (and should, and I think does) refrain from looking up
XIDs that are out of the range thought to be valid -- but how do you
propose that it avoid looking up XIDs that ought to have clog data
associated with them despite being >= relfrozenxid and < nextxid?
TransactionIdDidCommit() does not have a suppress-errors flag, adding
one would be quite invasive, yet we cannot safely perform a
significant number of checks without knowing whether the inserting
transaction committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Thomas Munro

On Sat, Aug 29, 2020 at 12:43 AM Jakub Wartak  wrote:
> USERPID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> postgres 120935  0.9  0.0 866052  3824 ?Ss   09:47   0:00 postgres: 
> checkpointer
> postgres 120936 61.9  0.0 865796  3824 ?Rs   09:47   0:22 postgres: 
> background writer
> postgres 120937 97.4  0.0 865940  5228 ?Rs   09:47   0:36 postgres: 
> startup recovering 00010089
>
> speedup of 1.647x

Thanks for testing!  That's better than I expected.  I guess it wasn't
quite so good with default bgwriter settings.

> I have only one comment about those 2 WIP patches, bgwriter_lru_maxpages 
> should be maybe called standby_bgwriter_lru_maxpages in this scenario or even 
> more preferred there shouldn't be a maximum set during closed DB recovery 
> scenario.

I wish bgwriter could auto-tune itself better, so we wouldn't need to
contemplate adding more settings.

As for the second patch ("Optionally, don't wait for end-of-recovery
checkpoint."), that also looked quite useful in your test scenario:

> end_of_recovery_checkpoint_wait = off, before DB is open 15s faster

> I suppose a checkpoint for large shared_buffers (hundredths of GB) might take 
> a lot of time and this 0002 patch bypasses that. I would find it quite useful 
> in some scenarios (e.g. testing backups, PITR recoveries, opening DB from 
> storage snapshots / storage replication, maybe with DWH-after-crash too).

I suppose a third option that you might want is no checkpoint at all
(meaning leave it to regular checkpoint scheduling), like fast
promotion.  One thing missing from the patch is that we probably need
to log an end-of-recovery *record*, like fast promotion does.  I'm a
little fuzzy on the timeline stuff.  I wonder if any recovery experts
would like to weigh in on theoretical problems that might be lurking
here...

Re: Allow ERROR from heap_prepare_freeze_tuple to be downgraded to WARNING

2020-08-28 Thread Robert Haas

On Fri, Aug 28, 2020 at 1:29 PM Andres Freund  wrote:
> It can break HOT chains, plain ctid chains etc, for example. Which, if
> earlier / follower tuples are removed can't be detected anymore at a
> later time.

I think I need a more specific example here to understand the problem.
If the xmax of one tuple matches the xmin of the next, then either
both values precede relfrozenxid or both follow it. In the former
case, neither tuple should be frozen and the chain should not get
broken; in the latter case, everything's normal anyway. If the xmax
and xmin don't match, then the chain was already broken. Perhaps we
are removing important evidence, though it seems like that might've
happened anyway prior to reaching the damaged page, but we're not
making whatever corruption may exist any worse. At least, not as far
as I can see.

> And we've had many long
> standing bugs in this area, several only found because we actually
> started to bleat about them. And quite evidently, we have more bugs to
> fix in the area.

I agree with all of this, but I do not think that it establishes that
we should abandon the entire VACUUM. "Bleating" about something
usually means logging it, and I think you understand that I am not now
nor have I ever complained about the logging we are doing here. I also
think you understand why I don't like the current behavior, and that
EDB has actual customers who have actually been damaged by it. All the
same, I don't expect to win an argument about changing the default,
but I hope to win one about at least providing an option. And if we're
not even going to do that much, then I hope to come out of this
discussion with a clear understanding of exactly why that's a bad
idea. I don't think "we need the data for forensics" is a sufficient
justification for "if you end up with one corrupted XID in a
billion-row table, your entire table will bloat out the wazoo, and
there is no option to get any other behavior."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handing off SLRU fsyncs to the checkpointer

2020-08-28 Thread Thomas Munro

On Sat, Aug 29, 2020 at 12:43 AM Jakub Wartak  wrote:
> ... %CPU ... COMMAND
> ... 97.4 ... postgres: startup recovering 00010089

So, what else is pushing this thing off CPU, anyway?  For one thing, I
guess it might be stalling while reading the WAL itself, because (1)
we only read it 8KB at a time, relying on kernel read-ahead, which
typically defaults to 128KB I/Os unless you cranked it up, but for
example we know that's not enough to saturate a sequential scan on
NVME system, so maybe it hurts here too (2) we keep having to switch
segment files every 16MB.  Increasing WAL segment size and kernel
readahead size presumably help with that, if indeed it is a problem,
but we could also experiment with a big POSIX_FADV_WILLNEED hint for a
future segment every time we cross a boundary, and also maybe increase
the size of our reads.

Re: Improve planner cost estimations for alternative subplans

2020-08-28 Thread Andy Fan

On Wed, Aug 26, 2020 at 4:21 PM Andy Fan  wrote:

>
>
> On Mon, Aug 17, 2020 at 10:12 PM Andy Fan 
> wrote:
>
>>
>>
>> On Mon, Jun 22, 2020 at 9:39 AM Tom Lane  wrote:
>>
>>> I wrote:
>>> > Nope.  The entire reason why we have that kluge is that we don't know
>>> > until much later how many times we expect to execute the subplan.
>>> > AlternativeSubPlan allows the decision which subplan form to use to be
>>> > postponed till runtime; but when we're doing things like estimating the
>>> > cost and selectivity of a where-clause, we don't know that.
>>>
>>> > Maybe there's some way to recast things to avoid that problem,
>>> > but I have little clue what it'd look like.
>>>
>>> Actually ... maybe it's not that bad.  Clearly there would be a
>>> circularity issue for selectivity estimation, but all the alternatives
>>> should have the same selectivity.  Cost estimation is a different story:
>>> by the time we need to do cost estimation for a subexpression, we do in
>>> many cases have an idea how often the subexpression will be executed.
>>>
>>>
>> I read your idea of "ripping out all executor support for
>> AlternativeSubPlan
>>  and instead having the planner replace an AlternativeSubPlan with
>>  the desired specific SubPlan somewhere late in planning, possibly
>> setrefs.c."
>> in [1].  I was thinking that if we can do such a replacement sooner,
>> for example once we know the num_calls for the subplans, Unknown if it
>> is possible though.  If we can, then we can handle the issue here as well.
>>
>> The attached is a very PoC version,  I'm not sure if it is the right
>> direction
>> to go.
>>
>
> The idea behind it is if we have a RelOptInfo which have
> some  AlternativeSubPlan,
> and assume these subplans have some correlated vars which can be expressed
> as
> deps_relids.  Then we can convert the AlternativeSubPlan to SubPlan once
> bms_is_subset(subplan->deps_relids,  rel->relids).
>

The way of figuring out subplan->deps_relids was wrong in my patch,  I will
fix it later.
But the general idea is the same.


> My patch is able to fix the issue reported
> here and it only converts the AlternativeSubPlan in rel->reltarget for
> demo purpose.
>
> --
> Best Regards
> Andy Fan
>


-- 
Best Regards
Andy Fan

Re: Disk-based hash aggregate's cost model

2020-08-28 Thread Jeff Davis

On Thu, 2020-08-27 at 17:28 -0700, Peter Geoghegan wrote:
> We have a Postgres 13 open item for Disk-based hash aggregate, which
> is the only non-trivial open item. There is a general concern about
> how often we get disk-based hash aggregation when work_mem is
> particularly low, and recursion seems unavoidable. This is generally
> thought to be a problem in the costing.

We discussed two approaches to tweaking the cost model:

1. Penalize HashAgg disk costs by a constant amount. It seems to be
chosen a little too often, and we can reduce the number of plan
changes.

2. Treat recursive HashAgg spilling skeptically, and heavily penalize
recursive spilling.

The problem with approach #2 is that we have a default hash mem of 4MB,
and real systems have a lot more than that. In this scenario, recursive
spilling can beat Sort by a lot.

For instance:

Data:
  create table text10m(t text collate "C.UTF-8", i int, n numeric);
  insert into t10m
select s.g::text, s.g, s.g::numeric
from (
  select (random()*10)::int as g
  from generate_series(1,1000)) s;
  vacuum (freeze,analyze) text10m;

Query: explain analyze select distinct t from text10m;

HashAgg: 10.5s
Sort+Distinct: 46s

I'm inclined toward option #1 for simplicity unless you feel strongly
about option #2. Specifically, I was thinking of a 1.5X penalty for
HashAgg disk costs.

Regards,
Jeff Davis

Re: More aggressive vacuuming of temporary tables

2020-08-28 Thread Michael Paquier

On Fri, Aug 28, 2020 at 11:46:49AM -0400, Tom Lane wrote:
> Hence I propose 0001 attached.  80% of it is just API additions to allow
> passing down the isTopLevel flag so that we can do the right thing in
> the CLUSTER case; the important change is in vacuum_set_xid_limits.
> (For ease of review, I didn't reindent the existing code in
> vacuum_set_xid_limits, but that would be something to do before commit.)

I got to wonder lately if we should not have a static state like what
we do for MyXactFlags to track down if a utility query is a top-level
one or not as most of the time we just want to check the flag as
passed down by standard_ProcessUtility().  I have faced this problem
as well lately when pushing down a PreventInTransactionBlock() for
some stuff with REINDEX for example.  Not sure how reliable this would
be though..  Passing isTopLevel down the road is a no-brainer, and
perhaps having a static value would create more problems than it
solves in practice.
--
Michael


signature.asc
Description: PGP signature

Re: Ideas about a better API for postgres_fdw remote estimates

2020-08-28 Thread Andrey Lepikhov




On 7/14/20 6:32 AM, Bruce Momjian wrote:

On Mon, Jul  6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw.  That still doesn't make it part of EXPLAIN.


Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above.  I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though.  I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me.  Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.


I agree the statistics extraction should probably be part of core.
There is the goal if FDWs returning data, and returning the data
quickly.  I think we can require all-new FDW servers to get improved
performance.  I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet.  I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.

On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well.  Getting multiple costs for a query goes in that direction.



During the implementation of sharding related improvements i noticed 
that if we use a lot of foreign partitions, we have bad plans because of 
vacuum don't update statistics of foreign tables.This is done by the 
ANALYZE command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch 
in attachment.


I implemented some FDW + pg core machinery to reduce weight of the 
problem. The ANALYZE command on foreign table executes query on foreign 
server that extracts statistics tuple, serializes it into json-formatted 
string and returns to the caller. The caller deserializes this string, 
generates statistics for this foreign table and update it. The second 
patch is a proof-of-concept.


This patch speedup analyze command and provides statistics relevance on 
a foreign table after autovacuum operation. Its effectiveness depends on 
relevance of statistics on the remote server, but still.


--
regards,
Andrey Lepikhov
Postgres Professional
>From 329954981959ee3fc97e52266c89a436d02ddf5e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 4 Aug 2020 09:29:37 +0500
Subject: [PATCH 1/2] TAP-Test on bad statistics.

Add dummy postgres_fdw_stat() routine. It will return stat tuples
for each input attribute.
---
 contrib/postgres_fdw/Makefile  |  4 ++--
 contrib/postgres_fdw/expected/foreign_stat.out | 18 ++
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql| 11 +++
 contrib/postgres_fdw/postgres_fdw.c| 17 +
 contrib/postgres_fdw/postgres_fdw.control  |  2 +-
 contrib/postgres_fdw/sql/foreign_stat.sql  | 10 ++
 6 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 contrib/postgres_fdw/expected/foreign_stat.out
 create mode 100644 contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
 create mode 100644 contrib/postgres_fdw/sql/foreign_stat.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..15a6f6c353 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -14,9 +14,9 @@ PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK_INTERNAL = $(libpq)
 
 EXTENSION = postgres_fdw
-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw foreign_stat
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/expected/foreign_stat.out b/contrib/postgres_fdw/expected/foreign_stat.out
new file mode 100644
index 00..28a470bccc
--- /dev/null
+++ b/contrib/postgres_fdw/expected/foreign_stat.out
@@ -0,0 +1,18 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
+VACUUM;
+-- Check statistic interface routine
+SELECT * FROM postgres_fdw_stat('public', 'test', 'a');
+ postgres_fdw_stat 
+---
+ 
+(1 row)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE)
+SELECT * FROM ftable;
+ QUERY PLAN  
+-

70 matches

Mail list logo