Return value of PathNameOpenFile()

2022-09-06 Thread Antonin Houska
I've noticed that some callers of PathNameOpenFile()
(e.g. bbsink_server_begin_archive()) consider the call failed even if the
function returned zero, while other ones do check whether the file descriptor
is strictly negative. Since the file descriptor is actually returned by the
open() system call, I assume that zero is a valid result, isn't it?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread kuroda.hay...@fujitsu.com
Dear hackers,

> I agreed both that DEBUG2 messages are still useful but we should not
> change the condition for output. So I prefer the idea suggested by Amit.

PSA newer patch, which contains the fix and test.

> > I have not verified but I think we need to backpatch this till 14
> > because prior to that in DecodeCommit, we use to set the top-level txn
> > as having catalog changes based on the if there are invalidation
> > messages in the commit record. So, in the current scenario shared by
> > Osumi-San, before SnapBuildCommitTxn(), the top-level txn will be
> > marked as having catalog changes.
> 
> I and Osumi-san are now investigating that, so please wait further reports and
> patches.

We investigated it about older versions, and in some versions *another 
stack-trace* has been found.


About PG10-13, indeed, the failure was not occurred.
In these versions transactions are regarded as
that have catalog changes when the commit record has XACT_XINFO_HAS_INVALS flag.
This flag will be set if the transaction has invalidation messages.

When sub transaction changes system catalogs and user commits,
all invalidation messages allocated in sub transaction will be transferred to 
top transaction.
Therefore both transactions will be marked as containing catalog changes.


About PG14 and 15, however, another stack-trace has been found.
While executing the same workload, we got followings at the same SQL statement;

```
(gdb) backtrace
#0  0x7fa78c6dc387 in raise () from /lib64/libc.so.6
#1  0x7fa78c6dda78 in abort () from /lib64/libc.so.6
#2  0x00b16680 in ExceptionalCondition (conditionName=0xcd3ab0 
"txn->ninvalidations == 0", errorType=0xcd3284 "FailedAssertion", 
fileName=0xcd32d0 "reorderbuffer.c", lineNumber=2936) at assert.c:69
#3  0x008e9e70 in ReorderBufferForget (rb=0x12b5b10, xid=735, 
lsn=24125384) at reorderbuffer.c:2936
#4  0x008d9493 in DecodeCommit (ctx=0x12a2d20, buf=0x7ffe08b236b0, 
parsed=0x7ffe08b23510, xid=734, two_phase=false) at decode.c:733
#5  0x008d8962 in DecodeXactOp (ctx=0x12a2d20, buf=0x7ffe08b236b0) at 
decode.c:279
#6  0x008d85e2 in LogicalDecodingProcessRecord (ctx=0x12a2d20, 
record=0x12a30e0) at decode.c:142
#7  0x008dfef2 in pg_logical_slot_get_changes_guts (fcinfo=0x129acb0, 
confirm=true, binary=false) at logicalfuncs.c:296
#8  0x008e002f in pg_logical_slot_get_changes (fcinfo=0x129acb0) at 
logicalfuncs.c:365
...
(gdb) frame 4
#4  0x008d9493 in DecodeCommit (ctx=0x14cfd20, buf=0x7ffc638b0ca0, 
parsed=0x7ffc638b0b00, xid=734, two_phase=false) at decode.c:733
733 ReorderBufferForget(ctx->reorder, 
parsed->subxacts[i], buf->origptr);
(gdb) list
728  */
729 if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
730 {
731 for (i = 0; i < parsed->nsubxacts; i++)
732 {
733 ReorderBufferForget(ctx->reorder, 
parsed->subxacts[i], buf->origptr);
734 }
735 ReorderBufferForget(ctx->reorder, xid, buf->origptr);
736 
737 return;
(gdb) frame 3
#3  0x008e9e70 in ReorderBufferForget (rb=0x14e2b10, xid=735, 
lsn=24125152) at reorderbuffer.c:2936
2936Assert(txn->ninvalidations == 0);
(gdb) list
2931 */
2932if (txn->base_snapshot != NULL && txn->ninvalidations > 0)
2933ReorderBufferImmediateInvalidation(rb, 
txn->ninvalidations,
2934
   txn->invalidations);
2935else
2936Assert(txn->ninvalidations == 0);
2937 
2938/* remove potential on-disk data, and deallocate */
2939ReorderBufferCleanupTXN(rb, txn);
2940}
(gdb) print *txn
$1 = {txn_flags = 3, xid = 735, toplevel_xid = 734, gid = 0x0, first_lsn = 
24113488, final_lsn = 24125152, end_lsn = 0, toptxn = 0x14ecb98, 
  restart_decoding_lsn = 24113304, origin_id = 0, origin_lsn = 0, commit_time = 
0, base_snapshot = 0x0, base_snapshot_lsn = 0, 
  base_snapshot_node = {prev = 0x14ecc00, next = 0x14e2b28}, snapshot_now = 
0x0, command_id = 4294967295, nentries = 5, nentries_mem = 5, 
  changes = {head = {prev = 0x14eecf8, next = 0x14eeb18}}, tuplecids = {head = 
{prev = 0x14ecb10, next = 0x14ecb10}}, ntuplecids = 0, 
  tuplecid_hash = 0x0, toast_hash = 0x0, subtxns = {head = {prev = 0x14ecb38, 
next = 0x14ecb38}}, nsubtxns = 0, ninvalidations = 3, 
  invalidations = 0x14e2d28, node = {prev = 0x14ecc68, next = 0x14ecc68}, size 
= 452, total_size = 452, concurrent_abort = false, 
  output_plugin_private = 0x0}
```

In these versions DecodeCommit() said OK. However, we have met another failure
because the ReorderBufferTXN of the sub transaction had invalidation messages 
but it did not have base_snapshot.

I thought that this failure was occurred the only the base_snapsh

Re: Remove dead macro exec_subplan_get_plan

2022-09-06 Thread Zhang Mingli
Hi,all

Regards,
Zhang Mingli
On Sep 6, 2022, 10:22 +0800, Richard Guo , wrote:
>
> On Tue, Sep 6, 2022 at 1:18 AM Tom Lane  wrote:
> > Zhang Mingli  writes:
> > > Macro exec_subplan_get_plan is not used anymore.
> > > Attach a patch to remove it.
> >
> > Hm, I wonder why it's not used anymore.  Maybe we no longer need
> > that list at all?  If we do, should use of the macro be
> > re-introduced in the accessors?

The PlannedStmt->subplans list is still used at several places.

> Seems nowadays no one fetches the Plan from PlannedStmt->subplans with a
> certain plan_id any more. Previously back in eab6b8b2 where this macro
> was introduced, it was used in explain_outNode and ExecInitSubPlan.
>
> I find a similar macro, planner_subplan_get_plan, who fetches the Plan
> from glob->subplans. We can use it in the codes where needed. For
> example, in the new function SS_make_multiexprs_unique.
>
>      /* Found one, get the associated subplan */
> -    plan = (Plan *) list_nth(root->glob->subplans, splan->plan_id - 1);
> +    plan = planner_subplan_get_plan(root, splan);
>
> Thanks
> Richard

Yeah, searched on history and found:
exec_subplan_get_plan was once used in ExecInitSubPlan() to create planstate.

```
Plan   *plan = exec_subplan_get_plan(estate->es_plannedstmt, subplan);
...
node->planstate = ExecInitNode(plan, sp_estate, eflags);
```

And now in ExecInitSubPlan(), planstate comes from es_subplanstates.

```
/* Link the SubPlanState to already-initialized subplan */
sstate->planstate = (PlanState *) list_nth(estate->es_subplanstates, 
subplan->plan_id - 1);
```

And estate->es_subplanstates is evaluated through a for-range of subplans list 
at some functions.

```
foreach(l, plannedstmt->subplans)
{
 ...
 estate->es_subplanstates = lappend(estate->es_subplanstates, subplanstate);
}
```




Re: Return value of PathNameOpenFile()

2022-09-06 Thread Daniel Gustafsson
> On 6 Sep 2022, at 09:26, Antonin Houska  wrote:
> 
> I've noticed that some callers of PathNameOpenFile()
> (e.g. bbsink_server_begin_archive()) consider the call failed even if the
> function returned zero, while other ones do check whether the file descriptor
> is strictly negative. Since the file descriptor is actually returned by the
> open() system call, I assume that zero is a valid result, isn't it?

Agreed, zero should be valid as it's a non-negative integer.  However, callers
in fd.c are themselves checking for (fd <= 0) in some cases, and some have done
so since the very early days of the codebase, so I wonder if there historically
used to be a platform which considered 0 an invalid fd?

--
Daniel Gustafsson   https://vmware.com/





RE: pg_publication_tables show dropped columns

2022-09-06 Thread houzj.f...@fujitsu.com
On Tuesday, September 6, 2022 11:13 AM Tom Lane  wrote:
> 
> Jaime Casanova  writes:
> > Just trying the new column/row filter on v15, I found this issue that
> > could be replicated very easily.
> 
> Bleah.  Post-beta4 catversion bump, here we come.

Oh, Sorry for the miss.

> > This could be solved by adding a "NOT attisdropped", simple patch
> > attached.
> 
> That view seems quite inefficient as written --- I wonder if we can't do 
> better by
> nuking the join-to-unnest business and putting the restriction in a WHERE
> clause on the pg_attribute scan.
> The query plan that you get for it right now is certainly awful.

I agree and try to improve the query as suggested.

Here is the new version patch.
I think the query plan and cost looks better after applying the patch.

Best regards,
Hou zj



v2-0001-Ignore-dropped-columns-in-pg_publication_tables.patch
Description:  v2-0001-Ignore-dropped-columns-in-pg_publication_tables.patch

 QUERY PLAN AFTER APPLYING THE PATCH

 Hash Join  (cost=24.41..92627.20 rows=5000 width=256)
   Hash Cond: (c.relnamespace = n.oid)
   ->  Nested Loop  (cost=23.32..137.61 rows=5000 width=200)
 ->  Seq Scan on pg_publication p  (cost=0.00..1.05 rows=5 width=64) 
 ->  Hash Join  (cost=23.32..35.97 rows=1000 width=136)
   Hash Cond: (gpt.relid = c.oid)
   ->  Function Scan on pg_get_publication_tables gpt  
(cost=0.01..10.01 rows=1000 width=68) 
   ->  Hash  (cost=18.14..18.14 rows=414 width=72) 
 Buckets: 1024  Batches: 1  Memory Usage: 51kB
 ->  Seq Scan on pg_class c  (cost=0.00..18.14 rows=414 
width=72)
   ->  Hash  (cost=1.04..1.04 rows=4 width=68) 
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68) 
(actual time=0.006..0.008 rows=4 loops=1)
   SubPlan 1
 ->  Aggregate  (cost=18.48..18.49 rows=1 width=32) (actual 
time=0.027..0.027 rows=1 loops=3)
   ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute 
a  (cost=0.28..18.47 rows=1 width=66) (actual time=0.020..0.022 rows=1 loops=3)
 Index Cond: ((attrelid = gpt.relid) AND (attnum > 0))
 Filter: ((NOT attisdropped) AND ((attnum = ANY 
((gpt.attrs)::smallint[])) OR (gpt.attrs IS NULL)))
 Rows Removed by Filter: 2




 QUERY PLAN BEFORE APPLYING THE PATCH

 Hash Join  (cost=24.41..166633.24 rows=5000 width=256) (actual 
time=0.593..0.735 rows=3 loops=1)
   Hash Cond: (c.relnamespace = n.oid)
   ->  Nested Loop  (cost=23.32..137.61 rows=5000 width=202) (actual 
time=0.492..0.550 rows=3 loops=1)
 ->  Seq Scan on pg_publication p  (cost=0.00..1.05 rows=5 width=64) 
(actual time=0.005..0.007 rows=5 loops=1)
 ->  Hash Join  (cost=23.32..35.97 rows=1000 width=138) (actual 
time=0.107..0.107 rows=1 loops=5)
   Hash Cond: (gpt.relid = c.oid)
   ->  Function Scan on pg_get_publication_tables gpt  
(cost=0.01..10.01 rows=1000 width=68) (actual time=0.025..0.025 rows=1 loops=5)
   ->  Hash  (cost=18.14..18.14 rows=414 width=74) (actual 
time=0.392..0.392 rows=414 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 52kB
 ->  Seq Scan on pg_class c  (cost=0.00..18.14 rows=414 
width=74) (actual time=0.006..0.243 rows=414 loops=1)
   ->  Hash  (cost=1.04..1.04 rows=4 width=68) (actual time=0.016..0.017 rows=4 
loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68) 
(actual time=0.009..0.011 rows=4 loops=1)
   SubPlan 2
 ->  Aggregate  (cost=33.28..33.29 rows=1 width=32) (actual 
time=0.049..0.050 rows=1 loops=3)
   InitPlan 1 (returns $2)
 ->  Aggregate  (cost=12.51..12.52 rows=1 width=32) (actual 
time=0.012..0.013 rows=1 loops=1)
   ->  Function Scan on generate_series g  (cost=0.01..10.01 
rows=1000 width=4) (actual time=0.004..0.005 rows=3 loops=1)
   ->  Sort  (cost=20.75..20.76 rows=1 width=66) (actual 
time=0.043..0.043 rows=2 loops=3)
 Sort Key: a.attnum
 Sort Method: quicksort  Memory: 25kB
 ->  Hash Join  (cost=0.54..20.69 rows=1 width=66) (actual 
time=0.033..0.035 rows=2 loops=3)
   Hash Cond: (a.attnum = k.k)
   ->  Index Scan using pg_attribute_relid_attnum_index on 
pg_attribute a  (c

Re: [RFC] building postgres with meson - v12

2022-09-06 Thread John Naylor
On Mon, Sep 5, 2022 at 4:11 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-04 13:12:52 +0700, John Naylor wrote:
> > On Fri, Sep 2, 2022 at 11:35 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2022-09-02 14:17:26 +0700, John Naylor wrote:
> > > > On Thu, Sep 1, 2022 at 1:12 AM Andres Freund  wrote:
> > > > > [v12]
> > > >
> > > > +# Build a small utility static lib for the parser. This makes it 
> > > > easier to not
> > > > +# depend on gram.h already having been generated for most of the other 
> > > > code
> > > > +# (which depends on generated headers having been generated). The 
> > > > generation
> > > > +# of the parser is slow...
> > > >
> > > > It's not obvious whether this is intended to be a Meson-only
> > > > optimization or a workaround for something awkward to specify.
> > >
> > > It is an optimization. The parser generation is by far the slowest part 
> > > of a
> > > build. If other files can only be compiled once gram.h is generated, 
> > > there's a
> > > long initial period where little can happen. So instead of having all .c 
> > > files
> > > have a dependency on gram.h having been generated, the above makes only
> > > scan.c, gram.c compilation depend on gram.h.  It only matters for the 
> > > first
> > > compilation, because such dependencies are added as order-only 
> > > dependencies,
> > > supplanted by more precise compiler generated dependencies after.
> >
> > Okay, I think the comment could include some of this info for clarity.
>
> Working on that.
>
>
> > > It's still pretty annoying that so much of the build is initially idle,
> > > waiting for genbki.pl to finish.
> > >
> > > Part of that is due to some ugly dependencies of src/common on backend 
> > > headers
> > > that IMO probably shouldn't exist (e.g. src/common/relpath.c includes
> > > catalog/pg_tablespace_d.h).
> >
> > Technically, *_d.h headers are not backend, that's why it's safe to
> > include them anywhere. relpath.c in its current form has to know the
> > tablespace OIDs, which I guess is what you think is ugly. (I agree
> > it's not great)
>
> Yea, I'm not saying it's unsafe in a produces-wrong-results way, just that it
> seems architecturally dubious / circular.
>
>
> > > Looks like it'd not be hard to get at least the
> > > _shlib version of src/common and libpq build without waiting for that. 
> > > But for
> > > all the backend code I don't really see a way, so it'd be nice to make 
> > > genbki
> > > faster at some point.
> >
> > The attached gets me a ~15% reduction in clock time by having
> > Catalog.pm parse the .dat files in one sweep, when we don't care about
> > formatting, i.e. most of the time:
>
> Cool. Seems worthwhile.

Okay, here's a cleaned up version with more idiomatic style and a new
copy of the perlcritic exception.

Note that the indentation hasn't changed. My thought there: perltidy
will be run again next year, at which time it will be part of a listed
whitespace-only commit. Any objections, since that could confuse
someone before then?

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e91a8e10a8..a71f5b05a8 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -287,6 +287,8 @@ sub ParseData
 	my $catname = $1;
 	my $data= [];
 
+	if ($preserve_formatting)
+	{
 	# Scan the input file.
 	while (<$ifd>)
 	{
@@ -346,11 +348,25 @@ sub ParseData
 		{
 			push @$data, $hash_ref if !$hash_ref->{autogenerated};
 		}
-		elsif ($preserve_formatting)
+		else
 		{
 			push @$data, $_;
 		}
 	}
+	}
+	else
+	{
+		# When we only care about the contents, it's faster to read and eval
+		# the whole file at once.
+		local $/;
+		my $full_file = <$ifd>;
+		eval '$data = ' . $full_file;## no critic (ProhibitStringyEval)
+		foreach my $hash_ref (@{$data})
+		{
+			AddDefaultValues($hash_ref, $schema, $catname);
+		}
+	}
+
 	close $ifd;
 
 	# If this is pg_type, auto-generate array types too.


Out-of-date comments about RecentGlobalXmin?

2022-09-06 Thread Japin Li


In commit dc7420c2c9, the RecentGlobalXmin variable is removed, however,
there are some places that reference it.

$ grep 'RecentGlobalXmin' -rn src/
src/backend/replication/logical/launcher.c:101:  * the secondary effect that it 
sets RecentGlobalXmin.  (This is critical
src/backend/utils/init/postinit.c:790:   * interested in the secondary effect 
that it sets RecentGlobalXmin. (This
src/backend/postmaster/autovacuum.c:1898:* the secondary effect that it 
sets RecentGlobalXmin.  (This is critical

It's out-of-date, doesn't it?  I'm not sure s/RecentGlobalXmin/RecentXmin/g
is right.  Any thoughts?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: build remaining Flex files standalone

2022-09-06 Thread John Naylor
On Mon, Sep 5, 2022 at 1:18 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-04 12:16:10 +0700, John Naylor wrote:
> > Pushed 01 and 02 separately, then squashed and pushed the rest.
>
> Thanks a lot! It does look a good bit cleaner to me now.
>
> I think, as a followup improvement, we should move gramparse.h to
> src/backend/parser, and stop installing gram.h, gramparse.h. gramparse.h
> already had this note:
>
>  * NOTE: this file is only meant to be included in the core parsing files,
>  * i.e., parser.c, gram.y, and scan.l.
>  * Definitions that are needed outside the core parser should be in parser.h.
>
> What do you think?

+1 for the concept, but haven't looked at the details.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Out-of-date comments about RecentGlobalXmin?

2022-09-06 Thread Zhang Mingli
Hi,

Regards,
Zhang Mingli
On Sep 6, 2022, 16:03 +0800, Japin Li , wrote:
>
> In commit dc7420c2c9, the RecentGlobalXmin variable is removed, however,
> there are some places that reference it.
>
> $ grep 'RecentGlobalXmin' -rn src/
> src/backend/replication/logical/launcher.c:101: * the secondary effect that 
> it sets RecentGlobalXmin. (This is critical
> src/backend/utils/init/postinit.c:790: * interested in the secondary effect 
> that it sets RecentGlobalXmin. (This
> src/backend/postmaster/autovacuum.c:1898: * the secondary effect that it sets 
> RecentGlobalXmin. (This is critical
>
Yeah, RecentGlobalXmin is removed.
> It's out-of-date, doesn't it? I'm not sure s/RecentGlobalXmin/RecentXmin/g
> is right. Any thoughts?
I’m afraid not, RecentGlobalXmin is split to several GlobalVis* variables.
Need to check one by one.


Re: Out-of-date comments about RecentGlobalXmin?

2022-09-06 Thread Daniel Gustafsson
> On 6 Sep 2022, at 10:10, Zhang Mingli  wrote:
> On Sep 6, 2022, 16:03 +0800, Japin Li , wrote:

> It's out-of-date, doesn't it? I'm not sure s/RecentGlobalXmin/RecentXmin/g
> is right. Any thoughts?
> I’m afraid not, RecentGlobalXmin is split to several GlobalVis* variables.
> Need to check one by one.

It's a set of functions actually and not variables IIRC.

It's worth looking at the entire comment and not just the grep output though,
as these three places share the exact same comment.  Note the second paragraph:

/*
 * Start a new transaction here before first access to db, and get a
 * snapshot.  We don't have a use for the snapshot itself, but we're
 * interested in the secondary effect that it sets RecentGlobalXmin. (This
 * is critical for anything that reads heap pages, because HOT may decide
 * to prune them even if the process doesn't attempt to modify any
 * tuples.)
 *
 * FIXME: This comment is inaccurate / the code buggy. A snapshot that is
 * not pushed/active does not reliably prevent HOT pruning (->xmin could
 * e.g. be cleared when cache invalidations are processed).
 */

This was added in dc7420c2c92 which removed RecentGlobalXmin, addressing that
FIXME would of course be very welcome.

--
Daniel Gustafsson   https://vmware.com/





Re: Out-of-date comments about RecentGlobalXmin?

2022-09-06 Thread Japin Li


On Tue, 06 Sep 2022 at 16:17, Daniel Gustafsson  wrote:
>> On 6 Sep 2022, at 10:10, Zhang Mingli  wrote:
>> On Sep 6, 2022, 16:03 +0800, Japin Li , wrote:
>
>> It's out-of-date, doesn't it? I'm not sure s/RecentGlobalXmin/RecentXmin/g
>> is right. Any thoughts?
>> I’m afraid not, RecentGlobalXmin is split to several GlobalVis* variables.
>> Need to check one by one.
>
> It's a set of functions actually and not variables IIRC.
>
> It's worth looking at the entire comment and not just the grep output though,
> as these three places share the exact same comment.  Note the second 
> paragraph:
>
> /*
>  * Start a new transaction here before first access to db, and get a
>  * snapshot.  We don't have a use for the snapshot itself, but we're
>  * interested in the secondary effect that it sets RecentGlobalXmin. (This
>  * is critical for anything that reads heap pages, because HOT may decide
>  * to prune them even if the process doesn't attempt to modify any
>  * tuples.)
>  *
>  * FIXME: This comment is inaccurate / the code buggy. A snapshot that is
>  * not pushed/active does not reliably prevent HOT pruning (->xmin could
>  * e.g. be cleared when cache invalidations are processed).
>  */
>
> This was added in dc7420c2c92 which removed RecentGlobalXmin, addressing that
> FIXME would of course be very welcome.

My bad!  Thanks for pointing out this.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: [PATCH] Tab completion for SET COMPRESSION

2022-09-06 Thread Aleksander Alekseev
Hi hackers,

> Right.  That looks fine to me, so applied.

Thanks, Michael.

> In addition, why not take this opportunity to create a tab completion for
> "ALTER TABLE  OF " and "ALTER TABLE  NOT OF"?

Thanks for reviewing, Shinya. Let's fix this too. The patch is attached.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Add-psql-tab-compression-for-OF-NOT-OF-with-ALTER.patch
Description: Binary data


Re: Smoothing the subtrans performance catastrophe

2022-09-06 Thread Simon Riggs
On Thu, 4 Aug 2022 at 13:11, Simon Riggs  wrote:

> I will post my patch, when complete, in a different thread.

To avoid confusion, I will withdraw this patch from the CF, in favour
of my other patch on a similar topic,
Minimizing calls to SubTransSetParent()
https://commitfest.postgresql.org/39/3806/.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: making relfilenodes 56 bits

2022-09-06 Thread Dilip Kumar
On Sun, Sep 4, 2022 at 9:27 AM Dilip Kumar  wrote:
>
> On Sat, Sep 3, 2022 at 5:11 PM Amit Kapila  wrote:

> > Isn't this happening because we are passing "--clean
> > --create"/"--create" options to pg_restore in create_new_objects()? If
> > so, then I think one idea to decouple would be to not use those
> > options. Perform drop/create separately via commands (for create, we
> > need to generate the command as we are generating while generating the
> > dump in custom format), then rewrite the conflicting tables, and
> > finally restore the dump.
>
> Hmm, you are right.  So I think something like this is possible to do,
> I will explore this more. Thanks for the idea.

I have explored this area more and also tried to come up with a
working prototype, so while working on this I realized that we would
have almost to execute all the code which is getting generated as part
of the dumpDatabase() and dumpACL() which is basically,

1. UPDATE pg_catalog.pg_database SET datistemplate = false
2. DROP DATABASE
3. CREATE DATABASE with all the database properties like ENCODING,
LOCALE_PROVIDER, LOCALE, LC_COLLATE, LC_CTYPE, ICU_LOCALE,
COLLATION_VERSION, TABLESPACE
4. COMMENT ON DATABASE
5. Logic inside dumpACL()

I feel duplicating logic like this is really error-prone, but I do not
find any clear way to reuse the code as dumpDatabase() has a high
dependency on the Archive handle and generating the dump file.

So currently I have implemented most of this logic except for a few
e.g. dumpACL(), comments on the database, etc.  So before we go too
far in this direction I wanted to know the opinions of others.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Column Filtering in Logical Replication

2022-09-06 Thread Amit Kapila
On Tue, Sep 6, 2022 at 5:08 AM Peter Smith  wrote:
>
> You are right - that REFRESH PUBLICATION was not necessary for this
> example. The patch is modified to use your suggested text.
>
> PSA v8
>

LGTM. I'll push this once the tag appears for v15.

-- 
With Regards,
Amit Kapila.




RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread kuroda.hay...@fujitsu.com
> I was not sure what's the proper way to fix it.
> The solution I've thought at first was transporting all invalidations from 
> sub to top
> like ReorderBufferTransferSnapToParent(),
> but I do not know its side effect. Moreover, how do we deal with
> ReorderBufferChange?
> Should we transfer them too? If so, how about the ordering of changes?
> Alternative solustion was just remove the assertion, but was it OK?

PSA the PoC patch for discussion. In this patch only invalidation messages are 
transported,
changes hold by subtxn are ignored. This can be passed the reported workload.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



REL14-0001-PoC-Fix-assertion-failure-during-logical-decoding.patch
Description:  REL14-0001-PoC-Fix-assertion-failure-during-logical-decoding.patch


Re: Asynchronous execution support for Custom Scan

2022-09-06 Thread Etsuro Fujita
On Mon, Sep 5, 2022 at 10:32 PM Kazutaka Onishi  wrote:
> I'm sorry for my error on your name...

No problem.

> >  IIUC, it uses the proposed
> > APIs, but actually executes ctidscans *synchronously*, so it does not
> > improve performance.  Right?

> Exactly.
> The actual CustomScan that supports asynchronous execution will
> start processing in CustomScanAsyncRequest,
> configure to detect completion via file descriptor in
> CustomScanAsyncConfigureWait,
> and receive the result in CustomScanAsyncNotify.

Ok, thanks!

Best regards,
Etsuro Fujita




Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread Amit Kapila
On Tue, Sep 6, 2022 at 9:31 AM wangw.f...@fujitsu.com
 wrote:
>
> On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> > Thanks for the comments, the attached patch has the changes for the same.
>
> Thanks for updating the patch.
>
> Here is one comment for 0001 patch.
> 1. The query in function check_publications_origin.
> +   appendStringInfoString(&cmd,
> +  "SELECT DISTINCT P.pubname 
> AS pubname\n"
> +  "FROM pg_publication P,\n"
> +  " LATERAL 
> pg_get_publication_tables(P.pubname) GPT\n"
> +  " LEFT JOIN 
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +  " pg_class C JOIN 
> pg_namespace N ON (N.oid = C.relnamespace)\n"
> +  "WHERE C.oid = GPT.relid 
> AND PS.srrelid IS NOT NULL AND P.pubname IN (");
>
> Since I found that we only use "PS.srrelid" in the WHERE statement by
> specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to join
> the table pg_subscription_rel?
>

I also think we can use INNER JOIN here but maybe there is a reason
why that is not used in the first place. If we change it in the code
then also let's change the same in the docs section as well.

Few minor points:
===
1.
This scenario is detected and a WARNING is logged to the
+   user, but the warning is only an indication of a potential problem; it is
+   the user responsibility to make the necessary checks to ensure the copied
+   data origins are really as wanted or not.
+  

/user/user's

2. How about a commit message like:
Raise a warning if there is a possibility of data from multiple origins.

This commit raises a warning message for a combination of options
('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
operations if the publication tables were also replicated from other
publishers.

During replication, we can skip the data from other origins as we have that
information in WAL but that is not possible during initial sync so we raise
a warning if there is such a possibility.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-06 Thread Aleksander Alekseev
Hi John,

Thanks for the feedback.

> Maybe "00xx 4-byte length word (aligned)," is more clear about
> what is aligned. Also, adding all those xxx's obscures the point that
> we only need to examine one byte to figure out what to do next.

IMO "00xx 4-byte length word" is still confusing. One can misread
this as a 00-xx-xx-xx hex value, where the first byte (not two bits)
is 00h.

> The patch has:
>
> + * In the third case the va_tag field (see varattrib_1b_e) is used to discern
> + * the specific type and length of the pointer datum. On disk the "xxx" bits
> + * currently always store sizeof(varatt_external) + 2.
>
> ...so not sure where 17 came from.

Right, AFTER applying the patch it's clear that it's actually 18
bytes. Currently the comment says "1 byte followed by a TOAST pointer
(16 bytes)" which is wrong.

> - * 1000 1-byte length word, unaligned, TOAST pointer
> + * 1000 , TOAST pointer (struct varatt_external)
>
> This implies that the header is two bytes, which is not accurate. That
> next byte is a type tag:
> [...]
> ...and does not always represent the on-disk length:

Well, the comments don't say what is the header and what is the type
tag. They merely describe the bit layouts. The patch doesn't seem to
make things worse in this respect. Do you think we should address this
too? I suspect that describing the difference between the header and
the type tag here will create even more confusion.

> And I don't think the new comments referring to "third case", "first
> two cases", etc make it easier to follow.

Maybe you are right. I'm open to suggestions.

-- 
Best regards,
Aleksander Alekseev




Re: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread Amit Kapila
On Tue, Sep 6, 2022 at 1:17 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Dear hackers,
>
> > I agreed both that DEBUG2 messages are still useful but we should not
> > change the condition for output. So I prefer the idea suggested by Amit.
>
> PSA newer patch, which contains the fix and test.
>
> > > I have not verified but I think we need to backpatch this till 14
> > > because prior to that in DecodeCommit, we use to set the top-level txn
> > > as having catalog changes based on the if there are invalidation
> > > messages in the commit record. So, in the current scenario shared by
> > > Osumi-San, before SnapBuildCommitTxn(), the top-level txn will be
> > > marked as having catalog changes.
> >
> > I and Osumi-san are now investigating that, so please wait further reports 
> > and
> > patches.
>
> We investigated it about older versions, and in some versions *another 
> stack-trace* has been found.
>
>
> About PG10-13, indeed, the failure was not occurred.
> In these versions transactions are regarded as
> that have catalog changes when the commit record has XACT_XINFO_HAS_INVALS 
> flag.
> This flag will be set if the transaction has invalidation messages.
>
> When sub transaction changes system catalogs and user commits,
> all invalidation messages allocated in sub transaction will be transferred to 
> top transaction.
> Therefore both transactions will be marked as containing catalog changes.
>
>
> About PG14 and 15, however, another stack-trace has been found.
> While executing the same workload, we got followings at the same SQL 
> statement;
>

Did you get this new assertion failure after you applied the patch for
the first failure? Because otherwise, how can you reach it with the
same test case?

About patch:
else if (sub_needs_timetravel)
  {
- /* track toplevel txn as well, subxact alone isn't meaningful */
+ elog(DEBUG2, "forced transaction %u to do timetravel due to one of
its subtransaction",
+ xid);
+ needs_timetravel = true;
  SnapBuildAddCommittedTxn(builder, xid);

Why did you remove the above comment? I think it still makes sense to retain it.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Tab completion for SET COMPRESSION

2022-09-06 Thread Shinya Kato

On 2022-09-06 17:28, Aleksander Alekseev wrote:

In addition, why not take this opportunity to create a tab completion 
for

"ALTER TABLE  OF " and "ALTER TABLE  NOT OF"?


Thanks for reviewing, Shinya. Let's fix this too. The patch is 
attached.


Thanks for the new patch!
A minor modification has been made so that the composite type is also 
completed after "ALTER TABLE  OF".


Thought?

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62a39779b9..053c0ea75c 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2240,7 +2240,8 @@ psql_completion(const char *text, int start, int end)
 	  "ENABLE", "INHERIT", "NO", "RENAME", "RESET",
 	  "OWNER TO", "SET", "VALIDATE CONSTRAINT",
 	  "REPLICA IDENTITY", "ATTACH PARTITION",
-	  "DETACH PARTITION", "FORCE ROW LEVEL SECURITY");
+	  "DETACH PARTITION", "FORCE ROW LEVEL SECURITY",
+	  "OF", "NOT OF");
 	/* ALTER TABLE xxx ADD */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ADD"))
 	{
@@ -2469,6 +2470,10 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION", MatchAny))
 		COMPLETE_WITH("CONCURRENTLY", "FINALIZE");
 
+	/* ALTER TABLE  OF */
+	else if (TailMatches("ALTER", "TABLE", MatchAny, "OF"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes);
+
 	/* ALTER TABLESPACE  with RENAME TO, OWNER TO, SET, RESET */
 	else if (Matches("ALTER", "TABLESPACE", MatchAny))
 		COMPLETE_WITH("RENAME TO", "OWNER TO", "SET", "RESET");


Re: HOT chain validation in verify_heapam()

2022-09-06 Thread Himanshu Upadhyaya
Hi Robert,

Thanks for sharing the feedback.

On Sat, Aug 27, 2022 at 1:47 AM Robert Haas  wrote:

> +htup = (HeapTupleHeader) PageGetItem(ctx.page, rditem);
> +if (!(HeapTupleHeaderIsHeapOnly(htup) &&
> htup->t_infomask & HEAP_UPDATED))
> +report_corruption(&ctx,
> +  psprintf("Redirected Tuple at
> line pointer offset %u is not HEAP_ONLY_TUPLE or HEAP_UPDATED tuple",
> +   (unsigned) rdoffnum));
>
> This isn't safe because the line pointer referenced by rditem may not
> have been sanity-checked yet. Refer to the comment just below where it
> says "Sanity-check the line pointer's offset and length values".
>
> handled by creating a new function check_lp and calling it before
accessing the redirected tuple.

>
>
> +/*
> + * Add line pointer offset to predecessor array.
> + * 1) If xmax is matching with xmin of next
> Tuple(reaching via its t_ctid).
> + * 2) If next tuple is in the same page.
> + * Raise corruption if:
> + * We have two tuples having same predecessor.
> + *
> + * We add offset to predecessor irrespective of
> transaction(t_xmin) status. We will
> + * do validation related to transaction status(and also
> all other validations)
> + * when we loop over predecessor array.
> + */
>
> The formatting of this comment will, I think, be mangled if pgindent
> is run against the file. You can use - markers to prevent that, I
> believe, or (better) write this as a paragraph without relying on the
> lines ending up uneven in length.
>
>
Done, reformatted using pg_indent.

+if (predecessor[nextTupOffnum] != 0)
> +{
> +report_corruption(&ctx,
> +psprintf("Tuple at offset %u is
> reachable from two or more updated tuple",
> +(unsigned) nextTupOffnum));
> +continue;
> +}
>
> You need to do this test after xmin/xmax matching. Otherwise you might
> get false positives. Also, the message should be more specific and
> match the style of the existing messages. ctx.offnum is already going
> to be reported in another column, but we can report both nextTupOffnum
> and predecessor[nextTupOffnum] here e.g. "updated version at offset %u
> is also the updated version of tuple at offset %u".
>
>
agree, done.

+currTupXmax = HeapTupleHeaderGetUpdateXid(ctx.tuphdr);
> +lp   = PageGetItemId(ctx.page, nextTupOffnum);
> +
> +htup = (HeapTupleHeader) PageGetItem(ctx.page, lp);
>
> This has the same problem I mentioned in my first comment above,
> namely, we haven't necessarily sanity-checked the length and offset
> values for nextTupOffnum yet. Saying that another way, if the contents
> of lp are corrupt and point off the page, we want that to be reported
> as corruption (which the current code will already do) and we want
> this check to be skipped so that we don't crash or access random
> memory addresses. You need to think about how to rearrange the code so
> that we only perform checks that are known to be safe.
>
>
Moved logic of sanity checked to a new function check_lp() and called
before accessing the next tuple while populating the predecessor array.


> Please take the time to format your code according to the PostgeSQL
> standard practice. If you don't know what that looks like, use
> pgindent.
>
> +{
> +HeapTupleHeader pred_htup, curr_htup;
> +TransactionId   pred_xmin, curr_xmin, curr_xmax;
> +ItemId  pred_lp, curr_lp;
>
> Same here.
>

Done.

I think it would actually be a good idea to set ctx.offnum to the
> predecessor's offset number, and use a separate variable for the
> current offset number. The reason why I think this is that I believe
> it will make it easier to phrase the messages appropriately. For
> example, if ctx.offnum is the predecessor tuple, then we can issue
> complaints like this:
>
> tuple with uncommitted xmin %u was updated to produce a tuple at
> offset %u with differing xmin %u
> unfrozen tuple was updated to produce a tuple at offset %u which is not
> frozen
> tuple with uncommitted xmin %u has cmin %u, but was updated to produce
> a tuple with cmin %u
> non-heap-only update produced a heap-only tuple at offset %u
> heap-only update produced a non-heap only tuple at offset %u
>
>
Agree, Done.


> +if (!TransactionIdIsValid(curr_xmax) &&
> +HeapTupleHeaderIsHotUpdated(curr_htup))
> +{
> +report_corruption(&ctx,
> +psprintf("Current tuple at offset %u is
> HOT but is last tuple in the HOT chain.",
> +(unsigned) ctx.offnum));
> +}
>
> This check has nothing t

May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

2022-09-06 Thread Anton A. Melnikov

Hello!

Found a periodic spike growth of the checkpoint_req counter on replica by 20-30 
units
after large insert (~350Mb) on master.
Reproduction on master and replica with default conf:
1) execute the command "insert into test values (generate_series(1,1E7));".
This leads to the table's growth by about 350Mb during about 15 secs (on my pc).
2)The wal records start coming to the replica, and when their number exceeds a 
certain limit, a request is emitted to the checkpointer process to create 
restartpoint on the replica and checkpoint_req is incremented. With default 
settings, this limit is 42 segments.
3) Restartpoint creation fails because a new restartpoint can only be created 
if the replica has received new WAL records about the checkpoint from the 
moment of the previous restartpoint. But there were no such records.
4) When the next WAL segment is received by replica, the next request is 
generated to create a restartpoint on the replica, and so on.
5) Finally, a WAL record about the checkpoint arrives on the replica, 
restartpoint is created and the growth of checkpoint_req stops.
The described process can be observed in the log with additional debugging. See 
insert_1E7_once.log attached. This
log is for v13 but master has the same behavior.

Can we treat such behavior as a bug?
If so it seems possible to check if a creating of restartpoint is obviously 
impossible before sending request and don't send it at all if so.

The patch applied tries to fix it.

With best regards.
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company022-09-04 21:09:45.160 MSK [2424110] LOG:  Start CFS version 0.54 supported compression algorithms pglz,zlib encryption disabled GC enabled
2022-09-04 21:09:45.168 MSK [2424110] LOG:  starting PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2022-09-04 21:09:45.168 MSK [2424110] LOG:  listening on IPv4 address "0.0.0.0", port 54131
2022-09-04 21:09:45.168 MSK [2424110] LOG:  listening on IPv6 address "::", port 54131
2022-09-04 21:09:45.177 MSK [2424110] LOG:  listening on Unix socket "/tmp/.s.PGSQL.54131"
2022-09-04 21:09:45.187 MSK [2424150] LOG:  database system was interrupted; last known up at 2022-09-04 21:09:44 MSK
2022-09-04 21:09:45.282 MSK [2424150] LOG:  entering standby mode
2022-09-04 21:09:45.292 MSK [2424150] LOG:  redo starts at 0/228
2022-09-04 21:09:45.296 MSK [2424150] LOG:  consistent recovery state reached at 0/2000198
2022-09-04 21:09:45.296 MSK [2424110] LOG:  database system is ready to accept read only connections
2022-09-04 21:09:45.307 MSK [2424233] LOG:  started streaming WAL from primary at 0/300 on timeline 1
2022-09-04 21:09:45.341 MSK [2424259] LOG:  Start 1 background garbage collection workers for CFS
2022-09-04 21:10:26.653 MSK [2424150] LOG:  !!!Request a restartpoint if we've replayed too much xlog!!! 
	readSegNo: 43, old_segno: 2, CheckPointSegments: 42
	
2022-09-04 21:10:26.653 MSK [2424224] LOG:  Checkpoint requested by wal. Checkpoints already started: 0, done: 0, failed: 0. 
	Now: 41. Last_chkpt_time: 0. Elapsed secs: 41. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:26.653 MSK [2424224] LOG:  !!!This is restartpoint!!! 
	
2022-09-04 21:10:26.669 MSK [2424224] LOG:  !!!Restartpoint NOT performed!!!
	
2022-09-04 21:10:28.926 MSK [2424150] LOG:  !!!Request a restartpoint if we've replayed too much xlog!!! 
	readSegNo: 44, old_segno: 2, CheckPointSegments: 42
	
2022-09-04 21:10:28.926 MSK [2424224] LOG:  Checkpoint requested by wal. Checkpoints already started: 1, done: 1, failed: 0. 
	Now: 43. Last_chkpt_time: -244. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:28.926 MSK [2424224] LOG:  !!!This is restartpoint!!! 
	
2022-09-04 21:10:29.176 MSK [2424224] LOG:  !!!Restartpoint NOT performed!!!
	
2022-09-04 21:10:30.014 MSK [2424150] LOG:  !!!Request a restartpoint if we've replayed too much xlog!!! 
	readSegNo: 45, old_segno: 2, CheckPointSegments: 42
	
2022-09-04 21:10:30.014 MSK [2424224] LOG:  Checkpoint requested by wal. Checkpoints already started: 2, done: 2, failed: 0. 
	Now: 45. Last_chkpt_time: -242. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:30.014 MSK [2424224] LOG:  !!!This is restartpoint!!! 
	
2022-09-04 21:10:30.058 MSK [2424224] LOG:  !!!Restartpoint NOT performed!!!
	
2022-09-04 21:10:45.072 MSK [2424224] LOG:  Checkpoint requested by time! Now: 60. Last_chkpt_time: -240. Elapsed secs: 300. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:45.072 MSK [2424224] LOG:  !!!This is restartpoint!!! 
	
2022-09-04 21:10:45.077 MSK [2424224] LOG:  !!!Checkpoint NOT performed!!! 
	
2022-09-04 21:11:00.092 MSK [2424224] LOG:  Checkpoint requested by time! Now: 75. Last_chkpt_time: -225. Elapsed secs: 300. CheckPointTimeout: 300. CheckPointWarning: 30

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-06 Thread Amit Kapila
On Sat, Aug 20, 2022 at 4:32 PM Önder Kalacı  wrote:
>
> I'm a little late to catch up with your comments, but here are my replies:
>
>> > My answer for the above assumes that your question is regarding what 
>> > happens if you ANALYZE on a partitioned table. If your question is 
>> > something different, please let me know.
>> >
>>
>> I was talking about inheritance cases, something like:
>> create table tbl1 (a int);
>> create table tbl1_part1 (b int) inherits (tbl1);
>> create table tbl1_part2 (c int) inherits (tbl1);
>>
>> What we do in such cases is documented as: "if the table being
>> analyzed has inheritance children, ANALYZE gathers two sets of
>> statistics: one on the rows of the parent table only, and a second
>> including rows of both the parent table and all of its children. This
>> second set of statistics is needed when planning queries that process
>> the inheritance tree as a whole. The child tables themselves are not
>> individually analyzed in this case."
>
>
> Oh, I haven't considered inherited tables. That seems right, the statistics 
> of the children are not updated when the parent is analyzed.
>
>>
>> Now, the point I was worried about was what if the changes in child
>> tables (*_part1, *_part2) are much more than in tbl1? In such cases,
>> we may not invalidate child rel entries, so how will logical
>> replication behave for updates/deletes on child tables? There may not
>> be any problem here but it is better to do some analysis of such cases
>> to see how it behaves.
>
>
> I also haven't observed any specific issues. In the end, when the user (or 
> autovacuum) does ANALYZE on the child, it is when the statistics are updated 
> for the child.
>

Right, I also think that should be the behavior but I have not
verified it. However, I think it should be easy to verify if
autovacuum updates the stats for child tables when we operate on only
one of such tables and whether that will invalidate the cache for our
case.

> Although I do not have much experience with inherited tables, this sounds 
> like the expected behavior?
>
> I also pushed a test covering inherited tables. First, a basic test on the 
> parent. Then, show that updates on the parent can also use indexes of the 
> children. Also, after an ANALYZE on the child, we can re-calculate the index 
> and use the index with a higher cardinality column.
>
>>
>> > Also, for the majority of the use-cases, I think we'd probably expect an 
>> > index on a column with high cardinality -- hence use index scan. So, 
>> > bitmap index scans are probably not going to be that much common.
>> >
>>
>> You are probably right here but I don't think we can make such
>> assumptions. I think the safest way to avoid any regression here is to
>> choose an index when the planner selects an index scan. We can always
>> extend it later to bitmap scans if required. We can add a comment
>> indicating the same.
>>
>
> Alright, I got rid of the bitmap scans.
>
> Though, it caused few of the new tests to fail. I think because of the data 
> size/distribution, the planner picks bitmap scans. To make the tests 
> consistent and small, I added `enable_bitmapscan to off` for this new test 
> file. Does that sound ok to you? Or, should we change the tests to make sure 
> they genuinely use index scans?
>

That sounds okay to me.

-- 
With Regards,
Amit Kapila.




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-06 Thread Ranier Vilela
Em seg., 5 de set. de 2022 às 23:02, David Rowley 
escreveu:

> On Tue, 6 Sept 2022 at 13:52, Ranier Vilela  wrote:
> >
> > Em seg., 5 de set. de 2022 às 22:29, David Rowley 
> escreveu:
> >> It feels like it would be good if we had a way to detect a few of
> >> these issues much earlier than we are currently.  There's been a long
> >> series of commits fixing up this sort of thing.  If we had a tool to
> >> parse the .c files and look for things like a function call to
> >> appendPQExpBuffer() and appendStringInfo() with only 2 parameters (i.e
> >> no va_arg arguments).
> >
> > StaticAssert could check va_arg no?
>
> I'm not sure exactly what you have in mind. If you think you have a
> way to make that work, it would be good to see a patch with it.
>
I will study the matter.
But first, I would like to continue with this correction of using strings.
In the following cases:
fprintf -> fputs -> fputc
printf -> puts -> putchar

There are many occurrences, do you think it would be worth the effort?

regards,
Ranier Vilela


(doc patch) psql version compatibility

2022-09-06 Thread Christoph Berg
Hi,

I didn't understand the current wording of the NOTES section in
psql(1) on which major versions psql is compatible with, so here's a
patch to make that more explicit.

Christoph
>From 89f947c215c679004e1f3fbf88751fe527e16f91 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 6 Sep 2022 13:21:48 +0200
Subject: [PATCH] psql-ref.sgml: Be more specific about major versions support

The old wording only mentioned \d, but in practise even `psql -l` would
fail for pre-9.2 servers, so be more clear about the version range.
---
 doc/src/sgml/ref/psql-ref.sgml | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 186f8c506a..09e3da7c24 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4946,20 +4946,19 @@ PSQL_EDITOR_LINENUMBER_ARG='--line '
 
 
   
-  psql works best with servers of the same
-   or an older major version.  Backslash commands are particularly likely
+  psql works with servers of the same
+   or an older major version, back to 9.2.  Backslash commands are particularly likely
to fail if the server is of a newer version than psql
-   itself.  However, backslash commands of the \d family should
-   work with servers of versions back to 9.2, though not necessarily with
-   servers newer than psql itself.  The general
+   itself, or older than 9.2.  The general
functionality of running SQL commands and displaying query results
-   should also work with servers of a newer major version, but this cannot
+   should also work with servers of a other major versions, but this cannot
be guaranteed in all cases.
   
   
If you want to use psql to connect to several
servers of different major versions, it is recommended that you use the
-   newest version of psql.  Alternatively, you
+   newest version of psql.  (To connect to pre-9.2 servers,
+   use psql 14 or earlier.)  Alternatively, you
can keep around a copy of psql from each
major version and be sure to use the version that matches the
respective server.  But in practice, this additional complication should
-- 
2.35.1



Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-09-06 Thread Dilip Kumar
On Tue, Aug 30, 2022 at 10:16 PM Simon Riggs
 wrote:

> PFA two patches, replacing earlier work
> 001_new_isolation_tests_for_subxids.v3.patch
> 002_minimize_calls_to_SubTransSetParent.v8.patch
>
> 001_new_isolation_tests_for_subxids.v3.patch
> Adds new test cases to master without adding any new code, specifically
> addressing the two areas of code that are not tested by existing tests.
> This gives us a baseline from which we can do test driven development.
> I'm hoping this can be reviewed and committed fairly smoothly.
>
> 002_minimize_calls_to_SubTransSetParent.v8.patch
> Reduces the number of calls to subtrans below 1% for the first 64 subxids,
> so overall will substantially reduce subtrans contention on master for the
> typical case, as well as smoothing the overflow case.
> Some discussion needed on this; there are various options.
> This combines the work originally posted here with another patch posted on the
> thread "Smoothing the subtrans performance catastrophe".
>
> I will do some performance testing also, but more welcome.

Thanks for the updated patch, I have some questions/comments.

1.
+ * This has the downside that anyone waiting for a lock on aborted
+ * subtransactions would not be released immediately; that may or
+ * may not be an acceptable compromise. If not acceptable, this
+ * simple call needs to be replaced with a loop to register the
+ * parent for the current subxid stack, so we can walk
back up it to
+ * the topxid.
+ */
+SubTransSetParent(subxid, GetTopTransactionId());

I do not understand in which situation we will see this downside.  I
mean if we see the logic of XactLockTableWait() then in the current
situation also if the subtransaction is committed we directly wait on
the top transaction by calling SubTransGetTopmostTransaction(xid);

So if the lock-taking subtransaction is committed then we will wait
directly for the top-level transaction and after that, it doesn't
matter if we abort any of the parent subtransactions, because it will
wait for the topmost transaction to complete.  And if the lock-taking
subtransaction is aborted then it will anyway stop waiting because
TransactionIdIsInProgress() should return false.

2.
/*
 * Notice that we update pg_subtrans with the top-level xid, rather than
 * the parent xid. This is a difference between normal processing and
 * recovery, yet is still correct in all cases. The reason is that
 * subtransaction commit is not marked in clog until commit processing, so
 * all aborted subtransactions have already been clearly marked in clog.
 * As a result we are able to refer directly to the top-level
 * transaction's state rather than skipping through all the intermediate
 * states in the subtransaction tree. This should be the first time we
 * have attempted to SubTransSetParent().
 */
for (i = 0; i < nsubxids; i++)
SubTransSetParent(subxids[i], topxid);

I think this comment needs some modification because in this patch now
in normal processing also we are setting the topxid as a parent right?

3.
+while (TransactionIdIsValid(parentXid))
+{
+previousXid = parentXid;
+
+/*
+ * Stop as soon as we are earlier than the cutoff. This saves multiple
+ * lookups against subtrans when we have a deeply nested subxid with
+ * a later snapshot with an xmin much higher than TransactionXmin.
+ */
+if (TransactionIdPrecedes(parentXid, cutoff_xid))
+{
+*xid = previousXid;
+return true;
+}
+parentXid = SubTransGetParent(parentXid);

Do we need this while loop if we are directly setting topxid as a
parent, so with that, we do not need multiple iterations to go to the
top xid?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




pg_upgrade major version compatibility

2022-09-06 Thread Christoph Berg
The pg_upgrade manpage in PG 14 and earlier claims that upgrades from
8.4 are supported, but that doesn't work:

/usr/lib/postgresql/14/bin/pg_upgrade -b /usr/lib/postgresql/8.4/bin -B 
/usr/lib/postgresql/14/bin -p 5432 -P 5433 -d /var/lib/postgresql/8.4/upgr -o 
-D /etc/postgresql/8.4/upgr -D /etc/postgresql/14/upgr
Finding the real data directory for the target cluster  ok
Performing Consistency Checks
-
Checking cluster versions   ok
The source cluster lacks some required control information:
  latest checkpoint oldestXID

Cannot continue without required control information, terminating
Failure, exiting

8.4 -> 14/13/12/11/10/9.6 are all broken in the same way (using the
target version's pg_upgrade of course)

9.0 -> 14 and 8.4 -> 9.5 work.

8.4 -> 15 "works" in the sense of that the non-support is correctly
documented in the manpage and in the pg_upgrade output:

/usr/lib/postgresql/15/bin/pg_upgrade -b /usr/lib/postgresql/8.4/bin -B 
/usr/lib/postgresql/15/bin -p 5432 -P 5433 -d /var/lib/postgresql/8.4/upgr -o 
-D /etc/postgresql/8.4/upgr -D /etc/postgresql/15/upgr
Finding the real data directory for the target cluster  ok
Performing Consistency Checks
-
Checking cluster versions
This utility can only upgrade from PostgreSQL version 9.2 and later.
Failure, exiting


Is that failure intentional, and just not documented properly, or is
that a bug?

Christoph




Re: [PATCH] Tab completion for SET COMPRESSION

2022-09-06 Thread Aleksander Alekseev
Hi Shinya,

> A minor modification has been made so that the composite type is also
> completed after "ALTER TABLE  OF".

LGTM. Here is v3 created with `git format-path`. Unlike v2 it also
includes the commit message.

-- 
Best regards,
Aleksander Alekseev


v3-0001-Add-psql-tab-compression-for-OF-NOT-OF-with-ALTER.patch
Description: Binary data


Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-09-06 Thread Simon Riggs
On Tue, 6 Sept 2022 at 12:37, Dilip Kumar  wrote:
>
> On Tue, Aug 30, 2022 at 10:16 PM Simon Riggs
>  wrote:
>
> > PFA two patches, replacing earlier work
> > 001_new_isolation_tests_for_subxids.v3.patch
> > 002_minimize_calls_to_SubTransSetParent.v8.patch
> >
> > 001_new_isolation_tests_for_subxids.v3.patch
> > Adds new test cases to master without adding any new code, specifically
> > addressing the two areas of code that are not tested by existing tests.
> > This gives us a baseline from which we can do test driven development.
> > I'm hoping this can be reviewed and committed fairly smoothly.
> >
> > 002_minimize_calls_to_SubTransSetParent.v8.patch
> > Reduces the number of calls to subtrans below 1% for the first 64 subxids,
> > so overall will substantially reduce subtrans contention on master for the
> > typical case, as well as smoothing the overflow case.
> > Some discussion needed on this; there are various options.
> > This combines the work originally posted here with another patch posted on 
> > the
> > thread "Smoothing the subtrans performance catastrophe".
> >
> > I will do some performance testing also, but more welcome.
>
> Thanks for the updated patch, I have some questions/comments.

Thanks for the review.

> 1.
> + * This has the downside that anyone waiting for a lock on 
> aborted
> + * subtransactions would not be released immediately; that may or
> + * may not be an acceptable compromise. If not acceptable, this
> + * simple call needs to be replaced with a loop to register the
> + * parent for the current subxid stack, so we can walk
> back up it to
> + * the topxid.
> + */
> +SubTransSetParent(subxid, GetTopTransactionId());
>
> I do not understand in which situation we will see this downside.  I
> mean if we see the logic of XactLockTableWait() then in the current
> situation also if the subtransaction is committed we directly wait on
> the top transaction by calling SubTransGetTopmostTransaction(xid);
>
> So if the lock-taking subtransaction is committed then we will wait
> directly for the top-level transaction and after that, it doesn't
> matter if we abort any of the parent subtransactions, because it will
> wait for the topmost transaction to complete.  And if the lock-taking
> subtransaction is aborted then it will anyway stop waiting because
> TransactionIdIsInProgress() should return false.

Yes, correct.

> 2.
> /*
>  * Notice that we update pg_subtrans with the top-level xid, rather than
>  * the parent xid. This is a difference between normal processing and
>  * recovery, yet is still correct in all cases. The reason is that
>  * subtransaction commit is not marked in clog until commit processing, so
>  * all aborted subtransactions have already been clearly marked in clog.
>  * As a result we are able to refer directly to the top-level
>  * transaction's state rather than skipping through all the intermediate
>  * states in the subtransaction tree. This should be the first time we
>  * have attempted to SubTransSetParent().
>  */
> for (i = 0; i < nsubxids; i++)
> SubTransSetParent(subxids[i], topxid);
>
> I think this comment needs some modification because in this patch now
> in normal processing also we are setting the topxid as a parent right?

Correct

> 3.
> +while (TransactionIdIsValid(parentXid))
> +{
> +previousXid = parentXid;
> +
> +/*
> + * Stop as soon as we are earlier than the cutoff. This saves 
> multiple
> + * lookups against subtrans when we have a deeply nested subxid with
> + * a later snapshot with an xmin much higher than TransactionXmin.
> + */
> +if (TransactionIdPrecedes(parentXid, cutoff_xid))
> +{
> +*xid = previousXid;
> +return true;
> +}
> +parentXid = SubTransGetParent(parentXid);
>
> Do we need this while loop if we are directly setting topxid as a
> parent, so with that, we do not need multiple iterations to go to the
> top xid?

Correct. I think we can dispense with
SubTransGetTopmostTransactionPrecedes() entirely.

I was initially trying to leave options open but that is confusing and
as a result, some parts are misleading after I merged the two patches.

I will update the patch, thanks for your scrutiny.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Modernizing our GUC infrastructure

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 1:43 AM Tom Lane  wrote:
> I think there's a good analogy here to temporary tables.  The SQL
> spec says that temp-table schemas are persistent and database-wide,
> but what we actually have is that they are session-local.  People
> occasionally propose that we implement the SQL semantics for that,
> but in the last twenty-plus years no one has bothered to write a
> committable patch to support it ... much less remove the existing
> behavior in favor of that, which I'm pretty sure no one would think
> is a good idea.

Well, I've thought about doing this a few times, but it's a real pain
in the neck, primarily because we store metadata that needs to be
per-instantiation in the catalog rows: relfrozenxid, relminmxid, and
the relation statistics. So I'm not sure "no one has bothered" is
quite the right way to characterize it. "no one has been able to
adequately untangle the mess" might be more accurate.

> So, is it actually a good idea to have persistent metadata for
> session variables?  I'd say that the issue is at best debatable,
> and at worst proven wrong by a couple of decades of experience.
> In what way are session variables less mutable than temp tables?

I haven't looked at that patch at all, but I would assume that
variables would have SQL types, and that we would never add GUCs with
SQL types, which seems like a pretty major semantic difference.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




psql -l and locales (Re: pgsql: Add option to use ICU as global locale provider)

2022-09-06 Thread Christoph Berg
Re: Tom Lane
> Christoph Berg  writes:
> > A possible solution might be to rip out all the locale columns except
> > "Encoding" from \l, and leave them in place for \l+.
> 
> I'd rather see a single column summarizing the locale situation.
> Perhaps it could be COALESCE(daticulocale, datcollate), or
> something using a CASE on datlocprovider?
> Then \l+ could replace that with all the underlying columns.

Fwiw I still think the default psql -l output should be more concise.
Any chance to have that happen for PG15?

I can try creating a patch if it has chances of getting through.

Christoph




Re: predefined role(s) for VACUUM and ANALYZE

2022-09-06 Thread Robert Haas
On Mon, Sep 5, 2022 at 2:56 PM Nathan Bossart  wrote:
> There are 2 bits remaining at the moment, so I didn't redesign the ACL
> system in the attached patch.  However, I did some research on a couple
> options.  Using a distinct set of bits for each catalog table should free
> up a handful of bits, which should indeed kick the can down the road a
> little.  Another easy option is to simply make AclMode a uint64, which
> would immediately free up another 16 privilege bits.  I was able to get
> this approach building and passing tests in a few minutes, but there might
> be performance/space concerns.

I believe Tom has expressed such concerns in the past, but it is not
clear to me that they are well-founded. I don't think we have much
code that manipulates large numbers of aclitems, so I can't quite see
where the larger size would be an issue. There may well be some
places, so I'm not saying that Tom or anyone else with concerns is
wrong, but I'm just having a hard time thinking of where it would be a
real issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: HOT chain validation in verify_heapam()

2022-09-06 Thread Aleksander Alekseev
Hi Himanshu,

Many thanks for working on this!

> Please find attached the patch with the above idea of HOT chain's validation

Please correct me if I'm wrong, but don't we have a race condition here:

```
+if ((TransactionIdDidAbort(pred_xmin) ||
TransactionIdIsInProgress(pred_xmin))
+&& !TransactionIdEquals(pred_xmin, curr_xmin))
 {
```

The scenario that concerns me is the following:

1. TransactionIdDidAbort(pred_xmin) returns false
2. The transaction aborts
3. TransactionIdIsInProgress(pred_xmin) returns false
4. (false || false) gives us false. An error is reported, although
actually the condition should have been true.

-- 
Best regards,
Aleksander Alekseev




Re: Proposal to provide the facility to set binary format output for specific OID's per session

2022-09-06 Thread Dave Cramer
On Tue, 6 Sept 2022 at 02:30, Ibrar Ahmed  wrote:

>
>
> On Fri, Aug 12, 2022 at 5:48 PM Dave Cramer  wrote:
>
>>
>>
>> On Fri, 5 Aug 2022 at 17:51, Justin Pryzby  wrote:
>>
>>> On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
>>> > Attached patch to correct these deficiencies.
>>>
>>> You sent a patch to be applied on top of the first patch, but cfbot
>>> doesn't
>>> know that, so it says the patch doesn't apply.
>>> http://cfbot.cputube.org/dave-cramer.html
>>>
>>> BTW, a previous discussion about this idea is here:
>>>
>>> https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aac...@2ndquadrant.com
>>
>>
>> squashed patch attached
>>
>> Dave
>>
> The patch does not apply successfully; a rebase is required.
>
> === applying patch ./0001-add-format_binary.patch
> patching file src/backend/tcop/postgres.c
> Hunk #1 succeeded at 97 (offset -8 lines).
> patching file src/backend/tcop/pquery.c
> patching file src/backend/utils/init/globals.c
> patching file src/backend/utils/misc/guc.c
> Hunk #1 succeeded at 144 (offset 1 line).
> Hunk #2 succeeded at 244 with fuzz 2 (offset 1 line).
> Hunk #3 succeeded at 4298 (offset -1 lines).
> Hunk #4 FAILED at 12906.
> 1 out of 4 hunks FAILED -- saving rejects to file 
> src/backend/utils/misc/guc.c.rej
> patching file src/include/miscadmin.h
>
>
>
>

Thanks,

New rebased patch attached

Dave

>


0001-add-format_binary.patch
Description: Binary data


Re: HOT chain validation in verify_heapam()

2022-09-06 Thread Aleksander Alekseev
Hi again,

> I am looking into the process of adding the TAP test for these changes and 
> finding a way to corrupt a page in the tap test

Please note that currently the patch breaks many existing tests. I
suggest fixing these first.

For the details please see the cfbot report [1] or execute the tests
locally. Personally I'm using a little script for this [2].

[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/blob/master/full-build.sh

-- 
Best regards,
Aleksander Alekseev




Re: Tracking last scan time

2022-09-06 Thread Dave Page
Hi

On Thu, 1 Sept 2022 at 19:35, Andres Freund  wrote:

> Hi,
>
> On 2022-09-01 14:18:42 +0200, Matthias van de Meent wrote:
> > On Wed, 31 Aug 2022 at 20:56, Andres Freund  wrote:
> > > But given this is done when stats are flushed, which only happens
> after the
> > > transaction ended, we can just use
> GetCurrentTransactionStopTimestamp() - if
> > > we got to flushing the transaction stats we'll already have computed
> that.
> >
> > I'm not entirely happy with that, as that would still add function
> > call overhead, and potentially still call GetCurrentTimestamp() in
> > this somewhat hot loop.
>
> We already used GetCurrentTransactionStopTimestamp() (as you reference
> below)
> before we get to this point, so I doubt that we'll ever call
> GetCurrentTimestamp(). And it's hard to imagine that the function call
> overhead of GetCurrentTransactionStopTimestamp() matters compared to
> acquiring
> locks etc.


Vik and I looked at this a little, and found that we actually don't have
generally have GetCurrentTransactionStopTimestamp() at this point - a
simple 'select * from pg_class' will result in 9 passes of this code, none
of which have xactStopTimestamp != 0.

After discussing it a little, we came to the conclusion that for the stated
use case, xactStartTimestamp is actually accurate enough, provided that we
only ever update it with a newer value. It would only likely be in extreme
edge-cases where the difference between start and end transaction time
would have any bearing on whether or not one might drop a table/index for
lack of use.

Doing it this way also means we no longer need the GUC to enable the
feature, which as Bruce notes, is likely to lose 95% of users.

Updated patch attached:

- GUC removed.
- The timestamp recorded is xactStartTimestamp.
- Docs updated to make it clear we're recording transaction start time.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


last_scan_v3.diff
Description: Binary data


Re: HOT chain validation in verify_heapam()

2022-09-06 Thread Aleksander Alekseev
Hi hackers,

> Please correct me if I'm wrong, but don't we have a race condition here:
>
> ```
> +if ((TransactionIdDidAbort(pred_xmin) ||
> TransactionIdIsInProgress(pred_xmin))
> +&& !TransactionIdEquals(pred_xmin, curr_xmin))
>  {
> ```
>
> The scenario that concerns me is the following:
>
> 1. TransactionIdDidAbort(pred_xmin) returns false
> 2. The transaction aborts
> 3. TransactionIdIsInProgress(pred_xmin) returns false
> 4. (false || false) gives us false. An error is reported, although
> actually the condition should have been true.

It looks like I had a slight brain fade here.

In order to report a false error either TransactionIdDidAbort() or
TransactionIdIsInProgress() should return true and
TransactionIdEquals() should be false. So actually under rare
conditions the error will NOT be reported while it should. Other than
that we seem to be safe from the concurrency perspective, unless I'm
missing something again.

Personally I don't have a strong opinion on whether we should bother
about this scenario. Probably an explicit comment will not hurt.

Also I suggest checking TransactionIdEquals() first though since it's cheaper.

-- 
Best regards,
Aleksander Alekseev




Re: pg15b3: recovery fails with wal prefetch enabled

2022-09-06 Thread Jonathan S. Katz

On 9/5/22 10:03 PM, Thomas Munro wrote:

On Tue, Sep 6, 2022 at 1:51 PM Tom Lane  wrote:

"Jonathan S. Katz"  writes:

On 9/5/22 7:18 PM, Thomas Munro wrote:

Well I was about to commit this, but beta4 just got stamped (but not
yet tagged).  I see now that Jonathan (with RMT hat on, CC'd) meant
commits should be in by the *start* of the 5th AoE, not the end.  So
the procedural/RMT question is whether it's still possible to close
this item in beta4.



Presumably because Tom stamped it, the released is wrapped so it
wouldn't make Beta 4, but I defer to him to see if it can be included
with the tag.


I already made the tarballs available to packagers, so adding this
would involve a re-wrap and great confusion.  In any case, I'm not
a fan of pushing fixes within a day or two of the wrap deadline,
let alone after it; you get inadequate buildfarm coverage when you
cut corners that way.  I think this one missed the boat.


Got it.  Yeah I knew it was going to be a close thing with a problem
diagnosed on Thursday/Friday before a Monday wrap, even before I
managed to confuse myself about dates and times.  Thanks both.


To close this loop, I added a section for "fixed before RC1" to Open 
Items since this is presumably the next release. We can include it there 
once committed.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Modernizing our GUC infrastructure

2022-09-06 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 6, 2022 at 1:43 AM Tom Lane  wrote:
>> I think there's a good analogy here to temporary tables.  The SQL
>> spec says that temp-table schemas are persistent and database-wide,
>> but what we actually have is that they are session-local.

> Well, I've thought about doing this a few times, but it's a real pain
> in the neck, primarily because we store metadata that needs to be
> per-instantiation in the catalog rows: relfrozenxid, relminmxid, and
> the relation statistics. So I'm not sure "no one has bothered" is
> quite the right way to characterize it. "no one has been able to
> adequately untangle the mess" might be more accurate.

I could agree on "no one has thought it was worth the work".  It could
be made to happen if we were sufficiently motivated, but we aren't.
I believe a big chunk of the reason is that the SQL semantics are not
obviously better than what we have.  And some of the advantages they
do have, like less catalog thrashing, wouldn't apply in the session
variable case.

> I haven't looked at that patch at all, but I would assume that
> variables would have SQL types, and that we would never add GUCs with
> SQL types, which seems like a pretty major semantic difference.

Yeah, I do not think we'd want to extend GUCs beyond the existing
bool/int/float/string cases, since they have to be readable under
non-transactional circumstances.  Having said that, that covers
an awful lot of practical territory.  Schema variables of
arbitrary SQL types sound cool, sure, but how many real use cases
are there that can't be met with the GUC types?

I think a large part of the reason the schema-variables patch
has gone sideways for so many years is that it's an ambitious
overdesign.

regards, tom lane




Re: pgsql: Add ALTER SUBSCRIPTION ... SKIP.

2022-09-06 Thread Masahiko Sawada
On Sun, Sep 4, 2022 at 8:39 PM Amit Kapila  wrote:
>
> On Sun, Sep 4, 2022 at 1:48 PM Alvaro Herrera  wrote:
> >
> > On 2022-Mar-22, Amit Kapila wrote:
> >
> > > Add ALTER SUBSCRIPTION ... SKIP.
> >
> > There are two messages here that seem oddly worded.
> >
> > msgid "start skipping logical replication transaction finished at %X/%X"
> > msgid "done skipping logical replication transaction finished at %X/%X"
> >
> > Two complaints here.  First, the phrases "start / finished" and "done /
> > finished" look very strange.  It took me a while to realize that
> > "finished" refers to the LSN, not to the skipping operation.  Do we ever
> > talk about a transaction "finished at XYZ" as opposed to a transaction
> > whose LSN is XYZ?  (This became particularly strange when I realized
> > that the LSN might come from a PREPARE.)
> >
>
> The reason to add "finished at ..." was to be explicit about whether
> it is a starting LSN or an end LSN of a transaction. We do have such
> differentiation in ReorderBufferTXN (first_lsn ... end_lsn).
>
> > Second, "logical replication transaction".  Is it not a regular
> > transaction that we happen to be processing via logical replication?
> >
> > I think they should say something like
> >
> > "logical replication starts skipping transaction with LSN %X/%X"
> > "logical replication completed skipping transaction with LSN %X/%X"
> >
>
> This looks better to me.

+1

> If you find the above argument to
> differentiate between the start and end LSN convincing then we can
> think of replacing "with" in the above messages with "finished at". I
> see your point related to using "finished at" for PREPARE may not be a
> good idea but I don't have better ideas for the same.

Given that the user normally doesn't need to be aware of the
difference between start LSN and end LSN in the context of using this
feature, I think we can use "with LSN %X/%X".

Regards,

-- 
Masahiko Sawada




Re: Return value of PathNameOpenFile()

2022-09-06 Thread Tom Lane
Daniel Gustafsson  writes:
> Agreed, zero should be valid as it's a non-negative integer.  However, callers
> in fd.c are themselves checking for (fd <= 0) in some cases, and some have 
> done
> so since the very early days of the codebase, so I wonder if there 
> historically
> used to be a platform which considered 0 an invalid fd?

I'm betting it's a thinko that never got caught because 0 would
always be taken up by stdin.  Maybe you'd notice if you tried to
close-and-reopen stdin, but that's not something the server ever does.

regards, tom lane




Re: (doc patch) psql version compatibility

2022-09-06 Thread Tom Lane
Christoph Berg  writes:
> I didn't understand the current wording of the NOTES section in
> psql(1) on which major versions psql is compatible with, so here's a
> patch to make that more explicit.

This seems both very repetitive and incorrect in detail.
Some operations will work fine with older servers, some won't.
I don't think that we want to undertake documenting exactly
which commands work how far back, so my inclination is to
add a nonspecific disclaimer along the lines of

   Some commands may not work with very old servers (before 9.2).

regards, tom lane




Re: Modernizing our GUC infrastructure

2022-09-06 Thread Pavel Stehule
Hi


> I think a large part of the reason the schema-variables patch
> has gone sideways for so many years is that it's an ambitious
> overdesign.
>

Last two  weeks this patch is shorter and shorter. I removed a large part
related to check of type consistency, because I can do this check more
easily - and other work is done by dependencies.

Big thanks to Julien - it does a lot of work and he shows me a lot of
issues and possibilities on how to fix it. With Julien work this patch
moved forward. Years before it was just a prototype.

This patch is not too complex - important part is session_variable.c with
1500 lines , and it is almost simple code - store value to hashtab, and
cleaning hash tab on sinval or on transaction end or abort + debug routine.

[pavel@localhost commands]$ cloc session_variable.c
   1 text file.
   1 unique file.
   0 files ignored.

github.com/AlDanial/cloc v 1.90  T=0.02 s (50.0 files/s, 77011.1 lines/s)
---
Language files  blankcomment
code
---
C1257463
 820
---

In other files there are +/- mechanical code





>
> regards, tom lane
>
>
>


Re: (doc patch) psql version compatibility

2022-09-06 Thread Christoph Berg
Re: Tom Lane
> Christoph Berg  writes:
> > I didn't understand the current wording of the NOTES section in
> > psql(1) on which major versions psql is compatible with, so here's a
> > patch to make that more explicit.
> 
> This seems both very repetitive and incorrect in detail.

Meh, I tried to preserve as much of the original text as reasonable,
but we can strip it down further.

> Some operations will work fine with older servers, some won't.
> I don't think that we want to undertake documenting exactly
> which commands work how far back, so my inclination is to
> add a nonspecific disclaimer along the lines of
> 
>Some commands may not work with very old servers (before 9.2).

I'd like it do say "it works with 9.2" when that's what the code is
implementing.

How about this?

  psql works with servers of the same
   or an older major version, back to 9.2. The general
   functionality of running SQL commands and displaying query results
   should also work with servers of other major versions, but
   backslash commands are particularly likely to fail.
  
  
   If you want to use psql to connect to several
   servers of different major versions, it is recommended that you use the
   newest version of psql.  (To connect to 
pre-9.2 servers,
   use psql 14 or earlier.)
  


Christoph
>From 2d83aaec5f398de041467b4fe78731401847d6a5 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 6 Sep 2022 13:21:48 +0200
Subject: [PATCH] psql-ref.sgml: Be more specific about major versions support

Explicitly say which version range we are supporting.
---
 doc/src/sgml/ref/psql-ref.sgml | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 186f8c506a..9e972f0c48 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4946,24 +4946,17 @@ PSQL_EDITOR_LINENUMBER_ARG='--line '
 
 
   
-  psql works best with servers of the same
-   or an older major version.  Backslash commands are particularly likely
-   to fail if the server is of a newer version than psql
-   itself.  However, backslash commands of the \d family should
-   work with servers of versions back to 9.2, though not necessarily with
-   servers newer than psql itself.  The general
+  psql works with servers of the same
+   or an older major version, back to 9.2. The general
functionality of running SQL commands and displaying query results
-   should also work with servers of a newer major version, but this cannot
-   be guaranteed in all cases.
+   should also work with servers of other major versions, but
+   backslash commands are particularly likely to fail.
   
   
If you want to use psql to connect to several
servers of different major versions, it is recommended that you use the
-   newest version of psql.  Alternatively, you
-   can keep around a copy of psql from each
-   major version and be sure to use the version that matches the
-   respective server.  But in practice, this additional complication should
-   not be necessary.
+   newest version of psql.  (To connect to pre-9.2 servers,
+   use psql 14 or earlier.)
   
   
 
-- 
2.35.1



Re: allowing for control over SET ROLE

2022-09-06 Thread Robert Haas
On Sat, Sep 3, 2022 at 3:46 PM Jeff Davis  wrote:
> > You are now connected to database "rhaas" as user "rhaas".
> > rhaas=# grant pg_read_all_settings to seer with set false;
> > GRANT ROLE
>
> You've defined this in terms of the mechanics -- allow SET ROLE or not
> -- but I assume you intend it as a security feature to allow/forbid
> some capabilities.
>
> Is this only about the capability to create objects owned by a role
> you're a member of? Or are there other implications?

I think there are some other implications, but I don't think they're
anything super-dramatic. For example, you could create a group that's
just for purposes of pg_hba.conf matching and make the grants both SET
FALSE and INHERIT FALSE, with the idea that the members shouldn't have
any access to the role; it's just there for grouping purposes. I
mentioned one other possible scenario, with oncallbot, in the original
post.

I'm not sure whether thinking about this in terms of security
capabilities is the most helpful way to view it. My view was, as you
say, more mechanical. I think sometimes you grant somebody a role and
you want them to be able to use SET ROLE to assume the privileges of
the target role, and sometimes you don't. I think that primarily
depends on the reason why you made the grant. In the case of a
predefined role, you're almost certainly granting membership so that
the privileges of the predefined role can be inherited. In other
cases, you may be doing it so that the member can SET ROLE to the
target role, or you may be doing it so that the member can administer
the role (because you give them ADMIN OPTION), or you may even be
doing it for pg_hba.conf matching.

And because of this, I think it follows that there may be some
capabilities conferred by role membership that you don't really want
to convey in particular cases, so I think it makes sense to have a way
to avoid conveying the ones that aren't necessary for the grant to
fulfill its purpose. I'm not exactly sure how far that gets you in
terms of building a system that is more secure than what you could
build otherwise, but it feels like a useful capability regardless.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: predefined role(s) for VACUUM and ANALYZE

2022-09-06 Thread Stephen Frost
Greetings,

* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Tue, Aug 23, 2022 at 07:46:47PM -0400, Stephen Frost wrote:
> > I've long felt that we should redefine the way the ACLs work to have a
> > distinct set of bits for each object type.  We don't need to support a
> > CONNECT bit on a table, yet we do today and we expend quite a few bits
> > in that way.  Having that handled on a per-object-type basis instead
> > would allow us to get quite a bit more mileage out of the existing 32bit
> > field before having to introduce more complicated storage methods like
> > using a bit to tell us to go look up more ACLs somewhere else.
> 
> There are 2 bits remaining at the moment, so I didn't redesign the ACL
> system in the attached patch.  However, I did some research on a couple
> options.  Using a distinct set of bits for each catalog table should free
> up a handful of bits, which should indeed kick the can down the road a
> little.  Another easy option is to simply make AclMode a uint64, which
> would immediately free up another 16 privilege bits.  I was able to get
> this approach building and passing tests in a few minutes, but there might
> be performance/space concerns.

Considering our burn rate of ACL bits is really rather slow (2 this
year, but prior to that was TRUNCATE in 2008 and CONNECT in 2006), I'd
argue that moving away from the current one-size-fits-all situation
would kick the can down the road more than just 'a little' and wouldn't
have any performance or space concerns.  Once we actually get to the
point where we've burned through all of those after the next few decades
then we can move to a uint64 or something else more complicated,
perhaps.

If we were to make the specific bits depend on the object type as I'm
suggesting, then we'd have 8 bits used for relations (10 with the vacuum
and analyze bits), leaving us with 6 remaining inside the existing
uint32, or more bits available than we've ever used since the original
implementation from what I can tell, or at least 15+ years.  That seems
like pretty darn good future-proofing without a lot of complication or
any change in physical size.  We would also be able to get rid of the
question of "well, is it more valuable to add the ability to GRANT
TRUNCATE on a relation, or GRANT CONNECT on databases" or other rather
odd debates between ultimately very different things.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Modernizing our GUC infrastructure

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 10:05 AM Tom Lane  wrote:
> > I haven't looked at that patch at all, but I would assume that
> > variables would have SQL types, and that we would never add GUCs with
> > SQL types, which seems like a pretty major semantic difference.
>
> Yeah, I do not think we'd want to extend GUCs beyond the existing
> bool/int/float/string cases, since they have to be readable under
> non-transactional circumstances.  Having said that, that covers
> an awful lot of practical territory.  Schema variables of
> arbitrary SQL types sound cool, sure, but how many real use cases
> are there that can't be met with the GUC types?

Well, if you use an undefined custom GUC, you're just going to get a
string data type, I believe, which is pretty well equivalent to not
having any type checking at all. You could extend that in some way to
allow users to create dummy GUCs of any type supported by the
mechanism, but I think that's mostly stacking one hack on top of
another. I believe there's good evidence that users want variables
based on SQL data types, whereas I can't see any reason why users
would variables based on GUC data types. It is of course true that the
GUC data types cover the cases people are mostly likely to want, but
that's just because it covers the most generally useful data types. If
you can want to pass an integer between one part of your application
and another, why can't you want to pass a numeric or a bytea? I think
you can, and I think people do.

This is not really an endorsement of the SQL variables patch, which I
haven't studied and which for all I know may have lots of problems,
either as to design or as to implementation. But I think it's a little
crazy to pretend that the ability to store strings - or even values of
any GUC type - into a fictional GUC is an adequate substitute for SQL
variables. Honestly, the fact that you can do that in the first place
seems more like an undesirable wart necessitated by the way loadable
modules interact with the GUC system than a feature -- but even if it
were a feature, it's not the same feature.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




New docs chapter on Transaction Management and related changes

2022-09-06 Thread Simon Riggs
New chapter on transaction management, plus a few related changes.

Markup and links are not polished yet, so please comment initially on
the topics, descriptions and wording.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


xact_docs.v2.patch
Description: Binary data


Re: predefined role(s) for VACUUM and ANALYZE

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 11:11 AM Stephen Frost  wrote:
> Considering our burn rate of ACL bits is really rather slow (2 this
> year, but prior to that was TRUNCATE in 2008 and CONNECT in 2006), I'd
> argue that moving away from the current one-size-fits-all situation
> would kick the can down the road more than just 'a little' and wouldn't
> have any performance or space concerns.  Once we actually get to the
> point where we've burned through all of those after the next few decades
> then we can move to a uint64 or something else more complicated,
> perhaps.

Our burn rate is slow because there's been a lot of pushback - mostly
from Tom - about consuming the remaining bits. It's not because people
haven't had ideas about how to use them up.

> If we were to make the specific bits depend on the object type as I'm
> suggesting, then we'd have 8 bits used for relations (10 with the vacuum
> and analyze bits), leaving us with 6 remaining inside the existing
> uint32, or more bits available than we've ever used since the original
> implementation from what I can tell, or at least 15+ years.  That seems
> like pretty darn good future-proofing without a lot of complication or
> any change in physical size.  We would also be able to get rid of the
> question of "well, is it more valuable to add the ability to GRANT
> TRUNCATE on a relation, or GRANT CONNECT on databases" or other rather
> odd debates between ultimately very different things.

I mostly agree with this. I don't think it's entirely clear how we
should try to get more bits going forward, but it's clear that we
cannot just forever hold our breath and refuse to find any more bits.
And of the possible ways of doing it, this seems like the one with the
lowest impact, so I think it likely makes sense to do this one first.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Tracking last scan time

2022-09-06 Thread Andres Freund
Hi,

On 2022-09-06 14:15:56 +0100, Dave Page wrote:
> Vik and I looked at this a little, and found that we actually don't have
> generally have GetCurrentTransactionStopTimestamp() at this point - a
> simple 'select * from pg_class' will result in 9 passes of this code, none
> of which have xactStopTimestamp != 0.

Huh, pgstat_report_stat() used GetCurrentTransactionStopTimestamp() has used
for a long time. Wonder when that was broken. Looks like it's set only when a
xid is assigned. We should fix this.


> After discussing it a little, we came to the conclusion that for the stated
> use case, xactStartTimestamp is actually accurate enough, provided that we
> only ever update it with a newer value. It would only likely be in extreme
> edge-cases where the difference between start and end transaction time
> would have any bearing on whether or not one might drop a table/index for
> lack of use.

I don't at all agree with this. Since we already use
GetCurrentTransactionStopTimestamp() in this path we should fix it.

Greetings,

Andres Freund




Re: predefined role(s) for VACUUM and ANALYZE

2022-09-06 Thread Nathan Bossart
On Tue, Sep 06, 2022 at 11:24:18AM -0400, Robert Haas wrote:
> On Tue, Sep 6, 2022 at 11:11 AM Stephen Frost  wrote:
>> If we were to make the specific bits depend on the object type as I'm
>> suggesting, then we'd have 8 bits used for relations (10 with the vacuum
>> and analyze bits), leaving us with 6 remaining inside the existing
>> uint32, or more bits available than we've ever used since the original
>> implementation from what I can tell, or at least 15+ years.  That seems
>> like pretty darn good future-proofing without a lot of complication or
>> any change in physical size.  We would also be able to get rid of the
>> question of "well, is it more valuable to add the ability to GRANT
>> TRUNCATE on a relation, or GRANT CONNECT on databases" or other rather
>> odd debates between ultimately very different things.
> 
> I mostly agree with this. I don't think it's entirely clear how we
> should try to get more bits going forward, but it's clear that we
> cannot just forever hold our breath and refuse to find any more bits.
> And of the possible ways of doing it, this seems like the one with the
> lowest impact, so I think it likely makes sense to do this one first.

+1.  My earlier note wasn't intended to suggest that one approach was
better than the other, merely that there are a couple of options to choose
from once we run out of bits.  I don't think this work needs to be tied to
the VACUUM/ANALYZE stuff, but I am interested in it and hope to take it on
at some point.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: New docs chapter on Transaction Management and related changes

2022-09-06 Thread Erik Rijkers

Op 06-09-2022 om 17:16 schreef Simon Riggs:

New chapter on transaction management, plus a few related changes.

Markup and links are not polished yet, so please comment initially on
the topics, descriptions and wording.


[xact_docs.v2.patch] 


Very clear explanations, thank you.

Two typos:

'not yet yet part'  should be
'not yet part'

'extenal'  should be
'external'


Erik




Re: pg_auth_members.grantor is bunk

2022-09-06 Thread Robert Haas
On Fri, Sep 2, 2022 at 6:01 PM Jeff Davis  wrote:
> > Yes. I figured that, when GRANTED BY is not specified, it is OK to
> > infer a valid grantor
>
> The spec is clear that the grantor should be either the current user or
> the current role. We also have a concept of INHERIT, which allows us to
> choose a role we're a member of if the current one does not suffice.
>
> But to choose a different role (the bootstrap superuser) even when the
> current (super) user role *does* suffice seems like an outright
> violation of both the spec and the principle of least surprise.

I don't think that the current superuser role suffices. For non-role
objects, privileges originate in the table owner and can then be
granted to others. Roles don't have an explicit owner, so I treated
the bootstrap superuser as the implicit owner of every role. Perhaps
there is some other way we could go here - e.g. it's been proposed by
multiple people that maybe roles should have owners - but I do not
think it is viable to regard the owner of a role as being anyone who
happens to be a superuser right at the moment. To some extent that's
related to your concern about whether ALTER USER .. NOSUPERUSER should
be fast and immune to failure, but I also think that it is a good idea
to have all of the privileges originating from a single owner. That
ensures, for example, that anyone who can act as the object owner can
revoke any privilege, which wouldn't necessarily be true if the object
had multiple owners. Now if all of the owners are themselves
superusers who all have the power to become any of the other owners
then perhaps it wouldn't end up mattering too much, but it doesn't
seem like a good idea to rely on that. In fact, part of my goal here
is to get to a world where there's less need to rely on superuser
powers to do system administration. I also just think it's less
confusing if objects have single owners rather than nebulous groups of
owners.

> > set session authorization a;
> > grant select on table t1 to b;
> >
> > At this point, b has SELECT permission on table t1 and the grantor of
> > record is p1
>
> That's because "a" does not have permision to grant select on t1, so
> INHERIT kicks in to implicitly "SET ROLE p1". What keeps INHERIT sane
> is that it only kicks in when required (i.e. it would otherwise result
> in failure).
>
> But in the case I raised, the current user is an entirely valid
> grantor, so it doesn't make sense to me to infer a different grantor.

See above, but also, see the first stanza of select_best_grantor(). If
alice is a table owner, and grants permissions to bob WITH GRANT
OPTION, and bob is a superuser and grants permissions on the table,
the grantor will be alice, not bob.

> > As to the first, the algorithm being used to select the best grantor
> > here is analogous to the one we use for privileges on other object
> > types, such as tables, namely, we prefer to create a grant that is
> > not
> > dependent on some other grant, rather than one that is.
>
> I don't quite follow. It seems like we're conflating a policy based on
> INHERIT with the policy around grants by superusers.
>
> In the case of role membership and INHERIT, our current behavior seems
> wise (and closer to the standard): to prefer a grantor that is closer
> to the current user/role, and therefore less dependent on other grants.
>
> But for the new policy around superusers, the current superuser is a
> completely valid grantor, and we instead preferring the bootstrap
> superuser. That doesn't seem consistent or wise to me.

I hope that the above comments on treating the bootstrap superuser as
the object owner explain why it works this way.

> I certainly don't want to pin every weird thing about our privilege
> system on you just because you're the last one to touch it. But your
> changes did extend the behavior, and create some new analogous
> behavior, so it seems like a reasonable time to discuss whether those
> extensions are in the right direction.

Sure.

> > When you view this in the context of how other types of grants work,
> > ALTER ROLE ... NOSUPERUSER isn't as much of a special case. Just as
> > we
> > want ALTER ROLE ... NOSUPERUSER to succeed quickly, we also insist
> > that REVOKE role1 FROM role2 to succeed quickly. It isn't allowed to
> > fail due to the existence of dependent privileges, because there
> > aren't allowed to be any dependent privileges.
>
>   create user u1;
>   create user u2;
>   create user u3;
>   grant u2 to u1 with admin option;
>   \c - u1
>   grant u2 to u3;
>   \c - bootstrap_superuser
>   revoke u2 from u1;
>   ERROR:  dependent privileges exist

Hmm, I stand corrected. I was thinking of a case in which the grant
was used to perform an action on behalf of an inherited role. Here the
grant from u2 to u3 is performed as u1 and attributed to u1.

> And the whole reason we are jumping through all of these hoops is
> because we want to allow the removal of superuser privileges quickly
> without the 

Re: New docs chapter on Transaction Management and related changes

2022-09-06 Thread Simon Riggs
On Tue, 6 Sept 2022 at 17:19, Erik Rijkers  wrote:
>
> Op 06-09-2022 om 17:16 schreef Simon Riggs:
> > New chapter on transaction management, plus a few related changes.
> >
> > Markup and links are not polished yet, so please comment initially on
> > the topics, descriptions and wording.
>
> > [xact_docs.v2.patch]
>
> Very clear explanations, thank you.
>
> Two typos:
>
> 'not yet yet part'  should be
> 'not yet part'
>
> 'extenal'  should be
> 'external'

Thanks, new version attached.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


xact_docs.v3.patch
Description: Binary data


Re: making relfilenodes 56 bits

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 4:40 AM Dilip Kumar  wrote:
> I have explored this area more and also tried to come up with a
> working prototype, so while working on this I realized that we would
> have almost to execute all the code which is getting generated as part
> of the dumpDatabase() and dumpACL() which is basically,
>
> 1. UPDATE pg_catalog.pg_database SET datistemplate = false
> 2. DROP DATABASE
> 3. CREATE DATABASE with all the database properties like ENCODING,
> LOCALE_PROVIDER, LOCALE, LC_COLLATE, LC_CTYPE, ICU_LOCALE,
> COLLATION_VERSION, TABLESPACE
> 4. COMMENT ON DATABASE
> 5. Logic inside dumpACL()
>
> I feel duplicating logic like this is really error-prone, but I do not
> find any clear way to reuse the code as dumpDatabase() has a high
> dependency on the Archive handle and generating the dump file.

Yeah, I don't think this is the way to go at all. The duplicated logic
is likely to get broken, and is also likely to annoy the next person
who has to maintain it.

I suggest that for now we fall back on making the initial
RelFileNumber for a system table equal to pg_class.oid. I don't really
love that system and I think maybe we should change it at some point
in the future, but all the alternatives seem too complicated to cram
them into the current patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: predefined role(s) for VACUUM and ANALYZE

2022-09-06 Thread Nathan Bossart
On Mon, Sep 05, 2022 at 11:56:30AM -0700, Nathan Bossart wrote:
> Here is a first attempt at allowing users to grant VACUUM or ANALYZE
> per-relation.  Overall, this seems pretty straightforward.  I needed to
> adjust the permissions logic for VACUUM/ANALYZE a bit, which causes some
> extra WARNING messages for VACUUM (ANALYZE) in some cases, but this didn't
> seem particularly worrisome.  It may be desirable to allow granting ANALYZE
> on specific columns or to allow granting VACUUM/ANALYZE at the schema or
> database level, but that is left as a future exercise.

Here is a new patch set with some follow-up patches to implement $SUBJECT.
0001 is the same as v3.  0002 simplifies some WARNING messages as suggested
upthread [0].  0003 adds the new pg_vacuum_all_tables and
pg_analyze_all_tables predefined roles.  Instead of adjusting the
permissions logic in vacuum.c, I modified pg_class_aclmask_ext() to return
the ACL_VACUUM and/or ACL_ANALYZE bits as appropriate.

[0] 
https://postgr.es/m/20220726.104712.912995710251150228.horikyota.ntt%40gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From aa4796c3925fc7675d87df310f8057e62890138f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 3 Sep 2022 23:31:38 -0700
Subject: [PATCH v4 1/3] Allow granting VACUUM and ANALYZE privileges on
 relations.

---
 doc/src/sgml/ddl.sgml | 49 ---
 doc/src/sgml/func.sgml|  3 +-
 .../sgml/ref/alter_default_privileges.sgml|  4 +-
 doc/src/sgml/ref/analyze.sgml |  3 +-
 doc/src/sgml/ref/grant.sgml   |  4 +-
 doc/src/sgml/ref/revoke.sgml  |  2 +-
 doc/src/sgml/ref/vacuum.sgml  |  3 +-
 src/backend/catalog/aclchk.c  |  8 ++
 src/backend/commands/analyze.c|  2 +-
 src/backend/commands/vacuum.c | 24 --
 src/backend/parser/gram.y |  7 ++
 src/backend/utils/adt/acl.c   | 16 
 src/bin/pg_dump/dumputils.c   |  2 +
 src/bin/pg_dump/t/002_pg_dump.pl  |  2 +-
 src/bin/psql/tab-complete.c   |  4 +-
 src/include/nodes/parsenodes.h|  4 +-
 src/include/utils/acl.h   |  6 +-
 src/test/regress/expected/dependency.out  | 22 ++---
 src/test/regress/expected/privileges.out  | 86 ++-
 src/test/regress/expected/rowsecurity.out | 34 
 src/test/regress/expected/vacuum.out  |  6 ++
 src/test/regress/sql/dependency.sql   |  2 +-
 src/test/regress/sql/privileges.sql   | 40 +
 23 files changed, 249 insertions(+), 84 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 03c0193709..ed034a6b1d 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1691,8 +1691,9 @@ ALTER TABLE products RENAME TO items;
INSERT, UPDATE, DELETE,
TRUNCATE, REFERENCES, TRIGGER,
CREATE, CONNECT, TEMPORARY,
-   EXECUTE, USAGE, SET
-   and ALTER SYSTEM.
+   EXECUTE, USAGE, SET,
+   ALTER SYSTEM, VACUUM, and
+   ANALYZE.
The privileges applicable to a particular
object vary depending on the object's type (table, function, etc.).
More detail about the meanings of these privileges appears below.
@@ -1982,7 +1983,25 @@ REVOKE ALL ON accounts FROM PUBLIC;
   
  
 
-   
+
+   
+VACUUM
+
+ 
+  Allows VACUUM on a relation.
+ 
+
+   
+
+   
+ANALYZE
+
+ 
+  Allows ANALYZE on a relation.
+ 
+
+   
+  
 
The privileges required by other commands are listed on the
reference page of the respective command.
@@ -2131,6 +2150,16 @@ REVOKE ALL ON accounts FROM PUBLIC;
   A
   PARAMETER
  
+ 
+  VACUUM
+  v
+  TABLE
+ 
+ 
+  ANALYZE
+  z
+  TABLE
+ 
  

   
@@ -2221,7 +2250,7 @@ REVOKE ALL ON accounts FROM PUBLIC;
  
  
   TABLE (and table-like objects)
-  arwdDxt
+  arwdDxtvz
   none
   \dp
  
@@ -2279,12 +2308,12 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw;
would show:
 
 => \dp mytable
-  Access privileges
- Schema |  Name   | Type  |   Access privileges   |   Column privileges   | Policies
-+-+---+---+---+--
- public | mytable | table | miriam=arwdDxt/miriam+| col1:+|
-| |   | =r/miriam+|   miriam_rw=rw/miriam |
-| |   | admin=arw/miriam  |   |
+   Access privileges
+ Schema |  Name   | Type  |Access privileges|   Column privileges   | Policies
++-+---+-+---+--
+ public | mytable | table | miriam=arwdDxtvz/miriam+| col1:+|
+| |   | =r/mir

Re: HOT chain validation in verify_heapam()

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 9:38 AM Aleksander Alekseev
 wrote:
> > Please correct me if I'm wrong, but don't we have a race condition here:
> >
> > ```
> > +if ((TransactionIdDidAbort(pred_xmin) ||
> > TransactionIdIsInProgress(pred_xmin))
> > +&& !TransactionIdEquals(pred_xmin, curr_xmin))
> >  {
> > ```
>
> It looks like I had a slight brain fade here.
>
> In order to report a false error either TransactionIdDidAbort() or
> TransactionIdIsInProgress() should return true and
> TransactionIdEquals() should be false. So actually under rare
> conditions the error will NOT be reported while it should. Other than
> that we seem to be safe from the concurrency perspective, unless I'm
> missing something again.
>
> Personally I don't have a strong opinion on whether we should bother
> about this scenario. Probably an explicit comment will not hurt.
>
> Also I suggest checking TransactionIdEquals() first though since it's cheaper.

I think the check should be written like this:

!TransactionIdEquals(pred_xmin, curr_xmin) && !TransctionIdDidCommit(pred_xmin)

The TransactionIdEquals check should be done first for the reason you
state: it's cheaper.

I think that we shouldn't be using TransactionIdDidAbort() at all,
because it can sometimes return false even when the transaction
actually did abort. See test_lockmode_for_conflict() and
TransactionIdIsInProgress() for examples of logic that copes with
this. What's happening here is that TransactionIdDidAbort doesn't ever
get called if the system crashes while a transaction is running. So we
can use TransactionIdDidAbort() only as an optimization: if it returns
true, the transaction is definitely aborted, but if it returns false,
we have to check whether it's still running. If not, it aborted
anyway.

TransactionIdDidCommit() does not have the same issue. A transaction
can abort without updating CLOG if the system crashes, but it can
never commit without updating CLOG. If the transaction didn't commit,
then it is either aborted or still in progress (and we don't care
which, because neither is an error here).

As to whether the existing formulation of the test has an error
condition, you're generally right that we should test
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort,
because we during commit or abort, we first set the status in CLOG -
which is queried by TransactionIdDidCommit/Abort - and only afterward
update the procarray - which is queried by TransactionIdIsInProgress.
So normally TransactionIdIsInProgress should be checked first, and
TransactionIdDidCommit/Abort should only be checked if it returns
false, at which point we know that the return values of the latter
calls can't ever change. Possibly there is an argument for including
the TransactionIdInProgress check here too:

!TransactionIdEquals(pred_xmin, curr_xmin) &&
(TransactionIdIsInProgress(pred_xmin) ||
!TransctionIdDidCommit(pred_xmin))

...but I don't think it could change the answer. Most places that
check TransactionIdIsInProgress() first are concerned with MVCC
semantics, and here we are not. I think the only effects of including
or excluding the TransactionIdIsInProgress() test are (1) performance,
in that searching the procarray might avoid expense if it's cheaper
than searching clog, or add expense if the reverse is true and (2)
slightly changing the time at which we're first able to detect this
form of corruption. I am inclined to prefer the simpler form of the
test without TransactionIdIsInProgress() unless someone can say why we
shouldn't go that route.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: allowing for control over SET ROLE

2022-09-06 Thread Jeff Davis
On Tue, 2022-09-06 at 10:42 -0400, Robert Haas wrote:
> I think there are some other implications, but I don't think they're
> anything super-dramatic. For example, you could create a group that's
> just for purposes of pg_hba.conf matching and make the grants both
> SET
> FALSE and INHERIT FALSE, with the idea that the members shouldn't
> have
> any access to the role; it's just there for grouping purposes. I
> mentioned one other possible scenario, with oncallbot, in the
> original
> post.

Interesting. All of those seem like worthwhile use cases to me.

> I'm not sure whether thinking about this in terms of security
> capabilities is the most helpful way to view it. My view was, as you
> say, more mechanical. I think sometimes you grant somebody a role and
> you want them to be able to use SET ROLE to assume the privileges of
> the target role, and sometimes you don't.

By denying the ability to "SET ROLE pg_read_all_settings", I assumed
that we'd deny the ability to create objects owned by that
pg_read_all_settings. But on closer inspection:

  grant all privileges on schema public to public;
  create user u1;
  grant pg_read_all_settings to u1 with set false;
  \c - u1
  create table foo(i int);
  set role pg_read_all_settings;
  ERROR:  permission denied to set role "pg_read_all_settings"
  alter table foo owner to pg_read_all_settings;
  \d
List of relations
   Schema | Name | Type  |Owner 
  +--+---+--
   public | foo  | table | pg_read_all_settings
  (1 row)


Users will reasonably interpret any feature of GRANT to be a security
feature that allows or prevents certain users from causing certain
outcomes. But here, I was initially fooled, and the outcome is still
possible.

So I believe we do need to think in terms of what capabilities we are
really restricting with this feature rather than solely the mechanics.

Regards,
Jeff Davis





Re: Return value of PathNameOpenFile()

2022-09-06 Thread Daniel Gustafsson
> On 6 Sep 2022, at 16:12, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Agreed, zero should be valid as it's a non-negative integer.  However, 
>> callers
>> in fd.c are themselves checking for (fd <= 0) in some cases, and some have 
>> done
>> so since the very early days of the codebase, so I wonder if there 
>> historically
>> used to be a platform which considered 0 an invalid fd?
> 
> I'm betting it's a thinko that never got caught because 0 would
> always be taken up by stdin.  Maybe you'd notice if you tried to
> close-and-reopen stdin, but that's not something the server ever does.

Doh, of course.  The attached is a quick (lightly make check tested) take on
allowing 0, but I'm not sure that's what we want?

--
Daniel Gustafsson   https://vmware.com/



pathnameopenfile_fd0.diff
Description: Binary data


Re: allowing for control over SET ROLE

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 2:45 PM Jeff Davis  wrote:
> > I'm not sure whether thinking about this in terms of security
> > capabilities is the most helpful way to view it. My view was, as you
> > say, more mechanical. I think sometimes you grant somebody a role and
> > you want them to be able to use SET ROLE to assume the privileges of
> > the target role, and sometimes you don't.
>
> By denying the ability to "SET ROLE pg_read_all_settings", I assumed
> that we'd deny the ability to create objects owned by that
> pg_read_all_settings. But on closer inspection:
>
>   grant all privileges on schema public to public;
>   create user u1;
>   grant pg_read_all_settings to u1 with set false;
>   \c - u1
>   create table foo(i int);
>   set role pg_read_all_settings;
>   ERROR:  permission denied to set role "pg_read_all_settings"
>   alter table foo owner to pg_read_all_settings;
>   \d
> List of relations
>Schema | Name | Type  |Owner
>   +--+---+--
>public | foo  | table | pg_read_all_settings
>   (1 row)

Yeah. Please note this paragraph in my original post:

"In order to apply this patch, we'd need to reach a conclusion about
the matters mentioned in
http://postgr.es/m/ca+tgmobheyynw9vrhvolvd8odspbjuu9cbk6tms6owd70hf...@mail.gmail.com
-- and thinking about this patch might shed some light on what we'd
want to do over there."

I hadn't quite gotten around to updating that thread based on posting
this, but this scenario was indeed on my mind.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: has_privs_of_role vs. is_member_of_role, redux

2022-09-06 Thread Robert Haas
Jeff Davis's comment in
http://postgr.es/m/4f8d536a9221bccc5a33bb784dace0ef2310ec4a.ca...@j-davis.com
reminds me that I need to update this thread based on the patch posted
over there. That patch allows you to grant membership in one role to
another while withholding the ability to SET ROLE to the target role.
And it's already possible to grant membership in one role to another
while not allowing for inheritance of privileges. And I think that
sheds new light on the two debatable points from my original email:

On Thu, Aug 25, 2022 at 12:12 PM Robert Haas  wrote:
> 1. robert can create new objects of various types owned by stuff:
>
> rhaas=> create schema stuff_by_robert authorization stuff;
> CREATE SCHEMA
> rhaas=> create schema unrelated_by_robert authorization unrelated;
> ERROR:  must be member of role "unrelated"
>
> 2. robert can change the owner of objects he owns to instead be owned by 
> stuff:
>
> rhaas=> alter table robert_table owner to unrelated;
> ERROR:  must be member of role "unrelated"
> rhaas=> alter table robert_table owner to stuff;
> ALTER TABLE

It now seems to me that we should disallow these, because if we adopt
the patch from that other thread, and then you GRANT
pg_read_all_settings TO alice WITH INHERIT false, SET false, you might
reasonably expect that alice is not going to be able to clutter the
system with a bunch of objects owned by pg_read_all_settings, but
because of (1) and (2), alice can do exactly that.

To be more precise, I propose that in order for alice to create
objects owned by bob or to change one of her objects to be owned by
bob, she must not only be a member of role bob, but also inherit bob's
privileges. If she has the ability to SET ROLE bob but not does not
inherit his privileges, she can create new objects owned by bob only
if she first does SET ROLE bob, and she cannot reassign ownership of
her objects to bob at all.

Meanwhile, the patch that I posted previously to fix point (3) from
the original email, that ALTER DEFAULT PRIVILEGES is allowed for no
good reason, still seems like a good idea. Any reviews appreciated.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Return value of PathNameOpenFile()

2022-09-06 Thread Tom Lane
Daniel Gustafsson  writes:
> Doh, of course.  The attached is a quick (lightly make check tested) take on
> allowing 0, but I'm not sure that's what we want?

Actually, wait a second.  At least some of these are not dealing
with kernel FDs but with our "virtual FD" abstraction.  For those,
zero is indeed an invalid value.  There might well be some "<= 0"
versus "< 0" errors in this area, but it requires closer inspection.

regards, tom lane




Re: Adding CI to our tree

2022-09-06 Thread Thomas Munro
On Wed, Oct 6, 2021 at 5:01 PM Thomas Munro  wrote:
> BTW, on those two OSes there are some messages like this each time a
> submake dumps its output to the log:
>
> [03:36:16.591] fcntl(): Bad file descriptor
>
> It seems worth putting up with these compared to the alternatives of
> either not using -j, not using -Otarget and having the output of
> parallel tests all mashed up and unreadable (that still happen
> sometimes but it's unlikely, because the submakes write() whole output
> chunks at infrequent intervals), or redirecting to a file so you can't
> see the realtime test output on the main CI page (not so fun, you have
> to wait until it's finished and view it as an 'artifact').  I tried to
> write a patch for GNU make to fix that[1], let's see if something
> happens.
>
> [1] https://savannah.gnu.org/bugs/?52922

For the record, GNU make finally fixed this problem (though Andres
found a workaround anyway), but in any case it won't be in the
relevant package repos before we switch over to Meson/Ninja :-)




Re: New docs chapter on Transaction Management and related changes

2022-09-06 Thread Justin Pryzby
On Tue, Sep 06, 2022 at 04:16:02PM +0100, Simon Riggs wrote:
> New chapter on transaction management, plus a few related changes.
> 
> Markup and links are not polished yet, so please comment initially on
> the topics, descriptions and wording.

This is useful.  Nitpicks follow.


+If executed in a subtransaction this will return the top-level xid.

I'd prefer it with a comma after "subtransaction"

+SQL Standard. PostgreSQL supports SAVEPOINTs through the implementation
+defined mechanism of subtransactions, which offer a superset of the required

implementation-defined

+features. Prepared transactions allow PostgreSQL to implement what SQL

what *the

+
+All transactions are identified by a unique VirtualTransactionId (virtualxid or
+vxid), e.g. 4/12532, which is comprised of the BackendId (in this example, 4)
+and a sequentially assigned number unique to that backend, known as LocalXid

sequentially-assigned

+
+If a transaction attempts to write to the database it will be assigned a

database comma

+property is used by the transaction system to say that if one xid is earlier
+than another xid then the earlier transaction attempted to write before the

xid comma then

+
+Currently executing transactions are shown in the pg_locks view in columns

currently-executing

+"virtualxid" (text) and "transactionid" (xid), if an xid has been assigned.

maybe remove the "if assigned" part, since it's described next?

+Read transactions will have a virtualxid but a NULL xid, while write
+transactions will have both a virtualxid and an xid assigned.
+

+
+Row-level read locks may require the assignment of a multixact ID (mxid), which
+are recorded in the pg_multixact directory.

which *is ?

+top-level transaction. Subtransactions can also be started from other
+subtransactions. As a result, the arrangement of transaction and 
subtransactions

transactions (plural) ?

+form a hierarchy or tree. Thus, each subtransaction has one parent transaction.

+At present in PostgreSQL, only one transaction or subtransaction can be active 
at
+one time.

one time per command/query/request.

+Subtransactions may end via a commit or abort without affecting their parent

may end either by committing or aborting, without ..

+transaction, allowing the parent transaction to continue.

+also be started in other ways, such as PL/pgSQL's EXCEPTION clause. PL/Python 
and
+PL/TCL also support explicit subtransactions. Working with C API, users may 
also
+call BeginInternalSubTransaction().

*the C API ?

+If a subtransaction is assigned an xid, we refer to this as a subxid. Read-only
+subtransactions are not assigned a subxid, but when a subtransaction attempts 
to
+write it will be assigned a subxid. We ensure that all of a subxid's parents, 
up

write comma.
Or say: "subxid is not assigned until the subtransaction attempts to
write" ?

+
+The more subtransactions each transaction uses, the greater the overhead for
+transaction management. Up to 64 subxids are cached in shmem for each backend,

backend semicolon

+Those commands are extensions to the SQL Standard, since the SQL syntax is not 
yet
+yet part of the standard, though the Standard does refer to encompassing

yet yet

+Information relating to these is stored in pg_twophase. Currently prepared

s/these/two-phase commits/

+transactions can be inspected using pg_prepared_xacts view.
+

currently-prepared ?




Re: Fix possible bogus array out of bonds (src/backend/access/brin/brin_minmax_multi.c)

2022-09-06 Thread David Rowley
On Tue, 6 Sept 2022 at 23:25, Ranier Vilela  wrote:
> But first, I would like to continue with this correction of using strings.
> In the following cases:
> fprintf -> fputs -> fputc
> printf -> puts -> putchar
>
> There are many occurrences, do you think it would be worth the effort?

I'm pretty unexcited about that. Quite a bit of churn and adding
another precedent that we currently have no good way to enforce or
maintain.

In addition to that, puts() is a fairly seldom used function, which
perhaps is because it's a bit quirky and appends a \n to the end of
the string. I'm just imagining all the bugs where we append an extra
newline. But, feel free to open another thread about it and see if you
can drum up any support.

David




Re: Add tracking of backend memory allocated to pg_stat_activity

2022-09-06 Thread Reid Thompson
On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
> 
> > @@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
> >     return NULL;
> >   
> >     context->mem_allocated += blksize;
> > +   pgstat_report_backend_mem_allocated_increase(blksi
> > ze);
> 
> I'm not sure this is acceptable. The function adds a branch even when
> the feature is turned off, which I think may cause a certain extent
> of
> performance degradation. A past threads [1], [2] and [3] might be
> informative.

 Stated above is '...even when the feature is turned off...', I want to
 note that this feature/patch (for tracking memory allocated) doesn't
 have an 'on/off'. Tracking would always occur.

 I'm open to guidance on testing for performance degradation.  I did
 note some basic pgbench comparison numbers in the thread regarding
 limiting backend memory allocations. 

> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center





Re: HOT chain validation in verify_heapam()

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 6:34 AM Himanshu Upadhyaya
 wrote:
>> This isn't safe because the line pointer referenced by rditem may not
>> have been sanity-checked yet. Refer to the comment just below where it
>> says "Sanity-check the line pointer's offset and length values".
>>
> handled by creating a new function check_lp and calling it before accessing 
> the redirected tuple.

I think this is going to result in duplicate error messages, because
if A redirects to B, what keeps us from calling check_lp(B) once when
we reach A and again when we reach B?

I am kind of generally suspicious of the idea that, both for redirects
and for ctid links, you just have it check_lp() on the target line
pointer and then maybe try to skip doing that again later when we get
there. That feels error-prone to me. I think we should try to find a
way of organizing the code where we do the check_lp() checks on all
line pointers in order without skipping around or backing up. It's not
100% clear to me what the best way of accomplishing that is, though.

But here's one random idea: add a successor[] array and an lp_valid[]
array. In the first loop, set lp_valid[offset] = true if it passes the
check_lp() checks, and set successor[A] = B if A redirects to B or has
a CTID link to B, without matching xmin/xmax. Then, in a second loop,
iterate over the successor[] array. If successor[A] = B && lp_valid[A]
&& lp_valid[B], then check whether A.xmax = B.xmin; if so, then
complain if predecessor[B] is already set, else set predecessor[B] =
A. Then, in the third loop, iterate over the predecessor array just as
you're doing now. Then it's clear that we do the lp_valid checks
exactly once for every offset that might need them, and in order. And
it's also clear that the predecessor-based checks can never happen
unless the lp_valid checks passed for both of the offsets involved.

> Done, reformatted using pg_indent.

Thanks, but the new check_lp() function's declaration is not formatted
according to pgindent guidelines. It's not enough to fix the problems
once, you have to avoid reintroducing them.

>> +if (!TransactionIdIsValid(curr_xmax) &&
>> +HeapTupleHeaderIsHotUpdated(curr_htup))
>> +{
>> +report_corruption(&ctx,
>> +psprintf("Current tuple at offset %u is
>> HOT but is last tuple in the HOT chain.",
>> +(unsigned) ctx.offnum));
>> +}
>>
>> This check has nothing to do with the predecessor[] array, so it seems
>> like it belongs in check_tuple() rather than here. Also, the message
>> is rather confused, because the test is checking whether the tuple has
>> been HOT-updated, while the message is talking about whether the tuple
>> was *itself* created by a HOT update. Also, when we're dealing with
>> corruption, statements like "is last tuple in the HOT chain" are
>> pretty ambiguous. Also, isn't this an issue for both HOT-updated
>> tuples and also just regular updated tuples? i.e. maybe what we should
>> be complaining about here is something like "tuple has been updated,
>> but xmax is 0" and then make the test check exactly that.
>
> Moved to check_tuple_header. This should be applicable for both HOT and 
> normal updates but even the last updated tuple in the normal update is 
> HEAP_UPDATED so not sure how we can apply this check for a normal update?

Oh, yeah.  You're right. I was thinking that HEAP_UPDATED was like
HEAP_HOT_UPDATED, but it's not: HEAP_UPDATED gets set on the new
tuple, while HEAP_HOT_UPDATED gets set on the old tuple.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Switching XLog source from archive to streaming when primary available

2022-09-06 Thread Nathan Bossart
+  
+   wal_source_switch_interval configuration 
parameter
+  

I don't want to bikeshed on the name too much, but I do think we need
something more descriptive.  I'm thinking of something like
streaming_replication_attempt_interval or
streaming_replication_retry_interval.

+Specifies how long the standby server should wait before switching WAL
+source from WAL archive to primary (streaming replication). This can
+happen either during the standby initial recovery or after a previous
+failed attempt to stream WAL from the primary.

I'm not sure what the second sentence means.  In general, I think the
explanation in your commit message is much clearer:

The standby makes an attempt to read WAL from primary after
wal_retrieve_retry_interval milliseconds reading from archive.

+If this value is specified without units, it is taken as milliseconds.
+The default value is 5 seconds. A setting of 0
+disables the feature.

5 seconds seems low.  I would expect the default to be 1-5 minutes.  I
think it's important to strike a balance between interrupting archive
recovery to attempt streaming replication and letting archive recovery make
progress.

+* Try reading WAL from primary after every wal_source_switch_interval
+* milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If
+* successful, the state machine moves to XLOG_FROM_STREAM state, 
otherwise
+* it falls back to XLOG_FROM_ARCHIVE state.

It's not clear to me how this is expected to interact with the pg_wal phase
of standby recovery.  As the docs note [0], standby servers loop through
archive recovery, recovery from pg_wal, and streaming replication.  Does
this cause the pg_wal phase to be skipped (i.e., the standby goes straight
from archive recovery to streaming replication)?  I wonder if it'd be
better for this mechanism to simply move the standby to the pg_wal phase so
that the usual ordering isn't changed.

+   if (!first_time &&
+   
TimestampDifferenceExceeds(last_switch_time, curr_time,
+   
   wal_source_switch_interval))

Shouldn't this also check that wal_source_switch_interval is not set to 0?

+   elog(DEBUG2,
+"trying to switch WAL 
source to %s after fetching WAL from %s for %d milliseconds",
+
xlogSourceNames[XLOG_FROM_STREAM],
+
xlogSourceNames[currentSource],
+
wal_source_switch_interval);
+
+   last_switch_time = curr_time;

Shouldn't the last_switch_time be set when the state machine first enters
XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
source switch.  This would mean that a standby that has spent a lot of time
in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
seems like we could end up rapidly looping between sources.  Perhaps I am
misunderstanding how this is meant to work.

+   {
+   {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY,
+   gettext_noop("Sets the time to wait before switching 
WAL source from archive to primary"),
+   gettext_noop("0 turns this feature off."),
+   GUC_UNIT_MS
+   },
+   &wal_source_switch_interval,
+   5000, 0, INT_MAX,
+   NULL, NULL, NULL
+   },

I wonder if the lower bound should be higher to avoid switching
unnecessarily rapidly between WAL sources.  I see that
WaitForWALToBecomeAvailable() ensures that standbys do not switch from
XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per
wal_retrieve_retry_interval.  Perhaps wal_retrieve_retry_interval should be
the lower bound for this GUC, too.  Or maybe WaitForWALToBecomeAvailable()
should make sure that the standby makes at least once attempt to restore
the file from archive before switching to streaming replication.

[0] 
https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_publication_tables show dropped columns

2022-09-06 Thread Tom Lane
"houzj.f...@fujitsu.com"  writes:
> Here is the new version patch.
> I think the query plan and cost looks better after applying the patch.

LGTM, pushed.

regards, tom lane




Re: pg_upgrade major version compatibility

2022-09-06 Thread Justin Pryzby
On Tue, Sep 06, 2022 at 01:50:10PM +0200, Christoph Berg wrote:
> The pg_upgrade manpage in PG 14 and earlier claims that upgrades from
> 8.4 are supported, but that doesn't work:

Tom discovered 2 months ago that this was broken since a year prior.

https://www.postgresql.org/message-id/1973418.1657040382%40sss.pgh.pa.us

Evidently ever the docs still aren't updated to say or, nor the tool to
fail gracefully.

-- 
Justin




Re: freeing LDAPMessage in CheckLDAPAuth

2022-09-06 Thread Michael Paquier
On Mon, Sep 05, 2022 at 02:50:09AM -0700, Zhihong Yu wrote:
> Here is patch v4.

FWIW, I am fine with what you are basically doing with v4, so I'd like
to apply that on HEAD on the basis of consistency with libpq.  As Tom
said, this authentication path will fail, but I'd like to think that
this is a good practice anyway.  I'll wait a few days first, in case
others have comments.
--
Michael


signature.asc
Description: PGP signature


Re: pg_auth_members.grantor is bunk

2022-09-06 Thread Jeff Davis
On Tue, 2022-09-06 at 13:15 -0400, Robert Haas wrote:


> Like, the logic to infer the grantor in check_role_grantor() and
> select_best_admin() is intended to be, and as far as I know actually
> is, an exact clone of the logic in select_best_grantor(). It is
> different only in that we regard the bootstrap superuser as the
> object
> owner because there is no other owner stored in the catalogs; and in
> that we check CREATEROLE permission rather than SUPERUSER permission.

There's at least one other difference: if you specify "GRANTED BY su1"
for a table grant, it still selects the table owner as the grantor;
whereas if you specify "GRANTED BY su1" for a role grant, it selects
"su1".

   grant all privileges on schema public to public;
   create user su1 superuser;
   create user u1;
   create user u2;
   create user aa;
   grant u2 to su1 with admin option;
   \c - aa
   create table t_aa(i int);
   grant all privileges on t_aa to su1 with grant option;
   \c - su1
   grant select on t_aa to u1 granted by su1;
   -- grantor aa
   select relname, relacl from pg_class where relname='t_aa';
   grant u2 to u1 granted by su1; -- grantor su1
   -- grantor su1
   select grantor::regrole from pg_auth_members
 where member='u1'::regrole;

[ If you run the same example but where su1 is not a superuser, then
both select "su1" as the grantor because that's the only valid grantor
that can be inferred. ]

Now that I understand the underlying philosophy better, and I've
experimented with more cases, I propose the following grantor inference
behavior which I believe is in the spirit of your changes:

  * Let the granting user be the one specified in the GRANTED BY clause
if it exists; otherwise the current user. In other words, omitting
GRANTED BY is the same as specifying "GRANTED BY current_user".
  * If the granting user has privileges to be the grantor (ADMIN OPTION
for roles, GRANT OPTION for other objects) then the granting user is
the grantor.
  * Else if the granting user inherits from a user with the privileges
to be the grantor, then it selects a role with the fewest inheritance
hops as the grantor.
  * Else if the current user is any superuser:
- If the grant is a role grant, it selects the bootstrap superuser
as the grantor.
- Else the object owner is the grantor.
  * Else error (or if an error would break important backwards
compatibility, silently make it work like before or perhaps issue a
WARNING).

In other words, try to issue the grant normally if at all possible, and
play the superuser card as a last resort. I believe that will lead to
the fewest surprising cases, and make them easiest to explain, because
superuser-ness doesn't influence the outcome in as many cases.

It cements the idea that the bootstrap superuser is the "real"
superuser, and must always remain so, and that all other superusers are
temporary stand-ins (kind of but not quite the same as inheritance).
And it leaves the ugliness that we lose the information about the
"real" grantor when we play the superuser card, but, as I say above,
that would be a last resort.

The proposal would be a slight behavior change from v15 in the
following case:

   grant all privileges on schema public to public;
   create user su1 superuser;
   create user u1;
   create user aa;
   \c - aa
   create table t_aa(i int);
   grant all privileges on t_aa to su1 with grant option;
   \c - su1
   grant select on t_aa to u1 granted by su1;
   -- grantor "aa" in v15, grantor "su1" after my proposal
   select relname, relacl from pg_class where relname='t_aa';

Another change in behavior would be that the bootstrap superuser could
be the grantor for table privileges, if the bootstrap superuser has
WITH GRANT OPTION privileges.

But those seems minor to me.

Regards,
Jeff Davis





START_REPLICATION SLOT causing a crash in an assert build

2022-09-06 Thread Jaime Casanova
Hi,

I'm not sure what is causing this, but I have seen this twice. The
second time without activity after changing the set of tables in a
PUBLICATION.

gdb says that debug_query_string contains:

"""
START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', 
publication_names '"pub_pgbench"')START_REPLICATION SLOT "sub_pgbench" LOGICAL 
0/0 (proto_version '3', publication_names '"pub_pgbench"')
"""

attached the backtrace.


-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {4194304, 140731069195120, 2, 6, 6807736, 
93843951759568, 
4611686018427388799, 140276820691622, 0, 281470681751456, 0, 0, 0, 
0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7f94bdd51535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {
  0, 0, 0, 0, 0, 140276818190325, 2, 4122259537847870528, 
7018350267711514210, 
  93843951759568, 7003715780713148896, 0, 1404880143296576, 
140731069195360, 0, 
  140731069196224}}, sa_flags = -1083658032, sa_restorer = 0x0}
sigs = {__val = {32, 0 }}
#2  0x5559bfd4f0ed in ExceptionalCondition (
conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, 
NameStr(slot->data.name)) == 0", errorType=0x5559bff30e0d "FailedAssertion", 
fileName=0x5559bff30dbb "pgstat_replslot.c", 
lineNumber=89) at assert.c:69
No locals.
#3  0x5559bfbd4353 in pgstat_report_replslot (slot=0x7f94bd839280, 
repSlotStat=0x7ffe81636820) at pgstat_replslot.c:89
entry_ref = 0x5559c0876130
shstatent = 0x7f94b48f2000
statent = 0x7f94b48f2018
#4  0x5559bfb09f76 in UpdateDecodingStats (ctx=0x5559c091ca50) at 
logical.c:1862
rb = 0x5559c093f5c0
repSlotStat = {slotname = {
data = 
"\300\365\223\300YU\000\000X\206\227\300YU\000\000\250\031\225\300YU\000\000\225\257\261\277\000\000\000\000\240hc\201\376\177\000\000aT\261\277YU",
 '\000' }, spill_txns = 0, spill_count = 0, spill_bytes = 0, 
stream_txns = 0, stream_count = 0, 
  stream_bytes = 0, total_txns = 1, total_bytes = 684, 
  stat_reset_timestamp = 140731069196560}
__func__ = "UpdateDecodingStats"
#5  0x5559bfb03d2e in DecodeCommit (ctx=0x5559c091ca50, buf=0x7ffe81636b30, 
parsed=0x7ffe81636930, xid=74033, two_phase=false) at decode.c:706
origin_lsn = 0
commit_time = 715805149160794
origin_id = 0
i = 0
#6  0x5559bfb03055 in xact_decode (ctx=0x5559c091ca50, buf=0x7ffe81636b30) 
at decode.c:216
xlrec = 0x5559c095b608
parsed = {xact_time = 715805149160794, xinfo = 73, dbId = 65390, tsId = 
1663, 
  nsubxacts = 0, subxacts = 0x0, nrels = 0, xnodes = 0x0, nstats = 0, 
stats = 0x0, 
  nmsgs = 5, msgs = 0x5559c095b620, twophase_xid = 0, 
  twophase_gid = '\000' , nabortrels = 0, abortnodes 
= 0x0, 
  nabortstats = 0, abortstats = 0x0, origin_lsn = 0, origin_timestamp = 
0}
xid = 74033
two_phase = false
builder = 0x5559c0951640
reorder = 0x5559c093f5c0
r = 0x5559c091ce10
info = 0 '\000'
__func__ = "xact_decode"
#7  0x5559bfb02d78 in LogicalDecodingProcessRecord (ctx=0x5559c091ca50, 
record=0x5559c091ce10) at decode.c:119
buf = {origptr = 18637212136, endptr = 18637212272, record = 
0x5559c091ce10}
txid = 0
rmgr = {rm_name = 0x5559bfdf3c45 "Transaction", rm_redo = 
0x5559bf792884 , 
  rm_desc = 0x5559bf75aa12 , rm_identify = 0x5559bf75ab8f 
, 
  rm_startup = 0x0, rm_cleanup = 0x0, rm_mask = 0x0, 
  rm_decode = 0x5559bfb02edd }
#8  0x5559bfb3aecd in XLogSendLogical () at walsender.c:3073
record = 0x5559c095b5d8
errm = 0x0
flushPtr = 18637212272
__func__ = "XLogSendLogical"
#9  0x5559bfb3a1b4 in WalSndLoop (send_data=0x5559bfb3ae2b 
)
at walsender.c:2503
__func__ = "WalSndLoop"
#10 0x5559bfb389fe in StartLogicalReplication (cmd=0x5559c08795d8) at 
walsender.c:1333
buf = {data = 0x0, len = 3, maxlen = 1024, cursor = 87}
qc = {commandTag = 2170776608, nprocessed = 93843959044340}
__func__ = "StartLogicalReplication"
#11 0x5559bfb394b0 in exec_replication_command (
cmd_string=0x5559c08518a0 "START_REPLICATION SLOT \"sub_pgbench\" LOGICAL 
0/0 (proto_version '3', publication_names '\"pub_pgbench\"')") at 
walsender.c:1843
cmd = 0x5559c08795d8
parse_rc = 0
cmd_node = 0x5559c08795d8
cmdtag = 0x5559bff17e7a "START_REPLICATION"
cmd_context = 0x5559c0878620
old_context = 0x5559c0851780
__func__ = "exec_replication_command"
#12 0x5559bfbadbbd in PostgresMain (dbname=0x5559c087df10 "pgbench", 
username=0x5559c087dee8 "jcasano

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2022-09-06 Thread Reid Thompson
On Thu, 2022-09-01 at 11:48 +0900, Kyotaro Horiguchi wrote:
> > 
> > The patch seems to limit both of memory-context allocations and DSM
> > allocations happen on a specific process by the same budget. In the
> > fist place I don't think it's sensible to cap the amount of DSM
> > allocations by per-process budget.
> > 
> > DSM is used by pgstats subsystem. There can be cases where pgstat
> > complains for denial of DSM allocation after the budget has been
> > exhausted by memory-context allocations, or every command complains
> > for denial of memory-context allocation after once the per-process
> > budget is exhausted by DSM allocations. That doesn't seem
> > reasonable.
> > regards.

It's intended as a mechanism for administrators to limit total
postgresql memory consumption to avoid the OOM killer causing a crash
and restart, or to ensure that resources are available for other
processes on shared hosts, etc. It limits all types of allocations in
order to accomplish this.  Our documentation will note this, so that
administrators that have the need to set it are aware that it can
affect all non-auxiliary processes and what the effect is.






Re: Add the ability to limit the amount of memory that can be allocated to backends.

2022-09-06 Thread Reid Thompson
On Fri, 2022-09-02 at 09:30 +0200, Drouvot, Bertrand wrote:
> Hi,
> 
> I'm not sure we are choosing the right victims here (aka the ones
> that are doing the request that will push the total over the limit).
>
> Imagine an extreme case where a single backend consumes say 99% of
> the limit, shouldn't it be the one to be "punished"? (and somehow forced
> to give the memory back).
> 
> The problem that i see with the current approach is that a "bad"
> backend could impact all the others and continue to do so.
> 
> what about punishing say the highest consumer , what do you think?
> (just speaking about the general idea here, not about the implementation)

Initially, we believe that punishing the detector is reasonable if we
can help administrators avoid the OOM killer/resource starvation.  But
we can and should expand on this idea.

Another thought is, rather than just failing the query/transaction we
have the affected backend do a clean exit, freeing all it's resources.



-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com







Re: pg_auth_members.grantor is bunk

2022-09-06 Thread Jeff Davis
On Tue, 2022-09-06 at 16:26 -0700, Jeff Davis wrote:
> In other words, omitting
> GRANTED BY is the same as specifying "GRANTED BY current_user".

Let me correct this thinko to distinguish between specifying GRANTED BY
and not:

  * Let the granting user be the one specified in the GRANTED BY clause
if it exists; otherwise the current user.
  * If the granting user has privileges to be the grantor (ADMIN OPTION
for roles, GRANT OPTION for other objects) then the granting user is
the grantor.
  * Else if GRANTED BY was *not* specified, infer the grantor:
- If the granting user inherits from a role with the privileges
to be the grantor, then it selects a role with the fewest inheritance
hops as the grantor.
- Else if the current user is any superuser, the grantor is the top
"owner" (bootstrap superuser for roles; object owner for other objects)
  * Else error (or if an error would break important backwards
compatibility, silently make it work like before and perhaps issue a
WARNING).

The basic idea is to use superuser privileges as a last resort in order
to maximize the cases that work normally (independent of superuser-
ness).

Regards,
Jeff Davis





PostgreSQL 15 Beta 4 release announcement draft

2022-09-06 Thread Jonathan S. Katz

Hi,

I've attached a draft of the PostgreSQL 15 Beta 4 release announcement. 
Please review for correctness and if there are any omissions.


Please provide feedback on the draft no later than Sep 8, 2022 0:00 AoE.

Thanks!

Jonathan
The PostgreSQL Global Development Group announces that the fourth beta release 
of
PostgreSQL 15 is now [available for 
download](https://www.postgresql.org/download/).
This release contains previews of all features that will be available when
PostgreSQL 15 is made generally available, though some details of the release
can change during the beta period.

You can find information about all of the PostgreSQL 15 features and changes in
the [release notes](https://www.postgresql.org/docs/15/release-15.html):

  
[https://www.postgresql.org/docs/15/release-15.html](https://www.postgresql.org/docs/15/release-15.html)

In the spirit of the open source PostgreSQL community, we strongly encourage you
to test the new features of PostgreSQL 15 on your systems to help us eliminate
bugs or other issues that may exist. While we do not advise you to run
PostgreSQL 15 Beta 4 in production environments, we encourage you to find ways
to run your typical application workloads against this beta release.

Your testing and feedback will help the community ensure that PostgreSQL 15
upholds our standards of delivering a stable, reliable release of the world's
most advanced open source relational database. Please read more about our
[beta testing process](https://www.postgresql.org/developer/beta/) and how you
can contribute:

  
[https://www.postgresql.org/developer/beta/](https://www.postgresql.org/developer/beta/)

Upgrading to PostgreSQL 15 Beta 4
-

To upgrade to PostgreSQL 15 Beta 4 from an earlier beta or previous version of
PostgreSQL, you will need to use a strategy similar to upgrading between major
versions of PostgreSQL (e.g. `pg_upgrade` or `pg_dump` / `pg_restore`). For more
information, please visit the documentation section on
[upgrading](https://www.postgresql.org/docs/15/static/upgrading.html).

Changes Since Beta 3


Fixes and changes in PostgreSQL 15 Beta 3 include:

* The SQL/JSON features proposed for this release have been removed.
* [`MERGE`](https://www.postgresql.org/docs/15/sql-merge.html) statements are
explicitly rejected inside of a
[common-table expression](https://www.postgresql.org/docs/15/queries-with.html)
(aka `WITH` query) and
[`COPY`](https://www.postgresql.org/docs/15/sql-copy.html) statements.
* Enable `table_rewrite` event triggers for `ALTER MATERIALIZED VIEW`.
* Fix crash in `CREATE DATABASE ... STRATEGY WAL_LOG`.
* Fix crash with parallel vacuum.
* Fix issue with
[recovery 
prefeteching](https://www.postgresql.org/docs/15/runtime-config-wal.html#GUC-RECOVERY-PREFETCH)
that could cause a crash on standby promotion.
* Adjust costing to force parallelism with partition aggregates.
* Fix LSN returned for error reports of WAL read failures from the
[`pg_walinspect`](https://www.postgresql.org/docs/15/pgwalinspect.html) 
extension.

Please see the [release 
notes](https://www.postgresql.org/docs/15/release-15.html)
for a complete list of new and changed features:

  
[https://www.postgresql.org/docs/15/release-15.html](https://www.postgresql.org/docs/15/release-15.html)

Testing for Bugs & Compatibility


The stability of each PostgreSQL release greatly depends on you, the community,
to test the upcoming version with your workloads and testing tools to find bugs
and regressions before the general availability of PostgreSQL 15. As
this is a beta release, changes to database behaviors, feature details, and
APIs are still possible. Your feedback and testing will help determine the final
tweaks on the new features, so please test in the near future. The quality of
user testing helps determine when we can make a final release.

A list of [open 
issues](https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items)
is publicly available in the PostgreSQL wiki.  You can
[report bugs](https://www.postgresql.org/account/submitbug/) using this form on
the PostgreSQL website:

  
[https://www.postgresql.org/account/submitbug/](https://www.postgresql.org/account/submitbug/)

Beta Schedule
-

This is the fourth beta release of version 15. The PostgreSQL Project will
release additional betas as required for testing, followed by one or more
release candidates, until the final release in late 2022. For further
information please see the [Beta 
Testing](https://www.postgresql.org/developer/beta/) page.

Links
-

* [Download](https://www.postgresql.org/download/)
* [Beta Testing Information](https://www.postgresql.org/developer/beta/)
* [PostgreSQL 15 Beta Release 
Notes](https://www.postgresql.org/docs/15/release-15.html)
* [PostgreSQL 15 Open 
Issues](https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items)
* [Submit a Bug](https://www.postgresql.org/account/submitbug/)
* [Follow @

Re: PostgreSQL 15 Beta 4 release announcement draft

2022-09-06 Thread David Rowley
On Wed, 7 Sept 2022 at 13:40, Jonathan S. Katz  wrote:
> Please provide feedback on the draft no later than Sep 8, 2022 0:00 AoE.

"* Adjust costing to force parallelism with partition aggregates."

If that's about 01474f5698, then it does not need to be mentioned.
All that commit does is update the regression tests so that they're
properly exercising what they were originally meant to test.

David




RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread kuroda.hay...@fujitsu.com
Dear Amit,

Thanks for giving comments!

> Did you get this new assertion failure after you applied the patch for
> the first failure? Because otherwise, how can you reach it with the
> same test case?

The first failure is occurred only in the HEAD, so I did not applied the first 
patch
to REL14 and REL15.
This difference is caused because the commit [Fix catalog lookup...] in 
REL15(272248a) and older is different
from the HEAD one.
In order versions SnapBuildXidSetCatalogChanges() has been added. In the 
function
a transaction will be marked as containing catalog changes if the transaction 
is in InitialRunningXacts,
and after that the relation between sub-top transactions is assigned based on 
the parsed->subxact.
The marking avoids the first failure, but the assignment triggers new failure.


> About patch:
> else if (sub_needs_timetravel)
>   {
> - /* track toplevel txn as well, subxact alone isn't meaningful */
> + elog(DEBUG2, "forced transaction %u to do timetravel due to one of
> its subtransaction",
> + xid);
> + needs_timetravel = true;
>   SnapBuildAddCommittedTxn(builder, xid);
> 
> Why did you remove the above comment? I think it still makes sense to retain 
> it.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



HEAD-v2-0001-Fix-assertion-failure-during-logical-decoding.patch
Description:  HEAD-v2-0001-Fix-assertion-failure-during-logical-decoding.patch


Re: more descriptive message for process termination due to max_slot_wal_keep_size

2022-09-06 Thread Kyotaro Horiguchi
(I noticed I sent a wrong version..)

At Tue, 6 Sep 2022 10:54:35 +0200, "Drouvot, Bertrand"  
wrote in 
> Thanks for the new patch version!. I did not realized (sorry about
> that) that we'd need to expose byte_size_pretty(). Now I wonder if we

I didn't think we need the units larger than MB, but I used
pretty_print to prevent small number from rounding to exactly zero. On
the other hand, in typical cases it is longer than 6 digits in bytes,
which is a bit hard to read a glance.

> LOG:  terminating process 16034 to release replication slot "rep1" because 
> its restart_lsn 0/3158000 exceeds the limit by 15368192 bytes

> should not simply report the number of bytes (like I can see it is
> done in many places). So something like:
..
> + (errmsg("terminating process %d to release replication slot \"%s\"
> because its restart_lsn %X/%X exceeds the limit by %lu bytes",
..
> and then forget about exposing/using byte_size_pretty() (that would be
> more consistent with the same kind of reporting in the existing code).
> 
> What do you think?

An alterntive would be rounding up to the whole MB, or a sub-MB.

>ereport(LOG,
>(errmsg("terminating process %d to release replication slot \"%s\" 
> because its restart_lsn %X/%X exceeds the limit by %.1lf MB",
>active_pid, NameStr(slotname),
>LSN_FORMAT_ARGS(restart_lsn),
>/* round-up at sub-MB */
>ceil((double) (oldestLSN - restart_lsn) / 1024 / 102.4) / 10),

> LOG:  terminating process 49539 to release replication slot "rep1" because 
> its restart_lsn 0/3038000 exceeds the limit by 15.8 MB

If the distance were 1 byte, it is shown as "0.1 MB".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pgsql: Add ALTER SUBSCRIPTION ... SKIP.

2022-09-06 Thread Amit Kapila
On Tue, Sep 6, 2022 at 7:40 PM Masahiko Sawada  wrote:
>
> > > I think they should say something like
> > >
> > > "logical replication starts skipping transaction with LSN %X/%X"
> > > "logical replication completed skipping transaction with LSN %X/%X"
> > >
> >
> > This looks better to me.
>
> +1
>
> > If you find the above argument to
> > differentiate between the start and end LSN convincing then we can
> > think of replacing "with" in the above messages with "finished at". I
> > see your point related to using "finished at" for PREPARE may not be a
> > good idea but I don't have better ideas for the same.
>
> Given that the user normally doesn't need to be aware of the
> difference between start LSN and end LSN in the context of using this
> feature, I think we can use "with LSN %X/%X".
>

Fair enough.

Alvaro, would you like to push your proposed change? Otherwise, I am
happy to take care of this.

-- 
With Regards,
Amit Kapila.




Re: Reducing the chunk header sizes on all memory context types

2022-09-06 Thread David Rowley
On Tue, 6 Sept 2022 at 01:41, David Rowley  wrote:
>
> On Fri, 2 Sept 2022 at 20:11, David Rowley  wrote:
> >
> > On Thu, 1 Sept 2022 at 12:46, Tom Lane  wrote:
> > >
> > > David Rowley  writes:
> > > > Maybe we should just consider always making room for a sentinel for
> > > > chunks that are on dedicated blocks. At most that's an extra 8 bytes
> > > > in some allocation that's either over 1024 or 8192 (depending on
> > > > maxBlockSize).
> > >
> > > Agreed, if we're not doing that already then we should.
> >
> > Here's a patch to that effect.
>
> If there are no objections, then I plan to push that patch soon.

I've now pushed the patch which adds the sentinel space in more cases.

The final analysis I did on the stats gathered during make
installcheck show that we'll now allocate about 19MBs more over the
entire installcheck run out of about 26GBs total allocations.

That analysis looks something like:

Before:

SELECT CASE
 WHEN pow2_size > 0
  AND pow2_size = size THEN 'No'
 WHEN pow2_size = 0
  AND size = maxalign_size THEN 'No'
 ELSE 'Yes'
   ENDAS has_sentinel,
   Count(*)   AS n_allocations,
   Sum(CASE
 WHEN pow2_size > 0 THEN pow2_size
 ELSE maxalign_size
   END) / 1024 / 1024 mega_bytes_alloc
FROM   memstats
GROUP  BY 1;
has_sentinel | n_allocations | mega_bytes_alloc
--+---+--
 No   |  26445855 |21556
 Yes  |  37602052 | 5044

After:

SELECT CASE
 WHEN pow2_size > 0
  AND pow2_size = size THEN 'No'
 WHEN pow2_size = 0
  AND size = maxalign_size THEN 'Yes' -- this part changed
 ELSE 'Yes'
   ENDAS has_sentinel,
   Count(*)   AS n_allocations,
   Sum(CASE
 WHEN pow2_size > 0 THEN pow2_size
 WHEN size = maxalign_size THEN maxalign_size + 8
 ELSE maxalign_size
   END) / 1024 / 1024 mega_bytes_alloc
FROM   memstats
GROUP  BY 1;
has_sentinel | n_allocations | mega_bytes_alloc
--+---+--
 No   |  23980527 | 2177
 Yes  |  40067380 |24442

That amounts to previously having about 58.7% of allocations having a
sentinel up to 62.6% currently, during the installcheck run.

It seems a pretty large portion of allocation request sizes are
power-of-2 sized and use AllocSet.

David




Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread vignesh C
On Tue, 6 Sept 2022 at 09:31, shiy.f...@fujitsu.com
 wrote:
>
> On Tue, Sep 6, 2022 11:14 AM vignesh C  wrote:
> >
> > Thanks for the comments, the attached patch has the changes for the same.
> >
>
> Thanks for updating the patch. Here are some comments.
>
> 1.
> +   if (subrel_count)
> +   {
> +   /*
> +* In case of ALTER SUBSCRIPTION ... REFRESH, 
> subrel_local_oids
> +* contains the list of relation oids that are already 
> present on the
> +* subscriber. This check should be skipped for these tables.
> +*/
> +   appendStringInfo(&cmd, "AND C.oid NOT IN (\n"
> +"SELECT C.oid \n"
> +"FROM pg_class C\n"
> +" JOIN pg_namespace N ON 
> N.oid = C.relnamespace\n"
> +"WHERE ");
> +   get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
> +   appendStringInfo(&cmd, ")");
> +   }
>
> I think we can skip the tables without using a subquery. See the SQL below.
> Would it be better?
>
> SELECT DISTINCT P.pubname AS pubname
> FROM pg_publication P,
>  LATERAL pg_get_publication_tables(P.pubname) GPT
>  LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
>  pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN ('p1', 
> 'p3')
> AND NOT (N.nspname = 'public' AND C.relname = 'tbl')
> AND NOT (N.nspname = 'public' AND C.relname = 't1');

Modified

> 2.
> +   When using a subscription parameter combination of
> +   copy_data=true and origin=NONE, the
>
> Could we use "copy_data = true" and "origin = NONE" (add two spaces around the
> equals sign)? I think it looks clearer that way. And it is consistent with 
> other
> places in this file.

Modified

Thanks for the comments, the attached v47 patch has the changes for the same.

Regards,
Vignesh


v47-0001-Raise-a-warning-if-there-is-a-possibility-of-dat.patch
Description: Binary data


v47-0002-Document-the-steps-for-replication-between-prima.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread vignesh C
On Tue, 6 Sept 2022 at 09:31, wangw.f...@fujitsu.com
 wrote:
>
> On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> > Thanks for the comments, the attached patch has the changes for the same.
>
> Thanks for updating the patch.
>
> Here is one comment for 0001 patch.
> 1. The query in function check_publications_origin.
> +   appendStringInfoString(&cmd,
> +  "SELECT DISTINCT P.pubname 
> AS pubname\n"
> +  "FROM pg_publication P,\n"
> +  " LATERAL 
> pg_get_publication_tables(P.pubname) GPT\n"
> +  " LEFT JOIN 
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +  " pg_class C JOIN 
> pg_namespace N ON (N.oid = C.relnamespace)\n"
> +  "WHERE C.oid = GPT.relid 
> AND PS.srrelid IS NOT NULL AND P.pubname IN (");
>
> Since I found that we only use "PS.srrelid" in the WHERE statement by
> specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to join
> the table pg_subscription_rel?

Thanks for the comment, the v47 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm33T%3D23P-GWvy3O7cT1BfHuGV8dosAw1AVLf40MPvg2bg%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread vignesh C
On Tue, 6 Sept 2022 at 15:25, Amit Kapila  wrote:
>
> On Tue, Sep 6, 2022 at 9:31 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> > > Thanks for the comments, the attached patch has the changes for the same.
> >
> > Thanks for updating the patch.
> >
> > Here is one comment for 0001 patch.
> > 1. The query in function check_publications_origin.
> > +   appendStringInfoString(&cmd,
> > +  "SELECT DISTINCT 
> > P.pubname AS pubname\n"
> > +  "FROM pg_publication 
> > P,\n"
> > +  " LATERAL 
> > pg_get_publication_tables(P.pubname) GPT\n"
> > +  " LEFT JOIN 
> > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> > +  " pg_class C JOIN 
> > pg_namespace N ON (N.oid = C.relnamespace)\n"
> > +  "WHERE C.oid = GPT.relid 
> > AND PS.srrelid IS NOT NULL AND P.pubname IN (");
> >
> > Since I found that we only use "PS.srrelid" in the WHERE statement by
> > specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to 
> > join
> > the table pg_subscription_rel?
> >
>
> I also think we can use INNER JOIN here but maybe there is a reason
> why that is not used in the first place. If we change it in the code
> then also let's change the same in the docs section as well.

Modified

> Few minor points:
> ===
> 1.
> This scenario is detected and a WARNING is logged to the
> +   user, but the warning is only an indication of a potential problem; it is
> +   the user responsibility to make the necessary checks to ensure the copied
> +   data origins are really as wanted or not.
> +  
>
> /user/user's

Modified

> 2. How about a commit message like:
> Raise a warning if there is a possibility of data from multiple origins.
>
> This commit raises a warning message for a combination of options
> ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> operations if the publication tables were also replicated from other
> publishers.
>
> During replication, we can skip the data from other origins as we have that
> information in WAL but that is not possible during initial sync so we raise
> a warning if there is such a possibility.

Yes, this looks good. Modified

Thanks for the comment, the v47 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm33T%3D23P-GWvy3O7cT1BfHuGV8dosAw1AVLf40MPvg2bg%40mail.gmail.com

Regards,
Vignesh




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-09-06 Thread Amit Kapila
On Mon, Aug 15, 2022 at 4:56 PM Melih Mutlu  wrote:
>
> Hi Amit,
>
> Amit Kapila , 6 Ağu 2022 Cmt, 16:01 tarihinde şunu 
> yazdı:
>>
>> I think there is some basic flaw in slot reuse design. Currently, we
>> copy the table by starting a repeatable read transaction (BEGIN READ
>> ONLY ISOLATION LEVEL REPEATABLE READ) and create a slot that
>> establishes a snapshot which is first used for copy and then LSN
>> returned by it is used in the catchup phase after the copy is done.
>> The patch won't establish such a snapshot before a table copy as it
>> won't create a slot each time. If this understanding is correct, I
>> think we need to use ExportSnapshot/ImportSnapshot functionality to
>> achieve it or do something else to avoid the problem mentioned.
>
>
> I did not really think about the snapshot created by replication slot while 
> making this change. Thanks for pointing it out.
> I've been thinking about how to fix this issue. There are some points I'm 
> still not sure about.
> If the worker will not create a new replication slot, which snapshot should 
> we actually export and then import?
>

Can we (export/import) use the snapshot we used the first time when a
slot is created for future transactions that copy other tables?
Because if we can do that then I think we can use the same LSN as
returned for the slot when it was created for all other table syncs.

-- 
With Regards,
Amit Kapila.




Re: windows resource files, bugs and what do we actually want

2022-09-06 Thread Peter Eisentraut

On 30.08.22 00:13, Andres Freund wrote:

1) For make based builds, all libraries that are built with MODULES rather
than MODULES_big have the wrong "FILETYPE", because Makefile.win32 checks
$(shlib), which is only set for MODULES_big.

This used to be even more widely wrong until recently:

commit 16a4a3d59cd5574fdc697ea16ef5692ce34c54d5
Author: Peter Eisentraut 
Date:   2020-01-15 10:15:06 +0100

Remove libpq.rc, use win32ver.rc for libpq

Afaict before that we only set it correctly for pgevent.


Note, when I worked on this at that time, it was with the aim of 
simplifying the version stamping script.  So I don't really know much 
about this.



3) We don't add an icon to postgres ("This is a daemon process, which is why
it is not labeled as an executable"), but we do add icons to several
libraries, at least snowball, pgevent, libpq.

We should probably just remove the icon from the libraries?


Wouldn't the icon still show up in the file manager or something?  Where 
is the icon actually used?



4) We include the date, excluding 0 for some mysterious reason, in the version
number. This seems to unnecessarily contribute to making the build not
reproducible. Hails from long ago:


Yeah, that is evil.


5) We have a PGFILEDESC for (nearly?) every binary/library. They largely don't
seem more useful descriptions than the binary's name. Why don't we just
drop most of them and just set the description as something like
"PostgreSQL $name (binary|library)"? I doubt anybody ever looks into these
details except to perhaps check the version number or such.


We do an equivalent shortcut with the pkg-config files:

echo 'Description: PostgreSQL lib$(NAME) library' >>$@

Seems good enough.




Re: [RFC] building postgres with meson - v11

2022-09-06 Thread Peter Eisentraut

On 02.09.22 01:12, samay sharma wrote:

So, I was thinking of the following structure:
- Supported Platforms
- Getting the Source
- Building with make and autoconf
   -- Short version
   -- Requirements
   -- Installation Procedure and it's subsections
- Building with Meson
   -- Short version
   -- Requirements
   -- Installation Procedure and it's subsections
- Post-installation Setup
- Platform specific notes


I like that.

As a follow up patch, we could also try to fit the Windows part into 
this model. We could add a Building with visual C++ or Microsoft windows 
SDK section. It doesn't have a short version but follows the remaining 
template of requirements and installation procedure subsections 
(Building, Cleaning and Installing and Running Regression tests) well.


We were thinking about removing the old Windows build system for PG 16. 
Let's see how that goes.  Otherwise, yes, that would be good as well.





Re: [RFC] building postgres with meson - v11

2022-09-06 Thread Peter Eisentraut

On 02.09.22 19:16, samay sharma wrote:
Another thing I'd like input on. A common question I've heard from 
people who've tried out the docs is How do we do the equivalent of X in 
make with meson. As meson will be new for a bunch of people who are 
fluent with make, I won't be surprised if this is a common ask. To 
address that, I was planning to add a page to specify the key things one 
needs to keep in mind while "migrating" from make to meson and having a 
translation table of commonly used commands.


I was planning to add it in the meson section, but if we go ahead with 
the structure proposed above, it doesn't fit it into one as cleanly. 
Maybe, it still goes in the meson section? Thoughts?


This could go into the wiki.

For example, we have 
, which was added 
during the CVS->Git transition.


This avoids that we make the PostgreSQL documentation a substitute 
manual for a third-party product.





RE: pg_publication_tables show dropped columns

2022-09-06 Thread houzj.f...@fujitsu.com
On Wednesday, September 7, 2022 6:01 AM Tom Lane  wrote:
> Subject: Re: pg_publication_tables show dropped columns
> 
> "houzj.f...@fujitsu.com"  writes:
> > Here is the new version patch.
> > I think the query plan and cost looks better after applying the patch.
> 
> LGTM, pushed.

Thanks for pushing.

Best regards,
Hou zj




Re: Reducing the chunk header sizes on all memory context types

2022-09-06 Thread Tom Lane
David Rowley  writes:
> It seems a pretty large portion of allocation request sizes are
> power-of-2 sized and use AllocSet.

No surprise there, we've been programming with aset.c's behavior
in mind for ~20 years ...

regards, tom lane




Re: [RFC] building postgres with meson - v12

2022-09-06 Thread Peter Eisentraut



On 31.08.22 20:11, Andres Freund wrote:

src/port/win32ver.rc.in: This is redundant with src/port/win32ver.rc.
(Note that the latter is also used as an input file for text
substitution.  So having another file named *.in next to it would be
super confusing.)

Yea, this stuff isn't great. I think the better solution, both for meson and
for configure, would be to move to do all the substitution to the C
preprocessor.


Yeah, I think if we can get rid of the evil date-based versioning, then
this could be done like

diff --git a/src/makefiles/Makefile.win32 b/src/makefiles/Makefile.win32
index 17d6819644..609156382f 100644
--- a/src/makefiles/Makefile.win32
+++ b/src/makefiles/Makefile.win32
@@ -65,21 +65,12 @@ endif
 # win32ver.rc or furnish a rule for generating one.  Set $(PGFILEDESC) to
 # signal win32ver.rc availability to the dll build rule below.
 ifndef PGXS
-win32ver.rc: $(top_srcdir)/src/port/win32ver.rc
-   sed -e 's;FILEDESC;$(PGFILEDESC);' \
-   -e 's;VFT_APP;$(PGFTYPE);' \
-   -e 's;_ICO_;$(PGICOSTR);' \
-   -e 's;\(VERSION.*\),0 *$$;\1,'`date '+%y%j' | sed 's/^0*//'`';' \
-   -e '/_INTERNAL_NAME_/$(if $(shlib),s;_INTERNAL_NAME_;"$(basename 
$(shlib))";,d)' \
-   -e '/_ORIGINAL_NAME_/$(if $(shlib),s;_ORIGINAL_NAME_;"$(shlib)";,d)' \
- $< >$@
-
 # Depend on Makefile.global to force rebuild on re-run of configure.
 win32ver.rc: $(top_builddir)/src/Makefile.global
 endif

-win32ver.o: win32ver.rc
-   $(WINDRES) -i $< -o $@ --include-dir=$(top_builddir)/src/include 
--include-dir=$(srcdir)
+win32ver.o: $(top_srcdir)/src/port/win32ver.rc
+   $(WINDRES) -i $< -o $@ --include-dir=$(top_builddir)/src/include --include-dir=$(srcdir) -D 
FILEDESC=$(PGFILEDESC) -D VFT_APP=$(PGFTYPE) -D_ICO_=$(PGICOSTR) -D_INTERNAL_NAME_=$(if 
$(shlib),s;_INTERNAL_NAME_;"$(basename $(shlib))";,d) -D_ORIGINAL_NAME_=$(if 
$(shlib),s;_ORIGINAL_NAME_;"$(shlib)";,d)


Probably needs some careful checking of the quoting.  But that should be
the right thing in principle.




Re: [RFC] building postgres with meson - v12

2022-09-06 Thread Peter Eisentraut

On 02.09.22 18:57, Andres Freund wrote:

Is it worth running ninja -t missingdeps as a test? At the time we run tests
we'll obviously have built and thus collected "real" dependencies, so we would
have the necessary information to determine whether dependencies are missing.
I think it'd be fine to do so only for ninja >= 1.11, rather than falling back
to the llvm python implementation, which is much slower (0.068s vs
3.760s). And also because it's not as obvious how to include the python script.

Alternatively, we could just document that ninja -t missingdeps is worth
running. Perhaps at the top of the toplevel build.meson file?


In the GNU/make world there is a distinction between "check" and 
"maintainer-check" for this kind of thing.


I think here if we put these kinds of things into a different, what's 
the term, "suite", then that would be a clear way to collect them and be 
able to run them all easily.






Re: PostgreSQL 15 Beta 4 release announcement draft

2022-09-06 Thread Erik Rijkers

Op 07-09-2022 om 03:40 schreef Jonathan S. Katz:

Hi,

I've attached a draft of the PostgreSQL 15 Beta 4 release announcement. 
Please review for correctness and if there are any omissions.


Please provide feedback on the draft no later than Sep 8, 2022 0:00 AoE.


'Fixes and changes in PostgreSQL 15 Beta 3 include:'  should be
'Fixes and changes in PostgreSQL 15 Beta 4 include:'


Erik




Re: Different compression methods for FPI

2022-09-06 Thread Michael Paquier
On Tue, Sep 06, 2022 at 03:47:05PM +0900, Michael Paquier wrote:
> On Sun, Sep 04, 2022 at 07:23:20PM -0500, Justin Pryzby wrote:
>>  page = BufferGetPage(*buf);
>>  if (!RestoreBlockImage(record, block_id, page))
>> -elog(ERROR, "failed to restore block image");
>> +ereport(ERROR,
>> +errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +errmsg("failed to restore block image"),
>> +errdetail("%s", record->errormsg_buf));
>> +
> 
> Yes, you are right here.  elog()'s should never be used for things
> that could be triggered by the user, even if this one depends on the
> build options.  I think that the error message ought to be updated as
> "could not restore block image" instead, to be more in line with the
> project policy.

At the end, I have not taken the approach to use errdetail() for this
problem as errormsg_buf is designed to include an error string.  So, I
have finished by refining the error messages generated in
RestoreBlockImage(), consuming them with an ERRCODE_INTERNAL_ERROR.

This approach addresses a second issue, actually, because we have
never provided any context when there are inconsistencies in the
decoded record for max_block_id, has_image or in_use when restoring a
block image.  This one is older than v15, but we have received
complaints about that for ~14 as far as I know, so I would leave this
change for HEAD and REL_15_STABLE.
--
Michael
From d413109bcb6c239023effc64ead945ad594008a6 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 7 Sep 2022 15:25:50 +0900
Subject: [PATCH] Provide more context for errors of RestoreBlockImage()

---
 src/backend/access/transam/xlogreader.c   | 22 +-
 src/backend/access/transam/xlogrecovery.c |  4 +++-
 src/backend/access/transam/xlogutils.c|  4 +++-
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index cdcacc7803..94cbeba459 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -2021,7 +2021,8 @@ XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *len)
 /*
  * Restore a full-page image from a backup block attached to an XLOG record.
  *
- * Returns true if a full-page image is restored.
+ * Returns true if a full-page image is restored, and false on failure with
+ * an error to be consumed by the caller.
  */
 bool
 RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
@@ -2032,9 +2033,20 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 
 	if (block_id > record->record->max_block_id ||
 		!record->record->blocks[block_id].in_use)
+	{
+		report_invalid_record(record,
+			  "could not restore image at %X/%X with invalid block %d specified",
+			  LSN_FORMAT_ARGS(record->ReadRecPtr),
+			  block_id);
 		return false;
+	}
 	if (!record->record->blocks[block_id].has_image)
+	{
+		report_invalid_record(record, "could not restore image at %X/%X with invalid state, block %d",
+			  LSN_FORMAT_ARGS(record->ReadRecPtr),
+			  block_id);
 		return false;
+	}
 
 	bkpb = &record->record->blocks[block_id];
 	ptr = bkpb->bkp_image;
@@ -2057,7 +2069,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 	bkpb->bimg_len, BLCKSZ - bkpb->hole_length) <= 0)
 decomp_success = false;
 #else
-			report_invalid_record(record, "image at %X/%X compressed with %s not supported by build, block %d",
+			report_invalid_record(record, "could not restore image at %X/%X compressed with %s not supported by build, block %d",
   LSN_FORMAT_ARGS(record->ReadRecPtr),
   "LZ4",
   block_id);
@@ -2074,7 +2086,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 			if (ZSTD_isError(decomp_result))
 decomp_success = false;
 #else
-			report_invalid_record(record, "image at %X/%X compressed with %s not supported by build, block %d",
+			report_invalid_record(record, "could not restore image at %X/%X compressed with %s not supported by build, block %d",
   LSN_FORMAT_ARGS(record->ReadRecPtr),
   "zstd",
   block_id);
@@ -2083,7 +2095,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 		}
 		else
 		{
-			report_invalid_record(record, "image at %X/%X compressed with unknown method, block %d",
+			report_invalid_record(record, "could not restore image at %X/%X compressed with unknown method, block %d",
   LSN_FORMAT_ARGS(record->ReadRecPtr),
   block_id);
 			return false;
@@ -2091,7 +2103,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 
 		if (!decomp_success)
 		{
-			report_invalid_record(record, "invalid compressed image at %X/%X, block %d",
+			report_invalid_record(record, "could not decompress imag

Re: failing to build preproc.c on solaris with sun studio

2022-09-06 Thread Peter Eisentraut

On 05.09.22 23:34, Tom Lane wrote:

Peter Eisentraut  writes:

Why is this being proposed?


Andres is annoyed by the long build time of ecpg, which he has to
wait for whether he wants to test it or not.  I could imagine that
I might disable ecpg testing on my slowest buildfarm animals, too.

I suppose maybe we could compromise on inventing --with-ecpg but
having it default to "on", so that you have to take positive
action if you don't want it.


We already have "make all" vs. "make world" to build just the important 
stuff versus everything.  And we have "make world-bin" to build, 
approximately, everything except the slow stuff.  Let's try to work 
within the existing mechanisms.  For example, removing ecpg from "make 
all" might be sensible.


(Obviously, "all" is then increasingly becoming a lie.  Maybe a renaming 
like "all" -> "core" and "world" -> "all" could be in order.)


The approach with the make targets is better than a configure option, 
because it allows you to build a narrow set of things during development 
and then build everything for final confirmation, without having to 
re-run configure.  Also, it's less confusing for packagers.