Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

2022-04-22 Thread Thomas Munro
On Wed, Apr 6, 2022 at 5:07 AM Robert Haas  wrote:
> On Mon, Apr 4, 2022 at 10:20 PM Thomas Munro  wrote:
> > > The checkpointer never takes heavyweight locks, so the opportunity
> > > you're describing can't arise.
> >
> >   Hmm, oh, you probably meant the buffer interlocking
> > in SyncOneBuffer().  It's true that my most recent patch throws away
> > more requests than it could, by doing the level check at the end of
> > the loop over all buffers instead of adding some kind of
> > DropPendingWritebacks() in the barrier handler.  I guess I could find
> > a way to improve that, basically checking the level more often instead
> > of at the end, but I don't know if it's worth it; we're still throwing
> > out an arbitrary percentage of writeback requests.
>
> Doesn't every backend have its own set of pending writebacks?
> BufferAlloc() calls
> ScheduleBufferTagForWriteback(&BackendWritebackContext, ...)?

Yeah.  Alright, for my exaggeration above, s/can't arise/probably
won't often arise/, since when regular backends do this it's for
random dirty buffers, not necessarily stuff they hold a relation lock
on.

I think you are right that if we find ourselves calling smgrwrite() on
another buffer for that relfilenode a bit later, then it's safe to
"unzonk" the relation.  In the worst case, IssuePendingWritebacks()
might finish up calling sync_file_range() on a file that we originally
wrote to for generation 1 of that relfilenode, even though we now have
a file descriptor for generation 2 of that relfilenode that was
reopened by a later smgrwrite().  That would be acceptable, because a
bogus sync_file_range() call would be harmless.  The key point here is
that we absolutely must avoid re-opening files without careful
interlocking, because that could lead to later *writes* for generation
2 going to the defunct generation 1 file that we opened just as it was
being unlinked.

(For completeness:  we know of another way for smgrwrite() to write to
the wrong generation of a recycled relfilenode, but that's hopefully
very unlikely and will hopefully be addressed by adding more bits and
killing off OID-wraparound[1].  In this thread we're concerned only
with these weird "explicit OID reycling" cases we're trying to fix
with PROCSIGNAL_BARRIER_SMGRRELEASE sledgehammer-based cache
invalidation.)

My new attempt, attached, is closer to what Andres proposed, except at
the level of md.c segment objects instead of level of SMgrRelation
objects.  This avoids the dangling SMgrRelation pointer problem
discussed earlier, and is also a little like your "zonk" idea, except
instead of a zonk flag, the SMgrRelation object is reset to a state
where it knows nothing at all except its relfilenode.  Any access
through smgrXXX() functions *except* smgrwriteback() will rebuild the
state, just as you would clear the hypothetical zonk flag.

So, to summarise the new patch that I'm attaching to this email as 0001:

1.  When PROCSIGNAL_BARRIER_SMGRRELEASE is handled, we call
smgrreleaseall(), which calls smgrrelease() on all SMgrRelation
objects, telling them to close all their files.

2.  All SMgrRelation objects are still valid, and any pointers to them
that were acquired before a CFI() and then used in an smgrXXX() call
afterwards will still work (the example I know of being
RelationCopyStorage()[2]).

3.  mdwriteback() internally uses a new "behavior" flag
EXTENSION_DONT_OPEN when getting its hands on the internal segment
object, which says that we should just give up immediately if we don't
already have the file open (concretely: if
PROCSIGNAL_BARRIER_SMGRRELEASE came along between our recent
smgrwrite() call and the writeback machinery's smgrwriteback() call,
it'll do nothing at all).

Someone might say that it's weird that smgrwriteback() has that
behaviour internally, and we might eventually want to make it more
explicit by adding a "behavior" argument to the function itself, so
that it's the caller that controls it.  It didn't seem worth it for
now though; the whole thing is a hint to the operating system anyway.

However it seems that I have something wrong, because CI is failing on
Windows; I ran out of time for looking into that today, but wanted to
post what I have so far since I know we have an open item or two to
close here ASAP...

Patches 0002-0004 are Andres's, with minor tweaks:

v3-0002-Fix-old-fd-issues-using-global-barriers-everywher.patch:

It seems useless to even keep the macro USE_BARRIER_SMGRRELEASE if
we're going to define it always, so I removed it.  I guess it's useful
to be able to disable that logic easily to see that the assertion in
the other patch fails, but you can do that by commenting out a line in
ProcessBarrierSmgrRelease().

I'm still slightly confused about whether we need an *extra* global
barrier in DROP TABLESPACE, not just if destroy_tablespace_directory()
failed.

v3-0003-WIP-test-for-file-reuse-dangers-around-database-a.patch

Was missing:

-EXTRA_INSTALL=contrib/test_decoding
+EXTRA_INSTALL=contrib

[PATCH] Compression dictionaries for JSONB

2022-04-22 Thread Aleksander Alekseev
Hi hackers,

This is a follow-up thread to `RFC: compression dictionaries for JSONB`
[1]. I would like to share my current progress in order to get early
feedback. The patch is currently in a draft state but implements the basic
functionality. I did my best to account for all the great feedback I
previously got from Alvaro and Matthias.

Usage example:

```
CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');

SELECT '{"aaa":"bbb"}' ::  mydict;
 mydict

 {"aaa": "bbb"}

SELECT ('{"aaa":"bbb"}' ::  mydict) -> 'aaa';
 ?column?
--
 "bbb"
```

Here `mydict` works as a transparent replacement for `jsonb`. However, its
internal representation differs. The provided dictionary entries ('aaa',
'bbb') are stored in the new catalog table:

```
SELECT * FROM pg_dict;
  oid  | dicttypid | dictentry
---+---+---
 39476 | 39475 | aaa
 39477 | 39475 | bbb
(2 rows)
```

When `mydict` sees 'aaa' in the document, it replaces it with the
corresponding code, in this case - 39476. For more details regarding the
compression algorithm and choosen compromises please see the comments in
the patch.

In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as
TYPTYPE_BASE with only difference: corresponding `_in`
(pg_type.typinput) and `_` (pg_cast.castfunc)
procedures receive the dictionary Oid as a `typmod` argument. This way the
procedures can distinguish `mydict1` from `mydict2` and use the proper
compression dictionary.

The approach with alternative `typmod` role is arguably a bit hacky, but it
was the less invasive way to implement the feature I've found. I'm open to
alternative suggestions.

Current limitations (todo):
- ALTER TYPE is not implemented
- Tests and documentation are missing
- Autocomplete is missing

Future work (out of scope of this patch):
- Support types other than JSONB: TEXT, XML, etc
- Automatically updated dictionaries, e.g. during VACUUM
- Alternative compression algorithms. Note that this will not require any
further changes in the catalog, only the values we write to pg_type and
pg_cast will differ.

Open questions:
- Dictionary entries are currently stored as NameData, the same type that
is used for enums. Are we OK with the accompanying limitations? Any
alternative suggestions?
- All in all, am I moving the right direction?

Your feedback is very much welcomed!

[1]:
https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev


v1-0001-CREATE-TYPE-foo-AS-DICTIONARY-OF-JSONB.patch
Description: Binary data


Re: RFC: compression dictionaries for JSONB

2022-04-22 Thread Aleksander Alekseev
Hi hackers,

Many thanks for all your great feedback!

Please see the follow-up thread '[PATCH] Compression dictionaries for JSONB':

https://postgr.es/m/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: Estimating HugePages Requirements?

2022-04-22 Thread Magnus Hagander
On Wed, Apr 20, 2022, 00:12 Michael Paquier  wrote:

> On Thu, Mar 24, 2022 at 02:07:26PM +0100, Magnus Hagander wrote:
> > But neither would the suggestion of redirecting stderr to /dev/null.
> > In fact, doing the redirect it will *also* throw away any FATAL that
> > happens. In fact, using the 2>/dev/null method, we *also* remove the
> > message that says there's another postmaster running in this
> > directory, which is strictly worse than the override of
> > log_min_messages.
>
> Well, we could also tweak more the command with a redirection of
> stderr to a log file or such, and tell to look at it for errors.
>

That's would be a pretty terrible ux though.



> > That said, the redirect can be removed without recompiling postgres,
> > so it is probably still hte better choice as a temporary workaround.
> > But we should really look into getting a better solution in place once
> > we start on 16.
>
> But do we really need a better or more invasive solution for already
> running servers though?  A SHOW command would be able to do the work
> already in this case.  This would lack consistency compared to the
> offline case, but we are not without option either.  That leaves the
> case where the server is running, has allocated memory but is not
> ready to accept connections, like crash recovery, still this use case
> looks rather thin to me.



I agree that thats a very narrow use case. And I'm nog sure the use case of
a running server is even that important here - it's really the offline one
that's important. Or rather, the really compelling one is when there is a
server running but I want to check the value offline because it will
change. SHOW doesn't help there because it shows the value based on the
currently running configuration, not the new one after a restart.

I don't agree that the redirect is a solution. It's a workaround.


>> My solution for the docs is perhaps too confusing for the end-user,
> >> and we are talking about a Linux-only thing here anyway.  So, at the
> >> end, I am tempted to just add the "2> /dev/null" as suggested upthread
> >> by Nathan and call it a day.  Does that sound fine?
> >
> > What would be a linux only thing?
>
> Perhaps not at some point in the future.  Now that's under a section
> of the docs only for Linux.
>


Hmm. So what's the solution on windows? I guess maybe it's not as important
there because there is no limit on huge pages, but generally getting the
expected shared memory usage might be useful? Just significantly less
important.

/Magnus


Re: BufferAlloc: don't take two simultaneous locks

2022-04-22 Thread Yura Sokolov
Btw, I've runned tests on EPYC (80 cores).

1 key per select
  conns | master |  patch-v11 |  master 1G | patch-v11 1G 
++++
  1 |  29053 |  28959 |  26715 |  25631 
  2 |  53714 |  53002 |  55211 |  53699 
  3 |  69796 |  72100 |  72355 |  71164 
  5 | 118045 | 112066 | 122182 | 119825 
  7 | 151933 | 156298 | 162001 | 160834 
 17 | 344594 | 347809 | 390103 | 386676 
 27 | 497656 | 527313 | 587806 | 598450 
 53 | 732524 | 853831 | 906569 | 947050 
 83 | 823203 | 991415 |1056884 |1222530 
107 | 812730 | 930175 |1004765 |1232307 
139 | 781757 | 938718 | 995326 |1196653 
163 | 758991 | 969781 | 990644 |1143724 
191 | 774137 | 977633 | 996763 |1210899 
211 | 771856 | 973361 |1024798 |1187824 
239 | 756925 | 940808 | 954326 |1165303 
271 | 756220 | 940508 | 970254 |1198773 
307 | 746784 | 941038 | 940369 |1159446 
353 | 710578 | 928296 | 923437 |1189575 
397 | 715352 | 915931 | 911638 |1180688 

3 keys per select

  conns | master |  patch-v11 |  master 1G | patch-v11 1G 
++++
  1 |  17448 |  17104 |  18359 |  19077 
  2 |  30888 |  31650 |  35074 |  35861 
  3 |  44653 |  43371 |  47814 |  47360 
  5 |  69632 |  64454 |  76695 |  76208 
  7 |  96385 |  92526 | 107587 | 107930 
 17 | 195157 | 205156 | 253440 | 239740 
 27 | 302343 | 316768 | 386748 | 335148 
 53 | 334321 | 396359 | 402506 | 486341 
 83 | 300439 | 374483 | 408694 | 452731 
107 | 302768 | 369207 | 390599 | 453817 
139 | 294783 | 364885 | 379332 | 459884 
163 | 272646 | 344643 | 376629 | 460839 
191 | 282307 | 334016 | 363322 | 449928 
211 | 275123 | 321337 | 371023 | 445246 
239 | 263072 | 341064 | 356720 | 441250 
271 | 271506 | 333066 | 373994 | 436481 
307 | 261545 | 333489 | 348569 | 466673 
353 | 255700 | 331344 | 333792 | 455430 
397 | 247745 | 325712 | 326680 | 439245 




Re: [Proposal] vacuumdb --schema only

2022-04-22 Thread Gilles Darold

Le 20/04/2022 à 19:38, Nathan Bossart a écrit :

Thanks for the new patch!  I think this is on the right track.

On Wed, Apr 20, 2022 at 05:15:02PM +0200, Gilles Darold wrote:

Le 18/04/2022 à 23:56, Nathan Bossart a écrit :

-   if (!tables_listed)
+   if (!objects_listed || objfilter == OBJFILTER_SCHEMA)

Do we need to check for objects_listed here?  IIUC we can just check for
objfilter != OBJFILTER_TABLE.

Yes we need it otherwise test 'vacuumdb with view' fail because we are not
trying to vacuum the view so the PG doesn't report:

     WARNING:  cannot vacuum non-tables or special system tables

My point is that the only time we don't want to filter for relevant
relation types is when the user provides a list of tables.  So my
suggestion would be to simplify this to the following:

if (objfilter != OBJFILTER_TABLE)
{
appendPQExpBufferStr(...);
has_where = true;
}



Right, I must have gotten mixed up in the test results. Fixed.



Unless I'm missing something, schema_is_exclude appears to only be used for
error checking and doesn't impact the generated catalog query.  It looks
like the relevant logic disappeared after v4 of the patch.

Right, removed.

I don't think -N works at the moment.  I tested it out, and vacuumdb was
still processing tables in schemas I excluded.  Can we add a test case for
this, too?



Fixed and regression tests added as well as some others to test the 
filter options compatibility.




+/*
+ * Verify that the filters used at command line are compatible
+ */
+void
+check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter curr_option)
+{
+   switch (curr_option)
+   {
+   case OBJFILTER_NONE:
+   break;
+   case OBJFILTER_DATABASE:
+   /* When filtering on database name, vacuum on all 
database is not allowed. */
+   if (curr_objfilter == OBJFILTER_ALL)
+   pg_fatal("cannot vacuum all databases and a specific 
one at the same time");
+   break;
[...]
+   }
+}
I don't think this handles all combinations.  For example, the following
command does not fail:

vacuumdb -a -N test postgres



Right, I have fix them all in this new patch.



Furthermore, do you think it'd be possible to dynamically generate the
message?  If it doesn't add too much complexity, this might be a nice way
to simplify this function.



I have tried to avoid reusing the same error message several time by 
using a new enum and function filter_error(). I also use the same 
messages with --schema and --exclude-schema related errors.



Patch v10 attached.


--
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   
+
+ 
+  -N
+  --exclude-schema
+ 
+ schema
+
+   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='foo' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..9ef5c789e0 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,38 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".bar/,
+	'vacuu

Re: Proposal for internal Numeric to Uint64 conversion function.

2022-04-22 Thread Amul Sul
On Fri, Mar 18, 2022 at 1:17 AM Greg Stark  wrote:
>
> On Fri, 11 Mar 2022 at 15:17, Tom Lane  wrote:
> >
> > Amul Sul  writes:
>
> > >  Yeah, that's true, I am too not sure if we really need to refactor
> > >  all those; If we want, I can give it a try.
> >
> > The patch as-presented isn't very compelling for
> > lack of callers of the new function
>
> Tom, are you saying you think we're not interested in just adding this
> function unless it's part of this refactoring?
>
> Amul, do you think if we did numeric_to_int64/numeric_to_uint64 as a
> refactored API and a second patch that made numeric_pg_lsn and other
> consumers use it it would clean up the code significantly?

Sorry for the long absence.

Yes, I think we can do cleanup to some extent.  Attaching the
following patches that first intend to remove DirectFunctionCall as
much as possible:

0001:
Refactors and moves numeric_pg_lsn to pg_lsn.c file. Function gut is
in numeric.c as numeric_to_uint64_type() generalised function that can
be used elsewhere too.

0002:
Does little more cleanup to pg_lsn.c file -- replace few
DirectFunctionCall1() by the numeric_to_uint64_type().

0003:
Refactors numeric_int8() function and replace few
DirectFunctionCall1() to numeric_int8 by the newly added
numeric_to_int64() and numeric_to_int64_type().
numeric_to_int64_type() version can be called only when you want to
refer to the specific type name in the error message like
numeric_to_uint64_type, e.g.MONEY type.

0004:
Adding float8_to_numeric and numeric_to_float8 by refactoring
float8_numeric and numeric_float8 respectively. I am a bit confused
about whether the type should be referred to as float8 or double.
Replaces a few DirectFunctionCall() calls by these c functions.

0005:
This one replaces all DirectFunctionCall needed for the numeric
arithmetic operations.

Regards,
Amul
From fd5e1ccffe2c1aa1c619fb0f31389f35fe554e5b Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Fri, 22 Apr 2022 06:17:51 -0400
Subject: [PATCH v2 5/5] Call numeric arithmetic functions directly

---
 contrib/btree_gist/btree_numeric.c  | 17 +++--
 src/backend/access/brin/brin_minmax_multi.c | 14 
 src/backend/utils/adt/cash.c| 15 
 src/backend/utils/adt/dbsize.c  | 33 +
 src/backend/utils/adt/formatting.c  | 15 
 src/backend/utils/adt/jsonb.c   |  7 ++--
 src/backend/utils/adt/numeric.c | 39 +++--
 src/backend/utils/adt/pg_lsn.c  | 16 -
 src/backend/utils/adt/rangetypes.c  | 10 +++---
 9 files changed, 80 insertions(+), 86 deletions(-)

diff --git a/contrib/btree_gist/btree_numeric.c b/contrib/btree_gist/btree_numeric.c
index 35e466cdd94..6c50cfa41f2 100644
--- a/contrib/btree_gist/btree_numeric.c
+++ b/contrib/btree_gist/btree_numeric.c
@@ -174,17 +174,10 @@ gbt_numeric_penalty(PG_FUNCTION_ARGS)
 	ok = gbt_var_key_readable(org);
 	uk = gbt_var_key_readable((GBT_VARKEY *) DatumGetPointer(uni));
 
-	us = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
-			 PointerGetDatum(uk.upper),
-			 PointerGetDatum(uk.lower)));
+	us = numeric_sub_opt_error((Numeric) uk.upper, (Numeric) uk.lower, NULL);
+	os = numeric_sub_opt_error((Numeric) ok.upper, (Numeric) ok.lower, NULL);
 
-	os = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
-			 PointerGetDatum(ok.upper),
-			 PointerGetDatum(ok.lower)));
-
-	ds = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
-			 NumericGetDatum(us),
-			 NumericGetDatum(os)));
+	ds = numeric_sub_opt_error(us, os, NULL);
 
 	if (numeric_is_nan(us))
 	{
@@ -202,9 +195,7 @@ gbt_numeric_penalty(PG_FUNCTION_ARGS)
 		if (DirectFunctionCall2(numeric_gt, NumericGetDatum(ds), NumericGetDatum(nul)))
 		{
 			*result += FLT_MIN;
-			os = DatumGetNumeric(DirectFunctionCall2(numeric_div,
-	 NumericGetDatum(ds),
-	 NumericGetDatum(us)));
+			os = numeric_div_opt_error(ds, us, NULL);
 			*result += (float4) DatumGetFloat8(DirectFunctionCall1(numeric_float8_no_overflow, NumericGetDatum(os)));
 		}
 	}
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 764efa381f2..c84a70d1ad9 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2008,19 +2008,21 @@ brin_minmax_multi_distance_tid(PG_FUNCTION_ARGS)
 Datum
 brin_minmax_multi_distance_numeric(PG_FUNCTION_ARGS)
 {
-	Datum		d;
-	Datum		a1 = PG_GETARG_DATUM(0);
-	Datum		a2 = PG_GETARG_DATUM(1);
+	Numeric		res;
+	Numeric		a1 = PG_GETARG_NUMERIC(0);
+	Numeric		a2 = PG_GETARG_NUMERIC(1);
 
 	/*
 	 * We know the values are range boundaries, but the range may be collapsed
 	 * (i.e. single points), with equal values.
 	 */
-	Assert(DatumGetBool(DirectFunctionCall2(numeric_le, a1, a2)));
+	Assert(DatumGetBool(DirectFunctionCall2(numeric_le,
+			NumericGetDatum(a1),
+			NumericGetDatum(a2;
 
-	d 

Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)

2022-04-22 Thread Bharath Rupireddy
On Mon, Apr 18, 2022 at 8:48 PM Stephen Frost  wrote:
>
> Greeting,
>
> * Bharath Rupireddy (bharath.rupireddyforpostg...@gmail.com) wrote:
> > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost  wrote:
> > > * Bharath Rupireddy (bharath.rupireddyforpostg...@gmail.com) wrote:
> > > > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > > > basically deletes (optionally archiving before deletion) the unneeded
> > > > WAL files.
> > > >
> > > > Please provide your thoughts and review the patches.
> > >
> > > Alright, I spent some more time thinking about this and contemplating
> > > what the next steps are... and I feel like the next step is basically
> > > "add a HINT when the server can't start due to being out of disk space
> > > that one should consider running pg_walcleaner" and at that point... why
> > > aren't we just, uh, doing that?  This is all still quite hand-wavy, but
> > > it sure would be nice to be able to avoid downtime due to a broken
> > > archiving setup.  pgbackrest has a way of doing this and while we, of
> > > course, discourage the use of that option, as it means throwing away
> > > WAL, it's an option that users have.  PG could have a similar option.
> > > Basically, to archive_command/library what max_slot_wal_keep_size is for
> > > slots.
> >
> > Thanks. I get your point. The way I see it is that the postgres should
> > be self-aware of the about-to-get-full disk (probably when the data
> > directory size is 90%(configurable, of course) of total disk size) and
> > then freeze the new write operations (may be via new ALTER SYSTEM SET
> > READ-ONLY or setting default_transaction_read_only GUC) and then go
> > clean the unneeded WAL files by just invoking pg_walcleaner tool
> > perhaps. I think, so far, this kind of work has been done outside of
> > postgres. Even then, we might get into out-of-disk situations
> > depending on how frequently we check the data directory size to
> > compute the 90% configurable limit. Detecting the disk size is the KEY
> > here. Hence we need an offline invokable tool like pg_walcleaner.
>
> Ugh, last I checked, figuring out if a given filesystem is near being
> full is a pain to do in a cross-platform way.  Why not just do exactly
> what we already are doing for replication slots, but for
> archive_command?

Do you mean to say that if the archvie_command fails, say, for "some
time" or "some number of attempts", just let the server not bother
about it and checkpoint delete the WAL files instead of going out of
disk? If this is the thought, then it's more dangerous as we might end
up losing the WAL forever. For invalidating replication slots, it's
okay because the required WAL can exist somewhere (either on the
primary or on the archive location).

> > > That isn't to say that we shouldn't also have a tool like this, but it
> > > generally feels like we're taking a reactive approach here rather than a
> > > proactive one to addressing the root issue.
> >
> > Agree. The offline tool like pg_walcleaner can help greatly even with
> > some sort of above internal/external disk space monitoring tools.
>
> See, this seems like a really bad idea to me.  I'd be very concerned
> about people mis-using this tool in some way and automating its usage
> strikes me as absolutely exactly that..  Are we sure that we can
> guarantee that we don't remove things we shouldn't when this ends up
> getting run against a running cluster from someone's automated tooling?
> Or when someone sees that it refuses to run for $reason and tries to..
> "fix" that?  Seems quite risky to me..  I'd probably want to put similar
> caveats around using this tool as I do around pg_resetwal when doing
> training- that is, don't ever, ever, ever use this, heh.

The initial version of the patch doesn't check if the server crashed
or not before running it. I was thinking of looking at the
postmaster.pid or pg_control file (however they don't guarantee
whether the server is up or crashed because the server can crash
without deleting postmaster.pid or updating pg_control file). Another
idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see
if the server is up and running, if yes, exit, otherwise proceed with
its work.

Also, to not cause losing of WAL permanently, we must recommend using
archvie_command so that the WAL can be moved to an alternative
location (could be the same archvie_location that primary uses).

And yes, we must have clear usage guidelines in the docs.

Regards,
Bharath Rupireddy.




Re: Dump/Restore of non-default PKs

2022-04-22 Thread Peter Eisentraut

On 21.04.22 13:43, Simon Riggs wrote:

1. create the table without primary key
2. create the index
3. attach the index as primary key constraint

That doesn't sound attractive.

Can you explain what you find unattractive about it?


Well, if I want to create a table with a primary key, the established 
way is to say "primary key", not to have to assemble it from multiple 
pieces.


I think this case is very similar to exclusion constraints, which also 
have syntax to specify the index access method.





Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-04-22 Thread Bharath Rupireddy
On Tue, Apr 19, 2022 at 10:42 AM Michael Paquier  wrote:
>
> > I would like to know if there's any problem with the proposed fix.
>
> There is nothing done for the case of compressed segments, meaning
> that you would see the same problem when being in the middle of
> writing a segment compressed with gzip or lz4 in the middle of writing
> it, and that's what you want to avoid here.  So the important part is
> not the pre-padding, it is to make sure that there is enough space
> reserved for the handling of a full segment before beginning the work
> on it.

Right. We find enough disk space and go to write and suddenly the
write operations fail for some reason or the VM crashes because of a
reason other than disk space. I think the foolproof solution is to
figure out the available disk space before prepadding or compressing
and also use the
write-first-to-temp-file-and-then-rename-it-to-original-file as
proposed in the earlier patches in this thread.

Having said that, is there a portable way that we can find out the
disk space available? I read your response upthread that statvfs isn't
portable to WIN32 platforms. So, we just say that part of the fix we
proposed here (checking disk space before prepadding or compressing)
isn't supported on WIN32 and we just do the temp file thing for WIN32
alone?

Regards,
Bharath Rupireddy.




Re: How to generate a WAL record spanning multiple WAL files?

2022-04-22 Thread Bharath Rupireddy
On Wed, Apr 6, 2022 at 6:56 AM Andy Fan  wrote:
>
> On Wed, Apr 6, 2022 at 12:41 AM Robert Haas  wrote:
>>
>> On Tue, Apr 5, 2022 at 10:10 AM Andy Fan  wrote:
>> >> > I wanted to have a WAL record spanning multiple WAL files of size, say
>> >> > 16MB. I'm wondering if the Full Page Images (FPIs) of a TOAST table
>> >> > would help here. Please let me know if there's any way to generate
>> >> > such large WAL records.
>> >>
>> >> It's easier to use pg_logical_emit_message().
>> >
>> > Not sure I understand the question correctly here.  What if I use the 
>> > below code
>> > where the len might be very large?  like 64MB.
>> >
>> >  XLogBeginInsert();
>> > XLogRegisterData((char *)&xl_append, sizeof(xl_cstore_append));
>> > XLogRegisterData((char *)data, len);
>> >
>> > XLogInsert(..);
>>
>> Well, that's how to do it from C. And pg_logical_emit_message() is how
>> to do it from SQL.
>>
>
> OK, Thanks for your confirmation!

Thanks all for your responses. Yes, using pg_logical_emit_message() is
easy, but it might come in the way of logical decoding as those
messages get decoded.

PS: I wrote a small extension (just for fun) called pg_synthesize_wal
[1] implementing functions to generate huge WAL records. I used the
"Custom WAL Resource Managers" feature [2] that got committed to PG15.

[1] https://github.com/BRupireddy/pg_synthesize_wal
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5c279a6d350205cc98f91fb8e1d3e4442a6b25d1

Regards,
Bharath Rupireddy.




Re: Dump/Restore of non-default PKs

2022-04-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 21.04.22 13:43, Simon Riggs wrote:
>> Can you explain what you find unattractive about it?

> Well, if I want to create a table with a primary key, the established 
> way is to say "primary key", not to have to assemble it from multiple 
> pieces.

> I think this case is very similar to exclusion constraints, which also 
> have syntax to specify the index access method.

That analogy would be compelling if exclusion constraints were a
SQL-standard feature; but they aren't so their clause syntax is
fully under our control.  The scenario that worries me is that
somewhere down the pike, the SQL committee might extend the
syntax of PKEY/UNIQUE constraint clauses in a way that breaks
our nonstandard extensions of them.

However, independently of whether we offer a syntax option or not,
it may still simplify pg_dump to make it treat the constraint and
the index as independent objects in all cases.

regards, tom lane




Re: why pg_walfile_name() cannot be executed during recovery?

2022-04-22 Thread Bharath Rupireddy
On Sat, Apr 9, 2022 at 10:21 PM Robert Haas  wrote:
>
> On Sat, Apr 9, 2022 at 12:25 PM Andrey Borodin  wrote:
> > Please excuse me if I'm not attentive enough. I've read this thread. And I 
> > could not find what is the problem that you are solving. What is the 
> > purpose of the WAL file name you want to obtain?
>
> Yeah, I'd also like to know this.

IMO, uses of pg_walfile_{name, name_offset} are plenty. Say, I have
LSNs (say, flush, insert, replayed or WAL receiver latest received)
and I would like to know the WAL file name and offset in an app
connecting to postgres or a control plane either for doing some
reporting or figuring out whether a WAL file exists given an LSN or
for some other reason. With these functions restricted when the server
is in recovery mode, the apps or control plane code can't use them and
they have to do if (!pg_is_in_recovery()) {select * from
pg_walfile_{name, name_offset}.

Am I missing any other important use-cases?

Regards,
Bharath Rupireddy.




Re: Fix NULL pointer reference in _outPathTarget()

2022-04-22 Thread Peter Eisentraut



On 20.04.22 18:53, Tom Lane wrote:

I think we could put the if (node->fldname) inside the WRITE_INDEX_ARRAY
macro.


Yeah, that's another way to do it.  I think though that the unresolved
question is whether or not we want the field name to appear in the output
when the field is null.  I believe that I intentionally made it not appear
originally, so that that case could readily be distinguished.  You could
argue that that would complicate life greatly for a _readPathTarget()
function, which is true, but I don't foresee that we'll need one.


We could adapt the convention to print NULL values as "<>", like

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6a02f81ad5..4eb5be3787 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -127,8 +127,11 @@ static void outChar(StringInfo str, char c);
 #define WRITE_INDEX_ARRAY(fldname, len) \
do { \
appendStringInfoString(str, " :" CppAsString(fldname) " "); \
-   for (int i = 0; i < len; i++) \
-   appendStringInfo(str, " %u", node->fldname[i]); \
+   if (node->fldname) \
+   for (int i = 0; i < len; i++) \
+   appendStringInfo(str, " %u", node->fldname[i]); \
+   else \
+   appendStringInfoString(str, "<>"); \
} while(0)

 #define WRITE_INT_ARRAY(fldname, len) \

There is currently no read function for this that would need to be 
changed.  But looking at peers such as WRITE_INT_ARRAY/READ_INT_ARRAY it 
shouldn't be hard to sort out if it became necessary.





Re: Fix NULL pointer reference in _outPathTarget()

2022-04-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 20.04.22 18:53, Tom Lane wrote:
>> Yeah, that's another way to do it.  I think though that the unresolved
>> question is whether or not we want the field name to appear in the output
>> when the field is null.  I believe that I intentionally made it not appear
>> originally, so that that case could readily be distinguished.  You could
>> argue that that would complicate life greatly for a _readPathTarget()
>> function, which is true, but I don't foresee that we'll need one.

> We could adapt the convention to print NULL values as "<>", like

Works for me.

regards, tom lane




Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?

2022-04-22 Thread Bharath Rupireddy
On Sat, Apr 9, 2022 at 6:38 PM Julien Rouhaud  wrote:
>
> On Sat, Apr 09, 2022 at 02:38:50PM +0530, Bharath Rupireddy wrote:
> > On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM
> >  wrote:
> > >
> > >> >  wrote:
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > I'm thinking if there's a way in core postgres to achieve $subject. 
> > >> > > In
> > >> > > reality, the sync/async standbys can either be closer/farther (which
> > >> > > means sync/async standbys can receive WAL at different times) to
> > >> > > primary, especially in cloud HA environments with primary in one
> > >> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
> > >> > > $subject may not be possible on dev systems (say, for testing some HA
> > >> > > features) unless we can inject a delay in WAL senders before sending
> > >> > > WAL.
> > >
> > > Simulation will be helpful even for end customers to simulate faults in 
> > > the
> > > production environments during availability zone/disaster recovery drills.
> >
> > Right.
>
> I'm not sure that's actually helpful.  If you want to do some realistic 
> testing
> you need to fully simulate various network incidents and only delaying 
> postgres
> replication is never going to be close to that.  You should instead rely on
> tool like tc, which can do much more than what $subject could ever do, and do
> that for all your HA stack.  At the very least you don't want to validate that
> your setup is working as excpected by just simulating a faulty postgres
> replication connection but still having all your clients and HA agent not
> having any network issue at all.

Agree that the external networking tools and commands can be used.
IMHO, not everyone is familiar with those tools and the tools may not
be portable and reliable all the time. And developers may not be able
to use those tools to test some of the HA related features (which may
require sync and async standbys being closer/farther to the primary)
that I or some other postgres HA solution providers may develop.
Having a reliable way within the core would actually help.

Upon thinking further, how about we have hooks in WAL sender code
(perhaps with replication slot info that it manages and some other
info) and one can implement an extension of their choice (similar to
auth_delay and ClientAuthentication_hook)?

Regards,
Bharath Rupireddy.




Cryptohash OpenSSL error queue in FIPS enabled builds

2022-04-22 Thread Daniel Gustafsson
In trying out an OpenSSL 3.1 build with FIPS enabled I realized that our
cryptohash code had a small issue.  Calling a banned cipher generated two
different error messages interleaved:

  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: unsupported
  postgres=# select md5('foo');
  ERROR:  could not compute MD5 hash: initialization error

It turns out that OpenSSL places two errors in the queue for this operation,
and we only consume one without clearing the queue in between, so we grab an
error from the previous run.

Consuming all (both) errors and creating a concatenated string seems overkill
as it would alter the API from a const error string to something that needs
freeing etc (also, very few OpenSSL consumers actually drain the queue, OpenSSL
themselves don't).  Skimming the OpenSSL code I was unable to find another
example of two errors generated.  The attached calls ERR_clear_error() as how
we do in libpq in order to avoid consuming earlier errors.

--
Daniel Gustafsson   https://vmware.com/



v1-0001-Clear-the-OpenSSL-error-queue-before-cryptohash-o.patch
Description: Binary data


Re: [PATCH] Compression dictionaries for JSONB

2022-04-22 Thread Zhihong Yu
On Fri, Apr 22, 2022 at 1:30 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi hackers,
>
> This is a follow-up thread to `RFC: compression dictionaries for JSONB`
> [1]. I would like to share my current progress in order to get early
> feedback. The patch is currently in a draft state but implements the basic
> functionality. I did my best to account for all the great feedback I
> previously got from Alvaro and Matthias.
>
> Usage example:
>
> ```
> CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');
>
> SELECT '{"aaa":"bbb"}' ::  mydict;
>  mydict
> 
>  {"aaa": "bbb"}
>
> SELECT ('{"aaa":"bbb"}' ::  mydict) -> 'aaa';
>  ?column?
> --
>  "bbb"
> ```
>
> Here `mydict` works as a transparent replacement for `jsonb`. However, its
> internal representation differs. The provided dictionary entries ('aaa',
> 'bbb') are stored in the new catalog table:
>
> ```
> SELECT * FROM pg_dict;
>   oid  | dicttypid | dictentry
> ---+---+---
>  39476 | 39475 | aaa
>  39477 | 39475 | bbb
> (2 rows)
> ```
>
> When `mydict` sees 'aaa' in the document, it replaces it with the
> corresponding code, in this case - 39476. For more details regarding the
> compression algorithm and choosen compromises please see the comments in
> the patch.
>
> In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as
> TYPTYPE_BASE with only difference: corresponding `_in`
> (pg_type.typinput) and `_` (pg_cast.castfunc)
> procedures receive the dictionary Oid as a `typmod` argument. This way the
> procedures can distinguish `mydict1` from `mydict2` and use the proper
> compression dictionary.
>
> The approach with alternative `typmod` role is arguably a bit hacky, but
> it was the less invasive way to implement the feature I've found. I'm open
> to alternative suggestions.
>
> Current limitations (todo):
> - ALTER TYPE is not implemented
> - Tests and documentation are missing
> - Autocomplete is missing
>
> Future work (out of scope of this patch):
> - Support types other than JSONB: TEXT, XML, etc
> - Automatically updated dictionaries, e.g. during VACUUM
> - Alternative compression algorithms. Note that this will not require any
> further changes in the catalog, only the values we write to pg_type and
> pg_cast will differ.
>
> Open questions:
> - Dictionary entries are currently stored as NameData, the same type that
> is used for enums. Are we OK with the accompanying limitations? Any
> alternative suggestions?
> - All in all, am I moving the right direction?
>
> Your feedback is very much welcomed!
>
> [1]:
> https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com
>
> --
> Best regards,
> Aleksander Alekseev
>
Hi,
For src/backend/catalog/pg_dict.c, please add license header.

+   elog(ERROR, "skipbytes > decoded_size - outoffset");

Include the values for skipbytes, decoded_size and outoffset.

Cheers


Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-22 Thread David G. Johnston
On Wed, Apr 20, 2022 at 11:38 PM bu...@sohu.com  wrote:

> > > for now fuction cost_subqueryscan always using *total* rows even
> parallel
> > > path. like this:
> > >
> > > Gather (rows=3)
> > >   Workers Planned: 2
> > >   ->  Subquery Scan  (rows=3) -- *total* rows, should be equal
> subpath
> > > ->  Parallel Seq Scan  (rows=1)
> >
> > OK, that's bad.
>

I don't understand how that plan shape is possible.  Gather requires a
parallel aware subpath, so said subpath can be executed multiple times in
parallel, and subquery isn't.  If there is parallelism happening within a
subquery the results are consolidated using Append or Gather first - and
the output rows of that path entry (all subpaths of Subquery have the same
->row value per set_subquery_size_estimates), become the input tuples for
Subquery, to which it then applies its selectivity multiplier and stores
the final result in baserel->rows; which the costing code then examines
when costing the RTE_SUBQUERY path entry.

David J.


Re: Skipping schema changes in publication

2022-04-22 Thread Bharath Rupireddy
On Tue, Mar 22, 2022 at 12:39 PM vignesh C  wrote:
>
> Hi,
>
> This feature adds an option to skip changes of all tables in specified
> schema while creating publication.
> This feature is helpful for use cases where the user wants to
> subscribe to all the changes except for the changes present in a few
> schemas.
> Ex:
> CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
> OR
> ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
>
> A new column pnskip is added to table "pg_publication_namespace", to
> maintain the schemas that the user wants to skip publishing through
> the publication. Modified the output plugin (pgoutput) to skip
> publishing the changes if the relation is part of skip schema
> publication.
> As a continuation to this, I will work on implementing skipping tables
> from all tables in schema and skipping tables from all tables
> publication.
>
> Attached patch has the implementation for this.
> This feature is for the pg16 version.
> Thoughts?

The feature seems to be useful especially when there are lots of
schemas in a database. However, I don't quite like the syntax. Do we
have 'SKIP' identifier in any of the SQL statements in SQL standard?
Can we think of adding skip_schema_list as an option, something like
below?

CREATE PUBLICATION foo FOR ALL TABLES (skip_schema_list = 's1, s2');
ALTER PUBLICATION foo SET (skip_schema_list = 's1, s2'); - to set
ALTER PUBLICATION foo SET (skip_schema_list = ''); - to reset

Regards,
Bharath Rupireddy.




Re: Handle infinite recursion in logical replication setup

2022-04-22 Thread vignesh C
On Tue, Apr 19, 2022 at 8:29 AM Peter Smith  wrote:
>
> I checked the latest v9-0001 patch. Below are my review comments.
>
> Other than these few trivial comments this 0001 patch looks good to me.
>
> ~~~
>
> 1. src/backend/replication/pgoutput/pgoutput.c - whitespace
>
> @@ -1696,6 +1714,10 @@ static bool
>  pgoutput_origin_filter(LogicalDecodingContext *ctx,
>  RepOriginId origin_id)
>  {
> + PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
> +
> + if (data->local_only && origin_id != InvalidRepOriginId)
> + return true;
>   return false;
>  }
>
> Suggest to add a blank line after the return true;

Modified

> ~~~
>
> 2. src/bin/psql/tab-complete.c - not alphabetical
>
> @@ -1874,7 +1874,7 @@ psql_completion(const char *text, int start, int end)
>   COMPLETE_WITH("(", "PUBLICATION");
>   /* ALTER SUBSCRIPTION  SET ( */
>   else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) &&
> TailMatches("SET", "("))
> - COMPLETE_WITH("binary", "slot_name", "streaming",
> "synchronous_commit", "disable_on_error");
> + COMPLETE_WITH("binary", "slot_name", "streaming", "local_only",
> "synchronous_commit", "disable_on_error");
>
> 2a. AFAIK the code intended that these options be listed in
> alphabetical order (I think the recent addition of disable_on_error is
> also wrong here). So "local_only" should be moved.

Modified

> @@ -3156,7 +3156,7 @@ psql_completion(const char *text, int start, int end)
>   /* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */
>   else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
>   COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> -   "enabled", "slot_name", "streaming",
> +   "enabled", "slot_name", "streaming", "local_only",
> "synchronous_commit", "two_phase", "disable_on_error");
>
> 2b. ditto

Modified

> ~~~
>
> 3. src/test/subscription/t/032_localonly.pl - wrong message
>
> +$node_C->wait_for_catchup($appname_B2);
> +$node_B->wait_for_catchup($appname_A);
> +
> +$result = $node_B->safe_psql('postgres', "SELECT * FROM tab_full;");
> +is( $result, qq(11
> +12
> +13),
> + 'Inserted successfully without leading to infinite recursion in
> circular replication setup'
> +);
> +
> +# check that the data published from node_C to node_B is not sent to node_A
> +$result = $node_A->safe_psql('postgres', "SELECT * FROM tab_full;");
> +is( $result, qq(11
> +12),
> + 'Inserted successfully without leading to infinite recursion in
> circular replication setup'
> +);
> +
>
> The new test looked good, but the cut/paste text message ('Inserted
> successfully without leading to infinite recursion in circular
> replication setup') maybe needs changing because there is nothing
> really "circular" about this test case.

Modified

Attached v10 patch has the changes for the same.

Regards,
Vignesh
From 20a6e0f29a2abc63c141bdf72930011358d7b2db Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Fri, 8 Apr 2022 11:10:05 +0530
Subject: [PATCH v10 1/2] Skip replication of non local data.

This patch adds a new SUBSCRIPTION boolean option
"local_only". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port='
PUBLICATION pub1 with (local_only = true);
---
 doc/src/sgml/ref/alter_subscription.sgml  |   5 +-
 doc/src/sgml/ref/create_subscription.sgml |  12 ++
 src/backend/catalog/pg_subscription.c |   1 +
 src/backend/catalog/system_views.sql  |   4 +-
 src/backend/commands/subscriptioncmds.c   |  26 ++-
 .../libpqwalreceiver/libpqwalreceiver.c   |   5 +
 src/backend/replication/logical/worker.c  |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  23 +++
 src/bin/pg_dump/pg_dump.c |  17 +-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/describe.c   |   8 +-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/pg_subscription.h |   3 +
 src/include/replication/logicalproto.h|  10 +-
 src/include/replication/pgoutput.h|   1 +
 src/include/replication/walreceiver.h |   1 +
 src/test/regress/expected/subscription.out| 142 ---
 src/test/regress/sql/subscription.sql |  10 ++
 src/test/subscription/t/032_localonly.pl  | 166 ++
 19 files changed, 369 insertions(+), 72 deletions(-)
 create mode 100644 src/test/subscription/t/032_localonly.pl

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 353ea5def2..c5ebcf5500 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -207,8 +207,9 @@ ALTER SUBSCRIPTION name RENAME TO <
   information.  The parameters that can be altered
   are slot_name,
   synchronous_commit,
-  binary, streaming, and
-  disable_on_error.
+  

Re: Handle infinite recursion in logical replication setup

2022-04-22 Thread vignesh C
On Wed, Apr 20, 2022 at 7:26 AM Peter Smith  wrote:
>
> Below are my review comments for the v9-0002 patch (except I did not
> yet look at the TAP tests).
>
> ~~~
>
> 1. General comment - describe.c
>
> I wondered why the copy_data enum value is not displayed by the psql
> \drs+ command. Should it be?
>

I noticed that we generally don't display the values for the options.
I did not add it to keep it consistent with other options.

> ~~~
>
> 2. General comment - SUBOPT_LOCAL_ONLY
>
> @@ -1134,7 +1204,7 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
>
>   case ALTER_SUBSCRIPTION_SET_PUBLICATION:
>   {
> - supported_opts = SUBOPT_COPY_DATA | SUBOPT_REFRESH;
> + supported_opts = SUBOPT_COPY_DATA | SUBOPT_REFRESH | SUBOPT_LOCAL_ONLY;
>   parse_subscription_options(pstate, stmt->options,
>  supported_opts, &opts);
>
> @@ -1236,7 +1308,8 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
>   errmsg("ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
> subscriptions")));
>
>   parse_subscription_options(pstate, stmt->options,
> -SUBOPT_COPY_DATA, &opts);
> +SUBOPT_COPY_DATA | SUBOPT_LOCAL_ONLY,
> +&opts);
>
> I noticed there is some new code that appears to be setting the
> SUBOT_LOCAL_ONLY as a supported option. Shouldn't those changes belong
> in the patch 0001 instead of in this patch 0002?

Modified

> ~~~
>
> 3. Commit message - wording
>
> CURRENT
> a) Node1 - Publication publishing employee table.
> b) Node2 - Subscription subscribing from publication pub1 with
> local_only.
> c) Node2 - Publication publishing employee table.
> d) Node1 - Subscription subscribing from publication pub2 with
> local_only.
>
> SUGGESTION (I think below has the same meaning but is simpler)
> a) Node1 - PUBLICATION pub1 for the employee table
> b) Node2 - SUBSCRIPTION from pub1 with local_only=true
> c) Node2 - PUBLICATION pub2 for the employee table
> d) Node1 - SUBSCRIPTION from pub2 with local_only=true

Modified

> ~~~
>
> 4. Commit message - meaning?
>
> CURRENT
> Now when user is trying to add another node Node3 to the
> above Multi master logical replication setup:
> a) user will have to create one subscription subscribing from Node1 to
> Node3
> b) user wil have to create another subscription subscribing from
> Node2 to Node3 using local_only option and copy_data as true.
>
> While the subscription is created, server will identify that Node2 is
> subscribing from Node1 and Node1 is subscribing from Node2 and throw an
> error so that user can handle the initial copy data.
>
> ~
>
> The wording above confused me. Can you clarify it?
> e.g.
> a) What exactly was the user hoping to achieve here?
> b) Are the user steps doing something deliberately wrong just so you
> can describe later that an error gets thrown?

Modified

> ~~~
>
> 5. doc/src/sgml/logical-replication.sgml - how to get here?
>
> I didn’t see any easy way to get to this page. (No cross refs from anywhere?)

I have added a link from create subscription page

> ~~~
>
> 6. doc/src/sgml/logical-replication.sgml - section levels
>
> I think the section levels may be a bit messed up. e.g. The HTML
> rendering of sections looks a bit confused. Maybe this is same as my
> review comment #13.

Modified

> ~~
>
> 7. doc/src/sgml/logical-replication.sgml - headings
>
> Setting Bidirection logical replication between two nodes:
>
> 7a. Maybe better just to have a simpler main heading like
> "Bidirectional logical replication".

Modified

> 7b. Don't put ":" in the title.

Modified

> ~~~
>
> 8. doc/src/sgml/logical-replication.sgml - missing intro
>
> IMO this page needs some sort of introduction/blurb instead of leaping
> straight into examples without any preamble text to give context.

Modified

> ~~~
>
> 9. doc/src/sgml/logical-replication.sgml - bullets
>
> Suggest removing all the bullets from the example steps. (I had
> something similar a while ago for the "Row Filter" page but
> eventually, they all had to be removed).

Modified

> ~~~
>
> 10. doc/src/sgml/logical-replication.sgml - SQL too long
>
> Many of the example commands are much too long, and for me, they are
> giving scroll bars when rendered. It would be better if they can be
> wrapped in appropriate places so easier to read (and no resulting
> scroll bars).

Modified

> ~~~
>
> 11. doc/src/sgml/logical-replication.sgml - add the psql prompt
>
> IMO it would also be easier to understand the examples if you show the
> psql prompt. Then you can easily know the node context without having
> to describe it in the text so often.
>
> e.g.
>
> +
> + Create the subscription in node2 to subscribe the changes from node1:
> +
> +CREATE SUBSCRIPTION sub_node1_node2 ...
>
> SUGGGESTION
> +
> + Create the subscription in node2 to subscribe the changes from node1
> +
> +node_2=# CREATE SUBSCRIPTION sub_node1_node2 ...

Modified

> ~~~
>
> 12. doc/src/sgml/logical-replication.sgml - typo
>
> +  
> +   Now the BiDirectional logical re

Re: Handle infinite recursion in logical replication setup

2022-04-22 Thread vignesh C
On Wed, Apr 20, 2022 at 11:19 AM Peter Smith  wrote:
>
> Below are my review comments for the v9-0002 patch (TAP test part only).
>
> (The order of my comments is a bit muddled because I jumped around in
> the file a bit while reviewing it).
>
> ==
>
> 1. create_subscription - missing comment.
>
> +sub create_subscription
> +{
> + my ($node_subscriber, $node_publisher, $sub_name, $node_connstr,
> + $application_name, $pub_name, $copy_data_val)
> +   = @_;
>
> Maybe add a comment for this subroutine and describe the expected parameters.

Added

> ~~
>
> 2. create_subscription - hides the options
>
> IMO the "create_subscription" subroutine is hiding too many of the
> details, so now it is less clear (from the caller's POV) what the test
> is really doing. Perhaps the options could be passed more explicitly
> to this subroutine.
>
> e.g. Instead of
>
> create_subscription($node_B, $node_A, $appname_B1, $node_A_connstr,
>$appname_B1, 'tap_pub_A', 'on');
>
> perhaps explicitly set the WITH options like:
>
> create_subscription($node_B, $node_A, $appname_B1, $node_A_connstr,
>$appname_B1, 'tap_pub_A', 'local_only = on, copy_data = on');

Modified

> ~~~
>
> 3. the application names are confusing
>
> +my $appname_A1 = 'tap_sub_A_B';
> +my $appname_A2 = 'tap_sub_A_C';
> +my $appname_B1 = 'tap_sub_B_A';
> +my $appname_B2 = 'tap_sub_B_C';
> +my $appname_C1 = 'tap_sub_C_A';
> +my $appname_C2 = 'tap_sub_C_B';
>
> I found the application names using '1' and '2' to be a bit confusing.
> i.e  it was unnecessarily hard to associate them (in my head) with
> their relevant subscriptions. IMO it would be easier to name them
> using letters like below:
>
> SUGGESTED
> my $appname_AB = 'tap_sub_A_B';
> my $appname_AC = 'tap_sub_A_C';
> my $appname_BA = 'tap_sub_B_A';
> my $appname_BC = 'tap_sub_B_C';
> my $appname_CA = 'tap_sub_C_A';
> my $appname_CB = 'tap_sub_C_B';

Removed appname and used subscription names for application name

> ~~~
>
> 4. create_subscription - passing the $appname 2x.
>
> +create_subscription($node_B, $node_A, $appname_B1, $node_A_connstr,
> +   $appname_B1, 'tap_pub_A', 'on');
>
>
> It seemed confusing that the $app_name is passed twice.
>
> IMO should rename all those $appname_XX vars (see previous comment) to
> be $subname_XX, and just pass that create_subscription instead.
>
> my $subname_AB = 'tap_sub_A_B';
> my $subname_AC = 'tap_sub_A_C';
> my $subname_BA = 'tap_sub_B_A';
> my $subname_BC = 'tap_sub_B_C';
> my $subname_CA = 'tap_sub_C_A';
> my $subname_CB = 'tap_sub_C_B';
>
> Then create_subscription subroutine should have one less argument.
> Just add a comment saying that the appname is always assigned the same
> value as the subname.

Modifeid

> ~~~
>
> 5. cleanup part - seems a bit misleading
>
> +# cleanup
> +$node_B->safe_psql(
> +   'postgres', "
> +DROP SUBSCRIPTION $appname_B2");
> +$node_C->safe_psql(
> + 'postgres', "
> +DELETE FROM tab_full");
> +$node_B->safe_psql(
> + 'postgres', "
> +DELETE FROM tab_full where a = 13");
>
> Is that comment misleading? IIUC this is not really cleaning up
> everything. It is just a cleanup of the previous Node_C test part
> isn't it?

Modified to clear the operations done by this test

> ~~~
>
> 6. Error case (when copy_data is true)
>
> +# Error when creating susbcription with local_only and copy_data as true when
> +# the publisher has replicated data
>
> 6a. Typo "susbcription"

Modified

> 6b. That comment maybe needs some more explanation - eg. say that
> since Node_A is already subscribing to Node_B so when Node_B makes
> another subscription to Node_A the copy doesn't really know if the
> data really originated from Node_A or not...

Slightly reworded and modified

> 6c. Maybe the comment needed to be more like ## style to
> denote this (and the one that follows) is a separate test case.

Modified

> ~~~
>
> 7. Error case (when copy_data is force)
>
> +# Creating subscription with local_only and copy_data as force should be
> +# successful when the publisher has replicated data
>
> 7a. Same typo "subscription"

I felt subscription is correct in this case, made no change for this

> ~~~
>
> 8. Add 3rd node when the existing node has some data
>
> 8a. Maybe that comment needs to be expanded more. This isn't just
> "adding a node" - you are joining it to the others as another
> bi-directional participant in a 3-way group. And the comment should
> clarify exactly which nodes have the initial data. Both the existing 2
> you are joining to? The new one only? All of them?

Modified

> 8b. IMO is would be much clearer to do SELECT from all the nodes at
> the start of this test just to re-confirm what is all the initial data
> on these nodes before joining the 3rd node.

Modified

> NOTE - These same review comments apply to the other test combinations too
> - Add 3rd node when the existing node has no data
> - Add 3rd node when the new node has some data

Modified

> ~~~
>
> 9. 

Re: Cryptohash OpenSSL error queue in FIPS enabled builds

2022-04-22 Thread Tom Lane
Daniel Gustafsson  writes:
> It turns out that OpenSSL places two errors in the queue for this operation,
> and we only consume one without clearing the queue in between, so we grab an
> error from the previous run.

Ugh.

> Consuming all (both) errors and creating a concatenated string seems overkill
> as it would alter the API from a const error string to something that needs
> freeing etc (also, very few OpenSSL consumers actually drain the queue, 
> OpenSSL
> themselves don't).  Skimming the OpenSSL code I was unable to find another
> example of two errors generated.  The attached calls ERR_clear_error() as how
> we do in libpq in order to avoid consuming earlier errors.

This seems quite messy.  How would clearing the queue *before* creating
the object improve matters?  It seems like that solution means you're
leaving an extra error in the queue to break unrelated code.  Wouldn't
it be better to clear after grabbing the error?  (Or maybe do both.)

Also, a comment seems advisable.

regards, tom lane




Why is EXECUTE granted to PUBLIC for all routines?

2022-04-22 Thread Jacek Trocinski
Hi,

The default behavior on Postgres is to grant EXECUTE to PUBLIC on any
function or procedure that is created.

I feel this this is a security concern, especially for procedures and
functions defined with the "SECURITY DEFINER" clause.

Normally, we don’t want everyone on the database to be able to run
procedures or function without explicitly granting them the privilege
to do so.

Is there any reason to keep grant EXECUTE to PUBLIC on routines as the default?

Best,
Jacek Trocinski




Re: Why is EXECUTE granted to PUBLIC for all routines?

2022-04-22 Thread Tom Lane
Jacek Trocinski  writes:
> The default behavior on Postgres is to grant EXECUTE to PUBLIC on any
> function or procedure that is created.

> I feel this this is a security concern, especially for procedures and
> functions defined with the "SECURITY DEFINER" clause.

There is zero security concern for non-SECURITY-DEFINER functions,
since they do nothing callers couldn't do for themselves.  For those,
you typically do want to grant out permissions.  As for SECURITY DEFINER
functions, there is no reason to make one unless it is meant to be called
by someone besides the owner.  Perhaps PUBLIC isn't the scope you want to
grant it to, but no-privileges wouldn't be a useful default there either.

In any case, changing this decision now would cause lots of problems,
such as breaking existing dump files.  We're unlikely to revisit it.

As noted in the docs, best practice is to adjust the permissions
as you want them in the same transaction that creates the function.

regards, tom lane




Re: Postgres perl module namespace

2022-04-22 Thread Andres Freund
On 2022-04-21 09:42:44 -0400, Andrew Dunstan wrote:
> On 2022-04-21 Th 00:11, Michael Paquier wrote:
> > On Wed, Apr 20, 2022 at 03:56:17PM -0400, Andrew Dunstan wrote:
> >> Basically I propose just to remove any mention of the Testlib items and
> >> get_free_port from the export and alias lists for versions where they
> >> are absent. If backpatchers need a function they can backport it if
> >> necessary.
> > Agreed.  I am fine to stick to that (I may have done that only once or
> > twice in the past years, so that does not happen a lot either IMO).
> > The patch in itself looks like an improvement in the right direction,
> > so +1 from me.

> Thanks, pushed.

Thanks for working on this!




Re: Building Postgres with lz4 on Visual Studio

2022-04-22 Thread Andres Freund
Hi,

Michael, Dilip, I think you worked most in this area? Based on
9ca40dcd4d0cad43d95a9a253fafaa9a9ba7de24

Robert, added you too, because zstd seems to have the same issue (based on the
tail of the quoted email below).

On 2022-04-13 17:21:41 +0300, Melih Mutlu wrote:
> I tried to build Postgres from source using Visual Studio 19. It worked all
> good.
> Then I wanted to build it with some dependencies, started with the ones
> listed here [1]. But I'm having some issues with lz4.
> 
> First I downloaded the latest release of lz4 from this link [2].
> Modified the src\tools\msvc\config.pl file as follows:
> 
> > use strict;
> > use warnings;
> >
> 
> 
> our $config;
> > $config->{"tap_tests"} = 1;
> > $config->{"asserts"} = 1;
> >
> 
> 
> $config->{"lz4"} = "c:/lz4/";
> > $config->{"openssl"} = "c:/openssl/1.1/";
> > $config->{"perl"} = "c:/strawberry/$ENV{DEFAULT_PERL_VERSION}/perl/";
> > $config->{"python"} = "c:/python/";
> >
> 
> 
> 1;
> 
> based on /src/tools/ci/windows_build_config.pl
> 
> Then ran the following commands:
> 
> > vcvarsall x64
> > perl src/tools/msvc/mkvcbuild.pl
> > msbuild  -m -verbosity:minimal /p:IncludePath="C:\lz4" pgsql.sln

I don't think the /p:IncludePath should be needed, the build scripts should
add that.


> msbuild command fails with the following error:
> "LINK : fatal error LNK1181: cannot open input file
> 'c:\lz4\\lib\liblz4.lib' [C:\postgres\libpgtypes.vcxproj]"
> 
> What I realized is that c:\lz4\lib\liblz4.lib actually does not exist.
> The latest versions of lz4, downloaded from [2], do not contain \liblz4.lib
> anymore, as opposed to what's written here [3]. Also there isn't a lib
> folder too.
> 
> After those changes on lz4 side, AFAIU there seems like this line adds
> library from wrong path in Solution.pm file [4].
> 
> > $proj->AddIncludeDir($self->{options}->{lz4} . '\include');
> > $proj->AddLibrary($self->{options}->{lz4} . '\lib\liblz4.lib');
> 
> 
> Even if I spent some time on this problem and tried to fix some places, I'm
> not able to run a successful build yet.
> This is also the case for zstd too. Enabling zstd gives the same error.
> 
> Has anyone had this issue before? Is this something that anyone is aware of
> and somehow made it work?
> I would appreciate any comment/suggestion etc.

Greetings,

Andres Freund




Re: pgsql: Allow db.schema.table patterns, but complain about random garbag

2022-04-22 Thread Robert Haas
On Fri, Apr 22, 2022 at 10:24 AM Andrew Dunstan  wrote:
> On 2022-04-22 Fr 10:04, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> This has upset the buildfarm's msys2 animals. There appears to be some
> >> wildcard expansion going on that causes the problem. I don't know why it
> >> should here when it's not causing trouble elsewhere. I have tried
> >> changing the way the tests are quoted, without success. Likewise,
> >> setting SHELLOPTS=noglob didn't work.
> >> At this stage I'm fresh out of ideas to fix it. It's also quite possible
> >> that my diagnosis is wrong.
> > When I was looking at this patch, I thought the number of test cases
> > was very substantially out of line anyway.  I suggest that rather
> > than investing a bunch of brain cells trying to work around this,
> > we just remove the failing test cases.
>
> WFM.

Sure, see also 
http://postgr.es/m/CA+TgmoYRGUcFBy6VgN0+Pn4f6Wv=2h0hzluphqsy6vc8ba7...@mail.gmail.com
where Andrew's opinion on how to fix this was sought.

I have to say the fact that IPC::Run does shell-glob expansion of its
arguments on some machines and not others seems ludicrous to me. This
patch may be overtested, but such a radical behavior difference is
completely nuts. How is anyone supposed to write reliable tests for
any feature in the face of such wildly inconsistent behavior?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgsql: Allow db.schema.table patterns, but complain about random garbag

2022-04-22 Thread Thomas Munro
On Sat, Apr 23, 2022 at 8:06 AM Robert Haas  wrote:
> Sure, see also 
> http://postgr.es/m/CA+TgmoYRGUcFBy6VgN0+Pn4f6Wv=2h0hzluphqsy6vc8ba7...@mail.gmail.com
> where Andrew's opinion on how to fix this was sought.
>
> I have to say the fact that IPC::Run does shell-glob expansion of its
> arguments on some machines and not others seems ludicrous to me. This
> patch may be overtested, but such a radical behavior difference is
> completely nuts. How is anyone supposed to write reliable tests for
> any feature in the face of such wildly inconsistent behavior?

Yeah, I was speculating that it's a bug in IPC::Run that has been
fixed (by our very own Noah), and some of the machines are still
running the buggy version.

(Not a Windows person, but I speculate the reason that such a stupid
bug is even possible may be that Windows lacks a way to 'exec' stuff
with a passed-in unadulterated argv[] array, so you always need to
build a full shell command subject to interpolation, so if you're
trying to emulate an argv[]-style interface you have to write the code
to do the escaping, and so everyone gets a chance to screw that up.)




[PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-04-22 Thread David Christensen
Hi -hackers,

Enclosed is a patch to allow extraction/saving of FPI from the WAL
stream via pg_waldump.

Description from the commit:

Extracts full-page images from the WAL stream into a target directory,
which must be empty or not
exist.  These images are subject to the same filtering rules as normal
display in pg_waldump, which
means that you can isolate the full page writes to a target relation,
among other things.

Files are saved with the filename:  with
formatting to make things
somewhat sortable; for instance:

-01C0.1663.1.6117.0
-01000150.1664.0.6115.0
-010001E0.1664.0.6114.0
-01000270.1663.1.6116.0
-01000300.1663.1.6113.0
-01000390.1663.1.6112.0
-01000420.1663.1.8903.0
-010004B0.1663.1.8902.0
-01000540.1663.1.6111.0
-010005D0.1663.1.6110.0

It's noteworthy that the raw images do not have the current LSN stored
with them in the WAL
stream (as would be true for on-heap versions of the blocks), nor
would the checksum be valid in
them (though WAL itself has checksums, so there is some protection
there).  This patch chooses to
place the LSN and calculate the proper checksum (if non-zero in the
source image) in the outputted
block.  (This could perhaps be a targetted flag if we decide we don't
always want this.)

These images could be loaded/inspected via `pg_read_binary_file()` and
used in the `pageinspect`
suite of tools to perform detailed analysis on the pages in question,
based on historical
information, and may come in handy for forensics work.

Best,

David


v1-pg_waldump-save-fpi.patch
Description: Binary data


Re: pgsql: Allow db.schema.table patterns, but complain about random garbag

2022-04-22 Thread Noah Misch
On Sat, Apr 23, 2022 at 09:12:20AM +1200, Thomas Munro wrote:
> On Sat, Apr 23, 2022 at 8:06 AM Robert Haas  wrote:
> > I have to say the fact that IPC::Run does shell-glob expansion of its
> > arguments on some machines and not others seems ludicrous to me. This
> > patch may be overtested, but such a radical behavior difference is
> > completely nuts. How is anyone supposed to write reliable tests for
> > any feature in the face of such wildly inconsistent behavior?

The MinGW gcc crt*.o files do shell-glob expansion on the arguments before
entering main().  See https://google.com/search?q=mingw+command+line+glob for
various discussion of that behavior.  I suspect you experienced that, not any
IPC::Run behavior.  (I haven't tested, though.)  Commit 11e9caf likely had the
same cause, though the commit message attributed it to the msys shell rather
than to crt*.o.

Let's disable that MinGW compiler behavior.
https://willus.com/mingw/_globbing.shtml lists two ways of achieving that.

> Yeah, I was speculating that it's a bug in IPC::Run that has been
> fixed (by our very own Noah), and some of the machines are still
> running the buggy version.

That change affected arguments containing double quote characters, but mere
asterisks shouldn't need it.  It also has not yet been released, so few or no
buildfarm machines are using it.

> (Not a Windows person, but I speculate the reason that such a stupid
> bug is even possible may be that Windows lacks a way to 'exec' stuff
> with a passed-in unadulterated argv[] array, so you always need to
> build a full shell command subject to interpolation, so if you're
> trying to emulate an argv[]-style interface you have to write the code
> to do the escaping, and so everyone gets a chance to screw that up.)

You needn't involve any shell.  Other than that, your description pretty much
says it.  CreateProcessA() is the Windows counterpart of execve().  There's no
argv[] array, just a single string.  (The following trivia does not affect
PostgreSQL.)  Worse, while there's a typical way to convert that string to
argv[] at program start, some programs do it differently.  You need to know
your callee in order to construct the string:
https://github.com/toddr/IPC-Run/pull/148/commits/c299a86c9a292375fbfc39fb756883c80adac4b0#diff-5833a343d19ba684779743e2c90516fc65479609274731785364d2d2b49e2211





Re: [Proposal] vacuumdb --schema only

2022-04-22 Thread Nathan Bossart
On Fri, Apr 22, 2022 at 11:57:05AM +0200, Gilles Darold wrote:
> Patch v10 attached.

Thanks!  I've attached a v11 with some minor editorialization.  I think I
was able to improve the error handling for invalid combinations of
command-line options a bit, but please let me know what you think.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From c180a56055bb9c61c4bb81d586aeb99241596457 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 22 Apr 2022 22:34:38 -0700
Subject: [PATCH v11 1/1] Add --schema and --exclude-schema options to
 vacuumdb.

These two new options can be used to either process all tables in
specific schemas or to skip processing all tables in specific
schemas.  This change also refactors the handling of invalid
combinations of command-line options to a new helper function.

Author: Gilles Darold
Reviewed-by: Justin Pryzby, Nathan Bossart
Discussion: https://postgr.es/m/929fbf3c-24b8-d454-811f-1d5898ab3e91%40migops.com
---
 doc/src/sgml/ref/vacuumdb.sgml|  66 ++
 src/bin/scripts/t/100_vacuumdb.pl |  35 ++
 src/bin/scripts/t/101_vacuumdb_all.pl |   3 +
 src/bin/scripts/vacuumdb.c| 170 +++---
 4 files changed, 233 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..841aced3bd 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   
+
+ 
+  -N
+  --exclude-schema
+ 
+ schema
+
+   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in
+schema only.  Multiple
+schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Do not clean or analyze any tables in
+schema.  Multiple schemas
+can be excluded by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +677,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the foo and bar schemas
+in a database named xyzzy:
+
+$ vacuumdb --schema='foo' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..e4aac53249 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,38 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".bar/,
+	'vacuumdb --schema schema only');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--exclude-schema', '"Foo"', 'postgres' ],
+	qr/(?:(?!VACUUM "Foo".bar).)*/,
+	'vacuumdb --exclude-schema schema');
+$node->command_fails(
+	[ 'vacuumdb', '-n', 'pg_catalog', '-t', 'pg_class', 'postgres' ],
+	'cannot use options -n and -t at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-n', 'pg_catalog', '-N', '"Foo"', 'postgres' ],
+	'cannot use options -n and -N at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-a', '-N', '"Foo"' ],
+	'cannot use options -a and -N at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-a', '-n', '"Foo"' ],
+	'cannot use options -a and -n at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-a', '-t', '"Foo".bar' ],
+	'cannot use options -a and -t at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-a', '-n', '"Foo"', 'postgres' ],
+	'cannot use options -a and -n at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-a', '-d', 'postgres' ],
+	'cannot use options -a and -d at the same time');
+$node->command_fails(
+	[ 'vacuumdb', '-a', 'postgres' ],
+	'cannot use option -a and a dbname as argument at the same time');
+
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..ccb05166df 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb',

multirange of arrays not working on postgresql 14

2022-04-22 Thread Jian He
select arraymultirange(arrayrange(array[1,2], array[2,1]));

ERROR:  42883: function arrayrange(integer[], integer[]) does not exist
> LINE 1: select arraymultirange(arrayrange(array[1,2], array[2,1]));
>^
> HINT:  No function matches the given name and argument types. You might
> need to add explicit type casts.
> LOCATION:  ParseFuncOrColumn, parse_func.c:629


tested on postgresql 14.
git.postgresql.org Git - postgresql.git/blob -
src/test/regress/sql/multirangetypes.sql

line:590
to 600.
git.postgresql.org Git - postgresql.git/blob -
src/test/regress/expected/multirangetypes.out

line
3200 to line 3210

I search log: git.postgresql.org Git - postgresql.git/log

there
is no mention of  arrayrange. So this feature is  available, but now seems
not working?
db fiddle: Postgres 14 | db<>fiddle (dbfiddle.uk)



Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-04-22 Thread Bharath Rupireddy
On Tue, Jan 11, 2022 at 2:11 PM Konstantin Knizhnik  wrote:
>
> We have faced with the similar problem in Zenith (open source Aurora)
> and have to implement back pressure mechanism to prevent overflow of WAL
> at stateless compute nodes
> and too long delays of [age reconstruction. Our implementation is the
> following:
> 1. Three GUCs are added: max_replication_write/flush/apply_lag
> 2. Replication lags are checked in XLogInsert and if one of 3 thresholds
> is reached then InterruptPending is set.
> 3. In ProcessInterrupts we block backend execution until lag is within
> specified boundary:
>
>  #define BACK_PRESSURE_DELAY 1L // 0.01 sec
>  while(true)
>  {
>  ProcessInterrupts_pg();
>
>  // Suspend writers until replicas catch up
>  lag = backpressure_lag();
>  if (lag <= 0)
>  break;
>
>  set_ps_display("backpressure throttling");
>
>  elog(DEBUG2, "backpressure throttling: lag %lu", lag);
>  pg_usleep(BACK_PRESSURE_DELAY);
>  }
>
> What is wrong here is that backend can be blocked for a long time
> (causing failure of client application due to timeout expiration) and
> hold acquired locks while sleeping.

Do we ever call CHECK_FOR_INTERRUPTS() while holding "important"
locks? I haven't seen any asserts or anything of that sort in
ProcessInterrupts() though, looks like it's the caller's
responsibility to not process interrupts while holding heavy weight
locks, here are some points on this upthread [1].

I don't think we have problem with various postgres timeouts
statement_timeout, lock_timeout, idle_in_transaction_session_timeout,
idle_session_timeout, client_connection_check_interval, because while
we wait for replication lag to get better in ProcessInterrupts(). I
think SIGALRM can be raised while we wait for replication lag to get
better, but it can't be handled. Why can't we just disable these
timeouts before going to wait and reset/enable right after the
replication lag gets better?

And the clients can always have their own
no-reply-kill-transaction-sort-of-timeout, if yes, let them fail and
deal with it. I don't think we can do much about this.

> We are thinking about smarter way of choosing throttling delay (for
> example exponential increase of throttling sleep interval until some
> maximal value is reached).
> But it is really hard to find some universal schema which will be good
> for all use cases (for example in case of short living session, which
> clients are connected to the server to execute just one query).

I think there has to be an upper limit to wait, perhaps a
'preconfigured amount of time'. I think others upthread aren't happy
with failing transactions because of the replication lag. But, my
point is how much time we would let the backends wait or throttle WAL
writes? It mustn't be forever (say if a broken connection to the async
standby is found).

[1] 
https://www.postgresql.org/message-id/20220105174643.lozdd3radxv4tlmx%40alap3.anarazel.de

Regards,
Bharath Rupireddy.