Re: New strategies for freezing, advancing relfrozenxid early

2022-08-31 Thread Peter Geoghegan
On Tue, Aug 30, 2022 at 11:28 PM Jeff Davis  wrote:
> That clarifies your point. It's still a challenge for me to reason
> about which of these potential new problems really need to be solved in
> v1, though.

I don't claim to understand it that well myself -- not just yet.
I feel like I have the right general idea, but the specifics
aren't all there (which is very often the case for me at this
point in the cycle). That seems like a good basis for further
discussion.

It's going to be quite a few months before some version of this
patchset is committed, at the very earliest. Obviously these are
questions that need answers, but the process of getting to those
answers is a significant part of the work itself IMV.

> > Why stop at a couple of dozens of lines of code? Why not just change
> > the default of vacuum_freeze_min_age and
> > vacuum_multixact_freeze_min_age to 0?
>
> I don't think that would actually solve the unbounded buildup of
> unfrozen pages. It would still be possible for pages to be marked all
> visible before being frozen, and then end up being skipped until an
> aggressive vacuum is forced, right?

With the 15 work in place, and with the insert-driven autovacuum
behavior from 13, it is likely that this would be enough to avoid all
antiwraparound vacuums in an append-only table. There is still one
case where we can throw away the opportunity to advance relfrozenxid
during non-aggressive VACUUMs for no good reason -- I didn't fix them
all just yet. But the remaining case (which is in lazy_scan_skip()) is
very narrow.

With vacuum_freeze_min_age = 0 and vacuum_multixact_freeze_min_age =
0, any page that is eligible to be set all-visible is also eligible to
have its tuples frozen and be set all-frozen instead, immediately.
When it isn't then we'll scan it in the next VACUUM anyway.

Actually I'm also ignoring some subtleties with Multis that could make
this not quite happen, but again, that's only a super obscure corner case.
The idea that just setting vacuum_freeze_min_age = 0 and
vacuum_multixact_freeze_min_age = 0 will be enough is definitely true
in spirit. You don't need to touch vacuum_freeze_table_age (if you did
then you'd get aggressive VACUUMs, and one goal here is to avoid
those whenever possible -- especially aggressive antiwraparound
autovacuums).

--
Peter Geoghegan




Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

2022-08-31 Thread Kyotaro Horiguchi
At Thu, 25 Aug 2022 11:44:34 +0200, "Drouvot, Bertrand"  
wrote in 
> Looking closer at it, I think we are already good for the drop case on
> the tables (by making direct use of the pg_stat_get_* functions on the
> before dropped oid).
> 
> So I think we can remove all the "table" new checks: new patch
> attached is doing so.
> 
> On the other hand, for the index case, I think it's better to keep the
> "committed index creation one".

I agree.

> Indeed, to check that the drop behaves correctly I think it's better
> in "the same test" to ensure we've had the desired result before the
> drop (I mean having pg_stat_have_stats() returning false after a drop
> does not really help if we are not 100% sure it was returning true for
> the exact same index before the drop).

Sounds reasonable.

> > We have other variable-numbered stats kinds
> > FUNCTION/REPLSLOT/SUBSCRIPTION. Don't we need the same for these?
> 
> I don't think we need more tests for the FUNCTION case (as it looks to
> me it is already covered in stat.sql by the
> pg_stat_get_function_calls() calls on the dropped functions oids).

Right.

> For SUBSCRIPTION, i think this is covered in 026_stats.pl:
> 
> # Subscription stats for sub1 should be gone
> is( $node_subscriber->safe_psql(
>     $db, qq(SELECT pg_stat_have_stats('subscription', 0,
> $sub1_oid))),
>     qq(f),
>     qq(Subscription stats for subscription '$sub1_name' should be
> removed.));
> 
> For REPLSLOT, I agree that we can add one test: I added it in
> contrib/test_decoding/sql/stats.sql. It relies on pg_stat_have_stats()
> (as relying on pg_stat_replication_slots and/or
> pg_stat_get_replication_slot() would not help that much for this test
> given that the slot has been removed from ReplicationSlotCtl)

Thanks for the searching.

+-- pg_stat_have_stats returns true for regression_slot_stats1
+-- Its index is 1 in ReplicationSlotCtl->replication_slots
+select pg_stat_have_stats('replslot', 0, 1);

This is wrong. The index is actually 0.  We cannot know the id
reliably since we don't expose it at all. We could slightly increase
robustness by assuming the range of the id but that is just moving the
problem to another place. If the test is broken by a change of
replslot id assignment policy, it would be easily found and fixed.

So is it fine simply fixing the comment with the correct ID?

Or, contrarily we can be more sensitive to the change of ID assignment
policy by checking all the replication slots.

select count(n) from generate_series(0, 2) as n where 
pg_stat_have_stats('replslot', 0, n);

The number changes from 3 to 0 across the slots drop.. If any of the
slots has gone out of the range, the number before the drop decreases.

> Attaching v3-0001 (with the right "numbering" this time ;-) )

Yeah, Looks fine:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Michael Paquier
On Sun, Aug 28, 2022 at 02:52:43PM -0700, Nathan Bossart wrote:
> On Sat, Aug 27, 2022 at 02:06:32PM +0530, Bharath Rupireddy wrote:
>> PSA v17 patch with reorderbuffer.c changes reverted.
> 
> LGTM

Yeah, what you have here looks pretty fine to me, so this is IMO
committable.  Let's wait a bit to see if there are any objections from
the others, in case.
--
Michael


signature.asc
Description: PGP signature


Re: SQL/JSON features for v15

2022-08-31 Thread Amit Langote
On Wed, Aug 31, 2022 at 3:51 PM Amit Langote  wrote:
> On Wed, Aug 31, 2022 at 6:25 AM Nikita Glukhov  
> wrote:
> > v10 patches
>
> Finally, I get this warning:
>
> execExprInterp.c: In function ‘ExecJsonCoerceCStringToText’:
> execExprInterp.c:4765:3: warning: missing braces around initializer
> [-Wmissing-braces]
>NameData encoding = {0};
>^
> execExprInterp.c:4765:3: warning: (near initialization for
> ‘encoding.data’) [-Wmissing-braces]

Given the time constraints on making a decision on this, I'd like to
also mention that other than the things mentioned in my last email,
which don't sound like a big deal for Nikita to take care of, I don't
have any further comments on the patches.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson - v12

2022-08-31 Thread Peter Eisentraut
I found that the perl test modules are not installed.  See attached 
patch to correct this.


To the patches:

4e15ee0e24 Don't hardcode tmp_check/ as test directory for tap tests
1a3169bc3f Split TESTDIR into TESTLOGDIR and TESTDATADIR

It's a bit weird that the first patch changes the meaning of TESTDIR
and the second patch removes it.  Maybe these patches should be
squashed together?


96d1d0a0cf meson: prereq: Extend gendef.pl in preparation for meson

ok


581721fa99 meson: prereq: Add src/tools/gen_export.pl

Still wondering about the whitespace changes I reported recently, but
that can also be fine-tuned later.


4245cc888e meson: prereq: Refactor PG_TEST_EXTRA logic in autoconf build

ok


3afe803e0f meson: prereq: Fix warning compat_informix/rnull.pgc with msvc

ok


ae7733f46c meson: prereq: Move darwin sysroot determination into 
separate file


ok


a1fb97a81b meson: Add meson based buildsystem

I'm not a fan of all this business to protect the two build systems
from each other.  I don't like the build process touching a file under
version control every time.  How necessary is this?  What happens
otherwise?

conversion_helpers.txt: should probably be removed now.

doc/src/sgml/resolv.xsl: I don't understand what this is doing.  Maybe
at least add a comment in the file.

src/common/unicode/meson.build: The comment at the top of the file
should be moved next to the files it is describing (similar to how it
is in the makefile).  I don't see CLDR_VERSION set anywhere.  Is that
part implemented?

src/port/win32ver.rc.in: This is redundant with src/port/win32ver.rc.
(Note that the latter is also used as an input file for text
substitution.  So having another file named *.in next to it would be
super confusing.)

src/tools/find_meson: Could use a brief comment what it does.

src/tools/pgflex: Could use a not-brief comment about what it does,
why it's needed.  Also a comment where it's used.  Also run this
through pycodestyle.

src/tools/rcgen: This is connected with the comment on win32ver.rc.in
above.  We already have this equivalent code in
src/makefiles/Makefile.win32.  Let's figure out a way to share this
code.  (It could be a Perl script, which is already required on
Windows.)  Also pycodestyle.

src/tools/testwrap: also documentation/comments/pycodestyle


cd193eb3e8 meson: ci: Build both with meson and as before

I haven't reviewed this one in detail.  Maybe add a summary in the
commit message, like these are the new jobs, these are the changes to
existing jobs.  It looks like there is more in there than just adding
a few meson jobs.


If the above are addressed, I think this will be just about at the
point where the above patches can be committed.

Everything past these patches I'm mentally postponing right now.From 80d6f4f9a574ddb6250e06490c610969aacdb824 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 31 Aug 2022 09:16:04 +0200
Subject: [PATCH] meson: Install test perl modules

---
 src/test/meson.build  |  2 ++
 src/test/perl/meson.build | 12 
 2 files changed, 14 insertions(+)
 create mode 100644 src/test/perl/meson.build

diff --git a/src/test/meson.build b/src/test/meson.build
index b86a0f3889..241d9d48aa 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -21,3 +21,5 @@ endif
 if icu.found()
   subdir('icu')
 endif
+
+subdir('perl')
diff --git a/src/test/perl/meson.build b/src/test/perl/meson.build
new file mode 100644
index 00..901bae7a56
--- /dev/null
+++ b/src/test/perl/meson.build
@@ -0,0 +1,12 @@
+# could use install_data's preserve_path option in >=0.64.0
+
+install_data(
+  'PostgreSQL/Version.pm',
+  install_dir: dir_pgxs / 'src/test/perl/PostgreSQL')
+
+install_data(
+  'PostgreSQL/Test/Utils.pm',
+  'PostgreSQL/Test/SimpleTee.pm',
+  'PostgreSQL/Test/RecursiveCopy.pm',
+  'PostgreSQL/Test/Cluster.pm',
+  install_dir: dir_pgxs / 'src/test/perl/PostgreSQL/Test')
-- 
2.37.1



Re: [RFC] building postgres with meson - v11

2022-08-31 Thread Peter Eisentraut

On 24.08.22 17:30, Andres Freund wrote:

0545eec895 meson: Add docs

We should think more about how to arrange the documentation.  We
probably don't want to copy-and-paste all the introductory and
requirements information.  I think we can make this initially much
briefer, like the Windows installation chapter.  For example, instead
of documenting each setup option again, just mention which ones exist
and then point (link) to the configure chapter for details.

The current docs, including the windows ones, are already hard to follow. I
think we should take some care to not make the meson bits even more
confusing. Cross referencing left and right seems problematic from that angle.


If you look at the current structure of the installation chapter

17.1. Short Version
17.2. Requirements
17.3. Getting the Source
17.4. Installation Procedure
17.5. Post-Installation Setup
17.6. Supported Platforms
17.7. Platform-Specific Notes

only 17.1, small parts of 12.2, and 17.4 should differ between make and 
meson.  There is no conceivable reason why the meson installation 
chapter should have a different "Getting the Source" section.  And some 
of the post-installation and platform-specific information doesn't 
appear at all on the meson chapter.


I think we can try to be a bit more ingenious in how we weave this 
together in the best way.  What I really wouldn't want is two separate 
chapters that duplicate the entire process.  I think we could do one 
chapter, like


- Short Version
- Requirements
- Getting the Source
- Installation Procedure
- Installation Procedure using Meson
- Post-Installation Setup
- Supported Platforms
- Platform-Specific Notes

Alternatively, if people prefer two separate chapters, let's think about 
some source-code level techniques to share the common contents.





Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Peter Eisentraut

On 30.08.22 15:16, Christoph Berg wrote:

I found the list of TG_ variables on
https://www.postgresql.org/docs/current/plpgsql-trigger.html#PLPGSQL-DML-TRIGGER
hard to read for several reasons: too much whitespace, all the lines
start with "Data type", and even after that, the actual content is
hiding behind some extra "variable that..." boilerplate.

The attached patch formats the list as a table, and removes some of
the clutter from the text.

I reused the catalog_table_entry table machinery, that is probably not
quite the correct thing, but I didn't find a better variant, and the
result looks ok.


I find the new version even harder to read.  The catalog_table_entry 
stuff doesn't really make sense here, since what you have before is 
already a definition list, and afterwards you have the same, just marked 
up "incorrectly".


We could move the data type in the , similar to how you did it in 
your patch.


I agree the whitespace layout is weird, but that's a problem of the 
website CSS stylesheet.  I think it looks a bit better with the local 
stylesheet, but that can all be tweaked.






Re: First draft of the PG 15 release notes

2022-08-31 Thread Peter Eisentraut

On 30.08.22 22:42, Bruce Momjian wrote:

Good question --- the full text is:

  
   
Record and check the collation of each database (Peter Eisentraut)
   

   
This is designed to detect collation
mismatches to avoid data corruption.  Function
pg_database_collation_actual_version()
reports the underlying operating system collation version, and
ALTER DATABASE ...  REFRESH sets the database
to match the operating system collation version.  DETAILS?
   
  

I just can't figure out what the user needs to understand about this,
and I understand very little of it.


We already had this feature for (schema-level) collations, now we have 
it on the level of the database collation.





Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Daniel Gustafsson
> On 31 Aug 2022, at 11:35, Peter Eisentraut 
>  wrote:
> 
> On 30.08.22 15:16, Christoph Berg wrote:
>> I found the list of TG_ variables on
>> https://www.postgresql.org/docs/current/plpgsql-trigger.html#PLPGSQL-DML-TRIGGER
>> hard to read for several reasons: too much whitespace, all the lines
>> start with "Data type", and even after that, the actual content is
>> hiding behind some extra "variable that..." boilerplate.
>> The attached patch formats the list as a table, and removes some of
>> the clutter from the text.
>> I reused the catalog_table_entry table machinery, that is probably not
>> quite the correct thing, but I didn't find a better variant, and the
>> result looks ok.
> 
> I find the new version even harder to read.  The catalog_table_entry stuff 
> doesn't really make sense here, since what you have before is already a 
> definition list, and afterwards you have the same, just marked up 
> "incorrectly".

If we change variable lists they should get their own formatting in the xsl
and css stylesheets.

> We could move the data type in the , similar to how you did it in your 
> patch.

That will make this look different from the trigger variable lists for other
languages (which typically don't list type), but I think it's worth it to avoid
the boilerplate which is a bit annoying.

Another thing we should change while there (but it's not directly related to
this patch) is that we document TG_RELID and $TG_relid as "object ID" but
TD["relid"] and $_TD->{relid} as "OID".  Punctuation of item descriptions is
also not consistent.

--
Daniel Gustafsson   https://vmware.com/





Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Christoph Berg
Re: Peter Eisentraut
> I find the new version even harder to read.  The catalog_table_entry stuff
> doesn't really make sense here, since what you have before is already a
> definition list, and afterwards you have the same, just marked up
> "incorrectly".

Fair enough. For comparison, this is what yesterday's patch looked
like: https://www.df7cb.de/s/2022-08-31.115813.w5UvAS.png

> We could move the data type in the , similar to how you did it in your
> patch.

The new version of the patch just moves up the data types, and removes
the extra clutter from the beginnings of each description:

https://www.df7cb.de/s/2022-08-31.115857.LkkKl8.png

Christoph
>From 61bcef4b3a0a8a17c86114ef9747c6e793a5b48f Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Wed, 31 Aug 2022 11:52:50 +0200
Subject: [PATCH] plpgsql-trigger.html: Use more concise wording for TG_
 variables

To improve readability of the TG_ variables definition list, this moves
the datatypes up to the defined term to avoid having each entry start
with "Data type", and removes some more redundant clutter ("Variable
holding...") from the descriptions that didn't carry any information.
---
 doc/src/sgml/plpgsql.sgml | 48 +++
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index cf387dfc3f..b862af35a7 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4032,10 +4032,10 @@ ASSERT condition  , 
 
- NEW
+ NEW (record)
  
   
-   Data type RECORD; variable holding the new
+   new
database row for INSERT/UPDATE operations in row-level
triggers. This variable is null in statement-level triggers
and for DELETE operations.
@@ -4044,10 +4044,10 @@ ASSERT condition  , 
 
 
- OLD
+ OLD (record)
  
   
-   Data type RECORD; variable holding the old
+   old
database row for UPDATE/DELETE operations in row-level
triggers. This variable is null in statement-level triggers
and for INSERT operations.
@@ -4056,20 +4056,20 @@ ASSERT condition  , 
 
 
- TG_NAME
+ TG_NAME (name)
  
   
-   Data type name; variable that contains the name of the trigger actually
+   name of the trigger actually
fired.
   
  
 
 
 
- TG_WHEN
+ TG_WHEN (text)
  
   
-   Data type text; a string of
+   string
BEFORE, AFTER, or
INSTEAD OF, depending on the trigger's definition.
   
@@ -4077,10 +4077,10 @@ ASSERT condition  , 
 
 
- TG_LEVEL
+ TG_LEVEL (text)
  
   
-   Data type text; a string of either
+   string
ROW or STATEMENT
depending on the trigger's definition.
   
@@ -4088,10 +4088,10 @@ ASSERT condition  , 
 
 
- TG_OP
+ TG_OP (text)
  
   
-   Data type text; a string of
+   string
INSERT, UPDATE,
DELETE, or TRUNCATE
telling for which operation the trigger was fired.
@@ -4100,20 +4100,20 @@ ASSERT condition  , 
 
 
- TG_RELID
+ TG_RELID (oid, references pg_class.oid)
  
   
-   Data type oid; the object ID of the table that caused the
+   object ID of the table that caused the
trigger invocation.
   
  
 
 
 
- TG_RELNAME
+ TG_RELNAME (name)
  
   
-   Data type name; the name of the table that caused the trigger
+   name of the table that caused the trigger
invocation. This is now deprecated, and could disappear in a future
release. Use TG_TABLE_NAME instead.
   
@@ -4121,40 +4121,40 @@ ASSERT condition  , 
 
 
- TG_TABLE_NAME
+ TG_TABLE_NAME (name)
  
   
-   Data type name; the name of the table that
+   name of the table that
caused the trigger invocation.
   
  
 
 
 
- TG_TABLE_SCHEMA
+ TG_TABLE_SCHEMA (name)
  
   
-   Data type name; the name of the schema of the
+   name of the schema of the
table that caused the trigger invocation.
   
  
 
 
 
- TG_NARGS
+ TG_NARGS (integer)
  
   
-   Data type integer; the number of arguments given to the trigger
+   number of arguments given to the trigger
function in the CREATE TRIGGER statement.
   
  
 
 
 
- TG_ARGV[]
+ TG_ARGV[] (text[])
  
   
-   Data type array of text; the arguments from
+   arguments from
the CREATE TRIGGER statement.
The index counts from 0. Invalid
indexes (less than 0 or greater than or equal to tg_nargs)
-- 
2.35.1



Re: [PATCH] Fix alter subscription concurrency errors

2022-08-31 Thread Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The patch applies with few "Hunk succeeded, offset -3 lines" warnings. Tested 
against master '7d5852ca'.

+   if (!HeapTupleIsValid(tup))
+   {
+   if (!missing_ok)
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("subscription \"%s\" 
does not exist",
+   subname)));
+   else
+   ereport(NOTICE,
+   (errmsg("subscription \"%s\" 
does not exist, skipping",
+   subname)));
+
+   return InvalidOid;
+   }
+

I think 'tup' should be released before returning, or break out of loop instead 
to release it.

The new status of this patch is: Waiting on Author


Re: Handle infinite recursion in logical replication setup

2022-08-31 Thread Amit Kapila
On Wed, Aug 31, 2022 at 11:35 AM Peter Smith  wrote:
>
> Here are my review comments for v43-0001.
>
> ==
>
> 1. Commit message
>
> The initial copy phase has no way to know the origin of the row data,
> so if 'copy_data = true' in the step 4 below, log a warning to notify user
> that potentially non-local data might have been copied.
>
> 1a.
> "in the step 4" -> "in step 4"
>
> ~
>
> 1b.
> "notify user" -> "notify the user"
>
> ==
>
> 2. doc/src/sgml/ref/create_subscription.sgml
>
> +  
> +   If the subscription is created with origin = none and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, log a
> +   warning to notify the user to check the publisher tables. Before 
> continuing
> +   with other operations the user should check that publisher tables did not
> +   have data with different origins to prevent data inconsistency issues on 
> the
> +   subscriber.
> +  
>
> I am not sure about that wording saying "to prevent data inconsistency
> issues" because I thought maybe is already too late because any
> unwanted origin data is already copied during the initial sync. I
> think the user can do it try to clean up any problems caused before
> things become even worse (??)
>

Oh no, it is not too late in all cases. The problem can only occur if
the user will try to subscribe from all the different publications
with copy_data = true. We are anyway trying to clarify in the second
patch the right way to accomplish this. So, I am not sure what better
we can do here. The only bulletproof way is to provide some APIs where
users don't need to bother about all these cases, basically something
similar to what Kuroda-San is working on in the thread [1].

> ~~~
>
> You asked for my thoughts (see [1] 6b):
>
> > There is nothing much a user can do in this case. Only option would be
> > to take a backup before the operation and restore it and then recreate
> > the replication setup. I was not sure if we should document these
> > steps as similar inconsistent data could get created without the
> > origin option . Thoughts?
>
> I think those cases are a bit different:
>
> - For a normal scenario (i.e. origin=ANY) then inconsistent data just
> refers to problems like maybe PK violations so I think you just get
> what you get and have to deal with the errors...
>
> - But when the user said origin=NONE they are specifically saying they
> do NOT want any non-local data, yet they still might end up getting
> data contrary to their wishes. So I think maybe there ought to be
> something documented to warn about this potentially happening. My
> first impression is that all this seems like a kind of terrible
> problem because if it happens then cleanup could be difficult - if
> that subscriber in turn was also a publisher then this bad data might
> have propagated all over the place already! Anyway, I guess all this
> could be mentioned in some (new ?) section of the
> logical-replication.sgml file (i.e. patch 0002).
>
> IDEA: I was wondering if the documentation should recommend (or maybe
> we can even enforce this) that when using origin=NONE the user should
> always create the subscription with enabled=FALSE. That way if there
> are some warnings about potential bad origin data then there it might
> be possible to take some action or change their mind *before* any
> unwanted data pollutes everything.
>

If we want to give this suggestion, then we need to also say that this
is only valid when copy_data is true. The other drawback is if users
always need to do this to use origin=NONE then this can be a burden to
the user. I am not against having such a NOTE but the tone should not
be to enforce users to do this.

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPuwRAoWY9pz%3DEubps3ooQCOBFiYPU9Yi%3DVB-U%2ByORU7OA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Small cleanups to tuplesort.c and a bonus small performance improvement

2022-08-31 Thread David Rowley
On Fri, 26 Aug 2022 at 16:48, David Rowley  wrote:
> 0003: Changes writetuple to tell it what it should do in regards to
> freeing and adjusting the memory accounting.
>
> Probably 0003 could be done differently. I'm certainly not set on the
> bool args. I understand that I'm never calling it with "freetup" ==
> true. So other options include 1) rip out the pfree code and that
> parameter; or 2) just do the inlining manually at both call sites.

This patch series needed to be rebased and on looking it at again,
since the pfree() code is never used I felt it makes very little sense
to keep it, so I decided that it might be better just to keep the
WRITETUP macro and just completely get rid of the writetuple function
and have the macro call the function pointed to be the "writetup"
pointer.   The only extra code we needed from writetuple() was the
memory accounting code which was only used in dumptuples(), so I've
just included that code in that function instead.

I also noticed that dumptuples() had a pretty braindead method of
zeroing out state->memtupcount by subtracting 1 from it on each loop.
Since that's not being used to keep track of the loop's progress, I've
just moved it out the loop and changed the code to set it to 0 once
the loop is done.

> I'll throw this in the September CF to see if anyone wants to look.
> There's probably lots more cleaning jobs that could be done in
> tuplesort.c.

My current thoughts are that this is a very trivial patch and unless
there's any objections I plan to push it soon.

David
From 7d9d960c6080f9511ecb2514defed386b9b65cdb Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Wed, 31 Aug 2022 18:52:11 +1200
Subject: [PATCH v2] Be smarter about freeing tuples during tuplesorts

During dumptuples() the call to writetuple() would pfree any non-null
tuple.  This was quite wasteful as this happens just before we perform a
reset of the context which stores all of those tuples.

It seems to make sense to do a bit of a code refactor to make this work,
so here we just get rid of the writetuple function and adjust the WRITETUP
macro to call the state's writetup function.  The WRITETUP usage in
mergeonerun() always has state->slabAllocatorUsed == true, so writetuple()
would never free the tuple or do any memory accounting.  The only call
path that needs memory accounting done is in dumptuples(), so let's just
do it manually there.

In passing, let's get rid of the state->memtupcount-- code that counts the
memtupcount down to 0 one tuple at a time inside the loop.  That seems to
be a rather inefficient way to set memtupcount to 0, so let's just zero it
after the loop instead.
---
 src/backend/utils/sort/tuplesort.c | 38 --
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index 66950983e6..416f02ba3c 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -395,7 +395,7 @@ struct Sharedsort
 
 #define REMOVEABBREV(state,stup,count) ((*(state)->base.removeabbrev) (state, 
stup, count))
 #define COMPARETUP(state,a,b)  ((*(state)->base.comparetup) (a, b, state))
-#define WRITETUP(state,tape,stup)  (writetuple(state, tape, stup))
+#define WRITETUP(state,tape,stup)  ((*(state)->base.writetup) (state, 
tape, stup))
 #define READTUP(state,stup,tape,len) ((*(state)->base.readtup) (state, stup, 
tape, len))
 #define FREESTATE(state)   ((state)->base.freestate ? 
(*(state)->base.freestate) (state) : (void) 0)
 #define LACKMEM(state) ((state)->availMem < 0 && 
!(state)->slabAllocatorUsed)
@@ -453,8 +453,6 @@ struct Sharedsort
 
 
 static void tuplesort_begin_batch(Tuplesortstate *state);
-static void writetuple(Tuplesortstate *state, LogicalTape *tape,
-  SortTuple *stup);
 static bool consider_abort_common(Tuplesortstate *state);
 static void inittapes(Tuplesortstate *state, bool mergeruns);
 static void inittapestate(Tuplesortstate *state, int maxTapes);
@@ -1339,24 +1337,6 @@ tuplesort_puttuple_common(Tuplesortstate *state, 
SortTuple *tuple, bool useAbbre
MemoryContextSwitchTo(oldcontext);
 }
 
-/*
- * Write a stored tuple onto tape.  Unless the slab allocator is
- * used, after writing the tuple, pfree() the out-of-line data (not the
- * SortTuple struct!), and increase state->availMem by the amount of
- * memory space thereby released.
- */
-static void
-writetuple(Tuplesortstate *state, LogicalTape *tape, SortTuple *stup)
-{
-   state->base.writetup(state, tape, stup);
-
-   if (!state->slabAllocatorUsed && stup->tuple)
-   {
-   FREEMEM(state, GetMemoryChunkSpace(stup->tuple));
-   pfree(stup->tuple);
-   }
-}
-
 static bool
 consider_abort_common(Tuplesortstate *state)
 {
@@ -2260,6 +2240,8 @@ mergeonerun(Tuplesortstate *state)
 */
beginmerge(state);
 
+   Assert(state->slabAllocatorUsed);
+
/*
  

Re: SQL/JSON features for v15

2022-08-31 Thread Amit Langote
On Wed, Aug 31, 2022 at 3:51 PM Amit Langote  wrote:
>  SELECT JSON_VALUE(jsonb '"aaa"', '$' RETURNING int DEFAULT 111 ON ERROR);
> - json_value
> -
> -111
> -(1 row)
> -
> +ERROR:  syntax error at or near "DEFAULT"
> +LINE 1: ...ELECT JSON_VALUE(jsonb '"aaa"', '$' RETURNING int DEFAULT 11...
>
> Is it intentional that you left many instances of the regression test
> output changes like the above?

Actually, thinking more about this, I am wondering if we should not
remove the DEFAULT expression productions in gram.y.  Maybe we can
keep the syntax and give an unsupported error during parse-analysis,
like the last version of the patch did for DEFAULT ON EMPTY.  Which
also means to also leave JsonBehavior alone but with default_expr
always NULL for now.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Avoid overhead open-close indexes (catalog updates)

2022-08-31 Thread Ranier Vilela
Hi,

The commit
https://github.com/postgres/postgres/commit/b17ff07aa3eb142d2cde2ea00e4a4e8f63686f96
Introduced the CopyStatistics function.

To do the work, CopyStatistics uses a less efficient function
to update/insert tuples at catalog systems.

The comment at indexing.c says:
"Avoid using it for multiple tuples, since opening the indexes
 * and building the index info structures is moderately expensive.
 * (Use CatalogTupleInsertWithInfo in such cases.)"

So inspired by the comment, changed in some fews places,
the CatalogInsert/CatalogUpdate to more efficient functions
CatalogInsertWithInfo/CatalogUpdateWithInfo.

With quick tests, resulting in small performance.

head:

1. REINDEX TABLE CONCURRENTLY pgbench_accounts;
Time: 77,805 ms
Time: 74,836 ms
Time: 73,480 ms

2. REINDEX TABLE CONCURRENTLY pgbench_tellers;
Time: 22,260 ms
Time: 22,205 ms
Time: 21,008 ms

patched:

1. REINDEX TABLE CONCURRENTLY pgbench_accounts;
Time: 65,048 ms
Time: 61,853 ms
Time: 61,119 ms

2. REINDEX TABLE CONCURRENTLY pgbench_tellers;
Time: 15,999 ms
Time: 15,961 ms
Time: 13,264 ms

There are other places that this could be useful,
but a careful analysis is necessary.

regards,
Ranier Vilela
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9b03579e6e..cf15051a05 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2856,8 +2856,7 @@ CopyStatistics(Oid fromrelid, Oid torelid)
 	SysScanDesc scan;
 	ScanKeyData key[1];
 	Relation	statrel;
-
-	statrel = table_open(StatisticRelationId, RowExclusiveLock);
+	CatalogIndexState indstate;
 
 	/* Now search for stat records */
 	ScanKeyInit(&key[0],
@@ -2865,6 +2864,9 @@ CopyStatistics(Oid fromrelid, Oid torelid)
 BTEqualStrategyNumber, F_OIDEQ,
 ObjectIdGetDatum(fromrelid));
 
+	statrel = table_open(StatisticRelationId, RowExclusiveLock);
+	indstate = CatalogOpenIndexes(statrel);
+
 	scan = systable_beginscan(statrel, StatisticRelidAttnumInhIndexId,
 			  true, NULL, 1, key);
 
@@ -2878,13 +2880,14 @@ CopyStatistics(Oid fromrelid, Oid torelid)
 
 		/* update the copy of the tuple and insert it */
 		statform->starelid = torelid;
-		CatalogTupleInsert(statrel, tup);
+		CatalogTupleInsertWithInfo(statrel, tup, indstate);
 
 		heap_freetuple(tup);
 	}
 
 	systable_endscan(scan);
 
+	CatalogCloseIndexes(indstate);
 	table_close(statrel, RowExclusiveLock);
 }
 
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index 114715498d..785fbc1669 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -69,6 +69,7 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 	bool		nulls[Natts_pg_enum];
 	ListCell   *lc;
 	HeapTuple	tup;
+	CatalogIndexState indstate;
 
 	num_elems = list_length(vals);
 
@@ -113,6 +114,9 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 	/* and make the entries */
 	memset(nulls, false, sizeof(nulls));
 
+	/* open the Catalog Indexes for Insert Tuples */
+	indstate = CatalogOpenIndexes(pg_enum);
+
 	elemno = 0;
 	foreach(lc, vals)
 	{
@@ -137,7 +141,7 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 
 		tup = heap_form_tuple(RelationGetDescr(pg_enum), values, nulls);
 
-		CatalogTupleInsert(pg_enum, tup);
+		CatalogTupleInsertWithInfo(pg_enum, tup, indstate);
 		heap_freetuple(tup);
 
 		elemno++;
@@ -145,6 +149,7 @@ EnumValuesCreate(Oid enumTypeOid, List *vals)
 
 	/* clean up */
 	pfree(oids);
+	CatalogCloseIndexes(indstate);
 	table_close(pg_enum, RowExclusiveLock);
 }
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a7966fff83..5cfcdf8ba7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1624,12 +1624,14 @@ static void
 update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 {
 	Relation	sd;
+	CatalogIndexState indstate;
 	int			attno;
 
 	if (natts <= 0)
 		return;	/* nothing to do */
 
 	sd = table_open(StatisticRelationId, RowExclusiveLock);
+	indstate = CatalogOpenIndexes(sd);
 
 	for (attno = 0; attno < natts; attno++)
 	{
@@ -1735,18 +1737,19 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 	 nulls,
 	 replaces);
 			ReleaseSysCache(oldtup);
-			CatalogTupleUpdate(sd, &stup->t_self, stup);
+			CatalogTupleUpdateWithInfo(sd, &stup->t_self, stup, indstate);
 		}
 		else
 		{
 			/* No, insert new tuple */
 			stup = heap_form_tuple(RelationGetDescr(sd), values, nulls);
-			CatalogTupleInsert(sd, stup);
+			CatalogTupleInsertWithInfo(sd, stup, indstate);
 		}
 
 		heap_freetuple(stup);
 	}
 
+	CatalogCloseIndexes(indstate);
 	table_close(sd, RowExclusiveLock);
 }
 
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index 4cc4e3c00f..b95b989c5a 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -1004,14 +1004,16 @@ DefineTSConfiguration(List *names, List *parameters, ObjectAddress *copied)
 		ScanKeyData skey;
 		SysScanDesc scan;
 		HeapTuple	m

Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Christoph Berg
Re: To Peter Eisentraut
> The new version of the patch just moves up the data types, and removes
> the extra clutter from the beginnings of each description:

The last version had the brackets in TG_ARGV[] (text[]) duplicated.

Christoph
>From 1355794d56919a015bd1528f62428beaab0a681b Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Wed, 31 Aug 2022 11:52:50 +0200
Subject: [PATCH] plpgsql-trigger.html: Use more concise wording for TG_
 variables

To improve readability of the TG_ variables definition list, this moves
the datatypes up to the defined term to avoid having each entry start
with "Data type", and removes some more redundant clutter ("Variable
holding...") from the descriptions that didn't carry any information.
---
 doc/src/sgml/plpgsql.sgml | 48 +++
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index cf387dfc3f..13b2ab8829 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4032,10 +4032,10 @@ ASSERT condition  , 
 
- NEW
+ NEW (record)
  
   
-   Data type RECORD; variable holding the new
+   new
database row for INSERT/UPDATE operations in row-level
triggers. This variable is null in statement-level triggers
and for DELETE operations.
@@ -4044,10 +4044,10 @@ ASSERT condition  , 
 
 
- OLD
+ OLD (record)
  
   
-   Data type RECORD; variable holding the old
+   old
database row for UPDATE/DELETE operations in row-level
triggers. This variable is null in statement-level triggers
and for INSERT operations.
@@ -4056,20 +4056,20 @@ ASSERT condition  , 
 
 
- TG_NAME
+ TG_NAME (name)
  
   
-   Data type name; variable that contains the name of the trigger actually
+   name of the trigger actually
fired.
   
  
 
 
 
- TG_WHEN
+ TG_WHEN (text)
  
   
-   Data type text; a string of
+   string
BEFORE, AFTER, or
INSTEAD OF, depending on the trigger's definition.
   
@@ -4077,10 +4077,10 @@ ASSERT condition  , 
 
 
- TG_LEVEL
+ TG_LEVEL (text)
  
   
-   Data type text; a string of either
+   string
ROW or STATEMENT
depending on the trigger's definition.
   
@@ -4088,10 +4088,10 @@ ASSERT condition  , 
 
 
- TG_OP
+ TG_OP (text)
  
   
-   Data type text; a string of
+   string
INSERT, UPDATE,
DELETE, or TRUNCATE
telling for which operation the trigger was fired.
@@ -4100,20 +4100,20 @@ ASSERT condition  , 
 
 
- TG_RELID
+ TG_RELID (oid, references pg_class.oid)
  
   
-   Data type oid; the object ID of the table that caused the
+   object ID of the table that caused the
trigger invocation.
   
  
 
 
 
- TG_RELNAME
+ TG_RELNAME (name)
  
   
-   Data type name; the name of the table that caused the trigger
+   name of the table that caused the trigger
invocation. This is now deprecated, and could disappear in a future
release. Use TG_TABLE_NAME instead.
   
@@ -4121,40 +4121,40 @@ ASSERT condition  , 
 
 
- TG_TABLE_NAME
+ TG_TABLE_NAME (name)
  
   
-   Data type name; the name of the table that
+   name of the table that
caused the trigger invocation.
   
  
 
 
 
- TG_TABLE_SCHEMA
+ TG_TABLE_SCHEMA (name)
  
   
-   Data type name; the name of the schema of the
+   name of the schema of the
table that caused the trigger invocation.
   
  
 
 
 
- TG_NARGS
+ TG_NARGS (integer)
  
   
-   Data type integer; the number of arguments given to the trigger
+   number of arguments given to the trigger
function in the CREATE TRIGGER statement.
   
  
 
 
 
- TG_ARGV[]
+ TG_ARGV (text[])
  
   
-   Data type array of text; the arguments from
+   arguments from
the CREATE TRIGGER statement.
The index counts from 0. Invalid
indexes (less than 0 or greater than or equal to tg_nargs)
-- 
2.35.1



Re: replacing role-level NOINHERIT with a grant-level option

2022-08-31 Thread Robert Haas
On Tue, Aug 30, 2022 at 5:20 PM Nathan Bossart  wrote:
> On Tue, Aug 30, 2022 at 03:24:56PM -0400, Robert Haas wrote:
> > - william => charles => elizabeth => uk is OK because the first two
> > hops have ADMIN and the last hop has INHERIT
>
> Don't you mean the first two hops have INHERIT and the last has ADMIN?

I do.

> > - william => charles => elizabeth => uk => parliament is probably not
> > OK because although the first two hops have ADMIN and the last has
> > INHERIT, the third hop probably lacks INHERIT
>
> Same here.  I don't mean to be pedantic, I just want to make sure I'm
> thinking of this correctly.

Right. The first two hops have INHERIT and the last has ADMIN, but the
third hop probably lacks INHERIT.

> Yes, this is very helpful.  I always appreciate your detailed examples.  I
> think what you are describing matches the mental model I was beginning to
> form.

Cool.

> Okay, now to take a closer look at the patch...

Thanks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




pg_upgrade generated files in subdir follow-up

2022-08-31 Thread Daniel Gustafsson
Commit 38bfae36526 moved the .txt files pg_upgrade generates to a separate
subdir, but there are a few left which are written to cwd.  The thread
resulting in that patch doesn't discuss these files specifically so it seems
they are just an oversight. Unless I'm missing something.

Should something the attached be applied to ensure all generated files are
placed in the subdirectory?

--
Daniel Gustafsson   https://vmware.com/



pg_upgrade_generated_files.diff
Description: Binary data


Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Daniel Gustafsson
> On 31 Aug 2022, at 13:55, Christoph Berg  wrote:
> 
> Re: To Peter Eisentraut
>> The new version of the patch just moves up the data types, and removes
>> the extra clutter from the beginnings of each description:
> 
> The last version had the brackets in TG_ARGV[] (text[]) duplicated.

This, and the other string variables, now reads a bit strange IMO:

-   Data type text; a string of
+   string
INSERT, UPDATE,
DELETE, or TRUNCATE

Wouldn't it be better with "string containing INSERT.." or something
similar?

--
Daniel Gustafsson   https://vmware.com/





Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Christoph Berg
Re: Daniel Gustafsson
> This, and the other string variables, now reads a bit strange IMO:
> 
> -   Data type text; a string of
> +   string
> INSERT, UPDATE,
> DELETE, or TRUNCATE
> 
> Wouldn't it be better with "string containing INSERT.." or something
> similar?

Right, that felt strange to me as well, but I couldn't think of
something better.

"string containing" is again pretty boilerplatish, how about just
"contains"?

Christoph
>From b675a9e63ac663817cf90ca3aee2acf398b3fa97 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Wed, 31 Aug 2022 11:52:50 +0200
Subject: [PATCH] plpgsql-trigger.html: Use more concise wording for TG_
 variables

To improve readability of the TG_ variables definition list, this moves
the datatypes up to the defined term to avoid having each entry start
with "Data type", and removes some more redundant clutter ("Variable
holding...") from the descriptions that didn't carry any information.
---
 doc/src/sgml/plpgsql.sgml | 52 +++
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index cf387dfc3f..b5cfe3c63d 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4032,10 +4032,10 @@ ASSERT condition  , 
 
- NEW
+ NEW (record)
  
   
-   Data type RECORD; variable holding the new
+   new
database row for INSERT/UPDATE operations in row-level
triggers. This variable is null in statement-level triggers
and for DELETE operations.
@@ -4044,10 +4044,10 @@ ASSERT condition  , 
 
 
- OLD
+ OLD (record)
  
   
-   Data type RECORD; variable holding the old
+   old
database row for UPDATE/DELETE operations in row-level
triggers. This variable is null in statement-level triggers
and for INSERT operations.
@@ -4056,20 +4056,20 @@ ASSERT condition  , 
 
 
- TG_NAME
+ TG_NAME (name)
  
   
-   Data type name; variable that contains the name of the trigger actually
+   name of the trigger actually
fired.
   
  
 
 
 
- TG_WHEN
+ TG_WHEN (text)
  
   
-   Data type text; a string of
+   contains
BEFORE, AFTER, or
INSTEAD OF, depending on the trigger's definition.
   
@@ -4077,43 +4077,43 @@ ASSERT condition  , 
 
 
- TG_LEVEL
+ TG_LEVEL (text)
  
   
-   Data type text; a string of either
-   ROW or STATEMENT
+   contains
+   ROW or STATEMENT,
depending on the trigger's definition.
   
  
 
 
 
- TG_OP
+ TG_OP (text)
  
   
-   Data type text; a string of
+   contains
INSERT, UPDATE,
-   DELETE, or TRUNCATE
+   DELETE, or TRUNCATE,
telling for which operation the trigger was fired.
   
  
 
 
 
- TG_RELID
+ TG_RELID (oid, references pg_class.oid)
  
   
-   Data type oid; the object ID of the table that caused the
+   object ID of the table that caused the
trigger invocation.
   
  
 
 
 
- TG_RELNAME
+ TG_RELNAME (name)
  
   
-   Data type name; the name of the table that caused the trigger
+   name of the table that caused the trigger
invocation. This is now deprecated, and could disappear in a future
release. Use TG_TABLE_NAME instead.
   
@@ -4121,40 +4121,40 @@ ASSERT condition  , 
 
 
- TG_TABLE_NAME
+ TG_TABLE_NAME (name)
  
   
-   Data type name; the name of the table that
+   name of the table that
caused the trigger invocation.
   
  
 
 
 
- TG_TABLE_SCHEMA
+ TG_TABLE_SCHEMA (name)
  
   
-   Data type name; the name of the schema of the
+   name of the schema of the
table that caused the trigger invocation.
   
  
 
 
 
- TG_NARGS
+ TG_NARGS (integer)
  
   
-   Data type integer; the number of arguments given to the trigger
+   number of arguments given to the trigger
function in the CREATE TRIGGER statement.
   
  
 
 
 
- TG_ARGV[]
+ TG_ARGV (text[])
  
   
-   Data type array of text; the arguments from
+   arguments from
the CREATE TRIGGER statement.
The index counts from 0. Invalid
indexes (less than 0 or greater than or equal to tg_nargs)
-- 
2.35.1



Re: replacing role-level NOINHERIT with a grant-level option

2022-08-31 Thread Robert Haas
On Tue, Aug 30, 2022 at 6:10 PM Nathan Bossart  wrote:
> On Tue, Aug 30, 2022 at 08:33:46AM -0400, Robert Haas wrote:
> > Third try.
>
> Now that I have this grantor stuff fresh in my mind, this patch looks good
> to me.

Thanks for reviewing. Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Christoph Berg
Re: To Daniel Gustafsson
> "string containing" is again pretty boilerplatish, how about just
> "contains"?

Actually, just omitting the whole prefix works best.

TG_WHEN (text)

BEFORE, AFTER, or INSTEAD OF, depending on the trigger's definition.

I also shortened some "name of table" to just "table". Since the data
type is "name", it's clear what "table" means.

Christoph
>From a281bbb3ba80e14645f83d464ca6df892a85c1bb Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Wed, 31 Aug 2022 11:52:50 +0200
Subject: [PATCH] plpgsql-trigger.html: Use more concise wording for TG_
 variables

To improve readability of the TG_ variables definition list, this moves
the datatypes up to the defined term to avoid having each entry start
with "Data type", and removes some more redundant clutter ("Variable
holding...") from the descriptions that didn't carry any information.
---
 doc/src/sgml/plpgsql.sgml | 52 +++
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index cf387dfc3f..b5cfe3c63d 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4032,10 +4032,10 @@ ASSERT condition  , 
 
- NEW
+ NEW (record)
  
   
-   Data type RECORD; variable holding the new
+   new
database row for INSERT/UPDATE operations in row-level
triggers. This variable is null in statement-level triggers
and for DELETE operations.
@@ -4044,10 +4044,10 @@ ASSERT condition  , 
 
 
- OLD
+ OLD (record)
  
   
-   Data type RECORD; variable holding the old
+   old
database row for UPDATE/DELETE operations in row-level
triggers. This variable is null in statement-level triggers
and for INSERT operations.
@@ -4056,20 +4056,20 @@ ASSERT condition  , 
 
 
- TG_NAME
+ TG_NAME (name)
  
   
-   Data type name; variable that contains the name of the trigger actually
+   name of the trigger actually
fired.
   
  
 
 
 
- TG_WHEN
+ TG_WHEN (text)
  
   
-   Data type text; a string of
+   contains
BEFORE, AFTER, or
INSTEAD OF, depending on the trigger's definition.
   
@@ -4077,43 +4077,43 @@ ASSERT condition  , 
 
 
- TG_LEVEL
+ TG_LEVEL (text)
  
   
-   Data type text; a string of either
-   ROW or STATEMENT
+   contains
+   ROW or STATEMENT,
depending on the trigger's definition.
   
  
 
 
 
- TG_OP
+ TG_OP (text)
  
   
-   Data type text; a string of
+   contains
INSERT, UPDATE,
-   DELETE, or TRUNCATE
+   DELETE, or TRUNCATE,
telling for which operation the trigger was fired.
   
  
 
 
 
- TG_RELID
+ TG_RELID (oid, references pg_class.oid)
  
   
-   Data type oid; the object ID of the table that caused the
+   object ID of the table that caused the
trigger invocation.
   
  
 
 
 
- TG_RELNAME
+ TG_RELNAME (name)
  
   
-   Data type name; the name of the table that caused the trigger
+   name of the table that caused the trigger
invocation. This is now deprecated, and could disappear in a future
release. Use TG_TABLE_NAME instead.
   
@@ -4121,40 +4121,40 @@ ASSERT condition  , 
 
 
- TG_TABLE_NAME
+ TG_TABLE_NAME (name)
  
   
-   Data type name; the name of the table that
+   name of the table that
caused the trigger invocation.
   
  
 
 
 
- TG_TABLE_SCHEMA
+ TG_TABLE_SCHEMA (name)
  
   
-   Data type name; the name of the schema of the
+   name of the schema of the
table that caused the trigger invocation.
   
  
 
 
 
- TG_NARGS
+ TG_NARGS (integer)
  
   
-   Data type integer; the number of arguments given to the trigger
+   number of arguments given to the trigger
function in the CREATE TRIGGER statement.
   
  
 
 
 
- TG_ARGV[]
+ TG_ARGV (text[])
  
   
-   Data type array of text; the arguments from
+   arguments from
the CREATE TRIGGER statement.
The index counts from 0. Invalid
indexes (less than 0 or greater than or equal to tg_nargs)
-- 
2.35.1



Re: SQL/JSON features for v15

2022-08-31 Thread Andrew Dunstan


On 2022-08-31 We 07:01, Amit Langote wrote:
> On Wed, Aug 31, 2022 at 3:51 PM Amit Langote  wrote:
>>  SELECT JSON_VALUE(jsonb '"aaa"', '$' RETURNING int DEFAULT 111 ON ERROR);
>> - json_value
>> -
>> -111
>> -(1 row)
>> -
>> +ERROR:  syntax error at or near "DEFAULT"
>> +LINE 1: ...ELECT JSON_VALUE(jsonb '"aaa"', '$' RETURNING int DEFAULT 11...
>>
>> Is it intentional that you left many instances of the regression test
>> output changes like the above?
> Actually, thinking more about this, I am wondering if we should not
> remove the DEFAULT expression productions in gram.y.  Maybe we can
> keep the syntax and give an unsupported error during parse-analysis,
> like the last version of the patch did for DEFAULT ON EMPTY.  Which
> also means to also leave JsonBehavior alone but with default_expr
> always NULL for now.
>

Producing an error in the parse analysis phase seems best to me.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





allowing for control over SET ROLE

2022-08-31 Thread Robert Haas
Hi,

There are two ways in which a role can exercise the privileges of some
other role which has been granted to it. First, it can implicitly
inherit the privileges of the granted role. Second, it can assume the
identity of the granted role using the SET ROLE command. It is
possible to control the former behavior, but not the latter. In v15
and prior release, we had a role-level [NO]INHERIT property which
controlled whether a role automatically inherited the privileges of
any role granted to it. This was all-or-nothing. Beginning in
e3ce2de09d814f8770b2e3b3c152b7671bcdb83f, the inheritance behavior of
role-grants can be overridden for individual grants, so that some
grants are inherited and others are not. However, there is no similar
facility for controlling whether a role can SET ROLE to some other
role of which it is a member. At present, if role A is a member of
role B, then A can SET ROLE B, and that's it.

In some circumstances, it may be desirable to control this behavior.
For example, if we GRANT pg_read_all_settings TO seer, we do want the
seer to be able to read all the settings, else we would not have
granted the role. But we might not want the seer to be able to do
this:

You are now connected to database "rhaas" as user "seer".
rhaas=> set role pg_read_all_settings;
SET
rhaas=> create table artifact (a int);
CREATE TABLE
rhaas=> \d
List of relations
 Schema |   Name   | Type  |Owner
+--+---+--
 public | artifact | table | pg_read_all_settings
(1 row)

I have attached a rather small patch which makes it possible to
control this behavior:

You are now connected to database "rhaas" as user "rhaas".
rhaas=# grant pg_read_all_settings to seer with set false;
GRANT ROLE
rhaas=# \c - seer
You are now connected to database "rhaas" as user "seer".
rhaas=> set role pg_read_all_settings;
ERROR:  permission denied to set role "pg_read_all_settings"

I think that this behavior is generally useful, and not just for the
predefined roles that we ship as part of PostgreSQL. I don't think
it's too hard to imagine someone wanting to use some locally created
role as a container for privileges but not wanting the users who
possess this role to run around creating new objects owned by it. To
some extent that can be controlled by making sure the role in question
doesn't have any excess privileges, but that's not really sufficient:
all you need is one schema anywhere in the system that grants CREATE
to PUBLIC. You could avoid creating such a schema, which might be a
good idea for other reasons anyway, but it feels like approaching the
problem from the wrong end. What you really want is to allow the users
to inherit the privileges of the role but not use SET ROLE to become
that role, so that's what this patch lets you do.

There's one other kind of case in which this sort of thing might be
somewhat useful, although it's more dubious. Suppose you have an
oncall group where you regularly add and remove members according to
who is on call. Naturally, you have an on-call bot which performs this
task automatically. The on-call bot has the ability to manage
memberships in the oncall group, but should not have the ability to
access any of its privileges, either by inheritance or via SET ROLE.
This patch KIND OF lets you accomplish this:

rhaas=# create role oncall;
CREATE ROLE
rhaas=# create role oncallbot login;
CREATE ROLE
rhaas=# grant oncall to oncallbot with inherit false, set false, admin true;
GRANT ROLE
rhaas=# create role anna;
CREATE ROLE
rhaas=# create role eliza;
CREATE ROLE
rhaas=# \c - oncallbot
You are now connected to database "rhaas" as user "oncallbot".
rhaas=> grant oncall to anna;
GRANT ROLE
rhaas=> revoke oncall from anna;
REVOKE ROLE
rhaas=> grant oncall to eliza;
GRANT ROLE
rhaas=> set role oncall;
ERROR:  permission denied to set role "oncall"

The problem here is that if a nasty evil hacker takes over the
oncallbot role, nothing whatsoever prevents them from executing "grant
oncall to oncallbot with set true" after which they can then "SET ROLE
oncall" using the privileges they just granted themselves. And even if
under some theory we blocked that, they could still maliciously grant
the sought-after on-call privileges to some other role i.e. "grant
oncall to accomplice". It's fundamentally difficult to allow people to
administer a set of privileges without giving them the ability to
usurp those privileges, and I wouldn't like to pretend that this patch
is in any way sufficient to accomplish such a thing. Nevertheless, I
think there's some chance it might be useful to someone building such
a system, in combination with other safeguards. Or maybe not: this
isn't the main reason I'm interested in this, and it's just an added
benefit if it turns out that someone can do something like this with
it.

In order to apply this patch, we'd need to reach a conclusion about
the matters mentioned in
http://postgr.es/m/ca+tgmobheyynw9vrhvolvd8odspbjuu9

Re: Slight refactoring of state check in pg_upgrade check_ function

2022-08-31 Thread Daniel Gustafsson
> On 30 Aug 2022, at 23:08, Bruce Momjian  wrote:
> On Sun, Aug 28, 2022 at 03:06:09PM -0700, Nathan Bossart wrote:

>> The patch looks reasonable to me.
> 
> +1. Those checks have accumulated over time with different authors,
> hence the stylistic differences.

Pushed, thanks for review!

--
Daniel Gustafsson   https://vmware.com/





Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-08-31 Thread Imseih (AWS), Sami
>> For the benefit of anyone who may be looking at this thread in the
>> archive later, I believe this commit will have fixed this issue:

I can also confirm that indeed the commit 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6672d79
does fix this issue.

Thanks!

--
Sami Imseih
Amazon Web Services (AWS)





Re: pg_upgrade generated files in subdir follow-up

2022-08-31 Thread Tom Lane
Daniel Gustafsson  writes:
> Commit 38bfae36526 moved the .txt files pg_upgrade generates to a separate
> subdir, but there are a few left which are written to cwd.  The thread
> resulting in that patch doesn't discuss these files specifically so it seems
> they are just an oversight. Unless I'm missing something.

> Should something the attached be applied to ensure all generated files are
> placed in the subdirectory?

It certainly looks inconsistent ATM.  I wondered if maybe the plan was to
put routine output into the log directory but problem-reporting files
into cwd --- but that isn't what's happening now.

As long as we report the path to where the file is, I don't see a reason
not to put problem-reporting files in the subdir too.

regards, tom lane




pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
An internal VM crashed last night due to OOM.

When I tried to start postgres, it failed like:

< 2022-08-31 08:44:10.495 CDT  >LOG:  checkpoint starting: end-of-recovery 
immediate wait
< 2022-08-31 08:44:10.609 CDT  >LOG:  request to flush past end of generated 
WAL; request 1201/1CAF84F0, current position 1201/1CADB730
< 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 08:44:10.609 CDT  >ERROR:  xlog flush request 1201/1CAF84F0 is not 
satisfied --- flushed only to 1201/1CADB730
< 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 08:44:10.609 CDT  >FATAL:  checkpoint request failed

I was able to start it with -c recovery_prefetch=no, so it seems like
prefetch tried to do too much.  The VM runs centos7 under qemu.
I'm making a copy of the data dir in cases it's needed.

-- 
Justin




Re: plpgsql-trigger.html: Format TG_ variables as table (patch)

2022-08-31 Thread Dagfinn Ilmari Mannsåker
Christoph Berg  writes:

> Re: To Daniel Gustafsson
>> "string containing" is again pretty boilerplatish, how about just
>> "contains"?
>
> Actually, just omitting the whole prefix works best.
>
> TG_WHEN (text)
>
> BEFORE, AFTER, or INSTEAD OF, depending on the trigger's definition.

The attached patch does not reflect this, did you attach an old version?

> I also shortened some "name of table" to just "table". Since the data
> type is "name", it's clear what "table" means.

I think it reads better with the definite article and initial capital,
e.g. "The table that triggered ….".

> 
>  
> - NEW
> + NEW (record)

The type names should still be wrapped in , like they were before.

- ilmari




Re: SQL/JSON features for v15

2022-08-31 Thread Jonathan S. Katz

On 8/31/22 8:38 AM, Andrew Dunstan wrote:


On 2022-08-31 We 07:01, Amit Langote wrote:

On Wed, Aug 31, 2022 at 3:51 PM Amit Langote  wrote:

  SELECT JSON_VALUE(jsonb '"aaa"', '$' RETURNING int DEFAULT 111 ON ERROR);
- json_value
-
-111
-(1 row)
-
+ERROR:  syntax error at or near "DEFAULT"
+LINE 1: ...ELECT JSON_VALUE(jsonb '"aaa"', '$' RETURNING int DEFAULT 11...

Is it intentional that you left many instances of the regression test
output changes like the above?

Actually, thinking more about this, I am wondering if we should not
remove the DEFAULT expression productions in gram.y.  Maybe we can
keep the syntax and give an unsupported error during parse-analysis,
like the last version of the patch did for DEFAULT ON EMPTY.  Which
also means to also leave JsonBehavior alone but with default_expr
always NULL for now.



Producing an error in the parse analysis phase seems best to me.


Andres, Robert, Tom: With this recent work, have any of your opinions 
changed on including SQL/JSON in v15?


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: pg_upgrade generated files in subdir follow-up

2022-08-31 Thread Daniel Gustafsson
> On 31 Aug 2022, at 15:59, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Commit 38bfae36526 moved the .txt files pg_upgrade generates to a separate
>> subdir, but there are a few left which are written to cwd.  The thread
>> resulting in that patch doesn't discuss these files specifically so it seems
>> they are just an oversight. Unless I'm missing something.
> 
>> Should something the attached be applied to ensure all generated files are
>> placed in the subdirectory?
> 
> It certainly looks inconsistent ATM.  I wondered if maybe the plan was to
> put routine output into the log directory but problem-reporting files
> into cwd --- but that isn't what's happening now.

Right, check_proper_datallowconn and check_for_isn_and_int8_passing_mismatch
and a few other check functions already place error reporting in the subdir.

> As long as we report the path to where the file is, I don't see a reason
> not to put problem-reporting files in the subdir too.

Agreed. The documentation states:

"pg_upgrade creates various working files, such as schema dumps, stored
within pg_upgrade_output.d in the directory of the new cluster.  Each
run creates a new subdirectory named with a timestamp formatted as per
ISO 8601 (%Y%m%dT%H%M%S), where all the generated files are stored."

The delete_old_cluster and reindex_hash scripts are still placed in CWD, which
isn't changed by this patch, as that seems correct (and might break scripts if
we move them).  Maybe we should amend the docs to mention that scripts aren't
generated in the subdir?

--
Daniel Gustafsson   https://vmware.com/





Re: introduce bufmgr hooks

2022-08-31 Thread Andres Freund
HHi,

On 2022-08-29 15:24:49 -0700, Nathan Bossart wrote:
> I'd like to propose some new hooks for the buffer manager.  My primary goal
> is to allow users to create an additional caching mechanism between the
> shared buffers and disk for evicted buffers.

I'm very doubtful this is a good idea. These are quite hot paths. While not a
huge cost, adding an indirection isn't free nonetheless.  I also think it'll
make it harder to improve things in this area, which needs quite a bit of
work.

Greetings,

Andres Freund




Re: SQL/JSON features for v15

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 10:20:24 -0400, Jonathan S. Katz wrote:
> Andres, Robert, Tom: With this recent work, have any of your opinions
> changed on including SQL/JSON in v15?

I don't really know what to do here. It feels blatantly obvious that this code
isn't even remotely close to being releasable. I'm worried about the impact of
the big revert at this stage of the release cycle, and that's not getting
better by delaying further. And I'm getting weary of being asked to make the
obvious call that the authors of this feature as well as the RMT should have
made a while ago.

>From my POV the only real discussion is whether we'd want to revert this in 15
and HEAD or just 15. There's imo a decent point to be made to just revert in
15 and aggressively press forward with the changes posted in this thread.

Greetings,

Andres Freund




Re: [PATCH] Query Jumbling for CALL and SET utility statements

2022-08-31 Thread Pavel Stehule
Hi


st 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand 
napsal:

> Hi hackers,
>
> While query jumbling is provided for function calls that’s currently not
> the case for procedures calls.
> The reason behind this is that all utility statements are currently
> discarded for jumbling.
>
> We’ve recently seen performance impacts (LWLock contention) due to the
> lack of jumbling on procedure calls with pg_stat_statements and
> pg_stat_statements.track_utility enabled (think an application with a high
> rate of procedure calls with unique parameters for each call).
>
> Jeremy has had this conversation on twitter (see
> https://twitter.com/jer_s/status/1560003560116342785) and Nikolay
> reported that he also had to work on a similar performance issue with SET
> being used.
>
> That’s why we think it would make sense to allow jumbling for those 2
> utility statements: CALL and SET.
>
> Please find attached a patch proposal for doing so.
>
> With the attached patch we would get things like:
> CALL MINUS_TWO(3);
> CALL MINUS_TWO(7);
> CALL SUM_TWO(3, 8);
> CALL SUM_TWO(7, 5);
> set enable_seqscan=false;
> set enable_seqscan=true;
> set seq_page_cost=2.0;
> set seq_page_cost=1.0;
>
> postgres=# SELECT query, calls, rows FROM pg_stat_statements;
>query   | calls | rows
> ---+---+--
>  set seq_page_cost=$1  | 2 |0
>  CALL MINUS_TWO($1)| 2 |0
>  set enable_seqscan=$1 | 2 |0
>  CALL SUM_TWO($1, $2)  | 2 |0
>
> Looking forward to your feedback,
>
The idea is good, but I think you should use pg_stat_functions instead.
Maybe it is supported already (I didn't test it). I am  not sure so SET
statement should be traced in pg_stat_statements - it is usually pretty
fast, and without context it says nothing. It looks like just overhead.

Regards

Pavel


> Thanks,
>
> Jeremy & Bertrand
>
> --
> Bertrand Drouvot
> Amazon Web Services: https://aws.amazon.com
>
>


Re: SQL/JSON features for v15

2022-08-31 Thread Andrew Dunstan


On 2022-08-30 Tu 17:25, Nikita Glukhov wrote:
>
>
>
 Patches 0001-0006:

 Yeah, these add the overhead of an extra function call (typin() ->
 typin_opt_error()) in possibly very common paths.  Other than
 refactoring *all* places that call typin() to use the new API, the
 only other option seems to be to leave the typin() functions alone and
 duplicate their code in typin_opt_error() versions for all the types
 that this patch cares about.  Though maybe, that's not necessarily a
 better compromise than accepting the extra function call overhead.
>>> I think another possibility is to create a static inline function in the
>>> corresponding .c module (say boolin_impl() in bool.c), which is called
>>> by both the opt_error variant as well as the regular one.  This would
>>> avoid the duplicate code as well as the added function-call overhead.
>> +1
> I always thought about such internal inline functions, I 've added them in 
> v10.
>
>

A couple of questions about these:


1. Patch 5 changes the API of DecodeDateTime() and DecodeTimeOnly() by
adding an extra parameter bool *error. Would it be better to provide
_opt_error flavors of these?

2. Patch 6 changes jsonb_from_cstring so that it's no longer static
inline. Shouldn't we have a static inline function that can be called
from inside jsonb.c and is called by the extern function?


changing both of these things would be quite trivial and should not hold
anything up.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Tracking last scan time

2022-08-31 Thread Dave Page
On Tue, 30 Aug 2022 at 19:46, Bruce Momjian  wrote:

> On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote:
> > On Thu, 25 Aug 2022 at 01:44, David Rowley  wrote:
> > I don't have a particular opinion about the patch, I'm just pointing
> > out that there are other ways. Even just writing down the numbers on
> a
> > post-it note and coming back in a month to see if they've changed is
> > enough to tell if the table or index has been used.
> >
> >
> > There are usually other ways to perform monitoring tasks, but there is
> > something to be said for the convenience of having functionality built
> in and
> > not having to rely on tools, scripts, or post-it notes :-)
>
> Should we consider using something cheaper like time() so we don't need
> a GUC to enable this?
>

Interesting idea, but on my mac at least, 100,000,000 gettimeofday() calls
takes about 2 seconds, whilst 100,000,000 time() calls takes 14(!) seconds.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


Add tracking of backend memory allocated to pg_stat_activity

2022-08-31 Thread Reid Thompson
Hi Hackers,

Attached is a patch to 
Add tracking of backend memory allocated to pg_stat_activity
    
This new field displays the current bytes of memory allocated to the
backend process.  It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included
in the displayed value.  Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero.


-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..40ae638f25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
 
 
+
+ 
+  backend_mem_allocated bigint
+ 
+ 
+  The byte count of memory allocated to this backend. Dynamic shared memory
+  allocations are included only in the value displayed for the backend that
+  created them, they are not included in the value for backends that are
+  attached to them to avoid double counting.
+ 
+
+
  
   
query text
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5a844b63a1..d23f0e9dbb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -863,6 +863,7 @@ CREATE VIEW pg_stat_activity AS
 S.backend_xid,
 s.backend_xmin,
 S.query_id,
+S.backend_mem_allocated,
 S.query,
 S.backend_type
 FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..269ad2fe53 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
+#include "utils/backend_status.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 			name)));
 			return false;
 		}
+		/*
+		 * Detach and destroy pass through here, only decrease the memory
+		 * shown allocated in pgstat_activity when the creator destroys the
+		 * allocation.
+		 */
+		if (op == DSM_OP_DESTROY)
+			pgstat_report_backend_mem_allocated_decrease(*mapped_size);
 		*mapped_address = NULL;
 		*mapped_size = 0;
 		if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		name)));
 		return false;
 	}
+
+	/*
+	 * Attach and create pass through here, only update backend memory
+	 * allocated in pgstat_activity for the creator process.
+	 */
+	if (op == DSM_OP_CREATE)
+	{
+		/*
+		 * Posix creation calls dsm_impl_posix_resize implying that resizing
+		 * occurs or may be added in the future. As implemented
+		 * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+		 * whole new size as input, growing the allocation as needed * (only
+		 * truncate supports shrinking). We update by replacing the * old
+		 * allocation with the new.
+		 */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+		/*
+		 * posix_fallocate does not shrink allocations, adjust only on
+		 * allocation increase.
+		 */
+		if (request_size > *mapped_size)
+		{
+			pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+			pgstat_report_backend_mem_allocated_increase(request_size);
+		}
+#else
+		pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+		pgstat_report_backend_mem_allocated_increase(request_size);
+#endif
+	}
 	*mapped_address = address;
 	*mapped_size = request_size;
 	close(fd);
@@ -537,6 +575,14 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 			name)));
 			return false;
 		}
+
+		/*
+		 * Detach and destroy pass through here, only decrease the memory
+		 * shown allocated in pgstat_activity when the creator destroys the
+		 * allocation.
+		 */
+		if (op == DSM_OP_DESTROY)
+			pgstat_report_backend_mem_allocated_decrease(*mapped_size);
 		*mapped_address = NULL;
 		*mapped_size = 0;
 		if (op == DSM_OP_DESTROY && shmctl(ident, IPC_RMID, NULL) < 0)
@@ -584,6 +630,13 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 		name)));
 		return false;
 	}
+
+	/*
+	 * Attach and create pass through here, only update backend memory
+	 * allocated in pgstat_activity for the creator process.
+	 */
+	if (op == DSM_OP_CREATE)
+		pgstat_report_backend_mem_allocated_increase(req

Re: SQL/JSON features for v15

2022-08-31 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-31 10:20:24 -0400, Jonathan S. Katz wrote:
>> Andres, Robert, Tom: With this recent work, have any of your opinions
>> changed on including SQL/JSON in v15?

> I don't really know what to do here. It feels blatantly obvious that this code
> isn't even remotely close to being releasable. I'm worried about the impact of
> the big revert at this stage of the release cycle, and that's not getting
> better by delaying further. And I'm getting weary of being asked to make the
> obvious call that the authors of this feature as well as the RMT should have
> made a while ago.

I have to agree.  There is a large amount of code at stake here.
We're being asked to review a bunch of hastily-produced patches
to that code on an even more hasty schedule (and personally
I have other things I need to do today...)  I think the odds
of a favorable end result are small.

> From my POV the only real discussion is whether we'd want to revert this in 15
> and HEAD or just 15. There's imo a decent point to be made to just revert in
> 15 and aggressively press forward with the changes posted in this thread.

I'm not for that.  Code that we don't think is ready to ship
has no business being in the common tree, nor does it make
review any easier to be looking at one bulky set of
already-committed patches and another bulky set of deltas.

I'm okay with making an exception for the include/nodes/ and
backend/nodes/ files in HEAD, since the recent changes in that
area mean it'd be a lot of error-prone work to produce a reverting
patch there.  We can leave those in as dead code temporarily, I think.

regards, tom lane




Re: [PATCH] Query Jumbling for CALL and SET utility statements

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote:
> While query jumbling is provided for function calls that’s currently not the
> case for procedures calls.
> The reason behind this is that all utility statements are currently
> discarded for jumbling.
> [...]
> That’s why we think it would make sense to allow jumbling for those 2
> utility statements: CALL and SET.

Yes, I've seen this be an issue repeatedly. Although more heavily for PREPARE
/ EXECUTE than either of the two cases you handle here. IME not tracking
PREPARE / EXECUTE can distort statistics substantially - there's appears to be
a surprising number of applications / frameworks resorting to them. Basically
requiring that track utility is turned on.

I suspect we should carve out things like CALL, PREPARE, EXECUTE from
track_utility - it's more or less an architectural accident that they're
utility statements.  It's a bit less clear that SET should be dealt with that
way.



> @@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node)
>   APP_JUMB(var->varlevelsup);
>   }
>   break;
> + case T_CallStmt:
> + {
> + CallStmt   *stmt = (CallStmt *) node;
> + FuncExpr   *expr = stmt->funcexpr;
> +
> + APP_JUMB(expr->funcid);
> + JumbleExpr(jstate, (Node *) expr->args);
> + }
> + break;

Why do we need to take the arguments into account?


> + case T_VariableSetStmt:
> + {
> + VariableSetStmt *stmt = (VariableSetStmt *) 
> node;
> +
> + APP_JUMB_STRING(stmt->name);
> + JumbleExpr(jstate, (Node *) stmt->args);
> + }
> + break;

Same?


> + case T_A_Const:
> + {
> + int loc = ((const A_Const 
> *) node)->location;
> +
> + RecordConstLocation(jstate, loc);
> + }
> + break;

I suspect we only need this because of the jumbling of unparsed arguments I
questioned above?  If we do end up needing it, shouldn't we include the type
in the jumbling?

Greetings,

Andres Freund




Re: Tracking last scan time

2022-08-31 Thread Bruce Momjian
On Wed, Aug 31, 2022 at 05:02:33PM +0100, Dave Page wrote:
> 
> 
> On Tue, 30 Aug 2022 at 19:46, Bruce Momjian  wrote:
> 
> On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote:
> > On Thu, 25 Aug 2022 at 01:44, David Rowley  wrote:
> >     I don't have a particular opinion about the patch, I'm just pointing
> >     out that there are other ways. Even just writing down the numbers on
> a
> >     post-it note and coming back in a month to see if they've changed is
> >     enough to tell if the table or index has been used.
> >
> >
> > There are usually other ways to perform monitoring tasks, but there is
> > something to be said for the convenience of having functionality built 
> in
> and
> > not having to rely on tools, scripts, or post-it notes :-)
> 
> Should we consider using something cheaper like time() so we don't need
> a GUC to enable this?
> 
> 
> Interesting idea, but on my mac at least, 100,000,000 gettimeofday() calls
> takes about 2 seconds, whilst 100,000,000 time() calls takes 14(!) seconds.

Wow.  I was just thinking you need second-level accuracy, which must be
cheap somewhere.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Bug fix. autovacuum.c do_worker_start() associates memory allocations with TopMemoryContext rather than 'Autovacuum start worker (tmp)'

2022-08-31 Thread Reid Thompson
Hi Hackers,

This patch ensures get_database_list() switches back to the memory
context in use upon entry rather than returning with TopMemoryContext
as the context.

This will address memory allocations in autovacuum.c being associated
with TopMemoryContext when they should not be.

autovacuum.c do_start_worker() with current context
'Autovacuum start worker (tmp)' invokes get_database_list(). Upon
return, the current context has been changed to TopMemoryContext by
AtCommit_Memory() as part of an internal transaction. Further down
in the do_start_worker(), pgstat_fetch_stat_dbentry() is invoked.
Previously this didn't pose a issue, however recent changes altered
how pgstat_fetch_stat_dbentry() is implemented. The new
implementation has a branch utilizing palloc. The patch ensures these
allocations are associated with the 'Autovacuum start worker (tmp)'
context rather than the TopMemoryContext. Prior to the change,
leaving an idle laptop PG instance running just shy of 3 days saw the
autovacuum launcher process grow to 42MB with most of that growth in
TopMemoryContext due to the palloc allocations issued with autovacuum
worker startup.



-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com



-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index b3b1afba86..92b1ef45c1 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1941,6 +1941,13 @@ get_database_list(void)
 
 	CommitTransactionCommand();
 
+	/*
+	 * CommitTransactionCommand returns with CurrentMemoryContext set
+	 * to TopMemoryContext. Switch back to the context that we entered
+	 * with.
+	 */
+	MemoryContextSwitchTo(resultcxt);
+
 	return dblist;
 }
 


Re: Tracking last scan time

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-23 10:55:09 +0100, Dave Page wrote:
> Often it is beneficial to review one's schema with a view to removing
> indexes (and sometimes tables) that are no longer required. It's very
> difficult to understand when that is the case by looking at the number of
> scans of a relation as, for example, an index may be used infrequently but
> may be critical in those times when it is used.
> 
> The attached patch against HEAD adds optional tracking of the last scan
> time for relations. It updates pg_stat_*_tables with new last_seq_scan and
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> help with this.
> 
> Due to the use of gettimeofday(), those values are only maintained if a new
> GUC, track_scans, is set to on. By default, it is off.
> 
> I did run a 12 hour test to see what the performance impact is. pgbench was
> run with scale factor 1 and 75 users across 4 identical bare metal
> machines running Rocky 8 in parallel which showed roughly a -2% average
> performance penalty against HEAD with track_scans enabled. Machines were
> PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data
> directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source
> is tsc.
> 
>HEAD   track_scans  Penalty (%)
> box1   19582.4973519341.8881  -1.22869541
> box2   19936.5551319928.07479-0.04253664659
> box3   19631.7889518649.64379-5.002830696
> box4   19810.8676719420.67192-1.969604525
> Average 19740.4272819335.06965-2.05343896

Based on the size of those numbers this was a r/w pgbench. If it has this
noticable an impact for r/w, with a pretty low number of scans/sec, how's the
overhead for r/o (which can have 2 orders of magnitude more scans/sec)? It
must be quite bad.

I don't think we should accept this feature with this overhead - but I also
think we can do better, by accepting a bit less accuracy. For this to be
useful we don't need a perfectly accurate timestamp. The statement start time
is probably not accurate enough, but we could just have bgwriter or such
update one in shared memory every time we wake up? Or perhaps we could go to
an even lower granularity, by putting in the current LSN or such?

Greetings,

Andres Freund




Re: SQL/JSON features for v15

2022-08-31 Thread Robert Haas
On Wed, Aug 31, 2022 at 10:20 AM Jonathan S. Katz  wrote:
> Andres, Robert, Tom: With this recent work, have any of your opinions
> changed on including SQL/JSON in v15?

No. Nothing's been committed, and there's no time to review anything
in detail, and there was never going to be. Nikita said he was ready
to start hacking in mid-August. That's good of him, but feature freeze
was in April. We don't start hacking on a feature 4 months after the
freeze. I'm unwilling to drop everything I'm working on to review
patches that were written in a last minute rush. Even if these patches
were more important to me than my own work, which they are not, I
couldn't possibly do a good job reviewing complex patches on top of
other complex patches in an area that I haven't studied in years. And
if I could do a good job, no doubt I'd find a bunch of problems -
whether they would be large or small, I don't know - and then that
would lead to more changes even closer to the intended release date.

I just don't understand what the RMT thinks it is doing here. When a
concern is raised about whether a feature is anywhere close to being
in a releasable state in August, "hack on it some more and then see
where we're at" seems like an obviously impractical way forward. It
seemed clear to me from the moment that Andres raised his concerns
that the only two viable strategies were (1) revert the feature and be
sad or (2) decide to ship it anyway and hope that Andres is incorrect
in thinking that it will become an embarrassment to the project. The
RMT has chosen neither of these, and in fact, really seems to want
someone else to make the decision. But that's not how it works. The
RMT concept was invented precisely to solve problems like this one,
where the patch authors don't really want to revert it but other
people think it's pretty busted. If such problems were best addressed
by waiting for a long time to see whether anything changes, we
wouldn't need an RMT. That's exactly how we used to handle these kinds
of problems, and it sucked.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: SQL/JSON features for v15

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 12:26:29 -0400, Robert Haas wrote:
> On Wed, Aug 31, 2022 at 10:20 AM Jonathan S. Katz  
> wrote:
> > Andres, Robert, Tom: With this recent work, have any of your opinions
> > changed on including SQL/JSON in v15?
> 
> No. Nothing's been committed, and there's no time to review anything
> in detail, and there was never going to be. Nikita said he was ready
> to start hacking in mid-August. That's good of him, but feature freeze
> was in April.

As additional context: I had started raising those concerns mid June.

Greetings,

Andres Freund




Re: SQL/JSON features for v15

2022-08-31 Thread Jonathan S. Katz

On 8/31/22 12:26 PM, Robert Haas wrote:

On Wed, Aug 31, 2022 at 10:20 AM Jonathan S. Katz  wrote:

Andres, Robert, Tom: With this recent work, have any of your opinions
changed on including SQL/JSON in v15?


No. Nothing's been committed, and there's no time to review anything
in detail, and there was never going to be.


OK. Based on this feedback, the RMT is going to request that this is 
reverted.


With RMT hat on -- Andrew can you please revert the patchset?


 Nikita said he was ready
to start hacking in mid-August. That's good of him, but feature freeze
was in April. We don't start hacking on a feature 4 months after the
freeze. I'm unwilling to drop everything I'm working on to review
patches that were written in a last minute rush. Even if these patches
were more important to me than my own work, which they are not, I
couldn't possibly do a good job reviewing complex patches on top of
other complex patches in an area that I haven't studied in years. And
if I could do a good job, no doubt I'd find a bunch of problems -
whether they would be large or small, I don't know - and then that
would lead to more changes even closer to the intended release date.

I just don't understand what the RMT thinks it is doing here. When a
concern is raised about whether a feature is anywhere close to being
in a releasable state in August, "hack on it some more and then see
where we're at" seems like an obviously impractical way forward. It
seemed clear to me from the moment that Andres raised his concerns
that the only two viable strategies were (1) revert the feature and be
sad or (2) decide to ship it anyway and hope that Andres is incorrect
in thinking that it will become an embarrassment to the project. The
RMT has chosen neither of these, and in fact, really seems to want
someone else to make the decision. But that's not how it works. The
RMT concept was invented precisely to solve problems like this one,
where the patch authors don't really want to revert it but other
people think it's pretty busted. If such problems were best addressed
by waiting for a long time to see whether anything changes, we
wouldn't need an RMT. That's exactly how we used to handle these kinds
of problems, and it sucked.


This is fair feedback. However, there are a few things to consider here:

1. When Andres raised his initial concerns, the RMT did recommend to 
revert but did not force it. Part of the RMT charter is to try to get 
consensus before doing so and after we've exhausted the community 
process. As we moved closer, the patch others proposed some suggestions 
which other folks were amenable to trying.


Unfortunately, time has run out. However,

2. One of the other main goals of the RMT is to ensure the release ships 
"on time" which we define to be late Q3/early Q4. We factored that into 
the decision making process around this. We are still on time for the 
release.


I take responsibility for the decision making. I would be open to 
discuss this further around what worked / what didn't with the RMT and 
where we can improve in the future.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Add the ability to limit the amount of memory that can be allocated to backends.

2022-08-31 Thread Reid Thompson
Hi Hackers,

Add the ability to limit the amount of memory that can be allocated to
backends.

This builds on the work that adds backend memory allocated to
pg_stat_activity
https://www.postgresql.org/message-id/67bb5c15c0489cb499723b0340f16e10c22485ec.camel%40crunchydata.com
Both patches are attached.

Add GUC variable max_total_backend_memory.

Specifies a limit to the amount of memory (MB) that may be allocated to
backends in total (i.e. this is not a per user or per backend limit).
If unset, or set to 0 it is disabled. It is intended as a resource to
help avoid the OOM killer. A backend request that would push the total
over the limit will be denied with an out of memory error causing that
backends current query/transaction to fail. Due to the dynamic nature
of memory allocations, this limit is not exact. If within 1.5MB of the
limit and two backends request 1MB each at the same time both may be
allocated exceeding the limit. Further requests will not be allocated
until dropping below the limit. Keep this in mind when setting this
value to avoid the OOM killer. Currently, this limit does not affect
auxiliary backend processes, this list of non-affected backend
processes is open for discussion as to what should/should not be
included. Backend memory allocations are displayed in the
pg_stat_activity view. 



-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..caf958310a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2079,6 +2079,32 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_total_backend_memory (integer)
+  
+   max_total_backend_memory configuration parameter
+  
+  
+  
+   
+Specifies a limit to the amount of memory (MB) that may be allocated to
+backends in total (i.e. this is not a per user or per backend limit).
+If unset, or set to 0 it is disabled.  A backend request that would push
+the total over the limit will be denied with an out of memory error
+causing that backends current query/transaction to fail. Due to the dynamic
+nature of memory allocations, this limit is not exact. If within 1.5MB of
+the limit and two backends request 1MB each at the same time both may be
+allocated exceeding the limit. Further requests will not be allocated until
+dropping below the limit. Keep this in mind when setting this value. This
+limit does not affect auxiliary backend processes
+ . Backend memory allocations
+(backend_mem_allocated) are displayed in the
+pg_stat_activity
+view.
+   
+  
+ 
+
  
  
 
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 269ad2fe53..808ffe75f2 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -253,6 +253,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		return true;
 	}
 
+	/* Do not exceed maximum allowed memory allocation */
+	if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+		return false;
+
 	/*
 	 * Create new segment or open an existing one for attach.
 	 *
@@ -524,6 +528,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size,
 		int			flags = IPCProtection;
 		size_t		segsize;
 
+		/* Do not exceed maximum allowed memory allocation */
+		if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+			return false;
+
 		/*
 		 * Allocate the memory BEFORE acquiring the resource, so that we don't
 		 * leak the resource if memory allocation fails.
@@ -718,6 +726,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size,
 		return true;
 	}
 
+	/* Do not exceed maximum allowed memory allocation */
+	if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size))
+		return false;
+
 	/* Create new segment or open an existing one for attach. */
 	if (op == DSM_OP_CREATE)
 	{
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 17a00587f8..9137a000ae 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -44,6 +44,8 @@
  */
 bool		pgstat_track_activities = false;
 int			pgstat_track_activity_query_size = 1024;
+/* Max backend memory allocation allowed (MB). 0 = disabled */
+int			max_total_bkend_mem = 0;
 
 
 /* exposed so that backend_progress.c can access it */
@@ -1253,3 +1255,107 @@ pgstat_report_backend_mem_allocated_decrease(uint64 deallocation)
 	beentry->backend_mem_allocated -= deallocation;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
+
+/* --
+ * pgstat_get_all_backend_memory_allocated() -
+ *
+ *	Return a uint64 representing the current shared memory allocated to all
+ *	backends.  This looks directly at the BackendStatusArray, and so will
+ *	pro

Re: [PATCH] Query Jumbling for CALL and SET utility statements

2022-08-31 Thread Pavel Stehule
st 31. 8. 2022 v 17:50 odesílatel Pavel Stehule 
napsal:

> Hi
>
>
> st 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand 
> napsal:
>
>> Hi hackers,
>>
>> While query jumbling is provided for function calls that’s currently not
>> the case for procedures calls.
>> The reason behind this is that all utility statements are currently
>> discarded for jumbling.
>>
>> We’ve recently seen performance impacts (LWLock contention) due to the
>> lack of jumbling on procedure calls with pg_stat_statements and
>> pg_stat_statements.track_utility enabled (think an application with a high
>> rate of procedure calls with unique parameters for each call).
>>
>> Jeremy has had this conversation on twitter (see
>> https://twitter.com/jer_s/status/1560003560116342785) and Nikolay
>> reported that he also had to work on a similar performance issue with SET
>> being used.
>>
>> That’s why we think it would make sense to allow jumbling for those 2
>> utility statements: CALL and SET.
>>
>> Please find attached a patch proposal for doing so.
>>
>> With the attached patch we would get things like:
>> CALL MINUS_TWO(3);
>> CALL MINUS_TWO(7);
>> CALL SUM_TWO(3, 8);
>> CALL SUM_TWO(7, 5);
>> set enable_seqscan=false;
>> set enable_seqscan=true;
>> set seq_page_cost=2.0;
>> set seq_page_cost=1.0;
>>
>> postgres=# SELECT query, calls, rows FROM pg_stat_statements;
>>query   | calls | rows
>> ---+---+--
>>  set seq_page_cost=$1  | 2 |0
>>  CALL MINUS_TWO($1)| 2 |0
>>  set enable_seqscan=$1 | 2 |0
>>  CALL SUM_TWO($1, $2)  | 2 |0
>>
>> Looking forward to your feedback,
>>
> The idea is good, but I think you should use pg_stat_functions instead.
> Maybe it is supported already (I didn't test it). I am  not sure so SET
> statement should be traced in pg_stat_statements - it is usually pretty
> fast, and without context it says nothing. It looks like just overhead.
>

I was wrong - there is an analogy with SELECT fx, and the statistics are in
pg_stat_statements, and in pg_stat_function too.

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>> Thanks,
>>
>> Jeremy & Bertrand
>>
>> --
>> Bertrand Drouvot
>> Amazon Web Services: https://aws.amazon.com
>>
>>


Re: Add tracking of backend memory allocated to pg_stat_activity

2022-08-31 Thread Justin Pryzby
On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
> Hi Hackers,
> 
> Attached is a patch to 
> Add tracking of backend memory allocated to pg_stat_activity

> +  proargmodes => 
> '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',

In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years.  This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information?  Like adding backends to
pg_get_backend_memory_contexts() , maybe with another view on top of
that ?

+* shown allocated in pgstat_activity when the creator destroys 
the 
   

pg_stat

> +  * Posix creation calls dsm_impl_posix_resize implying that 
> resizing
> +  * occurs or may be added in the future. As implemented
> +  * dsm_impl_posix_resize utilizes fallocate or truncate, 
> passing the
> +  * whole new size as input, growing the allocation as needed * 
> (only
> +  * truncate supports shrinking). We update by replacing the * 
> old

wrapping caused extraneous stars

> +  * Do not allow backend_mem_allocated to go below zero. ereport if we
> +  * would have. There's no need for a lock around the read here asit's

as it's

> + ereport(LOG, (errmsg("decrease reduces reported backend memory 
> allocated below zero; setting reported to 0")));

errmsg() doesn't require the outside paranthesis since a couple years
ago.

> + /*
> +  * Until header allocation is included in context->mem_allocated cast to
> +  * slab and decrement the headerSize

add a comma before "cast" ?

-- 
Justin




Re: SQL/JSON features for v15

2022-08-31 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> From my POV the only real discussion is whether we'd want to revert this in 
>> 15
>> and HEAD or just 15. There's imo a decent point to be made to just revert in
>> 15 and aggressively press forward with the changes posted in this thread.

> I'm not for that.  Code that we don't think is ready to ship
> has no business being in the common tree, nor does it make
> review any easier to be looking at one bulky set of
> already-committed patches and another bulky set of deltas.

To enlarge on that a bit: it seems to me that the really fundamental
issue here is how to catch datatype-specific input and conversion
errors without using subtransactions, because those are too expensive
and can mask errors we'd rather not be masking, such as OOM.  (Andres
had some additional, more localized concerns, but I think this is the
one with big-picture implications.)

The currently proposed patchset hacks up a relatively small number
of core datatypes to be able to do that.  But it's just a hack
and there's no prospect of extension types being able to join
in the fun.  I think where we need to start, for v16, is making
an API design that will let any datatype have this functionality.
(I don't say that we'd convert every datatype to do so right away;
in the long run we should, but I'm content to start with just the
same core types touched here.)  Beside the JSON stuff, there is
another even more pressing application for such behavior, namely
the often-requested COPY functionality to be able to shunt bad data
off somewhere without losing the entire transfer.  In the COPY case
I think we'd want to be able to capture the error message that
would have been issued, which means the current patches are not
at all appropriate as a basis for that API design: they're just
returning a bool without any details.

So that's why I'm in favor of reverting and starting over.
There are probably big chunks of what's been done that can be
re-used, but it all needs to be re-examined with this sort of
design in mind.

As a really quick sketch of what such an API might look like:
we could invent a new node type, say IOCallContext, which is
intended to be passed as FunctionCallInfo.context to type
input functions and perhaps type conversion functions.
Call sites wishing to have no-thrown-error functionality would
initialize one of these to show "no error" and then pass it
to the data type's usual input function.  Old-style input
functions would ignore this and just throw errors as usual;
sorry, you don't get the no-error functionality you wanted.
But I/O functions that had been updated would know to store the
report of a relevant error into that node and then return NULL.
(Although I think there may be assumptions somewhere that
I/O functions don't return NULL, so maybe "just return any
dummy value" is a better idea?  Although likely it wouldn't
be hard to remove such assumptions from callers using this
functionality.)  The caller would detect the presence of an error
by examining the node contents and then do whatever it needs to do.

regards, tom lane




Re: SQL/JSON features for v15

2022-08-31 Thread Robert Haas
On Wed, Aug 31, 2022 at 1:06 PM Tom Lane  wrote:
> The currently proposed patchset hacks up a relatively small number
> of core datatypes to be able to do that.  But it's just a hack
> and there's no prospect of extension types being able to join
> in the fun.  I think where we need to start, for v16, is making
> an API design that will let any datatype have this functionality.

This would be really nice to have.

> (I don't say that we'd convert every datatype to do so right away;
> in the long run we should, but I'm content to start with just the
> same core types touched here.)

I would be in favor of making more of an effort than just a few token
data types. The initial patch could just touch a few, but once the
infrastructure is in place we should really make a sweep through the
tree and tidy up.

> Beside the JSON stuff, there is
> another even more pressing application for such behavior, namely
> the often-requested COPY functionality to be able to shunt bad data
> off somewhere without losing the entire transfer.  In the COPY case
> I think we'd want to be able to capture the error message that
> would have been issued, which means the current patches are not
> at all appropriate as a basis for that API design: they're just
> returning a bool without any details.

Fully agreed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SQL/JSON features for v15

2022-08-31 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 31, 2022 at 1:06 PM Tom Lane  wrote:
>> (I don't say that we'd convert every datatype to do so right away;
>> in the long run we should, but I'm content to start with just the
>> same core types touched here.)

> I would be in favor of making more of an effort than just a few token
> data types. The initial patch could just touch a few, but once the
> infrastructure is in place we should really make a sweep through the
> tree and tidy up.

Sure, but my point is that we can do that in a time-extended fashion
rather than having a flag day where everything must be updated.
The initial patch just needs to update a few types as proof of concept.

regards, tom lane




Re: [PATCH] Add native windows on arm64 support

2022-08-31 Thread Tom Lane
Michael Paquier  writes:
> At the end, I'd like to think that it would be wiser to just remove
> entirely the test for node() or reduce the contents of the string to
> be able to strictly control the output order (say a simple
> 'prepost' would do the trick).  I cannot think
> about a better idea now.  Note that removing the test case where we
> have node() has no impact on the coverage of xml.c.

Yeah, I confirm that: local code-coverage testing shows no change
in the number of lines reached in xml.c when I remove that column:

-SELECT * FROM XMLTABLE('*' PASSING 'pre&deeppost' COLUMNS x xml PATH 'node()', 
y xml PATH '/');
+SELECT * FROM XMLTABLE('*' PASSING 'pre&deeppost' COLUMNS y xml PATH '/');

Dropping the query altogether does result in a reduction in the
number of lines hit.

I did not check the other XMLTABLE infrastructure, so what probably
is best to do is keep the two output columns but change 'node()'
to something with a more stable result; any preferences?

regards, tom lane




Re: Add the ability to limit the amount of memory that can be allocated to backends.

2022-08-31 Thread Justin Pryzby
On Wed, Aug 31, 2022 at 12:50:19PM -0400, Reid Thompson wrote:
> Hi Hackers,
> 
> Add the ability to limit the amount of memory that can be allocated to
> backends.
> 
> This builds on the work that adds backend memory allocated to
> pg_stat_activity
> https://www.postgresql.org/message-id/67bb5c15c0489cb499723b0340f16e10c22485ec.camel%40crunchydata.com
> Both patches are attached.

You should name the patches with different prefixes, like
001,002,003  Otherwise, cfbot may try to apply them in the wrong order.
git format-patch is the usual tool for that.

> +Specifies a limit to the amount of memory (MB) that may be allocated 
> to

MB are just the default unit, right ?
The user should be allowed to write max_total_backend_memory='2GB'

> +backends in total (i.e. this is not a per user or per backend limit).
> +If unset, or set to 0 it is disabled.  A backend request that would 
> push
> +the total over the limit will be denied with an out of memory error
> +causing that backends current query/transaction to fail. Due to the 
> dynamic

backend's

> +nature of memory allocations, this limit is not exact. If within 
> 1.5MB of
> +the limit and two backends request 1MB each at the same time both 
> may be
> +allocated exceeding the limit. Further requests will not be 
> allocated until

allocated, and exceed the limit

> +bool
> +exceeds_max_total_bkend_mem(uint64 allocation_request)
> +{
> + bool result = false;
> +
> + if (MyAuxProcType != NotAnAuxProcess)
> + return result;

The double negative is confusing, so could use a comment.

> + /* Convert max_total_bkend_mem to bytes for comparison */
> + if (max_total_bkend_mem &&
> + pgstat_get_all_backend_memory_allocated() +
> + allocation_request > (uint64)max_total_bkend_mem * 1024 * 1024)
> + {
> + /*
> +  * Explicitely identify the OOM being a result of this
> +  * configuration parameter vs a system failure to allocate OOM.
> +  */
> + elog(WARNING,
> +  "request will exceed postgresql.conf defined 
> max_total_backend_memory limit (%lu > %lu)",
> +  pgstat_get_all_backend_memory_allocated() +
> +  allocation_request, (uint64)max_total_bkend_mem * 1024 
> * 1024);

I think it should be ereport() rather than elog(), which is
internal-only, and not-translated.

> +   {"max_total_backend_memory", PGC_SIGHUP, RESOURCES_MEM,
> +   gettext_noop("Restrict total backend memory 
> allocations to this max."),
> +   gettext_noop("0 turns this feature off."),
> +   GUC_UNIT_MB
> +   },
> +   &max_total_bkend_mem,
> +   0, 0, INT_MAX,
> +   NULL, NULL, NULL

I think this needs a maximum like INT_MAX/1024/1024

> +uint64
> +pgstat_get_all_backend_memory_allocated(void)
> +{
...
> + for (i = 1; i <= NumBackendStatSlots; i++)
> + {

It's looping over every backend for each allocation.
Do you know if there's any performance impact of that ?

I think it may be necessary to track the current allocation size in
shared memory (with atomic increments?).  Maybe decrements would need to
be exactly accounted for, or otherwise Assert() that the value is not
negative.  I don't know how expensive it'd be to have conditionals for
each decrement, but maybe the value would only be decremented at
strategic times, like at transaction commit or backend shutdown.

-- 
Justin




Re: PostgreSQL 15 release announcement draft

2022-08-31 Thread Michael Banck
Hi,

On Tue, Aug 30, 2022 at 03:58:48PM -0400, Jonathan S. Katz wrote:
> ### Other Notable Changes
> 
> PostgreSQL server-level statistics are now collected in shared memory,
> eliminating the statistics collector process and writing these stats to disk.
> PostgreSQL 15 also revokes the `CREATE` permission from all users except a
> database owner from the `public` (or default) schema.

It's a bit weird to lump those two in the same paragraph, but ok.

However, I think the "and writing these stats to disk." might not be
very clear to people not familiar with the feature, they might think
writing stats to disk is part of the new feature. So I propose "as well
as writing theses stats to disk" instead or something.


Michael







Re: Tracking last scan time

2022-08-31 Thread Matthias van de Meent
On Wed, 31 Aug 2022 at 18:21, Andres Freund  wrote:
>
> Hi,
>
> On 2022-08-23 10:55:09 +0100, Dave Page wrote:
> > Often it is beneficial to review one's schema with a view to removing
> > indexes (and sometimes tables) that are no longer required. It's very
> > difficult to understand when that is the case by looking at the number of
> > scans of a relation as, for example, an index may be used infrequently but
> > may be critical in those times when it is used.
> >
> > The attached patch against HEAD adds optional tracking of the last scan
> > time for relations. It updates pg_stat_*_tables with new last_seq_scan and
> > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> > help with this.
> >
> > Due to the use of gettimeofday(), those values are only maintained if a new
> > GUC, track_scans, is set to on. By default, it is off.
> >
> > I did run a 12 hour test to see what the performance impact is. pgbench was
> > run with scale factor 1 and 75 users across 4 identical bare metal
> > machines running Rocky 8 in parallel which showed roughly a -2% average
> > performance penalty against HEAD with track_scans enabled. Machines were
> > PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data
> > directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source
> > is tsc.
> >
> >HEAD   track_scans  Penalty (%)
> > box1   19582.4973519341.8881  -1.22869541
> > box2   19936.5551319928.07479-0.04253664659
> > box3   19631.7889518649.64379-5.002830696
> > box4   19810.8676719420.67192-1.969604525
> > Average 19740.4272819335.06965-2.05343896
>
> Based on the size of those numbers this was a r/w pgbench. If it has this
> noticable an impact for r/w, with a pretty low number of scans/sec, how's the
> overhead for r/o (which can have 2 orders of magnitude more scans/sec)? It
> must be quite bad.
>
> I don't think we should accept this feature with this overhead - but I also
> think we can do better, by accepting a bit less accuracy. For this to be
> useful we don't need a perfectly accurate timestamp. The statement start time
> is probably not accurate enough, but we could just have bgwriter or such
> update one in shared memory every time we wake up? Or perhaps we could go to
> an even lower granularity, by putting in the current LSN or such?

I don't think that LSN is precise enough. For example, if you're in a
(mostly) read-only system, the system may go long times without any
meaningful records being written.

As for having a lower granularity and preventing the
one-syscall-per-Relation issue, can't we reuse the query_start or
state_change timestamps that appear in pg_stat_activity (potentially
updated immediately before this stat flush), or some other per-backend
timestamp that is already maintained and considered accurate enough
for this use?
Regardless, with this patch as it is we get a new timestamp for each
relation processed, which I think is a waste of time (heh) even in
VDSO-enabled systems.

Apart from the above, I don't have any other meaningful opinion on
this patch - it might be a good addition, but I don't consume stats
often enough to make a good cost / benefit comparison.

Kind regards,

Matthias van de Meent




Re: Tracking last scan time

2022-08-31 Thread Bruce Momjian
On Wed, Aug 31, 2022 at 07:52:49PM +0200, Matthias van de Meent wrote:
> As for having a lower granularity and preventing the
> one-syscall-per-Relation issue, can't we reuse the query_start or
> state_change timestamps that appear in pg_stat_activity (potentially

Yeah, query start should be fine, but not transaction start time.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: [RFC] building postgres with meson - v12

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 10:28:05 +0200, Peter Eisentraut wrote:
> I found that the perl test modules are not installed.  See attached patch to
> correct this.
>
> To the patches:
>
> 4e15ee0e24 Don't hardcode tmp_check/ as test directory for tap tests
> 1a3169bc3f Split TESTDIR into TESTLOGDIR and TESTDATADIR
>
> It's a bit weird that the first patch changes the meaning of TESTDIR
> and the second patch removes it.  Maybe these patches should be
> squashed together?

Hm, to me they seem topically separate enough, but I don't have a strong
opinion on it.


> 581721fa99 meson: prereq: Add src/tools/gen_export.pl
>
> Still wondering about the whitespace changes I reported recently, but
> that can also be fine-tuned later.

I'll look into it in a bit.


> a1fb97a81b meson: Add meson based buildsystem
>
> I'm not a fan of all this business to protect the two build systems
> from each other.  I don't like the build process touching a file under
> version control every time.  How necessary is this?  What happens
> otherwise?

I added it after just about everyone trying meson hit problems due to
conflicts between (past) in-tree configure builds and meson, due to files left
in tree (picking up the wrong .h files, cannot entirely be fixed with -I
arguments, due to the "" includes). By adding the relevant check to the meson
configure phase, and by triggering meson re-configure whenever an in-tree
configure build is done, these issues can be detected.

It'd of course be nicer to avoid the potential for such conflicts, but that
appears to be a huge chunk of work, see the bison/flex subthread.

So I don't really see an alternative.


> conversion_helpers.txt: should probably be removed now.

Done.


> doc/src/sgml/resolv.xsl: I don't understand what this is doing.  Maybe
> at least add a comment in the file.

It's only used for building epubs. Perhaps I should extract that into a
separate patch as well? The relevant section is:

> #
> # epub
> #
>
> # This was previously implemented using dbtoepub - but that doesn't seem to
> # support running in build != source directory (i.e. VPATH builds already
> # weren't supported).
> if pandoc.found() and xsltproc.found()
>   # XXX: Wasn't able to make pandoc successfully resolve entities
>   # XXX: Perhaps we should just make all targets use this, to avoid repeatedly
>   # building whole thing? It's comparatively fast though.
>   postgres_full_xml = custom_target('postgres-full.xml',
> input: ['resolv.xsl', 'postgres.sgml'],
> output: ['postgres-full.xml'],
> depends: doc_generated + [postgres_sgml_valid],
> command: [xsltproc, '--path', '@OUTDIR@/', xsltproc_flags,
>   '-o', '@OUTPUT@', '@INPUT@'],
> build_by_default: false,
>   )

A noted, I couldn't make pandoc resolve our entities, so I used resolv.xsl
them, before calling pandoc.

I'll rename it to resolve-entities.xsl and add a comment.


> src/common/unicode/meson.build: The comment at the top of the file
> should be moved next to the files it is describing (similar to how it
> is in the makefile).

Done.


> I don't see CLDR_VERSION set anywhere.  Is that part implemented?

No, I didn't implement the generation parts of contrib/unaccent. I started
tackling the src/common/unicode bits after John Naylor asked whether that
could be done, but considered that good enough...


> src/port/win32ver.rc.in: This is redundant with src/port/win32ver.rc.
> (Note that the latter is also used as an input file for text
> substitution.  So having another file named *.in next to it would be
> super confusing.)

Yea, this stuff isn't great. I think the better solution, both for meson and
for configure, would be to move to do all the substitution to the C
preprocessor.


> src/tools/find_meson: Could use a brief comment what it does.

Added.


> src/tools/pgflex: Could use a not-brief comment about what it does,
> why it's needed.  Also a comment where it's used.  Also run this
> through pycodestyle.

Working on that.


> cd193eb3e8 meson: ci: Build both with meson and as before
>
> I haven't reviewed this one in detail.  Maybe add a summary in the
> commit message, like these are the new jobs, these are the changes to
> existing jobs.  It looks like there is more in there than just adding
> a few meson jobs.

I don't think we want to commit this as-is. It contains CI for a lot of
platforms - that's very useful for working on meson, but too much for
in-tree. I guess I'll split it into two, one patch for converting a reasonable
subset of the current CI tasks to meson and another to add (back) the current
array of tested platforms.


> If the above are addressed, I think this will be just about at the
> point where the above patches can be committed.

Woo!


> Everything past these patches I'm mentally postponing right now.

Makes sense.

Greetings,

Andres Freund




Re: SQL/JSON features for v15

2022-08-31 Thread Andrew Dunstan


On 2022-08-31 We 12:48, Jonathan S. Katz wrote:
>
>
> With RMT hat on -- Andrew can you please revert the patchset?


:-(


Yes, I'll do it, starting with the v15 branch. Might take a day or so.


cheers (kinda)


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: SQL/JSON features for v15

2022-08-31 Thread Bruce Momjian
On Wed, Aug 31, 2022 at 12:26:29PM -0400, Robert Haas wrote:
> someone else to make the decision. But that's not how it works. The
> RMT concept was invented precisely to solve problems like this one,
> where the patch authors don't really want to revert it but other
> people think it's pretty busted. If such problems were best addressed
> by waiting for a long time to see whether anything changes, we
> wouldn't need an RMT. That's exactly how we used to handle these kinds
> of problems, and it sucked.

I saw the RMT/Jonathan stated August 28 as the cut-off date for a
decision, which was later changed to September 1:

> The RMT is still inclined to revert, but will give folks until Sep 1 0:00
> AoE[1] to reach consensus on if SQL/JSON can be included in v15. This matches
> up to Andrew's availability timeline for a revert, and gives enough time to
> get through the buildfarm prior to the Beta 4 release[2].
 
I guess you are saying that setting a cut-off was a bad idea, or that
the cut-off was too close to the final release date.  For me, I think
there were three questions:

1.  Were subtransactions acceptable, consensus no
2.  Could trapping errors work for PG 15, consensus no
3.  Could the feature be trimmed back for PG 15 to avoid these, consensus ?

I don't think our community works well when there are three issues in
play at once.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: SQL/JSON features for v15

2022-08-31 Thread Bruce Momjian
On Wed, Aug 31, 2022 at 12:04:44PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > From my POV the only real discussion is whether we'd want to revert this in 
> > 15
> > and HEAD or just 15. There's imo a decent point to be made to just revert in
> > 15 and aggressively press forward with the changes posted in this thread.
> 
> I'm not for that.  Code that we don't think is ready to ship
> has no business being in the common tree, nor does it make
> review any easier to be looking at one bulky set of
> already-committed patches and another bulky set of deltas.

Agreed on removing from PG 15 and master --- it would be confusing to
have lots of incomplete code in master that is not in PG 15.

> I'm okay with making an exception for the include/nodes/ and
> backend/nodes/ files in HEAD, since the recent changes in that
> area mean it'd be a lot of error-prone work to produce a reverting
> patch there.  We can leave those in as dead code temporarily, I think.

I don't have an opinion on this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: effective_multixact_freeze_max_age issue

2022-08-31 Thread Peter Geoghegan
On Tue, Aug 30, 2022 at 8:56 PM Nathan Bossart  wrote:
> LGTM

Pushed, thanks.

-- 
Peter Geoghegan




Re: SQL/JSON features for v15

2022-08-31 Thread Tom Lane
Bruce Momjian  writes:
> I guess you are saying that setting a cut-off was a bad idea, or that
> the cut-off was too close to the final release date.  For me, I think
> there were three questions:

> 1.  Were subtransactions acceptable, consensus no
> 2.  Could trapping errors work for PG 15, consensus no
> 3.  Could the feature be trimmed back for PG 15 to avoid these, consensus ?

We could probably have accomplished #3 if there was more time,
but we're out of time.  (I'm not entirely convinced that spending
effort towards #3 was productive anyway, given that we're now thinking
about a much differently-scoped patch with API changes.)

> I don't think our community works well when there are three issues in
> play at once.

To the extent that there was a management failure here, it was that
we didn't press for a resolution sooner.  Given the scale of the
concerns raised in June, I kind of agree with Andres' opinion that
fixing them post-freeze was doomed to failure.  It was definitely
doomed once we reached August with no real work done towards it.

regards, tom lane




Re: Tracking last scan time

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 19:52:49 +0200, Matthias van de Meent wrote:
> As for having a lower granularity and preventing the
> one-syscall-per-Relation issue, can't we reuse the query_start or
> state_change timestamps that appear in pg_stat_activity (potentially
> updated immediately before this stat flush), or some other per-backend
> timestamp that is already maintained and considered accurate enough
> for this use?

The problem is that it won't change at all for a query that runs for a week -
and we'll report the timestamp from a week ago when it finally ends.

But given this is done when stats are flushed, which only happens after the
transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if
we got to flushing the transaction stats we'll already have computed that.


>   tabentry->numscans += lstats->t_counts.t_numscans;
> + if (pgstat_track_scans && lstats->t_counts.t_numscans)
> + tabentry->lastscan = GetCurrentTimestamp();

Besides replacing GetCurrentTimestamp() with
GetCurrentTransactionStopTimestamp(), this should then also check if
tabentry->lastscan is already newer than the new timestamp.

Greetings,

Andres Freund




Re: [PATCH] Query Jumbling for CALL and SET utility statements

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 11:00:05 -0700, Jeremy Schneider wrote:
> On 8/31/22 9:08 AM, Andres Freund wrote:
> > 
> > I suspect we should carve out things like CALL, PREPARE, EXECUTE from
> > track_utility - it's more or less an architectural accident that they're
> > utility statements.  It's a bit less clear that SET should be dealt with 
> > that
> > way.
> 
> Regarding SET, the compelling use case was around "application_name"
> whose purpose is to provide a label in pg_stat_activity and on log
> lines, which can be used to improve observability and connect queries to
> their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.


> Personally, at this point, I think pg_stat_statements is critical
> infrastructure for anyone running PostgreSQL at scale. The information
> it provides is indispensable. I don't think it's really defensible to
> tell people that if they want to scale, then they need to fly blind on
> any utility statements.

I wasn't suggesting doing so at all.

Greetings,

Andres Freund




Re: SQL/JSON features for v15

2022-08-31 Thread Andrew Dunstan


On 2022-08-31 We 14:45, Tom Lane wrote:
> To the extent that there was a management failure here, it was that
> we didn't press for a resolution sooner.  Given the scale of the
> concerns raised in June, I kind of agree with Andres' opinion that
> fixing them post-freeze was doomed to failure.  It was definitely
> doomed once we reached August with no real work done towards it.


I'm not going to comment publicly in general about this, you might
imagine what my reaction is. The decision is the RMT's to make and I
have no quarrel with that.

But I do want it understood that there was work being done right from
the time in June when Andres' complaints were published. These were
difficult issues, and we didn't let the grass grow looking for a fix. I
concede that might not have been visible until later.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote:
> @@ -50,7 +51,7 @@ copydir(char *fromdir, char *todir, bool recurse)
>  
>   while ((xlde = ReadDir(xldir, fromdir)) != NULL)
>   {
> - struct stat fst;
> + PGFileType  xlde_type;
>  
>   /* If we got a cancel signal during the copy of the directory, 
> quit */
>   CHECK_FOR_INTERRUPTS();
> @@ -62,18 +63,15 @@ copydir(char *fromdir, char *todir, bool recurse)
>   snprintf(fromfile, sizeof(fromfile), "%s/%s", fromdir, 
> xlde->d_name);
>   snprintf(tofile, sizeof(tofile), "%s/%s", todir, xlde->d_name);
>  
> - if (lstat(fromfile, &fst) < 0)
> - ereport(ERROR,
> - (errcode_for_file_access(),
> -  errmsg("could not stat file \"%s\": 
> %m", fromfile)));
> + xlde_type = get_dirent_type(fromfile, xlde, false, ERROR);
>  
> - if (S_ISDIR(fst.st_mode))
> + if (xlde_type == PGFILETYPE_DIR)
>   {
>   /* recurse to handle subdirectories */
>   if (recurse)
>   copydir(fromfile, tofile, true);
>   }
> - else if (S_ISREG(fst.st_mode))
> + else if (xlde_type == PGFILETYPE_REG)
>   copy_file(fromfile, tofile);
>   }
>   FreeDir(xldir);

It continues to make no sense to me to add behaviour changes around
error-handling as part of a conversion to get_dirent_type(). I don't at all
understand why e.g. the above change to make copydir() silently skip over
files it can't stat is ok?

Greetings,

Andres Freund




Re: [PATCH] Add sortsupport for range types and btree_gist

2022-08-31 Thread Christoph Heiss

Hi!

Sorry for the long delay.

This fixes the crashes, now all tests run fine w/ and w/o debug 
configuration. That shouldn't even have slipped through as such.


Notable changes from v1:
- gbt_enum_sortsupport() now passes on fcinfo->flinfo
  enum_cmp_internal() needs a place to cache the typcache entry.
- inet sortsupport now uses network_cmp() directly

Thanks,
Christoph HeissFrom 667779bc0761c1356141722181c5a54ac46d96b9 Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Wed, 31 Aug 2022 19:20:43 +0200
Subject: [PATCH v2 1/1] Add sortsupport for range types and btree_gist

Incrementally building up a GiST index can result in a sub-optimal index
structure for range types.
By sorting the data before inserting it into the index will result in a
much better index.

This can provide sizeable improvements in query execution time, I/O read
time and shared block hits/reads.

Signed-off-by: Christoph Heiss 
---
 contrib/btree_gist/Makefile |   3 +-
 contrib/btree_gist/btree_bit.c  |  19 
 contrib/btree_gist/btree_bool.c |  22 
 contrib/btree_gist/btree_cash.c |  22 
 contrib/btree_gist/btree_enum.c |  26 +
 contrib/btree_gist/btree_gist--1.7--1.8.sql | 110 
 contrib/btree_gist/btree_gist.control   |   2 +-
 contrib/btree_gist/btree_inet.c |  19 
 contrib/btree_gist/btree_interval.c |  19 
 contrib/btree_gist/btree_macaddr8.c |  19 
 contrib/btree_gist/btree_time.c |  19 
 src/backend/utils/adt/rangetypes_gist.c |  70 +
 src/include/catalog/pg_amproc.dat   |   3 +
 src/include/catalog/pg_proc.dat |   3 +
 14 files changed, 354 insertions(+), 2 deletions(-)
 create mode 100644 contrib/btree_gist/btree_gist--1.7--1.8.sql

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index 48997c75f6..4096de73ea 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -33,7 +33,8 @@ EXTENSION = btree_gist
 DATA = btree_gist--1.0--1.1.sql \
btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
-   btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql
+   btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
+   btree_gist--1.7--1.8.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
diff --git a/contrib/btree_gist/btree_bit.c b/contrib/btree_gist/btree_bit.c
index 5b246bcde4..bf32bfd628 100644
--- a/contrib/btree_gist/btree_bit.c
+++ b/contrib/btree_gist/btree_bit.c
@@ -7,6 +7,7 @@
 #include "btree_utils_var.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/sortsupport.h"
 #include "utils/varbit.h"
 
 
@@ -19,10 +20,17 @@ PG_FUNCTION_INFO_V1(gbt_bit_picksplit);
 PG_FUNCTION_INFO_V1(gbt_bit_consistent);
 PG_FUNCTION_INFO_V1(gbt_bit_penalty);
 PG_FUNCTION_INFO_V1(gbt_bit_same);
+PG_FUNCTION_INFO_V1(gbt_bit_sortsupport);
 
 
 /* define for comparison */
 
+static int
+bit_fast_cmp(Datum x, Datum y, SortSupport ssup)
+{
+	return DatumGetInt32(DirectFunctionCall2(byteacmp, x, y));
+}
+
 static bool
 gbt_bitgt(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
 {
@@ -208,3 +216,14 @@ gbt_bit_penalty(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(gbt_var_penalty(result, o, n, PG_GET_COLLATION(),
 	  &tinfo, fcinfo->flinfo));
 }
+
+Datum
+gbt_bit_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+	ssup->comparator = bit_fast_cmp;
+	ssup->ssup_extra = NULL;
+
+	PG_RETURN_VOID();
+}
diff --git a/contrib/btree_gist/btree_bool.c b/contrib/btree_gist/btree_bool.c
index 8b2af129b5..3a9f230e68 100644
--- a/contrib/btree_gist/btree_bool.c
+++ b/contrib/btree_gist/btree_bool.c
@@ -6,6 +6,7 @@
 #include "btree_gist.h"
 #include "btree_utils_num.h"
 #include "common/int.h"
+#include "utils/sortsupport.h"
 
 typedef struct boolkey
 {
@@ -23,6 +24,16 @@ PG_FUNCTION_INFO_V1(gbt_bool_picksplit);
 PG_FUNCTION_INFO_V1(gbt_bool_consistent);
 PG_FUNCTION_INFO_V1(gbt_bool_penalty);
 PG_FUNCTION_INFO_V1(gbt_bool_same);
+PG_FUNCTION_INFO_V1(gbt_bool_sortsupport);
+
+static int
+bool_fast_cmp(Datum x, Datum y, SortSupport ssup)
+{
+	bool	arg1 = DatumGetBool(x);
+	bool	arg2 = DatumGetBool(y);
+
+	return arg1 - arg2;
+}
 
 static bool
 gbt_boolgt(const void *a, const void *b, FmgrInfo *flinfo)
@@ -167,3 +178,14 @@ gbt_bool_same(PG_FUNCTION_ARGS)
 	*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
 	PG_RETURN_POINTER(result);
 }
+
+Datum
+gbt_bool_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+	ssup->comparator = bool_fast_cmp;
+	ssup->ssup_extra = NULL;
+
+	PG_RETURN_VOID();
+}
diff --git a/contrib/btree_gist/btree_cash.c b/contrib/btree_gist/btree_cash.c
index 6ac97e2b12..389b725130 1006

Re: Tracking last scan time

2022-08-31 Thread Bruce Momjian
On Wed, Aug 31, 2022 at 11:56:29AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-08-31 19:52:49 +0200, Matthias van de Meent wrote:
> > As for having a lower granularity and preventing the
> > one-syscall-per-Relation issue, can't we reuse the query_start or
> > state_change timestamps that appear in pg_stat_activity (potentially
> > updated immediately before this stat flush), or some other per-backend
> > timestamp that is already maintained and considered accurate enough
> > for this use?
> 
> The problem is that it won't change at all for a query that runs for a week -
> and we'll report the timestamp from a week ago when it finally ends.
> 
> But given this is done when stats are flushed, which only happens after the
> transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if
> we got to flushing the transaction stats we'll already have computed that.

Oh, good point --- it is safer to show a more recent time than a too-old
time.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Nathan Bossart
On Wed, Aug 31, 2022 at 12:14:33PM -0700, Andres Freund wrote:
> On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote:
>> -if (lstat(fromfile, &fst) < 0)
>> -ereport(ERROR,
>> -(errcode_for_file_access(),
>> - errmsg("could not stat file \"%s\": 
>> %m", fromfile)));
>> +xlde_type = get_dirent_type(fromfile, xlde, false, ERROR);
>>  
>> -if (S_ISDIR(fst.st_mode))
>> +if (xlde_type == PGFILETYPE_DIR)
>>  {
>>  /* recurse to handle subdirectories */
>>  if (recurse)
>>  copydir(fromfile, tofile, true);
>>  }
>> -else if (S_ISREG(fst.st_mode))
>> +else if (xlde_type == PGFILETYPE_REG)
>>  copy_file(fromfile, tofile);
>>  }
>>  FreeDir(xldir);
> 
> It continues to make no sense to me to add behaviour changes around
> error-handling as part of a conversion to get_dirent_type(). I don't at all
> understand why e.g. the above change to make copydir() silently skip over
> files it can't stat is ok?

In this example, the call to get_dirent_type() should ERROR if the call to
lstat() fails (the "elevel" argument is set to ERROR).
-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

2022-08-31 Thread Andres Freund
Hiu,

On 2022-08-25 11:44:34 +0200, Drouvot, Bertrand wrote:
> For REPLSLOT, I agree that we can add one test: I added it in
> contrib/test_decoding/sql/stats.sql. It relies on pg_stat_have_stats() (as
> relying on pg_stat_replication_slots and/or pg_stat_get_replication_slot()
> would not help that much for this test given that the slot has been removed
> from ReplicationSlotCtl)

As Horiguchi-san noted, we can't rely on specific indexes being used. I feel
ok with the current coverage in that area, but if we *really* feel we need to
test it, we'd need to count the number of indexes with slots before dropping
the slot and after the drop.


> +-- pg_stat_have_stats returns true for committed index creation

Maybe another test for an uncommitted index creation would be good too?

Could you try running this test with debug_discard_caches = 1 - it's pretty
easy to write tests in this area that aren't reliable timing wise.


> +CREATE table stats_test_tab1 as select generate_series(1,10) a;
> +CREATE index stats_test_idx1 on stats_test_tab1(a);
> +SELECT oid AS dboid from pg_database where datname = current_database() \gset

Since you introduced this, maybe convert the other instance of this query at
the end of the file as well?


Greetings,

Andres Freund




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 12:24:28 -0700, Nathan Bossart wrote:
> On Wed, Aug 31, 2022 at 12:14:33PM -0700, Andres Freund wrote:
> > On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote:
> > It continues to make no sense to me to add behaviour changes around
> > error-handling as part of a conversion to get_dirent_type(). I don't at all
> > understand why e.g. the above change to make copydir() silently skip over
> > files it can't stat is ok?
> 
> In this example, the call to get_dirent_type() should ERROR if the call to
> lstat() fails (the "elevel" argument is set to ERROR).

Oh, oops. Skimmed code too quickly...

Greetings,

Andres Freund




Re: First draft of the PG 15 release notes

2022-08-31 Thread Bruce Momjian
On Wed, Aug 31, 2022 at 11:38:33AM +0200, Peter Eisentraut wrote:
> On 30.08.22 22:42, Bruce Momjian wrote:
> > Good question --- the full text is:
> > 
> >   
> >
> > Record and check the collation of each  > linkend="sql-createdatabase">database (Peter Eisentraut)
> >
> > 
> >
> > This is designed to detect collation
> > mismatches to avoid data corruption.  Function
> > pg_database_collation_actual_version()
> > reports the underlying operating system collation version, and
> > ALTER DATABASE ...  REFRESH sets the database
> > to match the operating system collation version.  DETAILS?
> >
> >   
> > 
> > I just can't figure out what the user needs to understand about this,
> > and I understand very little of it.
> 
> We already had this feature for (schema-level) collations, now we have it on
> the level of the database collation.

Okay, I figured out the interplay between OS collation version support,
collation libraries, and collation levels.  Here is an updated patch for
the release notes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index e3ab13e53c..dca4550c03 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -578,17 +578,17 @@ Author: Peter Eisentraut 
 
  
   
-   Record and check the collation of each database (Peter Eisentraut)
   
 
   
-   This is designed to detect collation
+   This is designed to detect collation version
mismatches to avoid data corruption.  Function
pg_database_collation_actual_version()
reports the underlying operating system collation version, and
ALTER DATABASE ...  REFRESH sets the database
-   to match the operating system collation version.  DETAILS?
+   to match the operating system collation version.
   
  
 
@@ -605,9 +605,11 @@ Author: Peter Eisentraut 
   
 
   
-   Previously, ICU collations could only be
-   specified in CREATE
-   COLLATION and used with the
+   Previously, only libc-based
+   collations could be set at the cluster and database levels.
+   ICU collations were previously limited
+   to CREATE
+   COLLATION and referenced by the
COLLATE clause.
   
  


Re: Bug fix. autovacuum.c do_worker_start() associates memory allocations with TopMemoryContext rather than 'Autovacuum start worker (tmp)'

2022-08-31 Thread Tom Lane
Reid Thompson  writes:
> This patch ensures get_database_list() switches back to the memory
> context in use upon entry rather than returning with TopMemoryContext
> as the context.

> This will address memory allocations in autovacuum.c being associated
> with TopMemoryContext when they should not be.

> autovacuum.c do_start_worker() with current context
> 'Autovacuum start worker (tmp)' invokes get_database_list(). Upon
> return, the current context has been changed to TopMemoryContext by
> AtCommit_Memory() as part of an internal transaction. Further down
> in the do_start_worker(), pgstat_fetch_stat_dbentry() is invoked.
> Previously this didn't pose a issue, however recent changes altered
> how pgstat_fetch_stat_dbentry() is implemented. The new
> implementation has a branch utilizing palloc. The patch ensures these
> allocations are associated with the 'Autovacuum start worker (tmp)'
> context rather than the TopMemoryContext. Prior to the change,
> leaving an idle laptop PG instance running just shy of 3 days saw the
> autovacuum launcher process grow to 42MB with most of that growth in
> TopMemoryContext due to the palloc allocations issued with autovacuum
> worker startup.

Yeah, I can reproduce noticeable growth within a couple minutes
after setting autovacuum_naptime to 1s, and I confirm that the
launcher's space consumption stays static after applying this.

Even if there's only a visible leak in v15/HEAD, I'm inclined
to back-patch this all the way, because it's certainly buggy
on its own terms.  It's just the merest accident that neither
caller is leaking other stuff into TopMemoryContext, because
they both think they are using a short-lived context.

Thanks for the report!

regards, tom lane




Re: Bug fix. autovacuum.c do_worker_start() associates memory allocations with TopMemoryContext rather than 'Autovacuum start worker (tmp)'

2022-08-31 Thread Andres Freund
Hi,

On 2022-08-31 16:05:03 -0400, Tom Lane wrote:
> Even if there's only a visible leak in v15/HEAD, I'm inclined
> to back-patch this all the way, because it's certainly buggy
> on its own terms.  It's just the merest accident that neither
> caller is leaking other stuff into TopMemoryContext, because
> they both think they are using a short-lived context.

+1

> Thanks for the report!

+1

Greetings,

Andres Freund




Re: SQL/JSON features for v15

2022-08-31 Thread Jonathan S. Katz

On 8/31/22 3:08 PM, Andrew Dunstan wrote:


On 2022-08-31 We 14:45, Tom Lane wrote:

To the extent that there was a management failure here, it was that
we didn't press for a resolution sooner.  Given the scale of the
concerns raised in June, I kind of agree with Andres' opinion that
fixing them post-freeze was doomed to failure.  It was definitely
doomed once we reached August with no real work done towards it.



I'm not going to comment publicly in general about this, you might
imagine what my reaction is. The decision is the RMT's to make and I
have no quarrel with that.

But I do want it understood that there was work being done right from
the time in June when Andres' complaints were published. These were
difficult issues, and we didn't let the grass grow looking for a fix. I
concede that might not have been visible until later.


June was a bit of a rough month too -- we had the issues that spawned 
the out-of-cycle release at the top of the month, which started almost 
right after Beta 1, and then almost immediately into Beta 2 after 14.4. 
I know that consumed a lot of my cycles. At that point in time for the 
v15 release process I was primarily focused on monitoring open items at 
that point, so I missed the June comments.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: SQL/JSON features for v15

2022-08-31 Thread Nikita Glukhov


On 31.08.2022 20:14, Tom Lane wrote:

Robert Haas  writes:

On Wed, Aug 31, 2022 at 1:06 PM Tom Lane  wrote:

The currently proposed patchset hacks up a relatively small number
of core datatypes to be able to do that.  But it's just a hack
and there's no prospect of extension types being able to join
in the fun.  I think where we need to start, for v16, is making
an API design that will let any datatype have this functionality.

(I don't say that we'd convert every datatype to do so right away;
in the long run we should, but I'm content to start with just the
same core types touched here.)  Beside the JSON stuff, there is
another even more pressing application for such behavior, namely
the often-requested COPY functionality to be able to shunt bad data
off somewhere without losing the entire transfer.  In the COPY case
I think we'd want to be able to capture the error message that
would have been issued, which means the current patches are not
at all appropriate as a basis for that API design: they're just
returning a bool without any details.


I would be in favor of making more of an effort than just a few token
data types. The initial patch could just touch a few, but once the
infrastructure is in place we should really make a sweep through the
tree and tidy up.

Sure, but my point is that we can do that in a time-extended fashion
rather than having a flag day where everything must be updated.
The initial patch just needs to update a few types as proof of concept.


And here is a quick POC patch with an example for COPY and float4:

=# CREATE TABLE test (i int, f float4);
CREATE TABLE

=# COPY test (f) FROM stdin WITH (null_on_error (f));
1
err
2
\.

COPY 3

=# SELECT f FROM test;
 f
---
 1
  
 2

(3 rows)

=# COPY test (i) FROM stdin WITH (null_on_error (i));
ERROR:  input function for datatype "integer" does not support error handling



PG_RETURN_ERROR() is a reincarnation of ereport_safe() macro for returning
ErrorData, which was present in older versions (~v18) of SQL/JSON patches.
Later it was replaced with `bool *have_error` and less magical
`if (have_error) ... else ereport(...)`.


Obviously, this needs a separate thread.

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From 537e65e1ebcccdf4a760d3aa743c88611e7c Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Wed, 31 Aug 2022 22:24:33 +0300
Subject: [PATCH 1/5] Introduce PG_RETURN_ERROR and FuncCallError node

---
 src/include/nodes/execnodes.h |  7 +++
 src/include/utils/elog.h  | 25 +
 2 files changed, 32 insertions(+)

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc09..b401d354656 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2729,4 +2729,11 @@ typedef struct LimitState
 	TupleTableSlot *last_slot;	/* slot for evaluation of ties */
 } LimitState;
 
+typedef struct FuncCallError
+{
+	NodeTag		type;
+	ErrorData  *error;			/* place where function should put
+ * error data instead of throwing it */
+} FuncCallError;
+
 #endif			/* EXECNODES_H */
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 56398176901..5fd3deed61f 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -159,6 +159,31 @@
 #define ereport(elevel, ...)	\
 	ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
 
+/*
+ * PG_RETURN_ERROR() -- special macro for copying error info into the
+ * FuncCallError node, if it was as FunctionCallInfo.context,
+ * instead of throwing it with ordinary ereport().
+ *
+ * This is intended for handling of errors of categories like
+ * ERRCODE_DATA_EXCEPTION without PG_TRY/PG_CATCH and substransactions,
+ * but not for errors like ERRCODE_OUT_OF_MEMORY.
+ */
+#define PG_RETURN_ERROR(...) \
+	do { \
+		if (fcinfo->context && IsA(fcinfo->context, FuncCallError)) { \
+			pg_prevent_errno_in_scope(); \
+			\
+			errstart(ERROR, TEXTDOMAIN); \
+			(__VA_ARGS__); \
+			castNode(FuncCallError, fcinfo->context)->error = CopyErrorData(); \
+			FlushErrorState(); \
+			\
+			PG_RETURN_NULL(); \
+		} else { \
+			ereport(ERROR, (__VA_ARGS__)); \
+		} \
+	} while (0)
+
 #define TEXTDOMAIN NULL
 
 extern bool message_level_is_interesting(int elevel);
-- 
2.25.1

From 7831b2c571784a614dd4ee4b48730187b0c0a8d1 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Wed, 31 Aug 2022 23:02:06 +0300
Subject: [PATCH 2/5] Introduce pg_proc.proissafe and func_is_safe()

---
 src/backend/utils/cache/lsyscache.c | 19 +++
 src/include/catalog/pg_proc.h   |  3 +++
 src/include/utils/lsyscache.h   |  1 +
 3 files changed, 23 insertions(+)

diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a16a63f4957..ff3005077b2 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1830,6 +1830,25 @@ get_func_leakproof(Oid funcid)
 	return result;
 }
 
+/*
+ * func_is_safe
+ 

Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread Tomas Vondra


On 8/31/22 00:40, David Rowley wrote:
> On Wed, 31 Aug 2022 at 02:17, Tom Lane  wrote:
>>
>> I wrote:
>>> So maybe we should revisit the question.  It'd be worth collecting some
>>> stats about how much extra space would be needed if we force there
>>> to be room for a sentinel.
>>
>> Actually, after ingesting more caffeine, the problem with this for aset.c
>> is that the only way to add space for a sentinel that didn't fit already
>> is to double the space allocation.  That's a little daunting, especially
>> remembering how many places deliberately allocate power-of-2-sized
>> arrays.
> 
> I decided to try and quantify that by logging the size, MAXALIGN(size)
> and the power of 2 size during AllocSetAlloc and GenerationAlloc. I
> made the pow2_size 0 in GenerationAlloc and in AlloocSetAlloc when
> size > allocChunkLimit.
> 
> After running make installcheck, grabbing the records out the log and
> loading them into Postgres, I see that if we did double the pow2_size
> when there's no space for the sentinel byte then we'd go from
> allocating a total of 10.2GB all the way to 16.4GB (!) of
> non-dedicated block aset.c allocations.
> 
> select
> round(sum(pow2_Size)::numeric/1024/1024/1024,3) as pow2_size,
> round(sum(case when maxalign_size=pow2_size then pow2_size*2 else
> pow2_size end)::numeric/1024/1024/1024,3) as method1,
> round(sum(case when maxalign_size=pow2_size then pow2_size+8 else
> pow2_size end)::numeric/1024/1024/1024,3) as method2
> from memstats
> where pow2_size > 0;
>  pow2_size | method1 | method2
> ---+-+-
> 10.194 |  16.382 |  10.463
> 
> if we did just add on an extra 8 bytes (or or MAXALIGN(size+1) at
> least), then that would take the size up to 10.5GB.
> 

I've been experimenting with this a bit too, and my results are similar,
but not exactly the same. I've logged all Alloc/Realloc calls for the
two memory contexts, and when I aggregated the results I get this:

f| size | pow2(size) | pow2(size+1)
-+--++--
 AllocSetAlloc   |23528 |  28778 |31504
 AllocSetRelloc  |  761 |824 | 1421
 GenerationAlloc |   68 | 90 |  102

So the raw size (what we asked for) is ~23.5GB, but in practice we
allocate ~28.8GB because of the pow-of-2 logic. And by adding the extra
1B we end up allocating 31.5GB. That doesn't seem like a huge increase,
and it's far from the +60% you got.

I wonder where does the difference come - I did make installcheck too,
so how come you get 10/16GB, and I get 28/31GB? My patch is attached,
maybe I did something silly.

I also did a quick hack to see if always having the sentinel detects any
pre-existing issues, but that didn't happen. I guess valgrind would find
those, but not sure?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6eeb8abab..e6d86a54d7 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -674,6 +674,15 @@ AllocSetDelete(MemoryContext context)
 	free(set);
 }
 
+static Size pow2_size(Size size)
+{
+	Size s = 1;
+	while (s < size)
+		s *= 2;
+
+	return s;
+}
+
 /*
  * AllocSetAlloc
  *		Returns pointer to allocated memory of given size or NULL if
@@ -699,13 +708,16 @@ AllocSetAlloc(MemoryContext context, Size size)
 
 	AssertArg(AllocSetIsValid(set));
 
+	fprintf(stderr, "AllocSetAlloc %ld %ld %ld\n", size, pow2_size(size), pow2_size(size+1));
+	fflush(stderr);
+
 	/*
 	 * If requested size exceeds maximum for chunks, allocate an entire block
 	 * for this request.
 	 */
-	if (size > set->allocChunkLimit)
+	if (size + 1 > set->allocChunkLimit)
 	{
-		chunk_size = MAXALIGN(size);
+		chunk_size = MAXALIGN(size + 1);
 		blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
 		block = (AllocBlock) malloc(blksize);
 		if (block == NULL)
@@ -725,7 +737,16 @@ AllocSetAlloc(MemoryContext context, Size size)
 		chunk->requested_size = size;
 		/* set mark to catch clobber of "unused" space */
 		if (size < chunk_size)
+		{
 			set_sentinel(MemoryChunkGetPointer(chunk), size);
+			fprintf(stderr, "sentinel added\n");
+			fflush(stderr);
+		}
+		else
+		{
+			fprintf(stderr, "A sentinel not added\n");
+			fflush(stderr);
+		}
 #endif
 #ifdef RANDOMIZE_ALLOCATED_MEMORY
 		/* fill the allocated space with junk */
@@ -767,7 +788,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 	 * If one is found, remove it from the free list, make it again a member
 	 * of the alloc set and return its data address.
 	 */
-	fidx = AllocSetFreeIndex(size);
+	fidx = AllocSetFreeIndex(size+1);
 	chunk = set->freelist[fidx];
 	if (chunk != NULL)
 	{
@@ -784,7 +805,16 @@ AllocSetAlloc(MemoryContext context, Size size)
 		chunk->requested_size = size;
 		/* set mark to catch clobber of "unused" space */
 		if (size < GetChunkSizeFromFreeListIdx(fidx))
+		{
 			set_senti

Re: Trivial doc patch

2022-08-31 Thread Bruce Momjian
On Fri, Aug 19, 2022 at 10:42:45AM -0400, Bruce Momjian wrote:
> > Given that 'optional, optional' has no independent meaning from
> > 'optional';  it requires one to scan the entire set looking for
> > the non-optional embedded in the option.  So no gain.
> 
> I originally had the same reaction Michael Paquier did, that having one
> big optional block is nice, but seeing that 'CREATE DATABASE name WITH'
> actually works, I can see the point in having our syntax be accurate,
> and removing the outer optional brackets now does seem like an
> improvement to me.

Backpatched to PG 10.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread David Rowley
On Thu, 1 Sept 2022 at 08:53, Tomas Vondra
 wrote:
> So the raw size (what we asked for) is ~23.5GB, but in practice we
> allocate ~28.8GB because of the pow-of-2 logic. And by adding the extra
> 1B we end up allocating 31.5GB. That doesn't seem like a huge increase,
> and it's far from the +60% you got.
>
> I wonder where does the difference come - I did make installcheck too,
> so how come you get 10/16GB, and I get 28/31GB? My patch is attached,
> maybe I did something silly.

The reason my reported results were lower is because I ignored size >
allocChunkLimit allocations. These are not raised to the next power of
2, so I didn't think they should be included.

I'm not sure why you're seeing only a 3GB additional overhead. I
noticed a logic error in my query where I was checking
maxaligned_size=pow2_size and doubling that to give sentinel space.
That really should have been "case size=pow2_size then pow2_size * 2
else pow2_size end", However, after adjusting the query, it does not
seem to change the results much:

postgres=# select
postgres-# round(sum(pow2_Size)::numeric/1024/1024/1024,3) as pow2_size,
postgres-# round(sum(case when size=pow2_size then pow2_size*2 else
pow2_size end)::numeric/1024/1024/1024,3) as method1,
postgres-# round(sum(case when size=pow2_size then pow2_size+8 else
pow2_size end)::numeric/1024/1024/1024,3) as method2
postgres-# from memstats
postgres-# where pow2_size > 0;
 pow2_size | method1 | method2
---+-+-
10.269 |  16.322 |  10.476

I've attached the crude patch I came up with for this.  For some
reason it was crashing on Linux, but it ran ok on Windows, so I used
the results from that instead.  Maybe that accounts for some
differences as e.g sizeof(long) == 4 on 64-bit windows. I'd be
surprised if that accounted for so many GBs though.

I also forgot to add code to GenerationRealloc and AllocSetRealloc

David
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6eeb8abab..f4977f9bcc 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -46,6 +46,7 @@
 
 #include "postgres.h"
 
+#include "miscadmin.h"
 #include "port/pg_bitutils.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
@@ -696,9 +697,22 @@ AllocSetAlloc(MemoryContext context, Size size)
int fidx;
Sizechunk_size;
Sizeblksize;
+   static int  rlevel = 1;
 
AssertArg(AllocSetIsValid(set));
 
+   if (rlevel == 1 && GetProcessingMode() == NormalProcessing)
+   {
+   rlevel++;
+   elog(LOG, "AllocSetAlloc,%s,%zu,%zu,%zu,%zu",
+context->name,
+size,
+MAXALIGN(size),
+size > set->allocChunkLimit ? 0 : 
GetChunkSizeFromFreeListIdx(AllocSetFreeIndex(size)),
+set->allocChunkLimit);
+   rlevel--;
+   }
+
/*
 * If requested size exceeds maximum for chunks, allocate an entire 
block
 * for this request.
diff --git a/src/backend/utils/mmgr/generation.c 
b/src/backend/utils/mmgr/generation.c
index b39894ec94..9c5ff3c095 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -36,6 +36,7 @@
 #include "postgres.h"
 
 #include "lib/ilist.h"
+#include "miscadmin.h"
 #include "port/pg_bitutils.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
@@ -344,6 +345,18 @@ GenerationAlloc(MemoryContext context, Size size)
MemoryChunk *chunk;
Sizechunk_size = MAXALIGN(size);
Sizerequired_size = chunk_size + Generation_CHUNKHDRSZ;
+   static int  rlevel = 1;
+
+   if (rlevel == 1 && GetProcessingMode() == NormalProcessing)
+   {
+   rlevel++;
+   elog(LOG, "GenerationAlloc,%s,%zu,%zu,0,%zu",
+context->name,
+size,
+MAXALIGN(size),
+set->allocChunkLimit);
+   rlevel--;
+   }
 
/* is it an over-sized chunk? if yes, allocate special block */
if (chunk_size > set->allocChunkLimit)


Re: New strategies for freezing, advancing relfrozenxid early

2022-08-31 Thread Peter Geoghegan
On Thu, Aug 25, 2022 at 2:21 PM Peter Geoghegan  wrote:
> Attached patch series is a completely overhauled version of earlier
> work on freezing. Related work from the Postgres 15 cycle became
> commits 0b018fab, f3c15cbe, and 44fa8488.

Attached is v2.

This is just to keep CFTester happy, since v1 now has conflicts when
applied against HEAD. There are no notable changes in this v2 compared
to v1.

-- 
Peter Geoghegan


v2-0001-Add-page-level-freezing-to-VACUUM.patch
Description: Binary data


v2-0002-Teach-VACUUM-to-use-visibility-map-snapshot.patch
Description: Binary data


v2-0003-Add-eager-freezing-strategy-to-VACUUM.patch
Description: Binary data


v2-0004-Avoid-allocating-MultiXacts-during-VACUUM.patch
Description: Binary data


Solaris "sed" versus pre-v13 plpython tests

2022-08-31 Thread Tom Lane
I noticed that buildfarm member margay, which just recently started
running tests on REL_12_STABLE, is failing the plpython tests in that
branch [1], though it's happy in v13 and later.  The failures appear
due to syntax errors in Python "except" statements, and it's visible
in some of the tests that we are failing to convert ancient Python
"except" syntax to what Python 3 wants:

diff -U3 
/home/marcel/build-farm-14/buildroot/REL_12_STABLE/pgsql.build/src/pl/plpython/expected/python3/plpython_types_3.out
 
/home/marcel/build-farm-14/buildroot/REL_12_STABLE/pgsql.build/src/pl/plpython/results/python3/plpython_types.out
--- 
/home/marcel/build-farm-14/buildroot/REL_12_STABLE/pgsql.build/src/pl/plpython/expected/python3/plpython_types_3.out
2022-08-31 22:29:51.070597370 +0200
+++ 
/home/marcel/build-farm-14/buildroot/REL_12_STABLE/pgsql.build/src/pl/plpython/results/python3/plpython_types.out
   2022-08-31 22:29:53.053544840 +0200
@@ -400,15 +400,16 @@
 import marshal
 try:
 return marshal.loads(x)
-except ValueError as e:
+except ValueError, e:
 return 'FAILED: ' + str(e)
 $$ LANGUAGE plpython3u;
+ERROR:  could not compile PL/Python function "test_type_unmarshal"
+DETAIL:  SyntaxError: invalid syntax (, line 6)

So it would seem that regress-python3-mangle.mk's sed command to
perform this transformation isn't working on margay's sed.

We've had to hack regress-python3-mangle.mk for Solaris "sed" before
(cf commit c3556f6fa).  This seems to be another instance of that
same crummy implementation of '*' patterns.  I suppose Noah missed
this angle at the time because the problematic pattern had already
been removed in v13 and up (45223fd9c).  But it's still there in v12.

I am not completely sure why buildfarm member wrasse isn't failing
similarly, but a likely theory is that Noah has got some other sed
in his search path there.

I confirmed on the gcc farm's Solaris 11 box that the pattern
doesn't work as expected with /usr/bin/sed:

tgl@gcc-solaris11:~$ echo except foo, bar: | sed -e 's/except 
\([[:alpha:]][[:alpha:].]*\), *\([[:alpha:]][[:alpha:]]*\):/except \1 as \2:/g'
except foo, bar:

We could perhaps do this instead:

$ echo except foo, bar: | sed -e '/^ *except.*,.*: *$/s/, / as /g'
except foo as bar:

It's a little bit more brittle, but Python doesn't allow any other
commands on the same line does it?

Another idea is to try to find some other sed to use.  I see that
configure already does that on most platforms, because ax_pthread.m4
has "AC_REQUIRE([AC_PROG_SED])".  But if there's still someone
out there using --disable-thread-safety, they might think this is
an annoying movement of the build-requirements goalposts.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2022-08-31%2020%3A00%3A05




Re: cataloguing NOT NULL constraints

2022-08-31 Thread Zhihong Yu
On Wed, Aug 31, 2022 at 3:19 PM Alvaro Herrera 
wrote:

> So I was wrong in thinking that "this case was simple to implement" as I
> replied upthread.  Doing that actually required me to rewrite large
> parts of the patch.  I think it ended up being a good thing, because in
> hindsight the approach I was using was somewhat bogus anyway, and the
> current one should be better.  Please find it attached.
>
> There are still a few problems, sadly.  Most notably, I ran out of time
> trying to fix a pg_upgrade issue with pg_dump in binary-upgrade mode.
> I have to review that again, but I think it'll need a deeper rethink of
> how we pg_upgrade inherited constraints.  So the pg_upgrade tests are
> known to fail.  I'm not aware of any other tests failing, but I'm sure
> the cfbot will prove me wrong.
>
> I reluctantly added a new ALTER TABLE subcommand type, AT_SetAttNotNull,
> to allow setting pg_attribute.attnotnull without adding a CHECK
> constraint (only used internally).  I would like to find a better way to
> go about this, so I may remove it again, therefore it's not fully
> implemented.
>
> There are *many* changed regress expect files and I didn't carefully vet
> all of them.  Mostly it's the addition of CHECK constraints in the
> footers of many \d listings and stuff like that.  At a quick glance they
> appear valid, but I need to review them more carefully still.
>
> We've had pg_constraint.conparentid for a while now, but for some
> constraints we continue to use conislocal/coninhcount.  I think we
> should get rid of that and rely on conparentid completely.
>
> An easily fixed issue is that of constraint naming.
> ChooseConstraintName has an argument for passing known constraint names,
> but this patch doesn't use it and it must.
>
> One issue that I don't currently know how to fix, is the fact that we
> need to know whether a column is a row type or not (because they need a
> different null test).  At table creation time that's easy to know,
> because we have the descriptor already built by the time we add the
> constraints; but if you do ALTER TABLE .. ADD COLUMN .., ADD CONSTRAINT
> then we don't.
>
> Some ancient code comments suggest that allowing a child table's NOT
> NULL constraint acquired from parent shouldn't be independently
> droppable.  This patch doesn't change that, but it's easy to do if we
> decide to.  However, that'd be a compatibility break, so I'd rather not
> do it in the same patch that introduces the feature.
>
> Overall, there's a lot more work required to get this to a good shape.
> That said, I think it's the right direction.
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
> "La primera ley de las demostraciones en vivo es: no trate de usar el
> sistema.
> Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
>

Hi,
For findNotNullConstraintAttnum():

+   if (multiple == NULL)
+   break;

Shouldn't `pfree(arr)` be called before breaking ?

+static Constraint *makeNNCheckConstraint(Oid nspid, char *constraint_name,

You used `NN` because there is method makeCheckNotNullConstraint, right ?
I think it would be better to expand `NN` so that its meaning is easy to
understand.

Cheers


Re: cataloguing NOT NULL constraints

2022-08-31 Thread Zhihong Yu
On Wed, Aug 31, 2022 at 4:08 PM Zhihong Yu  wrote:

>
>
> On Wed, Aug 31, 2022 at 3:19 PM Alvaro Herrera 
> wrote:
>
>> So I was wrong in thinking that "this case was simple to implement" as I
>> replied upthread.  Doing that actually required me to rewrite large
>> parts of the patch.  I think it ended up being a good thing, because in
>> hindsight the approach I was using was somewhat bogus anyway, and the
>> current one should be better.  Please find it attached.
>>
>> There are still a few problems, sadly.  Most notably, I ran out of time
>> trying to fix a pg_upgrade issue with pg_dump in binary-upgrade mode.
>> I have to review that again, but I think it'll need a deeper rethink of
>> how we pg_upgrade inherited constraints.  So the pg_upgrade tests are
>> known to fail.  I'm not aware of any other tests failing, but I'm sure
>> the cfbot will prove me wrong.
>>
>> I reluctantly added a new ALTER TABLE subcommand type, AT_SetAttNotNull,
>> to allow setting pg_attribute.attnotnull without adding a CHECK
>> constraint (only used internally).  I would like to find a better way to
>> go about this, so I may remove it again, therefore it's not fully
>> implemented.
>>
>> There are *many* changed regress expect files and I didn't carefully vet
>> all of them.  Mostly it's the addition of CHECK constraints in the
>> footers of many \d listings and stuff like that.  At a quick glance they
>> appear valid, but I need to review them more carefully still.
>>
>> We've had pg_constraint.conparentid for a while now, but for some
>> constraints we continue to use conislocal/coninhcount.  I think we
>> should get rid of that and rely on conparentid completely.
>>
>> An easily fixed issue is that of constraint naming.
>> ChooseConstraintName has an argument for passing known constraint names,
>> but this patch doesn't use it and it must.
>>
>> One issue that I don't currently know how to fix, is the fact that we
>> need to know whether a column is a row type or not (because they need a
>> different null test).  At table creation time that's easy to know,
>> because we have the descriptor already built by the time we add the
>> constraints; but if you do ALTER TABLE .. ADD COLUMN .., ADD CONSTRAINT
>> then we don't.
>>
>> Some ancient code comments suggest that allowing a child table's NOT
>> NULL constraint acquired from parent shouldn't be independently
>> droppable.  This patch doesn't change that, but it's easy to do if we
>> decide to.  However, that'd be a compatibility break, so I'd rather not
>> do it in the same patch that introduces the feature.
>>
>> Overall, there's a lot more work required to get this to a good shape.
>> That said, I think it's the right direction.
>>
>> --
>> Álvaro Herrera   48°01'N 7°57'E  —
>> https://www.EnterpriseDB.com/
>> "La primera ley de las demostraciones en vivo es: no trate de usar el
>> sistema.
>> Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
>>
>
> Hi,
> For findNotNullConstraintAttnum():
>
> +   if (multiple == NULL)
> +   break;
>
> Shouldn't `pfree(arr)` be called before breaking ?
>
> +static Constraint *makeNNCheckConstraint(Oid nspid, char *constraint_name,
>
> You used `NN` because there is method makeCheckNotNullConstraint, right ?
> I think it would be better to expand `NN` so that its meaning is easy to
> understand.
>
> Cheers
>
Hi,
For tryExtractNotNullFromNode , in the block for `if (rel == NULL)`:

+   return false;

I think you meant returning NULL since false is for boolean.

Cheers


Re: Support tls-exporter as channel binding for TLSv1.3

2022-08-31 Thread Jacob Champion
On Sun, Aug 28, 2022 at 11:02 PM Michael Paquier  wrote:
> RFC9266, that has been released not so long ago, has added
> tls-exporter as a new channel binding type:
> https://www.rfc-editor.org/rfc/rfc5929.html

Hi Michael, thank you for sending this!

> Note also that tls-exporter is aimed for
> TLSv1.3 and newer protocols, but OpenSSL allows the thing to work with
> older protocols (testable with ssl_max_protocol_version, for example),
> and I don't see a need to prevent this scenario.

For protocols less than 1.3 we'll need to ensure that the extended
master secret is in use:

   This channel binding mechanism is defined only when the TLS handshake
   results in unique master secrets.  This is true of TLS versions prior
   to 1.3 when the extended master secret extension of [RFC7627] is in
   use, and it is always true for TLS 1.3 (see Appendix D of [RFC8446]).

OpenSSL should have an API for that (SSL_get_extms_support); I don't
know when it was introduced.

If we want to cross all our T's, we should also disallow tls-exporter
if the server was unable to set SSL_OP_NO_RENEGOTIATION.

> An extra thing is
> that attempting to use tls-exporter with a backend <= 15 and a client
> >= 16 causes a failure during the SASL exchange, where the backend
> complains about tls-exporter being unsupported.

Yep.

--

Did you have any thoughts about contributing the Python tests (or
porting them to Perl, bleh) so that we could test failure modes as
well? Unfortunately those Python tests were also OpenSSL-based, so
they're less powerful than an independent implementation...

Thanks,
--Jacob




Re: Small cleanups to tuplesort.c and a bonus small performance improvement

2022-08-31 Thread David Rowley
On Wed, 31 Aug 2022 at 22:39, David Rowley  wrote:
> My current thoughts are that this is a very trivial patch and unless
> there's any objections I plan to push it soon.

Pushed.

David




Re: Doc patch

2022-08-31 Thread Bruce Momjian
On Fri, Aug 19, 2022 at 12:04:54PM -0400, Bruce Momjian wrote:
> You are right about the above to changes.  The existing syntax shows
> FROM/IN is only possible if a direction is specified, while
> src/parser/gram.y says that FROM/IN with no direction is supported.
> 
> I plan to apply this second part of the patch soon.

Patch applied back to PG 10.  Thanks for the research on this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Transparent column encryption

2022-08-31 Thread Jacob Champion
On Tue, Aug 30, 2022 at 4:53 AM Peter Eisentraut
 wrote:
> I would be interested in learning more about such padding systems.  I
> have done a lot of reading for this development project, and I have
> never come across a cryptographic approach to hide length differences by
> padding.  Of course, padding to the block cipher's block size is already
> part of the process, but that is done out of necessity, not because you
> want to disguise the length.  Are there any other methods?  I'm
> interested to learn more.

TLS 1.3 has one example. Here is a description from GnuTLS:
https://gnutls.org/manual/html_node/On-Record-Padding.html (Note the
option to turn on constant-time padding; that may not be a good
tradeoff for us if we're focusing on offline attacks.)

Here's a recent paper that claims to formally characterize length
hiding, but it's behind a wall and I haven't read it:
https://dl.acm.org/doi/abs/10.1145/3460120.3484590

I'll try to find more when I get the chance.

--Jacob




Re: Inconsistent error message for varchar(n)

2022-08-31 Thread Bruce Momjian
8On Tue, Aug 16, 2022 at 09:56:17PM -0400, Bruce Momjian wrote:
> On Sun, Nov 14, 2021 at 10:33:19AM +0800, Japin Li wrote:
> > Oh! I didn't consider this situation.  Since the max size of varchar cannot
> > exceed 10485760, however, I cannot find this in documentation [1]. Is there
> > something I missed? Should we mention this in the documentation?
> > 
> > [1] https://www.postgresql.org/docs/devel/datatype-character.html
> 
> Sorry for my long delay in reviewing this issue.  You are correct this
> should be documented --- patch attached.

Patch applied back to PG 10.  Thanks for the report.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Thomas Munro
On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby  wrote:
> < 2022-08-31 08:44:10.495 CDT  >LOG:  checkpoint starting: end-of-recovery 
> immediate wait
> < 2022-08-31 08:44:10.609 CDT  >LOG:  request to flush past end of generated 
> WAL; request 1201/1CAF84F0, current position 1201/1CADB730
> < 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
> base/16881/2840_vm
> < 2022-08-31 08:44:10.609 CDT  >ERROR:  xlog flush request 1201/1CAF84F0 is 
> not satisfied --- flushed only to 1201/1CADB730
> < 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
> base/16881/2840_vm
> < 2022-08-31 08:44:10.609 CDT  >FATAL:  checkpoint request failed
>
> I was able to start it with -c recovery_prefetch=no, so it seems like
> prefetch tried to do too much.  The VM runs centos7 under qemu.
> I'm making a copy of the data dir in cases it's needed.

Hmm, a page with an LSN set 118208 bytes past the end of WAL.  It's a
vm fork page (which recovery prefetch should ignore completely).  Did
you happen to get a copy before the successful recovery?  After the
successful recovery, what LSN does that page have, and can you find
the references to it in the WAL with eg pg_waldump -R 1663/16681/2840
-F vm?  Have you turned FPW off (perhaps this is on ZFS?)?




Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread Tomas Vondra


On 8/31/22 23:46, David Rowley wrote:
> On Thu, 1 Sept 2022 at 08:53, Tomas Vondra
>  wrote:
>> So the raw size (what we asked for) is ~23.5GB, but in practice we
>> allocate ~28.8GB because of the pow-of-2 logic. And by adding the extra
>> 1B we end up allocating 31.5GB. That doesn't seem like a huge increase,
>> and it's far from the +60% you got.
>>
>> I wonder where does the difference come - I did make installcheck too,
>> so how come you get 10/16GB, and I get 28/31GB? My patch is attached,
>> maybe I did something silly.
> 
> The reason my reported results were lower is because I ignored size >
> allocChunkLimit allocations. These are not raised to the next power of
> 2, so I didn't think they should be included.

If I differentiate the large chunks allocated separately (v2 patch
attached), I get this:

f|t |  count   |   s1 |   s2 |   s3
-+--+--+--+--+--
 AllocSetAlloc   | normal   | 60714914 | 4982 | 6288 | 8185
 AllocSetAlloc   | separate |   824390 |18245 |18245 |18251
 AllocSetRelloc  | normal   |   182070 |  763 |  826 | 1423
 GenerationAlloc | normal   |  2118115 |   68 |   90 |  102
 GenerationAlloc | separate |   28 |0 |0 |0
(5 rows)

Where s1 is the sum of requested sizes, s2 is the sum of allocated
chunks, and s3 is chunks allocated with 1B sentinel.

Focusing on the aset, vast majority of allocations (60M out of 64M) is
small enough to use power-of-2 logic, and we go from 6.3GB to 8.2GB, so
~30%. Not great, not terrible.

For the large allocations, there's almost no increase - in the last
query I used the power-of-2 logic in the calculations, but that was
incorrect, of course.


> 
> I'm not sure why you're seeing only a 3GB additional overhead. I
> noticed a logic error in my query where I was checking
> maxaligned_size=pow2_size and doubling that to give sentinel space.
> That really should have been "case size=pow2_size then pow2_size * 2
> else pow2_size end", However, after adjusting the query, it does not
> seem to change the results much:
> 
> postgres=# select
> postgres-# round(sum(pow2_Size)::numeric/1024/1024/1024,3) as pow2_size,
> postgres-# round(sum(case when size=pow2_size then pow2_size*2 else
> pow2_size end)::numeric/1024/1024/1024,3) as method1,
> postgres-# round(sum(case when size=pow2_size then pow2_size+8 else
> pow2_size end)::numeric/1024/1024/1024,3) as method2
> postgres-# from memstats
> postgres-# where pow2_size > 0;
>  pow2_size | method1 | method2
> ---+-+-
> 10.269 |  16.322 |  10.476
> 
> I've attached the crude patch I came up with for this.  For some
> reason it was crashing on Linux, but it ran ok on Windows, so I used
> the results from that instead.  Maybe that accounts for some
> differences as e.g sizeof(long) == 4 on 64-bit windows. I'd be
> surprised if that accounted for so many GBs though.
> 

I tried to use that patch, but "make installcheck" never completes for
me, for some reason. It seems to get stuck in infinite_recurse.sql, but
I haven't looked into the details.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index b6eeb8abab..b215940062 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -674,6 +674,15 @@ AllocSetDelete(MemoryContext context)
 	free(set);
 }
 
+static Size pow2_size(Size size)
+{
+	Size s = 1;
+	while (s < size)
+		s *= 2;
+
+	return s;
+}
+
 /*
  * AllocSetAlloc
  *		Returns pointer to allocated memory of given size or NULL if
@@ -703,9 +712,12 @@ AllocSetAlloc(MemoryContext context, Size size)
 	 * If requested size exceeds maximum for chunks, allocate an entire block
 	 * for this request.
 	 */
-	if (size > set->allocChunkLimit)
+	if (size + 1 > set->allocChunkLimit)
 	{
-		chunk_size = MAXALIGN(size);
+		fprintf(stderr, "AllocSetAlloc separate %ld %ld %ld\n", size, MAXALIGN(size), MAXALIGN(size+1));
+		fflush(stderr);
+
+		chunk_size = MAXALIGN(size+1);
 		blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
 		block = (AllocBlock) malloc(blksize);
 		if (block == NULL)
@@ -726,6 +738,11 @@ AllocSetAlloc(MemoryContext context, Size size)
 		/* set mark to catch clobber of "unused" space */
 		if (size < chunk_size)
 			set_sentinel(MemoryChunkGetPointer(chunk), size);
+		else
+		{
+			fprintf(stderr, "sentinel not added\n");
+			fflush(stderr);
+		}
 #endif
 #ifdef RANDOMIZE_ALLOCATED_MEMORY
 		/* fill the allocated space with junk */
@@ -761,13 +778,16 @@ AllocSetAlloc(MemoryContext context, Size size)
 		return MemoryChunkGetPointer(chunk);
 	}
 
+	fprintf(stderr, "AllocSetAlloc normal %ld %ld %ld\n", size, pow2_size(size), pow2_size(size+1));
+	fflush(stderr);
+
 	/*
 	 * Request is small enough to be treated as a chunk.  Look in the
 	 * corresponding 

Re: PostgreSQL 15 release announcement draft

2022-08-31 Thread Justin Pryzby
On Tue, Aug 30, 2022 at 03:58:48PM -0400, Jonathan S. Katz wrote:

> In this latest release, PostgreSQL improves on its in-memory and on-disk 
> sorting
> algorithms, with benchmarks showing speedups of 25% - 400% based on sort 
> types.

rather than "based on": "depending on the data types being sorted"

> Building on work from the previous PostgreSQL release for allowing async 
> remote
> queries, the PostgreSQL foreign data wrapper, `postgres_fdw`, can now commit
> transactions in parallel.

asynchronous

> benefits for certain workloads. On certain operating systems, PostgreSQL 15

s/certain/some ?

> supports the ability to prefetch WAL file contents and speed up recovery 
> times.

> PostgreSQL's built-in backup command, `pg_basebackup`, now supports 
> server-side
> compression of backup files with a choice of gzip, LZ4, and zstd.

remove "server-side", since they're also supported on the client-side.

> PostgreSQL 15 lets user create views that query data using the permissions of

users

> the caller, not the view creator. This option, called `security_invoker`, adds
> an additional layer of protection to ensure view callers have the correct
> permissions for working with the underlying data.

ensure *that ?

> alter server-level configuration parameters. Additionally, users can now 
> search
> for information about configuration using the `\dconfig` command from the 
> `psql`
> command-line tool.

rather than "search for information about configuration", say "list
configuration information" ?

> PostgreSQL server-level statistics are now collected in shared memory,
> eliminating the statistics collector process and writing these stats to disk.

and *the need to periodically* write these stats to disk

-- 
Justin




Re: Handle infinite recursion in logical replication setup

2022-08-31 Thread Peter Smith
On Wed, Aug 31, 2022 at 8:36 PM Amit Kapila  wrote:
>
> On Wed, Aug 31, 2022 at 11:35 AM Peter Smith  wrote:
> >
> > Here are my review comments for v43-0001.
> >
...
> > ==
> >
> > 2. doc/src/sgml/ref/create_subscription.sgml
> >
> > +  
> > +   If the subscription is created with origin = none and
> > +   copy_data = true, it will check if the publisher has
> > +   subscribed to the same table from other publishers and, if so, log a
> > +   warning to notify the user to check the publisher tables. Before 
> > continuing
> > +   with other operations the user should check that publisher tables did 
> > not
> > +   have data with different origins to prevent data inconsistency issues 
> > on the
> > +   subscriber.
> > +  
> >
> > I am not sure about that wording saying "to prevent data inconsistency
> > issues" because I thought maybe is already too late because any
> > unwanted origin data is already copied during the initial sync. I
> > think the user can do it try to clean up any problems caused before
> > things become even worse (??)
> >
>
> Oh no, it is not too late in all cases. The problem can only occur if
> the user will try to subscribe from all the different publications
> with copy_data = true. We are anyway trying to clarify in the second
> patch the right way to accomplish this. So, I am not sure what better
> we can do here. The only bulletproof way is to provide some APIs where
> users don't need to bother about all these cases, basically something
> similar to what Kuroda-San is working on in the thread [1].
>

The point of my review comment was only about the wording of the note
- specifically, you cannot "prevent" something (e,g, data
inconsistency) if it has already happened.

Maybe modify the wording like below?

BEFORE
Before continuing with other operations the user should check that
publisher tables did not have data with different origins to prevent
data inconsistency issues on the subscriber.

AFTER
If a publisher table with data from different origins was found to
have been copied to the subscriber, then some corrective action may be
necessary before continuing with other operations.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread Tom Lane
Tomas Vondra  writes:
> Focusing on the aset, vast majority of allocations (60M out of 64M) is
> small enough to use power-of-2 logic, and we go from 6.3GB to 8.2GB, so
> ~30%. Not great, not terrible.

Not sure why this escaped me before, but I remembered another argument
for not forcibly adding space for a sentinel: if you don't have room,
that means the chunk end is up against the header for the next chunk,
which means that any buffer overrun will clobber that header.  So we'll
detect the problem anyway if we validate the headers to a reasonable
extent.

The hole in this argument is that the very last chunk allocated in a
block might have no following chunk to validate.  But we could probably
special-case that somehow, maybe by laying down a sentinel in the free
space, where it will get overwritten by the next chunk when that does
get allocated.

30% memory bloat seems like a high price to pay if it's adding negligible
detection ability, which it seems is true if this argument is valid.
Is there reason to think we can't validate headers enough to catch
clobbers?

regards, tom lane




Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread David Rowley
On Thu, 1 Sept 2022 at 12:23, Tom Lane  wrote:
> Is there reason to think we can't validate headers enough to catch
> clobbers?

For non-sentinel chunks, the next byte after the end of the chunk will
be storing the block offset for the following chunk.  I think:

if (block != MemoryChunkGetBlock(chunk))
elog(WARNING, "problem in alloc set %s: bad block offset for chunk %p
in block %p",
name, chunk, block);

should catch those.

Maybe we should just consider always making room for a sentinel for
chunks that are on dedicated blocks. At most that's an extra 8 bytes
in some allocation that's either over 1024 or 8192 (depending on
maxBlockSize).

David




Re: [PATCH] Add native windows on arm64 support

2022-08-31 Thread Michael Paquier
On Wed, Aug 31, 2022 at 01:33:53PM -0400, Tom Lane wrote:
> I did not check the other XMLTABLE infrastructure, so what probably
> is best to do is keep the two output columns but change 'node()'
> to something with a more stable result; any preferences?

The input object could also be reworked so as we would not have any
ordering issues, say "pre&deeppost".
Changing only the path, you could use "/e/n2" instead of "node()", so
as we'd always get the most internal member.
--
Michael


signature.asc
Description: PGP signature


Re: Bug fix. autovacuum.c do_worker_start() associates memory allocations with TopMemoryContext rather than 'Autovacuum start worker (tmp)'

2022-08-31 Thread Michael Paquier
On Wed, Aug 31, 2022 at 01:09:22PM -0700, Andres Freund wrote:
> On 2022-08-31 16:05:03 -0400, Tom Lane wrote:
>> Even if there's only a visible leak in v15/HEAD, I'm inclined
>> to back-patch this all the way, because it's certainly buggy
>> on its own terms.  It's just the merest accident that neither
>> caller is leaking other stuff into TopMemoryContext, because
>> they both think they are using a short-lived context.
> 
> +1

Ouch.  Thanks for the quick fix and the backpatch.
--
Michael


signature.asc
Description: PGP signature


Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread Tom Lane
David Rowley  writes:
> Maybe we should just consider always making room for a sentinel for
> chunks that are on dedicated blocks. At most that's an extra 8 bytes
> in some allocation that's either over 1024 or 8192 (depending on
> maxBlockSize).

Agreed, if we're not doing that already then we should.

regards, tom lane




Re: Reducing the chunk header sizes on all memory context types

2022-08-31 Thread Tomas Vondra



On 9/1/22 02:23, Tom Lane wrote:
> Tomas Vondra  writes:
>> Focusing on the aset, vast majority of allocations (60M out of 64M) is
>> small enough to use power-of-2 logic, and we go from 6.3GB to 8.2GB, so
>> ~30%. Not great, not terrible.
> 
> Not sure why this escaped me before, but I remembered another argument
> for not forcibly adding space for a sentinel: if you don't have room,
> that means the chunk end is up against the header for the next chunk,
> which means that any buffer overrun will clobber that header.  So we'll
> detect the problem anyway if we validate the headers to a reasonable
> extent.
> 
> The hole in this argument is that the very last chunk allocated in a
> block might have no following chunk to validate.  But we could probably
> special-case that somehow, maybe by laying down a sentinel in the free
> space, where it will get overwritten by the next chunk when that does
> get allocated.
> 
> 30% memory bloat seems like a high price to pay if it's adding negligible
> detection ability, which it seems is true if this argument is valid.
> Is there reason to think we can't validate headers enough to catch
> clobbers?
> 

I'm not quite convinced the 30% figure is correct - it might be if you
ignore cases exceeding allocChunkLimit, but that also makes it pretty
bogus (because large allocations represent ~2x as much space).

You're probably right we'll notice the clobber cases due to corruption
of the next chunk header. The annoying thing is having a corrupted
header only tells you there's a corruption somewhere, but it may be hard
to know which part of the code caused it. I was hoping the sentinel
would make it easier, because we mark it as NOACCESS for valgrind. But
now I see we mark the first part of a MemoryChunk too, so maybe that's
enough.

OTOH we have platforms where valgrind is either not supported or no one
runs tests with (e.g. on rpi4 it'd take insane amounts of code). In that
case the sentinel might be helpful, especially considering alignment on
those platforms can cause funny memory issues, as evidenced by this thread.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: POC: Better infrastructure for automated testing of concurrency issues

2022-08-31 Thread Alexander Korotkov
Hi!

On Tue, Feb 23, 2021 at 3:09 AM Peter Geoghegan  wrote:
> On Tue, Dec 8, 2020 at 2:42 AM Alexander Korotkov  
> wrote:
> > Thank you for your feedback!
>
> It would be nice to use this patch to test things that are important
> but untested inside vacuumlazy.c, such as the rare
> HEAPTUPLE_DEAD/tupgone case (grep for "Ordinarily, DEAD tuples would
> have been removed by..."). Same is true of the closely related
> heap_prepare_freeze_tuple()/heap_tuple_needs_freeze() code.

I'll continue work on this patch.  The rebased patch is attached.  It
implements stop events as configure option (not runtime GUC option).

--
Regards,
Alexander Korotkov


0001-Stopevents-v3.patch
Description: Binary data


Re: pg15b3: recovery fails with wal prefetch enabled

2022-08-31 Thread Justin Pryzby
On Thu, Sep 01, 2022 at 12:05:36PM +1200, Thomas Munro wrote:
> On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby  wrote:
> > < 2022-08-31 08:44:10.495 CDT  >LOG:  checkpoint starting: end-of-recovery 
> > immediate wait
> > < 2022-08-31 08:44:10.609 CDT  >LOG:  request to flush past end of 
> > generated WAL; request 1201/1CAF84F0, current position 1201/1CADB730
> > < 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
> > base/16881/2840_vm
> > < 2022-08-31 08:44:10.609 CDT  >ERROR:  xlog flush request 1201/1CAF84F0 is 
> > not satisfied --- flushed only to 1201/1CADB730
> > < 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
> > base/16881/2840_vm
> > < 2022-08-31 08:44:10.609 CDT  >FATAL:  checkpoint request failed
> >
> > I was able to start it with -c recovery_prefetch=no, so it seems like
> > prefetch tried to do too much.  The VM runs centos7 under qemu.
> > I'm making a copy of the data dir in cases it's needed.
> 
> Hmm, a page with an LSN set 118208 bytes past the end of WAL.  It's a
> vm fork page (which recovery prefetch should ignore completely).  Did
> you happen to get a copy before the successful recovery?  After the
> successful recovery, what LSN does that page have, and can you find
> the references to it in the WAL with eg pg_waldump -R 1663/16681/2840
> -F vm?  Have you turned FPW off (perhaps this is on ZFS?)?

Yes, I have a copy that reproduces the issue:

#1  0x009a0df4 in errfinish (filename=, 
filename@entry=0xa15535 "xlog.c", lineno=lineno@entry=2671, 
funcname=funcname@entry=0xa1da27 <__func__.22763> "XLogFlush") at elog.c:588
#2  0x0055f1cf in XLogFlush (record=19795985532144) at xlog.c:2668
#3  0x00813b24 in FlushBuffer (buf=0x7fffdf1f8900, reln=) at bufmgr.c:2889
#4  0x00817a5b in SyncOneBuffer (buf_id=buf_id@entry=7796, 
skip_recently_used=skip_recently_used@entry=false, 
wb_context=wb_context@entry=0x7fffcdf0) at bufmgr.c:2576
#5  0x008181c2 in BufferSync (flags=flags@entry=358) at bufmgr.c:2164
#6  0x008182f5 in CheckPointBuffers (flags=flags@entry=358) at 
bufmgr.c:2743
#7  0x005587b2 in CheckPointGuts (checkPointRedo=19795985413936, 
flags=flags@entry=358) at xlog.c:6855
#8  0x0055feb3 in CreateCheckPoint (flags=flags@entry=358) at 
xlog.c:6534
#9  0x007aceaa in CheckpointerMain () at checkpointer.c:455
#10 0x007aac52 in AuxiliaryProcessMain 
(auxtype=auxtype@entry=CheckpointerProcess) at auxprocess.c:153
#11 0x007b0bd8 in StartChildProcess (type=) at 
postmaster.c:5430
#12 0x007b388f in PostmasterMain (argc=argc@entry=7, 
argv=argv@entry=0xf139e0) at postmaster.c:1463
#13 0x004986a6 in main (argc=7, argv=0xf139e0) at main.c:202

It's not on zfs, and FPW have never been turned off.

I should add that this instance has been pg_upgraded since v10.

BTW, base/16881 is the postgres DB )which has 43GB of logfiles imported from
CSV, plus 2GB of snapshots of pg_control_checkpoint, pg_settings,
pg_stat_bgwriter, pg_stat_database, pg_stat_wal).

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_2619', 
'main', 0));
  lsn  | checksum | flags | lower | upper | special | pagesize | 
version | prune_xid  
---+--+---+---+---+-+--+-+
 1201/1CDD1F98 |-6200 | 1 |44 |   424 |8192 | 8192 |   
4 | 3681043287
(1 fila)

postgres=# SELECT * FROM page_header(get_raw_page('pg_toast.pg_toast_2619', 
'vm', 0));
  lsn  | checksum | flags | lower | upper | special | pagesize | 
version | prune_xid 
---+--+---+---+---+-+--+-+---
 1201/1CAF84F0 |-6010 | 0 |24 |  8192 |8192 | 8192 |   
4 | 0

I found this in waldump (note that you had a typoe - it's 16881).

[pryzbyj@template0 ~]$ sudo /usr/pgsql-15/bin/pg_waldump -R 1663/16881/2840 -F 
vm -p /mnt/tmp/15/data/pg_wal 00011201001C
rmgr: Heap2   len (rec/tot): 64/   122, tx:  0, lsn: 
1201/1CAF2658, prev 1201/1CAF2618, desc: VISIBLE cutoff xid 3681024856 flags 
0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0 FPW, blkref #1: rel 
1663/16881/2840 blk 54
rmgr: Heap2   len (rec/tot): 59/59, tx:  0, lsn: 
1201/1CAF3AF8, prev 1201/1CAF2788, desc: VISIBLE cutoff xid 2 flags 0x03, 
blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 1663/16881/2840 
blk 0
rmgr: Heap2   len (rec/tot): 59/59, tx:  0, lsn: 
1201/1CAF3B70, prev 1201/1CAF3B38, desc: VISIBLE cutoff xid 3671427998 flags 
0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 
1663/16881/2840 blk 2
rmgr: Heap2   len (rec/tot): 59/59, tx:  0, lsn: 
1201/1CAF4DC8, prev 1201/1CAF3BB0, desc: VISIBLE cutoff xid 3672889900 flags 
0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0, blkref #1: rel 
1663/16881/2840 blk 4
rmgr: Heap2   len (rec

  1   2   >