date:20231103

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-11-03 Thread Jeff Davis

On Thu, 2023-11-02 at 22:38 +0530, Bharath Rupireddy wrote:
> > I suppose the question is: should reading from the WAL buffers an
> > intentional thing that the caller does explicitly by specific
> > callers?
> > Or is it an optimization that should be hidden from the caller?
> > 
> > I tend toward the former, at least for now.
> 
> Yes, it's an optimization that must be hidden from the caller.

As I said, I tend toward the opposite: that specific callers should
read from the buffers explicitly in the cases where it makes sense.

I don't think this is the most important point right now though, let's
sort out the other details.

> > 
> At any given point of time, WAL buffer pages are maintained as a
> circularly sorted array in an ascending order from
> OldestInitializedPage to InitializedUpTo (new pages are inserted at
> this end).

I don't see any reference to OldestInitializedPage or anything like it,
with or without your patch. Am I missing something?

> - Read 6 pages starting from LSN 80. Nothing is read from WAL buffers
> as the page at LSN 80 doesn't exist despite other 5 pages starting
> from LSN 90 exist.

This is what I imagined a "partial hit" was: read the 5 pages starting
at 90. The caller would then need to figure out how to read the page at
LSN 80 from the segment files.

I am not saying we should support this case; perhaps it doesn't matter.
I'm just describing why that term was confusing for me.

> If a caller asks for an unflushed WAL read
> intentionally or unintentionally, XLogReadFromBuffers() reads only 4
> pages starting from LSN 150 to LSN 180 and will leave the remaining 2
> pages for the caller to deal with. This is the partial hit that can
> happen.

To me that's more like an EOF case. "Partial hit" sounds to me like the
data exists but is not available in the cache (i.e. go to the segment
files); whereas if it encountered the end, the data is not available at
all.

> > 
> WALBufMappingLock protects both xlblocks and WAL buffer pages [1][2].
> I'm not sure how using the memory barrier, not WALBufMappingLock,
> prevents writers from replacing WAL buffer pages while readers
> reading
> the pages.

It doesn't *prevent* that case, but it does *detect* that case. We
don't want to prevent writers from replacing WAL buffers, because that
would mean we are slowing down the critical WAL writing path.

Let me explain the potential problem cases, and how the barriers
prevent them:

Potential problem 1: the page is not yet resident in the cache at the
time the memcpy begins. In this case, the first read barrier would
ensure that the page is also not yet resident at the time xlblocks[idx]
is read into endptr1, and we'd break out of the loop.

Potential problem 2: the page is evicted before the memcpy finishes. In
this case, the second read barrier would ensure that the page was also
evicted before xlblocks[idx] is read into endptr2, and again we'd
detect the problem and break out of the loop.

I assume here that, if xlblocks[idx] holds the endPtr of the desired
page, all of the bytes for that page are resident at that moment. I
don't think that's true right now: AdvanceXLInsertBuffers() zeroes the
old page before updating xlblocks[nextidx]. I think it needs something
like:

  pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
  pg_write_barrier();

before the MemSet.

I didn't review your latest v14 patch yet.

Regards,
Jeff Davis

Improvement discussion of custom and generic plans

2023-11-03 Thread Quan Zongliang


Hi

We have one such problem. A table field has skewed data. Statistics:
n_distinct | -0.4481973
most_common_vals   | {5f006ca25b52ed78e457b150ee95a30c}
most_common_freqs  | {0.5518474}


Data generation:

CREATE TABLE s_user (
 user_id varchar(32) NOT NULL,
 corp_id varchar(32), 


 status int NOT NULL
 );

insert into s_user
select md5('user_id ' || a), md5('corp_id ' || a),
 case random()<0.877675 when true then 1 else -1 end
  FROM generate_series(1,10031) a;

insert into s_user
select md5('user_id ' || a), md5('corp_id 10032'),
 case random()<0.877675 when true then 1 else -1 end
  FROM generate_series(10031,22383) a;

CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id);

analyze s_user;


1. First, define a PREPARE statement
prepare stmt as select count(*) from s_user where status=1 and corp_id = $1;

2. Run it five times. Choose the custom plan.
explain (analyze,buffers) execute stmt('5f006ca25b52ed78e457b150ee95a30c');

Here's the plan:
 Aggregate  (cost=639.84..639.85 rows=1 width=8) (actual 
time=4.653..4.654 rows=1 loops=1)

   Buffers: shared hit=277
   ->  Seq Scan on s_user  (cost=0.00..612.76 rows=10830 width=0) 
(actual time=1.402..3.747 rows=10836 loops=1)
 Filter: ((status = 1) AND ((corp_id)::text = 
'5f006ca25b52ed78e457b150ee95a30c'::text))

 Rows Removed by Filter: 11548
 Buffers: shared hit=277
 Planning Time: 0.100 ms
 Execution Time: 4.674 ms
(8 rows)

3.From the sixth time. Choose generic plan.
We can see that there is a huge deviation between the estimate and the 
actual value:
 Aggregate  (cost=11.83..11.84 rows=1 width=8) (actual 
time=4.424..4.425 rows=1 loops=1)

   Buffers: shared hit=154 read=13
   ->  Bitmap Heap Scan on s_user  (cost=4.30..11.82 rows=2 width=0) 
(actual time=0.664..3.371 rows=10836 loops=1)

 Recheck Cond: ((corp_id)::text = $1)
 Filter: (status = 1)
 Rows Removed by Filter: 1517
 Heap Blocks: exact=154
 Buffers: shared hit=154 read=13
 ->  Bitmap Index Scan on s_user_corp_id_idx  (cost=0.00..4.30 
rows=2 width=0) (actual time=0.635..0.635 rows=12353 loops=1)

   Index Cond: ((corp_id)::text = $1)
   Buffers: shared read=13
 Planning Time: 0.246 ms
 Execution Time: 4.490 ms
(13 rows)

This is because in the choose_custom_plan function, the generic plan is 
attempted after executing the custom plan five times.


if (plansource->num_custom_plans < 5)
return true;

The generic plan uses var_eq_non_const to estimate the average selectivity.

These are facts that many people already know. So a brief introduction.


Our users actually use such parameter conditions in very complex PREPARE 
statements. Once they use the generic plan for the sixth time. The 
execution time will change from 5 milliseconds to 5 minutes.



To improve this problem. The following approaches can be considered:

1. Determine whether data skew exists in the PREPARE statement parameter 
conditions based on the statistics.

However, there is no way to know if the user will use the skewed parameter.

2.When comparing the cost of the generic plan with the average cost of 
the custom plan(function choose_custom_plan). Consider whether the 
maximum cost of a custom plan executed is an order of magnitude 
different from the cost of a generic plan.
If the first five use a small selectivity condition. And after the sixth 
use a high selectivity condition. Problems will still arise.


3.Trace the execution time of the PREPARE statement. When an execution 
time is found to be much longer than the average execution time, the 
custom plan is forced to run.



Is there any better idea?

--
Quan Zongliang

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread Jeff Davis

On Fri, 2023-11-03 at 10:51 +1300, Thomas Munro wrote:
> bowerbird and hammerkop didn't like commit a02b37fc.  They're still
> using the old 3rd build system that is not tested by CI.  It's due
> for
> removal in the 17 cycle IIUC but in the meantime I guess the new
> codegen script needs to be invoked by something under src/tools/msvc?
> 
>   varlena.obj : error LNK2019: unresolved external symbol
> unicode_category referenced in function unicode_assigned
> [H:\\prog\\bf\\root\\HEAD\\pgsql.build\\postgres.vcxproj]

I think I just need to add unicode_category.c to @pgcommonallfiles in
Mkvcbuild.pm. I'll do a trial commit tomorrow and see if that fixes it
unless someone has a better suggestion.

Regards,
Jeff Davis

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread David Rowley

On Fri, 3 Nov 2023 at 20:49, Jeff Davis  wrote:
>
> On Fri, 2023-11-03 at 10:51 +1300, Thomas Munro wrote:
> > bowerbird and hammerkop didn't like commit a02b37fc.  They're still
> > using the old 3rd build system that is not tested by CI.  It's due
> > for
> > removal in the 17 cycle IIUC but in the meantime I guess the new
> > codegen script needs to be invoked by something under src/tools/msvc?
> >
> >   varlena.obj : error LNK2019: unresolved external symbol
> > unicode_category referenced in function unicode_assigned
> > [H:\\prog\\bf\\root\\HEAD\\pgsql.build\\postgres.vcxproj]
>
> I think I just need to add unicode_category.c to @pgcommonallfiles in
> Mkvcbuild.pm. I'll do a trial commit tomorrow and see if that fixes it
> unless someone has a better suggestion.

(I didn't realise this was being discussed.)

Thomas mentioned this to me earlier today. After looking I also
concluded that unicode_category.c needed to be added to
@pgcommonallfiles. After looking at the time, I didn't expect you to
be around so opted just to push that to fix the MSVC buildfarm
members.

Sorry for the duplicate effort and/or stepping on your toes.

David

Re: Tab completion regression test failed on illumos

2023-11-03 Thread Thomas Munro

On Fri, Nov 3, 2023 at 3:14 PM Japin Li  wrote:
> Thanks for your explantation, the termios.c_oflag on Illumos enables TABDLY
> by default.

It seems that various other open source Unixen dropped that between 29
and 2 years ago, but not illumos.  I guess no one ever had IO::Pty
installed on an older OpenBSD or NetBSD machine or we'd have seen this
problem there too, but as of a few years ago they behave like Linux
and FreeBSD: no tab expansion.

https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/bsd/sys/ttydefaults.h#L79
https://github.com/freebsd/freebsd-src/commit/210df5b10c855161149dd7a1e88f610972f2afaa
https://github.com/NetBSD/src/commit/44a07dbdbdcb2b9e14340856c8267dc659a0ebd8
https://github.com/openbsd/src/commit/818e463522f2237e9da1be8aa7958dcc8af28fca
https://github.com/illumos/illumos-gate/blob/0f9b8dcfdb872a210003f6b077d091b793c24a6e/usr/src/uts/common/io/tty_common.c#L35

Re: Report search_path value back to the client.

2023-11-03 Thread Jelte Fennema-Nio

I wanted to revive this thread, since it's by far one of the most
common foot guns that people run into with PgBouncer. Almost all
session level SET commands leak across transactions, but SET
search_path is by far the one with the biggest impact when it is not
the setting that you expect. As well as being a very common setting to
change. In the Citus extension we actually change search_path to be
GUC_REPORT so that PgBouncer can track it, because otherwise Citus its
schema based sharding starts breaking completely when using PgBouncer.
[1]

On Fri, 3 Nov 2023 at 09:38, Tom Lane  wrote:
> I'm against this on a couple of different grounds:
>
> 1. Performance.  ...
>
> 2. Semantics.  The existing GUC_REPORT variables are all things that
> directly relate to client-visible behavior, eg how values of type
> timestamp will be interpreted and printed.  search_path is no such thing,

>
> We could possibly alleviate problem #1 by changing the behavior of guc.c
> so it doesn't report every single transition of flagged variables, but
> only (say) once just before ReadyForQuery if the variable changed in
> the just-finished command.  That's not exactly a one-line fix though.

The proposed fix for #1 has been committed since PG14 in
2432b1a04087edc2fd9536c7c9aa4ca03fd1b363

So that only leaves #2. I think search_path can arguably be considered
to be client visible, because it changes how the client and its
queries are parsed by Postgres. And even if you don't agree with that
argument, it's simply not true that the only GUCs that are GUC_REPORT
are related to client-visible behaviour. Things like application_name
and is_superuser are also GUC_REPORT [2].

> so it's hard to make a principled argument for reporting it that doesn't
> lead to the conclusion that you want *everything* reported.  (In
> particular, I don't believe at all your argument that this would help
> pgbouncer significantly.)

To be clear, I would like it to be configurable which settings are
GUC_REPORT. Because there are others that are useful for PgBouncer to
track (e.g. statement_timeout and lock_timeout) That's why I've been
active on the thread proposing such a change [3]. But that thread has
not been getting a lot of attention, I guess because it's a large
change that includes protocol changes. So that's why I'm reviving this
thread again. Since search_path is by far the most painful setting for
PgBouncer users. A common foot-gun is that running pg_dump causes
breakage for other clients, because its "SET search_path" is leaked to
the next transaction [4].

[1]: 
https://github.com/citusdata/citus/blob/21646ca1e96175370be1472a14d5ab1baa55b471/src/backend/distributed/shared_library_init.c#L2686-L2695
[2]: https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-ASYNC
[3]: 
https://www.postgresql.org/message-id/flat/0a6f5e6b-65b3-4272-260d-a17ce8f5b3a4%40eisentraut.org#bd6d06db7db9e4ded8100e4160050b17
[4]: https://github.com/pgbouncer/pgbouncer/issues/452

Re: Report search_path value back to the client.

2023-11-03 Thread Jelte Fennema-Nio

For completeness attached is a tiny patch implementing this, so this
thread can be added to the commit fest.


v1-0001-Mark-search_path-as-GUC_REPORT.patch
Description: Binary data

Re: Incorrect file reference in comment in procarray.c

2023-11-03 Thread Etsuro Fujita

On Thu, Nov 2, 2023 at 10:20 PM Daniel Gustafsson  wrote:
> > On 2 Nov 2023, at 13:40, Etsuro Fujita  wrote:
> > Attached is a small patch for that: s/heapam_visibility.c/snapmgr.c/.
>
> No objections to the patch, the change is correct.  However, with git grep and
> ctags and other ways of discovery I wonder if we're not better off avoiding
> such references to filenames which are prone to going stale (and do from time
> to time).

Agreed.  As XidInMVCCSnapshot() is an extern function, such a tool
would allow the reader to easily find the source file that contains
the definition of that function.

Thanks for the comment!

Best regards,
Etsuro Fujita

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread John Naylor

On Sat, Oct 28, 2023 at 4:15 AM Jeff Davis  wrote:
>
> I plan to commit something like v3 early next week unless someone else
> has additional comments or I missed a concern.

Hi Jeff, is the CF entry titled "Unicode character general category
functions" ready to be marked committed?

Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-03 Thread Mark Hills

I'm having errors restoring with pg_restore to v16.0, it appears to be a 
regression or bug. The same file restored to v15.4 without problem.

During the restore:

  pg_restore: error: could not execute query: ERROR:  type "hash" does not exist
  LINE 7: )::hash;
  [...]
  CONTEXT:  SQL function "gen_hash" during inlining

It prompted me to separate the restore into steps:

* An initial "--schema-only" completes
* The "--data-only" when the error takes place

I also double-checked for no mismatch of client/server etc.

For now, I can use 15.4 for this one-off task so will have to kick this 
can down the road.

But I think it worth reporting that something in 16.0 appears to be 
failing on valid data (or maybe there is an incompatibility with a dump 
from 13.5?)

Thanks

-- 
Mark



$ export DUMP="$HOME/tmp/production.pgdump"


$ pg_restore --dbname=stattrx --no-owner --no-privileges --schema-only 
--verbose --exit-on-error $DUMP
[succeeds, no errors]


$ pg_restore --dbname=stattrx --no-owner --no-privileges --data-only --verbose 
--exit-on-error $DUMP
pg_restore: connecting to database for restore
pg_restore: processing data for table "public.authentic"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 4183; 0 58291 TABLE DATA authentic postgres
pg_restore: error: could not execute query: ERROR:  type "hash" does not exist
LINE 7: )::hash;
   ^
QUERY:
SELECT
substring(
regexp_replace(
encode(gen_random_bytes(1024), 'base64'),
'[^a-zA-Z0-9]', '', 'g') for $1
)::hash;

CONTEXT:  SQL function "gen_hash" during inlining
Command was: COPY public.authentic (key, generated, peer, expires, studio) FROM 
stdin;


$ pg_restore --version
pg_restore (PostgreSQL) 16.0


$ pg_restore --list $DUMP
;
; Archive created at 2023-10-30 06:47:01 GMT
; dbname: production
; TOC Entries: 227
; Compression: gzip
; Dump Version: 1.14-0
; Format: CUSTOM
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 13.5
; Dumped by pg_dump version: 13.5
;
;
; Selected TOC Entries:
;
4; 3079 57533 EXTENSION - btree_gist
4212; 0 0 COMMENT - EXTENSION btree_gist
2; 3079 492253 EXTENSION - ltree
4213; 0 0 COMMENT - EXTENSION ltree
3; 3079 58156 EXTENSION - pgcrypto
4214; 0 0 COMMENT - EXTENSION pgcrypto
1022; 1247 58194 DOMAIN public handle postgres
1026; 1247 58197 DOMAIN public hash postgres
[...]
504; 1255 58233 FUNCTION public gen_hash(integer) postgres
[...]


--
-- Relevant SQL declarations
--

CREATE DOMAIN hash AS text
CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');

CREATE OR REPLACE FUNCTION gen_hash(int)
RETURNS hash AS
$$
SELECT
substring(
regexp_replace(
encode(gen_random_bytes(1024), 'base64'),
'[^a-zA-Z0-9]', '', 'g') for $1
)::hash;
$$ LANGUAGE SQL;

Re: pg_upgrade and logical replication

2023-11-03 Thread vignesh C

On Thu, 2 Nov 2023 at 11:05, Peter Smith  wrote:
>
> ~~~
>
> 2c.
> In a recent similar thread [1], they chose to implement a guc_hook to
> prevent a user from overriding this via the command line option during
> the upgrade. Shouldn't this patch do the same thing, for consistency?

Added GUC hook for consistency.

> ~~~
>
> 2d.
> If you do implement such a guc_hook (per #2c above), then should the
> patch also include a test case for getting an ERROR if the user tries
> to override that GUC?

Added a test for the same.

We can use this patch if we are planning to go ahead with guc_hooks
for max_slot_wal_keep_size as discussed at [1].
The attached patch has the changes for the same.

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPsTrB%3DmjBA-Y-%2BW4kK63tao9%3DXBsMXG9rkw4g_m9WatwA%40mail.gmail.com


Regards,
Vignesh
From 00050247bad78b331dc1f841296dd40b3f37ecaf Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 3 Nov 2023 14:57:48 +0530
Subject: [PATCH] Added GUC hook for max_logical_replication_workers.

During a binary upgrade, pg_upgrade sets this variable to 0 via the command
line in an attempt to prevent startup of the logical replication launcher
which may start apply workers that could start receiving changes from the
publisher before the physical files are put in place, causing corruption on
the new cluster upgrading to, but users have ways to override it. Added
GUC hook to prevent overriding of max_logical_replication_workers
configuration.
---
 src/backend/utils/init/postinit.c  | 24 +
 src/backend/utils/misc/guc_tables.c|  2 +-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 36 ++
 src/include/utils/guc_hooks.h  |  2 ++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 552cf9d950..22e37d34e7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,30 @@ check_max_wal_senders(int *newval, void **extra, GucSource source)
 	return true;
 }
 
+/*
+ * GUC check_hook for max_logical_replication_workers
+ *
+ * During a binary upgrade, pg_upgrade sets this variable to 0 via the command
+ * line in an attempt to prevent startup of the logical replication launcher
+ * which may start apply workers that could start receiving changes from the
+ * publisher before the physical files are put in place, causing corruption on
+ * the new cluster upgrading to, but users have ways to override it. To ensure
+ * the successful completion of the upgrade, it's essential to keep this
+ * variable unaltered.  See start_postmaster() in pg_upgrade for more details.
+ */
+bool
+check_max_logical_replication_workers(int *newval, void **extra,
+	  GucSource source)
+{
+	if (IsBinaryUpgrade && *newval)
+	{
+		GUC_check_errdetail("\"%s\" must be set to 0 during binary upgrade mode.",
+			"max_logical_replication_workers");
+		return false;
+	}
+	return true;
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 7605eff9b9..4a5ff3d317 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3049,7 +3049,7 @@ struct config_int ConfigureNamesInt[] =
 		},
 		&max_logical_replication_workers,
 		4, 0, MAX_BACKENDS,
-		NULL, NULL, NULL
+		check_max_logical_replication_workers, NULL, NULL
 	},
 
 	{
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index c6d83d3c21..a6ca422c58 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -371,6 +371,42 @@ $oldnode->start;
 $oldnode->safe_psql('postgres', 'DROP DATABASE regression_invalid');
 $oldnode->stop;
 
+command_fails(
+	[
+		'pg_upgrade', '--no-sync','-d', $oldnode->data_dir,
+		'-D', $newnode->data_dir, '-b', $oldbindir,
+		'-B', $newbindir, '-s', $newnode->host,
+		'-p', $oldnode->port, '-P', $newnode->port,
+		'-O -c max_logical_replication_workers=1',
+		$mode,'--check',
+	],
+	'run of pg_upgrade with invalid max_logical_replication_workers');
+ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+# Verify the reason why the logical replication slot cannot be upgraded
+my $upgrade_logfile;
+
+# Find pg_upgrade_server text file. We cannot predict the file's path because
+# the output directory contains a milliseconds timestamp.
+# File::Find::find must be used.
+find(
+	sub {
+		if ($File::Find::name =~ m/pg_upgrade_server\.log/)
+		{
+			$upgrade_logfile = $File::Find::name;
+		}
+	},
+	$newnode->data_dir . "/pg_upgrade_output.d");
+
+# Check that the server has logged a message saying server start failed with
+# invalid max_logical_replication_workers error.
+like(
+	slurp_f

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-03 Thread Xiang Gao

On Date: Thu, 2 Nov 2023 09:35:50AM -0500, Nathan Bossart wrote:

>On Thu, Nov 02, 2023 at 06:17:20AM +, Xiang Gao wrote:
>> After reading the discussion, I understand that in order to avoid performance
>> regression in some instances, we need to try our best to avoid runtime 
>> checks.
> >I don't know if I understand it correctly.

>The idea is that we don't want to start forcing runtime checks on builds
>where we aren't already doing runtime checks.  IOW if the compiler can use
>the ARMv8 CRC instructions with the default compiler flags, we should only
>use vmull_p64() if it can also be used with the default compiler flags.  I
>suspect this limitation sounds worse than it actually is in practice.  The
>vast majority of the buildfarm uses runtime checks, and at least some of
>the platforms that don't, such as the Apple M-series machines, seem to
>include +crypto by default.

>Of course, if a compiler picks up +crc but not +crypto in its defaults, we
>could lose the vmull_p64() optimization on that platform.  But as noted in
>the other thread, we can revisit if these kinds of hypothetical situations
>become reality.

>> Could you please give me some suggestions about how to refine this patch?

>Of course.  I think we'll ultimately want to independently check for the
>availability of the new instruction like we do for the other sets of
>intrinsics:
>
>   PGAC_ARMV8_VMULL_INTRINSICS([])
>   if test x"$pgac_armv8_vmull_intrinsics" != x"yes"; then
>   PGAC_ARMV8_VMULL_INTRINSICS([-march=armv8-a+crypto])
>   fi
>
>My current thinking is that we'll want to add USE_ARMV8_VMULL and
>USE_ARMV8_VMULL_WITH_RUNTIME_CHECK and use those to decide exactly what to
>compile.  I'll admit I haven't fully thought through every detail yet, but
>I'm cautiously optimistic that we can avoid too much complexity in the
>autoconf/meson scripts.

Thank you so much!
This is the newest patch, I think the code for which crc algorithm to choose is 
a bit complicated. Maybe we can just use USE_ARMV8_VMULL only, and do runtime 
checks on the vmull_p64 instruction at all times. This will not affect the 
existing builds, because this is a new instruction and new logic. In addition, 
it  can also reduce the complexity of the code.
Very much looking forward to receiving your suggestions, thank you!
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


0005-crc32c-parallel-computation-optimization-on-arm.patch
Description:  0005-crc32c-parallel-computation-optimization-on-arm.patch

Re: Popcount optimization using AVX512

2023-11-03 Thread Matthias van de Meent

On Thu, 2 Nov 2023 at 15:22, Amonson, Paul D  wrote:
>
> This proposal showcases the speed-up provided to popcount feature when using 
> AVX512 registers. The intent is to share the preliminary results with the 
> community and get feedback for adding avx512 support for popcount.
>
> Revisiting the previous discussion/improvements around this feature, I have 
> created a micro-benchmark based on the pg_popcount() in PostgreSQL's current 
> implementations for x86_64 using the newer AVX512 intrinsics. Playing with 
> this implementation has improved performance up to 46% on Intel's Sapphire 
> Rapids platform on AWS. Such gains will benefit scenarios relying on popcount.

How does this compare to older CPUs, and more mixed workloads? IIRC,
the use of AVX512 (which I believe this instruction to be included in)
has significant implications for core clock frequency when those
instructions are being executed, reducing overall performance if
they're not a large part of the workload.

> My setup:
>
> Machine: AWS EC2 m7i - 16vcpu, 64gb RAM
> OS : Ubuntu 22.04
> GCC: 11.4 and 12.3 with flags "-mavx -mavx512vpopcntdq -mavx512vl 
> -march=native -O2".
>
> 1. I copied the pg_popcount() implementation into a new C/C++ project using 
> cmake/make.
> a. Software only and
> b. SSE 64 bit version
> 2. I created an implementation using the following AVX512 intrinsics:
> a. _mm512_popcnt_epi64()
> b. _mm512_reduce_add_epi64()
> 3. I tested random bit streams from 64 MiB to 1024 MiB in length (5 sizes; 
> repeatable with RNG seed [std::mt19937_64])

Apart from the two type functions bytea_bit_count and bit_bit_count
(which are not accessed in postgres' own systems, but which could want
to cover bytestreams of >BLCKSZ) the only popcount usages I could find
were on objects that fit on a page, i.e. <8KiB in size. How does
performance compare for bitstreams of such sizes, especially after any
CPU clock implications are taken into account?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Synchronizing slots from primary to standby

2023-11-03 Thread Amit Kapila

On Thu, Nov 2, 2023 at 2:35 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Here is the new version patch set(V29) which addressed Peter comments[1][2] 
> and
> fixed one doc compile error.
>

Few comments:
==
1.
+   
+failover (boolean)
+
+ 
+  Specifies whether the replication slot assocaited with the
subscription
+  is enabled to be synced to the physical standbys so that logical
+  replication can be resumed from the new primary after failover.
+  The default is true.

Why do you think it is a good idea to keep the default value as true?
I think the user needs to enable standby for syncing slots which is
not a default feature, so by default, the failover property should
also be false. AFAICS, it is false for create_slot SQL API as per the
below change; so that way also keeping default true for a subscription
doesn't make sense.
@@ -479,6 +479,7 @@ CREATE OR REPLACE FUNCTION
pg_create_logical_replication_slot(
 IN slot_name name, IN plugin name,
 IN temporary boolean DEFAULT false,
 IN twophase boolean DEFAULT false,
+IN failover boolean DEFAULT false,
 OUT slot_name name, OUT lsn pg_lsn)

BTW, the below change indicates that the code treats default as false;
so, it seems to be a documentation error.
@@ -157,6 +158,8 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
  opts->runasowner = false;
  if (IsSet(supported_opts, SUBOPT_ORIGIN))
  opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+ if (IsSet(supported_opts, SUBOPT_FAILOVER))
+ opts->failover = false;

2.
-
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
  *

Spurious line removal.

3.
+ else if (opts.slot_name && failover_enabled)
+ {
+ walrcv_alter_slot(wrconn, opts.slot_name, opts.failover);
+ ereport(NOTICE,
+ (errmsg("altered replication slot \"%s\" on publisher",
+ opts.slot_name)));
+ }

I think we can add a comment to describe why it makes sense to enable
the failover property of the slot in this case. Can we change the
notice message to: "enabled failover for replication slot \"%s\" on
publisher"

4.
 libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
- bool temporary, bool two_phase, CRSSnapshotAction snapshot_action,
- XLogRecPtr *lsn)
+ bool temporary, bool two_phase, bool failover,
+ CRSSnapshotAction snapshot_action, XLogRecPtr *lsn)
 {
  PGresult   *res;
  StringInfoData cmd;
@@ -913,7 +917,14 @@ libpqrcv_create_slot(WalReceiverConn *conn, const
char *slotname,
  else
  appendStringInfoChar(&cmd, ' ');
  }
-
+ if (failover)
+ {
+ appendStringInfoString(&cmd, "FAILOVER");
+ if (use_new_options_syntax)
+ appendStringInfoString(&cmd, ", ");
+ else
+ appendStringInfoChar(&cmd, ' ');
+ }

I don't see a corresponding change in repl_gram.y. I think the
following part of the code needs to be changed:
/* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options] */
| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
create_slot_options

You also need to update the docs for the same. See [1].

5.
@@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
fcinfo, bool confirm, bool bin
  NameStr(MyReplicationSlot->data.plugin),
  format_procedure(fcinfo->flinfo->fn_oid;
..
+ if (XLogRecPtrIsInvalid(upto_lsn))
+ wal_to_wait = end_of_wal;
+ else
+ wal_to_wait = Min(upto_lsn, end_of_wal);
+
+ /* Initialize standby_slot_names_list */
+ SlotSyncInitConfig();
+
+ /*
+ * Wait for specified streaming replication standby servers (if any)
+ * to confirm receipt of WAL upto wal_to_wait.
+ */
+ WalSndWaitForStandbyConfirmation(wal_to_wait);
+
+ /*
+ * The memory context used to allocate standby_slot_names_list will be
+ * freed at the end of this call. So free and nullify the list in
+ * order to avoid usage of freed list in the next call to this
+ * function.
+ */
+ SlotSyncFreeConfig();

What if there is an error in WalSndWaitForStandbyConfirmation() before
calling SlotSyncFreeConfig()? I think the problem you are trying to
avoid by freeing it here can occur. I think it is better to do this in
a logical decoding context and free the list along with it as we are
doing in commit c7256e6564(see PG15). Also, it is better to allocate
this list somewhere in WalSndWaitForStandbyConfirmation(), probably in
WalSndGetStandbySlots, that will make the code look neat and also
avoid allocating this list when failover is not enabled for the slot.

6.
+/* ALTER_REPLICATION_SLOT slot */
+alter_replication_slot:
+ K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'

I think you need to update the docs for this new command. See existing docs [1].

[1] - https://www.postgresql.org/docs/devel/protocol-replication.html

-- 
With Regards,
Amit Kapila.

Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-11-03 Thread Nisha Moond

On Thu, Nov 2, 2023 at 11:52 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 31 Oct 2023 18:11:48 +0530, vignesh C  wrote in
> > Few others are also facing this problem with similar code like in:
> > https://stackoverflow.com/questions/15882799/fgets-returning-error-for-file-returned-by-popen
>
> I'm inclined to believe that the pipe won't enter the EOF state until
> the target command terminates (then the top-level cmd.exe). The target
> command likely terminated prematurely due to an error before priting
> any output.
>
> If we append "2>&1" to the command line, we can capture the error
> message through the returned pipe if any. Such error messages will
> cause the subsequent code to fail with an error such as "unexpected
> string: 'the output'". I'm not sure, but if this is permissive, the
> returned error messages could potentially provide insight into the
> underlying issue, paving the way for a potential solution.
>

Appending '2>&1 test:
The command still results in NULL and ends up failing as no data is
returned. Which means even no error message is returned. The error log
with appended '2>$1' is -

no data was returned by command
""D:/Project/pg1/postgres/tmp_install/bin/pg_resetwal" -V 2>&1"

check for "D:/Project/pg1/postgres/tmp_install/bin/pg_resetwal"
failed: cannot execute
Failure, exiting

Further observations:
1. To make sure the forked process completes before fgets(), I tested
with Sleep(100) before fgets() call.
...
...
if ((pgver = popen(cmd, "r")) == NULL)
{
perror("popen failure");
return NULL;
}

errno = 0;
Sleep(100);
if (fgets(line, maxsize, pgver) == NULL)
{
if (feof(pgver))
fprintf(stderr, "no data was returned by command \"%s\"\n", cmd);
...
...

This also doesn't resolve the issue, the error is still seen intermittently.

2.  Even though fgets() fails, the output is still getting captured in
'line' string.
Tested with printing the 'line' in case of failure:
...
...
if ((pgver = popen(cmd, "r")) == NULL)
{
perror("popen failure");
return NULL;
}

errno = 0;
if (fgets(line, maxsize, pgver) == NULL)
{
if (line)
  fprintf(stderr, "cmd output - %s\n", line);

if (feof(pgver))
  fprintf(stderr, "no data was returned by command \"%s\"\n", cmd);
…
…
And the log looks like -
cmd output - postgres (PostgreSQL) 17devel
no data was returned by command
""D:/Project/pg1/postgres/tmp_install/bin/pg_controldata" -V"

check for "D:/Project/pg1/postgres/tmp_install/bin/pg_controldata"
failed: cannot execute
Failure, exiting

Attached test result log for the same - "regress_log_003_logical_slots".

Thanks,
Nisha Moond


regress_log_003_logical_slots
Description: Binary data

Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-03 Thread Andrew Dunstan




On 2023-11-03 Fr 06:17, Mark Hills wrote:

I'm having errors restoring with pg_restore to v16.0, it appears to be a
regression or bug. The same file restored to v15.4 without problem.

During the restore:

   pg_restore: error: could not execute query: ERROR:  type "hash" does not 
exist
   LINE 7: )::hash;
   [...]
   CONTEXT:  SQL function "gen_hash" during inlining

It prompted me to separate the restore into steps:

* An initial "--schema-only" completes
* The "--data-only" when the error takes place

I also double-checked for no mismatch of client/server etc.

For now, I can use 15.4 for this one-off task so will have to kick this
can down the road.

But I think it worth reporting that something in 16.0 appears to be
failing on valid data (or maybe there is an incompatibility with a dump
from 13.5?)



In general you should use pg_dump from the version you want to restore 
into. Dumps from earlier versions might work in some cases, but there is 
no guarantee.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-11-03 Thread Shlok Kyal

Hi,

On Fri, 3 Nov 2023 at 17:14, Jacob Champion  wrote:
>
> v12 implements a first draft of a client hook, so applications can
> replace either the device prompt or the entire OAuth flow. (Andrey and
> Mahendrakar: hopefully this is close to what you need.) It also cleans
> up some of the JSON tech debt.

I went through CFbot and found that build is failing, links:

https://cirrus-ci.com/task/6061898244816896
https://cirrus-ci.com/task/6624848198238208
https://cirrus-ci.com/task/5217473314684928
https://cirrus-ci.com/task/6343373221527552

Just want to make sure you are aware of these failures.

Thanks,
Shlok Kumar Kyal

Re: RFC: Logging plan of the running query

2023-11-03 Thread Ashutosh Bapat

I have following questions related to the functionality. (Please point
me to the relevant discussion if this has been already discussed.)

1. When a backend is running nested queries, we will see the plan of
the innermost query. That query may not be the actual culprit if the
user query is running slowly. E.g a query being run as part of inner
side nested loop when the nested loop itself is the bottleneck. I
think it will be useful to print plans of all the whole query stack.

2. When a query is running in parallel worker do we want to print that
query? It may or may not be interesting given the situation. If the
overall plan itself is faulty, output of the parallel worker query is
not useful. If the plan is fine but a given worker's leg is running
slowly it may be interesting.

As a side note, you may want to fix the indentation in
ExplainAssembleLogOutput().

On Fri, Oct 27, 2023 at 6:24 PM Étienne BERSAC
 wrote:
>
> > Hi,
> >
>
> > If we use client log message, that message is shown on the target
> > process whose pid is specified by the parameter of
> > pg_log_query_plan():
> >
> >(pid:1000)=# select pg_sleep(60);
> >(pid:1001)=# select pg_log_query_plan(1000);
> >(pid:1000)=# LOG:  query plan running on backend with PID 1000 is:
> > Query Text: select pg_sleep(1000);
> > Result  (cost=0.00..0.01 rows=1 width=4)
> >   Output: pg_sleep('1000'::double precision)
> >
> > I think this is not an expected behavior and we set elevel to
> > LOG_SERVER_ONLY.
>
>
> Makes sens. Thanks.

-- 
Best Wishes,
Ashutosh Bapat

Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-03 Thread David G. Johnston

On Friday, November 3, 2023, Mark Hills  wrote:

>
>   pg_restore: error: could not execute query: ERROR:  type "hash" does not
> exist
>   LINE 7: )::hash;
>   [...]
>   CONTEXT:  SQL function "gen_hash" during inlining
>
> --
> -- Relevant SQL declarations
> --
>

Those were not all of the relevant SQL declarations.  In particular you
haven’t shown where in your schema the gen_hash gets called.

Odds are you’ve violated a “cannot execute queries in …” rule in something
like a generated column or a check expression.  That it didn’t fail before
now is just a fluke.

I seem to recall another recent report of this for v16 that goes into more
detail.

David J.

Re: meson documentation build open issues

2023-11-03 Thread Christoph Berg

Re: Peter Eisentraut
> > "meson compile" doesn't seem to build the docs by default ( see 
> > ),
> > and I'd rather it didn't, building the docs is a separate and optional
> > step for the buildfarm.
> 
> You can control this with the "docs" option for meson, as of recently.

I've been looking into switching the Debian PG 17 build to meson, but
I'm running into several problems.

* The docs are still not built by default, and -Ddocs=enabled doesn't
  change that

* None of the "build docs" targets are documented in install-meson.html

* "ninja -C build alldocs" works, but it's impossible to see what
  flavors it's actually building. Everything is autodetected, and
  perhaps I would like to no build the .txt/something variants,
  but I have no idea what switch that is, or what package I have to
  uninstall so it's not autodetected (only html and pdf are
  documented.)

  Are there any other targets for the individual formats? (I could
  probably use one for the manpages only, without the html.)

Non-doc issues:

* LLVM is off by default (ok), when I enable it with -Dllvm=auto, it
  gets detected, but no .bc files are built, nor installed

* selinux is not autodetected. It needs -Dselinux=auto, but that's not
  documented in install-meson.html

* There is no split between libdir and pkglibdir. We had used that in
  the past for libpq -> /usr/lib/x86_64-linux-gnu and PG stuff ->
  /usr/lib/postgresql/17/lib.

Christoph

Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-03 Thread Tom Lane

Mark Hills  writes:
> I'm having errors restoring with pg_restore to v16.0, it appears to be a 
> regression or bug. The same file restored to v15.4 without problem.

> During the restore:

>   pg_restore: error: could not execute query: ERROR:  type "hash" does not 
> exist
>   LINE 7: )::hash;
>   [...]
>   CONTEXT:  SQL function "gen_hash" during inlining

It looks like your gen_hash() function is not proof against being
run with a minimal search_path, which is how the restore script
would call it.  However, then it's not clear why it would've worked
in 15.4 which does the same thing.  I wonder whether you are
using this function in a column default for the troublesome
table.  If so, the discrepancy might be explained by this
fix that I just got done writing a 16.1 release note for:

  In COPY FROM, avoid evaluating column default
  values that will not be needed by the command (Laurenz Albe)

  This avoids a possible error if the default value isn't actually
  valid for the column.  Previous releases did not fail in this edge
  case, so prevent v16 from doing so.

> It prompted me to separate the restore into steps:
> * An initial "--schema-only" completes
> * The "--data-only" when the error takes place

Uh, *what* prompted you to do that?  By and large, separating a
restore into two steps creates more problems than it solves.

regards, tom lane

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-11-03 Thread Bharath Rupireddy

On Fri, Nov 3, 2023 at 12:35 PM Jeff Davis  wrote:
>
> On Thu, 2023-11-02 at 22:38 +0530, Bharath Rupireddy wrote:
> > > I suppose the question is: should reading from the WAL buffers an
> > > intentional thing that the caller does explicitly by specific
> > > callers?
> > > Or is it an optimization that should be hidden from the caller?
> > >
> > > I tend toward the former, at least for now.
> >
> > Yes, it's an optimization that must be hidden from the caller.
>
> As I said, I tend toward the opposite: that specific callers should
> read from the buffers explicitly in the cases where it makes sense.

How about adding a bool flag (read_from_wal_buffers) to
XLogReaderState so that the callers can set it if they want this
facility via XLogReaderAllocate()?

> > At any given point of time, WAL buffer pages are maintained as a
> > circularly sorted array in an ascending order from
> > OldestInitializedPage to InitializedUpTo (new pages are inserted at
> > this end).
>
> I don't see any reference to OldestInitializedPage or anything like it,
> with or without your patch. Am I missing something?

OldestInitializedPage is introduced in v14-0001 patch. Please have a look.

> > - Read 6 pages starting from LSN 80. Nothing is read from WAL buffers
> > as the page at LSN 80 doesn't exist despite other 5 pages starting
> > from LSN 90 exist.
>
> This is what I imagined a "partial hit" was: read the 5 pages starting
> at 90. The caller would then need to figure out how to read the page at
> LSN 80 from the segment files.
>
> I am not saying we should support this case; perhaps it doesn't matter.
> I'm just describing why that term was confusing for me.

Okay. Current patch doesn't support this case.

> > If a caller asks for an unflushed WAL read
> > intentionally or unintentionally, XLogReadFromBuffers() reads only 4
> > pages starting from LSN 150 to LSN 180 and will leave the remaining 2
> > pages for the caller to deal with. This is the partial hit that can
> > happen.
>
> To me that's more like an EOF case. "Partial hit" sounds to me like the
> data exists but is not available in the cache (i.e. go to the segment
> files); whereas if it encountered the end, the data is not available at
> all.

Right. We can tweak the comments around "partial hit" if required.

> > WALBufMappingLock protects both xlblocks and WAL buffer pages [1][2].
> > I'm not sure how using the memory barrier, not WALBufMappingLock,
> > prevents writers from replacing WAL buffer pages while readers
> > reading
> > the pages.
>
> It doesn't *prevent* that case, but it does *detect* that case. We
> don't want to prevent writers from replacing WAL buffers, because that
> would mean we are slowing down the critical WAL writing path.
>
> Let me explain the potential problem cases, and how the barriers
> prevent them:
>
> Potential problem 1: the page is not yet resident in the cache at the
> time the memcpy begins. In this case, the first read barrier would
> ensure that the page is also not yet resident at the time xlblocks[idx]
> is read into endptr1, and we'd break out of the loop.
>
> Potential problem 2: the page is evicted before the memcpy finishes. In
> this case, the second read barrier would ensure that the page was also
> evicted before xlblocks[idx] is read into endptr2, and again we'd
> detect the problem and break out of the loop.

Understood.

> I assume here that, if xlblocks[idx] holds the endPtr of the desired
> page, all of the bytes for that page are resident at that moment. I
> don't think that's true right now: AdvanceXLInsertBuffers() zeroes the
> old page before updating xlblocks[nextidx].

Right.

> I think it needs something like:
>
>   pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
>   pg_write_barrier();
>
> before the MemSet.

I think it works. First, xlblocks needs to be turned to an array of
64-bit atomics and then the above change. With this, all those who
reads xlblocks with or without WALBufMappingLock also need to check if
xlblocks[idx] is ever InvalidXLogRecPtr and take appropriate action.

I'm sure you have seen the following. It looks like I'm leaning
towards the claim that it's safe to read xlblocks without
WALBufMappingLock. I'll put up a patch for these changes separately.

/*
 * Make sure the initialization of the page becomes visible to others
 * before the xlblocks update. GetXLogBuffer() reads xlblocks without
 * holding a lock.
 */
pg_write_barrier();

*((volatile XLogRecPtr *) &XLogCtl->xlblocks[nextidx]) = NewPageEndPtr;

I think the 3 things that helps read from WAL buffers without
WALBufMappingLock are: 1) couple of the read barriers in
XLogReadFromBuffers, 2) atomically initializing xlblocks[idx] to
InvalidXLogRecPtr plus a write barrier in AdvanceXLInsertBuffer(), 3)
the following sanity check to see if the read page is valid in
XLogReadFromBuffers(). If it sounds sensible, I'll work towards coding
it up. Thoughts?

+ , w

Re: remaining sql/json patches

2023-11-03 Thread Nikita Malakhov

Hi!

Read Tom Lane's note in previous discussion (quite long, so I've missed it)
on pg_proc column -

>I strongly recommend against having a new pg_proc column at all.
>I doubt that you really need it, and having one will create
>enormous mechanical burdens to making the conversion.  (For example,
>needing a catversion bump every time we convert one more function,
>or an extension version bump to convert extensions.)

so should figure out another way to do it.

Regards,
--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-03 Thread Daniel Gustafsson

> On 25 Aug 2023, at 17:40, Jim Jones  wrote:
> On 25.08.23 16:49, Vik Fearing wrote:

>> I do not think this should be addressed in this patch because there are 
>> quite a lot of functions that need to handle this.
> 
> v4 attached.

I had a look at v4 of this patch and apart from pgindenting and moving the
function within xml.c next to xmlcomment() I think this is ready.

Just like Vik says upthread we can't really claim X038 conformance without a
disclaimer, so I've added a 0002 which adds this to the XML spec conformance
page in the docs.

The attached v5 contains the above mentioned changes.  I've marked this ready
for committer in the CF app as well.

--
Daniel Gustafsson

v5-0001-Add-XMLText-function-SQL-XML-X038.patch
Description: Binary data

v5-0002-doc-Add-a-note-about-not-supporting-XML-SEQUENCE.patch
Description: Binary data

Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-03 Thread Mark Hills

On Fri, 3 Nov 2023, Tom Lane wrote:

> Mark Hills  writes:
> > I'm having errors restoring with pg_restore to v16.0, it appears to be a 
> > regression or bug. The same file restored to v15.4 without problem.
> 
> > During the restore:
> 
> >   pg_restore: error: could not execute query: ERROR:  type "hash" does not 
> > exist
> >   LINE 7: )::hash;
> >   [...]
> >   CONTEXT:  SQL function "gen_hash" during inlining
> 
> It looks like your gen_hash() function is not proof against being
> run with a minimal search_path, which is how the restore script
> would call it.

Yes, that makes sense.

I suppose I didn't expect these functions to be invoked at all on 
pg_restore, as seems to have been the case before, because...

> However, then it's not clear why it would've worked
> in 15.4 which does the same thing.  I wonder whether you are
> using this function in a column default for the troublesome
> table.

Yes, it's just a simple DEFAULT:

  CREATE TABLE authentic (
   key hash NOT NULL UNIQUE DEFAULT gen_hash(32),

and so every row would have a value.

> If so, the discrepancy might be explained by this fix that I just got 
> done writing a 16.1 release note for:
> 
> 
> 
>  
>   In COPY FROM, avoid evaluating column default
>   values that will not be needed by the command (Laurenz Albe)
>  
> 
>  
>   This avoids a possible error if the default value isn't actually
>   valid for the column.  Previous releases did not fail in this edge
>   case, so prevent v16 from doing so.
>  
> 

Indeed, that like a good match to this issue.

Is there a thread or link for this? Interested in the positive change that 
had this side effect.

And I think this could imply that v16 can pg_dump a data set which itself 
cannot restore? Imagining that might be considered a more serious bug. 
Only needs a column default that invokes another function or type, and 
that would seem relatively common.

> > It prompted me to separate the restore into steps:
> > * An initial "--schema-only" completes
> > * The "--data-only" when the error takes place
> 
> Uh, *what* prompted you to do that?  By and large, separating a
> restore into two steps creates more problems than it solves.

Only to try and bisect the problem in some way to try and make a 
reasonable bug report :)

Thanks

-- 
Mark

Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-03 Thread Vik Fearing


On 11/3/23 16:30, Daniel Gustafsson wrote:

On 25 Aug 2023, at 17:40, Jim Jones  wrote:


Just like Vik says upthread we can't really claim X038 conformance without a
disclaimer, so I've added a 0002 which adds this to the XML spec conformance
page in the docs.



We should put a short version of the disclaimer in sql_features.txt as well.
--
Vik Fearing

Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-03 Thread Tom Lane

Mark Hills  writes:
> On Fri, 3 Nov 2023, Tom Lane wrote:
>> However, then it's not clear why it would've worked
>> in 15.4 which does the same thing.  I wonder whether you are
>> using this function in a column default for the troublesome
>> table.

> Yes, it's just a simple DEFAULT:

>   CREATE TABLE authentic (
>key hash NOT NULL UNIQUE DEFAULT gen_hash(32),

> and so every row would have a value.

Right, so the 910eb61b2 fix explains it.  I guess I'd better
expand the release note entry, because we'd not foreseen this
particular failure mode.

> Is there a thread or link for this? Interested in the positive change that 
> had this side effect.

You could look at the commit:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=910eb61b2

Our modern practice is for the commit log message to link to the mailing
list discussion that led up to the commit, and this one does so:

> Discussion: 
> https://postgr.es/m/75a7b7483aeb331aa017328d606d568fc715b90d.ca...@cybertec.at

That message says

>> Bisecting shows that the regression was introduced by commit 9f8377f7a2,
>> which introduced DEFAULT values for COPY FROM.

regards, tom lane

Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-03 Thread Jim Jones

Hi Daniel, hi Vik,

Thanks a lot for the review!

On 03.11.23 16:45, Vik Fearing wrote:
> We should put a short version of the disclaimer in sql_features.txt as
> well.
You mean to add a disclaimer in the X038 entry? Something along these
lines perhaps?

X038    XMLText            YES     It does not address the , as it is not supported in
PostgreSQL.

Jim

Re: meson documentation build open issues

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-03 15:26:05 +0100, Christoph Berg wrote:
> Re: Peter Eisentraut
> > > "meson compile" doesn't seem to build the docs by default ( see 
> > > ),
> > > and I'd rather it didn't, building the docs is a separate and optional
> > > step for the buildfarm.
> >
> > You can control this with the "docs" option for meson, as of recently.
>
> I've been looking into switching the Debian PG 17 build to meson, but
> I'm running into several problems.
>
> * The docs are still not built by default, and -Ddocs=enabled doesn't
>   change that

Maybe I am missing something - they aren't built by default in autoconf
either?

> * None of the "build docs" targets are documented in install-meson.html

Hm, odd, I thought they were, but you are right. There were some docs patches
that we never really could find agreement upon :/

> * "ninja -C build alldocs" works, but it's impossible to see what
>   flavors it's actually building. Everything is autodetected, and
>   perhaps I would like to no build the .txt/something variants,
>   but I have no idea what switch that is, or what package I have to
>   uninstall so it's not autodetected (only html and pdf are
>   documented.)

I think a package build should probably turn off auto-detection (
  meson setup --auto-features=disabled) and enable specific features that are
desired - in which case you get errors if they are not available. Which
presumably is the behaviour you'd like?

>   Are there any other targets for the individual formats? (I could
>   probably use one for the manpages only, without the html.)

Yes, there are.
ninja doc/src/sgml/{postgres-A4.pdf,html,postgres.html,man1}

Perhaps more interesting for your purposes, there are the
install-doc-{html,man} targets.

I remember discussing adding doc-{html,man} targets alongside
install-doc-{html,man}, not sure why we ended up not doing that. I'd be in
favor of adding them.

I've also been wondering about a 'help' target that documents important
targets in a interactively usable way.

> Non-doc issues:
>
> * LLVM is off by default (ok), when I enable it with -Dllvm=auto, it
>   gets detected, but no .bc files are built, nor installed

Support for that has not yet been merged.

> * selinux is not autodetected. It needs -Dselinux=auto, but that's not
>   documented in install-meson.html

Uh, huh. There's no documentation for --with-selinux in the installation.sgml
either, just in sepgsql.sgml. So when the relevant docs got translated to
meson, -Dselinux= wasn't documented either.

> * There is no split between libdir and pkglibdir. We had used that in
>   the past for libpq -> /usr/lib/x86_64-linux-gnu and PG stuff ->
>   /usr/lib/postgresql/17/lib.

I don't think the autoconf build currently exposes separately configuring
pkglibdir either, I think that's a debian patch? I'm entirely open to adding
an explicit configuration option for this though.

Thanks for looking at this, it's quite helpful!

Greetings,

Andres Freund

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2023-11-03 Thread Tomas Vondra

On 9/11/23 10:04, Lepikhov Andrei wrote:
> 
> 
> On Mon, Sep 11, 2023, at 11:51 AM, Andy Fan wrote:
>> Hi, 
>>
>> On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov 
>>  wrote:
>>> Hi, all.
>>>
>>> Some of my clients use JOIN's with three - four clauses. Quite 
>>> frequently, I see complaints on unreasonable switch of JOIN algorithm to 
>>> Merge Join instead of Hash Join. Quick research have shown one weak 
>>> place - estimation of an average bucket size in final_cost_hashjoin (see 
>>> q2.sql in attachment) with very conservative strategy.
>>> Unlike estimation of groups, here we use smallest ndistinct value across 
>>> all buckets instead of multiplying them (or trying to make multivariate 
>>> analysis).
>>> It works fine for the case of one clause. But if we have many clauses, 
>>> and if each has high value of ndistinct, we will overestimate average 
>>> size of a bucket and, as a result, prefer to use Merge Join. As the 
>>> example in attachment shows, it leads to worse plan than possible, 
>>> sometimes drastically worse.
>>> I assume, this is done with fear of functional dependencies between hash 
>>> clause components. But as for me, here we should go the same way, as 
>>> estimation of groups.
>>

Yes, this analysis is correct - final_cost_hashjoin assumes the clauses
may be correlated (not necessarily by functional dependencies, just that
the overall ndistinct is not a simple product of per-column ndistincts).

And it even says so in the comment before calculating bucket size:

 * Determine bucketsize fraction and MCV frequency for the inner
 * relation. We use the smallest bucketsize or MCV frequency estimated
 * for any individual hashclause; this is undoubtedly conservative.

I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.

Unfortunately, AFAICS the patch simply assumes the extreme in the
opposite direction - it assumes each clause splits the bucket for each
distinct value in the column. Which works great when it's true, but
surely it'd have issues when the columns are correlated?

I think this deserves more discussion, i.e. what happens if the
assumptions do not hold? We know what happens for the conservative
approach, but what's the worst thing that would happen for the
optimistic one?

I doubt e can simply switch from the conservative approach to the
optimistic one. Yes, it'll make some queries faster, but for other
queries it likely causes problems and slowdowns.

IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.

This however reminds me we don't use extended statistics for join
clauses at all. Which means that even with accurate extended statistics,
we can still get stuff like this for multiple join clauses:

   Hash Join  (cost=1317.00..2386.00 rows=200 width=24)
  (actual time=85.781..8574.784 rows=800 loops=1)

This is unrelated to the issue discussed here, of course, as it won't
affect join method selection for that join. But it certainly will affect
all estimates/costs above that join, which can be pretty disastrous.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-03 Thread Bruce Momjian

On Sat, Oct 28, 2023 at 11:01:59AM +0200, Michael Banck wrote:
> On Fri, Oct 27, 2023 at 05:49:42PM +0200, Laurenz Albe wrote:
> > True.  I have done that in the attached patch.
> > In this patch, it is mentioned *twice* that ALTER DEFAULT PRIVILEGES
> > only affects objects created by the current user.  I thought that
> > would not harm, but if it is too redundant, I can remove the second
> > mention.
> 
> I think it is fine, and I have marked the patch as ready-for-committer.
> 
> I think it should be applied to all branches, not just 14/15 as
> mentioned in the subject.

I have developed the attached patch on top of the alter default patch I
just applied.  It is more radical, making FOR ROLE clearer, and also
moving my new FOR ROLE text up to the first paragraph, and reordering
the paragraphs to be clearer.

I think this is too radical for backpatch to 11/12, but I think
16/master makes sense after the minor releases next week.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml
index 8a6006188d..43fd2c3888 100644
--- a/doc/src/sgml/ref/alter_default_privileges.sgml
+++ b/doc/src/sgml/ref/alter_default_privileges.sgml
@@ -90,23 +90,14 @@ REVOKE [ GRANT OPTION FOR ]
   
ALTER DEFAULT PRIVILEGES allows you to set the privileges
that will be applied to objects created in the future.  (It does not
-   affect privileges assigned to already-existing objects.)  Currently,
-   only the privileges for schemas, tables (including views and foreign
-   tables), sequences, functions, and types (including domains) can be
-   altered.  For this command, functions include aggregates and procedures.
-   The words FUNCTIONS and ROUTINES are
-   equivalent in this command.  (ROUTINES is preferred
-   going forward as the standard term for functions and procedures taken
-   together.  In earlier PostgreSQL releases, only the
-   word FUNCTIONS was allowed.  It is not possible to set
-   default privileges for functions and procedures separately.)
-  
-
-  
-   You can change default privileges only for objects that will be created by
-   yourself or by roles that you are a member of.  The privileges can be set
-   globally (i.e., for all objects created in the current database),
-   or just for objects created in specified schemas.
+   affect privileges assigned to already-existing objects.)  
+   As a non-superuser, you can change default privileges only for yourself
+   and for roles that you are a member of.  These privileges are not
+   inherited, so member roles must use SET ROLE to
+   access these privileges, or ALTER DEFAULT PRIVILEGES
+   must be run for each member role.  Privileges can be set globally
+   (i.e., for all objects created in the current database), or just for
+   objects created in specified schemas.
   
 
   
@@ -118,6 +109,19 @@ REVOKE [ GRANT OPTION FOR ]
ALTER DEFAULT PRIVILEGES.
   
 
+  
+   Currently,
+   only the privileges for schemas, tables (including views and foreign
+   tables), sequences, functions, and types (including domains) can be
+   altered.  For this command, functions include aggregates and procedures.
+   The words FUNCTIONS and ROUTINES are
+   equivalent in this command.  (ROUTINES is preferred
+   going forward as the standard term for functions and procedures taken
+   together.  In earlier PostgreSQL releases, only the
+   word FUNCTIONS was allowed.  It is not possible to set
+   default privileges for functions and procedures separately.)
+  
+
   
Default privileges that are specified per-schema are added to whatever
the global default privileges are for the particular object type.
@@ -136,12 +140,9 @@ REVOKE [ GRANT OPTION FOR ]
 target_role
 
  
-  The name of an existing role of which the current role is a member.
-  Default access privileges are not inherited, so member roles
-  must use SET ROLE to access these privileges,
-  or ALTER DEFAULT PRIVILEGES must be run for
-  each member role.  If FOR ROLE is omitted,
-  the current role is assumed.
+  If FOR ROLE is specified, this is the role that
+  will be assigned the new default privileges, or the current role
+  if not specified.

Re: Document aggregate functions better w.r.t. ORDER BY

2023-11-03 Thread Bruce Momjian

On Thu, Oct 26, 2023 at 04:05:12PM -0700, David G. Johnston wrote:
> On Thu, Oct 26, 2023 at 4:03 PM Bruce Momjian  wrote:
> 
> 
> Sure, done in the attached patch.
> 
> 
> 
> WFM.  Thank You!

Patch applied to master.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-03 Thread Vik Fearing


On 11/3/23 17:14, Jim Jones wrote:

Hi Daniel, hi Vik,

Thanks a lot for the review!

On 03.11.23 16:45, Vik Fearing wrote:

We should put a short version of the disclaimer in sql_features.txt as
well.

You mean to add a disclaimer in the X038 entry? Something along these
lines perhaps?

X038    XMLText            YES     It does not address the , as it is not supported in
PostgreSQL.


I was thinking of something much shorter than that.  Such as

X038XMLText YES supported except for RETURNING
--
Vik Fearing

Re: meson documentation build open issues

2023-11-03 Thread Christoph Berg

Re: Andres Freund
> > > You can control this with the "docs" option for meson, as of recently.
> >
> > I've been looking into switching the Debian PG 17 build to meson, but
> > I'm running into several problems.
> >
> > * The docs are still not built by default, and -Ddocs=enabled doesn't
> >   change that
> 
> Maybe I am missing something - they aren't built by default in autoconf
> either?

True, but the documentation (and this thread) reads like it should. Or
at least it should, when I explicitly say -Ddocs=enabled.

What would also help is when the tail of the meson output had a list
of features that are enabled. There's the list of "External libraries"
which is quite helpful at figuring out what's still missing, but
perhaps this could be extended:

  Features
LLVM : YES (/usr/bin/llvm-config-16)
DOCS : YES (html pdf texinfo)

Atm it's hidden in the long initial blurb of "Checking for.." and the
"NO" in there don't really stand out as much, since some of them are
normal.

> > * "ninja -C build alldocs" works, but it's impossible to see what
> >   flavors it's actually building. Everything is autodetected, and
> >   perhaps I would like to no build the .txt/something variants,
> >   but I have no idea what switch that is, or what package I have to
> >   uninstall so it's not autodetected (only html and pdf are
> >   documented.)
> 
> I think a package build should probably turn off auto-detection (
>   meson setup --auto-features=disabled) and enable specific features that are
> desired - in which case you get errors if they are not available. Which
> presumably is the behaviour you'd like?

I'm still trying to figure out the best spot in that space of options.
Currently I'm still in the phase of getting it to work at all; the end
result might well use that option.

> >   Are there any other targets for the individual formats? (I could
> >   probably use one for the manpages only, without the html.)
> 
> Yes, there are.
> ninja doc/src/sgml/{postgres-A4.pdf,html,postgres.html,man1}

Oh, that was not obvious to me that this "make $some_file" style
command would work. (But it still leaves the problem of knowing which
targets there are.)

> Perhaps more interesting for your purposes, there are the
> install-doc-{html,man} targets.

Hmm, I thought I had tried these, but apparently managed to miss them.
Thanks.

install-doc-man seems to install "man1" only, though?
(It seems to compile man5/man7, but not install them.)

> I remember discussing adding doc-{html,man} targets alongside
> install-doc-{html,man}, not sure why we ended up not doing that. I'd be in
> favor of adding them.
> 
> I've also been wondering about a 'help' target that documents important
> targets in a interactively usable way.

That is definitely missing, yes. I found out about "alldocs" only
after reading the meson files, and that took more than it should have.

> > Non-doc issues:
> >
> > * LLVM is off by default (ok), when I enable it with -Dllvm=auto, it
> >   gets detected, but no .bc files are built, nor installed
> 
> Support for that has not yet been merged.

Oh, that's a showstopper. I thought meson would already be ready for
production use. There is indeed an "experimental" note in
install-requirements.html, but not in install-meson.html

> > * selinux is not autodetected. It needs -Dselinux=auto, but that's not
> >   documented in install-meson.html
> 
> Uh, huh. There's no documentation for --with-selinux in the installation.sgml
> either, just in sepgsql.sgml. So when the relevant docs got translated to
> meson, -Dselinux= wasn't documented either.

Ok. It does show up in "External libraries" and was enabled in the
Debian packages before.

Why isn't it "auto" like the others?

> > * There is no split between libdir and pkglibdir. We had used that in
> >   the past for libpq -> /usr/lib/x86_64-linux-gnu and PG stuff ->
> >   /usr/lib/postgresql/17/lib.
> 
> I don't think the autoconf build currently exposes separately configuring
> pkglibdir either, I think that's a debian patch? I'm entirely open to adding
> an explicit configuration option for this though.

That would definitely be helpful.

> Thanks for looking at this, it's quite helpful!

Thanks for the feedback!
Christoph

Re: brininsert optimization opportunity

2023-11-03 Thread Tomas Vondra

Hi,

I took a look at this patch today. I had to rebase the patch, due to
some minor bitrot related to 9f0602539d (but nothing major). I also did
a couple tiny cosmetic tweaks, but other than that the patch seems OK.
See the attached v6.

I did some simple performance tests too, similar to those in the initial
message:

  CREATE UNLOGGED TABLE heap (i int);
  CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1);
  --
  TRUNCATE heap;
  INSERT INTO heap SELECT 1 FROM generate_series(1, 2000);

And the results look like this (5 runs each):

  master:   16448.338  16066.473  16039.166  16067.420  16080.066
  patched:  13260.065  13229.800  13254.454  13265.479  13273.693

So that's a nice improvement, even though enabling WAL will make the
relative speedup somewhat smaller.

The one thing I'm not entirely sure about is adding new stuff to the
IndexAmRoutine. I don't think we want to end up with too many callbacks
that all AMs have to initialize etc. I can't think of a different/better
way to do this, though.

Barring objections, I'll try to push this early next week, after another
round of cleanup.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom b1c47527132f0e8c39ff221b6b5adcc9678ba6ce Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 3 Nov 2023 19:17:21 +0100
Subject: [PATCH v6] Reuse revmap and brin desc in brininsert

brininsert() used to have code that performed per-tuple initialization
of the revmap. That had some overhead.

To mitigate, we introduce a new AM callback to clean up state at the end
of all inserts, and perform the revmap cleanup there.
---
 contrib/bloom/blutils.c  |  1 +
 doc/src/sgml/indexam.sgml| 14 +-
 src/backend/access/brin/brin.c   | 74 +++-
 src/backend/access/gin/ginutil.c |  1 +
 src/backend/access/gist/gist.c   |  1 +
 src/backend/access/hash/hash.c   |  1 +
 src/backend/access/index/indexam.c   | 15 ++
 src/backend/access/nbtree/nbtree.c   |  1 +
 src/backend/access/spgist/spgutils.c |  1 +
 src/backend/executor/execIndexing.c  |  5 ++
 src/include/access/amapi.h   |  4 ++
 src/include/access/brin_internal.h   |  1 +
 src/include/access/genam.h   |  2 +
 13 files changed, 109 insertions(+), 12 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index f23fbb1d9e0..4830cb3fee6 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -131,6 +131,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = blbuild;
 	amroutine->ambuildempty = blbuildempty;
 	amroutine->aminsert = blinsert;
+	amroutine->aminsertcleanup = NULL;
 	amroutine->ambulkdelete = blbulkdelete;
 	amroutine->amvacuumcleanup = blvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 30eda37afa8..a36d2eaebe7 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -139,6 +139,7 @@ typedef struct IndexAmRoutine
 ambuild_function ambuild;
 ambuildempty_function ambuildempty;
 aminsert_function aminsert;
+aminsertcleanup_function aminsertcleanup;
 ambulkdelete_function ambulkdelete;
 amvacuumcleanup_function amvacuumcleanup;
 amcanreturn_function amcanreturn;   /* can be NULL */
@@ -359,11 +360,22 @@ aminsert (Relation indexRelation,
within an SQL statement, it can allocate space
in indexInfo->ii_Context and store a pointer to the
data in indexInfo->ii_AmCache (which will be NULL
-   initially).
+   initially). If the data cached contains structures that can be simply pfree'd,
+   they will implicitly be pfree'd. But, if it contains more complex data, such
+   as Buffers or TupleDescs, additional cleanup is necessary. This additional
+   cleanup can be performed in indexinsertcleanup.
   
 
   
 
+bool
+aminsertcleanup (IndexInfo *indexInfo);
+
+   Clean up state that was maintained across successive inserts in
+   indexInfo->ii_AmCache. This is useful if the data
+   contained is complex - like Buffers or TupleDescs which need additional
+   cleanup, unlike simple structures that will be implicitly pfree'd.
+
 IndexBulkDeleteResult *
 ambulkdelete (IndexVacuumInfo *info,
   IndexBulkDeleteResult *stats,
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 25338a90e29..fd99b6eb11e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -58,6 +58,17 @@ typedef struct BrinBuildState
 	BrinMemTuple *bs_dtuple;
 } BrinBuildState;
 
+/*
+ * We use a BrinInsertState to capture running state spanning multiple
+ * brininsert invocations, within the same command.
+ */
+typedef struct BrinInsertState
+{
+	BrinRevmap *bs_rmAccess;
+	BrinDesc   *bs_desc;
+	BlockNumber bs_pages_per_range;
+}			BrinInsertState;
+
 /*
  * Struct used as "opaque" during index scans
  */
@@ -72,6 +83,7 @@ typedef struct BrinOpaque
 
 static BrinBuildState *initializ

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread Jeff Davis

On Fri, 2023-11-03 at 21:01 +1300, David Rowley wrote:
> Thomas mentioned this to me earlier today. After looking I also
> concluded that unicode_category.c needed to be added to
> @pgcommonallfiles. After looking at the time, I didn't expect you to
> be around so opted just to push that to fix the MSVC buildfarm
> members.
> 
> Sorry for the duplicate effort and/or stepping on your toes.

Thank you, no apology necessary.

Regards,
Jeff Davis

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread Jeff Davis

On Fri, 2023-11-03 at 17:11 +0700, John Naylor wrote:
> On Sat, Oct 28, 2023 at 4:15 AM Jeff Davis  wrote:
> > 
> > I plan to commit something like v3 early next week unless someone
> > else
> > has additional comments or I missed a concern.
> 
> Hi Jeff, is the CF entry titled "Unicode character general category
> functions" ready to be marked committed?

Done, thank you.

Regards,
Jeff Davis

Re: DRAFT GIST support for ORDER BY

2023-11-03 Thread Matthias van de Meent

On Mon, 30 Oct 2023 at 14:39, Michał Kłeczek  wrote:
>> On 30 Oct 2023, at 13:31, Matthias van de Meent 
>>  wrote:
>>
>>> The problem is though that right now handling of ORDER BY column clauses is 
>>> tightly coupled to BTree.
>>> It would be good to refactor the code so that semantics of ORDER BY column 
>>> could be more flexible.
>>
>> The existence of a BTREE operator class for the type is the indicator
>> that (and how) the type can be ordered - that is where PostgreSQL gets
>> its methods for ordering most types. Although I agree that it's a
>> quirk, I don't mind it that much as an indicator of how a type is
>> ordered.
>> I do agree, though, that operator classes by themselves should be able
>> to say "hey, we support full ordered retrieval as well". Right now,
>> that seems to be limited to btrees, but indeed a GiST index with
>> btree_gist columns should be able to support the same.
>
> Right now opfamily and strategy are set in PathKey before creating index scan 
> paths.
>
> The patch actually copies existing code from create_indexscan_plan
> that finds an operator OID for (pk_opfamily, pk_strategy).
> The operator is supposed to be binary with specific operand types.
>
> To create a path:
> 1) do the operator OID lookup as above
> 2) look for sortfamily of pg_amop entry for (operator did, index opfamily)
> If the sort family is the same as pk_opfamily we can create a path.
>
> The side effect is that it is possible to “ORDER BY column < ‘constant’” as 
> we have more ordering operators in pg_amop.
>
> Ideally we could look up _unary_ operator in pg_amop instead - that would 
> make sense we are actually measuring some “absolute distance”.
> But this would require more changes - createplan.c would need to decide when 
> to lookup unary and when - binary operator.

After researching this a bit more, I'm confused: If I register an opclass

CREATE OPERATOR CLASS gist_mytype_btree
DEFUALT FOR mytype USING gist
AS
OPERATOR 1 < (mytype, mytype) FOR ORDER BY mytype_ops, -- operator
<(mytype, mytype) returns bool
...
OPERATOR 15 <-> (mytype, mytype) FOR ORDER BY mytype_ops. --
operator <->(mytype, mytype) returns mytype
...

Then which order of values does the system expect the index to return
tuples in when either of these operators is applied?
Is that
  ORDER BY (index_column opr constant); but bool isn't the type
supported by the FOR ORDER BY opclass, or
  ORDER BY (index_column); but this makes no sense for distance operators.

After looking at get_relation_info() in optimizer/util/plancat.c, I
guess the difference is the difference between amhandler->amcanorder
vs amhandler->amcanorderbyop? But still it's not quite clear what the
implication for this is. Does it mean an index AM can either provide
natural ordering, or operator ordering, but not both?

>>> ORDER BY a == ORDER BY a <-> MIN_VALUE
>>> and
>>> ORDER BY a DESC == ORDER BY a <-> MAX_VALUE
>>>
>>> This allows implementing GIST ordered scans for btree_gist datatypes.
>>>
>>> This in turn makes using GIST with partitioning feasible (I have described 
>>> issues with such usage in my previous e-mails - see below).
>>
>> Did you take into account that GiST's internal distance function uses
>> floating point, and is thus only an approximation for values that
>> require more than 2^54 significant bits in their distance function?
>> For example, GiST wouldn't be guaranteed to yield correct ordering of
>> int8/bigint when you use `my_column <-> UINT64_MAX` because as far as
>> the floating point math is concerned, 0 is about as far away from
>> INT64_MAX as (say) 20 and -21.
>
> Hmm… Good point but it means ORDER BY <-> is broken for these types then?
> The patch assumes it works correctly and just uses it for ordered scans.

Huh, I didn't know this before, but apparently values are pushed onto
a reorderqueue/pairingheap if the index scan is marked
xs_recheckorderby (i.e. when the tuple order is not exact), which
would be used in this case.

So it seems like this wouldn't be much of an issue for the patch,
apart from the potential issue where this could use the pairingheap
much more than the usual ordered scan operations, which could result
in larger-than-normal memory usage. E.g. float btree ops wouldn't work
effectively at all because every reasonable value is extremely distant
from its max value.

Kind regards,

Matthias van de Meent

Re: meson documentation build open issues

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-03 19:19:17 +0100, Christoph Berg wrote:
> Re: Andres Freund
> > > > You can control this with the "docs" option for meson, as of recently.
> > >
> > > I've been looking into switching the Debian PG 17 build to meson, but
> > > I'm running into several problems.
> > >
> > > * The docs are still not built by default, and -Ddocs=enabled doesn't
> > >   change that
> >
> > Maybe I am missing something - they aren't built by default in autoconf
> > either?
>
> True, but the documentation (and this thread) reads like it should. Or
> at least it should, when I explicitly say -Ddocs=enabled.

My understanding of the intent of the options is to make meson error out if
the required dependencies are not available, not that it controls when the
build targets are built.

The reason for that is simply that the docs take too long to build.


> What would also help is when the tail of the meson output had a list
> of features that are enabled. There's the list of "External libraries"
> which is quite helpful at figuring out what's still missing, but
> perhaps this could be extended:
>
>   Features
> LLVM : YES (/usr/bin/llvm-config-16)
> DOCS : YES (html pdf texinfo)
>
> Atm it's hidden in the long initial blurb of "Checking for.." and the
> "NO" in there don't really stand out as much, since some of them are
> normal.

The summary does include both. LLVM is 'llvm', man/html docs is 'docs' and pdf
docs as 'docs_pdf'.


> > >   Are there any other targets for the individual formats? (I could
> > >   probably use one for the manpages only, without the html.)
> >
> > Yes, there are.
> > ninja doc/src/sgml/{postgres-A4.pdf,html,postgres.html,man1}
>
> Oh, that was not obvious to me that this "make $some_file" style
> command would work. (But it still leaves the problem of knowing which
> targets there are.)

Yes, you can trigger building any file that way.

The following is *not* an argument the docs targets shouldn't be documented
(working on a patch), just something that might be helpful until then /
separately. You can see which targets are built with

ninja -t targets all|grep doc/src/


> > Perhaps more interesting for your purposes, there are the
> > install-doc-{html,man} targets.
>
> Hmm, I thought I had tried these, but apparently managed to miss them.
> Thanks.
>
> install-doc-man seems to install "man1" only, though?
> (It seems to compile man5/man7, but not install them.)

Ugh, that's obviously a bug. I'll fix it.


> > > Non-doc issues:
> > >
> > > * LLVM is off by default (ok), when I enable it with -Dllvm=auto, it
> > >   gets detected, but no .bc files are built, nor installed
> >
> > Support for that has not yet been merged.
>
> Oh, that's a showstopper. I thought meson would already be ready for
> production use. There is indeed an "experimental" note in
> install-requirements.html, but not in install-meson.html

I'm working on merging it. Having it for core PG isn't a huge difficulty, the
extension story is what's been holding me back...


> > > * selinux is not autodetected. It needs -Dselinux=auto, but that's not
> > >   documented in install-meson.html
> >
> > Uh, huh. There's no documentation for --with-selinux in the 
> > installation.sgml
> > either, just in sepgsql.sgml. So when the relevant docs got translated to
> > meson, -Dselinux= wasn't documented either.
>
> Ok. It does show up in "External libraries" and was enabled in the
> Debian packages before.
>
> Why isn't it "auto" like the others?

I don't really remember why I did that, but it's platform specific, maybe
that's why I did it that way?


> > > * There is no split between libdir and pkglibdir. We had used that in
> > >   the past for libpq -> /usr/lib/x86_64-linux-gnu and PG stuff ->
> > >   /usr/lib/postgresql/17/lib.
> >
> > I don't think the autoconf build currently exposes separately configuring
> > pkglibdir either, I think that's a debian patch? I'm entirely open to adding
> > an explicit configuration option for this though.
>
> That would definitely be helpful.

I have a patch locally, will send it together with a few others in a bit.

Greetings,

Andres Freund

Re: brininsert optimization opportunity

2023-11-03 Thread Matthias van de Meent

On Fri, 3 Nov 2023 at 19:37, Tomas Vondra  wrote:
>
> Hi,
>
> I took a look at this patch today. I had to rebase the patch, due to
> some minor bitrot related to 9f0602539d (but nothing major). I also did
> a couple tiny cosmetic tweaks, but other than that the patch seems OK.
> See the attached v6.
> [...]
> Barring objections, I'll try to push this early next week, after another
> round of cleanup.

No hard objections: The principle looks fine.

I do think we should choose a better namespace than bs_* for the
fields of BrinInsertState, as BrinBuildState already uses the bs_*
namespace for its fields in the same file, but that's only cosmetic.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: meson documentation build open issues

2023-11-03 Thread Christoph Berg

Re: Andres Freund
> The reason for that is simply that the docs take too long to build.

That why I'd prefer to be able to separate arch:all and arch:any
builds, yes.

> The summary does include both. LLVM is 'llvm', man/html docs is 'docs' and pdf
> docs as 'docs_pdf'.

Sorry, I should have looked closer. :(

> The following is *not* an argument the docs targets shouldn't be documented
> (working on a patch), just something that might be helpful until then /
> separately. You can see which targets are built with
> 
> ninja -t targets all|grep doc/src/

Thanks.

> > Oh, that's a showstopper. I thought meson would already be ready for
> > production use. There is indeed an "experimental" note in
> > install-requirements.html, but not in install-meson.html
> 
> I'm working on merging it. Having it for core PG isn't a huge difficulty, the
> extension story is what's been holding me back...

In-core extensions or external ones?

> > Why isn't it "auto" like the others?
> 
> I don't really remember why I did that, but it's platform specific, maybe
> that's why I did it that way?

Isn't that kind the point of autodetecting things? Aren't bonjour and
bsd_auth autodetected as well?

> > > I don't think the autoconf build currently exposes separately configuring
> > > pkglibdir either, I think that's a debian patch? I'm entirely open to 
> > > adding
> > > an explicit configuration option for this though.
> >
> > That would definitely be helpful.
> 
> I have a patch locally, will send it together with a few others in a bit.

Thanks!

Christoph

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-11-03 Thread Jeff Davis

On Fri, 2023-11-03 at 20:23 +0530, Bharath Rupireddy wrote:
> > 
> OldestInitializedPage is introduced in v14-0001 patch. Please have a
> look.

I don't see why that's necessary if we move to the algorithm I
suggested below that doesn't require a lock.

> > 
> Okay. Current patch doesn't support this [partial hit of newer pages]
> case.

OK, no need to support it until you see a reason.
> > 

> > 
> > I think it needs something like:
> > 
> >   pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx],
> > InvalidXLogRecPtr);
> >   pg_write_barrier();
> > 
> > before the MemSet.
> 
> I think it works. First, xlblocks needs to be turned to an array of
> 64-bit atomics and then the above change.

Does anyone see a reason we shouldn't move to atomics here?

> 
>     pg_write_barrier();
> 
>     *((volatile XLogRecPtr *) &XLogCtl->xlblocks[nextidx]) =
> NewPageEndPtr;

I am confused why the "volatile" is required on that line (not from
your patch). I sent a separate message about that:

https://www.postgresql.org/message-id/784f72ac09061fe5eaa5335cc347340c367c73ac.ca...@j-davis.com

> I think the 3 things that helps read from WAL buffers without
> WALBufMappingLock are: 1) couple of the read barriers in
> XLogReadFromBuffers, 2) atomically initializing xlblocks[idx] to
> InvalidXLogRecPtr plus a write barrier in AdvanceXLInsertBuffer(), 3)
> the following sanity check to see if the read page is valid in
> XLogReadFromBuffers(). If it sounds sensible, I'll work towards
> coding
> it up. Thoughts?

I like it. I think it will ultimately be a fairly simple loop. And by
moving to atomics, we won't need the delicate comment in
GetXLogBuffer().


Regards,
Jeff Davis

Re: trying again to get incremental backup

2023-11-03 Thread Robert Haas

On Wed, Nov 1, 2023 at 8:57 AM Jakub Wartak
 wrote:
> Thanks for answering! It all sounds like this
> resync-standby-using-primary-incrbackup idea isn't fit for the current
> pg_combinebackup, but rather for a new tool hopefully in future. It
> could take the current LSN from stuck standby, calculate manifest on
> the lagged and offline standby (do we need to calculate manifest
> Checksum in that case? I cannot find code for it), deliver it via
> "UPLOAD_MANIFEST" to primary and start fetching and applying the
> differences while doing some form of copy-on-write from old & incoming
> incrbackup data to "$relfilenodeid.new" and then durable_unlink() old
> one and durable_rename("$relfilenodeid.new", "$relfilenodeid". Would
> it still be possible in theory? (it could use additional safeguards
> like rename controlfile when starting and just before ending to
> additionally block startup if it hasn't finished). Also it looks as
> per comment nearby struct IncrementalBackupInfo.manifest_files that
> even checksums are just more for safeguarding rather than core
> implementation (?)
>
> What I've meant in the initial idea is not to hinder current efforts,
> but asking if the current design will not stand in a way for such a
> cool new addition in future ?

Hmm, interesting idea. I think something like that could be made to
work. My first thought was that it would sort of suck to have to
compute a manifest as a precondition of doing this, but then I started
to think maybe it wouldn't, really. I mean, you'd have to scan the
local directory tree and collect all the filenames so that you could
remove any files that are no longer present in the current version of
the data directory which the incremental backup would send to you. If
you're already doing that, the additional cost of generating a
manifest isn't that high, at least if you don't include checksums,
which aren't required. On the other hand, if you didn't need to send
the server a manifest and just needed to send the required WAL ranges,
that would be even cheaper. I'll spend some more time thinking about
this next week.

> As per earlier test [1], I've already tried to simulate that in
> incrbackuptests-0.1.tgz/test_across_wallevelminimal.sh , but that
> worked (but that was with CTAS-wal-minimal-optimization -> new
> relfilenodeOID is used for CTAS which got included in the incremental
> backup as it's new file) Even retested that with Your v7 patch with
> asserts, same. When simulating with "BEGIN; TRUNCATE nightmare; COPY
> nightmare FROM '/tmp/copy.out'; COMMIT;" on wal_level=minimal it still
> recovers using incremental backup because the WAL contains:

TRUNCATE itself is always WAL-logged, but data added to the relation
in the same relation as the TRUNCATE isn't always WAL-logged (but
sometimes it is, depending on the relation size). So the failure case
wouldn't be missing the TRUNCATE but missing some data-containing
blocks within the relation shortly after it was created or truncated.
I think what I need to do here is avoid summarizing WAL that was
generated under wal_level=minimal. The walsummarizer process should
just refuse to emit summaries for any such WAL.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread Phil Krylov


On 2023-10-04 23:32, Chapman Flack wrote:

Well, for what reason does anybody run PG now with the encoding set
to anything besides UTF-8? I don't really have my finger on that pulse.
Could it be that it bloats common strings in their local script, and
with enough of those to store, it could matter to use the local
encoding that stores them more economically?


I do use CP1251 for storing some data which is coming in as XMLs in 
CP1251, and thus definitely fits. In UTF-8, that data would take exactly 
2x the size on disks (before compression, and pglz/lz4 won't help much 
with that).


-- Ph.

Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-03 Thread Jim Jones


On 03.11.23 19:05, Vik Fearing wrote:
> I was thinking of something much shorter than that.  Such as
>
>     X038    XMLText YES supported except for RETURNING

v6 attached includes this change and the doc addition from Daniel.

Thanks!

--
JimFrom 703c882c254826f10f7bc076a2071741d086e8f6 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 3 Nov 2023 21:15:21 +0100
Subject: [PATCH v6] Add XMLText function (SQL/XML X038)

This function implements the standard XMLTest function, which
converts text into xml text nodes. It uses the libxml2 function
xmlEncodeSpecialChars to escape predifined entites (&"<>), so
that those do not cause any conflict when concatenating the text
node output with existing xml documents.

This also adds a note in  features.sgml about not supporting
XML(SEQUENCE). The SQL specification defines a RETURNING clause
to a set of XML functions, where RETURNING CONTENT or RETURNING
SEQUENCE can be defined. Since PostgreSQL doesn't support
XML(SEQUENCE) all of these functions operate with an
implicit RETURNING CONTENT.

Author: Jim Jones 
Reviewed-by: Daniel Gustafsson 
Reviewed-by: Vik Fearing 
Discussion: https://postgr.es/m/86617a66-ec95-581f-8d54-08059cca8...@uni-muenster.de
---
 doc/src/sgml/features.sgml   |  9 +++
 doc/src/sgml/func.sgml   | 30 +++
 src/backend/catalog/sql_features.txt |  2 +-
 src/backend/utils/adt/xml.c  | 22 +
 src/include/catalog/pg_proc.dat  |  3 +++
 src/test/regress/expected/xml.out| 36 
 src/test/regress/expected/xml_1.out  | 23 ++
 src/test/regress/expected/xml_2.out  | 36 
 src/test/regress/sql/xml.sql |  7 ++
 9 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml
index 575afa3476..966fd39882 100644
--- a/doc/src/sgml/features.sgml
+++ b/doc/src/sgml/features.sgml
@@ -199,6 +199,15 @@
standard.
   
  
+
+ 
+  
+   PostgreSQL does not support the
+   RETURNING CONTENT or RETURNING SEQUENCE
+   clauses, functions which are defined to have these in the specification
+   are implicitly returning content.
+  
+ 
 

 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a6fcac0824..d963f0a0a0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14180,6 +14180,36 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 documents for processing in client applications.

 
+  
+xmltext
+
+
+ xmltext
+
+
+
+xmltext ( text ) xml
+
+
+
+ The function xmltext returns an XML value with a single
+ text node containing the input argument as its content. Predefined entities
+ like ampersand (), left and right angle brackets
+ (), and quotation marks ()
+ are escaped.
+
+
+
+ Example:
+
+
+   
+

 xmlcomment
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index b33065d7bf..80c40eaf57 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -633,7 +633,7 @@ X034	XMLAgg			YES
 X035	XMLAgg: ORDER BY option			YES	
 X036	XMLComment			YES	
 X037	XMLPI			YES	
-X038	XMLText			NO	
+X038	XMLText			YES	supported except for RETURNING
 X040	Basic table mapping			YES	
 X041	Basic table mapping: null absent			YES	
 X042	Basic table mapping: null as nil			YES	
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 2300c7ebf3..c401e7b821 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -47,6 +47,7 @@
 
 #ifdef USE_LIBXML
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -513,6 +514,27 @@ xmlcomment(PG_FUNCTION_ARGS)
 }
 
 
+Datum
+xmltext(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+	text	   *arg = PG_GETARG_TEXT_PP(0);
+	text	   *result;
+	xmlChar*xmlbuf = NULL;
+
+	xmlbuf = xmlEncodeSpecialChars(NULL, xml_text2xmlChar(arg));
+
+	Assert(xmlbuf);
+
+	result = cstring_to_text_with_len((const char *) xmlbuf, xmlStrlen(xmlbuf));
+	xmlFree(xmlbuf);
+	PG_RETURN_XML_P(result);
+#else
+	NO_XML_SUPPORT();
+	return 0;
+#endif			/* not USE_LIBXML */
+}
+
 
 /*
  * TODO: xmlconcat needs to merge the notations and unparsed entities
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 091f7e343c..f14aed422a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8793,6 +8793,9 @@
 { oid => '2922', descr => 'serialize an XML value to a character string',
   proname => 'text', prorettype => 'text', proargtypes => 'xml',
   prosrc => 'xmltotext' },
+{ oid => '3813', descr => 'generate XML text node',
+  proname => 'xmltext', proisstrict => 't', prorettype => 'xml',
+  proargtypes => 'text', prosrc => 'xmltext' },
 
 { oid => '2923', descr => 'map table contents to XML',
   proname => 'table_to

Re: Performance issues with parallelism and LIMIT

2023-11-03 Thread Tomas Vondra

On 2/22/23 13:22, Tomas Vondra wrote:
> ...
> 
>>> No opinion on these options, but worth a try. Alternatively, we could
>>> try the usual doubling approach - start with a low threshold (and set
>>> the latch frequently), and then gradually increase it up to the 1/4.
>>>
>>> That should work both for queries expecting only few rows and those
>>> producing a lot of data.
>>
>> I was thinking about this variant as well. One more alternative would be
>> latching the leader once a worker has produced 1/Nth of the LIMIT where
>> N is the number of workers. Both variants have the disadvantage that
>> there are still corner cases where the latch is set too late; but it
>> would for sure be much better than what we have today.
>>
>> I also did some profiling and - at least on my development laptop with 8
>> physical cores - the original example, motivating the batching change is
>> slower than when it's disabled by commenting out:
>>
>>     if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
>>
>> SET parallel_tuple_cost TO 0;
>> CREATE TABLE b (a int);
>> INSERT INTO b SELECT generate_series(1, 2);
>> ANALYZE b;
>> EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM b;
>>
>>  Gather  (cost=1000.00..1200284.61 rows=200375424 width=4) (actual
>> rows=2 loops=1)
>>    Workers Planned: 7
>>    Workers Launched: 7
>>    ->  Parallel Seq Scan on b  (cost=0.00..1199284.61 rows=28625061
>> width=4) (actual rows=2500 loops=8)
>>
>> Always latch: 19055 ms
>> Batching:     19575 ms
>>
>> If I find some time, I'll play around a bit more and maybe propose a patch.
>>
> 
> OK. Once you have a WIP patch maybe share it and I'll try to do some
> profiling too.
> 
 ...

 We would need something similar to CHECK_FOR_INTERRUPTS() which returns
 a NULL slot if a parallel worker is supposed to stop execution (we could
 e.g. check if the queue got detached). Or could we amend
 CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
 got detached?

>>> That sounds reasonable, but I'm not very familiar the leader-worker
>>> communication, so no opinion on how it should be done.
>>
>> I think an extra macro that needs to be called from dozens of places to
>> check if parallel execution is supposed to end is the least preferred
>> approach. I'll read up more on how CHECK_FOR_INTERRUPTS() works and if
>> we cannot actively signal the workers that they should stop.
>>
> 
> IMHO if this requires adding another macro to a bunch of ad hoc places
> is rather inconvenient. It'd be much better to fix this in a localized
> manner (especially as it seems related to a fairly specific place).
> 

David, do you still plan to try fixing these issues? I have a feeling
those issues may be fairly common but often undetected, or just brushed
of as "slow query" (AFAICS it was only caught thanks to comparing
timings before/after upgrade). Would be great to improve this.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: meson documentation build open issues

2023-11-03 Thread Andres Freund

Hi,


On 2023-11-03 20:19:18 +0100, Christoph Berg wrote:
> Re: Andres Freund
> > The reason for that is simply that the docs take too long to build.
>
> That why I'd prefer to be able to separate arch:all and arch:any
> builds, yes.

What's stopping you from doing that?  I think the only arch:any content we
have is the docs, and those you can build separately? Doc builds do trigger
generation of a handful of files besides the docs, but not more.


> > > Oh, that's a showstopper. I thought meson would already be ready for
> > > production use. There is indeed an "experimental" note in
> > > install-requirements.html, but not in install-meson.html
> >
> > I'm working on merging it. Having it for core PG isn't a huge difficulty, 
> > the
> > extension story is what's been holding me back...
>
> In-core extensions or external ones?

Both, although the difficulty of doing it is somewhat separate for each.


> > > Why isn't it "auto" like the others?
> >
> > I don't really remember why I did that, but it's platform specific, maybe
> > that's why I did it that way?
>
> Isn't that kind the point of autodetecting things? Aren't bonjour and
> bsd_auth autodetected as well?

I'd be happy to change it, unless somebody objects?


> > > > I don't think the autoconf build currently exposes separately 
> > > > configuring
> > > > pkglibdir either, I think that's a debian patch? I'm entirely open to 
> > > > adding
> > > > an explicit configuration option for this though.
> > >
> > > That would definitely be helpful.
> >
> > I have a patch locally, will send it together with a few others in a bit.
>
> Thanks!

Attached.

0001 - the bugfix for install-man only installing man1, I'll push that soon
0002 - Document --with-selinux/-Dselinux options centrally
0003 - Add doc-{html,man} targets

   I'm not quite sure it's worth it, but it's basically free, so ...

0004 - Documentation for important build targets

   I'm not entirely happy with the formatting, but it looks like that's
   mostly a CSS issue. I started a thread on fixing that on -www.

0005 - Add -Dpkglibdir option

   I guess we might want to do the same for configure if we decide to do
   this?

Greetings,

Andres Freund
>From 146f4e5a76e68c551aee55cb46bb2197166da63d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 3 Nov 2023 11:46:52 -0700
Subject: [PATCH v1 1/5] meson: docs: Install all manpages, not just ones in
 man1

In f13eb16485f I made a mistake leading to only man1 being installed. I will
report a bug suggesting that meson warn about mistakes of this sort.

Reported-by: Christoph Berg 
Discussion: https://postgr.es/m/zuu5prqo6zueb...@msg.df7cb.de
Backpatch: 16-, where the meson build was introduced
---
 doc/src/sgml/meson.build | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 16c1aa980c9..90e2c062fa8 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -225,9 +225,10 @@ if docs_dep.found()
 
   install_doc_man = custom_target('install-man',
 output: 'install-man',
+input: man,
 command: [
   python, install_files, '--prefix', dir_prefix,
-  '--install-dirs', dir_man, man],
+  '--install-dirs', dir_man, '@INPUT@'],
 build_always_stale: true, build_by_default: false,
   )
   alias_target('install-doc-man', install_doc_man)
-- 
2.38.0

>From 16109154c197d77b3e5d16144cfd383c74459025 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 3 Nov 2023 10:21:13 -0700
Subject: [PATCH v1 2/5] docs: Document --with-selinux/-Dselinux options
 centrally

Previously --with-selinux was documented for autoconf in the sepgsql
documentation and not at all for meson. There are further improvements related
to this that could be made, but this seems like a clear improvement.

Author:
Reviewed-by:
Reported-by: Christoph Berg 
Discussion: https://postgr.es/m/20231103163848.26egkh5qdgw3v...@awork3.anarazel.de
Backpatch:
---
 doc/src/sgml/installation.sgml | 21 +
 doc/src/sgml/sepgsql.sgml  | 11 ---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 9b25e9fdb1b..e1c03e21414 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1209,6 +1209,16 @@ build-postgresql:

   
 
+  
+   --with-selinux
+   
+
+ Build with selinux support, enabling the 
+ extension.
+
+   
+  
+
  
 

@@ -2640,6 +2650,17 @@ ninja install

   
  
+
+ 
+  -Dselinux={ disabled | auto | enabled }
+  
+   
+Build with selinux support, enabling the 
+extension.
+   
+  
+ 
+
 

 
diff --git a/doc/src/sgml/sepgsql.sgml b/doc/src/sgml/sepgsql.sgml
index b368e587cbf..1b848f1977c 100644
--- a/doc/src/sgml/sepgsql.sgml
+++ b/doc/src/sgml/sepgsql.sgml
@@ -87,9 +87,14 @@ Policy from con

Re: Unnecessary confirm work on logical replication

2023-11-03 Thread Tomas Vondra

On 4/11/23 14:58, Emre Hasegeli wrote:
>> In fact, the WAL sender always starts reading WAL from restart_lsn,
>> which in turn is always <= confirmed_flush_lsn. While reading WAL, WAL
>> sender may read XLOG_RUNNING_XACTS WAL record with lsn <=
>> confirmed_flush_lsn. While processing XLOG_RUNNING_XACTS record it may
>> update its restart_lsn and catalog_xmin with current_lsn = lsn fo
>> XLOG_RUNNING_XACTS record. In this situation current_lsn <=
>> confirmed_flush_lsn.
> 
> This can only happen when the WAL sender is restarted.  However in
> this case, the restart_lsn and catalog_xmin should have already been
> persisted by the previous run of the WAL sender.
> 
> I still doubt these calls are necessary.  I think there is a
> complicated chicken and egg problem here.  Here is my logic:
> 
> 1) LogicalConfirmReceivedLocation() is called explicitly when
> confirmed_flush is sent by the replication client.
> 
> 2) LogicalConfirmReceivedLocation() is the only place that updates
> confirmed_flush.
> 
> 3) The replication client can only send a confirmed_flush for a
> current_lsn it has already received.
> 
> 4) These two functions have already run for any current_lsn the
> replication client has received.
> 
> 5) These two functions call LogicalConfirmReceivedLocation() only if
> current_lsn <= confirmed_flush.
> 
> Thank you for your patience.
> 

Hi Emre,

I was going through my TODO list of messages, and I stumbled on this
thread from a couple months back. Do you still think this is something
we should do?

I see there was some discussion about whether this update is needed, or
whether current_lsn can ever be <= confirmed_flush_lsn. I think you may
be right this can't happen, but I wonder if we could verify that by an
assert in a convenient place (instead of just removing the update).

Also, did you try to quantify how expensive this is? The update seems
very cheap, but I guess you just noticed by eye-balling the code, which
is fine ofc. Even if it's cheap/not noticeable, it still may be worth
removing so as not to confuse people improving the code in the future.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Draft back-branch release notes are up

2023-11-03 Thread Tom Lane

I've finished the first draft of 16.1 release notes; see

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1e774846b870a858f8eb88b3f512562009177f45

Please send comments/corrections by Sunday.

regards, tom lane

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread Thomas Munro

On Fri, Nov 3, 2023 at 9:01 PM David Rowley  wrote:
> On Fri, 3 Nov 2023 at 20:49, Jeff Davis  wrote:
> > On Fri, 2023-11-03 at 10:51 +1300, Thomas Munro wrote:
> > > bowerbird and hammerkop didn't like commit a02b37fc.  They're still
> > > using the old 3rd build system that is not tested by CI.  It's due
> > > for
> > > removal in the 17 cycle IIUC but in the meantime I guess the new
> > > codegen script needs to be invoked by something under src/tools/msvc?
> > >
> > >   varlena.obj : error LNK2019: unresolved external symbol
> > > unicode_category referenced in function unicode_assigned
> > > [H:\\prog\\bf\\root\\HEAD\\pgsql.build\\postgres.vcxproj]
> >
> > I think I just need to add unicode_category.c to @pgcommonallfiles in
> > Mkvcbuild.pm. I'll do a trial commit tomorrow and see if that fixes it
> > unless someone has a better suggestion.
>
> (I didn't realise this was being discussed.)
>
> Thomas mentioned this to me earlier today. After looking I also
> concluded that unicode_category.c needed to be added to
> @pgcommonallfiles. After looking at the time, I didn't expect you to
> be around so opted just to push that to fix the MSVC buildfarm
> members.

Shouldn't it be added unconditionally near unicode_norm.c?  It looks
like it was accidentally made conditional on openssl, which might
explain why it worked for David but not for bowerbird.

Re: Inconsistent use of "volatile" when accessing shared memory?

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-02 23:19:03 -0700, Jeff Davis wrote:
> I noticed that we occasionally use "volatile" to access shared memory,
> but usually not; and I'm not clear on the rules for doing so. For
> instance, AdvanceXLInsertBuffer() writes to XLogCtl->xlblocks[nextidx]
> through a volatile pointer; but then immediately writes to XLogCtl-
> >InitializedUpTo with a non-volatile pointer. There are also places in
> procarray.c that make use of volatile through UINT32_ACCESS_ONCE(), and
> of course there are atomics (which use volatile as well as guaranteeing
> atomicity).

> In theory, I think we're always supposed to access shared memory
> through a volatile pointer, right? Otherwise a sufficiently smart (and
> cruel) compiler could theoretically optimize away the load/store in
> some surprising cases, or hold a value in a register longer than we
> expect, and then any memory barriers would be useless.

I don't think so. We used to use volatile for most shared memory accesses, but
volatile doesn't provide particularly useful semantics - and generates
*vastly* slower code in a lot of circumstances. Most of that usage predates
spinlocks being proper compiler barriers, which was rectified in:

commit 0709b7ee72e
Author: Robert Haas 
Date:   2014-09-09 17:45:20 -0400

Change the spinlock primitives to function as compiler barriers.

or the introduction of compiler/memory barriers in

commit 0c8eda62588
Author: Robert Haas 
Date:   2011-09-23 17:52:43 -0400

Memory barrier support for PostgreSQL.

Most instances of volatile used for shared memory access should be replaced
with explicit compiler/memory barriers, as appropriate.

Note that use of volatile does *NOT* guarantee anything about memory ordering!

> But in practice we don't do that even for sensitive structures like the
> one referenced by XLogCtl. My intuition up until now was that if we
> access through a global pointer, then the compiler wouldn't completely
> optimize away the store/load.

That's not guaranteed at all - luckily, as it'd lead to code being more
bulky. It's not that the global variable will be optimized away entirely in
many situations, but repeated accesses can sometimes be merged and the access
can be moved around.

> What is the guidance here? Is the volatile pointer use in
> AdvanceXLInsertBuffer() required, and if so, why not other places?

I don't think it's required. The crucial part is to avoid memory reordering
between zeroing the block / initializing fields and changing that field - the
relevant part for that is the pg_write_barrier(); *not* the volatile.

The volatile does prevent the compiler from deferring the update of
xlblocks[idx] to the next loop iteration. Which I guess isn't a bad idea - but
it's not required for correctness.

When an individual value is read/written from memory, and all that's desired
is to prevent the compiler for eliding/moving that operation, it can lead to
better code to use volatile *on the individual access* compared to using
pg_compiler_barrier(), because it allows the compiler to keep other variables
in registers.

Greetings,

Andres Freund

Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-11-03 Thread Jacob Champion

On Fri, Nov 3, 2023 at 5:28 AM Shlok Kyal  wrote:
> Just want to make sure you are aware of these failures.

Thanks for the nudge! Looks like I need to reconcile with the changes
to JsonLexContext in 1c99cde2. I should be able to get to that next
week; in the meantime I'll mark it Waiting on Author.

--Jacob

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-03 20:23:30 +0530, Bharath Rupireddy wrote:
> On Fri, Nov 3, 2023 at 12:35 PM Jeff Davis  wrote:
> >
> > On Thu, 2023-11-02 at 22:38 +0530, Bharath Rupireddy wrote:
> > > > I suppose the question is: should reading from the WAL buffers an
> > > > intentional thing that the caller does explicitly by specific
> > > > callers?
> > > > Or is it an optimization that should be hidden from the caller?
> > > >
> > > > I tend toward the former, at least for now.
> > >
> > > Yes, it's an optimization that must be hidden from the caller.
> >
> > As I said, I tend toward the opposite: that specific callers should
> > read from the buffers explicitly in the cases where it makes sense.
> 
> How about adding a bool flag (read_from_wal_buffers) to
> XLogReaderState so that the callers can set it if they want this
> facility via XLogReaderAllocate()?

That seems wrong architecturally - why should xlogreader itself know about any
of this? What would it mean in frontend code if read_from_wal_buffers were
set? IMO this is something that should happen purely within the read function.

Greetings,

Andres Freund

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-02 22:38:38 +0530, Bharath Rupireddy wrote:
> From 5b5469d7dcd8e98bfcaf14227e67356bbc1f5fe8 Mon Sep 17 00:00:00 2001
> From: Bharath Rupireddy 
> Date: Thu, 2 Nov 2023 15:10:51 +
> Subject: [PATCH v14] Track oldest initialized WAL buffer page
>
> ---
>  src/backend/access/transam/xlog.c | 170 ++
>  src/include/access/xlog.h |   1 +
>  2 files changed, 171 insertions(+)
>
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index b541be8eec..fdf2ef310b 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -504,6 +504,45 @@ typedef struct XLogCtlData
>   XLogRecPtr *xlblocks;   /* 1st byte ptr-s + XLOG_BLCKSZ */
>   int XLogCacheBlck;  /* highest allocated xlog 
> buffer index */
>
> + /*
> +  * Start address of oldest initialized page in XLog buffers.
> +  *
> +  * We mainly track oldest initialized page explicitly to quickly tell 
> if a
> +  * given WAL record is available in XLog buffers. It also can be used 
> for
> +  * other purposes, see notes below.
> +  *
> +  * OldestInitializedPage gives XLog buffers following properties:
> +  *
> +  * 1) At any given point of time, pages in XLog buffers array are sorted
> +  * in an ascending order from OldestInitializedPage till 
> InitializedUpTo.
> +  * Note that we verify this property for assert-only builds, see
> +  * IsXLogBuffersArraySorted() for more details.

This is true - but also not, if you look at it a bit too literally. The
buffers in xlblocks itself obviously aren't ordered when wrapping around
between XLogRecPtrToBufIdx(OldestInitializedPage) and
XLogRecPtrToBufIdx(InitializedUpTo).


> +  * 2) OldestInitializedPage is monotonically increasing (by virtue of 
> how
> +  * postgres generates WAL records), that is, its value never decreases.
> +  * This property lets someone read its value without a lock. There's no
> +  * problem even if its value is slightly stale i.e. concurrently being
> +  * updated. One can still use it for finding if a given WAL record is
> +  * available in XLog buffers. At worst, one might get false positives
> +  * (i.e. OldestInitializedPage may tell that the WAL record is available
> +  * in XLog buffers, but when one actually looks at it, it isn't really
> +  * available). This is more efficient and performant than acquiring a 
> lock
> +  * for reading. Note that we may not need a lock to read
> +  * OldestInitializedPage but we need to update it holding
> +  * WALBufMappingLock.

I'd
s/may not need/do not need/

But perhaps rephrase it a bit more, to something like:

To update OldestInitializedPage, WALBufMappingLock needs to be held
exclusively, for reading no lock is required.


> +  *
> +  * 3) One can start traversing XLog buffers from OldestInitializedPage
> +  * till InitializedUpTo to list out all valid WAL records and stats, and
> +  * expose them via SQL-callable functions to users.
> +  *
> +  * 4) XLog buffers array is inherently organized as a circular, sorted 
> and
> +  * rotated array with OldestInitializedPage as pivot with the property
> +  * where LSN of previous buffer page (if valid) is greater than
> +  * OldestInitializedPage and LSN of next buffer page (if valid) is 
> greater
> +  * than OldestInitializedPage.
> +  */
> + XLogRecPtr  OldestInitializedPage;

It seems a bit odd to name a LSN containing variable *Page...


>   /*
>* InsertTimeLineID is the timeline into which new WAL is being inserted
>* and flushed. It is zero during recovery, and does not change once 
> set.
> @@ -590,6 +629,10 @@ static ControlFileData *ControlFile = NULL;
>  #define NextBufIdx(idx)  \
>   (((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
>
> +/* Macro to retreat to previous buffer index. */
> +#define PreviousBufIdx(idx)  \
> + (((idx) == 0) ? XLogCtl->XLogCacheBlck : ((idx) - 1))

I think it might be worth making these inlines and adding assertions that idx
is not bigger than XLogCtl->XLogCacheBlck?


>  /*
>   * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
>   * would hold if it was in cache, the page containing 'recptr'.
> @@ -708,6 +751,10 @@ static void WALInsertLockAcquireExclusive(void);
>  static void WALInsertLockRelease(void);
>  static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
>
> +#ifdef USE_ASSERT_CHECKING
> +static bool IsXLogBuffersArraySorted(void);
> +#endif
> +
>  /*
>   * Insert an XLOG record represented by an already-constructed chain of data
>   * chunks.  This is a low-level routine; to construct the WAL record header
> @@ -1992,6 +2039,52 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, 
> bool opportunistic)
>   XLogCtl->InitializedUpTo

Re: Inconsistent use of "volatile" when accessing shared memory?

2023-11-03 Thread Jeff Davis

On Fri, 2023-11-03 at 15:59 -0700, Andres Freund wrote:
> I don't think so. We used to use volatile for most shared memory
> accesses, but
> volatile doesn't provide particularly useful semantics - and
> generates
> *vastly* slower code in a lot of circumstances. Most of that usage
> predates
> spinlocks being proper compiler barriers, 

A compiler barrier doesn't always force the compiler to generate loads
and stores, though.

For instance (example code I placed at the bottom of xlog.c):

  typedef struct DummyStruct {
  XLogRecPtr recptr;
  } DummyStruct;
  extern void DummyFunction(void);
  static DummyStruct Dummy = { 5 };
  static DummyStruct *pDummy = &Dummy;
  void
  DummyFunction(void)
  {
  while(true)
  {
  pg_compiler_barrier();
  pg_memory_barrier();
  if (pDummy->recptr == 0)
  break;
  pg_compiler_barrier();
  pg_memory_barrier();
  }
  }

Generates the following code (clang -O2):

  0016ed10 :
16ed10:   f0 83 04 24 00  lock addl $0x0,(%rsp)
16ed15:   f0 83 04 24 00  lock addl $0x0,(%rsp)
16ed1a:   eb f4   jmp16ed10 
16ed1c:   0f 1f 40 00 nopl   0x0(%rax)

Obviously this is an oversimplified example and if I complicate it in
any number of ways then it will start generating actual loads and
stores, and then the compiler and memory barriers should do their job.

> Note that use of volatile does *NOT* guarantee anything about memory
> ordering!

Right, but it does force loads/stores to be emitted by the compiler;
and without loads/stores a memory barrier is useless.

I understand that my example is too simple and I'm not claiming that
there's a problem. I'd just like to understand the key difference
between my example and what we do with XLogCtl.

Another way to phrase my question: under what specific circumstances
must we use something like UINT32_ACCESS_ONCE()? That seems to be used
for local pointers, but it's not clear to me exactly why that matters.
Intuitively, access through a local pointer seems much more likely to
be optimized and therefore more dangerous, but that doesn't imply that
access through global variables is not dangerous.

Regards,
Jeff Davis

Re: Explicitly skip TAP tests under Meson if disabled

2023-11-03 Thread Andres Freund

Hi,

On 2023-10-30 05:45:52 -0400, Peter Eisentraut wrote:
> Under Meson, it is not very easy to see if TAP tests have been enabled or
> disabled, if you rely on the default auto setting.  You either need to
> carefully study the meson setup output, or you notice, what a minute, didn't
> there use to be like 250 tests, not only 80?
> 
> I think it would be better if we still registered the TAP tests in Meson
> even if the tap_tests option is disabled, but with a dummy command that
> registers them as skipped.  That way you get a more informative output like

Hm, ok. I've never felt I needed this, but I can see the point.


> See attached patch for a possible implementation.  (This uses perl as a hard
> build requirement.  We are planning to do that anyway, but obviously other
> implementations, such as using python, would also be possible.)

There's already other hard dependencies on perl in the meson build (generating
kwlist etc). We certainly error out if it's not available.


>  
> -test(test_dir['name'] / onetap_p,
> -  python,
> -  kwargs: test_kwargs,
> -  args: testwrap_base + [
> -'--testgroup', test_dir['name'],
> -'--testname', onetap_p,
> -'--', test_command,
> -test_dir['sd'] / onetap,
> -  ],
> -)
> +if tap_tests_enabled
> +  test(test_dir['name'] / onetap_p,
> +python,
> +kwargs: test_kwargs,
> +args: testwrap_base + [
> +  '--testgroup', test_dir['name'],
> +  '--testname', onetap_p,
> +  '--', test_command,
> +  test_dir['sd'] / onetap,
> +],
> +  )
> +else
> +  test(test_dir['name'] / onetap_p,
> +   perl,
> +   args: ['-e', 'print "1..0 # Skipped: TAP tests not enabled"'],
> +   kwargs: test_kwargs)
> +endif

I'd just use a single test() invocation here, and add an argument to testwrap
indicating that it should print out the skipped message. That way we a) don't
need two test() invocations, b) could still see the test name etc in the test
invocation.

Greetings,

Andres Freund

Re: Inconsistent use of "volatile" when accessing shared memory?

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-03 17:44:44 -0700, Jeff Davis wrote:
> On Fri, 2023-11-03 at 15:59 -0700, Andres Freund wrote:
> > I don't think so. We used to use volatile for most shared memory
> > accesses, but
> > volatile doesn't provide particularly useful semantics - and
> > generates
> > *vastly* slower code in a lot of circumstances. Most of that usage
> > predates
> > spinlocks being proper compiler barriers,
>
> A compiler barrier doesn't always force the compiler to generate loads
> and stores, though.

I don't think volatile does so entirely either, but I don't think it's
relevant anyway.

> For instance (example code I placed at the bottom of xlog.c):
>
>   typedef struct DummyStruct {
>   XLogRecPtr recptr;
>   } DummyStruct;
>   extern void DummyFunction(void);
>   static DummyStruct Dummy = { 5 };
>   static DummyStruct *pDummy = &Dummy;
>   void
>   DummyFunction(void)
>   {
>   while(true)
>   {
>   pg_compiler_barrier();
>   pg_memory_barrier();
>   if (pDummy->recptr == 0)
>   break;
>   pg_compiler_barrier();
>   pg_memory_barrier();
>   }
>   }
>
>
> Generates the following code (clang -O2):
>
>   0016ed10 :
> 16ed10:   f0 83 04 24 00  lock addl $0x0,(%rsp)
> 16ed15:   f0 83 04 24 00  lock addl $0x0,(%rsp)
> 16ed1a:   eb f4   jmp16ed10 
> 16ed1c:   0f 1f 40 00 nopl   0x0(%rax)
>
> Obviously this is an oversimplified example and if I complicate it in
> any number of ways then it will start generating actual loads and
> stores, and then the compiler and memory barriers should do their job.

All that is happening here is that clang can prove that nothing ever could
change the variable (making this actually undefined behaviour, IIRC). If you
add another function like:

extern void
SetDummy(uint64_t lsn)
{
pDummy->recptr = lsn;
}

clang does generate the code you'd expect - there's a possibility that it
could be true.

See https://godbolt.org/z/EaM77E8jK

We don't gain anything from preventing the compiler from making such
conclusions afaict.

> > Note that use of volatile does *NOT* guarantee anything about memory
> > ordering!
>
> Right, but it does force loads/stores to be emitted by the compiler;
> and without loads/stores a memory barrier is useless.

There would not have been the point in generating that load...

> I understand that my example is too simple and I'm not claiming that
> there's a problem. I'd just like to understand the key difference
> between my example and what we do with XLogCtl.

The key difference is that there's code changing relevant variables :)

> Another way to phrase my question: under what specific circumstances
> must we use something like UINT32_ACCESS_ONCE()? That seems to be used
> for local pointers

I guess I don't really know what you mean with global or local pointers?
There's no point in ever using something like it if the memory is local to a
function.

The procarray uses all are for shared memory. Sure, we save a pointer to a
subset of shared memory in a local variable, but that doesn't change very much
(it does tell the compiler that it doesnt' need to reload ProcGlobal->xid from
memory after calling an external function (which could change it), so it's a
minor efficiency improvement).

The reason for
/* Fetch xid just once - see GetNewTransactionId */
pxid = UINT32_ACCESS_ONCE(other_xids[pgxactoff]);
is explained in a comment in GetNewTransactionId():
 *
 * Note that readers of ProcGlobal->xids/PGPROC->xid should be careful 
to
 * fetch the value for each proc only once, rather than assume they can
 * read a value multiple times and get the same answer each time.  Note 
we
 * are assuming that TransactionId and int fetch/store are atomic.

without the READ_ONCE(), the compiler could decide to not actually load the
memory contents into a register through the loop body, but to just refetch it
every time.

We could also implement this with a compiler barrier between fetching pxid and
using it - but it'd potentially lead to worse code, because instead of just
forcing one load to come from memory, it'd also force reloading *other*
variables from memory.

> but it's not clear to me exactly why that matters.  Intuitively, access
> through a local pointer seems much more likely to be optimized and therefore
> more dangerous, but that doesn't imply that access through global variables
> is not dangerous.

I really don't think there's a meaningful difference between the two. What is
interesting is where the memory points to and whether the compiler can reason
about other code potentially having had access to the memory.

E.g. with code like this (overly simple example)

int foo = 0;
int *foop = &0;

pg_memory_barrier();

if (*foop != 0)
   return false;

the compiler can easily prove that the variable could not have

Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-03 00:55:00 +0530, Bharath Rupireddy wrote:
> On Wed, Nov 1, 2023 at 4:24 AM Michael Paquier  wrote:
> >
> > On Tue, Oct 31, 2023 at 04:26:18PM +0900, torikoshia wrote:
> > > Yes, calling pg_stat_reset_shared() for all stats types can do what I 
> > > wanted
> > > to do.
> > > But calling it with 6 different parameters seems tiresome and I thought it
> > > would be convenient to have a parameter to delete all cluster-wide
> > > statistics at once.
> > > I may be wrong, but I imagine that it's more common to want to delete all 
> > > of
> > > the statistics for an entire cluster rather than just a portion of it.

Yes, agreed. However, I'd suggest adding pg_stat_reset_shared(), instead of
pg_stat_reset_shared('all') for this purpose.

> > If more flexibility is wanted in this function, could it be an option
> > to consider a flavor like pg_stat_reset_shared(text[]), where it is
> > possible to specify a list of shared stats types to reset?  Perhaps
> > there are no real use cases for it, just wanted to mention it anyway
> > regarding the fact that it could have benefits to refactor this code
> > to use a bitwise operator for its internals with bit flags for each
> > type.

I don't think there is much point in such an API - if the caller actually
wants to delete individual stats, it's not too hard to loop.

But most of the time resetting individual stats doesn't make sense. IMO it was
a mistake to ever add the ability. But that ship has sailed.

> I don't see a strong reason to introduce yet-another API when someone
> can just call things in a loop.

I don't agree at all. That requires callers to know the set of possible values
that stats need to be reset for - which has grown over time. But nearly all
the time the right thing to do is to reset *all* shared stats, not just some.

> I could recollect a recent analogy - a
> proposal to have a way to define multiple custom wait events with a
> single function call instead of callers defining in a loop didn't draw
> much interest.

That's not analoguous - in your example the caller by definition knows the set
of wait events it wants to create. Introducing a batch API wouldn't change
that.  But in case of resetting all stats the caller does *not* inherently
know the set of stats types.

Greetings,

Andres Freund

Re: make pg_ctl more friendly

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-02 14:50:14 +0800, Crisp Lee wrote:
> I got a basebackup using pg_basebackup -R. After that, I created a restore
> point named test on primary, and set recovery_target_name to test,
> recovery_target_action to shutdown in standby datadir. I got a failure
> startup message  after 'pg_ctl start -D $standby_datadir'. I think it is
> not a failure, and makes users nervous, especially for newbies.
> 
> My thought is to generate a recovery.done file if the postmaster receives
> exit code 3 from the startup process. When postmaster  exits, pg_ctl will
> give a more friendly message to users.

I think we can detect this without any additional state - pg_ctl already
accesses pg_control (via get_control_dbstate()). We should be able to detect
your case by issuing a different warning if

a) get_control_dbstate() at the start was *not* DB_SHUTDOWNED
b) get_control_dbstate() at the end is DB_SHUTDOWNED

Greetings,

Andres Freund

Re: Something seems weird inside tts_virtual_copyslot()

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-01 11:35:50 +1300, David Rowley wrote:
> Looking at the Assert inside tts_virtual_copyslot(), it does:
> 
> Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts);

I think that assert was intended to be the other way round.

> So, that seems to indicate that it's ok for the src slot to have fewer
> attributes than the destination.  The code then calls
> tts_virtual_clear(dstslot), then slot_getallattrs(srcslot); then does
> the following loop:

> for (int natt = 0; natt < srcdesc->natts; natt++)
> {
> dstslot->tts_values[natt] = srcslot->tts_values[natt];
> dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt];
> }
> 
> Seems ok so far.
>
> If the srcslot has fewer attributes then that'll leave the extra dstslot
> array elements untouched.

It is not ok even up to just here! Any access to dstslot->tts_{values,isnull}
for an attribute bigger than srcdesc->natts would now be stale, potentially
pointing to another attribute.

> Where it gets weird is inside tts_virtual_materialize().  In that
> function, we materialize *all* of the dstslot attributes, even the
> extra ones that were left alone in the for loop shown above.  Why do
> we need to materialize all of those attributes? We only need to
> materialize up to srcslot->natts.
> 
> Per the following code, only up to the srcdesc->natts would be
> accessible anyway:
> 
> dstslot->tts_nvalid = srcdesc->natts;
> 
> Virtual slots don't need any further deforming and
> tts_virtual_getsomeattrs() is coded in a way that we'll find out if
> anything tries to deform a virtual slot.
> 
> I changed the Assert in tts_virtual_copyslot() to check the natts
> match in each of the slots and all of the regression tests still pass,
> so it seems we have no tests where there's an attribute number
> mismatch...

If we want to prohibit that, I think we ought to assert this in
ExecCopySlot(), rather than just tts_virtual_copyslot.

Even that does survive the test - but I don't think it'd be really wrong to
copy from a slot with more columns into one with fewer. And it seems plausible
that that could happen somewhere, e.g. when copying from a slot in a child
partition with additional columns into a slot from the parent, where the
column types/order otherwise matches, so we don't have to use the attribute
mapping infrastructure.

> I think if we are going to support copying slots where the source and
> destination don't have the same number of attributes then the
> following comment should explain what's allowed and what's not
> allowed:
> 
> /*
> * Copy the contents of the source slot into the destination slot's own
> * context. Invoked using callback of the destination slot.
> */
> void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);

Arguably the more relevant place would be document this in ExecCopySlot(), as
that's what "users" of ExecCopySlot() would presumably would look at.  I dug a
bit in the history, and we used to say

The caller must ensure the slots have compatible tupdescs.

whatever that precisely means.

> Is the Assert() in tts_virtual_copyslot() wrong?

Yes, it's inverted.

Greetings,

Andres Freund

Re: Moving forward with TDE [PATCH v3]

2023-11-03 Thread Andres Freund

Hi,

On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> I'm quite surprised at the significant number of changes being made
> outside the core storage manager files. I thought that changing out
> mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> would be the most obvious change to implement cluster-wide encryption
> with the least code touched, as relations don't need to know whether
> the files they're writing are encrypted, right? Is there a reason to
> not implement this at the smgr level that I overlooked in the
> documentation of these patches?

You can't really implement encryption transparently inside an smgr without
significant downsides. You need a way to store an initialization vector
associated with the page (or you can store that elsewhere, but then you've
doubled the worst cse amount of random reads/writes). The patch uses the LSN
as the IV (which I doubt is a good idea). For authenticated encryption further
additional storage space is required.

To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
increases in additional situations (a more aggressive version of
wal_log_hint_bits) - which can't be done below smgr, where we don't know about
what WAL logging was done. Nor can you easily just add space on the page below
md.c, for the purpose of storing an LSN independent IV and the authentication
data.

You could decide that the security that XTS provides is sufficient (XTS could
be used without a real IV, with some downsides), but we'd pretty much prevent
ourselves from ever implementing authenticated encryption.

Greetings,

Andres Freund

Re: Pre-proposal: unicode normalized text

2023-11-03 Thread David Rowley

On Sat, 4 Nov 2023 at 10:57, Thomas Munro  wrote:
>
> On Fri, Nov 3, 2023 at 9:01 PM David Rowley  wrote:
> > On Fri, 3 Nov 2023 at 20:49, Jeff Davis  wrote:
> > > I think I just need to add unicode_category.c to @pgcommonallfiles in
> > > Mkvcbuild.pm. I'll do a trial commit tomorrow and see if that fixes it
> > > unless someone has a better suggestion.
> >
> > Thomas mentioned this to me earlier today. After looking I also
> > concluded that unicode_category.c needed to be added to
> > @pgcommonallfiles. After looking at the time, I didn't expect you to
> > be around so opted just to push that to fix the MSVC buildfarm
> > members.
>
> Shouldn't it be added unconditionally near unicode_norm.c?  It looks
> like it was accidentally made conditional on openssl, which might
> explain why it worked for David but not for bowerbird.

Well, I did that one pretty poorly :-(

I've just pushed a fix for that.  Thanks.

David

Re: Moving forward with TDE [PATCH v3]

2023-11-03 Thread Andres Freund

On 2023-11-02 19:32:28 -0700, Andres Freund wrote:
> > From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> > From: David Christensen 
> > Date: Fri, 29 Sep 2023 15:16:00 -0400
> > Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
> >
> > When using an encrypted cluster, we need to ensure that the WAL is also
> > encrypted. While we could go with an page-based approach, we use instead a
> > per-record approach, using GCM for the encryption method and storing the 
> > AuthTag
> > in the xl_crc field.

What was the reason for this decision?

?

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-11-03 Thread Andres Freund

Hi,

On 2023-10-31 17:11:26 +, John Morris wrote:
> Postgres memory reservations come from multiple sources.
> 
>   *   Malloc calls made by the Postgres memory allocators.
>   *   Static shared memory created by the postmaster at server startup,
>   *   Dynamic shared memory created by the backends.
>   *   A fixed amount (1Mb) of “initial” memory reserved whenever a process 
> starts up.
> 
> Each process also maintains an accurate count of its actual memory
> allocations. The process-private variable “my_memory” holds the total
> allocations for that process. Since there can be no contention, each process
> updates its own counters very efficiently.

I think this will introduce measurable overhead in low concurrency cases and
very substantial overhead / contention when there is a decent amount of
concurrency.  This makes all memory allocations > 1MB contend on a single
atomic.  Massive amount of energy have been spent writing multi-threaded
allocators that have far less contention than this - the current state is to
never contend on shared resources on any reasonably common path. This gives
away one of the few major advantages our process model has away.

The patch doesn't just introduce contention when limiting is enabled - it
introduces it even when memory usage is just tracked. It makes absolutely no
sense to have a single contended atomic in that case - just have a per-backend
variable in shared memory that's updated. It's *WAY* cheaper to compute the
overall memory usage during querying than to keep a running total in shared
memory.

> Pgstat now includes global memory counters. These shared memory counters
> represent the sum of all reservations made by all Postgres processes. For
> efficiency, these global counters are only updated when new reservations
> exceed a threshold, currently 1 Mb for each process. Consequently, the
> global reservation counters are approximate totals which may differ from the
> actual allocation totals by up to 1 Mb per process.

I see that you added them to the "cumulative" stats system - that doesn't
immediately makes sense to me - what you're tracking here isn't an
accumulating counter, it's something showing the current state, right?

> The max_total_memory limit is checked whenever the global counters are
> updated. There is no special error handling if a memory allocation exceeds
> the global limit. That allocation returns a NULL for malloc style
> allocations or an ENOMEM for shared memory allocations. Postgres has
> existing mechanisms for dealing with out of memory conditions.

I still think it's extremely unwise to do tracking of memory and limiting of
memory in one patch.  You should work towards and acceptable patch that just
tracks memory usage in an as simple and low overhead way as possible. Then we
later can build on that.

> For sanity checking, pgstat now includes the pg_backend_memory_allocation
> view showing memory allocations made by the backend process. This view
> includes a scan of the top memory context, so it compares memory allocations
> reported through pgstat with actual allocations. The two should match.

Can't you just do this using the existing pg_backend_memory_contexts view?

> Performance-wise, there was no measurable impact with either pgbench or a
> simple “SELECT * from series” query.

That seems unsurprising - allocations aren't a major part of the work there,
you'd have to regress by a lot to see memory allocator changes to show a
significant performance decrease.

> diff --git a/src/test/regress/expected/opr_sanity.out 
> b/src/test/regress/expected/opr_sanity.out
> index 7a6f36a6a9..6c813ec465 100644
> --- a/src/test/regress/expected/opr_sanity.out
> +++ b/src/test/regress/expected/opr_sanity.out
> @@ -468,9 +468,11 @@ WHERE proallargtypes IS NOT NULL AND
>ARRAY(SELECT proallargtypes[i]
>  FROM generate_series(1, array_length(proallargtypes, 1)) g(i)
>  WHERE proargmodes IS NULL OR proargmodes[i] IN ('i', 'b', 'v'));
> - oid | proname | proargtypes | proallargtypes | proargmodes 
> --+-+-++-
> -(0 rows)
> + oid  | proname  | proargtypes |  proallargtypes 
>   |proargmodes
> +--+--+-+---+---
> + 9890 | pg_stat_get_memory_reservation   | | 
> {23,23,20,20,20,20,20,20} | {i,o,o,o,o,o,o,o}
> + 9891 | pg_get_backend_memory_allocation | | 
> {23,23,20,20,20,20,20}| {i,o,o,o,o,o,o}
> +(2 rows)

This indicates that your pg_proc entries are broken, they need to fixed rather
than allowed here.

Greetings,

Andres Freund

Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-03 Thread Laurenz Albe

On Fri, 2023-11-03 at 12:53 -0400, Bruce Momjian wrote:
> I have developed the attached patch on top of the alter default patch I
> just applied.  It is more radical, making FOR ROLE clearer, and also
> moving my new FOR ROLE text up to the first paragraph, and reordering
> the paragraphs to be clearer.
> 
> I think this is too radical for backpatch to 11/12, but I think
> 16/master makes sense after the minor releases next week.

I think it is a good idea to move part of the text to a new paragraph.

> --- a/doc/src/sgml/ref/alter_default_privileges.sgml
> +++ b/doc/src/sgml/ref/alter_default_privileges.sgml
> @@ -90,23 +90,14 @@ REVOKE [ GRANT OPTION FOR ]
> [...]
> +   As a non-superuser, you can change default privileges only for yourself
> +   and for roles that you are a member of.  These privileges are not
> +   inherited, so member roles must use SET ROLE to
> +   access these privileges, or ALTER DEFAULT PRIVILEGES
> +   must be run for each member role.  Privileges can be set globally
> +   (i.e., for all objects created in the current database), or just for
> +   objects created in specified schemas.

That this paragraph is not clear enough about who gets the privileges and
who creates the objects, and that is one of the difficulties in understanding
ALTER DEFAULT PRIVILEGES.

Perhaps:

  ALTER DEFAULT PRIVILEGES allows you to set the privileges
  that will be applied to objects created in the future.  (It does not
  affect privileges assigned to already-existing objects.)   Privileges can be
  set globally (i.e., for all objects created in the current database), or
  just for objects created in specified schemas.

  As a non-superuser, you can change default privileges only on objects created
  by yourself or by roles that you are a member of.  If you alter the default
  privileges for a role, only objects created by that role will be affected.
  It is not sufficient to be a member of that role; member roles must use
  SET ROLE to assume the identity of the role for which
  default privileges were altered.

  There is no way to change the default privileges for objects created by
  any role.  You have run ALTER DEFAULT PRIVILEGES for all
  roles that can create objects whose default privileges should be modified.

> @@ -136,12 +140,9 @@ REVOKE [ GRANT OPTION FOR ]
>  target_role
>  
>   
> -  The name of an existing role of which the current role is a member.
> -  Default access privileges are not inherited, so member roles
> -  must use SET ROLE to access these privileges,
> -  or ALTER DEFAULT PRIVILEGES must be run for
> -  each member role.  If FOR ROLE is omitted,
> -  the current role is assumed.
> +  If FOR ROLE is specified, this is the role that
> +  will be assigned the new default privileges, or the current role
> +  if not specified.

This is downright wrong; the "target_role" will *not* be assigned any
privileges.

Perhaps:

  Default privileges are changed only for objects created by
  target_role.  If FOR ROLE
  is omitted, the current role is assumed.

Yours,
Laurenz Albe

65 matches

Mail list logo