Re: New "raw" COPY format

2024-10-19 Thread Joel Jacobson
On Fri, Oct 18, 2024, at 19:24, Joel Jacobson wrote:
> Attachments:
> * v11-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch
> * v11-0002-Add-raw-format-to-COPY-command.patch

Here is a demo of a importing a decently sized real text file,
that can't currently be imported without the CSV hack:

$ head 
/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
.package-cache-mutate   devel/cargo
bin admin/base-files
bin/archdetect  admin/ubiquity
bin/autopartition   admin/ubiquity
bin/autopartition-cryptoadmin/ubiquity
bin/autopartition-loop  admin/ubiquity
bin/autopartition-lvm   admin/ubiquity
bin/block-attr  admin/ubiquity
bin/blockdev-keygen admin/ubiquity
bin/blockdev-wipe   admin/ubiquity

This file uses a combination of tabs and spaces, in between the two columns,
so none of the existing formats are suitable to deal with this file.

$ ls -lah 
/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
-rw-r--r-- 1 root root 791M Apr 24 02:07 
/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64

To import using the CSV hack, we first have find two bytes that don't exist 
anyway,
which can be done using e.g. ripgrep. The below command verifies \x01 and \x02
don't exist anywhere:

$ rg -uuu --multiline '(?-u)[\x01|\x02]' 
/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64
$

Knowing these bytes don't exist anywhere,
we can then safely use these as delimiter and quote characters,
as a hack to disable these features:

CREATE TABLE package_contents (raw_line text);

COPY package_contents FROM 
'/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64' 
(FORMAT CSV, DELIMITER E'\x01', QUOTE E'\x02');
COPY 8443588
Time: 3882.100 ms (00:03.882)
Time: 3552.991 ms (00:03.553)
Time: 3748.038 ms (00:03.748)
Time: 3775.947 ms (00:03.776)
Time: 3729.020 ms (00:03.729)

I tested writing a Rust program that would read the file line-by-line and 
INSERT each line instead.
This is of course a lot slower, since it has to execute each insert separately:

$ cargo run --release
   Compiling insert_package_contents v0.1.0 (/home/joel/insert_package_contents)
Finished `release` profile [optimized] target(s) in 0.70s
 Running `target/release/insert_package_contents`
Connecting to the PostgreSQL database...
Successfully connected to the database.
Starting to insert lines from the file...
Successfully inserted 8443588 lines into package_contents in 134.65s.

New approach using the RAW format:

COPY package_contents FROM 
'/var/lib/apt/lists/se.archive.ubuntu.com_ubuntu_dists_noble_Contents-amd64' 
(FORMAT RAW, DELIMITER E'\n');
COPY 8443588
Time: 2918.489 ms (00:02.918)
Time: 3020.372 ms (00:03.020)
Time: 3336.589 ms (00:03.337)
Time: 3067.268 ms (00:03.067)
Time: 3343.694 ms (00:03.344)

Apart from the convenience improvement,
it seems to be somewhat faster already.

/Joel




Using read_stream in index vacuum

2024-10-19 Thread Andrey M. Borodin
Hi hackers!

On a recent hacking workshop [0] Thomas mentioned that patches using new API 
would be welcomed.
So I prototyped streamlining of B-tree vacuum for a discussion.
When cleaning an index we must visit every index tuple, thus we uphold a 
special invariant:
After checking a trailing block, it must be last according to subsequent 
RelationGetNumberOfBlocks(rel) call.

This invariant does not allow us to completely replace block loop with 
streamlining. That's why streamlining is done only for number of blocks 
returned by first RelationGetNumberOfBlocks(rel) call. A tail is processed with 
regular ReadBufferExtended().

Also, it's worth mentioning that we have to jump to the left blocks from a 
recently split pages. We also do it with regular ReadBufferExtended(). That's 
why signature btvacuumpage() now accepts a buffer, not a block number.


I've benchmarked the patch on my laptop (MacBook Air M3) with following 
workload:
1. Initialization
create unlogged table x as select random() r from generate_series(1,1e7);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
vacuum;
2. pgbench with 1 client
insert into x select random() from generate_series(0,10) x;
vacuum x;

On my laptop I see ~3% increase in TPS of the the pgbench (~ from 101 to 104), 
but statistical noise is very significant, bigger than performance change. 
Perhaps, a less noisy benchmark can be devised.

What do you think? If this approach seems worthwhile, I can adapt same 
technology to other AMs.


Best regards, Andrey Borodin.

[0] 
https://rhaas.blogspot.com/2024/08/postgresql-hacking-workshop-september.html



0001-Prototype-B-tree-vacuum-streamlineing.patch
Description: Binary data


Re: Wrong security context for deferred triggers?

2024-10-19 Thread Pavel Stehule
so 19. 10. 2024 v 13:02 odesílatel Laurenz Albe 
napsal:

> On Sat, 2024-10-19 at 06:17 +0200, Pavel Stehule wrote:
> > Maybe we should document so the trigger is executed with the identity
> used by DML command always,
> > when the trigger function has no SECURITY DEFINER flag?
>
> Good idea.  Version 3 has documentation.
>

perfect

all tests passed

I'll mark this patch as ready for committer

Regards

Pavel


>
> Yours,
> Laurenz Albe
>


Re: New "raw" COPY format

2024-10-19 Thread Joel Jacobson
On Sat, Oct 19, 2024, at 12:13, jian he wrote:
> We already make RAW and can only have one column.
> if RAW has no default delimiter, then COPY FROM a text file will
> become one datum value;
> which makes it looks like importing a Large Object.
> (https://www.postgresql.org/docs/17/lo-funcs.html)

The single datum value might not come from a physical column; it could be
an aggregated JSON value, as in the example Daniel mentioned:

On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote:
>   copy (select json_agg(col) from table ) to 'file' RAW
>
> This is a variant of the discussion in [1] where the OP does:
>
>  copy (select json_agg(row_to_json(t)) from  t) TO 'file' 
>
> and he complains that both text and csv "break the JSON".
> That discussion morphed into a proposed patch adding JSON
> format to COPY,  but RAW would work directly as the OP
> expected.
>
> That is, unless  happens to include JSON fields with LF/CRLF
> in them, and the RAW format says this is an error condition.
> In that case it's quite annoying to make it an error, rather than
> simply let it pass.
>
> [1]
> https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=k...@mail.gmail.com

In such cases, a user could perform the following:

CREATE TABLE customers (id int, name text, email text);

INSERT INTO customers (id, name, email) VALUES
(1, 'John Doe', 'john@example.com'),
(2, 'Jane Smith', 'jane.sm...@example.com'),
(3, 'Alice Johnson', 'alice.john...@example.com');

COPY (SELECT json_agg(row_to_json(t)) FROM customers t) TO '/tmp/file' 
(FORMAT raw);

% cat /tmp/file
[{"id":1,"name":"John Doe","email":"john@example.com"}, 
{"id":2,"name":"Jane Smith","email":"jane.sm...@example.com"}, 
{"id":3,"name":"Alice Johnson","email":"alice.john...@example.com"}]%

> i think, most of the time, you have more than one row/value to import
> and export?

Yes, probably, but it might not be a physical row. It could be an aggregated
one, like in the example above. When importing, it might be a large JSON array
of objects that is imported into a temporary table and then deserialized into
a proper schema.

The need to load entire files is already fulfilled by pg_read_file(text) -> 
text,
but there is no pg_write_file(), likely for security reasons.
So COPY TO ... (FORMAT RAW) with no delimiter seems necessary,
and then COPY FROM also needs to work accordingly.

>> The refactoring is now in a separate first single commit, which seems
>> necessary, to separate the new functionality, from the refactoring.
> I agree.

> ProcessCopyOptions
> /* Extract options from the statement node tree */
> foreach(option, options)
> {
> }
> /* --- DELIMITER option --- */
> /* --- NULL option --- */
> /* --- QUOTE option --- */
> Currently the regress test passed, i think that means your refactor is fine.

I believe that a passing test indicates it might be okay,
but a failing test definitely means it's not. :D

I've meticulously refactored one option at a time, checking which code in
ProcessCopyOptions depends on each option field to ensure the semantics
are preserved.

I think the changes are easy to follow, and it's clear that each change is
correct when looking at them individually, though it might be more challenging
when viewing the total change.

I've tried to minimize code movement, preserving as much of the original
code placement as possible.

> in ProcessCopyOptions, maybe we can rearrange the code after the
> foreach loop (foreach(option, options)
> based on the parameters order in
> https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
> so we can review it by comparing the refactoring with the
> sql-copy.html Parameters section's description.

That would be nice, but unfortunately, it's not possible because the order of
the option code blocks matters due to the setting of defaults in else/else
if branches when an option is not specified.

For example, in the documentation, DEFAULT precedes QUOTE,
but in ProcessCopyOptions, the QUOTE code block must come before
the DEFAULT code block due to the check:

   /* Don't allow the CSV quote char to appear in the default string. */

I also believe there's value in minimizing code movement.


>> > We already did column length checking at BeginCopyTo.
>> > no need to  "if (list_length(cstate->attnumlist) != 1)" error check in
>> > CopyOneRowTo?
>>
>> Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
>> and DoCopyTo() which in turn calls CopyOneRowTo(),
>> but CopyOneRowTo() is also being called from copy_dest_receive().
>>
>
> BeginCopyTo do the preparation work.
> cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
>
> After CopyGetAttnums, the number of attributes for COPY TO cannot be changed.
> right after CopyGetAttnums call then check the length of cstate->attnumlist
> seems fine for me.
> I think in CopyOneRowTo, we can actually
> Assert(list_length(cstate->attnumlist) == 1).
> for raw format.

Right, I've changed it t

Re: Using read_stream in index vacuum

2024-10-19 Thread Junwang Zhao
Hi Andrey,

On Sat, Oct 19, 2024 at 5:39 PM Andrey M. Borodin  wrote:
>
> Hi hackers!
>
> On a recent hacking workshop [0] Thomas mentioned that patches using new API 
> would be welcomed.
> So I prototyped streamlining of B-tree vacuum for a discussion.
> When cleaning an index we must visit every index tuple, thus we uphold a 
> special invariant:
> After checking a trailing block, it must be last according to subsequent 
> RelationGetNumberOfBlocks(rel) call.
>
> This invariant does not allow us to completely replace block loop with 
> streamlining. That's why streamlining is done only for number of blocks 
> returned by first RelationGetNumberOfBlocks(rel) call. A tail is processed 
> with regular ReadBufferExtended().

I'm wondering why is the case, ISTM that we can do *p.current_blocknum
= scanblkno*
and *p.last_exclusive = num_pages* in each loop of the outer for?

+ /* We only streamline number of blocks that are know at the beginning */
know -> known

+ * However, we do not depent on it much, and in future ths
+ * expetation might change.

depent -> depend
ths -> this
expetation -> expectation

>
> Also, it's worth mentioning that we have to jump to the left blocks from a 
> recently split pages. We also do it with regular ReadBufferExtended(). That's 
> why signature btvacuumpage() now accepts a buffer, not a block number.
>
>
> I've benchmarked the patch on my laptop (MacBook Air M3) with following 
> workload:
> 1. Initialization
> create unlogged table x as select random() r from generate_series(1,1e7);
> create index on x(r);
> create index on x(r);
> create index on x(r);
> create index on x(r);
> create index on x(r);
> create index on x(r);
> create index on x(r);
> vacuum;
> 2. pgbench with 1 client
> insert into x select random() from generate_series(0,10) x;
> vacuum x;
>
> On my laptop I see ~3% increase in TPS of the the pgbench (~ from 101 to 
> 104), but statistical noise is very significant, bigger than performance 
> change. Perhaps, a less noisy benchmark can be devised.
>
> What do you think? If this approach seems worthwhile, I can adapt same 
> technology to other AMs.
>

I think this is a use case where the read stream api fits very well, thanks.

>
> Best regards, Andrey Borodin.
>
> [0] 
> https://rhaas.blogspot.com/2024/08/postgresql-hacking-workshop-september.html
>


-- 
Regards
Junwang Zhao




Re: Make default subscription streaming option as Parallel

2024-10-19 Thread Dilip Kumar
On Fri, 18 Oct 2024 at 5:24 PM, Amit Kapila  wrote:

> On Fri, Oct 18, 2024 at 9:48 AM Dilip Kumar  wrote:
> >
> > On Tue, Oct 8, 2024 at 3:38 PM Amit Kapila 
> wrote:
> >>
> >> On Tue, Oct 8, 2024 at 2:25 PM shveta malik 
> wrote:
> >> >
> >> > On Mon, Oct 7, 2024 at 4:03 PM vignesh C  wrote:
> >> > >
> >> >
> >> > With parallel streaming as default, do you think there is a need to
> >> > increase the default for 'max_logical_replication_workers' as IIUC
> >> > parallel workers are taken from the same pool.
> >> >
> >>
> >> Good question. But then one may say that we should proportionately
> >> increase max_worker_processes as well. I don't know what should be
> >> reasonable new defaults. I think we should make parallel streaming as
> >> default and then wait for some user feedback before changing other
> >> defaults.
> >>
> >
> > I agree, actually streaming of in progress transactions is a useful
> feature for performance in case of large transactions, so it makes sense to
> make it "on" by default.  So +1 from my side.
> >
>
> Your response is confusing. AFAIU, this proposal is to change the
> default value of the streaming option to 'parallel' but you are
> suggesting to make 'on' as default which is different from the
> proposed default but OTOH you are saying +1 as well. So, both can't be
> true.


Sorry for confusion I meant to say change default as ‘parallel’

—
Dilip

>


report a typo in WaitReadBuffers

2024-10-19 Thread Junwang Zhao
While reading the stream read interfaces, I found a typo.

/*
* How many neighboring-on-disk blocks can we can scatter-read into
* other buffers at the same time? In this case we don't wait if we

Seems the second *can* is not necessary.

-- 
Regards
Junwang Zhao




Re: Wrong security context for deferred triggers?

2024-10-19 Thread Laurenz Albe
On Sat, 2024-10-19 at 06:17 +0200, Pavel Stehule wrote:
> Maybe we should document so the trigger is executed with the identity used by 
> DML command always,
> when the trigger function has no SECURITY DEFINER flag?

Good idea.  Version 3 has documentation.

Yours,
Laurenz Albe
From f6f02eec3dcb29482c97a6c49c131a4791a87b2e Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Sat, 19 Oct 2024 13:01:01 +0200
Subject: [PATCH v3] Make AFTER triggers run with the correct user

With deferred triggers, it is possible that the current role changes
between the time when the trigger is queued and the time it is
executed (for example, the triggering data modification could have been
executed in a SECURITY DEFINER function).

Up to now, deferred trigger functions would run with the current role
set to whatever was active at commit time.  That does not matter for
regular constraints, whose correctness doesn't depend on the current
role.  But for user-written contraint triggers, the current role
certainly matters.

Security considerations:
- The trigger function could be modified between the time the trigger
  is queued and the time it runs.  If the trigger was executed by a
  privileged user, the new behavior could be used for privilege
  escalation.  But if a privileged user executes DML on a table owned
  by an untrusted user, all bets are off anyway --- the malicious code
  could as well be in the trigger function from the beginning.
  So we don't consider this a security hazard.
- The previous behavior could lead to code inadvertently running with
  elevated privileges if a privileged user temporarily assumes lower
  privileges while executing DML on an untrusted table, but the deferred
  trigger runs with the user's original privileges.  Also, that only
  applies if the privileged user commits *after* resuming the original
  role.  This should be seen as deliberate self-damage rather than a
  security problem.

Author: Laurenz Albe
Discussion: https://postgr.es/m/77ee784cf248e842f74588418f55c2931e47bd78.camel%40cybertec.at
---
 doc/src/sgml/trigger.sgml  |  4 ++
 src/backend/commands/trigger.c | 36 
 src/test/regress/expected/triggers.out | 81 ++
 src/test/regress/sql/triggers.sql  | 75 
 4 files changed, 196 insertions(+)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index a9abaab9056..c3c0faf7a1b 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -129,6 +129,10 @@
 In all cases, a trigger is executed as part of the same transaction as
 the statement that triggered it, so if either the statement or the
 trigger causes an error, the effects of both will be rolled back.
+Also, the trigger will always run in the security context of the role
+that executed the statement that caused the trigger to fire, unless
+the trigger function is defined as SECURITY DEFINER,
+in which case it will run as the function owner.

 

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 09356e46d16..f762e484e28 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3634,6 +3634,7 @@ typedef struct AfterTriggerSharedData
 	TriggerEvent ats_event;		/* event type indicator, see trigger.h */
 	Oid			ats_tgoid;		/* the trigger's ID */
 	Oid			ats_relid;		/* the relation it's on */
+	Oid			ats_rolid;		/* role to execute the trigger */
 	CommandId	ats_firing_id;	/* ID for firing cycle */
 	struct AfterTriggersTableData *ats_table;	/* transition table access */
 	Bitmapset  *ats_modifiedcols;	/* modified columns */
@@ -4118,6 +4119,7 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 	{
 		if (newshared->ats_tgoid == evtshared->ats_tgoid &&
 			newshared->ats_relid == evtshared->ats_relid &&
+			newshared->ats_rolid == evtshared->ats_rolid &&
 			newshared->ats_event == evtshared->ats_event &&
 			newshared->ats_table == evtshared->ats_table &&
 			newshared->ats_firing_id == 0)
@@ -4292,6 +4294,8 @@ AfterTriggerExecute(EState *estate,
 	int			tgindx;
 	bool		should_free_trig = false;
 	bool		should_free_new = false;
+	Oid			save_rolid;
+	int			save_sec_context;
 
 	/*
 	 * Locate trigger in trigdesc.  It might not be present, and in fact the
@@ -4491,6 +4495,33 @@ AfterTriggerExecute(EState *estate,
 
 	MemoryContextReset(per_tuple_context);
 
+	/*
+	 * If the current role was different when the trigger got queued,
+	 * temporarily change the current role.
+	 */
+	GetUserIdAndSecContext(&save_rolid, &save_sec_context);
+	if (save_rolid != evtshared->ats_rolid)
+	{
+		HeapTuple	tuple;
+
+		/*
+		 * We have to check if the role still exists.  Since the trigger queue
+		 * is in memory only, no dependencies to database objects are tracked,
+		 * and the role could have been dropped after the trigger was queued.
+		 */
+		tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(evtshared->ats_rolid));
+
+		if (!HeapTupleIsValid(t

Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL

2024-10-19 Thread Joel Jacobson
On Sat, Oct 19, 2024, at 03:32, Michael Paquier wrote:
> If this area of the code is refactored so as a different error is
> triggered for these two specific queries, we'd still be alerted that
> something is wrong the same way for HEAD or what you are suggesting.
> I can see your argument, but it does not really matter because the
> errors triggered by these option combinations don't link specifically
> to COPY TO or COPY FROM.  At the end, I'm OK with leaving things the
> way they are on HEAD, and let that be.

OK, I understand your point of view, and agree there is no problem from a
safety perspective, as the tests would still alert us, just with a suboptimal
error message.

I can see how such a small change might not be worth doing in a single commit.

However, since my last email, I've found some other problems in this area,
and think we should do a more ambitious improvement, by rearranging the
incorrect options tests into three categories:

1. incorrect COPY {FROM|TO} options
2. incorrect COPY FROM options
3. incorrect COPY TO options

Also, I've found two new problems:

1. Incorrect options tests are missing for the QUOTE and ESCPAE options.
   This was discovered by Jian He in a different thread.

2. One of the ON_ERROR incorrect options tests also depend on the order
   of checks in ProcessCopyOptions.

To explain my current focus on the COPY tests, it's because we're currently
working on the new raw format for COPY, and it would be nice to cleanup these
tests as a preparatory step first.

New patch attached.

/Joel


0001-Improve-incorrect-COPY-options-tests.patch
Description: Binary data


Re: New "raw" COPY format

2024-10-19 Thread jian he
On Sat, Oct 19, 2024 at 1:24 AM Joel Jacobson  wrote:
>>
> Handling of e.g. JSON and other structured text files that could contain
> newlines, in a seamless way seems important, so therefore the default is
> no delimiter for the raw format, so that the entire input is read as one data
> value for COPY FROM, and all column data is concatenated without delimiter
> for COPY TO.
>
> When specifying a delimiter for the raw format, it separates *rows*, and can 
> be
> a multi-byte string, such as E'\r\n' to handle Windows text files.
>
> This has been documented under the DELIMITER option, as well as under the
> Raw Format section.
>
We already make RAW and can only have one column.
if RAW has no default delimiter, then COPY FROM a text file will
become one datum value;
which makes it looks like importing a Large Object.
(https://www.postgresql.org/docs/17/lo-funcs.html)

i think, most of the time, you have more than one row/value to import
and export?


> The refactoring is now in a separate first single commit, which seems
> necessary, to separate the new functionality, from the refactoring.
I agree.


ProcessCopyOptions
/* Extract options from the statement node tree */
foreach(option, options)
{
}
/* --- DELIMITER option --- */
/* --- NULL option --- */
/* --- QUOTE option --- */
Currently the regress test passed, i think that means your refactor is fine.

in ProcessCopyOptions, maybe we can rearrange the code after the
foreach loop (foreach(option, options)
based on the parameters order in
https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
so we can review it by comparing the refactoring with the
sql-copy.html Parameters section's description.


>
> > We already did column length checking at BeginCopyTo.
> > no need to  "if (list_length(cstate->attnumlist) != 1)" error check in
> > CopyOneRowTo?
>
> Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
> and DoCopyTo() which in turn calls CopyOneRowTo(),
> but CopyOneRowTo() is also being called from copy_dest_receive().
>

BeginCopyTo do the preparation work.
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);

After CopyGetAttnums, the number of attributes for COPY TO cannot be changed.
right after CopyGetAttnums call then check the length of cstate->attnumlist
seems fine for me.
I think in CopyOneRowTo, we can actually
Assert(list_length(cstate->attnumlist) == 1).
for raw format.


src10=# drop table if exists x;
create table x(a int);
COPY x from stdin (FORMAT raw);
DROP TABLE
CREATE TABLE
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 11
>> 12
>> \.
ERROR:  invalid input syntax for type integer: "11
12
"
CONTEXT:  COPY x, line 1, column a: "11
12
"

The above case means COPY FROM STDIN (FORMAT RAW) can only import one
single value (when successful).
user need to specify like:

COPY x from stdin (FORMAT raw, delimiter E'\n');

seems raw format default no delimiter is not user friendly.




Add pg_ownerships and pg_privileges system views

2024-10-19 Thread Joel Jacobson
Hi hackers,

Here is an attempt to revive this patch from 2021-2022, that has been ready now
for a while, thanks to pg_get_acl() function that was committed in
4564f1c and d898665.

I've renamed the $subject of the email thread, to match the commitfest entry:
https://commitfest.postgresql.org/50/5033/

---

Add pg_ownerships and pg_privileges system views.

These new views provide a more accessible and user-friendly way to retrieve
information about object ownerships and privileges.

The view pg_ownerships provides access to information about object ownerships.

The view pg_privileges provides access to information about explicitly
granted privileges on database objects. The special grantee value "-" means
the privilege is granted to PUBLIC.

Example usage:

CREATE ROLE alice;
CREATE ROLE bob;
CREATE ROLE carol;

CREATE TABLE alice_table ();
ALTER TABLE alice_table OWNER TO alice;
REVOKE ALL ON alice_table FROM alice;
GRANT SELECT ON alice_table TO bob;

CREATE TABLE bob_table ();
ALTER TABLE bob_table OWNER TO bob;
REVOKE ALL ON bob_table FROM bob;
GRANT SELECT, UPDATE ON bob_table TO carol;

SELECT * FROM pg_ownerships ORDER BY owner;

 classid  | objid | objsubid | type  | schema |name |  identity 
 | owner
--+---+--+---++-++---
 pg_class | 16388 |0 | table | public | alice_table | 
public.alice_table | alice
 pg_class | 16391 |0 | table | public | bob_table   | public.bob_table  
 | bob
(2 rows)

SELECT * FROM pg_privileges ORDER BY grantee;

 classid  | objid | objsubid | type  | schema |name |  identity 
 | grantor | grantee | privilege_type | is_grantable
--+---+--+---++-++-+-++--
 pg_class | 16388 |0 | table | public | alice_table | 
public.alice_table | alice   | bob | SELECT | f
 pg_class | 16391 |0 | table | public | bob_table   | public.bob_table  
 | bob | carol   | SELECT | f
 pg_class | 16391 |0 | table | public | bob_table   | public.bob_table  
 | bob | carol   | UPDATE | f
(3 rows)

---

Recap:

During the work on this, the need for a pg_get_acl() function was identified.

Thanks to feedback from Peter Eisentraut, the view "pg_permissions" was renamed
to "pg_privileges", since "permissions" is not an SQL term.

David Fetter:
> +1 for both this and the ownerships view.

Joe Conway:
> While this is interesting and probably useful for troubleshooting, it does not
> provide the complete picture if what you care about is something like "what
> stuff can joel do in my database".
> 
> The reasons for this include default grants to PUBLIC and role membership, and
> even that is convoluted by INHERIT/NOINHERIT role attributes.

Chapman Flack expressed interest in reviewing the patch, but at that time
the pg_get_acl() had not yet been committed and the view not been renamed.

Michael Paquier alerted me CF bot had been red, and the patch was rebased.

/Joel


v2-0001-Add-pg_ownerships-and-pg_privileges-system-views.patch
Description: Binary data


Re: System views for versions reporting

2024-10-19 Thread Dmitry Dolgov
> On Mon, Oct 07, 2024 at 11:26:41AM GMT, Dmitry Dolgov wrote:
> > On Sun, Oct 06, 2024 at 12:01:29PM GMT, Joe Conway wrote:
> > I'm not sure why ICU is "Compile Time" rather than "Run Time" when it is not
> > statically linked.
>
> It reports U_UNICODE_VERSION at compile time. It's not necessarily
> correct though, I can try to replace it with the runtime version. I
> think there was some ICU functionality (something like
> u_getUnicodeVersion), which is maybe a better fit.
>
> > Also, if we are going to include ICU here, shouldn't we
> > also include libc version?
>
> Yeah, why not. One of my goals here is to identify a balanced set of
> useful versions to report.

Here is how it would look like, I've added icu and glibc runtime
versions into the second patch.
>From d903142d178dd8a9c03d07ee4809ac582b9b7818 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sat, 5 Oct 2024 18:27:57 +0200
Subject: [PATCH v2 1/4] Add infrastructure for pg_system_versions view

Introduce a unified way of reporting versions (PostgreSQL itself, the
compiler, the host system, compile and runtime dependencies, etc.) via a
new system view pg_system_versions. This is going to be useful for
troubleshooting and should enhance bug reports, replacing manual
bug-prone collecting of the same information.

The view is backed by a hash table, that contains callbacks returning
version string for a particular component. The idea is to allow some
flexibility in reporting, making components responsible for how and when
the information is exposed.
---
 doc/src/sgml/system-views.sgml  |  56 
 src/backend/catalog/system_views.sql|   8 ++
 src/backend/utils/misc/Makefile |   3 +-
 src/backend/utils/misc/meson.build  |   1 +
 src/backend/utils/misc/system_version.c | 108 
 src/include/catalog/pg_proc.dat |   6 ++
 src/include/utils/system_version.h  |  40 +
 src/test/regress/expected/rules.out |   8 ++
 8 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/utils/misc/system_version.c
 create mode 100644 src/include/utils/system_version.h

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 634a4c0..df0b9d3 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -226,6 +226,11 @@
   wait events
  
 
+ 
+  pg_system_versions
+  system versions
+ 
+
 

   
@@ -5064,4 +5069,55 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+ 
+  pg_system_versions
+
+  
+   pg_system_versions
+  
+
+  
+   The view pg_system_versions provides description
+   about versions of various system components, e.g. PostgreSQL itself,
+   compiler used to build it, dependencies, etc.
+  
+
+  
+   pg_system_versions Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Component name
+  
+ 
+
+ 
+  
+   version text
+  
+  
+   Component version
+  
+ 
+
+
+   
+  
+ 
+
 
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 7fd5d25..091013d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1377,3 +1377,11 @@ CREATE VIEW pg_stat_subscription_stats AS
 
 CREATE VIEW pg_wait_events AS
 SELECT * FROM pg_get_wait_events();
+
+CREATE VIEW pg_system_versions AS
+SELECT
+name, version,
+CASE type WHEN 0 THEN 'Compile Time'
+  WHEN 1 THEN 'Run Time'
+  END AS "type"
+FROM pg_get_system_versions();
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index d9f5978..fcb3be7 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -31,7 +31,8 @@ OBJS = \
sampling.o \
superuser.o \
timeout.o \
-   tzparser.o
+   tzparser.o \
+   system_version.o
 
 # This location might depend on the installation directories. Therefore
 # we can't substitute it into pg_config.h.
diff --git a/src/backend/utils/misc/meson.build 
b/src/backend/utils/misc/meson.build
index 6669502..ca2abc5 100644
--- a/src/backend/utils/misc/meson.build
+++ b/src/backend/utils/misc/meson.build
@@ -15,6 +15,7 @@ backend_sources += files(
   'rls.c',
   'sampling.c',
   'superuser.c',
+  'system_version.c',
   'timeout.c',
   'tzparser.c',
 )
diff --git a/src/backend/utils/misc/system_version.c 
b/src/backend/utils/misc/system_version.c
new file mode 100644
index 000..4d633fc
--- /dev/null
+++ b/src/backend/utils/misc/system_version.c
@@ -0,0 +1,108 @@
+/*
+ *
+ * system_version.c
+ *   Functions for reporting version of system components.
+ *
+ * A system component is defined very broadly here, it might 

Improve documentation regarding custom settings, placeholders, and the administrative functions

2024-10-19 Thread David G. Johnston
Hey!

Motivated by recent user complaints regarding our documentation of behavior
in this area I propose the following to shore things up a bit.

There may be other places that need this coverage but it seems the most
likely place our users are going to experience this behavior is in using
set_config and current_setting.

Mostly I'm pointing out the fact that one can never take the null value to
be the actual value of a setting.  In terms of current_setting this then
establishes the fact that the null value it may return is an error-handling
alternative only and not something to be relied upon as being an actual
value of the setting.

The change to custom options and set_config point expands on the
"placeholder" concept already Introduced and just documents that set_config
can and will create such placeholders implicitly.

Also, in realizing that the null value is not technically a valid input for
new_value in set_config, a paragraph is added to explain how the null value
gets interpreted.

I sorta get not wanting to encourage the use of custom options but saying
"have no function" just doesn't make sense.  The certainly exist, and as
data they don't have any function on their own anyway.  The surrounding
discussion regarding extensions gets the same point across sufficiently
without stating that an external loadable module is required when it is not.

David J.

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 934ef5e469..4478d0aa91 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -23,7 +23,7 @@

 
  All parameter names are case-insensitive. Every parameter takes a
- value of one of five types: boolean, string, integer, floating point,
+ non-null value of one of five types: boolean, string, integer,
floating point,
  or enumerated (enum).  The type determines the syntax for setting the
  parameter:
 
@@ -11350,14 +11350,20 @@ dynamic_library_path =
'C:\tools\postgresql;H:\my_project\lib;$libdir'
 
  Because custom options may need to be set in processes that have not
  loaded the relevant extension module,
PostgreSQL
- will accept a setting for any two-part parameter name.  Such variables
- are treated as placeholders and have no function until the module that
- defines them is loaded. When an extension module is loaded, it will
add
+ will accept a setting for any two-part parameter name.
+ When an extension module is loaded, it will add
  its variable definitions and convert any placeholder values according
to
  those definitions.  If there are any unrecognized placeholders
  that begin with its extension name, warnings are issued and those
  placeholders are removed.
 
+
+
+ If a placeholder is created in a session it will exist for the
+ lifetime of the session unless removed by an extension.
+ Placeholders have a string data type with a reset value of the empty
string.
+
+



diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ad663c94d7..605bf533ee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28157,7 +28157,7 @@ acl  |
{postgres=arwdDxtm/postgres,foo=r/postgres}
 text


-Returns the current value of the
+Returns the current non-null value of the
 setting setting_name.  If there is no such
 setting, current_setting throws an error
 unless missing_ok is supplied and
@@ -28191,6 +28191,17 @@ acl  |
{postgres=arwdDxtm/postgres,foo=r/postgres}
 use false instead. This function corresponds to
 the SQL command .

+   
+set_config accepts the NULL value for
+new_value, but as settings cannot be null
this input
+is interpreted as a request to set the setting to its default
value.
+   
+   
+If setting_name does not already exist
+set_config throws an error unless the
identifier is a valid
+custom option name,
in which it
+creates a placeholder with the empty string as its old value.
+   

 set_config('log_statement_stats', 'off', false)
 off


Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION

2024-10-19 Thread Noah Misch
On Thu, Sep 19, 2024 at 05:35:33PM -0400, Tom Lane wrote:
> So the fix seems clear to me: RestoreRelationMap needs to happen
> before anything that could result in catalog lookups.  I'm kind
> of inclined to move up the adjacent restores of non-transactional
> low-level stuff too, particularly RestoreReindexState which has
> direct impact on how catalog lookups are done.

Thanks for debugging that.  RestorePendingSyncs() also changes what
RelationInitPhysicalAddr() puts in the relcache entry, so it needs to stay
with RestoreRelationMap().  I'm attaching the fix I have in mind.  Since the
above (commit 126ec0b), the following has been getting an
AssertPendingSyncConsistency() trap:

  make check-tests TESTS=reindex_catalog 
PG_TEST_INITDB_EXTRA_OPTS='-cwal_level=minimal -cmax_wal_senders=0'

In commit range [6e086fa,126ec0b^], that failed differently, reflecting the
outdated relmapper.  (6e086fa is mostly innocent here, though.)

I looked for ways we'd regret not back-patching commit 126ec0b and this, but I
didn't find a concrete problem.  After InvalidateSystemCaches(), the
back-branch parallel worker relcache contains the nailed relations.  Each
entry then has:

- rd_locator possibly outdated
- rd_firstRelfilelocatorSubid==0 (correct for rd_locator)
- rd_isvalid==false, from RelationReloadNailed()

>From code comments, I gather RelationCacheInitializePhase2() is the latest we
use an rd_locator without first making the relcache entry rd_isvalid=true.  If
that's right, so long as no catalogs get rd_isvalid=true between
InvalidateSystemCaches() and RestorePendingSyncs(), we're okay without a
back-patch.

I was going to add check-world coverage of this case, but I stopped in light
of the tricky deadlocks that led to commit fe4d022 disabling the
reindex_catalog test.  check-world does reach it, in inplace.spec, if one runs
with both wal_level=minimal and debug_parallel_query=regress.  (inplace.spec
may eventually fail with the same deadlocks.  I've not heard of it deadlocking
so far, and being a NO_INSTALLCHECK test helps.)
Author: Noah Misch 
Commit: Noah Misch 

Fix parallel worker tracking of new catalog relfilenumbers.

Reunite RestorePendingSyncs() with RestoreRelationMap().  If
RelationInitPhysicalAddr() ran after RestoreRelationMap() but before
RestorePendingSyncs(), the relcache entry could cause RelationNeedsWAL()
to return true erroneously.  Trouble required commands of the current
transaction to include REINDEX or CLUSTER of a system catalog.  The
parallel leader correctly derived RelationNeedsWAL()==false from the new
relfilenumber, but the worker saw RelationNeedsWAL()==true.  Worker
MarkBufferDirtyHint() then wrote unwanted WAL.  Recovery of that
unwanted WAL could lose tuples like the system could before commit
c6b92041d38512a4176ed76ad06f713d2e6c01a8 introduced this tracking.
RestorePendingSyncs() and RestoreRelationMap() were adjacent till commit
126ec0bc76d044d3a9eb86538b61242bf7da6db4, so no back-patch for now.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index d4e84aa..4a2e352 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1421,17 +1421,18 @@ ParallelWorkerMain(Datum main_arg)
StartParallelWorkerTransaction(tstatespace);
 
/*
-* Restore relmapper and reindex state early, since these affect catalog
-* access.  Ideally we'd do this even before calling InitPostgres, but
-* that has order-of-initialization problems, and also the relmapper 
would
-* get confused during the CommitTransactionCommand call above.
+* Restore state that affects catalog access.  Ideally we'd do this even
+* before calling InitPostgres, but that has order-of-initialization
+* problems, and also the relmapper would get confused during the
+* CommitTransactionCommand call above.
 */
+   pendingsyncsspace = shm_toc_lookup(toc, PARALLEL_KEY_PENDING_SYNCS,
+  
false);
+   RestorePendingSyncs(pendingsyncsspace);
relmapperspace = shm_toc_lookup(toc, PARALLEL_KEY_RELMAPPER_STATE, 
false);
RestoreRelationMap(relmapperspace);
reindexspace = shm_toc_lookup(toc, PARALLEL_KEY_REINDEX_STATE, false);
RestoreReindexState(reindexspace);
-
-   /* Restore combo CID state. */
combocidspace = shm_toc_lookup(toc, PARALLEL_KEY_COMBO_CID, false);
RestoreComboCIDState(combocidspace);
 
@@ -1488,11 +1489,6 @@ ParallelWorkerMain(Datum main_arg)
SetTempNamespaceState(fps->temp_namespace_id,
  fps->temp_toast_namespace_id);
 
-   /* Restore pending syncs. */
-   pendingsyncsspace = shm_toc_lookup(toc, PARALLEL_KEY_PENDING_SYNCS,
- 

Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION

2024-10-19 Thread Tom Lane
Noah Misch  writes:
> Thanks for debugging that.  RestorePendingSyncs() also changes what
> RelationInitPhysicalAddr() puts in the relcache entry, so it needs to stay
> with RestoreRelationMap().  I'm attaching the fix I have in mind.

Ah.  No objections here.

regards, tom lane