Re: transforms

2025-04-19 Thread Chapman Flack
On 04/19/25 15:03, Tom Lane wrote:
> So what are the odds that outside PLs do it correctly (for whatever
> you think "correctly" is)?  It doesn't help any that we document
> none of this.

My sense of Déjà vu turns out to be because I had a bunch of proposed
new documentation and example code for plsample in a commitfest a few
years back[0] but had to set it aside at the time. And it's becoming clear
there's even more to be nailed down than I realized at the time anyway.

> If I had my druthers I would flush every one of these special rules,
> and just say that an argument/result gets transformed if there's a
> transform matching its declared type, full stop.  You want to
> transform arrays or domains, you make a transform to match.
> This would be a lot simpler to understand and potentially more
> flexible.  But I don't know if it falls foul of the wording of
> the spec, and I concede that in some cases it'd require users
> to do more work creating those additional transforms.

I don't see right offhand that it would fall afoul of the spec.

I had also proposed in that earlier CF to change some of what our doc
says about the spec. We currently say "There is a CREATE TRANSFORM command
in the SQL standard, but it is for adapting data types to client languages"
and I don't think it's as different as that suggests. 4.9.6 clearly says
the from-sql function is invoked whenever a value "is passed to a host-
language program *or an external routine*" and the to-sql function is
invoked whenever a value "is supplied by a host-language program *or an
external routine*" (asterisks mine).

The biggest difference is our use of type internal to allow transforming
to arbitrary PL data types: the ISO to- and from-sql functions have to
interface with the external routine using predefined SQL types.

ISO does have a notion of user-defined types being subtypes or supertypes
of each other, and sections 9.31 and 9.33 on determining the applicable
from-sql or to-sql function do involve examining a type's supertypes to
find one that has a transform matching the wanted transform group name.
I don't see anything, though, about peering into array or range or domain
types.

Perhaps we could regard a domain as a subtype of its base type, but
I do not see for sure that the spec does.

I think that a transform-must-match-declared-type rule would not be
at odds with the spec.

Regards,
-Chap


[0] https://commitfest.postgresql.org/patch/3554/




Re: transforms

2025-04-19 Thread Chapman Flack
On 04/19/25 19:32, Chapman Flack wrote:
> On 04/19/25 19:12, Tom Lane wrote:
>> You could argue that
>> CREATE FUNCTION foo(anyelement) RETURNS anyelement
>> TRANSFORM FOR TYPE int
>> AS ...;
>> should mean that if the actual argument type is int, then the
>> mentioned transform should be applied to the input and result;

Also if foo() RETURNS TRIGGER and tg_trigtuple/tg_newtuple have
int components.

Regards,
-Chap




Re: Rename injection point names in test_aio

2025-04-19 Thread Michael Paquier
On Wed, Apr 16, 2025 at 03:25:19PM -0700, Michael Paquier wrote:
> If anybody objects about the points getting renamed here, please let
> me know.

Applied this one as 114f7fa81c72.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-19 Thread Sami Imseih
> Sami Imseih  writes:
> > I think the solution proposed by Frédéric seems reasonable: to switch
> > the debug_query_string
> > inside PortalDrop. However, I do not like the way the
> > debug_query_string changes values
> > after the CreatePortal call inside exec_bind_message; that seems incorrect.
> > So, I believe we should temporarily switch the debug_query_string
> > value only while
> > running PortalDrop. Attached is what I think could be safer to do.
> > What do you think?
>
> I don't think this is safe at all.  The portal's query string
> is potentially shorter-lived than the portal, see in particular
> exec_simple_query which just passes a pointer to the original
> string in MessageContext.

Yes, you are correct. The comments about MessageContect in
mmgr/README give me pause.

"
This
is kept separate from per-transaction and per-portal contexts because a
query string might need to live either a longer or shorter time than any
single transaction or portal.
"""

> I'm frankly inclined to do nothing, but if we must do something,
> the way to fix it here would be to transiently set debug_query_string
> to NULL so that the actions of PortalDrop aren't blamed on any
> particular query.  (In my testing, the "temporary file:" message comes
> out without any attached STATEMENT most of the time already, so this
> isn't losing much as far as that's concerned.)

moreover, as I was looking into why the case mentioned earlier

```
SELECT FROM foo ORDER BY a \bind
;
SELECT 1 \bind
;
```

does not show STATEMENT after the temp file logging, I realized
it's because the temp files are cleaned up and reported at
the end of transaction, which means that debug_query_string is NULL at the
time the portal is dropped in the next query.  This causes
check_log_of_query to return false.

/* query string available? */
if (debug_query_string == NULL)
  return false;

> Perhaps a cleaner answer is to rearrange things in postgres.c
> so that if there's a pre-existing unnamed portal, we drop that
> before we ever set debug_query_string and friends at all.

That did cross my mind as well, but I was trying to avoid doing this
type of rearranging. I still rather not go down that path considering the
case mentioned above will still not display the query text in a STATEMENT log.

> I'm frankly inclined to do nothing, but if we must do something,
> the way to fix it here would be to transiently set debug_query_string
> to NULL so that the actions of PortalDrop aren't blamed on any
> particular query.

I think this is better, because I rather we avoid lines like the below in which
there are temp files being reported all with STATEMENT logging
of a different query. It's better to just not show STATEMENT at all.

```
2025-04-19 16:44:38.082 CDT [38963] STATEMENT:  SELECT 1
2025-04-19 16:44:38.115 CDT [38963] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp38963.1", size 1073741824
2025-04-19 16:44:38.115 CDT [38963] STATEMENT:  SELECT 1
2025-04-19 16:44:38.149 CDT [38963] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp38963.2", size 1073741824
2025-04-19 16:44:38.149 CDT [38963] STATEMENT:  SELECT 1
2025-04-19 16:44:38.305 CDT [38963] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp38963.3", size 1073741824
2025-04-19 16:44:38.305 CDT [38963] STATEMENT:  SELECT 1
2025-04-19 16:44:38.558 CDT [38963] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp38963.4", size 1073741824
2025-04-19 16:44:38.558 CDT [38963] STATEMENT:  SELECT 1
2025-04-19 16:44:38.744 CDT [38963] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp38963.5", size 1073741824
```


--
Sami Imseih
Amazon Web Services (AWS)




Re: transforms

2025-04-19 Thread Tom Lane
Oh, just in case this wasn't complicated enough already: what
to do with polymorphic arguments/results?

You could argue that

CREATE FUNCTION foo(anyelement) RETURNS anyelement
TRANSFORM FOR TYPE int
AS ...;

should mean that if the actual argument type is int, then the
mentioned transform should be applied to the input and result;
but if it's some other type then just do the normal conversions.

You could perhaps also argue that that's a bad idea.  I'm not sure.
It does make for very dynamic matching of transforms to arguments,
which feels dubious to me, but I can't put my finger on a bad
consequence.

We have not faced this issue in the in-core PLs because neither
plperl nor plpython allow polymorphic arguments/results.  (I'm
surprised that no one has yet wanted to fix that.)  But I will
bet that there are other PLs that do allow that, and if they
also implement transforms then there's probably precedent
out there already.

regards, tom lane




Re: transforms

2025-04-19 Thread Chapman Flack
On 04/19/25 19:12, Tom Lane wrote:
> You could argue that
> CREATE FUNCTION foo(anyelement) RETURNS anyelement
> TRANSFORM FOR TYPE int
> AS ...;
> should mean that if the actual argument type is int, then the
> mentioned transform should be applied to the input and result;
> but if it's some other type then just do the normal conversions.
> 
> You could perhaps also argue that that's a bad idea.  I'm not sure.

My position would be that it's not a bad idea. It would be consistent
and unastonishing.

A PL that uses a staging approach has probably precomputed and cached
a good deal of information, including what transforms are requested and
what parameter positions are polymorphic if any, and only needs to check
those when specializing, and funccache can preserve those results too.

Regards,
-Chap




Re: jsonapi: scary new warnings with LTO enabled

2025-04-19 Thread Daniel Gustafsson
> On 17 Apr 2025, at 17:48, Jacob Champion  
> wrote:
> 
> On Thu, Apr 17, 2025 at 8:20 AM Tom Lane  wrote:
>> I confirm this silences those warnings on my Fedora 41 box.
> 
> Instead of doing
> 
>lex = calloc(...);
>/* (error out on NULL return) */
>makeJsonLexContextCstringLen(lex, ...);
> 
> we need to do
> 
>lex = makeJsonLexContextCstringLen(NULL, ...);
>/* (error out on NULL return) */
> 
> so that JSONLEX_FREE_STRUCT is set correctly. Otherwise we'll leak the
> main allocation:

Since there is no way to determine if the allocation succeeded from outside of
the JSON api it might be better to keep the calloc and explicitly free it?

(Could a JsonLexContextBroken() function be useful perhaps?)

--
Daniel Gustafsson



v2-0001-Allocate-JsonLexContexts-on-the-heap-to-avoid-war.patch
Description: Binary data


Re: Typos in the code and README

2025-04-19 Thread David Rowley
On Sat, 19 Apr 2025 at 20:00, Alexander Lakhin  wrote:
> I've gathered the following collection of typos and inconsistencies
> appeared since 2025-01-01:

Here are a few more fixes of a similar ilk. All new in 2025,
predominantly from the last few days before feature freeze.

The few I wasn't a full 100% certain on were "big big" *could* be used
as an adjective, but "huge" or "massive" would have been better. I
didn't see the need so assumed it was a mistake. The "now now"
technically is ok in South Africa [1], but I don't believe that
meaning was the intention.

I found these with a manual trawl through:

git grep -E "\b(\w+[^long|^that]+)\s+\1\b" -- ':!*.po' ':!*.out'
':!*.data' ':!*.dat' ':!*.xml'

David

[1] https://en.wikipedia.org/wiki/List_of_South_African_slang_words


fix_duplicate_words.patch
Description: Binary data


Re: Typos in the code and README

2025-04-19 Thread Alexander Lakhin

Hello hackers,

I've gathered the following collection of typos and inconsistencies
appeared since 2025-01-01:
aio_worker.c -> method_worker.c

amcheck_lock_relation -> amcheck_lock_relation_and_check

be_key -> be_cancel_key

belonds -> belongs

cann -> can

CheckConstraintFetch -> CheckNNConstraintFetch
(renamed by a379061a2)

COLLID -> COLLOID

coltypemods -> coltypmods

columsn -> column

combinig -> combining

containting -> containing

non-exist datbase -> non-existent database

dead_end -> dead-end
(see a78af0427)

derefefence -> dereference

displyed -> displayed

droping -> dropping

es_eqp_active -> es_epq_active

es_part_prune_result -> es_part_prune_results

ExceDelete -> ExecDelete

excludes-chema -> exclude-schema

it fhe -> if the

GIN_TUPLE_ -> GIN_TUPLE_H

HandleFatalErrors -> HandleFatalError

happend -> reached
(to preserve the line width)

happenning -> happening

HIKEY -> P_HIKEY

IGNORE_CHECKSUM_FAILURE -> PIV_IGNORE_CHECKSUM_FAILURE

InitExecPartitionPruneContext -> InitExecPartitionPruneContexts

inititalization -> initialization

Injection_point -> Injection point

iterm -> item

$log_end -> remove unused variable
(align with src/test/modules/oauth_validator/t/001_server.pl)

looplback -> loopback

mappping -> mapping

MyWorkerId -> MyIoWorkerId

negotation -> negotiation

network-byte-order -> network byte order

parititioned* -> partitioned*

pending_null -> pending_nulls

permanentaly -> permanently

pg_localeconf_free -> pg_localeconv_free

PG_READ_ALL_STATS -> ROLE_PG_READ_ALL_STATS

pk_strategy -> pk_cmptype
(see 8123e91f5)

PM_WAIT_BACKEND -> PM_WAIT_BACKENDS

pollset -> poll set

predicable -> predictable

PRINT_STATS_STDERR -> PRINT_STATS_TO_STDERR

procced -> proceed

remining -> remaining

save_xmin -> saved_xmin

smgstartreadv -> smgrstartreadv

StartLocalBufer -> StartLocalBufferIO

statitistics -> statistics

swaping -> swapping

synq_queue_destroy -> sync_queue_destroy

thats -> that's

ther -> there

test-server -> test server

unfronzen -> unfrozen

upin -> unpin

Please find attached the complete patch for your convenience.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)diff --git a/contrib/amcheck/verify_common.h b/contrib/amcheck/verify_common.h
index b2565bfbbab..e78adb68808 100644
--- a/contrib/amcheck/verify_common.h
+++ b/contrib/amcheck/verify_common.h
@@ -16,7 +16,7 @@
 #include "utils/relcache.h"
 #include "miscadmin.h"
 
-/* Typedefs for callback functions for amcheck_lock_relation */
+/* Typedefs for callback functions for amcheck_lock_relation_and_check */
 typedef void (*IndexCheckableCallback) (Relation index);
 typedef void (*IndexDoCheckCallback) (Relation rel,
 	  Relation heaprel,
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 318fe330518..3629243ed01 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -359,7 +359,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
 ptr->depth = stack->depth + 1;
 
 /*
- * Set rightmost parent key to invalid iterm pointer. Its
+ * Set rightmost parent key to invalid item pointer. Its
  * value is 'Infinity' and not explicitly stored.
  */
 if (rightlink == InvalidBlockNumber)
@@ -587,7 +587,7 @@ gin_check_parent_keys_consistency(Relation rel,
 
 		/*
 		 * Check if it is properly adjusted. If succeed,
-		 * procced to the next key.
+		 * proceed to the next key.
 		 */
 		if (ginCompareEntries(&state, attnum, current_key,
 			  current_key_category, parent_key,
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index e1701bd56ef..6bfb9fb669e 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -289,7 +289,7 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
  *
  * Returns NUMA node ID for each memory page used by the buffer. Buffers may
  * be smaller or larger than OS memory pages. For each buffer we return one
- * entry for each memory page used by the buffer (it fhe buffer is smaller,
+ * entry for each memory page used by the buffer (if the buffer is smaller,
  * it only uses a part of one memory page).
  *
  * We expect both sizes (for buffers and memory pages) to be a power-of-2, so
@@ -335,7 +335,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
 		 * how the pages and buffers "align" in memory - the buffers may be
 		 * shifted in some way, using more memory pages than necessary.
 		 *
-		 * So we need to be careful about mappping buffers to memory pages. We
+		 * So we need to be careful about mapping buffers to memory pages. We
 		 * calculate the maximum number of pages a buffer might use, so that
 		 * we allocate enough space for the entries. And then we count the
 		 * actual number of entries as we scan the buffers.
diff --git a/contrib/pg_overexplain/expected/pg_overexplain.out b/contrib/pg_overexplain/expected/pg_overexpl

Re: disabled SSL log_like tests

2025-04-19 Thread Tom Lane
Andrew Dunstan  writes:
> On 2025-04-18 Fr 7:26 PM, Tom Lane wrote:
> +See C.  CAUTION: use of either option requires that
> +the server's log_min_messages be at least DEBUG2, and that no other
> +client backend is launched concurrently.  These requirements allow
> +C to wait to see the postmaster-log report of backend
> +exit, without which there is a race condition as to whether we will
> +see the expected backend log output.

> That seems a little fragile. I can imagine test authors easily 
> forgetting this. Is it worth sanity checking to make sure 
> log_min_messages is appropriately set?

Setting log_min_messages is not so easily forgotten, because
connect_fails will just hang until timeout if you didn't.

I'm more worried about the "no other backend" requirement.
I think v2 is reasonably proof against that, but whether it's
sufficiently bulletproof to withstand the buildfarm environment
remains to be seen.  I wish there were a better way to
determine the backend PID for a failed connection...

regards, tom lane




Re: What's our minimum supported Python version?

2025-04-19 Thread Florents Tselai



> On 19 Apr 2025, at 7:17 PM, Tom Lane  wrote:
> 
> Our fine manual claims the answer to $SUBJECT is 3.2.
> 
> However, there is room to doubt that our code still actually
> works with 3.2, because the oldest version I can find being
> tested by the buildfarm is topminnow's 3.4.2.  The next oldest
> is shelduck's 3.4.10, and everything else has 3.5.2 or newer.
> topminnow hasn't reported in a couple months, so it may
> actually be dead.
> 
> The reason I bring this up is that I found out the hard way
> that src/test/modules/oauth_validator fails on RHEL8, because
> its oauth_server.py script is not compatible with the 3.6.8
> version of Python supplied by this distro.  (There are 22
> buildfarm animals running 3.6.8, presumably mostly also Red
> Hat-derived platforms, so I'm not going to apologize for
> my workstation being a little long in the tooth.)
> 
> I think we need to do some combination of moving our
> minimum-supported-version goalposts forward, making sure that
> whatever we claim is the minimum Python version is actually
> being tested in the buildfarm, and fixing oauth_server.py
> so that it works on that version.
> 
> I am not familiar enough with the Python landscape to
> have an informed opinion about what the minimum supported
> version should be.  But I'll present this data scraped
> from the buildfarm about how many animals are running what
> (counting animals that have reported since 2025-01-01):
> 
>  Count Version
> 
>  1 3.4.2
>  1 3.4.10
>  2 3.5.2
>  3 3.5.3
>  2 3.6.5
> 22 3.6.8
>  3 3.6.9
>  5 3.6.15
>  1 3.7.1
>  2 3.7.3
>  1 3.7.10
>  2 3.7.16
>  1 3.8.8
>  4 3.8.10
> 15 3.9.2
>  6 3.9.16
>  6 3.9.18
>  1 3.9.19
>  2 3.9.20
> 13 3.9.21
>  6 3.10.x
> 18 3.11.x
> 21 3.12.x
> 22 3.13.x
> 
>160 total
> 
> Thoughts?
> 
>   regards, tom lane
> 

From an Python ecosystem perspective,
 3.9 is the usual minimum that people use in CI matrices nowdays.
So if it was up to me, that’s what I’d choose. 

Mabe 3.7 if I was too liberal, but <3.6 and older it’s probably too old to 
bother.
Then again, I’m not a buildfarm power user, so my 2 cents are worth a bit less.







Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-19 Thread Sami Imseih
I looked into this a bit more.

What occurs during exec_bind_message is that the debug_query_string of the
query is set early on. Later in that routine, a new portal is created
via CreatePortal,
which also drops the existing unnamed portal from the previous
execution and which
also logs the temp file information. So the logging is now using the
debug_query_string
of the current query and not the one referenced by the to-be-dropped portal.

This behavior means that there is a delay in log_temp_files logging,
since the portal
must be dropped, which can occur after the statement has completed
execution. We can look
into improving that, but I am not sure whether it is possible or can
be done safely.

I think the solution proposed by Frédéric seems reasonable: to switch
the debug_query_string
inside PortalDrop. However, I do not like the way the
debug_query_string changes values
after the CreatePortal call inside exec_bind_message; that seems incorrect.
So, I believe we should temporarily switch the debug_query_string
value only while
running PortalDrop. Attached is what I think could be safer to do.
What do you think?

--
Sami Imseih
Amazon Web Services (AWS)


v2-0001-Fix-race-condition-between-debug_query_string-and.patch
Description: Binary data


Re: transforms

2025-04-19 Thread Tom Lane
Chapman Flack  writes:
>Therefore:
> 5. You can create a transform for an array type, but can never refer to it
>in TRANSFORM FOR TYPE (you end up referring to the element type).
> 6. You can refer to a domain in TRANSFORM FOR TYPE, but that transform
>can never be created.
> These don't seem to be what the rules ought to be, but I don't have
> a strong intuition for what the rules ought to be instead.

Yeah, I noticed that inconsistency too, but hadn't gotten around to
researching what the spec says about it.  I don't especially like it
because it puts the onus on PL implementations to dig for the
appropriate transform to apply to an argument or result, and I don't
think they're likely to do that consistently.  It looks to me like
plperl and plpython are already inconsistent: plpython seems to be
digging down into arrays and domains, but plperl does not AFAICS.
So what are the odds that outside PLs do it correctly (for whatever
you think "correctly" is)?  It doesn't help any that we document
none of this.

If I had my druthers I would flush every one of these special rules,
and just say that an argument/result gets transformed if there's a
transform matching its declared type, full stop.  You want to
transform arrays or domains, you make a transform to match.
This would be a lot simpler to understand and potentially more
flexible.  But I don't know if it falls foul of the wording of
the spec, and I concede that in some cases it'd require users
to do more work creating those additional transforms.

regards, tom lane




Re: Typos in the code and README

2025-04-19 Thread Michael Paquier
On Sat, Apr 19, 2025 at 11:00:00AM +0300, Alexander Lakhin wrote:
> I've gathered the following collection of typos and inconsistencies
> appeared since 2025-01-01:

Good finds, Alexander, particularly for the set of function and
variable names used in the comments that do not match with the
reality.  I've checked all these, and they're good for me.

> GIN_TUPLE_ -> GIN_TUPLE_H

This one is interested.  It should be harmless in practice, but it
could cause conflicts in theory.
--
Michael


signature.asc
Description: PGP signature


Re: disabled SSL log_like tests

2025-04-19 Thread Andrew Dunstan



On 2025-04-18 Fr 7:26 PM, Tom Lane wrote:

I wrote:

What I think happened here is that a previous backend hadn't exited
yet when we start the test, and when its report does come out,
connect_fails prematurely stops waiting.  (In the above, evidently
the child process we want to wait for is 599, but we're fooled by
a delayed report for 25401.)  So my v1 patch needs work.
Maybe we can make the test compare the PIDs in the "forked new client
backend" and "client backend exited" log messages.  Stay tuned...

Okay, this version seems more reliable.



+See C.  CAUTION: use of either option requires that
+the server's log_min_messages be at least DEBUG2, and that no other
+client backend is launched concurrently.  These requirements allow
+C to wait to see the postmaster-log report of backend
+exit, without which there is a race condition as to whether we will
+see the expected backend log output.


That seems a little fragile. I can imagine test authors easily 
forgetting this. Is it worth sanity checking to make sure 
log_min_messages is appropriately set?



cheers


andrew







--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-19 Thread Christoph Berg
Re: Jacob Champion
> > libpq_append_conn_error(conn, "no custom OAuth flows are available,
> > and libpq-oauth could not be loaded library could not be loaded. Try
> > installing the libpq-oauth package from the same source that you
> > installed libpq from");
> 
> Thanks! I think that's a little too prescriptive for packagers,
> personally, but I agree that the current message isn't correct
> anymore. I've gone with "no custom OAuth flows are available, and the
> builtin flow is not installed".

This whole oauth business is highly confusing if you aren't a web
security expert. It's a pretty long way from "the builtin flow is not
installed" to "if you want this to work, you need to install an extra
library/package on the client", so I don't think this message is
helpful.

The originally suggested message was pretty good in that regard. The
distinction about custom flows could probably be dropped.

How about this:

  No libpq OAuth flows are available. (Try installing the libpq-oauth package.)

People who have custom flows will likely know that they have to do
anyway.

Devrim: Does that match the package name you'd use?

> (I suppose packagers could patch in a
> platform-specific message if they really wanted?)

We could, but I'd prefer if we didn't have to. :*)

Christoph




Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-19 Thread Tom Lane
Sami Imseih  writes:
> I think the solution proposed by Frédéric seems reasonable: to switch
> the debug_query_string
> inside PortalDrop. However, I do not like the way the
> debug_query_string changes values
> after the CreatePortal call inside exec_bind_message; that seems incorrect.
> So, I believe we should temporarily switch the debug_query_string
> value only while
> running PortalDrop. Attached is what I think could be safer to do.
> What do you think?

I don't think this is safe at all.  The portal's query string
is potentially shorter-lived than the portal, see in particular
exec_simple_query which just passes a pointer to the original
string in MessageContext.

I'm frankly inclined to do nothing, but if we must do something,
the way to fix it here would be to transiently set debug_query_string
to NULL so that the actions of PortalDrop aren't blamed on any
particular query.  (In my testing, the "temporary file:" message comes
out without any attached STATEMENT most of the time already, so this
isn't losing much as far as that's concerned.)

The whole thing is just a band-aid, though.  debug_query_string
is not the only indicator of what the backend is currently doing.
If somebody comes along and complains that the pg_stat_activity
entry doesn't reflect what's happening, are we going to take
that seriously?

Perhaps a cleaner answer is to rearrange things in postgres.c
so that if there's a pre-existing unnamed portal, we drop that
before we ever set debug_query_string and friends at all.
This would add an extra portal hashtable lookup, but I'm not sure
if that would be measurable or not.  (Possibly we could get that
back by simplifying CreatePortal's API so that it doesn't need to
perform an initial lookup.)

regards, tom lane




Re: What's our minimum supported Python version?

2025-04-19 Thread Tom Lane
Florents Tselai  writes:
> On 19 Apr 2025, at 7:17 PM, Tom Lane  wrote:
>> I think we need to do some combination of moving our
>> minimum-supported-version goalposts forward, making sure that
>> whatever we claim is the minimum Python version is actually
>> being tested in the buildfarm, and fixing oauth_server.py
>> so that it works on that version.

> From an Python ecosystem perspective,
>  3.9 is the usual minimum that people use in CI matrices nowdays.
> So if it was up to me, that’s what I’d choose. 

Per these numbers, that would be cutting off 31% of the buildfarm,
including a lot of still-in-support distros such as RHEL8.
So I would say that's not likely to be our choice.

regards, tom lane




Re: SQL function to access to `creating_extension`

2025-04-19 Thread Florents Tselai


> On 19 Apr 2025, at 4:18 AM, Yurii Rashkovskii  wrote:
> 
> Hi,
> 
> I propose to introduce `pg_creating_extension()` function that would return 
> the OID of the extension being currently created (or `null` if none is).
> 
> The core motivation for it is to be able to establish data provenance in 
> tables created by extensions to be used in `pg_extension_config_dump` 
> configuration. This way, even if a secondary extension inserts data, it can 
> still be excluded from the dump by tracking data provenance in a column with 
> a default. Something like
> 
> ```
> create table my_config (
>--- ...
>extension_provenance oid default pg_creating_extension()
> )
> ```
> 
> This would allow for a generalized exclusion in pg_extension_config_dump.
> 
> I've attached a draft patch for this simple function. I am happy to finalize 
> it with tests and documentation if there is a consensus on the shape.
> 
> --
> Founder at Omnigres
> https://omnigres.com 
> 


Hi Yurii,  

+1 from me.  

I can see this being helpful, especially when dealing with shared tables or 
managing extension-specific data in configuration dumps.



Re: transforms

2025-04-19 Thread Chapman Flack
Also noticing about transforms:

1. protrftypes can have duplicates.

   In part, this is because CreateFunction does nothing to stop you saying
   redundant things like TRANSFORM FOR TYPE circle, FOR TYPE circle.

   But it is also because:

2. CreateFunction hands every type seen in TRANSFORM FOR TYPE
   to get_base_element_type, so if you write something like
   TRANSFORM FOR TYPE circle, FOR TYPE circle[], FOR TYPE arrcircle
   (where arrcircle is a domain over circle[]), protrftypes simply
   ends up {circle,circle,circle}.

3. get_base_element_type leaves alone a domain whose base type is not
   an array, so if bigcircle is a domain over circle, then
   TRANSFORM FOR TYPE bigcircle, FOR TYPE bigcircle[] will add
   {bigcircle,bigcircle} to protrftypes.

4. CreateTransform makes no use of get_base_element_type, but does reject
   any type that is a domain.

   Therefore:

5. You can create a transform for an array type, but can never refer to it
   in TRANSFORM FOR TYPE (you end up referring to the element type).

6. You can refer to a domain in TRANSFORM FOR TYPE, but that transform
   can never be created.

These don't seem to be what the rules ought to be, but I don't have
a strong intuition for what the rules ought to be instead.

Regards,
-Chap




Re: Typos in the code and README

2025-04-19 Thread Alexander Lakhin

Hello Michael,

19.04.2025 13:04, Michael Paquier wrote:

Good finds, Alexander, particularly for the set of function and
variable names used in the comments that do not match with the
reality.  I've checked all these, and they're good for me.


Thank you for your attention to this!


GIN_TUPLE_ -> GIN_TUPLE_H

This one is interested.  It should be harmless in practice, but it
could cause conflicts in theory.
--


I've included it in the collection because of:
#ifndef GIN_TUPLE_
...
#endif   /* GIN_TUPLE_H */

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

What's our minimum supported Python version?

2025-04-19 Thread Tom Lane
Our fine manual claims the answer to $SUBJECT is 3.2.

However, there is room to doubt that our code still actually
works with 3.2, because the oldest version I can find being
tested by the buildfarm is topminnow's 3.4.2.  The next oldest
is shelduck's 3.4.10, and everything else has 3.5.2 or newer.
topminnow hasn't reported in a couple months, so it may
actually be dead.

The reason I bring this up is that I found out the hard way
that src/test/modules/oauth_validator fails on RHEL8, because
its oauth_server.py script is not compatible with the 3.6.8
version of Python supplied by this distro.  (There are 22
buildfarm animals running 3.6.8, presumably mostly also Red
Hat-derived platforms, so I'm not going to apologize for
my workstation being a little long in the tooth.)

I think we need to do some combination of moving our
minimum-supported-version goalposts forward, making sure that
whatever we claim is the minimum Python version is actually
being tested in the buildfarm, and fixing oauth_server.py
so that it works on that version.

I am not familiar enough with the Python landscape to
have an informed opinion about what the minimum supported
version should be.  But I'll present this data scraped
from the buildfarm about how many animals are running what
(counting animals that have reported since 2025-01-01):

  Count Version

  1 3.4.2
  1 3.4.10
  2 3.5.2
  3 3.5.3
  2 3.6.5
 22 3.6.8
  3 3.6.9
  5 3.6.15
  1 3.7.1
  2 3.7.3
  1 3.7.10
  2 3.7.16
  1 3.8.8
  4 3.8.10
 15 3.9.2
  6 3.9.16
  6 3.9.18
  1 3.9.19
  2 3.9.20
 13 3.9.21
  6 3.10.x
 18 3.11.x
 21 3.12.x
 22 3.13.x

160 total

Thoughts?

regards, tom lane




Re: transforms

2025-04-19 Thread Tom Lane
BTW, there is another way in which the current rules for transform
application are not self-consistent.  Postgres has three kinds of
container types: arrays, composites, and ranges.  But the transform
code special-cases only arrays.  I'd have thought that if the idea is
to specify transforms only for base types, then TRANSFORM FOR TYPE
ought to recurse into composites and list their field types in
protrftypes.  It doesn't, but if you look at plpython it appears
that

(1) you can create a transform for a composite type and it will be
applied;
(2) if you don't, the code recurses to the fields and will apply
per-field-type transforms then.

plperl will effectively do (1) but not (2), if I'm reading it right.

Ranges are a bit odd in that their semantics are not just a collection
of element/field types, so it perhaps makes sense that we don't
attempt to apply base-type transforms to their contents.  But it's
sure inconsistent that arrays and composites aren't handled more
nearly alike here.

regards, tom lane




Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC

2025-04-19 Thread Jeff Davis
On Thu, 2025-04-17 at 06:58 -0700, Noah Misch wrote:
> Should initcap_wbnext() pass in a locale-dependent "bool posix"
> argument like
> the others calls the commit changed?

Yes, I believe you are correct. Patch and tests attached.

> Long-term, pg_u_isword() should have a "bool posix" argument. 
> Currently, only
> tests call that function.  If it got a non-test caller,
> https://www.unicode.org/reports/tr18/#word would have pg_u_isword()
> follow the
> choice of posix compatibility like pg_u_isalnum() does.

I based those functions on:

https://www.unicode.org/reports/tr18/#Compatibility_Properties

and the "word" class does not have a POSIX variant. But Postgres has
two documented definitions for "word" characters:

 * for regexes, alnum + "_"
 * for INITCAP(), just alnum

and the above definition doesn't match up with either one, which is why
we don't use it.

ICU INITCAP() uses the ICU definition of word boundaries, so doesn't
match our documentation.

We could adjust our documentation to allow for provider-dependent
definitions of word characters, which might be a good idea. But that
still doesn't quite capture ICU's more complex definition of word
boundaries.

Or, we could remove those unused functions for now, and figure out if
there's a reason to add them back later. They are probably adding more
confusion than anything.

Regards,
Jeff Davis

From ff3f5cc9e7a725e2c1d324c77dd534d77ea085a4 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 19 Apr 2025 11:47:44 -0700
Subject: [PATCH v1] Fix INITCAP() word boundaries for PG_UNICODE_FAST.

Word boundaries are based on whether a character is alphanumeric or
not. For the PG_UNICODE_FAST collation, alphanumeric includes
non-ASCII digits; whereas for the PG_C_UTF8 collation, it only
includes digits 0-9. Pass down the right information from the
pg_locale_t into initcap_wbnext to differentiate the behavior.

Reported-by: Noah Misch 
Discussion: https://postgr.es/m/20250417135841.33.nmi...@google.com
---
 src/backend/utils/adt/pg_locale_builtin.c  |  4 +++-
 src/common/unicode/case_test.c | 13 -
 src/test/regress/expected/collate.utf8.out |  8 ++--
 src/test/regress/sql/collate.utf8.sql  |  2 ++
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 125b10ff7ab..f51768830cd 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -40,6 +40,7 @@ struct WordBoundaryState
 	const char *str;
 	size_t		len;
 	size_t		offset;
+	bool		posix;
 	bool		init;
 	bool		prev_alnum;
 };
@@ -58,7 +59,7 @@ initcap_wbnext(void *state)
 	{
 		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
 		wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
+		bool		curr_alnum = pg_u_isalnum(u, wbstate->posix);
 
 		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
 		{
@@ -92,6 +93,7 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		.str = src,
 		.len = srclen,
 		.offset = 0,
+		.posix = !locale->info.builtin.casemap_full,
 		.init = false,
 		.prev_alnum = false,
 	};
diff --git a/src/common/unicode/case_test.c b/src/common/unicode/case_test.c
index f0b38b3bdd7..fdfb62e8552 100644
--- a/src/common/unicode/case_test.c
+++ b/src/common/unicode/case_test.c
@@ -41,6 +41,7 @@ struct WordBoundaryState
 	const char *str;
 	size_t		len;
 	size_t		offset;
+	bool		posix;
 	bool		init;
 	bool		prev_alnum;
 };
@@ -55,7 +56,7 @@ initcap_wbnext(void *state)
 	{
 		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
 		wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
+		bool		curr_alnum = pg_u_isalnum(u, wbstate->posix);
 
 		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
 		{
@@ -112,10 +113,13 @@ icu_test_full(char *str)
 	char		icu_upper[BUFSZ];
 	char		icu_fold[BUFSZ];
 	UErrorCode	status;
+
+	/* full case mapping doesn't use posix semantics */
 	struct WordBoundaryState wbstate = {
 		.str = str,
 		.len = strlen(str),
 		.offset = 0,
+		.posix = false,
 		.init = false,
 		.prev_alnum = false,
 	};
@@ -344,6 +348,12 @@ test_convert_case()
 	test_convert(tfunc_lower, "σς'Σ' ΣΣ'Σ'", "σς'ς' σσ'ς'");
 	test_convert(tfunc_title, "σςΣ ΣΣΣ", "Σςς Σσς");
 	test_convert(tfunc_fold, "σςΣ ΣΣΣ", "σσσ σσσ");
+	/* test that alphanumerics are word characters */
+	test_convert(tfunc_title, "λλ", "Λλ");
+	test_convert(tfunc_title, "1a", "1a");
+	/* U+FF11 FULLWIDTH ONE is alphanumeric for full case mapping */
+	test_convert(tfunc_title, "\uFF11a", "\uFF11a");
+
 
 #ifdef USE_ICU
 	icu_test_full("");
@@ -354,6 +364,7 @@ test_convert_case()
 	icu_test_full("abc 123xyz");
 	icu_test_full("σςΣ ΣΣΣ");
 	icu_test_full("ıiIİ");
+	icu_test_full("\uFF11a");
 	/* test  */
 	icu_test_full("\u0391\u0345\u0301");
 #endif
diff --git a/src/test/regress/expected/collate.utf8.out b/src/test/regress/expected/collate.utf8.out