Problem with psprintf and intmax_t (%jd)

2020-11-13 Thread Jan Behrens
Dear all,

I'm facing a problem with psprintf and the %jd format string. I used
the following C-code:

PG_RETURN_CSTRING(psprintf("%d@%jd", (int)1, (intmax_t)2));

While this worked fine in past, I recently get (with PostgreSQL 13):

ERROR:  vsnprintf failed: Invalid argument with format string "%d@%jd"

I was not able to figure out what's the problem. Apparently this error
is thrown when vsnprintf(buf, len, fmt, args) in src/common/psprintf.c
returns a negative value. However, calling snprintf(buf, len, "%d@%jd",
(int)1, (intmax_t)2)) in a separate test program works fine on my
system.

I found a workaround. The following call works fine:

PG_RETURN_CSTRING(psprintf("%d@%lld", (int)1, (long long)2));

I could live with the workaround, but it is a bit scary, as I don't
really understand why "%jd" fails while "%lld" does not. Does anyone
know why, and/or is it possible for anyone else to reproduce the
problem?

Kind regards,
Jan Behrens




Transaction isolation level Repeatable Read Read Only vs Serializable Read Only

2020-11-26 Thread Jan Behrens
Dear all,

I have some questions regarding the transaction isolation level REPEATABLE READ 
and it's documentation at

[1] https://www.postgresql.org/docs/13/transaction-iso.html
and
[2] https://www.postgresql.org/docs/13/sql-set-transaction.html

As far as I understood, a read-only transaction with isolation level REPEATABLE 
READ will see only changes committed before that transaction started. [1] 
states that, "The Repeatable Read isolation level only sees data committed 
before the transaction began; it never sees either uncommitted data or changes 
committed during transaction execution by concurrent transactions.", and [2] 
states for REPEATABLE READ: "All statements of the current transaction can only 
see rows committed before the first query or data-modification statement was 
executed in this transaction."

I understand that in a read+write scenario, two concurrent transactions may 
still lead to a result that could not have occurred if those two transactions 
were executed one after the other. However, in a read-only case, I do not see 
how REPEATABLE READ could differ from SERIALIZABLE. Yet [1] explains that:

"The Repeatable Read mode provides a rigorous guarantee that each transaction 
sees a completely stable view of the database. However, this view will not 
necessarily always be consistent with some serial (one at a time) execution of 
concurrent transactions of the same level. For example, even a read only 
transaction at this level may see a control record updated to show that a batch 
has been completed but not see one of the detail records which is logically 
part of the batch because it read an earlier revision of the control record."

If a REPEATABLE READ READ ONLY transaction only sees data commited from 
transactions before it began (more precisely "before the first query or 
data-modification statement was executed", as explained in [2]), I do not 
understand how this can lead to an inconsistent view. Of course, two other 
reading+writing transactions could create an inconsistent result, but such a 
result would also be read by a subsequent SERIALIZABLE READ ONLY transaction. 
Thus, what is the difference between "REPEATABLE READ READ ONLY" and 
"SERIALIZABLE READ ONLY"?

There should be a difference, as [2] explicitly says that the "DEFERRABLE" 
option is available only for SERIALZABLE READ ONLY transactions. I therefore 
conclude that the two levels REPEATABLE READ and SERIALIZABLE act different -- 
also in the READ ONLY case.

However, [1] states that REPEATABLE READ is implemented as "Snapshots 
Isolation" as defined in [berenson95] ("A Critique of ANSI SQL Isolation 
Levels"), see: 
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

In that paper, Table 4 on page 11 states that Snapshot Isolation allows only 
the A5B (Write Skew) anomaly, which is a scenario where two transactions 
concurrently write. The A5A case (Read Skew) is explicitly prohibited. This 
also matches [2], where it says: "All statements of the current transaction can 
only see rows committed before the first query or data-modification statement 
was executed in this transaction."

Thus, what does SERIALIZABLE READ ONLY achieve that REPEATABLE READ READ ONLY 
does not? And what is SERIALIZABLE READ ONLY DEFERRABLE for?

Any hints to make me better understand this issue are appreciated.

Kind regards,
Jan Behrens




Re: Transaction isolation level Repeatable Read Read Only vs Serializable Read Only

2020-11-26 Thread Jan Behrens
On Thu, 26 Nov 2020 19:13:53 -0500
Mohamed Wael Khobalatte  wrote:

> >
> > Thus, what does SERIALIZABLE READ ONLY achieve that REPEATABLE READ READ
> > ONLY does not? And what is SERIALIZABLE READ ONLY DEFERRABLE for?
> >
> 
> There is a nice walkthrough of the "control" and "batches/details" scenario
> mentioned in the docs, you can find it in the wiki,
>  and I think it
> explains the difference well.

Thank you, it helped me a lot.

If there are two REPEATABLE READ READ WRITE transactions, which do *not* 
require SERIALIZABLE isolation, there could still be a third READ ONLY 
transaction, which has stricter requirements on isolation when retrieving data. 
SERIALIZABLE READ ONLY will fail under certain circumstances in which case the 
retrieved information should be deemed void and re-requested in a new 
transaction in order to fit the extra requirements on isolation of the 
read-only transaction.

Then "DEFERRABLE" makes sense too, as it will make the third transaction block 
for a certain time instead of risking cancellation.

Thanks
Jan




GiST index on INT8, possible bug in query planner?

2018-12-03 Thread Jan Behrens
Dear colleagues,

I have developed two indices using PostgreSQL's awesome GiST support,
one of them available here:

http://www.public-software-group.org/pgLatLon

(which is a lightweight and MIT-licensed alternative to PostGIS for
certain simple tasks involving geographic coordinates on the WGS-84
spheroid)


Recently I had the requirement of creating a multi-column index on
an integer in the first column and a custom data type in the second
column of the index. Since integers are not supported by GiST indices
by default, I used the btree_gist extension by Teodor Sigaev,
Oleg Bartunov, Janko Richter, and Paul Jungwirth, see:
https://www.postgresql.org/docs/10/btree-gist.html

However, the GiST index seems not to work as expected by me when
64-bit integers are involved. I tried to create a minimal
proof-of-concept to demonstrate this. Consider the following setup:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (id SERIAL4, ctx INT4);
CREATE TABLE test8_btree (id SERIAL4, ctx INT8);
CREATE TABLE test4_gist (id SERIAL4, ctx INT4);
CREATE TABLE test8_gist (id SERIAL4, ctx INT8);

I create multi-column indices on all four tables, with "ctx" as primary
and "id" as secondary column:

CREATE INDEX ON test4_btree (ctx, id);
CREATE INDEX ON test8_btree (ctx, id);
CREATE INDEX ON test4_gist USING gist (ctx, id);
CREATE INDEX ON test8_gist USING gist (ctx, id);

Now we add some data:

INSERT INTO test4_btree (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);
INSERT INTO test8_btree (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);
INSERT INTO test4_gist (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);
INSERT INTO test8_gist (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 1);

Only the tables directly using the B-tree index ("test4_btree" and
"test8_btree") and the table where "ctx" is 32-bit wide seem to work
properly:

EXPLAIN SELECT * FROM test4_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test4_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: (id = 2)

The query planning for the select on table "test8_gist" does not
include "ctx" in the "Index Cond".


To verify that the above problem isn't just an optimization because of
a low row count, I created a larger example with different values:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (ctx INT4, src INT4);
CREATE TABLE test8_btree (ctx INT8, src INT4);
CREATE TABLE test4_gist (ctx INT4, src INT4);
CREATE TABLE test8_gist (ctx INT8, src INT4);

CREATE INDEX ON test4_btree (ctx, src);
CREATE INDEX ON test8_btree (ctx, src);
CREATE INDEX ON test4_gist USING gist (ctx, src);
CREATE INDEX ON test8_gist USING gist (ctx, src);

INSERT INTO test4_btree SELECT floor(random()*1)+1, floor(random()*2)+1 
FROM generate_series(1, 100);
INSERT INTO test8_btree SELECT floor(random()*1)+1, floor(random()*2)+1 
FROM generate_series(1, 100);
INSERT INTO test4_gist SELECT floor(random()*1)+1, floor(random()*2)+1 FROM 
generate_series(1, 100);
INSERT INTO test8_gist SELECT floor(random()*1)+1, floor(random()*2)+1 FROM 
generate_series(1, 100);

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: (src = 2)

ANALYZE;

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- does not use Index Cond at all, but Filter: ((ctx = 1) AND (src = 2))

SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;  -- slow!

The query on "test8_gist" is significantly slower than in all other
three cases.


I wonder if this is a bug in the query planner, in the GiST facilities
of PostgreSQL, a problem of the "btree_gist" extension, or something
else? Can anyone help me?


Kind rega

Re: GiST index on INT8, possible bug in query planner?

2018-12-03 Thread Jan Behrens
On Mon, 03 Dec 2018 11:47:17 -0500
Tom Lane  wrote:

> Jan Behrens  writes:
> 
> > However, the GiST index seems not to work as expected by me when
> > 64-bit integers are involved. I tried to create a minimal
> > proof-of-concept to demonstrate this. Consider the following setup:
> > 
> > CREATE TABLE test8_gist (id SERIAL4, ctx INT8);
> > CREATE INDEX ON test8_gist USING gist (ctx, id);
> > EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
> > -- uses Index Cond: (id = 2)
> > 
> > The query planning for the select on table "test8_gist" does not
> > include "ctx" in the "Index Cond".
> 
> Probably it would if you'd written "WHERE ctx = 1::int8".  Without
> the cast, what you'll have is "int8 = int4", and I suspect that
> btree_gist doesn't include cross-type operators in its opclasses.
> 
>   regards, tom lane

You are right! I just tested it and ::int8 does the job.

It might be good to add a short notice or warning in the documentation
at: https://www.postgresql.org/docs/current/btree-gist.html

It might help other people who run into the same problem.


Thanks for helping me,
Jan Behrens

-- 
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.org
vorst...@public-software-group.org

eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B

Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek



(When) can a single SQL statement return multiple result sets?

2024-04-10 Thread Jan Behrens
Hello,

While writing a PostgreSQL client library for Lua supporting
Pipelining (using PQsendQueryParams), I have been wondering if there
are any single SQL commands that return multiple result sets. It is
indeed possible to create such a case by using the RULE system:

db=> CREATE VIEW magic AS SELECT;
CREATE VIEW
db=> CREATE RULE r1 AS ON DELETE TO magic
db-> DO INSTEAD SELECT 42 AS "answer";
CREATE RULE
db=> CREATE RULE r2 AS ON DELETE TO magic
db-> DO ALSO SELECT 'Hello' AS "col1", 'World!' AS "col2";
CREATE RULE
db=> DELETE FROM magic; -- single SQL statement!
 answer 

 42
(1 row)

 col1  |  col2  
---+
 Hello | World!
(1 row)

DELETE 0

Here, "DELETE FROM magic" returns multiple result sets, even though it
is only a single SQL statement.

(Note that this isn't possible with rules ON SELECT because it is only
allowed to create a single SELECT rule on a view.)

The case outlined above seems to be a somewhat special case. I haven't
found any other way to return multiple results (other than sending
several semicolon-separated statements, which is not supported by
PQsendQueryParams). So is there any (other) case where I reasonably
should expect several result sets returned by PQgetResult (before
PQgetResult returns NULL)? Wouldn't it make sense to disallow such
behavior altogether? And if not, why can't I write a stored procedure
or function that returns multiple result sets?

These questions are relevant to me because it may have an effect on the
API design if a statement can return several result sets.

Kind regards,
Jan Behrens




No error message/code for commands after libpq pipeline abortion

2024-04-15 Thread Jan Behrens
Hello,

I am writing a Lua client library for PostgreSQL using libpq that
supports pipelining in an async/effect-based context.

I understand that when an error occurs, all subsequent commands also
fail (where PQgetResult returns PGRES_PIPELINE_ABORTED).

However, only the first command has an error message
via PQresultErrorMessage and an error code via
PQresultErrorField(..., PG_DIAG_SQLSTATE) set.

All other, subsequently obtained results lack an error message or an
SQLSTATE. I would expect something like "ERROR:  command failed due to
previous failure in pipeline", but I don't get anything like that.

Why is that? Is it intentional? Is it documented somewhere?

Further question: How do I deal with it with regard to the user of my
library? Should I make up my own error message? Which error code could
I set? I didn't find any appropriate error code in Appendix A of the
documentation.

https://www.postgresql.org/docs/16/errcodes-appendix.html

My current approach is to just set "*" as an error code, which doesn't
feel good:

https://github.com/JanBeh/neumond/blob/9889c5fc19dbb23dff483e4a1cb2e4ef3ab74085/pgeff.c#L332-L345

Maybe I'm just overlooking something?

Kind regards,
Jan Behrens




Re: (When) can a single SQL statement return multiple result sets?

2024-04-15 Thread Jan Behrens
On Wed, 10 Apr 2024 19:02:48 -0400
Tom Lane  wrote:

> Jan Behrens  writes:
> > While writing a PostgreSQL client library for Lua supporting
> > Pipelining (using PQsendQueryParams), I have been wondering if there
> > are any single SQL commands that return multiple result sets.
> 
> Right now, I don't think so.  I believe the current protocol design
> intends to support that, and I think this may trace back to some
> ancient idea at Berkeley that if you select from an inheritance
> hierarchy where the child tables aren't all alike, you should be
> able to see all the child data, which'd require changing tuple
> descriptors midstream.  But our current interpretation of SQL
> SELECT forbids that.

I thought multiple result sets are supported for commands like PQexec,
where "Multiple queries sent in a single PQexec call" are explictly
supported, and which then return multiple result set. This, however,
doesn't apply to pipelining because PQexec is not available in
pipelining mode.

> 
> > Here, "DELETE FROM magic" returns multiple result sets, even though it
> > is only a single SQL statement.
> 
> Right, so it's kind of a case that you have to support.  We're not
> likely to rip out rules anytime soon, even if they're a bit
> deprecated.

As it seems to be a corner case that rarely occurs in practice, I was
considering to simply not support this case in my client library. I
don't know which SQL error code I could return in that case though.
Maybe "0A000" (feature_not_supported) or
"21000" (cardinality_violation). Not sure if either of those is a good
choice. Any better idea?

> 
> > The case outlined above seems to be a somewhat special case. I haven't
> > found any other way to return multiple results (other than sending
> > several semicolon-separated statements, which is not supported by
> > PQsendQueryParams). So is there any (other) case where I reasonably
> > should expect several result sets returned by PQgetResult (before
> > PQgetResult returns NULL)? Wouldn't it make sense to disallow such
> > behavior altogether?
> 
> No.  For one thing, there's too much overlap between what you're
> suggesting and pipelined queries.

To which question was "no" the answer to. I'm not sure if I understand.

> 
>   regards, tom lane
> 

Regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2024-12-27 Thread Jan Behrens
On Fri, 27 Dec 2024 13:26:28 -0700
"David G. Johnston"  wrote:

> > Or is it documented somewhere?
> 
> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING

I can't find any notes regarding functions and schemas in that section.

> Can someone explain to me what's going on, and what is the best practice to
> > deal with it? Is there a way to avoid fully qualifying every type and
> > expression? Which parts do I have to qualify or is this something that
> > could be fixed in a future version of PostgreSQL?
> >
> 
> Add qualification or attach a “set search_path” clause to “create
> function”.  Code stored in the server should not rely on the session
> search_path.
> 
> David J.

In my (real world) case, I was unable to use "SET search_path FROM
CURRENT" because it isn't possible to use "SET" in procedures that use
transactions, due to this documented limitation:

"If a SET clause is attached to a procedure, then that procedure cannot
execute transaction control statements (for example, COMMIT and
ROLLBACK, depending on the language)."

https://www.postgresql.org/docs/17/sql-createprocedure.html

My procedure looks more or less like this:

CREATE PROCEDURE "myfunc"()
  LANGUAGE plpgsql AS
  $$
  DECLARE
"old_search_path" TEXT;
-- some more variables
  BEGIN
SELECT current_setting('search_path') INTO "old_search_path";
SET search_path TO 'myschema';
-- some code that uses COMMIT and SET TRANSACTION ISOLATION LEVEL
PERFORM set_config('search_path', "old_search_path", FALSE);
  END;
  $$;

My question is: Am I safe if I use fully-qualified types in the DECLARE
section only? Or do I need to provide full qualification also in the
code below (after SET search_path TO 'myschema')?

And bonus question: Is it documented somewhere?

Maybe not many people run into these issues because schemas and
functions aren't used as often in combination?

Kind Regards
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-03 Thread Jan Behrens
On Fri, 3 Jan 2025 08:34:57 -0700
"David G. Johnston"  wrote:

> On Friday, January 3, 2025, Jan Behrens  wrote:
> >
> > I would like to know if the above example is correct. It seems overall
> > bulky, but I haven't found a better way, assuming that it can be
> > unknown where a particular extension has been installed to. In
> > particular I feel a bit insecure about where I have to fully qualify,
> > and where not. See the comments in the code above.
> 
> 
> Short answer, you cannot looking at a definition and know the answer -
> whether the code is going to be executed in a sanitized search_path is what
> matters.

I don't understand. Do you mean my last example is wrong / insecure?
If so, why?

> Anything that would be executed during pg_restore has to be made
> safe.  Therefore, code that is only ever executed by applications directly
> can use swarch_path.

Why should the function be executed during pg_restore?

> 
> I’d probably modify the function signature to take search_path as a second
> optional argument and then invoke a set search_path within the function.
> At worse the caller can place current_setting(search_path) as the value of
> that argument though being explicit would be recommended.
> 
> David J.

I could do that, but I would like to understand if that is really
necessary as it makes the interface more complicated, and I would like
to avoid unnecessary complexity in my interface.

Is it really impossible to have functions without SET search_path in
the definition of a PL/pgSQL function if I fully-qualify all types in
the DECLARE section and if all other non-qualified identifiers occur
after set_config('search_path', ...)?

Kind regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-03 Thread Jan Behrens
On Fri, 3 Jan 2025 10:16:15 -0700
"David G. Johnston"  wrote:

> It is at risk because it depends on the session search_path.  That is all.
> Whether that risk turns into a failure to execute depends on how/when it is
> executed.  I'm not that comfortable talking about security risks in this
> context though the current design goal is to mitigate such security issues
> by setting things up so the function execution fails rather than is
> executed insecurely.  This is presently mainly done by setting the
> search_path to just effectively pg_catalog before executing the query,
> breaking any code depending on other schemas existing in the search_path.

I'm not sure if there is a misunderstanding. In my last example (e-mail
dated Fri, 3 Jan 2025 13:53:32 +0100), the user who has control over
the contents of the "query_p" argument is an application programmer,
not a real end-user. The function is also *not* marked as SECURITY
DEFINER, so it always runs with the privileges of the caller. I don't
see any specific security risk here, except that I'm unsure if the
function is written properly with regard to qualification of the used
types after PL/pgSQL's BEGIN. As I learned, I must fully-qualify types
*before* the BEGIN, i.e. in the DECLARE section. But does this also
hold for types after the BEGIN when I previously ensure that the
search_path is correctly set (set within the function's body)?

> 
> > > Anything that would be executed during pg_restore has to be made
> > > safe.  Therefore, code that is only ever executed by applications
> > directly
> > > can use swarch_path.
> >
> > Why should the function be executed during pg_restore?
> 
> 
> If the function is used in building an index, or a materialized view, are
> the common cases.  Trigger functions too.
> 
> Note, this is talking about evaluating functions generally, not the one
> provided here specifically.

I don't think my function would be evaluated during a pg_restore then.

> 
> > I could do that, but I would like to understand if that is really
> > necessary as it makes the interface more complicated, and I would like
> > to avoid unnecessary complexity in my interface.
> >
> > Is it really impossible to have functions without SET search_path in
> > the definition of a PL/pgSQL function if I fully-qualify all types in
> > the DECLARE section and if all other non-qualified identifiers occur
> > after set_config('search_path', ...)?
> >
> If you add a set_config to the body of the function then you indeed avoid
> the problem.  It is basically equivalent to adding a SET clause to the
> create function command.  In this case even when the function is executed
> in a sanitized search_path environment (such as the one established by
> pg_restore) you are not relying on it.  That non-reliance is all that
> really matters.
> 
> David J.

But if I use "SET search_path FROM CURRENT", then the called function
won't know the search_path that is set at the caller's side (which is
what I need to make my interface nice to use).

I would prefer my current solution, but I would like to be sure that my
example (the one in my e-mail dated Fri, 3 Jan 2025 13:53:32 +0100) is
correct. I still am not sure about that.

Kind Regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-03 Thread Jan Behrens
On Fri, 3 Jan 2025 13:56:02 -0800
Adrian Klaver  wrote:

> At this point I am lost as to what the overall goal of this is.
> 
> Can you provide a 1 ft view if what it is you are trying to achieve?

Sure! I would like to create a component (e.g. a PostgreSQL extension)
that provides a function which processes some complex data, without
making any requirements regarding where the data is stored. To pass
this data to the function, I could use arrays of composite types, but
that seems to be very bulky. Another option would be to use cursors,
but that didn't turn out to work very smooth either.

Instead, I plan to expect the function to receive a query string that
will get the data that is being processed by the function.

That query string should be allowed to refer to tables in the
search_path at the caller's side.

Therefore, I cannot use the "SET search_path FROM CURRENT" in my
"CREATE FUNCTION" statement, because it would overwrite the current
search_path on each call of the function.

Thus my idea is to do this (simplified):

CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type"

  
  -- I cannot use SET search_path FROM CURRENT here, because "query_p"
  -- shall refer to tables in the search_path of the caller.
  

  LANGUAGE plpgsql AS $$
DECLARE
  "old_search_path" TEXT;

  
  -- I have to fully qualify types in the DECLARE section.
  

  "some_variable" "some_schema"."some_type";
BEGIN
  SELECT current_setting('search_path') INTO "old_search_path";
  PERFORM set_config(
'search_path',
'some_schema, pg_temp, ' || "old_search_path",
TRUE
  );

  
  -- Do I have to fully qualify types and operators from
  -- "myschema" here? Or is it safe to not fully qualify them?
  
END;
  $$;

That is my overall idea.

My problem is that I'm confused about WHEN EXACTLY I have to qualify
tables/types, etc. It is very hard to understand from reading (just) the
documentation.

Kind Regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-03 Thread Jan Behrens
On Sat, 4 Jan 2025 00:22:03 +0100
Jan Behrens  wrote:

>   
>   -- Do I have to fully qualify types and operators from
>   -- "myschema" here? Or is it safe to not fully qualify them?
>   

This was meant to read:

...from "some_schema" here.




Re: search_path for PL/pgSQL functions partially cached?

2025-01-01 Thread Jan Behrens
On Sat, 28 Dec 2024 00:40:09 +0100
Jan Behrens  wrote:

> On Fri, 27 Dec 2024 13:26:28 -0700
> "David G. Johnston"  wrote:
> 
> > > Or is it documented somewhere?
> > 
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> 
> I can't find any notes regarding functions and schemas in that section.

Actually, I found another note in the documentation. But it doesn't
explain things correctly. In the documentation for PostgreSQL 17,
section 36.17.6.1. (Security Considerations for Extension Functions)
says:

"SQL-language and PL-language functions provided by extensions are at
risk of search-path-based attacks when they are executed, since parsing
of these functions occurs at execution time not creation time."

https://www.postgresql.org/docs/17/extend-extensions.html#EXTEND-EXTENSIONS-SECURITY

So here, the manual explicity states that functions are parsed at
execution, not creation time. As seen in my original example in this
thread, this isn't (fully) true. Moreover, it isn't true for all
SQL-language functions, as can be demonstrated with the following code:



CREATE SCHEMA s1;
CREATE SCHEMA s2;

CREATE VIEW s1.v AS SELECT 'creation' AS col;
CREATE VIEW s2.v AS SELECT 'runtime' AS col;

SET search_path TO 'public', 's1';

CREATE FUNCTION use_sql_atomic() RETURNS TEXT LANGUAGE sql BEGIN ATOMIC
  SELECT 'use_sql_atomic = ' || col FROM v;
END;

CREATE FUNCTION use_sql_string() RETURNS TEXT LANGUAGE sql AS $$
  SELECT 'use_sql_string = ' || col FROM v;
$$;

CREATE FUNCTION use_plpgsql() RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN
  RETURN (SELECT 'use_plpgsql = ' || col FROM v);
END; $$;

SET search_path TO 'public', 's2';

SELECT use_sql_atomic() AS "output" UNION ALL
SELECT use_sql_string() AS "output" UNION ALL
SELECT use_plpgsql() AS "output";



This generates the following output:

  output   
---
 use_sql_atomic = creation
 use_sql_string = runtime
 use_plpgsql = runtime
(3 rows)

Overall, PostgreSQL doesn't behave consistent, and to me it seems that
the documentation isn't describing its behavior correctly either.

I understand if fixing this is too much work (even though I would
really like to see this fixed). But given that the current behavior is
highly surprising and inconsistent - and keeping in mind that this is a
subject that may affect security - I think the documentation should
reflect the current behavior at least. I thus see this as a
documentation issue.

Kind regards,
Jan Behrens




search_path for PL/pgSQL functions partially cached?

2024-12-27 Thread Jan Behrens
Hello,

I'm experiencing some weird issues when running the following code in a psql 
session:



CREATE TABLE "tbl" ("col" NUMERIC(15, 0));

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
RETURN '2.4';
  END;
$$;

BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO 'myschema';

CREATE TABLE "tbl" ("col" NUMERIC);

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
RETURN '5.4';
  END;
$$;

CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$
  DECLARE
"variable" "tbl"."col"%TYPE;
  BEGIN
"variable" := "foo"();
RETURN "variable";
  END;
$$;

COMMIT;

SELECT "myschema"."run"(); -- returns '2.4' (when run in the same session)

-- reconnect to database here:
\c

SELECT "myschema"."run"(); -- returns '2'
SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5'

-- reconnect to database again:
\c

SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5.4'
SET search_path TO 'public';
SELECT "myschema"."run"(); -- returns '2.4' again



I'm using PostgreSQL verison 16.4.

Is this the expected behavior? If yes, where is this documented? If no, what 
would be the expected behavior?

Of course, I could fix this by fully qualifying the table name "tbl" in the 
function. Nonetheless, I'm not really sure what's going on here.

It seems that it matters *both* how the search_path was set during the *first* 
invocation of the function within a session *and* how it is set during the 
actual call of the function. So even if there are just two schemas involved, 
there are 4 possible outcomes for the "run" function's result ('2.4', '2', '5', 
and '5.4'). To me, this behavior seems to be somewhat dangerous. Maybe it is 
even considered a bug? Or is it documented somewhere? I remember running into 
some problems like that in the past already, but unfortunately, I don't 
remember details.

I suppose this is because there is some caching mechanism in place. But 
apparently it only caches the "tbl"."col"%TYPE and not the "foo"() function 
call expression. Can someone explain to me what's going on, and what is the 
best practice to deal with it? Is there a way to avoid fully qualifying every 
type and expression? Which parts do I have to qualify or is this something that 
could be fixed in a future version of PostgreSQL?

Many thanks and kind regards,
Jan Behrens




Allowed to return possibly TOASTed datums?

2024-12-14 Thread Jan Behrens
Hi,

I would like to implement a function that simply returns its argument
(for the unary plus operator of a custum data type).

Is it allowed to use the following function, assuming the data type is
TOASTable?

PG_FUNCTION_INFO_V1(mytype_pos);
Datum mytype_pos(PG_FUNCTION_ARGS) { 
  PG_RETURN_DATUM(GETARG_DATUM(0));
}

CREATE FUNCTION "mytype_pos"("mytype") RETURNS "mytype"
  STRICT IMMUTABLE LANGUAGE c AS 'mylib.so', 'mytype_pos';

CREATE OPERATOR + (
  rightarg = "mytype",
  function = "mytype_pos"
);

Or do I need to always return a DETOASTed value? In addition to the
answer, I would like to know where I can find such information. It
seems to be difficult (for me) to find it in the source. Is there any
other documentation I can use, or any hints on WHERE in the source I
find more information about when to DETOAST and when not?

Also: Is my approach an idiomatic way to implement unary plus? Are there
pre-existing functions for that?

Many thanks in advance for your help and kind regards

Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Jan Behrens
On Wed, 1 Jan 2025 11:19:32 -0700
"David G. Johnston"  wrote:

> On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens  wrote:
> 
> > On Sat, 28 Dec 2024 00:40:09 +0100
> > Jan Behrens  wrote:
> >
> > > On Fri, 27 Dec 2024 13:26:28 -0700
> > > "David G. Johnston"  wrote:
> > >
> > > > > Or is it documented somewhere?
> > > >
> > > >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> > >
> > > I can't find any notes regarding functions and schemas in that section.
> >
> >
> "Because PL/pgSQL saves prepared statements and sometimes execution plans
> in this way, SQL commands that appear directly in a PL/pgSQL function must
> refer to the same tables and columns on every execution; that is, you
> cannot use a parameter as the name of a table or column in an SQL command."
> 
> Changing search_path is just one possible way to change out which object a
> name tries to refer to so it is not called out explicitly.

The first part of the cited sentence seems helpful ("you must always
refer to the same tables and columns on every execution"). I would thus
conclude that using a dynamic search_path when running functions or
procedures is *always* considered errorneous (even though not reported
by the database as an error), except when using EXECUTE.

I wonder if the database could/should generate an error (or at least a
warning?) when a function or procedure without a "SET search_path"
statement uses a non-qualified name? According to the documentation
using a dynamic search_path to refer to different entities in the
database is a case that "must" not happen.

But following through, this might lead to more warnings one might
expect, e.g. when using simple operators such as "=" or the "IN" or
"CASE expression WHEN" statements, as these rely on the search_path as
well. Should such code be considered non-idiomatic, dangerous, or even
errorneous if a "SET search_path" option is missing in the
function's/procedure's definition?

Maybe I'm overthinking this. But in practice, I've been running into
surprising issues whenever functions and schemas are involved, and I'm
not sure if every programmer will be aware of how important it is to
properly set a search_path in the function's defintion after reading
the documentation. (Besides, it's not always possible in procedures.)

> 
> > "SQL-language and PL-language functions provided by extensions are at
> > risk of search-path-based attacks when they are executed, since parsing
> > of these functions occurs at execution time not creation time."
> 
> > Moreover, it isn't true for all
> > SQL-language functions, as can be demonstrated with the following code:
> 
> Yeah, when we added a second method to write an SQL-language function, one
> that doesn't simply accept a string body, we didn't update that section to
> point out that is the string input variant of create function that is
> affected in this manner, the non-string (atomic) variant stores the result
> of parsing the inline code as opposed to storing the raw text.
> 
> David J.

I missed that other part in the manual (which is in a totally different
section). Should I report the missing update in section 36.17.6.1. of
the documentation as a documentation issue, or is it not necessary?

Kind regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Jan Behrens
On Thu, 2 Jan 2025 13:48:29 +0100
Pavel Stehule  wrote:

> čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens 
> napsal:
> 
> > On Thu, 2 Jan 2025 12:40:59 +0100
> > Pavel Stehule  wrote:
> >
> > > How can you identify unwanted usage of non qualified identifiers from
> > > wanted usage of non qualified identifiers? It is a common pattern for
> > > sharding. Using not qualified identifiers of operators, functions is
> > common
> > > when you are using orafce extensions, etc.
> >
> > I don't fully understand the use-case. Could you elaborate?
> >
> > As I understand, even if identifiers are not fully-qualified, it is
> > forbidden to use the search_path to refer to different database
> > entities at run-time (as David pointed out).
> >
> > So I don't understand how a dynamic "search_path" could be used in any
> > scenario within functions except when EXECUTE is involved.
> >
> 
> you don't need more databases
> 
> schema one - customer x
> schema two - customer y
> 
> create table one.t1(..); create table one.t2(..);
> create table two.t1(..); create table two.t2(..);
> 
> set search_path to one;
> -- work with data set of customer x
> 
> set search_path to two;
> -- work wit data set of customer y
> 
> some times can be pretty ineffective to have database per customer - more
> connect, disconnect in postgres is much more expensive than SET search_path
> TO .. and maybe RESET plans;

I guess that means there is a practical application where search_path
MAY change at runtime IF done in different sessions or if the cache is
reset using the DISCARD command:

https://www.postgresql.org/docs/17/sql-discard.html

I assume DISCARD PLANS would be the right command?

This seems to be a very special case though. I think there should be a
warning in the documentation of CREATE FUNCTION with regard to schemas
anyway, though.

Regards,
Jan




Re: search_path for PL/pgSQL functions partially cached?

2025-01-03 Thread Jan Behrens
On Sat, 28 Dec 2024 00:40:09 +0100
Jan Behrens  wrote:

> > Add qualification or attach a “set search_path” clause to “create
> > function”.  Code stored in the server should not rely on the session
> > search_path.
> > 
> > David J.

I have been trying to adjust some of my code, and I still have cases
where I have to rely on the session's search_path. I'll provide an
example below.

> 
> [...]
> 
> My question is: Am I safe if I use fully-qualified types in the DECLARE
> section only? Or do I need to provide full qualification also in the
> code below (after SET search_path TO 'myschema')?
> 
> And bonus question: Is it documented somewhere?
>
> [...]
> 
> Kind Regards
> Jan Behrens

The following code is taken from a project I'm currently working on:



-- Let's assume we don't know the name of the schema in which the
-- "pgratio" extension with the RATIONAL data type is installed.
CREATE SCHEMA "qwertyuiop";
CREATE EXTENSION "pgratio" WITH SCHEMA "qwertyuiop";

-- This installs schema "myschema" with some dynamic function:
BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO "myschema";

-- Append schema of "pgratio" extension, which provides the RATIONAL
-- data type, to search_path:
SELECT set_config(
  'search_path',
  current_setting('search_path') || ', ' || quote_ident(nspname),
  TRUE
) FROM pg_namespace, pg_extension
  WHERE pg_namespace.oid = extnamespace AND extname = 'pgratio';

CREATE DOMAIN "rational_wrapper" AS RATIONAL;

CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS RATIONAL

  
  -- I cannot use SET search_path FROM CURRENT here, because "query_p"
  -- shall refer to tables in the search_path of the caller.
  

  LANGUAGE plpgsql AS $$
DECLARE
  "old_search_path" TEXT;

  
  -- I have to fully qualify the following type.
  -- Moreover, I can't use RATIONAL as I don't know its schema.
  

  "result" "myschema"."rational_wrapper";
BEGIN
  SELECT current_setting('search_path') INTO "old_search_path";
  PERFORM set_config(
'search_path',
'myschema, ' || quote_ident(nspname) || ', pg_temp, ' ||
"old_search_path",
TRUE
  ) FROM pg_namespace, pg_extension
WHERE pg_namespace.oid = extnamespace AND extname = 'pgratio';

  
  -- Is it safe to not fully qualify type RATIONAL below?
  -- And, if yes, where in the documentation is this explained?
  

  CREATE TEMP TABLE "mytemptable" ("val" RATIONAL);
  EXECUTE 'INSERT INTO "mytemptable" '
'SELECT "query"."a" * "query"."b" '
'FROM (' || "query_p" || ') AS "query"';
  -- Do some things here.
  SELECT sum("val") INTO "result" FROM "mytemptable";
  PERFORM set_config('search_path', "old_search_path", TRUE);
  RETURN "result";
END;
  $$;

COMMIT;

CREATE TABLE "tbl" ("foo" INT8, "bar" INT8);
INSERT INTO "tbl" VALUES (5, 7), (1, 10);

SELECT "myschema"."some_function"(
  'SELECT "foo" AS "a", "bar" AS "b" FROM "tbl"'
);
\c
SELECT "myschema"."some_function"(
  'SELECT "foo" AS "a", "bar" AS "b" FROM "tbl"'
);



The code for the pgratio extension that provides the RATIONAL data type
is found here: https://www.public-software-group.org/pgratio

Running that code on my machine correctly gives:

some_function 
---
 45
(1 row)

You are now connected to database "jbe" as user "jbe".
 some_function 
---
 45
(1 row)

Because extensions can only be installed in one schema, it may be a bad
idea to have a component requiring an extension to be installed in a
particular schema (because if different components have different
expectations on the schema name, e.g. some might expect "pgratio" to be
installed in "public" and others might expect it in "pgratio" or some
other schema such as "qwertyuiop", this would lead to an unresolvable
conflict).

I would like to know if the above example is correct. It seems overall
bulky, but I haven't found a better way, assuming that it can be
unknown where a particular extension has been installed to. In
particular I feel a bit insecure about where I have to fully qualify,
and where not. See the comments in the code above.

Note that I want the function to accept a query that makes sense in the
caller's search_path. Thus using "SET search_path FROM CURRENT" is not
an option for me, I believe.

Regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-04 Thread Jan Behrens
On Fri, 3 Jan 2025 18:36:13 -0500
Isaac Morland  wrote:

> On Fri, 3 Jan 2025 at 18:22, Jan Behrens  wrote:
> 
> 
> > Instead, I plan to expect the function to receive a query string that
> > will get the data that is being processed by the function.
> >
> > That query string should be allowed to refer to tables in the
> > search_path at the caller's side.
> >
> > Therefore, I cannot use the "SET search_path FROM CURRENT" in my
> > "CREATE FUNCTION" statement, because it would overwrite the current
> > search_path on each call of the function.
> >
> 
>  I wonder if it would help if EXECUTE took an optional search_path to use
> while executing the query.

That wouldn't solve my problem, because the function that includes the
EXECUTE still needs to know the search_path set on the caller side.

This only works if I omit the "SET search_path FROM CURRENT" option in
the function's definition OR if I pass a search_path as an argument. I
guess I could write a wrapper:



BEGIN;

CREATE SCHEMA "some_schema";
SET LOCAL search_path TO "some_schema";

CREATE TYPE "some_type" AS ("name" TEXT, "height" INT8);

CREATE FUNCTION "foo_impl"("query_p" TEXT, "search_path_p" TEXT)
  RETURNS "some_type"
  LANGUAGE plpgsql SET search_path FROM CURRENT AS $$
DECLARE
  "old_search_path" TEXT;
  "result" "some_type";
BEGIN
  "old_search_path" = current_setting('search_path');
  PERFORM set_config('search_path', "search_path_p", TRUE);
  EXECUTE "query_p" INTO "result";
  PERFORM set_config('search_path', "old_search_path", TRUE);
  RETURN "result";
END;
  $$;

CREATE FUNCTION "foo"("query_p" TEXT)
  RETURNS "some_type"
  RETURN "foo_impl"("query_p", current_setting('search_path'));

COMMIT;

CREATE TABLE "tbl" ("id" SERIAL8, "name" TEXT, "height" INT8);
INSERT INTO "tbl" ("name", "height") VALUES ('Alice', 200);

SELECT * FROM "some_schema"."foo"('SELECT "name" FROM "tbl"');



Not sure which variant (this or my previous attempt) is better and if
either is safe/correct.

Regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-04 Thread Jan Behrens
On Sat, 4 Jan 2025 09:37:14 -0800
Adrian Klaver  wrote:

> On 1/3/25 15:22, Jan Behrens wrote:
> > On Fri, 3 Jan 2025 13:56:02 -0800
> > Adrian Klaver  wrote:
> > 
> >> At this point I am lost as to what the overall goal of this is.
> >>
> >> Can you provide a 1 ft view if what it is you are trying to achieve?
> > 
> > Sure! I would like to create a component (e.g. a PostgreSQL extension)
> > that provides a function which processes some complex data, without
> > making any requirements regarding where the data is stored. To pass
> > this data to the function, I could use arrays of composite types, but
> > that seems to be very bulky. Another option would be to use cursors,
> > but that didn't turn out to work very smooth either.
> > 
> > Instead, I plan to expect the function to receive a query string that
> > will get the data that is being processed by the function.
> > 
> > That query string should be allowed to refer to tables in the
> > search_path at the caller's side.
> > 
> > Therefore, I cannot use the "SET search_path FROM CURRENT" in my
> > "CREATE FUNCTION" statement, because it would overwrite the current
> > search_path on each call of the function.
> > 
> > Thus my idea is to do this (simplified):
> > 
> > CREATE FUNCTION "some_function" ("query_p" TEXT) RETURNS "some_type"
> > 
> >
> >-- I cannot use SET search_path FROM CURRENT here, because "query_p"
> >-- shall refer to tables in the search_path of the caller.
> >
> > 
> >LANGUAGE plpgsql AS $$
> >  DECLARE
> >"old_search_path" TEXT;
> > 
> >
> >-- I have to fully qualify types in the DECLARE section.
> >
> > 
> >"some_variable" "some_schema"."some_type";
> >  BEGIN
> >SELECT current_setting('search_path') INTO "old_search_path";
> >PERFORM set_config(
> >  'search_path',
> >  'some_schema, pg_temp, ' || "old_search_path",
> >  TRUE
> >);
> > 
> >
> >-- Do I have to fully qualify types and operators from
> >-- "myschema" here? Or is it safe to not fully qualify them?
 (correction: "some_schema")
> >
> >  END;
> >$$;
> > 
> > That is my overall idea.
> 
> Is 'some_schema' a known item when installing?

Yes, fortunately "some_schema" is a fixed name.

> 
> Once you have the search_path defined and assuming all the objects you 
> want are in that path, then yes you can drop the schema qualification.

That would be nice, but it doesn't seem to be the case. At least not
always. I constructed the following new example:



CREATE TABLE "tbl" ("col" NUMERIC(15, 0));

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
RETURN '2.4';
  END;
$$;

BEGIN;

CREATE SCHEMA "myschema";
SET LOCAL search_path TO 'myschema';

CREATE TABLE "tbl" ("col" NUMERIC);

CREATE FUNCTION "foo"() RETURNS TEXT LANGUAGE plpgsql AS $$
  BEGIN
RETURN '5.4';
  END;
$$;

CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql AS $$
  DECLARE
"old_search_path" TEXT;
  BEGIN
"old_search_path" := current_setting('search_path');
SET LOCAL search_path TO "myschema";
-- At this point, search_path is always set to 'myschema'!
DECLARE
  "variable" "tbl"."col"%TYPE;
BEGIN
  "variable" := "foo"();
  RETURN "variable";
END;
PERFORM set_config('search_path', "old_search_path", TRUE);
  END;
$$;

COMMIT;

SELECT "myschema"."run"(); -- returns '5.4' (when run in the same session)

-- reconnect to database here:
\c

SELECT "myschema"."run"(); -- returns '5'
SET search_path TO 'myschema';
SELECT "myschema"."run"(); -- returns '5'

-- reconnect to database again:
\c

Re: search_path for PL/pgSQL functions partially cached?

2025-01-05 Thread Jan Behrens
On Sun, 05 Jan 2025 07:48:56 +0100
Laurenz Albe  wrote:

> So what you should do is set the "search_path" *on* the function, not *in*
> the function:
> 
> CREATE FUNCTION "run"() RETURNS TEXT LANGUAGE plpgsql
> SET search_path = myschema
> AS $$
>   DECLARE
> "variable" "tbl"."col"%TYPE;
>   BEGIN
> "variable" := "foo"();
> RETURN "variable";
>   END;
> $$;
> 
> Yours,
> Laurenz Albe

Yes, that's what works and what I would also do whenever possible
(probably in the form "SET search_path FROM CURRENT").

Summarizing the remaining thread, some issues are:

* The documentation isn't providing a prominent warning that behavior
  can be surprising if "SET search_path" is not used in the function's
  or procedure's defintion. (E.g. searching for "schema" in the
  documentation page for "CREATE FUNCTION" doesn't give any helpful
  hints or warning.)

* Things get more complicated when it's impossible to use
  "SET search_path" in the function's/procedure's definition, for which
  there are two scenarios:

  Scenario 1: The function or procedure needs or wants to access or use
  the search_path of the caller.

  Scenario 2: A procedure wants to execute transactional statements
  such as COMMIT or ROLLBACK within its body.

  In scenario 1, using "SET search_path" will overwrite the caller's
  search_path at runtime, so I cannot access it. (In my post from Sat,
  4 Jan 2025 14:23:10 +0100, I have proposed a wrapper function to work
  around that.)

  In scenario 2, using "SET search_path" is simply not possible and
  will be rejected by PostgreSQL.

* It is a bit unclear how the exact behavior is when I set a
  search_path from within the functions body (e.g. due to one of the two
  scenarios above). There are some examples that show some quite
  surprising behavior, at least if you don't fully understand the
  plan caching mechanism that is used.

Kind regards,
Jan Behrens




Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Jan Behrens
On Thu, 2 Jan 2025 12:40:59 +0100
Pavel Stehule  wrote:

> How can you identify unwanted usage of non qualified identifiers from
> wanted usage of non qualified identifiers? It is a common pattern for
> sharding. Using not qualified identifiers of operators, functions is common
> when you are using orafce extensions, etc.

I don't fully understand the use-case. Could you elaborate?

As I understand, even if identifiers are not fully-qualified, it is
forbidden to use the search_path to refer to different database
entities at run-time (as David pointed out).

So I don't understand how a dynamic "search_path" could be used in any
scenario within functions except when EXECUTE is involved.

> 
> Using qualified identifiers everywhere strongly reduces readability. There
> are no aliases to the schema, so aliases cannot help.

Yes, I agree on that. Using "SET search_path" in the function's
definition fixes that problem, but it's easy to miss how important this
is from reading the documentation:

The manual regarding "CREATE FUNCTION" refers to "search_path" only
within the "Writing SECURITY DEFINER Functions Safely" section. It's
easy to skip that part unless you use that feature. Moreover, that
section alone doesn't explain the weird behavior of four different
outcomes of a function with only two schemas involved which I brought
up in the beginning of this thread.

The part on "SET configuration_parameter" part in the "CREATE FUNCTION"
documentation doesn't mention the search_path or schemas. And I don't
think you can expect every programmer will read the "Plan Caching"
subsection in the "PL/pgSQL under the Hood" section. But even then, the
information is just provided indirectly.

Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't
give any hint either.

I think (assuming that the behavior isn't fixed) that some slighly more
prominent warning would be reasonable.

> 
> you can identify the functions where search_path is not explicitly assigned
> 
> select oid::regprocedure
>   from pg_proc
> where pronamespace::regnamespace not in ('pg_catalog',
> 'information_schema')
>and not exists(select 1 from unnest(proconfig) g(v) where  v ~
> '^search_path');
> 
> 
> Regards
> 
> Pavel

Kind regards,
Jan




Calling set-returning functions in a non-set-expecting context

2025-02-17 Thread Jan Behrens
Hi,

I wonder if it is guaranteed that when calling a set-returning function
in a non-set-expecting context, the used row is guaranteed to be the
first row returned.

I.e. if I have the following function definition

CREATE FUNCTION foo() RETURNS INTEGER RETURN generate_series(1, 10);

is it then guaranteed, that foo() always returns 1? And if so, is that
documented somewhere? I didn't find it.

I know that generate_series creates an ordered result, so that's not my
concern, but I'm not sure whether the first row will be picked. There
is something written here:

https://www.postgresql.org/docs/17/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET

"SQL functions execute an arbitrary list of SQL statements, returning
the result of the last query in the list. In the simple (non-set) case,
the first row of the last query's result will be returned. (Bear in
mind that 'the first row' of a multirow result is not well-defined
unless you use ORDER BY.) If the last query happens to return no rows
at all, the null value will be returned."

But this part explicitly mentions queries. Using the "RETURN"
statement, I don't give a query but an expression. So does the "first
row gets used" rule also apply in my context, and why so?

For example, the following command results in an error:

SELECT (SELECT generate_series(1, 10));
ERROR:  more than one row returned by a subquery used as an expression

Kind regards and thanks for your advice,
Jan Behrens