Re: libpq: COPY FROM STDIN BINARY of arrays

2023-03-31 Thread Dominique Devienne
On Thu, Mar 30, 2023 at 6:00 PM Tom Lane  wrote:

> Dominique Devienne  writes:
> > Does the wire-format of arrays differ between binary-binds and
> binary-copy?
>
> No.
>

An update on this, I think I figure it out, by comparing with COPY TO
STDOUT WITH (FORMAT BINARY).
I was missing the byte-count "header" for the array. Since I'm reusing my
binary-bind code for COPY,
I'm missing the 4 bytes for the array's "total encoded byte length" before
the array data proper.
Now that I understand the issue, should be a simple matter to fix. Thanks
for chiming in Tom. --DD


Cannot terminate backend

2023-03-31 Thread Arnaud Lesauvage
Hi all,

I have a long running query that I seem unable to either cancel or terminate.
What could be the reason for this, and what is the bet way to terminate this 
kind of query ?

The query is a CTE using postgis ST_ClusterDBSCAN function. The CTE returns 
approximately 150k rows.
The SQL is as follows :

EXPLAIN ANALYZE

WITH subq AS (

SELECT id, geom, ST_ClusterDBSCAN(geom, eps := 1000, minpoints := 1) 
OVER() AS cluster_id
FROM mytable
)

SELECT cluster_id, count(id), ST_Collect(geom)

FROM subq

GROUP BY cluster_id;


pg_stat_activity show no wait event.
pg_cancel_backend(mypid) returns true but the state does not change in 
pg_stat_activity.
pg_terminate_backend(mypid) yields the same result (as superuser)
Pg_stat_activity show no wait_event.


SELECT version();

PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit


Any idea ? Thanks a lot !

Regards
Arnaud


Re: Question on creating keys on partitioned tables

2023-03-31 Thread Laurenz Albe
On Thu, 2023-03-30 at 17:05 -0700, Siddharth Jain wrote:
> I have this question. Say I create a partitioned table on column X.
> 
> Option 1:
> 
> I add a primary key on (X,Y). Y is another column. Even though Y is a 
> globally unique PK (global meaning it is unique across partitions, not just 
> in one partition), Postgres does not allow me to
> create a PK on Y in a partitioned table.
> 
> Option 2:
> 
> I add PK on Y on each of the partitions
> 
> Are these not equivalent? If not, which is better and why?

No, they are not equivalent.

Option 2 comes closer to guaranteeing uniqueness for column X, so use that.

> PS: This is what my best friend had to say:
> 
> [...] If you are using the "table inheritance" approach [...]

Don't even consider that.  Declarative partitioning is so much better.

Yours,
Laurenz Albe




Re: Cannot terminate backend

2023-03-31 Thread Laurenz Albe
On Fri, 2023-03-31 at 13:46 +, Arnaud Lesauvage wrote:
> I have a long running query that I seem unable to either cancel or terminate.
> What could be the reason for this, and what is the bet way to terminate this 
> kind of query ?
>  
> The query is a CTE using postgis ST_ClusterDBSCAN function. The CTE returns 
> approximately 150k rows.
> The SQL is as follows : 
> 
> EXPLAIN ANALYZE
> WITH subq AS (
>     SELECT id, geom, ST_ClusterDBSCAN(geom, eps := 1000, minpoints := 1) 
> OVER() AS cluster_id
>     FROM mytable
> )
> SELECT cluster_id, count(id), ST_Collect(geom) 
> FROM subq
> GROUP BY cluster_id;
>  
> pg_stat_activity show no wait event.
> pg_cancel_backend(mypid) returns true but the state does not change in 
> pg_stat_activity.
> pg_terminate_backend(mypid) yields the same result (as superuser)
> Pg_stat_activity show no wait_event.
>  
> SELECT version();
> PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled 
> by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

The most likely explanation is that one of the PostGIS functions runs for
a long time without checking CHECK_FOR_INTERRUPTS().
That would be a PostGIS bug.  Try to construct a reproducible test case
that you can share!

Perhaps this trick can help:
https://www.cybertec-postgresql.com/en/cancel-hanging-postgresql-query/

Yours,
Laurenz Albe




RE: Cannot terminate backend

2023-03-31 Thread Arnaud Lesauvage
> On Fri, 2023-03-31 at 13:46 +, Arnaud Lesauvage wrote:
> > I have a long running query that I seem unable to either cancel or
> terminate.
> > What could be the reason for this, and what is the bet way to terminate
> this kind of query ?
> >
> > The query is a CTE using postgis ST_ClusterDBSCAN function. The CTE
> returns approximately 150k rows.
> > The SQL is as follows :
> >
> > EXPLAIN ANALYZE
> > WITH subq AS (
> >     SELECT id, geom, ST_ClusterDBSCAN(geom, eps := 1000, minpoints
> > := 1) OVER() AS cluster_id
> >     FROM mytable
> > )
> > SELECT cluster_id, count(id), ST_Collect(geom) FROM subq GROUP BY
> > cluster_id;
> >
> > pg_stat_activity show no wait event.
> > pg_cancel_backend(mypid) returns true but the state does not change in
> pg_stat_activity.
> > pg_terminate_backend(mypid) yields the same result (as superuser)
> > Pg_stat_activity show no wait_event.
> >
> > SELECT version();
> > PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
> > compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
> 
> The most likely explanation is that one of the PostGIS functions runs for a
> long time without checking CHECK_FOR_INTERRUPTS().
> That would be a PostGIS bug.  Try to construct a reproducible test case that
> you can share!
> 
> Perhaps this trick can help:
> https://www.cybertec-postgresql.com/en/cancel-hanging-postgresql-
> query/


Thanks Laurenz
Unfortunately, I don't have a shell access to the server, so I guess I'll have 
to ask to sysadmin to kill -9 ?

Regards
Arnaud



Re: Question on creating keys on partitioned tables

2023-03-31 Thread Siddharth Jain
Thanks Laurenz.

I think the two are equivalent. If not, could you please explain why?

On Fri, Mar 31, 2023 at 6:46 AM Laurenz Albe 
wrote:

> On Thu, 2023-03-30 at 17:05 -0700, Siddharth Jain wrote:
> > I have this question. Say I create a partitioned table on column X.
> >
> > Option 1:
> >
> > I add a primary key on (X,Y). Y is another column. Even though Y is a
> globally unique PK (global meaning it is unique across partitions, not just
> in one partition), Postgres does not allow me to
> > create a PK on Y in a partitioned table.
> >
> > Option 2:
> >
> > I add PK on Y on each of the partitions
> >
> > Are these not equivalent? If not, which is better and why?
>
> No, they are not equivalent.
>
> Option 2 comes closer to guaranteeing uniqueness for column X, so use that.
>
> > PS: This is what my best friend had to say:
> >
> > [...] If you are using the "table inheritance" approach [...]
>
> Don't even consider that.  Declarative partitioning is so much better.
>
> Yours,
> Laurenz Albe
>


Re: Cannot terminate backend

2023-03-31 Thread Paul Ramsey
Afraid so. You can wait longer, I guess. You may have found two bugs… the lack 
of an interrupt in the dbscan loop, which I’m working on now. And maybe an 
infinite looping case? In which case if you want a fix on that, you’ll have to 
share your data and query. 

P.

> On Mar 31, 2023, at 7:41 AM, Arnaud Lesauvage  wrote:
> 
>> On Fri, 2023-03-31 at 13:46 +, Arnaud Lesauvage wrote:
>>> I have a long running query that I seem unable to either cancel or
>> terminate.
>>> What could be the reason for this, and what is the bet way to terminate
>> this kind of query ?
>>> 
>>> The query is a CTE using postgis ST_ClusterDBSCAN function. The CTE
>> returns approximately 150k rows.
>>> The SQL is as follows :
>>> 
>>> EXPLAIN ANALYZE
>>> WITH subq AS (
>>> SELECT id, geom, ST_ClusterDBSCAN(geom, eps := 1000, minpoints
>>> := 1) OVER() AS cluster_id
>>> FROM mytable
>>> )
>>> SELECT cluster_id, count(id), ST_Collect(geom) FROM subq GROUP BY
>>> cluster_id;
>>> 
>>> pg_stat_activity show no wait event.
>>> pg_cancel_backend(mypid) returns true but the state does not change in
>> pg_stat_activity.
>>> pg_terminate_backend(mypid) yields the same result (as superuser)
>>> Pg_stat_activity show no wait_event.
>>> 
>>> SELECT version();
>>> PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
>>> compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
>> 
>> The most likely explanation is that one of the PostGIS functions runs for a
>> long time without checking CHECK_FOR_INTERRUPTS().
>> That would be a PostGIS bug.  Try to construct a reproducible test case that
>> you can share!
>> 
>> Perhaps this trick can help:
>> https://www.cybertec-postgresql.com/en/cancel-hanging-postgresql-
>> query/
> 
> 
> Thanks Laurenz
> Unfortunately, I don't have a shell access to the server, so I guess I'll 
> have to ask to sysadmin to kill -9 ?
> 
> Regards
> Arnaud



Re: Question on creating keys on partitioned tables

2023-03-31 Thread Tom Lane
Siddharth Jain  writes:
> I think the two are equivalent. If not, could you please explain why?

Well, they're formally equivalent if you require there to be only one
X value per partition (ie, PARTITION BY LIST with only one listed value
per partition); if there's more, they're not the same thing.

Neither one guarantees that Y is globally unique.  We have no mechanism
for enforcing uniqueness across partitions except for partition key
columns.

regards, tom lane




Re: Question on creating keys on partitioned tables

2023-03-31 Thread Siddharth Jain
On Fri, Mar 31, 2023 at 9:07 AM Tom Lane  wrote:

> Siddharth Jain  writes:
> > I think the two are equivalent. If not, could you please explain why?
>
> Well, they're formally equivalent if you require there to be only one
> X value per partition (ie, PARTITION BY LIST with only one listed value
> per partition); if there's more, they're not the same thing.
>

Yes. This is what I have. I think I understand now. Is following correct?

when a PK is created on (X,Y) on the parent table what happens internally
is that the command is run individually on each of the child tables.
nothing more. nothing less.



>
> Neither one guarantees that Y is globally unique.  We have no mechanism
> for enforcing uniqueness across partitions except for partition key
> columns.
>
> regards, tom lane
>


My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

2023-03-31 Thread Bryn Llewellyn
*Summary*

My tests show that, when a WITHOUT HOLD cursor has to cache results (see Note 
1), then the WHERE clause (if present) is stripped off the cursor's defining 
SELECT statement and the entire unrestricted result set is cached. But when a 
WITH HOLD cursor is used, then it’s the *restricted* result set that’s cached.

I do see that this wouldn't have a detectable effect when the cursor's defining 
query doesn't involve any volatile functions. But it does seem that too much 
data is cached in the "not holdable" case—and this seems to be a bad thing for 
space use and for speed.

I'd like to know the rationale for this design choice but I've been unable to 
find any mention of it in the PG doc (see Note 2).

My test design used this statement:

select v from rndm_series() [where rndm_filter()]

And I used a trivial table function "cursor_rows()” that simply iterated over 
all the rows that the cursor defined, from first through last, to display the 
cursor's result set three times in succession.

This lead to four runs: "WITH or WITHOUT HOLD" by "with or without the WHERE 
clause".

I might just as well have used a simple "series()" function wrapped around 
"generate_series()" that used "raise info" to report that it was called 
together with a simple "filter()" function that always returned true and, 
again, that used "raise info" to report that it was called. But I find it too 
hard to divert "raise info" output to a spool file and have it interleave 
properly with regular SQL output. So I used a volatile "rndm_series() that 
always returned ten rows but (using the random() built-in function) produced a 
different set of values on each call. And I used a volatile "rndm_filter()" 
function that similarly randomly returned TRUE or FALSE. These devices 
dramatized the effect I'm that reporting here and made it simple to record the 
results ordinarily with the \o meta-command.

I'm assuming that my PL/pgSQL functions are opaque to the planner and so that 
it cannot run the execution plan in backwards order and must, therefore, cash 
the cursor's result set.

I would have expected the *restricted* result set to be cached for both flavors 
of cursor—holdable and not. After all, we're taught to expect nonsense results 
when a volatile function's results are cached—so a cursor should be no 
different w.r.t. this thinking.

(Of course, I've read the CAUTION in the PG doc for the DECLARE statement that 
says « Scrollable cursors may give unexpected results if they invoke any 
volatile functions » and I wouldn't use a volatile function in real life.)
——

*Note 1*

I read Laurenz's blogpost "WITH HOLD cursors and transactions in PostgreSQL" 
(www.cybertec-postgresql.com/en/with-hold-cursors-and-transactions-in-postgresql/)
 and I noted this:

«
[Some] execution plans [for a WITHOUT HOLD cursor] require the explicit keyword 
SCROLL for the cursor to become scrollable. Such cursors incur an overhead, 
because the server must cache the entire result set.
»

and then this:

«
PostgreSQL must calculate the complete result set [for a WITH HOLD cursor] at 
COMMIT time and cache it on the server.
»

In other words, both a WITH HOLD cursor (always) and a WITHOUT HOLD cursor 
(sometimes) must cache the entire result set.
——

*Note 2*

Laurenz also wrote this in the "Is the PL/pgSQL refcursor useful in a modern 
three-tier app?" thread that I started:

«
 I understand your confusion, and I believe that the documentation could be 
improved... I personally find that reading the PostgreSQL documentation gets 
you far, but only so far: for deep understanding, you have to read the code... 
I have come to see it as an extension of the documentation that covers the 
details.
»
——

*Testcase*

I used PG Version 15.2. The code that follows is self-contained. You need just 
to connect as an ordinary user to a database where it has the CREATE privilege. 
Copy it into, say, "t.sql" and start it in psql. (Make sure that you have the 
usual « AUTOCOMMIT = 'on' » setting.)

\c :db :u1

drop schema if exists s1 cascade;
create schema s1;
set search_path = s1, pg_catalog, pg_temp;

create procedure init_rndm()
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  perform setseed(0.0::double precision);
end;
$body$;

create function rndm_filter()
  returns boolean
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  select (random() > 0.7::float8);
$body$;

-- Ensure that the planner cannot know how the rows are delivered.
create function rndm_series()
  returns table(v int)
  set search_path = s1, pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  val int not null := 0;
  ctr int not null := 0;
begin
  loop
val := val + 1;
if (random() > 0.9::float8) then
  ctr := ctr + 1;
  exit when ctr > 10;
  v := val; return next;
end if;
  end loop;
end;
$body$;

create procedure open_holdable_curso

Re: My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

2023-03-31 Thread David G. Johnston
On Fri, Mar 31, 2023 at 12:35 PM Bryn Llewellyn  wrote:

> **Summary**
>
> My tests show that, when a WITHOUT HOLD cursor has to cache results (see
> Note 1), then the WHERE clause (if present) is stripped off the cursor's
> defining SELECT statement and the entire unrestricted result set is cached.
> But when a WITH HOLD cursor is used, then it’s the *restricted* result set
> that’s cached.
>
> I do see that this wouldn't have a detectable effect when the cursor's
> defining query doesn't involve any volatile functions. But it does seem
> that too much data is cached in the "not holdable" case—and this seems to
> be a bad thing for space use and for speed.
>
>
IIUC, all you've demonstrated here is the (sometimes) case for the WITHOUT
HOLD cursor where a cache is not used (i.e., the typical case).  In this
situation the executor, when asked to rewind back to the beginning, goes
and restarts execution at the beginning (executor nodes form a tree, it is
probable that certain nodes are more efficient at this "start over" thing
that others - e.g., I suspect a materialize node sitting in the tree would
prevent a sequential scan node from being asked to "start over"), which
necessarily involves potentially re-evaluating volatile
functions/expressions as noted.

David J.


Re: My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

2023-03-31 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> *Summary*
>> 
>> My tests show that, when a WITHOUT HOLD cursor has to cache results (see 
>> Note 1), then the WHERE clause (if present) is stripped off the cursor's 
>> defining SELECT statement and the entire unrestricted result set is cached. 
>> But when a WITH HOLD cursor is used, then it’s the *restricted* result set 
>> that’s cached.
>> 
>> I do see that this wouldn't have a detectable effect when the cursor's 
>> defining query doesn't involve any volatile functions. But it does seem that 
>> too much data is cached in the "not holdable" case—and this seems to be a 
>> bad thing for space use and for speed.
> 
> IIUC, all you've demonstrated here is the (sometimes) case for the WITHOUT 
> HOLD cursor where a cache is not used (i.e., the typical case). In this 
> situation the executor, when asked to rewind back to the beginning, goes and 
> restarts execution at the beginning (executor nodes form a tree, it is 
> probable that certain nodes are more efficient at this "start over" thing 
> that others - e.g., I suspect a materialize node sitting in the tree would 
> prevent a sequential scan node from being asked to "start over"), which 
> necessarily involves potentially re-evaluating volatile functions/expressions 
> as noted.

Forgive me. I don't understand your reply. I do understand (having read 
Laurenz's blog post) that sometimes the execution plan for the cursor's 
defining SELECT cannot be run backwards. I'm not sure that it matters whether 
this case is typical or not. It's enough that it can occur. And this is the 
case that I'm interested in. Laurenz says that in this case, for a WITHOUT HOLD 
cursor, the results must be cached to allow scrollability. And the results of 
my tests are consistent with this—up to a point.

However, my results show that for the WITHOUT HOLD case, the restriction that 
the cursor's SELECT might have is *not* applied to what's cached. But the 
restriction *is* applied when the WITH HOLD cache is populated.

And it's this that I'm asking about.

Forget that I ever said "volatile". I just edited the code that I included in 
my previous post. I globally replaced "rndm_series" with "series". And I 
globally replaced "rndm_filter" with "filter". I also removed the "create 
procedure init_rndm()" statement and removed the calls of the procedure. Here 
are the new implementations of "series()" and "filter()""

create function series()
  returns table(v int)
  set search_path = s1, pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info 'series() invoked';

  for v in (select generate_series(1, 10))loop
return next;
  end loop;
end;
$body$;

and

create function filter()
  returns boolean
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info 'filter() invoked';
  return true;
end;
$body$;

Then I ran the four tests by hand because I don't know how to spool the "raise 
info" output to a file. In all cases, the "cursor_rows()" invocation just 
reports the ten rows with values in 1 through 10 — of course.

Here's what I saw:

* (1) open_holdable_cursor(holdable=>false, filter_series=>false, caption=>'') *

The "open_holdable_cursor()" call completes silently.

The first "cursor_rows()" invocation reports "series() invoked" once.

Subsequent "cursor_rows()" invocations produce their rows without that message.

* (2) call open_holdable_cursor(holdable=>false, filter_series=>true, 
caption=>'') *

The "open_holdable_cursor()" call completes silently again.

The first "cursor_rows()" invocation again reports "series() invoked" once. And 
then it reports "filter() invoked" ten times.

The second "cursor_rows()" invocation again does *not* report "series() 
invoked". But it *does* report "filter() invoked" ten times. This tells me that 
its the *unrestricted* results that are cached.

It's the same for the third invocation (and any more that I care to do).

*  (3) open_holdable_cursor(holdable=>true, filter_series=>false, caption=>'') *

The "open_holdable_cursor()" call now reports "series() invoked".

The first, and all subsequent, "cursor_rows()" invocations do not say "series() 
invoked".

* (4) open_holdable_cursor(holdable=>true, filter_series=>true, caption=>'') *

The "open_holdable_cursor()" call now reports "series() invoked" followed by 
"filter() invoked" ten times.

The first, and all subsequent, "cursor_rows()" invocations do not bring any 
"raise info" output because the *restricted* results are cached.

I hope that my question is clearer now.



​jsonb @@ jsonpath operator doc: ​Only the first item of the result is taken into account

2023-03-31 Thread jian he
Hi,
https://www.postgresql.org/docs/current/functions-json.html

> jsonb @@ jsonpath → boolean
>
> Returns the result of a JSON path predicate check for the specified JSON
> value. Only the first item of the result is taken into account. If the
> result is not Boolean, then NULL is returned.
>
> '{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2' → t
>

select jsonb_path_query('{"a":[1,2,3,4,5]}',  '$.a[*]');
return

 jsonb_path_query
> --
>  1
>  2
>  3
>  4
>  5
> (5 rows)
>

I don't understand: "Only the first item of the result is taken into
account.".

Here, JSON path predicate check for the specified JSON value return true,
some return false. (1 > 2 is false, 2 > 2 is false).