Re: Undocumented array_val[generate_series(...)] functionality?

2021-07-12 Thread Dean Rasheed
On Mon, 12 Jul 2021 at 02:39, David G. Johnston
 wrote:
>
> One, the select generate_series(1,3) function call causes multiple rows to be 
> generated where there would usually be only one.

Yes.

> Two, composition results in an inside-to-outside execution order: the SRF is 
> evaluated first, the additional rows added, then the outer function (abs or 
> the subscript function respectively in these examples) is evaluated for 
> whatever rows are now present in the result.

Yes.

> Is the above something one can learn from our documentation?

Yes, but only if you know where to look.

> Is this syntax we are discouraging users from using and thus intentionally 
> not documenting it?

On the contrary, I would say that this is the expected behaviour, and
that it is documented, though not in the most obvious place:

https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET

That's probably not the first place one would go looking for it, and
might also (wrongly) imply that it only works for functions written in
language SQL.

BTW, this is something that started working in PG10 (in 9.6, an error
is thrown), and I think that it's a result of this release note item,
which matches your conclusions:

  Change the implementation of set-returning functions appearing in
  a query's SELECT list (Andres Freund)

  Set-returning functions are now evaluated before evaluation of
  scalar expressions in the SELECT list, much as though they had been
  placed in a LATERAL FROM-clause item. This allows saner semantics for
  cases where multiple set-returning functions are present. If they
  return different numbers of rows, the shorter results are extended to
  match the longest result by adding nulls. Previously the results were
  cycled until they all terminated at the same time, producing a number
  of rows equal to the least common multiple of the functions'
  periods. In addition, set-returning functions are now disallowed
  within CASE and COALESCE constructs. For more information see Section
  37.4.8.

Another interesting consequence of that is that it's possible to do a
similar thing with the array slice syntax, and a pair of
generate_series() calls, for example:

SELECT (array[1,2,3,4]::int[])[generate_series(1,4) : generate_series(2,4)];

 array
---
 {1,2}
 {2,3}
 {3,4}

(4 rows)

Note: there are 4 rows in that result, and the last one is NULL, which
is also consistent with the documentation, and the fact that the array
slice function returns NULL if either subscript is NULL.

I'd agree that there's an opportunity to improve the docs here.

Regards,
Dean




Re: pg_upgrade as a way of cloning an instance?

2021-07-12 Thread Luca Ferrari
On Thu, Jul 8, 2021 at 2:30 AM Bruce Momjian  wrote:
> Also, pg_upgrade will throw an error if copying from one version to the
> same version if there are tablespaces since the tablespace directory
> name is the catalog version name.

A very good point to discourage this usage!

Thanks,
Luca




Re: Removing a subscription that does not exist

2021-07-12 Thread Jeff Ross

On 7/11/21 7:38 PM, Kyotaro Horiguchi wrote:

At Fri, 9 Jul 2021 10:49:46 -0600, Jeff Ross  wrote in

Hello,

I'm working with an RDS instance running 12 that has an old
subscription that I can't seem to drop.

...
Look at the subdbid field in the first query result.  You were logging
into the databsae with OID=66754 and the subscription "metro" belongs
to the database 14313.  The second command doesn't show metro which is
not of the current database.



| What else can I try to remove this old non-functional subscription?

...

Thus you need to log in to the databse OID=14313 to manipulate on the
subsciption metro.

regards.

That was it exactly. Once I connected to that database the subscription 
could be disabled, its slot name set to None and finally dropped.


Thank you!

Jeff


Re: Undocumented array_val[generate_series(...)] functionality?

2021-07-12 Thread Tom Lane
Dean Rasheed  writes:
> BTW, this is something that started working in PG10 (in 9.6, an error
> is thrown), and I think that it's a result of this release note item,
> which matches your conclusions:
>   Change the implementation of set-returning functions appearing in
>   a query's SELECT list (Andres Freund)

Interesting.  When I first looked at this thread I figured "oh, that's
always worked, nothing to see here".  But you're right, we didn't use
to allow SRFs in subscripts.  Still, I'm with David that no new docs
are needed.  IMO the former restriction was the surprising thing, and
the current behavior is simply what one would expect from assembling
those parts in that order.

regards, tom lane




Re: Undocumented array_val[generate_series(...)] functionality?

2021-07-12 Thread David G. Johnston
On Monday, July 12, 2021, Tom Lane  wrote:

> Still, I'm with David that no new docs
> are needed.  IMO the former restriction was the surprising thing, and
> the current behavior is simply what one would expect from assembling
> those parts in that order.
>
>
I agree the material in Extending SQL - SQL Functions, plus normal
expectations, mean that we do indeed cover the topic sufficiently.  The
location and specifics of the material and use of cross-references could
use some attention though its also not an area that gets too many questions
so people are figuring this all out one way or another (or not finding a
need and just don’t know what they are missing…)

It doesn’t really fit that well in a syntax chapter since the calling
syntax is indeed the same.  But the Queries Chapter is where I was looking
this past time (I knew about and read the SQL Functions Chapter previously
but didn’t get there is this skim).

David J.


RE: 3867653, EMM1 cluster issue on 23rd May--core generated --design feedback

2021-07-12 Thread M Tarkeshwar Rao
Hi All,

We are getting following core with following use case:

Case: We made max_connections as 15 and 10 different processes opening 15 
connections with server. It is running fine. But after few hours it giving 
following
  core. When we increase the max_connections then core come again but 
it take more hours to come.


Program terminated with signal 11, Segmentation fault.

#0  0x7f712e7a3bf0 in pqRowProcessor () from 
/tmp/3pp/postgres//lib/libpq.so.5

Missing separate debuginfos, use: debuginfo-install 
CXC1741717-R2N-EM20_ICP2001.x86_64

(gdb)

(gdb) bt

#0  0x7f712e7a3bf0 in pqRowProcessor () from /3pp/postgres//lib/libpq.so.5

#1  0x7f712e7adef2 in pqParseInput3 () from /3pp/postgres//lib/libpq.so.5

#2  0x7f712e7a4e78 in PQgetResult () from /3pp/postgres//lib/libpq.so.5

#3  0x7f712e7a50af in PQexecStart () from /3pp/postgres//lib/libpq.so.5

#4  0x7f712e7a5261 in PQexec () from /3pp/postgres//lib/libpq.so.5

#5  0x7f712ea08add in PostGresSqlExecutor::executeSql (this=0x14c50090, 
aSqlStatement=0x1341fee8 "SELECT * from tab_1",

thePrefetchCount=) at PostGresSqlExecutor.cc:107

Regards
Tarkeshwar


Re: Why can't I drop a tablespace?

2021-07-12 Thread Tom Lane
"Phil Endecott"  writes:
> Thanks Laurenz. I was looking at the source for "alter table set
> tablespace" yesterday trying to work out what is supposed to happen.
> There is a comment at tablecmds.c line 3989: "Thanks to the magic of
> MVCC, an error anywhere along the way rolls back the whole operation;
> we don't have to do anything special to clean up." But I guess that
> creating an entirely new file on a different filesystem is an
> exception to that.

No, but PANIC'ing during commit is :-(.  Ordinarily the files created by a
failed transaction would be removed during transaction cleanup, but we
did not reach that code.  So these were left behind, but the table's
original files in the original tablespace should be undamaged.

regards, tom lane




Re: Undocumented array_val[generate_series(...)] functionality?

2021-07-12 Thread Tom Lane
"David G. Johnston"  writes:
> I agree the material in Extending SQL - SQL Functions, plus normal
> expectations, mean that we do indeed cover the topic sufficiently.  The
> location and specifics of the material and use of cross-references could
> use some attention though its also not an area that gets too many questions
> so people are figuring this all out one way or another (or not finding a
> need and just don’t know what they are missing…)

Yeah, there is a lot of material in that chapter that is about using
functions, not just writing them.  I don't think anyone's entirely
happy about the current factorization of those parts of the manual;
but it's not entirely clear how to make it better, either.

regards, tom lane




Use of '&' as table prefix in query

2021-07-12 Thread Rich Shepard

Long ago I wrote a query which was greatly improved (i.e., it actually
worked as intended) by help here):

/* This query selects all activity information for a named person */

SELECT p.lname, p.fname, p.loc_nbr, p.job_title, p.direct_phone, p.active,
   o.org_name,
   l.loc_nbr, l.loc_name,
   a.act_date, a.act_type, a.notes, a.next_contact
FROM People AS p
 JOIN Organizations AS o ON o.org_nbr = p.org_nbr
 JOIN Locations AS l ON l.org_nbr = o.org_nbr and l.loc_nbr = p.loc_nbr
 JOIN Activities AS a ON a.person_nbr = p.person_nbr
WHERE p.lname = &p.lname AND p.fname = &p.fname;

I did not save the reason why the ampersand is used in the WHERE row selection
phrase and want now to learn why it's there. Probably needed to concatenate
separate names?

TIA,

Rich




Re: Use of '&' as table prefix in query

2021-07-12 Thread Tom Lane
Rich Shepard  writes:
> Long ago I wrote a query which was greatly improved (i.e., it actually
> worked as intended) by help here):

> SELECT ...
> WHERE p.lname = &p.lname AND p.fname = &p.fname;

> I did not save the reason why the ampersand is used in the WHERE row selection
> phrase and want now to learn why it's there. Probably needed to concatenate
> separate names?

AFAICS this is invoking a prefix operator named "&".  There is no such
operator built into Postgres.  Maybe psql's "\do+ &" would jog your
memory about where yours came from.

regards, tom lane




Re: How to debug a connection that's "active" but hanging?

2021-07-12 Thread Jurrie Overgoor

On 10-07-2021 10:26, Vijaykumar Jain wrote:
On Sat, 10 Jul 2021 at 00:29, Jurrie Overgoor 
> wrote:


Hi everyone,

We are in the process of upgrading from PostgreSQL 9.6 to 13. When
our
database gets created in our regression tests, we run some unit tests
first. We see one of those tests hang.

It seems the client is waiting on more data to arrive from the
PostgreSQL server. A thread dump shows it waiting on the socket.

On the server, I see the connection from the client. Looking at
pg_stat_activity, I see it is in state 'active'. I have seen
query_start
be as old as eight hours ago. The documentation states that 'active'
means a query is being executed. The query in question is:

Looking at pg_locks, I only see locks with granted = true for the PID.


Is this reproducible, I mean this happens multiple times?



Hi Vijaykumar, thanks for replying. Yes, this is reproducible. About 50% 
of the times, my connection is hanging. It's always on the same query, 
which I shared in the previous post. These are unit tests that are 
executed just after the database is created from scratch using Liquibase.



can you please run the below query in a separate session and share the 
result, feel free to anonymize sensitive stuff.


SELECT db.datname, locktype, relation::regclass, mode, transactionid
AS tid, virtualtransaction AS vtid, pid, granted
FROM pg_catalog.pg_locks l
LEFT JOIN pg_catalog.pg_database db ON db.oid = l.database
WHERE NOT pid = pg_backend_pid();



Sure! This is all testdata; there is not much that needs to be 
anonymized :) Here it is:



|datname|locktype |relation |mode 
|tid|vtid   |pid  |granted|

|---|--|-|---|---|---|-|---|
|wildfly|relation  |pg_constraint_contypid_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation 
|pg_constraint_conrelid_contypid_conname_index|AccessShareLock| 
|4/46389|22928|true   |
|wildfly|relation  |pg_constraint_conname_nsp_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_constraint_conparentid_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_namespace_oid_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_namespace_nspname_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_attribute_relid_attnum_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_attribute_relid_attnam_index |AccessShareLock|   
|4/46389|22928|true   |

|wildfly|relation  |pg_depend |AccessShareLock|   |4/46389|22928|true   |
|wildfly|relation  |pg_constraint |AccessShareLock|   
|4/46389|22928|true   |

|wildfly|relation  |pg_class |AccessShareLock|   |4/46389|22928|true   |
|wildfly|relation  |pg_namespace |AccessShareLock|   |4/46389|22928|true   |
|wildfly|relation  |pg_attribute |AccessShareLock|   |4/46389|22928|true   |
|wildfly|relation  |information_schema.table_constraints 
|AccessShareLock|   |4/46389|22928|true   |
|wildfly|relation  |information_schema.referential_constraints 
|AccessShareLock|   |4/46389|22928|true   |
|wildfly|relation  |information_schema.key_column_usage 
|AccessShareLock|   |4/46389|22928|true   |

|   |virtualxid| |ExclusiveLock  |   |4/46389|22928|true   |
|wildfly|relation  |pg_depend_depender_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_class_oid_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_constraint_oid_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_class_relname_nsp_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_depend_reference_index |AccessShareLock|   
|4/46389|22928|true   |
|wildfly|relation  |pg_class_tblspc_relfilenode_index 
|AccessShareLock|   |4/46389|22928|true   |



Indeed, this time it's PID 22928 that's hanging.



There is nothing in the logs as far as I can see. Configuration
variables log_min_messages and log_min_error_statement are on
'debug1'.
This is a snippet of the logs:

2021-07-09 20:35:16.374 CEST [30399] STATEMENT: START TRANSACTION
2021-07-09 20:35:18.703 CEST [30399] WARNING:  there is already a
transaction in progress

Are you having multiple BEGIN tx not matching commit/rollbacks ? not 
sure related, but fixing this will help reduce noise.



Actually, I'm not quite sure where those messages come from. They of 
course indicate starting a transaction that's already in place. But I'd 
have to dig into the framework we use to know where these come from. My 
gut feeling is that this is not part of the problem, rather just noise.




I am a bit out of ideas - does anyone have any tips where I should
look
to see what is causing the query to hang?


Although I am not an expert at this, I am exploring myself.
Can you collect backtrace for this 

Re: Use of '&' as table prefix in query

2021-07-12 Thread Rob Sargent

On 7/12/21 11:25 AM, Rich Shepard wrote:

Long ago I wrote a query which was greatly improved (i.e., it actually
worked as intended) by help here):

/* This query selects all activity information for a named person */

SELECT p.lname, p.fname, p.loc_nbr, p.job_title, p.direct_phone, 
p.active,

   o.org_name,
   l.loc_nbr, l.loc_name,
   a.act_date, a.act_type, a.notes, a.next_contact
FROM People AS p
 JOIN Organizations AS o ON o.org_nbr = p.org_nbr
 JOIN Locations AS l ON l.org_nbr = o.org_nbr and l.loc_nbr = 
p.loc_nbr

 JOIN Activities AS a ON a.person_nbr = p.person_nbr
WHERE p.lname = &p.lname AND p.fname = &p.fname;

These look like value substitutions, usually done on the client at it 
sends the sql.  How is this sql getting to the server (presumably after 
substitution).





Re: Use of '&' as table prefix in query

2021-07-12 Thread Rich Shepard

On Mon, 12 Jul 2021, Tom Lane wrote:


AFAICS this is invoking a prefix operator named "&". There is no such
operator built into Postgres. Maybe psql's "\do+ &" would jog your memory
about where yours came from.


tom,

I thought it wasn't part of postgres. I've no idea why it's there, but I'll
remove it and see what happens.

Thanks!

Rich




Re: How to debug a connection that's "active" but hanging?

2021-07-12 Thread Tom Lane
Jurrie Overgoor  writes:
> Hi Vijaykumar, thanks for replying. Yes, this is reproducible. About 50% 
> of the times, my connection is hanging. It's always on the same query, 
> which I shared in the previous post.

The backtraces you captured look like the query is not "hung", it's
just computing away.

A plausible interpretation of the facts you've given is that the
query's plan is unstable, and sometimes the server is choosing a plan
that takes much longer to run than other times.  Ordinarily I'd suggest
that the auto_explain extension might help you debug that, but I think
it only works on queries that do eventually complete.  You might need
to investigate by altering your application to capture "EXPLAIN ..."
output just before the troublesome query, so you can see if it gets
a different plan in the slow cases.

regards, tom lane




Re: Use of '&' as table prefix in query

2021-07-12 Thread Rich Shepard

On Mon, 12 Jul 2021, Rob Sargent wrote:


These look like value substitutions, usually done on the client at it
sends the sql. How is this sql getting to the server (presumably after
substitution).


Rob,

I was running queries from the psql back then. Now I'm adding a GUI
(tkinter) and using psycopg2.

Thanks,

Rich




Re: Use of '&' as table prefix in query

2021-07-12 Thread Adrian Klaver

On 7/12/21 10:45 AM, Rich Shepard wrote:

On Mon, 12 Jul 2021, Tom Lane wrote:


AFAICS this is invoking a prefix operator named "&". There is no such
operator built into Postgres. Maybe psql's "\do+ &" would jog your memory
about where yours came from.


tom,

I thought it wasn't part of postgres. I've no idea why it's there, but I'll
remove it and see what happens.


Before you do that see Rob's post.



Thanks!

Rich





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Use of '&' as table prefix in query

2021-07-12 Thread Adrian Klaver

On 7/12/21 10:45 AM, Rich Shepard wrote:

On Mon, 12 Jul 2021, Tom Lane wrote:


AFAICS this is invoking a prefix operator named "&". There is no such
operator built into Postgres. Maybe psql's "\do+ &" would jog your memory
about where yours came from.


tom,

I thought it wasn't part of postgres. I've no idea why it's there, but I'll
remove it and see what happens.


You should also follow Tom's suggestion and do:

\do+ &

in psql.



Thanks!

Rich





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Use of '&' as table prefix in query

2021-07-12 Thread Rob Sargent

On 7/12/21 11:47 AM, Rich Shepard wrote:

On Mon, 12 Jul 2021, Rob Sargent wrote:


These look like value substitutions, usually done on the client at it
sends the sql. How is this sql getting to the server (presumably after
substitution).


Rob,

I was running queries from the psql back then. Now I'm adding a GUI
(tkinter) and using psycopg2.

Thanks,

Rich



Having a hard time seeing the value in p.lname = p.lname and straight sql.





Re: Use of '&' as table prefix in query

2021-07-12 Thread Rich Shepard

On Mon, 12 Jul 2021, Adrian Klaver wrote:



You should also follow Tom's suggestion and do:
\do+ &
in psql.


It's the bitwise 'and':
# \do+ &
   List of operators
   Schema   | Name | Left arg type | Right arg type | Result type |   Function   | Description 
+--+---++-+--+-

 pg_catalog | &| bigint| bigint | bigint  | int8and 
 | bitwise and
 pg_catalog | &| bit   | bit| bit | bitand  
 | bitwise and
 pg_catalog | &| inet  | inet   | inet| inetand 
 | bitwise and
 pg_catalog | &| integer   | integer| integer | int4and 
 | bitwise and
 pg_catalog | &| macaddr   | macaddr| macaddr | macaddr_and 
 | bitwise and
 pg_catalog | &| macaddr8  | macaddr8   | macaddr8| 
macaddr8_and | bitwise and
 pg_catalog | &| smallint  | smallint   | smallint| int2and 
 | bitwise and
(7 rows)

which doesn't seem appropriate in this context.

Thanks,

Rich




Re: Use of '&' as table prefix in query

2021-07-12 Thread Rich Shepard

On Mon, 12 Jul 2021, Rob Sargent wrote:


Having a hard time seeing the value in p.lname = p.lname and straight sql.


Me, too. That's why I wanted an explanation. Seems to me I added it sometime
for no valid reason. As I've not run that query in a very long time I'll
clean up the query and test it with the next opportunity.

Rich




Re: Use of '&' as table prefix in query [RESOLVED]

2021-07-12 Thread Rich Shepard

On Mon, 12 Jul 2021, Rich Shepard wrote:


Long ago I wrote a query which was greatly improved (i.e., it actually
worked as intended) by help here):


Ah! It finally came back to me as I looked to revise it. What I want both
&p.lname and &p.fname are specific names passed in from the user. Now I know
what they represent I'll find just how to specify them using python and
psycopg2.

Rich




Re: How to debug a connection that's "active" but hanging?

2021-07-12 Thread Vijaykumar Jain
On Mon, 12 Jul 2021 at 23:16, Tom Lane  wrote:

>
> The backtraces you captured look like the query is not "hung", it's
> just computing away.
>
>
He mentioned earlier that the query was hung as 'active' for 8 hours and on.

incase this is due to bad plan,
@Jurrie Overgoor   is it also possible for
you to run manually

`vacuumdb  -a -v`  from the terminal, each time before you run your test
suite for some runs, do you still get the same issue?

I have a feeling repeated runs may have caused a lot of bloat on some
tables which might have not been reclaimed by autovacuum runs.


Re: Why can't I drop a tablespace?

2021-07-12 Thread Phil Endecott

Laurenz Albe wrote:

On Fri, 2021-07-09 at 20:04 +0100, Phil Endecott wrote:

=# create tablespace tempspace location "/db_temp";
=# alter table requests set tablespace tempspace;

That didn't work; I think disk space had actually reached zero:

PANIC:  could not write to file "pg_wal/xlogtemp.19369": No space left on device
STATEMENT:  alter table requests set tablespace tempspace;

So I shut down the database and resolved the problem in a more conventional 
way by resizing the filesystem. It is now (apparently) functioning normally.


BUT: I am unable to drop the tablespace that I created:

=# drop tablespace tempspace;
ERROR:  tablespace "tempspace" is not empty

On inspection /db_temp does contain a few GB of data that looks 
consistent with my table "requests" that I had tried to move.


postgres:/db_temp/PG_11_201809051$ ls -l 17829/
total 2894972
-rw--- 1 postgres postgres   32137216 Jul  8 18:35 486095
-rw--- 1 postgres postgres   37240832 Jul  8 18:57 494286
-rw--- 1 postgres postgres 1073741824 Jul  8 19:02 502478
-rw--- 1 postgres postgres 1073741824 Jul  8 19:03 502478.1
-rw--- 1 postgres postgres  747577344 Jul  8 19:03 502478.2

I can't find what is using it:


These files don't get cleaned up after a crash, so they may well be
leftovers you can remove.


Thanks Laurenz. I was looking at the source for "alter table set
tablespace" yesterday trying to work out what is supposed to happen.
There is a comment at tablecmds.c line 3989: "Thanks to the magic of
MVCC, an error anywhere along the way rolls back the whole operation;
we don't have to do anything special to clean up." But I guess that
creating an entirely new file on a different filesystem is an
exception to that.


Use \dt+ and \di+ to determine if the table or any of its indexes
actually resides in the new tablespace.  Don't forget the TOAST table.


"select distinct(reltablespace) from pg_class" gives only pg_global and
null. Is that including TOAST tables etc.?


If they are all still in the original tablespace as they should be
on account of the transactional guarantees, go ahead and manually
remove the files.


My plan is to wait for a couple of days to see if there are any other
opinions here, and then do that.


Thanks again,

Phil.








Re: How to debug a connection that's "active" but hanging?

2021-07-12 Thread Jurrie Overgoor

On 12-07-2021 20:56, Vijaykumar Jain wrote:
On Mon, 12 Jul 2021 at 23:16, Tom Lane > wrote:



The backtraces you captured look like the query is not "hung", it's
just computing away.


He mentioned earlier that the query was hung as 'active' for 8 hours 
and on.


incase this is due to bad plan,
@Jurrie Overgoor   is it also 
possible for you to run manually


`vacuumdb  -a -v`  from the terminal, each time before you run your 
test suite for some runs, do you still get the same issue?


I have a feeling repeated runs may have caused a lot of bloat on some 
tables which might have not been reclaimed by autovacuum runs.



I configured Jenkins to run that command prior to executing the tests. I 
got 5 successful runs, no hanging queries. Then I reverted and ran 
again. The first and second run were ok; the third run hung again. So 
your hunch might be right.



On 12-07-2021 19:46, Tom Lane wrote:

You might need
to investigate by altering your application to capture "EXPLAIN ..."
output just before the troublesome query, so you can see if it gets
a different plan in the slow cases. 



Then I tried this. The query plans are indeed not consistent.

Most of the time the first line of the query plan is: Unique  
(cost=4892.35..4892.35 rows=1 width=64) [1]


I have seen other costs: 5818.30, 6350.85 and 6514.73. They all complete 
correctly. [2], [3], [4]


The plan that leaves the query hanging in the 'active' state starts 
with: Unique  (cost=241.81..241.82 rows=1 width=64) [5]


That's clearly much lower than the rest. So I suspect the planner making 
a 'wrong' guess there, causing a bad plan, and a long time to execute. 
For reference, the executed query is [6].


Now, where to go from here? Is this considered a bug in PostgreSQL, or 
am I misusing the database engine by doing DROP DATABASE and CREATE 
DATABASE over and over again? I must say that I never saw this behavior 
on PostgreSQL 9.6, so in that regard it might be considered a bug.?


What can I do to get to the bottom of this? Should I export the content 
of some metadata tables prior to executing the hanging query? Should I 
`vacuumdb -a -v` prior to logging the EXPLAIN for the hanging query?


With kind regards,

Jurrie


[1] https://jurr.org/PostgreSQL_13_hanging_query/normal.txt
[2] https://jurr.org/PostgreSQL_13_hanging_query/alt1.txt
[3] https://jurr.org/PostgreSQL_13_hanging_query/alt3.txt
[4] https://jurr.org/PostgreSQL_13_hanging_query/alt2.txt
[5] https://jurr.org/PostgreSQL_13_hanging_query/hang.txt
[6] https://jurr.org/PostgreSQL_13_hanging_query/query.txt




Re: How to debug a connection that's "active" but hanging?

2021-07-12 Thread Tom Lane
Jurrie Overgoor  writes:
> Then I tried this. The query plans are indeed not consistent.

> Most of the time the first line of the query plan is: Unique  
> (cost=4892.35..4892.35 rows=1 width=64) [1]

> I have seen other costs: 5818.30, 6350.85 and 6514.73. They all complete 
> correctly. [2], [3], [4]

> The plan that leaves the query hanging in the 'active' state starts 
> with: Unique  (cost=241.81..241.82 rows=1 width=64) [5]

> That's clearly much lower than the rest. So I suspect the planner making 
> a 'wrong' guess there, causing a bad plan, and a long time to execute. 

Yeah, evidently.

> Now, where to go from here?

The most likely bet here is that you're populating a table and then
running a query on it before autovacuum has had a chance to catch up
with what you did.  Then the planner is working with obsolete stats
or none at all, and it guesses wrong about what to do.  The standard
fix is to issue a manual ANALYZE on the table between the data-load
and querying steps of your application.

> Is this considered a bug in PostgreSQL, or 
> am I misusing the database engine by doing DROP DATABASE and CREATE 
> DATABASE over and over again?

It's not a bug.  I suppose in a perfect world the stats would
automatically be up to date all the time, but in the real world
it seems like the cost of that would be exorbitant.

regards, tom lane