Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Mladen Marinović
Hi,

We recently migrated our production instances from PG11 to PG17. While
doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As
everything worked on the test servers we pushed this to production a few
weeks ago. We did not notice any problems until a few days ago (but the
problems were here from the start). The main manifestation of the problems
is a service that runs a fixed query to get a backlog of unprocessed data
(limited to a 1000 rows). When testing the query using pgAdmin connected
directly to the database we get a result in cca. 20 seconds. The same query
runs for 2 hours when using pgBouncer to connect to the same database.

The more interesting part is that when we issue an explain of the same
query we get different plans. We did this a few seconds apart so there
should be no difference in collected statistics. We ruled out prepared
statements, as we suspected the generic plan might be the problem, but it
is not. Is there any pgBouncer or PG17 parameter that might be the cause of
this?

Regards,
Mladen Marinović


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Efrain J. Berdecia
Is the query using parameter markers? Is the source executing the query forcing 
a "bad" data type casting?

Yahoo Mail: Search, Organize, Conquer 
 
  On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović wrote:   

On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM  wrote:

Hi  , you had better try  vacuum  analyze for the whole db ,   pgbouncer  
connection layer can not causeslow queries.

I did that already. But the slow query is the consequence of the different 
plan, not the statistics. 
From: Mladen Marinović 
Sent: Monday, May 5, 2025 12:27 PM
To: Achilleas Mantzios 
Cc: pgsql-general@lists.postgresql.org 
Subject: Re: Different execution plans in PG17 and pgBouncer... 

On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios 
 wrote:



On 5/5/25 11:00, Mladen Marinović wrote:


On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios 
 wrote:



On 5/5/25 09:52, Mladen Marinović wrote:
Hi,
We recently migrated our production instances from PG11 to PG17. While doing so 
we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on 
the test servers we pushed this to production a few weeks ago. We did not 
notice any problems until a few days ago (but the problems were here from the 
start). The main manifestation of the problems is a service that runs a fixed 
query to get a backlog of unprocessed data (limited to a 1000 rows). When 
testing the query using pgAdmin connected directly to the database we get a 
result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer 
to connect to the same database.




That's a huge jump, I hope you guys did extensive testing of your app. In which 
language is your app written? If java, then define prepareThreshold=0 in your 
jdbc and setmax_prepared_statements = 0in pgbouncer.

Mainly python, but the problem was noticed in a java service.Prepare treshold 
was already set to 0. We changed the max_prepared_statementsto 0 from the 
default (200) but no change was noticed.

How about search paths ? any difference on those between the two runs ? Do you 
set search_path in pgbouncer ? what is "cca." btw ?


The more interesting part is that when we issue an explain of the same query we 
get different plans. We did this a few seconds apart so there should be no 
difference in collected statistics. We ruled out prepared statements, as we 
suspected the generic plan might be the problem, but it is not. Is there any 
pgBouncer or PG17 parameter that might be the cause of this?




Does this spawn any connections (such as dblink) ? are there limits per user/db 
pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, 
UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses 
a different plan and runs slower because of that.

Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just 
passes through SQL to the server as fast as possible as it can. But I am sure 
you know that. Good luck, keep us posted!

Yes, that is what puzzles me.

What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any of those? What's the isolation 
level ?

How about the user ? is this the same user doing pgadmin queries VS via the app 
?

Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you 
debug the prepared statements .. just in case , and other stuff not printed by 
log_statement = all.

None of those parameters should affect the fact that when issuing the explain 
select query (the statement is not prepared) from psql directly gives a 
different result than issuing it over the pgbouncer connection. The result is 
repeatable.
We have rolled back pgbouncer to 1.12. and it seems the problem persists. This 
is one of the weirdest things I have ever seen with PostgreSQL.

 Regards,Mladen Marinović


  


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread SERHAD ERDEM
Hi  , wish you good lock with the  "transaction mode" 🙂   if pgbouncer is not 
really needed  , remove and use plain connections.i have experienced  
pgbouncer   in session mode  over 2 years with situation like  "pain in the 
ass" , finaly  removed this  bouncing  layer.


From: Mladen Marinović 
Sent: Monday, May 5, 2025 1:26 PM
To: Efrain J. Berdecia 
Cc: SERHAD ERDEM ; Achilleas Mantzios 
; pgsql-general@lists.postgresql.org 

Subject: Re: Different execution plans in PG17 and pgBouncer...

Hi,

Mystery not solved...but identified. The pool is in transaction mode and some 
connections use set enable_mergejoin=off, but they do not set it back to on. 
Upon getting the connection from the pool the parameter is still set to off 
causing the planner to not use this kind of join which results in different 
plans when using this tainted pgbouncer connection instead of the clean one 
from pg17.

The problem is that server_reset_query is not used when the pool is in 
transaction mode. Now, we have to see how to fix this problem.

Regards,
Mladen Marinović

On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia 
mailto:ejberde...@yahoo.com>> wrote:
Is the query using parameter markers? Is the source executing the query forcing 
a "bad" data type casting?

Yahoo Mail: Search, Organize, 
Conquer

On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović
mailto:ma...@kset.org>> wrote:


On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM 
mailto:serh...@hotmail.com>> wrote:
Hi  , you had better try  vacuum  analyze for the whole db ,   pgbouncer  
connection layer can not causeslow queries.

I did that already. But the slow query is the consequence of the different 
plan, not the statistics.


From: Mladen Marinović mailto:ma...@kset.org>>
Sent: Monday, May 5, 2025 12:27 PM
To: Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>>
Cc: 
pgsql-general@lists.postgresql.org 
mailto:pgsql-general@lists.postgresql.org>>
Subject: Re: Different execution plans in PG17 and pgBouncer...



On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>> wrote:


On 5/5/25 11:00, Mladen Marinović wrote:


On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>> wrote:


On 5/5/25 09:52, Mladen Marinović wrote:
Hi,

We recently migrated our production instances from PG11 to PG17. While doing so 
we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on 
the test servers we pushed this to production a few weeks ago. We did not 
notice any problems until a few days ago (but the problems were here from the 
start). The main manifestation of the problems is a service that runs a fixed 
query to get a backlog of unprocessed data (limited to a 1000 rows). When 
testing the query using pgAdmin connected directly to the database we get a 
result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer 
to connect to the same database.


That's a huge jump, I hope you guys did extensive testing of your app. In which 
language is your app written? If java, then define prepareThreshold=0 in your 
jdbc and set max_prepared_statements = 0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the  max_prepared_statements 
to 0 from the default (200) but no change was noticed.

How about search paths ? any difference on those between the two runs ? Do you 
set search_path in pgbouncer ? what is "cca." btw ?

The more interesting part is that when we issue an explain of the same query we 
get different plans. We did this a few seconds apart so there should be no 
difference in collected statistics. We ruled out prepared statements, as we 
suspected the generic plan might be the problem, but it is not. Is there any 
pgBouncer or PG17 parameter that might be the cause of this?


Does this spawn any connections (such as dblink) ? are there limits per user/db 
pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, 
UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses 
a different plan and runs slower because of that.

Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just 
passes through SQL to the server as fast as possible as it can. But I am sure 
you know that. Good luck, keep us posted!

Yes, that is what puzzles me.

What is the pgbouncer's timeout in the server connections ?

How about "idle in trans

Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread SERHAD ERDEM
Hi, if you are sure that exac plans are the same ,   try the model  of  select  
 count (*)  from   type   ,  instead of   select   * fromlimit;you may  
understand that the problem it is due to  returning rows  or   not.

From: Mladen Marinović 
Sent: Monday, May 5, 2025 12:52 PM
To: SERHAD ERDEM 
Cc: Achilleas Mantzios ; 
pgsql-general@lists.postgresql.org 
Subject: Re: Different execution plans in PG17 and pgBouncer...



On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM 
mailto:serh...@hotmail.com>> wrote:
Hi  , you had better try  vacuum  analyze for the whole db ,   pgbouncer  
connection layer can not causeslow queries.

I did that already. But the slow query is the consequence of the different 
plan, not the statistics.


From: Mladen Marinović mailto:ma...@kset.org>>
Sent: Monday, May 5, 2025 12:27 PM
To: Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>>
Cc: 
pgsql-general@lists.postgresql.org 
mailto:pgsql-general@lists.postgresql.org>>
Subject: Re: Different execution plans in PG17 and pgBouncer...



On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>> wrote:


On 5/5/25 11:00, Mladen Marinović wrote:


On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>> wrote:


On 5/5/25 09:52, Mladen Marinović wrote:
Hi,

We recently migrated our production instances from PG11 to PG17. While doing so 
we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on 
the test servers we pushed this to production a few weeks ago. We did not 
notice any problems until a few days ago (but the problems were here from the 
start). The main manifestation of the problems is a service that runs a fixed 
query to get a backlog of unprocessed data (limited to a 1000 rows). When 
testing the query using pgAdmin connected directly to the database we get a 
result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer 
to connect to the same database.


That's a huge jump, I hope you guys did extensive testing of your app. In which 
language is your app written? If java, then define prepareThreshold=0 in your 
jdbc and set max_prepared_statements = 0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the  max_prepared_statements 
to 0 from the default (200) but no change was noticed.

How about search paths ? any difference on those between the two runs ? Do you 
set search_path in pgbouncer ? what is "cca." btw ?

The more interesting part is that when we issue an explain of the same query we 
get different plans. We did this a few seconds apart so there should be no 
difference in collected statistics. We ruled out prepared statements, as we 
suspected the generic plan might be the problem, but it is not. Is there any 
pgBouncer or PG17 parameter that might be the cause of this?


Does this spawn any connections (such as dblink) ? are there limits per user/db 
pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, 
UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses 
a different plan and runs slower because of that.

Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just 
passes through SQL to the server as fast as possible as it can. But I am sure 
you know that. Good luck, keep us posted!

Yes, that is what puzzles me.

What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any of those? What's the isolation 
level ?

How about the user ? is this the same user doing pgadmin queries VS via the app 
?

Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you 
debug the prepared statements .. just in case , and other stuff not printed by 
log_statement = all.

None of those parameters should affect the fact that when issuing the explain 
select query (the statement is not prepared) from psql directly gives a 
different result than issuing it over the pgbouncer connection. The result is 
repeatable.

We have rolled back pgbouncer to 1.12. and it seems the problem persists. This 
is one of the weirdest things I have ever seen with PostgreSQL.

Regards,
Mladen Marinović


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Mladen Marinović
Hi,

Mystery not solved...but identified. The pool is in transaction mode and
some connections use set enable_mergejoin=off, but they do not set it back
to on. Upon getting the connection from the pool the parameter is still set
to off causing the planner to not use this kind of join which results in
different plans when using this tainted pgbouncer connection instead of the
clean one from pg17.

The problem is that server_reset_query is not used when the pool is in
transaction mode. Now, we have to see how to fix this problem.

Regards,
Mladen Marinović

On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia 
wrote:

> Is the query using parameter markers? Is the source executing the query
> forcing a "bad" data type casting?
>
> Yahoo Mail: Search, Organize, Conquer
> 
>
> On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović
>  wrote:
>
>
> On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM  wrote:
>
> Hi  , you had better try  vacuum  analyze for the whole db ,   pgbouncer
> connection layer can not causeslow queries.
>
>
> I did that already. But the slow query is the consequence of the different
> plan, not the statistics.
>
>
> --
> *From:* Mladen Marinović 
> *Sent:* Monday, May 5, 2025 12:27 PM
> *To:* Achilleas Mantzios 
> *Cc:* pgsql-general@lists.postgresql.org <
> pgsql-general@lists.postgresql.org>
> *Subject:* Re: Different execution plans in PG17 and pgBouncer...
>
>
>
> On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
>
>
> On 5/5/25 11:00, Mladen Marinović wrote:
>
>
>
> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
>
>
> On 5/5/25 09:52, Mladen Marinović wrote:
>
> Hi,
>
> We recently migrated our production instances from PG11 to PG17. While
> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As
> everything worked on the test servers we pushed this to production a few
> weeks ago. We did not notice any problems until a few days ago (but the
> problems were here from the start). The main manifestation of the problems
> is a service that runs a fixed query to get a backlog of unprocessed data
> (limited to a 1000 rows). When testing the query using pgAdmin connected
> directly to the database we get a result in cca. 20 seconds. The same query
> runs for 2 hours when using pgBouncer to connect to the same database.
>
>
> That's a huge jump, I hope you guys did extensive testing of your app. In
> which language is your app written? If java, then define prepareThreshold=0
> in your jdbc and set max_prepared_statements = 0 in pgbouncer.
>
> Mainly python, but the problem was noticed in a java service.
> Prepare treshold was already set to 0. We changed the  max_prepared_statements
> to 0 from the default (200) but no change was noticed.
>
> How about search paths ? any difference on those between the two runs ? Do
> you set search_path in pgbouncer ? what is "cca." btw ?
>
>
> The more interesting part is that when we issue an explain of the same
> query we get different plans. We did this a few seconds apart so there
> should be no difference in collected statistics. We ruled out prepared
> statements, as we suspected the generic plan might be the problem, but it
> is not. Is there any pgBouncer or PG17 parameter that might be the cause of
> this?
>
>
> Does this spawn any connections (such as dblink) ? are there limits per
> user/db pool_size in pgbouncer ?
>
> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT,
> UPDATE, DELETE,...) There are limits, but they are not hit. The query just
> uses a different plan and runs slower because of that.
>
> Pgbouncer, in contrast to its old friend PgPool-II is completely passive,
> just passes through SQL to the server as fast as possible as it can. But I
> am sure you know that. Good luck, keep us posted!
>
> Yes, that is what puzzles me.
>
> What is the pgbouncer's timeout in the server connections ?
>
> How about "idle in transaction" ? do you get any of those? What's the
> isolation level ?
>
> How about the user ? is this the same user doing pgadmin queries VS via
> the app ?
>
> Can you identify the user under which the problem is manifested and :
>
> ALTER user "unlucky_user" SET log_statement = 'all';
>
> ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help
> you debug the prepared statements .. just in case , and other stuff not
> printed by log_statement = all.
>
> None of those parameters should affect the fact that when issuing the
> explain select query (the statement is not prepared) from psql di

Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Achilleas Mantzios


On 5/5/25 09:52, Mladen Marinović wrote:

Hi,

We recently migrated our production instances from PG11 to PG17. While 
doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As 
everything worked on the test servers we pushed this to production a 
few weeks ago. We did not notice any problems until a few days ago 
(but the problems were here from the start). The main manifestation of 
the problems is a service that runs a fixed query to get a backlog of 
unprocessed data (limited to a 1000 rows). When testing the query 
using pgAdmin connected directly to the database we get a result in 
cca. 20 seconds. The same query runs for 2 hours when using pgBouncer 
to connect to the same database.



That's a huge jump, I hope you guys did extensive testing of your app. 
In which language is your app written? If java, then define 
prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in 
pgbouncer.


How about search paths ? any difference on those between the two runs ? 
Do you set search_path in pgbouncer ? what is "cca." btw ?




The more interesting part is that when we issue an explain of the same 
query we get different plans. We did this a few seconds apart so there 
should be no difference in collected statistics. We ruled out prepared 
statements, as we suspected the generic plan might be the problem, but 
it is not. Is there any pgBouncer or PG17 parameter that might be the 
cause of this?



Does this spawn any connections (such as dblink) ? are there limits per 
user/db pool_size in pgbouncer ?


Pgbouncer, in contrast to its old friend PgPool-II is completely 
passive, just passes through SQL to the server as fast as possible as it 
can. But I am sure you know that. Good luck, keep us posted!




Regards,
Mladen Marinović

Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Achilleas Mantzios


On 5/5/25 11:00, Mladen Marinović wrote:



On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios 
 wrote:



On 5/5/25 09:52, Mladen Marinović wrote:

Hi,

We recently migrated our production instances from PG11 to PG17.
While doing so we upgraded our pgBouncer instances from 1.12 to
1.24. As everything worked on the test servers we pushed this to
production a few weeks ago. We did not notice any problems until
a few days ago (but the problems were here from the start). The
main manifestation of the problems is a service that runs a fixed
query to get a backlog of unprocessed data (limited to a 1000
rows). When testing the query using pgAdmin connected directly to
the database we get a result in cca. 20 seconds. The same query
runs for 2 hours when using pgBouncer to connect to the same
database.



That's a huge jump, I hope you guys did extensive testing of your
app. In which language is your app written? If java, then define
prepareThreshold=0 in your jdbc and set max_prepared_statements =
0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the 
max_prepared_statements to 0 from the default (200) but no change was 
noticed.


How about search paths ? any difference on those between the two
runs ? Do you set search_path in pgbouncer ? what is "cca." btw ?



The more interesting part is that when we issue an explain of the
same query we get different plans. We did this a few seconds
apart so there should be no difference in collected statistics.
We ruled out prepared statements, as we suspected the generic
plan might be the problem, but it is not. Is there any pgBouncer
or PG17 parameter that might be the cause of this?



Does this spawn any connections (such as dblink) ? are there
limits per user/db pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT, 
INSERT, UPDATE, DELETE,...) There are limits, but they are not hit. 
The query just uses a different plan and runs slower because of that.


Pgbouncer, in contrast to its old friend PgPool-II is completely
passive, just passes through SQL to the server as fast as possible
as it can. But I am sure you know that. Good luck, keep us posted!

Yes, that is what puzzles me.


What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any of those? What's the 
isolation level ?


How about the user ? is this the same user doing pgadmin queries VS via 
the app ?


Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help 
you debug the prepared statements .. just in case , and other stuff not 
printed by log_statement = all.



Regards,
Mladen Marinović

Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Mladen Marinović
On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <
a.mantz...@cloud.gatewaynet.com> wrote:

>
> On 5/5/25 11:00, Mladen Marinović wrote:
>
>
>
> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
>
>>
>> On 5/5/25 09:52, Mladen Marinović wrote:
>>
>> Hi,
>>
>> We recently migrated our production instances from PG11 to PG17. While
>> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As
>> everything worked on the test servers we pushed this to production a few
>> weeks ago. We did not notice any problems until a few days ago (but the
>> problems were here from the start). The main manifestation of the problems
>> is a service that runs a fixed query to get a backlog of unprocessed data
>> (limited to a 1000 rows). When testing the query using pgAdmin connected
>> directly to the database we get a result in cca. 20 seconds. The same query
>> runs for 2 hours when using pgBouncer to connect to the same database.
>>
>>
>> That's a huge jump, I hope you guys did extensive testing of your app. In
>> which language is your app written? If java, then define prepareThreshold=0
>> in your jdbc and set max_prepared_statements = 0 in pgbouncer.
>>
> Mainly python, but the problem was noticed in a java service.
> Prepare treshold was already set to 0. We changed the  max_prepared_statements
> to 0 from the default (200) but no change was noticed.
>
>> How about search paths ? any difference on those between the two runs ?
>> Do you set search_path in pgbouncer ? what is "cca." btw ?
>>
>>
>> The more interesting part is that when we issue an explain of the same
>> query we get different plans. We did this a few seconds apart so there
>> should be no difference in collected statistics. We ruled out prepared
>> statements, as we suspected the generic plan might be the problem, but it
>> is not. Is there any pgBouncer or PG17 parameter that might be the cause of
>> this?
>>
>>
>> Does this spawn any connections (such as dblink) ? are there limits per
>> user/db pool_size in pgbouncer ?
>>
> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT,
> UPDATE, DELETE,...) There are limits, but they are not hit. The query just
> uses a different plan and runs slower because of that.
>
>> Pgbouncer, in contrast to its old friend PgPool-II is completely passive,
>> just passes through SQL to the server as fast as possible as it can. But I
>> am sure you know that. Good luck, keep us posted!
>>
> Yes, that is what puzzles me.
>
> What is the pgbouncer's timeout in the server connections ?
>
> How about "idle in transaction" ? do you get any of those? What's the
> isolation level ?
>
> How about the user ? is this the same user doing pgadmin queries VS via
> the app ?
>
> Can you identify the user under which the problem is manifested and :
>
> ALTER user "unlucky_user" SET log_statement = 'all';
>
> ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help
> you debug the prepared statements .. just in case , and other stuff not
> printed by log_statement = all.
>
> None of those parameters should affect the fact that when issuing the
explain select query (the statement is not prepared) from psql directly
gives a different result than issuing it over the pgbouncer connection. The
result is repeatable.

We have rolled back pgbouncer to 1.12. and it seems the problem persists.
This is one of the weirdest things I have ever seen with PostgreSQL.

>
> Regards,
> Mladen Marinović
>
>


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Achilleas Mantzios

On 5/5/25 11:30, Ruben Morais wrote:


HI,

Could be a hint but test with jit to off.
If not wrong as you change from 11 to 17, that could be a cause, just 
try it because in some cases plans changed when jit is on.


Not only JIT but also other extensions (such as timescale) could greatly 
affect the plan.


He could find if any GUC are set for the particular user :

select * from pg_db_role_setting where setrole = 
to_regrole('');




Regards,
*Rúben Morais*



On Mon, May 5, 2025 at 11:07 AM Achilleas Mantzios 
 wrote:



On 5/5/25 11:00, Mladen Marinović wrote:



On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios
 wrote:


On 5/5/25 09:52, Mladen Marinović wrote:

Hi,

We recently migrated our production instances from PG11 to
PG17. While doing so we upgraded our pgBouncer instances
from 1.12 to 1.24. As everything worked on the test servers
we pushed this to production a few weeks ago. We did not
notice any problems until a few days ago (but the problems
were here from the start). The main manifestation of the
problems is a service that runs a fixed query to get a
backlog of unprocessed data (limited to a 1000 rows). When
testing the query using pgAdmin connected directly to the
database we get a result in cca. 20 seconds. The same query
runs for 2 hours when using pgBouncer to connect to the same
database.



That's a huge jump, I hope you guys did extensive testing of
your app. In which language is your app written? If java,
then define prepareThreshold=0 in your jdbc and set
max_prepared_statements = 0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the
max_prepared_statements to 0 from the default (200) but no change
was noticed.

How about search paths ? any difference on those between the
two runs ? Do you set search_path in pgbouncer ? what is
"cca." btw ?



The more interesting part is that when we issue an explain
of the same query we get different plans. We did this a few
seconds apart so there should be no difference in collected
statistics. We ruled out prepared statements, as we
suspected the generic plan might be the problem, but it is
not. Is there any pgBouncer or PG17 parameter that might be
the cause of this?



Does this spawn any connections (such as dblink) ? are there
limits per user/db pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT,
INSERT, UPDATE, DELETE,...) There are limits, but they are not
hit. The query just uses a different plan and runs slower because
of that.

Pgbouncer, in contrast to its old friend PgPool-II is
completely passive, just passes through SQL to the server as
fast as possible as it can. But I am sure you know that. Good
luck, keep us posted!

Yes, that is what puzzles me.


What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any of those? What's
the isolation level ?

How about the user ? is this the same user doing pgadmin queries
VS via the app ?

Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement = 0; --
to help you debug the prepared statements .. just in case , and
other stuff not printed by log_statement = all.


Regards,
Mladen Marinović


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Achilleas Mantzios


On 5/5/25 13:27, Mladen Marinović wrote:



On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios 
 wrote:



On 5/5/25 11:00, Mladen Marinović wrote:



On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios
 wrote:


On 5/5/25 09:52, Mladen Marinović wrote:

Hi,

We recently migrated our production instances from PG11 to
PG17. While doing so we upgraded our pgBouncer instances
from 1.12 to 1.24. As everything worked on the test servers
we pushed this to production a few weeks ago. We did not
notice any problems until a few days ago (but the problems
were here from the start). The main manifestation of the
problems is a service that runs a fixed query to get a
backlog of unprocessed data (limited to a 1000 rows). When
testing the query using pgAdmin connected directly to the
database we get a result in cca. 20 seconds. The same query
runs for 2 hours when using pgBouncer to connect to the same
database.



That's a huge jump, I hope you guys did extensive testing of
your app. In which language is your app written? If java,
then define prepareThreshold=0 in your jdbc and set
max_prepared_statements = 0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the
max_prepared_statements to 0 from the default (200) but no change
was noticed.

How about search paths ? any difference on those between the
two runs ? Do you set search_path in pgbouncer ? what is
"cca." btw ?



The more interesting part is that when we issue an explain
of the same query we get different plans. We did this a few
seconds apart so there should be no difference in collected
statistics. We ruled out prepared statements, as we
suspected the generic plan might be the problem, but it is
not. Is there any pgBouncer or PG17 parameter that might be
the cause of this?



Does this spawn any connections (such as dblink) ? are there
limits per user/db pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT,
INSERT, UPDATE, DELETE,...) There are limits, but they are not
hit. The query just uses a different plan and runs slower because
of that.

Pgbouncer, in contrast to its old friend PgPool-II is
completely passive, just passes through SQL to the server as
fast as possible as it can. But I am sure you know that. Good
luck, keep us posted!

Yes, that is what puzzles me.


What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any of those? What's
the isolation level ?

How about the user ? is this the same user doing pgadmin queries
VS via the app ?

Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement = 0; --
to help you debug the prepared statements .. just in case , and
other stuff not printed by log_statement = all.

None of those parameters should affect the fact that when issuing the 
explain select query (the statement is not prepared) from psql 
directly gives a different result than issuing it over the pgbouncer 
connection. The result is repeatable.


We have rolled back pgbouncer to 1.12. and it seems the problem 
persists. This is one of the weirdest things I have ever seen with 
PostgreSQL.



ok, this is something, at least one more extreme thought ruled out. How 
about search_path ?  is this the SAME user that is issuing the 
statements in pgadmin VS pgbouncer ?


Is there a connect_query inside pgbouncer's conf ?

you have to show all configuration involved and also full logging on the 
backend for said user.




Regards,
Mladen Marinović


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread SERHAD ERDEM
Hi  , you had better try  vacuum  analyze for the whole db ,   pgbouncer  
connection layer can not causeslow queries.


From: Mladen Marinović 
Sent: Monday, May 5, 2025 12:27 PM
To: Achilleas Mantzios 
Cc: pgsql-general@lists.postgresql.org 
Subject: Re: Different execution plans in PG17 and pgBouncer...



On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>> wrote:


On 5/5/25 11:00, Mladen Marinović wrote:


On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios 
mailto:a.mantz...@cloud.gatewaynet.com>> wrote:


On 5/5/25 09:52, Mladen Marinović wrote:
Hi,

We recently migrated our production instances from PG11 to PG17. While doing so 
we upgraded our pgBouncer instances from 1.12 to 1.24. As everything worked on 
the test servers we pushed this to production a few weeks ago. We did not 
notice any problems until a few days ago (but the problems were here from the 
start). The main manifestation of the problems is a service that runs a fixed 
query to get a backlog of unprocessed data (limited to a 1000 rows). When 
testing the query using pgAdmin connected directly to the database we get a 
result in cca. 20 seconds. The same query runs for 2 hours when using pgBouncer 
to connect to the same database.


That's a huge jump, I hope you guys did extensive testing of your app. In which 
language is your app written? If java, then define prepareThreshold=0 in your 
jdbc and set max_prepared_statements = 0 in pgbouncer.

Mainly python, but the problem was noticed in a java service.
Prepare treshold was already set to 0. We changed the  max_prepared_statements 
to 0 from the default (200) but no change was noticed.

How about search paths ? any difference on those between the two runs ? Do you 
set search_path in pgbouncer ? what is "cca." btw ?

The more interesting part is that when we issue an explain of the same query we 
get different plans. We did this a few seconds apart so there should be no 
difference in collected statistics. We ruled out prepared statements, as we 
suspected the generic plan might be the problem, but it is not. Is there any 
pgBouncer or PG17 parameter that might be the cause of this?


Does this spawn any connections (such as dblink) ? are there limits per user/db 
pool_size in pgbouncer ?

No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT, 
UPDATE, DELETE,...) There are limits, but they are not hit. The query just uses 
a different plan and runs slower because of that.

Pgbouncer, in contrast to its old friend PgPool-II is completely passive, just 
passes through SQL to the server as fast as possible as it can. But I am sure 
you know that. Good luck, keep us posted!

Yes, that is what puzzles me.

What is the pgbouncer's timeout in the server connections ?

How about "idle in transaction" ? do you get any of those? What's the isolation 
level ?

How about the user ? is this the same user doing pgadmin queries VS via the app 
?

Can you identify the user under which the problem is manifested and :

ALTER user "unlucky_user" SET log_statement = 'all';

ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help you 
debug the prepared statements .. just in case , and other stuff not printed by 
log_statement = all.

None of those parameters should affect the fact that when issuing the explain 
select query (the statement is not prepared) from psql directly gives a 
different result than issuing it over the pgbouncer connection. The result is 
repeatable.

We have rolled back pgbouncer to 1.12. and it seems the problem persists. This 
is one of the weirdest things I have ever seen with PostgreSQL.

Regards,
Mladen Marinović


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Mladen Marinović
On Mon, May 5, 2025 at 2:36 PM Achilleas Mantzios <
a.mantz...@cloud.gatewaynet.com> wrote:

>
> On 5/5/25 13:27, Mladen Marinović wrote:
>
>
>
> On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
>
>>
>> On 5/5/25 11:00, Mladen Marinović wrote:
>>
>>
>>
>> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <
>> a.mantz...@cloud.gatewaynet.com> wrote:
>>
>>>
>>> On 5/5/25 09:52, Mladen Marinović wrote:
>>>
>>> Hi,
>>>
>>> We recently migrated our production instances from PG11 to PG17. While
>>> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As
>>> everything worked on the test servers we pushed this to production a few
>>> weeks ago. We did not notice any problems until a few days ago (but the
>>> problems were here from the start). The main manifestation of the problems
>>> is a service that runs a fixed query to get a backlog of unprocessed data
>>> (limited to a 1000 rows). When testing the query using pgAdmin connected
>>> directly to the database we get a result in cca. 20 seconds. The same query
>>> runs for 2 hours when using pgBouncer to connect to the same database.
>>>
>>>
>>> That's a huge jump, I hope you guys did extensive testing of your app.
>>> In which language is your app written? If java, then define
>>> prepareThreshold=0 in your jdbc and set max_prepared_statements = 0 in
>>> pgbouncer.
>>>
>> Mainly python, but the problem was noticed in a java service.
>> Prepare treshold was already set to 0. We changed the  
>> max_prepared_statements
>> to 0 from the default (200) but no change was noticed.
>>
>>> How about search paths ? any difference on those between the two runs ?
>>> Do you set search_path in pgbouncer ? what is "cca." btw ?
>>>
>>>
>>> The more interesting part is that when we issue an explain of the same
>>> query we get different plans. We did this a few seconds apart so there
>>> should be no difference in collected statistics. We ruled out prepared
>>> statements, as we suspected the generic plan might be the problem, but it
>>> is not. Is there any pgBouncer or PG17 parameter that might be the cause of
>>> this?
>>>
>>>
>>> Does this spawn any connections (such as dblink) ? are there limits per
>>> user/db pool_size in pgbouncer ?
>>>
>> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT,
>> UPDATE, DELETE,...) There are limits, but they are not hit. The query just
>> uses a different plan and runs slower because of that.
>>
>>> Pgbouncer, in contrast to its old friend PgPool-II is completely
>>> passive, just passes through SQL to the server as fast as possible as it
>>> can. But I am sure you know that. Good luck, keep us posted!
>>>
>> Yes, that is what puzzles me.
>>
>> What is the pgbouncer's timeout in the server connections ?
>>
>> How about "idle in transaction" ? do you get any of those? What's the
>> isolation level ?
>>
>> How about the user ? is this the same user doing pgadmin queries VS via
>> the app ?
>>
>> Can you identify the user under which the problem is manifested and :
>>
>> ALTER user "unlucky_user" SET log_statement = 'all';
>>
>> ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help
>> you debug the prepared statements .. just in case , and other stuff not
>> printed by log_statement = all.
>>
> None of those parameters should affect the fact that when issuing the
> explain select query (the statement is not prepared) from psql directly
> gives a different result than issuing it over the pgbouncer connection. The
> result is repeatable.
>
> We have rolled back pgbouncer to 1.12. and it seems the problem persists.
> This is one of the weirdest things I have ever seen with PostgreSQL.
>
>
> ok, this is something, at least one more extreme thought ruled out. How
> about search_path ?  is this the SAME user that is issuing the statements
> in pgadmin VS pgbouncer ?
>
The user is the same, the search path is the same:
show search_path ;
   search_path
-
 "$user", public
(1 row)

There is only one schema (public) so there is no posibility that one
connection uses different tables. As I remember correctly the planner uses
the postgresql parameters (from the conf file visible in pg_settings) and
table/row statistics to choose a plan. It is unclear to me how those
numbers can be different when using pgbouncer (I have tried selecting the
configuration using both connections and it is the same).


> Is there a connect_query inside pgbouncer's conf ?
>
The connect_query parameter is not used.

> you have to show all configuration involved and also full logging on the
> backend for said user.
>
I can provide redacted explains if it would help. Full logging is not
feasible.

>
>
>> Regards,
>> Mladen Marinović
>>
>>


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Mladen Marinović
On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM  wrote:

> Hi  , you had better try  vacuum  analyze for the whole db ,   pgbouncer
> connection layer can not causeslow queries.
>

I did that already. But the slow query is the consequence of the different
plan, not the statistics.


> --
> *From:* Mladen Marinović 
> *Sent:* Monday, May 5, 2025 12:27 PM
> *To:* Achilleas Mantzios 
> *Cc:* pgsql-general@lists.postgresql.org <
> pgsql-general@lists.postgresql.org>
> *Subject:* Re: Different execution plans in PG17 and pgBouncer...
>
>
>
> On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
>
>
> On 5/5/25 11:00, Mladen Marinović wrote:
>
>
>
> On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
>
>
> On 5/5/25 09:52, Mladen Marinović wrote:
>
> Hi,
>
> We recently migrated our production instances from PG11 to PG17. While
> doing so we upgraded our pgBouncer instances from 1.12 to 1.24. As
> everything worked on the test servers we pushed this to production a few
> weeks ago. We did not notice any problems until a few days ago (but the
> problems were here from the start). The main manifestation of the problems
> is a service that runs a fixed query to get a backlog of unprocessed data
> (limited to a 1000 rows). When testing the query using pgAdmin connected
> directly to the database we get a result in cca. 20 seconds. The same query
> runs for 2 hours when using pgBouncer to connect to the same database.
>
>
> That's a huge jump, I hope you guys did extensive testing of your app. In
> which language is your app written? If java, then define prepareThreshold=0
> in your jdbc and set max_prepared_statements = 0 in pgbouncer.
>
> Mainly python, but the problem was noticed in a java service.
> Prepare treshold was already set to 0. We changed the  max_prepared_statements
> to 0 from the default (200) but no change was noticed.
>
> How about search paths ? any difference on those between the two runs ? Do
> you set search_path in pgbouncer ? what is "cca." btw ?
>
>
> The more interesting part is that when we issue an explain of the same
> query we get different plans. We did this a few seconds apart so there
> should be no difference in collected statistics. We ruled out prepared
> statements, as we suspected the generic plan might be the problem, but it
> is not. Is there any pgBouncer or PG17 parameter that might be the cause of
> this?
>
>
> Does this spawn any connections (such as dblink) ? are there limits per
> user/db pool_size in pgbouncer ?
>
> No additional connection nor dbling. Just plain SQL (CTE, SELECT, INSERT,
> UPDATE, DELETE,...) There are limits, but they are not hit. The query just
> uses a different plan and runs slower because of that.
>
> Pgbouncer, in contrast to its old friend PgPool-II is completely passive,
> just passes through SQL to the server as fast as possible as it can. But I
> am sure you know that. Good luck, keep us posted!
>
> Yes, that is what puzzles me.
>
> What is the pgbouncer's timeout in the server connections ?
>
> How about "idle in transaction" ? do you get any of those? What's the
> isolation level ?
>
> How about the user ? is this the same user doing pgadmin queries VS via
> the app ?
>
> Can you identify the user under which the problem is manifested and :
>
> ALTER user "unlucky_user" SET log_statement = 'all';
>
> ALTER user "unlucky_user" SET log_min_duration_statement = 0; -- to help
> you debug the prepared statements .. just in case , and other stuff not
> printed by log_statement = all.
>
> None of those parameters should affect the fact that when issuing the
> explain select query (the statement is not prepared) from psql directly
> gives a different result than issuing it over the pgbouncer connection. The
> result is repeatable.
>
> We have rolled back pgbouncer to 1.12. and it seems the problem persists.
> This is one of the weirdest things I have ever seen with PostgreSQL.
>
>
> Regards,
> Mladen Marinović
>
>


Re: Is anyone up for hosting the online PG game "Schemaverse"?

2025-05-05 Thread Merlin Moncure
On Thu, May 1, 2025 at 5:23 PM Justin Clift  wrote:

> Hi all,
>
> The PostgreSQL game "Schemaverse" was removed from the PostgreSQL
> website's
> links a few months ago because it no longer had hosting.
>
> Does anyone around have spare server/vm/something that could be used to
> host it (for free)?
>

I might be interested.  Is it difficult to set up?

merlin


Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Álvaro Herrera
On 2025-May-05, Mladen Marinović wrote:

> Hi,
> 
> Mystery not solved...but identified. The pool is in transaction mode and
> some connections use set enable_mergejoin=off, but they do not set it back
> to on.

Maybe instead of "SET enable_mergejoin=off" these connections could be
changed to use SET LOCAL enable_mergejoin=off.  That way, the setting
reverts to its original value automatically at the end of the
transaction.  It seems more appropriate when using transaction mode
anyway.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Different execution plans in PG17 and pgBouncer...

2025-05-05 Thread Achilleas Mantzios


On 5/5/25 14:26, Mladen Marinović wrote:

Hi,

Mystery not solved...but identified. The pool is in transaction mode 
and some connections use set enable_mergejoin=off, but they do not set 
it back to on. Upon getting the connection from the pool the parameter 
is still set to off causing the planner to not use this kind of join 
which results in different plans when using this tainted pgbouncer 
connection instead of the clean one from pg17.


The problem is that server_reset_query is not used when the pool is in 
transaction mode. Now, we have to see how to fix this problem.


But you've got this : https://www.pgbouncer.org/config.html

"


 server_reset_query_always

Whether |server_reset_query| should be run in all pooling modes. When 
this setting is off (default), the |server_reset_query| will be run only 
in pools that are in sessions-pooling mode. Connections in 
transaction-pooling mode should not have any need for a reset query.


This setting is for working around broken setups that run applications 
that use session features over a transaction-pooled PgBouncer. It 
changes non-deterministic breakage to deterministic breakage: Clients 
always lose their state after each transaction.


"




Regards,
Mladen Marinović

On Mon, May 5, 2025 at 3:10 PM Efrain J. Berdecia 
 wrote:


Is the query using parameter markers? Is the source executing the
query forcing a "bad" data type casting?

Yahoo Mail: Search, Organize, Conquer



On Mon, May 5, 2025 at 8:52 AM, Mladen Marinović
 wrote:


On Mon, May 5, 2025 at 2:38 PM SERHAD ERDEM
 wrote:

Hi  , you had better try  vacuum  analyze for the whole db
,   pgbouncer  connection layer can not causeslow queries.


I did that already. But the slow query is the consequence of
the different plan, not the statistics.



*From:* Mladen Marinović 
*Sent:* Monday, May 5, 2025 12:27 PM
*To:* Achilleas Mantzios 
*Cc:* pgsql-general@lists.postgresql.org

*Subject:* Re: Different execution plans in PG17 and
pgBouncer...


On Mon, May 5, 2025 at 12:07 PM Achilleas Mantzios
 wrote:


On 5/5/25 11:00, Mladen Marinović wrote:



On Mon, May 5, 2025 at 11:24 AM Achilleas Mantzios
 wrote:


On 5/5/25 09:52, Mladen Marinović wrote:

Hi,

We recently migrated our production
instances from PG11 to PG17. While doing
so we upgraded our pgBouncer instances
from 1.12 to 1.24. As everything worked on
the test servers we pushed this to
production a few weeks ago. We did not
notice any problems until a few days ago
(but the problems were here from the
start). The main manifestation of the
problems is a service that runs a fixed
query to get a backlog of unprocessed data
(limited to a 1000 rows). When testing the
query using pgAdmin connected directly to
the database we get a result in cca. 20
seconds. The same query runs for 2 hours
when using pgBouncer to connect to the
same database.


That's a huge jump, I hope you guys did
extensive testing of your app. In which
language is your app written? If java, then
define prepareThreshold=0 in your jdbc and set
max_prepared_statements = 0 in pgbouncer.

Mainly python, but the problem was noticed in a
java service.
Prepare treshold was already set to 0. We changed
the max_prepared_statements to 0 from the default
(200) but no change was noticed.

How about search paths ? any difference on
those between the two runs ? Do you set
search_path in pgbouncer ? what is "cca." btw ?


The more interestin