Missing query plan for auto_explain.
Our Postgres recently started reporting considerably different execution times for the same query. When executed from our JDBC application the Postgres logs report an average execution time of 1500 ms but when the query is manually executed through `psql` it doesn't take longer than 50 ms. With a view to investigate discrepancies in the plan we enabled `auto_explain` in `session_preload_libraries` with `auto_explain.log_min_duration = '1s'`. All application servers were bounced to ensure new connections were created and picked up the changes. However this trouble query does not have an explain plan printed, even when its execution time exceeds the threshold (other queries do though). Does anyone have ideas of why the explain plan is not being printed? Sample log entry for trouble query executed from application: ``` Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG: duration: 1423.481 ms bind : /*@ org.sixty6.as66.QueryID:report.framework.userSearch */select users.user_id, users.unique_identifier, user_profile.name, user_profile.email, organisation.organisation_id, organisation.description, user_realm.user_realm_id, user_realm.description as col_8, user_realm.type, user_realm_category.description as col_10, vip_schema.description as col_11, vip_level.description as col_12, affiliate.description as col_13, users.status, users.creation_date, user_statistics.last_user_session_creation_date, users.real_base_currency, users.fun_base_currency, local_balance_account.balance Aug 26 09:11:33 db-931 postgres[8106]: [66-2] #011from users users Aug 26 09:11:33 db-931 postgres[8106]: [66-3] #011join user_profile user_profile Aug 26 09:11:33 db-931 postgres[8106]: [66-4] #011on user_profile.user_id = users.user_id Aug 26 09:11:33 db-931 postgres[8106]: [66-5] #011join user_realm user_realm Aug 26 09:11:33 db-931 postgres[8106]: [66-6] #011on user_realm.user_realm_id = users.user_realm_id Aug 26 09:11:33 db-931 postgres[8106]: [66-7] #011join organisation organisation Aug 26 09:11:33 db-931 postgres[8106]: [66-8] #011on organisation.organisation_id = user_realm.organisation_id Aug 26 09:11:33 db-931 postgres[8106]: [66-9] #011join acl_allowed_organisation acl_allowed_organisation Aug 26 09:11:33 db-931 postgres[8106]: [66-10] #011on acl_allowed_organisation.organisation_id = organisation.organisation_id and acl_allowed_organisation.permission = $1 and acl_allowed_organisation.user_id = $2 Aug 26 09:11:33 db-931 postgres[8106]: [66-11] #011join affiliate affiliate Aug 26 09:11:33 db-931 postgres[8106]: [66-12] #011on affiliate.affiliate_id = users.affiliate_id Aug 26 09:11:33 db-931 postgres[8106]: [66-13] #011join acl_allowed_affiliate acl_allowed_affiliate Aug 26 09:11:33 db-931 postgres[8106]: [66-14] #011on acl_allowed_affiliate.affiliate_id = affiliate.affiliate_id and acl_allowed_affiliate.permission = $3 and acl_allowed_affiliate.user_id = $4 Aug 26 09:11:33 db-931 postgres[8106]: [66-15] #011join acl_allowed_user_realm_category acl_allowed_user_realm_category Aug 26 09:11:33 db-931 postgres[8106]: [66-16] #011on acl_allowed_user_realm_category.user_realm_category_id = user_realm.user_realm_category_id and acl_allowed_user_realm_category.permission = $5 and acl_allowed_user_realm_category.user_id = $6 Aug 26 09:11:33 db-931 postgres[8106]: [66-17] #011join user_statistics user_statistics Aug 26 09:11:33 db-931 postgres[8106]: [66-18] #011on user_statistics.user_id = users.user_id Aug 26 09:11:33 db-931 postgres[8106]: [66-19] #011join vip_level vip_level Aug 26 09:11:33 db-931 postgres[8106]: [66-20] #011on vip_level.vip_level_id = users.vip_level_id Aug 26 09:11:33 db-931 postgres[8106]: [66-21] #011join vip_schema vip_schema Aug 26 09:11:33 db-931 postgres[8106]: [66-22] #011on vip_schema.vip_schema_id = vip_level.vip_schema_id Aug 26 09:11:33 db-931 postgres[8106]: [66-23] #011join user_realm_category user_realm_category Aug 26 09:11:33 db-931 postgres[8106]: [66-24] #011on user_realm_category.user_realm_category_id = user_realm.user_realm_category_id Aug 26 09:11:33 db-931 postgres[8106]: [66-25] #011left join local_balance_account local_balance_account Aug 26 09:11:33 db-931 postgres[8106]: [66-26] #011on local_balance_account.user_id = users.user_id and local_balance_account.currency = users.real_base_currency and local_balance_account.balance_category = $7 and local_balance_account.primary_balance = $8 Aug 26 09:11:33 db-931 postgres[8106]: [66-27] #011where users.type in ($9, $10) and user_realm.type in ($11, $12, $13) and users.unique_identifier like $14 escape '\' Aug 26 09:11:33 db-931 postgres[8106]: [66-28] #011order by users.unique_identifier asc, user_realm.user_realm_id asc Aug 26 09:11:33 db-931 postgres[8106]: [66-29] #011limit $15 Aug 26 09:11:33 db-931 postgres[8106]: [66-30] 2022-08-26 09:11:33 GMT [8106]: [6-1] db=betwave,user=betwave_app_readonly_user DETAIL: parameters: $1 = 'READ', $2 = '1468137', $3 = 'READ
Re: Missing query plan for auto_explain.
Good idea on using an actual prepared statement but unfortunately it didn't produce any different result. Could you please elaborate a bit on your advice concerning ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me. On Tue, 30 Aug 2022 at 12:16, Alvaro Herrera wrote: > On 2022-Aug-30, Matheus Martin wrote: > > > Our Postgres recently started reporting considerably different > > execution times for the same query. When executed from our JDBC > > application the Postgres logs report an average execution time of 1500 > > ms but when the query is manually executed through `psql` it doesn't > > take longer than 50 ms. > > I don't know why the plan is not saved by auto_explain (maybe we're > missing ExecutorEnd calls somewhere? that would be strange), but one > frequent reason for queries to show different plan in JDBC than psql is > the use of prepared statements. Did you try using "PREPARE > yourquery(...)" and then EXPLAIN EXECUTE(...)? Sometimes that helps to > recreate the original problem. > > > (Apparently, ExecutorEnd is called from PortalCleanup; what happens with > the portal for an extended-protocol query?) > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ >
Re: Missing query plan for auto_explain.
The threshold for `auto_explain` was changed to 500 ms and explain plans are still not being logged. On Tue, 30 Aug 2022 at 13:30, Julien Rouhaud wrote: > Hi, > > On Tue, Aug 30, 2022 at 01:16:43PM +0200, Alvaro Herrera wrote: > > On 2022-Aug-30, Matheus Martin wrote: > > > > > Our Postgres recently started reporting considerably different > > > execution times for the same query. When executed from our JDBC > > > application the Postgres logs report an average execution time of 1500 > > > ms but when the query is manually executed through `psql` it doesn't > > > take longer than 50 ms. > > > > I don't know why the plan is not saved by auto_explain (maybe we're > > missing ExecutorEnd calls somewhere? that would be strange), but one > > frequent reason for queries to show different plan in JDBC than psql is > > the use of prepared statements. Did you try using "PREPARE > > yourquery(...)" and then EXPLAIN EXECUTE(...)? Sometimes that helps to > > recreate the original problem. > > > > (Apparently, ExecutorEnd is called from PortalCleanup; what happens with > > the portal for an extended-protocol query?) > > AFAICS log_min_duration_statements threshold is based on the full query > processing time while auto_explain is only based on the executor runtime, > so > one more likely explanation is that out of the 1423ms, more than 423ms were > spent in the planner? >
Re: Missing query plan for auto_explain.
We tried running the prepared statement six times as suggested but wasn't still able to recreate the original problem. Perhaps more concerning/relevant is that we have not found any explanation to why the explain plan is not being logged by `auto_explain`. Could this be a bug? Shall we report it? On Tue, 30 Aug 2022 at 17:45, Alvaro Herrera wrote: > On 2022-Aug-30, Matheus Martin wrote: > > > Good idea on using an actual prepared statement but unfortunately it > didn't > > produce any different result. > > I should have also mentioned to try the EXPLAIN EXECUTE six times and > see if the last one produces a different plan. That's when it switches > from planning every time to planning with generic arguments, as I > recall. > > > Could you please elaborate a bit on your advice concerning > > ExecutorEnd/PortalCleanup? I am afraid it doesn't mean much to me. > > That wasn't advice actually, just a note that the code might be doing > that thing wrong, causing auto_explain to miss it. This is an unproven > hypothesis that is likely to be all wrong. > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ > "I'm impressed how quickly you are fixing this obscure issue. I came from > MS SQL and it would be hard for me to put into words how much of a better > job > you all are doing on [PostgreSQL]." > Steve Midgley, > http://archives.postgresql.org/pgsql-sql/2008-08/msg0.php >
Re: Missing query plan for auto_explain.
Yes, we do see some plans logged by the auto_explain. We couldn't find a `auto_explain.log_min_duration_statements` setting but `log_min_duration_statement` as in https://www.postgresql.org/docs/current/runtime-config-logging.html is set to 100 ms. Unfortunately, due to the amount of traffic we have, we cannot lower the threshold for the auto_explain. `log_lock_waits` is turned on and the logs do not indicate any locks related to the tables in the query. On Fri, 2 Sept 2022 at 05:49, Julien Rouhaud wrote: > Hi, > > On Thu, Sep 01, 2022 at 08:20:13PM +0100, Matheus Martin wrote: > > We tried running the prepared statement six times as suggested but wasn't > > still able to recreate the original problem. > > > > Perhaps more concerning/relevant is that we have not found any > explanation > > to why the explain plan is not being logged by `auto_explain`. Could this > > be a bug? Shall we report it? > > Just to be sure, do you get at least some plans logged by auto_explain when > queries are executed by the JDBC application? > > Can you try to temporarily lower auto_explain.log_min_duration_statements > to > less than 50ms and see what auto_explain sees for the execution time (and > planning time). > > Another possibility would be some conflicting locks held. If the conflict > happens during the planning auto_explain still won't be triggered as it's > outside the executor. Also, have you enabled log_lock_waits? >
Re: Missing query plan for auto_explain.
`auto_explain.log_min_duration` is set to 500 ms. On Mon, 5 Sept 2022 at 12:35, Peter J. Holzer wrote: > On 2022-09-02 10:58:58 +0100, Matheus Martin wrote: > > Yes, we do see some plans logged by the auto_explain. We couldn't find a > > `auto_explain.log_min_duration_statements` setting > > This is weird as the documentation says: > > | Note that the default behavior is to do nothing, so you must set at > | least auto_explain.log_min_duration if you want any results. > > What does > show auto_explain.log_min_duration; > return? > > > but `log_min_duration_statement` as in > > https://www.postgresql.org/docs/current/ runtime-config-logging.html > > is set to 100 ms. > > I don't think this affects auto_explain. > > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" >
Re: Missing query plan for auto_explain.
We do have JIT enabled `jit=on` with `jit_above_cost=10`. I am sorry but I don't quite understand what role JIT plays in this situation with `auto_explain`. Could you please elaborate on that? On Tue, 6 Sept 2022 at 00:29, Maxim Boguk wrote: > > > On Tue, Aug 30, 2022 at 1:38 PM Matheus Martin < > matheus.mar...@voidbridge.com> wrote: > >> Our Postgres recently started reporting considerably different execution >> times for the same query. When executed from our JDBC application the >> Postgres logs report an average execution time of 1500 ms but when the query >> is manually executed through `psql` it doesn't take longer than 50 ms. >> >> With a view to investigate discrepancies in the plan we enabled >> `auto_explain` in `session_preload_libraries` with >> `auto_explain.log_min_duration = '1s'`. All application servers were bounced >> to ensure new connections were created and picked up the changes. However >> this trouble query does not have an explain plan printed, even when its >> execution time exceeds the threshold (other queries do though). >> >> Does anyone have ideas of why the explain plan is not being printed? >> >> Sample log entry for trouble query executed from application: >> ``` >> Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT >> [8106]: [5-1] db=betwave,user=betwave_app_readonly_user LOG: duration: >> 1423.481 ms bind : >> >> > My understanding of how to auto_explain work - it deals only for execution > calls, but in your case duration: 1423.481 ms on BIND call, before query > execution. > At least in my understanding - auto_explain cannot work and will not help > in case of a slow BIND call (because it's a time when the query is planned > but not executed). > According documentation: > "Query planning typically occurs when the Bind message is processed. If > the prepared statement has no parameters, or is executed repeatedly, the > server might save the created plan and re-use it during subsequent Bind > messages for the same prepared statement." > Hard to say what the reason for slow planning, but one (there could be > others) likely reason is JIT work. Do you have JIT enabled? > > > -- > Maxim Boguk > Senior Postgresql DBA > https://dataegret.com/ > > Phone UA: +380 99 143 > Phone AU: +61 45 218 5678 > > LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b > Skype: maxim.boguk > > "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно > когда я так делаю ещё раз?" > >
Re: Missing query plan for auto_explain.
rows=1 width=60) (actual time=0.008..0.008 rows=1 loops=23) Index Cond: ((user_realm_id)::text = (users.user_realm_id)::text) Filter: ((type)::text = ANY ('{ADMIN,HUB,PLAYER}'::text[])) Buffers: shared hit=69 -> Index Scan using pk_organisation on organisation (cost=0.27..0.43 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=23) Index Cond: ((organisation_id)::text = (user_realm.organisation_id)::text) Buffers: shared hit=69 -> Index Only Scan using pk_acl_allowed_organisation on acl_allowed_organisation (cost=0.29..1.38 rows=1 width=9) (actual time=0.007..0.007 rows=1 loops=23) Index Cond: ((user_id = '1468137'::bigint) AND (organisation_id = (user_realm.organisation_id)::text) AND (permission = 'READ'::text)) Heap Fetches: 0 Buffers: shared hit=47 -> Index Scan using pk_affiliate on affiliate (cost=0.28..0.86 rows=1 width=21) (actual time=0.005..0.005 rows=1 loops=23) Index Cond: (affiliate_id = users.affiliate_id) Buffers: shared hit=69 -> Index Only Scan using pk_acl_allowed_user_realm_category on acl_allowed_user_realm_category (cost=0.28..0.59 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=23) Index Cond: ((user_id = '1468137'::bigint) AND (user_realm_category_id = (user_realm.user_realm_category_id)::text) AND (permission = 'READ'::text)) Heap Fetches: 0 Buffers: shared hit=47 -> Index Scan using pk_user_statistics on user_statistics (cost=0.43..0.51 rows=1 width=16) (actual time=0.030..0.030 rows=1 loops=23) Index Cond: (user_id = user_profile.user_id) Buffers: shared hit=92 dirtied=1 -> Index Scan using pk_vip_level on vip_level (cost=0.14..0.22 rows=1 width=20) (actual time=0.005..0.005 rows=1 loops=23) Index Cond: (vip_level_id = users.vip_level_id) Buffers: shared hit=46 -> Index Scan using pk_vip_schema on vip_schema (cost=0.13..0.63 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=23) Index Cond: ((vip_schema_id)::text = (vip_level.vip_schema_id)::text) Buffers: shared hit=46 -> Index Scan using pk_user_realm_category on user_realm_category (cost=0.13..0.18 rows=1 width=1032) (actual time=0.005..0.005 rows=1 loops=23) Index Cond: ((user_realm_category_id)::text = (user_realm.user_realm_category_id)::text) Buffers: shared hit=46 -> Index Scan using local_balance_account_unq01 on local_balance_account (cost=0.29..6.45 rows=1 width=15) (actual time=0.020..0.020 rows=1 loops=23) Index Cond: ((user_id = users.user_id) AND ((balance_category)::text = 'CASH'::text)) Filter: ((currency)::text = (users.real_base_currency)::text) Buffers: shared hit=67 Planning: Buffers: shared hit=859 Planning Time: 50.833 ms Execution Time: 5.480 ms (85 rows) On Fri, 9 Sept 2022 at 22:06, Maxim Boguk wrote: > > > On Thu, Sep 8, 2022 at 1:18 PM Matheus Martin < > matheus.mar...@voidbridge.com> wrote: > >> We do have JIT enabled `jit=on` with `jit_above_cost=10`. >> >> I am sorry but I don't quite understand what role JIT plays in >> this situation with `auto_explain`. Could you please elaborate on that? >> >> > In your log - time spent during the execution stage (where auto-exlain can > help) but during the bind/planning stage (e.g. generation of plan). > So you have a problem not with slow query execution, but with slow query > planning, so autoexplain cannot help in that case. > JIT is one possible explanation of planning stage slowdown. > > Can you run explain (analyze, costs, buffers, timing) of your query? > And check how long the planning stage took and check if JIT was used or > not (and how much time spent during JIT if it had been used). > > > > -- > Maxim Boguk > Senior Postgresql DBA > https://dataegret.com/ > > Phone UA: +380 99 143 > Phone AU: +61 45 218 5678 > >