even though the explain plan suggests differently but its taking long long time
On Thu, Jul 17, 2014 at 4:26 PM, Prabhjot Sheena < prabhjot.she...@rivalwatch.com> wrote: > Here is the explain plan. There query time went backup up to 2 to 3 > minutes from 3 second in just 2 hrs. Can anyone suggest something on how > to fix this or why this is happening > > explain SELECT account.id, account.organization_id, run.application_id, > work_unit.script, work_unit.id, work_unit.start_time, run.id, > work_unit.priority FROM work_unit, run, account WHERE work_unit.status = > 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status = > 1 AND run.account_id = account.id; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..24.84 rows=1 width=64) > Join Filter: (run.id = work_unit.run_id) > -> Nested Loop (cost=0.00..16.55 rows=1 width=16) > -> Index Scan using run_status_1 on run (cost=0.00..8.27 rows=1 > width=12) > Index Cond: (status = 1) > -> Index Scan using account_pkey on account (cost=0.00..8.27 > rows=1 width=8) > Index Cond: (account.id = run.account_id) > -> Index Scan using idx_work_unit_1_status_3_new on work_unit > (cost=0.00..8.28 rows=1 width=52) > Index Cond: (work_unit.status = 3) > > Table Structure > > \d work_unit > Table "public.work_unit" > Column | Type | Modifiers > ------------+-----------------------------+----------- > id | integer | not null > run_id | integer | > status | integer | > script | character varying | > parameters | character varying | > input | character varying | > start_time | timestamp without time zone | > stop_time | timestamp without time zone | > priority | integer | > type | integer | > lineitems | integer | > outputs | integer | > sub_type | integer | > site_code | text | > Indexes: > "work_unit_pkey" PRIMARY KEY, btree (id) > "idx_work_unit_1_partial" btree (run_id, start_time) WHERE status >= 3 > "idx_work_unit_1_run_id" btree (run_id) > "idx_work_unit_1_script" btree (script) > "idx_work_unit_1_site_code" btree (site_code) > "idx_work_unit_1_starttime" btree (start_time) > "idx_work_unit_1_status_3_new" btree (status, type) WHERE status = 3 > AND type <> 1 > "idx_work_unit_1_status_5" btree (status) WHERE status >= 4 > "idx_work_unit_1_status_part_new" btree (status) WHERE status < 4 > "idx_work_unit_1_stop_time" btree (stop_time) > > > \d run > Table "public.run" > Column | Type | Modifiers > ---------------------+--------------------------+--------------- > id | integer | not null > job_id | integer | > start_time | timestamp with time zone | not null > status | integer | not null > job_name | character varying | > account_id | integer | > application_id | integer | > postproc_script | character varying | > upload_url | character varying | > complete_time | timestamp with time zone | > email | character varying | > size | integer | > errors | integer | > raw_count | integer | > munge | integer | > job_details | character varying | > user_settings | character varying | > run_type | integer | > stop_after | timestamp with time zone | > total_work_time | integer | > flags | integer | > cluster_info_id | integer | > skiplineitems_count | integer | > last_update_ts | timestamp with time zone | > result_exists | boolean | default false > abort_type | integer | > Indexes: > "run_pkey" PRIMARY KEY, btree (id) > "idx_run_acc_stat_comp" btree (account_id, status, complete_time) > "idx_run_app" btree (application_id) > "idx_run_complete_time_2" btree (complete_time) > "idx_run_job_name" btree (job_name) > "run_application_account_idx" btree (application_id, account_id, > status) > "run_job" btree (job_id) > "run_result_exists" btree (result_exists) > "run_start_time" btree (start_time) > "run_status" btree (status) > "run_status_1" btree (status) WHERE status = 1 > "run_status_part_idx" btree (status) WHERE status < 3 > Triggers: > run_upd_ts BEFORE INSERT OR UPDATE ON run FOR EACH ROW EXECUTE > PROCEDURE set_last_update_ts() > tr_proc_update_job_summary AFTER UPDATE ON run FOR EACH ROW EXECUTE > PROCEDURE tr_proc_update_job_summary() > > > > > On Thu, Jul 17, 2014 at 12:00 PM, Potentialtech <wmo...@potentialtech.com> > wrote: > >> On Thu, 17 Jul 2014 11:28:04 -0700 >> Prabhjot Sheena <prabhjot.she...@rivalwatch.com> wrote: >> >> > Thanks i have changed that to 64 and reloaded it. >> > >> > When i had load issue today there was this exact same query that hits >> the >> > db like 50 to 60 times from different machines in 3 to 4 minutes and was >> > taking long time to execute and was holding up the database. i did >> recreate >> > an index and it started performing better. My question is why it is not >> > fetching the result from the memory since its the same query that runs >> > again and again. >> > >> > This is the actual query i m taking about: >> > >> > SELECT account.id, account.organization_id, run.application_id, >> > work_unit.script, work_unit.id, work_unit.start_time, run.id, >> > work_unit.priority FROM work_unit, run, account WHERE >> work_unit.status = >> > 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status >> = 1 >> > AND run.account_id = account.id >> > >> > Pls suggest if i can do something to fix this >> >> Provide a lot more information if you want anyone on the list to be able >> to help: such as explain output while the problem is happening, and some >> information about the makeup of the tables (column types/indexes/# rows). >> >> Guessing, based on the little information you've provided, it's likely >> that you have something else going on at the same time that you're not >> aware of, and this particular query is only a symptom. I'm saying that >> because SELECTs don't generally create any WAL traffic, so there were >> probably some INSERT/UPDATE/DELETE running at the same time that both >> pushed those 3 tables out of memory and/or saturated disk activity to >> the point that accessing everything becomes slow for a short while, and >> it's just those queries that you noticed. >> >> Are you making the mistake where you set log_min_duration to 1s and only >> worry about queries that take longer than 1s? Because I've seen (on >> multiple occasions) where many 1000s of queries, each running less than >> 1s, are the actual cause of the problem. pgBadger is particularly helpful >> in tracking down situations like that. >> >> > On Thu, Jul 17, 2014 at 11:06 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> > >> > > Potentialtech <wmo...@potentialtech.com> writes: >> > > > If the warning isn't happening too often, I would try increasing it >> only >> > > a >> > > > little and see if it helps. If it's not enough you can then >> increase it >> > > some >> > > > more. Various sources around the Internet suggest that you don't >> want >> > > to go >> > > > much larger than 256 for this (if only because it's uncommon to do >> so >> > > and is >> > > > probably indicative of other tuning that you need to do). >> > > Unfortunatley, you >> > > > need to restart PG for the change to take effect, so you have to >> balance >> > > > experimenting with your tuning against how often you can get away >> with a >> > > server >> > > > restart. >> > > >> > > Huh? You don't need a restart, just a reload (SIGHUP) to change that. >> > > >> > > regards, tom lane >> > > >> >> >> -- >> Potentialtech <wmo...@potentialtech.com> >> > >