Re: [GENERAL] Checkpoint_segments optimal value

Prabhjot Sheena Thu, 17 Jul 2014 16:41:34 -0700

even though the explain plan suggests differently but its taking long long
time




On Thu, Jul 17, 2014 at 4:26 PM, Prabhjot Sheena <
prabhjot.she...@rivalwatch.com> wrote:

> Here is the explain plan. There query time went backup up to 2 to 3
> minutes from 3 second  in just 2 hrs.  Can anyone suggest something on how
> to fix this or why this is happening
>
>  explain SELECT account.id, account.organization_id, run.application_id,
> work_unit.script, work_unit.id, work_unit.start_time, run.id,
> work_unit.priority  FROM work_unit, run, account  WHERE  work_unit.status =
> 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status =
> 1 AND run.account_id = account.id;
>                                              QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..24.84 rows=1 width=64)
>    Join Filter: (run.id = work_unit.run_id)
>    ->  Nested Loop  (cost=0.00..16.55 rows=1 width=16)
>          ->  Index Scan using run_status_1 on run  (cost=0.00..8.27 rows=1
> width=12)
>                Index Cond: (status = 1)
>          ->  Index Scan using account_pkey on account  (cost=0.00..8.27
> rows=1 width=8)
>                Index Cond: (account.id = run.account_id)
>    ->  Index Scan using idx_work_unit_1_status_3_new on work_unit
> (cost=0.00..8.28 rows=1 width=52)
>          Index Cond: (work_unit.status = 3)
>
> Table Structure
>
>  \d work_unit
>                Table "public.work_unit"
>    Column   |            Type             | Modifiers
> ------------+-----------------------------+-----------
>  id         | integer                     | not null
>  run_id     | integer                     |
>  status     | integer                     |
>  script     | character varying           |
>  parameters | character varying           |
>  input      | character varying           |
>  start_time | timestamp without time zone |
>  stop_time  | timestamp without time zone |
>  priority   | integer                     |
>  type       | integer                     |
>  lineitems  | integer                     |
>  outputs    | integer                     |
>  sub_type   | integer                     |
>  site_code  | text                        |
> Indexes:
>     "work_unit_pkey" PRIMARY KEY, btree (id)
>     "idx_work_unit_1_partial" btree (run_id, start_time) WHERE status >= 3
>     "idx_work_unit_1_run_id" btree (run_id)
>     "idx_work_unit_1_script" btree (script)
>     "idx_work_unit_1_site_code" btree (site_code)
>     "idx_work_unit_1_starttime" btree (start_time)
>     "idx_work_unit_1_status_3_new" btree (status, type) WHERE status = 3
> AND type <> 1
>     "idx_work_unit_1_status_5" btree (status) WHERE status >= 4
>     "idx_work_unit_1_status_part_new" btree (status) WHERE status < 4
>     "idx_work_unit_1_stop_time" btree (stop_time)
>
>
>  \d run
>                        Table "public.run"
>        Column        |           Type           |   Modifiers
> ---------------------+--------------------------+---------------
>  id                  | integer                  | not null
>  job_id              | integer                  |
>  start_time          | timestamp with time zone | not null
>  status              | integer                  | not null
>  job_name            | character varying        |
>  account_id          | integer                  |
>  application_id      | integer                  |
>  postproc_script     | character varying        |
>  upload_url          | character varying        |
>  complete_time       | timestamp with time zone |
>  email               | character varying        |
>  size                | integer                  |
>  errors              | integer                  |
>  raw_count           | integer                  |
>  munge               | integer                  |
>  job_details         | character varying        |
>  user_settings       | character varying        |
>  run_type            | integer                  |
>  stop_after          | timestamp with time zone |
>  total_work_time     | integer                  |
>  flags               | integer                  |
>  cluster_info_id     | integer                  |
>  skiplineitems_count | integer                  |
>  last_update_ts      | timestamp with time zone |
>  result_exists       | boolean                  | default false
>  abort_type          | integer                  |
> Indexes:
>     "run_pkey" PRIMARY KEY, btree (id)
>     "idx_run_acc_stat_comp" btree (account_id, status, complete_time)
>     "idx_run_app" btree (application_id)
>     "idx_run_complete_time_2" btree (complete_time)
>     "idx_run_job_name" btree (job_name)
>     "run_application_account_idx" btree (application_id, account_id,
> status)
>     "run_job" btree (job_id)
>     "run_result_exists" btree (result_exists)
>     "run_start_time" btree (start_time)
>     "run_status" btree (status)
>     "run_status_1" btree (status) WHERE status = 1
>     "run_status_part_idx" btree (status) WHERE status < 3
> Triggers:
>     run_upd_ts BEFORE INSERT OR UPDATE ON run FOR EACH ROW EXECUTE
> PROCEDURE set_last_update_ts()
>     tr_proc_update_job_summary AFTER UPDATE ON run FOR EACH ROW EXECUTE
> PROCEDURE tr_proc_update_job_summary()
>
>
>
>
> On Thu, Jul 17, 2014 at 12:00 PM, Potentialtech <wmo...@potentialtech.com>
> wrote:
>
>> On Thu, 17 Jul 2014 11:28:04 -0700
>> Prabhjot Sheena <prabhjot.she...@rivalwatch.com> wrote:
>>
>> > Thanks i have changed that to 64 and reloaded it.
>> >
>> > When i had load issue today there was this exact same query that hits
>> the
>> > db like 50 to 60 times from different machines in 3 to 4 minutes and was
>> > taking long time to execute and was holding up the database. i did
>> recreate
>> > an index and it started performing better. My question is why it is not
>> > fetching the result from the memory since  its the same query that runs
>> > again and again.
>> >
>> > This is the actual query i m taking about:
>> >
>> > SELECT account.id, account.organization_id, run.application_id,
>> > work_unit.script, work_unit.id, work_unit.start_time, run.id,
>> > work_unit.priority  FROM work_unit, run, account  WHERE
>>  work_unit.status =
>> > 3 AND work_unit.run_id = run.id AND work_unit.type != 1 AND run.status
>> = 1
>> > AND run.account_id = account.id
>> >
>> > Pls suggest if i can do something to fix this
>>
>> Provide a lot more information if you want anyone on the list to be able
>> to help: such as explain output while the problem is happening, and some
>> information about the makeup of the tables (column types/indexes/# rows).
>>
>> Guessing, based on the little information you've provided, it's likely
>> that you have something else going on at the same time that you're not
>> aware of, and this particular query is only a symptom.  I'm saying that
>> because SELECTs don't generally create any WAL traffic, so there were
>> probably some INSERT/UPDATE/DELETE running at the same time that both
>> pushed those 3 tables out of memory and/or saturated disk activity to
>> the point that accessing everything becomes slow for a short while, and
>> it's just those queries that you noticed.
>>
>> Are you making the mistake where you set log_min_duration to 1s and only
>> worry about queries that take longer than 1s?  Because I've seen (on
>> multiple occasions) where many 1000s of queries, each running less than
>> 1s, are the actual cause of the problem.  pgBadger is particularly helpful
>> in tracking down situations like that.
>>
>> > On Thu, Jul 17, 2014 at 11:06 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> >
>> > > Potentialtech <wmo...@potentialtech.com> writes:
>> > > > If the warning isn't happening too often, I would try increasing it
>> only
>> > > a
>> > > > little and see if it helps.  If it's not enough you can then
>> increase it
>> > > some
>> > > > more.  Various sources around the Internet suggest that you don't
>> want
>> > > to go
>> > > > much larger than 256 for this (if only because it's uncommon to do
>> so
>> > > and is
>> > > > probably indicative of other tuning that you need to do).
>> > >  Unfortunatley, you
>> > > > need to restart PG for the change to take effect, so you have to
>> balance
>> > > > experimenting with your tuning against how often you can get away
>> with a
>> > > server
>> > > > restart.
>> > >
>> > > Huh?  You don't need a restart, just a reload (SIGHUP) to change that.
>> > >
>> > >                         regards, tom lane
>> > >
>>
>>
>> --
>> Potentialtech <wmo...@potentialtech.com>
>>
>
>

Re: [GENERAL] Checkpoint_segments optimal value

Reply via email to