[Launchpad-dev] 9 second hard timeout (no new timeout bugs)

Robert Collins Wed, 20 Apr 2011 20:23:56 -0700

Of course, we have some really hard pages to fix, and many more that
timeout only a few times a day, but we've fixed enough pages to be
managably down to a 9 second cap.


This doesn't mean we can stop working on timeouts :) - but it does
mean, at least for a while, that I won't be moving the hard timeout
value down if doing so would add new timeout bugs. It is time to
consolidate and focus on the second half of the stretch goal Francis
set : 9 second timeout + no critical bugs. (1/3 of the critical bugs
are timeouts).

The longer term goal is still a 5 second timeout with 1 second 99th
percentile... and we had a discussion a few weeks back about setting
the timeout for *new* pages to 5 seconds straight away. Thats still
not totally settled, but I think its time we looked into how to make
that work. In the mean time the hard timeout default value can sit at
9 seconds. If we get to the point where it could be dropped another
second without adding critical bugs, I'll definitely do that - but
only if it won't be adding bugs :). (dropping it provides a backstop
against misbehaving pages, its an important overall thing to get it
low).

The following pages have timeout exceptions at the moment:
hard_timeout    default 0       9000
hard_timeout    pageid:BugTask:+create-question 12      20000
hard_timeout    pageid:Distribution:+bugs       4       10000
hard_timeout    pageid:Distribution:+bugtarget-portlet-tags-content     3       
10000
hard_timeout    pageid:Distribution:EntryResource:searchTasks   5       10000
hard_timeout    pageid:Question:+index  18      11000
hard_timeout    pageid:RootObject:+login        1       20000

Question:+index because it takes a very long time before it does its
commit - even without mail spooling its a slow page that doesn't
improve with retries.
Ditto BugTask:+create-question
Distribution:+bugs because we have some difficult performance work to
do on search, and its not inside the time frames suitable for
maintenance squads - at least, as assessed so far.
The tags portlet should be temporary until we deploy the bugsumary table.
pageid:Distribution:EntryResource:searchTasks seems to be driven by a
script - perhaps arsenal - but its getting into offsets of*thousands*
in the DB : we really need to address the batching logic.
Finally, RootObject:+login is exempted because we're running into SSO
backend delays which we have little visibility into as a team - there
is a bug open on canonical-identity-provider about performance, and
I've volunteered our collective knowledge if the ISD team have any
trouble analysing how or why the thing is slow - we've all learnt a
lot about addressing performance in the last 10 months, so please feel
free to share :)

-Rob

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-dev
Post to     : launchpad-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp

[Launchpad-dev] 9 second hard timeout (no new timeout bugs)

Reply via email to