Re: excessive time spent in "statistics" status

Shawn Green Sat, 01 Sep 2007 08:22:41 -0700

Hello Lucio,

(reply below)
Lucio Chiappetti wrote:

I have some queries, involving a largish number of JOIN, whichare apparently very slow or even take forever (a mysqladmin processlistshows them remain in the "statistics" status for a long time, in mostcases I have to kill them after several minutes).
When I first had the problem I googled around and found some reference(which I've lost) saying that the "statistics" status is actually what onedoes with an EXPLAIN SELECT, and that this is done preliminarily to theactual query. It also said it might occur with a large number of joinsbecause this analysis, for n joins MIGHT try up to n! combinations, unlessone somehow specified the priorities (but the author did not rememberhow).

You can find those explanations in our manual athttp://dev.mysql.com/doc/refman/5.0/en/thread-information.html

I thought to have overcome the problem using a feature of the CREATEVIEW command (see below), but apparently I simply moved it to an higher n.
Now I tried to see how it scales with the number of joins, and the curiousthings is that e.g. for n=9 it works fast, for n=15 it works slowly, forn=18 works fast again and for n=20 takes an infinite time.
I'll first explain my background :

 - I have a number of tables (let's call them t0, t1, t2 ...)
 - all of them have an auto_increment column called seq which is also
   an index

 - one table (t0) is "more important" (actually it is list of
   celestial X-ray sources while the other are celestial objects
   in other wavebands but this is irrelevant to you).

 - I have precomputed correlation tables among t0 and each of
   the other. These tables are called eg. t0ti, have two columns
   t0 and ti corresponding to the t0.seq and ti.seq of objects which
   are "associated". They are indexed on (t0,ti). Note that an
   object in t0 can be associated with 1 or more or zero (t0ti.ti null)
   objects in ti.
- I originally (already under mysql 3) devised a way to identifycounterparts in MORE tables (all these associations are based onspherical distance + other criteria). This involved creating a
   working table G

   This table has columns named t0 t1 ... tn (containing the pointers
t0.seq t1.seq ... for counterparts associated and validated accordingto some criteria) plus other service columns
   The simultaneous access was achieved in our interface by a mechanism
   we called virtual tables, which essentially was

     SELECT
      some subset of columns in some of the t0...tn
      or some expression thereof
     FROM
      G left join t0 on G.t0=t0.seq
        left join t1 on G.t1=t1.seq
        ...
left join tn on G.tn=tn.seq
   We refer to the t0...tn as "the member tables" of G.

I normally do not get lost in symbolic descriptions such as yours.However your description of G and how you build it leaves me in the dark.

Can you show me a few sample rows of G (symbolically, if you like) anddescribe what is in each column G.t0 to G.tn? What I am curious to knowis what do each of these n object have in common that allows them torepresented as a single tuple on the G table.

You also mention other "service columns". What kinds of information areyou keeping in those?


   We have different versions of G corresponding to different sets
   of member tables and different association criteria.

   The largest of our cases has 26 different members.
<snip>
 - for each G we define a view as

"Each G" ? Again, that makes the concept of what a G really is moreconfusing to me. I understand databases and I know more than a littleabout stellar cartography, cosmology, and physics. Please don't hold back.

<snipped (to be revisited later)>

Questions :

 - what does "explain select" actually do and why sometimes hangs ?

 - can this be overcome rearranging the order of the joins (note that
   the bulk of the members are all joined with G), or introducing
   parentheses or with other syntax changes ?

 - or has it to do with some configuration parameter, maybe related
   to what is cached, cache size or other ?

Thanks in advance to whoever is able to give hints.

EXPLAIN SELECT simply stops a normal SELECT statement from actuallyperforming the data retrieval steps and shows us (the users) adescription of the techniques the query engine was about to use to getat the data. That means that the steps of parsing the query, tokenizingthe symbols, and optimizing the execution plan still take place. It isduring this optimization phase that most of your CPU time is being usedas the engine will work many permutations of joining one table toanother until it reaches a decision about which plan is "less expensive"that all of the others.


http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
http://dev.mysql.com/doc/refman/5.0/en/left-join-optimization.html

and the rest of the optimization chapter describe this process inconsiderable details

http://dev.mysql.com/doc/refman/5.0/en/query-speed.html

Please do explain the data problems you are trying to solve in some moredetail as I may be able to help you to design a less cumbersome methodof achieving the same goals.


--
Shawn Green, Support Engineer
MySQL Inc., USA, www.mysql.com
Office: Blountville, TN
    __  ___     ___ ____  __
   /  |/  /_ __/ __/ __ \/ /
  / /|_/ / // /\ \/ /_/ / /__
 /_/  /_/\_, /___/\___\_\___/
        <___/
 Join the Quality Contribution Program Today!
 http://dev.mysql.com/qualitycontribution.html

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: excessive time spent in "statistics" status

Reply via email to