Hello Lucio,

(reply below)
Lucio Chiappetti wrote:
I have some queries, involving a largish number of JOIN, which are apparently very slow or even take forever (a mysqladmin processlist shows them remain in the "statistics" status for a long time, in most cases I have to kill them after several minutes).

When I first had the problem I googled around and found some reference (which I've lost) saying that the "statistics" status is actually what one does with an EXPLAIN SELECT, and that this is done preliminarily to the actual query. It also said it might occur with a large number of joins because this analysis, for n joins MIGHT try up to n! combinations, unless one somehow specified the priorities (but the author did not remember how).


You can find those explanations in our manual at http://dev.mysql.com/doc/refman/5.0/en/thread-information.html


I thought to have overcome the problem using a feature of the CREATE VIEW command (see below), but apparently I simply moved it to an higher n.

Now I tried to see how it scales with the number of joins, and the curious things is that e.g. for n=9 it works fast, for n=15 it works slowly, for n=18 works fast again and for n=20 takes an infinite time.

I'll first explain my background :

 - I have a number of tables (let's call them t0, t1, t2 ...)
 - all of them have an auto_increment column called seq which is also
   an index

 - one table (t0) is "more important" (actually it is list of
   celestial X-ray sources while the other are celestial objects
   in other wavebands but this is irrelevant to you).

 - I have precomputed correlation tables among t0 and each of
   the other. These tables are called eg. t0ti, have two columns
   t0 and ti corresponding to the t0.seq and ti.seq of objects which
   are "associated". They are indexed on (t0,ti). Note that an
   object in t0 can be associated with 1 or more or zero (t0ti.ti null)
   objects in ti.

- I originally (already under mysql 3) devised a way to identify counterparts in MORE tables (all these associations are based on spherical distance + other criteria). This involved creating a
   working table G

   This table has columns named t0 t1 ... tn (containing the pointers
t0.seq t1.seq ... for counterparts associated and validated according to some criteria) plus other service columns

   The simultaneous access was achieved in our interface by a mechanism
   we called virtual tables, which essentially was

     SELECT
      some subset of columns in some of the t0...tn
      or some expression thereof
     FROM
      G left join t0 on G.t0=t0.seq
        left join t1 on G.t1=t1.seq
        ...
left join tn on G.tn=tn.seq
   We refer to the t0...tn as "the member tables" of G.


I normally do not get lost in symbolic descriptions such as yours. However your description of G and how you build it leaves me in the dark.

Can you show me a few sample rows of G (symbolically, if you like) and describe what is in each column G.t0 to G.tn? What I am curious to know is what do each of these n object have in common that allows them to represented as a single tuple on the G table.

You also mention other "service columns". What kinds of information are you keeping in those?



   We have different versions of G corresponding to different sets
   of member tables and different association criteria.

   The largest of our cases has 26 different members.
<snip>
 - for each G we define a view as


"Each G" ? Again, that makes the concept of what a G really is more confusing to me. I understand databases and I know more than a little about stellar cartography, cosmology, and physics. Please don't hold back.


<snipped (to be revisited later)>

Questions :

 - what does "explain select" actually do and why sometimes hangs ?

 - can this be overcome rearranging the order of the joins (note that
   the bulk of the members are all joined with G), or introducing
   parentheses or with other syntax changes ?

 - or has it to do with some configuration parameter, maybe related
   to what is cached, cache size or other ?
Thanks in advance to whoever is able to give hints.


EXPLAIN SELECT simply stops a normal SELECT statement from actually performing the data retrieval steps and shows us (the users) a description of the techniques the query engine was about to use to get at the data. That means that the steps of parsing the query, tokenizing the symbols, and optimizing the execution plan still take place. It is during this optimization phase that most of your CPU time is being used as the engine will work many permutations of joining one table to another until it reaches a decision about which plan is "less expensive" that all of the others.

http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
http://dev.mysql.com/doc/refman/5.0/en/left-join-optimization.html
and the rest of the optimization chapter describe this process in considerable details
http://dev.mysql.com/doc/refman/5.0/en/query-speed.html

Please do explain the data problems you are trying to solve in some more detail as I may be able to help you to design a less cumbersome method of achieving the same goals.

--
Shawn Green, Support Engineer
MySQL Inc., USA, www.mysql.com
Office: Blountville, TN
    __  ___     ___ ____  __
   /  |/  /_ __/ __/ __ \/ /
  / /|_/ / // /\ \/ /_/ / /__
 /_/  /_/\_, /___/\___\_\___/
        <___/
 Join the Quality Contribution Program Today!
 http://dev.mysql.com/qualitycontribution.html

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to