Hello Lucio,
(reply below)
Lucio Chiappetti wrote:
I have some queries, involving a largish number of JOIN, which
are apparently very slow or even take forever (a mysqladmin processlist
shows them remain in the "statistics" status for a long time, in most
cases I have to kill them after several minutes).
When I first had the problem I googled around and found some reference
(which I've lost) saying that the "statistics" status is actually what one
does with an EXPLAIN SELECT, and that this is done preliminarily to the
actual query. It also said it might occur with a large number of joins
because this analysis, for n joins MIGHT try up to n! combinations, unless
one somehow specified the priorities (but the author did not remember
how).
You can find those explanations in our manual at
http://dev.mysql.com/doc/refman/5.0/en/thread-information.html
I thought to have overcome the problem using a feature of the CREATE
VIEW command (see below), but apparently I simply moved it to an higher n.
Now I tried to see how it scales with the number of joins, and the curious
things is that e.g. for n=9 it works fast, for n=15 it works slowly, for
n=18 works fast again and for n=20 takes an infinite time.
I'll first explain my background :
- I have a number of tables (let's call them t0, t1, t2 ...)
- all of them have an auto_increment column called seq which is also
an index
- one table (t0) is "more important" (actually it is list of
celestial X-ray sources while the other are celestial objects
in other wavebands but this is irrelevant to you).
- I have precomputed correlation tables among t0 and each of
the other. These tables are called eg. t0ti, have two columns
t0 and ti corresponding to the t0.seq and ti.seq of objects which
are "associated". They are indexed on (t0,ti). Note that an
object in t0 can be associated with 1 or more or zero (t0ti.ti null)
objects in ti.
- I originally (already under mysql 3) devised a way to identify
counterparts in MORE tables (all these associations are based on
spherical distance + other criteria). This involved creating a
working table G
This table has columns named t0 t1 ... tn (containing the pointers
t0.seq t1.seq ... for counterparts associated and validated according
to some criteria) plus other service columns
The simultaneous access was achieved in our interface by a mechanism
we called virtual tables, which essentially was
SELECT
some subset of columns in some of the t0...tn
or some expression thereof
FROM
G left join t0 on G.t0=t0.seq
left join t1 on G.t1=t1.seq
...
left join tn on G.tn=tn.seq
We refer to the t0...tn as "the member tables" of G.
I normally do not get lost in symbolic descriptions such as yours.
However your description of G and how you build it leaves me in the dark.
Can you show me a few sample rows of G (symbolically, if you like) and
describe what is in each column G.t0 to G.tn? What I am curious to know
is what do each of these n object have in common that allows them to
represented as a single tuple on the G table.
You also mention other "service columns". What kinds of information are
you keeping in those?
We have different versions of G corresponding to different sets
of member tables and different association criteria.
The largest of our cases has 26 different members.
<snip>
- for each G we define a view as
"Each G" ? Again, that makes the concept of what a G really is more
confusing to me. I understand databases and I know more than a little
about stellar cartography, cosmology, and physics. Please don't hold back.
<snipped (to be revisited later)>
Questions :
- what does "explain select" actually do and why sometimes hangs ?
- can this be overcome rearranging the order of the joins (note that
the bulk of the members are all joined with G), or introducing
parentheses or with other syntax changes ?
- or has it to do with some configuration parameter, maybe related
to what is cached, cache size or other ?
Thanks in advance to whoever is able to give hints.
EXPLAIN SELECT simply stops a normal SELECT statement from actually
performing the data retrieval steps and shows us (the users) a
description of the techniques the query engine was about to use to get
at the data. That means that the steps of parsing the query, tokenizing
the symbols, and optimizing the execution plan still take place. It is
during this optimization phase that most of your CPU time is being used
as the engine will work many permutations of joining one table to
another until it reaches a decision about which plan is "less expensive"
that all of the others.
http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
http://dev.mysql.com/doc/refman/5.0/en/left-join-optimization.html
and the rest of the optimization chapter describe this process in
considerable details
http://dev.mysql.com/doc/refman/5.0/en/query-speed.html
Please do explain the data problems you are trying to solve in some more
detail as I may be able to help you to design a less cumbersome method
of achieving the same goals.
--
Shawn Green, Support Engineer
MySQL Inc., USA, www.mysql.com
Office: Blountville, TN
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ /
/ /|_/ / // /\ \/ /_/ / /__
/_/ /_/\_, /___/\___\_\___/
<___/
Join the Quality Contribution Program Today!
http://dev.mysql.com/qualitycontribution.html
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]