Re: MySQL LEFT JOIN Optimization Using LIMIT CLAUSE

Scott Klarenbach Fri, 03 Feb 2006 11:07:34 -0800

Thanks a lot Shawn.  As always, your advice has been very helpful.


On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
>
>
> Scott Klarenbach <[EMAIL PROTECTED]> wrote on 02/02/2006 02:01:11
> PM:
>
> > I have a table `requirement` which is left joining to a table
> `inventory`
> > based on a matching `partNumber` column.  The inventory table has
> millions
> > of records, the requirement table has tens of thousands.  I'm noticing
> that
> > the left join between requirement and inventory doesn't take advantage
> of a
> > LIMIT clause.  So whether I select all records from requirement or limit
> it
> > to 50, the LEFT JOIN operation still seems to be calculating for ALL
> > requirement records against ALL inventory records.  (The query takes the
> > exact same amount of time, whether I pull 50 requirement records or
> 10,000).
> >
> > How can I force mysql to only join the inventory table for the those 50
> > records brought back by the LIMIT clause?
> >
> > What I would do in a more powerful DB like SQL Server, is build a
> temporary
> > table with my 50 requirement rows, and then perform the inventory join
> on
> > the temp table.  But due to MySQL SPROC limitations (ie, LIMIT clauses
> must
> > have integer constants, not parameters) and View limititations (ie, no
> > indexing of views), I'd have to build this temporary table and the rest
> of
> > query in PHP first, which is really ugly.
> >
> > I'm hoping there is a nice SQL trick I can use with MySQL to restrict
> the
> > join to only those records that would come back from the limit set.
> >
> > Thanks,
> > Scott Klarenbach
>
> Yes, and no.  You cannot apply a LIMIT specifically to a JOIN clause
> unless you break your query into separate pieces and put limits on each of
> them.  What happens during the normal execution of a query is that after
> parsing and planning the engine begins collecting and combining the source
> data. Which records are combined and matched against which others is defined
> in the FROM clause and all of the JOIN clauses.
>
> The equivalent to a large virtual table (similar to saying "SELECT * FROM
> <all involved tables>") is created in memory. The only restrictions to which
> rows of data make it into this first processing stage come from the ON
> clauses (and any WHERE clauses the optimizer _may_ choose to include)
> defined between the JOINed tables. Next comes WHERE clause processing, then
> GROUP BY processing, HAVING processing, ORDER BY processing, and finally
> LIMIT processing.
>
> As you can see by the flow of query execution, LIMIT clauses are really
> only useful for restricting how much data is finally sent to the user. In
> order to minimize how much processing your CPU has to do to compute a
> particular query you have several tools at your disposal: indexes, temporary
> tables, and stepwize result construction.
>
> JOINing tables is a geometrically expensive action. The number of
> potential row matches increase by the product of the number of rows in each
> table involved in the join. If you can preselect certain target rows from
> your really large tables into smaller temporary tables and build your final
> result set from them, the query processor will only need to compute a small
> fraction of the row comparisons it would have had to perform compared to the
> number of row comparisons necessary to JOIN your original tables. Take this
> rough math as an example:
>
> TABLE A: 10000 rows
> TABLE B: 10000 rows
>
> SELECT * from A INNER JOIN B ON A.id <http://a.id/> = B.A_ic;
>
> There are potentially 10000 x 10000 = 100000000 (1.0e+08) row combinations
> to be checked. If instead of joining A to B, we create two derivative tables
> called C and D (assuming we don't change the column names)
>
> TABLE A -> TABLE C: 5000 rows
> TABLE B -> TABLE D: 1000 rows
>
> SELECT * from C INNER JOIN D ON C.id <http://c.id/> = D.A_ic;
>
> That means there are now 5000 x 1000 = 5000000 (5.0e+06) or 1/20th the
> number of comparisons to run. Computing tables C and D should be in linear
> or logarithmic time (because you should have good index coverage) so there
> will usually be a net gain in performance. This is the secret to stepwize
> result construction.
>
> To help you to optimize your particular query, I would need to see it and
> the table definitions it is working against (SHOW CREATE TABLE works best
> for me).
>
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
>
>

Re: MySQL LEFT JOIN Optimization Using LIMIT CLAUSE

Reply via email to