Thanks a lot Shawn. As always, your advice has been very helpful.
On 2/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > Scott Klarenbach <[EMAIL PROTECTED]> wrote on 02/02/2006 02:01:11 > PM: > > > I have a table `requirement` which is left joining to a table > `inventory` > > based on a matching `partNumber` column. The inventory table has > millions > > of records, the requirement table has tens of thousands. I'm noticing > that > > the left join between requirement and inventory doesn't take advantage > of a > > LIMIT clause. So whether I select all records from requirement or limit > it > > to 50, the LEFT JOIN operation still seems to be calculating for ALL > > requirement records against ALL inventory records. (The query takes the > > exact same amount of time, whether I pull 50 requirement records or > 10,000). > > > > How can I force mysql to only join the inventory table for the those 50 > > records brought back by the LIMIT clause? > > > > What I would do in a more powerful DB like SQL Server, is build a > temporary > > table with my 50 requirement rows, and then perform the inventory join > on > > the temp table. But due to MySQL SPROC limitations (ie, LIMIT clauses > must > > have integer constants, not parameters) and View limititations (ie, no > > indexing of views), I'd have to build this temporary table and the rest > of > > query in PHP first, which is really ugly. > > > > I'm hoping there is a nice SQL trick I can use with MySQL to restrict > the > > join to only those records that would come back from the limit set. > > > > Thanks, > > Scott Klarenbach > > Yes, and no. You cannot apply a LIMIT specifically to a JOIN clause > unless you break your query into separate pieces and put limits on each of > them. What happens during the normal execution of a query is that after > parsing and planning the engine begins collecting and combining the source > data. Which records are combined and matched against which others is defined > in the FROM clause and all of the JOIN clauses. > > The equivalent to a large virtual table (similar to saying "SELECT * FROM > <all involved tables>") is created in memory. The only restrictions to which > rows of data make it into this first processing stage come from the ON > clauses (and any WHERE clauses the optimizer _may_ choose to include) > defined between the JOINed tables. Next comes WHERE clause processing, then > GROUP BY processing, HAVING processing, ORDER BY processing, and finally > LIMIT processing. > > As you can see by the flow of query execution, LIMIT clauses are really > only useful for restricting how much data is finally sent to the user. In > order to minimize how much processing your CPU has to do to compute a > particular query you have several tools at your disposal: indexes, temporary > tables, and stepwize result construction. > > JOINing tables is a geometrically expensive action. The number of > potential row matches increase by the product of the number of rows in each > table involved in the join. If you can preselect certain target rows from > your really large tables into smaller temporary tables and build your final > result set from them, the query processor will only need to compute a small > fraction of the row comparisons it would have had to perform compared to the > number of row comparisons necessary to JOIN your original tables. Take this > rough math as an example: > > TABLE A: 10000 rows > TABLE B: 10000 rows > > SELECT * from A INNER JOIN B ON A.id <http://a.id/> = B.A_ic; > > There are potentially 10000 x 10000 = 100000000 (1.0e+08) row combinations > to be checked. If instead of joining A to B, we create two derivative tables > called C and D (assuming we don't change the column names) > > TABLE A -> TABLE C: 5000 rows > TABLE B -> TABLE D: 1000 rows > > SELECT * from C INNER JOIN D ON C.id <http://c.id/> = D.A_ic; > > That means there are now 5000 x 1000 = 5000000 (5.0e+06) or 1/20th the > number of comparisons to run. Computing tables C and D should be in linear > or logarithmic time (because you should have good index coverage) so there > will usually be a net gain in performance. This is the secret to stepwize > result construction. > > To help you to optimize your particular query, I would need to see it and > the table definitions it is working against (SHOW CREATE TABLE works best > for me). > > Shawn Green > Database Administrator > Unimin Corporation - Spruce Pine > >