Matthias Pigulla wrote:
Hi all,

can someone explain me why this query gets executed the way it is? I
simplified it as much as possible and think it's a conceptual/logical
thing, so I'll omit - at least for now - the table definitions and
sample data for brevity.

-- "Superquery"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
        -- "Subquery"
        SELECT DISTINCT data_id
        FROM wfd_reference
        WHERE wfd_field_id =699
        AND ref_data_id
        IN ( 171 )
)

This gives:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 277
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: wfd_reference
         type: ref
possible_keys: field_data,test
          key: field_data
      key_len: 4
          ref: const,func
         rows: 4
        Extra: Using where; Using temporary

However - why do we need the *dependent* subquery at all? The subquery
can be executed on its own, as it does not depend on any information of
the "superquery".

EXPLAIN SELECT DISTINCT data_id
FROM wfd_reference
WHERE wfd_field_id =699
AND ref_data_id
IN ( 171 )
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wfd_reference
         type: ref
possible_keys: field_data,test
          key: test
      key_len: 4
          ref: const,const
         rows: 9
        Extra: Using where; Using temporary

Now if I just take the result of this query, concat the data_ids on the
application level and build the superquery as follows:

-- "two-staged superquery variant"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
32, 31, 30, 53, 56, 57, 58, 59, 60, 111
)

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 10
        Extra: Using where

the results seem to be much better.
I was afraid of the dependent subquery for the "ALL" scan of t1, as t1
will become huge. OTOH, the subquery will be very restrictive: The
number of data_ids will always be very small, at least compared to the
number of rows in t1.

Are there any non-obvious reasons for the behaviour described above? Is
that something that cannot be optimized right now? Am I too
short-sighted with my "optimization" approach?

Thanks a lot,
Matthias

As others have already pointed out, mysql often doesn't optimize subqueries properly, and fixing that doesn't seem to be a high priority right now. Subqueries are relatively new in mysql, so it probably shouldn't be surprising that they aren't optimized as well as other things. Fortunately, most subqueries can be rewritten as joins <http://dev.mysql.com/doc/mysql/en/rewriting-subqueries.html>, which mysql does a good job of optimizing.

I believe your query is equivalent to

  SELECT DISTINCT t1.id, t1.name
  FROM document AS t1
  JOIN wfd_reference AS wfd ON t1.id = wfd.data_id
  WHERE wfd.wfd_field_id = 699
    AND wfd.ref_data_id IN (171);

which should perform better than the (incorrectly) dependent subquery, because it will be properly optimized. Mysql will use the 'test' index in wfd_reference to pick the rows which match the conditions, then use t1's primary key to get the corresponding rows from t1.

I notice the DISTINCT in your subquery. I take it that there may be multiple rows in wfd_reference with the same data_id, wfd_field_id = 699, and ref_data_id IN (171)? (If not, just leave out the DISTINCT in the above.) In that case, it may be possible that your 2-step approach will be even faster than the JOIN with DISTINCT. You can do this directly in mysql, instead of in your app, by using a temporary table to store the inner query result. Something like

  CREATE TEMPORARY TABLE matches
    SELECT DISTINCT data_id
    FROM wfd_reference
    WHERE wfd_field_id = 699
      AND ref_data_id IN (171);

  SELECT t1.id, t1.name
  FROM document AS t1
  JOIN matches ON t1.id = matches.data_id;

  DROP TABLE matches;

Michael



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to