Re: unexpected EXPLAIN result with subselect

Michael Stassen Wed, 14 Sep 2005 08:23:46 -0700

Matthias Pigulla wrote:

Hi all,


can someone explain me why this query gets executed the way it is? I
simplified it as much as possible and think it's a conceptual/logical
thing, so I'll omit - at least for now - the table definitions and
sample data for brevity.

-- "Superquery"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
        -- "Subquery"
        SELECT DISTINCT data_id
        FROM wfd_reference
        WHERE wfd_field_id =699
        AND ref_data_id
        IN ( 171 )
)

This gives:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 277
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: wfd_reference
         type: ref
possible_keys: field_data,test
          key: field_data
      key_len: 4
          ref: const,func
         rows: 4
        Extra: Using where; Using temporary

However - why do we need the *dependent* subquery at all? The subquery
can be executed on its own, as it does not depend on any information of
the "superquery".

EXPLAIN SELECT DISTINCT data_id
FROM wfd_reference
WHERE wfd_field_id =699
AND ref_data_id

IN ( 171 )

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wfd_reference
         type: ref
possible_keys: field_data,test
          key: test
      key_len: 4
          ref: const,const
         rows: 9
        Extra: Using where; Using temporary

Now if I just take the result of this query, concat the data_ids on the
application level and build the superquery as follows:

-- "two-staged superquery variant"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
32, 31, 30, 53, 56, 57, 58, 59, 60, 111
)

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 10
        Extra: Using where

the results seem to be much better.

I was afraid of the dependent subquery for the "ALL" scan of t1, as t1
will become huge. OTOH, the subquery will be very restrictive: The
number of data_ids will always be very small, at least compared to the
number of rows in t1.

Are there any non-obvious reasons for the behaviour described above? Is
that something that cannot be optimized right now? Am I too
short-sighted with my "optimization" approach?

Thanks a lot,
Matthias

As others have already pointed out, mysql often doesn't optimize subqueriesproperly, and fixing that doesn't seem to be a high priority right now.Subqueries are relatively new in mysql, so it probably shouldn't besurprising that they aren't optimized as well as other things. Fortunately,most subqueries can be rewritten as joins<http://dev.mysql.com/doc/mysql/en/rewriting-subqueries.html>, which mysqldoes a good job of optimizing.


I believe your query is equivalent to

  SELECT DISTINCT t1.id, t1.name
  FROM document AS t1
  JOIN wfd_reference AS wfd ON t1.id = wfd.data_id
  WHERE wfd.wfd_field_id = 699
    AND wfd.ref_data_id IN (171);

which should perform better than the (incorrectly) dependent subquery,because it will be properly optimized. Mysql will use the 'test' index inwfd_reference to pick the rows which match the conditions, then use t1'sprimary key to get the corresponding rows from t1.

I notice the DISTINCT in your subquery. I take it that there may bemultiple rows in wfd_reference with the same data_id, wfd_field_id = 699,and ref_data_id IN (171)? (If not, just leave out the DISTINCT in theabove.) In that case, it may be possible that your 2-step approach will beeven faster than the JOIN with DISTINCT. You can do this directly in mysql,instead of in your app, by using a temporary table to store the inner queryresult. Something like


  CREATE TEMPORARY TABLE matches
    SELECT DISTINCT data_id
    FROM wfd_reference
    WHERE wfd_field_id = 699
      AND ref_data_id IN (171);

  SELECT t1.id, t1.name
  FROM document AS t1
  JOIN matches ON t1.id = matches.data_id;

  DROP TABLE matches;

Michael



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: unexpected EXPLAIN result with subselect

Reply via email to