Re: Same condition in the CTE and in the subsequent JOIN using it

Alban Hertroys Wed, 25 Apr 2018 13:04:28 -0700

> On 25 Apr 2018, at 17:45, Alexander Farber <[email protected]> wrote:


(…)

> And here is the function source code:
> 
> CREATE OR REPLACE FUNCTION words_stat_scores(
>                 in_social integer,
>                 in_sid    text
>         ) RETURNS TABLE (
>                 out_day   text,
>                 out_diff  numeric,
>                 out_score numeric
>         ) AS
> $func$
>         WITH cte AS (
>                 SELECT
>                         DATE_TRUNC('day', m.played) AS day,
>                         m.mid,
>                         EXTRACT(EPOCH FROM m.played - LAG(m.played) OVER 
> (PARTITION BY m.gid ORDER BY m.played))::int/60 AS diff
>                 FROM    words_moves m
>                 JOIN    words_games g ON (m.gid = g.gid)
>                 JOIN    words_social s ON (s.uid IN (g.player1, g.player2))
>                 WHERE   s.social = in_social                           -- CAN 
> THIS BE REFERRED TO FROM BELOW?
>                 AND     s.sid = in_sid
>                 AND     m.played > CURRENT_TIMESTAMP - interval '1 month'
>         )
>         SELECT
>                 TO_CHAR(c.day, 'DD.MM.YYYY'),
>                 ROUND(AVG(c.diff)),
>                 ROUND(AVG(m.score), 1)
>         FROM    words_moves m
>         JOIN    cte c using(mid)
>         JOIN    words_social s USING(uid)
>         WHERE   s.social = in_social
>         AND     s.sid = in_sid
>         AND     m.action = 'play'
>         GROUP BY c.day
>         ORDER BY c.day;
> 
> $func$ LANGUAGE sql STABLE;
> 
> By looking at the above source code, do you think, that the condition being 
> used twice (the s.social = in_social AND s.sid = in_sid) is "too much" and 
> can be optimized? :-)

Actually, no. The conditions are part of different joins.

Within the CTE, you have a join that boils down to:

>   FROM    words_games g ON (m.gid = g.gid)
>   JOIN    words_social s ON (s.uid IN (g.player1, g.player2) AND s.social = 
> in_social AND s.sid = in_sid)


In your outer query, you have:

>   FROM    words_moves m
>   JOIN    words_social s ON (s.uid = m.uid AND s.social = in_social AND s.sid 
> = in_sid)


The joins are on different fields, in different tables even, so you can't just 
leave the conditions out because they filter different rows.

What you _can_ do is move the words_social JOIN and it's conditions into a new 
CTE and join with that instead.  Something like so:

WITH words_in_social AS (
        SELECT sid, uid
          FROM words_social
         WHERE social = in_social
           AND sid = in_sid
),
cte AS (
        SELECT
                DATE_TRUNC('day', m.played) AS day,
                m.mid,
                EXTRACT(EPOCH FROM m.played - LAG(m.played) OVER (PARTITION BY 
m.gid ORDER BY m.played))::int/60 AS diff
          FROM    words_moves m
          JOIN    words_games g ON (m.gid = g.gid)
          JOIN    words_in_social s ON (s.uid IN (g.player1, g.player2))
          WHERE   m.played > CURRENT_TIMESTAMP - interval '1
)
SELECT
        TO_CHAR(c.day, 'DD.MM.YYYY'),
        ROUND(AVG(c.diff)),
        ROUND(AVG(m.score), 1)
  FROM words_moves m
  JOIN cte c using(mid)
  JOIN words_in_social s USING(uid)
 WHERE m.action = 'play'
 GROUP BY c.day
 ORDER BY c.day;

That may look cleaner, but your original query probably performs better, since 
CTE's also act as an optimisation fence.

BTW, I suggest to use a better name for your CTE than cte; I'd rather use a 
name that clarifies its purpose.

> Thank you for any hints, I apologize if my question is too specific and 
> difficult to answer...
> 
> Regards
> Alex

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Same condition in the CTE and in the subsequent JOIN using it

Reply via email to