[HACKERS] 7.4beta2 vs 7.3.3

Gaetano Mendola Thu, 18 Sep 2003 12:45:35 -0700

Hi all,
my impression on 7.4 performances:

===== Good news =====

How expected the use of clausole 'IN' with Postgres 7.4beta2
now is really fast ( see the result below )

===== Bad news =====

I posted time ago about a slow query:

SELECT ul.*
FROM user_logs ul,
     user_data ud,
     class_default cd
WHERE ul.id_user = ud.id_user AND
      ud.id_class = cd.id_class AND
      cd.id_provider = 39;

these are the information about the tables involved:

user_logs:    ~1.5 Milion rows
user_data:    ~10000 rows
class_default ~100 rows

and I found also that was better do this query
in three steps:

SELECT id_class
FROM class_default
WHERE id_provider = 39;

SELECT id_user
FROM user_data
WHERE id_class IN ( 48 ); <= result of query above

SELECT *
FROM user_logs
WHERE id_user IN ( 11126, ...., 11769 );   <= 43 values result

This last query runs for 10.30 msec !!!

Here the comparison between Postgres7.3.3 and
Postgres7.4beta2 for that original query:

Postgres 7.3.3

test=# explain analyze select ul.*
test-# from user_logs ul,
test-#      user_data ud,
test-#      class_default cd
test-# where ul.id_user = ud.id_user and
test-#       ud.id_class = cd.id_class and
test-#       cd.id_provider = 39;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=325.79..36234.20 rows=41957 width=60) (actual time=6151.29..7022.29 rows=702 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530 width=48) (actual time=0.02..5427.07 rows=1426530 loops=1) -> Hash (cost=324.97..324.97 rows=329 width=12) (actual time=320.97..320.97 rows=0 loops=1) -> Nested Loop (cost=0.00..324.97 rows=329 width=12) (actual time=0.24..320.89 rows=43 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.43 rows=1 width=4) (actual time=0.05..0.07 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..318.55 rows=400 width=8) (actual time=0.19..320.72 rows=43 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 7023.15 msec (10 rows)

Postgres 7.4beta2

test=# explain analyze select ul.*
test-# from user_logs ul,
test-#      user_data ud,
test-#      class_default cd
test-# where ul.id_user = ud.id_user and
test-#       ud.id_class = cd.id_class and
test-#       cd.id_provider = 39;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=322.94..36127.70 rows=42081 width=47) (actual time=2629.84..2990.00 rows=702 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530 width=47) (actual time=0.03..1738.65 rows=1426530 loops=1) -> Hash (cost=322.12..322.12 rows=330 width=4) (actual time=0.78..0.78 rows=0 loops=1) -> Nested Loop (cost=0.00..322.12 rows=330 width=4) (actual time=0.19..0.71 rows=43 loops=1) -> Seq Scan on class_default cd (cost=0.00..1.43 rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data ud (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 loops=1) Index Cond: (ud.id_class = "outer".id_class) Total runtime: 2990.70 msec (10 rows)

The performance are really improved but not yet 10 msecs.


I tried nesting the 3 queries that I was speak about,
I did it only with 7.4 due the fact with 7.3.3 I never
seen the result.

test=# EXPLAIN ANALYZE SELECT *
test-# FROM user_logs
test-# WHERE id_user in (
test(#    SELECT id_user
test(#    FROM user_data
test(#    WHERE id_class in (
test(#        SELECT id_class FROM class_default WHERE id_provider = 39
test(#    )
test(# );

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------- Hash IN Join (cost=322.94..36127.70 rows=42081 width=47) (actual time=2626.20..2985.53 rows=702 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Seq Scan on user_logs (cost=0.00..28251.30 rows=1426530 width=47) (actual time=0.03..1731.59 rows=1426530 loops=1) -> Hash (cost=322.12..322.12 rows=330 width=4) (actual time=0.80..0.80 rows=0 loops=1) -> Nested Loop (cost=1.43..322.12 rows=330 width=4) (actual time=0.22..0.72 rows=43 loops=1) -> HashAggregate (cost=1.43..1.43 rows=1 width=4) (actual time=0.10..0.10 rows=1 loops=1) -> Seq Scan on class_default (cost=0.00..1.43 rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1) Filter: (id_provider = 39) -> Index Scan using idx_user_data_class on user_data (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 loops=1) Index Cond: (user_data.id_class = "outer".id_class) Total runtime: 2986.33 msec (11 rows)


How you can see with 7.4 the two queries ( the original with the join
and with the IN nested ) are performing at the same way.

Regards
Gaetano Mendola

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
     joining column's datatypes do not match

[HACKERS] 7.4beta2 vs 7.3.3

Reply via email to