Hi all, my impression on 7.4 performances:
===== Good news =====
How expected the use of clausole 'IN' with Postgres 7.4beta2 now is really fast ( see the result below )
===== Bad news =====
I posted time ago about a slow query:
SELECT ul.* FROM user_logs ul, user_data ud, class_default cd WHERE ul.id_user = ud.id_user AND ud.id_class = cd.id_class AND cd.id_provider = 39;
these are the information about the tables involved:
user_logs: ~1.5 Milion rows user_data: ~10000 rows class_default ~100 rows
and I found also that was better do this query in three steps:
SELECT id_class FROM class_default WHERE id_provider = 39;
SELECT id_user FROM user_data WHERE id_class IN ( 48 ); <= result of query above
SELECT * FROM user_logs WHERE id_user IN ( 11126, ...., 11769 ); <= 43 values result
This last query runs for 10.30 msec !!!
Here the comparison between Postgres7.3.3 and Postgres7.4beta2 for that original query:
Postgres 7.3.3
test=# explain analyze select ul.* test-# from user_logs ul, test-# user_data ud, test-# class_default cd test-# where ul.id_user = ud.id_user and test-# ud.id_class = cd.id_class and test-# cd.id_provider = 39;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=325.79..36234.20 rows=41957 width=60) (actual time=6151.29..7022.29 rows=702 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530 width=48) (actual time=0.02..5427.07 rows=1426530 loops=1)
-> Hash (cost=324.97..324.97 rows=329 width=12) (actual time=320.97..320.97 rows=0 loops=1)
-> Nested Loop (cost=0.00..324.97 rows=329 width=12) (actual time=0.24..320.89 rows=43 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.43 rows=1 width=4) (actual time=0.05..0.07 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud (cost=0.00..318.55 rows=400 width=8) (actual time=0.19..320.72 rows=43 loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 7023.15 msec
(10 rows)
Postgres 7.4beta2
test=# explain analyze select ul.* test-# from user_logs ul, test-# user_data ud, test-# class_default cd test-# where ul.id_user = ud.id_user and test-# ud.id_class = cd.id_class and test-# cd.id_provider = 39;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=322.94..36127.70 rows=42081 width=47) (actual time=2629.84..2990.00 rows=702 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530 width=47) (actual time=0.03..1738.65 rows=1426530 loops=1)
-> Hash (cost=322.12..322.12 rows=330 width=4) (actual time=0.78..0.78 rows=0 loops=1)
-> Nested Loop (cost=0.00..322.12 rows=330 width=4) (actual time=0.19..0.71 rows=43 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.43 rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 2990.70 msec
(10 rows)
The performance are really improved but not yet 10 msecs.
I tried nesting the 3 queries that I was speak about, I did it only with 7.4 due the fact with 7.3.3 I never seen the result.
test=# EXPLAIN ANALYZE SELECT * test-# FROM user_logs test-# WHERE id_user in ( test(# SELECT id_user test(# FROM user_data test(# WHERE id_class in ( test(# SELECT id_class FROM class_default WHERE id_provider = 39 test(# ) test(# );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Hash IN Join (cost=322.94..36127.70 rows=42081 width=47) (actual time=2626.20..2985.53 rows=702 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs (cost=0.00..28251.30 rows=1426530 width=47) (actual time=0.03..1731.59 rows=1426530 loops=1)
-> Hash (cost=322.12..322.12 rows=330 width=4) (actual time=0.80..0.80 rows=0 loops=1)
-> Nested Loop (cost=1.43..322.12 rows=330 width=4) (actual time=0.22..0.72 rows=43 loops=1)
-> HashAggregate (cost=1.43..1.43 rows=1 width=4) (actual time=0.10..0.10 rows=1 loops=1)
-> Seq Scan on class_default (cost=0.00..1.43 rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data (cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43 loops=1)
Index Cond: (user_data.id_class = "outer".id_class)
Total runtime: 2986.33 msec
(11 rows)
How you can see with 7.4 the two queries ( the original with the join and with the IN nested ) are performing at the same way.
Regards Gaetano Mendola
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match