Not sure if I'm showing you what you asked for, but here it is:
select * from pg_stats where tablename='user_url_tag' and
attname='user_url_id';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs |
histogram_bounds
| correlation
------------+--------------+-------------+-----------+-----------+------------+------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-------------
public | user_url_tag | user_url_id | 0 | 4 | 60825 |
{458321,1485346,16304,68027,125417,153465,182503,201175,202973,218423} |
{0.00133333,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
|
{195,195993,325311,480323,647778,782598,1014527,1201726,1424822,1614712,1853719}
| 0.795521
You asked if the table has been analyzed recently.
I think so - I run ANALYZE on the whole DB every night, like this:
$ psql -U me -c "ANALYZE;" mydb
For a good measure, I just analyzed the table now: $ psql -U me -c "ANALYZE
user_url_tag;" mydb
Then I set the enable_hashjoin back to ON and re-run the EXPLAIN ANALYZE.
I still get the sequential scan, even after analyzing the table :(
I'm not sure which numbers you are referring to when you said the estimate is
off, but here are some numbers:
The whole table has 6-7 M rows.
That query matches about 2500 rows.
If there are other things I can play with and help narrow this down, please let
me know.
Thanks,
Otis
----- Original Message ----
From: Tom Lane <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [email protected]
Sent: Wednesday, May 10, 2006 9:53:49 PM
Subject: Re: [SQL] Help with a seq scan on multi-million row table
<[EMAIL PROTECTED]> writes:
> Aha! set hashjoin=off did the trick.
> -> Index Scan using ix_user_url_tag_user_url_id on
> user_url_tag userurltag0_ (cost=0.00..157.34 rows=103 width=14) (actual
> time=1.223..1.281 rows=5 loops=1666)
> Index Cond: (userurltag0_.user_url_id = "outer".id)
This seems to be the problem right here: the estimate of matching rows
is off by a factor of 20, and that inflates the overall cost estimate
for this plan about the same, causing the planner to think the other way
is cheaper.
What does the pg_stats row for user_url_tag.user_url_id contain?
Have you analyzed that table recently?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings