Cultural Sublimation wrote:
CREATE TABLE users
(
user_id int UNIQUE NOT NULL,
user_name text,
PRIMARY KEY (user_id)
);
CREATE TABLE stories
(
story_id int UNIQUE NOT NULL,
story_title text,
story_body text,
story_timestamp timestamptz,
story_author int REFERENCES users (user_id) NOT NULL,
PRIMARY KEY (story_id)
);
CREATE TABLE comments
(
comment_id int UNIQUE NOT NULL,
comment_title text,
comment_body text,
comment_timestamp timestamptz,
comment_story int REFERENCES stories (story_id) NOT NULL,
comment_author int REFERENCES users (user_id) NOT NULL,
PRIMARY KEY (comment_id)
);
I've also populated the database with some test data, comprising 1,000
users, 1,000 stories (one per user), and 1,000,000 comments (one comment
per user per story).
Now, the query I wish to optimise is also simple: get me all comments (the
comment_id suffices) and corresponding user *names* for a given story.
If for example the story_id is 100, the query looks like this:
SELECT comments.comment_id, users.user_name
FROM comments, users
WHERE comments.comment_story = 100 AND comments.comment_author = users.user_id;
The problem is that this query takes a *very* long time. With the said
1,000,000 comments, it needs at least 1100ms on my system. "Explain
analyze" tells me that a sequential scan is being performed on both
users and comments:
What else is it supposed to do? You haven't created any indexes. I'm
also guessing that you haven't analysed the tables either.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend