First question... did you create the appropriate indexes on the appropriate columns for these tables? Foreign keys do not implicitly create indexes in postgres.
Bryan On 7/30/07, Cultural Sublimation <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm fairly new with Postgresql, so I am not sure if the performance > problems I'm having are due to poorly constructed queries/indices, > or if I bumped into more fundamental problems requiring a design of > my database structure. That's why I'm requesting your help. > > Here's the situation: I have three tables: Users, Stories, and Comments. > Stories have an author (a user), and a comment is associated with a > story and with the user who posted it. The structure of the database > is therefore fairly simple: (there are also some sequences, which I've > omitted for clarity) > > > CREATE TABLE users > ( > user_id int UNIQUE NOT NULL, > user_name text, > PRIMARY KEY (user_id) > ); > > > CREATE TABLE stories > ( > story_id int UNIQUE NOT NULL, > story_title text, > story_body text, > story_timestamp timestamptz, > story_author int REFERENCES users (user_id) NOT NULL, > PRIMARY KEY (story_id) > ); > > > CREATE TABLE comments > ( > comment_id int UNIQUE NOT NULL, > comment_title text, > comment_body text, > comment_timestamp timestamptz, > comment_story int REFERENCES stories (story_id) NOT > NULL, > comment_author int REFERENCES users (user_id) NOT NULL, > PRIMARY KEY (comment_id) > ); > > > I've also populated the database with some test data, comprising 1,000 > users, 1,000 stories (one per user), and 1,000,000 comments (one comment > per user per story). > > Now, the query I wish to optimise is also simple: get me all comments (the > comment_id suffices) and corresponding user *names* for a given story. > If for example the story_id is 100, the query looks like this: > > SELECT comments.comment_id, users.user_name > FROM comments, users > WHERE comments.comment_story = 100 AND comments.comment_author = > users.user_id; > > The problem is that this query takes a *very* long time. With the said > 1,000,000 comments, it needs at least 1100ms on my system. "Explain > analyze" tells me that a sequential scan is being performed on both > users and comments: > > Hash Join (cost=28.50..21889.09 rows=988 width=14) (actual > time=3.674..1144.779 rows=1000 loops=1) > Hash Cond: ((comments.comment_author)::integer = (users.user_id > )::integer) > -> Seq Scan on comments (cost=0.00..21847.00 rows=988 width=8) > (actual > time=0.185..1136.067 rows=1000 loops=1) > Filter: ((comment_story)::integer = 100) > -> Hash (cost=16.00..16.00 rows=1000 width=14) (actual time= > 3.425..3.425 > rows=1000 loops=1) > -> Seq Scan on users (cost=0.00..16.00 rows=1000 width=14) > (actual > time=0.068..1.845 rows=1000 loops=1) > Total runtime: 1146.424 ms > > On the long run, I guess one possible solution to this problem will be > to partition the comments table into a number of sub-tables, most likely > based on the timestamp attribute (by having current versus historic data). > Nevertheless, I am wondering if there are other more straightforward ways > to optimise this query. Some clever use of indices, perhaps? Or is > the way I am now constructing the select non-optimal? Or do I need > some pixie-magic on the Postgresql settings? Anyway, any suggestions > are welcome! (and thanks in advance) > > Regards, > C.S. > > > > > > ____________________________________________________________________________________ > Yahoo! oneSearch: Finally, mobile search > that gives answers, not web links. > http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >