Thanks Joris. I'm familiar with explain and I do use it quite frequently. My site is just starting out so I don't think performance is that crucial of an issue at the onset (premature optimization). Also, there is not that much data yet so its difficult to have a very accurate picture of what a full production load would look like.
So I guess I was looking for some type of rule of thumb for when the different options would be better. I guess I'll just have to keep an eye out for my trouble queries and work with my indexes and techniques as I progress. Thanks again. On Sat, Dec 13, 2008 at 6:51 PM, Joris Dobbelsteen < jo...@familiedobbelsteen.nl> wrote: > Jonathon Suggs wrote, On 10-12-08 20:12: > >> I'm asking this as a more general question on which will perform better. >> I'm trying to get a set of comments and their score/rankings from two >> tables. >> >> *comments* >> cid (integer, primary key) >> title >> body >> >> *comment_ratings* >> cid (integer, primary key) >> uid (integer, primary key) >> score >> >> *Option 1* (Single group by query) >> select c.cid,c.title,c.body,coalsece(sum(r.score),0) as score >> from comments c left outer join comment_ratings r on c.cid = r.cid >> group by c.cid,c.title,c.body; >> >> *Option 2* (Two simple queries) >> select cid,title,body from comments; >> select cid,sum(score) from comment_ratings; >> >> I know that in general you always want to keep the number of queries to a >> minimum and that Option 1 is the most logical/preferred way to proceed. >> However, I've greatly simplified Option 1. In practice I'm going to be >> pulling a lot more columns and will be inner joining to additional tables. >> /So my real question is whether having a lot of items in the group by >> clause will have an adverse effect on performance./ >> > > You can answer that yourself... Create the database and fill it with a > decent set of sample data (if that doesn't already exists). Run analyze to > get your statistics. Use explain/explain analyze on the queries you are > interested in. > > The manual has very good references. If you like, pgadmin can even create > graphical representation (which is what I prefer). > > The obvious advantage depends on what you want to achieve? If its just a > dump of a table and an aggregate, you second option might be better. > If you need both data to transform it into something the user will see, > probably the first option is better, as you leave combining to the highly > specialized and optimized software and don't have to do it yourself. The > choice has to do with index scans and table scans, that depend on the > situation. > > In fact, if the first one doesn't come out nice (first join than aggregate, > do aggeration in a subquery). So: learn explain!!! Every (decent) RDBMS has > such a facility. > > - Joris >