Thanks Joris.

I'm familiar with explain and I do use it quite frequently.  My site is just
starting out so I don't think performance is that crucial of an issue at the
onset (premature optimization).  Also, there is not that much data yet so
its difficult to have a very accurate picture of what a full production load
would look like.

So I guess I was looking for some type of rule of thumb for when the
different options would be better.  I guess I'll just have to keep an eye
out for my trouble queries and work with my indexes and techniques as I
progress.  Thanks again.

On Sat, Dec 13, 2008 at 6:51 PM, Joris Dobbelsteen <
jo...@familiedobbelsteen.nl> wrote:

> Jonathon Suggs wrote, On 10-12-08 20:12:
>
>> I'm asking this as a more general question on which will perform better.
>>  I'm trying to get a set of comments and their score/rankings from two
>> tables.
>>
>> *comments*
>> cid (integer, primary key)
>> title
>> body
>>
>> *comment_ratings*
>> cid (integer, primary key)
>> uid (integer, primary key)
>> score
>>
>> *Option 1* (Single group by query)
>> select c.cid,c.title,c.body,coalsece(sum(r.score),0) as score
>> from comments c left outer join comment_ratings r on c.cid = r.cid
>> group by c.cid,c.title,c.body;
>>
>> *Option 2* (Two simple queries)
>> select cid,title,body from comments;
>> select cid,sum(score) from comment_ratings;
>>
>> I know that in general you always want to keep the number of queries to a
>> minimum and that Option 1 is the most logical/preferred way to proceed.
>>  However, I've greatly simplified Option 1.  In practice I'm going to be
>> pulling a lot more columns and will be inner joining to additional tables.
>>  /So my real question is whether having a lot of items in the group by
>> clause will have an adverse effect on performance./
>>
>
> You can answer that yourself... Create the database and fill it with a
> decent set of sample data (if that doesn't already exists). Run analyze to
> get your statistics. Use explain/explain analyze on the queries you are
> interested in.
>
> The manual has very good references. If you like, pgadmin can even create
> graphical representation (which is what I prefer).
>
> The obvious advantage depends on what you want to achieve? If its just a
> dump of a table and an aggregate, you second option might be better.
> If you need both data to transform it into something the user will see,
> probably the first option is better, as you leave combining to the highly
> specialized and optimized software and don't have to do it yourself. The
> choice has to do with index scans and table scans, that depend on the
> situation.
>
> In fact, if the first one doesn't come out nice (first join than aggregate,
> do aggeration in a subquery). So: learn explain!!! Every (decent) RDBMS has
> such a facility.
>
> - Joris
>

Reply via email to