Hi guys! This'll hopefully be an easy question for one of the perf guys like Tom..

I'm wondering what the performance advantage is to using a clustered index. Take the following example: I have a table of recipes with a recipeid, and a table of comments that various users might have entered about those recipes ("this tastes like crap!", etc) with a commentid and a FK on recipeid. Naturally I want to be able to say "give me all the comments about recipe x". To me, this seems like the perfect opportunity to use a cluster. Inserts are fairly rare and reads need to be super fast. If all the comments were clustered by recipeid, all the data would be right next to each other on disk. It seems like the query planner could count on this and make a lot of cool optimizations. It could just find the first row with recipe x and traverse the rows until it got to the next recipe.

However, the thing that really confuses me is I can insert as many rows as I want, and they're not clustered. In fact, the only time they'll cluster is if I run the "CLUSTER RecipeComments" command. Sure, I could write a trigger to do this on insert, or I could write a script that runs every midnight to do this, or what not. However, if the query planner can't assume the data is in a certain order on disk, what's the point of having this at all?

I'm quite sure SQL Server doesn't work this way and I'm not sure about Oracle. Can someone enlighten me on the exact benefit of this? Thanks!!

Mike

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to