After playing with this in benchmarks and researching the weird results I got I'm going to advise dropping the todo for now unless something happens to change how postgres handles clustering. You guys probably already grokked this so I am just recording it for the list archives.
The primary factor here is that postgres doesn't maintain clustered indexes. Clustering is a one-time operation that clusters the table at this current point in time. Basically, there really isn't any such thing in postgres as a clustered index. There is an operation - Cluster - which takes an index and a table as input and re-orders the table according to the index. But it is borderline fiction to call the index used "clustered" because the next row inserted will pop in at the end of the table instead of slipping into the middle of the table per the desired ordering. All the pg_table cluster candidates are candidates because they have a row per table column and we expect that a query will want to get several of these rows at once. These rows are naturally clustered because the scripts that create them insert their information into the catalog contiguously. When you create a catalog table the pg_attribute rows for its columns are inserted together. When you then create all its triggers they too are put into pg_triggers one after the other. So calling the Cluster operation after dbinit doesn't help anything. Over time table alterations can fragment this information. If a user loads a bunch of tables, then alters them over time the columns added later on will have their metadata stored separately from the columns created originally. Which gets us to the down and dirty of how the Cluster function works. It puts an access exclusive lock on the entire table - blocking all attempts to read and write to the table - creates a copy of the table in the desired order, drops the original, and renames the copy. Doing this to a catalog table that is relevant to queries pretty much brings everything else in the database to a halt while the system table is locked up. And the brute force logic makes this time consuming even if the table is perfectly ordered already. Additionally, snapshots taken of the table during the Cluster operation make the table appear to be empty which introduces the possibility of system table corruption if transactions are run concurrently with a Cluster operation. So basically, the Cluster operation in its current form is not something you want running automatically on a bunch of system table as it is currently implemented. It gives your system the hiccups. You would only want to run it manually during downtime. And you can do that just as easily with or without any preparation during dbinit. Thanks everyone, -Simone Aiken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers