[GENERAL] CLUSTERing on Insert

CG Sun, 17 Sep 2006 22:28:48 -0700

As I'm waiting for a CLUSTER operation to finish, it occurs to me that in a lot 
of cases, the performance benefits to having one's data stored on disk in index 
order can outweigh the overhead involved in inserting data on-disk in index 
order.... Just an idea I thought I'd throw out. :) 
 
Also, the CLUSTER operation is about as straight forward as one can get. It 
basically reads each row, one-by-one, in the index order over to the new table, 
reindexes, then renames the new table to preserve references. I've been 
thinking about how to speed up the copy process. Perhaps taking contiguous 
blocks of data and moving them into place would save some I/O time. Locking the 
table is another problem. Would it be impossible to perform the CLUSTER within 
the context of a READ COMMITTED transaction, and then pick up the leftover CRUD 
rows and put them at the end of the file. The existing code makes some 
assumptions that the table was not altered. There would be no more assumptions. 
 
I'm sure I'm not the first person to scratch his head thinking about CLUSTER. 
Maybe I just don't really understand the limitations that are out there 
preventing these things from being created. But, what else is there to do at 
1AM on a Sunday night waiting for a 500MB table to CLUSTER? :)
 
 
CG


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[GENERAL] CLUSTERing on Insert

Reply via email to