On Thu, Dec 08, 2005 at 11:58:50AM +0200, Hannu Krosing wrote: > ??hel kenal p??eval, N, 2005-12-08 kell 01:08, kirjutas Jim C. Nasby: > > On Thu, Dec 08, 2005 at 08:57:42AM +0200, Hannu Krosing wrote: > > > ??hel kenal p??eval, N, 2005-12-08 kell 00:16, kirjutas Jim C. Nasby: > > > > On Sat, Dec 03, 2005 at 10:15:25AM -0500, Greg Stark wrote: > > > > > Tom Lane <[EMAIL PROTECTED]> writes: > > > > > > What's worse, once you have excluded writes you have to rescan the > > > > > > entire > > > > > > table to be sure you haven't missed anything. So in the scenarios > > > > > > where this > > > > > > whole thing is actually interesting, ie enormous tables, you're > > > > > > still > > > > > > talking about a fairly long interval with writes locked out. Maybe > > > > > > not as > > > > > > long as a complete REINDEX, but long. > > > > > > > > > > I was thinking you would set a flag to disable use of the FSM for > > > > > inserts/updates while the reindex was running. So you would know > > > > > where to find > > > > > the new tuples, at the end of the table after the last tuple you read. > > > > > > > > What about keeping a separate list of new tuples? Obviously we'd only do > > > > this when an index was being built on a table. > > > > > > The problem with separate list is that it can be huge. For example on a > > > table with 200 inserts/updates per second an index build lasting 6 hours > > > would accumulate total on 6*3600*200 = 4320000 new tuples. > > > > Sure, but it's unlikely that such a table would be very wide, so 4.3M > > tuples would probably only amount to a few hundred MB of data. It's also > > possible that this list could be vacuumed by whatever the regular vacuum > > process is for the table. > > I think that keeping such list as part the table at well defined > location (like pages from N to M) is the best strategy, as it will > automatically make all new tuples available to parallel processes and > avoids both duplicate storage as well as the the need for changing > insert/update code.
There's one thing I hate about that idea though: good luck trying to move those tuples somewhere else after the index build is done and you now want to shrink the table back down to a more normal size. If we had a better way to do that it would be much more palatable, but right now on a heavily updated table this would result in a lot of bloat. Along those lines, I've wondered if it makes sense to add more flexibility in how free space is reclaimed in a table. One obvious possibility is to have a strategy where new tuples will always look to the FSM for space (instead of going into the current page if possible), and the FSM will always hand out the earliest page in the table it has. This mode would have the effect of moving tuples towards the front of the table, allowing for space reclamation. A variation might be that this mode will not effect tuples that are generated as part of an UPDATE and are in the first x% of the table, since it doesn't make sense to move a tuple from page 2 to page 1 in a 1000 page table. Another possibility is to always go to the FSM and to have the FSM hand back the page that is closest to the new tuple according to a certain index. This would allow for ALTER TABLE CLUSTER to be much more self-maintaining. The downside is that you'd have to do a lookup on that index, but presumably if the index is important enough to cluster on then it should be well-cached. There's probably some other tweaks that could be done as well to make this more performant. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match