I'm a bit embarrassed to bring this up here because I don't know much about storage layout and indexing. It's probably a silly notion, but if so, could someone please tell me how and why?
First I'll describe the situation that leads me to write this. I'm seeing some performance problems in an application that keeps a large table of logged events. Rows aren't normally ever updated, but new ones are inserted all the time. There are several fields containing various kinds of timestamps, some exact and some sloppy. These and perhaps other columns are loosely correlated with the order in which the rows are inserted. [Q: Does this happen to a lot of users? Is it worth bothering with?] Naturally the table is frequently queried for ranges of timestamps of one kind of another. It's probably not atypical. Some of the timestamp columns are indexed, but I'm concerned about both the size of the indexes and the cost they add to inserts. Both queries using the indexes and insertions can become unacceptably slow sometimes. [Q: Are the access and insertion costs of an index really non-negligible compared to those of the table itself?] It sounded to me like this might possibly be a job for spatial indexes (degraded down to a single dimension), but I couldn't find enough documentation to figure out whether they generalize to this usage. From what I found, it didn't sound likely. [Q: Do spatial indexes work on simple scalar values and operators? If not, any reason why not? Otherwise, would they help?] Bruce suggested partitioning the table, but that won't work well for multiple independent criteria and comes with a host of scalability and management issues. I'd also want something that was transparent to the application. Perhaps I'm asking for too much but AFAIK it's exactly that ability to separate functionality from optimization that made SQL a success in the first place. [Q: Is there some other transparent optimization for values that correlate with insertion/update order?] So I was wondering whether it would make sense to have a more compact kind of index. One that partitions the value range of a given column into sub-ranges, and for each of those, merely keeps loose track of where those value ranges might be found. "Dates from 2006-07-15 to 2006-08-04: try blocks 99-126 and 175." Just a fixed maximum of two or three contiguous block ranges per value range would probably do the trick. The risk of having too few, of course, is that one oddly positioned block could make the index look as if a particular value range was spread out throughout most of the table. [Q: Am I re-inventing the wheel? If not, is there really a robust, linear correlation between a row's time of insertion or last update and its storage block number?] The index would not need to be exact, just a bit pessimistic. Any re-partitioning or re-balancing could be done offline in "vacuum analyze" style. AFAICS that would allow O(1) insertion cost at runtime, or more aggressive tradeoffs to reduce run-time degradation of indexing quality. Most maintenance could be done incrementally to smooth out its costs over time. With an "index" like this, an index scan would be very cheap indeed but you'd then also scan a small subset of the table itself for exact matches. I would hope that would be a win if the indexed value accounts for more than an infinitesimal portion of table size, and is reasonably correlated with insertion/update time. Also I guess bitmap scans on the indexed values could in principle used compressed bitmaps, excluding areas that according to the index contain no matching rows. There might be some marginal benefit there. [Q: Would this fit in at all? Any particular reason why it doesn't make sense?] Ready and grateful for any criticism, Jeroen ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings