On Fri, 2008-01-11 at 20:03 +0100, Gavin Sherry wrote: > Okay, it's good that you want the planner to look at those. Did you > consider the point I made about the sheer amount of data the planner > would have to consider for large cases?
Sorry, thought I had somewhere in all those emails... If you really do have 16TB of data and its in the use case of mostly read only, volatile in last portion of table, then it would be straightforward to increase segment size. Whatever form of partitioning we go for we do need to allow 1-2,000 partitions fairly easily. We can't sustain a sequential method of applying the rules, which gives O(n) behaviour. We need some form of indexing/tree search mechanism that gives roughly O(logN) behaviour. > > We're back to saying that if the visibility map is volatile, then SE > > won't help you much. I agree with that and haven't argued otherwise. > > Does saying it make us throw away SE? No, at least, not yet and not > for > > that reason. > > Yes, I'm not against SE I just think that only having it would see a > serious regression for larger user. If we do find regressions, then I'd suggest we look at ways of turning it on/off automatically. We can keep track of the volatility in the map. We can also have an off switch if you're really against it. But I'm still skeptical. I think we can sustain the last few segments in a table being volatile, as long as the greater proportion of segments are not. The volatile part of the table is the subject of most of the queries anyway, so excluding it from seq scans isn't that important. Index access to historical data doesn't slow down whether or not you have a volatile map. > Personally, I think SE would be a > great idea for append only tables since it removes the thing I'm most > worried about with it: the need to vacuum to 'turn it on'. You do need to VACUUM anyway, so thats no problem. Sure you have to scan it to derive the boundary values, but thats one scan that saves many in the future. > I'll go back to what I said above. SE looks like a good performance > boost for archival read only data. If we tighten up the definitions of > how some tables can be used -- append only -- then we can remove the > vacuum requirement and also change other characteristics of the > storage. > For example, reduced visibilty information, compression, etc. These > are > hot topics for people with that kind of data. SE isn't aimed at solely INSERT-only data. It's much wider than that. An ERP/SCM type application would easily benefit, say where orders are received and processed within a relatively short period, but order data needs to be maintained online for 2+ years. Anything where the table grows either by date, or by increasing key, that has a read-only "tail". That's a lot of applications and a ton of data. It might not apply to the "Customer" table in that app, but it will apply to Orders, OrderItem etc. We can compress older read-only data as a way of shrinking file size. That's way easier, plus it applies to more real-world cases than trying to remove all the visibility stuff. The Insert-only people will still be able to take advantage of it compression, but the more general purpose people will never be able to make use of the visibility removal code. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate