"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Which means we need all the interface bits to be able to tell PostgreSQL > where every single temp storage area is. Presumably much of the > tablespace mechanism could be used for this, but it's still a bunch of > work. And you can't just say "I have 8 spindles", you have to tell > PostgreSQL exactly where to put each temporary area (unless you just > have it put one on every tablespace you have defined).
Yes, if you have more than one temporary area you definitely need a way to tell Postgres where to put them since obviously they won't be in the postgres base directory. But I think that's all Postgres really needs to know. One could imagine a more complex version where Postgres has meta information about the bandwidth and seek penalty for each sort area separately. Presumably also for each table space. But that's a whole lot more complexity than Postgres's current cost model. > > that it should strive to maximize sequential reads within one temp area and > > expect switching between temp areas (which represent multiple spindles) to > > be > > better than multiplexing multiple tapes within a single temp area (which > > represents a single spindle). > > Which adds yet more complexity to all the code that uses the temp area. > And as others have brought up, you still have to allow for the case when > splitting all of this out into multiple files means you end up using > substantially more disk space. That further drives up the complexity. You also have to consider that it won't always be a benefit to spread the sort over multiple sort areas. If there's only one sort going on and you can reach a 1:1 ratio between tapes and spindles then I think it would be a huge benefit. Effectively boosting the sort speed by random_page_cost. But if you don't have as many spindles as your algorithm needs tapes then it's unclear which to multiplex down and whether you gain any benefit once you're multiplexing over simply using a single sort area. And worse, if there are multiple sorts going on in the system then you're not going to get sequential access even if you have multiple sort areas available. If you have N sort areas and N sorts are going on then you're probably better off multiplexing each one down to a single sort area and letting them each proceed without interfering with each other rather than having each one hog all the sort areas and forcing the OS to do the multiplexing blindly. > My point is that unless someone shows that there's a non-trivial > performance gain here, it's not going to happen. I think two extreme cases are well worth pursuing: 1) Use n sort areas for n tapes making everything purely sequential access. That would be useful for large DSS systems where large sorts are running and i/o bandwidth is high for sequential access. That gives effectively a random_page_cost speed boost. 2) Use the current algorithm unchanged but have each sort use a different sort area in some sort of round-robin fashion. That helps the OLTP type environment (ignoring for the moment that OLTP environments really ought not be doing disk sorts) where people complain about unreliable execution times more than slow execution times. If you can provide enough sort areas then it would remove one big reason other queries concurrent impact the execution time of queries. -- greg ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq