On Thu, Jun 28, 2018 at 9:47 AM, Teodor Sigaev <teo...@sigaev.ru> wrote: > Current estimation of sort cost has following issues: > - it doesn't differ one and many columns sort > - it doesn't pay attention to comparison function cost and column width > - it doesn't try to count number of calls of comparison function on per > column > basis
I've been suspicious of the arbitrary way in which I/O for external sorts is costed by cost_sort() for a long time. I'm not 100% sure about how we should think about this question, but I am sure that it needs to be improved in *some* way. It's really not difficult to show that external sorts are now often faster than internal sorts, because they're able to be completed on-the-fly, which can have very good CPU cache characteristics, and because the I/O latency can be hidden fairly well much of the time. Of course, as memory is taken away, external sorts will eventually get slower and slower, but it's surprising how little difference it makes. (This makes me tempted to look into a sort_mem GUC, even though I suspect that that will be controversial.) Clearly there is a cost to doing I/O even when an external sort is faster than an internal sort "in isolation"; I/O does not magically become something that we don't have to worry about. However, the I/O cost seems more and more like a distributed cost. We don't really have a way of thinking about that at all. I'm not sure if that much bigger problem needs to be addressed before this specific problem with cost_sort() can be addressed. -- Peter Geoghegan