On 07/14/2018 12:04 AM, Kefan Yang wrote: > > ... > > And finally, I see the PDF reports "CPU clocks" but I'm not sure what > that actually is? Is that elapsed time in milliseconds or something > else? > > > Sorry for the confusion, but "CPU clocks" actually means CPU clock > ticks, which are units of time of a constant but system-specific length. >
OK, how do you measure this metric? > After reading your test results, I think the main problems of intro sort are > > 1. Slow on CREATE INDEX cases. > > I am still trying to figure out where the bottleneck is. Is the data > pattern in index creation very different from other cases? Also, > pg_qsort has 10%-20% advantage at creating index even on sorted data > (faster CPU, N = 1000000). This is very strange to me since the two > sorting routines execute exactly the same code when the input data is > sorted. > > 2. Slow on faster CPU and large data set. > > The main difference is still on CREATE INDEX. But there are several > SELECT cases(faster CPU, line 179, 206, 377) where pg_qsort can have > more than 60% advantage, which is crazy. All these cases are dealing > with mostly sorted data. > > Personally, I would say intro sort is good as long as it has nearly the > same performance as pg_qsort on average cases because of the better > worst-case complexity. In fact, we can make the two sorting routines as > similar as we want by increasing the threshold that intro sort switches > to heap sort. Therefore, I think by no means could pg_qsort be 10%-20% > faster than a well-optimized intro sort because they execute the same > code most of the time. There must be something special about CREATE > INDEX test cases, and I am trying to figure it out. Also, I am > considering performing the preordered check on every recursion, like > what pg_qsort does, and see how it goes. > I don't know. All I have is the results I shared. I suggest you try reproducing the tests on your system - the scripts are in the git repositories. > Finally, you've mentioned > > But even if it was, I guess the easiest way to deal with it would be to > randomize the selection of pivots. > > > I am wondering if pg_sort with random pivot selecting could be faster > than intro sort. I've just done some simple tests, which shows that > intro sort in faster in all cases. But I guess it depends on how we > would implement the random number generation. > Unlikely. The pivot randomization is merely a way to defeat an adversary attempting to perform DoS by triggering sorts on a killer sequence. Randomization makes it much harder/impossible, because the killer sequence changes over time. It's not a regular performance optimization. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services