Re: Fwd: GSOC 2018 Project - A New Sorting Routine

Tomas Vondra Fri, 13 Jul 2018 18:04:29 -0700

On 07/14/2018 12:04 AM, Kefan Yang wrote:
> 
> ...
>
>     And finally, I see the PDF reports "CPU clocks" but I'm not sure what
>     that actually is? Is that elapsed time in milliseconds or something
>     else?
> 
> 
> Sorry for the confusion, but "CPU clocks" actually means CPU clock
> ticks, which are units of time of a constant but system-specific length.
>


OK, how do you measure this metric?

> After reading your test results, I think the main problems of intro sort are
> 
> 1. Slow on CREATE INDEX cases.
> 
> I am still trying to figure out where the bottleneck is. Is the data
> pattern in index creation very different from other cases? Also,
> pg_qsort has 10%-20% advantage at creating index even on sorted data
> (faster CPU, N = 1000000). This is very strange to me since the two
> sorting routines execute exactly the same code when the input data is
> sorted.
> 
> 2. Slow on faster CPU and large data set.
> 
> The main difference is still on CREATE INDEX.  But there are several
> SELECT cases(faster CPU, line 179, 206, 377) where pg_qsort can have
> more than 60% advantage, which is crazy. All these cases are dealing
> with mostly sorted data. 
> 
> Personally, I would say intro sort is good as long as it has nearly the
> same performance as pg_qsort on average cases because of the better
> worst-case complexity. In fact, we can make the two sorting routines as
> similar as we want by increasing the threshold that intro sort switches
> to heap sort. Therefore, I think by no means could pg_qsort be 10%-20%
> faster than a well-optimized intro sort because they execute the same
> code most of the time. There must be something special about CREATE
> INDEX test cases, and I am trying to figure it out. Also, I am
> considering performing the preordered check on every recursion, like
> what pg_qsort does, and see how it goes.
> 

I don't know. All I have is the results I shared. I suggest you try
reproducing the tests on your system - the scripts are in the git
repositories.

> Finally, you've mentioned
> 
>     But even if it was, I guess the easiest way to deal with it would be to
>     randomize the selection of pivots.
> 
> 
> I am wondering if pg_sort with random pivot selecting could be faster
> than intro sort. I've just done some simple tests, which shows that
> intro sort in faster in all cases. But I guess it depends on how we
> would implement the random number generation.
> 

Unlikely. The pivot randomization is merely a way to defeat an adversary
attempting to perform DoS by triggering sorts on a killer sequence.
Randomization makes it much harder/impossible, because the killer
sequence changes over time. It's not a regular performance optimization.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fwd: GSOC 2018 Project - A New Sorting Routine

Reply via email to