Hi Vladimir,
There could be many reasons for this.
The verisimilar ones are imprecise time measurement with highly dispersed
results and biased samples.
Another reason is that an attempt to divide whole array into equal-size
partition not always gives us the best result.
And hence choosing "wrong" pivots could make partitions balanced slightly
better.
Let me clarify this counter-intuitive statement regarding not-equal
partitioning.
Consider following quite straightforward dual pivot quicksort.
sort(a[]) {
pivot1, pivot2 = choosePivots(a);
// partitioning
forall (a[k] in a) {
if (a[k] < pivot1) {
a[k] goes to the left partition
} else if (a[k] > pivot2) {
a[k] goes to the right partition
} else {
a[k] goes to the middle partition
}
}
sort(left partition);
sort(middle partition);
sort(right partition);
}
Here you can see that during partitioning every item is compared with one or
two pivots. In our case item is compared with
the second pivot only if it greater than the first one. So what is the average
number of comparison during partitioning?
If we succeed to choose pivots so that they divide whole array into 3 equal
partitions we have 1*(1/3) + 2*(1/3) + 2*(1/3) = 5/3 per item.
Is this ideal? What if pivots divide array in following proportions 1/2 1/4
1/4? Then we have 1*(1/2) + 2*(1/4) + 2*(1/4) = 3/2.
3/2 is less than 5/3.
Let's find now ideal proportions of partitions taking into account number of
comparison of sorting in whole.
Assume that number of comparison is A*n*ln(n) + B*n + o(n) and we divide whole
array in following proportions
[alpha, (1 - alpha)/2, (1 - alpha)/2], where 0 < alpha < 1.
A*n*ln(n) + B*n =
= n * (alpha + 2*(1 - alpha)) { partitioning }
+ A * alpha*n * ln(alpha*n) + B * alpha*n { sorting left partition }
+ 2 * A * (1-alpha)*n/2 * ln((1-alpha)*n/2) + 2 * B * (1-alpha)*n/2 { sorting
middle and right partitions }
A*n*ln(n) + B*n =
= A*alpha*n*ln(n) + A*(1-alpha)*n*ln(n) +
+ B*alpha*n + B*(1-alpha)*n +
+ n * (alpha + 2*(1-alpha))
+ A*alpha*n*ln(alpha) + A*(1-alpha)*n*ln((1-alpha)/2)
0 = (alpha + 2*(1-alpha)) + A*alpha*ln(alpha) + A*(1-alpha)*ln((1-alpha)/2)
A = (alpha - 2) / (alpha*ln(alpha) + (1-alpha)*ln((1-alpha)/2))
alpha A
1/12 2.078316236
2/12 1.783079278
3/12 1.617083005
4/12 1.517065378
5/12 1.461274369
6/12 1.442695041 !!!
7/12 1.463491681
8/12 1.536871653
9/12 1.699242413
10/12 2.060936332
11/12 3.143757518
It appears that the best alpha is about 1/2. Thus ideal partition is something
like [1/2, 1/4, 1/4].
Of course, these consideration does not apply to the real DPQ completely. This
is because in real DPQ every item is not compared with pivots in well defined
order and
real DPQ contains numerous special cases, which make it harder to analyze.
Regards,
Dmytro Sheyko
> From: [email protected]
> To: [email protected]
> Subject: Question on sorting
> Date: Fri, 30 Jul 2010 22:55:00 +0400
> CC: [email protected]
>
> Hello,
>
> I have performance question in sorting.
>
> In an implementation of Dual-Pivot Quicksort 5 elements
>
> a[e1], a[e2], a[e3], a[e4], a[e5]
>
> where e1 = len/6, e2 = len/3, e3 = len/2, e4 = len*2/3, e5 = len*5/6,
> are sorted and then elements a[e2], a[e4] are taken as pivots.
>
> but if I take a[e1] and a[e3] as pivots, it works 2-3% faster on
> *random* inputs.
>
> I tried different sorting for these 5 elements: network, bubble,
> insertion - with a[e1] and a[e3] pivots it is faster than with
> a[e2] and a[e4].
>
> If I sort these 5 elements in descending order, it works faster
> with a[e5] and a[e3] pivots.
>
> Do you have any idea why it happens?
>
> Thank you,
> Vladimir