Jeff Davis wrote:
On Wed, 2007-12-19 at 12:08 -0500, Mark Mielke wrote:
Stupid question #2: Is it well recognized that the CPU is the
bottleneck in the PostgreSQL sorting mechanism? Or might it be memory
bandwidth and I/O?
I think it depends a lot on several factors. It's probably a different
bottleneck for integers versus localized text, and depends on the
available memory and I/O characteristics.
Makes sense.
It would seem to me that any sort worth parallelizing (administrative
and synchronization overhead), must have data larger than the L2
cache. If larger than the L2 cache, it becomes real memory speed. If
real memory speed, wouldn't one CPU without hardware synchronization,
be able to fill the memory read/write pipe?
We do an external merge sort, which involves merging M runs together.
You seem to be implying that we can generate the output run at disk
speed, and therefore the CPU speed is unimportant.
Correct. Or, alternatively, you could achieve the same effect using
asychronous I/O or read ahead.
I suspect that the comparison costs are enough that the above statement
isn't true in all cases, particularly in the case of localized text.
That sounds possible, but I still feel myself suspecting that disk reads
will be much slower than localized text comparison. Perhaps I am
overestimating the performance of the comparison function?
Also, there is probably a lot of memory copying going on, and that
probably destroys a lot of the effectiveness of L2 caching. When L2
caching is ineffective, the CPU spends a lot of time just waiting on
memory. In that case, it's better to have P threads of execution all
waiting on memory operations in parallel.
I didn't consider the high throughput / high latency effect. This could
be true if the CPU prefetch isn't effective enough.
This would explain why "1p2t" would outperform a "1p1t" in Ron's
reference above.
These are just my first thoughts, however. There is a lot of existing
research out there that we can look into, and also a lot of tests that
we can run before jumping into this.
I think parallel sorting can be looked into separately from the other
sorting improvements.
Yep - I started to read up on it. It still sounds like it's a hard-ish
problem (to achieve near N times speedup for N CPU cores without
degrading performance for existing loads), but that doesn't mean
impossible. :-)
Cheers,
mark
--
Mark Mielke <[EMAIL PROTECTED]>