Parallel sorts get ~10 times slower as one increases the vector size from 4*10^9 to 5*10^9, perhaps at exactly 2^32, but this wasn't checked. The example below is for a vector of ints but the same phenomenon is observed on a vector of long longs.
To reproduce (sort_test.cc is below): 0. Adjust 'processors' in sort_test.cc. 1. g++ -O3 -fopenmp sort_test.cc -lgomp 2. ./a.out output: 58 seconds used in sort [for vector of size 4,000,000,000] 667 seconds used in sort [for vector of size 5,000,000,000] gcc version information: crd4% gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.4.1/configure --with-gmp=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1 --with-mpfr=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1 --prefix=/broad/tools/Linux/x86_64/pkgs/gcc_4.4.1 Thread model: posix gcc version 4.4.1 (GCC) We first observed the problem under gcc 4.3.3. hardware info: crd4% uname -a Linux crd4 2.6.16.54-0.2.5-smp #1 SMP Mon Jan 21 13:29:51 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux This is a 32-processor machine with 256 GB of memory, but I don't think the problem is specific to this architecture. sort_test.cc: #include <iostream> #include <omp.h> #include <time.h> #include <vector> using namespace std; int main( ) { for ( long long m = 4; m <= 5; m++ ) { const long long entries = m * (long long) 1000000000; const int processors = 32; vector<int> x(entries); for ( long long i = 0; i < entries; i++ ) x[i] = (i*i) % 123456789; time_t clock1, clock2; time( &clock1 ); omp_set_num_threads(processors); sort( x.begin( ), x.end( ) ); time( &clock2 ); cout << clock2 - clock1 << " seconds used in sort" << endl; } } -- Summary: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jaffe at broad dot mit dot edu GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852