[PATCH] sort: Add --threads option, which parallelizes internal sort.

Paul Eggert Wed, 25 Mar 2009 13:47:14 -0700

This patch is by Glen Lenker, Matt Pham, Benjamin Nuernberger, Sky
Lin, TaeSung Roh, and Paul Eggert.  It adds support for parallelism
within an internal sort.  On our simple tests on a 2-core desktop x86,
overall performance improved by roughly a factor of 1.6.
* THANKS: Add Benjamin Nuernberger, Sky Lin, TaeSung Roh.  Sort.
* TODO: Add note about how to make this algorithm better.
* bootstrap.conf: Add nproc, pthread.
* doc/coreutils.texi (sort invocation): Document new --threads option.
* src/Makefile.am (sort_LDADD): Add $(LIB_PTHREAD).
* src/sort.c: Include pthread.h, nproc.h.
(SUBTHREAD_LINES_HEURISTIC, THREADS_OPTION): New constants.
(sortlines_temp): Remove decl.
(usage, long_options, main): New option --threads.
(specify_nthreads): New function.
(mergelines): New signature, to emphasize the fact that the HI area must be
part of the destination.  All callers changed.
(sortlines): Rewrite to support and use parallelism, with a new signature.
All uses changed.  Merge in the functionality of sortlines_temp.
(struct thread_args): New struct.
(sortlines_thread): New function.
(sortlines_temp): Remove.
(sort): New arg NTHREADS.  All uses changed.
(main): Disable threading if we are sorting at random.
* tests/Makefile.am (TESTS): Add misc/sort-benchmark-random.
* tests/misc/sort-benchmark-random: New file.
---
 THANKS                           |   23 +++--
 TODO                             |    8 ++
 bootstrap.conf                   |    9 ++-
 doc/coreutils.texi               |    8 ++
 src/Makefile.am                  |    2 +-
 src/sort.c                       |  220 +++++++++++++++++++++++++++-----------
 tests/Makefile.am                |    1 +
 tests/misc/sort-benchmark-random |   67 ++++++++++++
 8 files changed, 263 insertions(+), 75 deletions(-)
 create mode 100644 tests/misc/sort-benchmark-random


diff --git a/THANKS b/THANKS
index 665a9ef..e37fdb3 100644
--- a/THANKS
+++ b/THANKS
@@ -24,12 +24,11 @@ aldomel                             aldo...@ix.netcom.com
 Alen Muzinic                        zv...@fly.cc.fer.hr
 Alexander Nguyen                    v...@seas.ucla.edu
 Alexander V. Lukyanov               l...@netis.ru
-Allen Hewes                         al...@decisiv.net
-Axel Dörfler                        ax...@pinc-software.de
 Alexandre Duret-Lutz                dure...@epita.fr
 Alexey Solovyov                     ale...@math.uu.se
 Alexey Vyskubov                     ale...@pippuri.mawhrin.net
 Alfred M. Szmidt                    a...@kemisten.nu
+Allen Hewes                         al...@decisiv.net
 Andi Kleen                          frei...@alancoxonachip.com
 Andre Novaes Cunha                  andre.cu...@br.global-one.net
 Andreas Frische                     andreasfris...@gmail.com
@@ -61,13 +60,15 @@ Arvind Autar                        autar...@planet.nl
 Augey Mikus                         mi...@dqc.org
 Aurelien Jarno                      aure...@debian.org
 Austin Donnelly                     austin.donne...@cl.cam.ac.uk
+Axel Dörfler                        ax...@pinc-software.de
 Axel Kittenberger                   ans...@gmx.net
 Barry Kelly                         http://barrkel.blogspot.com/
 Bauke Jan Douma                     bjdo...@xs4all.nl
 Ben Elliston                        b...@air.net.au
 Ben Harris                          bj...@netbsd.org
-Benjamin Cutler                     cutle...@simla.colostate.edu
 Bengt Martensson                    be...@mathematik.uni-bremen.de
+Benjamin Cutler                     cutle...@simla.colostate.edu
+Benjamin Nuernberger                benjamin...@gmail.com
 Bernard Giroud                      bernard.gir...@creditlyonnais.ch
 Bernd Eckenfels                     e...@debian.org
 Bernd Leibing                       bernd.leib...@rz.uni-ulm.de
@@ -169,12 +170,12 @@ Elbert Pol                          elbert....@gmail.com
 Eli Zaretskii                       e...@is.elta.co.il
 Elias Pipping                       pipp...@gentoo.org
 Emile LeBlanc                       lebl...@math.toronto.edu
-Erik Auerswald                      auers...@unix-ag.uni-kl.de
 Eric Backus                         er...@lsid.hp.com
 Eric Blake                          e...@byu.net
 Eric G. Miller                      e...@jps.net
 Eric Pemente                        peme...@northpark.edu
 Eric S. Raymond                     e...@snark.thyrsus.com
+Erik Auerswald                      auers...@unix-ag.uni-kl.de
 Erik Bennett                        benn...@cvo.oneworld.com
 Erik Corry                          e...@kroete2.freinet.de
 Evan Hunt                           etha...@armory.com
@@ -207,7 +208,6 @@ Gerhard Poul                        gp...@gnu.org
 Germano Leichsenring                germ...@jedi.cs.kobe-u.ac.jp
 Glen Lenker                         glen.len...@gmail.com
 Göran Uddeborg                      goe...@uddeborg.se
-Guochun Shi                         g...@ncsa.uiuc.edu
 GOTO Masanori                       go...@debian.or.jp
 Greg Louis                          glo...@dynamicro.on.ca
 Greg McGary                         g...@gnu.org
@@ -218,6 +218,7 @@ Greg Wooledge                       gawoole...@sherwin.com
 Gregory Leblanc                     glebl...@cu-portland.edu
 Guido Leenders                      guido.leend...@invantive.com
 Guntram Blohm                       extern.guntram.bl...@audi.de
+Guochun Shi                         g...@ncsa.uiuc.edu
 H. J. Lu                            h...@valinux.com
 Hans Ginzel                         h...@matfyz.cz
 Hans Lermen                         ler...@fgan.de
@@ -232,8 +233,8 @@ Herbert Xu                          
herb...@gondor.apana.org.au
 Holger Berger                       hber...@ess.nec.de
 Hon-Yin Kok                         h...@yoda.unl.edu
 Hugh Daniel                         h...@xanadu.com
-Ian Bruce                           ian.br...@myrealbox.com
 Iain Calder                         i...@rogers.com
+Ian Bruce                           ian.br...@myrealbox.com
 Ian Jackson                         ijack...@chiark.greenend.org.uk
 Ian Lance Taylor                    i...@cygnus.com
 Ian Turner                          vec...@pipeline.com
@@ -243,8 +244,8 @@ Ingo Saitz                          i...@debian.org
 Ivo Timmermans                      i...@debian.org
 James                               ja...@albion.glarp.com
 James Antill                        jmanti%essex.ac...@seralph21.essex.ac.uk
-James Lemley                        james.lem...@acxiom.com
 James Hunt                          jamesodh...@hotmail.com
+James Lemley                        james.lem...@acxiom.com
 James Ralston                       rals...@pobox.com
 James Sneeringer                    j...@ocslink.com
 James Tanis                         j...@soscorp.com
@@ -318,8 +319,8 @@ Keith Thompson                      k...@cts.com
 Ken Pizzini                         k...@halcyon.com
 Kevin Mudrick                       kmudr...@healthmarketscience.com
 Kirk Kelsey                         kirk.kel...@0x4b.net
-Kristin E Thomas                    krist...@us.ibm.com
 Kjetil Torgrim Homme                kjeti...@ifi.uio.no
+Kristin E Thomas                    krist...@us.ibm.com
 Kristoffer Rose                     k...@diku.dk
 Larry McVoy                         l...@sgi.com
 Lars Hecking                        lheck...@nmrc.ucc.ie
@@ -408,8 +409,8 @@ Michiel Bacchiani                   bacch...@raven.bu.edu
 Mikael Magnusson                    mika...@gmail.com
 Mike Castle                         dalg...@ix.netcom.com
 Mike Coleman                        m...@mathdogs.com
-Mike Jetzer                         mjet...@mke.catalystwms.com
 Mike Frysinger                      vap...@gentoo.org
+Mike Jetzer                         mjet...@mke.catalystwms.com
 Mikko Tuumanen                      m...@sorvankyla.yok.utu.fi
 Mikulas Patocka                     miku...@artax.karlin.mff.cuni.cz
 Miles Bader                         mi...@gnu.ai.mit.edu
@@ -510,6 +511,7 @@ Savochkin Andrey Vladimirovich      s...@msu.ru
 Scott Lurndal                       sl...@griffin.engr.sgi.com
 Sébastien Maret                     sma...@umich.edu
 Shing-Shong Shei                    s...@cs.indiana.edu
+Sky Lin                             chip...@gmail.com
 Soeren Sonnenburg                   sonnenb...@informatik.hu-berlin.de
 Solar Designer                      so...@owl.openwall.com
 Stanislav Ievlev                    in...@altlinux.ru
@@ -530,9 +532,10 @@ Stuart Shelton                      stu...@shelton.me
 Sven Joachim                        svenj...@gmx.de
 Szakacsits Szabolcs                 sz...@sienet.hu
 Tadayoshi Funaba                    t...@kt.rim.or.jp
+TaeSung Roh                         abraham....@gmail.com
 TAKAI Kousuke                       ta...@vlsi.kuee.kyoto-u.ac.jp
-Theodore Ts'o                       ty...@rsts-11.mit.edu
 The Wanderer                        inversepara...@comcast.net
+Theodore Ts'o                       ty...@rsts-11.mit.edu
 Theodoros V. Kalamatianos           n...@users.sourceforge.net
 Thomas Bushnell                     tho...@gnu.ai.mit.edu
 Thomas Goerlich                     tho...@schnappmatik.de
diff --git a/TODO b/TODO
index 7288285..ee7e850 100644
--- a/TODO
+++ b/TODO
@@ -102,6 +102,14 @@ sort: Investigate better sorting algorithms; see Knuth 
vol. 3.
   5.3.1, who credits Lester Ford, Jr. and Selmer Johnson, American
   Mathematical Monthly 66 (1959), 387-389.
 
+  The current method of parallelization can be improved.  Parent
+  threads wait for children to finish, which can be a bottleneck.  It
+  might be better, at least when computation nears the root of the
+  merge tree, for parent threads to accept outputs from their children
+  one by one, and thus merge while their children are merging.  This
+  will require some inter-thread communication (e.g., via shared
+  objects that are accessed safely via mutual exclusion).
+
 shred: Update shred as described here to conform to DoD 5220 rules:
 http://lists.gnu.org/archive/html/bug-coreutils/2007-05/msg00075.html
 
diff --git a/bootstrap.conf b/bootstrap.conf
index 0747bb8..7358e10 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -74,12 +74,19 @@ gnulib_modules="
        memcasecmp memcmp2 mempcpy
        memrchr mgetgroups
        mkancesdirs mkdir mkdir-p mkstemp mktime modechange
-       mountlist mpsort obstack pathmax perl physmem
+       mountlist
+       mpsort
+       nproc
+       obstack
+       pathmax
+       perl
+       physmem
        posix-shell
        posixtm
        posixver
        progname
        propername
+       pthread
        putenv
        quote quotearg raise readlink areadlink-with-size
        randint
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 70effa1..a9ad20e 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4043,6 +4043,14 @@ have a large sort or merge that is I/O-bound, you can 
often improve
 performance by using this option to specify directories on different
 disks and controllers.
 
+...@item --threa...@var{n}
+Use no more than @var{n} threads to improve parallelism and thus
+reduce the wall-clock time needed for the sort.  The default @var{n}
+is the number of processors currently online, rounded down to the
+nearest power of two.  This default may change in the future.  If
+...@var{n} is a power of two, increasing it to a value less than 2...@var{n}
+does not typically help performance.
+
 @item -u
 @itemx --unique
 @opindex -u
diff --git a/src/Makefile.am b/src/Makefile.am
index 2313ed3..6113e5f 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -118,7 +118,7 @@ tac_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME)
 vdir_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) $(LIB_SELINUX) $(LIB_CAP)
 
 ## If necessary, add -lm to resolve use of pow in lib/strtod.c.
-sort_LDADD = $(LDADD) $(POW_LIB) $(LIB_GETHRXTIME)
+sort_LDADD = $(LDADD) $(POW_LIB) $(LIB_GETHRXTIME) $(LIB_PTHREAD)
 
 # for get_date and gettime
 date_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME)
diff --git a/src/sort.c b/src/sort.c
index ced0f2d..b4f0dfb 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -23,6 +23,7 @@
 #include <config.h>
 
 #include <getopt.h>
+#include <pthread.h>
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <signal.h>
@@ -32,6 +33,7 @@
 #include "filevercmp.h"
 #include "hash.h"
 #include "md5.h"
+#include "nproc.h"
 #include "physmem.h"
 #include "posixver.h"
 #include "quote.h"
@@ -83,6 +85,13 @@ struct rlimit { size_t rlim_cur; };
 # define OPEN_MAX 20
 #endif
 
+/* Heuristic value for the number of lines for which it is worth
+   creating a subthread, during an internal merge sort, on a machine
+   that has processors galore.  Currently this number is just a guess.
+   This value must be at least 4.  We don't know of any machine where
+   this number has any practical effect.  */
+enum { SUBTHREAD_LINES_HEURISTIC = 4 };
+
 #define UCHAR_LIM (UCHAR_MAX + 1)
 
 #ifndef DEFAULT_TMPDIR
@@ -289,8 +298,6 @@ static char const *compress_program;
    number are present, temp files will be used. */
 static unsigned int nmerge = NMERGE_DEFAULT;
 
-static void sortlines_temp (struct line *, size_t, struct line *);
-
 /* Report MESSAGE for FILE, then clean up and exit.
    If FILE is null, it represents standard output.  */
 
@@ -380,6 +387,7 @@ Other options:\n\
   -t, --field-separator=SEP  use SEP instead of non-blank to blank 
transition\n\
   -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or %s;\n\
                               multiple options specify multiple directories\n\
+      --threads=N           use no more than N threads to improve 
parallelism\n\
   -u, --unique              with -c, check for strict ordering;\n\
                               without -c, output only the first of an equal 
run\n\
 "), DEFAULT_TMPDIR);
@@ -423,7 +431,8 @@ enum
   FILES0_FROM_OPTION,
   NMERGE_OPTION,
   RANDOM_SOURCE_OPTION,
-  SORT_OPTION
+  SORT_OPTION,
+  THREADS_OPTION
 };
 
 static char const short_options[] = "-bcCdfgik:mMno:rRsS:t:T:uVy:z";
@@ -455,6 +464,7 @@ static struct option const long_options[] =
   {"temporary-directory", required_argument, NULL, 'T'},
   {"unique", no_argument, NULL, 'u'},
   {"zero-terminated", no_argument, NULL, 'z'},
+  {"threads", required_argument, NULL, THREADS_OPTION},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
   {NULL, 0, NULL, 0},
@@ -1253,6 +1263,21 @@ specify_sort_size (int oi, char c, char const *s)
   xstrtol_fatal (e, oi, c, long_options, s);
 }
 
+/* Specify the number of threads to spawn during internal sort.  */
+static unsigned long int
+specify_nthreads (int oi, char c, char const *s)
+{
+  unsigned long int nthreads;
+  enum strtol_error e = xstrtoul (s, NULL, 10, &nthreads, "");
+  if (e == LONGINT_OVERFLOW)
+    return ULONG_MAX;
+  if (e != LONGINT_OK)
+    xstrtol_fatal (e, oi, c, long_options, s);
+  if (nthreads == 0)
+    error (SORT_FAILURE, 0, _("number of threads must be nonzero"));
+  return nthreads;
+}
+
 /* Return the default sort size.  */
 static size_t
 default_sort_size (void)
@@ -2435,25 +2460,28 @@ mergefiles (struct sortfile *files, size_t ntemps, 
size_t nfiles,
   return nopened;
 }
 
-/* Merge into T the two sorted arrays of lines LO (with NLO members)
-   and HI (with NHI members).  T, LO, and HI point just past their
-   respective arrays, and the arrays are in reverse order.  NLO and
-   NHI must be positive, and HI - NHI must equal T - (NLO + NHI).  */
+/* Merge into T (of size NLINES) the two sorted arrays of lines
+   LO (with NLINES / 2 members), and
+   T - (NLINES / 2) (with NLINES - NLINES / 2 members).
+   T and LO point just past their respective arrays, and the arrays
+   are in reverse order.  NLINES must be at least 2.  */
 
 static inline void
-mergelines (struct line *t,
-           struct line const *lo, size_t nlo,
-           struct line const *hi, size_t nhi)
+mergelines (struct line *restrict t, size_t nlines,
+           struct line const *restrict lo)
 {
+  size_t nlo = nlines / 2;
+  size_t nhi = nlines - nlo;
+  struct line const *hi = t - nlo;
+
   for (;;)
     if (compare (lo - 1, hi - 1) <= 0)
       {
        *--t = *--lo;
        if (! --nlo)
          {
-           /* HI - NHI equalled T - (NLO + NHI) when this function
-              began.  Therefore HI must equal T now, and there is no
-              need to copy from HI to T.  */
+           /* HI must equal T now, and there is no need to copy from
+              HI to T.  */
            return;
          }
       }
@@ -2471,53 +2499,54 @@ mergelines (struct line *t,
       }
 }
 
-/* Sort the array LINES with NLINES members, using TEMP for temporary space.
-   NLINES must be at least 2.
-   The input and output arrays are in reverse order, and LINES and
-   TEMP point just past the end of their respective arrays.
 
-   Use a recursive divide-and-conquer algorithm, in the style
-   suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23.  Use
-   the optimization suggested by exercise 5.2.4-10; this requires room
-   for only 1.5*N lines, rather than the usual 2*N lines.  Knuth
-   writes that this memory optimization was originally published by
-   D. A. Bell, Comp J. 1 (1958), 75.  */
+static void sortlines (struct line *restrict, size_t, struct line *restrict,
+                      unsigned long int, bool);
 
-static void
-sortlines (struct line *lines, size_t nlines, struct line *temp)
+/* Thread arguments for sortlines_thread.  */
+struct thread_args
 {
-  if (nlines == 2)
-    {
-      if (0 < compare (&lines[-1], &lines[-2]))
-       {
-         struct line tmp = lines[-1];
-         lines[-1] = lines[-2];
-         lines[-2] = tmp;
-       }
-    }
-  else
-    {
-      size_t nlo = nlines / 2;
-      size_t nhi = nlines - nlo;
-      struct line *lo = lines;
-      struct line *hi = lines - nlo;
-      struct line *sorted_lo = temp;
-
-      sortlines (hi, nhi, temp);
-      if (1 < nlo)
-       sortlines_temp (lo, nlo, sorted_lo);
-      else
-       sorted_lo[-1] = lo[-1];
+  struct line *lines;
+  size_t nlines;
+  struct line *temp;
+  unsigned long int nthreads;
+  bool to_temp;
+};
 
-      mergelines (lines, sorted_lo, nlo, hi, nhi);
-    }
+/* Like sortlines, except with a signature acceptable to pthread_create.  */
+static void *
+sortlines_thread (void *data)
+{
+  struct thread_args const *args = data;
+  sortlines (args->lines, args->nlines, args->temp, args->nthreads,
+            args->to_temp);
+  return NULL;
 }
 
-/* Like sortlines (LINES, NLINES, TEMP), except output into TEMP
-   rather than sorting in place.  */
+/* Sort the array LINES with NLINES members, using TEMP for temporary space,
+   spawning NTHREADS threads.  NLINES must be at least 2.
+   The input and output arrays are in reverse order, and LINES and
+   TEMP point just past the end of their respective arrays.
+
+   If TO_TEMP, place the result in TEMP (trashing LINES in the
+   process); otherwise, place the result back into LINES so that it is
+   an in-place sort (trashing TEMP in the process).
+
+   Use a recursive divide-and-conquer algorithm, in the style
+   suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23.  If
+   multithreaded, this requires that TEMP contain NLINES entries; if
+   singlethreaded, use the optimization suggested by Knuth exercise
+   5.2.4-10, which requires room for only 1.5*N lines, rather than the
+   usual 2*N lines.  Knuth writes that this memory optimization was
+   originally published by D. A. Bell, Comp J. 1 (1958), 75.
+
+   This function is inline so that its tests for multthreadedness and
+   inplacedness can be optimized away in common cases.  */
 
 static void
-sortlines_temp (struct line *lines, size_t nlines, struct line *temp)
+sortlines (struct line *restrict lines, size_t nlines,
+          struct line *restrict temp,
+          unsigned long int nthreads, bool to_temp)
 {
   if (nlines == 2)
     {
@@ -2525,8 +2554,17 @@ sortlines_temp (struct line *lines, size_t nlines, 
struct line *temp)
         <http://lists.gnu.org/archive/html/bug-coreutils/2005-10/msg00086.html>
         in the IBM xlc 6.0.0.0 compiler in 64-bit mode.  */
       int swap = (0 < compare (&lines[-1], &lines[-2]));
-      temp[-1] = lines[-1 - swap];
-      temp[-2] = lines[-2 + swap];
+      if (to_temp)
+       {
+         temp[-1] = lines[-1 - swap];
+         temp[-2] = lines[-2 + swap];
+       }
+      else if (swap)
+       {
+         temp[-1] = lines[-1];
+         lines[-1] = lines[-2];
+         lines[-2] = temp[-1];
+       }
     }
   else
     {
@@ -2534,13 +2572,43 @@ sortlines_temp (struct line *lines, size_t nlines, 
struct line *temp)
       size_t nhi = nlines - nlo;
       struct line *lo = lines;
       struct line *hi = lines - nlo;
-      struct line *sorted_hi = temp - nlo;
 
-      sortlines_temp (hi, nhi, sorted_hi);
-      if (1 < nlo)
-       sortlines (lo, nlo, temp);
+      unsigned long int child_subthreads = nthreads / 2;
+      unsigned long int my_subthreads = nthreads - child_subthreads;
+      pthread_t thread;
+      struct thread_args args = {hi, nhi, temp - nlo, child_subthreads, 
to_temp};
+
+      if (child_subthreads != 0 && SUBTHREAD_LINES_HEURISTIC <= nlines
+         && pthread_create (&thread, NULL, sortlines_thread, &args) == 0)
+       {
+         /* Guarantee that nlo and nhi are each at least 2.  */
+         verify (4 <= SUBTHREAD_LINES_HEURISTIC);
+
+         sortlines (lo, nlo, temp, my_subthreads, !to_temp);
+         pthread_join (thread, NULL);
+       }
+      else
+       {
+         sortlines (hi, nhi, temp - (to_temp ? nlo : 0), 1, to_temp);
+         if (1 < nlo)
+           sortlines (lo, nlo, temp, 1, !to_temp);
+         else if (!to_temp)
+           temp[-1] = lo[-1];
+       }
 
-      mergelines (temp, lo, nlo, sorted_hi, nhi);
+      struct line *dest;
+      struct line const *sorted_lo;
+      if (to_temp)
+       {
+         dest = temp;
+         sorted_lo = lines;
+       }
+      else
+       {
+         dest = lines;
+         sorted_lo = temp;
+       }
+      mergelines (dest, nlines, sorted_lo);
     }
 }
 
@@ -2744,7 +2812,8 @@ merge (struct sortfile *files, size_t ntemps, size_t 
nfiles,
 /* Sort NFILES FILES onto OUTPUT_FILE. */
 
 static void
-sort (char * const *files, size_t nfiles, char const *output_file)
+sort (char * const *files, size_t nfiles, char const *output_file,
+      unsigned long int nthreads)
 {
   struct buffer buf;
   size_t ntemps = 0;
@@ -2758,8 +2827,11 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
       char const *file = *files;
       FILE *fp = xfopen (file, "r");
       FILE *tfp;
+
+      /* If singlethreaded, the merge uses the memory optimization
+        suggested in Knuth exercise 5.2.4-10; see sortlines.  */
       size_t bytes_per_line = (2 * sizeof (struct line)
-                              - sizeof (struct line) / 2);
+                              - (1 < nthreads ? 0 : sizeof (struct line) / 2));
 
       if (! buf.alloc)
        initbuf (&buf, bytes_per_line,
@@ -2787,7 +2859,7 @@ sort (char * const *files, size_t nfiles, char const 
*output_file)
          line = buffer_linelim (&buf);
          linebase = line - buf.nlines;
          if (1 < buf.nlines)
-           sortlines (line, buf.nlines, linebase);
+           sortlines (line, buf.nlines, linebase, nthreads, false);
          if (buf.eof && !nfiles && !ntemps && !buf.left)
            {
              xfclose (fp, file);
@@ -3039,6 +3111,7 @@ main (int argc, char **argv)
   bool mergeonly = false;
   char *random_source = NULL;
   bool need_random = false;
+  unsigned long int nthreads = 0;
   size_t nfiles = 0;
   bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
   bool obsolete_usage = (posix2_version () < 200112);
@@ -3361,6 +3434,10 @@ main (int argc, char **argv)
          add_temp_dir (optarg);
          break;
 
+       case THREADS_OPTION:
+         nthreads = specify_nthreads (oi, c, optarg);
+         break;
+
        case 'u':
          unique = true;
          break;
@@ -3506,6 +3583,9 @@ main (int argc, char **argv)
 
   if (need_random)
     {
+      /* Threading does not lock the randread_source structure, so
+        downgrade to one thread to avoid race conditions. */
+      nthreads = 1;
       randread_source = randread_new (random_source, MD5_DIGEST_SIZE);
       if (! randread_source)
        die (_("open failed"), random_source);
@@ -3560,7 +3640,21 @@ main (int argc, char **argv)
       IF_LINT (free (sortfiles));
     }
   else
-    sort (files, nfiles, outfile);
+    {
+      if (!nthreads)
+       {
+         /* The default NTHREADS is 2 ** floor (log2 (number of processors)).
+            If comparisons do not vary in cost and threads are
+            scheduled evenly, with the current algorithm there is no
+            performance advantage to using a number of threads that
+            is not a power of 2.  */
+         unsigned long int np2 = num_processors () / 2;
+         for (nthreads = 1; nthreads <= np2; nthreads *= 2)
+           continue;
+       }
+
+      sort (files, nfiles, outfile, nthreads);
+    }
 
   if (have_read_stdin && fclose (stdin) == EOF)
     die (_("close failed"), "-");
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 5f150ad..2e2bfec 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -201,6 +201,7 @@ TESTS =                                             \
   misc/shred-remove                            \
   misc/shuf                                    \
   misc/sort                                    \
+  misc/sort-benchmark-random                   \
   misc/sort-compress                           \
   misc/sort-continue                           \
   misc/sort-files0-from                                \
diff --git a/tests/misc/sort-benchmark-random b/tests/misc/sort-benchmark-random
new file mode 100644
index 0000000..2bace4f
--- /dev/null
+++ b/tests/misc/sort-benchmark-random
@@ -0,0 +1,67 @@
+#!/bin/sh
+# Benchmark sort on randomly generated data.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Written by Glen Lenker.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  sort --version
+fi
+
+. $srcdir/test-lib.sh
+
+very_expensive_
+
+# Use the 'C' locale to avoid problems in case Perl's sort isn't stable.
+LC_ALL=C
+export LC_ALL
+
+fail=0
+
+perl -e '
+my $num_lines = 500000;
+my $length = 100;
+
+for (my $i=0; $i < $num_lines; $i++)
+{
+    for (my $j=0; $j < $length; $j++)
+    {
+       printf "%c", 32 + rand(94);
+    }
+    print "\n";
+}' > in || framework_failure
+
+# We need to generate a lot of data for sort to show a noticeable
+# improvement in performance. Sorting it in PERL may take awhile.
+
+perl -e '
+open (FILE, "<in");
+my @list = <FILE>;
+print sort(@list);
+close (FILE);
+' > exp || framework_failure
+
+#start=$(date +%s)
+time sort in > out || fail=1
+#duration=$(expr $(date +%s) - $start)
+
+#echo sorting the random data took $duration seconds
+
+compare out exp || fail=1
+
+Exit $fail
-- 
1.6.2.1



_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

[PATCH] sort: Add --threads option, which parallelizes internal sort.

Reply via email to