< Paul Eggert <[EMAIL PROTECTED]> wrote: < ... < > Hmm, it sounds like your input data has some very long lines, then. < > That would explain at least part of your problem, then. 'sort' needs < > to keep at least two lines in main memory to compare them: if single < > input lines are many gigabytes long, then 'sort' must consume many < > gigabytes of memory, regardless of what parameter you specify with '-S'. < < You can run this to find the maximum line length: < < wc --max-line-length your-data
Ok, first, let me thank Jim, Bob and Paul. Here is the problem in a nutshell: wc is counting with long ints, and the first line of this 50GB file is a string of \0 whose length appears to be negative when counted with long ints. (Details below). I believe that this must be an error in the header file where 'uintmax_t' is defined. I do not know if one can consider this behaviour as a bug in sort, but it seems to me that sort might issue a warning if it encounters 'n>0' consecutive null characters in a file. --- I have squeezed out the null characters with tr and am attempting to sort the transformed file. This has shrunk the file from 50GB to 7GB, so I anticipate no problems. I will report back. --- Leo Butler. Details: ------- In my original post I mentioned I did count the max line length: $ /usr/bin/wc -L /data/espace/k_400_a.out 107 Here is the censored output of a routine that counts the occurence of all ascii characters: $ ./census /data/espace/k_400_a.out Ascii char Count ---------- ----- \0 Null character -1363090872 (snip) The longest line was identified at about line 65x10^6 with 108 chars incl. \n. Ouch! Look at that count of \0. The routine was counting with long ints, so I recompiled it with unsigned longs, and got Ascii char Count ---------- ----- \0 Null character 2931876424 (snip) Longest line 2931876444 chars at line 1 The counts of \0 are congruent mod LONG_MAX. Apparently, the first line contained roughly 42GB worth of null characters. I have no bleeding idea how this creeped in. LB. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils