Jeremy Chadwick <[EMAIL PROTECTED]> writes: > As you said: the code shows that when no files are specified (e.g. read > off a pipe), sort will make some assumptions regarding the initial > buffer size to read data into. The buffer size allocated in that case > is fairly large, rather than basing it off of the first line off stdin; > it looks like this is done to save CPU time in the long run (otherwise > you'd have to rellocate more later and take a hit; initbuf() is > responsible for that).
Oh give me a break, the self-starting exponential algorithm for growth of dynamically allocated buffers has been known for decades. In case any GNU sort developers are reading this, here it comes, free of charge: static char *buf = NULL; static size_t bufsz = 0; static size_t buflen = 0; int buf_append(const char *str) { size_t len; len = strlen(str); if (buflen + len + 1 > bufsz) { size_t nbufsz = bufsz; char *nbuf; while (buflen + len + 1 > nbufsz) nbufsz = nbufsz * 2 + 1; nbuf = realloc(buf, nbufsz); if (nbuf == NULL) return (-1); buf = nbuf; bufsz = nbufsz; } memcpy(buf + buflen, str, len); buf[buflen + len] = '\0'; buflen += len; return (0); } With a good allocator - and depending to some extent on the memory usage pattern of the rest of your program - if you jump-start it by initally allocating 16 kB or so (and setting bufsz accordingly), realloc() will never need to copy anything - but even in the worst case, the amortized cost will be O(2n), IIRC. This is practically unnoticeable next to the cost of the sorting algorithm itself, which will be O(n log n) at best and O(n*n) at worst. > > Looking at the code, it seems to go to extreme lengths to get it > > absolutely wrong. For instance, if hw.physmem / 8 > hw.usermem, it will > > pick the former, which means it's pretty much guaranteed to either fail > > or hose your system (or both). > Can you expand on this? Looking at the code, it doesn't appear that's > possible. The code in question is default_sort_size(), which is used > when no -S or --buffer-size argument is specified. I looked at how it computes the cap, which is MAX(total / 8, avail) - in other words, never mind what's actually feasible, I want more! More! More, I say! > > Count this as a vote for ditching GNU sort in favor of a BSD-licensed > > implementation (from {Net,Open}BSD for instance). > In this specific case, I think you're bashing GNU just because you feel > like it. Come on man... =/ No, I'm bashing GNU because it's bloated crap, as this example clearly shows. It wouldn't be the first time a BSD rewrite of a GNU tool ended up performing better; see for instance bsdtar. Besides, the FreeBSD project has a tradition of replacing GNU tools with BSD-licensed equivalents as long as no functionality is lost. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"