Jim Meyering wrote: > Jim Meyering wrote: > ... >> In case anyone is chomping at the bit, here's a preliminary patch: >> >> Here's a smaller test case that appears to be host/nproc-independent: >> It should print two lines: 1, then 7. >> Without this patch, it prints only "7". >> >> (yes 7|head -11; echo 1)|sort --parallel=1 -S32b -u ... > Here's a complete patch: > >>From 431102766cbf7c360ee6fa1f157ebcd7d8b9ca0e Mon Sep 17 00:00:00 2001 > From: Jim Meyering <[email protected]> > Date: Wed, 15 Aug 2012 12:30:44 +0200 > Subject: [PATCH] sort: sort --unique (-u) could cause data loss > > sort -u could omit one or more lines of expected output. > This bug arose because sort recorded the most recently printed line via > reference, and if you were unlucky, the storage for that line would be > reused (overwritten) as additional input was read into memory. If you > were doubly unlucky, the new value of the "saved" line would not only > match the very next line, but if that next line were also the first in > a series of identical, not-yet-printed lines, then the corrupted "saved" > line value would result in the omission of all matching lines. > > * src/sort.c (saved_line): New static/global, renamed and moved from... > (write_unique): ...here. Old name was "saved", which was too generic > for its new role as file-scoped global. > (fillbuf): With --unique, when we're about to read into a buffer that > overlaps the saved "preceding" line (saved_line), copy the line's .text > member to a realloc'd-as-needed temporary buffer and adjust the line's > key-defining members if they're set. > (overlap): New function. > * tests/misc/sort: New tests. > * NEWS (Bug fixes): Mention it. > * THANKS.in: Update. > Bug introduced via commit v8.5-89-g9face83. > Reported by Rasmus Borup Hansen in > http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/23173/focus=24647
That sort -u can cause data loss is a big deal. I want to make a release with this fix as soon as possible. Since I'm making this a mostly-bug-fix release, the du and md5 --tag changes will have to wait for 8.20. However, I'll be happy to apply documentation-correcting changes if someone would post a complete, updated patch or two. If Bruce and Paul find that changing gnulib's parse-datetime test will avoid a failure on LFS, I'll pull in a gnulib update for that. Any other bug-fix-like changes that people can suggest?
