Pádraig Brady <[EMAIL PROTECTED]> wrote: > Jim Meyering wrote: >> >> Hi Matt, >> >> I'm glad you're willing to work on this. >> It's an often-requested feature. >> Unfortunately, the Debian -W patch was not acceptable. >> It did not allow the same flexibility that sort does in >> selecting keys. To provide that, GNU uniq will eventually >> accept at least the following options, just as sort does: >> >> -k, --key=POS1[,POS2] start a key at POS1, end it at POS2 (origin 1) >> -t, --field-separator=SEP use SEP instead of non-blank to blank transition >> -z, --zero-terminated end lines with 0 byte, not newline >> >> and even most, if not all, of these (for flexibility/interoperability >> with sort, as well as to ease code sharing between uniq and sort): >> >> -b, --ignore-leading-blanks ignore leading blanks >> -d, --dictionary-order consider only blanks and alphanumeric >> characters >> -i, --ignore-nonprinting consider only printable characters > > agreed > >> -f, --ignore-case fold lower case to upper case characters > > It has this already. See below. > >> -g, --general-numeric-sort compare according to general numerical value >> -M, --month-sort compare (unknown) < `JAN' < ... < `DEC' >> -n, --numeric-sort compare according to string numerical value >> -r, --reverse reverse the result of comparisons > > These 4 deal with specific order which I don't think uniq should worry about?
You're right about --reverse. Thanks. However, the others change sort's idea of which values are equal, so they are relevant. For -g, 0.0 == 0 == 00, etc. For -M, FEB == feb == Feb, etc. For -n, 00 == 0. The idea is to be able to use uniq with the same keyspec options as you used when sorting the data. That means the command-line options listed above as well as the key spec modifier options like b, d, g, M etc. used e.g., in -k 1b,1 -k 2n. > uniq can be efficient and assume LANG=C always as > it need only care if adjacent items match or not. > Assuming LANG=C may be an issue for --ignore-case though? > However I notice v5.2.1 at least only seems to handle ascii: > > $ LANG=ga_IE.utf8 uniq -i < Pádraig > Pádraig > PÁdraig Yes, that's still a problem. Would you like to work on it? _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils