Living on the edge and top posting aggresively, these changes would be good to have.
Thanks, Alistair On Saturday, May 28, 2016, Robert Elz <k...@munnari.oz.au> wrote: > Inspired by Paul Goyette's question (on netbsd-users) I took a look at > sort, > and I'd like to commit the following updates if no-one objects. > > The only changes that should affect anything are the addition of the posix > C option, which is identical to c, but doesn't write messages to stderr > if the input file is not sorted, and fixing bugs in the processing of -R > such that if -R is used (and without setting it to \n - the processing of > which is also fixed in case it is set that wat using -R 10) \n does not > become a field separator regardless of what might be set later with -t (if > -t preceded -R it would have worked correctly, but not the other way.) > > Aside from that the changes are more or less cosmetic - they enforce using > only one of -c -C and -m (which make no sense used together), make the > usage() reflect reality (including formatting it to stop assuming it > is outputting to an 80 column display..., and reflect the man page changes > mentioned next), and fix a minor bug in a comment, removed the unused 'x' > option (what was that?) from SORT_OPTS (no effect, generates usage() either > way) and sorted the option processing (R comes before S...) > > In the man page, -C is documented, the synopsis is split to show the > (only one file allowed) different usage for -C or -c, and perhaps most > importantly, the names "field1" and "filed2" are changed to "kstart" and > "kend" to make it (a little more) clear that the -k argument does not > specify > or use a field as such, but designates the start and end of the sort keys > (with the designators using fields as an addressing object - which is all > fields are used for in sort, unlike awk, cut, etc.) and -R is fully > documented. > > There are no changes (at all) to anything actually related to sorting... > > Anyone object to these changes? (patch appended) > > kre > > Index: msort.c > =================================================================== > RCS file: /cvsroot/src/usr.bin/sort/msort.c,v > retrieving revision 1.30 > diff -u -r1.30 msort.c > --- msort.c 5 Feb 2010 21:58:42 -0000 1.30 > +++ msort.c 29 May 2016 05:00:34 -0000 > @@ -365,7 +365,7 @@ > * check order on one file > */ > void > -order(struct filelist *filelist, struct field *ftbl) > +order(struct filelist *filelist, struct field *ftbl, int quiet) > { > get_func_t get = SINGL_FLD ? makeline : makekey; > RECHEADER *crec, *prec, *trec; > @@ -387,10 +387,14 @@ > exit(0); > while (get(fp, crec, crec_end, ftbl) == 0) { > if (0 < (c = cmp(prec, crec))) { > + if (quiet) > + exit(1); > crec->data[crec->length-1] = 0; > errx(1, "found disorder: %s", > crec->data+crec->offset); > } > if (UNIQUE && !c) { > + if (quiet) > + exit(1); > crec->data[crec->length-1] = 0; > errx(1, "found non-uniqueness: %s", > crec->data+crec->offset); > Index: sort.1 > =================================================================== > RCS file: /cvsroot/src/usr.bin/sort/sort.1,v > retrieving revision 1.34 > diff -u -r1.34 sort.1 > --- sort.1 29 May 2013 15:00:35 -0000 1.34 > +++ sort.1 29 May 2016 05:00:34 -0000 > @@ -67,16 +67,26 @@ > .Nd sort or merge text files > .Sh SYNOPSIS > .Nm > -.Op Fl bcdfHilmnrSsu > +.Op Fl bdfHilmnrSsu > .Oo > .Fl k > -.Ar field1 Ns Op Li \&, Ns Ar field2 > +.Ar kstart Ns Op Li \&, Ns Ar kend > .Oc > .Op Fl o Ar output > .Op Fl R Ar char > .Op Fl T Ar dir > .Op Fl t Ar char > .Op Ar > +.Nm > +.Fl c|C > +.Op Fl bdfilnru > +.Oo > +.Fl k > +.Ar kstart Ns Op Li \&, Ns Ar kend > +.Op Fl t Ar char > +.Oc > +.Op Fl R Ar char > +.Op Ar file > .Sh DESCRIPTION > The > .Nm > @@ -101,6 +111,10 @@ > produces no output. > See also > .Fl u . > +.It Fl C > +Identical to > +.Fl c > +without the error messages in the case of unsorted input. > .It Fl H > Ignored for compatibility with earlier versions of > .Nm . > @@ -137,9 +151,13 @@ > option, check that there are no lines with duplicate keys. > .El > .Pp > -The following options override the default ordering rules. > -When ordering options appear independent of key field > -specifications, the requested field ordering rules are > +The following options, > +which should be given before any > +.Fl k > +options, override the default ordering rules. > +When ordering options appear independent of, > +and before, key field specifications, > +the requested field ordering rules are > applied globally to all sort keys. > When attached to a specific key (see > .Fl k ) , > @@ -224,12 +242,21 @@ > This should be used with discretion; > .Fl R Aq Ar alphanumeric > usually produces undesirable results. > +If char is not a single character, then it > +specifies the value of the desired record > +separator as an integer specified in any > +of the normal NNN, 0ooo, or 0xXXX ways, > +or as an octal value preceded by \e. > +Caution: do not attempt to specify Ctl-A > +as > +.Dq -R 1 > +which will not do what was intended at all! > The default record separator is newline. > -.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2 > +.It Fl k Ar kstart Ns Op Li \&, Ns Ar kend > Designates the starting position, > -.Ar field1 , > +.Ar kstart , > and optional ending position, > -.Ar field2 , > +.Ar kend , > of a key field. > The > .Fl k > @@ -265,16 +292,16 @@ > Fields are specified > by the > .Fl k > -.Ar field1 Ns Op \&, Ns Ar field2 > +.Ar kstart Ns Op \&, Ns Ar kend > argument. > A missing > -.Ar field2 > +.Ar kend > argument defaults to the end of a line. > .Pp > The arguments > -.Ar field1 > +.Ar kstart > and > -.Ar field2 > +.Ar kend > have the form > .Ar m Ns Li \&. Ns Ar n > and can be followed by one or more of the letters > @@ -284,7 +311,7 @@ > .Cm r , > which correspond to the options discussed above. > A > -.Ar field1 > +.Ar kstart > position specified by > .Ar m Ns Li \&. Ns Ar n > .Pq Ar m , n No \*[Gt] 0 > @@ -296,7 +323,7 @@ > A missing > .Li \&. Ns Ar n > in > -.Ar field1 > +.Ar kstart > means > .Ql \&.1 , > indicating the first character of the > @@ -314,7 +341,7 @@ > field. > .Pp > A > -.Ar field2 > +.Ar kend > position specified by > .Ar m Ns Li \&. Ns Ar n > is interpreted as > @@ -451,7 +478,7 @@ > Thus performance depends highly on efficient choice of sort keys, and the > .Fl b > option and the > -.Ar field2 > +.Ar kend > argument of the > .Fl k > option should be used whenever possible. > Index: sort.c > =================================================================== > RCS file: /cvsroot/src/usr.bin/sort/sort.c,v > retrieving revision 1.61 > diff -u -r1.61 sort.c > --- sort.c 16 Sep 2011 15:39:29 -0000 1.61 > +++ sort.c 29 May 2016 05:00:34 -0000 > @@ -117,7 +117,7 @@ > main(int argc, char *argv[]) > { > int ch, i, stdinflag = 0; > - char cflag = 0, mflag = 0; > + char mode = 0; > char *outfile, *outpath = 0; > struct field *fldtab; > size_t fldtab_sz, fld_cnt; > @@ -145,9 +145,9 @@ > fldtab = emalloc(fldtab_sz * sizeof(*fldtab)); > memset(fldtab, 0, fldtab_sz * sizeof(*fldtab)); > > -#define SORT_OPTS "bcdD:fHik:lmno:rR:sSt:T:ux" > +#define SORT_OPTS "bcCdD:fHik:lmno:rR:sSt:T:u" > > - /* Convert "+field" args to -f format */ > + /* Convert "+field" args to -k format */ > fixit(&argc, argv, SORT_OPTS); > > if (!(tmpdir = getenv("TMPDIR"))) > @@ -158,8 +158,10 @@ > case 'b': > fldtab[0].flags |= BI | BT; > break; > - case 'c': > - cflag = 1; > + case 'c': case 'C': case 'm': > + if (mode) > + usage("Incompatible operation modes"); > + mode = ch; > break; > case 'D': /* Debug flags */ > for (i = 0; optarg[i]; i++) > @@ -179,15 +181,33 @@ > > setfield(optarg, &fldtab[++fld_cnt], > fldtab[0].flags); > break; > - case 'm': > - mflag = 1; > - break; > case 'o': > outpath = optarg; > break; > case 'r': > REVERSE = 1; > break; > + case 'R': > + if (REC_D != '\n') > + usage("multiple record delimiters"); > + REC_D = *optarg; > + if (optarg[1] != '\0') { > + char *ep; > + int t = 0; > + > + if (optarg[0] == '\\') > + optarg++, t = 8; > + REC_D = (int)strtol(optarg, &ep, t); > + if (*ep != '\0' || REC_D < 0 || > + REC_D >= (int)__arraycount(d_mask)) > + errx(2, "invalid record delimiter > %s", > + optarg); > + } > + if (REC_D == '\n') > + break; > + d_mask['\n'] = d_mask[' ']; > + d_mask[REC_D] = REC_D_F; > + break; > case 's': > /* > * Nominally 'stable sort', keep lines with equal > keys > @@ -213,30 +233,11 @@ > SEP_FLAG = 1; > d_mask[' '] &= ~FLD_D; > d_mask['\t'] &= ~FLD_D; > + d_mask['\n'] &= ~FLD_D; > d_mask[(u_char)*optarg] |= FLD_D; > if (d_mask[(u_char)*optarg] & REC_D_F) > errx(2, "record/field delimiter clash"); > break; > - case 'R': > - if (REC_D != '\n') > - usage("multiple record delimiters"); > - REC_D = *optarg; > - if (REC_D == '\n') > - break; > - if (optarg[1] != '\0') { > - char *ep; > - int t = 0; > - if (optarg[0] == '\\') > - optarg++, t = 8; > - REC_D = (int)strtol(optarg, &ep, t); > - if (*ep != '\0' || REC_D < 0 || > - REC_D >= (int)__arraycount(d_mask)) > - errx(2, "invalid record delimiter > %s", > - optarg); > - } > - d_mask['\n'] = d_mask[' ']; > - d_mask[REC_D] = REC_D_F; > - break; > case 'T': > /* -T tmpdir */ > tmpdir = optarg; > @@ -254,13 +255,13 @@ > /* Don't sort on raw record if keys match */ > posix_sort = 0; > > - if (cflag && argc > optind+1) > + if ((mode == 'c' || mode == 'C') && argc > optind+1) > errx(2, "too many input files for -c option"); > if (argc - 2 > optind && !strcmp(argv[argc-2], "-o")) { > outpath = argv[argc-1]; > argc -= 2; > } > - if (mflag && argc - optind > (MAXFCT - (16+1))*16) > + if (mode == 'm' && argc - optind > (MAXFCT - (16+1))*16) > errx(2, "too many input files for -m option"); > > for (i = optind; i < argc; i++) { > @@ -309,8 +310,8 @@ > num_input_files = argc - optind; > } > > - if (cflag) { > - order(&filelist, fldtab); > + if (mode == 'c' || mode == 'C') { > + order(&filelist, fldtab, mode == 'C'); > /* NOT REACHED */ > } > > @@ -348,7 +349,7 @@ > err(2, "output file %s", outfile); > } > > - if (mflag) > + if (mode == 'm') > fmerge(&filelist, num_input_files, outfp, fldtab); > else > fsort(&filelist, num_input_files, outfp, fldtab); > @@ -393,13 +394,20 @@ > static void > usage(const char *msg) > { > + const char *pn = getprogname(); > + > if (msg != NULL) > (void)fprintf(stderr, "%s: %s\n", getprogname(), msg); > (void)fprintf(stderr, > - "usage: %s [-bcdfHilmnrSsu] [-k field1[,field2]] [-o output]" > - " [-R char] [-T dir]", getprogname()); > + "usage: %s [-bdfHilmnrSsu] [-k kstart[,kend]] [-o output]" > + " [-R char] [-T dir]\n", pn); > (void)fprintf(stderr, > " [-t char] [file ...]\n"); > + (void)fprintf(stderr, > + " or: %s -[cC] [-bdfilnru] [-k kstart[,kend]] [-o output]" > + " [-R char]\n", pn); > + (void)fprintf(stderr, > + " [-t char] [file]\n"); > exit(2); > } > > Index: sort.h > =================================================================== > RCS file: /cvsroot/src/usr.bin/sort/sort.h,v > retrieving revision 1.35 > diff -u -r1.35 sort.h > --- sort.h 5 Aug 2015 07:10:03 -0000 1.35 > +++ sort.h 29 May 2016 05:00:34 -0000 > @@ -191,7 +191,7 @@ > int makeline(FILE *, RECHEADER *, u_char *, struct field *); > void makeline_copydown(RECHEADER *); > int optval(int, int); > -__dead void order(struct filelist *, struct field *); > +__dead void order(struct filelist *, struct field *, int); > void putline(const RECHEADER *, FILE *); > void putrec(const RECHEADER *, FILE *); > void putkeydump(const RECHEADER *, FILE *); > > >