"Dmitry V. Levin" <[EMAIL PROTECTED]> wrote: > On Tue, Feb 19, 2008 at 03:44:00PM +0100, Jim Meyering wrote: >> James Youngman <[EMAIL PROTECTED]> wrote: >> > This is a consolidated patch including all the previous changes, >> > implementing order checking in join by default. "make distcheck" >> > works (if I move distcheck-hook from Makefile.am to GNUmakefile). >> > >> > 2008-02-16 James Youngman <[EMAIL PROTECTED]> >> > >> > * src/join.c: Support --check-order and --nocheck-order. >> > New variables check_input_order, seen_unpairable and >> > issued_disorder_warning[]. For --check-order, verify that the >> ... >> >> Thanks again. >> I've just pushed that. > > Looks like this commit v6.10-70-ga1e7156 introduces a regression: > > $ printf '%s\n%s\n' '1 b' '2 a' | sort -k2,2 | join -12 -21 - /dev/null > join: File 1 is not in sorted order > $ echo $? > 1 > $ join --version |grep join > join (GNU coreutils) 6.10.133-677610
Thank you for catching that. The new check_order function uses keycmp in a new way: with both lines coming from the same file, so that that function's hard-coded use of join_field_1 and join_field_2 globals was not appropriate. Here's a bare, knee-jerk patch: no NEWS, no comment adjustments, no new tests. Yet. Now, I'm hoping someone will find time to do a little testing to see whether the new feature (esp. with this additional change or something equivalent) impacts performance. diff --git a/src/join.c b/src/join.c index 8f0d436..aaf66b0 100644 --- a/src/join.c +++ b/src/join.c @@ -300,7 +300,8 @@ freeline (struct line *line) Report an error and exit if the comparison fails. */ static int -keycmp (struct line const *line1, struct line const *line2) +keycmp (struct line const *line1, struct line const *line2, + size_t jf_1, size_t jf_2) { /* Start of field to compare in each file. */ char *beg1; @@ -310,10 +311,10 @@ keycmp (struct line const *line1, struct line const *line2) size_t len2; /* Length of fields to compare. */ int diff; - if (join_field_1 < line1->nfields) + if (jf_1 < line1->nfields) { - beg1 = line1->fields[join_field_1].beg; - len1 = line1->fields[join_field_1].len; + beg1 = line1->fields[jf_1].beg; + len1 = line1->fields[jf_1].len; } else { @@ -321,10 +322,10 @@ keycmp (struct line const *line1, struct line const *line2) len1 = 0; } - if (join_field_2 < line2->nfields) + if (jf_2 < line2->nfields) { - beg2 = line2->fields[join_field_2].beg; - len2 = line2->fields[join_field_2].len; + beg2 = line2->fields[jf_2].beg; + len2 = line2->fields[jf_2].len; } else { @@ -376,7 +377,8 @@ check_order (const struct line *prev, { if (!issued_disorder_warning[whatfile-1]) { - if (keycmp (prev, current) > 0) + size_t join_field = whatfile == 1 ? join_field_1 : join_field_2; + if (keycmp (prev, current, join_field, join_field) > 0) { error ((check_input_order == CHECK_ORDER_ENABLED ? EXIT_FAILURE : 0), @@ -404,6 +406,7 @@ get_line (FILE *fp, struct line *line, int which) error (EXIT_FAILURE, errno, _("read error")); free (line->buf.buffer); line->buf.buffer = NULL; + // prevline[which - 1] = NULL; return false; } @@ -605,7 +608,8 @@ join (FILE *fp1, FILE *fp2) while (seq1.count && seq2.count) { size_t i; - diff = keycmp (&seq1.lines[0], &seq2.lines[0]); + diff = keycmp (&seq1.lines[0], &seq2.lines[0], + join_field_1, join_field_2); if (diff < 0) { if (print_unpairables_1) @@ -633,7 +637,8 @@ join (FILE *fp1, FILE *fp2) ++seq1.count; break; } - while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0])); + while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0], + join_field_1, join_field_2)); /* Keep reading lines from file2 as long as they continue to match the current line from file1. */ @@ -645,7 +650,8 @@ join (FILE *fp1, FILE *fp2) ++seq2.count; break; } - while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1])); + while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1], + join_field_1, join_field_2)); if (print_pairables) { -- 1.5.5.rc0.21.g740fd _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils