Re: [PATCH] Implement join --check-order and --nocheck-order.

2008-03-23 Thread Jim Meyering
"Dmitry V. Levin" <[EMAIL PROTECTED]> wrote:
> On Tue, Feb 19, 2008 at 03:44:00PM +0100, Jim Meyering wrote:
>> James Youngman <[EMAIL PROTECTED]> wrote:
>> > This is a consolidated patch including all the previous changes,
>> > implementing order checking in join by default.  "make distcheck"
>> > works (if I move distcheck-hook from Makefile.am to GNUmakefile).
>> >
>> > 2008-02-16  James Youngman  <[EMAIL PROTECTED]>
>> >
>> >* src/join.c: Support --check-order and --nocheck-order.
>> >New variables check_input_order, seen_unpairable and
>> >issued_disorder_warning[]. For --check-order, verify that the
>> ...
>>
>> Thanks again.
>> I've just pushed that.
>
> Looks like this commit v6.10-70-ga1e7156 introduces a regression:
>
> $ printf '%s\n%s\n' '1 b' '2 a' | sort -k2,2 | join -12 -21 - /dev/null
> join: File 1 is not in sorted order
> $ echo $?
> 1
> $ join --version |grep join
> join (GNU coreutils) 6.10.133-677610

Thank you for catching that.
The new check_order function uses keycmp in a new way: with both
lines coming from the same file, so that that function's hard-coded
use of join_field_1 and join_field_2 globals was not appropriate.

Here's a bare, knee-jerk patch: no NEWS,
no comment adjustments, no new tests.  Yet.

Now, I'm hoping someone will find time to do a little testing to see
whether the new feature (esp. with this additional change or something
equivalent) impacts performance.

diff --git a/src/join.c b/src/join.c
index 8f0d436..aaf66b0 100644
--- a/src/join.c
+++ b/src/join.c
@@ -300,7 +300,8 @@ freeline (struct line *line)
Report an error and exit if the comparison fails.  */

 static int
-keycmp (struct line const *line1, struct line const *line2)
+keycmp (struct line const *line1, struct line const *line2,
+   size_t jf_1, size_t jf_2)
 {
   /* Start of field to compare in each file.  */
   char *beg1;
@@ -310,10 +311,10 @@ keycmp (struct line const *line1, struct line const 
*line2)
   size_t len2; /* Length of fields to compare.  */
   int diff;

-  if (join_field_1 < line1->nfields)
+  if (jf_1 < line1->nfields)
 {
-  beg1 = line1->fields[join_field_1].beg;
-  len1 = line1->fields[join_field_1].len;
+  beg1 = line1->fields[jf_1].beg;
+  len1 = line1->fields[jf_1].len;
 }
   else
 {
@@ -321,10 +322,10 @@ keycmp (struct line const *line1, struct line const 
*line2)
   len1 = 0;
 }

-  if (join_field_2 < line2->nfields)
+  if (jf_2 < line2->nfields)
 {
-  beg2 = line2->fields[join_field_2].beg;
-  len2 = line2->fields[join_field_2].len;
+  beg2 = line2->fields[jf_2].beg;
+  len2 = line2->fields[jf_2].len;
 }
   else
 {
@@ -376,7 +377,8 @@ check_order (const struct line *prev,
 {
   if (!issued_disorder_warning[whatfile-1])
{
- if (keycmp (prev, current) > 0)
+ size_t join_field = whatfile == 1 ? join_field_1 : join_field_2;
+ if (keycmp (prev, current, join_field, join_field) > 0)
{
  error ((check_input_order == CHECK_ORDER_ENABLED
  ? EXIT_FAILURE : 0),
@@ -404,6 +406,7 @@ get_line (FILE *fp, struct line *line, int which)
error (EXIT_FAILURE, errno, _("read error"));
   free (line->buf.buffer);
   line->buf.buffer = NULL;
+  // prevline[which - 1] = NULL;
   return false;
 }

@@ -605,7 +608,8 @@ join (FILE *fp1, FILE *fp2)
   while (seq1.count && seq2.count)
 {
   size_t i;
-  diff = keycmp (&seq1.lines[0], &seq2.lines[0]);
+  diff = keycmp (&seq1.lines[0], &seq2.lines[0],
+join_field_1, join_field_2);
   if (diff < 0)
{
  if (print_unpairables_1)
@@ -633,7 +637,8 @@ join (FILE *fp1, FILE *fp2)
++seq1.count;
break;
  }
-  while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0]));
+  while (!keycmp (&seq1.lines[seq1.count - 1], &seq2.lines[0],
+ join_field_1, join_field_2));

   /* Keep reading lines from file2 as long as they continue to
  match the current line from file1.  */
@@ -645,7 +650,8 @@ join (FILE *fp1, FILE *fp2)
++seq2.count;
break;
  }
-  while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1]));
+  while (!keycmp (&seq1.lines[0], &seq2.lines[seq2.count - 1],
+ join_field_1, join_field_2));

   if (print_pairables)
{
--
1.5.5.rc0.21.g740fd


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: configure fails on Maemo/OS2008 (sed: no previous regexp)

2008-03-23 Thread Jim Meyering
Vincent Lefevre <[EMAIL PROTECTED]> wrote:
> On 2008-03-22 19:20:37 +0100, Jim Meyering wrote:
>> This appears to be a busybox "limitation".
>> I expect to work around it with this patch:
...
> Thanks, I could build the coreutils on the N810.

Thanks for the quick confirmation.
I pushed that change.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Implement join --check-order and --nocheck-order.

2008-03-23 Thread Jim Meyering
Jim Meyering <[EMAIL PROTECTED]> wrote:
...
> diff --git a/src/join.c b/src/join.c
> index 8f0d436..aaf66b0 100644
> --- a/src/join.c
> +++ b/src/join.c
...
> @@ -404,6 +406,7 @@ get_line (FILE *fp, struct line *line, int which)
>   error (EXIT_FAILURE, errno, _("read error"));
>free (line->buf.buffer);
>line->buf.buffer = NULL;
> +  // prevline[which - 1] = NULL;
>return false;
>  }
>

BTW, that chunk is obviously not going to be pushed.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


seq bug -- out of bounds read (off by one)

2008-03-23 Thread Daniel Dunbar
Hi Jim, 
 
Thanks for the quick reply on the ptx bug. 
 
We have found another minor bug (non-crashing) in seq. An example problem input 
is:  
-- 
[EMAIL PROTECTED]:src$ ./seq -f %0 1 
./seq: memory exhausted 
-- 
 
The problem is in seq.c:213 (in 6.10 source),  
  if (! strchr ("efgaEFGA", fmt[i])) 
which I believe should be 
  if (!fmt[i] || ! strchr ("efgaEFGA", fmt[i])) 
  
In the cases when fmt[i]=='\0' then strchr returns a pointer to the end of the 
"efgaEFGA" string, 
and the subsequent for loop preface increments i, thus jumping over the 
terminating null. The 
for loop then performs out of bounds reads until the next 0 or %[^%]. We 
attempted to get a 
more interesting crash since this loop is incrementing suffix_len in the 
meantime, but it appears 
that all format string arguments that generate this bug are later rejected by 
asprintf in print_numbers(). 
 
 - Daniel 
 
- Original Message  
From: Jim Meyering <[EMAIL PROTECTED]> 
To: Cristian Cadar <[EMAIL PROTECTED]> 
Cc: bug-coreutils@gnu.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] 
Sent: Friday, March 21, 2008 3:11:02 AM 
Subject: Re: ptx bug -- unbounded buffer overflow 
 
 Cristian  Cadar  <[EMAIL PROTECTED]>  wrote: 
> Hello,  I'm  part  of  a  research  group  at  Stanford,  working  on  
> automatic 
>  bug-finding  tools.   We  are  currently  testing  coreutils,  and  we  
> found  a 
>  crash  bug  in  ptx  due  to  an  unbounded  buffer  overflow. 
> 
> Here  is  a  trivial  test  case  that  triggers  the  bug  in  the  
> current 
>  version  of  coreutils  (6.10): 
> 
>  $  ptx  -F\\ 
> 
> Another  example,  which  overflows  more  bytes  would  be: 
>  $  ptx  -F\\  abcdef 
> 
>  (the  overflow  increases  w/  the  length  of  the  second  argument). 
> 
> The  problem  is  in  function  copy_unescaped_string(const  char  
> *string), 
>  which  in  the  presence  of  backslashes  can  advance  the  pointer  
> "string" 
>  past  the  end  of  the  buffer.   This  in  turn  causes  an  unbounded  
> overflow 
>  of  the  buffer  malloc-ed  at  the  very  beginning  of  the  function,  
> which  in 
>  turn  can  be  used  to  corrupt  the  heap  metadata  and  crash  the  
> program. 
 
Thank  you  for  finding/reporting  that.   It  is  indeed  a  bug. 
I've  included  a  patch  and  a  test-suite  addition  below. 
 
Are  your  tools  freely  available? 
 
If  you  have  any  more  reports  like  that,  please  send  them  in  soon. 
I  expect  to  make  a  new  release  in  the  next  week  or  two. 
 
 
 
 
 





___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils