Re: Java-related Bison tests all fail

2007-02-11 Thread Bruno Haible
Joel E. Denny wrote:
> I am able to reproduce the problem with coreutils 5.2.1 on both an x86 
> Linux and an x86 Solaris.  I've attached the erroneous output.
> 
> The trouble is the last tr.  The following replacement, which I discovered 
> by trial and error, works.  I just inserted zz after FG:

Thanks. I'm applying this workaround:


2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* m4/javacomp.m4 (gt_JAVACOMP): Work around a 'tr' bug in coreutils.
Reported by Joel E. Denny <[EMAIL PROTECTED]>,

--- m4/javacomp.m4  4 Feb 2007 16:28:01 -   1.3
+++ m4/javacomp.m4  11 Feb 2007 12:24:50 -
@@ -1,4 +1,4 @@
-# javacomp.m4 serial 9 (gettext-0.16.2)
+# javacomp.m4 serial 10 (gettext-0.16.2)
 dnl Copyright (C) 2001-2003, 2006-2007 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -86,9 +86,11 @@
dnl to assume the presence of uudecode, use the command
dnl   $ od -A n -t o1 < conftestver.class | tr ' ' '\012'| sort | uniq 
| sed -e '/^$/d' -e 's,^,\\,' | tr -d '\012'
dnl and the long tr command in opposite direction.
+   dnl Finally move the position corresponding to \055 to the last 
position,
+   dnl to work around a coreutils-5.x bug.
echo 
'xyvw!$!H!C,!)!2+!3!4*!5,!3!6,!7!8)!9)!:"!(LdhdmM"!$EFV"!%Ni_a"!1PdhaQngYakUXYfa"!%gXdh"!8EWPeXoXJfXhcJTmkdhcKFV"!,TinkZaOdfa"!2ZihbmalmoakIeXoX.!*!+)!;.!!?)[EMAIL
 
PROTECTED]"!-Zihbmalmoak"!2eXoXJfXhcJRYeaZm"!2eXoXJfXhcJTplmag"!$inm"!7PeXoXJdiJSkdhmTmkaXgK"!-camSkijakmp"!DEPeXoXJfXhcJTmkdhcKFPeXoXJfXhcJTmkdhcK"!5eXoXJdiJSkdhmTmkaXg"!)jkdhmfh"!7EPeXoXJfXhcJTmkdhcKFV!C!(!)!#!"!*!+!"!,!!!?!"!"!!!&Gt!"q!!!"!-!!!(!"!!!"!+!.!/!"!,!!!E!#!"!!!.r!#4$u!%s!&q!!!"!-!!!,!#!!!$!-!%!"!0!!!#!1'
 \
  | tr -d '\012\015' \
- | tr '!"#$%&()*+,-./0123456789:;<=>[EMAIL PROTECTED]' 
'\000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\040\041\046\050\051\052\055\056\057\073\074\076\103\106\114\116\117\120\123\124\126\133\141\142\143\144\145\146\147\151\152\154\155\156\157\160\162\163\164\165\166\171\261\262\266\267\270\272\276\312\376'
 \
+ | tr '!"#$%&()*+,-./0123456789:;<=>[EMAIL PROTECTED]' 
'\000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\040\041\046\050\051\052\056\057\073\074\076\103\106\114\116\117\120\123\124\126\133\141\142\143\144\145\146\147\151\152\154\155\156\157\160\162\163\164\165\166\171\261\262\266\267\270\272\276\312\376\055'
 \
  > conftestver.class
target_version=`{
  unset JAVA_HOME





c-strstr: add testcase

2007-02-11 Thread Bruno Haible
Hi,

I'm starting to speed up the substring search functions. First, a test case.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* modules/c-strstr-tests: New file.
* tests/test-c-strstr.c: New file.

== modules/c-strstr-tests ==
Files:
tests/test-c-strstr.c

Depends-on:

configure.ac:

Makefile.am:
TESTS += test-c-strstr
check_PROGRAMS += test-c-strstr

=== tests/test-c-strstr.c ==
/* Test of searching in a string.
   Copyright (C) 2007 Free Software Foundation, Inc.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2, or (at your option)
   any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software Foundation,
   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */

/* Written by Bruno Haible <[EMAIL PROTECTED]>, 2007.  */

#ifdef HAVE_CONFIG_H
# include 
#endif

#include "c-strstr.h"

#include 

#define ASSERT(expr) if (!(expr)) abort ();

int
main ()
{
  {
const char input[] = "foo";
const char *result = c_strstr (input, "");
ASSERT (result == input);
  }

  {
const char input[] = "foo";
const char *result = c_strstr (input, "o");
ASSERT (result == input + 1);
  }

  {
const char input[] = "ABC ABCDAB ABCDABCDABDE";
const char *result = c_strstr (input, "ABCDABD");
ASSERT (result == input + 15);
  }

  {
const char input[] = "ABC ABCDAB ABCDABCDABDE";
const char *result = c_strstr (input, "ABCDABE");
ASSERT (result == NULL);
  }

  /* Check that a very long haystack is handled quickly if the needle is
 short and occurs near the beginning.  */
  {
size_t repeat = 1;
size_t m = 100;
char *needle =
  ""
  "";
char *haystack = (char *) malloc (m + 1);
if (haystack != NULL)
  {
memset (haystack, 'A', m);
haystack[0] = 'B';
haystack[m] = '\0';

for (; repeat > 0; repeat--)
  {
ASSERT (c_strstr (haystack, needle) == haystack + 1);
  }

free (haystack);
  }
  }

  /* Check that a very long needle is discarded quickly if the haystack is
 short.  */
  {
size_t repeat = 1;
size_t m = 100;
char *haystack =
  ""
  "ABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABAB";
char *needle = (char *) malloc (m + 1);
if (needle != NULL)
  {
memset (needle, 'A', m);
needle[m] = '\0';

for (; repeat > 0; repeat--)
  {
ASSERT (c_strstr (haystack, needle) == NULL);
  }

free (needle);
  }
  }

  /* Check that the asymptotic worst-case complexity is not quadratic.  */
  {
size_t m = 100;
char *haystack = (char *) malloc (2 * m + 2);
char *needle = (char *) malloc (m + 2);
if (haystack != NULL && needle != NULL)
  {
const char *result;

memset (haystack, 'A', 2 * m);
haystack[2 * m] = 'B';
haystack[2 * m + 1] = '\0';

memset (needle, 'A', m);
needle[m] = 'B';
needle[m + 1] = '\0';

result = c_strstr (haystack, needle);
ASSERT (result == haystack + m);
  }
if (needle != NULL)
  free (needle);
if (haystack != NULL)
  free (haystack);
  }

  return 0;
}





Re: c-strstr: add testcase

2007-02-11 Thread Bruno Haible
The "check that the asymptotic worst-case complexity is not quadratic"
fails: it takes eternities. I'm therefore going to add a catch for
worst-case linear complexity. This is impossible with the current over-
optimized 'goto' spaghetti implementation, so I'm replacing it with a
maintainable implementation first.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/c-strstr.c: Complete rewrite for maintainability.

 lib/c-strstr.c 
/* c-strstr.c -- substring search in C locale
   Copyright (C) 2005-2007 Free Software Foundation, Inc.
   Written by Bruno Haible <[EMAIL PROTECTED]>, 2005, 2007.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2, or (at your option)
   any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software Foundation,
   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */

#include 

/* Specification.  */
#include "c-strstr.h"

#include 

/* Find the first occurrence of NEEDLE in HAYSTACK.  */
char *
c_strstr (const char *haystack, const char *needle)
{
  /* Be careful not to look at the entire extent of haystack or needle
 until needed.  This is useful because of these two cases:
   - haystack may be very long, and a match of needle found early,
   - needle may be very long, and not even a short initial segment of
 needle may be found in haystack.  */
  if (*needle != '\0')
{
  /* Speed up the following searches of needle by caching its first
 character.  */
  unsigned char b = (unsigned char) *needle;

  needle++;
  for (;; haystack++)
{
  if (*haystack == '\0')
/* No match.  */
return NULL;
  if ((unsigned char) *haystack == b)
/* The first character matches.  */
{
  const char *rhaystack = haystack + 1;
  const char *rneedle = needle;

  for (;; rhaystack++, rneedle++)
{
  if (*rneedle == '\0')
/* Found a match.  */
return (char *) haystack;
  if (*rhaystack == '\0')
/* No match.  */
return NULL;
  if ((unsigned char) *rhaystack != (unsigned char) *rneedle)
/* Nothing in this round.  */
break;
}
}
}
}
  else
return (char *) haystack;
}





Re: c-strstr: add testcase

2007-02-11 Thread Bruno Haible
Now here's the fix that brings the worst-case complexity down from O(n*m)
to O(n). One could use either the Knuth-Morris-Pratt or the Boyer-Moore
algorithm. I'm using the former one, because Boyer-Moore works "backwards",
which is harder to generalize to the mbsstr case.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

Ensure O(n) worst-case complexity of c_strstr.
* lib/c-strstr.c: Include stdbool.h, string.h.
(knuth_morris_pratt): New function.
(c_strstr): Add some bookkeeping. Invoke knuth_morris_pratt when the
bookkeeping indicates that it's worth it.
* modules/c-strstr (Depends-on): Add stdbool, strnlen.

*** lib/c-strstr.c  11 Feb 2007 13:58:43 -  1.4
--- lib/c-strstr.c  11 Feb 2007 14:01:35 -
***
*** 21,27 
  /* Specification.  */
  #include "c-strstr.h"
  
! #include 
  
  /* Find the first occurrence of NEEDLE in HAYSTACK.  */
  char *
--- 21,117 
  /* Specification.  */
  #include "c-strstr.h"
  
! #include 
! #include 
! #include 
! 
! /* Knuth-Morris-Pratt algorithm.
!See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
!Return a boolean indicating success.  */
! static bool
! knuth_morris_pratt (const char *haystack, const char *needle,
!   const char **resultp)
! {
!   size_t m = strlen (needle);
! 
!   /* Allocate the table.  */
!   size_t *table = (size_t *) malloc (m * sizeof (size_t));
!   if (table == NULL)
! return false;
!   /* Fill the table.
!  For 0 < i < m:
!0 < table[i] <= i is defined such that
!rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
!implies
!forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
!and table[i] is as large as possible with this property.
!  table[0] remains uninitialized.  */
!   {
! size_t i, j;
! 
! table[1] = 1;
! j = 0;
! for (i = 2; i < m; i++)
!   {
!   unsigned char b = (unsigned char) needle[i - 1];
! 
!   for (;;)
! {
!   if (b == (unsigned char) needle[j])
! {
!   table[i] = i - ++j;
!   break;
! }
!   if (j == 0)
! {
!   table[i] = i;
!   break;
! }
!   j = j - table[j];
! }
!   }
!   }
! 
!   /* Search, using the table to accelerate the processing.  */
!   {
! size_t j;
! const char *rhaystack;
! const char *phaystack;
! 
! *resultp = NULL;
! j = 0;
! rhaystack = haystack;
! phaystack = haystack;
! /* Invariant: phaystack = rhaystack + j.  */
! while (*phaystack != '\0')
!   if ((unsigned char) needle[j] == (unsigned char) *phaystack)
!   {
! j++;
! phaystack++;
! if (j == m)
!   {
! /* The entire needle has been found.  */
! *resultp = rhaystack;
! break;
!   }
!   }
!   else if (j > 0)
!   {
! /* Found a match of needle[0..j-1], mismatch at needle[j].  */
! rhaystack += table[j];
! j -= table[j];
!   }
!   else
!   {
! /* Found a mismatch at needle[0] already.  */
! rhaystack++;
! phaystack++;
!   }
!   }
! 
!   free (table);
!   return true;
! }
  
  /* Find the first occurrence of NEEDLE in HAYSTACK.  */
  char *
***
*** 34,39 
--- 124,149 
   needle may be found in haystack.  */
if (*needle != '\0')
  {
+   /* Minimizing the worst-case complexity:
+Let n = strlen(haystack), m = strlen(needle).
+The naïve algorithm is O(n*m) worst-case.
+The Knuth-Morris-Pratt algorithm is O(n) worst-case but it needs a
+memory allocation.
+To achieve linear complexity and yet amortize the cost of the memory
+allocation, we activate the Knuth-Morris-Pratt algorithm only once
+the naïve algorithm has already run for some time; more precisely,
+when
+  - the outer loop count is >= 10,
+  - the average number of comparisons per outer loop is >= 5,
+  - the total number of comparisons is >= m.
+But we try it only once.  If the memory allocation attempt failed,
+we don't retry it.  */
+   bool try_kmp = true;
+   size_t outer_loop_count = 0;
+   size_t comparison_count = 0;
+   size_t last_ccount = 0; /* last comparison count */
+   const char *needle_last_ccount = needle;/* = needle + 
last_ccount */
+ 
/* Speed up the following searches of needle by caching its first
 character.  */
unsigned char b = (unsigned char) *needle;
***
*** 44,49 
--- 154,190 
  if (*haystack == '\0')
/* No match.  */
return NULL;
+ 
+ /* See whether it's advisable to use an asymptotically faster
+algorithm.  */
+ if (try_kmp
+ && 

c-strcasestr: avoid quadratic time consumption

2007-02-11 Thread Bruno Haible
Like c_strstr, here is a patch for c_strcasestr that avoids quadratic
asymptotic complexity.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

Ensure O(n) worst-case complexity of c_strcasestr.
* lib/c-strcasestr.c: Include stdbool.h, string.h.
(knuth_morris_pratt): New function.
(c_strcasestr): Add some bookkeeping. Invoke knuth_morris_pratt when
the bookkeeping indicates that it's worth it.
* modules/c-strcasestr (Depends-on): Add stdbool, strnlen.

* modules/c-strcasestr-tests: New file.
* tests/test-c-strcasestr.c: New file.

*** lib/c-strcasestr.c  14 Sep 2006 14:18:36 -  1.2
--- lib/c-strcasestr.c  11 Feb 2007 15:00:17 -
***
*** 21,30 
--- 21,121 
  /* Specification.  */
  #include "c-strcasestr.h"
  
+ #include 
  #include 
+ #include 
  
  #include "c-ctype.h"
  
+ /* Knuth-Morris-Pratt algorithm.
+See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
+Return a boolean indicating success.  */
+ static bool
+ knuth_morris_pratt (const char *haystack, const char *needle,
+   const char **resultp)
+ {
+   size_t m = strlen (needle);
+ 
+   /* Allocate the table.  */
+   size_t *table = (size_t *) malloc (m * sizeof (size_t));
+   if (table == NULL)
+ return false;
+   /* Fill the table.
+  For 0 < i < m:
+0 < table[i] <= i is defined such that
+rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
+implies
+forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
+and table[i] is as large as possible with this property.
+  table[0] remains uninitialized.  */
+   {
+ size_t i, j;
+ 
+ table[1] = 1;
+ j = 0;
+ for (i = 2; i < m; i++)
+   {
+   unsigned char b = c_tolower ((unsigned char) needle[i - 1]);
+ 
+   for (;;)
+ {
+   if (b == c_tolower ((unsigned char) needle[j]))
+ {
+   table[i] = i - ++j;
+   break;
+ }
+   if (j == 0)
+ {
+   table[i] = i;
+   break;
+ }
+   j = j - table[j];
+ }
+   }
+   }
+ 
+   /* Search, using the table to accelerate the processing.  */
+   {
+ size_t j;
+ const char *rhaystack;
+ const char *phaystack;
+ 
+ *resultp = NULL;
+ j = 0;
+ rhaystack = haystack;
+ phaystack = haystack;
+ /* Invariant: phaystack = rhaystack + j.  */
+ while (*phaystack != '\0')
+   if (c_tolower ((unsigned char) needle[j])
+ == c_tolower ((unsigned char) *phaystack))
+   {
+ j++;
+ phaystack++;
+ if (j == m)
+   {
+ /* The entire needle has been found.  */
+ *resultp = rhaystack;
+ break;
+   }
+   }
+   else if (j > 0)
+   {
+ /* Found a match of needle[0..j-1], mismatch at needle[j].  */
+ rhaystack += table[j];
+ j -= table[j];
+   }
+   else
+   {
+ /* Found a mismatch at needle[0] already.  */
+ rhaystack++;
+ phaystack++;
+   }
+   }
+ 
+   free (table);
+   return true;
+ }
+ 
  /* Find the first occurrence of NEEDLE in HAYSTACK, using case-insensitive
 comparison.
 Note: This function may, in multibyte locales, return success even if
***
*** 39,44 
--- 130,155 
   needle may be found in haystack.  */
if (*needle != '\0')
  {
+   /* Minimizing the worst-case complexity:
+Let n = strlen(haystack), m = strlen(needle).
+The naïve algorithm is O(n*m) worst-case.
+The Knuth-Morris-Pratt algorithm is O(n) worst-case but it needs a
+memory allocation.
+To achieve linear complexity and yet amortize the cost of the memory
+allocation, we activate the Knuth-Morris-Pratt algorithm only once
+the naïve algorithm has already run for some time; more precisely,
+when
+  - the outer loop count is >= 10,
+  - the average number of comparisons per outer loop is >= 5,
+  - the total number of comparisons is >= m.
+But we try it only once.  If the memory allocation attempt failed,
+we don't retry it.  */
+   bool try_kmp = true;
+   size_t outer_loop_count = 0;
+   size_t comparison_count = 0;
+   size_t last_ccount = 0; /* last comparison count */
+   const char *needle_last_ccount = needle;/* = needle + 
last_ccount */
+ 
/* Speed up the following searches of needle by caching its first
 character.  */
unsigned char b = c_tolower ((unsigned char) *needle);
***
*** 49,54 
--- 160,196 
  if (*haystack == '\0')
/* No match.  */
return NULL;
+ 
+ /* See whether it's advisable to use an asymptotically faster
+algorithm.  */
+ if (try_kmp
+ &

strcasestr: avoid quadratic time consumption

2007-02-11 Thread Bruno Haible
Like c_strstr, here is a patch for strcasestr that avoids quadratic
asymptotic complexity.

One could consider introducing new modules 'strstr-linear' and
'strcasestr-linear' which override the system's strstr() or strcasestr()
functions if they have a quadratic worst-case time consumption.
But since gnulib recommends to use c-str* or mbs* anyway, I don't see
much point in it.

Bruno

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

Ensure O(n) worst-case complexity of strcasestr substitute.
* lib/strcasestr.c: Include stdbool.h.
(knuth_morris_pratt): New function.
(strcasestr): Add some bookkeeping. Invoke knuth_morris_pratt when the
bookkeeping indicates that it's worth it.
* modules/strcasestr (Depends-on): Add stdbool, strnlen.

* modules/strcasestr-tests: New file.
* tests/test-strcasestr.c: New file.

*** lib/strcasestr.c5 Feb 2007 02:42:27 -   1.6
--- lib/strcasestr.c11 Feb 2007 15:46:48 -
***
*** 22,31 
--- 22,121 
  #include 
  
  #include 
+ #include 
  #include   /* for NULL, in case a nonstandard string.h lacks it */
  
  #define TOLOWER(Ch) (isupper (Ch) ? tolower (Ch) : (Ch))
  
+ /* Knuth-Morris-Pratt algorithm.
+See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
+Return a boolean indicating success.  */
+ static bool
+ knuth_morris_pratt (const char *haystack, const char *needle,
+   const char **resultp)
+ {
+   size_t m = strlen (needle);
+ 
+   /* Allocate the table.  */
+   size_t *table = (size_t *) malloc (m * sizeof (size_t));
+   if (table == NULL)
+ return false;
+   /* Fill the table.
+  For 0 < i < m:
+0 < table[i] <= i is defined such that
+rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
+implies
+forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
+and table[i] is as large as possible with this property.
+  table[0] remains uninitialized.  */
+   {
+ size_t i, j;
+ 
+ table[1] = 1;
+ j = 0;
+ for (i = 2; i < m; i++)
+   {
+   unsigned char b = TOLOWER ((unsigned char) needle[i - 1]);
+ 
+   for (;;)
+ {
+   if (b == TOLOWER ((unsigned char) needle[j]))
+ {
+   table[i] = i - ++j;
+   break;
+ }
+   if (j == 0)
+ {
+   table[i] = i;
+   break;
+ }
+   j = j - table[j];
+ }
+   }
+   }
+ 
+   /* Search, using the table to accelerate the processing.  */
+   {
+ size_t j;
+ const char *rhaystack;
+ const char *phaystack;
+ 
+ *resultp = NULL;
+ j = 0;
+ rhaystack = haystack;
+ phaystack = haystack;
+ /* Invariant: phaystack = rhaystack + j.  */
+ while (*phaystack != '\0')
+   if (TOLOWER ((unsigned char) needle[j])
+ == TOLOWER ((unsigned char) *phaystack))
+   {
+ j++;
+ phaystack++;
+ if (j == m)
+   {
+ /* The entire needle has been found.  */
+ *resultp = rhaystack;
+ break;
+   }
+   }
+   else if (j > 0)
+   {
+ /* Found a match of needle[0..j-1], mismatch at needle[j].  */
+ rhaystack += table[j];
+ j -= table[j];
+   }
+   else
+   {
+ /* Found a mismatch at needle[0] already.  */
+ rhaystack++;
+ phaystack++;
+   }
+   }
+ 
+   free (table);
+   return true;
+ }
+ 
  /* Find the first occurrence of NEEDLE in HAYSTACK, using case-insensitive
 comparison.
 Note: This function may, in multibyte locales, return success even if
***
*** 35,40 
--- 125,150 
  {
if (*needle != '\0')
  {
+   /* Minimizing the worst-case complexity:
+Let n = strlen(haystack), m = strlen(needle).
+The naïve algorithm is O(n*m) worst-case.
+The Knuth-Morris-Pratt algorithm is O(n) worst-case but it needs a
+memory allocation.
+To achieve linear complexity and yet amortize the cost of the memory
+allocation, we activate the Knuth-Morris-Pratt algorithm only once
+the naïve algorithm has already run for some time; more precisely,
+when
+  - the outer loop count is >= 10,
+  - the average number of comparisons per outer loop is >= 5,
+  - the total number of comparisons is >= m.
+But we try it only once.  If the memory allocation attempt failed,
+we don't retry it.  */
+   bool try_kmp = true;
+   size_t outer_loop_count = 0;
+   size_t comparison_count = 0;
+   size_t last_ccount = 0; /* last comparison count */
+   const char *needle_last_ccount = needle;/* = needle + 
last_ccount */
+ 
/* Speed up the following searches of needle by caching its first
 character.  */
unsigned char b = TOLOWER ((unsigned char) *ne

new module 'mbslen'

2007-02-11 Thread Bruno Haible
The function 'mbslen' turns out to be a common utility function, needed if
one needs to a memory allocation or similar before iterating through a string.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

New module mbslen.
* modules/mbslen: New file.
* lib/mbslen.c: New file.
* lib/string_.h (mbslen): New declaration.
* m4/mbslen.m4: New file.
* m4/string_h.m4 (gl_STRING_MODULE_INDICATOR_DEFAULTS): Initialize
GNULIB_MBSLEN.
* modules/string (string.h): Also substitute GNULIB_MBSLEN.
* MODULES.html.sh (Internationalization functions): Add mbslen.

=== modules/mbslen ===
Description:
mbslen() function: Determine the number of multibyte characters in a string.

Files:
lib/mbslen.c
m4/mbslen.m4
m4/mbrtowc.m4

Depends-on:
mbuiter
string

configure.ac:
gl_FUNC_MBSLEN
gl_STRING_MODULE_INDICATOR([mbslen])

Makefile.am:
lib_SOURCES += mbslen.c

Include:


License:
LGPL

Maintainer:
Bruno Haible

 lib/mbslen.c 
/* Counting the multibyte characters in a string.
   Copyright (C) 2007 Free Software Foundation, Inc.
   Written by Bruno Haible <[EMAIL PROTECTED]>, 2007.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2, or (at your option)
   any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software Foundation,
   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */

#include 

/* Specification.  */
#include 

#if HAVE_MBRTOWC
# include "mbuiter.h"
#endif

/* Return the number of multibyte characters in the character string STRING.  */
size_t
mbslen (const char *string)
{
#if HAVE_MBRTOWC
  if (MB_CUR_MAX > 1)
{
  size_t count;
  mbui_iterator_t iter;

  count = 0;
  for (mbui_init (iter, string); mbui_avail (iter); mbui_advance (iter))
count++;

  return count;
}
  else
#endif
return strlen (string);
}
 m4/mbslen.m4 
# mbslen.m4 serial 1
dnl Copyright (C) 2007 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
dnl with or without modifications, as long as this notice is preserved.

AC_DEFUN([gl_FUNC_MBSLEN],
[
  gl_PREREQ_MBSLEN
])

# Prerequisites of lib/mbslen.c.
AC_DEFUN([gl_PREREQ_MBSLEN], [
  AC_REQUIRE([gl_FUNC_MBRTOWC])
  :
])
==
--- lib/string_.h   6 Feb 2007 01:59:41 -   1.19
+++ lib/string_.h   11 Feb 2007 16:57:48 -
@@ -349,6 +349,12 @@
 /* The following functions are not specified by POSIX.  They are gnulib
extensions.  */
 
+#if @GNULIB_MBSLEN@
+/* Return the number of multibyte characters in the character string STRING.
+   This considers multibyte characters, unlike strlen, which counts bytes.  */
+extern size_t mbslen (const char *string);
+#endif
+
 #if @GNULIB_MBSCHR@
 /* Locate the first single-byte character C in the character string STRING,
and return a pointer to it.  Return NULL if C is not found in STRING.
--- m4/string_h.m4  6 Feb 2007 01:59:41 -   1.17
+++ m4/string_h.m4  11 Feb 2007 16:57:48 -
@@ -67,6 +67,7 @@
   GNULIB_STRSEP=0;  AC_SUBST([GNULIB_STRSEP])
   GNULIB_STRCASESTR=0;  AC_SUBST([GNULIB_STRCASESTR])
   GNULIB_STRTOK_R=0;AC_SUBST([GNULIB_STRTOK_R])
+  GNULIB_MBSLEN=0;  AC_SUBST([GNULIB_MBSLEN])
   GNULIB_MBSCHR=0;  AC_SUBST([GNULIB_MBSCHR])
   GNULIB_MBSRCHR=0; AC_SUBST([GNULIB_MBSRCHR])
   GNULIB_MBSSTR=0;  AC_SUBST([GNULIB_MBSSTR])
--- modules/string  6 Feb 2007 01:59:41 -   1.16
+++ modules/string  11 Feb 2007 16:57:48 -
@@ -21,6 +21,7 @@
rm -f [EMAIL PROTECTED] $@
{ echo '/* DO NOT EDIT! GENERATED AUTOMATICALLY! */' && \
  sed -e 's|@''ABSOLUTE_STRING_H''@|$(ABSOLUTE_STRING_H)|g' \
+ -e 's|@''GNULIB_MBSLEN''@|$(GNULIB_MBSLEN)|g' \
  -e 's|@''GNULIB_MBSCHR''@|$(GNULIB_MBSCHR)|g' \
  -e 's|@''GNULIB_MBSRCHR''@|$(GNULIB_MBSRCHR)|g' \
  -e 's|@''GNULIB_MBSSTR''@|$(GNULIB_MBSSTR)|g' \
--- MODULES.html.sh 6 Feb 2007 01:59:41 -   1.190
+++ MODULES.html.sh 11 Feb 2007 16:57:48 -
@@ -2160,6 +2160,7 @@
   func_module iconvme
   func_module localcharset
   func_module hard-locale
+  func_module mbslen
   func_module mbschr
   func_module mbsrchr
   fun

copying multibyte string iterators

2007-02-11 Thread Bruno Haible
In rare cases, it is necessary to copy iterators (including the multibyte
state), rather than creating a new iterator.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/mbiter.h: Include .
(mbiter_multi_copy): New function.
(mbi_copy): New macro.
* lib/mbuiter.h: Include .
(mbuiter_multi_copy): New function.
(mbui_copy): New macro.

*** lib/mbiter.h17 Aug 2005 13:56:26 -  1.2
--- lib/mbiter.h11 Feb 2007 17:11:21 -
***
*** 1,5 
  /* Iterating through multibyte strings: macros for multi-byte encodings.
!Copyright (C) 2001, 2005 Free Software Foundation, Inc.
  
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
--- 1,5 
  /* Iterating through multibyte strings: macros for multi-byte encodings.
!Copyright (C) 2001, 2005, 2007 Free Software Foundation, Inc.
  
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
***
*** 65,70 
--- 65,73 
 mbi_reloc (iter, ptrdiff)
   relocates iterator when the string is moved by ptrdiff bytes.
  
+mbi_copy (&destiter, &srciter)
+  copies srciter to destiter.
+ 
 Here are the function prototypes of the macros.
  
 extern voidmbi_init (mbi_iterator_t iter,
***
*** 74,79 
--- 77,83 
 extern mbchar_tmbi_cur (mbi_iterator_t iter);
 extern const char *mbi_cur_ptr (mbi_iterator_t iter);
 extern voidmbi_reloc (mbi_iterator_t iter, ptrdiff_t 
ptrdiff);
+extern voidmbi_copy (mbi_iterator_t *new, const 
mbi_iterator_t *old);
   */
  
  #ifndef _MBITER_H
***
*** 81,86 
--- 85,91 
  
  #include 
  #include 
+ #include 
  
  /* Tru64 with Desktop Toolkit C has a bug:  must be included before
 .
***
*** 174,179 
--- 179,196 
iter->limit += ptrdiff;
  }
  
+ static inline void
+ mbiter_multi_copy (struct mbiter_multi *new_iter, const struct mbiter_multi 
*old_iter)
+ {
+   new_iter->limit = old_iter->limit;
+   if ((new_iter->in_shift = old_iter->in_shift))
+ memcpy (&new_iter->state, &old_iter->state, sizeof (mbstate_t));
+   else
+ memset (&new_iter->state, 0, sizeof (mbstate_t));
+   new_iter->next_done = old_iter->next_done;
+   mb_copy (&new_iter->cur, &old_iter->cur);
+ }
+ 
  /* Iteration macros.  */
  typedef struct mbiter_multi mbi_iterator_t;
  #define mbi_init(iter, startptr, length) \
***
*** 192,195 
--- 209,215 
  /* Relocation.  */
  #define mbi_reloc(iter, ptrdiff) mbiter_multi_reloc (&iter, ptrdiff)
  
+ /* Copying an iterator.  */
+ #define mbi_copy mbiter_multi_copy
+ 
  #endif /* _MBITER_H */
*** lib/mbuiter.h   17 Aug 2005 13:58:34 -  1.1
--- lib/mbuiter.h   11 Feb 2007 17:11:21 -
***
*** 1,5 
  /* Iterating through multibyte strings: macros for multi-byte encodings.
!Copyright (C) 2001, 2005 Free Software Foundation, Inc.
  
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
--- 1,5 
  /* Iterating through multibyte strings: macros for multi-byte encodings.
!Copyright (C) 2001, 2005, 2007 Free Software Foundation, Inc.
  
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
***
*** 73,78 
--- 73,81 
 mbui_reloc (iter, ptrdiff)
   relocates iterator when the string is moved by ptrdiff bytes.
  
+mbui_copy (&destiter, &srciter)
+  copies srciter to destiter.
+ 
 Here are the function prototypes of the macros.
  
 extern voidmbui_init (mbui_iterator_t iter, const char 
*startptr);
***
*** 81,86 
--- 84,90 
 extern mbchar_tmbui_cur (mbui_iterator_t iter);
 extern const char *mbui_cur_ptr (mbui_iterator_t iter);
 extern voidmbui_reloc (mbui_iterator_t iter, ptrdiff_t 
ptrdiff);
+extern voidmbui_copy (mbui_iterator_t *new, const 
mbui_iterator_t *old);
   */
  
  #ifndef _MBUITER_H
***
*** 89,94 
--- 93,99 
  #include 
  #include 
  #include 
+ #include 
  
  /* Tru64 with Desktop Toolkit C has a bug:  must be included before
 .
***
*** 182,187 
--- 187,203 
iter->cur.ptr += ptrdiff;
  }
  
+ static inline void
+ mbuiter_multi_copy (struct mbuiter_multi *new_iter, const struct 
mbuiter_multi *old_iter)
+ {
+   if ((new_iter->in_shift = old_iter->in_shift))
+ memcpy (&new_iter->state, &old_iter->state, sizeof (mbstate_t));
+   else
+ memset (&new_iter->state, 0, sizeof (mbstate_t));
+   new_iter->next_done = old_iter->next_done;
+   mb_copy (&

Re: Java-related Bison tests all fail

2007-02-11 Thread Joel E. Denny
On Sun, 11 Feb 2007, Bruno Haible wrote:

> Joel E. Denny wrote:
> > I am able to reproduce the problem with coreutils 5.2.1 on both an x86 
> > Linux and an x86 Solaris.  I've attached the erroneous output.
> > 
> > The trouble is the last tr.  The following replacement, which I discovered 
> > by trial and error, works.  I just inserted zz after FG:
> 
> Thanks. I'm applying this workaround:

Thanks.  That works for me.




Re: javacomp.m4 usage [Fwd: Re: Java-related Bison tests all fail]

2007-02-11 Thread Joel E. Denny
On Sat, 10 Feb 2007, Bruno Haible wrote:

> 2007-02-10  Bruno Haible  <[EMAIL PROTECTED]>
> 
>   Enable the Java related testsuite tests when the only Java compiler
>   found is a gcj < 4.3.
>   * configure.ac (gt_JAVACOMP): Don't specify a target_version.
> 
> --- configure.ac29 Jan 2007 10:54:42 -  1.79
> +++ configure.ac10 Feb 2007 20:53:22 -
> @@ -136,7 +136,7 @@
>  O0CXXFLAGS=`echo $CXXFLAGS | sed 's/-O[[0-9]] *//'`
>  AC_SUBST([O0CXXFLAGS])
>  
> -gt_JAVACOMP([1.3], [1.3])
> +gt_JAVACOMP([1.3])
>  gt_JAVAEXEC
>  
>  AC_CONFIG_FILES([Makefile

With Bruno's recent Gnulib patch (working around the old tr bug), the 
above fixes Java support in Bison for me.  Paolo, is this fine with you?




Re: new module 'mbschr'

2007-02-11 Thread Bruno Haible
I'm adding this fix, and a test case.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/mbschr.c (mbschr): Fix bug.

* modules/mbschr-tests: New file.
* tests/test-mbschr.sh: New file.
* tests/test-mbschr.c: New file.
* m4/locale-zh.m4: New file.

*** lib/mbschr.c5 Feb 2007 01:01:37 -   1.1
--- lib/mbschr.c11 Feb 2007 17:39:26 -
***
*** 41,51 
  
for (mbui_init (iter, string);; mbui_advance (iter))
{
  if (mb_len (mbui_cur (iter)) == 1
  && (unsigned char) * mbui_cur_ptr (iter) == (unsigned char) c)
break;
- if (!mbui_avail (iter))
-   goto notfound;
}
return (char *) mbui_cur_ptr (iter);
   notfound:
--- 41,51 
  
for (mbui_init (iter, string);; mbui_advance (iter))
{
+ if (!mbui_avail (iter))
+   goto notfound;
  if (mb_len (mbui_cur (iter)) == 1
  && (unsigned char) * mbui_cur_ptr (iter) == (unsigned char) c)
break;
}
return (char *) mbui_cur_ptr (iter);
   notfound:





Re: new module 'mbsrchr'

2007-02-11 Thread Bruno Haible
This module too requires a fix.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/mbsrchr.c (mbsrchr): Fix bug.

* modules/mbsrchr-tests: New file.
* tests/test-mbsrchr.sh: New file.
* tests/test-mbsrchr.c: New file.

*** lib/mbsrchr.c   5 Feb 2007 01:07:28 -   1.1
--- lib/mbsrchr.c   11 Feb 2007 17:41:58 -
***
*** 40,52 
const char *result = NULL;
mbui_iterator_t iter;
  
!   for (mbui_init (iter, string);; mbui_advance (iter))
{
  if (mb_len (mbui_cur (iter)) == 1
  && (unsigned char) * mbui_cur_ptr (iter) == (unsigned char) c)
result = mbui_cur_ptr (iter);
- if (!mbui_avail (iter))
-   break;
}
return (char *) result;
  }
--- 40,50 
const char *result = NULL;
mbui_iterator_t iter;
  
!   for (mbui_init (iter, string); mbui_avail (iter); mbui_advance (iter))
{
  if (mb_len (mbui_cur (iter)) == 1
  && (unsigned char) * mbui_cur_ptr (iter) == (unsigned char) c)
result = mbui_cur_ptr (iter);
}
return (char *) result;
  }





Re: javacomp.m4 usage [Fwd: Re: Java-related Bison tests all fail]

2007-02-11 Thread Paolo Bonzini


With Bruno's recent Gnulib patch (working around the old tr bug), the 
above fixes Java support in Bison for me.  Paolo, is this fine with you?


Sure it is.

Paolo




mbsstr: avoid quadratic time consumption

2007-02-11 Thread Bruno Haible
This patch uses the Knuth-Morris-Pratt algorithm to ensure that mbsstr()
is O(n) worst-case.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

Ensure O(n) worst-case complexity of mbsstr.
* lib/mbsstr.c: Include stdbool.h.
(knuth_morris_pratt_unibyte, knuth_morris_pratt_multibyte): New
functions.
(mbsstr): Add some bookkeeping. Invoke knuth_morris_pratt_* when the
bookkeeping indicates that it's worth it.
* modules/mbsstr (Depends-on): Add stdbool, mbslen, strnlen.

*** lib/mbsstr.c5 Feb 2007 01:36:34 -   1.2
--- lib/mbsstr.c11 Feb 2007 19:22:31 -
***
*** 21,32 
--- 21,232 
  /* Specification.  */
  #include 
  
+ #include 
  #include   /* for NULL, in case a nonstandard string.h lacks it */
  
  #if HAVE_MBRTOWC
  # include "mbuiter.h"
  #endif
  
+ /* Knuth-Morris-Pratt algorithm.
+See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
+Return a boolean indicating success.  */
+ 
+ static bool
+ knuth_morris_pratt_unibyte (const char *haystack, const char *needle,
+   const char **resultp)
+ {
+   size_t m = strlen (needle);
+ 
+   /* Allocate the table.  */
+   size_t *table = (size_t *) malloc (m * sizeof (size_t));
+   if (table == NULL)
+ return false;
+   /* Fill the table.
+  For 0 < i < m:
+0 < table[i] <= i is defined such that
+rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
+implies
+forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
+and table[i] is as large as possible with this property.
+  table[0] remains uninitialized.  */
+   {
+ size_t i, j;
+ 
+ table[1] = 1;
+ j = 0;
+ for (i = 2; i < m; i++)
+   {
+   unsigned char b = (unsigned char) needle[i - 1];
+ 
+   for (;;)
+ {
+   if (b == (unsigned char) needle[j])
+ {
+   table[i] = i - ++j;
+   break;
+ }
+   if (j == 0)
+ {
+   table[i] = i;
+   break;
+ }
+   j = j - table[j];
+ }
+   }
+   }
+ 
+   /* Search, using the table to accelerate the processing.  */
+   {
+ size_t j;
+ const char *rhaystack;
+ const char *phaystack;
+ 
+ *resultp = NULL;
+ j = 0;
+ rhaystack = haystack;
+ phaystack = haystack;
+ /* Invariant: phaystack = rhaystack + j.  */
+ while (*phaystack != '\0')
+   if ((unsigned char) needle[j] == (unsigned char) *phaystack)
+   {
+ j++;
+ phaystack++;
+ if (j == m)
+   {
+ /* The entire needle has been found.  */
+ *resultp = rhaystack;
+ break;
+   }
+   }
+   else if (j > 0)
+   {
+ /* Found a match of needle[0..j-1], mismatch at needle[j].  */
+ rhaystack += table[j];
+ j -= table[j];
+   }
+   else
+   {
+ /* Found a mismatch at needle[0] already.  */
+ rhaystack++;
+ phaystack++;
+   }
+   }
+ 
+   free (table);
+   return true;
+ }
+ 
+ #if HAVE_MBRTOWC
+ static bool
+ knuth_morris_pratt_multibyte (const char *haystack, const char *needle,
+ const char **resultp)
+ {
+   size_t m = mbslen (needle);
+   mbchar_t *needle_mbchars;
+   size_t *table;
+ 
+   /* Allocate room for needle_mbchars and the table.  */
+   char *memory = (char *) malloc (m * (sizeof (mbchar_t) + sizeof (size_t)));
+   if (memory == NULL)
+ return false;
+   needle_mbchars = (mbchar_t *) memory;
+   table = (size_t *) (memory + m * sizeof (mbchar_t));
+ 
+   /* Fill needle_mbchars.  */
+   {
+ mbui_iterator_t iter;
+ size_t j;
+ 
+ j = 0;
+ for (mbui_init (iter, needle); mbui_avail (iter); mbui_advance (iter), 
j++)
+   mb_copy (&needle_mbchars[j], &mbui_cur (iter));
+   }
+ 
+   /* Fill the table.
+  For 0 < i < m:
+0 < table[i] <= i is defined such that
+rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
+implies
+forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
+and table[i] is as large as possible with this property.
+  table[0] remains uninitialized.  */
+   {
+ size_t i, j;
+ 
+ table[1] = 1;
+ j = 0;
+ for (i = 2; i < m; i++)
+   {
+   mbchar_t *b = &needle_mbchars[i - 1];
+ 
+   for (;;)
+ {
+   if (mb_equal (*b, needle_mbchars[j]))
+ {
+   table[i] = i - ++j;
+   break;
+ }
+   if (j == 0)
+ {
+   table[i] = i;
+   break;
+ }
+   j = j - table[j];
+ }
+   }
+   }
+ 
+   /* Search, using the table to accelerate the processing.  */
+   {
+ size_t j;
+ mbui_iterator_t rhaystack;
+ mbui_iterator_t phaystack;
+ 
+ *resultp = NULL;
+ j = 0;
+ mb

Re: javacomp.m4 usage [Fwd: Re: Java-related Bison tests all fail]

2007-02-11 Thread Joel E. Denny
On Sun, 11 Feb 2007, Paolo Bonzini wrote:

> 
> > With Bruno's recent Gnulib patch (working around the old tr bug), the above
> > fixes Java support in Bison for me.  Paolo, is this fine with you?
> 
> Sure it is.

Thanks.  Committed.




Re: gcc-4.3, glibc and AC_PROG_CC_STDC

2007-02-11 Thread Ralf Wildenhues
Hello Bruno,

* Bruno Haible wrote on Sat, Feb 10, 2007 at 11:44:43PM CET:
>
> glibc's  from 2.3.x up to the most recent ones has a bug
[...]

> I think autoconf should do something about it.
> 
> There are two possibilities:
>   - Modify AC_PROG_CC_STDC so that it doesn't activate --std=gnu99 in this
> case,

This sounds far too drastic IMHO.  It would be seriously silly if
the GNU system would detect its own system as non-C99 one just for
one C99 related bug.  There are loads of useful C99 features and
programs that make use of them, independent of this issue.

Wait.  Did you mean for it to activate --std=c99 instead, or gnu89?

>   - Modify AC_CHECK_HEADERS so that it reports  missing in this case.

- Write a gl_HEADER_WCHAR_H that DTRT.  Use that.  If it's good,
  suggest it for inclusion in Autoconf as AC_HEADER_WCHAR_H.

Isn't GCC's fixincludes meant for exactly this sort of issues?
Why can't it be "fixed" there?  I mean, this issue is one that
touches all users of GCC, not just those that also happen to use
Autoconf.  And a fix there would kill the issue for all of them
(next to the obvious fix place: in (distributors' versions of)
glibc).

If the absolute header inclusion technique employed by gnulib is
preventing the fixinclude thingy to work, then I suggest that
gnulib be fixed.  I haven't checked though whether that is the
case here.

>  - gcc-4.3's behaviour is not likely to change (they have made the change
>precisely for implementing ISO C 99 more closely),
>  - gcc-4.3 should be released in spring 2008,

Hmm.  Given that there is still discussion and work on GCC for this
topic anyway[1], I don't think Autoconf should be doing anything just
yet.

Cheers,
Ralf

[1] 




wint_t in printf-args.c under mingw32

2007-02-11 Thread Ben Pfaff
While compiling GNU PSPP under mingw32, for portability testing
purposes, I noticed an error from printf-args.c in gnulib:

../../intl/printf-args.c: In function `printf_fetchargs':
../../intl/printf-args.c:83: warning: `wint_t' is promoted to `int' when passed 
through `...'
../../intl/printf-args.c:83: warning: (so you should pass `int' not `wint_t' to 
`va_arg')
../../intl/printf-args.c:83: note: if this code is reached, the program will 
abort

which corresponds to these source lines:

#ifdef HAVE_WINT_T
  case TYPE_WIDE_CHAR:
ap->a.a_wide_char = va_arg (args, wint_t);
break;
#endif

I see that in other cases in that file where the type passed to
va_arg is likely to be narrower than int, "int" is what is passed
to va_arg as the type.  Should the same tactic be used here?
-- 
Ben Pfaff 
[EMAIL PROTECTED]
http://benpfaff.org





mbscasestr: avoid quadratic time consumption

2007-02-11 Thread Bruno Haible
This patch uses the Knuth-Morris-Pratt algorithm to ensure that mbscasestr()
is O(n) worst-case.

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

Ensure O(n) worst-case complexity of mbscasestr.
* lib/mbscasestr.c: Include stdbool.h.
(knuth_morris_pratt_unibyte, knuth_morris_pratt_multibyte): New
functions.
(mbscasestr): Add some bookkeeping. Invoke knuth_morris_pratt_* when
the bookkeeping indicates that it's worth it.
* modules/mbscasestr (Depends-on): Add stdbool, mbslen, strnlen.

*** lib/mbscasestr.c5 Feb 2007 02:42:27 -   1.2
--- lib/mbscasestr.c11 Feb 2007 21:32:25 -
***
*** 22,27 
--- 22,28 
  #include 
  
  #include 
+ #include 
  #include   /* for NULL, in case a nonstandard string.h lacks it */
  
  #if HAVE_MBRTOWC
***
*** 30,35 
--- 31,247 
  
  #define TOLOWER(Ch) (isupper (Ch) ? tolower (Ch) : (Ch))
  
+ /* Knuth-Morris-Pratt algorithm.
+See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
+Return a boolean indicating success.  */
+ 
+ static bool
+ knuth_morris_pratt_unibyte (const char *haystack, const char *needle,
+   const char **resultp)
+ {
+   size_t m = strlen (needle);
+ 
+   /* Allocate the table.  */
+   size_t *table = (size_t *) malloc (m * sizeof (size_t));
+   if (table == NULL)
+ return false;
+   /* Fill the table.
+  For 0 < i < m:
+0 < table[i] <= i is defined such that
+rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
+implies
+forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
+and table[i] is as large as possible with this property.
+  table[0] remains uninitialized.  */
+   {
+ size_t i, j;
+ 
+ table[1] = 1;
+ j = 0;
+ for (i = 2; i < m; i++)
+   {
+   unsigned char b = TOLOWER ((unsigned char) needle[i - 1]);
+ 
+   for (;;)
+ {
+   if (b == TOLOWER ((unsigned char) needle[j]))
+ {
+   table[i] = i - ++j;
+   break;
+ }
+   if (j == 0)
+ {
+   table[i] = i;
+   break;
+ }
+   j = j - table[j];
+ }
+   }
+   }
+ 
+   /* Search, using the table to accelerate the processing.  */
+   {
+ size_t j;
+ const char *rhaystack;
+ const char *phaystack;
+ 
+ *resultp = NULL;
+ j = 0;
+ rhaystack = haystack;
+ phaystack = haystack;
+ /* Invariant: phaystack = rhaystack + j.  */
+ while (*phaystack != '\0')
+   if (TOLOWER ((unsigned char) needle[j])
+ == TOLOWER ((unsigned char) *phaystack))
+   {
+ j++;
+ phaystack++;
+ if (j == m)
+   {
+ /* The entire needle has been found.  */
+ *resultp = rhaystack;
+ break;
+   }
+   }
+   else if (j > 0)
+   {
+ /* Found a match of needle[0..j-1], mismatch at needle[j].  */
+ rhaystack += table[j];
+ j -= table[j];
+   }
+   else
+   {
+ /* Found a mismatch at needle[0] already.  */
+ rhaystack++;
+ phaystack++;
+   }
+   }
+ 
+   free (table);
+   return true;
+ }
+ 
+ #if HAVE_MBRTOWC
+ static bool
+ knuth_morris_pratt_multibyte (const char *haystack, const char *needle,
+ const char **resultp)
+ {
+   size_t m = mbslen (needle);
+   mbchar_t *needle_mbchars;
+   size_t *table;
+ 
+   /* Allocate room for needle_mbchars and the table.  */
+   char *memory = (char *) malloc (m * (sizeof (mbchar_t) + sizeof (size_t)));
+   if (memory == NULL)
+ return false;
+   needle_mbchars = (mbchar_t *) memory;
+   table = (size_t *) (memory + m * sizeof (mbchar_t));
+ 
+   /* Fill needle_mbchars.  */
+   {
+ mbui_iterator_t iter;
+ size_t j;
+ 
+ j = 0;
+ for (mbui_init (iter, needle); mbui_avail (iter); mbui_advance (iter), 
j++)
+   {
+   mb_copy (&needle_mbchars[j], &mbui_cur (iter));
+   if (needle_mbchars[j].wc_valid)
+ needle_mbchars[j].wc = towlower (needle_mbchars[j].wc);
+   }
+   }
+ 
+   /* Fill the table.
+  For 0 < i < m:
+0 < table[i] <= i is defined such that
+rhaystack[0..i-1] == needle[0..i-1] and rhaystack[i] != needle[i]
+implies
+forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1],
+and table[i] is as large as possible with this property.
+  table[0] remains uninitialized.  */
+   {
+ size_t i, j;
+ 
+ table[1] = 1;
+ j = 0;
+ for (i = 2; i < m; i++)
+   {
+   mbchar_t *b = &needle_mbchars[i - 1];
+ 
+   for (;;)
+ {
+   if (mb_equal (*b, needle_mbchars[j]))
+ {
+   table[i] = i - ++j;
+   break;
+ }
+   if (j == 0)
+ {
+   table[i] = i;
+   break;
+ }
+ 

Re: new module mbscspn

2007-02-11 Thread Bruno Haible
A small optimization was possible:

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/mbscspn.c (mbscspn): Remove unnecessary strlen call and
unneeded cast.

--- lib/mbscspn.c   5 Feb 2007 02:52:43 -   1.1
+++ lib/mbscspn.c   11 Feb 2007 22:27:51 -
@@ -50,8 +50,8 @@
{
  if (mb_len (mbui_cur (iter)) == 1)
{
- if (mbschr (accept, (unsigned char) * mbui_cur_ptr (iter)))
-   return mbui_cur_ptr (iter) - string;
+ if (mbschr (accept, * mbui_cur_ptr (iter)))
+   goto found;
}
  else
{
@@ -61,10 +61,11 @@
   mbui_avail (aiter);
   mbui_advance (aiter))
if (mb_equal (mbui_cur (aiter), mbui_cur (iter)))
- return mbui_cur_ptr (iter) - string;
+ goto found;
}
}
-  return strlen (string);
+ found:
+  return mbui_cur_ptr (iter) - string;
 }
   else
 #endif





Re: new module mbspbrk

2007-02-11 Thread Bruno Haible
A cosmetic tweak:

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/mbspbrk.c (mbspbrk): Remove unneeded cast.

--- lib/mbspbrk.c   5 Feb 2007 03:12:26 -   1.1
+++ lib/mbspbrk.c   11 Feb 2007 22:32:41 -
@@ -46,7 +46,7 @@
{
  if (mb_len (mbui_cur (iter)) == 1)
{
- if (mbschr (accept, (unsigned char) * mbui_cur_ptr (iter)))
+ if (mbschr (accept, * mbui_cur_ptr (iter)))
return (char *) mbui_cur_ptr (iter);
}
  else





Re: new module mbsspn

2007-02-11 Thread Bruno Haible
This bug fix was necessary:

2007-02-11  Bruno Haible  <[EMAIL PROTECTED]>

* lib/mbsspn.c (mbsspn): Fix bug. Remove unnecessary strlen call.

--- lib/mbsspn.c5 Feb 2007 03:23:34 -   1.1
+++ lib/mbsspn.c11 Feb 2007 22:36:25 -
@@ -47,8 +47,8 @@
  for (mbui_init (iter, string); mbui_avail (iter); mbui_advance (iter))
if (!(mb_len (mbui_cur (iter)) == 1
  && (unsigned char) * mbui_cur_ptr (iter) == uc))
- return mbui_cur_ptr (iter) - string;
- return strlen (string);
+ break;
+ return mbui_cur_ptr (iter) - string;
}
   else
 #endif
@@ -71,25 +71,24 @@
{
  if (mb_len (mbui_cur (iter)) == 1)
{
- if (mbschr (reject, (unsigned char) * mbui_cur_ptr (iter)) == 
NULL)
-   return mbui_cur_ptr (iter) - string;
+ if (mbschr (reject, * mbui_cur_ptr (iter)) == NULL)
+   goto found;
}
  else
{
  mbui_iterator_t aiter;
 
- for (mbui_init (aiter, reject);
-  mbui_avail (aiter);
-  mbui_advance (aiter))
+ for (mbui_init (aiter, reject);; mbui_advance (aiter))
{
  if (!mbui_avail (aiter))
-   return mbui_cur_ptr (iter) - string;
+   goto found;
  if (mb_equal (mbui_cur (aiter), mbui_cur (iter)))
break;
}
}
}
-  return strlen (string);
+ found:
+  return mbui_cur_ptr (iter) - string;
 }
   else
 #endif





Re: wint_t in printf-args.c under mingw32

2007-02-11 Thread Bruno Haible
Ben Pfaff wrote:
> While compiling GNU PSPP under mingw32, for portability testing
> purposes, I noticed an error from printf-args.c in gnulib:
> 
> ../../intl/printf-args.c: In function `printf_fetchargs':
> ../../intl/printf-args.c:83: warning: `wint_t' is promoted to `int' when 
> passed through `...'
> ../../intl/printf-args.c:83: warning: (so you should pass `int' not `wint_t' 
> to `va_arg')
> ../../intl/printf-args.c:83: note: if this code is reached, the program will 
> abort
> 
> which corresponds to these source lines:
> 
> #ifdef HAVE_WINT_T
>   case TYPE_WIDE_CHAR:
>   ap->a.a_wide_char = va_arg (args, wint_t);
>   break;
> #endif

Thanks for reporting this. This is already fixed in gnulib for 6 months:

2006-07-22  Bruno Haible  <[EMAIL PROTECTED]>

Merge from GNU gettext 0.15.

2005-07-05  Bruno Haible  <[EMAIL PROTECTED]>

* printf-args.c (printf_fetchargs): Work around broken
definition of wint_t on mingw.





Re: wint_t in printf-args.c under mingw32

2007-02-11 Thread Ben Pfaff
Bruno Haible <[EMAIL PROTECTED]> writes:

> Thanks for reporting this. This is already fixed in gnulib for 6 months:

I thought I'd done a "cvs update" recently, but obviously not.
Thanks for taking the time to reply.
-- 
Ben Pfaff 
[EMAIL PROTECTED]
http://benpfaff.org





Re: gcc-4.3, glibc and AC_PROG_CC_STDC

2007-02-11 Thread Paul Eggert
I am CC'ing this message to gcc@gcc.gnu.org to give GCC developers a
heads-up on the situation.  For GCC developers: please see

for the problem with a gcc-4.3 snapshot with --std=gnu99 -O2" and
glibc wchar.h.

Ralf Wildenhues <[EMAIL PROTECTED]> writes:

> Given that there is still discussion and work on GCC for this topic
> anyway[1], I don't think Autoconf should be doing anything just yet.

Both of the solutions that Bruno suggested seem too drastic to me as
well.  I hope we can come up with something less intrusive.

Once GCC 4.3 settles down, perhaps we can work around any remaining
problems with gnulib's wchar module.

A less-appetizing possibility would be to modify AC_PROG_CC_STDC to
specify --std=gnu99 -fgnu89-extern-inline rather than plain
--std=gnu99 if 'configure' detects the problem with wchar.h, or with
any other standard include file.  Presumably this would occur only if
GCC 4.3 (or 4.4?) is used with an older glibc.  But for this case I'm
inclined to ask, along with Ralf, why GCC 4.3 doesn't apply
fixincludes to the troublesome include files.

> If the absolute header inclusion technique employed by gnulib is
> preventing the fixinclude thingy to work,

That should not be a problem.  gnulib's absolute header method should
work even when fixincludes has generated the include file.  (I haven't
tested this, though.)