Re: memmem issues

2007-12-31 Thread Bruno Haible
Ben Pfaff wrote:
> >> > +unsigned char b = (unsigned char) needle[i - 1];
> >> > ...
> >> > +if (b == (unsigned char) needle[j])
> >>
> >> Would it be cleaner to declare 'b' to be of type 'char' and avoid the
> >> casts?
> >
> > No; ISO C 99 section 7.21.4 says that when byte strings are compared the
> > elements are considered as 'unsigned char' values.
> 
> The first cast to unsigned char quoted above seems to be
> unnecessary: assigning a value to an object of type unsigned char
> will implicitly convert it to unsigned char.

The answer was already in the mail to which you replied:

  Why risk bugs when the approach is very simple: after fetching
  any 'char' from any of the strings, cast it to 'unsigned char'.
  Simple rule, simple to remember, works fine.

When you omit the cast, the next person who does a refactoring of the code
and who was not aware that you were relying on implicit conversions will
introduce a bug.

Code maintainability is about clearly expressing the code's intent. Either
in comments or - here - through seemingly redundant casts.

Bruno




Re: memmem issues

2007-12-31 Thread Bruno Haible
Paul Eggert wrote:
> The 'char' type can have padding bits, which means that code like this:
> 
>  char const *needle = (char const *) needle_start;
>  char const *haystack = (char const *) haystack_start;
>  if ((unsigned char) *needle == (unsigned char) *haystack) ...
> 
> might ignore some of the bits in the storage addressed by A and B.
> To get a proper memcmp-style comparison one must do something like this:
> 
>  unsigned char const *needle = (unsigned char const *) needle_start;
>  unsigned char const *haystack = (unsigned char const *) haystack_start;
>  if (*needle == *haystack) ...
> 
> (In the latter version, the casts are needed only because this code is
> intended to be portable to C++.)

I disagree. Citing ISO C 99, section 7.21.4:

   "The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters
(both interpreted as unsigned char) that differ in the objects
being compared."

It says that the _values_ of the characters are interpreted as
unsigned char. I.e. like this:

if ((unsigned char) *needle == (unsigned char) *haystack) ...

It does _not_ say that the memory storage should be interpreted as
'unsigned char[]'.

Bruno





Re: memmem issues

2007-12-31 Thread Bruno Haible
Paul Eggert wrote:
> OK, I installed this patch.
> 
> 2007-12-29  Paul Eggert  <[EMAIL PROTECTED]>
> 
>   * lib/memmem.c (knuth_morris_pratt): Check for size_t overflow
>   when multiplying M by sizeof (size_t).

Thanks. Let me generalize it, as follows.


2007-12-30  Bruno Haible  <[EMAIL PROTECTED]>

* lib/malloca.h (nmalloca): New macro.
* lib/c-strcasestr.c (knuth_morris_pratt): Use it.
* lib/c-strstr.c (knuth_morris_pratt): Likewise.
* lib/mbscasestr.c (knuth_morris_pratt_unibyte,
knuth_morris_pratt_multibyte): Likewise.
* lib/mbsstr.c (knuth_morris_pratt_unibyte,
knuth_morris_pratt_multibyte): Likewise.
* lib/memmem.c (knuth_morris_pratt): Likewise.
* lib/strcasestr.c (knuth_morris_pratt): Likewise.

*** lib/malloca.h.orig  2007-12-30 16:44:24.0 +0100
--- lib/malloca.h   2007-12-30 16:21:32.0 +0100
***
*** 70,78 
  # define freea free
  #endif
  
! /* Maybe we should also define a variant
! nmalloca (size_t n, size_t s) - behaves like malloca (n * s)
!If this would be useful in your application. please speak up.  */
  
  
  #ifdef __cplusplus
--- 70,88 
  # define freea free
  #endif
  
! /* nmalloca(N,S) is an overflow-safe variant of malloca (N * S).
!It allocates an array of N objects, each with S bytes of memory,
!on the stack.  S must be positive and N must be nonnegative.
!The array must be freed using freea() before the function returns.  */
! #if 1
! /* Cf. the definition of xalloc_oversized.  */
! # define nmalloca(n, s) \
! ((n) > (size_t) (sizeof (ptrdiff_t) <= sizeof (size_t) ? -1 : -2) / (s) \
!  ? NULL \
!  : malloca ((n) * (s)))
! #else
! extern void * nmalloca (size_t n, size_t s);
! #endif
  
  
  #ifdef __cplusplus
*** lib/c-strcasestr.c.orig 2007-12-30 16:44:24.0 +0100
--- lib/c-strcasestr.c  2007-12-30 16:37:08.0 +0100
***
*** 37,43 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) malloca (m * sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
--- 37,43 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) nmalloca (m, sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
*** lib/c-strstr.c.orig 2007-12-30 16:44:24.0 +0100
--- lib/c-strstr.c  2007-12-30 16:37:22.0 +0100
***
*** 36,42 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) malloca (m * sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
--- 36,42 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) nmalloca (m, sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
*** lib/mbscasestr.c.orig   2007-12-30 16:44:24.0 +0100
--- lib/mbscasestr.c2007-12-30 16:37:58.0 +0100
***
*** 42,48 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) malloca (m * sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
--- 42,48 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) nmalloca (m, sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
***
*** 164,170 
size_t *table;
  
/* Allocate room for needle_mbchars and the table.  */
!   char *memory = (char *) malloca (m * (sizeof (mbchar_t) + sizeof (size_t)));
if (memory == NULL)
  return false;
needle_mbchars = (mbchar_t *) memory;
--- 164,170 
size_t *table;
  
/* Allocate room for needle_mbchars and the table.  */
!   char *memory = (char *) nmalloca (m, sizeof (mbchar_t) + sizeof (size_t));
if (memory == NULL)
  return false;
needle_mbchars = (mbchar_t *) memory;
*** lib/mbsstr.c.orig   2007-12-30 16:44:24.0 +0100
--- lib/mbsstr.c2007-12-30 16:39:36.0 +0100
***
*** 39,45 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) malloca (m * sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
--- 39,45 
size_t m = strlen (needle);
  
/* Allocate the table.  */
!   size_t *table = (size_t *) nmalloca (m, sizeof (size_t));
if (table == NULL)
  return false;
/* Fill the table.
***
*** 160,166 
size_t *table;
  
/* Allocate room for needle_mbchars and the table.  */
!   char *memory = (char *) malloca (m * (sizeof (mbchar_t) + sizeof (size_t)));
if (memory == NULL)
  return false;
needle_mbchars = (mbchar_t *) memory;
--- 160,166 
size_t *table;
  
/* Allocate room for needle_mbchars and the table.  */
!   char *memory = (char 

new macro xnmalloca

2007-12-31 Thread Bruno Haible
Since there was need for a macro 'nmalloca', there will also be need for
'xnmalloca'.

This also fixes a bug: On platforms without working alloca(), xmalloc()
could be used without being declared.


2007-12-30  Bruno Haible  <[EMAIL PROTECTED]>

* lib/xmalloca.h: Include xalloc.h.
(xnmalloca): New macro.

*** lib/xmalloca.h.orig 2007-12-30 16:44:24.0 +0100
--- lib/xmalloca.h  2007-12-30 16:30:56.0 +0100
***
*** 19,24 
--- 19,25 
  #define _XMALLOCA_H
  
  #include "malloca.h"
+ #include "xalloc.h"
  
  
  #ifdef __cplusplus
***
*** 40,48 
xmalloc (N)
  #endif
  
! /* Maybe we should also define a variant
! xnmalloca (size_t n, size_t s) - behaves like xmalloca (n * s)
!If this would be useful in your application. please speak up.  */
  
  
  #ifdef __cplusplus
--- 41,59 
xmalloc (N)
  #endif
  
! /* xnmalloca(N,S) is an overflow-safe variant of xmalloca (N * S).
!It allocates an array of N objects, each with S bytes of memory,
!on the stack.  S must be positive and N must be nonnegative.
!The array must be freed using freea() before the function returns.
!Upon failure, it exits with an error message.  */
! #if HAVE_ALLOCA
! /* Rely on xmalloca (SIZE_MAX) calling xalloc_die ().  */
! # define xnmalloca(n, s) \
! xmalloca (xalloc_oversized ((n), (s)) ? (size_t) (-1) : (n) * (s))
! #else
! # define xnmalloca(n, s) \
! xnmalloc ((n), (s))
! #endif
  
  
  #ifdef __cplusplus





xmalloca: small optimization

2007-12-31 Thread Bruno Haible
This is a small optimization: On platform without alloca(), the xmmalloca()
does not need to be defined, since nothing uses it.

2007-12-30  Bruno Haible  <[EMAIL PROTECTED]>

* lib/xmalloca.c (xmmalloca): Don't define if HAVE_ALLOCA is not
defined.

*** lib/xmalloca.c.orig 2007-12-30 16:44:24.0 +0100
--- lib/xmalloca.c  2007-12-30 16:34:13.0 +0100
***
*** 22,27 
--- 22,29 
  
  #include "xalloc.h"
  
+ #if HAVE_ALLOCA
+ 
  void *
  xmmalloca (size_t n)
  {
***
*** 32,34 
--- 34,38 
  xalloc_die ();
return p;
  }
+ 
+ #endif





Re: memmem issues

2007-12-31 Thread Bruno Haible
Paul Eggert wrote:
> Bruno Haible <[EMAIL PROTECTED]> writes:
> 
> > Most of your comments apply to all copies of the KMP code in gnulib.
> 
> Ouch!  Should these be coalesced?

Out of the 8 copies of the code currently in gnulib, I can easily coalesce
5 into 1; see attached patch. The remaining 4 copies, however, cannot be
merged without making them less efficient or without introducing ugly #ifs.

Bruno


2007-12-30  Bruno Haible  <[EMAIL PROTECTED]>

Unify 5 copies of the KMP code.
* lib/str-kmp.h: New file.
* lib/c-strcasestr.c: Include str-kmp.h.
(knuth_morris_pratt): Remove function.
(c_strcasestr): Update.
* lib/c-strstr.c: Include str-kmp.h.
(knuth_morris_pratt): Remove function.
(c_strcasestr): Update.
* lib/mbscasestr.c: Include str-kmp.h.
(knuth_morris_pratt_unibyte): Remove function.
* lib/mbsstr.c: Include str-kmp.h.
(knuth_morris_pratt_unibyte): Remove function.
* lib/strcasestr.c: Include str-kmp.h.
(knuth_morris_pratt): Remove function.
(strcasestr): Update.
* modules/c-strcasestr (Files): Add lib/str-kmp.h.
* modules/c-strstr (Files): Likewise.
* modules/mbscasestr (Files): Likewise.
* modules/mbsstr (Files): Likewise.
* modules/strcasestr (Files): Likewise.
Suggested by Paul Eggert.

*** lib/c-strcasestr.c.orig 2007-12-30 17:20:46.0 +0100
--- lib/c-strcasestr.c  2007-12-30 17:13:41.0 +0100
***
*** 27,153 
  #include "malloca.h"
  #include "c-ctype.h"
  
! /* Knuth-Morris-Pratt algorithm.
!See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
!Return a boolean indicating success.  */
! static bool
! knuth_morris_pratt (const char *haystack, const char *needle,
!   const char **resultp)
! {
!   size_t m = strlen (needle);
! 
!   /* Allocate the table.  */
!   size_t *table = (size_t *) nmalloca (m, sizeof (size_t));
!   if (table == NULL)
! return false;
!   /* Fill the table.
!  For 0 < i < m:
!0 < table[i] <= i is defined such that
!forall 0 < x < table[i]: needle[x..i-1] != needle[0..i-1-x],
!and table[i] is as large as possible with this property.
!  This implies:
!  1) For 0 < i < m:
!   If table[i] < i,
!   needle[table[i]..i-1] = needle[0..i-1-table[i]].
!  2) For 0 < i < m:
!   rhaystack[0..i-1] == needle[0..i-1]
!   and exists h, i <= h < m: rhaystack[h] != needle[h]
!   implies
!   forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1].
!  table[0] remains uninitialized.  */
!   {
! size_t i, j;
! 
! /* i = 1: Nothing to verify for x = 0.  */
! table[1] = 1;
! j = 0;
! 
! for (i = 2; i < m; i++)
!   {
!   /* Here: j = i-1 - table[i-1].
!  The inequality needle[x..i-1] != needle[0..i-1-x] is known to hold
!  for x < table[i-1], by induction.
!  Furthermore, if j>0: needle[i-1-j..i-2] = needle[0..j-1].  */
!   unsigned char b = c_tolower ((unsigned char) needle[i - 1]);
! 
!   for (;;)
! {
!   /* Invariants: The inequality needle[x..i-1] != needle[0..i-1-x]
!  is known to hold for x < i-1-j.
!  Furthermore, if j>0: needle[i-1-j..i-2] = needle[0..j-1].  */
!   if (b == c_tolower ((unsigned char) needle[j]))
! {
!   /* Set table[i] := i-1-j.  */
!   table[i] = i - ++j;
!   break;
! }
!   /* The inequality needle[x..i-1] != needle[0..i-1-x] also holds
!  for x = i-1-j, because
!needle[i-1] != needle[j] = needle[i-1-x].  */
!   if (j == 0)
! {
!   /* The inequality holds for all possible x.  */
!   table[i] = i;
!   break;
! }
!   /* The inequality needle[x..i-1] != needle[0..i-1-x] also holds
!  for i-1-j < x < i-1-j+table[j], because for these x:
!needle[x..i-2]
!= needle[x-(i-1-j)..j-1]
!!= needle[0..j-1-(x-(i-1-j))]  (by definition of table[j])
!   = needle[0..i-2-x],
!  hence needle[x..i-1] != needle[0..i-1-x].
!  Furthermore
!needle[i-1-j+table[j]..i-2]
!= needle[table[j]..j-1]
!= needle[0..j-1-table[j]]  (by definition of table[j]).  */
!   j = j - table[j];
! }
!   /* Here: j = i - table[i].  */
!   }
!   }
! 
!   /* Search, using the table to accelerate the processing.  */
!   {
! size_t j;
! const char *rhaystack;
! const char *phaystack;
! 
! *resultp = NULL;
! j = 0;
! rhaystack = haystack;
! phaystack = haystack;
! /* Invariant: phaystack = rhaystack + j.  */
! while (*phaystack != '\0')
!   if (c_tolower ((unsigned char) needle[j])
! == c_tolower (

commit to gnulib.texi

2007-12-31 Thread Bruno Haible
Hi Karl,

You did this commit:
  
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=bf156d57d0488caa843739e039238d56fde4507d

Could you please add a ChangeLog entry and notify the mailing list about
the change, preferably with attached diffs? That's the common process here
for peer review, and peer review ensures good quality of the commits.

If you don't believe this, please take a look at the other one of your
commits yesterday:
  
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=b463b61cfbce5128bbec87be04104535349fdec8
It contains a grammatical mistake.

Bruno




[PATCH] Avoid use of private FTS type name.

2007-12-31 Thread Jim Meyering
FYI, I've just pushed this change:

Avoid use of private FTS type name.
* lib/fts.c (fts_sort): Use FTSENT rather than "struct _ftsent".

diff --git a/lib/fts.c b/lib/fts.c
index ceb8935..82ea8f6 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -1475,7 +1475,7 @@ fts_sort (FTS *sp, FTSENT *head, register size_t nitems)
 * 40 so don't realloc one entry at a time.
 */
if (nitems > sp->fts_nitems) {
-   struct _ftsent **a;
+   FTSENT **a;

sp->fts_nitems = nitems + 40;
if (SIZE_MAX / sizeof *a < sp->fts_nitems
--
1.5.4.rc2.1.gc00e




Re: memmem issues

2007-12-31 Thread Paul Eggert
Bruno Haible <[EMAIL PROTECTED]> writes:

> It says that the _values_ of the characters are interpreted as
> unsigned char. I.e. like this:

Sure, but a character, as used in the sense here, is a bit
representation that fits in a byte (see section 3.7.1); it is not a
'char' value.  The intent is that memcmp looks at all the bits of the
underlying representation.  See, for example, footnote 43 of section
6.2.6.1:

   It is possible for objects x and y with the same effective type T
   to have the same value when they are accessed as objects of type T,
   but to have different values in other contexts. In particular, if
   == is defined for type T, then x == y does not imply that
   memcmp(&x, &y, sizeof (T)) == 0.  Furthermore, x == y does not
   necessarily imply that x and y have the same value; other
   operations on values of type T may distinguish between them.

Here again, the intent is that memcmp(&x, &y, sizeof (T)) compares all
the bits in the underlying representation.  The only way to do that
portably is via "unsigned char *".




Re: commit to gnulib.texi

2007-12-31 Thread Karl Berry
You did this commit:
  
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=bf156d57d0488caa843739e039238d56fde4507d

Could you please add a ChangeLog entry and notify the mailing list about
the change, preferably with attached diffs? 

Yes, I apologize for forgetting to do that.  I'll send a separate msg
with the info.

please take a look at the other one of your
commits yesterday:
  
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=b463b61cfbce5128bbec87be04104535349fdec8
It contains a grammatical mistake.

I do not see it.  Please tell me.

karl




Re: commit to gnulib.texi

2007-12-31 Thread Karl Berry
>Could you please add a ChangeLog entry 

2007-12-30  Karl Berry  <[EMAIL PROTECTED]>
* doc/gnulib.texi (Library vs. Reusable Code): remove period, to
work around defect in Texinfo and/or the standalone Info browser.

>preferably with attached diffs? 

In this case, you already gave the url:
http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=bf156d57d0488caa843739e039238d56fde4507d

Rationale: this is simply the most expedient way to work around the
present suboptimal situation wrt to periods in Texinfo node names.

karl