Re: memmem issues
Ben Pfaff wrote: > >> > +unsigned char b = (unsigned char) needle[i - 1]; > >> > ... > >> > +if (b == (unsigned char) needle[j]) > >> > >> Would it be cleaner to declare 'b' to be of type 'char' and avoid the > >> casts? > > > > No; ISO C 99 section 7.21.4 says that when byte strings are compared the > > elements are considered as 'unsigned char' values. > > The first cast to unsigned char quoted above seems to be > unnecessary: assigning a value to an object of type unsigned char > will implicitly convert it to unsigned char. The answer was already in the mail to which you replied: Why risk bugs when the approach is very simple: after fetching any 'char' from any of the strings, cast it to 'unsigned char'. Simple rule, simple to remember, works fine. When you omit the cast, the next person who does a refactoring of the code and who was not aware that you were relying on implicit conversions will introduce a bug. Code maintainability is about clearly expressing the code's intent. Either in comments or - here - through seemingly redundant casts. Bruno
Re: memmem issues
Paul Eggert wrote: > The 'char' type can have padding bits, which means that code like this: > > char const *needle = (char const *) needle_start; > char const *haystack = (char const *) haystack_start; > if ((unsigned char) *needle == (unsigned char) *haystack) ... > > might ignore some of the bits in the storage addressed by A and B. > To get a proper memcmp-style comparison one must do something like this: > > unsigned char const *needle = (unsigned char const *) needle_start; > unsigned char const *haystack = (unsigned char const *) haystack_start; > if (*needle == *haystack) ... > > (In the latter version, the casts are needed only because this code is > intended to be portable to C++.) I disagree. Citing ISO C 99, section 7.21.4: "The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared." It says that the _values_ of the characters are interpreted as unsigned char. I.e. like this: if ((unsigned char) *needle == (unsigned char) *haystack) ... It does _not_ say that the memory storage should be interpreted as 'unsigned char[]'. Bruno
Re: memmem issues
Paul Eggert wrote: > OK, I installed this patch. > > 2007-12-29 Paul Eggert <[EMAIL PROTECTED]> > > * lib/memmem.c (knuth_morris_pratt): Check for size_t overflow > when multiplying M by sizeof (size_t). Thanks. Let me generalize it, as follows. 2007-12-30 Bruno Haible <[EMAIL PROTECTED]> * lib/malloca.h (nmalloca): New macro. * lib/c-strcasestr.c (knuth_morris_pratt): Use it. * lib/c-strstr.c (knuth_morris_pratt): Likewise. * lib/mbscasestr.c (knuth_morris_pratt_unibyte, knuth_morris_pratt_multibyte): Likewise. * lib/mbsstr.c (knuth_morris_pratt_unibyte, knuth_morris_pratt_multibyte): Likewise. * lib/memmem.c (knuth_morris_pratt): Likewise. * lib/strcasestr.c (knuth_morris_pratt): Likewise. *** lib/malloca.h.orig 2007-12-30 16:44:24.0 +0100 --- lib/malloca.h 2007-12-30 16:21:32.0 +0100 *** *** 70,78 # define freea free #endif ! /* Maybe we should also define a variant ! nmalloca (size_t n, size_t s) - behaves like malloca (n * s) !If this would be useful in your application. please speak up. */ #ifdef __cplusplus --- 70,88 # define freea free #endif ! /* nmalloca(N,S) is an overflow-safe variant of malloca (N * S). !It allocates an array of N objects, each with S bytes of memory, !on the stack. S must be positive and N must be nonnegative. !The array must be freed using freea() before the function returns. */ ! #if 1 ! /* Cf. the definition of xalloc_oversized. */ ! # define nmalloca(n, s) \ ! ((n) > (size_t) (sizeof (ptrdiff_t) <= sizeof (size_t) ? -1 : -2) / (s) \ ! ? NULL \ ! : malloca ((n) * (s))) ! #else ! extern void * nmalloca (size_t n, size_t s); ! #endif #ifdef __cplusplus *** lib/c-strcasestr.c.orig 2007-12-30 16:44:24.0 +0100 --- lib/c-strcasestr.c 2007-12-30 16:37:08.0 +0100 *** *** 37,43 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) malloca (m * sizeof (size_t)); if (table == NULL) return false; /* Fill the table. --- 37,43 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) nmalloca (m, sizeof (size_t)); if (table == NULL) return false; /* Fill the table. *** lib/c-strstr.c.orig 2007-12-30 16:44:24.0 +0100 --- lib/c-strstr.c 2007-12-30 16:37:22.0 +0100 *** *** 36,42 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) malloca (m * sizeof (size_t)); if (table == NULL) return false; /* Fill the table. --- 36,42 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) nmalloca (m, sizeof (size_t)); if (table == NULL) return false; /* Fill the table. *** lib/mbscasestr.c.orig 2007-12-30 16:44:24.0 +0100 --- lib/mbscasestr.c2007-12-30 16:37:58.0 +0100 *** *** 42,48 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) malloca (m * sizeof (size_t)); if (table == NULL) return false; /* Fill the table. --- 42,48 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) nmalloca (m, sizeof (size_t)); if (table == NULL) return false; /* Fill the table. *** *** 164,170 size_t *table; /* Allocate room for needle_mbchars and the table. */ ! char *memory = (char *) malloca (m * (sizeof (mbchar_t) + sizeof (size_t))); if (memory == NULL) return false; needle_mbchars = (mbchar_t *) memory; --- 164,170 size_t *table; /* Allocate room for needle_mbchars and the table. */ ! char *memory = (char *) nmalloca (m, sizeof (mbchar_t) + sizeof (size_t)); if (memory == NULL) return false; needle_mbchars = (mbchar_t *) memory; *** lib/mbsstr.c.orig 2007-12-30 16:44:24.0 +0100 --- lib/mbsstr.c2007-12-30 16:39:36.0 +0100 *** *** 39,45 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) malloca (m * sizeof (size_t)); if (table == NULL) return false; /* Fill the table. --- 39,45 size_t m = strlen (needle); /* Allocate the table. */ ! size_t *table = (size_t *) nmalloca (m, sizeof (size_t)); if (table == NULL) return false; /* Fill the table. *** *** 160,166 size_t *table; /* Allocate room for needle_mbchars and the table. */ ! char *memory = (char *) malloca (m * (sizeof (mbchar_t) + sizeof (size_t))); if (memory == NULL) return false; needle_mbchars = (mbchar_t *) memory; --- 160,166 size_t *table; /* Allocate room for needle_mbchars and the table. */ ! char *memory = (char
new macro xnmalloca
Since there was need for a macro 'nmalloca', there will also be need for 'xnmalloca'. This also fixes a bug: On platforms without working alloca(), xmalloc() could be used without being declared. 2007-12-30 Bruno Haible <[EMAIL PROTECTED]> * lib/xmalloca.h: Include xalloc.h. (xnmalloca): New macro. *** lib/xmalloca.h.orig 2007-12-30 16:44:24.0 +0100 --- lib/xmalloca.h 2007-12-30 16:30:56.0 +0100 *** *** 19,24 --- 19,25 #define _XMALLOCA_H #include "malloca.h" + #include "xalloc.h" #ifdef __cplusplus *** *** 40,48 xmalloc (N) #endif ! /* Maybe we should also define a variant ! xnmalloca (size_t n, size_t s) - behaves like xmalloca (n * s) !If this would be useful in your application. please speak up. */ #ifdef __cplusplus --- 41,59 xmalloc (N) #endif ! /* xnmalloca(N,S) is an overflow-safe variant of xmalloca (N * S). !It allocates an array of N objects, each with S bytes of memory, !on the stack. S must be positive and N must be nonnegative. !The array must be freed using freea() before the function returns. !Upon failure, it exits with an error message. */ ! #if HAVE_ALLOCA ! /* Rely on xmalloca (SIZE_MAX) calling xalloc_die (). */ ! # define xnmalloca(n, s) \ ! xmalloca (xalloc_oversized ((n), (s)) ? (size_t) (-1) : (n) * (s)) ! #else ! # define xnmalloca(n, s) \ ! xnmalloc ((n), (s)) ! #endif #ifdef __cplusplus
xmalloca: small optimization
This is a small optimization: On platform without alloca(), the xmmalloca() does not need to be defined, since nothing uses it. 2007-12-30 Bruno Haible <[EMAIL PROTECTED]> * lib/xmalloca.c (xmmalloca): Don't define if HAVE_ALLOCA is not defined. *** lib/xmalloca.c.orig 2007-12-30 16:44:24.0 +0100 --- lib/xmalloca.c 2007-12-30 16:34:13.0 +0100 *** *** 22,27 --- 22,29 #include "xalloc.h" + #if HAVE_ALLOCA + void * xmmalloca (size_t n) { *** *** 32,34 --- 34,38 xalloc_die (); return p; } + + #endif
Re: memmem issues
Paul Eggert wrote: > Bruno Haible <[EMAIL PROTECTED]> writes: > > > Most of your comments apply to all copies of the KMP code in gnulib. > > Ouch! Should these be coalesced? Out of the 8 copies of the code currently in gnulib, I can easily coalesce 5 into 1; see attached patch. The remaining 4 copies, however, cannot be merged without making them less efficient or without introducing ugly #ifs. Bruno 2007-12-30 Bruno Haible <[EMAIL PROTECTED]> Unify 5 copies of the KMP code. * lib/str-kmp.h: New file. * lib/c-strcasestr.c: Include str-kmp.h. (knuth_morris_pratt): Remove function. (c_strcasestr): Update. * lib/c-strstr.c: Include str-kmp.h. (knuth_morris_pratt): Remove function. (c_strcasestr): Update. * lib/mbscasestr.c: Include str-kmp.h. (knuth_morris_pratt_unibyte): Remove function. * lib/mbsstr.c: Include str-kmp.h. (knuth_morris_pratt_unibyte): Remove function. * lib/strcasestr.c: Include str-kmp.h. (knuth_morris_pratt): Remove function. (strcasestr): Update. * modules/c-strcasestr (Files): Add lib/str-kmp.h. * modules/c-strstr (Files): Likewise. * modules/mbscasestr (Files): Likewise. * modules/mbsstr (Files): Likewise. * modules/strcasestr (Files): Likewise. Suggested by Paul Eggert. *** lib/c-strcasestr.c.orig 2007-12-30 17:20:46.0 +0100 --- lib/c-strcasestr.c 2007-12-30 17:13:41.0 +0100 *** *** 27,153 #include "malloca.h" #include "c-ctype.h" ! /* Knuth-Morris-Pratt algorithm. !See http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm !Return a boolean indicating success. */ ! static bool ! knuth_morris_pratt (const char *haystack, const char *needle, ! const char **resultp) ! { ! size_t m = strlen (needle); ! ! /* Allocate the table. */ ! size_t *table = (size_t *) nmalloca (m, sizeof (size_t)); ! if (table == NULL) ! return false; ! /* Fill the table. ! For 0 < i < m: !0 < table[i] <= i is defined such that !forall 0 < x < table[i]: needle[x..i-1] != needle[0..i-1-x], !and table[i] is as large as possible with this property. ! This implies: ! 1) For 0 < i < m: ! If table[i] < i, ! needle[table[i]..i-1] = needle[0..i-1-table[i]]. ! 2) For 0 < i < m: ! rhaystack[0..i-1] == needle[0..i-1] ! and exists h, i <= h < m: rhaystack[h] != needle[h] ! implies ! forall 0 <= x < table[i]: rhaystack[x..x+m-1] != needle[0..m-1]. ! table[0] remains uninitialized. */ ! { ! size_t i, j; ! ! /* i = 1: Nothing to verify for x = 0. */ ! table[1] = 1; ! j = 0; ! ! for (i = 2; i < m; i++) ! { ! /* Here: j = i-1 - table[i-1]. ! The inequality needle[x..i-1] != needle[0..i-1-x] is known to hold ! for x < table[i-1], by induction. ! Furthermore, if j>0: needle[i-1-j..i-2] = needle[0..j-1]. */ ! unsigned char b = c_tolower ((unsigned char) needle[i - 1]); ! ! for (;;) ! { ! /* Invariants: The inequality needle[x..i-1] != needle[0..i-1-x] ! is known to hold for x < i-1-j. ! Furthermore, if j>0: needle[i-1-j..i-2] = needle[0..j-1]. */ ! if (b == c_tolower ((unsigned char) needle[j])) ! { ! /* Set table[i] := i-1-j. */ ! table[i] = i - ++j; ! break; ! } ! /* The inequality needle[x..i-1] != needle[0..i-1-x] also holds ! for x = i-1-j, because !needle[i-1] != needle[j] = needle[i-1-x]. */ ! if (j == 0) ! { ! /* The inequality holds for all possible x. */ ! table[i] = i; ! break; ! } ! /* The inequality needle[x..i-1] != needle[0..i-1-x] also holds ! for i-1-j < x < i-1-j+table[j], because for these x: !needle[x..i-2] != needle[x-(i-1-j)..j-1] !!= needle[0..j-1-(x-(i-1-j))] (by definition of table[j]) ! = needle[0..i-2-x], ! hence needle[x..i-1] != needle[0..i-1-x]. ! Furthermore !needle[i-1-j+table[j]..i-2] != needle[table[j]..j-1] != needle[0..j-1-table[j]] (by definition of table[j]). */ ! j = j - table[j]; ! } ! /* Here: j = i - table[i]. */ ! } ! } ! ! /* Search, using the table to accelerate the processing. */ ! { ! size_t j; ! const char *rhaystack; ! const char *phaystack; ! ! *resultp = NULL; ! j = 0; ! rhaystack = haystack; ! phaystack = haystack; ! /* Invariant: phaystack = rhaystack + j. */ ! while (*phaystack != '\0') ! if (c_tolower ((unsigned char) needle[j]) ! == c_tolower (
commit to gnulib.texi
Hi Karl, You did this commit: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=bf156d57d0488caa843739e039238d56fde4507d Could you please add a ChangeLog entry and notify the mailing list about the change, preferably with attached diffs? That's the common process here for peer review, and peer review ensures good quality of the commits. If you don't believe this, please take a look at the other one of your commits yesterday: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=b463b61cfbce5128bbec87be04104535349fdec8 It contains a grammatical mistake. Bruno
[PATCH] Avoid use of private FTS type name.
FYI, I've just pushed this change: Avoid use of private FTS type name. * lib/fts.c (fts_sort): Use FTSENT rather than "struct _ftsent". diff --git a/lib/fts.c b/lib/fts.c index ceb8935..82ea8f6 100644 --- a/lib/fts.c +++ b/lib/fts.c @@ -1475,7 +1475,7 @@ fts_sort (FTS *sp, FTSENT *head, register size_t nitems) * 40 so don't realloc one entry at a time. */ if (nitems > sp->fts_nitems) { - struct _ftsent **a; + FTSENT **a; sp->fts_nitems = nitems + 40; if (SIZE_MAX / sizeof *a < sp->fts_nitems -- 1.5.4.rc2.1.gc00e
Re: memmem issues
Bruno Haible <[EMAIL PROTECTED]> writes: > It says that the _values_ of the characters are interpreted as > unsigned char. I.e. like this: Sure, but a character, as used in the sense here, is a bit representation that fits in a byte (see section 3.7.1); it is not a 'char' value. The intent is that memcmp looks at all the bits of the underlying representation. See, for example, footnote 43 of section 6.2.6.1: It is possible for objects x and y with the same effective type T to have the same value when they are accessed as objects of type T, but to have different values in other contexts. In particular, if == is defined for type T, then x == y does not imply that memcmp(&x, &y, sizeof (T)) == 0. Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T may distinguish between them. Here again, the intent is that memcmp(&x, &y, sizeof (T)) compares all the bits in the underlying representation. The only way to do that portably is via "unsigned char *".
Re: commit to gnulib.texi
You did this commit: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=bf156d57d0488caa843739e039238d56fde4507d Could you please add a ChangeLog entry and notify the mailing list about the change, preferably with attached diffs? Yes, I apologize for forgetting to do that. I'll send a separate msg with the info. please take a look at the other one of your commits yesterday: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=b463b61cfbce5128bbec87be04104535349fdec8 It contains a grammatical mistake. I do not see it. Please tell me. karl
Re: commit to gnulib.texi
>Could you please add a ChangeLog entry 2007-12-30 Karl Berry <[EMAIL PROTECTED]> * doc/gnulib.texi (Library vs. Reusable Code): remove period, to work around defect in Texinfo and/or the standalone Info browser. >preferably with attached diffs? In this case, you already gave the url: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=bf156d57d0488caa843739e039238d56fde4507d Rationale: this is simply the most expedient way to work around the present suboptimal situation wrt to periods in Texinfo node names. karl