Simon Josefsson <simon <at> josefsson.org> writes: > > return !result || !memmem ("a", 1, 0, 0);]])], > > > Ah, thanks. This was not part of the old memmem.m4 checks. Still, the > glibc documentation (which I regard as the memmem specification) doesn't > say what should be returned for empty needles. My conclusion is that > empty needles should lead to undefined behaviour.
Fair enough for the memmem-simple module. But since POSIX requires that 0- length needles in strstr(ptr,"") return ptr, and I consider that the similarity between memmem and strstr is essential, (in part so someone could do a naive implementation of strstr as: return memmem (haystack, strlen (haystack), needle, strlen (needle)); and in part because if the haystack length is known in advance, memmem can be more efficient than strstr). Therefore, the full memmem module should enforce an empty needle as well-defined behavior. (Or, put another way, I want 'echo "index(abc,)"|m4' to work without me having to special case whether the second argument to index is empty). > Unfortunately, cross-compilation is in practice the typical scenario > where people care about size issues, so I would prefer to rely on > AC_CHECK_FUNC instead. OK then, based on this email, the gl_FUNC_MEMMEM_SIMPLE macro uses AC_CHECK_FUNC when cross-compiling, while gl_FUNC_MEMMEM is pessimistic and always uses the replacement (unless you did primed the cache with gl_cv_func_memmem_works=yes first). Clients of memmem-simple must avoid empty needles, and should avoid arbitrary needles, and as a result get smaller code size on glibc and cygwin. Here's what I'm committing (I reversed your proposed macro dependency - the full module should depend on and augment the simple check, not the other way around): From: Eric Blake <[EMAIL PROTECTED]> Date: Wed, 9 Jan 2008 10:19:13 -0700 Subject: [PATCH] Add memmem-simple module. * m4/memmem.m4 (gl_FUNC_MEMMEM_SIMPLE): New macro. (gl_FUNC_MEMMEM): Separate performance from presence checks. * modules/memmem-simple: New file. * modules/memmem (Description): Tweak. * MODULES.html.sh (string handling): Mention it. * doc/functions/memmem.texi (memmem): Distinguish which flaws are addressed by memmem-simple. * NEWS: Document the difference. Signed-off-by: Eric Blake <[EMAIL PROTECTED]> --- ChangeLog | 15 ++++++++++++++- MODULES.html.sh | 3 ++- NEWS | 6 ++++++ doc/functions/memmem.texi | 6 +++--- m4/memmem.m4 | 20 +++++++++++++++----- modules/memmem | 2 +- modules/memmem-simple | 28 ++++++++++++++++++++++++++++ 7 files changed, 69 insertions(+), 11 deletions(-) create mode 100644 modules/memmem-simple diff --git a/ChangeLog b/ChangeLog index 975665d..721b9e2 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,16 @@ +2008-01-09 Simon Josefsson <[EMAIL PROTECTED]> + and Eric Blake <[EMAIL PROTECTED]> + + Add memmem-simple module. + * m4/memmem.m4 (gl_FUNC_MEMMEM_SIMPLE): New macro. + (gl_FUNC_MEMMEM): Separate performance from presence checks. + * modules/memmem-simple: New file. + * modules/memmem (Description): Tweak. + * MODULES.html.sh (string handling): Mention new module. + * doc/functions/memmem.texi (memmem): Distinguish which flaws are + addressed by memmem-simple. + * NEWS: Document the difference. + 2008-01-09 Eric Blake <[EMAIL PROTECTED]> Give gcc some memmem optimization hints. diff --git a/MODULES.html.sh b/MODULES.html.sh index 324b3f3..66e0b01 100755 --- a/MODULES.html.sh +++ b/MODULES.html.sh @@ -1,6 +1,6 @@ #!/bin/sh # -# Copyright (C) 2002-2007 Free Software Foundation, Inc. +# Copyright (C) 2002-2008 Free Software Foundation, Inc. # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -1654,6 +1654,7 @@ func_all_modules () func_begin_table func_module bcopy func_module memmem + func_module memmem-simple func_module mempcpy func_module memrchr func_module stpcpy diff --git a/NEWS b/NEWS index c30b272..df2676f 100644 --- a/NEWS +++ b/NEWS @@ -6,6 +6,12 @@ User visible incompatible changes Date Modules Changes +2008-01-08 memmem This module now replaces worst-case inefficient + implementations; clients that use controlled + needles and thus do not care about worst-case + efficiency should use the new memmem-simple + module instead for smaller code size. + 2007-12-24 setenv The include file is changed from "setenv.h" to <stdlib.h>. Also, the unsetenv function is no longer declared in this module; use the 'unsetenv' diff --git a/doc/functions/memmem.texi b/doc/functions/memmem.texi index fefbe3c..4925ac8 100644 --- a/doc/functions/memmem.texi +++ b/doc/functions/memmem.texi @@ -4,7 +4,7 @@ Unspecified by POSIX, but comparable to @code{strstr}. -Gnulib module: memmem +Gnulib module: memmem, memmem-simple Portability problems fixed by Gnulib: @itemize @@ -14,12 +14,12 @@ Linux libc 5.0.9 @item This function returns incorrect values in some cases, such as when -given an empty needle: +given an empty needle (not fixed in memmem-simple): glibc <= 2.0, cygwin 1.5.x @item This function has quadratic instead of linear complexity on some -platforms: +platforms (not fixed in memmem-simple): glibc <= 2.6.1, cygwin 1.5.x @item diff --git a/m4/memmem.m4 b/m4/memmem.m4 index 9767354..d862027 100644 --- a/m4/memmem.m4 +++ b/m4/memmem.m4 @@ -1,10 +1,11 @@ -# memmem.m4 serial 7 +# memmem.m4 serial 8 dnl Copyright (C) 2002, 2003, 2004, 2007, 2008 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, dnl with or without modifications, as long as this notice is preserved. -AC_DEFUN([gl_FUNC_MEMMEM], +dnl Check that memmem is present. +AC_DEFUN([gl_FUNC_MEMMEM_SIMPLE], [ dnl Persuade glibc <string.h> to declare memmem(). AC_REQUIRE([AC_USE_SYSTEM_EXTENSIONS]) @@ -14,7 +15,15 @@ AC_DEFUN([gl_FUNC_MEMMEM], AC_CHECK_DECLS_ONCE(memmem) if test $ac_cv_have_decl_memmem = no; then HAVE_DECL_MEMMEM=0 - else + fi + gl_PREREQ_MEMMEM +]) # gl_FUNC_MEMMEM_SIMPLE + +dnl Additionally, check that memmem is efficient and handles empty needles. +AC_DEFUN([gl_FUNC_MEMMEM], +[ + AC_REQUIRE([gl_FUNC_MEMMEM_SIMPLE]) + if test $ac_cv_have_decl_memmem = yes; then AC_CACHE_CHECK([whether memmem works in linear time], [gl_cv_func_memmem_works], [AC_RUN_IFELSE([AC_LANG_PROGRAM([ @@ -28,6 +37,7 @@ AC_DEFUN([gl_FUNC_MEMMEM], /* Failure to compile this test due to missing alarm is okay, since all such platforms (mingw) also lack memmem. */ alarm (5); + /* Check for quadratic performance. */ if (haystack && needle) { memset (haystack, 'A', 2 * m); @@ -36,6 +46,7 @@ AC_DEFUN([gl_FUNC_MEMMEM], needle[m] = 'B'; result = memmem (haystack, 2 * m + 1, needle, m + 1); } + /* Check for empty needle behavior. */ return !result || !memmem ("a", 1, 0, 0);]])], [gl_cv_func_memmem_works=yes], [gl_cv_func_memmem_works=no], [dnl pessimistically assume the worst, since even glibc 2.6.1 @@ -46,8 +57,7 @@ AC_DEFUN([gl_FUNC_MEMMEM], AC_LIBOBJ([memmem]) fi fi - gl_PREREQ_MEMMEM -]) +]) # gl_FUNC_MEMMEM # Prerequisites of lib/memmem.c. AC_DEFUN([gl_PREREQ_MEMMEM], [:]) diff --git a/modules/memmem b/modules/memmem index 71f770f..28e386d 100644 --- a/modules/memmem +++ b/modules/memmem @@ -1,5 +1,5 @@ Description: -memmem() function: locate first substring in a buffer. +memmem() function: efficiently locate first substring in a buffer. Files: lib/memmem.c diff --git a/modules/memmem-simple b/modules/memmem-simple new file mode 100644 index 0000000..bbe89cd --- /dev/null +++ b/modules/memmem-simple @@ -0,0 +1,28 @@ +Description: +memmem() function: locate first substring in a buffer. + +Files: +lib/memmem.c +m4/memmem.m4 + +Depends-on: +extensions +string +stdint +memchr +memcmp + +configure.ac: +gl_FUNC_MEMMEM_SIMPLE +gl_STRING_MODULE_INDICATOR([memmem]) + +Makefile.am: + +Include: +<string.h> + +License: +LGPLv2+ + +Maintainer: +libc, Eric Blake, Simon Josefsson -- 1.5.3.5