-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [this portion of the thread can live on just autoconf-patches]
According to Eric Blake on 9/29/2007 1:31 PM: > Also, some of those most frequent patterns can be done with index() rather > than regexp() in m4sugar, offering some speedups even without m4 improvements. > Proof: $ cd m4/example $ cat search.m4 include(forloop2.m4)ifdef(`limit',,`define(limit,10000)')divert(-1) forloop(i,1,limit,`index(abc,b)index(def,b)') $ time m4 search.m4 real 0m0.262s user 0m0.249s sys 0m0.030s $ time m4 -Dindex=regexp'($@)' search.m4 real 0m1.655s user 0m1.640s sys 0m0.030s $ time m4 -Dindex='builtin(`index'\'',$@)' search.m4 real 0m0.303s user 0m0.296s sys 0m0.015s patsubst is harder to replace than regexp in most cases. But here's a few low-hanging fruit that I'm installing. With this installed, the same coreutils example now runs much faster (after priming the cache each time, I dropped from 58 seconds pre-patch down to 34), and has only 18k regular expressions (compared to 61k before), with the worst offenders: $ sort m4.trace | uniq -c |sort -n -k1,1 |tail -n 20 50 p:y:{\([0-9]+\)\([tuvwxyz]\)} 50 p:y:{\.} 98 p:y:{.} 122 p:n:{\..*} 122 p:n:{^[^.]*\.} 208 p:y:{[\"$`]} 287 p:y:{[\\`]} 364 p:n:{(.*)} 415 p:n:{@&[EMAIL PROTECTED] 432 p:y:{[][*+.?\^$]} 432 r:n:{ 547 p:y:{[^A-Z0-9_]} 740 p:y:{[\\'']} 863 r:n:{^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_][abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789]*$} 1163 p:n:{\\ 1163 p:y:{^. ?\(.*\) .$} 1596 } 2324 p:y:{[ ]+} 3826 p:y:{[^a-zA-Z0-9_]} 5020 p:y:{[`""]} Key: p:y is patsubst with replacement (usually, no alternative). p:n is patsubst with deletion (translit can delete faster than regex bracket expressions, and substr faster than regex literals). r:y is regexp with alternate string, and r:n is regexp with location (index is faster for location, and index+if can quickly do alternate string). Of the remaining regular expressions, the most benefit might be had by writing AS_WORD_IF, which could more efficiently do: m4_bmatch(m4_bpatsubst([[$1]], [@&[EMAIL PROTECTED]), ^m4_defn([m4_re_word])$, [], [AC_FATAL([$0: `$1' is not a valid shell variable name])]) used by both AC_SUBST and AC_DEFINE, among others (m4_re_word is ^[a-zA-Z_][a-zA-Z_0-9]*$). At any rate, I'm committing the following. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG/xyh84KuGfSFAYARAum8AJ9zsTo5YFDOX3/vhpjGrBcpIcJc1wCgr+tX TVUvK5pgc+f4gM13CX3Xs9k= =sQEt -----END PGP SIGNATURE-----
>From b2dda0adc778b0d3004950048211289958da2abc Mon Sep 17 00:00:00 2001 From: Eric Blake <[EMAIL PROTECTED]> Date: Sat, 29 Sep 2007 21:44:10 -0600 Subject: [PATCH] Speed optimization: avoid m4 regex when other algorithms work. * lib/m4sugar/m4sh.m4 (AS_LITERAL_IF): Rewrite without regex. (_AS_QUOTE_IFELSE): Likewise. * lib/m4sugar/m4sugar.m4 (m4_strip): Reduce from 3 to 2 regex. (m4_bpatsubsts): Split... (_m4_bpatsubsts): ...so that recursion can avoid patsubst on empty regex. (_m4_divert()): Define, to avoid m4 warning on `m4_divert'. (m4_qlen): Optimize on short strings, to avoid regex. (m4_sign): Avoid regex, and fix bug with `01' and `-0'. * lib/autoconf/general.m4 (AC_CACHE_VAL): Rewrite without regex. (AC_DEFINE_TRACE): Likewise. Signed-off-by: Eric Blake <[EMAIL PROTECTED]> --- ChangeLog | 15 +++++++++++++++ lib/autoconf/general.m4 | 19 ++++++++++++------- lib/m4sugar/m4sh.m4 | 28 +++++++++++++++++++++------- lib/m4sugar/m4sugar.m4 | 44 ++++++++++++++++++++++++++++++++------------ 4 files changed, 80 insertions(+), 26 deletions(-) diff --git a/ChangeLog b/ChangeLog index 48c6c20..69d2053 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,18 @@ +2007-09-29 Eric Blake <[EMAIL PROTECTED]> + + Speed optimization: avoid m4 regex when other algorithms work. + * lib/m4sugar/m4sh.m4 (AS_LITERAL_IF): Rewrite without regex. + (_AS_QUOTE_IFELSE): Likewise. + * lib/m4sugar/m4sugar.m4 (m4_strip): Reduce from 3 to 2 regex. + (m4_bpatsubsts): Split... + (_m4_bpatsubsts): ...so that recursion can avoid patsubst on empty + regex. + (_m4_divert()): Define, to avoid m4 warning on `m4_divert'. + (m4_qlen): Optimize on short strings, to avoid regex. + (m4_sign): Avoid regex, and fix bug with `01' and `-0'. + * lib/autoconf/general.m4 (AC_CACHE_VAL): Rewrite without regex. + (AC_DEFINE_TRACE): Likewise. + 2007-09-28 Eric Blake <[EMAIL PROTECTED]> Oops - my earlier 'optimization' caused a regression. diff --git a/lib/autoconf/general.m4 b/lib/autoconf/general.m4 index a0f473a..7835f24 100644 --- a/lib/autoconf/general.m4 +++ b/lib/autoconf/general.m4 @@ -1950,14 +1950,15 @@ rm -f confcache[]dnl # The name of shell var CACHE-ID must contain `_cv_' in order to get saved. # Should be dnl'ed. Try to catch common mistakes. m4_defun([AC_CACHE_VAL], -[AS_LITERAL_IF([$1], [m4_bmatch(m4_quote($1), [_cv_], [], - [AC_DIAGNOSE([syntax], +[AS_LITERAL_IF([$1], [m4_if(m4_index(m4_quote($1), [_cv_]), [-1], + [AC_DIAGNOSE([syntax], [$0($1, ...): suspicious cache-id, must contain _cv_ to be cached])])])dnl -m4_bmatch([$2], [AC_DEFINE], - [AC_DIAGNOSE([syntax], +m4_if(m4_index([$2], [AC_DEFINE]), [-1], [], + [AC_DIAGNOSE([syntax], [$0($1, ...): suspicious presence of an AC_DEFINE in the second argument, ]dnl -[where no actions should be taken])], - [AC_SUBST], [AC_DIAGNOSE([syntax], +[where no actions should be taken])])dnl +m4_if(m4_index([$2], [AC_SUBST]), [-1], [], + [AC_DIAGNOSE([syntax], [$0($1, ...): suspicious presence of an AC_SUBST in the second argument, ]dnl [where no actions should be taken])])dnl AS_VAR_SET_IF([$1], @@ -2006,8 +2007,12 @@ m4_bmatch([$1], ^m4_defn([m4_re_word])$, [], # --------------------------- # This macro is a wrapper around AC_DEFINE_TRACE_LITERAL which filters # out non literal symbols. +# +# m4_index is roughly 5 to 8 times faster than m4_bpatsubst. m4_define([AC_DEFINE_TRACE], -[AS_LITERAL_IF([$1], [AC_DEFINE_TRACE_LITERAL(m4_bpatsubst([[$1]], [(.*)]))])]) +[AS_LITERAL_IF([$1], [AC_DEFINE_TRACE_LITERAL( + m4_if(m4_index([[$1]], [(]), [-1], [[$1]], + [m4_substr([[$1]], [0], m4_index([[$1]], [(]))]))])]) # AC_DEFINE(VARIABLE, [VALUE], [DESCRIPTION]) diff --git a/lib/m4sugar/m4sh.m4 b/lib/m4sugar/m4sh.m4 index 3334bdd..6f9e9a6 100644 --- a/lib/m4sugar/m4sh.m4 +++ b/lib/m4sugar/m4sh.m4 @@ -573,12 +573,22 @@ m4_define([AS_ESCAPE], # If STRING contains `\\' or `\$', it's modern. # If STRING contains `\"' or `\`', it's old. # Otherwise it's modern. -# We use two quotes in the pattern to keep highlighting tools at peace. +# +# Profiling shows that m4_index is 5 to 8x faster than m4_bregexp. The +# slower implementation used: +# m4_bmatch([$1], +# [\\[\\$]], [$2], +# [\\[`"]], [$3], +# [$2]) +# The current implementation caters to the common case of no backslashes, +# to minimize m4_index expansions (hence the nested if). m4_define([_AS_QUOTE_IFELSE], -[m4_bmatch([$1], - [\\[\\$]], [$2], - [\\[`""]], [$3], - [$2])]) +[m4_if(m4_index([$1], [\]), [-1], [$2], + [m4_if(m4_eval(m4_index([$1], [\\]) >= 0), [1], [$2], + m4_eval(m4_index([$1], [\$]) >= 0), [1], [$2], + m4_eval(m4_index([$1], [\`]) >= 0), [1], [$3], + m4_eval(m4_index([$1], [\"]) >= 0), [1], [$3], + [$2])])]) # _AS_QUOTE(STRING, [CHARS = `"]) @@ -1217,9 +1227,13 @@ m4_popdef([AS_Prefix])dnl # IF-INDIR, else IF-NOT-INDIR. # This is an *approximation*: for instance EXPRESSION = `\$' is # definitely a literal, but will not be recognized as such. +# +# We used to use m4_bmatch(m4_quote($1), [[`$]], [$3], [$2]), but +# profiling shows that it is faster to use m4_translit. m4_define([AS_LITERAL_IF], -[m4_bmatch(m4_quote($1), [[`$]], - [$3], [$2])]) +[m4_if(m4_len(m4_quote($1)), + m4_len(m4_translit(m4_dquote(m4_quote($1)), [`$])), + [$2], [$3])]) # AS_TMPDIR(PREFIX, [DIRECTORY = $TMPDIR [= /tmp]]) diff --git a/lib/m4sugar/m4sugar.m4 b/lib/m4sugar/m4sugar.m4 index b6f3b9b..8fea736 100644 --- a/lib/m4sugar/m4sugar.m4 +++ b/lib/m4sugar/m4sugar.m4 @@ -438,10 +438,16 @@ m4_define([m4_map_sep], # I would have liked to name this macro `m4_bpatsubst', unfortunately, # due to quotation problems, I need to double quote $1 below, therefore # the anchors are broken :( I can't let users be trapped by that. +# +# Recall that m4_shiftn always results in an argument. Hence, we need +# to distinguish between a final deletion vs. and ending recursion. m4_define([m4_bpatsubsts], [m4_if([$#], 0, [m4_fatal([$0: too few arguments: $#])], [$#], 1, [m4_fatal([$0: too few arguments: $#: $1])], [$#], 2, [m4_builtin([patsubst], $@)], + [_$0([EMAIL PROTECTED](m4_eval($# & 1), 0, [,]))])]) +m4_define([_m4_bpatsubsts], +[m4_if([$#], 2, [$1], [$0(m4_builtin([patsubst], [[$1]], [$2], [$3]), m4_shiftn(3, $@))])]) @@ -737,6 +743,9 @@ m4_define([_m4_divert], # KILL is only used to suppress output. m4_define([_m4_divert(KILL)], -1) +# The empty diversion name is a synonym for 0. +m4_define([_m4_divert()], 0) + # _m4_divert_n_stack # ------------------ @@ -1449,16 +1458,20 @@ m4_define([m4_flatten], # # Because we want to preserve active symbols, STRING must be double-quoted. # -# Then notice the 2 last patterns: they are in charge of removing the +# First, notice that we guarantee trailing space. Why? Because regex +# are greedy, and `.* ?' always groups the space into the .* portion. +# The algorithm is simpler by avoiding `?' at the end. The algorithm +# correctly strips everything if STRING is just ` '. +# +# Then notice the second pattern: it is in charge of removing the # leading/trailing spaces. Why not just `[^ ]'? Because they are -# applied to doubly quoted strings, i.e. more or less [[STRING]]. So -# if there is a leading space in STRING, then it is the *third* -# character, since there are two leading `['; equally for the last pattern. +# applied to over-quoted strings, i.e. more or less [STRING], due +# to the limitations of m4_bpatsubsts. So the leading space in STRING +# is the *second* character; equally for the trailing space. m4_define([m4_strip], -[m4_bpatsubsts([[$1]], +[m4_bpatsubsts([$1 ], [[ ]+], [ ], - [^\(..\) ], [\1], - [ \(..\)$], [\1])]) + [^. ?\(.*\) .$], [[[\1]]])]) # m4_normalize(STRING) @@ -1620,8 +1633,11 @@ m4_define([m4_text_box], # m4_qlen(STRING) # --------------- # Expands to the length of STRING after autom4te converts all quadrigraphs. +# +# Avoid bpatsubsts for the common case of no quadrigraphs. m4_define([m4_qlen], -[m4_len(m4_bpatsubsts([[$1]], [EMAIL PROTECTED](<:\|:>\|S|\|%:\)@], [P], [@&[EMAIL PROTECTED]))]) +[m4_if(m4_index([$1], [EMAIL PROTECTED]), [-1], [m4_len([$1])], + [m4_len(m4_bpatsubsts([[$1]], [EMAIL PROTECTED](<:\|:>\|S|\|%:\)@], [P], [@&[EMAIL PROTECTED]))])]) # m4_qdelta(STRING) @@ -1641,11 +1657,15 @@ m4_define([m4_qdelta], # ---------- # # The sign of the integer A. +# +# Rather than resort to eval or regex, we merely delete [0\t ], collapse +# all other digits to 1, then use the first two characters to decide. m4_define([m4_sign], -[m4_bmatch([$1], - [^-], -1, - [^0+], 0, - 1)]) +[m4_case(m4_substr(m4_translit([[$1]], [2-90 ], [11111111]), 0, 2), + [-1], [-1], + [-], [0], + [], [0], + [1])]) # m4_cmp(A, B) # ------------ -- 1.5.3.2
_______________________________________________ M4-discuss mailing list M4-discuss@gnu.org http://lists.gnu.org/mailman/listinfo/m4-discuss