-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Bruno Haible on 2/28/2008 7:02 AM: | i.e. an additional 10% speedup. | | The speed-relevant code is only the case for INPUT_CHAIN. I added the one | for INPUT_STRING because it may be useful it situations I don't know of.
Thanks for the ideas - while this patch isn't ready to apply yet, it definitely has merit before M4 1.4.11 (actually, I'm really starting to lean towards calling the next non-beta release 1.5, because of the speed increases). First off, you don't appear to have copyright assignment on file for M4 yet, and this is not a trivial patch. Care to follow through with that? Next, I'd like to finish merging the argv_ref into both branch-1_4 and master before applying your patch, while it's still fresh on my mind. If you could rework your patch to apply atop the argv_ref branch, that would make it a bit easier, as several of the remaining patches on that branch already rearrange the behavior of next_char and friends. | + if (curr_quote.len1 == 1 && curr_quote.len2 == 1 && !input_change | + && isp) Not quite right. The condition here needs to be that quote_age is non-zero (certain 1-byte quote combinations don't optimize well according to the semantics required by M4, such as when they overlap with comment delimiters; the calculation of quote_age already took this into account). ~ I'm also considering adding a placeholder input block that always results in CHAR_EOF, so that isp is guaranteed to be non-null and we have one less branch in this loop. | + { | + /* The optimized case. It heavily inlines the MATCH macro and | + the next_char and next_char_1 functions, to the point that | + the scan is a loop over a region of memory followed by a | + simple memory copy operation. | + The case with INPUT_CHAIN alone can speed up GNU autoconf | + runs by 10%. */ Yes, I can see where this idea has merit. The whole reason that next_char was turned into a macro with a fallback of next_char_1 was because it reduced a huge number of function calls in the common case, but then I went and blew that optimization away since more input is now chain based rather than string based. But I think I'd rather factor it a bit differently (particularly since master has 'virtualized' the four input block types, making it harder to optimize on just one, but making it easier to add a new access pattern). Basically, rather than calling next_char() all the time, I envision: /* Return a pointer into the buffer available from the current input block, and set *LEN to the length of the result. If the next character to be parsed cannot be represented as an unsigned char (such as CHAR_EOF), or if the input block does not have read-ahead data buffered at the moment, return NULL, and the caller must fall back to using next_char(). */ char *curr_buf (size_t *len); /* Discard LEN bytes from the current input block. LEN must be less than or equal to the previous length returned by a successful call to curr_buf(). */ void consume (size_t len); Then you can easily operate on the buffer, with no intervening function calls while the buffer is not emptied, and consume the appropriate number of bytes from the block all at once. | + else if (chain->type == CHAIN_STR && chain->u.u_s.len > 0) | + { | + unsigned char curr_quote_1 = | + to_uchar (curr_quote.str1[0]); Unnecessary cast. char is assignable to unsigned char without issues. The cast is only needed when assigning char to int where the int will be treated as an extended unsigned char (such as when CHAR_EOF or CHAR_QUOTE factors into the picture). It seems a shame that there isn't really a function optimized for searching for the first of two characters among a fixed-size memory block. ~ These days, it is much more efficient to do a vector-based approach - calculate one or two vector-sized masks (8 or even 16 bytes), then step through memory in large chunks using the mask to see if we've hit an interesting vector, rather than searching one byte at a time. strchr() is close (searching for the first of NUL or one character of choice), but it is not length-limited, and it fixes one of the two bytes. getndelim2 is also close (searching for the first of two characters of choice in a length-limited manner), but is constrained to reading from a stream (and fmemopen is not yet portable). Maybe it's time to also write a function along these lines, and using the best of strchr and getndelim2 in its implementation: /* Return the address of the first character of either C1 or C2, treated as unsigned int, that occurs within the first N bytes of S; else return NULL if neither character occurs. */ void *memchr2 (void const *s, int c1, int c2, size_t n); - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHx28x84KuGfSFAYARAo5FAJ9adfyCRVKOacS3fZleA8YvQ02aMwCdHz+/ qL5/3+HGLQFq63FwufMRrwY= =yTSW -----END PGP SIGNATURE----- _______________________________________________ M4-discuss mailing list M4-discuss@gnu.org http://lists.gnu.org/mailman/listinfo/m4-discuss