Re: M4 1.4.10b [beta] released

Eric Blake Thu, 28 Feb 2008 18:34:08 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Bruno Haible on 2/28/2008 7:02 AM:
| i.e. an additional 10% speedup.
|
| The speed-relevant code is only the case for INPUT_CHAIN. I added the one
| for INPUT_STRING because it may be useful it situations I don't know of.


Thanks for the ideas - while this patch isn't ready to apply yet, it
definitely has merit before M4 1.4.11 (actually, I'm really starting to
lean towards calling the next non-beta release 1.5, because of the speed
increases).

First off, you don't appear to have copyright assignment on file for M4
yet, and this is not a trivial patch.  Care to follow through with that?

Next, I'd like to finish merging the argv_ref into both branch-1_4 and
master before applying your patch, while it's still fresh on my mind.  If
you could rework your patch to apply atop the argv_ref branch, that would
make it a bit easier, as several of the remaining patches on that branch
already rearrange the behavior of next_char and friends.

| +       if (curr_quote.len1 == 1 && curr_quote.len2 == 1 && !input_change
| +           && isp)

Not quite right.  The condition here needs to be that quote_age is
non-zero (certain 1-byte quote combinations don't optimize well according
to the semantics required by M4, such as when they overlap with comment
delimiters; the calculation of quote_age already took this into account).
~ I'm also considering adding a placeholder input block that always results
in CHAR_EOF, so that isp is guaranteed to be non-null and we have one less
branch in this loop.

| +         {
| +           /* The optimized case.  It heavily inlines the MATCH macro and
| +              the next_char and next_char_1 functions, to the point that
| +              the scan is a loop over a region of memory followed by a
| +              simple memory copy operation.
| +              The case with INPUT_CHAIN alone can speed up GNU autoconf
| +              runs by 10%.  */

Yes, I can see where this idea has merit.  The whole reason that next_char
was turned into a macro with a fallback of next_char_1 was because it
reduced a huge number of function calls in the common case, but then I
went and blew that optimization away since more input is now chain based
rather than string based.  But I think I'd rather factor it a bit
differently (particularly since master has 'virtualized' the four input
block types, making it harder to optimize on just one, but making it
easier to add a new access pattern).  Basically, rather than calling
next_char() all the time, I envision:

/* Return a pointer into the buffer available from the current input
block, and set *LEN to the length of the result.  If the next character to
be parsed cannot be represented as an unsigned char (such as CHAR_EOF), or
if the input block does not have read-ahead data buffered at the moment,
return NULL, and the caller must fall back to using next_char().  */
char *curr_buf (size_t *len);

/* Discard LEN bytes from the current input block.  LEN must be less than
or equal to the previous length returned by a successful call to
curr_buf().  */
void consume (size_t len);

Then you can easily operate on the buffer, with no intervening function
calls while the buffer is not emptied, and consume the appropriate number
of bytes from the block all at once.

| +                   else if (chain->type == CHAIN_STR && chain->u.u_s.len > 0)
| +                     {
| +                       unsigned char curr_quote_1 =
| +                         to_uchar (curr_quote.str1[0]);

Unnecessary cast.  char is assignable to unsigned char without issues.
The cast is only needed when assigning char to int where the int will be
treated as an extended unsigned char (such as when CHAR_EOF or CHAR_QUOTE
factors into the picture).

It seems a shame that there isn't really a function optimized for
searching for the first of two characters among a fixed-size memory block.
~ These days, it is much more efficient to do a vector-based approach -
calculate one or two vector-sized masks (8 or even 16 bytes), then step
through memory in large chunks using the mask to see if we've hit an
interesting vector, rather than searching one byte at a time.  strchr() is
close (searching for the first of NUL or one character of choice), but it
is not length-limited, and it fixes one of the two bytes. getndelim2 is
also close (searching for the first of two characters of choice in a
length-limited manner), but is constrained to reading from a stream (and
fmemopen is not yet portable).  Maybe it's time to also write a function
along these lines, and using the best of strchr and getndelim2 in its
implementation:

/* Return the address of the first character of either C1 or C2, treated
as unsigned int, that occurs within the first N bytes of S; else return
NULL if neither character occurs.  */
void *memchr2 (void const *s, int c1, int c2, size_t n);

- --
Don't work too hard, make some time for fun as well!

Eric Blake             [EMAIL PROTECTED]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHx28x84KuGfSFAYARAo5FAJ9adfyCRVKOacS3fZleA8YvQ02aMwCdHz+/
qL5/3+HGLQFq63FwufMRrwY=
=yTSW
-----END PGP SIGNATURE-----


_______________________________________________
M4-discuss mailing list
M4-discuss@gnu.org
http://lists.gnu.org/mailman/listinfo/m4-discuss

Re: M4 1.4.10b [beta] released

Reply via email to