Eric Blake wrote: > First off, you don't appear to have copyright assignment on file for M4 > yet, and this is not a trivial patch. Care to follow through with that?
Yes, I'll do this ASAP. > Next, I'd like to finish merging the argv_ref into both branch-1_4 and > master before applying your patch, while it's still fresh on my mind. Sure, algorithmic changes should go in first. > | + if (curr_quote.len1 == 1 && curr_quote.len2 == 1 && !input_change > | + && isp) > > Not quite right. The condition here needs to be that quote_age is > non-zero (certain 1-byte quote combinations don't optimize well according > to the semantics required by M4, such as when they overlap with comment > delimiters; the calculation of quote_age already took this into account). I don't understand. In the current code, quote_age is taken is only taken into account by the INPUT_CHAIN part of next_char and next_char_1. If a test of quote_change is missing here in next_token, it is also missing in next_char and next_char_1. - Or do you mean that a new boolean variable 'singlechar_quotes' could be introduced, which shall always be equivalent to curr_quote.len1 == 1 && curr_quote.len2 == 1, and which shall be updated by set_quotes? > ~ I'm also considering adding a placeholder input block that always results > in CHAR_EOF, so that isp is guaranteed to be non-null and we have one less > branch in this loop. Tried this already (see attached patch). It does not provide a noticeable speedup: 41.0 sec instead of 41.1, that's less than 0.5%. You get more speedup (about 1%) by use of __builtin_expect (also attached). > Basically, rather than calling next_char() all the time, I envision: > > /* Return a pointer into the buffer available from the current input > block, and set *LEN to the length of the result. If the next character to > be parsed cannot be represented as an unsigned char (such as CHAR_EOF), or > if the input block does not have read-ahead data buffered at the moment, > return NULL, and the caller must fall back to using next_char(). */ > char *curr_buf (size_t *len); > > /* Discard LEN bytes from the current input block. LEN must be less than > or equal to the previous length returned by a successful call to > curr_buf(). */ > void consume (size_t len); Very good. The entire idea of my patch was to 1. avoid a function call for every character, 2. treat consecutive characters which are all alike regarding syntax in a single operation. Inlining next_char and next_char_1 was a way to do it; these new primitives are another way to do it, and a better one - since they don't require as much code duplication. Now I see where freadptr() and freadseek() come in handy... > | + else if (chain->type == CHAIN_STR && chain->u.u_s.len > 0) > | + { > | + unsigned char curr_quote_1 = > | + to_uchar (curr_quote.str1[0]); > > Unnecessary cast. char is assignable to unsigned char without issues. Yes, but I prefer casts to be explicit rather than implicit. > It seems a shame that there isn't really a function optimized for > searching for the first of two characters among a fixed-size memory block. Yes. We have strcspn but it is not well optimized for the case of only 1 or 2 delimiter chars. > /* Return the address of the first character of either C1 or C2, treated > as unsigned int, that occurs within the first N bytes of S; else return > NULL if neither character occurs. */ > void *memchr2 (void const *s, int c1, int c2, size_t n); Yes, I see where this function would come in handy. Bruno
41.0 sec user + 0.9 sec system *** input.c.patch1 2008-02-28 01:45:09.000000000 +0100 --- input.c 2008-02-28 01:59:43.000000000 +0100 *************** *** 75,81 **** INPUT_STRING, /* String resulting from macro expansion. */ INPUT_FILE, /* File from command line or include. */ INPUT_MACRO, /* Builtin resulting from defn. */ ! INPUT_CHAIN /* FIFO chain of separate strings and $@ refs. */ }; typedef enum input_type input_type; --- 75,82 ---- INPUT_STRING, /* String resulting from macro expansion. */ INPUT_FILE, /* File from command line or include. */ INPUT_MACRO, /* Builtin resulting from defn. */ ! INPUT_CHAIN, /* FIFO chain of separate strings and $@ refs. */ ! INPUT_NONE /* End of the input block stack. */ }; typedef enum input_type input_type; *************** *** 115,120 **** --- 116,126 ---- u; }; + /* This input block is an indicator that the input stack is at its end. + This fake input block avoids the need for testing 'isp' in next_char() + and next_char_1(). */ + static struct input_block input_block_none = { NULL, INPUT_NONE }; + /* Current input file name. */ const char *current_file; *************** *** 307,313 **** { /* Free any memory occupied by completely parsed strings. */ assert (next == NULL); ! while (isp && pop_input (false)); /* Reserve the next location on the obstack. */ next = (input_block *) obstack_alloc (current_input, sizeof *next); --- 313,319 ---- { /* Free any memory occupied by completely parsed strings. */ assert (next == NULL); ! while (isp->type != INPUT_NONE && pop_input (false)); /* Reserve the next location on the obstack. */ next = (input_block *) obstack_alloc (current_input, sizeof *next); *************** *** 607,613 **** return false; if (debug_level & DEBUG_TRACE_INPUT) { ! if (tmp) DEBUG_MESSAGE2 ("input reverted to %s, line %d", tmp->file, tmp->line); else --- 613,619 ---- return false; if (debug_level & DEBUG_TRACE_INPUT) { ! if (tmp->type != INPUT_NONE) DEBUG_MESSAGE2 ("input reverted to %s, line %d", tmp->file, tmp->line); else *************** *** 652,658 **** obstack_free (current_input, NULL); free (current_input); ! if (wsp == NULL) { /* End of the program. Free all memory even though we are about to exit, since it makes leak detection easier. */ --- 658,664 ---- obstack_free (current_input, NULL); free (current_input); ! if (wsp->type == INPUT_NONE) { /* End of the program. Free all memory even though we are about to exit, since it makes leak detection easier. */ *************** *** 671,677 **** obstack_init (wrapup_stack); isp = wsp; ! wsp = NULL; input_change = true; return true; --- 677,683 ---- obstack_init (wrapup_stack); isp = wsp; ! wsp = &input_block_none; input_change = true; return true; *************** *** 753,761 **** while (1) { - if (block == NULL) - return CHAR_EOF; - switch (block->type) { case INPUT_STRING: --- 759,764 ---- *************** *** 819,824 **** --- 822,830 ---- } break; + case INPUT_NONE: + return CHAR_EOF; + default: assert (!"peek_input"); abort (); *************** *** 841,847 **** `-------------------------------------------------------------------*/ #define next_char(AQ) \ ! (isp && isp->str_len && !input_change \ ? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \ : next_char_1 (AQ)) --- 847,853 ---- `-------------------------------------------------------------------*/ #define next_char(AQ) \ ! (isp->str_len && !input_change \ ? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \ : next_char_1 (AQ)) *************** *** 853,865 **** while (1) { - if (isp == NULL) - { - current_file = ""; - current_line = 0; - return CHAR_EOF; - } - if (input_change) { current_file = isp->file; --- 859,864 ---- *************** *** 949,954 **** --- 948,958 ---- } break; + case INPUT_NONE: + current_file = ""; + current_line = 0; + return CHAR_EOF; + default: assert (!"next_char_1"); abort (); *************** *** 1195,1202 **** obstack_init (&token_stack); token_bottom = obstack_finish (&token_stack); ! isp = NULL; ! wsp = NULL; next = NULL; start_of_input_line = false; --- 1199,1206 ---- obstack_init (&token_stack); token_bottom = obstack_finish (&token_stack); ! isp = &input_block_none; ! wsp = &input_block_none; next = NULL; start_of_input_line = false;
40.7 sec user + 0.9 sec system *** input.c.patch3 2008-02-28 02:11:11.000000000 +0100 --- input.c 2008-02-28 02:23:00.000000000 +0100 *************** *** 64,69 **** --- 64,78 ---- # include "regex.h" #endif /* ENABLE_CHANGEWORD */ + /* __builtin_expect(condition, expected_value) tells GCC about the expected + value of boolean conditions. */ + #if __GNUC__ < 3 + # undef __builtin_expect + # define __builtin_expect(condition, expected_value) (condition) + #endif + #define likely(condition) __builtin_expect (condition, 1) + #define unlikely(condition) __builtin_expect (condition, 0) + /* Number of bytes where it is more efficient to inline the reference as a string than it is to track reference bookkeeping for those bytes. */ *************** *** 762,774 **** switch (block->type) { case INPUT_STRING: ! if (!block->str_len) break; return to_uchar (block->u.u_s.str[0]); case INPUT_FILE: ch = getc (block->u.u_f.fp); ! if (ch != EOF) { ungetc (ch, block->u.u_f.fp); return ch; --- 771,783 ---- switch (block->type) { case INPUT_STRING: ! if (unlikely (!block->str_len)) break; return to_uchar (block->u.u_s.str[0]); case INPUT_FILE: ch = getc (block->u.u_f.fp); ! if (likely (ch != EOF)) { ungetc (ch, block->u.u_f.fp); return ch; *************** *** 847,853 **** `-------------------------------------------------------------------*/ #define next_char(AQ) \ ! (isp->str_len && !input_change \ ? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \ : next_char_1 (AQ)) --- 856,862 ---- `-------------------------------------------------------------------*/ #define next_char(AQ) \ ! (likely (isp->str_len) && likely (!input_change) \ ? (isp->str_len--, to_uchar (*isp->u.u_s.str++)) \ : next_char_1 (AQ)) *************** *** 859,865 **** while (1) { ! if (input_change) { current_file = isp->file; current_line = isp->line; --- 868,874 ---- while (1) { ! if (unlikely (input_change)) { current_file = isp->file; current_line = isp->line; *************** *** 869,881 **** switch (isp->type) { case INPUT_STRING: ! if (!isp->str_len) break; isp->str_len--; return to_uchar (*isp->u.u_s.str++); case INPUT_FILE: ! if (start_of_input_line) { start_of_input_line = false; current_line = ++isp->line; --- 878,890 ---- switch (isp->type) { case INPUT_STRING: ! if (unlikely (!isp->str_len)) break; isp->str_len--; return to_uchar (*isp->u.u_s.str++); case INPUT_FILE: ! if (unlikely (start_of_input_line)) { start_of_input_line = false; current_line = ++isp->line; *************** *** 884,895 **** /* If stdin is a terminal, calling getc after peek_input already called it would make the user have to hit ^D twice to quit. */ ! if (!isp->u.u_f.end) { ch = getc (isp->u.u_f.fp); ! if (ch != EOF) { ! if (ch == '\n') start_of_input_line = true; return ch; } --- 893,904 ---- /* If stdin is a terminal, calling getc after peek_input already called it would make the user have to hit ^D twice to quit. */ ! if (likely (!isp->u.u_f.end)) { ch = getc (isp->u.u_f.fp); ! if (likely (ch != EOF)) { ! if (unlikely (ch == '\n')) start_of_input_line = true; return ch; } *************** *** 1516,1522 **** /* Can't consume character until after CHAR_MACRO is handled. */ TOKEN_DATA_TYPE (td) = TOKEN_VOID; ch = peek_input (allow_argv && current_quote_age); ! if (ch == CHAR_EOF) { #ifdef DEBUG_INPUT xfprintf (stderr, "next_token -> EOF\n"); --- 1525,1531 ---- /* Can't consume character until after CHAR_MACRO is handled. */ TOKEN_DATA_TYPE (td) = TOKEN_VOID; ch = peek_input (allow_argv && current_quote_age); ! if (unlikely (ch == CHAR_EOF)) { #ifdef DEBUG_INPUT xfprintf (stderr, "next_token -> EOF\n"); *************** *** 1524,1530 **** next_char (false); return TOKEN_EOF; } ! if (ch == CHAR_MACRO) { init_macro_token (td); next_char (false); --- 1533,1539 ---- next_char (false); return TOKEN_EOF; } ! if (unlikely (ch == CHAR_MACRO)) { init_macro_token (td); next_char (false); *************** *** 1534,1540 **** #endif /* DEBUG_INPUT */ return TOKEN_MACDEF; } ! if (ch == CHAR_ARGV) { init_argv_token (obs, td); #ifdef DEBUG_INPUT --- 1543,1549 ---- #endif /* DEBUG_INPUT */ return TOKEN_MACDEF; } ! if (unlikely (ch == CHAR_ARGV)) { init_argv_token (obs, td); #ifdef DEBUG_INPUT *************** *** 1549,1555 **** next_char (false); /* Consume character we already peeked at. */ file = current_file; *line = current_line; ! if (MATCH (ch, curr_comm.str1, true)) { if (obs) obs_td = obs; --- 1558,1564 ---- next_char (false); /* Consume character we already peeked at. */ file = current_file; *line = current_line; ! if (unlikely (MATCH (ch, curr_comm.str1, true))) { if (obs) obs_td = obs; *************** *** 1570,1576 **** type = TOKEN_STRING; } ! else if (default_word_regexp && (isalpha (ch) || ch == '_')) { obstack_1grow (&token_stack, ch); while ((ch = peek_input (false)) < CHAR_EOF --- 1579,1585 ---- type = TOKEN_STRING; } ! else if (unlikely (default_word_regexp && (isalpha (ch) || ch == '_'))) { obstack_1grow (&token_stack, ch); while ((ch = peek_input (false)) < CHAR_EOF *************** *** 1584,1590 **** #ifdef ENABLE_CHANGEWORD ! else if (!default_word_regexp && word_regexp.fastmap[ch]) { obstack_1grow (&token_stack, ch); while (1) --- 1593,1599 ---- #ifdef ENABLE_CHANGEWORD ! else if (unlikely (!default_word_regexp && word_regexp.fastmap[ch])) { obstack_1grow (&token_stack, ch); while (1) *************** *** 1644,1650 **** while (1) { ch = next_char (obs != NULL && current_quote_age); ! if (ch == CHAR_EOF) /* Current_file changed to "" if we see CHAR_EOF, use the previous value we stored earlier. */ m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller, --- 1653,1659 ---- while (1) { ch = next_char (obs != NULL && current_quote_age); ! if (unlikely (ch == CHAR_EOF)) /* Current_file changed to "" if we see CHAR_EOF, use the previous value we stored earlier. */ m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
_______________________________________________ M4-discuss mailing list M4-discuss@gnu.org http://lists.gnu.org/mailman/listinfo/m4-discuss