Hello GCC-team, I use GCC for all my C and C++ programs. I know how to use GCC, but I am not a contributor to GCC (yet). I often discover some problems C and C++ code have in general. There is often the choice between fast or readable code. Some code I have seen performs well but looks ugly (goto, etc.); other code is simple, but has some performance problems. What if we could make the simple code perform well?
There is a common problem with conditional branches inside loops. This can decrease the performance of a program. To fix this issue, the conditional branch should be moved outside of the loop. Sometimes this optimization is done by the compiler, but guessing on optimizations done by the compiler is really bad. Often it is not easy to transform the source code to have the conditional branching outside the loop. Instead I propose a new attribute, which forces the compiler to do a conditional branch (based on the annotated parameter) at the beginning of a function. It branches to the corresponding code of the function compiled with the value being constant. Here is a code example, which contains this issue. enum reduce_op { REDUCE_ADD, REDUCE_MULT, REDUCE_MIN, REDUCE_MAX }; /* performance critical function */ void reduce_data(enum reduce_op reduce, unsigned const* data, unsigned data_size) { unsigned i, result, item; result = reduce == REDUCE_MULT ? 1u : reduce == REDUCE_MIN ? ~0u // ~0u is UINT_MAX : 0u; for(i = 0; i < data_size; ++i) { item = data[i]; switch(reduce) { case REDUCE_ADD: result += item; break; case REDUCE_MULT: result *= item; break; case REDUCE_MIN: if(item < result) result = item; // RIP: result <?= item; break; case REDUCE_MAX: if(item > result) result = item; // RIP: result >?= item; break; } } return result; } The value of reduce does not change inside the function. For this example, the optimization is trivial. But consider more complex examples. The function should be optimized to: void reduce_data(enum reduce_op reduce, unsigned const* data, unsigned data_size) { unsigned i, result, item; switch(reduce) { case REDUCE_ADD: result = 0; for(i = 0; i < data_size; ++i) result += data[i]; return result; case REDUCE_MULT: result = 1; for(i = 0; i < data_size; ++i) result *= data[i]; return result; case REDUCE_MIN: result = ~0u; for(i = 0; i < data_size; ++i) { item = data[i]; if(item < result) result = item; } return result; case REDUCE_MAX: result = 0; for(i = 0; i < data_size; ++i) { item = data[i]; if(item > result) result = item; } return result; default: return 0u; } } As I have mentioned, the value is treated as constant in the code. The compiler simply compiles the body of the function for each enum-value as if it was constant and puts it in a conditional branch. This means that __builtin_constant_p evaluates to true/1, when the parameter has the attribute. But duplicate code should also be avoided, the initialization of the loop counter i could be moved before the branch. You might ask, what about values which are not declared in the enum? The compiler simply compiles a default case as in the example code above. The default case equals to the normal code (as if the variable was not annotated with the attribute), except that the compiler knows, that the enum value is never a value declared in the enum. In this case the return value is always 0. In the default case __builtin_constant_p with the annotated variable evaluates to false. (But as reduce == REDUCE_ADD is always false __builtin_constant_p(reduce == REDUCE_ADD) should be always true.) The attribute can only be applied in the implementation of a function. I call the attribute early_branch. The usage of the attribute could be void reduce_data(enum reduce_op reduce __attribute__((early_branch)), unsigned const* data, unsigned data_size) { ... } The way to remove the default case could be to just test against non-enum values and insert __builtin_unreachable() if(reduce != REDUCE_ADD && reduce != REDUCE_MULT && reduce != REDUCE_MIN && reduce != REDUCE_MAX) __builtin_unreachable(); Or test if it is not constant if(!__builtin_constant_p(reduce)) __builtin_unreachable(); For this case, there could be a problem when the proposed attribute is not available. There could be either a builtin __builtin_default_branch_p or a macro definition. #if __has_attribute(early_branch) #define is_default_branch(v) (!__builtin_constant_p(v)) #else #define is_default_branch(v) 0 #endif The attribute can take additional parameters, which can specify the values to compile the code seperately for, this can include non-enum values void do_stuff(unsigned element_size __attribute__((early_branch(1, 2, 4, 8)))) { ... } The bool/_Bool type is interpreted as an enum with two values. What if the variable changes? When the variable changes there should be a jump to the corresponding parallel code of the compiled code with new value. Unoptimized /* removes comments starting with # and ending in a newline */ void remove_comment(char* dst, char const* src) { // initialization nessecary int state __attribute__((early_branch(0, 1))) = 0; char c; while(*src) { c = *src++; switch(state) { case 0: if(c == '#') state = 1; else *dst++ = c; break; case 1: if(c == '\n') { *dst++ = '\n'; state = 0; } break; } } *dst = '\0'; } changed to void remove_comment(char* dst, char const* src) { char c; switch(0) { case 0: while(*src) { c = *src++; if(c == '#') goto branch1; else *dst++ = c; branch0: } break; case 1: while(*src) { if(*src++ == '\n') { *dst++ = '\n'; goto branch0; } branch1: } break; } *dst = '\0'; } You see that the state is not tested in each iteration of the loop(s). For complex cases (like parsers), this can improve performance. This attribute is needed to avoid such a goto usage. I have seen such code, which is unmaintainable but performs better than the first solution. I hope that some of this is implementable. I'd like to discuss about this and want to know what you think. cmdLP