On Tue, Apr 30, 2019 at 9:53 PM Jeff Law <l...@redhat.com> wrote: > > On 4/30/19 12:34 PM, cmdLP #CODE wrote: > > Hello GCC-team, > > > > I use GCC for all my C and C++ programs. I know how to use GCC, but I am > > not a contributor to GCC (yet). I often discover some problems C and C++ > > code have in general. There is often the choice between fast or readable > > code. Some code I have seen performs well but looks ugly (goto, etc.); > > other code is simple, but has some performance problems. What if we could > > make the simple code perform well? > > > > There is a common problem with conditional branches inside loops. This can > > decrease the performance of a program. To fix this issue, the conditional > > branch should be moved outside of the loop. Sometimes this optimization is > > done by the compiler, but guessing on optimizations done by the compiler is > > really bad. Often it is not easy to transform the source code to have the > > conditional branching outside the loop. Instead I propose a new attribute, > > which forces the compiler to do a conditional branch (based on the > > annotated parameter) at the beginning of a function. It branches to the > > corresponding code of the function compiled with the value being constant. > > > > Here is a code example, which contains this issue. > > > > enum reduce_op > > { > > REDUCE_ADD, > > REDUCE_MULT, > > REDUCE_MIN, > > REDUCE_MAX > > }; > > > > /* performance critical function */ > > void reduce_data(enum reduce_op reduce, > > unsigned const* data, > > unsigned data_size) > > { > > unsigned i, result, item; > > > > result = reduce == REDUCE_MULT ? 1u > > : reduce == REDUCE_MIN ? ~0u // ~0u is UINT_MAX > > : 0u; > > > > for(i = 0; i < data_size; ++i) > > { > > item = data[i]; > > > > switch(reduce) > > { > > case REDUCE_ADD: > > result += item; > > break; > > > > case REDUCE_MULT: > > result *= item; > > break; > > > > case REDUCE_MIN: > > if(item < result) result = item; > > // RIP: result <?= item; > > break; > > > > case REDUCE_MAX: > > if(item > result) result = item; > > // RIP: result >?= item; > > break; > > } > > } > > > > return result; > > } > > > > The value of reduce does not change inside the function. For this > > example, the optimization is trivial. But consider more complex examples. > > The function should be optimized to: > [ .... ] > This is loop unswitching. It's a standard GCC optimization. If it's > not working as well as it should, we're far better off determining why > and fixing the automatic transformation rather than relying on > attributes to drive the transformation.
It's currently not implemented for switch () stmts, just for conditionals. This also hurts SPEC cactusADM. There might be a missed-optimization bug about this. A simple recursive implementation might be possible; unswitch one case at a time - maybe order by profile probability. We already recurse on the unswitched bodies (in case multiple conditions can be unswitched) > > > What if the variable changes? > > > > When the variable changes there should be a jump to the corresponding > > parallel code of the compiled code with new value. > > > > Unoptimized > > > > /* removes comments starting with # and ending in a newline */ > > void remove_comment(char* dst, > > char const* src) > > { > > // initialization nessecary > > int state __attribute__((early_branch(0, 1))) = 0; > > > > char c; > > > > while(*src) > > { > > c = *src++; > > > > switch(state) > > { > > case 0: > > if(c == '#') > > state = 1; > > else > > *dst++ = c; > > > > break; > > > > case 1: > > if(c == '\n') > > { > > *dst++ = '\n'; > > state = 0; > > } > > > > break; > > } > > } > > *dst = '\0'; > > } > > > > changed to > > > > void remove_comment(char* dst, > > char const* src) > > { > > char c; > > > > switch(0) > > { > > case 0: > > while(*src) > > { > > c = *src++; > > if(c == '#') > > goto branch1; > > else > > *dst++ = c; > > branch0: > > } > > break; > > > > case 1: > > while(*src) > > { > > if(*src++ == '\n') > > { > > *dst++ = '\n'; > > goto branch0; > > } > > branch1: > > } > > break; > > } > > *dst = '\0'; > > } > > > > You see that the state is not tested in each iteration of the loop(s). For > > complex cases (like parsers), this can improve performance. This attribute > > is needed to avoid such a goto usage. I have seen such code, which is > > unmaintainable but performs better than the first solution. > This is backwards jump threading which was primarily designed to help > state machines by avoiding evaluation of the switch at each iteration of > the loop and instead jumping directly to the desired target. > > Again, if it's not working for a particular case we're far better off > figuring out why than relying on attributes to drive the transformation. > > jeff >