Hi there.
I recently came across a nice micro-optimization to permutate bytes
within a dword. On x86 all 24 combinations can be done using a maximum
of 3 instructions (bswap, rotate32 and rotate16).
I'd like to give it a try and write an optimization pass that detects
such dword permutations and replaces them with the optimal sequences. In
my tests this seems to be a good win in size and speed. I have the
feeling that the register-allocator likes the smaller sequences as well.
Anyway, I don't know where to start. I've browsed the source-code and
searched for related optimization passes that I could use as a
boiler-plate for my experiments.
So far I found two places that look interesting for me:
In optabs.c I've seen the code that combines two shifts and one or
into a rotate. Looks like a hard-coded special case for me.
In tree-ssa-math-opts.c I've seen a code that scans for some common
math operations and replaces them with faster code.
Since the codebase is huge I have the feeling that I have overlooked
something. Does some kind of infrastructure to detect patterns within a
SSA tree already exists somewhere else? Where would be the best place
in gcc to add an automatic byteswap detection?
I don't know if I'll ever finish the experiment and submit a patch. The
code-base *huge* and scary, but I'd at least like to give it a try.
Nils Pipenbrinck