Hi there.

I recently came across a nice micro-optimization to permutate bytes within a dword. On x86 all 24 combinations can be done using a maximum of 3 instructions (bswap, rotate32 and rotate16).

I'd like to give it a try and write an optimization pass that detects such dword permutations and replaces them with the optimal sequences. In my tests this seems to be a good win in size and speed. I have the feeling that the register-allocator likes the smaller sequences as well.

Anyway, I don't know where to start. I've browsed the source-code and searched for related optimization passes that I could use as a boiler-plate for my experiments.

So far I found two places that look interesting for me:

In optabs.c I've seen the code that combines two shifts and one or into a rotate. Looks like a hard-coded special case for me.

In tree-ssa-math-opts.c I've seen a code that scans for some common math operations and replaces them with faster code.

Since the codebase is huge I have the feeling that I have overlooked something. Does some kind of infrastructure to detect patterns within a SSA tree already exists somewhere else? Where would be the best place in gcc to add an automatic byteswap detection?

I don't know if I'll ever finish the experiment and submit a patch. The code-base *huge* and scary, but I'd at least like to give it a try.

 Nils Pipenbrinck



Reply via email to