------- Additional Comments From mmitchel at gcc dot gnu dot org 2005-09-29 03:52 ------- Here is the current status for the four functions in Andi's testcase, with "f2" changed to use "32 - y" so that it is a proper rotation:
* f still generates a complex code sequence, but I'm not sure how much better we can do. Our code sequence doesn't look a lot worse than the sequence generated by icc 9.0, at first glance. We could try something like: if %ecx > 31: mov %eax, %ebx shldl $31, %edx, %eax shldl $31, %ebx, %edx %ecx -= 31 if %ecx > 31: mov %eax, %ebx shldl $31, %edx, %eax shldl $31, %ebx, %edx %ecx -= 31 if %ecx != 0: mov %eax, %ebx shldl %cl, %edx, %eax shldl %cl, %ebx, %edx but, that doesn't seem clearly better than what we presently generate. * f2 uses the roll instruction, which appears optimal. * f3 uses two shdl instructions, which appears optimal. * f4 uses the rorl instruction, which appears optimal. For all of f2 and f3, it looks like we generate code better than you get with icc 9.0. I have no plans to work on this further, for the time being, but I'll not close out the PRt; someone else might want to try to attack the code generated for the variable rotation case. Or, if people are satisfied, we can close the PR. -- What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|mark at codesourcery dot com|unassigned at gcc dot gnu | |dot org Status|ASSIGNED |NEW http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17886