I was writing a routine to shift bits along an array. I want the compiler to use the shrdl assembler command. It will do this if I load the values into an (unsigned long long) value, bit it does a whole lot more - taking 22 seconds to shift 1024000000 words on an 500MHz Intel PIII. The enclosed C routine doesn't use (unsigned long long) and is a faster - taking 14 seconds.
1. I would like the optimiser to recognise a shrd like it recognises a ror. 2. I would like the optimiser to respect my choice of decrement to zero. (I think earlier versions (3) of gcc C compiler would respect my choice to decrement to zero rather then increment from zero to a limit.) The hand-optimised assembler, my_bit32.s, takes just 8 seconds. Some 4 seconds were gained by using shrdl and a further 2 seconds were gained by using decl instead of incl and cmpl. -- Summary: optimize array bit shift Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ajrobb at bigfoot dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27125