On Wed, Jan 28, 2015 at 11:54 PM, Jan Beulich <jbeul...@suse.com> wrote: > Hello, > > in the Xen project we had (meanwhile fixed) code like this (meant to > be uniform between 32- and 64-bit): > > static inline int fls(unsigned int x) { > int ret; > asm("clz\t%0, %1" : "=r" (ret) : "r" (x)); > return BITS_PER_LONG - ret; > }
You want: asm("clz\t%w0, %w1" : "=r" (ret) : "r" (x)); The modifier 'w' should be documented but if it is not already. > > Being mainly an x86 person, when I first saw this I didn't understand > how this could be correct, as for aarch64 BITS_PER_LONG is 64, and > both operands being 32-bit I expected "clz w<a>, w<b>" to result. > Yet I had to learn that no matter what size the C operands, x<n> > registers are always being picked. Which still doesn't mean the above > is correct - a suitable call chain can leave a previous operation's > 64-bit result unconverted, making the above produce a supposedly > impossible result greater than 32. That is because the full register is xN but you want only the 32bit part of it. It is the same issue as on x86_64 where you want the lower 32bit part of it that is eax vs rax. > > Therefore I wonder whether aarch64_print_operand() shouldn't, > when neither the 'x' not the 'w' modifier is given, either - like > ix86_print_operand() (via print_reg()) - honor > GET_MODE_SIZE (GET_MODE (x)), or at the very least warn > when that one is more narrow than 64 bits. And yes, I realize that > this isn't going to be optimal (and could even be considered > inconsistent) as there's no way to express the low half word or > byte of a general register, i.e. operands more narrow than 32 bits > couldn't be fully checked without also knowing/evaluating the > instruction suffix, e.g. by introducing a 'z' operand modifier like > x86 has, or extending the existing 'e' one. No because sometimes you want to use the full register size as not all places where you use a register allows for wN (memory locations for one). Thanks, Andrew Pinski > > Jan >