Hello, in the Xen project we had (meanwhile fixed) code like this (meant to be uniform between 32- and 64-bit):
static inline int fls(unsigned int x) { int ret; asm("clz\t%0, %1" : "=r" (ret) : "r" (x)); return BITS_PER_LONG - ret; } Being mainly an x86 person, when I first saw this I didn't understand how this could be correct, as for aarch64 BITS_PER_LONG is 64, and both operands being 32-bit I expected "clz w<a>, w<b>" to result. Yet I had to learn that no matter what size the C operands, x<n> registers are always being picked. Which still doesn't mean the above is correct - a suitable call chain can leave a previous operation's 64-bit result unconverted, making the above produce a supposedly impossible result greater than 32. Therefore I wonder whether aarch64_print_operand() shouldn't, when neither the 'x' not the 'w' modifier is given, either - like ix86_print_operand() (via print_reg()) - honor GET_MODE_SIZE (GET_MODE (x)), or at the very least warn when that one is more narrow than 64 bits. And yes, I realize that this isn't going to be optimal (and could even be considered inconsistent) as there's no way to express the low half word or byte of a general register, i.e. operands more narrow than 32 bits couldn't be fully checked without also knowing/evaluating the instruction suffix, e.g. by introducing a 'z' operand modifier like x86 has, or extending the existing 'e' one. Jan