On Fri, Feb 13 2015, "George Spelvin" <li...@horizon.com> wrote:
>> the main loop is 20--3b. The test instruction at 2e seems to be >> redundant. The same at 37: the sub instruction already sets plenty of >> flags that could be used, so explicitly comparing %rbx to -1 seems >> redundant. > > Er... I think you hand-edited that code; it's wrong. The loop assumes that > %rbx is in units of words, but the prologue sets it up in units of bits. No, but I messed up the source by hand :-) My DIV_ROUND_UP macro was bogus. Well spotted. Fixing that I still see the redundant cmp and test, though. > The mov to %rcx is also redundant, since it could be eliminated with > some minor rescheduling. > > The code generation I *want* for that function is: > > # addr in %rdi, size in %rsi > movl %esi, %ecx > leaq 0x3f(%rsi), %rax > negl %ecx > movq $-1, %rdx > shrq $6, %rax > shrq %cl, %rdx > jmp 2f > 1: > movq $-1, %rdx > 2: > subq $1, %rax > jc 3f > andq (%rdi,%rax,8), %rdx > jeq 1b > > bsrq %rdx, %rdx > salq $6, %rax > addq %rdx, %rax > ret > 3: > movq %rsi, %rax > retq Nice. But I don't think find_last_bit is important enough to warrant arch-specific versions. So, where are we with this? Have we reached some kind of consensus? Rasmus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/