On Wed, Dec 20, 2017 at 09:57:47AM +0100, Peter Zijlstra wrote: > On Fri, Dec 15, 2017 at 03:41:40PM +0000, Patrick Bellasi wrote: > > Close enough, the actual code is: > > > > util_est = p->util_est.ewma; > > 5218: f9403ba3 ldr x3, [x29,#112] > > 521c: f9418462 ldr x2, [x3,#776] > > if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100)) > > 5220: eb010040 subs x0, x2, x1 > > 5224: da805400 cneg x0, x0, mi > > 5228: f100281f cmp x0, #0xa > > 522c: 54fff9cd b.le 5164 <dequeue_task_fair+0xa04> > > Ah, that cneg instruction is cute; on x86 we end up with something like: > > bool abs_test(long s) > { > return abs(s) < 32; > } > > cmpl $-31, %eax > jl .L107 > movq -8(%rbp), %rax > cmpl $31, %eax > jg .L107 > movl $1, %eax > jmp .L108 > .L107: > movl $0, %eax > .L108: > > > But I figured you can actually do: > > abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1) > > Which, if y is a constant, should result in nicer code, and it does for > x86: > > addq $31, %rax > cmpq $62, %rax > setbe %al > movzbl %al, %eax > > Just not measurably faster, I suppose because of all the dependencies :/
Ah no, it actually is, I'm an idiot and used 'long' for return value. If I use bool we loose that last movzbl and we go from around 4.0 cycles down to 3.4 cycles.