On Thu, Jan 17, 2013 at 10:37 AM, Brian Paul <bri...@vmware.com> wrote:
> > In compiler.h we define the likely(), unlikely() macros which wrap GCC's > __builtin_expect(). But we only use them in a handful of places. > > It seems to me that an obvious place to possibly use these would be for GL > error testing. For example, in glDrawArrays(): > > if (unlikely(count <= 0)) { > _mesa_error(); > } > > Plus, in some of the glBegin/End per-vertex calls such as > glVertexAttrib3fARB() where we error test the index parameter. > > I guess the key question is how much might we gain from this. I don't > really have a good feel for the value at this level. In a tight inner > loop, sure, but the GL error checking is pretty high-level code. > > This is basically a micro-optimization, to be honest. Not that micro-optimization is "bad", but while it should "improve" performance, it would take a lot for that to show up on profiles. In the case of error checking at the start of a function, you might be lucky to save a few cycles -- virtually unnoticeable. > I haven't found much on the web about performance gains from > __builtin_expect(). Anyone? > > I read a few heresay posts, but this one comes with actual numbers: http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html Long story short: if you're wrong, slower; if you're right, marginal improvement. It's use is for changing the ordering of jumps from gcc's default of assume linear execution. For example, code like this: --- if(A == NULL) //not likely return ERR_NULL; if(B >= MAX) //not likely return ERR_MAX; if(C < MIN) //not likely return ERR_MIN; doStuff(); --- generates jumps around the return statement, so in the normal case, you're making a jump, which can mean you have a delay and possibly refetch instructions. If you didn't jump, then CPU will have the "then" part already loaded in the icache. The "optimal" ordering then is: if(A != NULL) { if(B < MAX) { if(C >= MIN) { doStuff(); } else return ERR_MIN; } else return ERR_MAX; } else return ERR_NULL; --- In the common case then, the code does not branch, but executes a linear stream of instructions. On modern x86 CPUs, this matters very little, except for maybe a few in-order CPUs (maybe Intel Atom?). You're probably a lot more likely to get some improvements from non-x86 where branch prediction is weaker or unavailable and/or the CPU is in-order. ARM and older SPARC CPUs come to mind. Also, some architectures allow you to encode a branch prediction hint inside of the branch itself, e.g. IA64's "br.call.sptk.many" Branch / Call / Static Predict Taken / Many Times, which gcc can take advantage of. Still overall, this is well within the realm of micro-optimization. Patrick
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev