Am 11.04.2014 09:52, schrieb Lauri Kasanen: > On Thu, 10 Apr 2014 21:30:03 +0200 > Christian K?nig <deathsimple at vodafone.de> wrote: > >>>>> Quick thought from someone entirely unfamiliar with the hardware: >>>>> perhaps you can get the performance benefit without the size increase >>>>> by moving the else portion into a non-inline function? I'm guessing >>>>> that most accesses happen in the "if" branch. >>>> The function call overhead is about equal to branching overhead, so >>>> splitting it would only help about half that. It's called from many >>>> places, and a lot of calls per sec. >> Actually direct register access shouldn't be necessary so often. Apart >> from page flips, write/read pointer updates and irq processing there >> shouldn't be so many of them. Could you clarify a bit more what issue >> you are seeing here? > Too much cpu usage for such a simple function. 2% makes it #2 in top-10 > radeon.ko functions, right after evergreen_cs_parse. For reference, #3 > (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's > usage.
I think you misunderstood me here. I do believe your numbers that it makes a noticeable difference. But I've did a couple of perf tests recently on SI and CIK while hacking on VM support, and IIRC r100_mm_rreg didn't showed up in the top 10 on those systems. So what puzzles me is who the hack is calling r100_mm_rreg so often that it makes a noticeable difference on evergreen/NI? Christian. > > As proved by the perf increase, it's called often enough that getting > rid of the function call overhead (and compiling the if out > compile-time) helps measurably. > > - Lauri