On Fri, Nov 12, 2010 at 10:50 PM, Francisco Jerez <curroje...@riseup.net> wrote: > Jerome Glisse <j.gli...@gmail.com> writes: > >> Hi, >> >>[...] >> In order to find out which part of the stack is underperforming in >> front of state changes I slowly disabled layer starting by the bottom >> (which is the only way to do this ;o)). Thus i disabled the command >> buffer submission to the GPU (r600g-nogpu) and made sure the driver >> still believed things where happening. Drawoverhead state change from >> 123t(call/sec-r600g) to 220t(call/sec-r600g-nogpu). So the GPU is >> slowing down things a bit but not that much, also comparing sysprof >> shows that we are spending lot of time in cs ioctl. >> > In nouveau we also had a little performance problem with our pushbuf > ioctl, larger command buffers helped a lot because that allowed > userspace to pile up a considerable amount of rendering before coming > back to kernel mode (this fix might be completely irrelevant to your > case though, apparently the radeon CS IOCTL is O(n) on the number of > dwords submitted while its nouveau counterpart is O(1)). >
Yes, sadly for us kernel is our next bottleneck but i don't think we are hitting it yet. >> Next was to disable the r600g pipe driver, basically turning the >> driver into no-op where each call into it is ignored except for >> buffer/resource/texture allocations. Drawoverhead state change from >> 220t(call/sec-r600g-nogpu) to 1700t(call/sec-r600g-nogpu-nopipe). >> Obviously the r600g pipe is a CPU intensive task, lot of registers >> marshmalling. But the most disturbing fact is that we achieve 24.6 >> times less draw call per second when there is a state change than when >> there is none. Pointing out that the pipe driver is likely not the >> only one to blame. >> > Relative terms are somewhat misleading here, these are absolute > overheads calculated from your results: > > r600g ngnpnb 0.005155 ms/draw > r600g ngnp 0.013905 ms/draw > r600g ng 0.017194 ms/draw > r600g 0.021282 ms/draw > nv47g 0.006248 ms/draw > > So, yes, the pipe driver is definitely not the only one to be blamed, > but at least a 75% of the total overhead comes from below the mesa state > tracker. > Yes, i described the r600g pipe issues in my reply to Marek >> Last was to see if our memory allocation throught gem/ttm was hurting >> us. Yes it does (drawoverhead no state change >> 1600t(call/sec-r600g-nogpu-nopipe-nobo, drawoverhead state change >> 173t(call/sec-r600g-nogpu-nopipe-nobo). So when we use malloc for >> buffer allocation the performances, between no state change and a >> state change, drops only by a factor of 9.4. So obviously GPU buffer >> allocation is costing us a lot. >> > The question is why is GEM/TTM costing you anything at all in this > particular case, given working and smart enough buffer suballocation or > caching, "drawoverhead" wouldn't have ever met TTM inside its main loop. > According to sysprof most of the overhead is in pb_bufmgr_* helpers iirc. >>[...] >> this ? I didn't spot any obvious mistake in mesa state tracker. Of >> course one could argue that it's the pipe driver which is slow but i >> don't it's the only one to blame. Classic driver doesn't fallover in >> drawoverhead test, thought classic driver are lot less performant on >> this benchmark so maybe bottleneck in classic is also somewhere in >> state world. >> > I'm not going to take your r600c results literally because something > else was seriously slowing it down in this test, for comparison I've > repeated the same benchmark with the nouveau classic driver, this is > what I get: > draw only draw nop sc draw sc overhead > nv17 (classic) 1600t 1500t 685.3t 0.000835 ms/draw > nv17 (blob) 6100t 6100t 303.8t 0.003127 ms/draw > > nouveau classic seems *less* affected by state changes than the nvidia > blob, so I wouldn't blame the fact that you're going through mesa > (instead of an alternative non-existent state tracker) for this. > I think r600c is just a bit too naive and so it end up being very expensive to change any states with it. But i haven't took a closer look. I don't think we should look too much at relative cost of changing state. I think fglrx optimized the function call cost just enough so that it didn't impact performances, while nvidia did go nuts and over optimized function call overhead. Thus i think target should be more about making sure core mesa + gallium with noop pipe driver should be able to keep up at 500t draw call/sec when states change occur (of course this could vary depending on which states change) and not 173t call/sec. Cheers, Jerome Glisse _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev