Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

Jose Fonseca Tue, 20 Oct 2015 15:59:18 -0700

On 20/10/15 23:16, Rowley, Timothy O wrote:

On Oct 20, 2015, at 4:23 PM, Jose Fonseca <jfons...@vmware.com> wrote:

I tried it on my i7-5500U, but I run into two issues:

- OpenSWR seems to only use 2 threads (even though my system support 4 threads)

- and even when I compensate llvmpipe to only use 2 rasterizer threads, I still only get 
half the framerate of llvmpipe with the "gloss" Mesa demo (a very simple 
texturing demo):

$ ./gloss
SWR create screen!
This processor supports AVX2.
720 frames in 5.004 seconds = 143.885 FPS
737 frames in 5.005 seconds = 147.253 FPS
729 frames in 5.004 seconds = 145.683 FPS
732 frames in 5.002 seconds = 146.341 FPS
735 frames in 5.001 seconds = 146.971 FPS
[...]
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1539 frames in 5.002 seconds = 307.677 FPS
1719 frames in 5 seconds = 343.8 FPS
1780 frames in 5.002 seconds = 355.858 FPS
1497 frames in 5.002 seconds = 299.28 FPS
1548 frames in 5.001 seconds = 309.538 FPS
[..]

I see similar ratio with more complex  workload with the trace from:

  http://people.freedesktop.org/~jrfonseca/traces/furmark-1.8.2-svga.trace

(you'll need to download https://github.com/apitrace/apitrace and build)

My questions are:

- Is this the expected performance when texturing is used? Or is there 
something wrong with my setup?


Two things are happening here to cause the behavior you’re seeing.  First, 
OpenSWR only generates threads equal to the number of physical cores.  On our 
workloads, going beyond that and using hyperthreads was a minimal or negative 
performance increase.  Second, one thread is reserved for the API thread, which 
does not participate in either frontend (geometry) or backend (fragment) work.  
Thus on your two core 5500U OpenSWR only had one raster thread versus 
llvmpipe’s two, giving half the performance.  If you want to switch OpenSWR to 
using hyperthreads, set the environment variable KNOB_MAX_THREADS_PER_CORE=0.


Thanks for the explanations.  It's closer now, but still a bit of gap:

$ KNOB_MAX_THREADS_PER_CORE=0 ./gloss
SWR create screen!
This processor supports AVX2.
--> numThreads = 3
1102 frames in 5.002 seconds = 220.312 FPS
1133 frames in 5.001 seconds = 226.555 FPS
1130 frames in 5.002 seconds = 225.91 FPS
^C
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1456 frames in 5 seconds = 291.2 FPS
1617 frames in 5.003 seconds = 323.206 FPS
1571 frames in 5.002 seconds = 314.074 FPS

One final question: you said that one thread is reserved for the API,but I see all threads (with top `H`) maxing up the CPU. So if thethread reserved for the API is not doing vertex/fragment processing,then what is it using 100% of a CPU thread for?

Final thoughts: I understand this project has its own history, but Iecho what Roland said -- it would be nice to unify with llvmpipe at onepoint, in some way or fashion. Our (VMware's) focus has been desktopcomposition, but there's no reason why a single SW renderer can'tsatisfy both ends of the spectrum, especially for JIT enable renderers,since they can emit at runtime the code most suited for the workload.

That said, it's really nice seeing Mesa and Gallium enabling this sortof experiments with SW rendering.



Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

Reply via email to