On 20/10/15 23:16, Rowley, Timothy O wrote:
On Oct 20, 2015, at 4:23 PM, Jose Fonseca <jfons...@vmware.com> wrote:
I tried it on my i7-5500U, but I run into two issues:
- OpenSWR seems to only use 2 threads (even though my system support 4 threads)
- and even when I compensate llvmpipe to only use 2 rasterizer threads, I still only get
half the framerate of llvmpipe with the "gloss" Mesa demo (a very simple
texturing demo):
$ ./gloss
SWR create screen!
This processor supports AVX2.
720 frames in 5.004 seconds = 143.885 FPS
737 frames in 5.005 seconds = 147.253 FPS
729 frames in 5.004 seconds = 145.683 FPS
732 frames in 5.002 seconds = 146.341 FPS
735 frames in 5.001 seconds = 146.971 FPS
[...]
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1539 frames in 5.002 seconds = 307.677 FPS
1719 frames in 5 seconds = 343.8 FPS
1780 frames in 5.002 seconds = 355.858 FPS
1497 frames in 5.002 seconds = 299.28 FPS
1548 frames in 5.001 seconds = 309.538 FPS
[..]
I see similar ratio with more complex workload with the trace from:
http://people.freedesktop.org/~jrfonseca/traces/furmark-1.8.2-svga.trace
(you'll need to download https://github.com/apitrace/apitrace and build)
My questions are:
- Is this the expected performance when texturing is used? Or is there
something wrong with my setup?
Two things are happening here to cause the behavior you’re seeing. First,
OpenSWR only generates threads equal to the number of physical cores. On our
workloads, going beyond that and using hyperthreads was a minimal or negative
performance increase. Second, one thread is reserved for the API thread, which
does not participate in either frontend (geometry) or backend (fragment) work.
Thus on your two core 5500U OpenSWR only had one raster thread versus
llvmpipe’s two, giving half the performance. If you want to switch OpenSWR to
using hyperthreads, set the environment variable KNOB_MAX_THREADS_PER_CORE=0.
Thanks for the explanations. It's closer now, but still a bit of gap:
$ KNOB_MAX_THREADS_PER_CORE=0 ./gloss
SWR create screen!
This processor supports AVX2.
--> numThreads = 3
1102 frames in 5.002 seconds = 220.312 FPS
1133 frames in 5.001 seconds = 226.555 FPS
1130 frames in 5.002 seconds = 225.91 FPS
^C
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1456 frames in 5 seconds = 291.2 FPS
1617 frames in 5.003 seconds = 323.206 FPS
1571 frames in 5.002 seconds = 314.074 FPS
One final question: you said that one thread is reserved for the API,
but I see all threads (with top `H`) maxing up the CPU. So if the
thread reserved for the API is not doing vertex/fragment processing,
then what is it using 100% of a CPU thread for?
Final thoughts: I understand this project has its own history, but I
echo what Roland said -- it would be nice to unify with llvmpipe at one
point, in some way or fashion. Our (VMware's) focus has been desktop
composition, but there's no reason why a single SW renderer can't
satisfy both ends of the spectrum, especially for JIT enable renderers,
since they can emit at runtime the code most suited for the workload.
That said, it's really nice seeing Mesa and Gallium enabling this sort
of experiments with SW rendering.
Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev