On 09/12/2014 08:09 AM, Brian Paul wrote:
On 09/11/2014 04:58 PM, Jason Ekstrand wrote:
On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <die...@nuetzel-hh.de
<mailto:die...@nuetzel-hh.de>> wrote:
Am 12.09.2014 00:31, schrieb Jason Ekstrand:
On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
<die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>
wrote:
Am 15.08.2014 04:50, schrieb Jason Ekstrand:
On Aug 14, 2014 7:13 PM, "Dieter Nützel"
<die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>
wrote:
Am 15.08.2014 02:36, schrieb Dave Airlie:
On 08/02/2014 02:11 PM, Jason Ekstrand
wrote:
Most format conversion operations
required by GL can be
performed by
converting one channel at a time,
shuffling the channels
around, and
optionally filling missing channels
with zeros and ones.
This
adds a
function to do just that in a
general, yet efficient, way.
v2:
* Add better comments including full
docs for functions
* Don't use __typeof__
* Use inline helpers instead of
writing out conversions
by
hand,
* Force full loop unrolling for
better performance
This file seems to anger gcc a lot.
It seems to take upwards of a minute or two to
compile here.
gcc 4.8.3 on 32-bit x86.
Dave.
For me (on our poor little Duron 1800/2 GB) it ran ~5
minutes...
gcc 4.8.1 on 32-bit x86.
If we'd like, the way the macros are set up, it would be
easy to
change it so that we do less unrolling in the cases
where we are
actually doing substantial format conversion and
wouldn't notice
the
extra logic quite as much. I'll play with it a bit
tomorrow or
next
week and see how how much of a hit we would actually
take if we
unrolled a little less in places.
--Jason Ekstrand
Ping.
In a second it took 11+ minutes , here...
11 minutes! What system are you running? and are you using
-03 or
something? Yes, we can do something to cut it down, but it will
probably require a configure flag; the question is what flag.
--Jason
See above, the old children's system... ;-)
-O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
-mfpmath=sse,387 -pipe
Bad? - Worked for ages on AthlonMP....8-)
Maybe it is bad on Duron (the MP thing, much smaller cache and
better GCC), now.
Dieter
Yeah, my recommendation would be hacking the macros to not unroll and
keep the patch locally. If you've got a better idea as to how to
organize the code so the compiler likes it, I'm open as long as we don't
loose performance.
It looks like a release build with MSVC is taking quite a while to
compile this file too (actually at link time when the optimizer kicks in).
But even on my fast Linux system with gcc, the difference in compile
time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 seconds).
I'm still prototyping something but it looks like breaking the top-level
switch cases in _mesa_swizzle_and_convert() into separate functions
reduces the time quite a bit. Let me pursue that a bit further and see
how it goes...
OK, I'm posting a couple patches:
mesa: break up _mesa_swizzle_and_convert() to reduce compile time
This reduces -O3 compile time with gcc to 1/4 of what it was. Seems to
reduce compile time with MSVC too, but I haven't really measured it.
Dieter, can you test this patch on your system?
mesa: move i, j var decls into SWIZZLE_CONVERT_LOOP() macro
I think the optimizer can sometimes do a better job when loop variables
are declared per-loop, rather than declared per-function. But this
patch increases the size of the .o file from 2556528 to 2933216 bytes
(15%). Jason, if you have a benchmark to measure the speed of this
code, I'd be interested to know if this patch helps much. Moving the
declaration of 'j' inside the 's' loop makes it even bigger (3074472 bytes).
I haven't done a full piglit run on these changes yet.
-Brian
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev