Forgot to reply-all. On Sep 12, 2014 9:05 AM, "Jason Ekstrand" <ja...@jlekstrand.net> wrote:
> The teximage-colors test that I pushed to piglit a week or two ago takes a > --benchmark flag that bumps the texture size and does the upload 1000 times > and gives you the average time to upload. > --Jason > On Sep 12, 2014 9:01 AM, "Brian Paul" <bri...@vmware.com> wrote: > >> On 09/12/2014 08:49 AM, Jason Ekstrand wrote: >> >>> >>> On Sep 12, 2014 7:09 AM, "Brian Paul" <bri...@vmware.com >>> <mailto:bri...@vmware.com>> wrote: >>> > >>> > On 09/11/2014 04:58 PM, Jason Ekstrand wrote: >>> >> >>> >> >>> >> >>> >> On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <die...@nuetzel-hh.de >>> <mailto:die...@nuetzel-hh.de> >>> >> <mailto:die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>> wrote: >>> >> >>> >> Am 12.09.2014 00:31, schrieb Jason Ekstrand: >>> >> >>> >> On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel >>> >> <die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de> >>> <mailto:die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>> >>> >> >>> >> wrote: >>> >> >>> >> Am 15.08.2014 04:50, schrieb Jason Ekstrand: >>> >> >>> >> On Aug 14, 2014 7:13 PM, "Dieter Nützel" >>> >> <die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de> >>> <mailto:die...@nuetzel-hh.de <mailto:die...@nuetzel-hh.de>>> >>> >> >>> >> wrote: >>> >> >>> >> >>> >> Am 15.08.2014 02:36, schrieb Dave Airlie: >>> >> >>> >> On 08/02/2014 02:11 PM, Jason >>> Ekstrand >>> >> wrote: >>> >> >>> >> >>> >> >>> >> Most format conversion operations >>> >> required by GL can be >>> >> >>> >> performed by >>> >> >>> >> converting one channel at a time, >>> >> shuffling the channels >>> >> >>> >> around, and >>> >> >>> >> optionally filling missing >>> channels >>> >> with zeros and ones. >>> >> >>> >> This >>> >> adds a >>> >> >>> >> function to do just that in a >>> >> general, yet efficient, way. >>> >> >>> >> v2: >>> >> * Add better comments including >>> full >>> >> docs for functions >>> >> * Don't use __typeof__ >>> >> * Use inline helpers instead of >>> >> writing out conversions >>> >> >>> >> by >>> >> hand, >>> >> >>> >> * Force full loop unrolling for >>> >> better performance >>> >> >>> >> >>> >> >>> >> This file seems to anger gcc a lot. >>> >> >>> >> It seems to take upwards of a minute or two >>> to >>> >> compile here. >>> >> >>> >> gcc 4.8.3 on 32-bit x86. >>> >> >>> >> Dave. >>> >> >>> >> >>> >> >>> >> For me (on our poor little Duron 1800/2 GB) it >>> ran ~5 >>> >> >>> >> minutes... >>> >> >>> >> >>> >> gcc 4.8.1 on 32-bit x86. >>> >> >>> >> >>> >> If we'd like, the way the macros are set up, it >>> would be >>> >> easy to >>> >> change it so that we do less unrolling in the cases >>> >> where we are >>> >> actually doing substantial format conversion and >>> >> wouldn't notice >>> >> the >>> >> extra logic quite as much. I'll play with it a bit >>> >> tomorrow or >>> >> next >>> >> week and see how how much of a hit we would actually >>> >> take if we >>> >> unrolled a little less in places. >>> >> --Jason Ekstrand >>> >> >>> >> >>> >> Ping. >>> >> >>> >> In a second it took 11+ minutes , here... >>> >> >>> >> >>> >> 11 minutes! What system are you running? and are you using >>> -03 or >>> >> something? Yes, we can do something to cut it down, but it >>> will >>> >> probably require a configure flag; the question is what flag. >>> >> >>> >> --Jason >>> >> >>> >> >>> >> See above, the old children's system... ;-) >>> >> -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx >>> >> -mfpmath=sse,387 -pipe >>> >> >>> >> Bad? - Worked for ages on AthlonMP....8-) >>> >> Maybe it is bad on Duron (the MP thing, much smaller cache and >>> >> better GCC), now. >>> >> >>> >> Dieter >>> >> >>> >> >>> >> Yeah, my recommendation would be hacking the macros to not unroll and >>> >> keep the patch locally. If you've got a better idea as to how to >>> >> organize the code so the compiler likes it, I'm open as long as we >>> don't >>> >> loose performance. >>> > >>> > >>> > It looks like a release build with MSVC is taking quite a while to >>> compile this file too (actually at link time when the optimizer kicks >>> in). >>> > >>> > But even on my fast Linux system with gcc, the difference in compile >>> time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 >>> seconds). >>> >>> The unfortunate thing is that I doubt -O3 gains you anything on this >>> function given how thoroughly things are unrolled. :-( >>> >> >> Do you have a benchmark program to test the speed of this code? Have you >> compared -O0 .. -O3? I'd be very interested in that. >> >> -Brian >> >>
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev