On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <die...@nuetzel-hh.de> wrote:
> Am 12.09.2014 00:31, schrieb Jason Ekstrand: > > On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel <die...@nuetzel-hh.de> >> wrote: >> >> Am 15.08.2014 04:50, schrieb Jason Ekstrand: >>> >>> On Aug 14, 2014 7:13 PM, "Dieter Nützel" <die...@nuetzel-hh.de> >>>> wrote: >>>> >>>>> >>>>> Am 15.08.2014 02:36, schrieb Dave Airlie: >>>>> >>>>> On 08/02/2014 02:11 PM, Jason Ekstrand wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Most format conversion operations required by GL can be >>>>>>>>> >>>>>>>> performed by >>>> >>>>> converting one channel at a time, shuffling the channels >>>>>>>>> >>>>>>>> around, and >>>> >>>>> optionally filling missing channels with zeros and ones. >>>>>>>>> >>>>>>>> This >>>> adds a >>>> >>>>> function to do just that in a general, yet efficient, way. >>>>>>>>> >>>>>>>>> v2: >>>>>>>>> * Add better comments including full docs for functions >>>>>>>>> * Don't use __typeof__ >>>>>>>>> * Use inline helpers instead of writing out conversions >>>>>>>>> >>>>>>>> by >>>> hand, >>>> >>>>> * Force full loop unrolling for better performance >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> This file seems to anger gcc a lot. >>>>>> >>>>>> It seems to take upwards of a minute or two to compile here. >>>>>> >>>>>> gcc 4.8.3 on 32-bit x86. >>>>>> >>>>>> Dave. >>>>>> >>>>> >>>>> >>>>> For me (on our poor little Duron 1800/2 GB) it ran ~5 >>>>> >>>> minutes... >>>> >>>>> >>>>> gcc 4.8.1 on 32-bit x86. >>>>> >>>> >>>> If we'd like, the way the macros are set up, it would be easy to >>>> change it so that we do less unrolling in the cases where we are >>>> actually doing substantial format conversion and wouldn't notice >>>> the >>>> extra logic quite as much. I'll play with it a bit tomorrow or >>>> next >>>> week and see how how much of a hit we would actually take if we >>>> unrolled a little less in places. >>>> --Jason Ekstrand >>>> >>> >>> Ping. >>> >>> In a second it took 11+ minutes , here... >>> >> >> 11 minutes! What system are you running? and are you using -03 or >> something? Yes, we can do something to cut it down, but it will >> probably require a configure flag; the question is what flag. >> >> --Jason >> > > See above, the old children's system... ;-) > -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx > -mfpmath=sse,387 -pipe > > Bad? - Worked for ages on AthlonMP....8-) > Maybe it is bad on Duron (the MP thing, much smaller cache and better > GCC), now. > > Dieter > Yeah, my recommendation would be hacking the macros to not unroll and keep the patch locally. If you've got a better idea as to how to organize the code so the compiler likes it, I'm open as long as we don't loose performance. --Jason
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev