Have you considered turning all inline functions into macros, so that the compiler doesn't have to inline them?
Marek On Fri, Sep 12, 2014 at 12:58 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > > > On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel <die...@nuetzel-hh.de> wrote: >> >> Am 12.09.2014 00:31, schrieb Jason Ekstrand: >> >>> On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel <die...@nuetzel-hh.de> >>> wrote: >>> >>>> Am 15.08.2014 04:50, schrieb Jason Ekstrand: >>>> >>>>> On Aug 14, 2014 7:13 PM, "Dieter Nützel" <die...@nuetzel-hh.de> >>>>> wrote: >>>>>> >>>>>> >>>>>> Am 15.08.2014 02:36, schrieb Dave Airlie: >>>>>> >>>>>>>>> On 08/02/2014 02:11 PM, Jason Ekstrand wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Most format conversion operations required by GL can be >>>>> >>>>> performed by >>>>>>>>>> >>>>>>>>>> converting one channel at a time, shuffling the channels >>>>> >>>>> around, and >>>>>>>>>> >>>>>>>>>> optionally filling missing channels with zeros and ones. >>>>> >>>>> This >>>>> adds a >>>>>>>>>> >>>>>>>>>> function to do just that in a general, yet efficient, way. >>>>>>>>>> >>>>>>>>>> v2: >>>>>>>>>> * Add better comments including full docs for functions >>>>>>>>>> * Don't use __typeof__ >>>>>>>>>> * Use inline helpers instead of writing out conversions >>>>> >>>>> by >>>>> hand, >>>>>>>>>> >>>>>>>>>> * Force full loop unrolling for better performance >>>>>>>>>> >>>>>>> >>>>>>> >>>>>>> This file seems to anger gcc a lot. >>>>>>> >>>>>>> It seems to take upwards of a minute or two to compile here. >>>>>>> >>>>>>> gcc 4.8.3 on 32-bit x86. >>>>>>> >>>>>>> Dave. >>>>>> >>>>>> >>>>>> >>>>>> For me (on our poor little Duron 1800/2 GB) it ran ~5 >>>>> >>>>> minutes... >>>>>> >>>>>> >>>>>> gcc 4.8.1 on 32-bit x86. >>>>> >>>>> >>>>> If we'd like, the way the macros are set up, it would be easy to >>>>> change it so that we do less unrolling in the cases where we are >>>>> actually doing substantial format conversion and wouldn't notice >>>>> the >>>>> extra logic quite as much. I'll play with it a bit tomorrow or >>>>> next >>>>> week and see how how much of a hit we would actually take if we >>>>> unrolled a little less in places. >>>>> --Jason Ekstrand >>>> >>>> >>>> Ping. >>>> >>>> In a second it took 11+ minutes , here... >>> >>> >>> 11 minutes! What system are you running? and are you using -03 or >>> something? Yes, we can do something to cut it down, but it will >>> probably require a configure flag; the question is what flag. >>> >>> --Jason >> >> >> See above, the old children's system... ;-) >> -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx >> -mfpmath=sse,387 -pipe >> >> Bad? - Worked for ages on AthlonMP....8-) >> Maybe it is bad on Duron (the MP thing, much smaller cache and better >> GCC), now. >> >> Dieter > > > Yeah, my recommendation would be hacking the macros to not unroll and keep > the patch locally. If you've got a better idea as to how to organize the > code so the compiler likes it, I'm open as long as we don't loose > performance. > --Jason > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev