On Mon, Sep 12, 2011 at 11:24:22PM +0200, Marie E. Rognes wrote: > On 09/12/11 21:56, Kristian Ølgaard wrote: > >On 12 September 2011 21:36, Marie E. Rognes<m...@simula.no> wrote: > >>On 09/12/11 20:00, Marie E. Rognes wrote: > >>>On 09/12/11 19:54, Garth N. Wells wrote: > >>>>On 12 September 2011 18:49, Marie E. Rognes<m...@simula.no> wrote: > >>>>>On 09/12/11 19:40, Garth N. Wells wrote: > >>>>>>Which compiler options did you use when evaluating the speed up? > >>>>>> > >>>>>Tested Extrapolation.h with vanilla dolfin (which is dominated by > >>>>>evaluate_basis calls). No additional compiler options set. > >>>>> > >>>>>What are the default compiler options? > >>>>> > >>>>'-g' for plain JIT, which is dead slow. You should test with at least: > >>>> > >>>> parameters["form_compiler"]["cpp_optimize"] = True > >>>> > >>>>in the Python code. This will use '-O2'. > >Isn't this limited in a way? Would it be a problem to let users do: > > > >parameters["form_compiler"]["cpp_optimize"] = '-O2 -funroll-loops' > >parameters["form_compiler"]["cpp_optimize"] = '-O3' > > > >and then perhaps let > > > >parameters["form_compiler"]["cpp_optimize"] = True > > > >default to '-O2' as we do now? > >Just a thought. > > > >>>Ok, thanks -- I'll take a closer look. > >>> > >>Take a look at the attached results in old_evaluate_basis.txt (results with > >>"old" FFC), > >>and new_evaluate_basis.txt (results with "new" FFC) from running the > >>attached > >>test_evaluate_basis.py. > >> > >>Acceptable? > >Looks good, and the generated code is much nicer now. :) > >It could have been fun to see the impact of the '-O2 -funroll-loops' > >option on the old code, but then you'll have to switch to C++. Anyway, > >I'm quite sure that the old code will never perform as well as the new > >code even with this option. > > > >As you have probably found out, the generated code was simply a mirror > >of what is going on in FIAT (translated to C++). > > Yep. > > >Perhaps there are more places where we can simplify the generated code? > > > > Probably, did you have anything particular in mind? > > One thing we could do to reduce code size > would be to move the evaluation of the modal(?) basis functions > outside of the switch and just do the vector-vector product inside. > > Also, I think it would significantly speed up evaluate_basis_all, > if we just did the evaluation of the modal basis functions once, > and then the vector-vector product 'local_dimension'-times. > > Actually, I plan on doing that unless anyone protests vehemently. > The reduction in generated code from the one should more or less > counteract the increase in generated code from the other.
Another thing to try would be to use BLAS to do the vector-vector products (call ddot from BLAS) or even better if it can be written as one big matrix-vector product (call dgemv from BLAS). -- Anders > >Another thing in relation to improving the evaluate_basis* functions > >that I have thought about is if it's really necessary to support > >derivatives of arbitrary order. If we only generate code for the first > >derivative by default (and support arbitrary derivatives by a command > >line argument) the code will be a lot simpler (easier on C++ compiler) > >and much faster irrespective of which gcc optimisation is being used. > > > > Sound neat to me. > _______________________________________________ Mailing list: https://launchpad.net/~ffc Post to : ffc@lists.launchpad.net Unsubscribe : https://launchpad.net/~ffc More help : https://help.launchpad.net/ListHelp