Re: [Bug-apl] Experimental OpenMP patch

Juergen Sauermann Tue, 11 Mar 2014 08:24:14 -0700

Hi,

we could do it similar to the LOG macro where you can choose between
more efficient compile-time settings and less efficient run-time settings.


It is important that we do these things properly from the outset to avoid
too many changes later on.

/// Jürgen


On 03/11/2014 04:10 PM, Elias Mårtenson wrote:

May I suggest that being able to choose the number of cores at runtimeshould actually be the default. Remember that most Linux distributionswill not compile the source on the local machine and insteaddistributes binaries.

Having some #ifdefs would be good, and having runtime user-selected(or automatically based on cores) number of threads as default isimportant for this reason.


Regards,
Elias

On 11 March 2014 23:07, Juergen Sauermann<juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>>wrote:


    Hi David,

    looks good! Some comments, though.

    1 .you could adapt src/testcases/Performance.pt with some longer
    skalar functions in order to get some performance figures. You can
    start it like this:

    ./apl -T testcases/Performance.pt

    2. I believe we should not bother the user with specifying
    parallelization parameters in ⎕SYL.
    I would rather ./configure CORES=n with n=1 meaning no parallel
    execution, CORES=auto
    being the number of cores on the build machine, and explicit
    numbers n>1 meaning that
    n cores shall be used. This would generate slightly faster code
    than computing array bounds
    at runtime. Its a bit more hassle for the user, but may pay off soon.

    3. Yes, GNU APL throws many exception (almost every APL error was
    thrown from somewhere),
     and I was excpecting that we have to catch them on the throwing
    processor. Not too difficult if
    we do it on the top level.

    4. It would be good to understand how the OPenMP loops work. I
    could imagined one of two strategies:

    - in loop(j, MAX)   thread j executes iteration j, j+CORES, ...
    - thread j executes iterations j*MAX/CORES ... (j+1)*MAX/CORES

    The first strategy interleaves the data and is more intuitive
    while the second uses blocks of data and is more cache-friendly
    and therefore probably
    giving better performance.

    5. Not sure if your earlier comment on letting the scheduler
    decide is correct. I have been doing
    pthread programming in the past and I have seen cases where the
    scheduler fooled itself and
    led to cases where the same problem took more than double the
    capacity compared to explicit
    affinity on a 4-core CPU. I would expect that APL generates very
    fine-graned and short-lived
    pieces of execution and the scheduler may not be optimized for
    that. I guess we have to try that out.

    /// Jürgen




    On 03/11/2014 08:02 AM, David B. Lamkins wrote:

        Juergen's suggestion prompted me to attempt an implementation
        using
        OpenMP rather than the by-hand coding that I had been
        anticipating.
        Attached is a quick-and-dirty patch to enable GNU APL to be
        build with
        OpenMP support.

        ./configure --with-openmp

        There are many rough edges, both in the Makefile and the code.

        --with-openmp would ideally check to see whether the compiler
        supports
        OpenMP. It may be necessary to check the compiler version, as
        different
        compilers support different versions of OpenMP. Also, I've assumed
        compilation on/for Linux despite the fact that GNU APL and
        OpenMP should
        be buildable with the right Windows compiler.

        As one might expect, OpenMP requires that any throw from a
        worker thread
        must be caught by the same thread. I'm almost certain that this
        restriction could be violated by GNU APL code as currently
        written.

        The good news, though, is that the changes are benign; in the
        absence of
        --with-openmp, GNU APL's behavior is unchanged.

        With OpenMP support, ⎕syl is extended to access some of OpenMPs
        parameters.

        I've done only trivial testing at this point; just enough to
        verify that
        compiling OpenMP support doesn't obviously break GNU APL.

        I haven't confirmed that the OpenMP #pragmas on the key loops in
        SkalarFunction.cc have any effect on execution time or
        processor core
        utilization. I hope to do more testing later this week.

        Best wishes,
           David

Re: [Bug-apl] Experimental OpenMP patch

Reply via email to