Peter,

a patch is available at https://github.com/ggouaillardet/ompi-release/commit/0b62eabcae403b95274ce55973a7ce29483d0c98.patch

it is now under review

Cheers,

Gilles

On 2/2/2016 11:22 PM, Gilles Gouaillardet wrote:
Thanks Peter,

this is just a workaround for a bug we just identified, the fix will come soon

Cheers,

Gilles

On Tuesday, February 2, 2016, Peter Wind <peter.w...@met.no <mailto:peter.w...@met.no>> wrote:

    That worked!

    i.e with the changed you proposed the code gives the right result.

    That was efficient work, thank you Gilles :)

    Best wishes,
    Peter


    ------------------------------------------------------------------------

        Thanks Peter,

        that is quite unexpected ...

        let s try an other workaround, can you replace

        integer            :: comm_group

        with

        integer            :: comm_group, comm_tmp

        and

        call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr)

        with

        call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_tmp, ierr)

        if (irank < (num_procs/2)) then

             comm_group = comm_tmp

        else

             call MPI_Comm_dup(comm_tmp, comm_group, ierr)

        endif

        if it works, I will make a fix tomorrow when I can access my
        workstation.
        if not, can you please run
        mpirun --mca osc_base_verbose 100 ...
        and post the output ?

        I will then try to reproduce the issue and investigate it

        Cheers,

        Gilles

        On Tuesday, February 2, 2016, Peter Wind <peter.w...@met.no>
        wrote:

            Thanks Gilles,

            I get the following output (I guess it is not what you
            wanted?).

            Peter


            $ mpirun --mca osc pt2pt -np 4 a.out
            
--------------------------------------------------------------------------
            A requested component was not found, or was unable to be
            opened.  This
            means that this component is either not installed or is
            unable to be
            used on your system (e.g., sometimes this means that
            shared libraries
            that the component requires are unable to be
            found/loaded).  Note that
            Open MPI stopped checking at the first component that it
            did not find.

            Host:      stallo-2.local
            Framework: osc
            Component: pt2pt
            
--------------------------------------------------------------------------
            
--------------------------------------------------------------------------
            It looks like MPI_INIT failed for some reason; your
            parallel process is
            likely to abort.  There are many reasons that a parallel
            process can
            fail during MPI_INIT; some of which are due to
            configuration or environment
            problems.  This failure appears to be an internal failure;
            here's some
            additional information (which may only be relevant to an
            Open MPI
            developer):

              ompi_osc_base_open() failed
              --> Returned "Not found" (-13) instead of "Success" (0)
            
--------------------------------------------------------------------------
            *** An error occurred in MPI_Init
            *** on a NULL communicator
            *** MPI_ERRORS_ARE_FATAL (processes in this communicator
            will now abort,
            ***    and potentially your MPI job)
            [stallo-2.local:38415] Local abort before MPI_INIT
            completed successfully; not able to aggregate error
            messages, and not able to guarantee that all other
            processes were killed!
            *** An error occurred in MPI_Init
            *** on a NULL communicator
            *** MPI_ERRORS_ARE_FATAL (processes in this communicator
            will now abort,
            ***    and potentially your MPI job)
            [stallo-2.local:38418] Local abort before MPI_INIT
            completed successfully; not able to aggregate error
            messages, and not able to guarantee that all other
            processes were killed!
            *** An error occurred in MPI_Init
            *** on a NULL communicator
            *** MPI_ERRORS_ARE_FATAL (processes in this communicator
            will now abort,
            ***    and potentially your MPI job)
            [stallo-2.local:38416] Local abort before MPI_INIT
            completed successfully; not able to aggregate error
            messages, and not able to guarantee that all other
            processes were killed!
            *** An error occurred in MPI_Init
            *** on a NULL communicator
            *** MPI_ERRORS_ARE_FATAL (processes in this communicator
            will now abort,
            ***    and potentially your MPI job)
            [stallo-2.local:38417] Local abort before MPI_INIT
            completed successfully; not able to aggregate error
            messages, and not able to guarantee that all other
            processes were killed!
            -------------------------------------------------------
            Primary job  terminated normally, but 1 process returned
            a non-zero exit code.. Per user-direction, the job has
            been aborted.
            -------------------------------------------------------
            
--------------------------------------------------------------------------
            mpirun detected that one or more processes exited with
            non-zero status, thus causing
            the job to be terminated. The first process to do so was:

              Process name: [[52507,1],0]
              Exit code:    1
            
--------------------------------------------------------------------------
            [stallo-2.local:38410] 3 more processes have sent help
            message help-mca-base.txt / find-available:not-valid
            [stallo-2.local:38410] Set MCA parameter
            "orte_base_help_aggregate" to 0 to see all help / error
            messages
            [stallo-2.local:38410] 2 more processes have sent help
            message help-mpi-runtime / mpi_init:startup:internal-failure


            
------------------------------------------------------------------------

                Peter,

                at first glance, your test program looks correct.

                can you please try to run
                mpirun --mca osc pt2pt -np 4 ...

                I  might have identified a bug with the sm osc component.

                Cheers,

                Gilles

                On Tuesday, February 2, 2016, Peter Wind
                <peter.w...@met.no> wrote:

                    Enclosed is a short (< 100 lines) fortran code
                    example that uses shared memory.
                    It seems to me it behaves wrongly if openmpi is used.
                    Compiled with SGI/mpt , it gives the right result.

                    To fail, the code must be run on a single node.
                    It creates two groups of 2 processes each. Within
                    each group memory is shared.
                    The error is that the two groups get the same
                    memory allocated, but they should not.

                    Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and
                    gfortran, intel 13.0, intel 14.0
                    all fail.

                    The call:
                       call MPI_Win_allocate_shared(win_size,
                    disp_unit, MPI_INFO_NULL, comm_group, cp1, win, ierr)

                    Should allocate memory only within the group. But
                    when the other group allocates memory, the
                    pointers from the two groups point to the same
                    address in memory.

                    Could you please confirm that this is the wrong
                    behaviour?

                    Best regards,
                    Peter Wind


                _______________________________________________
                users mailing list
                us...@open-mpi.org
                Subscription:
                http://www.open-mpi.org/mailman/listinfo.cgi/users
                Link to this post:
                http://www.open-mpi.org/community/lists/users/2016/02/28429.php



        _______________________________________________
        users mailing list
        us...@open-mpi.org
        <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2016/02/28431.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/02/28436.php

Reply via email to