Hi, how exactly do you run this to get this error? I tried and it worked for me.

burl-ct-x2200-16 50 =>mpirun -mca btl_openib_warn_default_gid_prefix 0 -mca btl self,sm,openib -np 2 -host burl-ct-x2200-16,burl-ct-x2200-17 -mca btl_openib_ib_timeout 16 a.out
I am 0 at 1252670691
I am 1 at 1252670559
I am 0 at 1252670692
I am 1 at 1252670559
 burl-ct-x2200-16 51 =>

Rolf

On 09/11/09 07:18, Ake Sandgren wrote:
Hi!

The following code shows a bad behaviour when running over openib.

Openmpi: 1.3.3
With openib it dies with "error polling HP CQ with status WORK REQUEST
FLUSHED ERROR status number 5 ", with tcp or shmem it works as expected.


#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"

int main(int argc, char *argv[])
{
    int          rank;
    int          n;

    MPI_Init( &argc, &argv );

    MPI_Comm_rank( MPI_COMM_WORLD, &rank );

    fprintf(stderr, "I am %d at %d\n", rank, time(NULL));
    fflush(stderr);

    n = 4;
    MPI_Bcast(&n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD);
    fprintf(stderr, "I am %d at %d\n", rank, time(NULL));
    fflush(stderr);
    if (rank == 0) {
        sleep(60);
    }
    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Finalize( );
    exit(0);
}

I know about the internal openmpi reason for it do behave as it does.
But i think that it should be allowed to behave as it does.

This example is a bit engineered but there are codes where a similar
situation can occur, i.e. the Bcast sender doing lots of other work
after the Bcast before the next MPI call. VASP is a candidate for this.



--

=========================
rolf.vandeva...@sun.com
781-442-3043
=========================

Reply via email to