Re: [OMPI users] "An error occurred in MPI_Recv" with more than 2 CPU

Eugene Loh Wed, 27 May 2009 10:41:37 -0400

vasilis wrote:

Thank you Eugene for your suggestion. I used different tags for each variable,and now I do not get this error.The problem now is that I am getting a different solution when I use more than2 CPUs. I checked the matrices and I found that they differ by a very smallamount of the order 10^(-10). Actually, I am getting a different solution if Iuse 4CPUs or 16CPUs!!!
Do you have any idea what could cause this behavior?

Sure.

Rank 0 accumulates all the res_cpu values into a single array, res. Itstarts with its own res_cpu and then adds all other processes. Whennp=2, that means the order is prescribed. When np>2, the order is nolonger prescribed and some floating-point rounding variations can startto occur.

If you want results to be more deterministic, you need to fix the orderin which res is aggregated. E.g., instead of using MPI_ANY_SOURCE, loopover the peer processes in a specific order.

P.S. It seems to me that you could use MPI collective operations toimplement what you're doing. E.g., something like:

call MPI_Reduce(res_cpu, res, total_unknown, MPI_DOUBLE_PRECISION,MPI_SUM, 0, MPI_COMM_WORLD, ierr)


call MPI_Gather(jacob_cpu, total_elem_cpu * unique, MPI_DOUBLE_PRECISION, &

jacob , total_elem_cpu * unique,MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)

call MPI_Gather(  row_cpu, total_elem_cpu * unique, MPI_INTEGER         , &

row , total_elem_cpu * unique, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr)

call MPI_Gather(  col_cpu, total_elem_cpu * unique, MPI_INTEGER         , &

col , total_elem_cpu * unique, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr)

I think the res part is right. The jacob/row/col parts are not quiteright since you don't just want to gather the elements, but add theminto particular arrays. Not sure if you really want to allocate a newscratch array to use for this purpose or what. Nor would this solve theres_cpu indeterministic problem you had. I just wanted to make sure youknew about the MPI collective operations as an alternative to yourpoint-to-point implementation.

Re: [OMPI users] "An error occurred in MPI_Recv" with more than 2 CPU

Reply via email to