Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-30 Thread Doug Gregor
On Jun 29, 2006, at 11:16 PM, Graham E Fagg wrote: On Thu, 29 Jun 2006, Doug Gregor wrote: When I use algorithm 6, I get: [odin003.cs.indiana.edu:14174] *** An error occurred in MPI_Bcast [odin005.cs.indiana.edu:10510] *** An error occurred in MPI_Bcast Broadcasting integers from root 0...[od

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Graham E Fagg
On Thu, 29 Jun 2006, Doug Gregor wrote: When I use algorithm 6, I get: [odin003.cs.indiana.edu:14174] *** An error occurred in MPI_Bcast [odin005.cs.indiana.edu:10510] *** An error occurred in MPI_Bcast Broadcasting integers from root 0...[odin004.cs.indiana.edu:11752] *** An error occurred in

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Graham E Fagg
On Thu, 29 Jun 2006, Doug Gregor wrote: Are there other settings I can tweak to try to find the algorithm that it's deciding to use at run-time? Yes just: -mca coll_base_verbose 1 will show whats being decided at run time. i.e. [reliant:25351] ompi_coll_tuned_bcast_intra_dec_fixed [reliant:25

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Doug Gregor
On Jun 29, 2006, at 5:23 PM, Graham E Fagg wrote: Hi Doug wow, looks like some messages are getting lost (or even delivered to the wrong peer on the same node.. ) Could you also try with: -mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_bcast_algorithm <1,2,3,

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Graham E Fagg
Hi Doug wow, looks like some messages are getting lost (or even delivered to the wrong peer on the same node.. ) Could you also try with: -mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_bcast_algorithm <1,2,3,4,5,6> The values 1-6 control which topology/aglorith

[OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Doug Gregor
I am running into a problem with a simple program (which performs several MPI_Bcast operations) hanging. Most processes hang in MPI_Finalize, the others hang in MPI_Bcast. Interestingly enough, this only happens when I oversubscribe the nodes. For instance, using IU's Odin cluster, I take 4