On 01/04/2010 01:23 AM, Eugene Loh wrote:
1) What about "-mca coll_sync_barrier_before 100"? (The default may be 1000. So, you can try various values less than 1000. I'm suggesting 100.) Note that broadcast has somewhat one-way traffic flow, which can have some undesirable flow control issues.Louis Rossi wrote: Hi Eugene,Louis Rossi wrote: Hi Eugene,Great. Next time, go ahead and respond to the wider mail alias so that everyone learns that your particular problem was resolved. I will update the trac ticket to point to this as another instance of this problem. One signature of the problem is that GCC 4.4.0 or later exposes the problem, while earlier revs do not. I can't tell for sure, but it appears to me that this condition is met with Fedora 11. Our understanding of trac 2043 has recently improved immensely. It would be great if you could confirm the fix. The ticket is at https://svn.open-mpi.org/trac/ompi/ticket/2043 . r22324 should fix the problem. If you could get that version, build with GCC (presumably 4.4.0 or more recent), then the workaround should no longer be needed. |
- [OMPI users] Dual quad core Opteron hangs on Bcast. Louis Rossi
- Re: [OMPI users] Dual quad core Opteron hangs on Bca... Eugene Loh
- Re: [OMPI users] Dual quad core Opteron hangs on... Lenny Verkhovsky
- Re: [OMPI users] Dual quad core Opteron hangs on... Eugene Loh
- Re: [OMPI users] Dual quad core Opteron hang... Eugene Loh
- Re: [OMPI users] Dual quad core Opteron ... Louis Rossi
- Re: [OMPI users] Dual quad core Opteron hangs on Bca... Matthew MacManes