Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Timothy S. Woodall
Troy, I've been able to reproduce this. Should have this corrected shortly. Thanks, Tim > On Mon, 14 Nov 2005 10:38:03 -0700, Troy Telford > wrote: > >> My mvapi config is using the Mellanox IB Gold 1.8 IB software release. >> Kernel 2.6.5-7.201 (SLES 9 SP2) >> >> When I ran IMB using mvapi, I

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
On Mon, 14 Nov 2005 17:28:15 -0700, Troy Telford wrote: I've just finished a build of RC7, so I'll go give that a whirl and report. RC7: With *both* mvapi and openib, I recieve the following when using IMB-MPI1: ***mvapi*** [0,1,3][btl_mvapi_component.c:637:mca_btl_mvapi_component_progr

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
On Mon, 14 Nov 2005 10:38:03 -0700, Troy Telford wrote: My mvapi config is using the Mellanox IB Gold 1.8 IB software release. Kernel 2.6.5-7.201 (SLES 9 SP2) When I ran IMB using mvapi, I received the following error: *** [0,1,2][btl_mvapi_component.c:637:mca_btl_mvapi_component_progress] e

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
Thus far, it appears that moving to MX 1.1.0 didn't change the error message I've been getting about parts being 'not implemented.' I also re-provisioned four of the IB nodes (leaving me with 3 four-node clusters: One using mvapi, one using openib, and one using myrinet) My mvapi config is

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Troy Telford
On Sun, 13 Nov 2005 17:53:40 -0700, Jeff Squyres wrote: I can't believe I missed that, sorry. :-( None of the btl's are capable of doing loopback communication except "self." Hence, you really can't run "--mca btl foo" if your app ever sends to itself -- you really need to run "--mca btl f

Re: [O-MPI users] 1.0rc5 is up

2005-11-14 Thread Jeff Squyres
1.0rc6 is now available; we made some minor fixes in gm, the datatype engine, and the shared memory btl. I'm not sure if your MX problem will be fixed, but please give it a whirl. Let us know exactly which version of MX you are using, too. http://www.open-mpi.org/software/v1.0/ Than

Re: [O-MPI users] 1.0rc5 is up

2005-11-13 Thread Jeff Squyres
I can't believe I missed that, sorry. :-( None of the btl's are capable of doing loopback communication except "self." Hence, you really can't run "--mca btl foo" if your app ever sends to itself -- you really need to run "--mca btl foo,self" at a minimum. This is not so much an optimizati

Re: [O-MPI users] 1.0rc5 is up

2005-11-13 Thread Brian Barrett
One other thing I noticed... You specify -mca btl openib. Try specifying -mca btl openib,self. The self component is used for "send to self" operations. This could be the cause of your failures. Brian On Nov 13, 2005, at 3:02 PM, Jeff Squyres wrote: Troy -- Were you perchance using mu

Re: [O-MPI users] 1.0rc5 is up

2005-11-13 Thread Jeff Squyres
Troy -- Were you perchance using multiple processes per node? If so, we literally just fixed some sm btl bugs that could have been affecting you (they could have caused hangs). They're fixed in the nightly snapshots from today (both trunk and v1.0): r8140. If you were using the sm btl and

Re: [O-MPI users] 1.0rc5 is up

2005-11-12 Thread Troy Telford
We have very limited openib resources for testing at the moment. Can you provide details on how to reproduce? My bad; I must've been in a bigger hurry to go home for the weekend than I thought. I'm going to start with the assumption you're interested in the steps to reproduce it in OpenMPI

Re: [O-MPI users] 1.0rc5 is up

2005-11-11 Thread Timothy S. Woodall
Hello Troy, We have very limited openib resources for testing at the moment. Can you provide details on how to reproduce? Thanks, Tim > On Fri, 11 Nov 2005 13:12:13 -0700, Jeff Squyres > wrote: > >> At long last, 1.0rc5 is available for download. It fixes all known >> issues reported here on t

Re: [O-MPI users] 1.0rc5 is up

2005-11-11 Thread Galen M. Shipman
The bad: OpenIB frequently crashes with the error: *** [0,1,2][btl_openib_endpoint.c: 135:mca_btl_openib_endpoint_post_send] error posting send request errno says Operation now in progress[0,1,2d [0,1,3][btl_openib_endpoint.c: 135:mca_btl_openib_endpoint_post_send] error posting s

Re: [O-MPI users] 1.0rc5 is up

2005-11-11 Thread Troy Telford
On Fri, 11 Nov 2005 13:12:13 -0700, Jeff Squyres wrote: At long last, 1.0rc5 is available for download. It fixes all known issues reported here on the mailing list. We still have a few minor issues to work out, but things appear to generally be working now. Please try to break it:

[O-MPI users] 1.0rc5 is up

2005-11-11 Thread Jeff Squyres
At long last, 1.0rc5 is available for download. It fixes all known issues reported here on the mailing list. We still have a few minor issues to work out, but things appear to generally be working now. Please try to break it: http://www.open-mpi.org/software/v1.0/ -- {+} Jeff Squyr