Re: [OMPI users] Error Polling HP CQ Status on PPC64 LInux with IB

2006-06-29 Thread Jeff Squyres (jsquyres)
Owen -- Sorry, we all fell [way] behind on e-mail because many of us were at an OMPI developer's meeting last week. :-( In the interim, we have finally released Open MPI v1.1. Could you give this version a whirl and see if it fixes your problems? > -Original Message- > From: users-bo

Re: [OMPI users] Error Polling HP CQ Status on PPC64 LInux with IB

2006-06-29 Thread Galen M. Shipman
I'm currently working with Owen on this issue.. will continue my findings on list.. - Galen On Jun 29, 2006, at 7:56 AM, Jeff Squyres ((jsquyres)) wrote: Owen -- Sorry, we all fell [way] behind on e-mail because many of us were at an OMPI developer's meeting last week. :-( In the interi

Re: [OMPI users] Why does it suddenly not run?

2006-06-29 Thread Jeff Squyres (jsquyres)
Jens -- I'm trolling through old e-mails on this list and it doesn't look like you ever got an answer to this message. Did you ever figure out the problem? > -Original Message- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Jens Klostermann > Sent

Re: [OMPI users] mpirun and ncurses

2006-06-29 Thread Jeff Squyres (jsquyres)
It doesn't look like you ever got an answer to this question -- sorry! We sometimes get very bad at mail management. :-( I'm guessing that this is always going to be a problematic scenario for Open MPI. We have to do forwarding of stdin/out/err between the MPI process and mpirun. I'm guessing

Re: [OMPI users] mca_btl_tcp_frag_send: writev failed with errno=110

2006-06-29 Thread Jeff Squyres (jsquyres)
Sorry for the delay in replying -- sometimes we just get overwhelmed with all the incoming mail. :-( > -Original Message- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Tony Ladd > Sent: Saturday, June 17, 2006 9:47 AM > To: us...@open-mpi.org > Sub

Re: [OMPI users] auto detect hosts

2006-06-29 Thread Jeff Squyres (jsquyres)
Sorry for the delay in replying. Too much travel, and too much e-mail! :-) > -Original Message- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Kluskens > Sent: Monday, June 19, 2006 4:56 PM > To: Open MPI Users > Subject: Re: [OMPI users] a

Re: [OMPI users] users Digest, Vol 318, Issue 1

2006-06-29 Thread openmpi-user
@Terry I hope this is of any help (debugged with TotalView): Enclose you will find a graph from TotalView as well as this: /Created process 2 (7633), named "mpirun" Thread 2.1 has appeared Thread 2.2 has appeared Thread 2.1 received a signal (Segmentation Violation)/ and the stack trace: /

Re: [OMPI users] OpenMPI 1.1 backward compatible?

2006-06-29 Thread Jeff Squyres (jsquyres)
I think you may have caught us in an unintentional breakage. If your Open MPI was compiled as shared libraries and dynamic shared objects (the default), this error should not have happened since we did not change mpi.h. So there must be a second-order effect going on here (somehow the size of

Re: [OMPI users] OpenMPI 1.1 backward compatible?

2006-06-29 Thread Jeff Squyres (jsquyres)
I should have tried this before I replied. I had a further thought (after I replied, of course) -- I was wondering if one of our components had a reference to ompi_comm_world (and not your application) and that caused the problem. If you installed 1.1 over 1.0.2 and didn't uninstall first, an

Re: [OMPI users] keyval parser error after v1.1 upgrade

2006-06-29 Thread Jeff Squyres (jsquyres)
Patrick -- I'm a little confused about your response. Are you replying to the "keyval parser" thread (i.e., saying that you had the same problem as Benjamin Landsteiner), or are you replying to the "mca_oob_tcp_accept" thread? > -Original Message- > From: users-boun...@open-mpi.org > [

[OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Doug Gregor
I am running into a problem with a simple program (which performs several MPI_Bcast operations) hanging. Most processes hang in MPI_Finalize, the others hang in MPI_Bcast. Interestingly enough, this only happens when I oversubscribe the nodes. For instance, using IU's Odin cluster, I take 4

Re: [OMPI users] keyval parser error after v1.1 upgrade

2006-06-29 Thread Patrick Jessee
Jeff, Sorry for the confusion. It's for the the "mca_oob_tcp_accept" thread. I mistakenly replied to the wrong message ("keyval parser"). As for the "mca_oob_tcp_accept" thread, I have since found since found some more information on the problem (1.1 no longer works if stdin is closed; 1.0

[OMPI users] OpenMpi 1.1 and Torque 2.1.1

2006-06-29 Thread Justin Bronder
I'm having trouble getting OpenMPI to execute jobs when submitting through Torque. Everything works fine if I am to use the included mpirun scripts, but this is obviously not a good solution for the general users on the cluster. I'm running under OS X 10.4, Darwin 8.6.0. I configured OpenMpi wit

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Graham E Fagg
Hi Doug wow, looks like some messages are getting lost (or even delivered to the wrong peer on the same node.. ) Could you also try with: -mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_bcast_algorithm <1,2,3,4,5,6> The values 1-6 control which topology/aglorith

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Doug Gregor
On Jun 29, 2006, at 5:23 PM, Graham E Fagg wrote: Hi Doug wow, looks like some messages are getting lost (or even delivered to the wrong peer on the same node.. ) Could you also try with: -mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_bcast_algorithm <1,2,3,

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Graham E Fagg
On Thu, 29 Jun 2006, Doug Gregor wrote: Are there other settings I can tweak to try to find the algorithm that it's deciding to use at run-time? Yes just: -mca coll_base_verbose 1 will show whats being decided at run time. i.e. [reliant:25351] ompi_coll_tuned_bcast_intra_dec_fixed [reliant:25

[OMPI users] Testing one-sided message passing with 1.1

2006-06-29 Thread Tom Rosmond
I am testing the one-sided message passing (mpi_put, mpi_get) that is now supported in the 1.1 release. It seems to work OK for some simple test codes, but when I run my big application, it fails. This application is a large weather model that runs operationally on the SGI Origin 3000, using

Re: [OMPI users] MPI_Bcast/MPI_Finalize hang with Open MPI 1.1

2006-06-29 Thread Graham E Fagg
On Thu, 29 Jun 2006, Doug Gregor wrote: When I use algorithm 6, I get: [odin003.cs.indiana.edu:14174] *** An error occurred in MPI_Bcast [odin005.cs.indiana.edu:10510] *** An error occurred in MPI_Bcast Broadcasting integers from root 0...[odin004.cs.indiana.edu:11752] *** An error occurred in