Re: [OMPI users] EXTERNAL: Re: unacceptable latency in gathering process

2012-10-03 Thread Ralph Castain
Hmmm...you probably can't without digging down into the diagnostics. Perhaps we could help more if we had some idea how you are measuring this "latency". I ask because that is orders of magnitude worse than anything we measure - so I suspect the problem is in your app (i.e., that the time you ar

Re: [OMPI users] EXTERNAL: Re: unacceptable latency in gathering process

2012-10-03 Thread Hodge, Gary C
how do I tell the difference between when the message was received and when the message was picked up in MPI_Test? From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, October 03, 2012 1:00 PM To: Open MPI Users Subject: EXTERNAL: Re: [

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread Jeff Squyres
This list is intended for Open MPI support, not general Linux cluster support. You might be able to get more detailed help from other forums and/or your local cluster support admin / vendor. Thanks! On Oct 3, 2012, at 6:58 AM, Syed Ahsan Ali wrote: > Thanks John for the detailed procedure. t

Re: [OMPI users] unacceptable latency in gathering process

2012-10-03 Thread Ralph Castain
Out of curiosity, have you logged the time when the SP called "send" and compared it to the time when the message was received, and when that message is picked up in MPI_Test? In other words, have you actually verified that the delay is in the MPI library as opposed to in your application? On

[OMPI users] unacceptable latency in gathering process

2012-10-03 Thread Hodge, Gary C
Hi all, I am running on an IBM BladeCenter, using Open MPI 1.4.1, and opensm subnet manager for Infiniband Our application has real time requirements and it has recently been proven that it does not scale to meet future requirements. Presently, I am re-organizing the application to process work

Re: [OMPI users] one more problem with process bindings on openmpi-1.6.2

2012-10-03 Thread Ralph Castain
On Oct 3, 2012, at 8:40 AM, Siegmar Gross wrote: > Hi, > >> As I said, in the absence of a hostfile, -host assigns ONE slot for >> each time a host is named. So the equivalent hostfile would have >> "slots=1" to create the same pattern as your -host cmd line. > > That would mean that a hostfi

Re: [OMPI users] one more problem with process bindings on openmpi-1.6.2

2012-10-03 Thread Siegmar Gross
Hi, > As I said, in the absence of a hostfile, -host assigns ONE slot for > each time a host is named. So the equivalent hostfile would have > "slots=1" to create the same pattern as your -host cmd line. That would mean that a hostfile has nothing to do with the underlying hardware and that it wo

Re: [OMPI users] Load and link MPI Host at runtime

2012-10-03 Thread Jeff Squyres
On Oct 3, 2012, at 2:30 AM, wrote: > I`m looking for a document in 'Run MPI At Run-time' topic. I'm can't quite parse this. Are you looking for a document with that name? If so, I suggest Google. > My idea is to load MPI and link host at run-time in special situation. please > help. I'm

Re: [OMPI users] one more problem with process bindings on openmpi-1.6.2

2012-10-03 Thread Ralph Castain
As I said, in the absence of a hostfile, -host assigns ONE slot for each time a host is named. So the equivalent hostfile would have "slots=1" to create the same pattern as your -host cmd line. On Oct 3, 2012, at 7:12 AM, Siegmar Gross wrote: > Hi, > > I thought that "slot" is the smallest

Re: [OMPI users] one more problem with process bindings on openmpi-1.6.2

2012-10-03 Thread Siegmar Gross
Hi, I thought that "slot" is the smallest manageable entity so that I must set "slot=4" for a dual-processor dual-core machine with one hardware-thread per core. Today I learned about the new keyword "sockets" for a hostfile (I didn't find it in "man orte_hosts"). How would I specify a system with

Re: [OMPI users] problem with rankfile and openmpi-1.6.2

2012-10-03 Thread Ralph Castain
I filed a bug fix for this one. However, something you should note. If you fail to provide a "-np N" argument to mpiexec, we assume you want ALL all available slots filled. The rankfile will contain only those procs that you want specifically bound. The remaining procs will be unbound. So with

Re: [OMPI users] question to binding options in openmpi-1.6.2

2012-10-03 Thread Siegmar Gross
Hi, thank you very much for your help. Now the command with "-npersocket" works. Unfortunately it is not a solution for the other problem, which I reported a few minutes ago. tyr fd1026 191 cat host_sunpc0_1 sunpc0 sockets=2 slots=4 sunpc1 sockets=2 slots=4 tyr fd1026 192 mpiexec -report-bindin

Re: [OMPI users] one more problem with process bindings on openmpi-1.6.2

2012-10-03 Thread Ralph Castain
On Oct 3, 2012, at 6:19 AM, Siegmar Gross wrote: > Hi, > > I recognized another problem with procecss bindings. The command > works, if I use "-host" and it breaks, if I use "-hostfile" with > the same machines. > > tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \ > -cpus

[OMPI users] one more problem with process bindings on openmpi-1.6.2

2012-10-03 Thread Siegmar Gross
Hi, I recognized another problem with procecss bindings. The command works, if I use "-host" and it breaks, if I use "-hostfile" with the same machines. tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \ -cpus-per-proc 2 -bind-to-core hostname sunpc1 [sunpc1:00086] MCW rank 1

Re: [OMPI users] crashes in VASP with openmpi 1.6.x

2012-10-03 Thread Noam Bernstein
Thanks to everyone who answered, in particular Ake Sandgren, it appears to be a weird problem with acml that somehow triggers a seg fault in libmpi, but only when running on Opterons. I'd still be interested in figuring out how to get a more complete backtrace, but at least the immediate problem i

Re: [OMPI users] question to binding options in openmpi-1.6.2

2012-10-03 Thread Ralph Castain
Okay, I looked at this and the problem isn't in the code. The problem is that the 1.6 series doesn't have the more sophisticated discovery and mapping algorithms of the 1.7 series. In this case, the specific problem is that the 1.6 series doesn't automatically detect the number of sockets on a n

Re: [OMPI users] problem with rankfile and openmpi-1.6.2

2012-10-03 Thread Ralph Castain
I saw your earlier note about this too. Just a little busy right now, but hope to look at it soon. Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used code path. On Oct 3, 2012, at 3:03 AM, Siegmar Gross wrote: > Hi, > > I want to test process bindings with a ran

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread Syed Ahsan Ali
Thanks John for the detailed procedure. the fstab thing was in mind but it was not sure how to make it happen on compute nodes. I'll try this and let you know. Actually the cluster and SAN was deployed by a local vendor of Dell and they are not much sure about this thing. On Wed, Oct 3, 2012 at 3

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread John Hearns
If I may ask, which comapny installed thsi cluster for you? Surely they will advise on how to NFS mount the storage on the compute nodes?

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread John Hearns
Data is large and cannot be copied to the local drives od the compute nodes as the data is large. I understand that. I think that you have storage attached to your cluster head node - the 'SAN storage' you refer to. Lets' call that volume /data All you need to do is edit the /etc/exports file o

[OMPI users] problem with rankfile and openmpi-1.6.2

2012-10-03 Thread Siegmar Gross
Hi, I want to test process bindings with a rankfile in openmpi-1.6.2. Both machines are dual-processor dual-core machines running Solaris 10 x86_64. tyr fd1026 138 cat host_sunpc0_1 sunpc0 slots=4 sunpc1 slots=4 tyr fd1026 139 cat rankfile rank 0=sunpc0 slot=0:0-1,1:0-1 rank 1=sunpc1 slot=0:0-

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread Syed Ahsan Ali
Data is large and cannot be copied to the local drives od the compute nodes as the data is large. Second option is good but the thing I don't understand is that when each and everything is NFS mounted to the compute nodes then why it can't takes the external SAN drives too, I don't know how to expo

Re: [OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread John Hearns
You need to either copy the data to storage which the cluster nodes have mounted. Surely your cluster vendor included local storage? Or you can configure the cluster head node to export the SAN volume by NFS

[OMPI users] Need solution- nodes can't find the paths.

2012-10-03 Thread Syed Ahsan Ali
Dear All I have a Dell Cluster running Platform Cluster Manager (PCM) , the compute nodes are NFS mounted with the master node. Storage (SAN) is mounted to the installer node only, the problem is that I am running a programme which uses data which resides on Storage , so as far as running the prog

[OMPI users] Load and link MPI Host at runtime

2012-10-03 Thread mostafa . barmshory
Hi:I`m looking for a document in 'Run MPI At Run-time' topic. My idea is to load MPI and link host at run-time in special situation. please help. Thanks