Hmmm...you probably can't without digging down into the diagnostics.
Perhaps we could help more if we had some idea how you are measuring this
"latency". I ask because that is orders of magnitude worse than anything we
measure - so I suspect the problem is in your app (i.e., that the time you ar
how do I tell the difference between when the message was received and when the
message was picked up in MPI_Test?
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
Of Ralph Castain
Sent: Wednesday, October 03, 2012 1:00 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [
This list is intended for Open MPI support, not general Linux cluster support.
You might be able to get more detailed help from other forums and/or your local
cluster support admin / vendor.
Thanks!
On Oct 3, 2012, at 6:58 AM, Syed Ahsan Ali wrote:
> Thanks John for the detailed procedure. t
Out of curiosity, have you logged the time when the SP called "send" and
compared it to the time when the message was received, and when that message is
picked up in MPI_Test? In other words, have you actually verified that the
delay is in the MPI library as opposed to in your application?
On
Hi all,
I am running on an IBM BladeCenter, using Open MPI 1.4.1, and opensm subnet
manager for Infiniband
Our application has real time requirements and it has recently been proven that
it does not scale to meet future requirements.
Presently, I am re-organizing the application to process work
On Oct 3, 2012, at 8:40 AM, Siegmar Gross
wrote:
> Hi,
>
>> As I said, in the absence of a hostfile, -host assigns ONE slot for
>> each time a host is named. So the equivalent hostfile would have
>> "slots=1" to create the same pattern as your -host cmd line.
>
> That would mean that a hostfi
Hi,
> As I said, in the absence of a hostfile, -host assigns ONE slot for
> each time a host is named. So the equivalent hostfile would have
> "slots=1" to create the same pattern as your -host cmd line.
That would mean that a hostfile has nothing to do with the underlying
hardware and that it wo
On Oct 3, 2012, at 2:30 AM,
wrote:
> I`m looking for a document in 'Run MPI At Run-time' topic.
I'm can't quite parse this. Are you looking for a document with that name? If
so, I suggest Google.
> My idea is to load MPI and link host at run-time in special situation. please
> help.
I'm
As I said, in the absence of a hostfile, -host assigns ONE slot for each time a
host is named. So the equivalent hostfile would have "slots=1" to create the
same pattern as your -host cmd line.
On Oct 3, 2012, at 7:12 AM, Siegmar Gross
wrote:
> Hi,
>
> I thought that "slot" is the smallest
Hi,
I thought that "slot" is the smallest manageable entity so that I
must set "slot=4" for a dual-processor dual-core machine with one
hardware-thread per core. Today I learned about the new keyword
"sockets" for a hostfile (I didn't find it in "man orte_hosts").
How would I specify a system with
I filed a bug fix for this one. However, something you should note.
If you fail to provide a "-np N" argument to mpiexec, we assume you want ALL
all available slots filled. The rankfile will contain only those procs that you
want specifically bound. The remaining procs will be unbound.
So with
Hi,
thank you very much for your help. Now the command with "-npersocket"
works. Unfortunately it is not a solution for the other problem, which
I reported a few minutes ago.
tyr fd1026 191 cat host_sunpc0_1
sunpc0 sockets=2 slots=4
sunpc1 sockets=2 slots=4
tyr fd1026 192 mpiexec -report-bindin
On Oct 3, 2012, at 6:19 AM, Siegmar Gross
wrote:
> Hi,
>
> I recognized another problem with procecss bindings. The command
> works, if I use "-host" and it breaks, if I use "-hostfile" with
> the same machines.
>
> tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \
> -cpus
Hi,
I recognized another problem with procecss bindings. The command
works, if I use "-host" and it breaks, if I use "-hostfile" with
the same machines.
tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \
-cpus-per-proc 2 -bind-to-core hostname
sunpc1
[sunpc1:00086] MCW rank 1
Thanks to everyone who answered, in particular Ake Sandgren, it appears
to be a weird problem with acml that somehow triggers a seg fault in
libmpi, but only when running on Opterons. I'd still be interested in
figuring out how to get a more complete backtrace, but at least the
immediate problem i
Okay, I looked at this and the problem isn't in the code. The problem is that
the 1.6 series doesn't have the more sophisticated discovery and mapping
algorithms of the 1.7 series. In this case, the specific problem is that the
1.6 series doesn't automatically detect the number of sockets on a n
I saw your earlier note about this too. Just a little busy right now, but hope
to look at it soon.
Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used
code path.
On Oct 3, 2012, at 3:03 AM, Siegmar Gross
wrote:
> Hi,
>
> I want to test process bindings with a ran
Thanks John for the detailed procedure. the fstab thing was in mind but it
was not sure how to make it happen on compute nodes. I'll try this and let
you know.
Actually the cluster and SAN was deployed by a local vendor of Dell and
they are not much sure about this thing.
On Wed, Oct 3, 2012 at 3
If I may ask, which comapny installed thsi cluster for you?
Surely they will advise on how to NFS mount the storage on the compute nodes?
Data is large and cannot be copied to the local drives od the compute
nodes as the data is large.
I understand that.
I think that you have storage attached to your cluster head node - the
'SAN storage' you refer to.
Lets' call that volume /data
All you need to do is edit the /etc/exports file o
Hi,
I want to test process bindings with a rankfile in openmpi-1.6.2. Both
machines are dual-processor dual-core machines running Solaris 10 x86_64.
tyr fd1026 138 cat host_sunpc0_1
sunpc0 slots=4
sunpc1 slots=4
tyr fd1026 139 cat rankfile
rank 0=sunpc0 slot=0:0-1,1:0-1
rank 1=sunpc1 slot=0:0-
Data is large and cannot be copied to the local drives od the compute nodes
as the data is large.
Second option is good but the thing I don't understand is that when each
and everything is NFS mounted to the compute nodes then why it can't takes
the external SAN drives too, I don't know how to expo
You need to either copy the data to storage which the cluster nodes have
mounted. Surely your cluster vendor included local storage?
Or you can configure the cluster head node to export the SAN volume by NFS
Dear All
I have a Dell Cluster running Platform Cluster Manager (PCM) , the compute
nodes are NFS mounted with the master node. Storage (SAN) is mounted to the
installer node only, the problem is that I am running a programme which
uses data which resides on Storage , so as far as running the prog
Hi:I`m looking for a document in 'Run MPI At Run-time' topic. My idea is to
load MPI and link host at run-time in special situation. please help. Thanks
25 matches
Mail list logo