HmmmTetsuya is quite correct. Afraid I got distracted by the segfault
(still investigating that one). Our default policy for 2 processes is to map-by
core, and that would indeed fail when cpus-per-proc > 1. However, that seems
like a non-intuitive requirement, so let me see if I can make thi
Looks like there is some strange interaction there, but I doubt I'll get
around to fixing it soon unless someone has a burning reason to not use tree
spawn when preloading binaries. I'll mark it down as something to look at as
time permits.
On Jun 6, 2014, at 4:28 PM, Ralph Castain wrote:
>
Don't know - I haven't seen someone use that option in awhile. Is there some
reason to do so?
On Jun 6, 2014, at 3:44 PM, E.O. wrote:
> Thank you!
> With the patch, --preload-binary option is working fine.
> However, if I add "--gmca plm_rsh_no_tree_spawn 1" as a mpirun command line
> option,
Thank you!
With the patch, --preload-binary option is working fine.
However, if I add "--gmca plm_rsh_no_tree_spawn 1" as a mpirun command line
option, it hangs.
# /mpi/bin/mpirun --allow-run-as-root --gmca plm_rsh_no_tree_spawn 1
--preload-binary --hostfile /root/.hosts --prefix /mpi --np 120 a.o
Hi Josh,
I asked one of our more advanced users to add the "-mca btl_openib_if_include
mlx4_0:1" argument to his job script. Unfortunately, the same error
occurred as before.
We'll keep digging on our end; if you have any other suggestions, please
let us know.
Tim
On Thu, Jun 5, 2014 at 7:32 P
Okay, I found the problem and think I have a fix that I posted (copied EO on
it). You are welcome to download the patch and try it. Scheduled for release in
1.8.2
Thanks
Ralph
On Jun 6, 2014, at 1:01 PM, Ralph Castain wrote:
> Yeah, it doesn't require ssh any more - but I haven't tested it i
Hi Dan,
Please try:
mpirun -np 2 --map-by socket:pe=8 ./hello
or
mpirun -np 2 --map-by slot:pe=8 ./hello
You can not bind 8 cpus to the object "core" which has
only one cpu. This limitation started from 1.8 series.
The objcet "socket" has 8 cores in your case. So you
can do it. And, the object
Yeah, it doesn't require ssh any more - but I haven't tested it in a bit, and
so it's possible something crept in there.
On Jun 6, 2014, at 12:27 PM, Reuti wrote:
> Am 06.06.2014 um 21:04 schrieb Ralph Castain:
>
>> Supposed to, yes - but I don't know how much testing it has seen. I can try
>
Okay, I'll poke into this - thanks!
On Jun 6, 2014, at 12:48 PM, Dan Dietz wrote:
> No problem -
>
> These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz chips.
> 2 per node, 8 cores each. No threading enabled.
>
> $ lstopo
> Machine (64GB)
> NUMANode L#0 (P#0 32GB)
>Socket L#0
No problem -
These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz chips.
2 per node, 8 cores each. No threading enabled.
$ lstopo
Machine (64GB)
NUMANode L#0 (P#0 32GB)
Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
Am 06.06.2014 um 21:04 schrieb Ralph Castain:
> Supposed to, yes - but I don't know how much testing it has seen. I can try
> to take a look
Wasn't it on the list recently, that 1.8.1 should do it even without
passphraseless SSH between the nodes?
-- Reuti
> On Jun 6, 2014, at 12:02 PM, E.O.
You might want to update to 1.6.5, if you can - I'll see what I can find
On Jun 6, 2014, at 12:07 PM, Sasso, John (GE Power & Water, Non-GE)
wrote:
> Version 1.6 (i.e. prior to 1.6.1)
>
> -Original Message-
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
>
Version 1.6 (i.e. prior to 1.6.1)
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, June 06, 2014 3:03 PM
To: Open MPI Users
Subject: Re: [OMPI users] Determining what parameters a scheduler passes to
OpenMPI
It's possible that yo
Supposed to, yes - but I don't know how much testing it has seen. I can try to
take a look
On Jun 6, 2014, at 12:02 PM, E.O. wrote:
> Hello
> I am using OpenMPI ver 1.8.1 on a cluster of 4 machines.
> One Redhat 6.2 and three busybox machine. They are all 64bit environment.
>
> I want to use -
It's possible that you are hitting a bug - not sure how much the cpus-per-proc
option has been exercised in 1.6. Is this 1.6.5, or some other member of that
series?
I don't have a Torque machine handy any more, but should be able to test this
scenario on my boxes
On Jun 6, 2014, at 10:51 AM,
Hello
I am using OpenMPI ver 1.8.1 on a cluster of 4 machines.
One Redhat 6.2 and three busybox machine. They are all 64bit environment.
I want to use --preload-binary option to send the binary file to hosts but
it's not working.
# /mpi/bin/mpirun --prefix /mpi --preload-files ./a.out --allow-run
Sorry to pester with questions, but I'm trying to narrow down the issue.
* What kind of chips are on these machines?
* If they have h/w threads, are they enabled?
* you might have lstopo on one of those machines - could you pass along its
output? Otherwise, you can run a simple "mpirun -n 1 -mc
Re: $PBS_NODEFILE, we use that to create the hostfile that is passed via
--hostfile (i.e. the two are the same).
To further debug this, I passed "--display-allocation --display-map" to
orterun, which resulted in:
== ALLOCATED NODES ==
Data for nod
fixed here: https://svn.open-mpi.org/trac/ompi/changeset/31962
Thanks for report.
On Thu, Jun 5, 2014 at 7:45 PM, Mike Dubman
wrote:
> seems oshmem_info uses uninitialized value.
> we will check it, thanks for report.
>
>
> On Thu, Jun 5, 2014 at 6:56 PM, Timur Ismagilov
> wrote:
>
>> Hello!
Thanks for the reply. I tried out the --display-allocation option with
several different combinations and have attached the output. I see
this behavior on both RHEL6.4, RHEL6.5, and RHEL5.10 clusters.
Here's debugging info on the segfault. Does that help? FWIW this does
not seem to crash on the R
On Jun 6, 2014, at 10:24 AM, Gus Correa wrote:
> On 06/06/2014 01:05 PM, Ralph Castain wrote:
>> You can always add --display-allocation to the cmd line to see what we
>> thought we received.
>>
>> If you configure OMPI with --enable-debug, you can set --mca
>> ras_base_verbose 10 to see the de
On 06/06/2014 01:05 PM, Ralph Castain wrote:
You can always add --display-allocation to the cmd line to see what we
thought we received.
If you configure OMPI with --enable-debug, you can set --mca
ras_base_verbose 10 to see the details
Hi John
On the Torque side, you can put a line "cat $P
You can always add --display-allocation to the cmd line to see what we thought
we received.
If you configure OMPI with --enable-debug, you can set --mca ras_base_verbose
10 to see the details
On Jun 6, 2014, at 10:01 AM, Reuti wrote:
> Am 06.06.2014 um 18:58 schrieb Sasso, John (GE Power & W
Am 06.06.2014 um 18:58 schrieb Sasso, John (GE Power & Water, Non-GE):
> OK, so at the least, how can I get the node and slots/node info that is
> passed from PBS?
>
> I ask because I’m trying to troubleshoot a problem w/ PBS and the build of
> OpenMPI 1.6 I noted. If I submit a 24-process si
OK, so at the least, how can I get the node and slots/node info that is passed
from PBS?
I ask because I'm trying to troubleshoot a problem w/ PBS and the build of
OpenMPI 1.6 I noted. If I submit a 24-process simple job through PBS using a
script which has:
/usr/local/openmpi/bin/orterun -n
Fascinating - I can only assume that Torque is setting something in the
environment that is creating the confusion. Sadly, Nathan is at the MPI Forum
this week, so we may have to wait until Mon to get his input on the problem as
he wrote the udcm code.
On Jun 6, 2014, at 8:51 AM, Fischer, Greg
We currently only get the node and slots/node info from PBS - we don't get any
task placement info at all. We then use the mpirun cmd options and built-in
mappers to map the tasks to the nodes.
I suppose we could do more integration in that regard, but haven't really seen
a reason to do so - th
For the PBS scheduler and using a build of OpenMPI 1.6 built against PBS
include files + libs, is there a way to determine (perhaps via some debugging
flags passed to mpirun) what job placement parameters are passed from the PBS
scheduler to OpenMPI? In particular, I am talking about task place
Yep, TCP works fine when launched via Torque/qsub:
[binf315:fischega] $ mpirun -np 2 -mca btl tcp,sm,self ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decre
Huh - how strange. I can't imagine what it has to do with Torque vs rsh - this
is failing when the openib BTL is trying to create the connection, which comes
way after the launch is complete.
Are you able to run this with btl tcp,sm,self? If so, that would confirm that
everything else is correc
Here are the results when logging in to the compute node via ssh and running as
you suggest:
[binf102:fischega] $ mpirun -np 2 -mca btl openib,sm,self ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Possible - honestly don't know
On Jun 6, 2014, at 12:16 AM, Timur Ismagilov wrote:
> Sometimes, after termination of the program, launched with the command
> "sbatch ... -o myprogram.out .", no file "myprogram.out" is being
> produced. Could this be due to the above mentioned problem?
>
could you please provide command line ?
On Fri, Jun 6, 2014 at 10:56 AM, Timur Ismagilov wrote:
> Hello!
>
> I am using Open MPI v1.8.1 in
> example program hello_oshmem.cpp.
>
> When I put spml_ikrit_np = 1000 (more than 4) and run task on 4 (2,1)
> nodes, I get an:
> in out file:
> No availa
Hello!
I am using Open MPI v1.8.1 in
example program hello_oshmem.cpp.
When I put spml_ikrit_np = 1000 (more than 4) and run task on 4 (2,1) nodes, I
get an:
in out file:
No available spml components were found!
This means that there are no components of this type installed on your
system or al
Sometimes, after termination of the program, launched with the command
"sbatch ... -o myprogram.out .", no file "myprogram.out" is being
produced. Could this be due to the above mentioned problem?
Thu, 5 Jun 2014 07:45:01 -0700 от Ralph Castain :
>FWIW: support for the --resv-ports opti
35 matches
Mail list logo