Sometimes, after termination of the program, launched with the command
"sbatch ... -o myprogram.out .", no file "myprogram.out" is being
produced. Could this be due to the above mentioned problem?
Thu, 5 Jun 2014 07:45:01 -0700 от Ralph Castain :
>FWIW: support for the --resv-ports opti
Hello!
I am using Open MPI v1.8.1 in
example program hello_oshmem.cpp.
When I put spml_ikrit_np = 1000 (more than 4) and run task on 4 (2,1) nodes, I
get an:
in out file:
No available spml components were found!
This means that there are no components of this type installed on your
system or al
could you please provide command line ?
On Fri, Jun 6, 2014 at 10:56 AM, Timur Ismagilov wrote:
> Hello!
>
> I am using Open MPI v1.8.1 in
> example program hello_oshmem.cpp.
>
> When I put spml_ikrit_np = 1000 (more than 4) and run task on 4 (2,1)
> nodes, I get an:
> in out file:
> No availa
Possible - honestly don't know
On Jun 6, 2014, at 12:16 AM, Timur Ismagilov wrote:
> Sometimes, after termination of the program, launched with the command
> "sbatch ... -o myprogram.out .", no file "myprogram.out" is being
> produced. Could this be due to the above mentioned problem?
>
Here are the results when logging in to the compute node via ssh and running as
you suggest:
[binf102:fischega] $ mpirun -np 2 -mca btl openib,sm,self ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Huh - how strange. I can't imagine what it has to do with Torque vs rsh - this
is failing when the openib BTL is trying to create the connection, which comes
way after the launch is complete.
Are you able to run this with btl tcp,sm,self? If so, that would confirm that
everything else is correc
Yep, TCP works fine when launched via Torque/qsub:
[binf315:fischega] $ mpirun -np 2 -mca btl tcp,sm,self ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decre
For the PBS scheduler and using a build of OpenMPI 1.6 built against PBS
include files + libs, is there a way to determine (perhaps via some debugging
flags passed to mpirun) what job placement parameters are passed from the PBS
scheduler to OpenMPI? In particular, I am talking about task place
We currently only get the node and slots/node info from PBS - we don't get any
task placement info at all. We then use the mpirun cmd options and built-in
mappers to map the tasks to the nodes.
I suppose we could do more integration in that regard, but haven't really seen
a reason to do so - th
Fascinating - I can only assume that Torque is setting something in the
environment that is creating the confusion. Sadly, Nathan is at the MPI Forum
this week, so we may have to wait until Mon to get his input on the problem as
he wrote the udcm code.
On Jun 6, 2014, at 8:51 AM, Fischer, Greg
OK, so at the least, how can I get the node and slots/node info that is passed
from PBS?
I ask because I'm trying to troubleshoot a problem w/ PBS and the build of
OpenMPI 1.6 I noted. If I submit a 24-process simple job through PBS using a
script which has:
/usr/local/openmpi/bin/orterun -n
Am 06.06.2014 um 18:58 schrieb Sasso, John (GE Power & Water, Non-GE):
> OK, so at the least, how can I get the node and slots/node info that is
> passed from PBS?
>
> I ask because I’m trying to troubleshoot a problem w/ PBS and the build of
> OpenMPI 1.6 I noted. If I submit a 24-process si
You can always add --display-allocation to the cmd line to see what we thought
we received.
If you configure OMPI with --enable-debug, you can set --mca ras_base_verbose
10 to see the details
On Jun 6, 2014, at 10:01 AM, Reuti wrote:
> Am 06.06.2014 um 18:58 schrieb Sasso, John (GE Power & W
On 06/06/2014 01:05 PM, Ralph Castain wrote:
You can always add --display-allocation to the cmd line to see what we
thought we received.
If you configure OMPI with --enable-debug, you can set --mca
ras_base_verbose 10 to see the details
Hi John
On the Torque side, you can put a line "cat $P
On Jun 6, 2014, at 10:24 AM, Gus Correa wrote:
> On 06/06/2014 01:05 PM, Ralph Castain wrote:
>> You can always add --display-allocation to the cmd line to see what we
>> thought we received.
>>
>> If you configure OMPI with --enable-debug, you can set --mca
>> ras_base_verbose 10 to see the de
Thanks for the reply. I tried out the --display-allocation option with
several different combinations and have attached the output. I see
this behavior on both RHEL6.4, RHEL6.5, and RHEL5.10 clusters.
Here's debugging info on the segfault. Does that help? FWIW this does
not seem to crash on the R
fixed here: https://svn.open-mpi.org/trac/ompi/changeset/31962
Thanks for report.
On Thu, Jun 5, 2014 at 7:45 PM, Mike Dubman
wrote:
> seems oshmem_info uses uninitialized value.
> we will check it, thanks for report.
>
>
> On Thu, Jun 5, 2014 at 6:56 PM, Timur Ismagilov
> wrote:
>
>> Hello!
Re: $PBS_NODEFILE, we use that to create the hostfile that is passed via
--hostfile (i.e. the two are the same).
To further debug this, I passed "--display-allocation --display-map" to
orterun, which resulted in:
== ALLOCATED NODES ==
Data for nod
Sorry to pester with questions, but I'm trying to narrow down the issue.
* What kind of chips are on these machines?
* If they have h/w threads, are they enabled?
* you might have lstopo on one of those machines - could you pass along its
output? Otherwise, you can run a simple "mpirun -n 1 -mc
Hello
I am using OpenMPI ver 1.8.1 on a cluster of 4 machines.
One Redhat 6.2 and three busybox machine. They are all 64bit environment.
I want to use --preload-binary option to send the binary file to hosts but
it's not working.
# /mpi/bin/mpirun --prefix /mpi --preload-files ./a.out --allow-run
It's possible that you are hitting a bug - not sure how much the cpus-per-proc
option has been exercised in 1.6. Is this 1.6.5, or some other member of that
series?
I don't have a Torque machine handy any more, but should be able to test this
scenario on my boxes
On Jun 6, 2014, at 10:51 AM,
Supposed to, yes - but I don't know how much testing it has seen. I can try to
take a look
On Jun 6, 2014, at 12:02 PM, E.O. wrote:
> Hello
> I am using OpenMPI ver 1.8.1 on a cluster of 4 machines.
> One Redhat 6.2 and three busybox machine. They are all 64bit environment.
>
> I want to use -
Version 1.6 (i.e. prior to 1.6.1)
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, June 06, 2014 3:03 PM
To: Open MPI Users
Subject: Re: [OMPI users] Determining what parameters a scheduler passes to
OpenMPI
It's possible that yo
You might want to update to 1.6.5, if you can - I'll see what I can find
On Jun 6, 2014, at 12:07 PM, Sasso, John (GE Power & Water, Non-GE)
wrote:
> Version 1.6 (i.e. prior to 1.6.1)
>
> -Original Message-
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
>
Am 06.06.2014 um 21:04 schrieb Ralph Castain:
> Supposed to, yes - but I don't know how much testing it has seen. I can try
> to take a look
Wasn't it on the list recently, that 1.8.1 should do it even without
passphraseless SSH between the nodes?
-- Reuti
> On Jun 6, 2014, at 12:02 PM, E.O.
No problem -
These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz chips.
2 per node, 8 cores each. No threading enabled.
$ lstopo
Machine (64GB)
NUMANode L#0 (P#0 32GB)
Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
Okay, I'll poke into this - thanks!
On Jun 6, 2014, at 12:48 PM, Dan Dietz wrote:
> No problem -
>
> These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz chips.
> 2 per node, 8 cores each. No threading enabled.
>
> $ lstopo
> Machine (64GB)
> NUMANode L#0 (P#0 32GB)
>Socket L#0
Yeah, it doesn't require ssh any more - but I haven't tested it in a bit, and
so it's possible something crept in there.
On Jun 6, 2014, at 12:27 PM, Reuti wrote:
> Am 06.06.2014 um 21:04 schrieb Ralph Castain:
>
>> Supposed to, yes - but I don't know how much testing it has seen. I can try
>
Hi Dan,
Please try:
mpirun -np 2 --map-by socket:pe=8 ./hello
or
mpirun -np 2 --map-by slot:pe=8 ./hello
You can not bind 8 cpus to the object "core" which has
only one cpu. This limitation started from 1.8 series.
The objcet "socket" has 8 cores in your case. So you
can do it. And, the object
Okay, I found the problem and think I have a fix that I posted (copied EO on
it). You are welcome to download the patch and try it. Scheduled for release in
1.8.2
Thanks
Ralph
On Jun 6, 2014, at 1:01 PM, Ralph Castain wrote:
> Yeah, it doesn't require ssh any more - but I haven't tested it i
Hi Josh,
I asked one of our more advanced users to add the "-mca btl_openib_if_include
mlx4_0:1" argument to his job script. Unfortunately, the same error
occurred as before.
We'll keep digging on our end; if you have any other suggestions, please
let us know.
Tim
On Thu, Jun 5, 2014 at 7:32 P
Thank you!
With the patch, --preload-binary option is working fine.
However, if I add "--gmca plm_rsh_no_tree_spawn 1" as a mpirun command line
option, it hangs.
# /mpi/bin/mpirun --allow-run-as-root --gmca plm_rsh_no_tree_spawn 1
--preload-binary --hostfile /root/.hosts --prefix /mpi --np 120 a.o
Don't know - I haven't seen someone use that option in awhile. Is there some
reason to do so?
On Jun 6, 2014, at 3:44 PM, E.O. wrote:
> Thank you!
> With the patch, --preload-binary option is working fine.
> However, if I add "--gmca plm_rsh_no_tree_spawn 1" as a mpirun command line
> option,
Looks like there is some strange interaction there, but I doubt I'll get
around to fixing it soon unless someone has a burning reason to not use tree
spawn when preloading binaries. I'll mark it down as something to look at as
time permits.
On Jun 6, 2014, at 4:28 PM, Ralph Castain wrote:
>
HmmmTetsuya is quite correct. Afraid I got distracted by the segfault
(still investigating that one). Our default policy for 2 processes is to map-by
core, and that would indeed fail when cpus-per-proc > 1. However, that seems
like a non-intuitive requirement, so let me see if I can make thi
35 matches
Mail list logo