Hi,
Am 26.11.2013 um 01:22 schrieb tmish...@jcity.maeda.co.jp:
> Thank you very much for your quick response.
>
> I'm afraid to say that I found one more issuse...
>
> It's not so serious. Please check it when you have a lot of time.
>
> The problem is cpus-per-proc with -map-by option under T
Hi,
I used interactive mode just because it was easy to report the behavior.
I'm sure that submiting job gives the same result.
Therefore, I think the environment variables are also set in the session.
Anyway, I'm away from the cluster now. Regarding "$ env | grep PBS",
I'll send it later.
Reg
Hello,
Just like r29736, I believe that there are some missing tests in
ompi/mca/coll/libnbc/nbc_iscatterv.c and ompi/mca/coll/libnbc/nbc_igatherv.c
Thoughts ?
Pierre
Index: nbc_igatherv.c
===
--- nbc_igatherv.c (revision 29756)
Here are the results of those two commands:
$ )mpic++ -show
g++ -I/Users/meredithk/tools/openmpi/include
-L/Users/meredithk/tools/openmpi/lib -lmpi_cxx -lmpi -lm
$ )otool -L /Users/meredithk/tools/openmpi/lib/libmpi_cxx.dylib
/Users/meredithk/tools/openmpi/lib/libmpi_cxx.dylib:
/Users/me
Nathan,
(Please forget about the segfault. It was my mistake).
I use OpenMPI-1.7.2 (build with gcc-4.7.2) to run the program. I used
contrib/platform/lanl/cray_xe6/optimized_lustre and
--enable-mpirun-prefix-by-default for configuration. As I said, it works
fine with aprun, but fails with mpirun
Weird. That is the same configuration we have deployed on Cielito and Cielo.
Does
it work under an msub allocation?
BTW, with that configuration you should not set
plm_base_strip_prefix_from_node_names
to 0. That will confuse orte since the node hostname will not match what was
supplied by alps.
Nathan,
Now I remove strip_prefix stuff, which was applied to the other versions
of OpenMPI.
I still have the same problem with msubrun command.
knteran@mzlogin01:~> msub -lnodes=2:ppn=16 -I
qsub: waiting for job 7754058.sdb to start
qsub: job 7754058.sdb ready
knteran@mzlogin01:~> cd test-ope
Seems like something is going wrong with processor binding. Can you run with
-mca plm_base_verbose 100 . Might shed some light on why it thinks there are
not enough slots.
-Nathan Hjelm
Application Readiness, HPC-5, LANL
On Tue, Nov 26, 2013 at 09:18:14PM +, Teranishi, Keita wrote:
> Nathan,
Nathan,
Please see the attached obtained from two cases (-np 2 and -np 4).
Thanks,
---
--
Keita Teranishi
Principal Member of Technical Staff
Scalable Modeling and Analysis Systems
Sandia National Laboratories
Livermore, CA 9
Well, no hints as to the error there. Looks identical to the output on my XE-6.
How
about setting -mca rmaps_base_verbose 100 . See what is going on with the
mapper.
-Nathan Hjelm
Application Readiness, HPC-5, LANL
On Tue, Nov 26, 2013 at 09:33:20PM +, Teranishi, Keita wrote:
> Nathan,
>
>
Nathan,
I am hoping these files would help you.
Thanks,
Keita
On 11/26/13 1:41 PM, "Nathan Hjelm" wrote:
>Well, no hints as to the error there. Looks identical to the output on my
>XE-6. How
>about setting -mca rmaps_base_verbose 100 . See what is going on with the
>mapper.
>
>-Nathan Hjelm
Ok, that sheds a little more light on the situation. For some reason it sees 2
nodes
apparently with one slot each. One more set out outputs would be helpful.
Please run
with -mca ras_base_verbose 100 . That way I can see what was read from alps.
-Nathan
On Tue, Nov 26, 2013 at 10:14:11PM +
Nathan,
Here it is.
Keita
On 11/26/13 3:02 PM, "Nathan Hjelm" wrote:
>Ok, that sheds a little more light on the situation. For some reason it
>sees 2 nodes
>apparently with one slot each. One more set out outputs would be helpful.
>Please run
>with -mca ras_base_verbose 100 . That way I ca
??? Alps reports that the two nodes each have one slot. What PE release
are you using. A quick way to find out is ls -l /opt/cray/xe-sysroot on the
external login node (this directory does not exist on the internal login nodes.)
-Nathan
On Tue, Nov 26, 2013 at 11:07:36PM +, Teranishi, Keita w
Hi,
Here is the output of "printenv | grep PBS". It seems that all variables
are set as I expected.
[mishima@manage mpi_demo]$ qsub -I -l nodes=1:ppn=32
qsub: waiting for job 8120.manage.cluster to start
qsub: job 8120.manage.cluster ready
[mishima@node03 ~]$ printenv | grep PBS
PBS_VERSION=TO
Here is what we can see:
knteran@mzlogin01e:~> ls -l /opt/cray/xe-sysroot
total 8
drwxr-xr-x 6 bin bin 4096 2012-02-04 11:05 4.0.36.securitypatch.20111221
drwxr-xr-x 6 bin bin 4096 2013-01-11 15:17 4.1.40
lrwxrwxrwx 1 root root6 2013-01-11 15:19 default -> 4.1.40
Thanks,
Keita
On 11/2
Alright, everything is identical to Cielito but it looks like you are getting
bad data from alps.
I think we changed some of the alps parsing for 1.7.3. Can you give that
version a try and let me know if it resolves your issue. If not I can add
better debugging to the ras/alps module.
-Nathan
On
17 matches
Mail list logo