Hooray!
On Dec 19, 2013, at 10:14 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> Thank you for your fix. It works for me.
>
> Tetsuya Mishima
>
>
>> Actually, it looks like it would happen with hetero-nodes set - only
> required that at least two nodes have the same architecture.
Hi Ralph,
Thank you for your fix. It works for me.
Tetsuya Mishima
> Actually, it looks like it would happen with hetero-nodes set - only
required that at least two nodes have the same architecture. So you might
want to give the trunk a shot as it may well now be
> fixed.
>
>
> On Dec 19, 201
Actually, it looks like it would happen with hetero-nodes set - only required
that at least two nodes have the same architecture. So you might want to give
the trunk a shot as it may well now be fixed.
On Dec 19, 2013, at 8:35 AM, Ralph Castain wrote:
> Hmmm...not having any luck tracking thi
Hmmm...not having any luck tracking this down yet. If anything, based on what I
saw in the code, I would have expected it to fail when hetero-nodes was false,
not the other way around.
I'll keep poking around - just wanted to provide an update.
On Dec 19, 2013, at 12:54 AM, tmish...@jcity.maeda
Hi Ralph, sorry for intersecting post.
Your advice about -hetero-nodes in other thread gives me a hint.
I already put "orte_hetero_nodes = 1" in my mca-params.conf, because
you told me a month ago that my environment would need this option.
Removing this line from mca-params.conf, then it work
Yes, it's very strange. But I don't think there's any chance that
I have < 8 actual cores on the node. I guess that you cat replicate
it with SLURM, please try it again.
I changed to use node10 and node11, then I got the warning against
node11.
Furthermore, just as an information for you, I tri
Very strange - I can't seem to replicate it. Is there any chance that you have
< 8 actual cores on node12?
On Dec 18, 2013, at 4:53 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph, sorry for confusing you.
>
> At that time, I cut and paste the part of "cat $PBS_NODEFILE".
> I guess I di
Hi Ralph, sorry for confusing you.
At that time, I cut and paste the part of "cat $PBS_NODEFILE".
I guess I didn't paste the last line by my mistake.
I retried the test and below one is exactly what I got when I did the test.
[mishima@manage ~]$ qsub -I -l nodes=node11:ppn=8+node12:ppn=8
qsub:
I removed the debug in #2 - thanks for reporting it
For #1, it actually looks to me like this is correct. If you look at your
allocation, there are only 7 slots being allocated on node12, yet you have
asked for 8 cpus to be assigned (2 procs with 2 cpus/proc). So the warning is
in fact correct
On Dec 18, 2013, at 7:04 PM,
wrote:
> 3) I use PGI compiler. It can not accept compiler switch
> "-Wno-variadic-macros", which is
> included in configure script.
>
> btl_usnic_CFLAGS="-Wno-variadic-macros"
Yoinks. I'll fix (that flag is only intended for our private copy of v1.6 --
tr
Hi Ralph, I found that openmpi-1.7.4rc1 was already uploaded. So I'd like
to report
3 issues mainly regarding -cpus-per-proc.
1) When I use 2 nodes(node11,node12), which has 8 cores each(= 2 sockets X
4 cores/socket),
it starts to produce the error again as shown below. At least,
openmpi-1.7.4a1
Thank you, Ralph.
I just hope that it helps you to improve the quality of openmpi-1.7 series.
Tetsuya Mishima
> Hmmm...okay, I understand the scenario. Must be something in the algo
when it only has one node, so it shouldn't be too hard to track down.
>
> I'm off on travel for a few days, but
Hmmm...okay, I understand the scenario. Must be something in the algo when it
only has one node, so it shouldn't be too hard to track down.
I'm off on travel for a few days, but will return to this when I get back.
Sorry for delay - will try to look at this while I'm gone, but can't promise
any
Hi Ralph, sorry for confusing.
We usually logon to "manage", which is our control node.
>From manage, we submit job or enter a remote node such as
node03 by torque interactive mode(qsub -I).
At that time, instead of torque, I just did rsh to node03 from manage
and ran myprog on the node. I hope
On Dec 10, 2013, at 6:05 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> I tried again with -cpus-per-proc 2 as shown below.
> Here, I found that "-map-by socket:span" worked well.
>
> [mishima@node03 demos]$ mpirun -np 8 -report-bindings -cpus-per-proc 2
> -map-by socket:span mypro
Hi Ralph,
I tried again with -cpus-per-proc 2 as shown below.
Here, I found that "-map-by socket:span" worked well.
[mishima@node03 demos]$ mpirun -np 8 -report-bindings -cpus-per-proc 2
-map-by socket:span myprog
[node03.cluster:10879] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket
1[core
Hmmm...that's strange. I only have 2 sockets on my system, but let me poke
around a bit and see what might be happening.
On Dec 10, 2013, at 4:47 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> Thanks. I didn't know the meaning of "socket:span".
>
> But it still causes the problem,
Hi Ralph,
Thanks. I didn't know the meaning of "socket:span".
But it still causes the problem, which seems socket:span doesn't work.
[mishima@manage demos]$ qsub -I -l nodes=node03:ppn=32
qsub: waiting for job 8265.manage.cluster to start
qsub: job 8265.manage.cluster ready
[mishima@node03 ~]
No, that is actually correct. We map a socket until full, then move to the
next. What you want is --map-by socket:span
On Dec 10, 2013, at 3:42 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> I had a time to try your patch yesterday using openmpi-1.7.4a1r29646.
>
> It stopped the e
Hi Ralph,
I had a time to try your patch yesterday using openmpi-1.7.4a1r29646.
It stopped the error but unfortunately "mapping by socket" itself didn't
work
well as shown bellow:
[mishima@manage demos]$ qsub -I -l nodes=1:ppn=32
qsub: waiting for job 8260.manage.cluster to start
qsub: job 826
Hi Ralph,
Thank you for providing the fix. I'll check it in 1.7.4.
Regards,
Tetsuya Mishima
> I fixed this under the trunk (was an issue regardless of RM) and have
scheduled it for 1.7.4.
>
> Thanks!
> Ralph
>
> On Nov 25, 2013, at 4:22 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi Ra
I fixed this under the trunk (was an issue regardless of RM) and have scheduled
it for 1.7.4.
Thanks!
Ralph
On Nov 25, 2013, at 4:22 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> Thank you very much for your quick response.
>
> I'm afraid to say that I found one more issuse...
>
Hi,
Here is the output of "printenv | grep PBS". It seems that all variables
are set as I expected.
[mishima@manage mpi_demo]$ qsub -I -l nodes=1:ppn=32
qsub: waiting for job 8120.manage.cluster to start
qsub: job 8120.manage.cluster ready
[mishima@node03 ~]$ printenv | grep PBS
PBS_VERSION=TO
Hi,
I used interactive mode just because it was easy to report the behavior.
I'm sure that submiting job gives the same result.
Therefore, I think the environment variables are also set in the session.
Anyway, I'm away from the cluster now. Regarding "$ env | grep PBS",
I'll send it later.
Reg
Hi,
Am 26.11.2013 um 01:22 schrieb tmish...@jcity.maeda.co.jp:
> Thank you very much for your quick response.
>
> I'm afraid to say that I found one more issuse...
>
> It's not so serious. Please check it when you have a lot of time.
>
> The problem is cpus-per-proc with -map-by option under T
Hi Ralph,
Thank you very much for your quick response.
I'm afraid to say that I found one more issuse...
It's not so serious. Please check it when you have a lot of time.
The problem is cpus-per-proc with -map-by option under Torque manager.
It doesn't work as shown below. I guess you can get
Fixed and scheduled to move to 1.7.4. Thanks again!
On Nov 17, 2013, at 6:11 PM, Ralph Castain wrote:
> Thanks! That's precisely where I was going to look when I had time :-)
>
> I'll update tomorrow.
> Ralph
>
>
>
>
> On Sun, Nov 17, 2013 at 7:01 PM, wrote:
>
>
> Hi Ralph,
>
> This is
Thanks! That's precisely where I was going to look when I had time :-)
I'll update tomorrow.
Ralph
On Sun, Nov 17, 2013 at 7:01 PM, wrote:
>
>
> Hi Ralph,
>
> This is the continuous story of "Segmentation fault in oob_tcp.c of
> openmpi-1.7.4a1r29646".
>
> I found the cause.
>
> Firstly, I n
28 matches
Mail list logo