subject:"Re\: \[OMPI users\] openmpi\-1.7.4a1r29646 with \-hostfile option under Torque manager"

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-20 Thread Ralph Castain

Hooray! On Dec 19, 2013, at 10:14 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > Thank you for your fix. It works for me. > > Tetsuya Mishima > > >> Actually, it looks like it would happen with hetero-nodes set - only > required that at least two nodes have the same architecture.

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-20 Thread tmishima

Hi Ralph, Thank you for your fix. It works for me. Tetsuya Mishima > Actually, it looks like it would happen with hetero-nodes set - only required that at least two nodes have the same architecture. So you might want to give the trunk a shot as it may well now be > fixed. > > > On Dec 19, 201

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-19 Thread Ralph Castain

Actually, it looks like it would happen with hetero-nodes set - only required that at least two nodes have the same architecture. So you might want to give the trunk a shot as it may well now be fixed. On Dec 19, 2013, at 8:35 AM, Ralph Castain wrote: > Hmmm...not having any luck tracking thi

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-19 Thread Ralph Castain

Hmmm...not having any luck tracking this down yet. If anything, based on what I saw in the code, I would have expected it to fail when hetero-nodes was false, not the other way around. I'll keep poking around - just wanted to provide an update. On Dec 19, 2013, at 12:54 AM, tmish...@jcity.maeda

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-19 Thread tmishima

Hi Ralph, sorry for intersecting post. Your advice about -hetero-nodes in other thread gives me a hint. I already put "orte_hetero_nodes = 1" in my mca-params.conf, because you told me a month ago that my environment would need this option. Removing this line from mca-params.conf, then it work

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread tmishima

Yes, it's very strange. But I don't think there's any chance that I have < 8 actual cores on the node. I guess that you cat replicate it with SLURM, please try it again. I changed to use node10 and node11, then I got the warning against node11. Furthermore, just as an information for you, I tri

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread Ralph Castain

Very strange - I can't seem to replicate it. Is there any chance that you have < 8 actual cores on node12? On Dec 18, 2013, at 4:53 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, sorry for confusing you. > > At that time, I cut and paste the part of "cat $PBS_NODEFILE". > I guess I di

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread tmishima

Hi Ralph, sorry for confusing you. At that time, I cut and paste the part of "cat $PBS_NODEFILE". I guess I didn't paste the last line by my mistake. I retried the test and below one is exactly what I got when I did the test. [mishima@manage ~]$ qsub -I -l nodes=node11:ppn=8+node12:ppn=8 qsub:

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread Ralph Castain

I removed the debug in #2 - thanks for reporting it For #1, it actually looks to me like this is correct. If you look at your allocation, there are only 7 slots being allocated on node12, yet you have asked for 8 cpus to be assigned (2 procs with 2 cpus/proc). So the warning is in fact correct

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread Jeff Squyres (jsquyres)

On Dec 18, 2013, at 7:04 PM, wrote: > 3) I use PGI compiler. It can not accept compiler switch > "-Wno-variadic-macros", which is > included in configure script. > > btl_usnic_CFLAGS="-Wno-variadic-macros" Yoinks. I'll fix (that flag is only intended for our private copy of v1.6 -- tr

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-18 Thread tmishima

Hi Ralph, I found that openmpi-1.7.4rc1 was already uploaded.　So I'd like to report 3 issues mainly regarding -cpus-per-proc. 1) When I use 2 nodes(node11,node12), which has 8 cores each(= 2 sockets X 4 cores/socket), it starts to produce the error again as shown below. At least, openmpi-1.7.4a1

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-11 Thread tmishima

Thank you, Ralph. I just hope that it helps you to improve the quality of openmpi-1.7 series. Tetsuya Mishima > Hmmm...okay, I understand the scenario. Must be something in the algo when it only has one node, so it shouldn't be too hard to track down. > > I'm off on travel for a few days, but

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread Ralph Castain

Hmmm...okay, I understand the scenario. Must be something in the algo when it only has one node, so it shouldn't be too hard to track down. I'm off on travel for a few days, but will return to this when I get back. Sorry for delay - will try to look at this while I'm gone, but can't promise any

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima

Hi Ralph, sorry for confusing. We usually logon to "manage", which is our control node. >From manage, we submit job or enter a remote node such as node03 by torque interactive mode(qsub -I). At that time, instead of torque, I just did rsh to node03 from manage and ran myprog on the node. I hope

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread Ralph Castain

On Dec 10, 2013, at 6:05 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > I tried again with -cpus-per-proc 2 as shown below. > Here, I found that "-map-by socket:span" worked well. > > [mishima@node03 demos]$ mpirun -np 8 -report-bindings -cpus-per-proc 2 > -map-by socket:span mypro

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima

Hi Ralph, I tried again with -cpus-per-proc 2 as shown below. Here, I found that "-map-by socket:span" worked well. [mishima@node03 demos]$ mpirun -np 8 -report-bindings -cpus-per-proc 2 -map-by socket:span myprog [node03.cluster:10879] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket 1[core

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread Ralph Castain

Hmmm...that's strange. I only have 2 sockets on my system, but let me poke around a bit and see what might be happening. On Dec 10, 2013, at 4:47 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > Thanks. I didn't know the meaning of "socket:span". > > But it still causes the problem,

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima

Hi Ralph, Thanks. I didn't know the meaning of "socket:span". But it still causes the problem, which seems socket:span doesn't work. [mishima@manage demos]$ qsub -I -l nodes=node03:ppn=32 qsub: waiting for job 8265.manage.cluster to start qsub: job 8265.manage.cluster ready [mishima@node03 ~]

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread Ralph Castain

No, that is actually correct. We map a socket until full, then move to the next. What you want is --map-by socket:span On Dec 10, 2013, at 3:42 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > I had a time to try your patch yesterday using openmpi-1.7.4a1r29646. > > It stopped the e

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-10 Thread tmishima

Hi Ralph, I had a time to try your patch yesterday using openmpi-1.7.4a1r29646. It stopped the error but unfortunately "mapping by socket" itself didn't work well as shown bellow: [mishima@manage demos]$ qsub -I -l nodes=1:ppn=32 qsub: waiting for job 8260.manage.cluster to start qsub: job 826

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-08 Thread tmishima

Hi Ralph, Thank you for providing the fix. I'll check it in 1.7.4. Regards, Tetsuya Mishima > I fixed this under the trunk (was an issue regardless of RM) and have scheduled it for 1.7.4. > > Thanks! > Ralph > > On Nov 25, 2013, at 4:22 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Hi Ra

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-08 Thread Ralph Castain

I fixed this under the trunk (was an issue regardless of RM) and have scheduled it for 1.7.4. Thanks! Ralph On Nov 25, 2013, at 4:22 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > Thank you very much for your quick response. > > I'm afraid to say that I found one more issuse... >

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-26 Thread tmishima

Hi, Here is the output of "printenv | grep PBS". It seems that all variables are set as I expected. [mishima@manage mpi_demo]$ qsub -I -l nodes=1:ppn=32 qsub: waiting for job 8120.manage.cluster to start qsub: job 8120.manage.cluster ready [mishima@node03 ~]$ printenv | grep PBS PBS_VERSION=TO

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-26 Thread tmishima

Hi, I used interactive mode just because it was easy to report the behavior. I'm sure that submiting job gives the same result. Therefore, I think the environment variables are also set in the session. Anyway, I'm away from the cluster now. Regarding "$ env | grep PBS", I'll send it later. Reg

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-26 Thread Reuti

Hi, Am 26.11.2013 um 01:22 schrieb tmish...@jcity.maeda.co.jp: > Thank you very much for your quick response. > > I'm afraid to say that I found one more issuse... > > It's not so serious. Please check it when you have a lot of time. > > The problem is cpus-per-proc with -map-by option under T

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-25 Thread tmishima

Hi Ralph, Thank you very much for your quick response. I'm afraid to say that I found one more issuse... It's not so serious. Please check it when you have a lot of time. The problem is cpus-per-proc with -map-by option under Torque manager. It doesn't work as shown below. I guess you can get

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-24 Thread Ralph Castain

Fixed and scheduled to move to 1.7.4. Thanks again! On Nov 17, 2013, at 6:11 PM, Ralph Castain wrote: > Thanks! That's precisely where I was going to look when I had time :-) > > I'll update tomorrow. > Ralph > > > > > On Sun, Nov 17, 2013 at 7:01 PM, wrote: > > > Hi Ralph, > > This is

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-17 Thread Ralph Castain

Thanks! That's precisely where I was going to look when I had time :-) I'll update tomorrow. Ralph On Sun, Nov 17, 2013 at 7:01 PM, wrote: > > > Hi Ralph, > > This is the continuous story of "Segmentation fault in oob_tcp.c of > openmpi-1.7.4a1r29646". > > I found the cause. > > Firstly, I n

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

28 matches

Site Navigation

Mail list logo

Footer information