Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
Thanks to several people who tried to help to diagnose, and shared your thoughts, on this subject thread. That gave me more clues and courage to talk back to our vender. My question on the Torque list is still pending for replies... Best regards to you all, Zhiliang At 11:22 AM 9/30/2008 +1

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Terry Frankcombe
On Mon, 2008-09-29 at 17:30 -0500, Zhiliang Hu wrote: > >As you blank out some addresses: have the nodes and the headnode one > >or two network cards installed? All the names like node001 et al. are > >known on neach node by the correct address? I.e. 172.16.100.1 = node001? > > > >-- Reuti >

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Reuti
Am 30.09.2008 um 00:30 schrieb Zhiliang Hu: At 12:10 AM 9/30/2008 +0200, you wrote: Can you please try this jobscript instead: #!/bin/sh set | grep PBS /path/to/mpirun /path/to/my_program All should be handled by Open MPI automatically. With the "set" bash command you will get a list with al

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
At 12:10 AM 9/30/2008 +0200, you wrote: >>Can you please try this jobscript instead: >> >>#!/bin/sh >>set | grep PBS >>/path/to/mpirun /path/to/my_program >> >>All should be handled by Open MPI automatically. With the "set" >>bash >>command you will get a list

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Reuti
Am 29.09.2008 um 23:15 schrieb Doug Reeder: It sounds like you may not have setup paswordless ssh between all your nodes. If you have a tight intergration of Open MPI and use the task manager from torque this shiouldn't be necessary. Continued below... Doug Reeder On Sep 29, 2008, at 2:

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Jeff Squyres
References: <200809290102.m8t12ic5022...@despam-11.iastate.edu> <5118_1222651029_m8t1h7c9014112_297d3668-bbfa-480c-8aa3-4dfe9a7dc...@lanl.gov> <200809290207.m8t27hg6030...@despam-10.iastate.edu> <19464_1222702229_m8tfursa024528_a4205240-a331-4854-b32c-bfb27b24d...@cisco.com> <200809291541.m8tffq

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
At 02:15 PM 9/29/2008 -0700, you wrote: >It sounds like you may not have setup paswordless ssh between all >your nodes. > >Doug Reeder That's not the case. paswordless ssh is set up and it works fine. -- that's how I can do "mpirun -np 6 -machinefiles .." fine. Zhiliang >On Sep 29, 2008,

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Doug Reeder
It sounds like you may not have setup paswordless ssh between all your nodes. Doug Reeder On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote: At 10:45 PM 9/29/2008 +0200, you wrote: Am 29.09.2008 um 22:33 schrieb Zhiliang Hu: At 07:37 PM 9/29/2008 +0200, Reuti wrote: "-l nodes=6:ppn=2" is al

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
At 10:45 PM 9/29/2008 +0200, you wrote: >Am 29.09.2008 um 22:33 schrieb Zhiliang Hu: > >>At 07:37 PM 9/29/2008 +0200, Reuti wrote: >> "-l nodes=6:ppn=2" is all I have to specify the node requests: >>> >>>this might help: http://www.open-mpi.org/faq/?category=tm >> >>Essentially the examples giv

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Reuti
Am 29.09.2008 um 22:33 schrieb Zhiliang Hu: At 07:37 PM 9/29/2008 +0200, Reuti wrote: "-l nodes=6:ppn=2" is all I have to specify the node requests: this might help: http://www.open-mpi.org/faq/?category=tm Essentially the examples given on this web is no difference from what I did. Only

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
At 07:37 PM 9/29/2008 +0200, Reuti wrote: >>"-l nodes=6:ppn=2" is all I have to specify the node requests: > >this might help: http://www.open-mpi.org/faq/?category=tm Essentially the examples given on this web is no difference from what I did. Only thing new is, I suppose "qsub -I " is for inter

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Reuti
Hi, Am 29.09.2008 um 19:06 schrieb Zhiliang Hu: At 06:55 PM 9/29/2008 +0200, Reuti wrote: Am 29.09.2008 um 18:27 schrieb Zhiliang Hu: How you run that command line from *inside a Torque* job? -- I am only a poor biologist, reading through the manuals/ tutorials but still don't have good c

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
At 06:55 PM 9/29/2008 +0200, Reuti wrote: >Am 29.09.2008 um 18:27 schrieb Zhiliang Hu: > >>How you run that command line from *inside a Torque* job? >> >>-- I am only a poor biologist, reading through the manuals/ tutorials but >>still don't have good clues... (thanks in advance ;-) > >What is the

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
I am the "system admin" here (so far so good on several servers over several years but this PBS thing appear to be daunting ;-) I suppose **run ... from *inside a Torque*** is to run things with a PBS script. I thought "qsub -l nodes=6:ppn=2 mpirun ..." already bring things into a PBS environm

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Reuti
Am 29.09.2008 um 18:27 schrieb Zhiliang Hu: How you run that command line from *inside a Torque* job? -- I am only a poor biologist, reading through the manuals/ tutorials but still don't have good clues... (thanks in advance ;-) What is the content of your jobscript? Did you request more th

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Jeff Squyres
On Sep 29, 2008, at 12:27 PM, Zhiliang Hu wrote: How you run that command line from *inside a Torque* job? -- I am only a poor biologist, reading through the manuals/tutorials but still don't have good clues... (thanks in advance ;-) Ah, gotcha. I'm guessing that you're running OMPI outsid

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
How you run that command line from *inside a Torque* job? -- I am only a poor biologist, reading through the manuals/tutorials but still don't have good clues... (thanks in advance ;-) Zhiliang At 11:48 AM 9/29/2008 -0400, you wrote: >We need to see that command line from *inside a Torque* job

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Jeff Squyres
We need to see that command line from *inside a Torque* job. That's the only place where those PBS_* environment variables will exists -- OMPI's mpirun should be seeing these environment variables (when inside a Torque job) and then reacting to them by using the Torque native launcher, etc

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Zhiliang Hu
At 11:29 AM 9/29/2008 -0400, Jeff Squyres wrote: >On Sep 28, 2008, at 10:07 PM, Zhiliang Hu wrote: > >>Indeed as you expected, "printenv | grep PBS" produced nothing. > >Are you *sure*? I find it very hard to believe that if you run that >command ***in a Torque job*** that you will get no output

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Jeff Squyres
On Sep 28, 2008, at 10:07 PM, Zhiliang Hu wrote: Indeed as you expected, "printenv | grep PBS" produced nothing. Are you *sure*? I find it very hard to believe that if you run that command ***in a Torque job*** that you will get no output. Torque would have to be *seriously* misbehaving

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Ralph Castain
Hi Zhiliang This has nothing to do with how you configured Open MPI. The issue is that your Torque queue manager isn't setting the expected environment variables to tell us the allocation. I'm not sure why it wouldn't be doing so, and I'm afraid I'm not enough of a Torque person to know how

Re: [OMPI users] qsub - mpirun problem

2008-09-28 Thread Zhiliang Hu
Ralph, Thank you for your quick response. Indeed as you expected, "printenv | grep PBS" produced nothing. BTW, I have: > qmgr -c 'p s' # Create queues and set their attributes. # # # Create and define queue default # create queue default set queue default queue_type = Execution set queue defau

Re: [OMPI users] qsub - mpirun problem

2008-09-28 Thread Ralph Castain
Hi Zhiliang First thing to check is that your Torque system is defining and setting the environmental variables we are expecting in a Torque system. It is quite possible that your Torque system isn't configured as we expect. Can you run a job and send us the output from "printenv | grep P

[OMPI users] qsub - mpirun problem

2008-09-28 Thread Zhiliang Hu
I have asked this question on TorqueUsers list. Responses from that list suggests that the question be asked on this list: The situation is: I can submit my jobs as in: > qsub -l nodes=6:ppn=2 /path/to/mpi_program where "mpi_program" is: /path/to/mpirun -np 12 /path/to/my_program -- however e