Thanks to several people who tried to help to diagnose, and shared your
thoughts, on this subject thread. That gave me more clues and courage to talk
back to our vender.
My question on the Torque list is still pending for replies...
Best regards to you all,
Zhiliang
At 11:22 AM 9/30/2008 +1
On Mon, 2008-09-29 at 17:30 -0500, Zhiliang Hu wrote:
> >As you blank out some addresses: have the nodes and the headnode one
> >or two network cards installed? All the names like node001 et al. are
> >known on neach node by the correct address? I.e. 172.16.100.1 = node001?
> >
> >-- Reuti
>
Am 30.09.2008 um 00:30 schrieb Zhiliang Hu:
At 12:10 AM 9/30/2008 +0200, you wrote:
Can you please try this jobscript instead:
#!/bin/sh
set | grep PBS
/path/to/mpirun /path/to/my_program
All should be handled by Open MPI automatically. With the "set"
bash
command you will get a list with al
At 12:10 AM 9/30/2008 +0200, you wrote:
>>Can you please try this jobscript instead:
>>
>>#!/bin/sh
>>set | grep PBS
>>/path/to/mpirun /path/to/my_program
>>
>>All should be handled by Open MPI automatically. With the "set"
>>bash
>>command you will get a list
Am 29.09.2008 um 23:15 schrieb Doug Reeder:
It sounds like you may not have setup paswordless ssh between all
your nodes.
If you have a tight intergration of Open MPI and use the task manager
from torque this shiouldn't be necessary.
Continued below...
Doug Reeder
On Sep 29, 2008, at 2:
References: <200809290102.m8t12ic5022...@despam-11.iastate.edu> <5118_1222651029_m8t1h7c9014112_297d3668-bbfa-480c-8aa3-4dfe9a7dc...@lanl.gov>
<200809290207.m8t27hg6030...@despam-10.iastate.edu> <19464_1222702229_m8tfursa024528_a4205240-a331-4854-b32c-bfb27b24d...@cisco.com>
<200809291541.m8tffq
At 02:15 PM 9/29/2008 -0700, you wrote:
>It sounds like you may not have setup paswordless ssh between all
>your nodes.
>
>Doug Reeder
That's not the case. paswordless ssh is set up and it works fine.
-- that's how I can do "mpirun -np 6 -machinefiles .." fine.
Zhiliang
>On Sep 29, 2008,
It sounds like you may not have setup paswordless ssh between all
your nodes.
Doug Reeder
On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote:
At 10:45 PM 9/29/2008 +0200, you wrote:
Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:
At 07:37 PM 9/29/2008 +0200, Reuti wrote:
"-l nodes=6:ppn=2" is al
At 10:45 PM 9/29/2008 +0200, you wrote:
>Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:
>
>>At 07:37 PM 9/29/2008 +0200, Reuti wrote:
>>
"-l nodes=6:ppn=2" is all I have to specify the node requests:
>>>
>>>this might help: http://www.open-mpi.org/faq/?category=tm
>>
>>Essentially the examples giv
Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:
At 07:37 PM 9/29/2008 +0200, Reuti wrote:
"-l nodes=6:ppn=2" is all I have to specify the node requests:
this might help: http://www.open-mpi.org/faq/?category=tm
Essentially the examples given on this web is no difference from
what I did.
Only
At 07:37 PM 9/29/2008 +0200, Reuti wrote:
>>"-l nodes=6:ppn=2" is all I have to specify the node requests:
>
>this might help: http://www.open-mpi.org/faq/?category=tm
Essentially the examples given on this web is no difference from what I did.
Only thing new is, I suppose "qsub -I " is for inter
Hi,
Am 29.09.2008 um 19:06 schrieb Zhiliang Hu:
At 06:55 PM 9/29/2008 +0200, Reuti wrote:
Am 29.09.2008 um 18:27 schrieb Zhiliang Hu:
How you run that command line from *inside a Torque* job?
-- I am only a poor biologist, reading through the manuals/
tutorials but still don't have good c
At 06:55 PM 9/29/2008 +0200, Reuti wrote:
>Am 29.09.2008 um 18:27 schrieb Zhiliang Hu:
>
>>How you run that command line from *inside a Torque* job?
>>
>>-- I am only a poor biologist, reading through the manuals/ tutorials but
>>still don't have good clues... (thanks in advance ;-)
>
>What is the
I am the "system admin" here (so far so good on several servers over several
years but this PBS thing appear to be daunting ;-)
I suppose **run ... from *inside a Torque*** is to run things with a PBS
script. I thought "qsub -l nodes=6:ppn=2 mpirun ..." already bring things into
a PBS environm
Am 29.09.2008 um 18:27 schrieb Zhiliang Hu:
How you run that command line from *inside a Torque* job?
-- I am only a poor biologist, reading through the manuals/
tutorials but still don't have good clues... (thanks in advance ;-)
What is the content of your jobscript? Did you request more th
On Sep 29, 2008, at 12:27 PM, Zhiliang Hu wrote:
How you run that command line from *inside a Torque* job?
-- I am only a poor biologist, reading through the manuals/tutorials
but still don't have good clues... (thanks in advance ;-)
Ah, gotcha.
I'm guessing that you're running OMPI outsid
How you run that command line from *inside a Torque* job?
-- I am only a poor biologist, reading through the manuals/tutorials but still
don't have good clues... (thanks in advance ;-)
Zhiliang
At 11:48 AM 9/29/2008 -0400, you wrote:
>We need to see that command line from *inside a Torque* job
We need to see that command line from *inside a Torque* job. That's
the only place where those PBS_* environment variables will exists --
OMPI's mpirun should be seeing these environment variables (when
inside a Torque job) and then reacting to them by using the Torque
native launcher, etc
At 11:29 AM 9/29/2008 -0400, Jeff Squyres wrote:
>On Sep 28, 2008, at 10:07 PM, Zhiliang Hu wrote:
>
>>Indeed as you expected, "printenv | grep PBS" produced nothing.
>
>Are you *sure*? I find it very hard to believe that if you run that
>command ***in a Torque job*** that you will get no output
On Sep 28, 2008, at 10:07 PM, Zhiliang Hu wrote:
Indeed as you expected, "printenv | grep PBS" produced nothing.
Are you *sure*? I find it very hard to believe that if you run that
command ***in a Torque job*** that you will get no output. Torque
would have to be *seriously* misbehaving
Hi Zhiliang
This has nothing to do with how you configured Open MPI. The issue is
that your Torque queue manager isn't setting the expected environment
variables to tell us the allocation. I'm not sure why it wouldn't be
doing so, and I'm afraid I'm not enough of a Torque person to know how
Ralph,
Thank you for your quick response.
Indeed as you expected, "printenv | grep PBS" produced nothing.
BTW, I have:
> qmgr -c 'p s'
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue defau
Hi Zhiliang
First thing to check is that your Torque system is defining and
setting the environmental variables we are expecting in a Torque
system. It is quite possible that your Torque system isn't configured
as we expect.
Can you run a job and send us the output from "printenv | grep P
I have asked this question on TorqueUsers list. Responses from that list
suggests that the question be asked on this list:
The situation is:
I can submit my jobs as in:
> qsub -l nodes=6:ppn=2 /path/to/mpi_program
where "mpi_program" is:
/path/to/mpirun -np 12 /path/to/my_program
-- however e
24 matches
Mail list logo