Tom and I talked more about this off list, and I eventually logged in to his
cluster to see what I could see.
The issue turned out to be not related to SGE or THREAD_MULTIPLE at all. The
issue was that RHEL6, by default, activated a virtualization IP interface on
all of Tom's nodes. All nodes
Am 08.02.2012 um 22:52 schrieb Tom Bryan:
>
> Yes, this should work across multiple machines. And it's using `qrsh
-inherit
...` so it's failing somewhere in Open MPI - is it working with 1.4.4?
>>>
>>> I'm not sure. We no longer have our 1.4 test environment, so I'm in the
>>> proces
On 2/8/12 4:52 PM, "Tom Bryan" wrote:
> Got it. Unfortunately, we *definitely* need THREAD_MULTIPLE in our case.
> I rebuilt my code against 1.4.4.
>
> When I run my test "e" from before, which is basically just
> mpiexec -np 1 ./mpitest
> I get the following [errors]
Talking to Jeff, it soun
On 2/6/12 5:10 PM, "Reuti" wrote:
> Am 06.02.2012 um 22:28 schrieb Tom Bryan:
>
>> On 2/6/12 8:14 AM, "Reuti" wrote:
>>
If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
it's not clear to me whether MPI::Init_Thread() and
MPI::Inint_Thread(MPI::THREAD_
Am 06.02.2012 um 22:28 schrieb Tom Bryan:
> On 2/6/12 8:14 AM, "Reuti" wrote:
>
>>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
>>> it's not clear to me whether MPI::Init_Thread() and
>>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
On 2/6/12 8:14 AM, "Reuti" wrote:
>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
>> it's not clear to me whether MPI::Init_Thread() and
>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
>> Open MPI.
>
> If you need thread support, you
Am 04.02.2012 um 00:15 schrieb Tom Bryan:
> OK. I misunderstood you. I thought that you were saying that spawn_multiple
> had to call mpiexec for each spawned process. If you just meant that mpi.sh
> should launch the initial process with mpiexec, that seems reasonable. I
> tried it with and wi
On Feb 5, 2012, at 6:51 AM, Reuti wrote:
> Hi,
>
>>> Not sure whether I get it right. When I launch the same application with:
>>>
>>> "mpiexec -np1 ./Mpitest" (and get an allocation of 2+2 on the two machines):
>>>
>>> 27422 ?Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd
>>> 9504 ?
Hi,
>> Not sure whether I get it right. When I launch the same application with:
>>
>> "mpiexec -np1 ./Mpitest" (and get an allocation of 2+2 on the two machines):
>>
>> 27422 ?Sl 4:12 /usr/sge/bin/lx24-x86/sge_execd
>> 9504 ?S 0:00 \_ sge_shepherd-3791 -bg
>> 9506 ?
Am 04.02.2012 um 00:15 schrieb Tom Bryan:
A more detailed answer later, as it's late here. But one short note:
-pe orte 5 => give me exactly 5 slots
-pe orte 5-5 => the same
-pe orte 5- => give me at least 5 slots, up to the maximum you can get right
now in the cluster
The output in `qstat -g
OK. Sorry for the delay. I needed to read through this thread a few times
and try some experiments. Let me reply to a few of these pieces, and then
I'll talk about those experiments.
On 1/31/12 9:26 AM, "Reuti" wrote:
>>> I never used spawn_mutiple, but isn't it necessary to start it with mpi
FWIW: I have fixed this on the developer's trunk, and Jeff has scheduled it for
release in the upcoming 1.6 release (when 1.5 series rolls over). I don't
expect we'll backport it to 1.4 unless someone really needs it there.
Thanks!
Ralph
On Feb 1, 2012, at 9:31 AM, Ralph Castain wrote:
> Ah -
Ah - crud. Looks like the default-hostfile mca param isn't getting set to the
default value. Will resolve - thanks!
On Feb 1, 2012, at 9:28 AM, Reuti wrote:
> Am 01.02.2012 um 17:16 schrieb Ralph Castain:
>
>> Could you add --display-allocation to your cmd line? This will tell us if it
>> foun
Am 01.02.2012 um 17:16 schrieb Ralph Castain:
> Could you add --display-allocation to your cmd line? This will tell us if it
> found/read the default hostfile, or if the problem is with the mapper.
Sure:
reuti@pc15370:~> mpiexec --display-allocation -np 4 ./mpihello
== AL
Could you add --display-allocation to your cmd line? This will tell us if it
found/read the default hostfile, or if the problem is with the mapper.
On Feb 1, 2012, at 7:58 AM, Reuti wrote:
> Am 01.02.2012 um 15:38 schrieb Ralph Castain:
>
>> On Feb 1, 2012, at 3:49 AM, Reuti wrote:
>>
>>> Am
Am 01.02.2012 um 15:38 schrieb Ralph Castain:
> On Feb 1, 2012, at 3:49 AM, Reuti wrote:
>
>> Am 31.01.2012 um 21:25 schrieb Ralph Castain:
>>
>>> On Jan 31, 2012, at 12:58 PM, Reuti wrote:
>>
>> BTW: is there any default for a hostfile for Open MPI - I mean any in my
>> home directory or /etc
On Feb 1, 2012, at 3:49 AM, Reuti wrote:
> Am 31.01.2012 um 21:25 schrieb Ralph Castain:
>
>>
>> On Jan 31, 2012, at 12:58 PM, Reuti wrote:
>
> BTW: is there any default for a hostfile for Open MPI - I mean any in my home
> directory or /etc? When I check `man orte_hosts`, and all possible op
Am 31.01.2012 um 21:25 schrieb Ralph Castain:
>
> On Jan 31, 2012, at 12:58 PM, Reuti wrote:
>
>>
>> Am 31.01.2012 um 20:38 schrieb Ralph Castain:
>>
>>> Not sure I fully grok this thread, but will try to provide an answer.
>>>
>>> When you start a singleton, it spawns off a daemon that is th
On Jan 31, 2012, at 12:58 PM, Reuti wrote:
>
> Am 31.01.2012 um 20:38 schrieb Ralph Castain:
>
>> Not sure I fully grok this thread, but will try to provide an answer.
>>
>> When you start a singleton, it spawns off a daemon that is the equivalent of
>> "mpirun". This daemon is created for th
Am 31.01.2012 um 20:38 schrieb Ralph Castain:
> Not sure I fully grok this thread, but will try to provide an answer.
>
> When you start a singleton, it spawns off a daemon that is the equivalent of
> "mpirun". This daemon is created for the express purpose of allowing the
> singleton to use M
Not sure I fully grok this thread, but will try to provide an answer.
When you start a singleton, it spawns off a daemon that is the equivalent of
"mpirun". This daemon is created for the express purpose of allowing the
singleton to use MPI dynamics like comm_spawn - without it, the singleton wo
Am 31.01.2012 um 20:12 schrieb Jeff Squyres:
> I only noticed after the fact that Tom is also here at Cisco (it's a big
> company, after all :-) ).
>
> I've contacted him using our proprietary super-secret Cisco handshake (i.e.,
> the internal phone network); I'll see if I can figure out the is
I only noticed after the fact that Tom is also here at Cisco (it's a big
company, after all :-) ).
I've contacted him using our proprietary super-secret Cisco handshake (i.e.,
the internal phone network); I'll see if I can figure out the issues off-list.
On Jan 31, 2012, at 1:08 PM, Dave Love
Reuti writes:
> Maybe it's a side effect of a tight integration that it would start on
> the correct nodes (but I face an incorrect allocation of slots and an
> error message at the end if started without mpiexec), as in this case
> it has no command line option for the hostfile. How to get the
>
Am 31.01.2012 um 05:33 schrieb Tom Bryan:
>> Suppose you want to start 4 additional tasks, you would need 5 in total from
>> SGE.
>
> OK, thanks. I'll try other values.
BTW: there is a setting in the PE definition to allow one addititonal task:
$ qconf -sp openmpi
...
job_is_first_task FALSE
Am 31.01.2012 um 06:33 schrieb Rayson Ho:
> On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote:
>> For our use, yes, spawn_multiple makes sense. We won't be spawning lots and
>> lots of jobs in quick succession. We're using MPI as an robust way to get
>> IPC as we spawn multiple child processes
On Mon, Jan 30, 2012 at 11:33 PM, Tom Bryan wrote:
> For our use, yes, spawn_multiple makes sense. We won't be spawning lots and
> lots of jobs in quick succession. We're using MPI as an robust way to get
> IPC as we spawn multiple child processes while using SGE to help us with
> load balancing
On 1/29/12 5:44 PM, "Reuti" wrote:
> you compiled Open MPI --with-sge I assume, as the above is working - fine.
Yes, we compiled --with-sge.
>> #$ -pe orte 1-
>
> This number should match the processes you want to start plus one the master.
> Otherwise SGE might refuse to start a process on a
Am 27.01.2012 um 23:19 schrieb Tom Bryan:
> I am in the process of setting up a grid engine (SGE) cluster for running
> Open MPI applications. I'll detail the set up below, but my current problem
> is that this call to Span_multiple never seems to return.
>
> // Spawn all of the children process
I am in the process of setting up a grid engine (SGE) cluster for running
Open MPI applications. I'll detail the set up below, but my current problem
is that this call to Span_multiple never seems to return.
// Spawn all of the children processes.
_intercomm = MPI::COMM_WORLD.Spawn_multiple( _nPr
30 matches
Mail list logo