Quite possible - the two sections were written by different people several
years apart. I'll take a look and see what can be done.
Thanks!
On Dec 1, 2009, at 8:45 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
>> Interesting - especially since the existing code works quite well over a
>> wide range of
On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote:
> You may want to check your limits as defined by the shell/system. I can
> also run this for as long as I'm willing to let it run, so something else
> appears to be going on.
>
>
>
Is that with 1.3.3? I found that with 1.3.4 I can run the exampl
> Interesting - especially since the existing code works quite well over a
> wide range of platforms. So I'm not quite so eager to declare it incorrect
> and only working by accident.
>
> However, I would welcome a proposed patch so we can look at it. This is
> always an important area for us, so t
Hi Belaid
PBS loves to read the nodes' list backwards.
If you want to start with WN1,
put it last on the Torque/PBS "nodes" file.
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY,
Hi Belaid
Belaid MOA wrote:
You made my day Gus! Thank you very much.
I'm glad it helped.
I hope it is working for you now.
If I asked before, I would
have finished within two hours
(but I guess that's part of the learning process).
Oh, well, that's nothing to worry about.
On these mailin
Interesting - especially since the existing code works quite well over a wide
range of platforms. So I'm not quite so eager to declare it incorrect and only
working by accident.
However, I would welcome a proposed patch so we can look at it. This is always
an important area for us, so the more
>> I assume that both of you have seen the reply from Aleksej Saushev,
>> who seems to be the bloke looking after the port of OpenMPI to the
>> NetBSD platform.
>>
>>
>> Aleksej suggested some mods he had partially looked at, in
>>
>> opal/util/if.c
>
> Nope - didn't see anything like that :-/
Aah
I actually tried both:
-- in the interactive mode, as soon as I hit enter, the PBS sends me to a
worker node (WN2) that does not have tm support.
I guess if I added the head node to the list of PBS nodes, I would not run
into the problem. However, I am glad I did run into
the problem. Y
You made my day Gus! Thank you very much. If I asked before, I would have
finished within two hours
(but I guess that's part of the learning process). Very straight forward!
Although I tried doing exactly
what you said, the Googled-information is not clear and sometimes misleading
about what t
Just to further show my confusion (since I wrote much of the TM support):
If you get an interactive allocation and then type "mpirun ", mpirun will
execute on the node upon which you are sitting. Jeff's statement is -only- true
if you "qsub" the job - i.e., you run it in batch mode.
From yo
> Yes, this page is definitely incorrect if you want to run with PBS/TM
> support -- you definitely need to install with TM support on all nodes.
>
> The reason is that PBS will launch your script (and therefore
> "mpirun") on the first node of the job. This node must have an Open
> MPI mp
On Dec 1, 2009, at 6:43 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
>
>> "Jeff Squyres"
>>
>>
>> Oy. This is ick, because this error code is coming from horrendously
>> complex code deep in the depths of OMPI that is probing the OS to
>> figure out what ethernet interfaces you have. It may or ma
> "Jeff Squyres"
>
>
> Oy. This is ick, because this error code is coming from horrendously
> complex code deep in the depths of OMPI that is probing the OS to
> figure out what ethernet interfaces you have. It may or may not be
> simple to fix this.
>
> Do you mind diving into the OMPI code a
Jeff Squyres wrote:
(for the web archives)
Brock and I talked about this .f90 code a bit off list -- he's going
to investigate with the test author a bit more because both of us are
a bit confused by the F90 array syntax used.
Attached is a simple send/recv code written (procedural) C++ that
You may want to check your limits as defined by the shell/system. I can also
run this for as long as I'm willing to let it run, so something else appears to
be going on.
On Dec 1, 2009, at 4:38 PM, Nicolas Bock wrote:
>
>
> On Tue, Dec 1, 2009 at 16:28, Abhishek Kulkarni wrote:
> On Tue, De
I believe what this is saying is that we are not finding any TCP interfaces -
the ioctl itself is failing. So yes - miprun failing at that point is going to
happen because we have no way to communicate for launch.
Do you see interfaces if you do an /sbin/ifconfig? Do they have valid IP
address
You might also want to ensure that your Open MPI was built with
OpenFabrics support (i.e., to use verbs directly instead of IPoIB).
Try this:
ompi_info | grep openib
If that returns a line with "openib" and "btl" in it, then your Open
MPI has OpenFabrics support (we named the plugin "o
On Nov 29, 2009, at 6:15 PM, > wrote:
$ mpirun -n 4 hello_f77
[somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS)
failed with
errno=6
Oy. This is ick, because this error code is coming from horrendously
complex code deep in the depths of OMPI that is probing the OS to
figu
(for the web archives)
Brock and I talked about this .f90 code a bit off list -- he's going
to investigate with the test author a bit more because both of us are
a bit confused by the F90 array syntax used.
On Dec 1, 2009, at 10:46 AM, Brock Palen wrote:
The attached code, is an example
Hi Belaid Moa
Belaid MOA wrote:
In that case, the way I installed it is not right. I thought that only
the HN should be configured with the tm support
not the worker nodes; the worker nodes only have the PBS daemon clients
- No need for tm support on the worker nodes.
When I ran ompi_info | g
On Dec 1, 2009, at 7:02 PM, Belaid MOA wrote:
The information on the following link has mislead me then:
http://www.physics.iitm.ac.in/~sanoop/linux_files/cluster.html
(check OpenMPI Configuration section.)
Yes, this page is definitely incorrect if you want to run with PBS/TM
support -- you
Thanks a lot Jeff. That's what I will do next :)
With Many Thanks to everyone.
~Belaid.
> From: jsquy...@cisco.com
> To: us...@open-mpi.org
> Date: Tue, 1 Dec 2009 18:59:52 -0500
> Subject: Re: [OMPI users] mpirun is using one PBS node only
>
> You need to install with TM support on all nodes
In that case, the way I installed it is not right. I thought that only the HN
should be configured with the tm support
not the worker nodes; the worker nodes only have the PBS daemon clients - No
need for tm support on the worker nodes.
When I ran ompi_info | grep tm on the worker nodes, the
You need to install with TM support on all nodes.
On Dec 1, 2009, at 6:08 PM, Belaid MOA wrote:
I tried -bynode option but it did not change anything. I also tried
the "hostname" name command and
I keep getting only the name of one node repeated according to the -
n value.
Just to make sure
Thank you very much Ralph for your help.
>I'm having a little trouble following this email thread, so forgive any
>misunderstanding.
>If I understand this correctly, you are able to correctly run if you provide a
>-hostfile option. The issue is that mpirun does not >appear to be picking up
On Tue, Dec 1, 2009 at 16:28, Abhishek Kulkarni wrote:
> On Tue, Dec 1, 2009 at 6:15 PM, Nicolas Bock
> wrote:
> > After reading Anthony's question again, I am not sure now that we are
> having
> > the same problem, but we might. In any case, the attached example
> programs
> > trigger the issue
Hi Belaid Moa
The OpenMPI I install and use is on a NFS mounted directory.
Hence, all the nodes see the same version, which has "tm" support.
After reading your OpenMPI configuration parameters on the headnode
and working nodes (and the difference between them),
I would guess (just a guess) that
I'm having a little trouble following this email thread, so forgive any
misunderstanding.
If I understand this correctly, you are able to correctly run if you provide a
-hostfile option. The issue is that mpirun does not appear to be picking up the
PBS_NODEFILE automatically and using it - corr
On Tue, Dec 1, 2009 at 6:15 PM, Nicolas Bock wrote:
> After reading Anthony's question again, I am not sure now that we are having
> the same problem, but we might. In any case, the attached example programs
> trigger the issue of running out of pipes. I don't see how orted could, even
> if it was
Linux mujo 2.6.30-gentoo-r5 #1 SMP PREEMPT Thu Sep 17 07:47:12 MDT 2009
x86_64 Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz GenuineIntel GNU/Linux
On Tue, Dec 1, 2009 at 16:24, Ralph Castain wrote:
> It really does help if we have some idea what OMPI version you are talking
> about, and on what k
Sorry,
openmpi-1.3.3 compiled with gcc-4.4.2
nick
On Tue, Dec 1, 2009 at 16:24, Ralph Castain wrote:
> It really does help if we have some idea what OMPI version you are talking
> about, and on what kind of platform.
>
> This issue was fixed to the best of my knowledge (not all the pipes were
It really does help if we have some idea what OMPI version you are talking
about, and on what kind of platform.
This issue was fixed to the best of my knowledge (not all the pipes were
getting closed), but I would have to look and see what release might contain
the fix...would be nice to know w
After reading Anthony's question again, I am not sure now that we are having
the same problem, but we might. In any case, the attached example programs
trigger the issue of running out of pipes. I don't see how orted could, even
if it was reused. There is only a very limited number of processes run
Hello list,
a while back in January of this year, a user (Anthony Thevenin) had the
problem of running out of open pipes when he tried to use MPI_Comm_spawn a
few times. As I the thread his started in the mailing list archives and have
just joined the mailing list myself, I unfortunately can't rep
I tried -bynode option but it did not change anything. I also tried the
"hostname" name command and
I keep getting only the name of one node repeated according to the -n value.
Just to make sure I did the right installation, here is what I did:
-- On the head node (HN), I installed openMPI u
Hi Belaid Moa
Belaid MOA wrote:
Thanks a lot Gus for you help again. I only have one CPU per node.
The -n X option (no matter what the value of X is) shows X processes
running on one node only (the other one is free).
So, somehow it is oversubscribing your single processor
on the first node.
Thanks a lot Gus for you help again. I only have one CPU per node.
The -n X option (no matter what the value of X is) shows X processes running on
one node only (the other one is free).
If I add the machinefile option with WN1 and WN2 in it, the right behavior is
manifested. According to the do
Hi Belaid Moa
Belaid MOA wrote:
Hi everyone,
Here is another elementary question. I tried the following steps found
in the FAQ section of www.open-mpi.org with a simple hello world example
(with PBS/torque):
$ qsub -l nodes=2 my_script.sh
my_script.sh is pasted below:
Hi everyone,
Here is another elementary question. I tried the following steps found in the
FAQ section of www.open-mpi.org with a simple hello world example (with
PBS/torque):
$ qsub -l nodes=2 my_script.sh
my_script.sh is pasted below:
#!/bin/sh -l
#PBS -N hell
I saw those options before but somehow I did not pay attention to them :(.
I was thinking that the copying is done automatically, so I felt the options
were useless but I was wrong.
Thanks a lot Gus; that's exactly what I was looking for. I will try them then.
Best Regards.
~Belaid.
> Date: T
Hi Belaid Moa
Belaid MOA wrote:
Thanks a lot Gus for your help. Although I used stage_in/stage_out
features before, I found NFS mounting much better and cleaner.
Best Regards.
~Belaid.
Yes, unless you have very heavy I/O programs (some computational
Chemistry and genome programs are like th
Hi Belaid Moa
I spoke too fast, and burnt my tongue.
I should have double checked before speaking out.
I just looked up "man mpiexec" and found the options below.
I never used or knew about them, but you may want to try.
They seem to be similar to the Torque/PBS stage_in feature.
I would guess th
Thanks a lot Gus for your help. Although I used stage_in/stage_out features
before, I found NFS mounting much better and cleaner.
Best Regards.
~Belaid.
> Date: Tue, 1 Dec 2009 14:55:53 -0500
> From: g...@ldeo.columbia.edu
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] Elementary question
Hi Belaid Moa
Belaid MOA wrote:
Thank you very very much Gus. Does this mean that OpenMPI does not copy
the executable from the master node to the worker nodes?
Not that I know.
Making the executable available on the nodes, and any
input files the program may need, is the user's responsibility
Thank you very very much Gus. Does this mean that OpenMPI does not copy the
executable from the master node to the worker nodes?
If that's case, I will go ahead and NFS mount my working directory.
~Belaid.
> Date: Tue, 1 Dec 2009 13:50:57 -0500
> From: g...@ldeo.columbia.edu
> To: us...@open-m
Hi Belaid MOA
See this FAQ:
http://www.open-mpi.org/faq/?category=running#do-i-need-a-common-filesystem
http://www.open-mpi.org/faq/?category=building#where-to-install
http://www.open-mpi.org/faq/?category=tm#tm-obtain-host
Your executable needs to be on a directory that is accessible
by all nod
Hello everyone,
I am new to this list and I have a very elementary question: suppose we have
three machines, HN (Head Node hosting the pbs server), WN1 (A worker node) and
WN (another worker node). The PBS nodefile has WN1 and WN2 in it (DOES NOT HAVE
HN).
My openMPI program (hello) and PBS sc
On Dec 1, 2009, at 12:20 PM, Jurgen Heymann wrote:
Thank you for your feedback. In all cases, I installed it clean.
After consulting the mpiBLAST user group, I learned that there is an
issue building mpiBLAST with openMPI on Intel platforms and that it
is being investigated. Once I learn mo
Hi Jeff,
Thank you for your feedback. In all cases, I installed it clean. After
consulting the mpiBLAST user group, I learned that there is an issue building
mpiBLAST with openMPI on Intel platforms and that it is being investigated.
Once I learn more about it, I will post it here. I would stil
On Tue, 2009-12-01 at 10:46 -0500, Brock Palen wrote:
> The attached code, is an example where openmpi/1.3.2 will lock up, if
> ran on 48 cores, of IB (4 cores per node),
> The code loops over recv from all processors on rank 0 and sends from
> all other ranks, as far as I know this should work
The attached code, is an example where openmpi/1.3.2 will lock up, if
ran on 48 cores, of IB (4 cores per node),
The code loops over recv from all processors on rank 0 and sends from
all other ranks, as far as I know this should work, and I can't see
why not.
Note yes I know we can do the sam
On Nov 23, 2009, at 12:51 PM, Jurgen Heymann wrote:
I am trying to understand what parameters are essential to include
when running ./configure with openmpi-1.3.3 when working with Mac
using PPC (OS 10.4) or Intel platform (OS 10.5). What compilers and
settings work best for the individual
I can't think of what OMPI would be doing related to the predefined
stack size -- I am not aware of anywhere in the code where we look up
the predefine stack size and then do something with it.
That being said, I don't know what the OS and resource consumption
effects are of setting 1GB+ st
This is probably best taken up on the MTT list -- it doesn't look like
an OMPI error, but rather an MTT configuration error (if you're
running OMPI 1.3.3 through MTT, it shouldn't be trying to find OMPI
1.3.2).
On Dec 1, 2009, at 1:45 AM, Vishal Shorrghar wrote:
Hi ALL,
I tried to run
Hi
We have solved the problem by rewriting the starter.sh...
The script remained the same except for the very final part where
command is executed... Instead of plain exec "$@", we replaced it by:
==
#need for exec to fail on non-script jobs
shopt -s execfail
#start the job in thus shell
amjad ali wrote:
Hi,
thanks T.Prince,
Your saying:
"I'll just mention that we are well into the era of 3 levels of
programming parallelization: vectorization, threaded parallel (e.g.
OpenMP), and process parallel (e.g. MPI)." is a really great new
learning for me. Now I can perceive better.
Hi,
thanks T.Prince,
Your saying:
"I'll just mention that we are well into the era of 3 levels of programming
parallelization: vectorization, threaded parallel (e.g. OpenMP), and
process parallel (e.g. MPI)." is a really great new learning for me. Now I
can perceive better.
Can you please expl
amjad ali wrote:
Hi,
Suppose we run a parallel MPI code with 64 processes on a cluster, say
of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node.
Now all the 64 cores on the cluster running a process. Program is SPMD,
means all processes has the same workload.
Now if we
Am 01.12.2009 um 10:32 schrieb Ondrej Glembek:
Just to add more info:
Reuti wrote:
Am 30.11.2009 um 20:07 schrieb Ondrej Glembek:
But I think the real problem is, that Open MPI assumes you are
outside
of SGE and so uses a different startup. Are you resetting any of
SGE's
environment vari
Hi,
Am 01.12.2009 um 10:00 schrieb Ondrej Glembek:
Reuti wrote:
./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
--with-sge --enable-shared --enable-static --host=x86_64-linux
--build=x86_64-linux NM=x86_64-linux-nm
Is there any list of valid values for --host, --build and NM
Just to add more info:
Reuti wrote:
> Am 30.11.2009 um 20:07 schrieb Ondrej Glembek:
>
> But I think the real problem is, that Open MPI assumes you are outside
> of SGE and so uses a different startup. Are you resetting any of SGE's
> environment variables in your custom starter method (like $JOB
Hi
Reuti wrote:
>>
>> ./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
>> --with-sge --enable-shared --enable-static --host=x86_64-linux
>> --build=x86_64-linux NM=x86_64-linux-nm
>
> Is there any list of valid values for --host, --build and NM - and what
> is NM for? From the ./co
Hi ALL,
I tried to run trivial test between two nodes, it seems to be running
without any memory issue ,but it gives some error(path mismatch like its
taking openmpi 1.3.2 instead of 1.3.3) while fetching/executing some
test binaries i.e. cxx_ring, c_ring etc. I am using openmpi-1.3.3.path
is
Hi,
Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16
nodes. The cluster nodes has multicore CPU say 4 cores on each node.
Now all the 64 cores on the cluster running a process. Program is SPMD,
means all processes has the same workload.
Now if we had done auto-vectoriz
64 matches
Mail list logo