Re: [OMPI users] Measuring MPI message size used by application

2007-03-30 Thread stephen mulcahy

George Bosilca wrote:
> I used it on a IA64 platform, so I supposed x86_64 is supported, but  
> I never use it on an AMD 64. On the mpiP webpage they claim they  
> support the Cray XT3, which as far as I know are based on AMD Opteron  
> 64 bits. So, there is at least a spark of hope in the dark ...
> 
> I decide to give it a try on my x86_64 AMD box (Debian based system).  
> First problem, my box didn't have the libunwind. Not a big deal, it's  
> freely available on HP website (http://www.hpl.hp.com/research/linux/ 
> libunwind/download.php4). Few minutes later, the libunwind was  
> installed in /lib64. Now, time to focus on mpiP ... For some obscure  
> reason the configure script was unable to detect my g77 compiler  
> (whatever!!!) nor the installation of libunwind. Moreover, it keep  
> trying to use the clock_gettime call. Fortunately (which make me  
> think I'm not the only one having trouble with this), mpiP provide  
> configure options for all these. The final configure line was: ./ 
> configure --prefix=/opt/ --without-f77 --with-wtime --with-include=-I/ 
> include --with-lib=-L/lib64. Then a quick "make shared" followed by  
> "make install", complete the work. So, at least mpiP can compile on a  
> x86_64 box.
> 

Hi George,

I'm not onsite with the cluster now but this sounds promising enough
that I'll give it a shot the next time I'm back onsite.

Thanks for your reply,

-stephen

-- 
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
   GMIT, Dublin Rd, Galway, Ireland.  http://www.aplpi.com


Re: [OMPI users] Jeff Squyres: "Re: password orted problem"

2007-03-30 Thread Jeff Squyres

On Mar 29, 2007, at 1:08 PM, Jens Klostermann wrote:


In reply to
http://www.open-mpi.org/community/lists/users/2006/12/2286.php

I recently switched to openmpi1.2 unfortunately the password problem
still persists! I generated new rsa keys and made passwordless ssh
available. This was tested by login to each node per passwordless ssh,
fortunately there are only 16 nodes:-).
The funny thing is it seems to be a problem only with my user and
appears randomly, but more likely if I uses more nodes.


Is the problem still something like this:


[say_at_wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46 /tmp/test.x
orted: Command not found.
-

Because if so, it's a larger / non-MPI issue.  If the orted  
executable cannot be found on the remote node, there's no way Open  
MPI will succeed.


The question of *why* the orted can't be found may be a bit deeper of  
a problem -- if you have your PATH set right, etc., perhaps it's an  
NFS issue...?



One cure for the problem until now is using the option --mca
pls_rsh_debug. What does this switch do other than producing more  
output

that this resolves my problem?


It also slows the code down a bit such that the timing is different.


Two other questions what is the
-ras (Resource allocation subsystem): and how can I set this up/what
options to have


I would doubt that the ras is involved in the issue -- the ras is  
used to read hostfiles, analyze lists of hosts from resource  
managers, etc.  It doesn't actually do anything in the actual launch.



pls (Process launch subsystem): and how can I set this up/what options
to have?


I assume you're using the RSH launcher; you can use the ompi_info  
command to see what parameters are available for that component:


 ompi_info --param pls rsh

--
Jeff Squyres
Cisco Systems



[OMPI users] Getting a core-dump with OpenMPI

2007-03-30 Thread Jeff Stuart

I'm using OpenMPI, and the documentation says that only a totalview
style of debugger can be used. With that in mind, all I want to do is
get a core-dump when a process crashes. I can then just load the core
into GDB. Is there any easy way to do this?

I tried calling signal(SIGSEGV, SIG_DFL); signal(SIGABRT, SIG_DFL); to
no avail. All that happens is that I don't get a call stack dump
anymore.

Thanks,
-Jeff


[OMPI users] Newbie Hostfile Quesiton

2007-03-30 Thread Warner Yuen
In LAM/MPI, I can use "portal.private schedule=no" if I want to  
launch a job from a specific node but not schedule it for any work. I  
can't seem to find reference to an equivalent in Open MPI.


Thanks.

-Warner


Warner Yuen
Scientific Computing Consultant
Apple Computer
email: wy...@apple.com
Tel: 408.718.2859




Re: [OMPI users] Newbie Hostfile Quesiton

2007-03-30 Thread Jeff Squyres

Short version: just don't list that host in the OMPI hostfile.

Long version:

In LAM, we had the constraint that you *had* to include the local  
host in the hostfile that you lambooted.  This was undesirable in  
some cases, such as lambooting from a cluster's head node (where you  
didn't want to run launch any MPI processes).  So as a workaround, we  
created the "schedule=no" attribute such that your lamboot would  
include the node, but we wouldn't [by default] run any MPI processes  
on it.


In Open MPI, we do not have the restriction that you must include the  
local host in the hostfile that you mpirun on.  So equivalent  
functionality in Open MPI is simply to leave the local host out of  
the hostfile.


Sidenote -- maybe I should create a "I used to be a LAM user" section  
of the FAQ...



On Mar 30, 2007, at 1:13 PM, Warner Yuen wrote:

In LAM/MPI, I can use "portal.private schedule=no" if I want to  
launch a job from a specific node but not schedule it for any work.  
I can't seem to find reference to an equivalent in Open MPI.


Thanks.

-Warner


Warner Yuen
Scientific Computing Consultant
Apple Computer
email: wy...@apple.com
Tel: 408.718.2859


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Getting a core-dump with OpenMPI

2007-03-30 Thread Jeff Squyres
You should be able to get a core dump pretty easily by doing  
something like this:


{ char *foo = 0; *foo = 13; }

Ensure that your coredumpsize limit is set to "unlimited" in the  
shell on all nodes where you are running your MPI processes.  It's  
also helpful to set Linux (I'm assuming you're using Linux...?) to  
append the core filename with the PID of the process that created it  
so that you don't have multiple processes all writing to a single  
file named "core" (particularly in a network filesystem kind of  
scenario) because you'll get a single file that may or may not be  
usable.



On Mar 30, 2007, at 12:11 PM, Jeff Stuart wrote:


I'm using OpenMPI, and the documentation says that only a totalview
style of debugger can be used. With that in mind, all I want to do is
get a core-dump when a process crashes. I can then just load the core
into GDB. Is there any easy way to do this?

I tried calling signal(SIGSEGV, SIG_DFL); signal(SIGABRT, SIG_DFL); to
no avail. All that happens is that I don't get a call stack dump
anymore.

Thanks,
-Jeff
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Getting a core-dump with OpenMPI

2007-03-30 Thread George Bosilca
Generating core files is not a feature of Open MPI but of the  
operating system. Based on the shell script you're using there is a  
different way to reach this goal. Usually via limit (or ulimit). This  
webpage can give you more information about this (http://www.faqs.org/ 
faqs/hp/hpux-faq/section-257.html).


  george.

On Mar 30, 2007, at 12:11 PM, Jeff Stuart wrote:


I'm using OpenMPI, and the documentation says that only a totalview
style of debugger can be used. With that in mind, all I want to do is
get a core-dump when a process crashes. I can then just load the core
into GDB. Is there any easy way to do this?

I tried calling signal(SIGSEGV, SIG_DFL); signal(SIGABRT, SIG_DFL); to
no avail. All that happens is that I don't get a call stack dump
anymore.

Thanks,
-Jeff
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran




[OMPI users] Spawning to processors outside of the process manager assigned nodes

2007-03-30 Thread Prakash Velayutham
Hello,

I have Torque as the batch manager and Open MPI (1.0.1) as the MPI
library. Initially I request for 'n' processors through Torque. After
the Open MPI jobs starts, based on certain conditions, I want to acquire
more processors outside of the initially assigned nodes by Torque. Is
this a problem? Is this why my MPI_Comm_spawn is failing (where I say
the MPI_Info element's key as "host" and value as the hostname of the
new node outside of Torque's initial assignment)?

Any ideas?

Thanks,
Prakash


Re: [OMPI users] Measuring MPI message size used by application

2007-03-30 Thread Heywood, Todd
George,

It turns out I didn't have libunwind either, but didn't notice since mpiP
compiled/linked without it (OK, so I should have checked the config log).
However, once I got it it wouldn't compile on my RHEL system.

So, following this thread:

http://www.mail-archive.com/libunwind-devel@nongnu.org/msg00067.html

I had to download an alpha version of libunwind:

http://download.savannah.nongnu.org/releases/libunwind/libunwind-snap-070224
.tar.gz

... And build it with:

CFLAGS=-fPIC ./configure
make CFLAGS=-fPIC LDFLAGS=-fPIC shared
make CFLAGS=-fPIC LDFLAGS=-fPIC install

After that, everything went as you described. The "strange readings" in the
output did list the Parent_Funct's though:

---
@--- Callsites: 5 -
---
 ID Lev File/AddressLine Parent_Funct MPI_Call
  1   0 0x0041341d   RecvData Recv
  2   0 0x004133c7   SendData Send
  3   0 0x004134b9   SendRepeat   Send
  4   0 0x00413315   Sync Barrier
  5   0 0x004134ef   RecvRepeat   Recv


Thanks for the help!

Todd


On 3/29/07 5:48 PM, "George Bosilca"  wrote:

> I used it on a IA64 platform, so I supposed x86_64 is supported, but
> I never use it on an AMD 64. On the mpiP webpage they claim they
> support the Cray XT3, which as far as I know are based on AMD Opteron
> 64 bits. So, there is at least a spark of hope in the dark ...
> 
> I decide to give it a try on my x86_64 AMD box (Debian based system).
> First problem, my box didn't have the libunwind. Not a big deal, it's
> freely available on HP website (http://www.hpl.hp.com/research/linux/
> libunwind/download.php4). Few minutes later, the libunwind was
> installed in /lib64. Now, time to focus on mpiP ... For some obscure
> reason the configure script was unable to detect my g77 compiler
> (whatever!!!) nor the installation of libunwind. Moreover, it keep
> trying to use the clock_gettime call. Fortunately (which make me
> think I'm not the only one having trouble with this), mpiP provide
> configure options for all these. The final configure line was: ./
> configure --prefix=/opt/ --without-f77 --with-wtime --with-include=-I/
> include --with-lib=-L/lib64. Then a quick "make shared" followed by
> "make install", complete the work. So, at least mpiP can compile on a
> x86_64 box.
> 
> Now, I modify the makefile of NetPIPE, and add the "-lmpiP -lunwind",
> compile NetPIPE and run it. The mpiP headers showed up, the
> application run to completion and my human readable output was there.
> 
> @ mpiP
> @ Command : ./NPmpi
> @ Version  : 3.1.0
> @ MPIP Build date  : Mar 29 2007, 13:35:47
> @ Start time   : 2007 03 29 13:43:40
> @ Stop time: 2007 03 29 13:44:42
> @ Timer Used   : PMPI_Wtime
> @ MPIP env var : [null]
> @ Collector Rank   : 0
> @ Collector PID: 22838
> @ Final Output Dir : .
> @ Report generation: Single collector task
> @ MPI Task Assignment  : 0 dancer
> @ MPI Task Assignment  : 1 dancer
> 
> However, I got some strange reading inside the output.
> 
> ---
> @--- Callsites: 5
> -
> 
> ---
> ID Lev File/AddressLine Parent_Funct MPI_Call
>1   0 0x00402ffb   [unknown]Barrier
>2   0 0x00403103   [unknown]Recv
>3   0 0x004030ad   [unknown]Send
>4   0 0x0040319f   [unknown]Send
>5   0 0x004031d5   [unknown]Recv
> 
> I didn't dig further to see why. But, this prove that for at least a
> basic usage (general statistics gathering) mpiP works on x86_64
> platforms.
> 
>Have fun,
>  george.
> 
> On Mar 29, 2007, at 11:32 AM, Heywood, Todd wrote:
> 
>> George,
>> 
>> Any other simple, small, text-based (!) suggestions? mpiP seg
>> faults on
>> x86_64, and indeed its web page doesn't list x86_64 Linux as a
>> supported
>> platform.
>> 
>> Todd
>> 
>> 
>> On 3/28/07 10:39 AM, "George Bosilca"  wrote:
>> 
>>> Stephen,
>>> 
>>> There are a huge number of MPI profiling tools out there. My
>>> preference will be something small, fast and where the output is in
>>> human readable text format (and not fancy graphics). The tools I'm
>>> talking about is called mpiP (http://mpip.sourceforge.net/). It's not
>>> Open MPI specific, but it's really simple to use.
>>> 
>>>george.
>>> 
>>> On Mar 28, 2007, at 10:10 AM, stephen mulcahy wrote:
>>> 
 Hi,

Re: [OMPI users] Measuring MPI message size used by application

2007-03-30 Thread Heywood, Todd
P.s. I just found out you have to recompile/relink the MPI code with -g in
order for the File/Address field to show non-garbage.


On 3/30/07 2:43 PM, "Heywood, Todd"  wrote:

> George,
> 
> It turns out I didn't have libunwind either, but didn't notice since mpiP
> compiled/linked without it (OK, so I should have checked the config log).
> However, once I got it it wouldn't compile on my RHEL system.
> 
> So, following this thread:
> 
> http://www.mail-archive.com/libunwind-devel@nongnu.org/msg00067.html
> 
> I had to download an alpha version of libunwind:
> 
> http://download.savannah.nongnu.org/releases/libunwind/libunwind-snap-070224
> .tar.gz
> 
> ... And build it with:
> 
> CFLAGS=-fPIC ./configure
> make CFLAGS=-fPIC LDFLAGS=-fPIC shared
> make CFLAGS=-fPIC LDFLAGS=-fPIC install
> 
> After that, everything went as you described. The "strange readings" in the
> output did list the Parent_Funct's though:
> 
> ---
> @--- Callsites: 5 -
> ---
>  ID Lev File/AddressLine Parent_Funct MPI_Call
>   1   0 0x0041341d   RecvData Recv
>   2   0 0x004133c7   SendData Send
>   3   0 0x004134b9   SendRepeat   Send
>   4   0 0x00413315   Sync Barrier
>   5   0 0x004134ef   RecvRepeat   Recv
> 
> 
> Thanks for the help!
> 
> Todd
> 
> 
> On 3/29/07 5:48 PM, "George Bosilca"  wrote:
> 
>> I used it on a IA64 platform, so I supposed x86_64 is supported, but
>> I never use it on an AMD 64. On the mpiP webpage they claim they
>> support the Cray XT3, which as far as I know are based on AMD Opteron
>> 64 bits. So, there is at least a spark of hope in the dark ...
>> 
>> I decide to give it a try on my x86_64 AMD box (Debian based system).
>> First problem, my box didn't have the libunwind. Not a big deal, it's
>> freely available on HP website (http://www.hpl.hp.com/research/linux/
>> libunwind/download.php4). Few minutes later, the libunwind was
>> installed in /lib64. Now, time to focus on mpiP ... For some obscure
>> reason the configure script was unable to detect my g77 compiler
>> (whatever!!!) nor the installation of libunwind. Moreover, it keep
>> trying to use the clock_gettime call. Fortunately (which make me
>> think I'm not the only one having trouble with this), mpiP provide
>> configure options for all these. The final configure line was: ./
>> configure --prefix=/opt/ --without-f77 --with-wtime --with-include=-I/
>> include --with-lib=-L/lib64. Then a quick "make shared" followed by
>> "make install", complete the work. So, at least mpiP can compile on a
>> x86_64 box.
>> 
>> Now, I modify the makefile of NetPIPE, and add the "-lmpiP -lunwind",
>> compile NetPIPE and run it. The mpiP headers showed up, the
>> application run to completion and my human readable output was there.
>> 
>> @ mpiP
>> @ Command : ./NPmpi
>> @ Version  : 3.1.0
>> @ MPIP Build date  : Mar 29 2007, 13:35:47
>> @ Start time   : 2007 03 29 13:43:40
>> @ Stop time: 2007 03 29 13:44:42
>> @ Timer Used   : PMPI_Wtime
>> @ MPIP env var : [null]
>> @ Collector Rank   : 0
>> @ Collector PID: 22838
>> @ Final Output Dir : .
>> @ Report generation: Single collector task
>> @ MPI Task Assignment  : 0 dancer
>> @ MPI Task Assignment  : 1 dancer
>> 
>> However, I got some strange reading inside the output.
>> 
>> ---
>> @--- Callsites: 5
>> -
>> 
>> ---
>> ID Lev File/AddressLine Parent_Funct MPI_Call
>>1   0 0x00402ffb   [unknown]Barrier
>>2   0 0x00403103   [unknown]Recv
>>3   0 0x004030ad   [unknown]Send
>>4   0 0x0040319f   [unknown]Send
>>5   0 0x004031d5   [unknown]Recv
>> 
>> I didn't dig further to see why. But, this prove that for at least a
>> basic usage (general statistics gathering) mpiP works on x86_64
>> platforms.
>> 
>>Have fun,
>>  george.
>> 
>> On Mar 29, 2007, at 11:32 AM, Heywood, Todd wrote:
>> 
>>> George,
>>> 
>>> Any other simple, small, text-based (!) suggestions? mpiP seg
>>> faults on
>>> x86_64, and indeed its web page doesn't list x86_64 Linux as a
>>> supported
>>> platform.
>>> 
>>> Todd
>>> 
>>> 
>>> On 3/28/07 10:39 AM, "George Bosilca"  wrote:
>>> 
 Stephen,
 
 There are a huge number of MPI profiling tools out there. My
 preference will be somet

Re: [OMPI users] Measuring MPI message size used by application

2007-03-30 Thread Chris Chambreau


Hi Folks,

It's great to hear that people are interested in mpiP!

Currently, I am configuring mpiP on x86_64 with gcc 3.4.4 with -O2 and
without libunwind.

When running some simple tests, I'm having good luck using both mpiP
stack walking and libunwind when compiling with gcc and -O2.   However,
it looks to me like compiling the mpiP library or an application with
-O3 will cause stack walking with the mpiP-implemented stack walking
code to segfault.  If I configure mpiP to use libunwind and compile my
application with -O3, some libunwind calls fail and some MPI call sites
are not recorded.  It looks to me like building and running with -O3
with the Intel compiler (v9.1) is successful.

At this point the work-around for gcc appears to be building with -O2.
Hopefully we can get this sorted out by the next mpiP release.

The issue that George points out with the failed address lookup could be
due either to not compiling with -g or with the version of binutils he
is using.  I've successfully used binutils versions 2.15.92 and 2.16.1,
but have run into some issues with binutils-2.17.

We don't hear all that much from mpiP users, so if you run into annoying
issues with mpiP that you'd like sorted out or just have general
questions or comments about the tool, please let us know at
mpip-h...@lists.sourceforge.net .

Thanks!

-Chris




Re: [OMPI users] Measuring MPI message size used by application

2007-03-30 Thread Chris Chambreau


The mpip-help mail list is mpip-help at lists.sourceforge.net.

-Chris


Chris Chambreau wrote:


Hi Folks,

It's great to hear that people are interested in mpiP!

Currently, I am configuring mpiP on x86_64 with gcc 3.4.4 with -O2 and
without libunwind.

When running some simple tests, I'm having good luck using both mpiP
stack walking and libunwind when compiling with gcc and -O2.   However,
it looks to me like compiling the mpiP library or an application with
-O3 will cause stack walking with the mpiP-implemented stack walking
code to segfault.  If I configure mpiP to use libunwind and compile my
application with -O3, some libunwind calls fail and some MPI call sites
are not recorded.  It looks to me like building and running with -O3
with the Intel compiler (v9.1) is successful.

At this point the work-around for gcc appears to be building with -O2.
Hopefully we can get this sorted out by the next mpiP release.

The issue that George points out with the failed address lookup could be
due either to not compiling with -g or with the version of binutils he
is using.  I've successfully used binutils versions 2.15.92 and 2.16.1,
but have run into some issues with binutils-2.17.

We don't hear all that much from mpiP users, so if you run into annoying
issues with mpiP that you'd like sorted out or just have general
questions or comments about the tool, please let us know at
mpip-h...@lists.sourceforge.net .

Thanks!

-Chris






[OMPI users] migration FAQ

2007-03-30 Thread Geoff Galitz



Sidenote -- maybe I should create a "I used to be a LAM user" section
of the FAQ...



Actually a migration FAQ would be a good idea.  I am another former  
LAM user and had lots of questions about parameter syntax and "I did  
it in LAM this way, how do I do it here?"  I had the luxury of time  
to do some empirical testing but a migration FAQ would be useful to  
folks, I think.


-geoff

Geoff Galitz
ge...@galitz.org





[OMPI users] mca_btl_mx_init: mx_open_endpoint() failed with status=20

2007-03-30 Thread de Almeida, Valmor F.

Hello,

I am getting this error any time the number of processes requested per
machine is greater than the number of cpus. I suspect it is something on
the configuration of mx / ompi that I am missing since another machine I
have without mx installed runs ompi correctly with oversubscription.

Thanks for any help.

--
Valmor


->mpirun -np 3 --machinefile mymachines-1 a.out
[x1:23624] mca_btl_mx_init: mx_open_endpoint() failed with status=20
[x1:23624] *** Process received signal *** [x1:23624] Signal:
Segmentation fault (11) [x1:23624] Signal code: Address not mapped (1)
[x1:23624] Failing at address: 0x20 [x1:23624] [ 0] [0xb7f7f440]
[x1:23624] [ 1]
/opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_finalize+0x25)
[0xb7aca825] [x1:23624] [ 2]
/opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_component_init+0x6
f8) [0xb7acc658] [x1:23624] [ 3]
/opt/ompi/lib/libmpi.so.0(mca_btl_base_select+0x1a0) [0xb7f41900]
[x1:23624] [ 4]
/opt/openmpi-1.2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x2
6) [0xb7ad1006] [x1:23624] [ 5]
/opt/ompi/lib/libmpi.so.0(mca_bml_base_init+0x78) [0xb7f41198]
[x1:23624] [ 6]
/opt/openmpi-1.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_component_init+0
x7d) [0xb7af866d] [x1:23624] [ 7]
/opt/ompi/lib/libmpi.so.0(mca_pml_base_select+0x176) [0xb7f49b56]
[x1:23624] [ 8] /opt/ompi/lib/libmpi.so.0(ompi_mpi_init+0x4cf)
[0xb7f0fe2f] [x1:23624] [ 9] /opt/ompi/lib/libmpi.so.0(MPI_Init+0xab)
[0xb7f3204b] [x1:23624] [10] a.out(_ZN3MPI4InitERiRPPc+0x18) [0x8052cbe]
[x1:23624] [11] a.out(main+0x21) [0x804f4a7] [x1:23624] [12]
/lib/libc.so.6(__libc_start_main+0xdc) [0xb7be9824]

content of mymachines-1 file

x1  max_slots=4





[OMPI users] OpenMPI for Windows?

2007-03-30 Thread Rohit_Singh

Hi all,

I'm somewhat new to OpenMPI, but I'm currently evaluating it as a
communications mechanism between Windows and Unix servers.

I noticed that under your FAQs (
http://www.open-mpi.org/faq/?category=supported-systems), it says:

  There are plans to support Microsoft Windows in the not-distant
future.

When is the not-distant future?  Is it in scope for this year?  Will the
Windows support require a Unix emulation layer like Cygwin?

I do apologize if the information I'm requesting here is confidential in
nature.

Thanks again.

---
Rohit S. Singh  Phone: (905) 366-6164
Software Developer  Fax: (905) 273-9789
Logitech International S. A.Toll Free: 1-866-291-1505
Remote Control Business Unithttp://www.logitech.com/harmony


Re: [OMPI users] mca_btl_mx_init: mx_open_endpoint() failed with status=20

2007-03-30 Thread Tim Prins
Hi Valmor,

What is happening here is that when Open MPI tries to create MX endpoint for 
communication, mx returns code 20, which is MX_BUSY.

At this point we should gracefully move on, but there is a bug in Open MPI 1.2 
which causes a segmentation fault in case of this type of error. This will be 
fixed in 1.2.1, and the fix is available now in the 1.2 nightly tarballs. 

Hope this helps,

Tim

On Friday 30 March 2007 05:06 pm, de Almeida, Valmor F. wrote:
> Hello,
>
> I am getting this error any time the number of processes requested per
> machine is greater than the number of cpus. I suspect it is something on
> the configuration of mx / ompi that I am missing since another machine I
> have without mx installed runs ompi correctly with oversubscription.
>
> Thanks for any help.
>
> --
> Valmor
>
>
> ->mpirun -np 3 --machinefile mymachines-1 a.out
> [x1:23624] mca_btl_mx_init: mx_open_endpoint() failed with status=20
> [x1:23624] *** Process received signal *** [x1:23624] Signal:
> Segmentation fault (11) [x1:23624] Signal code: Address not mapped (1)
> [x1:23624] Failing at address: 0x20 [x1:23624] [ 0] [0xb7f7f440]
> [x1:23624] [ 1]
> /opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_finalize+0x25)
> [0xb7aca825] [x1:23624] [ 2]
> /opt/openmpi-1.2/lib/openmpi/mca_btl_mx.so(mca_btl_mx_component_init+0x6
> f8) [0xb7acc658] [x1:23624] [ 3]
> /opt/ompi/lib/libmpi.so.0(mca_btl_base_select+0x1a0) [0xb7f41900]
> [x1:23624] [ 4]
> /opt/openmpi-1.2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x2
> 6) [0xb7ad1006] [x1:23624] [ 5]
> /opt/ompi/lib/libmpi.so.0(mca_bml_base_init+0x78) [0xb7f41198]
> [x1:23624] [ 6]
> /opt/openmpi-1.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_component_init+0
> x7d) [0xb7af866d] [x1:23624] [ 7]
> /opt/ompi/lib/libmpi.so.0(mca_pml_base_select+0x176) [0xb7f49b56]
> [x1:23624] [ 8] /opt/ompi/lib/libmpi.so.0(ompi_mpi_init+0x4cf)
> [0xb7f0fe2f] [x1:23624] [ 9] /opt/ompi/lib/libmpi.so.0(MPI_Init+0xab)
> [0xb7f3204b] [x1:23624] [10] a.out(_ZN3MPI4InitERiRPPc+0x18) [0x8052cbe]
> [x1:23624] [11] a.out(main+0x21) [0x804f4a7] [x1:23624] [12]
> /lib/libc.so.6(__libc_start_main+0xdc) [0xb7be9824]
>
> content of mymachines-1 file
>
> x1  max_slots=4
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users