Re: [OMPI users] Heterogeneous OpenFabrics hardware

2009-01-27 Thread Samuel Sarholz

Hi,

I can think of a few scenarios where interoperability would be helpful,
but I guess in most case you can live without.

1. Some university departments buy tiny clusters (4-8 nodes) and when 
more projects/funding become available the next one. Thus ending up with 
2-4 different CPU generations or steppings and probably different HCA 
version. If your MPI program does load balancing you probably don't case 
about slightly different CPU speeds and you are glad if you can use all 
machines.


2. You operate a medium to large size cluster (300 nodes +) and after 
e.g. a year few HCAs might break and you have to replace them. I can 
imagine that it is hard to get an HCA with exactly the same chipset.
If you end up with a few nodes that can't run MPI programs with the rest 
 that would be unfortunate.


best regards,
Samuel

Don Kerr wrote:

Jeff,

Did IWG say anything about there being a chip set issue?Example what 
if a vender, say Sun, wraps Mellanox  chips and on its own HCAs, would 
Mellanox HCA and Sun HCA work together?


-DON

On 01/26/09 14:19, Jeff Squyres wrote:
The Interop Working Group (IWG) of the OpenFabrics Alliance asked me 
to bring a question to the Open MPI user and developer communities: is 
anyone interested in having a single MPI job span HCAs or RNICs from 
multiple vendors?  (pardon the cross-posting, but I did want to ask 
each group separately -- because the answers may be different)


The interop testing lab at the University of New Hampshire 
(http://www.iol.unh.edu/services/testing/ofa/) discovered that most 
(all?) MPI implementations fail when having a single MPI job span HCAs 
from multiple vendors and/or span RNICs from multiple vendors.  I 
don't remember the exact details (and they may not be public, anyway), 
but I'm pretty sure that OMPI failed when used with QLogic and 
Mellanox HCAs in a single MPI job.  This is fairly unsurprising, given 
how we tune Open MPI's use of OpenFabrics-capable hardware based on 
our .ini file.


So my question is: does anyone want/need to support jobs that span 
HCAs from multiple vendors and/or RNICs from multiple vendors?



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] Asynchronous behaviour of MPI Collectives

2009-01-27 Thread Gabriele Fatigati
Wow! Great and useful explanation.
Thanks Jeff .

2009/1/23 Jeff Squyres :
> FWIW, OMPI v1.3 is much better that registered memory usage than the 1.2
> series.  We introduced some new things, to include being able to specify
> exactly what receive queues you want.  See:
>
> ...gaaah!  It's not on our FAQ yet.  :-(
>
> The main idea is that there is a new MCA parameter for the openib BTL:
> btl_openib_receive_queues.  It takes a colon-delimited string listing one or
> more receive queues of specific sizes and characteristics.  For now, all
> processes in the job *must* use the same string.  You can specify three
> kinds of receive queues:
>
> - P: per-peer queues
> - S: shared receive queues
> - X: XRC queues (with OFED 1.4 and later with specific Mellanox hardware)
>
> Here's a copy-n-paste of our help file describing the format of each:
>
> Per-peer receive queues require between 1 and 5 parameters:
>
>  1. Buffer size in bytes (mandatory)
>  2. Number of buffers (optional; defaults to 8)
>  3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
>  4. Credit window size (optional; defaults to (low_watermark / 2))
>  5. Number of buffers reserved for credit messages (optional;
> defaults to (num_buffers*2-1)/credit_window)
>
>  Example: P,128,256,128,16
>  - 128 byte buffers
>  - 256 buffers to receive incoming MPI messages
>  - When the number of available buffers reaches 128, re-post 128 more
>buffers to reach a total of 256
>  - If the number of available credits reaches 16, send an explicit
>credit message to the sender
>  - Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
>reserved for explicit credit messages
>
> Shared receive queues can take between 1 and 4 parameters:
>
>  1. Buffer size in bytes (mandatory)
>  2. Number of buffers (optional; defaults to 16)
>  3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
>  4. Maximum number of outstanding sends a sender can have (optional;
> defaults to (low_watermark / 4)
>
>  Example: S,1024,256,128,32
>  - 1024 byte buffers
>  - 256 buffers to receive incoming MPI messages
>  - When the number of available buffers reaches 128, re-post 128 more
>buffers to reach a total of 256
>  - A sender will not send to a peer unless it has less than 32
>outstanding sends to that peer.
>
> IIRC, "X" takes the same parameters as "S"...?  Note that if you you *any*
> XRC queues, then *all* of your queues must be XRC.
>
> OMPI defaults to a btl_receive_queues value that may be specific to your
> hardware.  For example, connectx defaults to the following value:
>
> shell$ ompi_info --param btl openib --parsable | grep receive_queues
> mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32
> mca:btl:openib:param:btl_openib_receive_queues:data_source:default value
> mca:btl:openib:param:btl_openib_receive_queues:status:writable
> mca:btl:openib:param:btl_openib_receive_queues:help:Colon-delimited, comma
> delimited list of receive queues: P,4096,8,6,4:P,32768,8,6,4
> mca:btl:openib:param:btl_openib_receive_queues:deprecated:no
>
> Hope that helps!
>
>
>
>
> On Jan 23, 2009, at 9:27 AM, Igor Kozin wrote:
>
>> Hi Gabriele,
>> it might be that your message size is too large for available memory per
>> node.
>> I had a problem with IMB when I was not able to run to completion Alltoall
>> on N=128, ppn=8 on our cluster with 16 GB per node. You'd think 16 GB is
>> quite a lot but when you do the maths:
>> 2* 4 MB * 128 procs * 8 procs/node = 8 GB/node plus you need to double
>> because of buffering. I was told by Mellanox (our cards are ConnectX cards)
>> that they introduced XRC in OFED 1.3 in addition to Share Receive Queue
>> which should reduce memory foot print but I have not tested this yet.
>> HTH,
>> Igor
>> 2009/1/23 Gabriele Fatigati 
>> Hi Igor,
>> My message size is 4096kb and i have 4 procs per core.
>> There isn't any difference using different algorithms..
>>
>> 2009/1/23 Igor Kozin :
>> > what is your message size and the number of cores per node?
>> > is there any difference using different algorithms?
>> >
>> > 2009/1/23 Gabriele Fatigati 
>> >>
>> >> Hi Jeff,
>> >> i would like to understand why, if i run over 512 procs or more, my
>> >> code stops over mpi collective, also with little send buffer. All
>> >> processors are locked into call, doing nothing. But, if i add
>> >> MPI_Barrier  after MPI collective, it works! I run over Infiniband
>> >> net.
>> >>
>> >> I know many people with this strange problem, i think there is a
>> >> strange interaction between Infiniband and OpenMPI that causes it.
>> >>
>> >>
>> >>
>> >> 2009/1/23 Jeff Squyres :
>> >> > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
>> >> >
>> >> >> I've noted that OpenMPI has an asynchronous behaviour in the
>> >> >> collective
>> >> >> calls.
>> >> >> The processors, doesn't wait that other procs arrives in the call.
>> >> >
>> >> > T

Re: [OMPI users] Heterogeneous OpenFabrics hardware

2009-01-27 Thread Peter Kjellstrom
On Monday 26 January 2009, Jeff Squyres wrote:
> The Interop Working Group (IWG) of the OpenFabrics Alliance asked me
> to bring a question to the Open MPI user and developer communities: is
> anyone interested in having a single MPI job span HCAs or RNICs from
> multiple vendors?  (pardon the cross-posting, but I did want to ask
> each group separately -- because the answers may be different)
>
> The interop testing lab at the University of New Hampshire
> (http://www.iol.unh.edu/services/testing/ofa/ ) discovered that most (all?)
> MPI implementations fail when having a single MPI job span HCAs from
> multiple vendors and/or span RNICs from multiple vendors.  I don't remember
> the exact details (and they may not be public, anyway), but I'm pretty sure
> that OMPI failed when used with QLogic and Mellanox HCAs in a single MPI
> job.  This is fairly unsurprising, given how we tune Open MPI's use of
> OpenFabrics-capable hardware based on our .ini file.
>
> So my question is: does anyone want/need to support jobs that span
> HCAs from multiple vendors and/or RNICs from multiple vendors?

For these three cases:

1) Different vedor id but same OFED driver and basic chip
2) Same chip vendor, different OFED driver (mthca vs mlx4)
3) Any OFED supported IB HCA

IMHO:

Number one should just work. We may at times have some nodes with HCAs that 
have been flashed with non-standard/non-vendor firmware.

Number two is something I would kind of expect to work. A possible situation 
where I'd need it is if I temporarily use an older HCA (mthca) to get a node 
going on a cluster with ConnectX (mlx4). Another case could be a cluster with 
two partitions with different HCAs.

Number three would be nice to have. I think many users would assume it to 
work. Why not? They have symmetric software, all nodes run OFED, all have 
working IB... It would have worked if their nodes had had different kinds of 
ethernet NICS...

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] Heterogeneous OpenFabrics hardware

2009-01-27 Thread Jeff Squyres
It is worth clarifying a point in this discussion that I neglected to  
mention in my initial post: although Open MPI may not work *by  
default* with heterogeneous HCAs/RNICs, it is quite possible/likely  
that if you manually configure Open MPI to use the same verbs/hardware  
settings across all your HCAs/RNICs (assuming that you use a set of  
values that is compatible with all your hardware) that MPI jobs  
spanning multiple different kinds of HCAs or RNICs will work fine.


See this post on the devel list for a few more details:

http://www.open-mpi.org/community/lists/devel/2009/01/5314.php



On Jan 27, 2009, at 6:08 AM, Peter Kjellstrom wrote:


On Monday 26 January 2009, Jeff Squyres wrote:

The Interop Working Group (IWG) of the OpenFabrics Alliance asked me
to bring a question to the Open MPI user and developer communities:  
is

anyone interested in having a single MPI job span HCAs or RNICs from
multiple vendors?  (pardon the cross-posting, but I did want to ask
each group separately -- because the answers may be different)

The interop testing lab at the University of New Hampshire
(http://www.iol.unh.edu/services/testing/ofa/ ) discovered that  
most (all?)

MPI implementations fail when having a single MPI job span HCAs from
multiple vendors and/or span RNICs from multiple vendors.  I don't  
remember
the exact details (and they may not be public, anyway), but I'm  
pretty sure
that OMPI failed when used with QLogic and Mellanox HCAs in a  
single MPI
job.  This is fairly unsurprising, given how we tune Open MPI's use  
of

OpenFabrics-capable hardware based on our .ini file.

So my question is: does anyone want/need to support jobs that span
HCAs from multiple vendors and/or RNICs from multiple vendors?


For these three cases:

1) Different vedor id but same OFED driver and basic chip
2) Same chip vendor, different OFED driver (mthca vs mlx4)
3) Any OFED supported IB HCA

IMHO:

Number one should just work. We may at times have some nodes with  
HCAs that

have been flashed with non-standard/non-vendor firmware.

Number two is something I would kind of expect to work. A possible  
situation
where I'd need it is if I temporarily use an older HCA (mthca) to  
get a node
going on a cluster with ConnectX (mlx4). Another case could be a  
cluster with

two partitions with different HCAs.

Number three would be nice to have. I think many users would assume  
it to
work. Why not? They have symmetric software, all nodes run OFED, all  
have
working IB... It would have worked if their nodes had had different  
kinds of

ethernet NICS...

/Peter
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7

2009-01-27 Thread Anthony Thevenin

Hello,

I have two C codes :
   - master.c : spawns a slave
   - slave.c : spwaned by the master

If the spawn is include in a do-loop, I can do only 123 spawns before having 
the folowing errors:

ORTE_ERROR_LOG: The system limit on number of pipes a process can open was 
reached in file base/iof_base_setup.c at line 112
ORTE_ERROR_LOG: The system limit on number of pipes a process can open was 
reached in file odls_default_module.c at line 203

This test works perfectly even for a lot of spawns (more than 1000) with 
Open-MPI 1.2.7.

You will find the following files attached:
config.log.tgz
ompi_info.out.tgz
ifconfig.out.tgz
master.c.tgz
slave.c.tgz


command used to run my application :
mpirun -n 1 ./master

COMPILER:
PGI 7.1

PATH : 
/space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:.


LD_LIBRARY_PATH:
/space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib


If you have any idea of what this occurs, please tell me what to do to 
make it works.

Thank you very much


Anthony





config.log.tgz
Description: application/compressed-tar


ifconfig.out.tgz
Description: application/compressed-tar


master.c.tgz
Description: application/compressed-tar


ompi_info.out.tgz
Description: application/compressed-tar


slave.c.tgz
Description: application/compressed-tar


Re: [OMPI users] Heterogeneous OpenFabrics hardware

2009-01-27 Thread Peter Kjellstrom
On Tuesday 27 January 2009, Jeff Squyres wrote:
> It is worth clarifying a point in this discussion that I neglected to
> mention in my initial post: although Open MPI may not work *by
> default* with heterogeneous HCAs/RNICs, it is quite possible/likely
> that if you manually configure Open MPI to use the same verbs/hardware
> settings across all your HCAs/RNICs (assuming that you use a set of
> values that is compatible with all your hardware) that MPI jobs
> spanning multiple different kinds of HCAs or RNICs will work fine.
>
> See this post on the devel list for a few more details:
>
>  http://www.open-mpi.org/community/lists/devel/2009/01/5314.php

So is it correct that each rank will check its HCA-model and then pick up 
suitable settings for that HCA?

If so maybe OpenMPI could fall back to a very conservative settings if more 
than one HCA model was detected among the ranks. Or would this require 
communication in a stage where that would be complicated and/or ugly?

/Peter


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7

2009-01-27 Thread Ralph Castain
Just to be clear - you are doing over 1000 MPI_Comm_spawn calls to  
launch all the procs on a single node???


In the 1.2 series, every call to MPI_Comm_spawn would launch another  
daemon on the node, which would then fork/exec the specified app. If  
you look at your process table, you will see a whole lot of "orted"  
processes. Thus, you wouldn't run out of pipes because every orted  
only opened enough for a single process.


In the 1.3 series, there is only one daemon on each node (mpirun fills  
that function on its node). MPI_Comm_spawn simply reuses that daemon  
to launch the new proc(s). Thus, there is a limit to the number of  
procs you can start on any node that is set by the #pipes a process  
can open.


You can adjust that number, of course. You can look it up readily  
enough for your particular system. However, you may find that 1000  
comm_spawns on a single node will lead to poor performance as the  
procs contend for processor attention.


Hope that helps
Ralph


On Jan 27, 2009, at 7:59 AM, Anthony Thevenin wrote:


Hello,

I have two C codes :
  - master.c : spawns a slave
  - slave.c : spwaned by the master

If the spawn is include in a do-loop, I can do only 123 spawns  
before having the folowing errors:


ORTE_ERROR_LOG: The system limit on number of pipes a process can  
open was reached in file base/iof_base_setup.c at line 112
ORTE_ERROR_LOG: The system limit on number of pipes a process can  
open was reached in file odls_default_module.c at line 203


This test works perfectly even for a lot of spawns (more than 1000)  
with Open-MPI 1.2.7.


You will find the following files attached:
config.log.tgz
ompi_info.out.tgz
ifconfig.out.tgz
master.c.tgz
slave.c.tgz


command used to run my application :
mpirun -n 1 ./master

COMPILER:
PGI 7.1

PATH : /space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/ 
usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/ 
matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/ 
bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:.


LD_LIBRARY_PATH:
/space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib


If you have any idea of what this occurs, please tell me what to do  
to make it works.

Thank you very much


Anthony



< 
config 
.log 
.tgz 
> 
< 
ifconfig 
.out 
.tgz 
> 
< 
master 
.c 
.tgz 
> 
< 
ompi_info 
.out.tgz>___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7

2009-01-27 Thread Anthony Thevenin

Thank you!

Yes, I am trying to do over 1000 MPI_Comm_spawn on a single node.
But as I mentioned in my previous email, the MPI_Comm_spawn is in a 
do-loop. So in this single node, I only have 2 procs (master and slave). 
The next spawned slave comes only when the previous slave is dead.
We (my team and me) are developing a coupler which launch the codes 
dynamically. Sometimes, depending on the coupling algorithm, we need to 
spawn a code (which can be parallel or not) a lot of times (more than 1000).


Anthony




Ralph Castain wrote:
Just to be clear - you are doing over 1000 MPI_Comm_spawn calls to 
launch all the procs on a single node???


In the 1.2 series, every call to MPI_Comm_spawn would launch another 
daemon on the node, which would then fork/exec the specified app. If 
you look at your process table, you will see a whole lot of "orted" 
processes. Thus, you wouldn't run out of pipes because every orted 
only opened enough for a single process.


In the 1.3 series, there is only one daemon on each node (mpirun fills 
that function on its node). MPI_Comm_spawn simply reuses that daemon 
to launch the new proc(s). Thus, there is a limit to the number of 
procs you can start on any node that is set by the #pipes a process 
can open.


You can adjust that number, of course. You can look it up readily 
enough for your particular system. However, you may find that 1000 
comm_spawns on a single node will lead to poor performance as the 
procs contend for processor attention.


Hope that helps
Ralph


On Jan 27, 2009, at 7:59 AM, Anthony Thevenin wrote:


Hello,

I have two C codes :
  - master.c : spawns a slave
  - slave.c : spwaned by the master

If the spawn is include in a do-loop, I can do only 123 spawns before 
having the folowing errors:


ORTE_ERROR_LOG: The system limit on number of pipes a process can 
open was reached in file base/iof_base_setup.c at line 112
ORTE_ERROR_LOG: The system limit on number of pipes a process can 
open was reached in file odls_default_module.c at line 203


This test works perfectly even for a lot of spawns (more than 1000) 
with Open-MPI 1.2.7.


You will find the following files attached:
config.log.tgz
ompi_info.out.tgz
ifconfig.out.tgz
master.c.tgz
slave.c.tgz


command used to run my application :
mpirun -n 1 ./master

COMPILER:
PGI 7.1

PATH : 
/space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:. 



LD_LIBRARY_PATH:
/space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib


If you have any idea of what this occurs, please tell me what to do 
to make it works.

Thank you very much


Anthony



___ 


users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Heterogeneous OpenFabrics hardware

2009-01-27 Thread Jeff Squyres

On Jan 27, 2009, at 10:19 AM, Peter Kjellstrom wrote:


It is worth clarifying a point in this discussion that I neglected to
mention in my initial post: although Open MPI may not work *by
default* with heterogeneous HCAs/RNICs, it is quite possible/likely
that if you manually configure Open MPI to use the same verbs/ 
hardware

settings across all your HCAs/RNICs (assuming that you use a set of
values that is compatible with all your hardware) that MPI jobs
spanning multiple different kinds of HCAs or RNICs will work fine.

See this post on the devel list for a few more details:

http://www.open-mpi.org/community/lists/devel/2009/01/5314.php


So is it correct that each rank will check its HCA-model and then  
pick up

suitable settings for that HCA?


Correct.  We have an INI-style file that is installed in $pkgdir/mca- 
btl-openib-device-params.ini (typically expands to $prefix/share/ 
openmpi/mca-btl-openib-device-params.ini).  This file contains a bunch  
of device-specific parameters, but it also has a "general" section  
that can be applied to any device if no specific match is found.


If so maybe OpenMPI could fall back to a very conservative settings  
if more

than one HCA model was detected among the ranks. Or would this require
communication in a stage where that would be complicated and/or ugly?



Today we don't do this kind of check; we just assume that every other  
MPI process is using the same hardware and/or the settings pulled from  
the INI file will be compatible.  AFAIK, most (all?) other MPI's do  
the same thing.


We *could* do that kind of check:

a) there hasn't been enough customer demand for it / no one has  
submitted a patch to do so
b) it might be a bit complicated because the startup sequence in the  
openib BTL is a little complex
c) we are definitely moving to a scenario (at scale) where there is  
little/no communication at startup about coordinating information from  
all of the MPI peer processes; this strategy might be problematic in  
those scenarios (i.e., the coordination / determination of  
"conservative" settings would have to be done by a human and likely  
pre-posted to a file on each node -- still hand-waving a bit because  
that design isn't finalized/implemented yet)
d) programatically finding what "conservative" settings are workable  
across a wide variety of devices may be problematic because individual  
device capabilities can vary wildly (does it have SRQ?  can it support  
more than one BSRQ?  what's a good MTU?  ...?)


I think d) is a big sticking point; we *could* make extremely  
conservative settings that should probably work everywhere.  I can see  
at least one potential problematic scenario:


- cluster has N nodes
- a year later, an HCA in 1 node dies
- get a new HCA, perhaps even from a different vendor
- capabilities of the new HCA and old HCAs are different
- so OMPI falls back to "extreme conservative" settings
- jobs that run on that one node suffer in performance
- jobs that do not run on that node see "normal" performance
- users are confused

I suppose that we could print a Big Hairy Warning(tm) if we fall back  
to extreme conservative settings, but it still seems to create the  
potential to violate the Law of Least Astonishment.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI-1.3 and XGrid

2009-01-27 Thread Jeff Squyres
Thanks for reporting this Frank -- looks like we borked a symbol in  
the xgrid component in 1.3.  It seems that the compiler doesn't  
complain about the missing symbol; it only shows up when you try to  
*run* with it.  Whoops!


I filed ticket https://svn.open-mpi.org/trac/ompi/ticket/1777 about  
this issue.



On Jan 23, 2009, at 3:11 PM, Frank Kahle wrote:

I'm running OpenMPI on OS X 4.11. After upgrading to OpenMPI-1.3 I  
get the following error when submitting a job via XGrid:


dyld: lazy symbol binding failed: Symbol not found:  
_orte_pointer_array_add

 Referenced from: /usr/local/mpi/lib/openmpi/mca_plm_xgrid.so
 Expected in: flat namespace

Here you'll find ompi_info's output:
[g5-node-1:~] motte% ompi_info
Package: Open MPI root@ibi.local Distribution
   Open MPI: 1.3
  Open MPI SVN revision: r20295
  Open MPI release date: Jan 19, 2009
   Open RTE: 1.3
  Open RTE SVN revision: r20295
  Open RTE release date: Jan 19, 2009
   OPAL: 1.3
  OPAL SVN revision: r20295
  OPAL release date: Jan 19, 2009
   Ident string: 1.3
 Prefix: /usr/local/mpi
Configured architecture: powerpc-apple-darwin8
 Configure host: ibi.local
  Configured by: root
  Configured on: Tue Jan 20 19:45:26 CET 2009
 Configure host: ibi.local
   Built by: root
   Built on: Tue Jan 20 20:49:48 CET 2009
 Built host: ibi.local
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (single underscore)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc-4.3
C compiler absolute: /usr/local/bin/gcc-4.3
   C++ compiler: c++-4.3
  C++ compiler absolute: /usr/local/bin/c++-4.3
 Fortran77 compiler: gfortran-4.3
 Fortran77 compiler abs: /usr/local/bin/gfortran-4.3
 Fortran90 compiler: gfortran-4.3
 Fortran90 compiler abs: /usr/local/bin/gfortran-4.3
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
  Sparse Groups: no
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
  Heterogeneous support: no
mpirun default --prefix: no
MPI I/O support: yes
  MPI_WTIME support: gettimeofday
Symbol visibility support: yes
  FT Checkpoint support: no  (checkpoint thread: no)
  MCA backtrace: darwin (MCA v2.0, API v2.0, Component v1.3)
  MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3)
  MCA carto: auto_detect (MCA v2.0, API v2.0, Component  
v1.3)

  MCA carto: file (MCA v2.0, API v2.0, Component v1.3)
  MCA maffinity: first_use (MCA v2.0, API v2.0, Component  
v1.3)

  MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3)
MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3)
MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3)
MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3)
 MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3)
  MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3)
  MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3)
   MCA coll: basic (MCA v2.0, API v2.0, Component v1.3)
   MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3)
   MCA coll: inter (MCA v2.0, API v2.0, Component v1.3)
   MCA coll: self (MCA v2.0, API v2.0, Component v1.3)
   MCA coll: sm (MCA v2.0, API v2.0, Component v1.3)
   MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3)
 MCA io: romio (MCA v2.0, API v2.0, Component v1.3)
  MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3)
  MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3)
  MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3)
MCA pml: cm (MCA v2.0, API v2.0, Component v1.3)
MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3)
MCA pml: v (MCA v2.0, API v2.0, Component v1.3)
MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3)
 MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3)
MCA btl: self (MCA v2.0, API v2.0, Component v1.3)
MCA btl: sm (MCA v2.0, API v2.0, Component v1.3)
MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3)
   MCA topo: unity (MCA v2.0, API v2.0, Component v1.3)
MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3)
MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3)
MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3)
MCA iof: orted (MCA v2.0, API v2.0, Component v1.3)
MCA iof: tool (MCA v2.0, API v2.0, Compone

Re: [OMPI users] Cannot compile on Linux Itanium system

2009-01-27 Thread Jeff Squyres

Thanks Joe -- let us know what you find...

From his config.log, I think his configure line was:

./configure --prefix=/opt/openmpi-1.3

See the full attachment here (scroll down to the bottom of the web  
page):


http://www.open-mpi.org/community/lists/users/2009/01/7810.php



On Jan 26, 2009, at 4:31 PM, Joe Griffin wrote:


Tony,

I don't know what iac is.  I use ias for my ASM code:

ia64b <82> cd /opt/intel
ia64b <83> find . -name 'iac'
ia64b <84> find . -name 'ias'
./fc/10.1.012/bin/ias
./cc/10.1.012/bin/ias

Anyway, if you want another data point and see if my compilers work I
will gladly try to compile if you send me your configure / make lines.

Aiming to help if I can,
Joe



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]

On

Behalf Of Iannetti, Anthony C. (GRC-RTB0)
Sent: Monday, January 26, 2009 12:45 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Cannot compile on Linux Itanium system

Jeff,

   I could successfully compile OpenMPI versions 1.2.X on Itanium

Linux

with the same compilers.  I was never able to compile the 1.3 beta
versions on IA64 Linux.

Joe,

  I am using whatever assembler that ./configure provides.  I believe

it

is icc.  Should I set AS (I think) to iac?


Thanks,
Tony
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] Compilers

2009-01-27 Thread Amos Leffler
Hi all,
 I want to compile Open-mpi  using intel compilers.
Unfortunately the Series 10 C compiler(icc) license has expired.  I
downloaded and looked at the Series 11 C++ compiler (no C compiler listed)
and would like to know if you can use this together with an enclosed or
obtained C compiler from Intel.  The release notes are a bit overwhelming!
Is it possible to use the standard Linux gcc instead?

Amos Leffler


[OMPI users] v1.3: mca_common_sm_mmap_init error

2009-01-27 Thread Prentice Bisbal
I just installed OpenMPI 1.3 with tight integration for SGE. Version
1.2.8 was working just fine for several months in the same arrangement.

Now that I've upgraded to 1.3, I get the following errors in my standard
error file:

mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent
ice@node09.aurora_0/21400/1/shared_mem_pool.node09.aurora failed with
errno=2
[node23.aurora:20601] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node23.aurora_0/21400/1/shared_mem_pool.node23.aurora failed with
errno=2
[node46.aurora:12118] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node46.aurora_0/21400/1/shared_mem_pool.node46.aurora failed with
errno=2
[node15.aurora:12421] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node15.aurora_0/21400/1/shared_mem_pool.node15.aurora failed with
errno=2
[node20.aurora:12534] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node20.aurora_0/21400/1/shared_mem_pool.node20.aurora failed with
errno=2
[node16.aurora:12573] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node16.aurora_0/21400/1/shared_mem_pool.node16.aurora failed with
errno=2

I've tested 3-4 different times, and the number of hosts that produces
this error varies, as well as which hosts produce this error. My program
seems to run fun, but it's just a simple "Hello, World!" program. Any
ideas? Is this a bug in 1.3?


-- Prentice
-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-01-27 Thread Mostyn Lewis

Sort of ditto but with SVN release at 20123 (and earlier):

e.g.

[r2250_46:30018] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46
 failed with errno=2
[r2250_63:05292] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_63_0/25682/1/shared_mem_pool.r2250_63
 failed with errno=2
[r2250_57:17527] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_57_0/25682/1/shared_mem_pool.r2250_57
 failed with errno=2
[r2250_68:13553] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_68_0/25682/1/shared_mem_pool.r2250_68
 failed with errno=2
[r2250_50:06541] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_50_0/25682/1/shared_mem_pool.r2250_50
 failed with errno=2
[r2250_49:29237] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_49_0/25682/1/shared_mem_pool.r2250_49
 failed with errno=2
[r2250_66:19066] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_66_0/25682/1/shared_mem_pool.r2250_66
 failed with errno=2
[r2250_58:24902] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_58_0/25682/1/shared_mem_pool.r2250_58
 failed with errno=2
[r2250_69:27426] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_69_0/25682/1/shared_mem_pool.r2250_69
 failed with errno=2
[r2250_60:30560] mca_common_sm_mmap_init: open 
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_60_0/25682/1/shared_mem_pool.r2250_60
 failed with errno=2

File not found in sm.

10 of them across 32 nodes (8 cores per node (2 sockets x quad-core))
"Apparently harmless"?

DM

On Tue, 27 Jan 2009, Prentice Bisbal wrote:


I just installed OpenMPI 1.3 with tight integration for SGE. Version
1.2.8 was working just fine for several months in the same arrangement.

Now that I've upgraded to 1.3, I get the following errors in my standard
error file:

mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent
ice@node09.aurora_0/21400/1/shared_mem_pool.node09.aurora failed with
errno=2
[node23.aurora:20601] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node23.aurora_0/21400/1/shared_mem_pool.node23.aurora failed with
errno=2
[node46.aurora:12118] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node46.aurora_0/21400/1/shared_mem_pool.node46.aurora failed with
errno=2
[node15.aurora:12421] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node15.aurora_0/21400/1/shared_mem_pool.node15.aurora failed with
errno=2
[node20.aurora:12534] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node20.aurora_0/21400/1/shared_mem_pool.node20.aurora failed with
errno=2
[node16.aurora:12573] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node16.aurora_0/21400/1/shared_mem_pool.node16.aurora failed with
errno=2

I've tested 3-4 different times, and the number of hosts that produces
this error varies, as well as which hosts produce this error. My program
seems to run fun, but it's just a simple "Hello, World!" program. Any
ideas? Is this a bug in 1.3?


-- Prentice
--
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users