Re: [OMPI users] Heterogeneous OpenFabrics hardware
Hi, I can think of a few scenarios where interoperability would be helpful, but I guess in most case you can live without. 1. Some university departments buy tiny clusters (4-8 nodes) and when more projects/funding become available the next one. Thus ending up with 2-4 different CPU generations or steppings and probably different HCA version. If your MPI program does load balancing you probably don't case about slightly different CPU speeds and you are glad if you can use all machines. 2. You operate a medium to large size cluster (300 nodes +) and after e.g. a year few HCAs might break and you have to replace them. I can imagine that it is hard to get an HCA with exactly the same chipset. If you end up with a few nodes that can't run MPI programs with the rest that would be unfortunate. best regards, Samuel Don Kerr wrote: Jeff, Did IWG say anything about there being a chip set issue?Example what if a vender, say Sun, wraps Mellanox chips and on its own HCAs, would Mellanox HCA and Sun HCA work together? -DON On 01/26/09 14:19, Jeff Squyres wrote: The Interop Working Group (IWG) of the OpenFabrics Alliance asked me to bring a question to the Open MPI user and developer communities: is anyone interested in having a single MPI job span HCAs or RNICs from multiple vendors? (pardon the cross-posting, but I did want to ask each group separately -- because the answers may be different) The interop testing lab at the University of New Hampshire (http://www.iol.unh.edu/services/testing/ofa/) discovered that most (all?) MPI implementations fail when having a single MPI job span HCAs from multiple vendors and/or span RNICs from multiple vendors. I don't remember the exact details (and they may not be public, anyway), but I'm pretty sure that OMPI failed when used with QLogic and Mellanox HCAs in a single MPI job. This is fairly unsurprising, given how we tune Open MPI's use of OpenFabrics-capable hardware based on our .ini file. So my question is: does anyone want/need to support jobs that span HCAs from multiple vendors and/or RNICs from multiple vendors? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI users] Asynchronous behaviour of MPI Collectives
Wow! Great and useful explanation. Thanks Jeff . 2009/1/23 Jeff Squyres : > FWIW, OMPI v1.3 is much better that registered memory usage than the 1.2 > series. We introduced some new things, to include being able to specify > exactly what receive queues you want. See: > > ...gaaah! It's not on our FAQ yet. :-( > > The main idea is that there is a new MCA parameter for the openib BTL: > btl_openib_receive_queues. It takes a colon-delimited string listing one or > more receive queues of specific sizes and characteristics. For now, all > processes in the job *must* use the same string. You can specify three > kinds of receive queues: > > - P: per-peer queues > - S: shared receive queues > - X: XRC queues (with OFED 1.4 and later with specific Mellanox hardware) > > Here's a copy-n-paste of our help file describing the format of each: > > Per-peer receive queues require between 1 and 5 parameters: > > 1. Buffer size in bytes (mandatory) > 2. Number of buffers (optional; defaults to 8) > 3. Low buffer count watermark (optional; defaults to (num_buffers / 2)) > 4. Credit window size (optional; defaults to (low_watermark / 2)) > 5. Number of buffers reserved for credit messages (optional; > defaults to (num_buffers*2-1)/credit_window) > > Example: P,128,256,128,16 > - 128 byte buffers > - 256 buffers to receive incoming MPI messages > - When the number of available buffers reaches 128, re-post 128 more >buffers to reach a total of 256 > - If the number of available credits reaches 16, send an explicit >credit message to the sender > - Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are >reserved for explicit credit messages > > Shared receive queues can take between 1 and 4 parameters: > > 1. Buffer size in bytes (mandatory) > 2. Number of buffers (optional; defaults to 16) > 3. Low buffer count watermark (optional; defaults to (num_buffers / 2)) > 4. Maximum number of outstanding sends a sender can have (optional; > defaults to (low_watermark / 4) > > Example: S,1024,256,128,32 > - 1024 byte buffers > - 256 buffers to receive incoming MPI messages > - When the number of available buffers reaches 128, re-post 128 more >buffers to reach a total of 256 > - A sender will not send to a peer unless it has less than 32 >outstanding sends to that peer. > > IIRC, "X" takes the same parameters as "S"...? Note that if you you *any* > XRC queues, then *all* of your queues must be XRC. > > OMPI defaults to a btl_receive_queues value that may be specific to your > hardware. For example, connectx defaults to the following value: > > shell$ ompi_info --param btl openib --parsable | grep receive_queues > mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32 > mca:btl:openib:param:btl_openib_receive_queues:data_source:default value > mca:btl:openib:param:btl_openib_receive_queues:status:writable > mca:btl:openib:param:btl_openib_receive_queues:help:Colon-delimited, comma > delimited list of receive queues: P,4096,8,6,4:P,32768,8,6,4 > mca:btl:openib:param:btl_openib_receive_queues:deprecated:no > > Hope that helps! > > > > > On Jan 23, 2009, at 9:27 AM, Igor Kozin wrote: > >> Hi Gabriele, >> it might be that your message size is too large for available memory per >> node. >> I had a problem with IMB when I was not able to run to completion Alltoall >> on N=128, ppn=8 on our cluster with 16 GB per node. You'd think 16 GB is >> quite a lot but when you do the maths: >> 2* 4 MB * 128 procs * 8 procs/node = 8 GB/node plus you need to double >> because of buffering. I was told by Mellanox (our cards are ConnectX cards) >> that they introduced XRC in OFED 1.3 in addition to Share Receive Queue >> which should reduce memory foot print but I have not tested this yet. >> HTH, >> Igor >> 2009/1/23 Gabriele Fatigati >> Hi Igor, >> My message size is 4096kb and i have 4 procs per core. >> There isn't any difference using different algorithms.. >> >> 2009/1/23 Igor Kozin : >> > what is your message size and the number of cores per node? >> > is there any difference using different algorithms? >> > >> > 2009/1/23 Gabriele Fatigati >> >> >> >> Hi Jeff, >> >> i would like to understand why, if i run over 512 procs or more, my >> >> code stops over mpi collective, also with little send buffer. All >> >> processors are locked into call, doing nothing. But, if i add >> >> MPI_Barrier after MPI collective, it works! I run over Infiniband >> >> net. >> >> >> >> I know many people with this strange problem, i think there is a >> >> strange interaction between Infiniband and OpenMPI that causes it. >> >> >> >> >> >> >> >> 2009/1/23 Jeff Squyres : >> >> > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote: >> >> > >> >> >> I've noted that OpenMPI has an asynchronous behaviour in the >> >> >> collective >> >> >> calls. >> >> >> The processors, doesn't wait that other procs arrives in the call. >> >> > >> >> > T
Re: [OMPI users] Heterogeneous OpenFabrics hardware
On Monday 26 January 2009, Jeff Squyres wrote: > The Interop Working Group (IWG) of the OpenFabrics Alliance asked me > to bring a question to the Open MPI user and developer communities: is > anyone interested in having a single MPI job span HCAs or RNICs from > multiple vendors? (pardon the cross-posting, but I did want to ask > each group separately -- because the answers may be different) > > The interop testing lab at the University of New Hampshire > (http://www.iol.unh.edu/services/testing/ofa/ ) discovered that most (all?) > MPI implementations fail when having a single MPI job span HCAs from > multiple vendors and/or span RNICs from multiple vendors. I don't remember > the exact details (and they may not be public, anyway), but I'm pretty sure > that OMPI failed when used with QLogic and Mellanox HCAs in a single MPI > job. This is fairly unsurprising, given how we tune Open MPI's use of > OpenFabrics-capable hardware based on our .ini file. > > So my question is: does anyone want/need to support jobs that span > HCAs from multiple vendors and/or RNICs from multiple vendors? For these three cases: 1) Different vedor id but same OFED driver and basic chip 2) Same chip vendor, different OFED driver (mthca vs mlx4) 3) Any OFED supported IB HCA IMHO: Number one should just work. We may at times have some nodes with HCAs that have been flashed with non-standard/non-vendor firmware. Number two is something I would kind of expect to work. A possible situation where I'd need it is if I temporarily use an older HCA (mthca) to get a node going on a cluster with ConnectX (mlx4). Another case could be a cluster with two partitions with different HCAs. Number three would be nice to have. I think many users would assume it to work. Why not? They have symmetric software, all nodes run OFED, all have working IB... It would have worked if their nodes had had different kinds of ethernet NICS... /Peter signature.asc Description: This is a digitally signed message part.
Re: [OMPI users] Heterogeneous OpenFabrics hardware
It is worth clarifying a point in this discussion that I neglected to mention in my initial post: although Open MPI may not work *by default* with heterogeneous HCAs/RNICs, it is quite possible/likely that if you manually configure Open MPI to use the same verbs/hardware settings across all your HCAs/RNICs (assuming that you use a set of values that is compatible with all your hardware) that MPI jobs spanning multiple different kinds of HCAs or RNICs will work fine. See this post on the devel list for a few more details: http://www.open-mpi.org/community/lists/devel/2009/01/5314.php On Jan 27, 2009, at 6:08 AM, Peter Kjellstrom wrote: On Monday 26 January 2009, Jeff Squyres wrote: The Interop Working Group (IWG) of the OpenFabrics Alliance asked me to bring a question to the Open MPI user and developer communities: is anyone interested in having a single MPI job span HCAs or RNICs from multiple vendors? (pardon the cross-posting, but I did want to ask each group separately -- because the answers may be different) The interop testing lab at the University of New Hampshire (http://www.iol.unh.edu/services/testing/ofa/ ) discovered that most (all?) MPI implementations fail when having a single MPI job span HCAs from multiple vendors and/or span RNICs from multiple vendors. I don't remember the exact details (and they may not be public, anyway), but I'm pretty sure that OMPI failed when used with QLogic and Mellanox HCAs in a single MPI job. This is fairly unsurprising, given how we tune Open MPI's use of OpenFabrics-capable hardware based on our .ini file. So my question is: does anyone want/need to support jobs that span HCAs from multiple vendors and/or RNICs from multiple vendors? For these three cases: 1) Different vedor id but same OFED driver and basic chip 2) Same chip vendor, different OFED driver (mthca vs mlx4) 3) Any OFED supported IB HCA IMHO: Number one should just work. We may at times have some nodes with HCAs that have been flashed with non-standard/non-vendor firmware. Number two is something I would kind of expect to work. A possible situation where I'd need it is if I temporarily use an older HCA (mthca) to get a node going on a cluster with ConnectX (mlx4). Another case could be a cluster with two partitions with different HCAs. Number three would be nice to have. I think many users would assume it to work. Why not? They have symmetric software, all nodes run OFED, all have working IB... It would have worked if their nodes had had different kinds of ethernet NICS... /Peter ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
[OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7
Hello, I have two C codes : - master.c : spawns a slave - slave.c : spwaned by the master If the spawn is include in a do-loop, I can do only 123 spawns before having the folowing errors: ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file base/iof_base_setup.c at line 112 ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 203 This test works perfectly even for a lot of spawns (more than 1000) with Open-MPI 1.2.7. You will find the following files attached: config.log.tgz ompi_info.out.tgz ifconfig.out.tgz master.c.tgz slave.c.tgz command used to run my application : mpirun -n 1 ./master COMPILER: PGI 7.1 PATH : /space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:. LD_LIBRARY_PATH: /space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib If you have any idea of what this occurs, please tell me what to do to make it works. Thank you very much Anthony config.log.tgz Description: application/compressed-tar ifconfig.out.tgz Description: application/compressed-tar master.c.tgz Description: application/compressed-tar ompi_info.out.tgz Description: application/compressed-tar slave.c.tgz Description: application/compressed-tar
Re: [OMPI users] Heterogeneous OpenFabrics hardware
On Tuesday 27 January 2009, Jeff Squyres wrote: > It is worth clarifying a point in this discussion that I neglected to > mention in my initial post: although Open MPI may not work *by > default* with heterogeneous HCAs/RNICs, it is quite possible/likely > that if you manually configure Open MPI to use the same verbs/hardware > settings across all your HCAs/RNICs (assuming that you use a set of > values that is compatible with all your hardware) that MPI jobs > spanning multiple different kinds of HCAs or RNICs will work fine. > > See this post on the devel list for a few more details: > > http://www.open-mpi.org/community/lists/devel/2009/01/5314.php So is it correct that each rank will check its HCA-model and then pick up suitable settings for that HCA? If so maybe OpenMPI could fall back to a very conservative settings if more than one HCA model was detected among the ranks. Or would this require communication in a stage where that would be complicated and/or ugly? /Peter signature.asc Description: This is a digitally signed message part.
Re: [OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7
Just to be clear - you are doing over 1000 MPI_Comm_spawn calls to launch all the procs on a single node??? In the 1.2 series, every call to MPI_Comm_spawn would launch another daemon on the node, which would then fork/exec the specified app. If you look at your process table, you will see a whole lot of "orted" processes. Thus, you wouldn't run out of pipes because every orted only opened enough for a single process. In the 1.3 series, there is only one daemon on each node (mpirun fills that function on its node). MPI_Comm_spawn simply reuses that daemon to launch the new proc(s). Thus, there is a limit to the number of procs you can start on any node that is set by the #pipes a process can open. You can adjust that number, of course. You can look it up readily enough for your particular system. However, you may find that 1000 comm_spawns on a single node will lead to poor performance as the procs contend for processor attention. Hope that helps Ralph On Jan 27, 2009, at 7:59 AM, Anthony Thevenin wrote: Hello, I have two C codes : - master.c : spawns a slave - slave.c : spwaned by the master If the spawn is include in a do-loop, I can do only 123 spawns before having the folowing errors: ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file base/iof_base_setup.c at line 112 ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 203 This test works perfectly even for a lot of spawns (more than 1000) with Open-MPI 1.2.7. You will find the following files attached: config.log.tgz ompi_info.out.tgz ifconfig.out.tgz master.c.tgz slave.c.tgz command used to run my application : mpirun -n 1 ./master COMPILER: PGI 7.1 PATH : /space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/ usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/ matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/ bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:. LD_LIBRARY_PATH: /space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib If you have any idea of what this occurs, please tell me what to do to make it works. Thank you very much Anthony < config .log .tgz > < ifconfig .out .tgz > < master .c .tgz > < ompi_info .out.tgz>___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Doing a lot of spawns does not work with ompi 1.3 BUT works with ompi 1.2.7
Thank you! Yes, I am trying to do over 1000 MPI_Comm_spawn on a single node. But as I mentioned in my previous email, the MPI_Comm_spawn is in a do-loop. So in this single node, I only have 2 procs (master and slave). The next spawned slave comes only when the previous slave is dead. We (my team and me) are developing a coupler which launch the codes dynamically. Sometimes, depending on the coupling algorithm, we need to spawn a code (which can be parallel or not) a lot of times (more than 1000). Anthony Ralph Castain wrote: Just to be clear - you are doing over 1000 MPI_Comm_spawn calls to launch all the procs on a single node??? In the 1.2 series, every call to MPI_Comm_spawn would launch another daemon on the node, which would then fork/exec the specified app. If you look at your process table, you will see a whole lot of "orted" processes. Thus, you wouldn't run out of pipes because every orted only opened enough for a single process. In the 1.3 series, there is only one daemon on each node (mpirun fills that function on its node). MPI_Comm_spawn simply reuses that daemon to launch the new proc(s). Thus, there is a limit to the number of procs you can start on any node that is set by the #pipes a process can open. You can adjust that number, of course. You can look it up readily enough for your particular system. However, you may find that 1000 comm_spawns on a single node will lead to poor performance as the procs contend for processor attention. Hope that helps Ralph On Jan 27, 2009, at 7:59 AM, Anthony Thevenin wrote: Hello, I have two C codes : - master.c : spawns a slave - slave.c : spwaned by the master If the spawn is include in a do-loop, I can do only 123 spawns before having the folowing errors: ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file base/iof_base_setup.c at line 112 ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 203 This test works perfectly even for a lot of spawns (more than 1000) with Open-MPI 1.2.7. You will find the following files attached: config.log.tgz ompi_info.out.tgz ifconfig.out.tgz master.c.tgz slave.c.tgz command used to run my application : mpirun -n 1 ./master COMPILER: PGI 7.1 PATH : /space/thevenin/openmpi-1.3_pgi/bin:/usr/local/tecplot/bin:/usr/local/pgi/linux86-64/7.1/bin:/usr/totalview/bin:/usr/local/matlab71/bin:/usr/bin:/usr/ucb:/usr/sbin:/usr/bsd:/sbin:/bin:/usr/bin/X11:/usr/etc:/usr/local/bin:/usr/bin:/usr/bsd:/sbin:/usr/bin/X11:. LD_LIBRARY_PATH: /space/thevenin/openmpi-1.3_pgi/lib:/usr/local/lib If you have any idea of what this occurs, please tell me what to do to make it works. Thank you very much Anthony ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Heterogeneous OpenFabrics hardware
On Jan 27, 2009, at 10:19 AM, Peter Kjellstrom wrote: It is worth clarifying a point in this discussion that I neglected to mention in my initial post: although Open MPI may not work *by default* with heterogeneous HCAs/RNICs, it is quite possible/likely that if you manually configure Open MPI to use the same verbs/ hardware settings across all your HCAs/RNICs (assuming that you use a set of values that is compatible with all your hardware) that MPI jobs spanning multiple different kinds of HCAs or RNICs will work fine. See this post on the devel list for a few more details: http://www.open-mpi.org/community/lists/devel/2009/01/5314.php So is it correct that each rank will check its HCA-model and then pick up suitable settings for that HCA? Correct. We have an INI-style file that is installed in $pkgdir/mca- btl-openib-device-params.ini (typically expands to $prefix/share/ openmpi/mca-btl-openib-device-params.ini). This file contains a bunch of device-specific parameters, but it also has a "general" section that can be applied to any device if no specific match is found. If so maybe OpenMPI could fall back to a very conservative settings if more than one HCA model was detected among the ranks. Or would this require communication in a stage where that would be complicated and/or ugly? Today we don't do this kind of check; we just assume that every other MPI process is using the same hardware and/or the settings pulled from the INI file will be compatible. AFAIK, most (all?) other MPI's do the same thing. We *could* do that kind of check: a) there hasn't been enough customer demand for it / no one has submitted a patch to do so b) it might be a bit complicated because the startup sequence in the openib BTL is a little complex c) we are definitely moving to a scenario (at scale) where there is little/no communication at startup about coordinating information from all of the MPI peer processes; this strategy might be problematic in those scenarios (i.e., the coordination / determination of "conservative" settings would have to be done by a human and likely pre-posted to a file on each node -- still hand-waving a bit because that design isn't finalized/implemented yet) d) programatically finding what "conservative" settings are workable across a wide variety of devices may be problematic because individual device capabilities can vary wildly (does it have SRQ? can it support more than one BSRQ? what's a good MTU? ...?) I think d) is a big sticking point; we *could* make extremely conservative settings that should probably work everywhere. I can see at least one potential problematic scenario: - cluster has N nodes - a year later, an HCA in 1 node dies - get a new HCA, perhaps even from a different vendor - capabilities of the new HCA and old HCAs are different - so OMPI falls back to "extreme conservative" settings - jobs that run on that one node suffer in performance - jobs that do not run on that node see "normal" performance - users are confused I suppose that we could print a Big Hairy Warning(tm) if we fall back to extreme conservative settings, but it still seems to create the potential to violate the Law of Least Astonishment. -- Jeff Squyres Cisco Systems
Re: [OMPI users] OpenMPI-1.3 and XGrid
Thanks for reporting this Frank -- looks like we borked a symbol in the xgrid component in 1.3. It seems that the compiler doesn't complain about the missing symbol; it only shows up when you try to *run* with it. Whoops! I filed ticket https://svn.open-mpi.org/trac/ompi/ticket/1777 about this issue. On Jan 23, 2009, at 3:11 PM, Frank Kahle wrote: I'm running OpenMPI on OS X 4.11. After upgrading to OpenMPI-1.3 I get the following error when submitting a job via XGrid: dyld: lazy symbol binding failed: Symbol not found: _orte_pointer_array_add Referenced from: /usr/local/mpi/lib/openmpi/mca_plm_xgrid.so Expected in: flat namespace Here you'll find ompi_info's output: [g5-node-1:~] motte% ompi_info Package: Open MPI root@ibi.local Distribution Open MPI: 1.3 Open MPI SVN revision: r20295 Open MPI release date: Jan 19, 2009 Open RTE: 1.3 Open RTE SVN revision: r20295 Open RTE release date: Jan 19, 2009 OPAL: 1.3 OPAL SVN revision: r20295 OPAL release date: Jan 19, 2009 Ident string: 1.3 Prefix: /usr/local/mpi Configured architecture: powerpc-apple-darwin8 Configure host: ibi.local Configured by: root Configured on: Tue Jan 20 19:45:26 CET 2009 Configure host: ibi.local Built by: root Built on: Tue Jan 20 20:49:48 CET 2009 Built host: ibi.local C bindings: yes C++ bindings: yes Fortran77 bindings: yes (single underscore) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc-4.3 C compiler absolute: /usr/local/bin/gcc-4.3 C++ compiler: c++-4.3 C++ compiler absolute: /usr/local/bin/c++-4.3 Fortran77 compiler: gfortran-4.3 Fortran77 compiler abs: /usr/local/bin/gfortran-4.3 Fortran90 compiler: gfortran-4.3 Fortran90 compiler abs: /usr/local/bin/gfortran-4.3 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Sparse Groups: no Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: darwin (MCA v2.0, API v2.0, Component v1.3) MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3) MCA carto: file (MCA v2.0, API v2.0, Component v1.3) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3) MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3) MCA coll: basic (MCA v2.0, API v2.0, Component v1.3) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3) MCA coll: inter (MCA v2.0, API v2.0, Component v1.3) MCA coll: self (MCA v2.0, API v2.0, Component v1.3) MCA coll: sm (MCA v2.0, API v2.0, Component v1.3) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3) MCA io: romio (MCA v2.0, API v2.0, Component v1.3) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3) MCA pml: cm (MCA v2.0, API v2.0, Component v1.3) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3) MCA pml: v (MCA v2.0, API v2.0, Component v1.3) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3) MCA btl: self (MCA v2.0, API v2.0, Component v1.3) MCA btl: sm (MCA v2.0, API v2.0, Component v1.3) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3) MCA topo: unity (MCA v2.0, API v2.0, Component v1.3) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3) MCA iof: orted (MCA v2.0, API v2.0, Component v1.3) MCA iof: tool (MCA v2.0, API v2.0, Compone
Re: [OMPI users] Cannot compile on Linux Itanium system
Thanks Joe -- let us know what you find... From his config.log, I think his configure line was: ./configure --prefix=/opt/openmpi-1.3 See the full attachment here (scroll down to the bottom of the web page): http://www.open-mpi.org/community/lists/users/2009/01/7810.php On Jan 26, 2009, at 4:31 PM, Joe Griffin wrote: Tony, I don't know what iac is. I use ias for my ASM code: ia64b <82> cd /opt/intel ia64b <83> find . -name 'iac' ia64b <84> find . -name 'ias' ./fc/10.1.012/bin/ias ./cc/10.1.012/bin/ias Anyway, if you want another data point and see if my compilers work I will gladly try to compile if you send me your configure / make lines. Aiming to help if I can, Joe -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Iannetti, Anthony C. (GRC-RTB0) Sent: Monday, January 26, 2009 12:45 PM To: us...@open-mpi.org Subject: Re: [OMPI users] Cannot compile on Linux Itanium system Jeff, I could successfully compile OpenMPI versions 1.2.X on Itanium Linux with the same compilers. I was never able to compile the 1.3 beta versions on IA64 Linux. Joe, I am using whatever assembler that ./configure provides. I believe it is icc. Should I set AS (I think) to iac? Thanks, Tony ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
[OMPI users] Compilers
Hi all, I want to compile Open-mpi using intel compilers. Unfortunately the Series 10 C compiler(icc) license has expired. I downloaded and looked at the Series 11 C++ compiler (no C compiler listed) and would like to know if you can use this together with an enclosed or obtained C compiler from Intel. The release notes are a bit overwhelming! Is it possible to use the standard Linux gcc instead? Amos Leffler
[OMPI users] v1.3: mca_common_sm_mmap_init error
I just installed OpenMPI 1.3 with tight integration for SGE. Version 1.2.8 was working just fine for several months in the same arrangement. Now that I've upgraded to 1.3, I get the following errors in my standard error file: mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node09.aurora_0/21400/1/shared_mem_pool.node09.aurora failed with errno=2 [node23.aurora:20601] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node23.aurora_0/21400/1/shared_mem_pool.node23.aurora failed with errno=2 [node46.aurora:12118] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node46.aurora_0/21400/1/shared_mem_pool.node46.aurora failed with errno=2 [node15.aurora:12421] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node15.aurora_0/21400/1/shared_mem_pool.node15.aurora failed with errno=2 [node20.aurora:12534] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node20.aurora_0/21400/1/shared_mem_pool.node20.aurora failed with errno=2 [node16.aurora:12573] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node16.aurora_0/21400/1/shared_mem_pool.node16.aurora failed with errno=2 I've tested 3-4 different times, and the number of hosts that produces this error varies, as well as which hosts produce this error. My program seems to run fun, but it's just a simple "Hello, World!" program. Any ideas? Is this a bug in 1.3? -- Prentice -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ
Re: [OMPI users] v1.3: mca_common_sm_mmap_init error
Sort of ditto but with SVN release at 20123 (and earlier): e.g. [r2250_46:30018] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46 failed with errno=2 [r2250_63:05292] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_63_0/25682/1/shared_mem_pool.r2250_63 failed with errno=2 [r2250_57:17527] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_57_0/25682/1/shared_mem_pool.r2250_57 failed with errno=2 [r2250_68:13553] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_68_0/25682/1/shared_mem_pool.r2250_68 failed with errno=2 [r2250_50:06541] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_50_0/25682/1/shared_mem_pool.r2250_50 failed with errno=2 [r2250_49:29237] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_49_0/25682/1/shared_mem_pool.r2250_49 failed with errno=2 [r2250_66:19066] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_66_0/25682/1/shared_mem_pool.r2250_66 failed with errno=2 [r2250_58:24902] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_58_0/25682/1/shared_mem_pool.r2250_58 failed with errno=2 [r2250_69:27426] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_69_0/25682/1/shared_mem_pool.r2250_69 failed with errno=2 [r2250_60:30560] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_60_0/25682/1/shared_mem_pool.r2250_60 failed with errno=2 File not found in sm. 10 of them across 32 nodes (8 cores per node (2 sockets x quad-core)) "Apparently harmless"? DM On Tue, 27 Jan 2009, Prentice Bisbal wrote: I just installed OpenMPI 1.3 with tight integration for SGE. Version 1.2.8 was working just fine for several months in the same arrangement. Now that I've upgraded to 1.3, I get the following errors in my standard error file: mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node09.aurora_0/21400/1/shared_mem_pool.node09.aurora failed with errno=2 [node23.aurora:20601] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node23.aurora_0/21400/1/shared_mem_pool.node23.aurora failed with errno=2 [node46.aurora:12118] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node46.aurora_0/21400/1/shared_mem_pool.node46.aurora failed with errno=2 [node15.aurora:12421] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node15.aurora_0/21400/1/shared_mem_pool.node15.aurora failed with errno=2 [node20.aurora:12534] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node20.aurora_0/21400/1/shared_mem_pool.node20.aurora failed with errno=2 [node16.aurora:12573] mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent ice@node16.aurora_0/21400/1/shared_mem_pool.node16.aurora failed with errno=2 I've tested 3-4 different times, and the number of hosts that produces this error varies, as well as which hosts produce this error. My program seems to run fun, but it's just a simple "Hello, World!" program. Any ideas? Is this a bug in 1.3? -- Prentice -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users