[OMPI users] Ethernet tuning on Solaris Opteron ?
I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron. Each quadripro node possess two ethernet interfaces bge0 and bge1. Interfaces bge0 are dedicated to parallel jobs and correspond to node names pxx, they use a dedicated gigabit switch. Interfaces bge1 provide nfs sharing etc and correspond to node names nxx over another gigabit switch. 1) I allocated 4 quadripro nodes. As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES runs 4 tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES --bynode distributes a task on each node. 2) According to the users list, mpirun --mca pml teg should revert to 2nd generation TCP instead of default ob1 (3rd gen). Unfortunately I get the message No available pml components were found! Have you removed the 2nd generation TCP transport ? Do you consider the new ob1 is competitive now ? 3) According to the users list, tuned collective primitives are available. Apparently they are now compiled by default, but the don't seem functional at all: mpirun --mca coll tuned Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:0 *** End of error message *** 4) According to the FAQ and to the users list, openmpi attempts to discover and use all interfaces. I attempted to force using bge0 only with no success. mpirun --mca btl_tcp_if_exclude bge1 [n33:04784] *** An error occurred in MPI_Barrier [n33:04784] *** on communicator MPI_COMM_WORLD [n33:04784] *** MPI_ERR_INTERN: internal error [n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye) 1 process killed (possibly by Open MPI) In the FAQ it is stated that a new syntax should be available soon. I tried if it is already implemented in openmpi-1.1a1r9260 mpirun --mca btl_tcp_if ^bge0,bge1 mpirun --mca btl_tcp_if ^bge1 works with identical performances. However I doubt this option is functional, because if I disable all ethernet interfaces, mpirun --mca btl_tcp_if ^bge0,bge1 the job still works! I would be happy to have more control on the interfaces being used. What is expected to work on other platforms ? What could be specific issues to the Solaris Opteron ? Have a nice openmpi day! -- Soutenez le mouvement SAUVONS LA RECHERCHE : http://recherche-en-danger.apinc.org/ _/_/_/_/_/ _/ Dr. Pierre VALIRON _/ _/ _/ _/ Laboratoire d'Astrophysique _/ _/ _/ _/Observatoire de Grenoble / UJF _/_/_/_/_/_/BP 53 F-38041 Grenoble Cedex 9 (France) _/ _/ _/http://www-laog.obs.ujf-grenoble.fr/~valiron/ _/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr _/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821 _/ _/_/
Re: [OMPI users] MPI_COMM_SPAWN f90 interface bug?
> -Original Message- > > [-:13327] mca: base: component_find: unable to open: dlopen(/usr/ > > local/lib/openmpi/mca_pml_teg.so, 9): Symbol not found: > > _mca_ptl_base_recv_request_t_class > >Referenced from: /usr/local/lib/openmpi/mca_pml_teg.so > >Expected in: flat namespace > > (ignored) > > I have determined that the above error/warning is caused by > installing opempi1.1r9212 on a machine were openmpi1.0.1 was > previously installed. I had to manually delete the files in > /usr/ local/lib/openmpi and then reinstall. This implies an > error with with the 1.1 install script. Just to clarify on this issue -- Open MPI uses Automake for its installation / uninstallation. As such, it only copies in the files that are relevant to each version of Open MPI. It does *not* uninstall any previous versions of Open MPI. Specifically, the plugins that are installed between Open MPI 1.0.x and 1.1.x are different. When you installed Open MPI 1.1.x over the same tree as 1.0.x, although most of the 1.0.x plugins were overwritten, some were not (because they only exist in 1.0.x). At run time, Open MPI 1.1.x tried to open the 1.0.x plugins and resulted in the "symbol not found" errors that you saw. So this is actually exactly what the Open MPI installation process is supposed to do (only touch the files that are relevant to it, not any others). We could probably be a bit smarter and not have Open MPI try to open plugins from earlier versions, but that's a low priority at the moment. -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [OMPI users] Ethernet tuning on Solaris Opteron ?
On Mar 14, 2006, at 4:42 AM, Pierre Valiron wrote: I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron. I guess I should have pointed this out more clearly earlier. Open MPI 1.1a1 is a nightly build of alpha release from our development trunk. It isn't guaranteed to be stable. About the only guarantee made is that it passed "make distcheck" on the Linux box we use to make tarballs. The Solaris patches have been moved over to the v1.0 release branch, so if stability is a concern, you might want to switch back to a nightly tarball from the v1.0 release. We should also be having another beta release of the 1.0.2 release in the near future. Each quadripro node possess two ethernet interfaces bge0 and bge1. Interfaces bge0 are dedicated to parallel jobs and correspond to node names pxx, they use a dedicated gigabit switch. Interfaces bge1 provide nfs sharing etc and correspond to node names nxx over another gigabit switch. 1) I allocated 4 quadripro nodes. As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES runs 4 tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES --bynode distributes a task on each node. 2) According to the users list, mpirun --mca pml teg should revert to 2nd generation TCP instead of default ob1 (3rd gen). Unfortunately I get the message No available pml components were found! Have you removed the 2nd generation TCP transport ? Do you consider the new ob1 is competitive now ? On the development trunk, we have removed the TEG PML and all the PTLs. The OB1 PML provides competitive (and most of the time better) performance than the TEG PML for most transports. The major issue is that when we added one-sided communication, we used the BTL transports directly. The BTL and PTL frameworks were not designed to live together, so issues were caused with the TEG PML. 3) According to the users list, tuned collective primitives are available. Apparently they are now compiled by default, but the don't seem functional at all: mpirun --mca coll tuned Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR) Failing at addr:0 *** End of error message *** Tuned collectives are available, but not as heavily tested as the basic collectives. Do you have a test case in particular that causes problems? 4) According to the FAQ and to the users list, openmpi attempts to discover and use all interfaces. I attempted to force using bge0 only with no success. mpirun --mca btl_tcp_if_exclude bge1 [n33:04784] *** An error occurred in MPI_Barrier [n33:04784] *** on communicator MPI_COMM_WORLD [n33:04784] *** MPI_ERR_INTERN: internal error [n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye) 1 process killed (possibly by Open MPI) That definitely shouldn't happen - Can you reconfigure / compile with the option --enable-debug, then run with the added option --mca btl_base_debug 2 and send the output you see to us? That might help in diagnosing the problem. In the FAQ it is stated that a new syntax should be available soon. I tried if it is already implemented in openmpi-1.1a1r9260 mpirun --mca btl_tcp_if ^bge0,bge1 mpirun --mca btl_tcp_if ^bge1 works with identical performances. This syntax only works for specifying component names, not interface names. So you would still need to use the btl_tcp_if_include and btl_tcp_if_exclude options. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] MPI_COMM_SPAWN f90 interface bug?
I see responses to noncritical parts of my discussion but not the following, is it a known issue, a fixed issue, or we don't want to discuss it issue? Michael On Mar 7, 2006, at 4:39 PM, Michael Kluskens wrote: The following errors/warnings also exist when running my spawn test on a clean installation of r9212. [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/ soh_base_get_proc_soh.c at line 100 [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/ oob_base_xcast.c at line 108 [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/ rmgr_base_stage_gate.c at line 276 [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/ soh_base_get_proc_soh.c at line 100 [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/ oob_base_xcast.c at line 108 [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/ rmgr_base_stage_gate.c at line 276 OS X 10.4.5 with g95 from current fink install for FC & F77. Running on a single machine and launching a single spawned subprocess as a test case for now. Also on Debian Sarge on Operton built using "./ configure --with-gnu-ld F77=pgf77 FFLAGS=-fastsse FC=pgf90 FCFLAGS=- fastsse" with PG 6.1. Are these diagnostic messages of errors in OpenMPI 1.1r9212 or related to errors in my test code? Is this information helpful for development purposes?
[OMPI users] comm_join and singleton init
Hi I've got a bit of an odd bug here. I've been playing around with MPI process management routines and I notied the following behavior with openmpi-1.0.1: Two processes (a and b), linked with ompi, but started independently (no mpiexec, just started the programs directly). - a and b: call MPI_Init - a: open a unix network socket on 'fd' - b: connect to a's socket - a and b: call MPI_Comm_join over 'fd' - a and b: call MPI_Intercomm_merge, get intracommunicator. These steps all work fine. Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over the intracommunicator. Both (correctly) think Comm_size is two, but both also think (incorrectly) that they are rank 1. ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Labs, IL USAB29D F333 664A 4280 315B
[OMPI users] MPI_Comm_connect and singleton init
Hello In playing around with process management routines, I found another issue. This one might very well be operator error, or something implementation specific. I've got two processes (a and b), linked with openmpi, but started independently (no mpiexec). - A starts up and calls MPI_Init - A calls MPI_Open_port, prints out the port name to stdout, then calls MPI_Comm_accept and blocks. - B takes as a command line argument the port name printed out by A. It calls MPI_Init and then and passes that port name to MPI_Comm_connect - B gets the following error: [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/dps/dps_unpack.c at line 121 [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/dps/dps_unpack.c at line 95 [leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect [leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD [leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error [leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye) [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file ../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183 - A is still waiting for someone to connect to it. Did I pass MPI port strings between programs the correct way, or is MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this information? Thanks ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Labs, IL USAB29D F333 664A 4280 315B
Re: [OMPI users] comm_join and singleton init
could you provide me a simple testcode for that? Comm_join and intercomm_merge should work, I would have a look at that... (separate answer to your second email is coming soon) Thanks Edgar Robert Latham wrote: Hi I've got a bit of an odd bug here. I've been playing around with MPI process management routines and I notied the following behavior with openmpi-1.0.1: Two processes (a and b), linked with ompi, but started independently (no mpiexec, just started the programs directly). - a and b: call MPI_Init - a: open a unix network socket on 'fd' - b: connect to a's socket - a and b: call MPI_Comm_join over 'fd' - a and b: call MPI_Intercomm_merge, get intracommunicator. These steps all work fine. Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over the intracommunicator. Both (correctly) think Comm_size is two, but both also think (incorrectly) that they are rank 1. ==rob
Re: [OMPI users] MPI_Comm_connect and singleton init
you are touching here a difficult area in Open MPI: - name publishing across independent jobs does unfortunatly not work right now (It does work, if all processes have been started by the same mpirun or if the have been spawned by a father process using MPI_Comm_spawn). Your approach with passing the port as a command line option should work however. - you have to start however the orted daemon *before* starting both jobs using the flags ' orted --seed --persistent --scope public' These flags are however currently just lightly tested, since a brand new runtime environment with much better support for these operations is currently under development. - regarding the 'pack data mismatch': do both machines which you are using have the same data representation? The reason I ask is because this looks like a data type mismatch error, and Open MPI currently does have some restriction regarding different data formats and endianness... Thanks Edgar Robert Latham wrote: Hello In playing around with process management routines, I found another issue. This one might very well be operator error, or something implementation specific. I've got two processes (a and b), linked with openmpi, but started independently (no mpiexec). - A starts up and calls MPI_Init - A calls MPI_Open_port, prints out the port name to stdout, then calls MPI_Comm_accept and blocks. - B takes as a command line argument the port name printed out by A. It calls MPI_Init and then and passes that port name to MPI_Comm_connect - B gets the following error: [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/dps/dps_unpack.c at line 121 [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/dps/dps_unpack.c at line 95 [leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect [leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD [leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error [leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye) [leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file ../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183 - A is still waiting for someone to connect to it. Did I pass MPI port strings between programs the correct way, or is MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this information? Thanks ==rob -- Edgar Gabriel Assistant Professor Department of Computer Science email:gabr...@cs.uh.edu University of Houston http://www.cs.uh.edu/~gabriel Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857 Houston, TX-77204, USA Fax: +1 (713) 743-3335
Re: [OMPI users] comm_join and singleton init
I think I know what goes wrong. Since they are in different 'universes', they will have exactly the same 'Open MPI name', and therefore the algorithm in intercomm_merge can not determine which process should be first and which is second. Practically, all jobs which are connected at a certain point in there lifetime have to be in the same MPI universe, such that all jobs will have different jobid's and therefore different names. To use the same universe, you have to start the orted daemon in the persistent mode, so the sequence should be: orted --seed --persistent --scope public mpirun -np x ./app1 mpirun -np y ./app2 In this case everything should work as expected, you could do the comm_join between app1 and app2 and the intercomm_merge should work as well. Hope this helps Edgar Edgar Gabriel wrote: could you provide me a simple testcode for that? Comm_join and intercomm_merge should work, I would have a look at that... (separate answer to your second email is coming soon) Thanks Edgar Robert Latham wrote: Hi I've got a bit of an odd bug here. I've been playing around with MPI process management routines and I notied the following behavior with openmpi-1.0.1: Two processes (a and b), linked with ompi, but started independently (no mpiexec, just started the programs directly). - a and b: call MPI_Init - a: open a unix network socket on 'fd' - b: connect to a's socket - a and b: call MPI_Comm_join over 'fd' - a and b: call MPI_Intercomm_merge, get intracommunicator. These steps all work fine. Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over the intracommunicator. Both (correctly) think Comm_size is two, but both also think (incorrectly) that they are rank 1. ==rob ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Department of Computer Science email:gabr...@cs.uh.edu University of Houston http://www.cs.uh.edu/~gabriel Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857 Houston, TX-77204, USA Fax: +1 (713) 743-3335
Re: [OMPI users] MPI_Comm_connect and singleton init
On Tue, Mar 14, 2006 at 12:00:57PM -0600, Edgar Gabriel wrote: > you are touching here a difficult area in Open MPI: I don't doubt it. I haven't found an MPI implementation yet that does this without any quirks or oddities :> > - name publishing across independent jobs does unfortunatly not work > right now (It does work, if all processes have been started by the > same mpirun or if the have been spawned by a father process using > MPI_Comm_spawn). Your approach with passing the port as a command > line option should work however. > > - you have to start however the orted daemon *before* starting both > jobs using the flags ' orted --seed --persistent --scope public' > These flags are however currently just lightly tested, since a > brand new runtime environment with much better support for these > operations is currently under development. Ok, got it. If there is some sort of setup before hand (in this case, lanuching orted), then these independent mpi processes will have a lot easier time talking to each other. Makes sense. > - regarding the 'pack data mismatch': do both machines which you are > using have the same data representation? The reason I ask is > because this looks like a data type mismatch error, and Open MPI > currently does have some restriction regarding different data > formats and endianness... I'm just running this on the same machine. Thanks for the quick response. ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Labs, IL USAB29D F333 664A 4280 315B