[OMPI users] OpenMPI 1.2.4 vs 1.2
Hello Guys, I had openmpi v1.2 installed on my cluster. Couple of days back, i thought to upgrade it to v1.2.4(latest release i suppose). Since i didnt want to take risk, i first installed it on temporary location and did bandwidth and bidirectional bandwidth test provided by the OSU guys, and to my surprise, old version performs better in both scenarios.Could anyone give me the reason for the same?I repeated the above point to point tests between all set of nodes, but the result were same :(-Neeraj
[OMPI users] Bug in common_mx.c (1.2.5a0r16522)
Hi! In common_mx.c the following looks wrong. ompi_common_mx_finalize(void) { mx_return_t mx_return; ompi_common_mx_initialize_ref_cnt--; if(ompi_common_mx_initialize == 0) { That should be if(ompi_common_mx_initialize_ref_cnt == 0) right? -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
Re: [OMPI users] Parallel Genetic Algorithms - Open MPI Implementation
Hi Dirk, On 10/24/07, Dirk Eddelbuettel wrote: > > > On 24 October 2007 at 01:01, Amit Kumar Saha wrote: > | Hello all! > | > | After some background research, I am soon going to start working on > | "Parallel Genetic Algorithms". When I reach the point of practical > | implementation, I am going to use Open MPI for the purpose. > | > | Has anyone here worked on similar things? It would be nice if you could > | share some views/comments. > > Yes. PGAPACK, developend in the mid-1990s by David Levine while at > Argonne, > works perfectly well in parallel under various MPI implementations. > > I have been in contact with David and Argonne to coordinate a re-release > under a newer license [1], but we're not quite there yet, and I have > been the one holding this up. Hopefully more news 'soon' but I've been > mumbling that all summer while I kept busy... > > You may want to look at PGAPACK and study it for possible extensions and > refactorings, rather than to start again from scratch. Had come across PGAPack some time back, did not spend much time with it though. But after I am through with some of the theoretical aspects of both Genetic algorithms, parallel genetic algorithms. I shall definitely start off with PGAPack By the way, if time permits could you kindly point me to some relevant resources you may know of, though I shall turn to Google soon. Will get back to you after I have started looking at PGAPack. Thanks, Amit -- Amit Kumar Saha *NetBeans Community Docs Contribution Coordinator* me blogs@ http://amitksaha.blogspot.com URL:http://amitsaha.in.googlepages.com
Re: [OMPI users] Bug in common_mx.c (1.2.5a0r16522)
On Wed, 2007-10-24 at 09:00 +0200, Åke Sandgren wrote: > Hi! > > In common_mx.c the following looks wrong. > ompi_common_mx_finalize(void) > { > mx_return_t mx_return; > ompi_common_mx_initialize_ref_cnt--; > if(ompi_common_mx_initialize == 0) { > > That should be > if(ompi_common_mx_initialize_ref_cnt == 0) > right? > And there was a missing return too. Complete ompi_common_mx_finalize should be int ompi_common_mx_finalize(void) { mx_return_t mx_return; ompi_common_mx_initialize_ref_cnt--; if(ompi_common_mx_initialize_ref_cnt == 0) { mx_return = mx_finalize(); if(mx_return != MX_SUCCESS){ opal_output(0, "Error in mx_finalize (error %s)\n", mx_strerror(mx_return)); return OMPI_ERROR; } } return OMPI_SUCCESS; }
Re: [OMPI users] Bug in common_mx.c (1.2.5a0r16522)
You're absolutely right. Thanks for the patch, I applied it on the trunk (revision 16560). Thanks, george. On Oct 24, 2007, at 8:17 AM, Åke Sandgren wrote: On Wed, 2007-10-24 at 09:00 +0200, Åke Sandgren wrote: Hi! In common_mx.c the following looks wrong. ompi_common_mx_finalize(void) { mx_return_t mx_return; ompi_common_mx_initialize_ref_cnt--; if(ompi_common_mx_initialize == 0) { That should be if(ompi_common_mx_initialize_ref_cnt == 0) right? And there was a missing return too. Complete ompi_common_mx_finalize should be int ompi_common_mx_finalize(void) { mx_return_t mx_return; ompi_common_mx_initialize_ref_cnt--; if(ompi_common_mx_initialize_ref_cnt == 0) { mx_return = mx_finalize(); if(mx_return != MX_SUCCESS){ opal_output(0, "Error in mx_finalize (error %s)\n", mx_strerror(mx_return)); return OMPI_ERROR; } } return OMPI_SUCCESS; } ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
[OMPI users] Cannot suppress openib error message
I've been scratching my head over this: lnx01:/usr/lib> orterun -n 2 --mca btl ^openib ~/c++/tests/mpitest [lnx01:14417] mca: base: component_find: unable to open btl openib: file not found (ignored) [lnx01:14418] mca: base: component_find: unable to open btl openib: file not found (ignored) Hello world, I'm process 0 Hello world, I'm process 1 lnx01:/usr/lib> grep openib /etc/openmpi/openmpi-mca-params.conf # btl = ^openib btl = ^openib lnx01:/usr/lib> orterun -n 2 ~/c++/tests/mpitest [lnx01:14429] mca: base: component_find: unable to open btl openib: file not found (ignored) [lnx01:14430] mca: base: component_find: unable to open btl openib: file not found (ignored) Hello world, I'm process 0 Hello world, I'm process 1 and when I strace it, I get uname({sys="Linux", node="lnx01", ...}) = 0 open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf820698) = -1 ENOTTY (Inappropriate ioctl for device) fstat64(3, {st_mode=S_IFREG|0644, st_size=2877, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f72000 read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2877 read(3, "", 4096) = 0 read(3, "", 8192) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8205f8) = -1 ENOTTY (Inappropriate ioctl for device) close(3)= 0 munmap(0xb7f72000, 4096)= 0 Why can't I suppress the dreaded Infinityband message? System is Ubuntu 7.04 with 'ported' (ie locally recompiled) current Open MPI packages from Debian. Dirk -- Three out of two people have difficulties with fractions.
[OMPI users] orterun "by hand"
Hello, I'd like to run Open MPI "by hand". I have a few ordinary workstations I'd like to run a code using Open MPI on. They're in the same LAN, have unique IP addresses and hostnames, and I've installed the default Open MPI package, and I've compiled an MPI app against the Open MPI libraries and copied the executable to each machine, but let's assume these machines do not have BProc, Torque, PBS, SLURM, rsh or ssh access to each other, or NFS. I'm looking at the shell of each node: what do I type in to make Open MPI go? If it matters, they're OS X Macs. I am welcome to be enlightened if I've missed the documentation for this scenario. Thanks, Dean
Re: [OMPI users] orterun "by hand"
On 10/24/07, Dean Dauger, Ph. D. wrote: > Hello, > > I'd like to run Open MPI "by hand". I have a few ordinary > workstations I'd like to run a code using Open MPI on. They're in > the same LAN, have unique IP addresses and hostnames, and I've > installed the default Open MPI package, and I've compiled an MPI app > against the Open MPI libraries and copied the executable to each > machine, but let's assume these machines do not have BProc, Torque, > PBS, SLURM, rsh or ssh access to each other, or NFS. I'm looking at > the shell of each node: what do I type in to make Open MPI go? > If I understand your question correctly, you need: mpirun /path/to/executable (depending on the program you may have to give -np N argument where N is the number of instances you'd like to run) and also read: http://www.open-mpi.org/faq/?category=running Hope this helps. Thanks. Gurhan > If it matters, they're OS X Macs. I am welcome to be enlightened if > I've missed the documentation for this scenario. > > Thanks, > Dean > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] orterun "by hand"
Dean, There is no way to run Open MPI by hand, or at least not simple way. How about xgrid on your OS X cluster ? Anyway, without a way to start processes remotely it is really difficult to start up any kind of parallel job. george. On Oct 24, 2007, at 12:06 PM, Dean Dauger, Ph. D. wrote: Hello, I'd like to run Open MPI "by hand". I have a few ordinary workstations I'd like to run a code using Open MPI on. They're in the same LAN, have unique IP addresses and hostnames, and I've installed the default Open MPI package, and I've compiled an MPI app against the Open MPI libraries and copied the executable to each machine, but let's assume these machines do not have BProc, Torque, PBS, SLURM, rsh or ssh access to each other, or NFS. I'm looking at the shell of each node: what do I type in to make Open MPI go? If it matters, they're OS X Macs. I am welcome to be enlightened if I've missed the documentation for this scenario. Thanks, Dean ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI users] orterun "by hand"
Hi, Am 24.10.2007 um 19:21 schrieb George Bosilca: There is no way to run Open MPI by hand, or at least not simple way. How about xgrid on your OS X cluster ? Anyway, without a way to start processes remotely it is really difficult to start up any kind of parallel job. just to note: with PVM it's possible, but rarely used I think. -- Reuti george. On Oct 24, 2007, at 12:06 PM, Dean Dauger, Ph. D. wrote: Hello, I'd like to run Open MPI "by hand". I have a few ordinary workstations I'd like to run a code using Open MPI on. They're in the same LAN, have unique IP addresses and hostnames, and I've installed the default Open MPI package, and I've compiled an MPI app against the Open MPI libraries and copied the executable to each machine, but let's assume these machines do not have BProc, Torque, PBS, SLURM, rsh or ssh access to each other, or NFS. I'm looking at the shell of each node: what do I type in to make Open MPI go? If it matters, they're OS X Macs. I am welcome to be enlightened if I've missed the documentation for this scenario. Thanks, Dean ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] orterun "by hand"
If they are OSX machines adding password-less ssh is easy, then you can make a nodefile with all the unique ip's, if you can do that you can avoid putting a full resource manager on them. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Oct 24, 2007, at 1:39 PM, Reuti wrote: Hi, Am 24.10.2007 um 19:21 schrieb George Bosilca: There is no way to run Open MPI by hand, or at least not simple way. How about xgrid on your OS X cluster ? Anyway, without a way to start processes remotely it is really difficult to start up any kind of parallel job. just to note: with PVM it's possible, but rarely used I think. -- Reuti george. On Oct 24, 2007, at 12:06 PM, Dean Dauger, Ph. D. wrote: Hello, I'd like to run Open MPI "by hand". I have a few ordinary workstations I'd like to run a code using Open MPI on. They're in the same LAN, have unique IP addresses and hostnames, and I've installed the default Open MPI package, and I've compiled an MPI app against the Open MPI libraries and copied the executable to each machine, but let's assume these machines do not have BProc, Torque, PBS, SLURM, rsh or ssh access to each other, or NFS. I'm looking at the shell of each node: what do I type in to make Open MPI go? If it matters, they're OS X Macs. I am welcome to be enlightened if I've missed the documentation for this scenario. Thanks, Dean ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI 1.2.4 vs 1.2
The changes in the 1.2 series are listed here: http://svn.open-mpi.org/svn/ompi/branches/v1.2/NEWS I'm surprised that your performance went down from v1.2 to v1.2.4. What networks were you testing, and how exactly did you test? On Oct 24, 2007, at 12:14 AM, Neeraj Chourasia wrote: Hello Guys, I had openmpi v1.2 installed on my cluster. Couple of days back, i thought to upgrade it to v1.2.4(latest release i suppose). Since i didnt want to take risk, i first installed it on temporary location and did bandwidth and bidirectional bandwidth test provided by the OSU guys, and to my surprise, old version performs better in both scenarios. Could anyone give me the reason for the same? I repeated the above point to point tests between all set of nodes, but the result were same :( -Neeraj ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Cannot suppress openib error message
This is quite likely because of a "feature" in how the OMPI v1.2 series handles its plugins. In OMPI <=v1.2.x, Open MPI opens all plugins that it can find and *then* applies the filter that you provide (e.g., via the "btl" MCA param) to close / ignore certain plugins. In OMPI >=v1.3, we [effectively] apply the filter *before* opening plugins. So "--mca btl ^openib" will actually prevent the openib BTL plugin from being loaded. I'm guessing that what you're seeing today is because we're opening the openib BTL on a system where the OpenFabrics support libraries are not available, and therefore the dlopen() fails. The error string that we get back from libltdl is the somewhat-misleading "file not found (ignored)", and that's what we print (note that ltdl is referring to the fact that a dependent library is not found). On Oct 24, 2007, at 9:51 AM, Dirk Eddelbuettel wrote: I've been scratching my head over this: lnx01:/usr/lib> orterun -n 2 --mca btl ^openib ~/c++/tests/mpitest [lnx01:14417] mca: base: component_find: unable to open btl openib: file not found (ignored) [lnx01:14418] mca: base: component_find: unable to open btl openib: file not found (ignored) Hello world, I'm process 0 Hello world, I'm process 1 lnx01:/usr/lib> grep openib /etc/openmpi/openmpi-mca-params.conf # btl = ^openib btl = ^openib lnx01:/usr/lib> orterun -n 2 ~/c++/tests/mpitest [lnx01:14429] mca: base: component_find: unable to open btl openib: file not found (ignored) [lnx01:14430] mca: base: component_find: unable to open btl openib: file not found (ignored) Hello world, I'm process 0 Hello world, I'm process 1 and when I strace it, I get uname({sys="Linux", node="lnx01", ...}) = 0 open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf820698) = -1 ENOTTY (Inappropriate ioctl for device) fstat64(3, {st_mode=S_IFREG|0644, st_size=2877, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f72000 read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2877 read(3, "", 4096) = 0 read(3, "", 8192) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8205f8) = -1 ENOTTY (Inappropriate ioctl for device) close(3)= 0 munmap(0xb7f72000, 4096)= 0 Why can't I suppress the dreaded Infinityband message? System is Ubuntu 7.04 with 'ported' (ie locally recompiled) current Open MPI packages from Debian. Dirk -- Three out of two people have difficulties with fractions. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Cannot suppress openib error message
Hi Jeff, On 24 October 2007 at 15:43, Jeff Squyres wrote: | This is quite likely because of a "feature" in how the OMPI v1.2 | series handles its plugins. In OMPI <=v1.2.x, Open MPI opens all | plugins that it can find and *then* applies the filter that you | provide (e.g., via the "btl" MCA param) to close / ignore certain | plugins. | | In OMPI >=v1.3, we [effectively] apply the filter *before* opening | plugins. So "--mca btl ^openib" will actually prevent the openib BTL | plugin from being loaded. | | I'm guessing that what you're seeing today is because we're opening | the openib BTL on a system where the OpenFabrics support libraries | are not available, and therefore the dlopen() fails. The error | string that we get back from libltdl is the somewhat-misleading "file | not found (ignored)", and that's what we print (note that ltdl is | referring to the fact that a dependent library is not found). I buy that explanation any day, but what is funny is that the btl = ^openib does suppress the warning on some of my systems (all running 1.2.4) but not others (also running 1.2.4). Hm. Dirk | On Oct 24, 2007, at 9:51 AM, Dirk Eddelbuettel wrote: | | > | > I've been scratching my head over this: | > | > lnx01:/usr/lib> orterun -n 2 --mca btl ^openib ~/c++/tests/mpitest | > [lnx01:14417] mca: base: component_find: unable to open btl openib: | > file not found (ignored) | > [lnx01:14418] mca: base: component_find: unable to open btl openib: | > file not found (ignored) | > Hello world, I'm process 0 | > Hello world, I'm process 1 | > lnx01:/usr/lib> grep openib /etc/openmpi/openmpi-mca-params.conf | > # btl = ^openib | > btl = ^openib | > lnx01:/usr/lib> orterun -n 2 ~/c++/tests/mpitest | > [lnx01:14429] mca: base: component_find: unable to open btl openib: | > file not found (ignored) | > [lnx01:14430] mca: base: component_find: unable to open btl openib: | > file not found (ignored) | > Hello world, I'm process 0 | > Hello world, I'm process 1 | > | > and when I strace it, I get | > | > uname({sys="Linux", node="lnx01", ...}) = 0 | > open("/etc/openmpi/openmpi-mca-params.conf", O_RDONLY) = 3 | > ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf820698) = -1 ENOTTY | > (Inappropriate ioctl for device) | > fstat64(3, {st_mode=S_IFREG|0644, st_size=2877, ...}) = 0 | > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, | > -1, 0) = 0xb7f72000 | > read(3, "#\n# Copyright (c) 2004-2005 The "..., 8192) = 2877 | > read(3, "", 4096) = 0 | > read(3, "", 8192) = 0 | > ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf8205f8) = -1 ENOTTY | > (Inappropriate ioctl for device) | > close(3)= 0 | > munmap(0xb7f72000, 4096)= 0 | > | > Why can't I suppress the dreaded Infinityband message? | > | > System is Ubuntu 7.04 with 'ported' (ie locally recompiled) current | > Open MPI packages | > from Debian. | > | > Dirk | > | > -- | > Three out of two people have difficulties with fractions. | > ___ | > users mailing list | > us...@open-mpi.org | > http://www.open-mpi.org/mailman/listinfo.cgi/users | | | -- | Jeff Squyres | Cisco Systems | | ___ | users mailing list | us...@open-mpi.org | http://www.open-mpi.org/mailman/listinfo.cgi/users -- Three out of two people have difficulties with fractions.
Re: [OMPI users] Cannot suppress openib error message
On Oct 24, 2007, at 4:16 PM, Dirk Eddelbuettel wrote: I buy that explanation any day, but what is funny is that the btl = ^openib does suppress the warning on some of my systems (all running 1.2.4) but not others (also running 1.2.4). If I had to guess, the systems where you don't see the warning are systems that have OFED loaded. -- Jeff Squyres Cisco Systems
Re: [OMPI users] orterun "by hand"
On Oct 24, 2007, at 1:21 PM, George Bosilca wrote: There is no way to run Open MPI by hand, or at least not simple way. How about xgrid on your OS X cluster ? Anyway, without a way to start processes remotely it is really difficult to start up any kind of parallel job. More specifically, Open MPI assumes that it can invoke some kind of action on the local node to cause processes to be started remotely (whether that's via rsh/ssh or some kind of resource manager mechanism). We don't really have a simple way for a user to start a bunch of jobs manually and have them magically join together into a single parallel job. You conceivably *could* replicate the commands the rsh/ssh starter executes, but I wouldn't really advise it, for two reasons: 1. they're long, complicated commands (which are generated automatically with variable arguments) 2. the specific arguments have changed between different versions of Open MPI -- we consider these interfaces to be internal and therefore subject to change without warning -- Jeff Squyres Cisco Systems
Re: [OMPI users] Syntax error in remote rsh execution
Hi Tim, Thank you for your reply. You are right, my openMPI version is rather old. However I am stuck with it while I can compile v1.2.4. I have had some problems with it (I already opened a case on Oct 15th). You were also right about my hostname. uname -n reports (none) and the "hostname" command did not exist in the nodes of my cluster. I already added it to the nodes and modified the /etc/hosts file. The error went away and now I can see that orted runs in the remote node. It is strange to me that orted runs with --num_proc 3 when mpirun was executed with -np 2. Does this sound correct to you? I might open a new case for it though... Thank you for your help, Jorge On Mon, 22 Oct 2007, Tim Prins wrote: Sorry to reply to my own mail. Just browsing through the logs you sent, and I see that 'hostname' should be working fine. However, you are using v1.1.5 which is very old. I would strongly suggest upgrading to v1.2.4. It is a huge improvement over the old v1.1 series (which is not being maintained anymore). Tim On Monday 22 October 2007 08:41:30 pm Tim Prins wrote: Hi Jorge, This is interesting. The problem is the universe name: root@(none):default-universe The "(none)" part is supposed to be the hostname where mpirun is executed. Try running: hostname and: uname -n These should both return valid hostnames for your machine. Open MPI pretty much assumes that all nodes have a valid (preferably unique) hostname. If the above commands don't work, you probably need to fix your cluster. Let me know if this does not work. Thanks, Tim On Thursday 18 October 2007 09:22:09 pm Jorge Parra wrote: Hi, When trying to execute an application that spawns to another node, I obtain the following message: # ./mpirun --hostfile /root/hostfile -np 2 greetings Syntax error: "(" unexpected (expecting ")") - - Could not execute the executable "/opt/OpenMPI/OpenMPI-1.1.5b/exec/bin/greetings ": Exec format error This could mean that your PATH or executable name is wrong, or that you do not have the necessary permissions. Please ensure that the executable is able to be found and executed. - - and in the remote node: # pam_rhosts_auth[183]: user root has a `+' user entry pam_rhosts_auth[183]: allowed to root@192.168.1.102 as root PAM_unix[183]: (rsh) session opened for user root by (uid=0) in.rshd[184]: root@192.168.1.102 as root: cmd='( ! [ -e ./.profile ] || . ./.pro file; orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --nodename 1 92.168.1.103 --universe root@(none):default-universe --nsreplica "0.0.0;tcp://19 2.168.1.102:32774" --gprreplica "0.0.0;tcp://192.168.1.102:32774" --mpi-call-yie ld 0 )' PAM_unix[183]: (rsh) session closed for user root I suspect the command that rsh is trying to execute in the remote node fails. It seems to me that the first parenthesis in cmd='( ! is not well interpreted, thus causing the syntax error. This might prevent .profile to run and to correctly set PATH. Therefore, "greetings" is not found. I am attaching to this email the appropiate configuration files of my system and openmpi on it. This is a system in an isolated network, so I don't care too much for security. Therefore I am using rsh on it. I would really appreciate any suggestions to correct this problem. Thank you, Jorge ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI::BOTTOM vs MPI_BOTTOM
Wow -- that has survived since LAM/MPI -- you're the first person to have ever noticed it. :-) I *think* it's just a wrong type, but I'd prefer to file a ticket so that someone gives it a bit more than a cursory examination before making the change. Thanks for pointing it out! On Oct 10, 2007, at 9:19 PM, Stephen Guzik wrote: Hi, To the Devs. I just noticed that MPI::BOTTOM requires a cast. Not sure if that was intended. Compiling 'MPI::COMM_WORLD.Bcast(MPI::BOTTOM, 1, someDataType, 0);' results in: error: invalid conversion from ‘const void*’ to ‘void*’ error: initializing argument 1 of ‘virtual void MPI::Comm::Bcast (void*, int, const MPI::Datatype&, int) const’ MPI_BOTTOM, on the other hand, works without a cast. Stephen ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Tuning Openmpi with IB Interconnect
Sorry I missed this message before... it got lost in the deluge that is my inbox. Are you using the mpi_leave_pinned MCA parameter? That will make a big difference on the typical ping-pong benchmarks: mpirun --mca mpi_leave_pinned 1 On Oct 11, 2007, at 11:44 AM, Matteo Cicuttin wrote: Il giorno 11/ott/07, alle ore 07:16, Neeraj Chourasia ha scritto: Dear All, Could anyone tell me the important tuning parameters in openmpi with IB interconnect? I tried setting eager_rdma, min_rdma_size, mpi_leave_pinned parameters from the mpirun command line on 38 nodes cluster (38*2 processors) but in vain. I found simple mpirun with no mca parameters performing better. I conducted test on P2P send/receive with data size of 8MB. Similarly i patched HPL linpack code with libnbc(non blocking collectives) and found no performance benefits. I went through its patch and found that, its probably not overlapping computation with communication. Any help in this direction would be appreciated. -Neeraj Hi! I'm Matteo, and I work for a company that produces HPC systems, in Italy. I'm new in that company and I'm looking for some help, and this thread seems to be good :) In the last days we're benchmarking a system, and I'm interested in some performance scores of the infiniband interconnect. The nodes are dual dual-core opteron machines and we use the PCI-X IB interfaces Mellanox Cougar Cub. Machines have the 8111 system controller and the 8131 PCI-X bridge. We reach a rate of about 600 MB/s in the point-to-point tests. This rate (more or less) is reported both by the ib_*_bw benchmarks and the IMB-MPI (sendrecv) benchmarks, version 3. MPI implementation is, of course, openmpi. I've read in a few places that a similar setup can reach about 800 MB/s on machines similar to those descripted above. Someone can confirm this? Someone have similar hardware and the measured bandwidth is better than 600 MB/s? Hints?Comments? Thank you in advance, Best regards, --- Cicuttin Matteo http://www.matteocicuttin.it Black holes are where god divided by zero ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] 1.2.4 cross-compilation problem
Well that's fun; I'm not sure why that would happen. Can you send all the information listed here: http://www.open-mpi.org/community/help/ On Oct 15, 2007, at 5:36 PM, Jorge Parra wrote: Hi, I am trying to cross-compile Open-mpi 1.2.4 for an embedded system. The development system is a i686 Linux and the target system is a ppc 405 based. When trying "make all" I get the following error: /bin/sh ../../../libtool --tag=CC --mode=link /opt/powerpc-405- linux/bin/powerpc-405-linux-gnu-gcc -O3 -DNDEBUG -finline- functions -fno-strict-aliasing -pthread -export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl - lutil -lm libtool: link: /opt/powerpc-405-linux/bin/powerpc-405-linux-gnu-gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -o opal_wrapper opal_wrapper.o -Wl,--export-dynamic ../../../ opal/.libs/libopen-pal.a -ldl -lnsl -lutil -lm -pthread ../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o)(.text+0xbe): In function `lt_dlinit': : undefined reference to `lt_libltdlc_LTX_preloaded_symbols' ../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o)(.text+0xc2): In function `lt_dlinit': : undefined reference to `lt_libltdlc_LTX_preloaded_symbols' collect2: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/opt/openmpi-1.2.4/opal/tools/wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/opt/openmpi-1.2.4/opal' make: *** [all-recursive] Error 1 Older versions of opem-mpi have been succesfully compiled in the same development system. I am attaching to this email all the output and the configuration information. Any help will greatly appreciated. Thank you, Jorge ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Syntax error in remote rsh execution
Glad you found the problem. Don't worry about the '--num_proc 3'. This does not refer to the number of application processes, but rather the number of 'daemon' processes plus 1 for mpirun. However, this is an internal interface which changes on different versions of Open MPI, so this explanation is subject to change :) Tim Jorge Parra wrote: Hi Tim, Thank you for your reply. You are right, my openMPI version is rather old. However I am stuck with it while I can compile v1.2.4. I have had some problems with it (I already opened a case on Oct 15th). You were also right about my hostname. uname -n reports (none) and the "hostname" command did not exist in the nodes of my cluster. I already added it to the nodes and modified the /etc/hosts file. The error went away and now I can see that orted runs in the remote node. It is strange to me that orted runs with --num_proc 3 when mpirun was executed with -np 2. Does this sound correct to you? I might open a new case for it though... Thank you for your help, Jorge On Mon, 22 Oct 2007, Tim Prins wrote: Sorry to reply to my own mail. Just browsing through the logs you sent, and I see that 'hostname' should be working fine. However, you are using v1.1.5 which is very old. I would strongly suggest upgrading to v1.2.4. It is a huge improvement over the old v1.1 series (which is not being maintained anymore). Tim On Monday 22 October 2007 08:41:30 pm Tim Prins wrote: Hi Jorge, This is interesting. The problem is the universe name: root@(none):default-universe The "(none)" part is supposed to be the hostname where mpirun is executed. Try running: hostname and: uname -n These should both return valid hostnames for your machine. Open MPI pretty much assumes that all nodes have a valid (preferably unique) hostname. If the above commands don't work, you probably need to fix your cluster. Let me know if this does not work. Thanks, Tim On Thursday 18 October 2007 09:22:09 pm Jorge Parra wrote: Hi, When trying to execute an application that spawns to another node, I obtain the following message: # ./mpirun --hostfile /root/hostfile -np 2 greetings Syntax error: "(" unexpected (expecting ")") - - Could not execute the executable "/opt/OpenMPI/OpenMPI-1.1.5b/exec/bin/greetings ": Exec format error This could mean that your PATH or executable name is wrong, or that you do not have the necessary permissions. Please ensure that the executable is able to be found and executed. - - and in the remote node: # pam_rhosts_auth[183]: user root has a `+' user entry pam_rhosts_auth[183]: allowed to root@192.168.1.102 as root PAM_unix[183]: (rsh) session opened for user root by (uid=0) in.rshd[184]: root@192.168.1.102 as root: cmd='( ! [ -e ./.profile ] || . ./.pro file; orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --nodename 1 92.168.1.103 --universe root@(none):default-universe --nsreplica "0.0.0;tcp://19 2.168.1.102:32774" --gprreplica "0.0.0;tcp://192.168.1.102:32774" --mpi-call-yie ld 0 )' PAM_unix[183]: (rsh) session closed for user root I suspect the command that rsh is trying to execute in the remote node fails. It seems to me that the first parenthesis in cmd='( ! is not well interpreted, thus causing the syntax error. This might prevent .profile to run and to correctly set PATH. Therefore, "greetings" is not found. I am attaching to this email the appropiate configuration files of my system and openmpi on it. This is a system in an isolated network, so I don't care too much for security. Therefore I am using rsh on it. I would really appreciate any suggestions to correct this problem. Thank you, Jorge ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Merging Intracommunicators
I believe that the second scenario that Sriram described is incorrect: you cannot merge independent intercommunicators into a single communicator (either intra or inter). On Oct 18, 2007, at 4:36 PM, Murat Knecht wrote: Hi, I have a question regarding merging intracommunicators. Using MPI_Spawn, I create on designated machines child processes, retrieving an intercommunicator each time. With MPI_Intercomm_Merge it is possible to get an intracommunicator containing the master process(es) and the newly spawned child process. The problem is to merge the intracommunicators into a single one. I understand there is the possibilty to use the so created intracommunicator from the first try in order to spawn the second child, merge this one into the intracomm and continue like this. This brings some considerable adminstrative overhead with it, as all already spawned children must (be informed to) participate in the spawn call. I would rather merge all intercommunicators together in the end using only the master process for spawning. Both these possibilites have been mentioned in the following post. http://www.lam-mpi.org/MailArchives/lam/2003/06/6226.php While I understand the first one, I do not follow the second - I cannot seem to find any method to merge multiple inter- or intracomms into a single intracomm. Groups cannot be used either, to collect the children and retrieve the intracomm, because this is only used for subgrouping within an already existing intracommunicator-group. Is there a way to merge them the easy way, or did I misread the post above? Thanks & best regards, Murat ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] problem with 'orted'
By default, I believe that orte assumes that the orted is in the same location on all nodes. If it's not, you should be able to use the following: 1. Make a sym link such that /usr/local/bin/orted appears on all of your nodes. You implied that you tried this, but I find it hard to believe that that didn't work -- the error message you show clearly indicates that it's looking for /usr/local/bin/orted. If it's there (and executable), it should work. 2. I assume you're using the rsh/ssh launcher. If this is the case, use the mca_pls_rsh_orted MCA parameter to /usr/bin/orted. E.g.: mpirun --mca pls_rsh_orted /usr/bin/orted ... On Oct 1, 2007, at 8:26 AM, Amit Kumar Saha wrote: hello, I am using Open MPI 1.2.3 to run a task on 4 hosts as follows: amit@ubuntu-desktop-1:~/mpi-exec$ mpirun --np 4 --hostfile mpi-host-file ParallelSearch bash: /usr/local/bin/orted: No such file or directory The problem is that 'orted' is not found on one of the 4 hosts. I investigated the problem and found out that whereas 'orted' is stored in /usr/local/bin on all the other 3 hosts, it is in /usr/bin/orted on the erroneous host. I tried to create a soft link to solve the problem but sadly it is not so simple, it seems. It would be nice to know how to get around this problem. Thanks, Amit -- Amit Kumar Saha *NetBeans Community Docs Coordinator* me blogs@ http://amitksaha.blogspot.com URL:http://amitsaha.in.googlepages.com ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Number of processes and number of the cores.
We don't really have this kind of fine-grained processor affinity control in Open MPI yet. Is there a reason you want to oversubscribe cores this way? Open MPI assumes that each process should be as aggressive as possible in terms of performance -- spinning heavily until progress can be made on message passing, etc. On Oct 23, 2007, at 3:15 PM, Siamak Riahi wrote: I have a question about using the open mpi. I want to tie "N" number of processes to one core and "M" number of processes to another core. I want to know if open mpi is capable of doing that. Thanks, Siamak ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] xcode and ompi
Those are the three libraries that are typically required. I don't know anything about xcode, so I don't know if there's any other secret sauce that you need to use. Warner -- can you shed any light here? To verify your Open MPI installation, you might want to try compiling a trivial MPI application outside of xcode with the simple "mpicc" wrapper compiler, such as: mpicc mpi_hello_world.c -o mpi_hello_world You can also see what underlying command mpicc is invoking with: mpicc mpi_hello_world.c -o mpi_hello_world --showme But I will pretty much guarantee that if you have mixed multiple MPI implementations (LAM and Open MPI) in the same directory tree, things won't work. It would be best to fully uninstall one (e.g., LAM) and then re-install the other (e.g., Open MPI). If you've lost the build directory for LAM, you can download a new source tarball from www.lam- mpi.org. On Oct 21, 2007, at 11:13 PM, Tony Sheh wrote: Hi all, I'm working in xcode and i'm trying to build an application that links against the OMPI libraries. So far i've included the following files in the build: libmpi.dylib libopen-pal.dylib libopen-rte.dylib and the errors i get are Undefined symbols: all the MPI functions you can think of.. as well as a warning: "suggest use of -bind_at_load, as lazy binding may result in errors or different symbols being used I've compiled and linked to the static libraries (using ./configure -- enable-static) and i get the same errors. Also, i previously the latest version of lam/mpi installed. I didn't uninstall it since i lost the original directory as well as the make and configure settings. If that is the conflict then any information about how to resolve it would be good. Thanks! Tony ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Cannot suppress openib error message
On 24 October 2007 at 16:22, Jeff Squyres wrote: | On Oct 24, 2007, at 4:16 PM, Dirk Eddelbuettel wrote: | | > I buy that explanation any day, but what is funny is that the | > btl = ^openib | > does suppress the warning on some of my systems (all running 1.2.4) | > but not | > others (also running 1.2.4). | | If I had to guess, the systems where you don't see the warning are | systems that have OFED loaded. I am pretty sure that none of the systems (at work) have IB hardware. I am very sure that my home systems do not, and there the 'btl = ^openib' successfully suppresses the warning --- whereas at work it doesn't. Must be a side-effect from something else. I made sure not lam libs were left around. Dirk -- Three out of two people have difficulties with fractions.
Re: [OMPI users] Cannot suppress openib error message
On Oct 24, 2007, at 9:23 PM, Dirk Eddelbuettel wrote: | If I had to guess, the systems where you don't see the warning are | systems that have OFED loaded. I am pretty sure that none of the systems (at work) have IB hardware. I am very sure that my home systems do not, and there the 'btl = ^openib' successfully suppresses the warning --- whereas at work it doesn't. Note that you don't need to have IB hardware -- all you need is the OFED software loaded. I don't know if Debian ships the OFED libraries by default...? In particular, look for libibverbs: [18:28] svbu-mpi:~/svn/ompi % ldd $bogus/lib/openmpi/mca_btl_openib.so libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x002a956c2000) libnsl.so.1 => /lib64/libnsl.so.1 (0x002a957cd000) libutil.so.1 => /lib64/libutil.so.1 (0x002a958e4000) libm.so.6 => /lib64/tls/libm.so.6 (0x002a959e8000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x002a95b6e000) libc.so.6 => /lib64/tls/libc.so.6 (0x002a95c83000) libdl.so.2 => /lib64/libdl.so.2 (0x002a95eb8000) /lib64/ld-linux-x86-64.so.2 (0x00552000) However, I note something in your last reply that I may have missed before -- can you clarify a point for me: are you saying that on your home machine, this generates the openib "file not found" warning: mpirun -np 2 hello but this does not: mpirun -np 2 --mca btl ^openib hello If so, can you confirm which version of Open MPI you are running? The only reason that I can think that that would happen is if you are running a trunk nightly download of Open MPI... If not, then there's something else going on that would be worth understanding. -- Jeff Squyres Cisco Systems
Re: [OMPI users] Cannot suppress openib error message
On 24 October 2007 at 21:31, Jeff Squyres wrote: | On Oct 24, 2007, at 9:23 PM, Dirk Eddelbuettel wrote: | | > | If I had to guess, the systems where you don't see the warning are | > | systems that have OFED loaded. | > | > I am pretty sure that none of the systems (at work) have IB | > hardware. I am | > very sure that my home systems do not, and there the 'btl = ^openib' | > successfully suppresses the warning --- whereas at work it doesn't. | | Note that you don't need to have IB hardware -- all you need is the | OFED software loaded. I don't know if Debian ships the OFED | libraries by default...? In particular, look for libibverbs: | | [18:28] svbu-mpi:~/svn/ompi % ldd $bogus/lib/openmpi/mca_btl_openib.so | libibverbs.so.1 => /usr/lib64/libibverbs.so.1 | (0x002a956c2000) | libnsl.so.1 => /lib64/libnsl.so.1 (0x002a957cd000) | libutil.so.1 => /lib64/libutil.so.1 (0x002a958e4000) | libm.so.6 => /lib64/tls/libm.so.6 (0x002a959e8000) | libpthread.so.0 => /lib64/tls/libpthread.so.0 | (0x002a95b6e000) | libc.so.6 => /lib64/tls/libc.so.6 (0x002a95c83000) | libdl.so.2 => /lib64/libdl.so.2 (0x002a95eb8000) | /lib64/ld-linux-x86-64.so.2 (0x00552000) Good point. However, I use the .deb packages which are I build for Debian, and they use libibverbs where available: Build-Depends: [...], libibverbs-dev [!kfreebsd-i386 !kfreebsd-amd64 \ !hurd-i386], gfortran, libsysfs-dev, automake, gcc (>= 4:4.1.2) in particular on i386. Consequently, the binary package ends up with a Depends on the run-time package 'libibverbs1' -- and this will hence always be present as all my systems use the .deb packages (either from Debian or locally rebuild) that forces libibverbs1 in via this Depends. At work, I re-build these same package under Ubuntu on my "head node". And on the head node, no warning is seen -- wherease my computes issue the warning. Could this be another one of the dlopen issues where basically ldopen("libibverbs.so") is executed? Because the compute nodes do NOT have libibverbs.so (from the -dev package) but only libibverbs.so.1.0.0 and its matching symlink libibverbs.so.1. I just tested that hypothesis and install libibverbs-dev, but no beans. Still get the warning. | However, I note something in your last reply that I may have missed | before -- can you clarify a point for me: are you saying that on your | home machine, this generates the openib "file not found" warning: | | mpirun -np 2 hello | | but this does not: | | mpirun -np 2 --mca btl ^openib hello More or less, but I use /etc/openmpi/openmci-mca-params.conf to toggle ^openib. Adding it again as --mca btl ^openib changes nothing, unfortunately. | If so, can you confirm which version of Open MPI you are running? | The only reason that I can think that that would happen is if you are | running a trunk nightly download of Open MPI... If not, then there's | something else going on that would be worth understanding. No, plain 1.2.4 from the original tarballs. Still puzzled. To recap, the head node and the compute node all use the same Ubuntu release, use the same binary .deb packages from Open MPI 1.2.4 I rebuild there. The 'sole' difference is that the 'head node' has more development packages and tools installed -- but that should not matter. I just re-checked and the compute node does not have any LAM or MPICH parts remaining. Dirk -- Three out of two people have difficulties with fractions.