Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
One last note to close this out. After some discussion on the developers list it was pointed out that this problem was fixed with new code in the trunk and 1.3 branch. So my statement below of the trunk, 1.3 and CT8 EA2 supporting nodes on different subnets can be made stronger that we really do expect this to work. --td Terry Dontje wrote: Terry Dontje wrote: Date: Tue, 29 Jul 2008 14:19:14 -0400 From: "Alexander Shabarshin" Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools To: Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Hello >>> > One idea comes to mind is whether the two nodes are on the same >>> > subnet? If they are not on the same subnet I think there is a bug in >>> > which the TCP BTL will recuse itself from communications between the >>> > two nodes. >> you are right - subnets are different, but routes set up correctly and >> everything like ping, ssh etc. are working OK between them > But it isn't a routing problem but how the tcp btl in Open MPI decides > which interface the nodes can communicate with (completely out of the > hands of the TCP stack and lower). Do you know when it can be fixed in official OpenMPI? Is patch available or something? Well this problem is captured in ticket 972 (https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question as to whether this ticket has been fixed or not (that is was code actually putback). Sun's experience with the Trunk, 1.3 branch and CT8 EA2 release seems to be that you now can run jobs across subnets but we (Sun) are not completely I guess I should have ended with "mumble..mumble" :-) Now for the rest of the sentence: ... sure whether the support is truly in there or we just got lucky in how our setup was configured. --td FWIW, it looks like that code has had a lot of changes in it between 1.2 and 1.3. --td
Re: [OMPI users] TCP Latency
Thanks again for all the answers. It seems that were was a bug in the driver in combination with Suse Linux Enterprise Server 10. It was fixed with version 1.0.146. Now we have 12us with NPtcp and 22us with NPmpi. This is still not fast enough but for the time acceptable. I will check the alternatives as soon as possible and look forward to OpenMPI 1.3. Then we will see what iWARP brings ;-). Best regards, Andy Kozin, I (Igor) schrieb: Thanks for the fast answer. So is this latency normal for TCP communications over MPI!? Could RDMA maybe reduce the latency? It should work with those cards but there are still problems with OFED. iWARP is also one of the features they offer but if it works... Hi Andy, Yes, ~40us TCP latency is normal (it can be worse too). If you need lower MPI latency you need to look elsewhere (but it's not going to be TCP). Check SCore, OpenMX and Gamma. SCore is more mature of the three but OpenMX looks promising too. We get less than 15 us using SCore MPI and Intel NICs (IMB PingPong). Of course commercial MPI libraries offer low latency too e.g. Scali MPI. Best, Igor -- Dresden University of Technology Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany e-mail: andy.geo...@zih.tu-dresden.de WWW:http://www.tu-dresden.de/zih ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users I. Kozin (i.kozin at dl.ac.uk) Computational Science and Engineering Dept. STFC Daresbury Laboratory Daresbury Science and Innovation Centre Daresbury, Warrington, WA4 4AD skype: in_kozin tel: +44 (0) 1925 603308 fax: +44 (0) 1925 603634 http://www.cse.clrc.ac.uk/disco ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dresden University of Technology Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Phone:(+49) 351/463-38783 Fax: (+49) 351/463-38245 e-mail: andy.geo...@zih.tu-dresden.de WWW:http://www.tu-dresden.de/zih
Re: [OMPI users] Segmentation fault: Address not mapped
Hi, OK, to answer my own question, I recompiled OpenMPI appending '--with-memory-manager=none' to configure and now things seem to run fine. I'm not sure how this might affect performance, but at least it's working now. Maybe this can be put in the FAQ? James On Wed, Jul 30, 2008 at 2:02 AM, James Philbin wrote: > Hi, > > I'm running an mpi module in python (pypar), but I believe (after > googling) that this might be a problem with openmpi. > When I run: 'python -c "import pypar"', I get: > [titus:21965] *** Process received signal *** > [titus:21965] Signal: Segmentation fault (11) > [titus:21965] Signal code: Address not mapped (1) > [titus:21965] Failing at address: 0x837a004 > [titus:21965] [ 0] /lib/i686/libpthread.so.0 [0x40035f93] > [titus:21965] [ 1] python [0x42029180] > [titus:21965] [ 2] /users/james/lib/libopen-pal.so.0(free+0xbc) [0x40e112fc] > [titus:21965] [ 3] > /users/james/lib/libopen-pal.so.0(mca_base_components_open+0x83) > [0x40dff9b3] > [titus:21965] [ 4] > /users/james/lib/libmpi.so.0(mca_allocator_base_open+0x46) > [0x40cb03b6] > [titus:21965] [ 5] /users/james/lib/libmpi.so.0(ompi_mpi_init+0x3dd) > [0x40c7b7dd] > [titus:21965] [ 6] /users/james/lib/libmpi.so.0(MPI_Init+0xef) [0x40c9fb1f] > [titus:21965] [ 7] > /users/james/lib/python2.5/site-packages/pypar/mpiext.so [0x40576613] > [titus:21965] [ 8] python(PyCFunction_Call+0x5a) [0x810c9ea] > [titus:21965] [ 9] python [0x80bb2fb] > [titus:21965] [10] python(PyEval_EvalFrameEx+0x22d2) [0x80b97a2] > [titus:21965] [11] python(PyEval_EvalCodeEx+0x376) [0x80ba0b6] > [titus:21965] [12] python(PyEval_EvalCode+0x57) [0x80bcfe7] > [titus:21965] [13] python(PyImport_ExecCodeModuleEx+0x13a) [0x80d0b9a] > [titus:21965] [14] python [0x80d3eeb] > [titus:21965] [15] python [0x80d180e] > [titus:21965] [16] python [0x80d27b6] > [titus:21965] [17] python [0x80d2309] > [titus:21965] [18] python [0x80d45bf] > [titus:21965] [19] python(PyImport_ImportModuleLevel+0x90) [0x80d3a40] > [titus:21965] [20] python [0x80b3dda] > [titus:21965] [21] python(PyCFunction_Call+0xce) [0x810ca5e] > [titus:21965] [22] python(PyObject_Call+0x29) [0x805eca9] > [titus:21965] [23] python(PyEval_CallObjectWithKeywords+0x75) [0x80bae95] > [titus:21965] [24] python(PyEval_EvalFrameEx+0x2041) [0x80b9511] > [titus:21965] [25] python(PyEval_EvalCodeEx+0x376) [0x80ba0b6] > [titus:21965] [26] python(PyEval_EvalCode+0x57) [0x80bcfe7] > [titus:21965] [27] python(PyImport_ExecCodeModuleEx+0x13a) [0x80d0b9a] > [titus:21965] [28] python [0x80d3eeb] > [titus:21965] [29] python [0x80d180e] > > I've built openmpi from the 1.2.6 sources with the following configure > flag: './configure --disable-dlopen --prefix=/users/james'. > > pypar seems to work fine on my ubuntu system also with openmpi > (installed from repositories). I'm tearing my hair out trying to solve > this, so any advice would be very welcome. > > Thanks, > James >
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip]
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
As your own tests have shown, it works fine if you just "mpirun -n 1 ./ spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun -n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
Just to clarify: the test code I wrote does *not* use MPI_Comm_spawn in the mpirun case. The problem may or may not exist under miprun. Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun -n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun -n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
Mark, if you can run a server process on the remote machine, you could send a request from your local MPI app to your server, then use an Intercomm to link the local process to the new remote process? On Jul 30, 2008, at 9:55 AM, Mark Borgerding wrote: I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun - n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
OK, thanks! Is it possible to fix it somehow directly in 1.2.x codebase? - Original Message - From: "Terry Dontje" To: Sent: Wednesday, July 30, 2008 7:15 AM Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools One last note to close this out. After some discussion on the developers list it was pointed out that this problem was fixed with new code in the trunk and 1.3 branch. So my statement below of the trunk, 1.3 and CT8 EA2 supporting nodes on different subnets can be made stronger that we really do expect this to work. --td Terry Dontje wrote: Terry Dontje wrote: Date: Tue, 29 Jul 2008 14:19:14 -0400 From: "Alexander Shabarshin" Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools To: Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Hello >>> > One idea comes to mind is whether the two nodes are on the same >>> > subnet? If they are not on the same subnet I think there is a bug in >>> > which the TCP BTL will recuse itself from communications between the >>> > two nodes. >> you are right - subnets are different, but routes set up correctly and >> everything like ping, ssh etc. are working OK between them > But it isn't a routing problem but how the tcp btl in Open MPI decides > which interface the nodes can communicate with (completely out of the > hands of the TCP stack and lower). Do you know when it can be fixed in official OpenMPI? Is patch available or something? Well this problem is captured in ticket 972 (https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question as to whether this ticket has been fixed or not (that is was code actually putback). Sun's experience with the Trunk, 1.3 branch and CT8 EA2 release seems to be that you now can run jobs across subnets but we (Sun) are not completely I guess I should have ended with "mumble..mumble" :-) Now for the rest of the sentence: ... sure whether the support is truly in there or we just got lucky in how our setup was configured. --td FWIW, it looks like that code has had a lot of changes in it between 1.2 and 1.3. --td ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
IThe problem would be finding a way to tell all the MPI apps how to contact each other as the Intercomm procedure needs that info to complete. I don't recall if the MPI_Name_publish/lookup functions worked in 1.2 - I'm building the code now to see. If it does, then you could use it to get the required contact info and wire up the Intercomm...it's a lot of what goes on under the comm_spawn covers anyway. Only diff is the necessity for the server... On Jul 30, 2008, at 8:24 AM, Robert Kubrick wrote: Mark, if you can run a server process on the remote machine, you could send a request from your local MPI app to your server, then use an Intercomm to link the local process to the new remote process? On Jul 30, 2008, at 9:55 AM, Mark Borgerding wrote: I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun - n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
Okay, I tested it and MPI_Name_publish and MPI_Name_lookup work on 1.2.6, so this may provide an avenue (albeit cumbersome) for you to get this to work. It may require a server, though, to make it work - your first MPI proc may be able to play that role if you pass it's contact info to the others, but I'd have to play with it for awhile to be sure. Haven't really tried that before. Otherwise, even if we devised a fix for the singleton comm_spawn in 1.2, it would still require an upgrade by the customer as it wouldn't be in 1.2.6 - best that could happen is for it to appear in 1.2.7, assuming we created the fix for that impending release (far from certain). So if this doesn't work, and the customer cannot or will not upgrade from 1.2.6, I fear you probably cannot do this with OMPI under the constraints you describe. On Jul 30, 2008, at 8:36 AM, Ralph Castain wrote: IThe problem would be finding a way to tell all the MPI apps how to contact each other as the Intercomm procedure needs that info to complete. I don't recall if the MPI_Name_publish/lookup functions worked in 1.2 - I'm building the code now to see. If it does, then you could use it to get the required contact info and wire up the Intercomm...it's a lot of what goes on under the comm_spawn covers anyway. Only diff is the necessity for the server... On Jul 30, 2008, at 8:24 AM, Robert Kubrick wrote: Mark, if you can run a server process on the remote machine, you could send a request from your local MPI app to your server, then use an Intercomm to link the local process to the new remote process? On Jul 30, 2008, at 9:55 AM, Mark Borgerding wrote: I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun - n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailm
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
I appreciate the suggestion about running a daemon on each of the remote nodes, but wouldn't I kind of be reinventing the wheel there? Process management is one of the things I'd like to be able to count on ORTE for. Would the following work to give the parent process an intercomm with each child? parent i.e. my non-mpirun-started process calls MPI_Init then MPI_Open_port parent spawns mpirun command via system/exec to create the remote children . The name from MPI_Open_port is placed in the environment. parent calls MPI_Comm_accept (once for each child?) all children call MPI_connect to the name I think this would give one intercommunicator back to the parent for each remote process (not ideal, but I can worry about broadcast data later) The remote processes can communicate to each other through MPI_COMM_WORLD. Actually when I think through the details, much of this is pretty similar to the daemon MPI_Publish_name+MPI_Lookup_name approach. The main difference being which processes come first. Mark Borgerding wrote: I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun -n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote: I appreciate the suggestion about running a daemon on each of the remote nodes, but wouldn't I kind of be reinventing the wheel there? Process management is one of the things I'd like to be able to count on ORTE for. Keep in mind that the daemons here are not for process management -- they're for name service. Would the following work to give the parent process an intercomm with each child? parent i.e. my non-mpirun-started process calls MPI_Init then MPI_Open_port parent spawns mpirun command via system/exec to create the remote children . The name from MPI_Open_port is placed in the environment. parent calls MPI_Comm_accept (once for each child?) all children call MPI_connect to the name It may be problematic to call system/exec in some environments (e.g., if using OpenFabrics networks). Bad Things can happen. I think this would give one intercommunicator back to the parent for each remote process (not ideal, but I can worry about broadcast data later) The remote processes can communicate to each other through MPI_COMM_WORLD. Actually when I think through the details, much of this is pretty similar to the daemon MPI_Publish_name+MPI_Lookup_name approach. The main difference being which processes come first. Instead of having the framework call MPI_Init in your plugin, can you plugin system/exec "mpirun -np 1 my_parent_app"? And perhaps use a pipe (or socket or some other IPC) to communicate between the framework process and my_parent_app? I realize it's a kludgey workaround, but it looks like we clearly have a bug in the 1.2 series with singletons in this area... -- Jeff Squyres Cisco Systems
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote: I appreciate the suggestion about running a daemon on each of the remote nodes, but wouldn't I kind of be reinventing the wheel there? Process management is one of the things I'd like to be able to count on ORTE for. Would the following work to give the parent process an intercomm with each child? parent i.e. my non-mpirun-started process calls MPI_Init then MPI_Open_port parent spawns mpirun command via system/exec to create the remote children . The name from MPI_Open_port is placed in the environment. parent calls MPI_Comm_accept (once for each child?) I think you have to create a separate thread to run the accept, in order to accept multiple client connections. This should be supported by OpenMPI as it was the original idea of the API design to handle multiple client connections. There is an MPI_Comm_accept multi-thread example in the book Using MPI-2. all children call MPI_connect to the name I think this would give one intercommunicator back to the parent for each remote process (not ideal, but I can worry about broadcast data later) The remote processes can communicate to each other through MPI_COMM_WORLD. You should be able to merge each child communicator from each accept thread into a global comm anyway. Actually when I think through the details, much of this is pretty similar to the daemon MPI_Publish_name+MPI_Lookup_name approach. The main difference being which processes come first. You can run a deamon through system/exec the same way you run mpiexec. Just use ssh or rsh on the system/exec call. Mark Borgerding wrote: I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun - n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/lis
Re: [OMPI users] How to specify hosts for MPI_Comm_spawn
Just to be clear: you do not require a daemon on every node. You just need one daemon - sitting somewhere - that can act as the data server for MPI_Name_publish/lookup. You then tell each app where to find it. Normally, mpirun fills that function. But if you don't have it, you can kickoff a persistent orted (perhaps just have the parent application fork/exec it) for that purpose. On Jul 30, 2008, at 9:50 AM, Robert Kubrick wrote: On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote: I appreciate the suggestion about running a daemon on each of the remote nodes, but wouldn't I kind of be reinventing the wheel there? Process management is one of the things I'd like to be able to count on ORTE for. Would the following work to give the parent process an intercomm with each child? parent i.e. my non-mpirun-started process calls MPI_Init then MPI_Open_port parent spawns mpirun command via system/exec to create the remote children . The name from MPI_Open_port is placed in the environment. parent calls MPI_Comm_accept (once for each child?) I think you have to create a separate thread to run the accept, in order to accept multiple client connections. This should be supported by OpenMPI as it was the original idea of the API design to handle multiple client connections. There is an MPI_Comm_accept multi-thread example in the book Using MPI-2. all children call MPI_connect to the name I think this would give one intercommunicator back to the parent for each remote process (not ideal, but I can worry about broadcast data later) The remote processes can communicate to each other through MPI_COMM_WORLD. You should be able to merge each child communicator from each accept thread into a global comm anyway. Actually when I think through the details, much of this is pretty similar to the daemon MPI_Publish_name+MPI_Lookup_name approach. The main difference being which processes come first. You can run a deamon through system/exec the same way you run mpiexec. Just use ssh or rsh on the system/exec call. Mark Borgerding wrote: I'm afraid I can't dictate to the customer that they must upgrade. The target platform is RHEL 5.2 ( uses openmpi 1.2.6 ) I will try to find some sort of workaround. Any suggestions on how to "fake" the functionality of MPI_Comm_spawn are welcome. To reiterate my needs: I am writing a shared object that plugs into an existing framework. I do not control how the framework launches its processes (no mpirun). I want to start remote processes to crunch the data. The shared object marshall the I/O between the framework and the remote processes. -- Mark Ralph Castain wrote: Singleton comm_spawn works fine on the 1.3 release branch - if singleton comm_spawn is critical to your plans, I suggest moving to that version. You can get a pre-release version off of the www.open-mpi.org web site. On Jul 30, 2008, at 6:58 AM, Ralph Castain wrote: As your own tests have shown, it works fine if you just "mpirun - n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-) I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2. On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote: I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes. If this is true, then my personal scorecard reads: # Days spent using openmpi: 4 (off and on) # identified bugs in openmpi :2 # useful programs built: 0 Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read. Matt Hughes wrote: I've found that I always have to use mpirun to start my spawner process, due to the exact problem you are having: the need to give OMPI a hosts file! It seems the singleton functionality is lacking somehow... it won't allow you to spawn on arbitrary hosts. I have not tested if this is fixed in the 1.3 series. Try mpiexec -np 1 -H op2-1,op2-2 spawner op2-2 mpiexec should start the first process on op2-1, and the spawn call should start the second on op2-2. If you don't use the Info object to set the hostname specifically, then on 1.2.x it will automatically start on op2-2. With 1.3, the spawn call will start processes starting with the first item in the host list. mch [snip] ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://
[OMPI users] Missing F90 modules
I'm attempting to move to OpenMPI from another MPICH-derived implementation. I compiled openmpi 1.2.6 using the following configure: ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= --prefix=/usr/mpi/pathscale/openmpi-1.2.6 --exec-prefix=/usr/mpi/pathscale/openmpi-1.2.6 --bindir=/usr/mpi/pathscale/openmpi-1.2.6/bin --sbindir=/usr/mpi/pathscale/openmpi-1.2.6/sbin --sysconfdir=/usr/mpi/pathscale/openmpi-1.2.6/etc --datadir=/usr/mpi/pathscale/openmpi-1.2.6/share --includedir=/usr/mpi/pathscale/openmpi-1.2.6/include --libdir=/usr/mpi/pathscale/openmpi-1.2.6/lib64 --libexecdir=/usr/mpi/pathscale/openmpi-1.2.6/libexec --localstatedir=/var --sharedstatedir=/usr/mpi/pathscale/openmpi-1.2.6/com --mandir=/usr/mpi/pathscale/openmpi-1.2.6/share/man --infodir=/usr/share/info --with-openib=/usr --with-openib-libdir=/usr/lib64 CC=pathcc CXX=pathCC F77=pathf90 FC=pathf90 --with-psm-dir=/usr --enable-mpirun-prefix-by-default --with-mpi-f90-size=large It looks like there is a single MPI.mod generated upon compilation and installation. Is this normal? I have a user complaining that MPI1.mod, MPI2.mod, and the f90base directory among others are missing (and thus the installation is incomplete). Are these modules provided by OpenMPI? I see in the configure help that the f90 bindings are enabled by default so I didn't add the "--enable-mpi-f90" option. Scot
Re: [OMPI users] Missing F90 modules
On all MPI's I have always used there was only MPI use mpi; Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jul 30, 2008, at 1:45 PM, Scott Beardsley wrote: I'm attempting to move to OpenMPI from another MPICH-derived implementation. I compiled openmpi 1.2.6 using the following configure: ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat- linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= -- prefix=/usr/mpi/pathscale/openmpi-1.2.6 --exec-prefix=/usr/mpi/ pathscale/openmpi-1.2.6 --bindir=/usr/mpi/pathscale/openmpi-1.2.6/ bin --sbindir=/usr/mpi/pathscale/openmpi-1.2.6/sbin --sysconfdir=/ usr/mpi/pathscale/openmpi-1.2.6/etc --datadir=/usr/mpi/pathscale/ openmpi-1.2.6/share --includedir=/usr/mpi/pathscale/openmpi-1.2.6/ include --libdir=/usr/mpi/pathscale/openmpi-1.2.6/lib64 -- libexecdir=/usr/mpi/pathscale/openmpi-1.2.6/libexec -- localstatedir=/var --sharedstatedir=/usr/mpi/pathscale/ openmpi-1.2.6/com --mandir=/usr/mpi/pathscale/openmpi-1.2.6/share/ man --infodir=/usr/share/info --with-openib=/usr --with-openib- libdir=/usr/lib64 CC=pathcc CXX=pathCC F77=pathf90 FC=pathf90 -- with-psm-dir=/usr --enable-mpirun-prefix-by-default --with-mpi-f90- size=large It looks like there is a single MPI.mod generated upon compilation and installation. Is this normal? I have a user complaining that MPI1.mod, MPI2.mod, and the f90base directory among others are missing (and thus the installation is incomplete). Are these modules provided by OpenMPI? I see in the configure help that the f90 bindings are enabled by default so I didn't add the "--enable-mpi-f90" option. Scot ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Missing F90 modules
Brock Palen wrote: On all MPI's I have always used there was only MPI use mpi; Please excuse my admittedly gross ignorance of all things Fortran but why does "include 'mpif.h'" work but "use mpi" does not? When I try the "use mpi" method I get errors like: $ mpif90 -c cart.f call mpi_cart_get( igcomm,2,ivdimx,lvperx, mygrid, ierr) ^ pathf95-389 pathf90: ERROR CART, File = cart.f, Line = 34, Column = 12 No specific match can be found for the generic subprogram call "MPI_CART_GET" $ mpif90 -c cartfoo.f $ diff cart.f cartfoo.f 3,4c3,4 < C include 'mpif.h' < use mpi; --- > include 'mpif.h' > C use mpi; $ From the googling I've done it seems like "use mpi" is preferred[1]. I've made sure that my $LD_LIBRARY_PATH has the directory that MPI.mod is in. Scott [1] http://www.mpi-forum.org/docs/mpi-20-html/node243.htm
Re: [OMPI users] Missing F90 modules
This is correct; Open MPI only generates MPI.mod so that you can "use mpi" in your Fortran app. I'm not sure that MPI1.mod and MPI2.mod and f90base are -- perhaps those are somehow specific artifacts of the other MPI implementation, and/or artifacts of the Fortran compiler...? On Jul 30, 2008, at 1:45 PM, Scott Beardsley wrote: I'm attempting to move to OpenMPI from another MPICH-derived implementation. I compiled openmpi 1.2.6 using the following configure: ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat- linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= -- prefix=/usr/mpi/pathscale/openmpi-1.2.6 --exec-prefix=/usr/mpi/ pathscale/openmpi-1.2.6 --bindir=/usr/mpi/pathscale/openmpi-1.2.6/ bin --sbindir=/usr/mpi/pathscale/openmpi-1.2.6/sbin --sysconfdir=/ usr/mpi/pathscale/openmpi-1.2.6/etc --datadir=/usr/mpi/pathscale/ openmpi-1.2.6/share --includedir=/usr/mpi/pathscale/openmpi-1.2.6/ include --libdir=/usr/mpi/pathscale/openmpi-1.2.6/lib64 -- libexecdir=/usr/mpi/pathscale/openmpi-1.2.6/libexec --localstatedir=/ var --sharedstatedir=/usr/mpi/pathscale/openmpi-1.2.6/com --mandir=/ usr/mpi/pathscale/openmpi-1.2.6/share/man --infodir=/usr/share/info --with-openib=/usr --with-openib-libdir=/usr/lib64 CC=pathcc CXX=pathCC F77=pathf90 FC=pathf90 --with-psm-dir=/usr --enable- mpirun-prefix-by-default --with-mpi-f90-size=large It looks like there is a single MPI.mod generated upon compilation and installation. Is this normal? I have a user complaining that MPI1.mod, MPI2.mod, and the f90base directory among others are missing (and thus the installation is incomplete). Are these modules provided by OpenMPI? I see in the configure help that the f90 bindings are enabled by default so I didn't add the "--enable-mpi-f90" option. Scot ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Missing F90 modules
I have seen strange things about fortran compilers and the suffix of files. use mpi is a fortran 90 thing, not 77, many compilers want fortran 90 codes to end in .f90 or .F90 Try renaming carfoo.f to cartfoo.f90 and try again. I have attached a helloworld.f90 that uses use mpi that works on our openmpi installs. helloworld.f90 Description: Binary data Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jul 30, 2008, at 4:15 PM, Scott Beardsley wrote: Brock Palen wrote: On all MPI's I have always used there was only MPI use mpi; Please excuse my admittedly gross ignorance of all things Fortran but why does "include 'mpif.h'" work but "use mpi" does not? When I try the "use mpi" method I get errors like: $ mpif90 -c cart.f call mpi_cart_get( igcomm,2,ivdimx,lvperx, mygrid, ierr) ^ pathf95-389 pathf90: ERROR CART, File = cart.f, Line = 34, Column = 12 No specific match can be found for the generic subprogram call "MPI_CART_GET" $ mpif90 -c cartfoo.f $ diff cart.f cartfoo.f 3,4c3,4 < C include 'mpif.h' < use mpi; --- > include 'mpif.h' > C use mpi; $ From the googling I've done it seems like "use mpi" is preferred [1]. I've made sure that my $LD_LIBRARY_PATH has the directory that MPI.mod is in. Scott [1] http://www.mpi-forum.org/docs/mpi-20-html/node243.htm ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Missing F90 modules
Scott, include brings in a file use brings in a module .. kind of like an object file. Joe > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Scott Beardsley > Sent: Wednesday, July 30, 2008 1:16 PM > To: Open MPI Users > Subject: Re: [OMPI users] Missing F90 modules > > Brock Palen wrote: > > On all MPI's I have always used there was only MPI > > > > use mpi; > > Please excuse my admittedly gross ignorance of all things Fortran but > why does "include 'mpif.h'" work but "use mpi" does not? When I try the > "use mpi" method I get errors like: > > $ mpif90 -c cart.f > >call mpi_cart_get( igcomm,2,ivdimx,lvperx, mygrid, ierr) > ^ > pathf95-389 pathf90: ERROR CART, File = cart.f, Line = 34, Column = 12 >No specific match can be found for the generic subprogram call > "MPI_CART_GET" > > $ mpif90 -c cartfoo.f > $ diff cart.f cartfoo.f > 3,4c3,4 > < C include 'mpif.h' > < use mpi; > --- > > include 'mpif.h' > > C use mpi; > $ > > From the googling I've done it seems like "use mpi" is preferred[1]. > I've made sure that my $LD_LIBRARY_PATH has the directory that MPI.mod > is in. > > Scott > > [1] http://www.mpi-forum.org/docs/mpi-20-html/node243.htm > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Missing F90 modules
On Wed, Jul 30, 2008 at 01:15:54PM -0700, Scott Beardsley wrote: > Brock Palen wrote: > > On all MPI's I have always used there was only MPI > > > > use mpi; > > Please excuse my admittedly gross ignorance of all things Fortran but > why does "include 'mpif.h'" work but "use mpi" does not? When I try the > "use mpi" method I get errors like: > > $ mpif90 -c cart.f > >call mpi_cart_get( igcomm,2,ivdimx,lvperx, mygrid, ierr) > ^ > pathf95-389 pathf90: ERROR CART, File = cart.f, Line = 34, Column = 12 >No specific match can be found for the generic subprogram call > "MPI_CART_GET" > > $ mpif90 -c cartfoo.f > $ diff cart.f cartfoo.f > 3,4c3,4 > < C include 'mpif.h' > < use mpi; > --- > > include 'mpif.h' > > C use mpi; > $ > > From the googling I've done it seems like "use mpi" is preferred[1]. > I've made sure that my $LD_LIBRARY_PATH has the directory that MPI.mod > is in. Try adding the path to MPI.mod to the include path (e.g., -I/usr/local/openmpi/mod). -- Ed[mund [Sumbar]] AICT Research Support Group esum...@ualberta.ca 780.492.9360
Re: [OMPI users] Missing F90 modules
"use mpi" basically gives you stronger type checking in Fortran 90 that you don't get with Fortran 77. So the error you're seeing is basically a compiler error telling you that you have the wrong types for MPI_CART_GET and that it doesn't match any of the functions provided by Open MPI. FWIW, the official declaration of the Fortran binding for MPI_CART_GET is: MPI_CART_GET(COMM, MAXDIMS, DIMS, PERIODS, COORDS, IERROR) INTEGER COMM, MAXDIMS, DIMS(*), COORDS(*), IERROR LOGICAL PERIODS(*) The real problem is that it looks like we have a bug in our F90 bindings. :-( We have the "periods" argument typed as an integer array, when it really should be a logical array. Doh! The patch below fixes the problem in the v1.2 series; I'll get it included in v1.2.7 and the upcoming v1.3 series. Index: ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh === --- ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh(revision 19099) +++ ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh(working copy) @@ -1120,7 +1120,7 @@ integer, intent(in) :: comm integer, intent(in) :: maxdims integer, dimension(*), intent(out) :: dims - integer, dimension(*), intent(out) :: periods + logical, dimension(*), intent(out) :: periods integer, dimension(*), intent(out) :: coords integer, intent(out) :: ierr end subroutine ${procedure} On Jul 30, 2008, at 4:15 PM, Scott Beardsley wrote: Brock Palen wrote: On all MPI's I have always used there was only MPI use mpi; Please excuse my admittedly gross ignorance of all things Fortran but why does "include 'mpif.h'" work but "use mpi" does not? When I try the "use mpi" method I get errors like: $ mpif90 -c cart.f call mpi_cart_get( igcomm,2,ivdimx,lvperx, mygrid, ierr) ^ pathf95-389 pathf90: ERROR CART, File = cart.f, Line = 34, Column = 12 No specific match can be found for the generic subprogram call "MPI_CART_GET" $ mpif90 -c cartfoo.f $ diff cart.f cartfoo.f 3,4c3,4 < C include 'mpif.h' < use mpi; --- > include 'mpif.h' > C use mpi; $ From the googling I've done it seems like "use mpi" is preferred[1]. I've made sure that my $LD_LIBRARY_PATH has the directory that MPI.mod is in. Scott [1] http://www.mpi-forum.org/docs/mpi-20-html/node243.htm ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Missing F90 modules
The real problem is that it looks like we have a bug in our F90 bindings. :-( We have the "periods" argument typed as an integer array, when it really should be a logical array. Doh! Ahhh ha! I checked the manpage vs the user's code but I didn't check the OpenMPI code. I can confirm that the patch you sent fixes the problem for me (v1.2.6). Thanks all! Scott