I can run openmpi perfectly with command line, but I wanted a graphic interface for debugging because I was having problems. Thanks anyway.
Vincent 2009/5/4 Warner Yuen <wy...@apple.com> > Admittedly, I don't use Xcode to build Open MPI either. > > You can just compile Open MPI from the command line and install everything > in /usr/local/. Make sure that gfortran is set in your path and you should > just be able to do a './configure --prefix=/usr/local' > > After the installation, just make sure that your path is set correctly when > you go to use the newly installed Open MPI. If you don't set your path, it > will always default to using the version of OpenMPI that ships with Leopard. > > > Warner Yuen > Scientific Computing > Consulting Engineer > Apple, Inc. > email: wy...@apple.com > Tel: 408.718.2859 > > > > > On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote: > > Send users mailing list submissions to >> us...@open-mpi.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> or, via email, send a message with subject or body 'help' to >> users-requ...@open-mpi.org >> >> You can reach the person managing the list at >> users-ow...@open-mpi.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of users digest..." >> >> >> Today's Topics: >> >> 1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 4 May 2009 18:13:45 +0200 >> From: Vicente Puig <vpui...@gmail.com> >> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >> To: Open MPI Users <us...@open-mpi.org> >> Message-ID: >> <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> If I can not make it work with Xcode, which one could I use?, which one >> do >> you use to compile and debug OpenMPI?. >> Thanks >> >> Vincent >> >> >> 2009/5/4 Jeff Squyres <jsquy...@cisco.com> >> >> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard >>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has >>> non-functional mpif77 and mpif90 wrapper compilers. >>> >>> So the Open MPI that you installed manually will use your Fortran >>> compilers, and therefore will have functional mpif77 and mpif90 wrapper >>> compilers. Hence, you probably need to be sure to use the "right" >>> wrapper >>> compilers. It looks like you specified the full path specified to >>> ExecPath, >>> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I >>> unfortunately don't use Xcode myself, so I don't know why that wouldn't >>> work). >>> >>> >>> >>> >>> On May 4, 2009, at 11:53 AM, Vicente wrote: >>> >>> Yes, I already have gfortran compiler on /usr/local/bin, the same path >>> >>>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin >>>> and on /Developer/usr/bin says it: >>>> >>>> "Unfortunately, this installation of Open MPI was not compiled with >>>> Fortran 90 support. As such, the mpif90 compiler is non-functional." >>>> >>>> >>>> That should be the problem, I will have to change the path to use the >>>> gfortran I have installed. >>>> How could I do it? (Sorry, I am beginner) >>>> >>>> Thanks. >>>> >>>> >>>> El 04/05/2009, a las 17:38, Warner Yuen escribi?: >>>> >>>> Have you installed a Fortran compiler? Mac OS X's developer tools do >>>>> not come with a Fortran compiler, so you'll need to install one if >>>>> you haven't already done so. I routinely use the Intel IFORT >>>>> compilers with success. However, I hear many good things about the >>>>> gfortran compilers on Mac OS X, you can't beat the price of gfortran! >>>>> >>>>> >>>>> Warner Yuen >>>>> Scientific Computing >>>>> Consulting Engineer >>>>> Apple, Inc. >>>>> email: wy...@apple.com >>>>> Tel: 408.718.2859 >>>>> >>>>> >>>>> >>>>> >>>>> On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote: >>>>> >>>>> Send users mailing list submissions to >>>>>> us...@open-mpi.org >>>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> or, via email, send a message with subject or body 'help' to >>>>>> users-requ...@open-mpi.org >>>>>> >>>>>> You can reach the person managing the list at >>>>>> users-ow...@open-mpi.org >>>>>> >>>>>> When replying, please edit your Subject line so it is more specific >>>>>> than "Re: Contents of users digest..." >>>>>> >>>>>> >>>>>> Today's Topics: >>>>>> >>>>>> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente) >>>>>> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) >>>>>> >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> Message: 1 >>>>>> Date: Mon, 4 May 2009 16:12:44 +0200 >>>>>> From: Vicente <vpui...@gmail.com> >>>>>> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >>>>>> To: us...@open-mpi.org >>>>>> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com> >>>>>> Content-Type: text/plain; charset="windows-1252"; Format="flowed"; >>>>>> DelSp="yes" >>>>>> >>>>>> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in >>>>>> Xcode", but it's only for MPICC. I am using MPIF90, so I did the >>>>>> same, >>>>>> but changing MPICC for MPIF90, and also the path, but it did not >>>>>> work. >>>>>> >>>>>> Building target ?fortran? of project ?fortran? with configuration >>>>>> ?Debug? >>>>>> >>>>>> >>>>>> Checking Dependencies >>>>>> Invalid value 'MPIF90' for GCC_VERSION >>>>>> >>>>>> >>>>>> The file "MPIF90.cpcompspec" looks like this: >>>>>> >>>>>> 1 /** >>>>>> 2 Xcode Coompiler Specification for MPIF90 >>>>>> 3 >>>>>> 4 */ >>>>>> 5 >>>>>> 6 { Type = Compiler; >>>>>> 7 Identifier = com.apple.compilers.mpif90; >>>>>> 8 BasedOn = com.apple.compilers.gcc.4_0; >>>>>> 9 Name = "MPIF90"; >>>>>> 10 Version = "Default"; >>>>>> 11 Description = "MPI GNU C/C++ Compiler 4.0"; >>>>>> 12 ExecPath = "/usr/local/bin/mpif90"; // This gets >>>>>> converted to the g++ variant automatically >>>>>> 13 PrecompStyle = pch; >>>>>> 14 } >>>>>> >>>>>> and is located in "/Developer/Library/Xcode/Plug-ins" >>>>>> >>>>>> and when I do mpif90 -v on terminal it works well: >>>>>> >>>>>> Using built-in specs. >>>>>> Target: i386-apple-darwin8.10.1 >>>>>> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- >>>>>> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ >>>>>> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap >>>>>> Thread model: posix >>>>>> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] >>>>>> (GCC) >>>>>> >>>>>> >>>>>> Any idea?? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Vincent >>>>>> -------------- next part -------------- >>>>>> HTML attachment scrubbed and removed >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> Message: 2 >>>>>> Date: Mon, 4 May 2009 08:28:26 -0600 >>>>>> From: Ralph Castain <r...@open-mpi.org> >>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>> Message-ID: >>>>>> <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com> >>>>>> Content-Type: text/plain; charset="iso-8859-1" >>>>>> >>>>>> Unfortunately, I didn't write any of that code - I was just fixing >>>>>> the >>>>>> mapper so it would properly map the procs. From what I can tell, >>>>>> the proper >>>>>> things are happening there. >>>>>> >>>>>> I'll have to dig into the code that specifically deals with parsing >>>>>> the >>>>>> results to bind the processes. Afraid that will take awhile longer >>>>>> - pretty >>>>>> dark in that hole. >>>>>> >>>>>> >>>>>> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot >>>>>> <geopig...@gmail.com> wrote: >>>>>> >>>>>> Hi, >>>>>>> >>>>>>> So, there are no more crashes with my "crazy" mpirun command. But >>>>>>> the >>>>>>> paffinity feature seems to be broken. Indeed I am not able to pin my >>>>>>> processes. >>>>>>> >>>>>>> Simple test with a program using your plpa library : >>>>>>> >>>>>>> r011n006% cat hostf >>>>>>> r011n006 slots=4 >>>>>>> >>>>>>> r011n006% cat rankf >>>>>>> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ? >>>>>>> >>>>>>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- >>>>>>> rankfile >>>>>>> rankf --wdir /tmp -n 1 a.out >>>>>>> >>>>>>>> PLPA Number of processors online: 4 >>>>>>>>>> PLPA Number of processor sockets: 2 >>>>>>>>>> PLPA Socket 0 (ID 0): 2 cores >>>>>>>>>> PLPA Socket 1 (ID 3): 2 cores >>>>>>>>>> >>>>>>>>> >>>>>>> Ctrl+Z >>>>>>> r011n006%bg >>>>>>> >>>>>>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot >>>>>>> R+ gpignot 3 9271 97.8 a.out >>>>>>> >>>>>>> In fact whatever the slot number I put in my rankfile , a.out >>>>>>> always runs >>>>>>> on the CPU 3. I was looking for it on CPU 0 accordind to my >>>>>>> cpuinfo file >>>>>>> (see below) >>>>>>> The result is the same if I try another syntax (rank 0=r011n006 >>>>>>> slot=0:0 >>>>>>> bind to socket 0 - core 0 , exact ? ) >>>>>>> >>>>>>> Thanks in advance >>>>>>> >>>>>>> Geoffroy >>>>>>> >>>>>>> PS: I run on rhel5 >>>>>>> >>>>>>> r011n006% uname -a >>>>>>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 >>>>>>> CDT 2008 >>>>>>> x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> My configure is : >>>>>>> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/ >>>>>>> lib64' >>>>>>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous >>>>>>> >>>>>>> >>>>>>> r011n006% cat /proc/cpuinfo >>>>>>> processor : 0 >>>>>>> vendor_id : GenuineIntel >>>>>>> cpu family : 6 >>>>>>> model : 15 >>>>>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>>>>>> stepping : 6 >>>>>>> cpu MHz : 2660.007 >>>>>>> cache size : 4096 KB >>>>>>> physical id : 0 >>>>>>> siblings : 2 >>>>>>> core id : 0 >>>>>>> cpu cores : 2 >>>>>>> fpu : yes >>>>>>> fpu_exception : yes >>>>>>> cpuid level : 10 >>>>>>> wp : yes >>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>>>>>> pge mca >>>>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>>>>>> nx lm >>>>>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>>>>>> bogomips : 5323.68 >>>>>>> clflush size : 64 >>>>>>> cache_alignment : 64 >>>>>>> address sizes : 36 bits physical, 48 bits virtual >>>>>>> power management: >>>>>>> >>>>>>> processor : 1 >>>>>>> vendor_id : GenuineIntel >>>>>>> cpu family : 6 >>>>>>> model : 15 >>>>>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>>>>>> stepping : 6 >>>>>>> cpu MHz : 2660.007 >>>>>>> cache size : 4096 KB >>>>>>> physical id : 3 >>>>>>> siblings : 2 >>>>>>> core id : 0 >>>>>>> cpu cores : 2 >>>>>>> fpu : yes >>>>>>> fpu_exception : yes >>>>>>> cpuid level : 10 >>>>>>> wp : yes >>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>>>>>> pge mca >>>>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>>>>>> nx lm >>>>>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>>>>>> bogomips : 5320.03 >>>>>>> clflush size : 64 >>>>>>> cache_alignment : 64 >>>>>>> address sizes : 36 bits physical, 48 bits virtual >>>>>>> power management: >>>>>>> >>>>>>> processor : 2 >>>>>>> vendor_id : GenuineIntel >>>>>>> cpu family : 6 >>>>>>> model : 15 >>>>>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>>>>>> stepping : 6 >>>>>>> cpu MHz : 2660.007 >>>>>>> cache size : 4096 KB >>>>>>> physical id : 0 >>>>>>> siblings : 2 >>>>>>> core id : 1 >>>>>>> cpu cores : 2 >>>>>>> fpu : yes >>>>>>> fpu_exception : yes >>>>>>> cpuid level : 10 >>>>>>> wp : yes >>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>>>>>> pge mca >>>>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>>>>>> nx lm >>>>>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>>>>>> bogomips : 5319.39 >>>>>>> clflush size : 64 >>>>>>> cache_alignment : 64 >>>>>>> address sizes : 36 bits physical, 48 bits virtual >>>>>>> power management: >>>>>>> >>>>>>> processor : 3 >>>>>>> vendor_id : GenuineIntel >>>>>>> cpu family : 6 >>>>>>> model : 15 >>>>>>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>>>>>> stepping : 6 >>>>>>> cpu MHz : 2660.007 >>>>>>> cache size : 4096 KB >>>>>>> physical id : 3 >>>>>>> siblings : 2 >>>>>>> core id : 1 >>>>>>> cpu cores : 2 >>>>>>> fpu : yes >>>>>>> fpu_exception : yes >>>>>>> cpuid level : 10 >>>>>>> wp : yes >>>>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>>>>>> pge mca >>>>>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>>>>>> nx lm >>>>>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>>>>>> bogomips : 5320.03 >>>>>>> clflush size : 64 >>>>>>> cache_alignment : 64 >>>>>>> address sizes : 36 bits physical, 48 bits virtual >>>>>>> power management: >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>>> >>>>>>>> Message: 2 >>>>>>>> Date: Mon, 4 May 2009 04:45:57 -0600 >>>>>>>> From: Ralph Castain <r...@open-mpi.org> >>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>> Message-ID: <d01d7b16-4b47-46f3-ad41-d1a90b2e4...@open-mpi.org> >>>>>>>> >>>>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >>>>>>>> DelSp="yes" >>>>>>>> >>>>>>>> My apologies - I wasn't clear enough. You need a tarball from >>>>>>>> r21111 >>>>>>>> or greater...such as: >>>>>>>> >>>>>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz >>>>>>>> >>>>>>>> HTH >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote: >>>>>>>> >>>>>>>> Hi , >>>>>>>>> >>>>>>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my >>>>>>>>> command doesn't work >>>>>>>>> >>>>>>>>> cat rankf: >>>>>>>>> rank 0=node1 slot=* >>>>>>>>> rank 1=node2 slot=* >>>>>>>>> >>>>>>>>> cat hostf: >>>>>>>>> node1 slots=2 >>>>>>>>> node2 slots=2 >>>>>>>>> >>>>>>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1 >>>>>>>>> hostname : --host node2 -n 1 hostname >>>>>>>>> >>>>>>>>> Error, invalid rank (1) in the rankfile (rankf) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>> file >>>>>>>>> rmaps_rank_file.c at line 403 >>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>> file >>>>>>>>> base/rmaps_base_map_job.c at line 86 >>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>> file >>>>>>>>> base/plm_base_launch_support.c at line 86 >>>>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>> file >>>>>>>>> plm_rsh_module.c at line 1016 >>>>>>>>> >>>>>>>>> >>>>>>>>> Ralph, could you tell me if my command syntax is correct or >>>>>>>>> not ? if >>>>>>>>> not, give me the expected one ? >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> Geoffroy >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2009/4/30 Geoffroy Pignot <geopig...@gmail.com> >>>>>>>>> Immediately Sir !!! :) >>>>>>>>> >>>>>>>>> Thanks again Ralph >>>>>>>>> >>>>>>>>> Geoffroy >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> >>>>>>>>> Message: 2 >>>>>>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600 >>>>>>>>> From: Ralph Castain <r...@open-mpi.org> >>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>> Message-ID: >>>>>>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com> >>>>>>>>> Content-Type: text/plain; charset="iso-8859-1" >>>>>>>>> >>>>>>>>> I believe this is fixed now in our development trunk - you can >>>>>>>>> download any >>>>>>>>> tarball starting from last night and give it a try, if you like. >>>>>>>>> Any >>>>>>>>> feedback would be appreciated. >>>>>>>>> >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote: >>>>>>>>> >>>>>>>>> Ah now, I didn't say it -worked-, did I? :-) >>>>>>>>> >>>>>>>>> Clearly a bug exists in the program. I'll try to take a look at it >>>>>>>>> (if Lenny >>>>>>>>> doesn't get to it first), but it won't be until later in the week. >>>>>>>>> >>>>>>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: >>>>>>>>> >>>>>>>>> I agree with you Ralph , and that 's what I expect from openmpi >>>>>>>>> but my >>>>>>>>> second example shows that it's not working >>>>>>>>> >>>>>>>>> cat hostfile.0 >>>>>>>>> r011n002 slots=4 >>>>>>>>> r011n003 slots=4 >>>>>>>>> >>>>>>>>> cat rankfile.0 >>>>>>>>> rank 0=r011n002 slot=0 >>>>>>>>> rank 1=r011n003 slot=1 >>>>>>>>> >>>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >>>>>>>>> hostname >>>>>>>>> ### CRASHED >>>>>>>>> >>>>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> rmaps_rank_file.c at line 404 >>>>>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> base/rmaps_base_map_job.c at line 87 >>>>>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> base/plm_base_launch_support.c at line 77 >>>>>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> plm_rsh_module.c at line 985 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> A daemon (pid unknown) died unexpectedly on signal 1 while >>>>>>>>>>> >>>>>>>>>> attempting to >>>>>>>>>> >>>>>>>>>>> launch so we are aborting. >>>>>>>>>>> >>>>>>>>>>> There may be more information reported by the environment (see >>>>>>>>>>> >>>>>>>>>> above). >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This may be because the daemon was unable to find all the needed >>>>>>>>>>> >>>>>>>>>> shared >>>>>>>>>> >>>>>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH >>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>> have the >>>>>>>>>> >>>>>>>>>>> location of the shared libraries on the remote nodes and this >>>>>>>>>>> will >>>>>>>>>>> automatically be forwarded to the remote nodes. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> orterun noticed that the job aborted, but has no info as to the >>>>>>>>>>> >>>>>>>>>> process >>>>>>>>>> >>>>>>>>>>> that caused that situation. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> orterun: clean termination accomplished >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Message: 4 >>>>>>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600 >>>>>>>>> From: Ralph Castain <r...@lanl.gov> >>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> >>>>>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >>>>>>>>> DelSp="yes" >>>>>>>>> >>>>>>>>> The rankfile cuts across the entire job - it isn't applied on an >>>>>>>>> app_context basis. So the ranks in your rankfile must correspond >>>>>>>>> to >>>>>>>>> the eventual rank of each process in the cmd line. >>>>>>>>> >>>>>>>>> Unfortunately, that means you have to count ranks. In your case, >>>>>>>>> you >>>>>>>>> only have four, so that makes life easier. Your rankfile would >>>>>>>>> look >>>>>>>>> something like this: >>>>>>>>> >>>>>>>>> rank 0=r001n001 slot=0 >>>>>>>>> rank 1=r001n002 slot=1 >>>>>>>>> rank 2=r001n001 slot=1 >>>>>>>>> rank 3=r001n002 slot=2 >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I agree that my examples are not very clear. What I want to do >>>>>>>>>> is to >>>>>>>>>> launch a multiexes application (masters-slaves) and benefit >>>>>>>>>> from the >>>>>>>>>> processor affinity. >>>>>>>>>> Could you show me how to convert this command , using -rf option >>>>>>>>>> (whatever the affinity is) >>>>>>>>>> >>>>>>>>>> mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host >>>>>>>>>> r001n002 >>>>>>>>>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 - >>>>>>>>>> host r001n002 slave.x options4 >>>>>>>>>> >>>>>>>>>> Thanks for your help >>>>>>>>>> >>>>>>>>>> Geoffroy >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Message: 2 >>>>>>>>>> Date: Sun, 12 Apr 2009 18:26:35 +0300 >>>>>>>>>> From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> >>>>>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>>> Message-ID: >>>>>>>>>> >>>>>>>>>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com> >>>>>>>>>> Content-Type: text/plain; charset="iso-8859-1" >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> The first "crash" is OK, since your rankfile has ranks 0 and 1 >>>>>>>>>> defined, >>>>>>>>>> while n=1, which means only rank 0 is present and can be >>>>>>>>>> allocated. >>>>>>>>>> >>>>>>>>>> NP must be >= the largest rank in rankfile. >>>>>>>>>> >>>>>>>>>> What exactly are you trying to do ? >>>>>>>>>> >>>>>>>>>> I tried to recreate your seqv but all I got was >>>>>>>>>> >>>>>>>>>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile >>>>>>>>>> hostfile.0 >>>>>>>>>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname >>>>>>>>>> [witch19:30798] mca: base: component_find: paffinity >>>>>>>>>> "mca_paffinity_linux" >>>>>>>>>> uses an MCA interface that is not recognized (component MCA >>>>>>>>>> >>>>>>>>> v1.0.0 != >>>>>>>>> >>>>>>>>>> supported MCA v2.0.0) -- ignored >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> It looks like opal_init failed for some reason; your parallel >>>>>>>>>> process is >>>>>>>>>> likely to abort. There are many reasons that a parallel process >>>>>>>>>> can >>>>>>>>>> fail during opal_init; some of which are due to configuration or >>>>>>>>>> environment problems. This failure appears to be an internal >>>>>>>>>> >>>>>>>>> failure; >>>>>>>>> >>>>>>>>>> here's some additional information (which may only be relevant >>>>>>>>>> to an >>>>>>>>>> Open MPI developer): >>>>>>>>>> >>>>>>>>>> opal_carto_base_select failed >>>>>>>>>> --> Returned value -13 instead of OPAL_SUCCESS >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >>>>>>>>>> >>>>>>>>> file >>>>>>>>> >>>>>>>>>> ../../orte/runtime/orte_init.c at line 78 >>>>>>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >>>>>>>>>> >>>>>>>>> file >>>>>>>>> >>>>>>>>>> ../../orte/orted/orted_main.c at line 344 >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> A daemon (pid 11629) died unexpectedly with status 243 while >>>>>>>>>> attempting >>>>>>>>>> to launch so we are aborting. >>>>>>>>>> >>>>>>>>>> There may be more information reported by the environment (see >>>>>>>>>> >>>>>>>>> above). >>>>>>>>> >>>>>>>>>> >>>>>>>>>> This may be because the daemon was unable to find all the needed >>>>>>>>>> shared >>>>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>>>>> have the >>>>>>>>>> location of the shared libraries on the remote nodes and this >>>>>>>>>> will >>>>>>>>>> automatically be forwarded to the remote nodes. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> mpirun noticed that the job aborted, but has no info as to the >>>>>>>>>> >>>>>>>>> process >>>>>>>>> >>>>>>>>>> that caused that situation. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> mpirun: clean termination accomplished >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Lenny. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi , >>>>>>>>>>> >>>>>>>>>>> I am currently testing the process affinity capabilities of >>>>>>>>>>> >>>>>>>>>> openmpi and I >>>>>>>>>> >>>>>>>>>>> would like to know if the rankfile behaviour I will describe >>>>>>>>>>> below >>>>>>>>>>> >>>>>>>>>> is normal >>>>>>>>>> >>>>>>>>>>> or not ? >>>>>>>>>>> >>>>>>>>>>> cat hostfile.0 >>>>>>>>>>> r011n002 slots=4 >>>>>>>>>>> r011n003 slots=4 >>>>>>>>>>> >>>>>>>>>>> cat rankfile.0 >>>>>>>>>>> rank 0=r011n002 slot=0 >>>>>>>>>>> rank 1=r011n003 slot=1 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> ################################################################################## >>>> >>>>> >>>>>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ### >>>>>>>>>>> OK >>>>>>>>>>> r011n002 >>>>>>>>>>> r011n003 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> ################################################################################## >>>> >>>>> but >>>>>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >>>>>>>>>>> >>>>>>>>>> hostname >>>>>>>>>> >>>>>>>>>>> ### CRASHED >>>>>>>>>>> * >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> Error, invalid rank (1) in the rankfile (rankfile.0) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> rmaps_rank_file.c at line 404 >>>>>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> base/rmaps_base_map_job.c at line 87 >>>>>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> base/plm_base_launch_support.c at line 77 >>>>>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>>>>>> >>>>>>>>>> file >>>>>>>>> >>>>>>>>>> plm_rsh_module.c at line 985 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> A daemon (pid unknown) died unexpectedly on signal 1 while >>>>>>>>>>> >>>>>>>>>> attempting to >>>>>>>>>> >>>>>>>>>>> launch so we are aborting. >>>>>>>>>>> >>>>>>>>>>> There may be more information reported by the environment (see >>>>>>>>>>> >>>>>>>>>> above). >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This may be because the daemon was unable to find all the needed >>>>>>>>>>> >>>>>>>>>> shared >>>>>>>>>> >>>>>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH >>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>> have the >>>>>>>>>> >>>>>>>>>>> location of the shared libraries on the remote nodes and this >>>>>>>>>>> will >>>>>>>>>>> automatically be forwarded to the remote nodes. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> orterun noticed that the job aborted, but has no info as to the >>>>>>>>>>> >>>>>>>>>> process >>>>>>>>>> >>>>>>>>>>> that caused that situation. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>> >>>>> orterun: clean termination accomplished >>>>>>>>>>> * >>>>>>>>>>> It seems that the rankfile option is not propagted to the second >>>>>>>>>>> >>>>>>>>>> command >>>>>>>>>> >>>>>>>>>>> line ; there is no global understanding of the ranking inside a >>>>>>>>>>> >>>>>>>>>> mpirun >>>>>>>>>> >>>>>>>>>>> command. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> ################################################################################## >>>> >>>>> >>>>>>>>>>> Assuming that , I tried to provide a rankfile to each command >>>>>>>>>>> >>>>>>>>>> line: >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> cat rankfile.0 >>>>>>>>>>> rank 0=r011n002 slot=0 >>>>>>>>>>> >>>>>>>>>>> cat rankfile.1 >>>>>>>>>>> rank 0=r011n003 slot=1 >>>>>>>>>>> >>>>>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf >>>>>>>>>>> >>>>>>>>>> rankfile.1 >>>>>>>>>> >>>>>>>>>>> -n 1 hostname ### CRASHED >>>>>>>>>>> *[r011n002:28778] *** Process received signal *** >>>>>>>>>>> [r011n002:28778] Signal: Segmentation fault (11) >>>>>>>>>>> [r011n002:28778] Signal code: Address not mapped (1) >>>>>>>>>>> [r011n002:28778] Failing at address: 0x34 >>>>>>>>>>> [r011n002:28778] [ 0] [0xffffe600] >>>>>>>>>>> [r011n002:28778] [ 1] >>>>>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >>>>>>>>>>> >>>>>>>>>> 0(orte_odls_base_default_get_add_procs_data+0x55d) >>>>>>>>>> >>>>>>>>>>> [0x5557decd] >>>>>>>>>>> [r011n002:28778] [ 2] >>>>>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >>>>>>>>>>> >>>>>>>>>> 0(orte_plm_base_launch_apps+0x117) >>>>>>>>>> >>>>>>>>>>> [0x555842a7] >>>>>>>>>>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/ >>>>>>>>>>> >>>>>>>>>> mca_plm_rsh.so >>>>>>>>>> >>>>>>>>>>> [0x556098c0] >>>>>>>>>>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>>>>>>>>>> >>>>>>>>>> [0x804aa27] >>>>>>>>>> >>>>>>>>>>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>>>>>>>>>> >>>>>>>>>> [0x804a022] >>>>>>>>>> >>>>>>>>>>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc) >>>>>>>>>>> >>>>>>>>>> [0x9f1dec] >>>>>>>>>> >>>>>>>>>>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>>>>>>>>>> >>>>>>>>>> [0x8049f71] >>>>>>>>>> >>>>>>>>>>> [r011n002:28778] *** End of error message *** >>>>>>>>>>> Segmentation fault (core dumped)* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I hope that I've found a bug because it would be very important >>>>>>>>>>> >>>>>>>>>> for me to >>>>>>>>>> >>>>>>>>>>> have this kind of capabiliy . >>>>>>>>>>> Launch a multiexe mpirun command line and be able to bind my >>>>>>>>>>> exes >>>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>>> sockets together. >>>>>>>>>>> >>>>>>>>>>> Thanks in advance for your help >>>>>>>>>>> >>>>>>>>>>> Geoffroy >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>> >>>>>>>>> -------------- next part -------------- >>>>>>>>> HTML attachment scrubbed and removed >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> End of users Digest, Vol 1202, Issue 2 >>>>>>>>> ************************************** >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> -------------- next part -------------- >>>>>>>>> HTML attachment scrubbed and removed >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> End of users Digest, Vol 1218, Issue 2 >>>>>>>>> ************************************** >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>> >>>>>>>> -------------- next part -------------- >>>>>>>> HTML attachment scrubbed and removed >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> End of users Digest, Vol 1221, Issue 3 >>>>>>>> ************************************** >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> -------------- next part -------------- >>>>>> HTML attachment scrubbed and removed >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> End of users Digest, Vol 1221, Issue 6 >>>>>> ************************************** >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -------------- next part -------------- >> HTML attachment scrubbed and removed >> >> ------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> End of users Digest, Vol 1221, Issue 12 >> *************************************** >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >