If I can not make it work with Xcode, which one could I use?, which one do you use to compile and debug OpenMPI?. Thanks
Vincent 2009/5/4 Jeff Squyres <jsquy...@cisco.com> > Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard > doesn't ship with a Fortran compiler, the Open MPI that Apple ships has > non-functional mpif77 and mpif90 wrapper compilers. > > So the Open MPI that you installed manually will use your Fortran > compilers, and therefore will have functional mpif77 and mpif90 wrapper > compilers. Hence, you probably need to be sure to use the "right" wrapper > compilers. It looks like you specified the full path specified to ExecPath, > so I'm not sure why Xcode wouldn't work with that (like I mentioned, I > unfortunately don't use Xcode myself, so I don't know why that wouldn't > work). > > > > > On May 4, 2009, at 11:53 AM, Vicente wrote: > > Yes, I already have gfortran compiler on /usr/local/bin, the same path >> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin >> and on /Developer/usr/bin says it: >> >> "Unfortunately, this installation of Open MPI was not compiled with >> Fortran 90 support. As such, the mpif90 compiler is non-functional." >> >> >> That should be the problem, I will have to change the path to use the >> gfortran I have installed. >> How could I do it? (Sorry, I am beginner) >> >> Thanks. >> >> >> El 04/05/2009, a las 17:38, Warner Yuen escribió: >> >> > Have you installed a Fortran compiler? Mac OS X's developer tools do >> > not come with a Fortran compiler, so you'll need to install one if >> > you haven't already done so. I routinely use the Intel IFORT >> > compilers with success. However, I hear many good things about the >> > gfortran compilers on Mac OS X, you can't beat the price of gfortran! >> > >> > >> > Warner Yuen >> > Scientific Computing >> > Consulting Engineer >> > Apple, Inc. >> > email: wy...@apple.com >> > Tel: 408.718.2859 >> > >> > >> > >> > >> > On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote: >> > >> >> Send users mailing list submissions to >> >> us...@open-mpi.org >> >> >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> or, via email, send a message with subject or body 'help' to >> >> users-requ...@open-mpi.org >> >> >> >> You can reach the person managing the list at >> >> users-ow...@open-mpi.org >> >> >> >> When replying, please edit your Subject line so it is more specific >> >> than "Re: Contents of users digest..." >> >> >> >> >> >> Today's Topics: >> >> >> >> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente) >> >> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> Message: 1 >> >> Date: Mon, 4 May 2009 16:12:44 +0200 >> >> From: Vicente <vpui...@gmail.com> >> >> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >> >> To: us...@open-mpi.org >> >> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com> >> >> Content-Type: text/plain; charset="windows-1252"; Format="flowed"; >> >> DelSp="yes" >> >> >> >> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in >> >> Xcode", but it's only for MPICC. I am using MPIF90, so I did the >> >> same, >> >> but changing MPICC for MPIF90, and also the path, but it did not >> >> work. >> >> >> >> Building target ?fortran? of project ?fortran? with configuration >> >> ?Debug? >> >> >> >> >> >> Checking Dependencies >> >> Invalid value 'MPIF90' for GCC_VERSION >> >> >> >> >> >> The file "MPIF90.cpcompspec" looks like this: >> >> >> >> 1 /** >> >> 2 Xcode Coompiler Specification for MPIF90 >> >> 3 >> >> 4 */ >> >> 5 >> >> 6 { Type = Compiler; >> >> 7 Identifier = com.apple.compilers.mpif90; >> >> 8 BasedOn = com.apple.compilers.gcc.4_0; >> >> 9 Name = "MPIF90"; >> >> 10 Version = "Default"; >> >> 11 Description = "MPI GNU C/C++ Compiler 4.0"; >> >> 12 ExecPath = "/usr/local/bin/mpif90"; // This gets >> >> converted to the g++ variant automatically >> >> 13 PrecompStyle = pch; >> >> 14 } >> >> >> >> and is located in "/Developer/Library/Xcode/Plug-ins" >> >> >> >> and when I do mpif90 -v on terminal it works well: >> >> >> >> Using built-in specs. >> >> Target: i386-apple-darwin8.10.1 >> >> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- >> >> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ >> >> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap >> >> Thread model: posix >> >> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] >> >> (GCC) >> >> >> >> >> >> Any idea?? >> >> >> >> Thanks. >> >> >> >> Vincent >> >> -------------- next part -------------- >> >> HTML attachment scrubbed and removed >> >> >> >> ------------------------------ >> >> >> >> Message: 2 >> >> Date: Mon, 4 May 2009 08:28:26 -0600 >> >> From: Ralph Castain <r...@open-mpi.org> >> >> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >> >> To: Open MPI Users <us...@open-mpi.org> >> >> Message-ID: >> >> <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com> >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> >> Unfortunately, I didn't write any of that code - I was just fixing >> >> the >> >> mapper so it would properly map the procs. From what I can tell, >> >> the proper >> >> things are happening there. >> >> >> >> I'll have to dig into the code that specifically deals with parsing >> >> the >> >> results to bind the processes. Afraid that will take awhile longer >> >> - pretty >> >> dark in that hole. >> >> >> >> >> >> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot >> >> <geopig...@gmail.com> wrote: >> >> >> >>> Hi, >> >>> >> >>> So, there are no more crashes with my "crazy" mpirun command. But >> >>> the >> >>> paffinity feature seems to be broken. Indeed I am not able to pin my >> >>> processes. >> >>> >> >>> Simple test with a program using your plpa library : >> >>> >> >>> r011n006% cat hostf >> >>> r011n006 slots=4 >> >>> >> >>> r011n006% cat rankf >> >>> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ? >> >>> >> >>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- >> >>> rankfile >> >>> rankf --wdir /tmp -n 1 a.out >> >>>>>> PLPA Number of processors online: 4 >> >>>>>> PLPA Number of processor sockets: 2 >> >>>>>> PLPA Socket 0 (ID 0): 2 cores >> >>>>>> PLPA Socket 1 (ID 3): 2 cores >> >>> >> >>> Ctrl+Z >> >>> r011n006%bg >> >>> >> >>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot >> >>> R+ gpignot 3 9271 97.8 a.out >> >>> >> >>> In fact whatever the slot number I put in my rankfile , a.out >> >>> always runs >> >>> on the CPU 3. I was looking for it on CPU 0 accordind to my >> >>> cpuinfo file >> >>> (see below) >> >>> The result is the same if I try another syntax (rank 0=r011n006 >> >>> slot=0:0 >> >>> bind to socket 0 - core 0 , exact ? ) >> >>> >> >>> Thanks in advance >> >>> >> >>> Geoffroy >> >>> >> >>> PS: I run on rhel5 >> >>> >> >>> r011n006% uname -a >> >>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 >> >>> CDT 2008 >> >>> x86_64 x86_64 x86_64 GNU/Linux >> >>> >> >>> My configure is : >> >>> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/ >> >>> lib64' >> >>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous >> >>> >> >>> >> >>> r011n006% cat /proc/cpuinfo >> >>> processor : 0 >> >>> vendor_id : GenuineIntel >> >>> cpu family : 6 >> >>> model : 15 >> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >> >>> stepping : 6 >> >>> cpu MHz : 2660.007 >> >>> cache size : 4096 KB >> >>> physical id : 0 >> >>> siblings : 2 >> >>> core id : 0 >> >>> cpu cores : 2 >> >>> fpu : yes >> >>> fpu_exception : yes >> >>> cpuid level : 10 >> >>> wp : yes >> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >> >>> pge mca >> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >> >>> nx lm >> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >> >>> bogomips : 5323.68 >> >>> clflush size : 64 >> >>> cache_alignment : 64 >> >>> address sizes : 36 bits physical, 48 bits virtual >> >>> power management: >> >>> >> >>> processor : 1 >> >>> vendor_id : GenuineIntel >> >>> cpu family : 6 >> >>> model : 15 >> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >> >>> stepping : 6 >> >>> cpu MHz : 2660.007 >> >>> cache size : 4096 KB >> >>> physical id : 3 >> >>> siblings : 2 >> >>> core id : 0 >> >>> cpu cores : 2 >> >>> fpu : yes >> >>> fpu_exception : yes >> >>> cpuid level : 10 >> >>> wp : yes >> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >> >>> pge mca >> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >> >>> nx lm >> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >> >>> bogomips : 5320.03 >> >>> clflush size : 64 >> >>> cache_alignment : 64 >> >>> address sizes : 36 bits physical, 48 bits virtual >> >>> power management: >> >>> >> >>> processor : 2 >> >>> vendor_id : GenuineIntel >> >>> cpu family : 6 >> >>> model : 15 >> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >> >>> stepping : 6 >> >>> cpu MHz : 2660.007 >> >>> cache size : 4096 KB >> >>> physical id : 0 >> >>> siblings : 2 >> >>> core id : 1 >> >>> cpu cores : 2 >> >>> fpu : yes >> >>> fpu_exception : yes >> >>> cpuid level : 10 >> >>> wp : yes >> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >> >>> pge mca >> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >> >>> nx lm >> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >> >>> bogomips : 5319.39 >> >>> clflush size : 64 >> >>> cache_alignment : 64 >> >>> address sizes : 36 bits physical, 48 bits virtual >> >>> power management: >> >>> >> >>> processor : 3 >> >>> vendor_id : GenuineIntel >> >>> cpu family : 6 >> >>> model : 15 >> >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >> >>> stepping : 6 >> >>> cpu MHz : 2660.007 >> >>> cache size : 4096 KB >> >>> physical id : 3 >> >>> siblings : 2 >> >>> core id : 1 >> >>> cpu cores : 2 >> >>> fpu : yes >> >>> fpu_exception : yes >> >>> cpuid level : 10 >> >>> wp : yes >> >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >> >>> pge mca >> >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >> >>> nx lm >> >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >> >>> bogomips : 5320.03 >> >>> clflush size : 64 >> >>> cache_alignment : 64 >> >>> address sizes : 36 bits physical, 48 bits virtual >> >>> power management: >> >>> >> >>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 2 >> >>>> Date: Mon, 4 May 2009 04:45:57 -0600 >> >>>> From: Ralph Castain <r...@open-mpi.org> >> >>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >> >>>> To: Open MPI Users <us...@open-mpi.org> >> >>>> Message-ID: <d01d7b16-4b47-46f3-ad41-d1a90b2e4...@open-mpi.org> >> >>>> >> >>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >> >>>> DelSp="yes" >> >>>> >> >>>> My apologies - I wasn't clear enough. You need a tarball from >> >>>> r21111 >> >>>> or greater...such as: >> >>>> >> >>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz >> >>>> >> >>>> HTH >> >>>> Ralph >> >>>> >> >>>> >> >>>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote: >> >>>> >> >>>>> Hi , >> >>>>> >> >>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my >> >>>>> command doesn't work >> >>>>> >> >>>>> cat rankf: >> >>>>> rank 0=node1 slot=* >> >>>>> rank 1=node2 slot=* >> >>>>> >> >>>>> cat hostf: >> >>>>> node1 slots=2 >> >>>>> node2 slots=2 >> >>>>> >> >>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1 >> >>>>> hostname : --host node2 -n 1 hostname >> >>>>> >> >>>>> Error, invalid rank (1) in the rankfile (rankf) >> >>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>> rmaps_rank_file.c at line 403 >> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>> base/rmaps_base_map_job.c at line 86 >> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>> base/plm_base_launch_support.c at line 86 >> >>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>> plm_rsh_module.c at line 1016 >> >>>>> >> >>>>> >> >>>>> Ralph, could you tell me if my command syntax is correct or >> >>>>> not ? if >> >>>>> not, give me the expected one ? >> >>>>> >> >>>>> Regards >> >>>>> >> >>>>> Geoffroy >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> 2009/4/30 Geoffroy Pignot <geopig...@gmail.com> >> >>>>> Immediately Sir !!! :) >> >>>>> >> >>>>> Thanks again Ralph >> >>>>> >> >>>>> Geoffroy >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> ------------------------------ >> >>>>> >> >>>>> Message: 2 >> >>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600 >> >>>>> From: Ralph Castain <r...@open-mpi.org> >> >>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >> >>>>> To: Open MPI Users <us...@open-mpi.org> >> >>>>> Message-ID: >> >>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com> >> >>>>> Content-Type: text/plain; charset="iso-8859-1" >> >>>>> >> >>>>> I believe this is fixed now in our development trunk - you can >> >>>>> download any >> >>>>> tarball starting from last night and give it a try, if you like. >> >>>>> Any >> >>>>> feedback would be appreciated. >> >>>>> >> >>>>> Ralph >> >>>>> >> >>>>> >> >>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote: >> >>>>> >> >>>>> Ah now, I didn't say it -worked-, did I? :-) >> >>>>> >> >>>>> Clearly a bug exists in the program. I'll try to take a look at it >> >>>>> (if Lenny >> >>>>> doesn't get to it first), but it won't be until later in the week. >> >>>>> >> >>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: >> >>>>> >> >>>>> I agree with you Ralph , and that 's what I expect from openmpi >> >>>>> but my >> >>>>> second example shows that it's not working >> >>>>> >> >>>>> cat hostfile.0 >> >>>>> r011n002 slots=4 >> >>>>> r011n003 slots=4 >> >>>>> >> >>>>> cat rankfile.0 >> >>>>> rank 0=r011n002 slot=0 >> >>>>> rank 1=r011n003 slot=1 >> >>>>> >> >>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >> >>>>> hostname >> >>>>> ### CRASHED >> >>>>> >> >>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0) >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> rmaps_rank_file.c at line 404 >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> base/rmaps_base_map_job.c at line 87 >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> base/plm_base_launch_support.c at line 77 >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> plm_rsh_module.c at line 985 >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while >> >>>>>> attempting to >> >>>>>>> launch so we are aborting. >> >>>>>>> >> >>>>>>> There may be more information reported by the environment (see >> >>>>>> above). >> >>>>>>> >> >>>>>>> This may be because the daemon was unable to find all the needed >> >>>>>> shared >> >>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH >> >>>>>>> to >> >>>>>> have the >> >>>>>>> location of the shared libraries on the remote nodes and this >> >>>>>>> will >> >>>>>>> automatically be forwarded to the remote nodes. >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> orterun noticed that the job aborted, but has no info as to the >> >>>>>> process >> >>>>>>> that caused that situation. >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> orterun: clean termination accomplished >> >>>>> >> >>>>> >> >>>>> >> >>>>> Message: 4 >> >>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600 >> >>>>> From: Ralph Castain <r...@lanl.gov> >> >>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >> >>>>> To: Open MPI Users <us...@open-mpi.org> >> >>>>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> >> >>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >> >>>>> DelSp="yes" >> >>>>> >> >>>>> The rankfile cuts across the entire job - it isn't applied on an >> >>>>> app_context basis. So the ranks in your rankfile must correspond >> >>>>> to >> >>>>> the eventual rank of each process in the cmd line. >> >>>>> >> >>>>> Unfortunately, that means you have to count ranks. In your case, >> >>>>> you >> >>>>> only have four, so that makes life easier. Your rankfile would >> >>>>> look >> >>>>> something like this: >> >>>>> >> >>>>> rank 0=r001n001 slot=0 >> >>>>> rank 1=r001n002 slot=1 >> >>>>> rank 2=r001n001 slot=1 >> >>>>> rank 3=r001n002 slot=2 >> >>>>> >> >>>>> HTH >> >>>>> Ralph >> >>>>> >> >>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote: >> >>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I agree that my examples are not very clear. What I want to do >> >>>>>> is to >> >>>>>> launch a multiexes application (masters-slaves) and benefit >> >>>>>> from the >> >>>>>> processor affinity. >> >>>>>> Could you show me how to convert this command , using -rf option >> >>>>>> (whatever the affinity is) >> >>>>>> >> >>>>>> mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host >> >>>>>> r001n002 >> >>>>>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 - >> >>>>>> host r001n002 slave.x options4 >> >>>>>> >> >>>>>> Thanks for your help >> >>>>>> >> >>>>>> Geoffroy >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Message: 2 >> >>>>>> Date: Sun, 12 Apr 2009 18:26:35 +0300 >> >>>>>> From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> >> >>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >> >>>>>> To: Open MPI Users <us...@open-mpi.org> >> >>>>>> Message-ID: >> >>>>>> >> >>>>>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com> >> >>>>>> Content-Type: text/plain; charset="iso-8859-1" >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> The first "crash" is OK, since your rankfile has ranks 0 and 1 >> >>>>>> defined, >> >>>>>> while n=1, which means only rank 0 is present and can be >> >>>>>> allocated. >> >>>>>> >> >>>>>> NP must be >= the largest rank in rankfile. >> >>>>>> >> >>>>>> What exactly are you trying to do ? >> >>>>>> >> >>>>>> I tried to recreate your seqv but all I got was >> >>>>>> >> >>>>>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile >> >>>>>> hostfile.0 >> >>>>>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname >> >>>>>> [witch19:30798] mca: base: component_find: paffinity >> >>>>>> "mca_paffinity_linux" >> >>>>>> uses an MCA interface that is not recognized (component MCA >> >>>>> v1.0.0 != >> >>>>>> supported MCA v2.0.0) -- ignored >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>> It looks like opal_init failed for some reason; your parallel >> >>>>>> process is >> >>>>>> likely to abort. There are many reasons that a parallel process >> >>>>>> can >> >>>>>> fail during opal_init; some of which are due to configuration or >> >>>>>> environment problems. This failure appears to be an internal >> >>>>> failure; >> >>>>>> here's some additional information (which may only be relevant >> >>>>>> to an >> >>>>>> Open MPI developer): >> >>>>>> >> >>>>>> opal_carto_base_select failed >> >>>>>> --> Returned value -13 instead of OPAL_SUCCESS >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >> >>>>> file >> >>>>>> ../../orte/runtime/orte_init.c at line 78 >> >>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >> >>>>> file >> >>>>>> ../../orte/orted/orted_main.c at line 344 >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>> A daemon (pid 11629) died unexpectedly with status 243 while >> >>>>>> attempting >> >>>>>> to launch so we are aborting. >> >>>>>> >> >>>>>> There may be more information reported by the environment (see >> >>>>> above). >> >>>>>> >> >>>>>> This may be because the daemon was unable to find all the needed >> >>>>>> shared >> >>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to >> >>>>>> have the >> >>>>>> location of the shared libraries on the remote nodes and this >> >>>>>> will >> >>>>>> automatically be forwarded to the remote nodes. >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>> mpirun noticed that the job aborted, but has no info as to the >> >>>>> process >> >>>>>> that caused that situation. >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>> mpirun: clean termination accomplished >> >>>>>> >> >>>>>> >> >>>>>> Lenny. >> >>>>>> >> >>>>>> >> >>>>>> On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote: >> >>>>>>> >> >>>>>>> Hi , >> >>>>>>> >> >>>>>>> I am currently testing the process affinity capabilities of >> >>>>>> openmpi and I >> >>>>>>> would like to know if the rankfile behaviour I will describe >> >>>>>>> below >> >>>>>> is normal >> >>>>>>> or not ? >> >>>>>>> >> >>>>>>> cat hostfile.0 >> >>>>>>> r011n002 slots=4 >> >>>>>>> r011n003 slots=4 >> >>>>>>> >> >>>>>>> cat rankfile.0 >> >>>>>>> rank 0=r011n002 slot=0 >> >>>>>>> rank 1=r011n003 slot=1 >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> ################################################################################## >> >>>>>>> >> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ### >> >>>>>>> OK >> >>>>>>> r011n002 >> >>>>>>> r011n003 >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> ################################################################################## >> >>>>>>> but >> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >> >>>>>> hostname >> >>>>>>> ### CRASHED >> >>>>>>> * >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0) >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> rmaps_rank_file.c at line 404 >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> base/rmaps_base_map_job.c at line 87 >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> base/plm_base_launch_support.c at line 77 >> >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >> >>>>> file >> >>>>>>> plm_rsh_module.c at line 985 >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> A daemon (pid unknown) died unexpectedly on signal 1 while >> >>>>>> attempting to >> >>>>>>> launch so we are aborting. >> >>>>>>> >> >>>>>>> There may be more information reported by the environment (see >> >>>>>> above). >> >>>>>>> >> >>>>>>> This may be because the daemon was unable to find all the needed >> >>>>>> shared >> >>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH >> >>>>>>> to >> >>>>>> have the >> >>>>>>> location of the shared libraries on the remote nodes and this >> >>>>>>> will >> >>>>>>> automatically be forwarded to the remote nodes. >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> orterun noticed that the job aborted, but has no info as to the >> >>>>>> process >> >>>>>>> that caused that situation. >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> -------------------------------------------------------------------------- >> >>>>>>> orterun: clean termination accomplished >> >>>>>>> * >> >>>>>>> It seems that the rankfile option is not propagted to the second >> >>>>>> command >> >>>>>>> line ; there is no global understanding of the ranking inside a >> >>>>>> mpirun >> >>>>>>> command. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> ################################################################################## >> >>>>>>> >> >>>>>>> Assuming that , I tried to provide a rankfile to each command >> >>>>> line: >> >>>>>>> >> >>>>>>> cat rankfile.0 >> >>>>>>> rank 0=r011n002 slot=0 >> >>>>>>> >> >>>>>>> cat rankfile.1 >> >>>>>>> rank 0=r011n003 slot=1 >> >>>>>>> >> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf >> >>>>>> rankfile.1 >> >>>>>>> -n 1 hostname ### CRASHED >> >>>>>>> *[r011n002:28778] *** Process received signal *** >> >>>>>>> [r011n002:28778] Signal: Segmentation fault (11) >> >>>>>>> [r011n002:28778] Signal code: Address not mapped (1) >> >>>>>>> [r011n002:28778] Failing at address: 0x34 >> >>>>>>> [r011n002:28778] [ 0] [0xffffe600] >> >>>>>>> [r011n002:28778] [ 1] >> >>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >> >>>>>> 0(orte_odls_base_default_get_add_procs_data+0x55d) >> >>>>>>> [0x5557decd] >> >>>>>>> [r011n002:28778] [ 2] >> >>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >> >>>>>> 0(orte_plm_base_launch_apps+0x117) >> >>>>>>> [0x555842a7] >> >>>>>>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/ >> >>>>>> mca_plm_rsh.so >> >>>>>>> [0x556098c0] >> >>>>>>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >> >>>>>> [0x804aa27] >> >>>>>>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >> >>>>>> [0x804a022] >> >>>>>>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc) >> >>>>>> [0x9f1dec] >> >>>>>>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >> >>>>>> [0x8049f71] >> >>>>>>> [r011n002:28778] *** End of error message *** >> >>>>>>> Segmentation fault (core dumped)* >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> I hope that I've found a bug because it would be very important >> >>>>>> for me to >> >>>>>>> have this kind of capabiliy . >> >>>>>>> Launch a multiexe mpirun command line and be able to bind my >> >>>>>>> exes >> >>>>>> and >> >>>>>>> sockets together. >> >>>>>>> >> >>>>>>> Thanks in advance for your help >> >>>>>>> >> >>>>>>> Geoffroy >> >>>>>> _______________________________________________ >> >>>>>> users mailing list >> >>>>>> us...@open-mpi.org >> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>>> >> >>>>> -------------- next part -------------- >> >>>>> HTML attachment scrubbed and removed >> >>>>> >> >>>>> ------------------------------ >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>>> >> >>>>> End of users Digest, Vol 1202, Issue 2 >> >>>>> ************************************** >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> -------------- next part -------------- >> >>>>> HTML attachment scrubbed and removed >> >>>>> >> >>>>> ------------------------------ >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>>> >> >>>>> End of users Digest, Vol 1218, Issue 2 >> >>>>> ************************************** >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> users mailing list >> >>>>> us...@open-mpi.org >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>> >> >>>> -------------- next part -------------- >> >>>> HTML attachment scrubbed and removed >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> _______________________________________________ >> >>>> users mailing list >> >>>> us...@open-mpi.org >> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>>> >> >>>> End of users Digest, Vol 1221, Issue 3 >> >>>> ************************************** >> >>>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> users mailing list >> >>> us...@open-mpi.org >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >>> >> >> -------------- next part -------------- >> >> HTML attachment scrubbed and removed >> >> >> >> ------------------------------ >> >> >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> End of users Digest, Vol 1221, Issue 6 >> >> ************************************** >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > Cisco Systems > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >