Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard doesn't ship with a Fortran compiler, the Open MPI that Apple ships has non-functional mpif77 and mpif90 wrapper compilers.

So the Open MPI that you installed manually will use your Fortran compilers, and therefore will have functional mpif77 and mpif90 wrapper compilers. Hence, you probably need to be sure to use the "right" wrapper compilers. It looks like you specified the full path specified to ExecPath, so I'm not sure why Xcode wouldn't work with that (like I mentioned, I unfortunately don't use Xcode myself, so I don't know why that wouldn't work).



On May 4, 2009, at 11:53 AM, Vicente wrote:

Yes, I already have gfortran compiler on /usr/local/bin, the same path
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
and on  /Developer/usr/bin says it:

"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional."


That should be the problem, I will have to change the path to use the
gfortran I have installed.
How could I do it? (Sorry, I am beginner)

Thanks.


El 04/05/2009, a las 17:38, Warner Yuen escribió:

> Have you installed a Fortran compiler? Mac OS X's developer tools do
> not come with a Fortran compiler, so you'll need to install one if
> you haven't already done so. I routinely use the Intel IFORT
> compilers with success. However, I hear many good things about the
> gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wy...@apple.com
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:
>
>> Send users mailing list submissions to
>>      us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>      http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>      users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>      users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>>  2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Mon, 4 May 2009 16:12:44 +0200
>> From: Vicente <vpui...@gmail.com>
>> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: us...@open-mpi.org
>> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
>> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>>      DelSp="yes"
>>
>> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
>> Xcode", but it's only for MPICC. I am using MPIF90, so I did the
>> same,
>> but changing MPICC for MPIF90, and also the path, but it did not
>> work.
>>
>> Building target ?fortran? of project ?fortran? with configuration
>> ?Debug?
>>
>>
>> Checking Dependencies
>> Invalid value 'MPIF90' for GCC_VERSION
>>
>>
>> The file "MPIF90.cpcompspec" looks like this:
>>
>>  1 /**
>>  2         Xcode Coompiler Specification for MPIF90
>>  3
>>  4 */
>>  5
>>  6 {   Type = Compiler;
>>  7     Identifier = com.apple.compilers.mpif90;
>>  8     BasedOn = com.apple.compilers.gcc.4_0;
>>  9     Name = "MPIF90";
>> 10     Version = "Default";
>> 11     Description = "MPI GNU C/C++ Compiler 4.0";
>> 12     ExecPath = "/usr/local/bin/mpif90";      // This gets
>> converted to the g++ variant automatically
>> 13     PrecompStyle = pch;
>> 14 }
>>
>> and is located in "/Developer/Library/Xcode/Plug-ins"
>>
>> and when I do mpif90 -v on terminal it works well:
>>
>> Using built-in specs.
>> Target: i386-apple-darwin8.10.1
>> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
>> prefix=/usr/local/gfortran --enable-languages=c,fortran --with- gmp=/
>> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
>> Thread model: posix
>> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
>> (GCC)
>>
>>
>> Any idea??
>>
>> Thanks.
>>
>> Vincent
>> -------------- next part --------------
>> HTML attachment scrubbed and removed
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 4 May 2009 08:28:26 -0600
>> From: Ralph Castain <r...@open-mpi.org>
>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>> To: Open MPI Users <us...@open-mpi.org>
>> Message-ID:
>>      <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Unfortunately, I didn't write any of that code - I was just fixing
>> the
>> mapper so it would properly map the procs. From what I can tell,
>> the proper
>> things are happening there.
>>
>> I'll have to dig into the code that specifically deals with parsing
>> the
>> results to bind the processes. Afraid that will take awhile longer
>> - pretty
>> dark in that hole.
>>
>>
>> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
>> <geopig...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> So, there are no more crashes with my "crazy" mpirun command. But
>>> the
>>> paffinity feature seems to be broken. Indeed I am not able to pin my
>>> processes.
>>>
>>> Simple test with a program using your plpa library :
>>>
>>> r011n006% cat hostf
>>> r011n006 slots=4
>>>
>>> r011n006% cat rankf
>>> rank 0=r011n006 slot=0   ----> bind to CPU 0 , exact ?
>>>
>>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf --
>>> rankfile
>>> rankf --wdir /tmp -n 1 a.out
>>>>>> PLPA Number of processors online: 4
>>>>>> PLPA Number of processor sockets: 2
>>>>>> PLPA Socket 0 (ID 0): 2 cores
>>>>>> PLPA Socket 1 (ID 3): 2 cores
>>>
>>> Ctrl+Z
>>> r011n006%bg
>>>
>>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
>>> R+   gpignot    3  9271 97.8 a.out
>>>
>>> In fact whatever the slot number I put in my rankfile , a.out
>>> always runs
>>> on the CPU 3. I was looking for it on CPU 0 accordind to my
>>> cpuinfo file
>>> (see below)
>>> The result is the same if I try another syntax (rank 0=r011n006
>>> slot=0:0
>>> bind to socket 0 - core 0  , exact ? )
>>>
>>> Thanks in advance
>>>
>>> Geoffroy
>>>
>>> PS: I run on rhel5
>>>
>>> r011n006% uname -a
>>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39
>>> CDT 2008
>>> x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> My configure is :
>>> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/
>>> lib64'
>>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous
>>>
>>>
>>> r011n006% cat /proc/cpuinfo
>>> processor       : 0
>>> vendor_id       : GenuineIntel
>>> cpu family      : 6
>>> model           : 15
>>> model name      : Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
>>> stepping        : 6
>>> cpu MHz         : 2660.007
>>> cache size      : 4096 KB
>>> physical id     : 0
>>> siblings        : 2
>>> core id         : 0
>>> cpu cores       : 2
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 10
>>> wp              : yes
>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips        : 5323.68
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>> processor       : 1
>>> vendor_id       : GenuineIntel
>>> cpu family      : 6
>>> model           : 15
>>> model name      : Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
>>> stepping        : 6
>>> cpu MHz         : 2660.007
>>> cache size      : 4096 KB
>>> physical id     : 3
>>> siblings        : 2
>>> core id         : 0
>>> cpu cores       : 2
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 10
>>> wp              : yes
>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips        : 5320.03
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>> processor       : 2
>>> vendor_id       : GenuineIntel
>>> cpu family      : 6
>>> model           : 15
>>> model name      : Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
>>> stepping        : 6
>>> cpu MHz         : 2660.007
>>> cache size      : 4096 KB
>>> physical id     : 0
>>> siblings        : 2
>>> core id         : 1
>>> cpu cores       : 2
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 10
>>> wp              : yes
>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips        : 5319.39
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>> processor       : 3
>>> vendor_id       : GenuineIntel
>>> cpu family      : 6
>>> model           : 15
>>> model name      : Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
>>> stepping        : 6
>>> cpu MHz         : 2660.007
>>> cache size      : 4096 KB
>>> physical id     : 3
>>> siblings        : 2
>>> core id         : 1
>>> cpu cores       : 2
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 10
>>> wp              : yes
>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
>>> pge mca
>>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall
>>> nx lm
>>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
>>> bogomips        : 5320.03
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>>
>>>> ------------------------------
>>>>
>>>> Message: 2
>>>> Date: Mon, 4 May 2009 04:45:57 -0600
>>>> From: Ralph Castain <r...@open-mpi.org>
>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>> To: Open MPI Users <us...@open-mpi.org>
>>>> Message-ID: <d01d7b16-4b47-46f3-ad41-d1a90b2e4...@open-mpi.org>
>>>>
>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>>>      DelSp="yes"
>>>>
>>>> My apologies - I wasn't clear enough. You need a tarball from
>>>> r21111
>>>> or greater...such as:
>>>>
>>>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz
>>>>
>>>> HTH
>>>> Ralph
>>>>
>>>>
>>>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my
>>>>> command doesn't work
>>>>>
>>>>> cat rankf:
>>>>> rank 0=node1 slot=*
>>>>> rank 1=node2 slot=*
>>>>>
>>>>> cat hostf:
>>>>> node1 slots=2
>>>>> node2 slots=2
>>>>>
>>>>> mpirun  --rankfile rankf --hostfile hostf  --host node1 -n 1
>>>>> hostname : --host node2 -n 1 hostname
>>>>>
>>>>> Error, invalid rank (1) in the rankfile (rankf)
>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> rmaps_rank_file.c at line 403
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> base/rmaps_base_map_job.c at line 86
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> base/plm_base_launch_support.c at line 86
>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>> plm_rsh_module.c at line 1016
>>>>>
>>>>>
>>>>> Ralph, could you tell me if my command syntax is correct or
>>>>> not ? if
>>>>> not, give me the expected one ?
>>>>>
>>>>> Regards
>>>>>
>>>>> Geoffroy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2009/4/30 Geoffroy Pignot <geopig...@gmail.com>
>>>>> Immediately Sir !!! :)
>>>>>
>>>>> Thanks again Ralph
>>>>>
>>>>> Geoffroy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Message: 2
>>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600
>>>>> From: Ralph Castain <r...@open-mpi.org>
>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>> Message-ID:
>>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com >
>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>
>>>>> I believe this is fixed now in our development trunk - you can
>>>>> download any
>>>>> tarball starting from last night and give it a try, if you like.
>>>>> Any
>>>>> feedback would be appreciated.
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote:
>>>>>
>>>>> Ah now, I didn't say it -worked-, did I? :-)
>>>>>
>>>>> Clearly a bug exists in the program. I'll try to take a look at it
>>>>> (if Lenny
>>>>> doesn't get to it first), but it won't be until later in the week.
>>>>>
>>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote:
>>>>>
>>>>> I agree with you Ralph , and that 's what I expect from openmpi
>>>>> but my
>>>>> second example shows that it's not working
>>>>>
>>>>> cat hostfile.0
>>>>> r011n002 slots=4
>>>>> r011n003 slots=4
>>>>>
>>>>> cat rankfile.0
>>>>>  rank 0=r011n002 slot=0
>>>>>  rank 1=r011n003 slot=1
>>>>>
>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1
>>>>> hostname
>>>>> ### CRASHED
>>>>>
>>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>
>>>>>>
>>>>>
>>>> -------------------------------------------------------------------------- >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> rmaps_rank_file.c at line 404
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/rmaps_base_map_job.c at line 87
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/plm_base_launch_support.c at line 77
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> plm_rsh_module.c at line 985
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> A daemon (pid unknown) died unexpectedly on signal 1  while
>>>>>> attempting to
>>>>>>> launch so we are aborting.
>>>>>>>
>>>>>>> There may be more information reported by the environment (see
>>>>>> above).
>>>>>>>
>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>>>>>>> to
>>>>>> have the
>>>>>>> location of the shared libraries on the remote nodes and this
>>>>>>> will
>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>
>>>> -------------------------------------------------------------------------- >>>>>>> orterun noticed that the job aborted, but has no info as to the
>>>>>> process
>>>>>>> that caused that situation.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> orterun: clean termination accomplished
>>>>>
>>>>>
>>>>>
>>>>> Message: 4
>>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600
>>>>> From: Ralph Castain <r...@lanl.gov>
>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov>
>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>>>>>     DelSp="yes"
>>>>>
>>>>> The rankfile cuts across the entire job - it isn't applied on an
>>>>> app_context basis. So the ranks in your rankfile must correspond
>>>>> to
>>>>> the eventual rank of each process in the cmd line.
>>>>>
>>>>> Unfortunately, that means you have to count ranks. In your case,
>>>>> you
>>>>> only have four, so that makes life easier. Your rankfile would
>>>>> look
>>>>> something like this:
>>>>>
>>>>> rank 0=r001n001 slot=0
>>>>> rank 1=r001n002 slot=1
>>>>> rank 2=r001n001 slot=1
>>>>> rank 3=r001n002 slot=2
>>>>>
>>>>> HTH
>>>>> Ralph
>>>>>
>>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I agree that my examples are not very clear. What I want to do
>>>>>> is to
>>>>>> launch a multiexes application (masters-slaves) and benefit
>>>>>> from the
>>>>>> processor affinity.
>>>>>> Could you show me how to convert this command , using -rf option
>>>>>> (whatever the affinity is)
>>>>>>
>>>>>> mpirun -n 1 -host r001n001 master.x options1  : -n 1 -host
>>>>>> r001n002
>>>>>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 -
>>>>>> host r001n002 slave.x options4
>>>>>>
>>>>>> Thanks for your help
>>>>>>
>>>>>> Geoffroy
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Message: 2
>>>>>> Date: Sun, 12 Apr 2009 18:26:35 +0300
>>>>>> From: Lenny Verkhovsky <lenny.verkhov...@gmail.com>
>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
>>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>>> Message-ID:
>>>>>>
>>>>>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com>
>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The first "crash" is OK, since your rankfile has ranks 0 and 1
>>>>>> defined,
>>>>>> while n=1, which means only rank 0 is present and can be
>>>>>> allocated.
>>>>>>
>>>>>> NP must be >= the largest rank in rankfile.
>>>>>>
>>>>>> What exactly are you trying to do ?
>>>>>>
>>>>>> I tried to recreate your seqv but all I got was
>>>>>>
>>>>>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun -- hostfile
>>>>>> hostfile.0
>>>>>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname
>>>>>> [witch19:30798] mca: base: component_find: paffinity
>>>>>> "mca_paffinity_linux"
>>>>>> uses an MCA interface that is not recognized (component MCA
>>>>> v1.0.0 !=
>>>>>> supported MCA v2.0.0) -- ignored
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> It looks like opal_init failed for some reason; your parallel
>>>>>> process is
>>>>>> likely to abort. There are many reasons that a parallel process
>>>>>> can
>>>>>> fail during opal_init; some of which are due to configuration or
>>>>>> environment problems. This failure appears to be an internal
>>>>> failure;
>>>>>> here's some additional information (which may only be relevant
>>>>>> to an
>>>>>> Open MPI developer):
>>>>>>
>>>>>> opal_carto_base_select failed
>>>>>> --> Returned value -13 instead of OPAL_SUCCESS
>>>>>>
>>>>>
>>>> -------------------------------------------------------------------------- >>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
>>>>> file
>>>>>> ../../orte/runtime/orte_init.c at line 78
>>>>>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
>>>>> file
>>>>>> ../../orte/orted/orted_main.c at line 344
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> A daemon (pid 11629) died unexpectedly with status 243 while
>>>>>> attempting
>>>>>> to launch so we are aborting.
>>>>>>
>>>>>> There may be more information reported by the environment (see
>>>>> above).
>>>>>>
>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>> have the
>>>>>> location of the shared libraries on the remote nodes and this
>>>>>> will
>>>>>> automatically be forwarded to the remote nodes.
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>> process
>>>>>> that caused that situation.
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>> mpirun: clean termination accomplished
>>>>>>
>>>>>>
>>>>>> Lenny.
>>>>>>
>>>>>>
>>>>>> On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi ,
>>>>>>>
>>>>>>> I am currently testing the process affinity capabilities of
>>>>>> openmpi and I
>>>>>>> would like to know if the rankfile behaviour I will describe
>>>>>>> below
>>>>>> is normal
>>>>>>> or not ?
>>>>>>>
>>>>>>> cat hostfile.0
>>>>>>> r011n002 slots=4
>>>>>>> r011n003 slots=4
>>>>>>>
>>>>>>> cat rankfile.0
>>>>>>> rank 0=r011n002 slot=0
>>>>>>> rank 1=r011n003 slot=1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> ##################################################################################
>>>>>>>
>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2  hostname ###
>>>>>>> OK
>>>>>>> r011n002
>>>>>>> r011n003
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> ##################################################################################
>>>>>>> but
>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : - n 1
>>>>>> hostname
>>>>>>> ### CRASHED
>>>>>>> *
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> Error, invalid rank (1) in the rankfile (rankfile.0)
>>>>>>>
>>>>>>
>>>>>
>>>> -------------------------------------------------------------------------- >>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> rmaps_rank_file.c at line 404
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/rmaps_base_map_job.c at line 87
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> base/plm_base_launch_support.c at line 77
>>>>>>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in
>>>>> file
>>>>>>> plm_rsh_module.c at line 985
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> A daemon (pid unknown) died unexpectedly on signal 1  while
>>>>>> attempting to
>>>>>>> launch so we are aborting.
>>>>>>>
>>>>>>> There may be more information reported by the environment (see
>>>>>> above).
>>>>>>>
>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH
>>>>>>> to
>>>>>> have the
>>>>>>> location of the shared libraries on the remote nodes and this
>>>>>>> will
>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>
>>>> -------------------------------------------------------------------------- >>>>>>> orterun noticed that the job aborted, but has no info as to the
>>>>>> process
>>>>>>> that caused that situation.
>>>>>>>
>>>>>>
>>>>>
>>>> --------------------------------------------------------------------------
>>>>>>> orterun: clean termination accomplished
>>>>>>> *
>>>>>>> It seems that the rankfile option is not propagted to the second
>>>>>> command
>>>>>>> line ; there is no global understanding of the ranking inside a
>>>>>> mpirun
>>>>>>> command.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> ##################################################################################
>>>>>>>
>>>>>>> Assuming that , I tried to provide a rankfile to each command
>>>>> line:
>>>>>>>
>>>>>>> cat rankfile.0
>>>>>>> rank 0=r011n002 slot=0
>>>>>>>
>>>>>>> cat rankfile.1
>>>>>>> rank 0=r011n003 slot=1
>>>>>>>
>>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : - rf
>>>>>> rankfile.1
>>>>>>> -n 1 hostname ### CRASHED
>>>>>>> *[r011n002:28778] *** Process received signal ***
>>>>>>> [r011n002:28778] Signal: Segmentation fault (11)
>>>>>>> [r011n002:28778] Signal code: Address not mapped (1)
>>>>>>> [r011n002:28778] Failing at address: 0x34
>>>>>>> [r011n002:28778] [ 0] [0xffffe600]
>>>>>>> [r011n002:28778] [ 1]
>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>> 0(orte_odls_base_default_get_add_procs_data+0x55d)
>>>>>>> [0x5557decd]
>>>>>>> [r011n002:28778] [ 2]
>>>>>>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so.
>>>>>> 0(orte_plm_base_launch_apps+0x117)
>>>>>>> [0x555842a7]
>>>>>>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/
>>>>>> mca_plm_rsh.so
>>>>>>> [0x556098c0]
>>>>>>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>> [0x804aa27]
>>>>>>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>> [0x804a022]
>>>>>>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)
>>>>>> [0x9f1dec]
>>>>>>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun
>>>>>> [0x8049f71]
>>>>>>> [r011n002:28778] *** End of error message ***
>>>>>>> Segmentation fault (core dumped)*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I hope that I've found a bug because it would be very important
>>>>>> for me to
>>>>>>> have this kind of capabiliy .
>>>>>>> Launch a multiexe mpirun command line and be able to bind my
>>>>>>> exes
>>>>>> and
>>>>>>> sockets together.
>>>>>>>
>>>>>>> Thanks in advance for your help
>>>>>>>
>>>>>>> Geoffroy
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> -------------- next part --------------
>>>>> HTML attachment scrubbed and removed
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> End of users Digest, Vol 1202, Issue 2
>>>>> **************************************
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> -------------- next part --------------
>>>>> HTML attachment scrubbed and removed
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> End of users Digest, Vol 1218, Issue 2
>>>>> **************************************
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> -------------- next part --------------
>>>> HTML attachment scrubbed and removed
>>>>
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> End of users Digest, Vol 1221, Issue 3
>>>> **************************************
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> -------------- next part --------------
>> HTML attachment scrubbed and removed
>>
>> ------------------------------
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> End of users Digest, Vol 1221, Issue 6
>> **************************************
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


Reply via email to