Hi Ralph Thanks for your extra tests. Before leaving , I just pointed out a problem coming from running plpa across different rh distribs (<=> different Linux kernels). Indeed, I configure and compile openmpi on rhel4 , then I run on rhel5. I think my problem comes from this approximation. I'll do few more tests tomorrow morning (France) and keep you inform.
Regards Geoffroy > > > Message: 2 > Date: Mon, 4 May 2009 13:34:40 -0600 > From: Ralph Castain <r...@open-mpi.org> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > To: Open MPI Users <us...@open-mpi.org> > Message-ID: > <71d2d8cc0905041234m76eb5a9dx57a773997779d...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hmmm...I'm afraid I can't replicate the problem. All seems to be working > just fine on the RHEL systems available to me. The procs indeed bind to the > specified processors in every case. > > rhc@odin ~/trunk]$ cat rankfile > rank 0=odin001 slot=0 > rank 1=odin002 slot=1 > > [rhc@odin mpi]$ mpirun -rf ../../../rankfile -n 2 --leave-session-attached > -mca paffinity_base_verbose 5 ./mpi_spin > [odin001.cs.indiana.edu:09297 <http://odin001.cs.indiana.edu:9297/>] > paffinity slot assignment: slot_list == 0 > [odin001.cs.indiana.edu:09297 <http://odin001.cs.indiana.edu:9297/>] > paffinity slot assignment: rank 0 runs on cpu #0 (#0) > [odin002.cs.indiana.edu:13566] paffinity slot assignment: slot_list == 1 > [odin002.cs.indiana.edu:13566] paffinity slot assignment: rank 1 runs on > cpu > #1 (#1) > > Suspended > [rhc@odin mpi]$ ssh odin001 > [rhc@odin001 ~]$ ps axo stat,user,psr,pid,pcpu,comm | grep rhc > S rhc 0 9296 0.0 orted > RLl rhc 0 9297 100 mpi_spin > > [rhc@odin mpi]$ ssh odin002 > [rhc@odin002 ~]$ ps axo stat,user,psr,pid,pcpu,comm | grep rhc > S rhc 0 13562 0.0 orted > RLl rhc 1 13566 102 mpi_spin > > > Not sure where to go from here...perhaps someone else can spot the problem? > Ralph > > > On Mon, May 4, 2009 at 8:28 AM, Ralph Castain <r...@open-mpi.org> wrote: > > > Unfortunately, I didn't write any of that code - I was just fixing the > > mapper so it would properly map the procs. From what I can tell, the > proper > > things are happening there. > > > > I'll have to dig into the code that specifically deals with parsing the > > results to bind the processes. Afraid that will take awhile longer - > pretty > > dark in that hole. > > > > > > > > On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot <geopig...@gmail.com > >wrote: > > > >> Hi, > >> > >> So, there are no more crashes with my "crazy" mpirun command. But the > >> paffinity feature seems to be broken. Indeed I am not able to pin my > >> processes. > >> > >> Simple test with a program using your plpa library : > >> > >> r011n006% cat hostf > >> r011n006 slots=4 > >> > >> r011n006% cat rankf > >> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ? > >> > >> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf > --rankfile > >> rankf --wdir /tmp -n 1 a.out > >> >>> PLPA Number of processors online: 4 > >> >>> PLPA Number of processor sockets: 2 > >> >>> PLPA Socket 0 (ID 0): 2 cores > >> >>> PLPA Socket 1 (ID 3): 2 cores > >> > >> Ctrl+Z > >> r011n006%bg > >> > >> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot > >> R+ gpignot 3 9271 97.8 a.out > >> > >> In fact whatever the slot number I put in my rankfile , a.out always > runs > >> on the CPU 3. I was looking for it on CPU 0 accordind to my cpuinfo file > >> (see below) > >> The result is the same if I try another syntax (rank 0=r011n006 slot=0:0 > >> bind to socket 0 - core 0 , exact ? ) > >> > >> Thanks in advance > >> > >> Geoffroy > >> > >> PS: I run on rhel5 > >> > >> r011n006% uname -a > >> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 CDT > >> 2008 x86_64 x86_64 x86_64 GNU/Linux > >> > >> My configure is : > >> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/lib64' > >> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous > >> > >> > >> r011n006% cat /proc/cpuinfo > >> processor : 0 > >> vendor_id : GenuineIntel > >> cpu family : 6 > >> model : 15 > >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > >> stepping : 6 > >> cpu MHz : 2660.007 > >> cache size : 4096 KB > >> physical id : 0 > >> siblings : 2 > >> core id : 0 > >> cpu cores : 2 > >> fpu : yes > >> fpu_exception : yes > >> cpuid level : 10 > >> wp : yes > >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca > >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm > >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > >> bogomips : 5323.68 > >> clflush size : 64 > >> cache_alignment : 64 > >> address sizes : 36 bits physical, 48 bits virtual > >> power management: > >> > >> processor : 1 > >> vendor_id : GenuineIntel > >> cpu family : 6 > >> model : 15 > >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > >> stepping : 6 > >> cpu MHz : 2660.007 > >> cache size : 4096 KB > >> physical id : 3 > >> siblings : 2 > >> core id : 0 > >> cpu cores : 2 > >> fpu : yes > >> fpu_exception : yes > >> cpuid level : 10 > >> wp : yes > >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca > >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm > >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > >> bogomips : 5320.03 > >> clflush size : 64 > >> cache_alignment : 64 > >> address sizes : 36 bits physical, 48 bits virtual > >> power management: > >> > >> processor : 2 > >> vendor_id : GenuineIntel > >> cpu family : 6 > >> model : 15 > >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > >> stepping : 6 > >> cpu MHz : 2660.007 > >> cache size : 4096 KB > >> physical id : 0 > >> siblings : 2 > >> core id : 1 > >> cpu cores : 2 > >> fpu : yes > >> fpu_exception : yes > >> cpuid level : 10 > >> wp : yes > >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca > >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm > >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > >> bogomips : 5319.39 > >> clflush size : 64 > >> cache_alignment : 64 > >> address sizes : 36 bits physical, 48 bits virtual > >> power management: > >> > >> processor : 3 > >> vendor_id : GenuineIntel > >> cpu family : 6 > >> model : 15 > >> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz > >> stepping : 6 > >> cpu MHz : 2660.007 > >> cache size : 4096 KB > >> physical id : 3 > >> siblings : 2 > >> core id : 1 > >> cpu cores : 2 > >> fpu : yes > >> fpu_exception : yes > >> cpuid level : 10 > >> wp : yes > >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca > >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm > >> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm > >> bogomips : 5320.03 > >> clflush size : 64 > >> cache_alignment : 64 > >> address sizes : 36 bits physical, 48 bits virtual > >> power management: > >> > >> > >>> ------------------------------ > >>> > >>> Message: 2 > >>> Date: Mon, 4 May 2009 04:45:57 -0600 > >>> From: Ralph Castain <r...@open-mpi.org> > >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > >>> To: Open MPI Users <us...@open-mpi.org> > >>> Message-ID: <d01d7b16-4b47-46f3-ad41-d1a90b2e4...@open-mpi.org> > >>> > >>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; > >>> DelSp="yes" > >>> > >>> My apologies - I wasn't clear enough. You need a tarball from r21111 > >>> or greater...such as: > >>> > >>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz > >>> > >>> HTH > >>> Ralph > >>> > >>> > >>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote: > >>> > >>> > Hi , > >>> > > >>> > I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my > >>> > command doesn't work > >>> > > >>> > cat rankf: > >>> > rank 0=node1 slot=* > >>> > rank 1=node2 slot=* > >>> > > >>> > cat hostf: > >>> > node1 slots=2 > >>> > node2 slots=2 > >>> > > >>> > mpirun --rankfile rankf --hostfile hostf --host node1 -n 1 > >>> > hostname : --host node2 -n 1 hostname > >>> > > >>> > Error, invalid rank (1) in the rankfile (rankf) > >>> > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file > >>> > rmaps_rank_file.c at line 403 > >>> > [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file > >>> > base/rmaps_base_map_job.c at line 86 > >>> > [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file > >>> > base/plm_base_launch_support.c at line 86 > >>> > [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file > >>> > plm_rsh_module.c at line 1016 > >>> > > >>> > > >>> > Ralph, could you tell me if my command syntax is correct or not ? if > >>> > not, give me the expected one ? > >>> > > >>> > Regards > >>> > > >>> > Geoffroy > >>> > > >>> > > >>> > > >>> > > >>> > 2009/4/30 Geoffroy Pignot <geopig...@gmail.com> > >>> > Immediately Sir !!! :) > >>> > > >>> > Thanks again Ralph > >>> > > >>> > Geoffroy > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > ------------------------------ > >>> > > >>> > Message: 2 > >>> > Date: Thu, 30 Apr 2009 06:45:39 -0600 > >>> > From: Ralph Castain <r...@open-mpi.org> > >>> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > >>> > To: Open MPI Users <us...@open-mpi.org> > >>> > Message-ID: > >>> > <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com> > >>> > Content-Type: text/plain; charset="iso-8859-1" > >>> > > >>> > I believe this is fixed now in our development trunk - you can > >>> > download any > >>> > tarball starting from last night and give it a try, if you like. Any > >>> > feedback would be appreciated. > >>> > > >>> > Ralph > >>> > > >>> > > >>> > On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote: > >>> > > >>> > Ah now, I didn't say it -worked-, did I? :-) > >>> > > >>> > Clearly a bug exists in the program. I'll try to take a look at it > >>> > (if Lenny > >>> > doesn't get to it first), but it won't be until later in the week. > >>> > > >>> > On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: > >>> > > >>> > I agree with you Ralph , and that 's what I expect from openmpi but > my > >>> > second example shows that it's not working > >>> > > >>> > cat hostfile.0 > >>> > r011n002 slots=4 > >>> > r011n003 slots=4 > >>> > > >>> > cat rankfile.0 > >>> > rank 0=r011n002 slot=0 > >>> > rank 1=r011n003 slot=1 > >>> > > >>> > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 > >>> > hostname > >>> > ### CRASHED > >>> > > >>> > > > Error, invalid rank (1) in the rankfile (rankfile.0) > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > rmaps_rank_file.c at line 404 > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > base/rmaps_base_map_job.c at line 87 > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > base/plm_base_launch_support.c at line 77 > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > plm_rsh_module.c at line 985 > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > A daemon (pid unknown) died unexpectedly on signal 1 while > >>> > > attempting to > >>> > > > launch so we are aborting. > >>> > > > > >>> > > > There may be more information reported by the environment (see > >>> > > above). > >>> > > > > >>> > > > This may be because the daemon was unable to find all the needed > >>> > > shared > >>> > > > libraries on the remote node. You may set your LD_LIBRARY_PATH to > >>> > > have the > >>> > > > location of the shared libraries on the remote nodes and this > will > >>> > > > automatically be forwarded to the remote nodes. > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > orterun noticed that the job aborted, but has no info as to the > >>> > > process > >>> > > > that caused that situation. > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > orterun: clean termination accomplished > >>> > > >>> > > >>> > > >>> > Message: 4 > >>> > Date: Tue, 14 Apr 2009 06:55:58 -0600 > >>> > From: Ralph Castain <r...@lanl.gov> > >>> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > >>> > To: Open MPI Users <us...@open-mpi.org> > >>> > Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> > >>> > Content-Type: text/plain; charset="us-ascii"; Format="flowed"; > >>> > DelSp="yes" > >>> > > >>> > The rankfile cuts across the entire job - it isn't applied on an > >>> > app_context basis. So the ranks in your rankfile must correspond to > >>> > the eventual rank of each process in the cmd line. > >>> > > >>> > Unfortunately, that means you have to count ranks. In your case, you > >>> > only have four, so that makes life easier. Your rankfile would look > >>> > something like this: > >>> > > >>> > rank 0=r001n001 slot=0 > >>> > rank 1=r001n002 slot=1 > >>> > rank 2=r001n001 slot=1 > >>> > rank 3=r001n002 slot=2 > >>> > > >>> > HTH > >>> > Ralph > >>> > > >>> > On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote: > >>> > > >>> > > Hi, > >>> > > > >>> > > I agree that my examples are not very clear. What I want to do is > to > >>> > > launch a multiexes application (masters-slaves) and benefit from > the > >>> > > processor affinity. > >>> > > Could you show me how to convert this command , using -rf option > >>> > > (whatever the affinity is) > >>> > > > >>> > > mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host r001n002 > >>> > > master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 - > >>> > > host r001n002 slave.x options4 > >>> > > > >>> > > Thanks for your help > >>> > > > >>> > > Geoffroy > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > Message: 2 > >>> > > Date: Sun, 12 Apr 2009 18:26:35 +0300 > >>> > > From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> > >>> > > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > >>> > > To: Open MPI Users <us...@open-mpi.org> > >>> > > Message-ID: > >>> > > < > 453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com> > >>> > > Content-Type: text/plain; charset="iso-8859-1" > >>> > > > >>> > > Hi, > >>> > > > >>> > > The first "crash" is OK, since your rankfile has ranks 0 and 1 > >>> > > defined, > >>> > > while n=1, which means only rank 0 is present and can be allocated. > >>> > > > >>> > > NP must be >= the largest rank in rankfile. > >>> > > > >>> > > What exactly are you trying to do ? > >>> > > > >>> > > I tried to recreate your seqv but all I got was > >>> > > > >>> > > ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile > >>> > > hostfile.0 > >>> > > -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname > >>> > > [witch19:30798] mca: base: component_find: paffinity > >>> > > "mca_paffinity_linux" > >>> > > uses an MCA interface that is not recognized (component MCA > >>> > v1.0.0 != > >>> > > supported MCA v2.0.0) -- ignored > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > It looks like opal_init failed for some reason; your parallel > >>> > > process is > >>> > > likely to abort. There are many reasons that a parallel process can > >>> > > fail during opal_init; some of which are due to configuration or > >>> > > environment problems. This failure appears to be an internal > >>> > failure; > >>> > > here's some additional information (which may only be relevant to > an > >>> > > Open MPI developer): > >>> > > > >>> > > opal_carto_base_select failed > >>> > > --> Returned value -13 instead of OPAL_SUCCESS > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in > >>> > file > >>> > > ../../orte/runtime/orte_init.c at line 78 > >>> > > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in > >>> > file > >>> > > ../../orte/orted/orted_main.c at line 344 > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > A daemon (pid 11629) died unexpectedly with status 243 while > >>> > > attempting > >>> > > to launch so we are aborting. > >>> > > > >>> > > There may be more information reported by the environment (see > >>> > above). > >>> > > > >>> > > This may be because the daemon was unable to find all the needed > >>> > > shared > >>> > > libraries on the remote node. You may set your LD_LIBRARY_PATH to > >>> > > have the > >>> > > location of the shared libraries on the remote nodes and this will > >>> > > automatically be forwarded to the remote nodes. > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > mpirun noticed that the job aborted, but has no info as to the > >>> > process > >>> > > that caused that situation. > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > mpirun: clean termination accomplished > >>> > > > >>> > > > >>> > > Lenny. > >>> > > > >>> > > > >>> > > On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote: > >>> > > > > >>> > > > Hi , > >>> > > > > >>> > > > I am currently testing the process affinity capabilities of > >>> > > openmpi and I > >>> > > > would like to know if the rankfile behaviour I will describe > below > >>> > > is normal > >>> > > > or not ? > >>> > > > > >>> > > > cat hostfile.0 > >>> > > > r011n002 slots=4 > >>> > > > r011n003 slots=4 > >>> > > > > >>> > > > cat rankfile.0 > >>> > > > rank 0=r011n002 slot=0 > >>> > > > rank 1=r011n003 slot=1 > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > ################################################################################## > >>> > > > > >>> > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ### OK > >>> > > > r011n002 > >>> > > > r011n003 > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > ################################################################################## > >>> > > > but > >>> > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 > >>> > > hostname > >>> > > > ### CRASHED > >>> > > > * > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > Error, invalid rank (1) in the rankfile (rankfile.0) > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > rmaps_rank_file.c at line 404 > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > base/rmaps_base_map_job.c at line 87 > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > base/plm_base_launch_support.c at line 77 > >>> > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > >>> > file > >>> > > > plm_rsh_module.c at line 985 > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > A daemon (pid unknown) died unexpectedly on signal 1 while > >>> > > attempting to > >>> > > > launch so we are aborting. > >>> > > > > >>> > > > There may be more information reported by the environment (see > >>> > > above). > >>> > > > > >>> > > > This may be because the daemon was unable to find all the needed > >>> > > shared > >>> > > > libraries on the remote node. You may set your LD_LIBRARY_PATH to > >>> > > have the > >>> > > > location of the shared libraries on the remote nodes and this > will > >>> > > > automatically be forwarded to the remote nodes. > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > orterun noticed that the job aborted, but has no info as to the > >>> > > process > >>> > > > that caused that situation. > >>> > > > > >>> > > > >>> > > >>> > -------------------------------------------------------------------------- > >>> > > > orterun: clean termination accomplished > >>> > > > * > >>> > > > It seems that the rankfile option is not propagted to the second > >>> > > command > >>> > > > line ; there is no global understanding of the ranking inside a > >>> > > mpirun > >>> > > > command. > >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > ################################################################################## > >>> > > > > >>> > > > Assuming that , I tried to provide a rankfile to each command > >>> > line: > >>> > > > > >>> > > > cat rankfile.0 > >>> > > > rank 0=r011n002 slot=0 > >>> > > > > >>> > > > cat rankfile.1 > >>> > > > rank 0=r011n003 slot=1 > >>> > > > > >>> > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf > >>> > > rankfile.1 > >>> > > > -n 1 hostname ### CRASHED > >>> > > > *[r011n002:28778] *** Process received signal *** > >>> > > > [r011n002:28778] Signal: Segmentation fault (11) > >>> > > > [r011n002:28778] Signal code: Address not mapped (1) > >>> > > > [r011n002:28778] Failing at address: 0x34 > >>> > > > [r011n002:28778] [ 0] [0xffffe600] > >>> > > > [r011n002:28778] [ 1] > >>> > > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. > >>> > > 0(orte_odls_base_default_get_add_procs_data+0x55d) > >>> > > > [0x5557decd] > >>> > > > [r011n002:28778] [ 2] > >>> > > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. > >>> > > 0(orte_plm_base_launch_apps+0x117) > >>> > > > [0x555842a7] > >>> > > > [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/ > >>> > > mca_plm_rsh.so > >>> > > > [0x556098c0] > >>> > > > [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun > >>> > > [0x804aa27] > >>> > > > [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun > >>> > > [0x804a022] > >>> > > > [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc) > >>> > > [0x9f1dec] > >>> > > > [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun > >>> > > [0x8049f71] > >>> > > > [r011n002:28778] *** End of error message *** > >>> > > > Segmentation fault (core dumped)* > >>> > > > > >>> > > > > >>> > > > > >>> > > > I hope that I've found a bug because it would be very important > >>> > > for me to > >>> > > > have this kind of capabiliy . > >>> > > > Launch a multiexe mpirun command line and be able to bind my exes > >>> > > and > >>> > > > sockets together. > >>> > > > > >>> > > > Thanks in advance for your help > >>> > > > > >>> > > > Geoffroy > >>> > > _______________________________________________ > >>> > > users mailing list > >>> > > us...@open-mpi.org > >>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > > >>> > -------------- next part -------------- > >>> > HTML attachment scrubbed and removed > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > users mailing list > >>> > us...@open-mpi.org > >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > > >>> > End of users Digest, Vol 1202, Issue 2 > >>> > ************************************** > >>> > > >>> > _______________________________________________ > >>> > users mailing list > >>> > us...@open-mpi.org > >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > > >>> > _______________________________________________ > >>> > users mailing list > >>> > us...@open-mpi.org > >>> > -------------- next part -------------- > >>> > HTML attachment scrubbed and removed > >>> > > >>> > ------------------------------ > >>> > > >>> > _______________________________________________ > >>> > users mailing list > >>> > us...@open-mpi.org > >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > > >>> > End of users Digest, Vol 1218, Issue 2 > >>> > ************************************** > >>> > > >>> > > >>> > _______________________________________________ > >>> > users mailing list > >>> > us...@open-mpi.org > >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> -------------- next part -------------- > >>> HTML attachment scrubbed and removed > >>> > >>> ------------------------------ > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> End of users Digest, Vol 1221, Issue 3 > >>> ************************************** > >>> > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > > > > > -------------- next part -------------- > HTML attachment scrubbed and removed > > ------------------------------ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 1221, Issue 17 > *************************************** >