No, it's not working as I expect , unless I expect somthing wrong . ( sorry for the long PATH, I needed to provide it )
$LD_LIBRARY_PATH=/hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/lib/ /hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun -np 2 -H witch1,witch2 hostname witch1 witch2 $LD_LIBRARY_PATH=/hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/lib/ /hpc/home/USERS/lennyb/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun -np 2 -H witch1,witch2 -app appfile dellix7 dellix7 $cat appfile -np 1 hostname -np 1 hostname On Tue, Jul 14, 2009 at 7:08 PM, Ralph Castain <r...@open-mpi.org> wrote: > Run it without the appfile, just putting the apps on the cmd line - does it > work right then? > > On Jul 14, 2009, at 10:04 AM, Lenny Verkhovsky wrote: > > additional info > I am running mpirun on hostA, and providing hostlist with hostB and hostC. > I expect that each application would run on hostB and hostC, but I get all > of them running on hostA. > dellix7$cat appfile > -np 1 hostname > -np 1 hostname > dellix7$mpirun -np 2 -H witch1,witch2 -app appfile > dellix7 > dellix7 > Thanks > Lenny. > > On Tue, Jul 14, 2009 at 4:59 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Strange - let me have a look at it later today. Probably something simple >> that another pair of eyes might spot. >> On Jul 14, 2009, at 7:43 AM, Lenny Verkhovsky wrote: >> >> Seems like connected problem: >> I can't use rankfile with app, even after all those fixes ( working with >> trunk 1.4a1r21657). >> This is my case : >> >> $cat rankfile >> rank 0=+n1 slot=0 >> rank 1=+n0 slot=0 >> $cat appfile >> -np 1 hostname >> -np 1 hostname >> $mpirun -np 2 -H witch1,witch2 -rf rankfile -app appfile >> -------------------------------------------------------------------------- >> Rankfile claimed host +n1 by index that is bigger than number of allocated >> hosts. >> -------------------------------------------------------------------------- >> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file >> ../../../../../orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 422 >> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file >> ../../../../orte/mca/rmaps/base/rmaps_base_map_job.c at line 85 >> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file >> ../../../../orte/mca/plm/base/plm_base_launch_support.c at line 103 >> [dellix7:13414] [[10851,0],0] ORTE_ERROR_LOG: Bad parameter in file >> ../../../../../orte/mca/plm/rsh/plm_rsh_module.c at line 1001 >> >> >> The problem is, that rankfile mapper tries to find an appropriate host in >> the partial ( and not full ) hostlist. >> >> Any suggestions how to fix it? >> >> Thanks >> Lenny. >> >> On Wed, May 13, 2009 at 1:55 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Okay, I fixed this today too....r21219 >>> >>> >>> On May 11, 2009, at 11:27 PM, Anton Starikov wrote: >>> >>> Now there is another problem :) >>>> >>>> You can try oversubscribe node. At least by 1 task. >>>> If you hostfile and rank file limit you at N procs, you can ask mpirun >>>> for N+1 and it wil be not rejected. >>>> Although in reality there will be N tasks. >>>> So, if your hostfile limit is 4, then "mpirun -np 4" and "mpirun -np 5" >>>> both works, but in both cases there are only 4 tasks. It isn't crucial, >>>> because there is nor real oversubscription, but there is still some bug >>>> which can affect something in future. >>>> >>>> -- >>>> Anton Starikov. >>>> >>>> On May 12, 2009, at 1:45 AM, Ralph Castain wrote: >>>> >>>> This is fixed as of r21208. >>>>> >>>>> Thanks for reporting it! >>>>> Ralph >>>>> >>>>> >>>>> On May 11, 2009, at 12:51 PM, Anton Starikov wrote: >>>>> >>>>> Although removing this check solves problem of having more slots in >>>>>> rankfile than necessary, there is another problem. >>>>>> >>>>>> If I set rmaps_base_no_oversubscribe=1 then if, for example: >>>>>> >>>>>> >>>>>> hostfile: >>>>>> >>>>>> node01 >>>>>> node01 >>>>>> node02 >>>>>> node02 >>>>>> >>>>>> rankfile: >>>>>> >>>>>> rank 0=node01 slot=1 >>>>>> rank 1=node01 slot=0 >>>>>> rank 2=node02 slot=1 >>>>>> rank 3=node02 slot=0 >>>>>> >>>>>> mpirun -np 4 ./something >>>>>> >>>>>> complains with: >>>>>> >>>>>> "There are not enough slots available in the system to satisfy the 4 >>>>>> slots >>>>>> that were requested by the application" >>>>>> >>>>>> but "mpirun -np 3 ./something" will work though. It works, when you >>>>>> ask for 1 CPU less. And the same behavior in any case (shared nodes, >>>>>> non-shared nodes, multi-node) >>>>>> >>>>>> If you switch off rmaps_base_no_oversubscribe, then it works and all >>>>>> affinities set as it requested in rankfile, there is no oversubscription. >>>>>> >>>>>> >>>>>> Anton. >>>>>> >>>>>> On May 5, 2009, at 3:08 PM, Ralph Castain wrote: >>>>>> >>>>>> Ah - thx for catching that, I'll remove that check. It no longer is >>>>>>> required. >>>>>>> >>>>>>> Thx! >>>>>>> >>>>>>> On Tue, May 5, 2009 at 7:04 AM, Lenny Verkhovsky < >>>>>>> lenny.verkhov...@gmail.com> wrote: >>>>>>> According to the code it does cares. >>>>>>> >>>>>>> $vi orte/mca/rmaps/rank_file/rmaps_rank_file.c +572 >>>>>>> >>>>>>> ival = orte_rmaps_rank_file_value.ival; >>>>>>> if ( ival > (np-1) ) { >>>>>>> orte_show_help("help-rmaps_rank_file.txt", "bad-rankfile", true, >>>>>>> ival, rankfile); >>>>>>> rc = ORTE_ERR_BAD_PARAM; >>>>>>> goto unlock; >>>>>>> } >>>>>>> >>>>>>> If I remember correctly, I used an array to map ranks, and since the >>>>>>> length of array is NP, maximum index must be less than np, so if you >>>>>>> have >>>>>>> the number of rank > NP, you have no place to put it inside array. >>>>>>> >>>>>>> "Likewise, if you have more procs than the rankfile specifies, we map >>>>>>> the additional procs either byslot (default) or bynode (if you specify >>>>>>> that >>>>>>> option). So the rankfile doesn't need to contain an entry for every >>>>>>> proc." >>>>>>> - Correct point. >>>>>>> >>>>>>> >>>>>>> Lenny. >>>>>>> >>>>>>> >>>>>>> On 5/5/09, Ralph Castain <r...@open-mpi.org> wrote: Sorry Lenny, but >>>>>>> that isn't correct. The rankfile mapper doesn't care if the rankfile >>>>>>> contains additional info - it only maps up to the number of processes, >>>>>>> and >>>>>>> ignores anything beyond that number. So there is no need to remove the >>>>>>> additional info. >>>>>>> >>>>>>> Likewise, if you have more procs than the rankfile specifies, we map >>>>>>> the additional procs either byslot (default) or bynode (if you specify >>>>>>> that >>>>>>> option). So the rankfile doesn't need to contain an entry for every >>>>>>> proc. >>>>>>> >>>>>>> Just don't want to confuse folks. >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, May 5, 2009 at 5:59 AM, Lenny Verkhovsky < >>>>>>> lenny.verkhov...@gmail.com> wrote: >>>>>>> Hi, >>>>>>> maximum rank number must be less then np. >>>>>>> if np=1 then there is only rank 0 in the system, so rank 1 is >>>>>>> invalid. >>>>>>> please remove "rank 1=node2 slot=*" from the rankfile >>>>>>> Best regards, >>>>>>> Lenny. >>>>>>> >>>>>>> On Mon, May 4, 2009 at 11:14 AM, Geoffroy Pignot < >>>>>>> geopig...@gmail.com> wrote: >>>>>>> Hi , >>>>>>> >>>>>>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my >>>>>>> command doesn't work >>>>>>> >>>>>>> cat rankf: >>>>>>> rank 0=node1 slot=* >>>>>>> rank 1=node2 slot=* >>>>>>> >>>>>>> cat hostf: >>>>>>> node1 slots=2 >>>>>>> node2 slots=2 >>>>>>> >>>>>>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1 hostname >>>>>>> : --host node2 -n 1 hostname >>>>>>> >>>>>>> Error, invalid rank (1) in the rankfile (rankf) >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file >>>>>>> rmaps_rank_file.c at line 403 >>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file >>>>>>> base/rmaps_base_map_job.c at line 86 >>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file >>>>>>> base/plm_base_launch_support.c at line 86 >>>>>>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in file >>>>>>> plm_rsh_module.c at line 1016 >>>>>>> >>>>>>> >>>>>>> Ralph, could you tell me if my command syntax is correct or not ? if >>>>>>> not, give me the expected one ? >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> Geoffroy >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2009/4/30 Geoffroy Pignot <geopig...@gmail.com> >>>>>>> >>>>>>> Immediately Sir !!! :) >>>>>>> >>>>>>> Thanks again Ralph >>>>>>> >>>>>>> Geoffroy >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------ >>>>>>> >>>>>>> Message: 2 >>>>>>> Date: Thu, 30 Apr 2009 06:45:39 -0600 >>>>>>> From: Ralph Castain <r...@open-mpi.org> >>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>> Message-ID: >>>>>>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com> >>>>>>> Content-Type: text/plain; charset="iso-8859-1" >>>>>>> >>>>>>> I believe this is fixed now in our development trunk - you can >>>>>>> download any >>>>>>> tarball starting from last night and give it a try, if you like. Any >>>>>>> feedback would be appreciated. >>>>>>> >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote: >>>>>>> >>>>>>> Ah now, I didn't say it -worked-, did I? :-) >>>>>>> >>>>>>> Clearly a bug exists in the program. I'll try to take a look at it >>>>>>> (if Lenny >>>>>>> doesn't get to it first), but it won't be until later in the week. >>>>>>> >>>>>>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: >>>>>>> >>>>>>> I agree with you Ralph , and that 's what I expect from openmpi but >>>>>>> my >>>>>>> second example shows that it's not working >>>>>>> >>>>>>> cat hostfile.0 >>>>>>> r011n002 slots=4 >>>>>>> r011n003 slots=4 >>>>>>> >>>>>>> cat rankfile.0 >>>>>>> rank 0=r011n002 slot=0 >>>>>>> rank 1=r011n003 slot=1 >>>>>>> >>>>>>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >>>>>>> hostname >>>>>>> ### CRASHED >>>>>>> >>>>>>> > > Error, invalid rank (1) in the rankfile (rankfile.0) >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > rmaps_rank_file.c at line 404 >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > base/rmaps_base_map_job.c at line 87 >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > base/plm_base_launch_support.c at line 77 >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > plm_rsh_module.c at line 985 >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > A daemon (pid unknown) died unexpectedly on signal 1 while >>>>>>> > attempting to >>>>>>> > > launch so we are aborting. >>>>>>> > > >>>>>>> > > There may be more information reported by the environment (see >>>>>>> > above). >>>>>>> > > >>>>>>> > > This may be because the daemon was unable to find all the needed >>>>>>> > shared >>>>>>> > > libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>> > have the >>>>>>> > > location of the shared libraries on the remote nodes and this >>>>>>> will >>>>>>> > > automatically be forwarded to the remote nodes. >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > orterun noticed that the job aborted, but has no info as to the >>>>>>> > process >>>>>>> > > that caused that situation. >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > orterun: clean termination accomplished >>>>>>> >>>>>>> >>>>>>> >>>>>>> Message: 4 >>>>>>> Date: Tue, 14 Apr 2009 06:55:58 -0600 >>>>>>> From: Ralph Castain <r...@lanl.gov> >>>>>>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> >>>>>>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >>>>>>> DelSp="yes" >>>>>>> >>>>>>> The rankfile cuts across the entire job - it isn't applied on an >>>>>>> app_context basis. So the ranks in your rankfile must correspond to >>>>>>> the eventual rank of each process in the cmd line. >>>>>>> >>>>>>> Unfortunately, that means you have to count ranks. In your case, you >>>>>>> only have four, so that makes life easier. Your rankfile would look >>>>>>> something like this: >>>>>>> >>>>>>> rank 0=r001n001 slot=0 >>>>>>> rank 1=r001n002 slot=1 >>>>>>> rank 2=r001n001 slot=1 >>>>>>> rank 3=r001n002 slot=2 >>>>>>> >>>>>>> HTH >>>>>>> Ralph >>>>>>> >>>>>>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote: >>>>>>> >>>>>>> > Hi, >>>>>>> > >>>>>>> > I agree that my examples are not very clear. What I want to do is >>>>>>> to >>>>>>> > launch a multiexes application (masters-slaves) and benefit from >>>>>>> the >>>>>>> > processor affinity. >>>>>>> > Could you show me how to convert this command , using -rf option >>>>>>> > (whatever the affinity is) >>>>>>> > >>>>>>> > mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host r001n002 >>>>>>> > master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 - >>>>>>> > host r001n002 slave.x options4 >>>>>>> > >>>>>>> > Thanks for your help >>>>>>> > >>>>>>> > Geoffroy >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > Message: 2 >>>>>>> > Date: Sun, 12 Apr 2009 18:26:35 +0300 >>>>>>> > From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> >>>>>>> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>>>>>> > To: Open MPI Users <us...@open-mpi.org> >>>>>>> > Message-ID: >>>>>>> > < >>>>>>> 453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com> >>>>>>> > Content-Type: text/plain; charset="iso-8859-1" >>>>>>> > >>>>>>> > Hi, >>>>>>> > >>>>>>> > The first "crash" is OK, since your rankfile has ranks 0 and 1 >>>>>>> > defined, >>>>>>> > while n=1, which means only rank 0 is present and can be allocated. >>>>>>> > >>>>>>> > NP must be >= the largest rank in rankfile. >>>>>>> > >>>>>>> > What exactly are you trying to do ? >>>>>>> > >>>>>>> > I tried to recreate your seqv but all I got was >>>>>>> > >>>>>>> > ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile >>>>>>> > hostfile.0 >>>>>>> > -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname >>>>>>> > [witch19:30798] mca: base: component_find: paffinity >>>>>>> > "mca_paffinity_linux" >>>>>>> > uses an MCA interface that is not recognized (component MCA v1.0.0 >>>>>>> != >>>>>>> > supported MCA v2.0.0) -- ignored >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > It looks like opal_init failed for some reason; your parallel >>>>>>> > process is >>>>>>> > likely to abort. There are many reasons that a parallel process can >>>>>>> > fail during opal_init; some of which are due to configuration or >>>>>>> > environment problems. This failure appears to be an internal >>>>>>> failure; >>>>>>> > here's some additional information (which may only be relevant to >>>>>>> an >>>>>>> > Open MPI developer): >>>>>>> > >>>>>>> > opal_carto_base_select failed >>>>>>> > --> Returned value -13 instead of OPAL_SUCCESS >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >>>>>>> file >>>>>>> > ../../orte/runtime/orte_init.c at line 78 >>>>>>> > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >>>>>>> file >>>>>>> > ../../orte/orted/orted_main.c at line 344 >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > A daemon (pid 11629) died unexpectedly with status 243 while >>>>>>> > attempting >>>>>>> > to launch so we are aborting. >>>>>>> > >>>>>>> > There may be more information reported by the environment (see >>>>>>> above). >>>>>>> > >>>>>>> > This may be because the daemon was unable to find all the needed >>>>>>> > shared >>>>>>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>> > have the >>>>>>> > location of the shared libraries on the remote nodes and this will >>>>>>> > automatically be forwarded to the remote nodes. >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > mpirun noticed that the job aborted, but has no info as to the >>>>>>> process >>>>>>> > that caused that situation. >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > mpirun: clean termination accomplished >>>>>>> > >>>>>>> > >>>>>>> > Lenny. >>>>>>> > >>>>>>> > >>>>>>> > On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote: >>>>>>> > > >>>>>>> > > Hi , >>>>>>> > > >>>>>>> > > I am currently testing the process affinity capabilities of >>>>>>> > openmpi and I >>>>>>> > > would like to know if the rankfile behaviour I will describe >>>>>>> below >>>>>>> > is normal >>>>>>> > > or not ? >>>>>>> > > >>>>>>> > > cat hostfile.0 >>>>>>> > > r011n002 slots=4 >>>>>>> > > r011n003 slots=4 >>>>>>> > > >>>>>>> > > cat rankfile.0 >>>>>>> > > rank 0=r011n002 slot=0 >>>>>>> > > rank 1=r011n003 slot=1 >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > >>>>>>> >>>>>>> ################################################################################## >>>>>>> > > >>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ### OK >>>>>>> > > r011n002 >>>>>>> > > r011n003 >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > >>>>>>> >>>>>>> ################################################################################## >>>>>>> > > but >>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >>>>>>> > hostname >>>>>>> > > ### CRASHED >>>>>>> > > * >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > Error, invalid rank (1) in the rankfile (rankfile.0) >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > rmaps_rank_file.c at line 404 >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > base/rmaps_base_map_job.c at line 87 >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > base/plm_base_launch_support.c at line 77 >>>>>>> > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>>>>>> file >>>>>>> > > plm_rsh_module.c at line 985 >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > A daemon (pid unknown) died unexpectedly on signal 1 while >>>>>>> > attempting to >>>>>>> > > launch so we are aborting. >>>>>>> > > >>>>>>> > > There may be more information reported by the environment (see >>>>>>> > above). >>>>>>> > > >>>>>>> > > This may be because the daemon was unable to find all the needed >>>>>>> > shared >>>>>>> > > libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>> > have the >>>>>>> > > location of the shared libraries on the remote nodes and this >>>>>>> will >>>>>>> > > automatically be forwarded to the remote nodes. >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > orterun noticed that the job aborted, but has no info as to the >>>>>>> > process >>>>>>> > > that caused that situation. >>>>>>> > > >>>>>>> > >>>>>>> -------------------------------------------------------------------------- >>>>>>> > > orterun: clean termination accomplished >>>>>>> > > * >>>>>>> > > It seems that the rankfile option is not propagted to the second >>>>>>> > command >>>>>>> > > line ; there is no global understanding of the ranking inside a >>>>>>> > mpirun >>>>>>> > > command. >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > >>>>>>> >>>>>>> ################################################################################## >>>>>>> > > >>>>>>> > > Assuming that , I tried to provide a rankfile to each command >>>>>>> line: >>>>>>> > > >>>>>>> > > cat rankfile.0 >>>>>>> > > rank 0=r011n002 slot=0 >>>>>>> > > >>>>>>> > > cat rankfile.1 >>>>>>> > > rank 0=r011n003 slot=1 >>>>>>> > > >>>>>>> > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf >>>>>>> > rankfile.1 >>>>>>> > > -n 1 hostname ### CRASHED >>>>>>> > > *[r011n002:28778] *** Process received signal *** >>>>>>> > > [r011n002:28778] Signal: Segmentation fault (11) >>>>>>> > > [r011n002:28778] Signal code: Address not mapped (1) >>>>>>> > > [r011n002:28778] Failing at address: 0x34 >>>>>>> > > [r011n002:28778] [ 0] [0xffffe600] >>>>>>> > > [r011n002:28778] [ 1] >>>>>>> > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >>>>>>> > 0(orte_odls_base_default_get_add_procs_data+0x55d) >>>>>>> > > [0x5557decd] >>>>>>> > > [r011n002:28778] [ 2] >>>>>>> > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >>>>>>> > 0(orte_plm_base_launch_apps+0x117) >>>>>>> > > [0x555842a7] >>>>>>> > > [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/ >>>>>>> > mca_plm_rsh.so >>>>>>> > > [0x556098c0] >>>>>>> > > [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>>>>>> > [0x804aa27] >>>>>>> > > [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>>>>>> > [0x804a022] >>>>>>> > > [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc) >>>>>>> > [0x9f1dec] >>>>>>> > > [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>>>>>> > [0x8049f71] >>>>>>> > > [r011n002:28778] *** End of error message *** >>>>>>> > > Segmentation fault (core dumped)* >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > I hope that I've found a bug because it would be very important >>>>>>> > for me to >>>>>>> > > have this kind of capabiliy . >>>>>>> > > Launch a multiexe mpirun command line and be able to bind my exes >>>>>>> > and >>>>>>> > > sockets together. >>>>>>> > > >>>>>>> > > Thanks in advance for your help >>>>>>> > > >>>>>>> > > Geoffroy >>>>>>> > _______________________________________________ >>>>>>> > users mailing list >>>>>>> > us...@open-mpi.org >>>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> -------------- next part -------------- >>>>>>> HTML attachment scrubbed and removed >>>>>>> >>>>>>> ------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> End of users Digest, Vol 1202, Issue 2 >>>>>>> ************************************** >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> -------------- next part -------------- >>>>>>> HTML attachment scrubbed and removed >>>>>>> >>>>>>> ------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> End of users Digest, Vol 1218, Issue 2 >>>>>>> ************************************** >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >