Thanks, I am not in a hurry but it would be nice if I could benefit from this feature in the next release. Regards
Geoffroy 2009/4/20 <users-requ...@open-mpi.org> > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 20 Apr 2009 05:59:52 -0600 > From: Ralph Castain <r...@open-mpi.org> > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <6378a8c1-1763-4a1c-abca-c6fcc3605...@open-mpi.org> > Content-Type: text/plain; charset="us-ascii"; Format="flowed"; > DelSp="yes" > > Honestly haven't had time to look at it yet - hopefully in the next > couple of days... > > Sorry for delay > > > On Apr 20, 2009, at 2:58 AM, Geoffroy Pignot wrote: > > > Do you have any news about this bug. > > Thanks > > > > Geoffroy > > > > > > Message: 1 > > Date: Tue, 14 Apr 2009 07:57:44 -0600 > > From: Ralph Castain <r...@lanl.gov> > > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > > To: Open MPI Users <us...@open-mpi.org> > > Message-ID: <beb90473-0747-43bf-a1e9-6fa4e7777...@lanl.gov> > > Content-Type: text/plain; charset="us-ascii"; Format="flowed"; > > DelSp="yes" > > > > Ah now, I didn't say it -worked-, did I? :-) > > > > Clearly a bug exists in the program. I'll try to take a look at it (if > > Lenny doesn't get to it first), but it won't be until later in the > > week. > > > > On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: > > > > > I agree with you Ralph , and that 's what I expect from openmpi but > > > my second example shows that it's not working > > > > > > cat hostfile.0 > > > r011n002 slots=4 > > > r011n003 slots=4 > > > > > > cat rankfile.0 > > > rank 0=r011n002 slot=0 > > > rank 1=r011n003 slot=1 > > > > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 > > > hostname > > > ### CRASHED > > > > > > > > Error, invalid rank (1) in the rankfile (rankfile.0) > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > rmaps_rank_file.c at line 404 > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > base/rmaps_base_map_job.c at line 87 > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > base/plm_base_launch_support.c at line 77 > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > plm_rsh_module.c at line 985 > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > A daemon (pid unknown) died unexpectedly on signal 1 while > > > > attempting to > > > > > launch so we are aborting. > > > > > > > > > > There may be more information reported by the environment (see > > > > above). > > > > > > > > > > This may be because the daemon was unable to find all the needed > > > > shared > > > > > libraries on the remote node. You may set your LD_LIBRARY_PATH > > to > > > > have the > > > > > location of the shared libraries on the remote nodes and this > > will > > > > > automatically be forwarded to the remote nodes. > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > orterun noticed that the job aborted, but has no info as to the > > > > process > > > > > that caused that situation. > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > orterun: clean termination accomplished > > > > > > > > > > > > Message: 4 > > > Date: Tue, 14 Apr 2009 06:55:58 -0600 > > > From: Ralph Castain <r...@lanl.gov> > > > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > > > To: Open MPI Users <us...@open-mpi.org> > > > Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> > > > Content-Type: text/plain; charset="us-ascii"; Format="flowed"; > > > DelSp="yes" > > > > > > The rankfile cuts across the entire job - it isn't applied on an > > > app_context basis. So the ranks in your rankfile must correspond to > > > the eventual rank of each process in the cmd line. > > > > > > Unfortunately, that means you have to count ranks. In your case, you > > > only have four, so that makes life easier. Your rankfile would look > > > something like this: > > > > > > rank 0=r001n001 slot=0 > > > rank 1=r001n002 slot=1 > > > rank 2=r001n001 slot=1 > > > rank 3=r001n002 slot=2 > > > > > > HTH > > > Ralph > > > > > > On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote: > > > > > > > Hi, > > > > > > > > I agree that my examples are not very clear. What I want to do > > is to > > > > launch a multiexes application (masters-slaves) and benefit from > > the > > > > processor affinity. > > > > Could you show me how to convert this command , using -rf option > > > > (whatever the affinity is) > > > > > > > > mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host > > r001n002 > > > > master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 - > > > > host r001n002 slave.x options4 > > > > > > > > Thanks for your help > > > > > > > > Geoffroy > > > > > > > > > > > > > > > > > > > > > > > > Message: 2 > > > > Date: Sun, 12 Apr 2009 18:26:35 +0300 > > > > From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> > > > > Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? > > > > To: Open MPI Users <us...@open-mpi.org> > > > > Message-ID: > > > > <453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi, > > > > > > > > The first "crash" is OK, since your rankfile has ranks 0 and 1 > > > > defined, > > > > while n=1, which means only rank 0 is present and can be > > allocated. > > > > > > > > NP must be >= the largest rank in rankfile. > > > > > > > > What exactly are you trying to do ? > > > > > > > > I tried to recreate your seqv but all I got was > > > > > > > > ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile > > > > hostfile.0 > > > > -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname > > > > [witch19:30798] mca: base: component_find: paffinity > > > > "mca_paffinity_linux" > > > > uses an MCA interface that is not recognized (component MCA > > > v1.0.0 != > > > > supported MCA v2.0.0) -- ignored > > > > > > > > > > -------------------------------------------------------------------------- > > > > It looks like opal_init failed for some reason; your parallel > > > > process is > > > > likely to abort. There are many reasons that a parallel process > > can > > > > fail during opal_init; some of which are due to configuration or > > > > environment problems. This failure appears to be an internal > > > failure; > > > > here's some additional information (which may only be relevant > > to an > > > > Open MPI developer): > > > > > > > > opal_carto_base_select failed > > > > --> Returned value -13 instead of OPAL_SUCCESS > > > > > > > > > > -------------------------------------------------------------------------- > > > > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in > > > file > > > > ../../orte/runtime/orte_init.c at line 78 > > > > [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in > > > file > > > > ../../orte/orted/orted_main.c at line 344 > > > > > > > > > > -------------------------------------------------------------------------- > > > > A daemon (pid 11629) died unexpectedly with status 243 while > > > > attempting > > > > to launch so we are aborting. > > > > > > > > There may be more information reported by the environment (see > > > above). > > > > > > > > This may be because the daemon was unable to find all the needed > > > > shared > > > > libraries on the remote node. You may set your LD_LIBRARY_PATH to > > > > have the > > > > location of the shared libraries on the remote nodes and this will > > > > automatically be forwarded to the remote nodes. > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > -------------------------------------------------------------------------- > > > > mpirun noticed that the job aborted, but has no info as to the > > > process > > > > that caused that situation. > > > > > > > > > > -------------------------------------------------------------------------- > > > > mpirun: clean termination accomplished > > > > > > > > > > > > Lenny. > > > > > > > > > > > > On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote: > > > > > > > > > > Hi , > > > > > > > > > > I am currently testing the process affinity capabilities of > > > > openmpi and I > > > > > would like to know if the rankfile behaviour I will describe > > below > > > > is normal > > > > > or not ? > > > > > > > > > > cat hostfile.0 > > > > > r011n002 slots=4 > > > > > r011n003 slots=4 > > > > > > > > > > cat rankfile.0 > > > > > rank 0=r011n002 slot=0 > > > > > rank 1=r011n003 slot=1 > > > > > > > > > > > > > > > > > > > > > > > > > ################################################################################## > > > > > > > > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ### > > OK > > > > > r011n002 > > > > > r011n003 > > > > > > > > > > > > > > > > > > > > > > > > > ################################################################################## > > > > > but > > > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 > > > > hostname > > > > > ### CRASHED > > > > > * > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > Error, invalid rank (1) in the rankfile (rankfile.0) > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > rmaps_rank_file.c at line 404 > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > base/rmaps_base_map_job.c at line 87 > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > base/plm_base_launch_support.c at line 77 > > > > > [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in > > > file > > > > > plm_rsh_module.c at line 985 > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > A daemon (pid unknown) died unexpectedly on signal 1 while > > > > attempting to > > > > > launch so we are aborting. > > > > > > > > > > There may be more information reported by the environment (see > > > > above). > > > > > > > > > > This may be because the daemon was unable to find all the needed > > > > shared > > > > > libraries on the remote node. You may set your LD_LIBRARY_PATH > > to > > > > have the > > > > > location of the shared libraries on the remote nodes and this > > will > > > > > automatically be forwarded to the remote nodes. > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > orterun noticed that the job aborted, but has no info as to the > > > > process > > > > > that caused that situation. > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > orterun: clean termination accomplished > > > > > * > > > > > It seems that the rankfile option is not propagted to the second > > > > command > > > > > line ; there is no global understanding of the ranking inside a > > > > mpirun > > > > > command. > > > > > > > > > > > > > > > > > > > > > > > > > ################################################################################## > > > > > > > > > > Assuming that , I tried to provide a rankfile to each command > > > line: > > > > > > > > > > cat rankfile.0 > > > > > rank 0=r011n002 slot=0 > > > > > > > > > > cat rankfile.1 > > > > > rank 0=r011n003 slot=1 > > > > > > > > > > mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf > > > > rankfile.1 > > > > > -n 1 hostname ### CRASHED > > > > > *[r011n002:28778] *** Process received signal *** > > > > > [r011n002:28778] Signal: Segmentation fault (11) > > > > > [r011n002:28778] Signal code: Address not mapped (1) > > > > > [r011n002:28778] Failing at address: 0x34 > > > > > [r011n002:28778] [ 0] [0xffffe600] > > > > > [r011n002:28778] [ 1] > > > > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. > > > > 0(orte_odls_base_default_get_add_procs_data+0x55d) > > > > > [0x5557decd] > > > > > [r011n002:28778] [ 2] > > > > > /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. > > > > 0(orte_plm_base_launch_apps+0x117) > > > > > [0x555842a7] > > > > > [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/ > > > > mca_plm_rsh.so > > > > > [0x556098c0] > > > > > [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun > > > > [0x804aa27] > > > > > [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun > > > > [0x804a022] > > > > > [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc) > > > > [0x9f1dec] > > > > > [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun > > > > [0x8049f71] > > > > > [r011n002:28778] *** End of error message *** > > > > > Segmentation fault (core dumped)* > > > > > > > > > > > > > > > > > > > > I hope that I've found a bug because it would be very important > > > > for me to > > > > > have this kind of capabiliy . > > > > > Launch a multiexe mpirun command line and be able to bind my > > exes > > > > and > > > > > sockets together. > > > > > > > > > > Thanks in advance for your help > > > > > > > > > > Geoffroy > > > > _______________________________________________ > > > > users mailing list > > > > us...@open-mpi.org > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -------------- next part -------------- > > > HTML attachment scrubbed and removed > > > > > > ------------------------------ > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > End of users Digest, Vol 1202, Issue 2 > > > ************************************** > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -------------- next part -------------- > > HTML attachment scrubbed and removed > > > > ------------------------------ > > > > Message: 2 > > Date: Tue, 14 Apr 2009 10:30:58 -0400 > > From: Prentice Bisbal <prent...@ias.edu> > > Subject: Re: [OMPI users] PGI Fortran pthread support > > To: Open MPI Users <us...@open-mpi.org> > > Message-ID: <49e49e22.9040...@ias.edu> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Orion, > > > > I have no trouble getting thread support during configure with PGI > > 8.0-3 > > > > Are there any other compilers in your path before the PGI compilers? > > Even if the PGI compilers come first, try specifying the PGI compilers > > explicitly with these environment variables (bash syntax shown): > > > > export CC=pgcc > > export CXX=pgCC > > export F77=pgf77 > > export FC=pgf90 > > > > also check the value of CPPFLAGS and LDFLAGS, and make sure they are > > correct for your PGI compilers. > > > > -- > > Prentice > > > > Orion Poplawski wrote: > > > Seeing the following building openmpi 1.3.1 on CentOS 5.3 with PGI > > pgf90 > > > 8.0-5 fortran compiler: > > > > > > checking if C compiler and POSIX threads work with -Kthread... no > > > checking if C compiler and POSIX threads work with -kthread... no > > > checking if C compiler and POSIX threads work with -pthread... yes > > > checking if C++ compiler and POSIX threads work with -Kthread... no > > > checking if C++ compiler and POSIX threads work with -kthread... no > > > checking if C++ compiler and POSIX threads work with -pthread... yes > > > checking if F77 compiler and POSIX threads work with -Kthread... no > > > checking if F77 compiler and POSIX threads work with -kthread... no > > > checking if F77 compiler and POSIX threads work with -pthread... no > > > checking if F77 compiler and POSIX threads work with -pthreads... no > > > checking if F77 compiler and POSIX threads work with -mt... no > > > checking if F77 compiler and POSIX threads work with -mthreads... no > > > checking if F77 compiler and POSIX threads work with -lpthreads... > > no > > > checking if F77 compiler and POSIX threads work with -llthread... no > > > checking if F77 compiler and POSIX threads work with -lpthread... no > > > checking for PTHREAD_MUTEX_ERRORCHECK_NP... yes > > > checking for PTHREAD_MUTEX_ERRORCHECK... yes > > > checking for working POSIX threads package... no > > > checking if C compiler and Solaris threads work... no > > > checking if C++ compiler and Solaris threads work... no > > > checking if F77 compiler and Solaris threads work... no > > > checking for working Solaris threads package... no > > > checking for type of thread support... none found > > > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > End of users Digest, Vol 1202, Issue 4 > > ************************************** > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > -------------- next part -------------- > HTML attachment scrubbed and removed > > ------------------------------ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 1208, Issue 2 > ************************************** >