[OMPI users] Why are the static libs different if compiled with or without dynamic switch?
I am setting up Openmpi 1.8.4. The first time I compiled, I had the following: version=1.8.4.I1404211913 ./configure \ --disable-vt \ --prefix=/apps/share/openmpi/$version \ --disable-shared \ --enable-static \ --with-verbs \ --enable-mpirun-prefix-by-default \ --with-memory-manager=none \ --with-hwloc \ --with-lsf=/apps/share/LSF/9.1.3/9.1 \ --with-lsf-libdir=/apps/share/LSF/9.1.3/9.1/linux2.6-glibc2.3-x86_64/lib \ --with-wrapper-cflags="-shared-intel" \ --with-wrapper-cxxflags="-shared-intel" \ --with-wrapper-ldflags="-shared-intel" \ --with-wrapper-fcflags="-shared-intel" \ --enable-mpi-ext And when installed I get (as a sample): -rw-r--r-- 1 tommy 460g3 6881702 Feb 19 14:58 libmpi.a Now the second time I install, I had the same as above for the configure, but this time I took out the "--disable-shared" option. and again, as a sample -rw-r--r-- 1 tommy 460g3 6641598 Feb 24 13:53 libmpi.a Can someone tell me why the static libs are different (sizes) when compiling or not compiling the dynamic ones? Seems to me that static ones should be identical. Is this an issue? thanks for any info
[OMPI users] OpenMPI 1.10.5 oversubscribing cores
I posted this question last year and we ended up not upgrading to the newer openmpi. Now I need to change to openmpi 1.10.5 and have the same issue. Specifically, using 1.4.2, I can run two 12 core jobs on a 24 core node and the processes would bind to cores and only have 1 process per core. ie not oversubscribe. What I used with 1.4.2 was: mpirun --mca mpi_paffinity_alone 1 --mca btl openib,tcp,sm,self ... Now with 1.10.5, I have tried multiple combinations of map-to core, bind-to core etc and cannot run 2 jobs on the same node without oversubcribing. Is there a solution to this? Thanks for any info tom ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Can't get OPENMPI to run parallel job with Myrinet/GM
Would you be able to send me the mpirun command and args that you use? how can I get more output to study? I added "--display-map -d -v " to my mpirun command, which gives more output, but not the reason for the failure. The information contained herein is GOODYEAR PROPRIETARY information and includes GOODYEAR CONFIDENTIAL information. Reproduction of this document, disclosure of the information, and use for any purpose other than to conduct business with Goodyear is expressly prohibited. George Bosilca To Sent by: Open MPI Users users-bounces@ope cc n-mpi.org t901...@rds4020.akr.goodyear.com Subject Re: [OMPI users] Can't get OPENMPI 02/14/2008 10:18 to run parallel job with Myrinet/GM PM Please respond to Open MPI Users I run a full testing on the GM with 1.2.5 and with the trunk. Both of them run to completion without any errors. Moreover, the error message only say that one of the processes was terminated, which usually means that something bad happened somewhere else, and the runtime decided to terminate the whole job. This might be a segfault, an abort. Without more information it will be difficult to help or to offer any advice.. george. On Feb 14, 2008, at 11:15 AM, Tom Wurgler wrote: > > I am trying to use openmpi 1.2.5 (I also tried 1.2.4) to run a > parallel job > using GM drivers. The only message I get is: > > mpirun noticed that job rank 0 with PID 19508 on node node93 exited on > signal 15 (Terminated). > > I can run serially on one node (4 processors), it just dies when > trying to use > more than one node. > > Any help appreciated. > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Open MPI instructional videos
Jeff, I started viewing some of these. I think this is great stuff. thanks! Jeff Squyres To Sent by: Open MPI Users users-bounces@ope cc n-mpi.org Subject Re: [OMPI users] Open MPI 05/27/2008 08:20 instructional videos PM Please respond to Open MPI Users About a dozen people e-mailed me after I sent the first mail asking where the videos were located. :-) http://www.open-mpi.org/video/ Also, "Videos" is a link on the left-hand side navigation of the Open MPI web site, so there's no need to memorize the link. On May 27, 2008, at 6:43 PM, Graham Jenkins wrote: > Jeff Squyres wrote: >> Over the past year or two, I have been slowly creating a large set of >> Open MPI training material that I've used to present to my company's >> customers and partners. I have just recently received permission to >> release all of my slides to the greater HPC community. Woo hoo! > > Great idea Jeff, sounds really useful. But where do I find them? > -- > Graham Jenkins > Senior Software Specialist, eResearch > Monash University (Clayton Campus, Bldg 11, Rm S503) > > Email: graham.jenk...@its.monash.edu.au > Tel: +613 9905-5942 (office) +614 4850-2491 (mobile) > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] ulimit question from video open-fabrics-concepts...
HI, I am in one of your MPI instructional videos and have a question. You said to make sure the registered memory ulimit is set to unlimited. I type the command "ulimit -a" and don't see a registered memory entry. Is this maybe the same as "max locked memory"? Or can you tell me where to check and then set registered memory to unlimited? Thanks! tom ps: the videos are very helpful
[OMPI users] locked memory problem
I get the locked memory error as follows: -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) [node10:10395] [0,0,0]-[0,1,6] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) -- The OpenIB BTL failed to initialize while trying to allocate some locked memory. This typically can indicate that the memlock limits are set too low. For most HPC installations, the memlock limits should be set to "unlimited". The failure occured here: Host: node10 OMPI source: btl_openib.c:830 Function: ibv_create_cq() Device:mlx4_0 Memlock limit: 32768 You may need to consult with your system administrator to get this problem fixed. This FAQ entry on the Open MPI web site may also be helpful: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages -- I've read the above FAQ and still have problems. Here is the scenario. All cluster nodes are (supposed) to be the same. I can run just fine on all except a few nodes. For testing, I have closed all the nodes, and when I submit the job, LSF puts the job in PENDING state. Now if I use brun -m "node1 node10" jobid to release the job, it runs fine. But if I use brun -m "node10 node1" jobid it fails with the above OPENMPI error. I've checked the ulimit -a on all nodes, it is set to unlimited. I've added a .bashrc file and set the ulimit in there, as well as in my .cshrc file (I start on a csh shell and the jobs run in sh). I've compared environment settings and everything else I can think of. 3 nodes have the (bad) behaviour if they happen to be the lead node and run fine if they are not, the rest of the nodes run fine in either position. Anyone have any ideas about this? thanks! tom
[OMPI users] Using OPENMPI configured for MX, GM and OPENIB interconnects
I configure openmpi (1.3.3 and previous ones as well) to be able to have an executable able to run on any cluster we have. I used: ./configure --with-mx --with-openib --with-gm At the end of the day, the same executable does run on any of the clusters. The question I have is: When, for example, I run on an IB cluster, I get warning messages about not finding GM NICS and another transport will be used etc. And warnings about mca btl mx components not found etc. It DOES run the IB, but it never says that in the output. What I'd like is to not get warnings about what I don't have on the cluster in question and instead get a note that says it found the IB. Is this already possible? Or can I at least suppress the warnings for the not-found interconnects? thanks! tom
Re: [OMPI users] Using OPENMPI configured for MX, GM and OPENIB interconnects
I see. My one script for all clusters calls mpirun --mca btl openib,mx,gm,tcp,sm,self so I'd need to add some logic above the mpirun line to figure out what cluster I am on to setup the correct mpirun line. still seems like I should be able to do the mpirun line I have and just tell me what it found, not what it can't find. thanks for the workaround... -tom Scott Atchley To Sent by: Open MPI Users users-bounces@ope cc n-mpi.org Subject Re: [OMPI users] Using OPENMPI 08/26/2009 03:57 configured for MX, GM and OPENIB PMinterconnects Please respond to Open MPI Users On Aug 26, 2009, at 3:41 PM, twu...@goodyear.com wrote: > When, for example, I run on an IB cluster, I get warning messages > about not > finding GM NICS and another transport will be used etc. > And warnings about mca btl mx components not found etc. It DOES run > the > IB, but it never says that in the output. > > What I'd like is to not get warnings about what I don't have on the > cluster > in question and instead get a note that says it found the IB. > > Is this already possible? > > Or can I at least suppress the warnings for the not-found > interconnects? > > thanks! > tom You can use: $ mpirun -mca pml ob1 -mca mx,sm,self ... when running with MX. Substitute gm or ib when running on those networks. It may still fail over to TCP. To avoid that, you could run: $ mpirun -mca pml ob1 -mca ^mx,^ib,^tcp ... to tell it to run on anything (GM, shared memory and self) except MX, IB, and TCP. You probably do not need -mca pml ob1, but that will prevent the MX MTL from trying to start as well. Scott ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Openmpi 1.8.8 and affinty
In the past (v 1.6.4-) we used mpirun args of --mca mpi_paffinity_alone 1 --mca btl openib,tcp,sm,self with lsf 7.0.6, and this was enough to make cores not be oversubscribed when submitting 2 or more jobs to the same node. Now I am using 1.8.8 and thus far don't have the right combination of args to make sure cores don't oversubscribe. I am currently using: -report-bindings --map-by core --bind-to core --nooversubscribe --mca btl openib,tcp,sm,self Do I have the incorrect options or am I missing an option? Thanks for any info tom