[OMPI users] Why are the static libs different if compiled with or without dynamic switch?

2015-02-24 Thread twurgl

I am setting up Openmpi 1.8.4.  The first time I compiled, I had the following:

version=1.8.4.I1404211913
./configure \
--disable-vt \
--prefix=/apps/share/openmpi/$version \
--disable-shared \
--enable-static \
--with-verbs \
--enable-mpirun-prefix-by-default \
--with-memory-manager=none \
--with-hwloc \
--with-lsf=/apps/share/LSF/9.1.3/9.1 \
--with-lsf-libdir=/apps/share/LSF/9.1.3/9.1/linux2.6-glibc2.3-x86_64/lib \
--with-wrapper-cflags="-shared-intel" \
--with-wrapper-cxxflags="-shared-intel" \
--with-wrapper-ldflags="-shared-intel" \
--with-wrapper-fcflags="-shared-intel" \
--enable-mpi-ext

And when installed I get (as a sample): 

  -rw-r--r-- 1 tommy 460g3 6881702 Feb 19 14:58 libmpi.a

Now the second time I install, I had the same as above for the configure, but
this time I took out the "--disable-shared" option.

and again, as a sample 

  -rw-r--r-- 1 tommy 460g3 6641598 Feb 24 13:53 libmpi.a

Can someone tell me why the static libs are different (sizes) when compiling or
not compiling the dynamic ones?  Seems to me that static ones should be
identical.  Is this an issue?

thanks for any info


[OMPI users] OpenMPI 1.10.5 oversubscribing cores

2017-09-08 Thread twurgl

I posted this question last year and we ended up not upgrading to the newer
openmpi.  Now I need to change to openmpi 1.10.5 and have the same issue.

Specifically, using 1.4.2, I can run two 12 core jobs on a 24 core node and the
processes would bind to cores and only have 1 process per core.  ie not
oversubscribe.

What I used with 1.4.2 was:
mpirun --mca mpi_paffinity_alone 1 --mca btl openib,tcp,sm,self ...

Now with 1.10.5, I have tried multiple combinations of map-to core, bind-to core
etc and cannot run 2 jobs on the same node without oversubcribing.

Is there a solution to this?

Thanks for any info
tom
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Can't get OPENMPI to run parallel job with Myrinet/GM

2008-02-19 Thread twurgl
Would you be able to send me the mpirun command and args that you use?

how can I get more output to study?  I added "--display-map -d -v " to my
mpirun command, which gives more output, but not the reason for the
failure.


The information contained herein is GOODYEAR PROPRIETARY information and
includes GOODYEAR CONFIDENTIAL information. Reproduction of this
document, disclosure of the information, and use for any purpose other than
to conduct business with Goodyear is expressly prohibited.



 George Bosilca
   To 
 Sent by:  Open MPI Users  
 users-bounces@ope  cc 
 n-mpi.org t901...@rds4020.akr.goodyear.com
   Subject 
   Re: [OMPI users] Can't get OPENMPI  
 02/14/2008 10:18  to run parallel job with Myrinet/GM 
 PM


 Please respond to 
  Open MPI Users   
 






I run a full testing on the GM with 1.2.5 and with the trunk. Both of
them run to completion without any errors.

Moreover, the error message only say that one of the processes was
terminated, which usually means that something bad happened somewhere
else, and the runtime decided to terminate the whole job. This might
be a segfault, an abort. Without more information it will be difficult
to help or to offer any advice..

   george.

On Feb 14, 2008, at 11:15 AM, Tom Wurgler wrote:

>
> I am trying to use openmpi 1.2.5 (I also tried 1.2.4) to run a
> parallel job
> using GM drivers.  The only message I get is:
>
> mpirun noticed that job rank 0 with PID 19508 on node node93 exited on
> signal 15 (Terminated).
>
> I can run serially on one node (4 processors), it just dies when
> trying to use
> more than one node.
>
> Any help appreciated.
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Open MPI instructional videos

2008-05-28 Thread twurgl
Jeff,

I started viewing some of these.  I think this is great stuff.  thanks!




 Jeff Squyres  
 To 
 Sent by:  Open MPI Users  
 users-bounces@ope  cc 
 n-mpi.org 
   Subject 
   Re: [OMPI users] Open MPI   
 05/27/2008 08:20  instructional videos
 PM


 Please respond to 
  Open MPI Users   
 






About a dozen people e-mailed me after I sent the first mail asking
where the videos were located.  :-)

 http://www.open-mpi.org/video/

Also, "Videos" is a link on the left-hand side navigation of the Open
MPI web site, so there's no need to memorize the link.



On May 27, 2008, at 6:43 PM, Graham Jenkins wrote:

> Jeff Squyres wrote:
>> Over the past year or two, I have been slowly creating a large set of
>> Open MPI training material that I've used to present to my company's
>> customers and partners.  I have just recently received permission to
>> release all of my slides to the greater HPC community.  Woo hoo!
>
> Great idea Jeff, sounds really useful.  But where do I find them?
> --
> Graham Jenkins
> Senior Software Specialist, eResearch
> Monash University (Clayton Campus, Bldg 11, Rm S503)
>
> Email: graham.jenk...@its.monash.edu.au
> Tel:   +613 9905-5942 (office)   +614 4850-2491 (mobile)
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] ulimit question from video open-fabrics-concepts...

2008-05-29 Thread twurgl
HI,

I am in one of your MPI instructional videos and have a question.  You said
to make sure
the registered memory ulimit is set to unlimited.  I type the command
"ulimit -a"  and don't
see a registered memory entry.  Is this maybe the same as "max locked
memory"?
Or can you tell me where to check and then set registered memory to
unlimited?

Thanks!
tom

ps: the videos are very helpful



[OMPI users] locked memory problem

2008-06-11 Thread twurgl

I get the locked memory error as follows:

--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[node10:10395] [0,0,0]-[0,1,6] mca_oob_tcp_msg_recv: readv failed: 
Connection reset by peer (104)
--
The OpenIB BTL failed to initialize while trying to allocate some
locked memory.  This typically can indicate that the memlock limits
are set too low.  For most HPC installations, the memlock limits
should be set to "unlimited".  The failure occured here:

Host:  node10
OMPI source:   btl_openib.c:830
Function:  ibv_create_cq()
Device:mlx4_0
Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--

I've read the above FAQ and still have problems.  Here is the scenario.  All 
cluster nodes are (supposed) to be the same.
I can run just fine on all except a few nodes. For testing, I have closed all 
the nodes, and when I submit the job, LSF puts the job in PENDING state.

Now if I use

brun -m "node1 node10" jobid

to release the job, it runs fine.

But if I use

brun -m "node10 node1" jobid

it fails with the above OPENMPI error.

I've checked the ulimit -a on all nodes, it is set to unlimited.  I've added a 
.bashrc file and set the ulimit in there, as well as in my .cshrc file
(I start on a csh shell and the jobs run in sh).

I've compared environment settings and everything else I can think of.  3 nodes 
have the (bad) behaviour if they happen to be the lead node and run
fine if
they are not, the rest of the nodes run fine in either position.

Anyone have any ideas about this?

thanks!
tom



[OMPI users] Using OPENMPI configured for MX, GM and OPENIB interconnects

2009-08-26 Thread twurgl

I configure openmpi (1.3.3 and previous ones as well) to be able to have an
executable able to run on any cluster we have.

I used:   ./configure --with-mx  --with-openib --with-gm 

At the end of the day, the same executable does run on any of the clusters.

The question I have is:

When, for example, I run on an IB cluster, I get warning messages about not
finding GM NICS and another transport will be used etc.
And warnings about mca btl mx components not found etc.  It DOES run the
IB, but it never says that in the output.

What I'd like is to not get warnings about what I don't have on the cluster
in question and instead get a note that says it found the IB.

Is this already possible?

Or can I at least suppress the warnings for the not-found interconnects?

thanks!
tom



Re: [OMPI users] Using OPENMPI configured for MX, GM and OPENIB interconnects

2009-08-26 Thread twurgl
I see.  My one script for all clusters calls

mpirun --mca btl openib,mx,gm,tcp,sm,self 

so I'd need to add some logic above the mpirun line to figure out what
cluster I am on to setup the correct  mpirun line.

still seems like I should be able to do the mpirun line I have and just
tell me what it found, not what it can't find.

thanks for the workaround...

-tom



 Scott Atchley 
   To 
 Sent by:  Open MPI Users  
 users-bounces@ope  cc 
 n-mpi.org 
   Subject 
   Re: [OMPI users] Using OPENMPI  
 08/26/2009 03:57  configured for MX,   GM and OPENIB  
 PMinterconnects   


 Please respond to 
  Open MPI Users   
 






On Aug 26, 2009, at 3:41 PM, twu...@goodyear.com wrote:

> When, for example, I run on an IB cluster, I get warning messages
> about not
> finding GM NICS and another transport will be used etc.
> And warnings about mca btl mx components not found etc.  It DOES run
> the
> IB, but it never says that in the output.
>
> What I'd like is to not get warnings about what I don't have on the
> cluster
> in question and instead get a note that says it found the IB.
>
> Is this already possible?
>
> Or can I at least suppress the warnings for the not-found
> interconnects?
>
> thanks!
> tom

You can use:

$ mpirun -mca pml ob1 -mca mx,sm,self ...

when running with MX. Substitute gm or ib when running on those
networks.

It may still fail over to TCP. To avoid that, you could run:

$ mpirun -mca pml ob1 -mca ^mx,^ib,^tcp ...

to tell it to run on anything (GM, shared memory and self) except MX,
IB, and TCP.

You probably do not need -mca pml ob1, but that will prevent the MX
MTL from trying to start as well.

Scott
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Openmpi 1.8.8 and affinty

2016-01-15 Thread twurgl
In the past (v 1.6.4-) we used mpirun args of

--mca mpi_paffinity_alone 1 --mca btl openib,tcp,sm,self

with lsf 7.0.6, and this was enough to make cores not be oversubscribed when
submitting 2 or more jobs to the same node.

Now I am using 1.8.8 and thus far don't have the right combination of args to
make sure cores don't oversubscribe.

I am currently using:

-report-bindings --map-by core --bind-to core --nooversubscribe --mca btl
 openib,tcp,sm,self 

Do I have the incorrect options or am I missing an option? 

Thanks for any info

tom