Re: [OMPI users] HPMPI versus OpenMPI performance

2008-06-04 Thread Mukesh K Srivastava
Hi

Could you specify few things -

(a) The version of PG Compilers being used both for HPMPI & OMPI. Is the
compilers same or the version same?

(b) Could you share the configure command given for OMPI with PG compilers?

(c) Could you compare the threads enable/disable for both cases needed?

(d) Is ptmalloc2 being supported for HPMPI or with IB end, could you analyze
the same with OPMPI too?

(e) Could you mail the log files for both.

BR
Mukesh
srimk...@gmail.com


Re: [OMPI users] HPMPI versus OpenMPI performance

2008-06-04 Thread Mukesh K Srivastava
Hi

Could you specify few things -

(a) The version of PG Compilers being used both for HPMPI & OMPI. Is the
compilers same or the version same?

(b) Could you share the configure command given for OMPI with PG compilers?

(c) Could you compare the threads enable/disable for both cases needed?

(d) Is ptmalloc2 being enabled/disabled for HPMPI or with IB end, could you
analyze the same with OPMPI too?

(e) Could you mail the log files for both.

BR
Mukesh
srimk...@gmail.com


[OMPI users] GCC extendability to OpenMPI Specification

2008-06-04 Thread Mukesh K Srivastava
Hi OMPI Community.


Is there any thought process to extend GCC support to OpenMPI or
implementation of OpenMPI specification in GCC for C, C++ & Fortran and
making it generally available for platforms which supports POSIX.

Can GCC community think extending a support library for OpenMPI in it's
releases.

BR
Mukesh


Re: [OMPI users] GCC extendability to OpenMPI Specification

2008-06-04 Thread Andreas Schäfer
Hi Mukesh,

Open MPI is an implementation of the MPI standard. Its API is thus the
one of a library, which is contrary to, say OpenMP, which requires
changes to the compiler. 

Open MPI already supports C, C++ and Fortran for virtually any
compiler and platform.

For what it's worth, there is little room to modify a compiler to
improve MPI bindings (maybe except for this[1]) without breaking the
whole MPI interface. How would you envision such a change?

HTH
-Andreas


[1] 
@inproceedings{DBLP:conf/pvm/Renault07,
  author= {{\'E}ric Renault},
  title = {Extended MPICC to Generate MPI Derived Datatypes from C
   Datatypes Automatically},
  booktitle = {PVM/MPI},
  year  = {2007},
  pages = {307-314},
  ee= {http://dx.doi.org/10.1007/978-3-540-75416-9_42},
  crossref  = {DBLP:conf/pvm/2007},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

@proceedings{DBLP:conf/pvm/2007,
  editor= {Franck Cappello and
   Thomas H{\'e}rault and
   Jack Dongarra},
  title = {Recent Advances in Parallel Virtual Machine and Message
   Passing Interface, 14th European PVM/MPI User's Group Meeting,
   Paris, France, September 30 - October 3, 2007, Proceedings},
  booktitle = {PVM/MPI},
  publisher = {Springer},
  series= {Lecture Notes in Computer Science},
  volume= {4757},
  year  = {2007},
  isbn  = {978-3-540-75415-2},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}


-- 

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!


pgpAEWzg0LLbj.pgp
Description: PGP signature


Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Andreas Schäfer
On 16:48 Tue 03 Jun , Jeff Squyres wrote:
> - more importantly, however, the audience likes to take the slides  
> away and when they actually look at them 6 weeks after the lecture,  
> they might actually remember the content better because they received  
> the same information via two forms of sensory input (audio + visual).

I consider him as a authority on this subject: ;-)

http://isites.harvard.edu/fs/html/icb.topic58703/winston1.html

> Plus it *is* just the builtin microphone on my Mac, so it may not be
> the greatest sound quality to begin with.  :-)

*gasp* ;-)

> As for .mov, yes, this is definitely a compromise.  I tried uploading  
> the videos to YouTube and Google Video and a few others, but a) most  
> have a time or file size restriction (e.g., 10 mins max) -- I was not  
> willing to spend the extra work to split up the videos into multiple  
> segments, and b) they down-res'ed the videos so much as to make the  
> slides look crappy and/or unreadable.  So I had to go with the video  
> encoder that I could get for darn little money (Cisco's a big company,  
> but my budget is still tiny :-) ).  That turned out to be a fun little  
> program called iShowU for OS X that does screen scraping + audio  
> capture.  It outputs Quicktime movies, so that was really my only  
> choice.
> 
> Is it a real hardship for people to install the QT player?  Are there  
> easy-to-install convertors?  I'm not opposed to hosting it in multiple  
> formats if it's easy and free to convert them.

Well, it's not so hard to install QT, but then again: many people
won't do it because it takes that two minutes extra. There are a lot
of open source converters. I prefer transcode (www.transcoding.org)
and would suggest MPEG output (MPEG 4, or MPEG 2 if you really
must). But that's just what I prefer.

Cheers
-Andi


-- 

Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net


(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your 
signature to help him gain world domination!


pgp_J3zTfQ5q1.pgp
Description: PGP signature


Re: [OMPI users] GCC extendability to OpenMPI Specification

2008-06-04 Thread Joe Landman

Mukesh K Srivastava wrote:

Hi OMPI Community.


Is there any thought process to extend GCC support to OpenMPI or
implementation of OpenMPI specification in GCC for C, C++ & Fortran and
making it generally available for platforms which supports POSIX.


Hi Mukesh:

  Open MPI is already written in C, and compiles correctly under 
several different GCC flavors.  It works with C, C++, and Fortran90 
currently, on multiple POSIX platforms.


  Are you asking about including this natively with GCC?  As OpenMP is 
integrated into the compiler?


  Please remember that MPI is generally a set of function/method calls 
that implement particular operations.  Even with a system as "simple" as 
OpenMP (not Open MPI), automatic parallelization in the general quite 
hard (and you don't get great performance).  MPI does force you to think 
about communication topology as well as many other issues.  It would be 
great if the compiler could handle this for you, but unfortunately, it 
is far more than a "simple matter of programming".  This is a genuinely 
hard problem.



Can GCC community think extending a support library for OpenMPI in it's
releases.


  You can build the lib*.so for each compiler, but please remember that 
the builds can be different (just look at all those nice options in 
configure!).  The API should be the same, so as long as you build 
against the same version (Jeff and others, please correct me if this 
assumption is not correct), you should be able to move your runtime 
linked binaries around.




BR
Mukesh


Joe


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
   http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


Re: [OMPI users] eigenvalue problem

2008-06-04 Thread Adrian Knoth
On Fri, May 30, 2008 at 10:22:42PM +0200, Radovan Herchel wrote:

> Unfortunately, Arpack is suitable only to calculate a few eigenvalues,
> not all.

I don't know much about this math stuff, but people over here like SAGE:

   http://www.sagemath.org

It has an MPI binding, programming can be done with Python.

As it comes with highly optimized libraries for linear algebra, it might
solve your problem.

A pointer to an example:

   
http://blog.mikael.johanssons.org/archive/2008/05/parallell-and-cluster-mpi4py/



-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Adrian Knoth
On Tue, Jun 03, 2008 at 04:48:50PM -0400, Jeff Squyres wrote:

> As for .mov, yes, this is definitely a compromise.  I tried uploading  
> the videos to YouTube and Google Video and a few others, but a) most  

QT sucks. Youtube (Flash) sucks.


> slides look crappy and/or unreadable.  So I had to go with the video  
> encoder that I could get for darn little money (Cisco's a big company,  
> but my budget is still tiny :-) ).  That turned out to be a fun little  
> program called iShowU for OS X that does screen scraping + audio  
> capture.  It outputs Quicktime movies, so that was really my only  
> choice.

People usually recommend ffmpegX for OSX. You might give it a whirl to
transcode your mov to something else, let's say H.264 in an AVI
container. (MP4/AVC, DivX, xvid, there are so many names for it)

You can also create flv (flash video) and use one of the free
flv-Players to achieve some kind of Youtube (i.e. playing right in the
browser), just without the 10minutes limitation.

There's also the SMIL format. It allows to reference some images (your
slides), the timing between them and your audio stream. It's like
composing your presentation, though some sophisticated products exist
to capture video, the slides, additional material and notes written by
hand on a digital screen.

But I guess it's too complicated for your purpose.


> Is it a real hardship for people to install the QT player?

It's just because QT sucks. And the data streams are proprietary (very
big concern). 


PS: My private list of sophisticated players: mplayer, VLC, xine.
Braindead players: realplayer, QT, Windows Media Player


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Adrian Knoth
On Wed, Jun 04, 2008 at 11:19:48AM +0200, Adrian Knoth wrote:

> People usually recommend ffmpegX for OSX. You might give it a whirl to
> transcode your mov to something else, let's say H.264 in an AVI
> container. (MP4/AVC, DivX, xvid, there are so many names for it)

I've checked your files, they're quite good. They are already H.264 and
AAC (advanced audio coding), the only thing wrong is the mov container.

It's easy to repack this to avi:

   $ mencoder input.mov -ovc copy -oac copy -o output.avi

I've tested it with openib-btl-tuning:

adi@chopin:/tmp$ ls -l openib-btl-tuning-v1.2.mov 
-rw-r--r-- 1 adi adi 16249094 Jun  4 11:24 openib-btl-tuning-v1.2.mov

adi@chopin:/tmp$ ls -l ompi-test.avi 
-rw-r--r-- 1 adi adi 15964104 Jun  4 11:26 ompi-test.avi

(you can download it here: )

On the other hand, the files are way too large. The video doesn't
contain much inter-frame correlation, so it's a good idea to give the
encoder some hints:

adi@chopin:/tmp$ ls -l test*.avi
-rw-r--r-- 1 adi adi 36171648 Jun  4 11:47 test.avi
-rw-r--r-- 1 adi adi 35323842 Jun  4 11:58 testx264.avi

(from approx. 160MB to 35MB). The first one is MPEG4 with an MP3 audio
stream, the second is H.264. Both video encoders were forced to 100kbit/s
and keyframes every 300 frames (not for x264):

   $ mencoder input.mov -oac mp3lame -ovc lavc -lavcopts \
 vbitrate=100:keyint=300 -o output.avi

For testing purposes, try http://adi.loris.tv/ompi-optimized.avi

I'd like to hear if these files, especially the last one, are working for
other users.

If so, I'd take care to convert the movs to avi, probably MPEG4.

(in that case: Jeff, you could probably give me all files in an archive
or point to a direct download link, so I don't have to click through the
website but just fire up the encoder in the for loop)

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Jeff Squyres
FWIW: I tried the http://adi.loris.tv/ompi-optimized.avi URL on my Mac  
and got redirected to the Quicktime plugin page.  I had no idea which  
plugin would make it play AVI files, so I skipped it.  I tried the URL  
on a Windows machine and Windows Media Player (i.e., what came up by  
default) seemed to play the audio just fine, but it couldn't find a  
video codec.  Here's the error message that it showed:


The codec you are missing is not available for download from this  
site.
You might be able to find it on another site by searching the Web  
for

"FMP4" (this is the WaveFormat or FourCC identifier of the codec).

That being said, if we can get a format that works nicely on multiple  
platforms / is convenient for users, it would be great if you could  
convert them for me -- thanks!


The entire web site is in a SVN repository (our mirrors just run "svn  
up" every night): http://svn.open-mpi.org/svn/ompi-www/trunk.  The  
videos are under /video (the same directory structure of the web site).



On Jun 4, 2008, at 6:13 AM, Adrian Knoth wrote:


On Wed, Jun 04, 2008 at 11:19:48AM +0200, Adrian Knoth wrote:

People usually recommend ffmpegX for OSX. You might give it a whirl  
to

transcode your mov to something else, let's say H.264 in an AVI
container. (MP4/AVC, DivX, xvid, there are so many names for it)


I've checked your files, they're quite good. They are already H.264  
and
AAC (advanced audio coding), the only thing wrong is the mov  
container.


It's easy to repack this to avi:

  $ mencoder input.mov -ovc copy -oac copy -o output.avi

I've tested it with openib-btl-tuning:

adi@chopin:/tmp$ ls -l openib-btl-tuning-v1.2.mov
-rw-r--r-- 1 adi adi 16249094 Jun  4 11:24 openib-btl-tuning-v1.2.mov

adi@chopin:/tmp$ ls -l ompi-test.avi
-rw-r--r-- 1 adi adi 15964104 Jun  4 11:26 ompi-test.avi

(you can download it here: )

On the other hand, the files are way too large. The video doesn't
contain much inter-frame correlation, so it's a good idea to give the
encoder some hints:

adi@chopin:/tmp$ ls -l test*.avi
-rw-r--r-- 1 adi adi 36171648 Jun  4 11:47 test.avi
-rw-r--r-- 1 adi adi 35323842 Jun  4 11:58 testx264.avi

(from approx. 160MB to 35MB). The first one is MPEG4 with an MP3 audio
stream, the second is H.264. Both video encoders were forced to  
100kbit/s

and keyframes every 300 frames (not for x264):

  $ mencoder input.mov -oac mp3lame -ovc lavc -lavcopts \
vbitrate=100:keyint=300 -o output.avi

For testing purposes, try http://adi.loris.tv/ompi-optimized.avi

I'd like to hear if these files, especially the last one, are  
working for

other users.

If so, I'd take care to convert the movs to avi, probably MPEG4.

(in that case: Jeff, you could probably give me all files in an  
archive
or point to a direct download link, so I don't have to click through  
the

website but just fire up the encoder in the for loop)

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Jeff Squyres

On Jun 4, 2008, at 3:54 AM, Andreas Schäfer wrote:


I consider him as a authority on this subject: ;-)

http://isites.harvard.edu/fs/html/icb.topic58703/winston1.html



Thanks -- I'll have a look!

--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Scott Atchley

Jeff,

If I remember correctly, Microsoft dropped support for .AVI 3-4 years  
ago so it can no longer be played by their media player. It is also  
not native to QT, so you will have to download a plugin (I have it  
somewhere if you want me to look for it).


I do not know if there is a container format that all players are  
happy with. My guess would be that MP4 would be the closest.


You can convert .mov to .mp4 with a free command-line app, qt_export,  
which is part of qt_tools. It uses the QT libraries and can transcode  
to a number of video and audio formats (including AVI if you have the  
QT plugin).


Scott

On Jun 4, 2008, at 8:31 AM, Jeff Squyres wrote:


FWIW: I tried the http://adi.loris.tv/ompi-optimized.avi URL on my Mac
and got redirected to the Quicktime plugin page.  I had no idea which
plugin would make it play AVI files, so I skipped it.  I tried the URL
on a Windows machine and Windows Media Player (i.e., what came up by
default) seemed to play the audio just fine, but it couldn't find a
video codec.  Here's the error message that it showed:

The codec you are missing is not available for download from this
site.
You might be able to find it on another site by searching the Web
for
"FMP4" (this is the WaveFormat or FourCC identifier of the codec).

That being said, if we can get a format that works nicely on multiple
platforms / is convenient for users, it would be great if you could
convert them for me -- thanks!

The entire web site is in a SVN repository (our mirrors just run "svn
up" every night): http://svn.open-mpi.org/svn/ompi-www/trunk.  The
videos are under /video (the same directory structure of the web  
site).



On Jun 4, 2008, at 6:13 AM, Adrian Knoth wrote:


On Wed, Jun 04, 2008 at 11:19:48AM +0200, Adrian Knoth wrote:


People usually recommend ffmpegX for OSX. You might give it a whirl
to
transcode your mov to something else, let's say H.264 in an AVI
container. (MP4/AVC, DivX, xvid, there are so many names for it)


I've checked your files, they're quite good. They are already H.264
and
AAC (advanced audio coding), the only thing wrong is the mov
container.

It's easy to repack this to avi:

 $ mencoder input.mov -ovc copy -oac copy -o output.avi

I've tested it with openib-btl-tuning:

adi@chopin:/tmp$ ls -l openib-btl-tuning-v1.2.mov
-rw-r--r-- 1 adi adi 16249094 Jun  4 11:24 openib-btl-tuning-v1.2.mov

adi@chopin:/tmp$ ls -l ompi-test.avi
-rw-r--r-- 1 adi adi 15964104 Jun  4 11:26 ompi-test.avi

(you can download it here: )

On the other hand, the files are way too large. The video doesn't
contain much inter-frame correlation, so it's a good idea to give the
encoder some hints:

adi@chopin:/tmp$ ls -l test*.avi
-rw-r--r-- 1 adi adi 36171648 Jun  4 11:47 test.avi
-rw-r--r-- 1 adi adi 35323842 Jun  4 11:58 testx264.avi

(from approx. 160MB to 35MB). The first one is MPEG4 with an MP3  
audio

stream, the second is H.264. Both video encoders were forced to
100kbit/s
and keyframes every 300 frames (not for x264):

 $ mencoder input.mov -oac mp3lame -ovc lavc -lavcopts \
   vbitrate=100:keyint=300 -o output.avi

For testing purposes, try http://adi.loris.tv/ompi-optimized.avi

I'd like to hear if these files, especially the last one, are
working for
other users.

If so, I'd take care to convert the movs to avi, probably MPEG4.

(in that case: Jeff, you could probably give me all files in an
archive
or point to a direct download link, so I don't have to click through
the
website but just fire up the encoder in the for loop)

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






[OMPI users] disabling tcp altogether

2008-06-04 Thread tayfun sen

Hello,

I would like to run an OpenMPI application on one node and since I think 
it would be better performance wise I want it to use shared memory for 
communication and not tcp. Is it possible to use shared memory not only 
for MPI communication but also for control messages and other similar 
inner MPI related communication? (so no tcp communication whatsoever is 
used).


I came up with following parameters but I am receiving an error when I 
use it:

mpirun --host localhost --mca btl sm,self --mca oob ^tcp -n 2 hello

It's running a simple hello world application. I know I don't have to 
use the host parameter since by default it will run on localhost but 
just to be on the safe side I included that too. I ask btl to use sm and 
self (I guess "self" is compulsory) and instruct oob to not use tcp (per 
the last lines in 
http://www.open-mpi.org/faq/?category=tcp#tcp-selection ). Isn't this 
correct?


Here's the exact error:

# mpirun --host localhost --mca btl sm,self --mca oob ^tcp -n 2 hello
[myhost:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init_stage1.c at line 182

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_rml_base_select failed
 --> Returned value -13 instead of ORTE_SUCCESS

--
[peanutbutter:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
runtime/orte_system_init.c at line 42
[peanutbutter:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init.c at line 52

--
Open RTE was unable to initialize properly.  The error occured while
attempting to orte_init().  Returned value -13 instead of ORTE_SUCCESS.
--



[OMPI users] tg3 module

2008-06-04 Thread Leonardo Fialho

Hi All,

I´m experimenting a strange problem. I don´t know if it was reported, 
but, thats is:


when I run Open MPI in a specific cluster the network card module (tg3) 
goes down... and in some minutes go up again. Of course its results in 
"[nodo22][[56833,1],3][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] 
mca_btl_tcp_frag_recv: readv failed: No route to host (113)".


I run the same application in other cluster (with other network module) 
and I get no errors.
I run the same application in the same cluster using MPICH and I get no 
errors.


Kernel (on "dead" node) logs it:
nfs: server 192.168.65.100 not responding, still trying
NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
nfs: server 192.168.65.100 OK

Anybody knows what is happening? I´m using Open MPI defaults, without 
any -mca or -am parameters.


Some info:

[lfialho@aoclsp ~]$ /sbin/modinfo tg3
filename:   /lib/modules/2.6.17-1.2142_FC4smp/kernel/drivers/net/tg3.ko
version:3.59
license:GPL
description:Broadcom Tigon3 ethernet driver
author: David S. Miller (da...@redhat.com) and Jeff Garzik 
(jgar...@pobox.com)

srcversion: CE9C9B036713CF38C2EE194
depends:
vermagic:   2.6.17-1.2142_FC4smp SMP mod_unload 686 REGPARM 4KSTACKS 
gcc-4.0
parm:   tg3_debug:Tigon3 bitmapped debugging message enable 
value (int)

[lfialho@aoclsp ~]$

[lfialho@aoclsp ~]$ uname -a
Linux aoclsp.uab.es 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 
2006 i686 i686 i386 GNU/Linux

[lfialho@aoclsp ~]$

[lfialho@aoclsp ~]$ gcc --version
gcc (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
Copyright (C) 2005 Free Software Foundation, Inc.
[lfialho@aoclsp ~]$

[lfialho@aoclsp ~]$ /opt/radic-mpi/bin/ompi_info
Package: Open MPI lfia...@aoclsp.uab.es Distribution
   Open MPI: 1.3a1-1
  Open MPI SVN revision: -1
   Open RTE: 1.3a1-1
  Open RTE SVN revision: -1
   OPAL: 1.3a1-1
  OPAL SVN revision: -1
   Ident string: 1.3a1-1
 Prefix: /opt/radic-mpi/
Configured architecture: i686-pc-linux-gnu
 Configure host: aoclsp.uab.es
  Configured by: lfialho
  Configured on: Tue Jun  3 16:16:08 CEST 2008
 Configure host: aoclsp.uab.es
   Built by: lfialho
   Built on: mar jun  3 16:41:19 CEST 2008
 Built host: aoclsp.uab.es
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (all)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc
C compiler absolute: /usr/bin/gcc
   C++ compiler: g++
  C++ compiler absolute: /usr/bin/g++
 Fortran77 compiler: gfortran
 Fortran77 compiler abs: /usr/bin/gfortran
 Fortran90 compiler: gfortran
 Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
  Sparse Groups: no
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
  Heterogeneous support: yes
mpirun default --prefix: no
MPI I/O support: yes
  MPI_WTIME support: gettimeofday
Symbol visibility support: yes
  FT Checkpoint support: yes  (checkpoint thread: no)
  MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.3)
 MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.3)
  MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
  MCA carto: auto_detect (MCA v1.0, API v1.0, Component v1.3)
  MCA carto: file (MCA v1.0, API v1.0, Component v1.3)
  MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.3)
  MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
MCA crs: blcr (MCA v1.0, API v1.0, Component v1.3)
MCA crs: self (MCA v1.0, API v1.0, Component v1.3)
MCA dpm: orte (MCA v1.0, API v1.0, Component v1.3)
 MCA pubsub: orte (MCA v1.0, API v1.0, Component v1.3)
  MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
  MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
   MCA coll: basic (MCA v1.0, API v1.1, Component v1.3)
   MCA coll: inter (MCA v1.0, API v1.1, Component v1.3)
   MCA coll: self (MCA v1.0, API v1.1, Component v1.3)
   MCA coll: sm (MCA v1.0, API v1.1, Component v1.3)
   MCA coll: tuned (MCA v

Re: [OMPI users] openmpi 32-bit g++ compilation issue

2008-06-04 Thread Jeff Squyres

Sorry for the delay in replying.

This looks like a problem on your system -- I think Doug is right:  
your system seems to be picking the wrong libraries when you specify - 
m32.  Can you compile any C++ libraries/binaries with -m32 successfully?



On May 19, 2008, at 5:48 PM, Arif Ali wrote:


Hi,

OS: SLES10 SP1
OFED: 1.3
openmpi: 1.2 1.2.5 1.2.6
compilers: gcc g++ gfortran

I am creating a 32-bit build of openmpi on an Infiniband cluster,  
and the compilation gets stuck, If I use the /usr/lib64/gcc/x86_64- 
suse-linux/4.1.2/32/libstdc++.so library manually it compiles that  
piece of code. I was wandering if anyone else has had this problem.  
Or is there any other way of getting this to work. I feel that there  
may be something very silly here that I have missed out. but I can't  
seem to gather it.


I have also tried this on a fresh install of OFED 1.3 with openmpi  
1.2.6



libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - 
I../../../orte/include -I../../../ompi/include - 
DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - 
I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT file.lo - 
MD -MP -MF .deps/file.Tpo -c file.cc  -fPIC -DPIC -o .libs/file.o

depbase=`echo win.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../../libtool --tag=CXX   --mode=compile g++ - 
DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include - 
I../../../ompi/include  -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 - 
DOMPI_SKIP_MPICXX=1 -I../../..-O3 -DNDEBUG -m32 -finline- 
functions -pthread -MT win.lo -MD -MP -MF $depbase.Tpo -c -o win.lo  
win.cc &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - 
I../../../orte/include -I../../../ompi/include - 
DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - 
I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT win.lo - 
MD -MP -MF .deps/win.Tpo -c win.cc  -fPIC -DPIC -o .libs/win.o
/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG - 
m32 -finline-functions -pthread  -export-dynamic -m32  -o  
libmpi_cxx.la -rpath /opt/openmpi/1.2.6/gnu_4.1.2/32/lib mpicxx.lo  
intercepts.lo comm.lo datatype.lo file.lo win.lo  -lnsl -lutil  -lm
libtool: link: g++ -shared -nostdlib /usr/lib64/gcc/x86_64-suse- 
linux/4.1.2/../../../../lib/crti.o /usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/32/crtbeginS.o  .libs/mpicxx.o .libs/intercepts.o .libs/ 
comm.o .libs/datatype.o .libs/file.o .libs/win.o   -Wl,-rpath -Wl,/ 
usr/lib64/gcc/x86_64-suse-linux/4.1.2 -Wl,-rpath -Wl,/usr/lib64/gcc/ 
x86_64-suse-linux/4.1.2 -lnsl -lutil -L/usr/lib64/gcc/x86_64-suse- 
linux/4.1.2/32 -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../ 
x86_64-suse-linux/lib/../lib -L/usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib64/ 
gcc/x86_64-suse-linux/4.1.2 -L/usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64-suse- 
linux/4.1.2/../../.. /usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc+ 
+.so -lm -lpthread -lc -lgcc_s /usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/32/crtendS.o /usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/../../../../lib/crtn.o  -m32 -pthread -m32   -pthread -Wl,- 
soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0
/usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc++.so: could not read  
symbols: File in wrong format

collect2: ld returned 1 exit status
--
Arif Ali
Software Engineer
OCF plc

Mobile: +44 (0)7970 148 122
DDI:+44 (0)114 257 2240
Office: +44 (0)114 257 2200
Fax:+44 (0)114 257 0022
Email:  a...@ocf.co.uk
Web:http://www.ocf.co.uk

Support Phone:   +44 (0)845 702 3829
Support E-mail:  supp...@ocf.co.uk

Skype:  arif_ali80
MSN:a...@ocf.co.uk

This email is confidential in that it is intended for the exclusive
attention of the addressee(s) indicated. If you are not the intended
recipient, this email should not be read or disclosed to any other
person. Please notify the sender immediately and delete this email  
from

your computer system. Any opinions expressed are not necessarily those
of the company from which this email was sent and, whilst to the  
best of

our knowledge no viruses or defects exist, no responsibility can be
accepted for any loss or damage arising from its receipt or subsequent
use of this email.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] ORTE_ERROR_LOG Timeout

2008-06-04 Thread Jeff Squyres

James --

Sorry for the delay in replying.

Do you have any firewall software running on your nodes (e.g.,  
iptables)?  OMPI uses random TCP ports to connect between nodes for  
control messages.  If they can't reach each other because TCP ports  
are blocked, Bad Things will happen (potentially even a hang, because  
firewalls can cause packets to be silently dropped).



On May 20, 2008, at 12:17 PM, Rudd, James wrote:

I have been trying to compile a molecular dynamics program with the  
Openmpi 1.2.5 included in OFED 1.3.  I am running Fedora Core 6; the  
output of uname –r is 2.6.18-1.2798.fc6.  I’ve traced the problems  
I’ve been having back to openmpi because I’m unable to run the test  
programs such as glob on more than one node.  I currently have 2  
nodes connected to an infiniband switch with opensm running on  
node1.  The nodes can ping each other and I am able to ssh between  
them without a password.  My openmpi-default-hostfile  includes the  
following:


node1 slots=2 max-slots=4
node2 slots=4 max-slots=4

When I run “mpirun -np 4 --debug-daemons ./glob” I get:
Daemon [0,0,1] checking in as pid 21341 on host node1
And the program appears to hang.  Once I CTRL+C it a couple of times  
I get the contents of error.txt


Per the instructions in the FAQ I’ve included the output of  
“ibv_devinfo”, “ifconfig”, and “ulimit –l” in the  
infiniband_info.txt file. The results of “ompi_info –all is in the  
ompi_info.txt file.


I’ve been tearing my hear out over this, any help would be greatly  
appreciated.


James Rudd
JLC-Biomedical/Biotechnology Research Institute
North Carolina Central University
700 George Street
Durham, NC 27707
Phone:  (919) 530-7015
Email:  jr...@nccu.edu
http://ariel.acc.nccu.edu/Academics/BBRI/personnel/rudd.htm


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] --bynode vs --byslot

2008-06-04 Thread Jeff Squyres

On May 23, 2008, at 9:07 PM, Cally K wrote:

Hi, I have a question about --bynode and --byslot that i would like  
to clarify


Say, for example, I have a hostfile

#Hostfile

__
node0
node1 slots=2 max_slots=2
node2 slots=2 max_slots=2
node3 slots=4 max_slots=4
___

There are 4 nodes and 9 slots, how do I run my mpirun, for now I use

a) mpirun -np --bynode 4 ./abcd


I assume you mean "... -np 4 --bynode ..."

I know that the slot thingy is for SMPs, and I have tried running  
mpirun -np --byslot 9 ./abcd


and I noticed that its longer when I do --byslot when compared to -- 
bynode


According to your text, you're running 9 processes when using --byslot  
and 4 when using --bynode.  Is that a typo?  I'll assume that it is --  
that you meant to use 9 in both cases.


and I just read the faq that said, by defauly the byslot option is  
used, so I dun have to use it rite,,,


I'm not sure what your question is.  The actual performance may depend  
on your application and what its communication and computation  
patterns are.  It gets more difficult to model when you have a  
heterogeneous setup (like it looks like you have, per your hostfile).


Let's take your example of 9 processes.

- With --bynode, the MPI_COMM_WORLD ranks will be laid out as follows  
(MCRW = "MPI_COMM_WORLD rank")


node0: MCWR 0
node1: MCWR 1, MCWR 4
node2: MCWR 2, MCWR 5
node3: MCRW 3, MCRW 6, MCWR 7, MCWR 8

- With --byslot, it'll look like this:

node0: MCWR 0
node1: MCWR 1, MCWR 2
node2: MCWR 3, MCWR 4
node3: MCRW 5, MCRW 6, MCWR 7, MCWR 8

In short, OMPI is doing round-robin placement of your processes; the  
only difference is in which dimension is traversed first: by node or  
by slot.


As to why there's such a performance difference, it could depend on a  
lot of things: the difference in computational speed and/or RAM on  
your 4 nodes, the changing communication patterns between the two  
(shared memory is usually used for on-node communication, which is  
usually faster than most networks), etc.  It really depends on what  
your application is *doing*.


Sorry I can't be of more help...

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Infinite loop when tcp free list max reached

2008-06-04 Thread Jeff Squyres

On May 26, 2008, at 5:17 PM, Matt Hughes wrote:


With the TCP btl, when free list items are exhausted, OMPI 1.2.6 falls
into an infinite loop:

#3981 0x002a98b4e23f in opal_condition_wait (c=0x2a98c541d0,
   m=0x2a98c54180) at ../../../../opal/threads/condition.h:81


[snip]

Yoinks.


The call used to get a free list item is OMPI_FREE_LIST_WAIT(), which
is supposed to block until an item is available.  However, it calls
opal_condition_wait(), which in turn calls opal_process(), which then
waits for a free list item.  It seems strange to me that
opal_condition_wait() calls opal_progress(), but I'm not that familiar
with the code.


We do that because OMPI is single-threaded.  Otherwise, there's no  
other way to make progress while waiting for the conditional variable  
to become true.



Is it possible that this has been fixed in 1.3?


It is possible -- there were some changes with regards to how free  
list waiting was done, etc.  Would it be possible to try your test  
with a trunk nightly tarball?


http://www.open-mpi.org/nightly/trunk/


I haven't tried 1.3 yet because I will have to file a truckload of
bugs against 1.3 first.


Do you have a truckload of bugs to file for v1.3?  If so, now is the  
time to do so -- we're gearing up for the v1.3 release...



Should I be posting this stuff to the devel list?



If your questions go beyond the naieve-user-level questions, you might  
get a quicker response on the devel list.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] tg3 module

2008-06-04 Thread Patrick Geoffray

Hi Leonardo,

Leonardo Fialho wrote:

NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 1000 Mbps, full duplex.


The tg3 driver times out because the transmit is stuck. It can be an 
interrupt problem or bad hardware flow-control on the switch. Since it 
works after the driver resets the link, it looks like either the switch 
flow control is busted (try to turn it off or try between 2 nodes in 
back-to-back) or one other node stops consuming.


Open-MPI may generate enough contention to trigger the problem but I 
don't think it is directly related to Open-MPI.


Patrick


Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Jeff Squyres

Thanks for the tip!

I downloaded and tried the qt_tools but all conversions that I did to  
the MP4 format looked absolutely horrid -- the resulting videos had  
"jagged" images and all kinds of weird artifacts that would appear and  
disappear.  The slides were quite readable, but they just looked "bad".


Does anyone else have any suggestions?  Perhaps I need to record them  
differently so that they can be converted to both .mov and .mp4 nicely  
at the end...?



On Jun 4, 2008, at 9:03 AM, Scott Atchley wrote:


Jeff,

If I remember correctly, Microsoft dropped support for .AVI 3-4 years
ago so it can no longer be played by their media player. It is also
not native to QT, so you will have to download a plugin (I have it
somewhere if you want me to look for it).

I do not know if there is a container format that all players are
happy with. My guess would be that MP4 would be the closest.

You can convert .mov to .mp4 with a free command-line app, qt_export,
which is part of qt_tools. It uses the QT libraries and can transcode
to a number of video and audio formats (including AVI if you have the
QT plugin).

Scott

On Jun 4, 2008, at 8:31 AM, Jeff Squyres wrote:

FWIW: I tried the http://adi.loris.tv/ompi-optimized.avi URL on my  
Mac

and got redirected to the Quicktime plugin page.  I had no idea which
plugin would make it play AVI files, so I skipped it.  I tried the  
URL

on a Windows machine and Windows Media Player (i.e., what came up by
default) seemed to play the audio just fine, but it couldn't find a
video codec.  Here's the error message that it showed:

   The codec you are missing is not available for download from this
site.
   You might be able to find it on another site by searching the Web
for
   "FMP4" (this is the WaveFormat or FourCC identifier of the codec).

That being said, if we can get a format that works nicely on multiple
platforms / is convenient for users, it would be great if you could
convert them for me -- thanks!

The entire web site is in a SVN repository (our mirrors just run "svn
up" every night): http://svn.open-mpi.org/svn/ompi-www/trunk.  The
videos are under /video (the same directory structure of the web
site).


On Jun 4, 2008, at 6:13 AM, Adrian Knoth wrote:


On Wed, Jun 04, 2008 at 11:19:48AM +0200, Adrian Knoth wrote:


People usually recommend ffmpegX for OSX. You might give it a whirl
to
transcode your mov to something else, let's say H.264 in an AVI
container. (MP4/AVC, DivX, xvid, there are so many names for it)


I've checked your files, they're quite good. They are already H.264
and
AAC (advanced audio coding), the only thing wrong is the mov
container.

It's easy to repack this to avi:

$ mencoder input.mov -ovc copy -oac copy -o output.avi

I've tested it with openib-btl-tuning:

adi@chopin:/tmp$ ls -l openib-btl-tuning-v1.2.mov
-rw-r--r-- 1 adi adi 16249094 Jun  4 11:24 openib-btl-tuning- 
v1.2.mov


adi@chopin:/tmp$ ls -l ompi-test.avi
-rw-r--r-- 1 adi adi 15964104 Jun  4 11:26 ompi-test.avi

(you can download it here: )

On the other hand, the files are way too large. The video doesn't
contain much inter-frame correlation, so it's a good idea to give  
the

encoder some hints:

adi@chopin:/tmp$ ls -l test*.avi
-rw-r--r-- 1 adi adi 36171648 Jun  4 11:47 test.avi
-rw-r--r-- 1 adi adi 35323842 Jun  4 11:58 testx264.avi

(from approx. 160MB to 35MB). The first one is MPEG4 with an MP3
audio
stream, the second is H.264. Both video encoders were forced to
100kbit/s
and keyframes every 300 frames (not for x264):

$ mencoder input.mov -oac mp3lame -ovc lavc -lavcopts \
  vbitrate=100:keyint=300 -o output.avi

For testing purposes, try http://adi.loris.tv/ompi-optimized.avi

I'd like to hear if these files, especially the last one, are
working for
other users.

If so, I'd take care to convert the movs to avi, probably MPEG4.

(in that case: Jeff, you could probably give me all files in an
archive
or point to a direct download link, so I don't have to click through
the
website but just fire up the encoder in the for loop)

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] disabling tcp altogether

2008-06-04 Thread Jeff Squyres

On Jun 4, 2008, at 10:39 AM, tayfun sen wrote:

I would like to run an OpenMPI application on one node and since I  
think

it would be better performance wise I want it to use shared memory for
communication and not tcp. Is it possible to use shared memory not  
only

for MPI communication but also for control messages and other similar
inner MPI related communication? (so no tcp communication whatsoever  
is

used).


I'm afraid not -- the only "oob" component ("out of band", meaning  
"not user-level MPI communication") that we have written is the TCP  
component.  We've toyed with the idea of writing others, but have  
never done it.


The oob component is mainly used during process startup and shutdown.   
So it doesn't really affect your steady-state MPI performance.



# mpirun --host localhost --mca btl sm,self --mca oob ^tcp -n 2 hello
[myhost:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init_stage1.c at line 182
--
It looks like orte_init failed for some reason; your parallel  
process is


[snip]

This is OMPI's way of telling you that by deselecting the tcp oob, it  
can't find any others to use.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] HPMPI versus OpenMPI performance

2008-06-04 Thread Jeff Squyres

Thanks for all the detailed information!

It is quite likely that our bsend performance has never been tuned; we  
simply implemented it, verified that it works, and then moved on -- we  
hadn't considered that real applications would actually use it.  :-\


But that being said, 60% difference is a bit odd.  Have you tried  
running with "--mca mpi_leave_pinned 1"?  If all your sends are  
MPI_BSEND, it *may* not make a difference, but it could make a  
difference on the receive side.


What are the typical communication patterns for your application?



On Jun 2, 2008, at 3:39 PM, Ayer, Timothy C. wrote:




We a performing a comparison of HPMPI versus OpenMPI using  
Infiniband and
seeing a performance hit in the vicinity of 60% (OpenMPI is slower)  
on
controlled benchmarks.  Since everything else is similar, we  
suspect a

problem with the way we are using or have installed OpenMPI.

Please find attached the following info as requested from
http://www.open-mpi.org/community/help/


Application:  in house CFD solver using both point-point and  
collective
operations. Also, for historical reasons it makes extensive use of  
BSEND.
We recognize that BSEND's can be inefficient but it is not  
practical to
change them at this time.  We are trying to understand why the  
performance

is so significantly different from HPMPI.  The application is mixed
FORTRAN 90 and C built with Portland Group compilers.

HPMPI Version info:

mpirun: HP MPI 02.02.05.00 Linux x86-64
major version 202 minor version 5

OpenMPI Version info:

mpirun (Open MPI) 1.2.4
Report bugs to http://www.open-mpi.org/community/help/




Configuration info :

The benchmark was a 4-processor job run on a single dual-socket  
dual core

HP DL140G3 (Woodcrest 3.0) with 4 GB of memory.  Each rank requires
approximately 250MB of memory.

1) Output from ompi_info --all

See attached file ompi_info_output.txt
<< File: ompi_info_output.txt >>

Below is the output requested in the FAQ section:

In order for us to help you, it is most helpful if you can run a  
few steps
before sending an e-mail to both perform some basic troubleshooting  
and
provide us with enough information about your environment to help  
you.

Please include answers to the following questions in your e-mail:


1.	Which OpenFabrics version are you running? Please specify where  
you
got the software from (e.g., from the OpenFabrics community web  
site, from

a vendor, or it was already included in your Linux distribution).

We obtained the software from  www.openfabrics.org >


Output from ofed_info command:

OFED-1.1

openib-1.1 (REV=9905)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace

Git:
ref: refs/heads/ofed_1_1
commit a083ec1174cb4b5a5052ef5de9a8175df82e864a

# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm



2.  What distro and version of Linux are you running? What is your
kernel version?

Linux  2.6.9-64.EL.IT133935.jbtest.1smp #1 SMP Fri Oct 19  
11:28:12

EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


3.  Which subnet manager are you running? (e.g., OpenSM, a
vendor-specific subnet manager, etc.)

We believe this to be HP or Voltaire but we are not certain how to
determine this.


4.	What is the output of the ibv_devinfo command on a known "good"  
node
and a known "bad" node? (NOTE: there must be at least one port  
listed as

"PORT_ACTIVE" for Open MPI to work. If there is not at least one
PORT_ACTIVE port, something is wrong with your OpenFabrics  
environment and

Open MPI will not be able to run).

hca_id: mthca0
   fw_ver: 1.2.0
   node_guid:  001a:4bff:ff0b:5f9c
   sys_image_guid: 001a:4bff:ff0b:5f9f
   vendor_id:  0x08f1
   vendor_part_id: 25204
   hw_ver: 0xA0
   board_id:   VLT0030010001
   phys_port_cnt:  1
   port:   1
   state:  PORT_ACTIVE (4)
   max_mtu:2048 (4)
   active_mtu: 2048 (4)
   sm_lid: 1
   port_lid:   161
   port_lmc:   0x00


5.  What is the output of the ifconfig command on a known "good" node
and a known "bad" node? (mainly relevant for IPoIB installations)  
Note
that some Linux distributions do not put ifconfig in the default  
path for

normal users; look for it in /sbin/ifconfig or /usr/sbin/ifconfig.

eth0  Link encap:Ethernet  HWaddr 00:XX:XX:XX:XX:XX
 inet addr:X.Y.Z.Q  Bcast:X.Y.Z.255  Mask:255.255.255.0
 inet6 addr: X::X:X:X:X/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX 

Re: [OMPI users] Open MPI instructional videos

2008-06-04 Thread Brock Palen
I really think just having them in a flash container works well,  
youtube style.  I do this with both snapZpro (mac only) and jing  
(windows & mac),  if you want to have a higher quality downloadable  
traditional video though, they prob wont work.


Also these are screen+audio/voice capture tools not ones that will  
convert your existing videos.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Jun 4, 2008, at 1:45 PM, Jeff Squyres wrote:

Thanks for the tip!

I downloaded and tried the qt_tools but all conversions that I did to
the MP4 format looked absolutely horrid -- the resulting videos had
"jagged" images and all kinds of weird artifacts that would appear and
disappear.  The slides were quite readable, but they just looked  
"bad".


Does anyone else have any suggestions?  Perhaps I need to record them
differently so that they can be converted to both .mov and .mp4 nicely
at the end...?


On Jun 4, 2008, at 9:03 AM, Scott Atchley wrote:


Jeff,

If I remember correctly, Microsoft dropped support for .AVI 3-4 years
ago so it can no longer be played by their media player. It is also
not native to QT, so you will have to download a plugin (I have it
somewhere if you want me to look for it).

I do not know if there is a container format that all players are
happy with. My guess would be that MP4 would be the closest.

You can convert .mov to .mp4 with a free command-line app, qt_export,
which is part of qt_tools. It uses the QT libraries and can transcode
to a number of video and audio formats (including AVI if you have the
QT plugin).

Scott

On Jun 4, 2008, at 8:31 AM, Jeff Squyres wrote:


FWIW: I tried the http://adi.loris.tv/ompi-optimized.avi URL on my
Mac
and got redirected to the Quicktime plugin page.  I had no idea  
which

plugin would make it play AVI files, so I skipped it.  I tried the
URL
on a Windows machine and Windows Media Player (i.e., what came up by
default) seemed to play the audio just fine, but it couldn't find a
video codec.  Here's the error message that it showed:

   The codec you are missing is not available for download from this
site.
   You might be able to find it on another site by searching the Web
for
   "FMP4" (this is the WaveFormat or FourCC identifier of the  
codec).


That being said, if we can get a format that works nicely on  
multiple

platforms / is convenient for users, it would be great if you could
convert them for me -- thanks!

The entire web site is in a SVN repository (our mirrors just run  
"svn

up" every night): http://svn.open-mpi.org/svn/ompi-www/trunk.  The
videos are under /video (the same directory structure of the web
site).


On Jun 4, 2008, at 6:13 AM, Adrian Knoth wrote:


On Wed, Jun 04, 2008 at 11:19:48AM +0200, Adrian Knoth wrote:

People usually recommend ffmpegX for OSX. You might give it a  
whirl

to
transcode your mov to something else, let's say H.264 in an AVI
container. (MP4/AVC, DivX, xvid, there are so many names for it)


I've checked your files, they're quite good. They are already H.264
and
AAC (advanced audio coding), the only thing wrong is the mov
container.

It's easy to repack this to avi:

$ mencoder input.mov -ovc copy -oac copy -o output.avi

I've tested it with openib-btl-tuning:

adi@chopin:/tmp$ ls -l openib-btl-tuning-v1.2.mov
-rw-r--r-- 1 adi adi 16249094 Jun  4 11:24 openib-btl-tuning-
v1.2.mov

adi@chopin:/tmp$ ls -l ompi-test.avi
-rw-r--r-- 1 adi adi 15964104 Jun  4 11:26 ompi-test.avi

(you can download it here: )

On the other hand, the files are way too large. The video doesn't
contain much inter-frame correlation, so it's a good idea to give
the
encoder some hints:

adi@chopin:/tmp$ ls -l test*.avi
-rw-r--r-- 1 adi adi 36171648 Jun  4 11:47 test.avi
-rw-r--r-- 1 adi adi 35323842 Jun  4 11:58 testx264.avi

(from approx. 160MB to 35MB). The first one is MPEG4 with an MP3
audio
stream, the second is H.264. Both video encoders were forced to
100kbit/s
and keyframes every 300 frames (not for x264):

$ mencoder input.mov -oac mp3lame -ovc lavc -lavcopts \
  vbitrate=100:keyint=300 -o output.avi

For testing purposes, try http://adi.loris.tv/ompi-optimized.avi

I'd like to hear if these files, especially the last one, are
working for
other users.

If so, I'd take care to convert the movs to avi, probably MPEG4.

(in that case: Jeff, you could probably give me all files in an
archive
or point to a direct download link, so I don't have to click  
through

the
website but just fire up the encoder in the for loop)

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listi

Re: [OMPI users] OpenIB problem: error polling HP CQ...

2008-06-04 Thread Jeff Squyres
We have made a *lot* of changes to the run-time support for spawn and  
some changes to the FLUSH support in the openib BTL for the upcoming  
v1.3 series.


Would it be possible for you to try a trunk nightly tarball snapshot,  
perchance?


http://www.open-mpi.org/nightly/trunk/


On May 29, 2008, at 3:50 AM, Matt Hughes wrote:


I have a program which uses MPI::Comm::Spawn to start processes on
compute nodes (c0-0, c0-1, etc).  The communication between the
compute nodes consists of ISend and IRecv pairs, while communication
between the compute nodes consists of gather and bcast operations.
After executing ~80 successful loops (gather/bcast pairs), I get this
error message from the head node process during a gather call:

[0,1,0][btl_openib_component.c:1332:btl_openib_component_progress]
from headnode.local to: c0-0 error polling HP CQ with status WORK
REQUEST FLUSHED ERROR status number 5 for wr_id 18504944 opcode 1

The relevant environment variables:
OMPI_MCA_btl_openib_rd_num=128
OMPI_MCA_btl_openib_verbose=1
OMPI_MCA_btl_base_verbose=1
OMPI_MCA_btl_openib_rd_low=75
OMPI_MCA_btl_base_debug=1
OMPI_MCA_btl_openib_warn_no_hca_params_found=0
OMPI_MCA_btl_openib_warn_default_gid_prefix=0
OMPI_MCA_btl=self,openib

If rd_low and rd_num are left at their default values, the program
simply hangs in the gather call after about 20 iterations (a gather
and a bcast).

Can anyone shed any light on what this error message means or what
might be done about it?

Thanks,
mch
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Problem with X forwarding

2008-06-04 Thread Jeff Squyres
In general, Open MPI doesn't have anything to do with X forwarding.   
However, if you're using ssh to startup your processes, ssh may  
configure X forwarding for you (depending on your local system  
setup).  But OMPI closes down ssh channels once applications have  
launched (there's no need to keep them open), so any X forwarding that  
may have been setup will be closed down.


The *easiest* way to setup X forwarding is simply to allow X  
connections to your local host from the node(s) that will be running  
your application.  E.g., use the "xhost" command to add the target  
nodes into the access list.  And then have mpirun export a suitable  
DISPLAY variable, such as:


export DISPLAY=my_hostname:0
mpirun -x DISPLAY ...

The "-x DISPLAY" clause tells Open MPI to export the value of the  
DISPLAY variable to all nodes when running your application.


Hope this helps.


On May 30, 2008, at 1:24 PM, Cally K wrote:

hi, I have some problem running DistributedData.cxx ( it is a VTK  
file ) , I need to be able to see the rendering from my computer


I, however have problem running the executable, I loaded both the  
executabe into 2 machines


and I am accesing it from my computer( DHCP enabled )

after running the following command - I use OpenMPI

mpirun -hostfile myhostfile -np 2 -bynode ./DistributedData

and I keep getting these errors

ERROR: In /home/kalpanak/Installation_Files/VTKProject/VTK/Rendering/ 
vtkXOpenGLRenderWindow.cxx, line 326

vtkXOpenGLRenderWindow (0x8664438): bad X server connection.


ERROR: In /home/kalpanak/Installation_Files/VTKProject/VTK/Rendering/ 
vtkXOpenGLRenderWindow.cxx, line 169

vtkXOpenGLRenderWindow (0x8664438): bad X server connection.


[vrc1:27394] *** Process received signal ***
[vrc1:27394] Signal: Segmentation fault (11)
[vrc1:27394] Signal code: Address not mapped (1)
[vrc1:27394] Failing at address: 0x84
[vrc1:27394] [ 0] [0xe440]
[vrc1:27394] [ 1] ./ 
DistributedData(_ZN22vtkXOpenGLRenderWindow20GetDesiredVisualInfoEv 
+0x229) [0x8227e7d]
[vrc1:27394] [ 2] ./ 
DistributedData(_ZN22vtkXOpenGLRenderWindow16WindowInitializeEv 
+0x340) [0x8226812]
[vrc1:27394] [ 3] ./ 
DistributedData(_ZN22vtkXOpenGLRenderWindow10InitializeEv+0x29)  
[0x82234f9]
[vrc1:27394] [ 4] ./ 
DistributedData(_ZN22vtkXOpenGLRenderWindow5StartEv+0x29) [0x82235eb]
[vrc1:27394] [ 5] ./ 
DistributedData(_ZN15vtkRenderWindow14DoStereoRenderEv+0x1a)  
[0x82342ac]
[vrc1:27394] [ 6] ./ 
DistributedData(_ZN15vtkRenderWindow10DoFDRenderEv+0x427) [0x8234757]
[vrc1:27394] [ 7] ./ 
DistributedData(_ZN15vtkRenderWindow10DoAARenderEv+0x5b7) [0x8234d19]
[vrc1:27394] [ 8] ./DistributedData(_ZN15vtkRenderWindow6RenderEv 
+0x690) [0x82353b4]
[vrc1:27394] [ 9] ./ 
DistributedData(_ZN22vtkXOpenGLRenderWindow6RenderEv+0x52) [0x82245e2]

[vrc1:27394] [10] ./DistributedData [0x819e355]
[vrc1:27394] [11] ./ 
DistributedData(_ZN16vtkMPIController19SingleMethodExecuteEv+0x1ab)  
[0x837a447]

[vrc1:27394] [12] ./DistributedData(main+0x180) [0x819de78]
[vrc1:27394] [13] /lib/libc.so.6(__libc_start_main+0xe0) [0xb79c0fe0]
[vrc1:27394] [14] ./DistributedData [0x819dc21]
[vrc1:27394] *** End of error message ***
mpirun noticed that job rank 0 with PID 27394 on node  exited on  
signal 11 (Segmentation fault).



Maybe I am not doing the xforwading properly, but has anyone ever  
encountered the same problem, it works fine on one pc, and I read  
the mailing list but I just don't know if my prob is similiar to  
their, I even tried changing the DISPLAY env



This is what I want to do

my mpirun should run on 2 machines ( A and B ) and I should be able  
to view the output ( on my PC ),

are there any specfic commands to use.









___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Scott Shaw
Hi, I was wondering if anyone had any comments with regarding to my
posting of questions.  Am I off base with my questions or is this the
wrong forum for these types of questions?   

> 
> Hi, I hope this is the right forum for my questions.  I am running
into a
> problem when scaling >512 cores on a infiniband cluster which has
14,336
> cores. I am new to openmpi and trying to figure out the right -mca
options
> to pass to avoid the "mca_oob_tcp_peer_complete_connect: connection
> failed:" on a cluster which has infiniband HCAs and OFED v1.3GA
release.
> Other MPI implementation like Intel MPI and mvapich work fine using
uDAPL
> or VERBs IB layers for MPI communications.
> 
> I find it difficult to understand which network interface or IB layer
> being used. When I explicitly state not to use eth0,lo,ib1, or ib1:0
> interfaces with the cmdline option "-mca oob_tcp_exclude" openmpi will
> continue to probe these interfaces.  For all MPI traffic openmpi
should
> use IB0 which is the 10.148 network. But with debugging enabled I see
> references trying the 10.149 network which is IB1.  Below is the
ifconfig
> network device output for a compute node.
> 
> Questions:
> 
> 1. Is there away to determine which network device is being used and
not
> have openmpi fallback to another device? With Intel MPI or HP MPI you
can
> state not to use a fallback device.  I thought "-mca oob_tcp_exclude"
> would be the correct option to pass but I maybe wrong.
> 
> 2. How can I determine infiniband openib device is actually being
used?
> When running a MPI app I continue to see counters for in/out packets
at a
> tcp level increasing when it should be using the IB RDMA device for
all
> MPI comms over the IB0 or mtcha0 device? OpenMPI was bundled with OFED
> v1.3 so I am assuming the openib interface should work.  Running
ompi_info
> shows btl_open_* references.
> 
> /usr/mpi/openmpi-1.2-2/intel/bin/mpiexec -mca
> btl_openib_warn_default_gid_prefix 0 -mca oob_tcp_exclude
> eth0,lo,ib1,ib1:0  -mca btl openib,sm,self -machinefile mpd.hosts.$$
-np
> 1024 ~/bin/test_ompi < input1
> 
> 3. When trying to avoid the "mca_oob_tcp_peer_complete_connect:
connection
> failed:" message I tried using "-mca btl openib,sm,self" and "-mca btl
> ^tcp" but I still get these error messages.  In cases with using the
"-mca
> btl openib,sm,self" openmpi will retry to use the IB1 (10.149 net)
fabric
> to establish a connection with a node.  What are my options to avoid
these
> connection failed messages?  I suspect openmpi is overflowing the tcp
> buffer on the clients based on large core count of this job since I
see
> lots of tcp buffer errors based on netstat -s output. I reviewed all
of
> the online FAQs and I am not sure what options to pass to get around
this
> issue.
> 
> OBTW, I did check the /usr/mpi/openmpi-1.2-2/intel/etc/openmpi-mca-
> params.conf file and no defaults are being specified.
> 
> 
> 
> Ompi_info:
> Open MPI: 1.2.2
>Open MPI SVN revision: r14613
> Open RTE: 1.2.2
>Open RTE SVN revision: r14613
> OPAL: 1.2.2
>OPAL SVN revision: r14613
>   Prefix: /usr/mpi/openmpi-1.2-2/intel
>  Configured architecture: x86_64-suse-linux-gnu
> 
> --
> 
> Following is the cluster configuration:
> 1792 nodes with 8 cores per node = 14336 cores
> Ofed Rel: OFED-1.3-rc1
> IB Device(s): mthca0 FW=1.2.0 Rate=20 Gb/sec (4X DDR) mthca1 FW=1.2.0
> Rate=20 Gb/sec (4X DDR)
> Processors: 2 x 4 Cores Intel(R) Xeon(R) CPU X5365 @ 3.00GHz 8192KB
Cache
> FSB:1333MHz
> Total Mem: 16342776 KB
> OS Release: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 SP1
> Kernel Ver: 2.6.16.54-0.2.5-smp
> 
> --
> 
> Ifconfig output:
> eth0  Link encap:Ethernet  HWaddr 00:30:48:7B:A7:AC
>   inet addr:192.168.159.41  Bcast:192.168.159.255
> Mask:255.255.255.0
>   inet6 addr: fe80::230:48ff:fe7b:a7ac/64 Scope:Link
>   UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500
Metric:1
>   RX packets:1215826 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:1342035 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:787514337 (751.0 Mb)  TX bytes:170968505 (163.0 Mb)
>   Base address:0x2000 Memory:dfa0-dfa2
> 
> ib0   Link encap:UNSPEC  HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-
> 00-00-00-00
>   inet addr:10.148.3.73  Bcast:10.148.255.255
Mask:255.255.0.0
>   inet6 addr: fe80::230:487b:a7ac:1/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
>   RX packets:20823896 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:19276836 errors:0 dropped:42 overruns:0 carrier:0
>   collisions:0 txqueuelen:256
>   RX bytes:176581223103 (168400.9 Mb)  TX bytes:182691213682
> (174227.9 Mb)
> 
> ib1   Link encap:UNSPEC  HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-
> 00-00-00-00
>   inet addr:10.149.1

Re: [OMPI users] Proper way to throw an error to all nodes?

2008-06-04 Thread Jeff Squyres
Yes -- MPI_Abort is the simplest way to get them all to die.  But  
you'll also get error message(s) from OMPI.  So you have [at least] 2  
options:


1. Exit with MPI error

-
  if (rank == process_who_does_the_checking && !exists(filename)) {
 print("bad!");
 MPI_Abort(MPI_COMM_WORLD);
  }
-

2. Exit with your own error; MPI finalizes cleanly

-
  file_exists = 1;
  if (rank == process_who_does_the_checking && !exists(filename)) {
 print("bad!");
 file_exists = 0;
  }
  MPI_Bcast(&file_exists, 1, MPI_INT, process_who_does_the_checking,  
MPI_COMM_WORLD);

  if (!file_exists) {
 MPI_Finalize();
 exit(1);
  }
-

There's oodles of variants on this, of course, but you get the general  
idea.




On Jun 3, 2008, at 11:00 PM, David Singleton wrote:



This is exactly what MPI_Abort is for.

David

Terry Frankcombe wrote:

Calling MPI_Finalize in a single process won't ever do what you want.
You need to get all the processes to call MPI_Finalize for the end  
to be

graceful.

What you need to do is have some sort of special message to tell
everyone to die.  In my codes I have a rather dynamic master-slave  
model
with flags being broadcast by the master process to tell the slaves  
what

to expect next, so it's easy for me to send out an "it's all over,
please kill yourself" message.  For a more rigid communication  
pattern
you could embed the die message in the data: something like if the  
first

element of the received data is negative, then that's the sign things
have gone south and everyone should stop what they're doing and
MPI_Finalize.  The details depend on the details of your code.

Presumably you could also set something up using tags and message
polling.

Hope this helps.


On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc...@sneakemail.com wrote:
So I'm working on this program which has many ways it might  
possibly die
at runtime, but one of them that happens frequently is the user  
types a
wrong (non-existant) filename on the command prompt. As it is now,  
the
node looking for the file notices the file doesn't exist and tries  
to
terminate the program. It tries to call MPI_Finalize(), but the  
other

nodes are all waiting for a message from the node doing the file
reading, so MPI_Finalize waits forever until the user realizes the  
job

isn't doing anything and terminates it manually.

So, my question is: what's the "correct" graceful way to handle
situations like this? Is there some MPI function which can basically
throw an exception to all other nodes telling them bail out now?  
Or is

correct behaviour just to have the node that spotted the error die
quietly and wait for the others to notice?

Thanks for any suggestions!


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Jeff Squyres
First and foremost: is it possible to upgrade your version of Open  
MPI?  The version you are using (1.2.2) is rather ancient -- many bug  
fixes have occurred since then (including TCP wireup issues).  Note  
that oob_tcp_in|exclude were renamed to be oob_tcp_if_in|exclude in  
1.2.3 to be symmetric with other _if_in|exclude params in other  
components.


More below.


On Jun 3, 2008, at 1:07 PM, Scott Shaw wrote:

Hi, I hope this is the right forum for my questions.  I am running  
into

a problem when scaling >512 cores on a infiniband cluster which has
14,336 cores. I am new to openmpi and trying to figure out the right
-mca options to pass to avoid the "mca_oob_tcp_peer_complete_connect:
connection failed:" on a cluster which has infiniband HCAs and OFED
v1.3GA release.  Other MPI implementation like Intel MPI and mvapich
work fine using uDAPL or VERBs IB layers for MPI communications.


The OMPI v1.2 series is a bit inefficient in its TCP wireup for  
control messages -- it creates TCP sockets between all MPI processes.   
Do you allow enough fd's per process to allow this to occur?


(this situation is considerably better in the upcoming v1.3 series)


I find it difficult to understand which network interface or IB layer
being used. When I explicitly state not to use eth0,lo,ib1, or ib1:0
interfaces with the cmdline option "-mca oob_tcp_exclude" openmpi will
continue to probe these interfaces.  For all MPI traffic openmpi  
should

use IB0 which is the 10.148 network. But with debugging enabled I see
references trying the 10.149 network which is IB1.  Below is the
ifconfig network device output for a compute node.


Just curious: does the oob_tcp_include parameter not work?


Questions:

1. Is there away to determine which network device is being used and  
not

have openmpi fallback to another device? With Intel MPI or HP MPI you
can state not to use a fallback device.  I thought "-mca
oob_tcp_exclude" would be the correct option to pass but I maybe  
wrong.


oob_tcp_in|exclude should be suitable for this purpose.  If they're  
not working, I'd be surprised (but it could have been a bug that was  
fixed in a later version...?).  Keep in mind that the "oob" traffic is  
just control messages -- it's not the actual MPI communication.  That  
will go over the verbs interfaces.


2. How can I determine infiniband openib device is actually being  
used?
When running a MPI app I continue to see counters for in/out packets  
at

a tcp level increasing when it should be using the IB RDMA device for
all MPI comms over the IB0 or mtcha0 device? OpenMPI was bundled with
OFED v1.3 so I am assuming the openib interface should work.  Running
ompi_info shows btl_open_* references.

/usr/mpi/openmpi-1.2-2/intel/bin/mpiexec -mca
btl_openib_warn_default_gid_prefix 0 -mca oob_tcp_exclude
eth0,lo,ib1,ib1:0  -mca btl openib,sm,self -machinefile mpd.hosts.$$  
-np

1024 ~/bin/test_ompi < input1


The "btl" is the component that controls point-to-point communication  
in Open MPI.  so if you specify "openib,sm,self", then Open MPI is  
definitely using the verbs stack for MPI communication (not a TCP  
stack).



3. When trying to avoid the "mca_oob_tcp_peer_complete_connect:
connection failed:" message I tried using "-mca btl openib,sm,self"  
and

"-mca btl ^tcp" but I still get these error messages.


Unfortunately, these are two different issues -- OMPI always uses TCP  
for wireup and out-of-band control messages.  That's where you're  
getting the errors from.  Specifically: giving values for the btl MCA  
parameter won't affect these messages / errors.



In cases with
using the "-mca btl openib,sm,self" openmpi will retry to use the IB1
(10.149 net) fabric to establish a connection with a node.  What are  
my
options to avoid these connection failed messages?  I suspect  
openmpi is

overflowing the tcp buffer on the clients based on large core count of
this job since I see lots of tcp buffer errors based on netstat -s
output. I reviewed all of the online FAQs and I am not sure what  
options

to pass to get around this issue.


I think we made this much better in 1.2.5 -- I see notes about this  
issue in the NEWS file under the 1.2.5 release.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Åke Sandgren
On Wed, 2008-06-04 at 11:43 -0700, Scott Shaw wrote:
> Hi, I was wondering if anyone had any comments with regarding to my
> posting of questions.  Am I off base with my questions or is this the
> wrong forum for these types of questions?   
> 
> > 
> > Hi, I hope this is the right forum for my questions.  I am running
> into a
> > problem when scaling >512 cores on a infiniband cluster which has
> 14,336
> > cores. I am new to openmpi and trying to figure out the right -mca
> options

I don't have any real answerr to you question except that i have had no
problems running HPL on our 672 node dual quad core = 5376 cores with
infiniband.
We use verbs.
I wouldn't touch the oob parameters since it uses tcp over ethernet to
setup the environment.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



[OMPI users] libibverbs

2008-06-04 Thread Brock Palen

We have two installs of openmpi-1.2.3
One with the pgi compilers the other with gcc/Nagf90

One the pgi compilers does not link against libibverbs, but ompi_info  
shows the openib btl and we see traffic on the fabric.


The other built with Nagware links against libibverbs.  It also shoes  
in ompi_info the openib btl.


What would cause this?  It just pointed out that our login nodes  
(most of our cluster does not have IB) don't have libibverbs making  
code not link.  Any insight from an OFED master would be great.



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985





Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Jeff Squyres
One other parameter that I neglected to mention (and Scott pointed out  
to me is *not* documented in the FAQ) is the mpi_preconnect_oob MCA  
param.


This parameter will cause all the OOB connections to be created during  
MPI_INIT, and *may* help such kind of issues.  You *do* need to have  
enough fd's available per process to allow this to happen at scale, of  
course.  I'll try to add this information to the FAQ by the end of  
this week.


This kind of thing is much better in the v1.3 series -- the linear TCP  
wireup is no longer necessary (e.g., each MPI process only opens 1 TCP  
socket: to the daemon on its host, etc.).



On Jun 4, 2008, at 4:14 PM, Åke Sandgren wrote:


On Wed, 2008-06-04 at 11:43 -0700, Scott Shaw wrote:

Hi, I was wondering if anyone had any comments with regarding to my
posting of questions.  Am I off base with my questions or is this the
wrong forum for these types of questions?



Hi, I hope this is the right forum for my questions.  I am running

into a

problem when scaling >512 cores on a infiniband cluster which has

14,336

cores. I am new to openmpi and trying to figure out the right -mca

options


I don't have any real answerr to you question except that i have had  
no

problems running HPL on our 672 node dual quad core = 5376 cores with
infiniband.
We use verbs.
I wouldn't touch the oob parameters since it uses tcp over ethernet to
setup the environment.

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Problem with X forwarding

2008-06-04 Thread Allen Barnett
If you are using a recent version of Linux (as machine A), the X server
is probably started with its TCP network connection turned off. For
example, if you do:

$ ps auxw | grep X
/usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7

The "-nolisten tcp" option turns off the X server's remote connection
socket. Also, "netstat -atp" on A will show that nothing is listening on
port 6000. So, for example, from machine B:

[B]$ xlogo -display A:0

doesn't work.

The trick I've used: Before you run your MPI application, you can ssh to
the remote node with X forwarding enabled ("ssh -Y"). On the remote
system, do "echo $DISPLAY" to see what DISPLAY environment variable ssh
created. For example, it might be something like "localhost:10.0". Leave
this ssh connection open and then run your OMPI application in another
window and pass "-x DISPLAY=localhost:10.0" through MPI. X applications
on the remote node *should* now be able to connect back through the open
ssh connection. This probably won't scale very well, though.

Allen

On Wed, 2008-06-04 at 14:36 -0400, Jeff Squyres wrote:
> In general, Open MPI doesn't have anything to do with X forwarding.   
> However, if you're using ssh to startup your processes, ssh may  
> configure X forwarding for you (depending on your local system  
> setup).  But OMPI closes down ssh channels once applications have  
> launched (there's no need to keep them open), so any X forwarding that  
> may have been setup will be closed down.
> 
> The *easiest* way to setup X forwarding is simply to allow X  
> connections to your local host from the node(s) that will be running  
> your application.  E.g., use the "xhost" command to add the target  
> nodes into the access list.  And then have mpirun export a suitable  
> DISPLAY variable, such as:
> 
> export DISPLAY=my_hostname:0
> mpirun -x DISPLAY ...
> 
> The "-x DISPLAY" clause tells Open MPI to export the value of the  
> DISPLAY variable to all nodes when running your application.
> 
> Hope this helps.
> 
> 
> On May 30, 2008, at 1:24 PM, Cally K wrote:
> 
> > hi, I have some problem running DistributedData.cxx ( it is a VTK  
> > file ) , I need to be able to see the rendering from my computer
> >
> > I, however have problem running the executable, I loaded both the  
> > executabe into 2 machines
> >
> > and I am accesing it from my computer( DHCP enabled )
> >
> > after running the following command - I use OpenMPI
> >
> > mpirun -hostfile myhostfile -np 2 -bynode ./DistributedData
> >
> > and I keep getting these errors
> >
> > ERROR: In /home/kalpanak/Installation_Files/VTKProject/VTK/Rendering/ 
> > vtkXOpenGLRenderWindow.cxx, line 326
> > vtkXOpenGLRenderWindow (0x8664438): bad X server connection.
> >
> >
> > ERROR: In /home/kalpanak/Installation_Files/VTKProject/VTK/Rendering/ 
> > vtkXOpenGLRenderWindow.cxx, line 169
> > vtkXOpenGLRenderWindow (0x8664438): bad X server connection.
> >
> >
> > [vrc1:27394] *** Process received signal ***
> > [vrc1:27394] Signal: Segmentation fault (11)
> > [vrc1:27394] Signal code: Address not mapped (1)
> > [vrc1:27394] Failing at address: 0x84
> > [vrc1:27394] [ 0] [0xe440]
> > [vrc1:27394] [ 1] ./ 
> > DistributedData(_ZN22vtkXOpenGLRenderWindow20GetDesiredVisualInfoEv 
> > +0x229) [0x8227e7d]
> > [vrc1:27394] [ 2] ./ 
> > DistributedData(_ZN22vtkXOpenGLRenderWindow16WindowInitializeEv 
> > +0x340) [0x8226812]
> > [vrc1:27394] [ 3] ./ 
> > DistributedData(_ZN22vtkXOpenGLRenderWindow10InitializeEv+0x29)  
> > [0x82234f9]
> > [vrc1:27394] [ 4] ./ 
> > DistributedData(_ZN22vtkXOpenGLRenderWindow5StartEv+0x29) [0x82235eb]
> > [vrc1:27394] [ 5] ./ 
> > DistributedData(_ZN15vtkRenderWindow14DoStereoRenderEv+0x1a)  
> > [0x82342ac]
> > [vrc1:27394] [ 6] ./ 
> > DistributedData(_ZN15vtkRenderWindow10DoFDRenderEv+0x427) [0x8234757]
> > [vrc1:27394] [ 7] ./ 
> > DistributedData(_ZN15vtkRenderWindow10DoAARenderEv+0x5b7) [0x8234d19]
> > [vrc1:27394] [ 8] ./DistributedData(_ZN15vtkRenderWindow6RenderEv 
> > +0x690) [0x82353b4]
> > [vrc1:27394] [ 9] ./ 
> > DistributedData(_ZN22vtkXOpenGLRenderWindow6RenderEv+0x52) [0x82245e2]
> > [vrc1:27394] [10] ./DistributedData [0x819e355]
> > [vrc1:27394] [11] ./ 
> > DistributedData(_ZN16vtkMPIController19SingleMethodExecuteEv+0x1ab)  
> > [0x837a447]
> > [vrc1:27394] [12] ./DistributedData(main+0x180) [0x819de78]
> > [vrc1:27394] [13] /lib/libc.so.6(__libc_start_main+0xe0) [0xb79c0fe0]
> > [vrc1:27394] [14] ./DistributedData [0x819dc21]
> > [vrc1:27394] *** End of error message ***
> > mpirun noticed that job rank 0 with PID 27394 on node  exited on  
> > signal 11 (Segmentation fault).
> >
> >
> > Maybe I am not doing the xforwading properly, but has anyone ever  
> > encountered the same problem, it works fine on one pc, and I read  
> > the mailing list but I just don't know if my prob is similiar to  
> > their, I even tried changing the DISPLAY env
> >
> >
> > This is what I want to do
> >
> > my mpirun should run on 2 machine

Re: [OMPI users] OpenMPI scaling > 512 cores

2008-06-04 Thread Pavel Shamis (Pasha)

Scott Shaw wrote:

Hi, I hope this is the right forum for my questions.  I am running into
a problem when scaling >512 cores on a infiniband cluster which has
14,336 cores. I am new to openmpi and trying to figure out the right
-mca options to pass to avoid the "mca_oob_tcp_peer_complete_connect:
connection failed:" on a cluster which has infiniband HCAs and OFED
v1.3GA release.  Other MPI implementation like Intel MPI and mvapich
work fine using uDAPL or VERBs IB layers for MPI communications.
  
Did you have chance to see this FAQ - 
http://www.open-mpi.org/faq/?category=troubleshooting#large-job-tcp-oob-timeout

I find it difficult to understand which network interface or IB layer
being used. When I explicitly state not to use eth0,lo,ib1, or ib1:0
interfaces with the cmdline option "-mca oob_tcp_exclude" openmpi will
continue to probe these interfaces.  For all MPI traffic openmpi should
use IB0 which is the 10.148 network. But with debugging enabled I see
references trying the 10.149 network which is IB1.  Below is the
ifconfig network device output for a compute node.

Questions:

1. Is there away to determine which network device is being used and not
have openmpi fallback to another device? With Intel MPI or HP MPI you
can state not to use a fallback device.  I thought "-mca
oob_tcp_exclude" would be the correct option to pass but I maybe wrong. 
  

If you want to use the IB verbs , you may specify:
-mca btl sm.self,openib
sm - shmem
self - self comunication
openib - IB communication (IB verbs)


2. How can I determine infiniband openib device is actually being used?
When running a MPI app I continue to see counters for in/out packets at
a tcp level increasing when it should be using the IB RDMA device for
all MPI comms over the IB0 or mtcha0 device? OpenMPI was bundled with
OFED v1.3 so I am assuming the openib interface should work.  Running
ompi_info shows btl_open_* references. 


/usr/mpi/openmpi-1.2-2/intel/bin/mpiexec -mca
btl_openib_warn_default_gid_prefix 0 -mca oob_tcp_exclude
eth0,lo,ib1,ib1:0  -mca btl openib,sm,self -machinefile mpd.hosts.$$ -np
1024 ~/bin/test_ompi < input1
  

http://www.open-mpi.org/community/lists/users/2008/05/5583.php




Re: [OMPI users] --bynode vs --byslot

2008-06-04 Thread Cally K
Thanks, that was actually a lot of help, I had very little understanding of
the bynode and byslot thingy, thanks

On 6/5/08, Jeff Squyres  wrote:
>
> On May 23, 2008, at 9:07 PM, Cally K wrote:
>
> > Hi, I have a question about --bynode and --byslot that i would like
> > to clarify
> >
> > Say, for example, I have a hostfile
> >
> > #Hostfile
> >
> > __
> > node0
> > node1 slots=2 max_slots=2
> > node2 slots=2 max_slots=2
> > node3 slots=4 max_slots=4
> > ___
> >
> > There are 4 nodes and 9 slots, how do I run my mpirun, for now I use
> >
> > a) mpirun -np --bynode 4 ./abcd
>
> I assume you mean "... -np 4 --bynode ..."
>
> > I know that the slot thingy is for SMPs, and I have tried running
> > mpirun -np --byslot 9 ./abcd
> >
> > and I noticed that its longer when I do --byslot when compared to --
> > bynode
>
> According to your text, you're running 9 processes when using --byslot
> and 4 when using --bynode.  Is that a typo?  I'll assume that it is --
> that you meant to use 9 in both cases.
>
> > and I just read the faq that said, by defauly the byslot option is
> > used, so I dun have to use it rite,,,
>
> I'm not sure what your question is.  The actual performance may depend
> on your application and what its communication and computation
> patterns are.  It gets more difficult to model when you have a
> heterogeneous setup (like it looks like you have, per your hostfile).
>
> Let's take your example of 9 processes.
>
> - With --bynode, the MPI_COMM_WORLD ranks will be laid out as follows
> (MCRW = "MPI_COMM_WORLD rank")
>
> node0: MCWR 0
> node1: MCWR 1, MCWR 4
> node2: MCWR 2, MCWR 5
> node3: MCRW 3, MCRW 6, MCWR 7, MCWR 8
>
> - With --byslot, it'll look like this:
>
> node0: MCWR 0
> node1: MCWR 1, MCWR 2
> node2: MCWR 3, MCWR 4
> node3: MCRW 5, MCRW 6, MCWR 7, MCWR 8
>
> In short, OMPI is doing round-robin placement of your processes; the
> only difference is in which dimension is traversed first: by node or
> by slot.
>
> As to why there's such a performance difference, it could depend on a
> lot of things: the difference in computational speed and/or RAM on
> your 4 nodes, the changing communication patterns between the two
> (shared memory is usually used for on-node communication, which is
> usually faster than most networks), etc.  It really depends on what
> your application is *doing*.
>
> Sorry I can't be of more help...
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>